Ivan Erill | 20 Nov 18:42 2014

[Biopython] NCBI e-utils parser upgrade

Hi all,

As part of my work, I need to deal with the new WP protein records at NCBI and, specifically, with the information on their coding sequences. This information is returned by E-utils through a an integrated protein report type of view:

which does not use a DTD for the XML, but rather a schema. Although there has been no formal announcement, I've been talking to NCBI people and they tell me that they will progressively be moving to schemas (which provide more fine grained validation specification). Specifically, all new XML exports from NCBI will be using schemas. I don't believe that existing DTDs are going to be replaced by schemas for now.

My original through was to branch an update for the current XML parser in BioPython, but it looks like using schemas would be a major overhaul of the existing code-base and it might make more sense to develop a parallel parser, so I first wanted to check on what approach you guys would prefer to do code-wise.



Biopython mailing list  -  Biopython <at> mailman.open-bio.org
Aisling O'Driscoll | 13 Nov 22:32 2014

[Biopython] Blast query


I have a query re BLAST output. I have written a BioPython script that takes the attached betl gene sequence and runs a blast. It returns results for Listeria monocytogenes WSLC1001 complete genome with gi|584465821|gb|CP007160.1.

I run the same from the NCBI web interface just to verify the output and it also returns this result.

However why is it that it does not return the correct match of Listeria monocytogenes EGD-e chromosome with gi|16802048:2172068-2173591 as we can see at the top of the attached fasta file?

Thanks in advance
Attachment (betl.fasta): application/octet-stream, 2209 bytes
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
Bhushan | 13 Nov 07:50 2014

[Biopython] How to apply symmetry matrix on protein chains


I want to apply the symmetry matrix on pdb 1HRI and output the biological assembly (with rotated coordinates). Can anyone suggest me a way to do this using biopython.

Thank you for your time. I would appreciate any kind of help.

With Regards,
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
Radhouane Aniba | 13 Nov 01:08 2014

[Biopython] SeqIO __init__.py typo ?


I was juste browsing the source to see how parse function is implemented and I found this before the docstring

def parse(handle, format, alphabet=None): r"""Turns a sequence file into an iterator returning SeqRecords. - handle - handle to the file, or the filename as a string (note older versions of Biopython only took a handle). - format - lower case string describing the file format. - alphabet - optional Alphabet object, useful when the sequence type cannot be automatically inferred from the file itself (e.g. format="fasta" or "tab") Typical usage, opening a file to read in, and looping over the record(s):
there is an r before the """

is that a typo ?

Radhouane Aniba
Bioinformatics Scientist
BC Cancer Agency, Vancouver, Canada
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
Sanjeev Sariya | 7 Nov 15:55 2014

[Biopython] Error in pdb superimposer: global name Bio not defined

Hi Team,

I am trying to align 2 pdb files..
When I try to create an object for Bio.PDB.Superimposer(), I get a strange error.

Langauge specs:
Python version - 2.7
Bio.__version__ - 1.61 ## bio python version
My code looks like:

from Bio.PDB import *
def alignPDB_file(refPDB, samplePDB): # function to align pdb file

     ref_model=PDBParser(QUIET=True).get_structure("reference",refPDB)[0] # get the 0th model
     sam_model=PDBParser(QUIET=True).get_structure("sample",samplePDB)[0] # get the 0th model

     for ref_chain in ref_model:
          for ref_res in ref_chain:
               if not "CA" in ref_res:continue
               else:  ref_atoms.append(ref_res['CA'])

     for sam_chain in sam_model:
          for sam_res in sam_chain:
               if not "CA" in sam_res: continue
               else: sample_atoms.append(sam_res['CA'])

     super_imposer = Bio.PDB.Superimposer()

    super_imposer = Bio.PDB.Superimposer()
NameError: global name 'Bio' is not defined

Kindly advise.

Biopython mailing list  -  Biopython <at> mailman.open-bio.org
Jim Mailman | 30 Oct 11:15 2014

[Biopython] Test feature location type

How to test Seqfeature location to know it's a Exact location, or non exact type, and know what type it is? Thanks, Jim

Biopython mailing list  -  Biopython <at> mailman.open-bio.org
Sanjeev Sariya | 24 Oct 20:10 2014

[Biopython] Understanding pdb biopython

Hi All,
I'm having a hard time using and understanding biopython pdb.
./read_pdb_file.py 3OE6.pdb

I'm attaching python script, pdb file, fasta file and output with mail.
I'have following doubts:
- When I print the sequence I get in broken pieces. Why?
- Also the sequence printed doesn't match with the fasta file (attached).
- Am I doing making a silly mistake?

I am running script as:
python read_pdb_file.py 3OE6.pdb 

Kindly help and guide.

Attachment (3OE6.pdb): invalid/x-pdb, 808 KiB
Attachment (read_pdb_file.py): text/x-python, 1018 bytes
Biopython mailing list  -  Biopython <at> mailman.open-bio.org
Ivan Erill | 24 Oct 17:34 2014

[Biopython] Faculty Position in Biological Data Science

Apologies for cross-posting

The Department of Biological Sciences at UMBC invites applications for a tenure-track Assistant Professor position in biological data science (i.e. bioinformatics, computational, theoretical or quantitative biology) specializing in computational neuroscience, metabolomics, metagenomics, data visualization or evolutionary genomics. The successful applicant will set up a computational laboratory and interact with faculty whose interests span the range from genomics and molecular genetics to evolution and behavior. Applicants must have a Ph.D. in a relevant field, post-doctoral experience in big-data computational or theoretical biology and a strong publication record, and are expected to establish a vigorous, externally funded research program, supervise doctoral-level graduate students, and teach at the undergraduate and graduate levels.

Applicants should submit a cover letter, curriculum vitae, research statement, research plan, a statement of teaching interests and philosophy and three letters of reference.  Application materials and letters of reference should be submitted to apply.interfolio.com/25756 by November 15, 2014.

UMBC is a medium-sized research university in the Baltimore-Washington D.C. area, whose combined excellence in research and outstanding educational programs have earned recognition by US News and World Report as the "#1 Up-and-Coming National University” for five years running. For information about the Department of Biological Sciences and its graduate programs, visit http://www.umbc.edu/biosci/.

The University of Maryland Baltimore County is an Equal Opportunity Employer/Affirmative Action. UMBC values gender, ethnic, and racial diversity; women, members of ethnic minority groups, and individuals with disabilities are strongly encouraged to apply. UMBC is the recipient of an NSF ADVANCE Institutional Transformation Award to increase the participation of women in academic careers.


Biopython mailing list  -  Biopython <at> mailman.open-bio.org
Thomas Girke | 22 Oct 19:09 2014

[Biopython] Faculty Position in Computational Systems Biology at UC Riverside

University of California, Riverside
AP Recruit job link: https://aprecruit.ucr.edu/apply/JPF00230

We are seeking an Assistant Professor in the field of Computational
Systems Biology. Research should be in systems biology, using
computational approaches or a combination of both experimental and
computational approaches.

Assistant Professor. Appointment and salary will be competitive and
commensurate with accomplishments.

University of California, Riverside.

The successful candidate will hold an academic appointment in the
Department of Botany and Plant Sciences with the option of a secondary
cooperating faculty appointment in a quantitative department such as
statistics, computer science, engineering or related. The candidate will
also join the innovative and multidisciplinary Institute for Integrative
Genome Biology (IIGB) which connects theoretical and experimental
researchers from different departments in Life, Physical and
Mathematical Sciences, Medicine, Engineering and various campus based
Centers. The IIGB is organized around a 10,000 sq.ft. suite of
Instrumentation Facilities that serve as a centralized, shared­use
resource for faculty, staff and students, offering advanced tools in
bioinformatics, microscopy, proteomics and genomics. Its bioinformatic
component is equipped with a modern high­performance compute (HPC)
infrastructure. This position will include an appointment in the
Agricultural Experiment Station, which incl udes the responsibility to
conduct research and outreach relevant to the mission of the California
Agricultural Experiment Station (http://cnas.ucr.edu/about/anr/).

A Ph.D. in a field related to systems biology is required including
several years of postdoctoral research.

Qualified candidates are expected to have excellent publication and
research records in computational systems biology or related areas, such
as bioinformatics or computational genome biology. The candidate’s
research program should be heavily focused on involving modern
high­throughput data, especially from next generation sequencing
technologies. An additional requirement is a strong commitment to
teaching in computational systems biology.

Review of applications will begin November 7, 2014 and continue until
the position is filled. Interested individuals should: (1) submit a
curriculum vitae, (2) provide a statement of research, and (3) three
letters of reference. A teaching statement is strongly recommended.
Please submit your applications through the AP Recruit job link:

Websites: http://www.genomics.ucr.edu, http://cnas.ucr.edu/, http://www.ucr.edu

The University of California is an Equal Opportunity/Affirmative Action
Employer/Veterans Employer. In accordance with Federal law, we are
making available our Campus Security Report to all prospective

Biopython mailing list  -  Biopython <at> mailman.open-bio.org
Ivan Gregoretti | 21 Oct 22:56 2014

[Biopython] Unix pipes and APIs like NcbiblastxCommandline()

Hello everybody

Can somebody suggest a way to run an API like NcbiblastxCommandline()
but directing the output to standard output?

For instance, this is the conventional execution with output directed
to a file, in this case opuntia.csv:

from Bio.Blast.Applications import NcbiblastxCommandline
blastx_cline = NcbiblastxCommandline("/mnt/shared/ncbi-blast-2.2.29+/blastx",
query="opuntia.fasta", db="nr", evalue=0.001, outfmt=10,

Now, what I would like to know is how to run this API with something like

out="/dev/stdout" instead of out="opuntia.csv".

In other words, I seek to avoid the creation of opuntia.csv.

Optional context:

I can currently execute local blast from within Python and direct its
output to a pipe (i.e. subprocess.Popen...). I am now interested in
trying the API way as it is likely to be more robust than my
implementation and already tested by a very large number of users.

Thank you,


Ivan Gregoretti, PhD
Biopython mailing list  -  Biopython <at> mailman.open-bio.org

Ivan Gregoretti | 20 Oct 21:08 2014

[Biopython] Running a specific version of NCBI BLAST from within Biopython

Hello Biopythoneers,

I have multiple versions of NCBI BLAST installed in my Linux box. I
need them all.

How do I tell Biopython which version of local BLAST I want to run each time?

Is it possible to specify the path to the desired executable?

Thank you,


Ivan Gregoretti, PhD
Biopython mailing list  -  Biopython <at> mailman.open-bio.org