Andrew Dalke | 4 May 09:31 2004

[BioPython] 1 day to BOSC abstracts deadline!

Working on a cool project?  Want to talk about it at BOSC?  Only 1 day
left before the abstract submission deadline.

That's for the full (10 to 30 minute) talks.  If you only want to talk
for 5 minutes or give a demo, there's still time left, but please do
let us know so we can schedule things.

                                dalke <at>
BioPython mailing list  -  BioPython <at>

Brad Chapman | 4 May 18:44 2004

Re: [BioPython]

Hi Chunlei;

>             I just tried "", but it failed for both fasta 
> file and genbank file I tested.
> >>> rec_h=RecordFile.RecordFile(open(r"gb_test.txt" ),'LOCUS','\\')
> or
> >>> rec_h=RecordFile.RecordFile(open(r"gb_test.txt" ),'>','')
> both returned the same error:

Thanks for the report -- it does look like RecordFile is really code
that is not completely finished. I think the best course of action
is to deprecate this module, which is not really used inside of
Biopython. Mostly it might be useful for building up parsers, but we
are currently encouraging the use of Martel for this.

> Actually, I wrote a simply script before using Bio.File's UndoHandle for 
> the same purpose. It looks much simpler, maybe not as powerful as 
>, but it does works for me.  I post it here and hope it is 
> worth sharing with you.

Thanks for the code. In many ways, I think both RecordFile and this
kind of duplicate the functionality that already exists in
Biopython. You can get a text record iterator of Fasta and GenBank
files using:

from Bio import Fasta
handle = open("your_file.fasta")
iterator = Fasta.Iterator(handle)
(Continue reading)

Brad Chapman | 4 May 18:55 2004

Re: [BioPython] Solved BioPython & Zope problem and a Bug?

Hi Cristian;

> I could solve my problem with BioPython in Zope.

Thanks for your work on this. I suspect there are not many people
out there (myself included) working with Zope and Biopython, so I
appreciate your work treading into unknown territory.

> The main problem was
> Zope can't import some functions defined in some modules. These modules
> have other modules with relative imports and Zope can't import them. 

Okay, I guess the problem is that both Zope and Biopython are
probably playing around with import hooks. I don't know enough about
all of what Zope does to fix all of the problems, but I guess the
best way is trial and error.

> have the following relative import:
> import _support
> I change it to:
> from Bio.config import _support


> I do the same with with ReseekFile import.

Where specifically are these problems and I can add the relevant
(Continue reading)

Sebastian Bassi | 5 May 04:02 2004

[BioPython] New version of Tm

I did send this to Brad the 4-18, but still is not into the CVS. So I 
send it here, maybe the original mail fell into a spam filter :)
Here is a code update (1.4 version). It has some correction that Olivier 
Friard point me at my code.
This should go here: [biopython]/biopython/Bio/SeqUtils/
I send it as an attach.


Best regards,

//=\ Sebastian Bassi - Diplomado en Ciencia y Tecnologia, UNQ   //=\
\=// IT Manager Advanta Seeds - Balcarce Research Center -      \=//
//=\ Pro secretario ASALUP - - PGP key available //=\
\=// E-mail: sbassi <at> - ICQ UIN: 3356556 -     \=//


Attachment ( application/zip, 2477 bytes
BioPython mailing list  -  BioPython <at>
Brad Chapman | 5 May 11:39 2004

Re: [BioPython] PSIBlastParser behavior

Hi John;
Sorry for the delay in getting back with you about this.

> I am observing an unexpected behavior using the PSIBlastParser.  I am
> doing a simple PSI-Blast run:
> SyntaxError: Line does not start with '  Database':
> Results from round 1

Hmmm, I can not seem to repeat this. I tried both Blast 2.2.6 and
the most recent 2.2.8 version and both seem to parse fine, using the
following code:

from Bio import db

input_file = "channel.fasta"
fasta_fetch = db["fasta"]
fasta_info = fasta_fetch["CNG4_BOVIN"]
input_handle = open(input_file, "w")

from Bio.Blast import NCBIStandalone

result_handle, error_handle = NCBIStandalone.blastpgp(
        "/usr/local/bin/blastpgp", "swissprot", input_file,
        expectation = 1e-5, npasses = 2)

parser = NCBIStandalone.PSIBlastParser()
rec = parser.parse(result_handle)
(Continue reading)

Brad Chapman | 5 May 13:58 2004

Re: [BioPython] Looking for functions

Hi Myriam;

> >- convert a swissprot file to a fasta file,
> >I try this one :
> >
> >from Bio.SeqIO import FASTA
> >from Bio.SwissProt import SProt
> >from sys import *
> >
> >def convert_sp_fasta(infile,outfile):
> >   """
> >   convert a SwissProt file into a Fasta formatted file
> >   """
> >   in_h = open(infile)
> >   sp = SProt.Iterator(in_h, SProt.SequenceParser())
> >   out_h = FASTA.FastaWriter(outfile)
> >   sequence =
> >   out_h.write(sequence)
> >   in_h.close()
> >   out_h.close()

The code is fine, as far as the use of the Biopython goes. The only
problem is that you are supposed to pass an open file handle to the
FastaWriter, hence the error:

AttributeError: 'str' object has no attribute 'write'

when the code tries to write to the name of the file (outfile)
instead of an open handle. You can fix this by changing:

(Continue reading)

Brad Chapman | 5 May 19:54 2004

Re: [BioPython] Robustness of parsing

Hi Stephan;

> I have a question about the robustness of parsing in my case the UniGene 
> at NCBI. 
> I have been assigned to write a parser to put the UniGene flat files 
> into an existing database structure, before starting writing code I thought
> I'd better search the web to find some existing solutions. This is when I 
> opened your site and found a parser for the UniGene.

The Biopython parser is designed to parse the UniGene cluster HMTL 
pages which list the EST sequences making up a cluster. Honestly,
I'm not sure if it is well maintained currently and will work with
the current UniGene pages.

I'm not sure exactly what kind of UniGene information you want to 
be parsing. If by flatfiles you are talking about the downloads
available from:

then you can parse this with the standard Fasta parser in Biopython,
but would then need to build up some code to parse the UniGene
specific information out of of the Fasta headers.

> At my work we already have a parser for these flat files written in C, 
> the only problem with this parser is, is that it will not run anymore 
> if the structure of the UniGene changes. For instance if a new field is 
> added or if relations change from a 1-to-1 to 1-to-many.
> My question about biopython; has it the same problems? If that is the case; 
> in what timespan are updates available?
(Continue reading)

Brad Chapman | 5 May 19:57 2004

Re: [BioPython] GenBank parser

Hi Leighton;

> I've noticed an oddity in the GenBank FeatureParser (CVS installation
> 19/4).  While parsing the Salmonella typhi file NC_003198.gbk, my way of
> dealing with 'gene' tags fell over.  This turned out to be because the
> GenBank file contains entries with valueless tags such as /partial and
> /pseudo.  The current parser concatenates these tags with the following
> tag.

Ah, good catch. Yes, I was dealing with these incorrectly in the
parsing. The problem, briefly, was that Martel generates two
feature_qualifier_name XML tags in a row (pseudo and gene) without
an intervening XML tag. In these cases, the parsing framework
assumes that the two tags are the same set of information, but split
up over multiple tags. Since XML can split long reems of information
over multiple tags, this is a safe assumption in most places but
falls apart here.

A fix for this was just checked in CVS, in which the
feature_qualifier_name tags are handled correctly.

Thanks for the bug report!
BioPython mailing list  -  BioPython <at>

Brad Chapman | 5 May 20:08 2004

Re: [BioPython] FormatIO + Fasta parser + BioDB.

Hi Christian;

> I'm writing a procedure to store files in a BioDB but I have the
> following error:
> """
> ...
>   File "/usr/lib/python2.2/site-packages/BioSQL/", line 209, in
> _load_bioentry_table
>     if'.') >= 0: # try to get a version from the id
> AttributeError: 'NoneType' object has no attribute 'find'
> """
> I feel that is because I don't define the title2ids function for the
> Fasta parser. If I'm right, how can I tell to the FormatIO module to use
> a title2ids function?

Yes, you have the problem figured exactly. The solution is actually
to not use the FormatIO module. That's more appropriate for
automated format conversions and you will probably need a finer
scale of work here to specifically parse out ids and descriptions
from the Fasta title headers.

To do this, use the standard Fasta.SequenceParser and Fasta.Iterator
classes, along with a title2ids function. The adjusted code which
should work is:

from Bio import Fasta

def your_title_to_ids(title):
        # write this for your specific FASTA titles
        # to return name, id and description
(Continue reading)

Brad Chapman | 5 May 20:56 2004

Re: [BioPython] HSPs in Blast parser

Hi Jawad;
Sorry for the delay in getting back with you.

> I am stuck on parsing a BlastN output and would appreciate some help. 
> I am working on multiple HSPs for a single hit . For example if there 
> are two hsps found for one hit, I need to find where query and subject 
> ends for one hsp and then compare it with the query and subject start 
> for the next hsp
> I have noticed that in the blast parser one can iterate through each 
> hsp for every single hit, but am not too sure how to treat two hsps of 
> a single hit as related and iterate through the two hsps of a single hit 
> in order to find the query (and subject) end of one and query (and subject) 
> start of the other.

You just need to iterate through each alignment and then through the
HSPs in each alignment. Then you can collect up the start and end
coordinates of each HSP and associate them with the overall
alignment. So, step by step:

1. Iterate through each alignment.
2. Create lists of HSP information for queries and subjects
  2a. Iterate through each HSP
  2b. Get the start and end coordinates from the hsp.query_start and
      hsp.sbjct_start attributes of the hsps
  2c. Calculate the HSP ends using the query and subject
      information, removing gaps added by BLAST
  2d. Add each start, end to the lists
3. You have the HSP lists, associated with the alignment title.

(Continue reading)