1 Jul 2006 23:47
Fasta parser
Michiel de Hoon <mdehoon <at> c2b2.columbia.edu>
2006-07-01 21:47:28 GMT
2006-07-01 21:47:28 GMT
Hi everybody,
The Biopython shows the following approach to parsing a Fasta file:
>>> from Bio import Fasta
>>> parser = Fasta.RecordParser()
>>> file = open("ls_orchid.fasta")
>>> iterator = Fasta.Iterator(file, parser)
>>> cur_record = iterator.next()
But for large Fasta files, it's very slow, compared to file.read(),
which may be due to going through Martel (I believe the same was true
for large GenBank files).
So I'm thinking about writing a simple-minded Fasta parser for better
performance with large files. What I'm wondering about:
1) Is there some advantage that I overlooked of using Martel for parsing
Fasta files?
2) Why is it necessary to create a parser first and passing it to
Fasta.Iterator? Are there any cases where Fasta.Iterator uses something
other than a Fasta.RecordParser?
--Michiel.
RSS Feed