1 Aug 2006 12:42
Re: Reading sequences: FormatIO, SeqIO, etc
Leighton Pritchard <lpritc <at> scri.sari.ac.uk>
2006-08-01 10:42:37 GMT
2006-08-01 10:42:37 GMT
On Mon, 2006-07-31 at 12:08 -0400, Marc Colosimo wrote:
> On Jul 31, 2006, at 11:14 AM, Peter (BioPython Dev) wrote:
> >>> The SeqUtils/quick_FASTA_reader is interesting in that it loads the
> >>> entire file into memory in one go, and then parses it. On the other
> >>> hand its not perfect: I would use "\n>" as the split marker
> >>> rather than
> >>> ">" which could appear in the description of a sequence.
> >>
> >> I agree (not that it's bitten me, yet), but I'd be inclined to go
> >> with
> >> "%s>" % os.linesep as the split marker, just in case.
> >
> > Good point. I wonder how many people even know this function exists?
> >
>
> The only problem with this is that if someone sends you a file not
> created on your system. [...]
> This has mostly simplied down to two - Unix and Windows - unless the
> person uses a Mac GUI app some of which use \r (CR) instead of \n
> (LF) where Windows uses \r\n (CRLF). I think the standard python
> disto comes with crlf.py and lfcr.py that can convert the line endings.
Also a good point. I had a play about with regular expression
splitting/substitution and the SeqUtils.quick_FASTA_reader method to see
if I could capture this variability in line-endings:
def method_quick_FASTA_reader3(filename):
txt = file(filename).read()
entries = []
split_marker = re.compile('^>', re.M)
(Continue reading)
RSS Feed