Ryan Golhar | 1 Oct 2005 01:16
Favicon

RE: How to read in FASTA formatted sequence without fastaheader?

True.  Ok, so I have a raw sequence instead of fasta...when I try to
read in the sequence using raw format, it only reads in the first line.

I'm thinking of modifying the raw module and making a multilineraw
module that will stop reading on a newline or EOF.

I don't want to modify the actual files because they might screw up all
my other scripts.  I could write one to insert the fasta header in a tmp
file then concatente the sequence to the file, but it just doesn't seem
like a clean solution to me.

-----Original Message-----
From: bioperl-l-bounces <at> portal.open-bio.org
[mailto:bioperl-l-bounces <at> portal.open-bio.org] On Behalf Of Richard
Sucgang, PhD
Sent: Friday, September 30, 2005 5:59 PM
To: golharam <at> umdnj.edu
Cc: 'Bioperl List'
Subject: Re: [Bioperl-l] How to read in FASTA formatted sequence without
fastaheader?

Well, maybe I am mistaken, but isn't the header line the item that  
makes a FASTA file a FASTA file?
As in, now you have a raw sequence.

On Sep 30, 2005, at 3:43 PM, Ryan Golhar wrote:

> I'm looking for the easier way to read in a fasta file that doesn't 
> contain the fasta header, ie the ">..." line.
>
(Continue reading)

Jason Stajich | 1 Oct 2005 05:12
Favicon

Re: How to read in FASTA formatted sequence without fastaheader?


On Sep 30, 2005, at 7:16 PM, Ryan Golhar wrote:

> True.  Ok, so I have a raw sequence instead of fasta...when I try to
> read in the sequence using raw format, it only reads in the first  
> line.
>
> I'm thinking of modifying the raw module and making a multilineraw
> module that will stop reading on a newline or EOF.
>
Well technically will have to detect the presence of multiple  
consecutive newlines as it currently separates on single newlines,  
hence your problem.

Seems like it is easier to use a standard file format in the future  
(and dare I say *standard* for anyone who might come along after you  
on a project), but you could probably modify raw.pm locally to  
separate on multiline newline.

Thinking about this I'm not sure how much help SeqIO is.  You just  
need a function that will give you back Bio::PrimarySeq objectsm  
isn't much more complicated than this below.
If you just add this to your perl script you will be able to split a  
sequence on double newlines and use the 'raw' format.

use strict;
use Bio::SeqIO;
use Bio::SeqIO::raw;

sub Bio::SeqIO::raw::next_seq{
(Continue reading)

Koen van der Drift | 1 Oct 2005 04:57
Picon
Favicon
Gravatar

Re: RC3 candidates


On Sep 29, 2005, at 10:19 AM, bioperl-l-request <at> portal.open-bio.org  
wrote:

> To avoid any confusion these are both tagged RC3 and should be used
> together.
>
> (BTW Hilmar are you interested in bioperl-db being also tagged as
> 1.5.X and released?)
>
>
> RC3 candidates of both bioperl core and bioperl-run are up.
>
> http://bioperl.org/DIST/bioperl-1.5.1-rc3.tar.gz
> http://bioperl.org/DIST/bioperl-run-1.5.1-rc3.tar.gz
>
> OR zips for those inclined
> http://bioperl.org/DIST/bioperl-1.5.1-rc3.zip
> http://bioperl.org/DIST/bioperl-run-1.5.1-rc3.zip
>
> Test, fix, repeat... =)
>

Both build and install fine on Mac OS X, 10.4.2, using fink.

- Koen.
Stefan Kirov | 1 Oct 2005 18:50
Picon
Favicon

Off topic: 'Intelligent Design' in schools in US

This is off topic, but some people may want to sign...
http://shovelbums.org/index.php?option=com_mospetition&Itemid=506
Hilmar Lapp | 1 Oct 2005 21:48
Picon
Gravatar

Re: Re: entrezgene binary ASN

I've tried to listen in on the exchange but I'm not sure I understand 
what the issue is.

I.e., does the parser need to seek in the stream? If yes, then piping 
won't do any good if it works at all. If no, then the parser should be 
perfectly fine with the filename being output from a pipe, and possibly 
accept a file handle in substitution too. In that case, the caller can 
pipe the actual input through any commands he/she wishes by simply 
passing the piped command(s) (e.g. as in "gzip -d -c 
file.asn.gz|gene2xml|").

The parser doing this auto-magically isn't necessary and doesn't save a 
caller that much. Instead, it exposes the parser to liabilities like 
path of gzip, path of gene2xml and similar stuff which may not be 
identical on all platforms.

What am I missing?

	-hilmar

On Sep 30, 2005, at 12:55 PM, Michael Seewald wrote:

> On 9/30/05, Mingyi Liu <mingyi.liu <at> gpc-biotech.com> wrote:
>>
>> I didn't say indexing would break, but the performance of retrieval
>> would be horrible. That's why in most situations there's no need to 
>> use
>> pipe - after all, any one who needs to use index & ID-based retrieval
>> would convert the binary ASN to text file anyway (using a script,
>> hopefully).
(Continue reading)

Stefan Kirov | 1 Oct 2005 22:01
Picon
Favicon

Re: Re: entrezgene binary ASN

Hilmar,
As of now the parser does not seek through the streem, but hopefully it 
will as soon as I can sit down and do it (by the way it is weird but 
gene2xml will not parse the gunzipped file, so you should not use gzip -d).
I don't think you are missing anything as far as I can tell.
Stefan

Hilmar Lapp wrote:

> I've tried to listen in on the exchange but I'm not sure I understand 
> what the issue is.
>
> I.e., does the parser need to seek in the stream? If yes, then piping 
> won't do any good if it works at all. If no, then the parser should be 
> perfectly fine with the filename being output from a pipe, and 
> possibly accept a file handle in substitution too. In that case, the 
> caller can pipe the actual input through any commands he/she wishes by 
> simply passing the piped command(s) (e.g. as in "gzip -d -c 
> file.asn.gz|gene2xml|").
>
> The parser doing this auto-magically isn't necessary and doesn't save 
> a caller that much. Instead, it exposes the parser to liabilities like 
> path of gzip, path of gene2xml and similar stuff which may not be 
> identical on all platforms.
>
> What am I missing?
>
>     -hilmar
>
> On Sep 30, 2005, at 12:55 PM, Michael Seewald wrote:
(Continue reading)

Hilmar Lapp | 2 Oct 2005 03:57
Picon
Gravatar

Re: Re: entrezgene binary ASN


On Oct 1, 2005, at 1:01 PM, Stefan Kirov wrote:

> Hilmar,
> As of now the parser does not seek through the streem, but hopefully 
> it will as soon as I can sit down and do it

What advantage would that have?

Note that not allowing streams first off makes entrezgene different 
from all other formats, and second, together with the gene2xml 
conversion requirement would require you to call it in a different 
manner than all other SeqIO parsers (i.e., just passing a string, w/ or 
w/o trailing pipe, wouldn't suffice; you'd have to do a preprocessing 
step). If seeking in the file can outweigh that with some significant 
advantages, then great, but even then it should be optional if it can 
be within reason.

	-hilmar

>  (by the way it is weird but gene2xml will not parse the gunzipped 
> file, so you should not use gzip -d).
> I don't think you are missing anything as far as I can tell.
> Stefan
>
> Hilmar Lapp wrote:
>
>> I've tried to listen in on the exchange but I'm not sure I understand 
>> what the issue is.
>>
(Continue reading)

Michael Seewald | 2 Oct 2005 17:27
Picon

Re: Re: entrezgene binary ASN

On 10/2/05, Hilmar Lapp <hlapp <at> gmx.net> wrote:
>
>
> On Oct 1, 2005, at 1:01 PM, Stefan Kirov wrote:
>
> > Hilmar,
> > As of now the parser does not seek through the streem, but hopefully
> > it will as soon as I can sit down and do it
>
> What advantage would that have?

You could create an index and seek by gene id. Could be handy in some cases,
although I wouldn't use it (personally).

Note that not allowing streams first off makes entrezgene different
> from all other formats, and second, together with the gene2xml
> conversion requirement would require you to call it in a different
> manner than all other SeqIO parsers (i.e., just passing a string, w/ or
> w/o trailing pipe, wouldn't suffice; you'd have to do a preprocessing
> step). If seeking in the file can outweigh that with some significant
> advantages, then great, but even then it should be optional if it can
> be within reason.

The "streaming use" shouldn't be affected. BTW, there are other bioperl
modules that use indices.
Best wishes,
Michael
Hilmar Lapp | 2 Oct 2005 21:03
Picon
Gravatar

Re: Re: entrezgene binary ASN


On Oct 2, 2005, at 8:27 AM, Michael Seewald wrote:

> On 10/2/05, Hilmar Lapp <hlapp <at> gmx.net> wrote:
>>
>>
>> On Oct 1, 2005, at 1:01 PM, Stefan Kirov wrote:
>>
>>> Hilmar,
>>> As of now the parser does not seek through the streem, but hopefully
>>> it will as soon as I can sit down and do it
>>
>> What advantage would that have?
>
>
>
> You could create an index and seek by gene id. Could be handy in some 
> cases,
> although I wouldn't use it (personally).
>
> [...]
>
> The "streaming use" shouldn't be affected. BTW, there are other bioperl
> modules that use indices.

Right, but they use a different implementation (module) for indexing, 
not the SeqIO parser (it only parses the entry).

	-hilmar

(Continue reading)

Stefan Kirov | 2 Oct 2005 21:08
Picon
Favicon

Re: Re: entrezgene binary ASN

Yes, Hilmar is right- there should be Bio::Index::entrezgene to do that. 
I agree this functionality is better kept separated from Bio::SeqIO parser.
Stefan

Hilmar Lapp wrote:

>
> On Oct 2, 2005, at 8:27 AM, Michael Seewald wrote:
>
>> On 10/2/05, Hilmar Lapp <hlapp <at> gmx.net> wrote:
>>
>>>
>>>
>>> On Oct 1, 2005, at 1:01 PM, Stefan Kirov wrote:
>>>
>>>> Hilmar,
>>>> As of now the parser does not seek through the streem, but hopefully
>>>> it will as soon as I can sit down and do it
>>>
>>>
>>> What advantage would that have?
>>
>>
>>
>>
>> You could create an index and seek by gene id. Could be handy in some 
>> cases,
>> although I wouldn't use it (personally).
>>
>> [...]
(Continue reading)


Gmane