Albert Vilella | 2 Jan 09:57
Picon
Gravatar

Downloading from dbEST by taxon range

Hi all and happy 2010 for those that follow the Gregorian calendar,

A question that is a bit in between bioperl and NCBI. I would like to use
bioperl to download sequences fom dbEST. For that, my idea is to use
Bio::DB::Genbank and get the sequences by gi id.

Now, I want my script to download sequences for a given NCBI taxonomy clade.

For example, if I want to download all fish (clupeocephala) sequences in dbEST,
I can browse it around with the dbEST webpage using "clupeocephala[taxonomy]",
so I am thinking there should be a way to do it programmatically.

How can I query NCBI dbEST through bioperl to give me the list of GI ids I am
looking for given a taxon id?

Thanks in advance,

Albert.
Jason Stajich | 2 Jan 17:35
Gravatar

Re: Downloading from dbEST by taxon range

DId you try Bio::DB::Query::GenBank ?
You'd want to use -db => 'nucest' and then you just put in an Entrez  
query as per the example.  you can include dates in the query so you  
can do updates to your locally retrieved data in a script that runs  
periodically.

-jason
On Jan 2, 2010, at 12:57 AM, Albert Vilella wrote:

> Hi all and happy 2010 for those that follow the Gregorian calendar,
>
> A question that is a bit in between bioperl and NCBI. I would like  
> to use
> bioperl to download sequences fom dbEST. For that, my idea is to use
> Bio::DB::Genbank and get the sequences by gi id.
>
> Now, I want my script to download sequences for a given NCBI  
> taxonomy clade.
>
> For example, if I want to download all fish (clupeocephala)  
> sequences in dbEST,
> I can browse it around with the dbEST webpage using  
> "clupeocephala[taxonomy]",
> so I am thinking there should be a way to do it programmatically.
>
> How can I query NCBI dbEST through bioperl to give me the list of GI  
> ids I am
> looking for given a taxon id?
>
> Thanks in advance,
(Continue reading)

Albert Vilella | 3 Jan 10:08
Picon
Gravatar

Re: Downloading from dbEST by taxon range

Thanks Jason!
For the sake of completion, here is the script I needed:

---------------------
#!/usr/bin/perl
use strict;

use Bio::SeqIO;
use Bio::DB::Taxonomy;
use Bio::DB::Query::GenBank;
use Bio::DB::GenBank;
use Bio::SeqIO;
use Getopt::Long;

my $keyword_type = 'EST';
my $outdir = '.';
my $taxon_name = undef;
my $db_type = 'nucest';

GetOptions('keyword_type:s' => \$keyword_type,
           't|taxon_name:s' => \$taxon_name,
           'db_type:s' => \$db_type,
           'outdir:s' => \$outdir);

my $query_string = $taxon_name ."[Organism] AND ". $keyword_type ."[Keyword]";
my $db = Bio::DB::Query::GenBank->new
  (-db => $db_type,
   -query => $query_string,
   -mindate => '2007',
   -maxdate => '2010');
(Continue reading)

Picon
Picon
Favicon

Re: How to read in the whole fasta file in the memory?

> Message: 1
> Date: Thu, 31 Dec 2009 11:26:45 +1800
> From: Peng Yu <pengyu.ut <at> gmail.com>
> Subject: [Bioperl-l] How to read in the whole fasta file in the
> 	memory?
> To: bioperl-l <at> lists.open-bio.org
> Message-ID:
> 	<366c6f340912300926k5af5cc88nc3c3babda541fd1 <at> mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> With Bio::SeqIO, I can only read in the records in a fasta file one by
> one. This is preferable if there are many records in a file.
> 
> But I also want to read all the records in. I could use a while loop
> to read all records in. But could somebody let me know if there is a
> function in bioperl that can read in all the record at once and return
> me an object?
> 
> http://www.bioperl.org/wiki/HOWTO:SeqIO
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Wed, 30 Dec 2009 13:04:53 -0500
> From: Sean Davis <sdavis2 <at> mail.nih.gov>
> Subject: Re: [Bioperl-l] How to read in the whole fasta file in the
> 	memory?
> To: Peng Yu <pengyu.ut <at> gmail.com>
> Cc: "bioperl-l <at> lists.open-bio.org" <bioperl-l <at> lists.open-bio.org>
(Continue reading)

Jason Stajich | 4 Jan 17:03
Gravatar

Re: How to read in the whole fasta file in the memory?

We typically think of SeqIO as parsing a stream of data, not being  
reliant on it being a file which is what these methods would be  
implying I think. Sounds a lot like a database - does Bio::DB::Fasta  
not provide some of the functionality you need by these methods?  I  
realize there isn't a by_order() but the get_by_id() is implemented to  
allow random access.

-jason

>
> Hi,
>
> I wrote and currently use a module I named Bio::SeqIO::multifasta,  
> which is basically a copy of Bio::SeqIO::fasta plus a few methods:
> get_by_id(), get_by_order(), first_seq() and previous_seq()
>
> It would need review, validation etc. Do I submit it to Bugzilla ?
>
> 	-- jmf
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l <at> lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich <at> gmail.com
jason <at> bioperl.org
http://fungalgenomes.org/
(Continue reading)

Albert Vilella | 4 Jan 21:00
Picon
Gravatar

indexed fastq files

Hi all,

What is the best way to index fastq files, so that once clustered, I
can provide a list of seq_ids and get
them back in fastq format from the indexed db?

Cheers,

Albert.
Chris Fields | 4 Jan 22:59
Favicon
Gravatar

Re: indexed fastq files

Bio::Index::Fastq, maybe?  To tell the truth, I haven't tried it since we refactored FASTQ parsing, so let
us know if it doesn't work.

chris

On Jan 4, 2010, at 2:00 PM, Albert Vilella wrote:

> Hi all,
> 
> What is the best way to index fastq files, so that once clustered, I
> can provide a list of seq_ids and get
> them back in fastq format from the indexed db?
> 
> Cheers,
> 
> Albert.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l <at> lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Chris Fields | 5 Jan 04:54
Favicon
Gravatar

Re: How to read in the whole fasta file in the memory?

Jean-Marc,

You can do that, yes.  Just curious, but have you looked at the various flat file indexing modules for FASTA? 
Bio::DB::Fasta and Bio::Index::Fasta are commonly used and allow lookups by primary ID (and I think in
some cases secondary IDs).

chris

On Jan 4, 2010, at 8:12 AM, Jean-Marc Frigerio INRA wrote:

> ...
> 
> Hi,
> 
> I wrote and currently use a module I named Bio::SeqIO::multifasta, which is basically a copy of
Bio::SeqIO::fasta plus a few methods:
> get_by_id(), get_by_order(), first_seq() and previous_seq()
> 
> It would need review, validation etc. Do I submit it to Bugzilla ?
> 
> 	-- jmf
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l <at> lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Frank Schwach | 6 Jan 23:16
Picon
Favicon

Bio::DB::Sam strange behaviour for read pairs

I'm trying to extract paired reads from a BAM file that span a given 
region. I would then like to get the two read ends of the sequenced 
clone that spans the region.
I use Bio::DB::Sam->get_features_by_location for this and it does give 
me the correct read pairs as a region match but it doesn't give me both 
read pairs in all cases.

Here is the script:

#!/usr/bin/perl
use Bio::DB::Sam;

my $usage = "usage: $0 BAMFILE CHROMOSOME STARTPOS ENDPOS\n" ;
my ($bam_file,$chrom,$start,$end) = @ARGV ;
die $usage unless $bam_file && $chrom && $start && $end;

my $bam = Bio::DB::Sam->new(-bam => $bam_file);

my @pairs = $bam->get_features_by_location(
    -type   => 'read_pair',
    -seq_id => $chrom,
    -start  => $start,
    -end    => $end);

print "region: $chrom:$start..$end\n" ;
foreach my $pair (@pairs) {
  print "  pair: id: ".$pair->id.", start".$pair->start.', 
end:'.$pair->end."\n";
  my ($first_mate,$second_mate) = $pair->get_SeqFeatures;
  print "    first_mate: start:".$first_mate->start.', 
(Continue reading)

Hilmar Lapp | 7 Jan 17:55
Gravatar

Re: Data missing into Annotation object using Bio::SeqIO (Genbank)

I don't know to what extent this was followed up on further and I  
guess it's too long ago to be of much help, but if it hasn't been  
mentioned before I wanted to point out  
Bio::SeqFeature::AnnotationAdaptor which integrates tag/value  
annotation and Bio::Annotation annotation into one  
AnnotationCollection, so it doesn't matter whether something is  
attached as a tag or as an annotation object.

	-hilmar

On Dec 16, 2009, at 10:09 AM, Chris Fields wrote:

> Emmanuel,
>
> The previous behavior in the 1.5.x series was to store feature tags  
> as Bio::Annotation.  The problem had been the way this was  
> implemented was considered unsatisfactory for various reasons, so we  
> reverted back to using simple tag-value pairs as the default.  You  
> can get at the data this way (from the Feature/Annotation HOWTO):
>
> for my $feat_object ($seq_object->get_SeqFeatures) {
>    print "primary tag: ", $feat_object->primary_tag, "\n";
>    for my $tag ($feat_object->get_all_tags) {
>        print "  tag: ", $tag, "\n";
>        for my $value ($feat_object->get_tag_values($tag)) {
>            print "    value: ", $value, "\n";
>        }
>    }
> }
>
(Continue reading)


Gmane