Anand Venkatraman | 1 May 2006 20:36
Picon
Favicon

how to obtain GIs from clone_ids


Hi everybody,

I have a file containing clone_ids (from the Features annotation section of a GenBank entry) 
------------------------------------------------------------
FEATURES             Location/Qualifiers
 source          1..707
 /clone="C0005918b04"
------------------------------------------------------------
Is there a way in Bioperl to send a query over the internet (one clone_id at a time) and get out just the GI
number for that clone_id? Any suggestions..

Thanks in advance.

Anand

		
---------------------------------
Blab-away for as little as 1ยข/min. Make  PC-to-Phone Calls using Yahoo! Messenger with Voice.
Cui, Wenwu (NIH/NCI) [F] | 1 May 2006 21:39
Picon

Re: how to obtain GIs from clone_ids

use strict;
use Bio::DB::Query::GenBank;

my $query_string = 'EST["C0005918b04"]';   
my $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide',                                           
					 -query=>$query_string,				       
					);   
my $count = $query->count;

my  <at> ids   = $query->ids;  

for ( <at> ids) {
  print;
}

-----Original Message-----
From: Anand Venkatraman [mailto:bioperlanand <at> yahoo.com] 
Sent: Monday, May 01, 2006 2:36 PM
To: bioperl-l <at> lists.open-bio.org
Subject: [Bioperl-l] how to obtain GIs from clone_ids

Hi everybody,

I have a file containing clone_ids (from the Features annotation section of a GenBank entry)
------------------------------------------------------------
FEATURES             Location/Qualifiers
 source          1..707
 /clone="C0005918b04"
------------------------------------------------------------
Is there a way in Bioperl to send a query over the internet (one clone_id at a time) and get out just the GI
(Continue reading)

Sergei Ryazansky | 1 May 2006 23:55
Picon

Re: blast program to run locally on windows

Hi,
Can you post your formatdb.log file here?
Chris Fields | 2 May 2006 06:15
Picon

Re: blast program to run locally on windows

We managed to work our way through it.  He hadn't set ncbi.ini to the  
correct directories; the database was formatted correctly.

Chris

On May 1, 2006, at 4:55 PM, Sergei Ryazansky wrote:

> Hi,
> Can you post your formatdb.log file here?
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l <at> lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
Chris Fields | 2 May 2006 18:19
Picon

Bio::DB::GenBank and complexity

I ran into some wonkiness with using extra parameters ('seq_start',
'seq_stop', 'strand', and 'complexity') with Bio::DB::GenBank that I have
gone through, fixed, and committed.  I also have added a few tests to DB.t
for everything (all changes were in Bio::DB::WebDBSeqI and
Bio::DB::NCBIHelper).  The 'complexity' tag is the strangest, though I did
manage to get it added as well (with tests).  This is how NCBI defines
complexity:

complexity regulates the display:
0 - get the whole blob
1 - get the bioseq for gi of interest (default in Entrez)
2 - get the minimal bioseq-set containing the gi of interest
3 - get the minimal nuc-prot containing the gi of interest
4 - get the minimal pub-set containing the gi of interest

Here's my quandary; when setting complexity to '0', you get a glob back (the
main sequence as well as any subsequences, such as CDS); this is in essence
a sequence stream with multiple alphabet types.  So, I now have it set up to
do this:

my $factory = Bio::DB::GenBank->new(-format => 'fasta',
                                    -complexity => 0
                                   );

my $seqin = $factory->get_Seq_by_acc($acc);

while (my $seq = $seqin->next_seq) {
    $seqout->write_seq($seq);
}

(Continue reading)

Mark A. Miller | 2 May 2006 13:41
Picon
Favicon

Can't parse bacterial strain from EMBL OS or RC lines

Hello all.

I have a recently donwloaded UniProt/TrEMBL flat file.  I am trying to
make FASTA subset files for some bacterial strains.  I haven't been
able to parse out the strain information from the OS or RC lines. 
These lines typically look like:

OS Somegenus somespecies subsp. somesubspecies strain ABC123.
RC STRAIN=ABC123.

I'm not especiialy good with Perl, and I'm definitely weak when it
comes to OOP.

I have included some code I pasted together from various pages on the
bioperl wiki.  In addition to the wiki, I have been making use of 
www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html

The code I have so far reports the species but not the subspecies or
variant.  I have also tried to walk through all of the feature,
annotation and reference objects but I still can't seem to parse out
the information I need.  (For brevity, the example I'm including below
only lists the code I used for the annotation objects.)  Also, this
code only prints the information...  I know that I'll have to write a
FASTA sequence object seperately.

Any suggestions?

Thanks,
Mark

(Continue reading)

Chris Fields | 2 May 2006 20:01
Picon

Re: Bio::DB::GenBank and complexity

I hate responding to my own post!  Just wanted to add that I'm adding a
warnings for the get_Seq* methods to use the approp. get_Stream* method when
complexity == 0 before returning the Bio::SeqIO object.

CJF

> -----Original Message-----
> From: bioperl-l-bounces <at> lists.open-bio.org [mailto:bioperl-l-
> bounces <at> lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Tuesday, May 02, 2006 11:20 AM
> To: bioperl-l <at> lists.open-bio.org
> Subject: [Bioperl-l] Bio::DB::GenBank and complexity
> 
> I ran into some wonkiness with using extra parameters ('seq_start',
> 'seq_stop', 'strand', and 'complexity') with Bio::DB::GenBank that I have
> gone through, fixed, and committed.  I also have added a few tests to DB.t
> for everything (all changes were in Bio::DB::WebDBSeqI and
> Bio::DB::NCBIHelper).  The 'complexity' tag is the strangest, though I did
> manage to get it added as well (with tests).  This is how NCBI defines
> complexity:
> 
> complexity regulates the display:
> 0 - get the whole blob
> 1 - get the bioseq for gi of interest (default in Entrez)
> 2 - get the minimal bioseq-set containing the gi of interest
> 3 - get the minimal nuc-prot containing the gi of interest
> 4 - get the minimal pub-set containing the gi of interest
> 
> Here's my quandary; when setting complexity to '0', you get a glob back
> (the
(Continue reading)

Jason Stajich | 2 May 2006 20:36
Favicon

Re: Can't parse bacterial strain from EMBL OS or RC lines

This is really a limitation of the EMBL/GenBank format

See this thread:
http://lists.open-bio.org/pipermail/bioperl-l/2006-March/021068.html

or on GMANE
http://comments.gmane.org/gmane.comp.lang.perl.bio.general/10557

I don't know if any of this has been resolved really so hopefully  
James will speak up if he's implemented anything.

-jason
On May 2, 2006, at 7:41 AM, Mark A. Miller wrote:

> Hello all.
>
> I have a recently donwloaded UniProt/TrEMBL flat file.  I am trying to
> make FASTA subset files for some bacterial strains.  I haven't been
> able to parse out the strain information from the OS or RC lines.
> These lines typically look like:
>
> OS Somegenus somespecies subsp. somesubspecies strain ABC123.
> RC STRAIN=ABC123.
>
> I'm not especiialy good with Perl, and I'm definitely weak when it
> comes to OOP.
>
> I have included some code I pasted together from various pages on the
> bioperl wiki.  In addition to the wiki, I have been making use of
> www.pasteur.fr/recherche/unites/sis/formation/bioperl/ch02s02.html
(Continue reading)

Marco Blanchette | 2 May 2006 21:30
Picon
Favicon

Bio::RangeI intersection and Bio::DB::GFF

Dear all--

I have been trying to use the intersection function to extract overlapping
region from alternatively spliced exons as in the following script. The
returned object from the 'my $overlap = $exon1->intersection($exon2);' is
actually loosing the strand of $exon1 if $exon1 is from the negative strand.
Is this behavior expected? Should I check the strand of $exon1 before
working on the object return by any Bio::RangeI function?

Many thanks 

#!/usr/bin/perl
use strict;
use warnings;
use Bio::DB::GFF;

MAIN:{

    my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
                                -dsn =>
'dbi:mysql:database=dmel_43_LS;host=riolab.net',
                                -user => 'guest');
    my $test_db = $db->segment('4');

    # Load up the exons into $exons_p
    for my $gene ($test_db->features(-types => 'gene')){

        my $exons_p = extractExons($gene);

        cluster($exons_p) unless ($#{$exons_p} == -1);
(Continue reading)

Brian Osborne | 2 May 2006 22:17
Favicon

Re: Bio::RangeI intersection and Bio::DB::GFF

Marco,

Yes, this is how intersection() is supposed to work. If both of the Range
objects have the same strand then the strand information is returned as part
of the result but if they aren't on the same strand then no strand
information is returned.

Brian O.

On 5/2/06 3:30 PM, "Marco Blanchette" <mblanche <at> berkeley.edu> wrote:

> Dear all--
> 
> I have been trying to use the intersection function to extract overlapping
> region from alternatively spliced exons as in the following script. The
> returned object from the 'my $overlap = $exon1->intersection($exon2);' is
> actually loosing the strand of $exon1 if $exon1 is from the negative strand.
> Is this behavior expected? Should I check the strand of $exon1 before
> working on the object return by any Bio::RangeI function?
> 
> Many thanks 
> 
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Bio::DB::GFF;
> 
> MAIN:{
> 
>     my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
(Continue reading)


Gmane