Daniel Renfro | 1 Feb 04:22
Picon
Gravatar

New package to compare two SeqI-implementing objects

Hello all,

A colleague and I have been working on a (Bio)Perl package to compare two
Seq objects. This is in response to a need we found in our lab -- we wanted
to see the changes to GenBank files through time, but wanted an automated
way to do this. This led to what I'm calling the SeqDiff.pm package. I
thought it would be a good idea to inform the community and get some
feedback.

The package takes two Seq objects as arguments, arbitrarily called "old" and
"new." It then matches the features from the old object with the new object.
This is done based on some criteria -- in our case we decided the features
must be of the same type (have the same primary_tag) and have at least one
matching database cross-reference (db_xref) in common.  The left-over
features (ones that did not have a match) are dropped into arrays called
"lost" and "gained." The matching is done in about NlogN time, as each
matching pair are removed from subsequent searches.

The matched features and iterated through and the differences are
calculated. Each feature is examined recursively and any differences are
reported. Optionally you can give the new() method a flag so that everything
is returned (differences and similarities.) You can set callbacks for
different types of objects (like anything that isa('Bio::LocationI')) if you
want a custom comparison for specific BioPerl objects. This comparison step
is the computationally slow part, and currently everything is held in
memory. I think it'd be better to do this piece-meal, using the BioPerl-ish
next() and last() methods.

Maybe this was a little verbose, but that is the SeqDiff package in a
nutshell. I hope to soon release v1.0. If you have any questions or comments
(Continue reading)

Mark A. Jensen | 1 Feb 04:47
Picon
Favicon
Gravatar

Re: New package to compare two SeqI-implementing objects

Daniel-- this sounds interesting and useful, I +1 it. Your intuition about
in-memory vs streaming sounds correct to me; features can be many, and
diffing many (MANY) sequences may bork. Maybe our feature-rich users
can chime in. (...however, I did just hear about a magic spell called 
'File::Map',
might check that out on CPAN.)
cheers- MAJ
----- Original Message ----- 
From: "Daniel Renfro" <bluecurio <at> gmail.com>
To: <bioperl-l <at> lists.open-bio.org>
Sent: Sunday, January 31, 2010 10:22 PM
Subject: [Bioperl-l] New package to compare two SeqI-implementing objects

> Hello all,
>
> A colleague and I have been working on a (Bio)Perl package to compare two
> Seq objects. This is in response to a need we found in our lab -- we wanted
> to see the changes to GenBank files through time, but wanted an automated
> way to do this. This led to what I'm calling the SeqDiff.pm package. I
> thought it would be a good idea to inform the community and get some
> feedback.
>
> The package takes two Seq objects as arguments, arbitrarily called "old" and
> "new." It then matches the features from the old object with the new object.
> This is done based on some criteria -- in our case we decided the features
> must be of the same type (have the same primary_tag) and have at least one
> matching database cross-reference (db_xref) in common.  The left-over
> features (ones that did not have a match) are dropped into arrays called
> "lost" and "gained." The matching is done in about NlogN time, as each
> matching pair are removed from subsequent searches.
(Continue reading)

Jelle Scholtalbers | 1 Feb 13:24
Picon

Bio::DB::SeqFeature::Store::memory -> filter_by_type very slow

Hi,
I used the Bio::DB::SeqFeature::Store::memory module to load in a GFF3 file
which I could then use in my script in a 'queryable' way. To retrieve
features I used for example
        $db->features(-type => 'BAC:FPC', -seq_id=>'chromosome0')
However when doing a profile on my script I found out that 60% of the
running time went into filter_by_type from
Bio::DB::SeqFeature::Store::memory.
Replacing this function with
     my @features = grep{$_->type eq 'BAC:FPC'}
$db->features(-seq_id=>'chromosome0')
which gave me the same results was just a fraction of the earlier run time.
My script went from 60min. to 4min. for the same result and only changing
this function (is called often).
Can/Should this be fixed or is this just the faster way to do it?

Cheers,
Jelle
Chris Fields | 1 Feb 14:23
Favicon
Gravatar

Re: Bio::DB::SeqFeature::Store::memory -> filter_by_type very slow

Jelle,

Seems reasonable, but Lincoln and Scott know that code better and are better suited to comment on it.
Lincoln, Scott?

chris

On Feb 1, 2010, at 6:24 AM, Jelle Scholtalbers wrote:

> Hi,
> I used the Bio::DB::SeqFeature::Store::memory module to load in a GFF3 file
> which I could then use in my script in a 'queryable' way. To retrieve
> features I used for example
>       $db->features(-type => 'BAC:FPC', -seq_id=>'chromosome0')
> However when doing a profile on my script I found out that 60% of the
> running time went into filter_by_type from
> Bio::DB::SeqFeature::Store::memory.
> Replacing this function with
>    my @features = grep{$_->type eq 'BAC:FPC'}
> $db->features(-seq_id=>'chromosome0')
> which gave me the same results was just a fraction of the earlier run time.
> My script went from 60min. to 4min. for the same result and only changing
> this function (is called often).
> Can/Should this be fixed or is this just the faster way to do it?
> 
> Cheers,
> Jelle
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l <at> lists.open-bio.org
(Continue reading)

Mark A. Jensen | 1 Feb 15:39
Picon
Favicon
Gravatar

Re: Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error running blastn

This issue appears resolved; will close out #3003--
thanks
MAJ
----- Original Message ----- 
From: "mike stubbington (BI)" <mike.stubbington <at> bbsrc.ac.uk>
To: "Mark A. Jensen" <maj <at> fortinbras.us>
Cc: <bioperl-l <at> lists.open-bio.org>; <Brian <at> portal.open-bio.org>; "Osborne" 
<bosborne11 <at> verizon.net>
Sent: Friday, January 29, 2010 8:25 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
error running blastn

Hi Mark,

Thanks for your continued help.

It now fails with this:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
/usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file 
found for nucleotide database [MouseGenome] in search path 
[/Volumes/stubbing/PerlScripts/5CTest/trunk::]

STACK Bio::Tools::Run::WrapperBase::_run 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
(Continue reading)

Jason Stajich | 1 Feb 20:48
Gravatar

Fwd: megablast using bioperl

Xiujing -
Your message is best asked on the mailing list.

Begin forwarded message:

> From: Xiujing Gu <xiujingg <at> yahoo.com>
> Date: February 1, 2010 7:58:55 AM PST
> To: jason <at> bioperl.org
> Subject: megablast using bioperl
>
> Hello Jason:
>
> After numerous googling, I found you might be the only person who  
> knows about this subject. Would you so kindly show me an example of  
> setting up megablast using bioperl Bio::Tools::Run::StandAloneBlast?
>
> Thanks a lot!
>
> Best regards,
> Xiujing
Smithies, Russell | 1 Feb 21:12
Picon

Re: Fwd: megablast using bioperl

Have you read the docs? It's very well documented:
http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/StandAloneBlast.pm

Have you searched the BioPerl wiki? Again, well documented with examples:
http://www.bioperl.org/wiki/HOWTO:StandAloneBlast

And if that doesn't help, perhaps you need to improve your Google skills:
http://www.google.com/support/websearch/?ctx=web

--Russell

> -----Original Message-----
> From: bioperl-l-bounces <at> lists.open-bio.org [mailto:bioperl-l-
> bounces <at> lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Tuesday, 2 February 2010 8:48 a.m.
> To: BioPerl List
> Cc: xiujingg <at> yahoo.com
> Subject: [Bioperl-l] Fwd: megablast using bioperl
> 
> Xiujing -
> Your message is best asked on the mailing list.
> 
> Begin forwarded message:
> 
> > From: Xiujing Gu <xiujingg <at> yahoo.com>
> > Date: February 1, 2010 7:58:55 AM PST
> > To: jason <at> bioperl.org
> > Subject: megablast using bioperl
> >
> > Hello Jason:
(Continue reading)

Mark A. Jensen | 1 Feb 21:22
Picon
Favicon
Gravatar

Re: Fwd: megablast using bioperl

Xiujing--
If you choose to use blast+, this is pretty easily done with
Bio::Tools::Run::StandAloneBlastPlus :

 use Bio::Tools::Run::StandAloneBlastPlus;
 $ENV{BLASTPLUSDIR} = "/path/to/blast+/bin";

 $fac = Bio::Tools::Run::StandAloneBlastPlus->new(
    -db_name => '/path/to/blastdb/mydb'
);

 $result = $fac->blastn( -query => 'myqueryseqs.fas' );

 while ($hit = $result->next_hit) {
  # process hits
 }

Here, blastn is used, because in blast+, megablast is the default
algorithm. --to specify others, you set the -task parameter: e.g., 
legacy blastn would be

 $result = $fac->blastn( -query =. 'myqueryseqs.fas',
                                     -task => 'blastn' );

For much more information, please look at 
http://bioperl.org/wiki/HOWTO:SearchIO
and
http://bioperl.org/wiki/HOWTO:BlastPlus

Many questions you might have will be answered in those pages.
(Continue reading)

Abhishek Pratap | 1 Feb 21:36
Picon
Gravatar

Pulling down data from NCBI

Hi All

I looking to batch download some 34K nucleotide sequences,
corresponding to a NCBI accession number. I tired the following and
getting an error. Has it got anything to do with recent update to code
that Chris was discussing.

my $factory = Bio::DB::EUtilities->new  (
					-eutil	=> 'efecth',
					-db 	=>	'nucleotide',
					-retype	=>	'fasta',
					-id		=> $id
				);

----------- EXCEPTION: Bio::Root::Exception -------------
MSG: efecth not supported
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:357
STACK: Bio::Tools::EUtilities::EUtilParameters::eutil
/usr/lib/perl5/vendor_perl/5.8.8/Bio/Tools/EUtilities/EUtilParameters.pm:452
STACK: Bio::Root::RootI::_set_from_args
/usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/RootI.pm:546
STACK: Bio::Tools::EUtilities::EUtilParameters::new
/usr/lib/perl5/vendor_perl/5.8.8/Bio/Tools/EUtilities/EUtilParameters.pm:193
STACK: Bio::DB::EUtilities::new
/usr/lib/perl5/vendor_perl/5.8.8/Bio/DB/EUtilities.pm:74
STACK: ./getDatafromNCBI.pl:9

-Abhi
(Continue reading)

Chris Fields | 1 Feb 21:38
Favicon
Gravatar

Re: Pulling down data from NCBI

that would be 'efetch', not 'efecth'.  

chris

On Feb 1, 2010, at 2:36 PM, Abhishek Pratap wrote:

> Hi All
> 
> I looking to batch download some 34K nucleotide sequences,
> corresponding to a NCBI accession number. I tired the following and
> getting an error. Has it got anything to do with recent update to code
> that Chris was discussing.
> 
> 
> 
> my $factory = Bio::DB::EUtilities->new  (
> 					-eutil	=> 'efecth',
> 					-db 	=>	'nucleotide',
> 					-retype	=>	'fasta',
> 					-id		=> $id
> 				);
> 
> 
> ----------- EXCEPTION: Bio::Root::Exception -------------
> MSG: efecth not supported
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Root/Root.pm:357
> STACK: Bio::Tools::EUtilities::EUtilParameters::eutil
> /usr/lib/perl5/vendor_perl/5.8.8/Bio/Tools/EUtilities/EUtilParameters.pm:452
(Continue reading)


Gmane