Hilmar Lapp | 20 Jul 00:09 2016
Picon
Gravatar

Docker containers for running BioPerl programs in a reproducible environment

As one of the results of the 2016 Codefest that took place in Orlando, FL, prior to BOSC, there are now “official” Docker images of and for BioPerl, to enable running BioPerl-dependent programs to run in reproducible software environments. 

The images are available from Docker Hub:

The source Dockerfiles are maintained in Github:

At present, there are two image repositories on Docker Hub (see link above). One, bioperl/bioperl contains a full BioPerl installation, including (presently) the add-on modules Bio::ASN1::EntrezGene and Bio::Phylo. There are currently 3 tags available for the image: ‘latest’ for a build of the latest changes to master of the bioperl-live repo (provided they pass Travis CI); ‘stable’ for the latest stable release published on CPAN (currently 1.6.924); and ‘release-1-6-924’ as a build of the v1.6.924 release (as published under the corresponding release tag in the bioperl-live repo on GitHub). I expect that new stable releases will receive their own tagged build as well. FYI, the ‘latest’ and ‘stable’ images are triggered from successful Travis CI runs to be automatically rebuilt upon corresponding changes to the bioperl-live repo.(*)

These images are build off of the bioperl/bioperl-deps image, the second BioPerl image repo on Docker Hub. This image uses Ubuntu 14.04 as a basis and pre-installs (almost) all dependencies, both mandatory and optional ones. Everyone is welcome to use the bioperl/bioperl-deps image as well if you need a customized build or special version of BioPerl. This includes running the test suite if you’re changing some piece of BioPerl but don’t want to install gazillions of specialized Perl packages onto your laptop (which is where this whole undertaking started at Codefest). 90% of the time to install BioPerl is spent on installing dependencies, so using this image as a basis will not only help harmonize the software environment in which we build and run BioPerl, but also cut down dramatically on installation time.

If you need

- additional dependencies, such as but not limited to SOAP::Lite (the only optional dependency currently not included in the bioperl/bioperl-deps image);
- images with additional BioPerl modules installed, such as bioperl-run, bioperl-db, or others; 
- images with tagged versions of BioPerl other than the ones already available, such as earlier stable releases;
- a different OS base images than Ubuntu 14.04;

then please post an issue on the tracker of the Dockerfile repo:

The choices we’ve made are mainly because they seemed reasonable and achievable in a short amount of time. This is as much an open-source project as BioPerl itself. Feedback, contributions, and contributors are most welcome. And in case someone is wondering I’ll be looking into Quay as well.

Thanks to Brad for organizing Codefest every year, and for making it so much fun to attend. And to Chris F for patiently entertaining my occasional rants about Perl, BioPerl, Docker Hub, and the world in general :-)

Cheers,

  -hilmar

(*) If you are curious about how this works, see here: https://github.com/bioperl/bioperl-live/pull/175 (there are some subsequent tweaks to this, but this is the gist of it). (BTW you cannot trigger regular expression pattern-based image tags through this mechanism. Yes, Docker Hub does not document this limitation.)

-- 
Hilmar Lapp -:- lappland.io



_______________________________________________
Bioperl-l mailing list
Bioperl-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Karsten Sieber | 12 Jul 20:43 2016
Picon

Taxonomy->get_taxon() bug

I have experienced that Bio::DB::Taxonomy::entrez->get_taxon() dies when pulling the taxon object for taxon ids corresponding to some of the top level taxons (http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Root&id=1&lvl=3&keep=1&srchmode=1&unlock). Specifically, I have found that get_taxon() dies when querying taxon ids corresponding to "cellular organisms" (131567), viroids (12884), viruses (10239), unclassified (12908), and "other sequences" (28384). Below is code demonstrating the error. I think this is a bug because get_taxon() should always return a taxon object for a valid taxon id.

Best,
Karsten

## $db->get_taxon() working as intended:
perl -MBio::DB::Taxonomy -we '$db=Bio::DB::Taxonomy->new(-source=>"entrez"); <at> taxonids=$db->get_taxonids("bacteria"); print join(" : ", <at> taxonids)."\n"; $taxon=$db->get_taxon( <at> taxonids); print "Pass:\t"; print $taxon->scientific_name."\n";'
2
Pass: Bacteria

## $db->get_taxon() fails to pull the taxon object for viruses
perl -MBio::DB::Taxonomy -we '$db=Bio::DB::Taxonomy->new(-source=>"entrez"); <at> taxonids=$db->get_taxonids("viruses"); print join(" : ", <at> taxonids)."\n"; $taxon=$db->get_taxon( <at> taxonids); print "Pass:\t"; print $taxon->scientific_name."\n";'
10239
Can't call method "children" on an undefined value at /Users/ksieber/perl5/lib/perl5/Bio/DB/Taxonomy/entrez.pm line 361.

## $db->get_taxon() fails with unclassified
perl -MBio::DB::Taxonomy -we '$db=Bio::DB::Taxonomy->new(-source=>"entrez"); <at> taxonids=$db->get_taxonids("unclassified"); print join(" : ", <at> taxonids)."\n"; $taxon=$db->get_taxon( <at> taxonids); print "Pass:\t"; print $taxon->scientific_name."\n";'
12908
Can't call method "children" on an undefined value at /Users/ksieber/perl5/lib/perl5/Bio/DB/Taxonomy/entrez.pm line 361

## fresh Bio::Perl install from the github repo 07.11.2016

perl -MBio::Perl -e 'print Bio::Perl->VERSION . "\n";'
1.006924
_______________________________________________
Bioperl-l mailing list
Bioperl-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Peter Cock | 22 Jun 16:52 2016
Gravatar

Fwd: [Utilities-announce] NCBI will transition to HTTPS on September 1, 2016

Hello to all the Open Bio* projects etc,

This is just a heads up that (all?) the NCBI website resources will be switching
to HTTPS later this year.

From my testing for Biopython is seems you can already request HTTPS,
and the good news is that Entrez (E-Utils) and QBLAST both already seem
to work:

https://github.com/biopython/biopython/pull/860

Peter

---------- Forwarded message ----------
From:  <utilities-announce <at> ncbi.nlm.nih.gov>
Date: Wed, Jun 22, 2016 at 2:31 PM
Subject: [Utilities-announce] NCBI will transition to HTTPS on September 1, 2016
To: NLM/NCBI List utilities-announce <utilities-announce <at> ncbi.nlm.nih.gov>

Starting on September 1st, when you visit NCBI pages, you’ll see a
green lock and https:// in the address bar instead of http://. This
lets you know that you are really on an NCBI page – that our server
identity is confirmed – and that your communication with our server is
encrypted and private.

Here’s what to expect if you’re a general user or a scripter:

For general users

You will see the changes mentioned above – https:// and a green lock
in the address bar – but you don’t have to update or change anything.

You don’t need to clear your cache or update any links to NCBI pages
that you’ve put on your own webpages or shared with people. We will
redirect all our pages to https://.

For scripters

To keep calls from failing, use https:, not http:.

Scripts that use HTTP POST to send data will not work once we
transition from HTTP to HTTPS on September 1st.

If you'd like to know more about this change to HTTPS, please read The
HTTPS-Only Standard [https://https.cio.gov/ ] from the Federal Chief
Information Officers website.

_______________________________________________
Utilities-announce mailing list
http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce

_______________________________________________
Bioperl-l mailing list
Bioperl-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Lee Katz | 8 Jun 17:32 2016
Picon
Gravatar

Tree comparisons

Hi, I am comparing two trees with a metric and would like to make a background distribution of random trees. From there I could calculate a Z score and p value.  I thought that the method to do so would involve feeding Tree::RandomFactory isolate names and branch lengths from the original tree. However, this class mutates the branch lengths according to coalescent methods like Yule and does not accept random branch lengths.

So I'm not really sure where to go from there. Should I be using something like Yule to make a background distribution of random trees? Or should I implement a method to take random branch lengths from the original trees?

_______________________________________________
Bioperl-l mailing list
Bioperl-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Mark A. Jensen | 1 Jun 07:06 2016
Picon
Gravatar

BP split progress and rationale

All,

I've made some significant progress towards a BP split. I know there 
have been several tries, but I'm willing to take this one to an 
actionable endpoint with YAPC::NA 2016 as a goal date for action.

I have built a graph of all the module dependencies (parent-child and 
horizontal) in Neo4j, and have been using this to design module 
groupings that encompass functional areas and also have hierarchical 
group dependencies such that the dependencies between groups are 
minimized. I'm calling the groupings "packages".

I am using the loose convention that "monophyletic" packages (groups of 
modules that fall within a namespace) are named after the namespace, and 
"polyphyletic" packages are named "BioPerl::<functional name>". The 
following packages are currently pretty solid. The descriptions indicate 
mainly what is encompassed by the contained modules, not rules for 
membership.

BioPerl::Base - Bio::Root::*, general design pattern helpers (i.e., 
many Bio::*I, Bio::Factory::*, Build helper classes.)

BioPerl::Sequence - Bio::Seq, Bio::SeqIO, and SeqIO drivers that can do 
without annotations (e.g., fasta)

BioPerl::Alignment - alignment objects and parsers

BioPerl::Annotation - most annotation modules

BioPerl::SeqFeature - most SeqFeature modules

BioPerl::Tree - most Tree related modules

BioPerl::DB - Most Bio::DB::*, Bio::Das interfaces

BioPerl::Search - The blast parsing and tiling

There are quite a few more. Examples of the logic: BioPerl::Base 
contains all of its dependencies. BioPerl::Sequence requires only 
BioPerl::Base to satisfy all its BP dependencies. BioPerl::Alignment 
requires BioPerl::Base and BioPerl::Sequence. BioPerl::Search requires 
Base, Sequence, and SeqFeature. And so on.

With a structure like this, a user who just needs Bio::PrimarySeq and 
Bio::SeqIO to read some fasta files can get away with installing 
BioPerl::Base and BioPerl::Sequence, about 141 modules, as opposed to 
the full 805 modules, including that broadly useful one 
"Bio::DB::HIV::HIVQueryHelper".

Once finished, I'll propose setting many of the namespaces free as 
separate CPAN packages - Bio::Restriction, Bio::DB::HIV, and others. 
These can be packaged with their appropriate BioPerl::* prerequisites in 
the metadata. I expect this will allow natural selection to operate much 
more efficiently on the obsolete modules.

I will set up CPAN::Meta compliant metadata for everything.

I have more thoughts but this is already too long.

MAJ
Roy Chaudhuri | 31 May 17:39 2016
Picon

Re: issue writing to pipe with bio sequoia

Hi Stephane,

Please remember to copy in the mailing list on replies. In my tests, a 
leading space doesn't make any difference to the issue - Bio::SeqIO 
still tries to read from the pipe rather than writing to it.

Mark - I have opened an issue on github:
https://github.com/bioperl/bioperl-live/issues/153

I think the issue is the regex in Bio::Root::IO->cleanfile not 
recognising a leading pipe character as indicating a "write" filehandle.

Cheers,
Roy.

On 31/05/2016 15:58, Stephane Plaisance | VIB | wrote:
> Thanks a lot Roy,
>
> I found that a leading space before the pipe and possibly the space directly after it were guilty, leading
to the executable not found although fully provided with path ??? what works is
>>> -file => " |$bgzip -c >  $outfile\.gz")
>
> but you fh version is precious and I will record it for future use.
>
> Thanks!
> stephane
>
>> On 31 May 2016, at 16:45, Roy Chaudhuri <roy.chaudhuri <at> gmail.com> wrote:
>>
>> Hi Stephane,
>>
>> According to the SeqIO docs your way should work, so this is probably a bug. However, Bio::SeqIO can
accept a filehandle instead using the -fh option, so as a workaround you could try something like this:
>>
>> open my $fh, '|$bgzip -c > $outfile.gz' or die $!;
>> $seq_out=Bio::SeqIO->newFh(-format=>'fasta', -fh=>$fh);
>>
>> Cheers,
>> Roy.
>>
>> On 31/05/2016 15:06, Stephane Plaisance | VIB | wrote:
>>> Dear,
>>> I try to bgzip my fasta outpmut fbut have an issue
>>>
>>>
>>> my $seq_out;
>>> if ( defined($zipit) ) {
>>> 	my $bgzip = `which bgzip`;
>>> 	chomp($bgzip);
>>> 	die "No bgzip command available\n" unless ( $bgzip );
>>> 	$seq_out = Bio::SeqIO -> newFh( -format => 'Fasta', -file => "  | $bgzip -c >  $outfile\.gz");
>>> } else {
>>> 	$seq_out = Bio::SeqIO -> newFh( -format => 'Fasta', -file => ">$outfile" );
>>> }
>>>
>>> the code fails although bgzip is in my path and the  $bgzip variable sets it right
>>> Any help is very welcome
>>>
>>> Stephane
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l <at> mailman.open-bio.org
>>> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
>>>
>
Stephane Plaisance | VIB | | 31 May 16:06 2016
Picon
Gravatar

issue writing to pipe with bio sequoia

Dear,
I try to bgzip my fasta outpmut fbut have an issue

my $seq_out;
if ( defined($zipit) ) {
	my $bgzip = `which bgzip`;
	chomp($bgzip);
	die "No bgzip command available\n" unless ( $bgzip );
	$seq_out = Bio::SeqIO -> newFh( -format => 'Fasta', -file => "  | $bgzip -c >  $outfile\.gz");
} else {
	$seq_out = Bio::SeqIO -> newFh( -format => 'Fasta', -file => ">$outfile" );
}

the code fails although bgzip is in my path and the  $bgzip variable sets it right
Any help is very welcome

Stephane
Turnsek, Jernej | 26 May 02:09 2016
Picon

Re: Cannot download/find BPbl2seq module

<!-- P {margin-top:0;margin-bottom:0;} -->

Wonderful. I'll give it a shot with v1.5.2 or lower then.


Thank you all again!


Jernej


From: Fields, Christopher J <cjfields <at> illinois.edu>
Sent: Wednesday, May 25, 2016 5:10:35 PM
To: Turnsek, Jernej
Cc: Mark Jensen; Brian Osborne; bioperl-l <at> bioperl.org
Subject: Re: [Bioperl-l] Cannot download/find BPbl2seq module
 
Yep, those tools used Ian’s old BPLite, see here:


These would last be in the 1.5.2 release series and were removed prior to v1.6 as their functionality was largely subsumed by Bio::SearchIO.

chris

On May 25, 2016, at 4:06 PM, Turnsek, Jernej <turnsek <at> fas.harvard.edu> wrote:

Thanks everyone for your thoughts and suggestions, especially thanks for the github page with older BioPerl releases. I found this website from winter 07' stating:

"Bioperl's older BLAST report parsers - BPlite, BPpsilite, BPbl2seq and Blast.pm - are no longer supported but since legacy Bioperl scripts have been written which use these objects, they are likely to remain within Bioperl for some time."

The error message I see is here:

<OAF_error.png>

Jernej
From: Fields, Christopher J <cjfields <at> illinois.edu>
Sent: Wednesday, May 25, 2016 4:39:28 PM
To: Mark Jensen
Cc: Turnsek, Jernej; bioperl-l <at> bioperl.org
Subject: Re: [Bioperl-l] Cannot download/find BPbl2seq module
 
One thing to note: the OAF tools were last released in 2008.  There is some possibility that newer versions of bioperl may or may not work with this; if you run into problems I suggest using one of the older releases:


chris

On May 25, 2016, at 2:12 PM, Mark A Jensen <maj <at> fortinbras.us> wrote:

Oops, I am wrong about this. 
There is no such module BPbl2seq in any distribution, there is a module Bio::AlignIO::bl2seq. My guess is this a module created by the author of the oaf tool that uses bioperl under the hood. You can get a missing module error sometimes if the module.is present but contains a syntax error.
On Wed, May 25, 2016 at 2:45 PM, Mark A Jensen <maj <at> fortinbras.us> wrote:

Jernej, this is a script that should appear in your path if you install bioperl from cpan and choose yes for the question "install scripts?"
Mark
On Wed, May 25, 2016 at 1:21 PM, Turnsek, Jernej <turnsek <at> fas.harvard.edu> wrote:


Dear BioPerl community,


I am trying to use OAF - a Perl-based tool to analyze cDNA data - with a goal to detect antizyme or antizyme-like sequences. The publication describing the tool is available here. I installed Perl (v5.22.2) and BioPerl (v1.6.924) which I believe came with HMMER. I haven't installed FASTA and BLAST locally yet - they are both optional (see this link). What I tried to do next is replicate the "my_sequence" example listed on this website using the attached .pl script and .fasta file, but ended up stuck with the error stating I am missing a necessary BioPerl module -BPbl2seq - which I couldn't download from CPAN. I tried to locate it manually online, but it looks like it doesn't exist anymore. I've talked to some Perl specialists around here and it seems like I should reinstall Perl and BioPerl versions that were present around the time this software was developed (2007/2008) with a rationale that they will carry all the necessary modules including BPbl2seq

I'd greatly appreciate if you could provide me with some tips on how to proceed. I am working on a Lenovo Yoga PC, 64-bit Windows 8.1.



I look forward to hearing from you.

Thank you and kind regards,

Jernej Turnsek






Jernej Turnsek 
Ph.D. Candidate | Pamela Silver's Lab 
Department of Systems Biology | Harvard Medical School
200 Longwood Ave | Boston, MA 02115
(617) 797-5386 | Web | LinkedIn |  <at> SynEnthu
_______________________________________________
Bioperl-l mailing list
Bioperl-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
Matthew | 25 May 22:08 2016
Picon

barcode split a paired-end fastq file

Are there any BioPerl modules that would help to split a barcoded fastq 
file ?

I have tried FASTX-toolkit, but it does not work on paired-end data.

The file I have been given is paired-end but a single file, and I have a 
list of 8 or so 4 letter barcodes.

Matthew
Turnsek, Jernej | 25 May 18:44 2016
Picon

Cannot download/find BPbl2seq module

Dear BioPerl community,


I am trying to use OAF - a Perl-based tool to analyze cDNA data - with a goal to detect antizyme or antizyme-like sequences. The publication describing the tool is available here. I installed Perl (v5.22.2) and BioPerl (v1.6.924) which I believe came with HMMER. I haven't installed FASTA and BLAST locally yet - they are both optional (see this link). What I tried to do next is replicate the "my_sequence" example listed on this website using the attached .pl script and .fasta file, but ended up stuck with the error stating I am missing a necessary BioPerl module - BPbl2seq - which I couldn't download from CPAN. I tried to locate it manually online, but it looks like it doesn't exist anymore. I've talked to some Perl specialists around here and it seems like I should reinstall Perl and BioPerl versions that were present around the time this software was developed (2007/2008) with a rationale that they will carry all the necessary modules including BPbl2seq


I'd greatly appreciate if you could provide me with some tips on how to proceed. I am working on a Lenovo Yoga PC, 64-bit Windows 8.1.


I look forward to hearing from you.

Thank you and kind regards,

Jernej Turnsek




Jernej Turnsek 
Ph.D. Candidate | Pamela Silver's Lab 
Department of Systems Biology | Harvard Medical School
200 Longwood Ave | Boston, MA 02115
(617) 797-5386 | Web | LinkedIn | <at> SynEnthu
Attachment (oaf.pl): application/octet-stream, 12 KiB
Attachment (mysequence.fasta): application/octet-stream, 1289 bytes
_______________________________________________
Bioperl-l mailing list
Bioperl-l <at> mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/bioperl-l
James E Keenan | 16 May 01:41 2016
Picon
Picon
Gravatar

Need CPAN release to clear up CPANtesters test failure reports

The other day I was looking for unanswered Perl-tagged questions on 
stackoverflow.  I came across this one:

http://stackoverflow.com/questions/36985090/extracting-and-joining-exons-from-multiple-sequence-alignments

To answer it I tried to install BioPerl from CPAN but got multiple test 
failures.  I then checked out the BioPerl page on CPANtesters and saw 
that your latest CPAN distribution has experienced massive test failures 
on many versions of Perl and several different operating systems.  See: 
  http://matrix.cpantesters.org/?dist=BioPerl+1.6.924

I subsequently located your github site at 
https://github.com/bioperl/bioperl-live, from which I was able to fork. 
  When I attempted to build locally, I was pleasantly surprised to find 
all tests run by ./Build test were PASSing.  The only problem I saw was 
a "missing or corrupt MANIFEST" message, for which I have supplied this 
pull request:  https://github.com/bioperl/bioperl-live/pull/150.

Your last CPAN release was in July 2014.  I strongly advise that you do 
a CPAN release of BioPerl ASAP so that your project does not suffer 
reputational damage from all those test failure reports.  At the same 
time, I urge you to go through the bug reports at 
https://rt.cpan.org/Public/Dist/Display.html?Name=BioPerl.  It's likely 
that HEAD on github resolves some of these.

Thank you very much.
Jim Keenan

Gmane