newtoperlprog | 24 Nov 23:07 2014

parse genebank for gene name/ id

Dear All,

I am trying to parse a genebank summary files for the following records:

Sequence, description, division, GI number, gene id/name, version and organism.

I used a script from the bioperl webpage and used to parse the above.

I am getting problem parsing the gene id/name, version and cds information, organism.

Could you please help with the same.

Below is the code I am using:

use strict;
use warnings;
use Bio::SeqIO;
use Bio::Seq;

my $seqobj;
my $file = "NM_000040.summary";

my $seqio = Bio::SeqIO->new (-format => 'GenBank',
                             -file   => $file);
print ref($seqio);
while ($seqobj = $seqio->next_seq ()) {
    printf "Sequence:    %s\n",$seqobj->seq;
    printf "Display ID:  %s\n",$seqobj->display_id;
    printf "Description: %s\n",$seqobj->desc;
    printf "Division:    %s\n",$seqobj->division;
    printf "Accession:   %s\n",$seqobj->accession_number;
    printf "GI number:   %s\n",$seqobj->primary_id;
    printf "Definition:  %s\n",$seqobj->seq_version;

Any help is greatly appreciated.

Bioperl-l mailing list
Bioperl-l <at>
Jennifer Krauel | 20 Nov 19:45 2014

Re: Error dereferencing sequence in SeqIO object

Well.  I made a very small fastq input file, stripped out all possible code, ran it in the debugger, and then it started working.  Now the original code and data files also work.  File it under "just keep trying stuff".

On Wednesday, November 12, 2014 6:01:01 PM UTC-6, Jennifer Krauel wrote:
I'm having some trouble running a very basic bioperl script to transform a fastq file into a fasta file.  When I try to dereference a sequence object I get this error: "Can't call method "seq" without a package or object reference..."
It's boilerplate code, and the error is on the last line in the snippet below:

my $seq_in = Bio::SeqIO->new(
                             -file   => "<$infile",
                             -format => $infileformat,
my $i=0;
while (my $this_seq = $seq_in->next_seq && $i < 100)
    { #while more sequences in fastq file
    my $seqstring = $this_seq->seq();

The counter is just to limit the number of reads while I'm testing the code, I don't think it should be causing the problem. I tried to google the error but didn't come up with anything useful except the suggestion that I might not be working with a clean or up to date bio perl installation.

When I try to get the bioperl version using this code:
perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
I get 1.005002102, which seems odd.

When I ask for the version number of SeqIO using
perl -MBio::SeqIO -e 'printf "%vd\n", $Bio::SeqIO::VERSION'
I get

I am using CloudBioLinux on AWS, with whatever standard installation that provides (which is the whole point of using AWS).

Is there something bone-headed I am doing, or is this an issue I should pursue with the Cloud folks?


Bioperl-l mailing list
Bioperl-l <at>
Fields, Christopher J | 17 Nov 16:50 2014

Next bioperl release

I will likely start working on a new BioPerl release in the next week or two to attempt fixing some of the CPAN
indexing problems that have plagued us in the last few releases (final version to be mentioned when that is
fixed :).  

Along those lines, a few things when committing code:

1) Add yourself to the AUTHORS file if you have ben making code changes.  We like to give credit where credit is due!
2) Please make sure to update the ‘Changes’ file if appropriate.  This is particularly important if the
changes are addressing bugs or adding functionality.
3) Make sure anything potentially code-breaking is made on a branch first and merged in.  This hasn’t been
a problem recently (the coveralls work that Franscisco is unfortunately easier to deal with on master, so
that is a clear exception).


Doug Hershberger | 14 Nov 00:04 2014

Looking for recommendations for a Perl web security consultant

Our company is beefing up our overall computer security. This includes making modifications to many custom legacy web based Perl cgi bioinformatics databases, as well as refactoring a few of these applications using the Perl Catalyst framework. We are looking for a consultant with expertise in Perl web security, preferably someone who also has experience with Catalyst best practices to review our code and make recommendations and/or rewrite some necessary pieces.  If they possess any relevant web application security credentials, that would be a plus.

Thanks in advance for any help you can provide in our search.
Doug Hershberger
Bioperl-l mailing list
Bioperl-l <at>
Cacau Centurion | 10 Oct 19:46 2014

Problems when using PAML::yn00

Hi All,

I tried to use PAML:Yn00 to run yn00 and parse the result. However, no results were given. Does anyone know what might be the problem?

The following code is obtained from 

use Bio::Tools::Run::Phylo::PAML::Yn00;
use Bio::AlignIO;
my $alignio = Bio::AlignIO->new(
    -format => 'fasta',
    -file   => "$ARGV[0]"
my $aln = $alignio->next_aln;

my $yn = Bio::Tools::Run::Phylo::PAML::Yn00->new();
my ( $rc, $parser ) = $yn->run;
while ( my $result = $parser->next_result ) { 
    my <at> otus     = $result->get_seqs();
    my $MLmatrix = $result->get_MLmatrix();

    #0 and 1 correspond to the 1st and 2nd entry in the <at> otus array
    my $dN   = $MLmatrix->[0]->[1]->{dN};
    my $dS   = $MLmatrix->[0]->[1]->{dS};
    my $kaks = $MLmatrix->[0]->[1]->{omega};
    print "Ka = $dN Ks = $dS Ka/Ks = $kaks\n";


Bioperl-l mailing list
Bioperl-l <at>
Andreas Prlic | 7 Oct 20:52 2014

The NIH Software Discovery Index | We invite your comments -- a system for linking software, publications and users in the research community.

Greetings Everyone,


On behalf of a number of software developers, end-users, publishers associated with the scientific analysis community, we would like to invite all of you to review a document generated as a result of a NIH BD2K supported meeting that focused on the opportunities and challenges of developing a software management ecosystem that could be valuable for finding and linking software, publications and users in the research community. You may be also be aware of a related project, the Data Discovery Index, which will be fully integrated with the software system.

The product of this workshop and the subsequent discussion is a document which details the opportunities and challenges of developing a Software Discovery Index that would enable researchers to find, cite, and link software and analysis tools publications and researchers. To ensure that the opportunities, challenges, and recommendations detailed in the document reflect the breadth of experience from the community, we are seeking your input.  In conjunction with related efforts already under way at NIH, including the development of a Data Discovery Index, the final document will be used by the NIH Office of the Associate Director (ADDS) to inform a strategy for the development of a Software Discovery Index and a commons ecosystem for data, software, and resources.


We need your help to ensure that this critical task is achieved: to guide the development of a community based system that gives credit and acknowledgment to the builder and maintainers of the software we all depend on! We invite all users, software developers, publishers, and software repository administrators to review our report prior to its submission to the NIH. Please complete your review and post comments by November 1, 2014.


The link to the report is here:


On behalf of the organizing committee, thank you for your assistance!


Organizing Committee


Owen White

Director of Bioinformatics, University of Maryland, Baltimore, School of Medicine

Co-Chair of NIH BD2K  Software Index Workshop


Asif Dhar

Principal & Chief Medical Informatics Officer

Co-Chair of NIH BD2K  Software Index Workshop


Vivien Bonazzi

Senior Advisor for Data Science Technologies (ADDS)

Co-Chair of BD2K Software and Methods Group


Jennifer Couch

Chief, Structural Biology and Molecular Applications Branch

NCI Co-Chair of BD2K Software and Methods Group


Chris Wellington

Program Director (NHGRI)


Bioperl-l mailing list
Bioperl-l <at>
Alexey Morozov | 30 Sep 08:20 2014

Getting pairwise alignment scores for existing multiple alignment

Dear colleagues,
Is there a method in bioperl that will calculate pairwise alignment scores for any given pair of genes in MSA (according to a given matrix and gap opening/extension cost)? It seems that Bio::SimpleAlign methods only work with score if it has been described in MSA file and can only hold a general multiple sequence alignment score.

Alexey Morozov,
LIN SB RAS, bioinformatics group.
Irkutsk, Russia.
Bioperl-l mailing list
Bioperl-l <at>
Mark A. Jensen | 30 Sep 05:41 2014

Spankin' new (alpha) build system for Bioperl-Run

All (esp. George)-

My work on Issue #11 ( 
has metastasized.

The proximate problem was tests that fail because of once-local 
prerequisites. The ultimate
problems are

- Why should I have to install every single wrapper when I only want X?
- Why should I care about any test that doesn't deal with X?
- Why doesn't X bring along its own prereq metadata (including Bio 
   rather than tag along with the distro and hope for the best?

(And I think these are the ultimate problems across BioPerl in terms of 

My solution was

- Add to the distro real, manually prepared metadata on prerequisites 
for all
   the tools
- Add an interactive selector that allows a user to pick their desired 
tools at
   perl Build.PL-time
- Have Module::Build check only (and ALL) the prereqs of the desired 
tools, and
   inform user of missing ones at perl Build.PL-time
- Make use of the persistence of the config information to skip/run .t 
files as
- Update ALL the tests to check whether to skip based on user selection
- Make M::B install only the relevant distro modules and documentation, 
not everything,
   at ./Build install-time

This is ready for brave alpha-testers at
Just do 'perl Build.PL'.

Pod below has some more details-- comments very welcome


     Bio::Tools::Run::Build - Instrument the build for features



     Bio::Tools::Run::Build is a subclass of Module::Build that allows 
     author to offer users the ability to select and install 
     subsets of modules that are packaged in a single large M::B-based

     Grouping and selection of distro modules is driven by the optional
     features concept as defined in CPAN::Meta::Spec and used by

     The subclass provides the following:

     *   Author specification of features and their prereqs

         The build author develops metadata files in json that follow
         "optional_features" in CPAN::Meta::Spec to group distribution
         modules and dependencies as selectable features.

     *   Interactive user selection of features

         The user can be presented with an interactive selector during
         Build.PL runs.

     *   Prereq checking of user selected features only

         M::B only checks for the presence of selected feature 

     *   Build-persistent recording of user selections

         The build object records the selection of features in the
         $build->feature field. This can be used in test files to 
         whether tests should be skipped (and not failed). See

     *   Installation only of selected feature modules

         Bio::Tools::Run::Build adds a build action, "deselect", which 
         after the "code" and "docs" actions. "deselect" removes 
         modules from the blib/lib directory and unneeded documentation 
         the blib/libdoc directory. This keeps the "install" action from
         installing unwanted files.

     The BioPerl-Run distribution contains a large variety of wrappers 
     parsers that handle the execution and output of many different
     bioinformatics tools. It has been provided as a large distro that
     installs and attempts to test all of its modules. Many users need 
only a
     small fraction of the functionality BioPerl-Run provides, relevant 
     to the tools they have installed. On the other hand, managing many
     different packages is unwieldy and uninviting for volunteer 

     The system described here is a compromise that enables a user to 
     test and install only those modules that meet the need, yet reduces 
     maintenance effort to the management of a set of metadata files in 
     single distribution.

Adam Sjøgren | 29 Sep 17:17 2014

Invalid EMBL files generated in rare circumstances; line wrapping


If you craft a tag on a feature sneakily (or if you are unlucky)
Bio::SeqIO will create invalid EMBL, separating the "/" from the
qualifier name:

    ID   unknown; SV 1; linear; unassigned DNA; STD; UNC; 4 BP.
    AC   unknown;
    FH   Key             Location/Qualifiers
    FT   CDS             1..4
    FT                   /
    FT                   X"
    SQ   Sequence 4 BP; 1 A; 1 C; 1 G; 1 T; 0 other;
         actg                                                                      4

In this example "/" and "note" are on separate lines, which is wrong; at
least BioPerl does not accept it itself.

Here is a script to create the above output (BioPerl 1.6.901 used):


    use strict;
    use warnings;

    use Bio::Seq::RichSeq;
    use Bio::SeqFeature::Generic;
    use IO::String;
    use Bio::SeqIO;

    my $seq=Bio::Seq::RichSeq->new(-display_id=>'TEST', -seq=>'actg');
    my $cds=Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', -start=>1, -end=>4);

    my $string;
    my $str=IO::String->new($string);
    my $io=Bio::SeqIO->new(-fh=>$str, -format=>'embl');
    print $string;

Changing the position of the space in the note makes a/the difference.

Maybe there is a bug lurking in the line wrapping/formatting code

Does this sound like a bug to anyone else?

  Best regards,



                                                          Adam Sjøgren
                                                    adsj <at>

Bioperl-l mailing list
Bioperl-l <at>
Daniel Lang | 22 Sep 20:13 2014

Parent/parent_id attribute


I'm using bioperl 1.6.923-1 (Ubuntu Trusty package) and
Bio::DB::SeqFeature to store and manipulate GFF3 files.

I'm wondering why the "Parent" GFF3 attributes are stored as parent_id
values in the feature objects, but not returned as such in the gff3_string?

Chr01   transdecoder    mRNA    5216    5627    .       +       .

Example debugger trace after fetching stored feature:

x $f
0  Bio::DB::SeqFeature=HASH(0x3e3a798)
   'attributes' => HASH(0x3e3a858)
      'Alias' => ARRAY(0x3e3a8b8)
         0  'T1.asmbl_1|m.6484'
         1  'T1.ORF'
      'load_id' => ARRAY(0x3e3aca8)
         0  'T1.Chr01.mRNA.1'
      'parent_id' => ARRAY(0x3e3acf0)
         0  'T1.Chr01.gene.1'
   'is_circular' => 0
   'name' => 'T1.Chr01.mRNA.1'
   'phase' => undef
   'primary_id' => 2428
   'ref' => 'Chr01'
   'score' => undef
   'source' => 'transdecoder'
   'start' => 5216
   'stop' => 5627
   'store' => Bio::DB::SeqFeature::Store::DBI::mysql=HASH(0x39b95d0)
      'class_loaded' => HASH(0x3e3a2b8)
         'Bio::DB::SeqFeature' => 1
      'dbh' => DBI::db=HASH(0x3dc1e40)
           empty hash
      'dumpdir' => '/tmp'
      'is_temp' => undef
      'namespace' => undef
      'seqfeatureclass' => 'Bio::DB::SeqFeature'
      'settings_cache' => HASH(0x3dc1d98)
         'autoindex' => 1
         'compress' => 0
         'index_subfeatures' => 1
         'serializer' => 'Storable'
      'writeable' => undef
   'strand' => 1
   'type' => 'mRNA'

x $f->gff3_string

What is the best practice to store parentage? I'm currently adding an
additional "Parent" value using add_tag_value.

Or is this a bug in the version I'm using?



Dr. Daniel Lang
University of Freiburg, Plant Biotechnology
Schaenzlestr. 1, D-79104 Freiburg
fax:        +49 761 203 6945
phone:      +49 761 203 6989
e-mail:     daniel.lang <at>

My software never has bugs.
It just develops random features.
Fields, Christopher J | 17 Sep 00:39 2014

Re: Whither Bio::FeatureIO?

It *might* be possible to set this up on Travis-CI independently on Bio::FeatureIO, which would be beneficial from a testing viewpoint (as we need to track what works w/ refactored FeatureIO vs what doesn’t).  

I suppose what we need to check with a refactor (master branch) is:

1) Maintaining a sane amount of compat. with Chado.  ‘Sane' meaning Bio::SF::Annotated will need to be chucked or completely reimplemented from scratch, as it is much less than sane now
2) If needed having a concurrently developed version of Chado to make it work.

It may not require much on #2 if Chado isn’t reliant on some of the less API-friendly parts of Bio::SF::Annotated (namely the heavy annotation associated with it).  


On Sep 16, 2014, at 4:53 PM, George Hartzell < <at>> wrote:

<at> scott, do you have test setup for the GMOD stuff?


On Tue, Sep 16, 2014 at 1:41 PM, Fields, Christopher J <cjfields <at>> wrote:
Cool!  I guess I could probably announce this as being released at some point now :)


PS - I may have a decent test environment set up for longer-term evaluation, but it would be nice to see if we can get something working with travis-ci or a smoker setup, just so I can check whether the main branch refactoring is clobbering chado (as I suspect it is).  

On Sep 16, 2014, at 1:50 PM, George Hartzell < <at>> wrote:

Hi All,

It took a while, but I was finally able to run my little litmus test and the good news is that it appears to pass.

I modified my ansible playbook that implements the steps described in INSTALL.Chado so that it uses the version of Bio::FeatureIO that is now on CPAN instead of pulling the github master.

The resulting installation ran to completion and then was able to load the yeast gff3 file:

cp /vagrant/saccharomyces_cerevisiae.gff . --gfffile saccharomyces_cerevisiae.gff --outfile saccharomyces_cerevisiae.sorted.gff --organism yeast --gfffile saccharomyces_cerevisiae.gff.sorted

and the resulting database seems to be stitched together reasonably (though I’m not a particularly informed judge of its character).

<at> chris thanks for the help on this!!!!


On Sat, Aug 30, 2014 at 9:24 PM, George Hartzell <hartzell <at>> wrote:
Fields, Christopher J writes:
 > Just a quick update on this: I released a separate Bio::FeatureIO
 > release to CPAN that represents the code split out from the core
 > modules:
 > I had to do some cleanup to get code to work and tests passing with
 > some sanity.  A *lot* of things were not passing tests when we
 > moved this over.
 > This should represent what was last working with Chado though.
 > However, I haven’t officially announced anything yet b/c I would
 > like to shake bugs out of it. Can either of you try this out on a
 > Chado run to make sure everything is up to snuff (or at least point
 > out issues)?  Time depending, I would like to get something running
 > on (for instance) Travis-CI, maybe including some optional
 > Chado-related stuff.  This would also help so that we can work on
 > merging what has been done on master so that these pass the same
 > tests.

I can't do anything until Tuesday, but will be happy to run it through
the standard Chado build process when I get back to work.

Thanks for digging into it.


Bioperl-l mailing list
Bioperl-l <at>