Fields, Christopher J | 23 May 2013 15:56
Favicon
Gravatar

Re: Speed issues with making IUPAC consensus from alignment

(keep the list cc'd)

On May 22, 2013, at 6:31 PM, Senanu <senanu.junk <at> gmail.com> wrote:

> On May 22, 2013, at 4:17 PM, Fields, Christopher J wrote:
> 
>> Hi all,
>>> 
>>> I am wondering if the consensus_iupac method of Bio::Align is known to be extremely slow, or if I'm doing
something wrong. 
>> 
>> Probably the former, but...
>> 
>>> I have bacterial whole-genome alignments (~7 Mbases) that I made in progressiveMauve and wish to get an
IUPAC consensus. (I know that progressiveMauve uses a non-standard XMFA format, but Bio::AlignIO seems
to read them just fine.) The code below takes more than all night to make a consensus. It works fine on tiny
test alignments. 
>> 
>> It shouldn't take that long, 7 Mb isn't that large.  Or is that 7 Mb for one genome?
> 
> It is 7Mb per genome, but there are only 2 genomes in the alignment, and the sequences are very similar to one
another. 
> 
>> 
>>> Is this a known problem? Is there another way to generate such a consensus?
>> 
>> The code isn't really optimized for this, but again this isn't terribly large.  Is the bottleneck reading
the alignment in, or is it the consensus_iupac() step?  Hard to say w/o seeing the alignment data itself.
> 
> The bottleneck is definitely with the consensus_iupac step. Reading the alignment in takes a few
(Continue reading)

Senanu | 22 May 2013 22:15
Picon

Speed issues with making IUPAC consensus from alignment

Hi all,

I am wondering if the consensus_iupac method of Bio::Align is known to be extremely slow, or if I'm doing
something wrong. 

I have bacterial whole-genome alignments (~7 Mbases) that I made in progressiveMauve and wish to get an
IUPAC consensus. (I know that progressiveMauve uses a non-standard XMFA format, but Bio::AlignIO seems
to read them just fine.) The code below takes more than all night to make a consensus. It works fine on tiny
test alignments. 

Is this a known problem? Is there another way to generate such a consensus?

my $in = Bio::AlignIO->new(-file => $files[0],
                           -format => 'XMFA');
while  (my $aln = $in->next_aln()) {
    foreach  my $seq ($aln->each_seq) {
        $seq->alphabet('dna');
    }
    my $con = $aln->consensus_iupac();
}

Thanks in advance.
Ngwenyama
Bernard Cohen | 20 May 2013 20:49

Phylip format error

Hello!

I happen to have checked to see what the PERL webpage says about Phylip format for DNA alignment files and see
that it is erroneous. 

I am not a PERL user and do not want to be bothered to register or otherwise learn how to make an official
comment, so forward this for someone to pick up.

Phylip format allows up to 10 spaces for taxon names; the data must start in the 11th space. This can be
checked on Jo Felsenstein's site.

The PERL page accessed by searching for "Phylip format PERL" allows only 8 spaces for the name. 

B. L. Cohen
shalabh sharma | 17 May 2013 16:54
Picon

Downloading annotated Gene nucleotide fasta files

HI,
                 First of all i am really sorry for sending this mail here,
i am not sure if this is the right forum. I know lot of people work on
similar stuff.
I wrote to NCBI but nobody replied.

Actually i am looking for all bacterial/microbial gene annotation
nucleotide fasta files.
Does anyone knows where to download these kind of files.
I tried *ffn files but they are not annotated.
Or is there any module in bioperl that i can use ?
I would really appreciate your help.

Thanks
Shalabh

--

-- 
Shalabh Sharma
Scientific Computing Professional Associate (Bioinformatics Specialist)
Department of Marine Sciences
University of Georgia
Athens, GA 30602-3636
Miquel Ràmia | 16 May 2013 12:42
Favicon

Problem compiling Bio::DB::Sam

Hi all,

I get this message when compiling Bio::DB::Sam:

Building Bio-SamTools

gcc -g -Wall -O2 -fPIC -o bam2bedgraph bam2bedgraph.o 
-L/var/lib/gbrowse2/databases/samtools/samtools-0.1.19 -lbam -lm -lz

/var/lib/gbrowse2/databases/samtools/samtools-0.1.19/libbam.a(bgzf.o): In function `mt_destroy':

/var/lib/gbrowse2/databases/samtools/samtools-0.1.19/bgzf.c:458: undefined reference to `pthread_join'

/var/lib/gbrowse2/databases/samtools/samtools-0.1.19/libbam.a(bgzf.o): In function `bgzf_mt':

/var/lib/gbrowse2/databases/samtools/samtools-0.1.19/bgzf.c:445: undefined reference to `pthread_create'

collect2: ld returned 1 exit status

make: *** [bam2bedgraph] Error 1

Is this error related to the module or some dependencies? or maybe a 
problem with my system?

Any help appreciated, thanks!

--

-- 
Miquel Ràmia Jesús
PhD. candidate (PIF)
Evolutionary Bioinformatics Group
(Continue reading)

Carnë Draug | 16 May 2013 03:53
Picon
Gravatar

sets of sequences - how to read?

Hi

when accessing entrez gene using eutils to get multiple genes, NCBI
now returns an Entrezgene-Set[1] rather than a list of EntrezGene.
This change must have happened sometime on the last 2 months. Compare:

use Bio::DB::EUtilities;

my %sets = (
  eutil   => 'efetch',
  db      => 'gene',
  retmode => 'text',
  rettype => 'asn1',
  email   => 'bioperl-l <at> lists.open-bio.org',
);

## this mimics the previous behaviour of the NCBI server but the
multiple requests will annoy their servers
my  <at> ids = (3014, 85235);
my $response;
foreach ( <at> ids) {
  my $fetcher = Bio::DB::EUtilities->new(%sets, id => $_);
  $response .= $fetcher->get_Response->content;
}
print $fetcher->get_Response->content;

## this used to be the right way to do it, but now returns an Entrezgene-Set
my $fetcher = Bio::DB::EUtilities->new(%sets, id => \ <at> ids);
$response .= $fetcher->get_Response->content;
print $fetcher->get_Response->content;
(Continue reading)

Wenlan Tian | 29 Apr 2013 20:17
Picon

blastxml

Hi,all,
I had done a blastall search and got a xml output file. How could i parse 
it to other format? I've found many scripts online, but no one worked. I am 
new to linux. Can anyone give me a good script to use?

Thanks,
Vivi
Hilmar Lapp | 15 May 2013 22:44
Gravatar

Workshop on Sustainable Software for Science: Practice and Experiences

FYI, if you haven't seen this yet:

http://wssspe.researchcomputing.org.uk/

It seems to me that the Bio* projects, perhaps led by BioPerl as the oldest and thus longest running
(nowadays more fancily called "sustained") of them would have a lot to say about the subject. Anyone
interested in a joint submission?

Also, I notice that Biojava's Andreas is on the organizing committee, so maybe he's been conspiring on
something already :-)

	-hilmar
--

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================

_______________________________________________
Bioperl-l mailing list
Bioperl-l <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
WoA | 8 May 2013 21:24
Picon

Extracting matching subsequence from pairwise alignment

Hello All,

I've a pairwise global alignemnet of two DNA sequences generated by the
program NEEDLE of EMBOSS package. I wish to extract the sub-sequence that
matches/aligns to a given region of the other sequence. In  this alignment
<http://pastebin.com/wrhEDzJU>   (Pastebin Link) the given region (actually
the CDS) falls between base number 24:485 in the original sequence with ID
'XM_001005073.' I wish to extract the sub-sequence in the sequence ID
'Homolog' that aligns with that 24:485 region of the other sequence.

I'm using Bioperl to parse the alignment. I find out the the alignment
column numbers corresponding to 24:485 region in the particular sequence,
using 'column_from_residue_number'. Then I extract the sub-sequence from the
'aligned' other sequence(containing gaps) using the corresponding column
numbers. Finally I remove the gap characters.

Am I doing this thing correctly and are there any pitfalls ? Is there any
better way to do it by (Bio)Perl/Python? The code goes here:
use strict;
use warnings;
use Bio::AlignIO;

# read in an alignment generated by the EMBOSS program Needle
my $in = new Bio::AlignIO(-format => 'emboss',
                  -file   => 'test_needle.aln');

while( my $aln = $in->next_aln ) {
          #Seqnames: 'XM_001005073.'(CDS:24-485),'Homolog'
          my ($cds_start,$cds_end)=(24,485);#
          my $col_cdsstart = $aln->column_from_residue_number(
(Continue reading)

Veronica A. | 8 May 2013 02:32
Picon
Favicon

Bio::SeqIO doesn't write all gbk sequences from Bio::DB::GenBank

Hello,
I'm currently running Bio::Perl 1.6.1 on Ubuntu 12.04.2 LTS and have noticed a problem with Bio::SeqIO
when writing genbank files using the write_seq() function;  some of the files do not include an 'ORIGIN'
tag or the sequence.

I am using GI#s (50 at a time every 2 minutes) to retrieve genbank files via Bio::DB::GenBank.

----------------------------------------START CODE----------------------------------
my $gb = Bio::DB::GenBank->new(-verbose=>-1);my $seqout = Bio::SeqIO->new(-file=>">$fileName",
'-format'=>'Genbank', -alphabet=>'dna', -flush=>0, -verbose=>-1);while( <at> ids){     my  <at> batchArray =
splice( <at> ids, 0, 50);     my $batchArrayRef = \ <at> batchArray;
     my $streamObj;     my $pid = fork();     if($pid == 0){          eval{               $streamObj =
$gb->get_Stream_by_id($batchArrayRef);          };          if($ <at> ){               print "Error: ".$ <at> ."\n";          }          else{               while(my
$seqObj = $streamObj->next_seq()){                    unless($seqObj->accession_number() =~ /N[A-Z]\_/){                        #print
"ID: ".$seqObj->id()."\n";                        #print "Seq:\n".$seqObj->seq()."\n";                         $seqout->write_seq($seqObj);                    }               
}          }          exit 0;     }}waitpid($pid,0);sleep(120);
----------------------------------------END CODE----------------------------------
Most of the Genbank files written to the output file have sequences, but there is a small portion that do not,
even though they should.  For example, JX287367, in NCBI includes an 'ORIGIN' tag and sequence and when I
use the print function before writing to file, the sequence is printed to STDOUT, but the 'ORIGIN' tag and
sequence are not written to the output gbk file.  The following is found in the final output file:
----------------------------------START GBK-----------------------------------------
LOCUS       JX287367                 588 bp    DNA     linear   BCT 19-DEC-2012DEFINITION  Chlamydia trachomatis strain UW-5/CX
pyruvoyl-dependent arginine            decarboxylase (aaxB) gene, complete cds.ACCESSION   JX287367VERSION    
JX287367.1  GI:404351720KEYWORDS    .SOURCE      Chlamydia trachomatis  ORGANISM  Chlamydia trachomatis           
Bacteria; Chlamydiae; Chlamydiales; Chlamydiaceae;            Chlamydia/Chlamydophila group;
Chlamydia.REFERENCE   1  (bases 1 to 588)  AUTHORS   Bliven,K.A., Fisher,D.J. and Maurelli,A.T.  TITLE    
Characterization of the activity and expression of arginine            decarboxylase in human and animal
Chlamydia pathogens  JOURNAL   FEMS Microbiol. Lett. 337 (2), 140-146 (2012)   PUBMED   23043454REFERENCE   2 
(bases 1 to 588)  AUTHORS   Bl
(Continue reading)

Chris Maloney | 5 May 2013 06:03
Picon
Gravatar

Wiki work, Template:Doclink

The module pages on the wiki could look a little better, like this one
for example:  http://www.bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast.
 There used to be a bunch of extra whitespace at the top of the page,
which was caused by extra line breaks in Template:Doclink, which I
just removed.

But, I think there are other improvements that could be made.  I would
like to turn this into an infobox -- which are the helpful informative
tables of info on Wikipedia that appear on many articles on the upper
right.  That would allow us to add more links -- like to metacpan, for
example.

It is not completely trivial to import infoboxes into a wiki though, I
just discovered.  I just went through the exercise on my home wiki,
and it involves importing a lot of templates from Wikipedia, and
fixing up the common.css.  You can see the full list of imported
templates here:
http://chrismaloney.org/wiki/index.php?title=Special:RecentChanges&limit=100.
 I don't *think* this should cause any problems, but I'm not 100%
sure.  On the other hand, if it does, it should be easy to roll back
-- it's a wiki, after all.

Does anybody have a problem if I do this?  I'll wait a day for
responses, and tackle this tomorrow, if no one objects.

-- Chris M.

Gmane