Sean O'Keeffe | 16 May 20:05
Picon

Re: fastq splitter - working but not before xmas!!

So now I've got a bunch of fastq's all about 17GB in size. The script is
puttering away but this is tediously slow.
I tried the the fastq-dump tool from sra toolkit but it didn't like my
commands (fastq-dump --split-files <input_fastq_file> ) - my ignorance no
doubt.

Any ideas out there on speeding up Bio::SeqIO::fastq output?
Thanks.

On 1 March 2012 03:16, Joel Martin <j_martin <at> lbl.gov> wrote:

> Just a caution to double check that the read1 and read2 names match after
> splitting.  I don't know if this thread jinxed me or what, but I just for
> the first time received a concatenated fastq file formatted as you
> describe, except the first read1 doesn't match the first read2.  zut alores!
>
> came up with converting to scarf, /usr/bin/sort the scarf, then read that
> with tossing into single or paired files and reconverting to fastq in the
> process.  it wasn't too bad, but I don't think bioperl has a scarf
> conversion, it's basically fastq with : substituted for \n.  most
> delimeters that aren't : would work better but i already had a fastq2scarf
> from early solexa days ( i think ).
>
> # this was the last step, if it's handy for this plague of hideous files,
> the fixed fields for : would need adjusting
> use strict;
>
> open( my $oph, '>', 'paired.fq' ) or die $!;
> open( my $osh, '>', 'single.fq' ) or die $!;
>
(Continue reading)

Horácio Montenegro | 21 May 19:28
Picon

BioPerl 1.6.901 and prot4est

    Dear all,

   I am trying to set up prot4est and ran into problems with the
bioperl from debian testing repositories (1.6.901). It breaks one of
the scripts (contructSMAT.pl) from prot4est:

~/bin/p4e3.1b/exampleData$ ../bin/constructSMAT.pl --access
p4e_access.txt --config ALC_smat.config
CLEAN => 1
SPECIES => Ascaris lumbricoides
FSA_FILE => ~/bin/p4e3.1b/exampleData/./A.lumbricoides_sim.fsa
EMBL_SEARCH => 1
This dataset is from Ascaris lumbricoides (6252)
Can't call method "ancestor" on an undefined value at
/usr/local/share/perl/5.12.4/Bio/Taxon.pm line 513, <GEN0> line 1.

    The culprit is sub ancestor at Taxon.pm. Debugging a bit I found
that a call to write_seq($emblO) on sub fsa2embl (from emblConnect.pl,
another prot4est script) fires the bug. Anyway, the workaround so far
is to use bioperl 1.6.1. In fact, if I use bioperl 1.6.901, but
manually replace sub ancestor with the one from bioperl 1.6.1,
prot4est runs normally.

   I do not know if this is a new bioperl bug, or if the changes in
sub ancestor revealed some bug in emblConnect.pl.

    best regards,
          Horacio
yang liu | 19 May 16:34
Picon

modify sequence names

Dear colleagues,

Would anyone please help me to modify sequence names with bioperl? I am
editing them manually now, is there a easier way?
I have a bunch of sequences in the format:

>lcl|NC_017840.1_cdsid_YP_006280919.1 [gene=cox1] [protein=cytochrome c
oxidase subunit 1] [protein_id=YP_006280919.1] [location=1..1584]
ATGACAAATCCGGTCCGATGGCTGTTCTCCACTAACCACAAGGATATAGGTACTCTATATTTCATCTTCG
GTGCCATTGCTGGAGTGATGGGCACATGCTTCTCAGTACTGATTCGTATGGAATTAGCACGACCCGGCGA
TCAAATTCTTGGTGGGAATCATCAACTTTATAATGTTTTAATAACGGCTCACGCTTTTTTAATGATCTTT

>lcl|NC_017840.1_cdsid_YP_006280920.1 [gene=ccmFn] [protein=cytochrome c
biogenesis FN] [protein_id=YP_006280920.1] [location=2225..3940]
ATGTCAATAAATGCATTTTCTCATTATTCGTTCTTTCCGGGTCTTTTCGTTGCATTCACTTACAACAAGA
AAGAACCACCAGCGTTTGGTGCAGCCCCTGCATTTTGGTGCATTCTTCTTTCTTTCCTTGGTCTTTCGTT
CCGTCATATTCCTAATAACTTATCCAATTACAGCGTATTAACCGCTAATGCACCTTTCTTTTATCAAATC

I hope to keep only the gene name, which means the word behind "gene=",
like:
>cox1
ATGACAAATCCGGTCCGATGGCTGTTCTCCACTAACCACAAGGATATAGGTACTCTATATTTCATCTTCG
GTGCCATTGCTGGAGTGATGGGCACATGCTTCTCAGTACTGATTCGTATGGAATTAGCACGACCCGGCGA
TCAAATTCTTGGTGGGAATCATCAACTTTATAATGTTTTAATAACGGCTCACGCTTTTTTAATGATCTTT

>ccmFn
ATGTCAATAAATGCATTTTCTCATTATTCGTTCTTTCCGGGTCTTTTCGTTGCATTCACTTACAACAAGA
AAGAACCACCAGCGTTTGGTGCAGCCCCTGCATTTTGGTGCATTCTTCTTTCTTTCCTTGGTCTTTCGTT
CCGTCATATTCCTAATAACTTATCCAATTACAGCGTATTAACCGCTAATGCACCTTTCTTTTATCAAATC

(Continue reading)

Jim Hu | 18 May 07:07
Picon
Favicon
Gravatar

Bio::Seq->subseq documentation

In the page for Bio::Seq, 

	http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Seq.html#POD5

I think the usage should match the documentation for Bio::PrimarySeq

	http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/PrimarySeq.html#POD4

indicating that the arguments can be integers OR location objects.  Is that correct?

Jim
=====================================
Jim Hu
Professor
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4054
Carnë Draug | 13 May 15:58
Picon
Gravatar

Writing bp_grep

Hi everyone

I'm starting to write a grep tool for sequences (bp_grep). The idea is
to have something just like grep but for DNA and protein sequences
with most of the options that make sense in this context (print the
filename or sequence name only, position, without match search, count,
etc). I was wondering if anyone has any piece of code that could fit
in it or started something similar but just never finished.

Thanks,
Carnë
Jason Stajich | 11 May 21:07
Picon
Gravatar

small project

HMMER folks have contributed a module to BioJava to simplify submission of a protein sequence to the HMMER
RESTful API
http://xfam.wordpress.com/2012/05/09/pdb-pfam-mapping/
http://hmmer.janelia.org/help/api

Perhaps a BioPerl similar module that wraps up the existing code to submit Bio::Seq objects similar to
other Bio::DB:: or probably better to do Bio::Tools interfaces for interaction with remote
webservices. The code to add this to bioperl is basically already written - the question would be if you
wanted to populate a Bio::Search object with the XML results (Writing a parser for the XML)

http://hmmer.janelia.org/help/api#sending

Jason

Jason Stajich
jason.stajich <at> gmail.com
jason <at> bioperl.org
Peter Cock | 9 May 19:44
Gravatar

BioPerl BuildBot

Hi all,

I've retitled this and sent it to the BioPerl list, continuing from
this thread on
the BioRuby list:

http://lists.open-bio.org/pipermail/bioruby/2012-May/002247.html

On Wed, May 9, 2012 at 6:35 PM, Pjotr Prins <pjotr.public14 <at> thebird.nl> wrote:
> On Wed, May 09, 2012 at 05:29:49PM +0000, Fields, Christopher J wrote:
>> *sigh*
>>
>> Anyone know of a way I can clone myself a few times, so one of my clones can get bioperl set up on buildbot? :P
>
> Peter knows someone in Scotland who can help! Now I got to see a man
> about a sheep...
>
> Pj.

You mean Dolly The Sheep? ;)

Tiago or I can assist on the BuilBot server side for BioPerl - in fact Tiago
had already made a start (CC'd).

We'll need help from a BioPerl developer with a spare machine or two
to use as a buildslave (and I can probably borrow some of my employer's
which are already nightly tests) to help with how we setup the BuildSlaves
- essentially how to get BioPerl and relevant dependencies installed,
and then what needs to be done from a fresh git checkout to build
and run the tests. Tiago has got this currently:
(Continue reading)

Tristan Lefebure | 9 May 17:23
Picon

Codon boostraping

Hi there,

Just submitted the following patch to do codon bootstrapping:

https://redmine.open-bio.org/issues/3350

I'll appreciate your comments on this tiny proposed addition to 
Bio::Align::Utilities

Thanks!

--
Tristan Lefebure
Hermann Norpois | 8 May 22:30

bioperl 1.6. and Perl API

Hello,

I installed bioperl 1.6.901-1 on ubuntu. Is it compatible with Perl API?
Ensembl seems to prefer an older version:
http://www.ensembl.org/info/docs/api/api_installation.html. I downloaded
the four API packages and put them in /usr/share/perl5 (location of the
already installes bio-perl-modules).
If I start my testscript I get:

Can't locate Bio/EnsEMBL/Registry.pm in @INC (@INC contains: /etc/perl
/usr/local/lib/perl/5.12.4 /usr/local/share/perl/5.12.4 /usr/lib/perl5
/usr/share/perl5 /usr/lib/perl/5.12 /usr/share/perl/5.12
/usr/local/lib/site_perl .)

But @INV contains:
/etc/perl
/usr/local/lib/perl/5.12.4
/usr/local/share/perl/5.12.4
/usr/lib/perl5
/usr/share/perl5
/usr/lib/perl/5.12
/usr/share/perl/5.12
/usr/local/lib/site_perl

So principally the module should be found.

Thanks
Hermann Norpois
Thomas, Dallas | 8 May 22:04
Picon
Favicon

hmmer3 to hmmer2

Hello,

I was wondering if you could use the updated Bio::SearchIO::hmmer to
take as input the out file of hmmer3 and output its equivalent in hmmer2
format.

Thanks

Dallas
Hermann Norpois | 8 May 18:16

some contigs do not work for sequence retrievel

Hello,

for getting a sequence 5 prime upstream of TTS I wrote a script that works
for some geneids but not for all. I always get a contig and coordinates. I
do not have an idea why I do not get a sequence ( I only get fasta
headers). Actually the sequence ID should be out of importance if I see
that a contig is detected. Has anybody an idea?

Thanks
Hermann Norpois

#!/bin/perl -w
use strict;
use Bio::DB::EntrezGene;
use Bio::SeqIO;
use Bio::DB::GenBank;

my $id = "12064"; #Works with geneid 18619 (Penk1) but not with 54161
(copg) or 12064 (bdnf)

my $seqio_obj = Bio::SeqIO->new(-file => ">bdnf.fasta", -format => 'fasta'
);

my $db = new Bio::DB::EntrezGene;

my $seq = $db->get_Seq_by_id($id);

my $ac = $seq->annotation;

for my $ann ($ac->get_Annotations('dblink')) {
(Continue reading)


Gmane