Bruno Vecchi | 1 Aug 2008 05:15
Picon
Favicon

Bio::Biblio doesn't find articles

Hi everyone,
I am trying to retrieve bibliographic data using Bio::Biblio, but so far 
I haven't got any luck.

The following code prints zero results whatever the keyword ("atom" in 
this example) you choose. Could anyone please point me to my mistake? 
There are no errors in the output, just no articles found.

# Beginning of script
#/usr/bin/perl -w
use strict;
use Bio::Biblio;

my $bib_obj = Bio::Biblio->new();

my $biblio_results = $bib_obj->find("atom");
print $biblio_results->get_count;
# End of script

Calling the Bio::Biblio constructor method without parameters sets them 
to default values, which are:

access: soap
location: 'http://www.ebi.ac.uk/openbqs/services/MedlineSRS'

I also tried using the example script at

http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/scripts/biblio/biblio.PLS 
with the same result. For instance, trying:

(Continue reading)

Bruno Vecchi | 1 Aug 2008 06:16
Picon
Favicon

Bio::Biblio doesn't find articles [SOLVED]


Sorry to abuse this mailing list!
My problem is solved, I changed the '-access' flag to 'eutils' and it worked just fine. Apparently the EBI services are temporarily down.

Greetings,

Bruno.

-------- Mensaje original -------- Asunto: Fecha: De: A:
Bio::Biblio doesn't find articles
Fri, 01 Aug 2008 00:15:19 -0300
Bruno Vecchi <brunovecchi <at> yahoo.com.ar>
bioperl-l <at> lists.open-bio.org


Hi everyone, I am trying to retrieve bibliographic data using Bio::Biblio, but so far I haven't got any luck. The following code prints zero results whatever the keyword ("atom" in this example) you choose. Could anyone please point me to my mistake? There are no errors in the output, just no articles found. # Beginning of script #/usr/bin/perl -w use strict; use Bio::Biblio; my $bib_obj = Bio::Biblio->new(); my $biblio_results = $bib_obj->find("atom"); print $biblio_results->get_count; # End of script Calling the Bio::Biblio constructor method without parameters sets them to default values, which are: access: soap location: 'http://www.ebi.ac.uk/openbqs/services/MedlineSRS' I also tried using the example script at http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/scripts/biblio/biblio.PLS with the same result. For instance, trying: biblio.PLS -find Java -find perl Gave the following output: Looking for 'Java'... Found 0 Looking for 'perl'... Found 0 Maybe the URL of the service is out of date? Thanks a lot in advance! Bruno.
_______________________________________________
Bioperl-l mailing list
Bioperl-l <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Clancy, Kevin | 2 Aug 2008 00:30
Favicon

Reference to a staden module under Bio::SeqIO.pm

Hi Folks
I am using the windows version of Bioperl 1.5.2_100. I recently was
compiling a tool using ActiveState's PerlApp which included Bioperl
modules. I received an error for the Bio::SeqIO module, which was
calling for the Bio::SeqIO::staden::read method(?) on line 312 - 314 of
the Bio::SeqIO.pm module. I don't appear to have a copy of the staden
module under the Bio::SeqIO directory and it doesn't appear to be
present in the current BioPerl trunk. I simply commented this out of my
SeqIO.pm file to perform my build and its all running normally. Was this
simply a reference to a non existent module or am I missing something?
Thank you for your help.
kevin

Kevin Clancy, PhD
Senior Scientist, Informatic Sciences
Invitrogen Corp
Carlsbad, CA 92008
Phone: (768) 268 8356
Email: kevin.clancy <at> invitrogen.com 
Jason Stajich | 2 Aug 2008 14:58
Gravatar

Re: Inframe stop codon

[regarding PAML analyses]

You would need to translate the cDNA sequence and identify where the  
stop codon is, then remove that codon or remove that sequence from  
your bulk analyses.  it depends on why you think the stop codon is in  
the sequence - mis-annotation, this is a pseudogene, or what?  If  
this is a small percentage of a lot of sequences I would probably  
just skip these, if this is the terminal stop codon that being  
included in the sequences, you just need to remove the last codon  
from the sequences before providing it to PAML. There Seq HOWTO has  
many examples of how to manipulate a sequence object with substr,  
trunc, as well as just the simple seq() method that gives you the  
sequence as a string, which you can manipulate, then update the  
sequence object afterwards. As in
my $str = $seq->seq;
# remove the last codon from this cDNA sequence
substr($str, -3, 3,'');
$seq->seq($str);

Alternatively you can use trunc to truncate the sequence
my $trunc = $seq->trunc(1,$seq->length -3);
$seq = $trunc;

You can translate the sequence with the $seq->translate command, then  
test for presence of a stop codon (This is exactly the code that is  
running in the pairwise_kaks script that is in the scripts/utilities/  
directory).  If you have a stop codon you need to figure out where it  
is at the end of the sequence or not.  If it is the terminal codon,  
you can just lop off the last codon on all your sequences, but if it  
is internal, you need to decide what you want to do with this sequence.

If there are multiple stop codons, I am not sure it is appropriate to  
run PAML here, unless you are interested in some sort of pseudo-rate  
calculation that has many of the codons omitted.  Otherwise you may  
just want to calculate a DNA substitution rate for the sequences to  
make comparison.

I suggest working a single file by hand to get the appropriate steps  
down and then coding it up will be easier.

I am sure folks on the list can help too so it is important to post  
to the mailing list - I don't see any messages from you on the list  
about this query.

-jason
On Aug 2, 2008, at 5:42 AM, Tannistha wrote:

>
> Hi Jason,
>
> Please suggest me how to filter the inframe stop codons,  
> aa_to_dna_aln returns the sequence with in-frame stop codons.
> I have posted my query along with the input files to the forum.
>
> Thanks for your earlier advice, runmode =0 is working for me.
>
> Look forward to your reply
>
> Best Regards
> Tannistha
>
>
> Dr. Tannistha Nandi
> email: tannistha3 <at> yahoo.com
>
>
>
Dave Messina | 3 Aug 2008 21:10
Picon
Picon
Favicon

Re: Reference to a staden module under Bio::SeqIO.pm

Hi Kevin,

The staden module is a oddball one, to be sure.

A search on the BioPerl website turns up this FAQ entry:
http://www.bioperl.org/wiki/FAQ#bioperl-ext_won.27t_compile_the_staden_IO_lib_part_-_what_do_I_do.3F

Also the Windows install page
http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

says:

> Some external programs such as Staden <http://www.bioperl.org/wiki/Staden> and
> the EMBOSS <http://www.bioperl.org/wiki/EMBOSS> suite of programs can only
> be installed on Windows by using Cygwin <http://www.cygwin.com/> and its gcc
> C compiler <http://gcc.gnu.org/> (see Bioperl in Cygwin, below)
>

In any case, the staden module (and associated external libraries) is used
only if you are trying to read the scf, abi, alf, pln, exp, ctf, or ztr
binary formats. So your edit shouldn't cause you any problems otherwise.

Dave
Chris Fields | 3 Aug 2008 22:20
Picon

Re: Reference to a staden module under Bio::SeqIO.pm

This seems to be a problem with PerlApp and eval{}; judging by a quick  
Google search this isn't the only module affected.  The line in  
question is wrapped in an eval{} to check for the availability of  
Bio::SeqIO::staden::read (but not die on it).

BTW, the eval was moved into the relevant plugin modules post-1.5.2,  
so the eval{} is checked when the module is loaded dynamically (i.e.  
when a format requiring it is passed in).  It was causing other issues  
with ActivePerl installations and was redundant, so it was removed.

http://bugzilla.open-bio.org/show_bug.cgi?id=2295

chris

On Aug 3, 2008, at 2:10 PM, Dave Messina wrote:

> Hi Kevin,
>
> The staden module is a oddball one, to be sure.
>
> A search on the BioPerl website turns up this FAQ entry:
> http://www.bioperl.org/wiki/FAQ#bioperl-ext_won.27t_compile_the_staden_IO_lib_part_-_what_do_I_do.3F
>
> Also the Windows install page
> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
>
> says:
>
>> Some external programs such as Staden <http://www.bioperl.org/wiki/Staden 
>> > and
>> the EMBOSS <http://www.bioperl.org/wiki/EMBOSS> suite of programs  
>> can only
>> be installed on Windows by using Cygwin <http://www.cygwin.com/>  
>> and its gcc
>> C compiler <http://gcc.gnu.org/> (see Bioperl in Cygwin, below)
>>
>
>
> In any case, the staden module (and associated external libraries)  
> is used
> only if you are trying to read the scf, abi, alf, pln, exp, ctf, or  
> ztr
> binary formats. So your edit shouldn't cause you any problems  
> otherwise.
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l <at> lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign
Benbo | 2 Aug 2008 22:05

Finding possible primers regex


Hi there, 
I'm trying to write a perl script to scan an aligned multiple entry fasta
file and find possible primers. So far I've produced a string which contains
bases which match all sequences and * where they don't match e.g.
1) TTAGCCTAA
2) TTAGCAGAA
3) TTACCCTAA

would give TTA*C**AA.

I want to parse this string and pull out all sequences which are 18-21 bp in
length and have no more than 4 * in them.

So far, I've got this:

while($fragment_match =~ /([GTAC*]{18,21})/g){
print "$1\n";
}

hoping to match all fragments 18-21 characters in length. However even that
doesn't work as it has essentially chunked it into 21 char blocks, rather
than what I hoped for of
0-18
0-19
0-20
0-21
1-19
1-20
1-21
1-22

etc.

Can anyone let me know if this is already possible in BioPerl, or how one
would go about it with regex. Sadly I'm fairly new to perl and getting to
grips with BioPerl, so please treat me gently :).

Many thanks,

Ben

--

-- 
View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
Chris Fields | 4 Aug 2008 06:08
Picon

Re: Finding possible primers regex

On Aug 2, 2008, at 3:05 PM, Benbo wrote:

>
> Hi there,
> I'm trying to write a perl script to scan an aligned multiple entry  
> fasta
> file and find possible primers. So far I've produced a string which  
> contains
> bases which match all sequences and * where they don't match e.g.
> 1) TTAGCCTAA
> 2) TTAGCAGAA
> 3) TTACCCTAA
>
> would give TTA*C**AA.
>
> I want to parse this string and pull out all sequences which are  
> 18-21 bp in
> length and have no more than 4 * in them.
>
> So far, I've got this:
>
> while($fragment_match =~ /([GTAC*]{18,21})/g){
> print "$1\n";
> }
>
> hoping to match all fragments 18-21 characters in length. However  
> even that
> doesn't work as it has essentially chunked it into 21 char blocks,  
> rather
> than what I hoped for of
> 0-18
> 0-19
> 0-20
> 0-21
> 1-19
> 1-20
> 1-21
> 1-22
>
> etc.
>
> Can anyone let me know if this is already possible in BioPerl, or  
> how one
> would go about it with regex. Sadly I'm fairly new to perl and  
> getting to
> grips with BioPerl, so please treat me gently :).
>
> Many thanks,
>
> Ben

There is a trick to this which is discussed more extensively in  
'Mastering Regular Expressions'.  Essentially you have to embed code  
into the regex and trick the parser into backtracking using a negative  
lookahead.  The match itself fails (i.e. no match is returned), but  
the embedded code is executed for each match attempt,

The following script is a slight modification of one I used which  
checks the consensus string from the input alignment (in aligned FASTA  
format here), extracts the alignment slice using that match, then spit  
the alignment out to STDOUT in clustalw format.  This should work for  
perl 5.8 and up, but it's only been tested on perl 5.10.  You should  
be able to use this to fit what you want.

my $in = Bio::AlignIO->new(-file => $file,
                            -format => 'fasta');
my $out = Bio::AlignIO->new(-fh => \*STDOUT,
                            -format => 'clustalw');

while (my $aln = $in->next_aln) {
     my $c = $aln->consensus_string(100);
     my  <at> matches;
     $c =~ m/
         ([GTAC?]{18,21})
         (?{my $match = check_match($1);
            push  <at> matches, [$match,
                            pos(),
                            length($match)]
               if defined $match;})
         (?!)
         /xig;
     for my $match ( <at> matches) {
         my ($hit, $st, $end) = ($match->[0],
                                 $match->[1] - $match->[2] + 1,
                                 $match->[1]);
         my $newaln = $aln->slice($st, $end);
         $out->write_aln($newaln);
     }
}

sub check_match {
     my $match = shift;
     return unless $match;
     my $ct = $match =~ tr/?/?/;
     return $match if $ct <= 4;
}

chris
Heikki Lehvaslaiho | 4 Aug 2008 08:42
Picon

Re: Bio::Coordinate::Pair

Prashanth,

Your example coordinates do not do the conversion but more or less report the 
locations of your features in some third coordinates.

The way to think coordinates pairs is to use them as HSPs. You tell the pair 
object what is the matching segment in the pair of sequences.

The synopsis in Bio::Coordinate::Pair class file gives the following example:

use Bio::Location::Simple;
use Bio::Coordinate::Pair;

my $match1 = Bio::Location::Simple->new
    (-seq_id => 'propeptide', -start => 21, -end => 40, -strand=>1 );
my $match2 = Bio::Location::Simple->new
    (-seq_id => 'peptide', -start => 1, -end => 20, -strand=>1 );
my $pair = Bio::Coordinate::Pair->new(-in => $match1,
				      -out => $match2
    );
# location to match
$pos = Bio::Location::Simple->new
    (-start => 25, -end => 25, -strand=> -1 );

$res = $pair->map($pos);
print $res->match->start; # 5

In other words, region 25-40 in the propeptide matches locations 1-20 in the 
final peptide. Therefore conversion from 25 gives 5:

     signalp        21  25             40
--------------------|---|--------------|
                    1   5  pep         20

I hope this clarifies it.

The advantage of using these objects over manual conversion is that the code 
has been debugged (no all too easy +/-1 errors) and that they can be chained 
together.

Yours,

      -Heikki

On Tuesday 29 July 2008 22:07:55 Prashanth Athri wrote:
> Dear Professor Lehvaslaiho:
>
> I had a quick question about the module- Bio::Coordinate::Pair
>
> The BioPerl tutorial has the following example:
>
> $input_coordinates = Bio::Location::Simple->new
> (-seq_id => 'propeptide', -start => 1000, -end => 2000, -strand=>1 );
>
> $output_coordinates = Bio::Location::Simple->new
> (-seq_id => 'peptide', -start => 1100, -end => 2100, -strand=>1 );
>
> $pair = Bio::Coordinate::Pair->new
> (-in => $input_coordinates , -out => $output_coordinates );
>
> $pos = Bio::Location::Simple->new (-start => 500, -end => 500 );
>
> $res = $pair->map($pos);
> $converted_start = $res->start;
>
> The way I understand it, $converted_start should return ³1600². But when I
> run this snippet, it returns ³500². Could you please let me know how
> $pair->map($pos) is processed?
>
> I appreciate your time and thanks in advance.
>
> Regards,
> Prashanth

--

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

_______________________________________________
Bioperl-l mailing list
Bioperl-l <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
Shaohua Fan | 5 Aug 2008 09:36
Picon

how to remove indentical sequences from a dataset

Hi, there ,

I have a sequence dataset which contains about 200 sequences. there are some identical sequences in this.
is there any bioperl modules  which can remove those identical sequences?

thanks a lot. 
yours,
shaohua
----- Original Message ----- 
From: "Benbo" <btemperton <at> googlemail.com>
To: <Bioperl-l <at> lists.open-bio.org>
Sent: Sunday, August 03, 2008 4:05 AM
Subject: [Bioperl-l] Finding possible primers regex

> 
> Hi there, 
> I'm trying to write a perl script to scan an aligned multiple entry fasta
> file and find possible primers. So far I've produced a string which contains
> bases which match all sequences and * where they don't match e.g.
> 1) TTAGCCTAA
> 2) TTAGCAGAA
> 3) TTACCCTAA
> 
> would give TTA*C**AA.
> 
> I want to parse this string and pull out all sequences which are 18-21 bp in
> length and have no more than 4 * in them.
> 
> So far, I've got this:
> 
> while($fragment_match =~ /([GTAC*]{18,21})/g){
> print "$1\n";
> }
> 
> hoping to match all fragments 18-21 characters in length. However even that
> doesn't work as it has essentially chunked it into 21 char blocks, rather
> than what I hoped for of
> 0-18
> 0-19
> 0-20
> 0-21
> 1-19
> 1-20
> 1-21
> 1-22
> 
> etc.
> 
> Can anyone let me know if this is already possible in BioPerl, or how one
> would go about it with regex. Sadly I'm fairly new to perl and getting to
> grips with BioPerl, so please treat me gently :).
> 
> Many thanks,
> 
> Ben
> 
> 
> 
> -- 
> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l <at> lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Gmane