Erik | 1 Jan 21:17
Picon
Picon
Favicon
Gravatar

Re: acquiring a local refseq + index

> Agree with Hilmar, in that we need examples.

Another problematic one was NC_004822 - however, most other problems I
referred to were ones that did *not* stop de DBD indexing. (Of course, I
do not know how many error-throwing entries there still are in the files
that are not yet indexed: ca 75%).

The most common error was pollution of the 'binomial' name with
classification lines as a result of faulty parsing. If Bio::Species is
deprecated (I hadn't noticed that before) then these problems are of
course correspondingly less important.

>> If you are referring to your submitted bug:
> http://bugzilla.open-bio.org/show_bug.cgi?id=2167

Yes, the above was one that stopped refseq indexing (there is one more
that I will stick into bugzilla in a minute). Thanks for the commit.
>
> we could add this in as long as it passes (I'll try giving it a
> workout with my local bacterial seqs tonight or tomorrow).  However,
> in the not-too-distant future your patch would likely be rendered
> obsolete, as any parsing in Bio::SeqIO modules pertaining to
> Bio::Species-related matters will be deprecated in favor of simple
> parsing (more foolproof, less uncertainty) and Bio::Taxon (which has
> optional db lookups using NCBI Taxonomy).  Bio::Species and anything
> related to it are considered marked for deprecation.  Fair warning...

What does simple parsing mean? Just returning the whole ORGANISM string,
and leaving further parsing to application side?

(Continue reading)

Chris Fields | 2 Jan 00:19
Picon

Re: acquiring a local refseq + index


On Jan 1, 2007, at 2:17 PM, Erik wrote:

>> we could add this in as long as it passes (I'll try giving it a
>> workout with my local bacterial seqs tonight or tomorrow).  However,
>> in the not-too-distant future your patch would likely be rendered
>> obsolete, as any parsing in Bio::SeqIO modules pertaining to
>> Bio::Species-related matters will be deprecated in favor of simple
>> parsing (more foolproof, less uncertainty) and Bio::Taxon (which has
>> optional db lookups using NCBI Taxonomy).  Bio::Species and anything
>> related to it are considered marked for deprecation.  Fair warning...
>
> What does simple parsing mean? Just returning the whole ORGANISM  
> string,
> and leaving further parsing to application side?

Current behavior with parsing tries to determine genus/species from  
the data in the sequence record data alone, which has become  
increasingly more difficult and unreliable over the years.  Since a  
perfectly valid source for taxonomic information exists (NCBI  
Taxonomy), and each GenBank/EMBL sequence record is tagged with a  
relevant TaxID, it makes more sense to base reliable parsing of  
taxonomic data on that resource.

Sendu has essentially set up Bio::Taxon for that reason; Bio::Species  
has been changed to inherit Bio::Taxon (which is also a  
Bio::Tree::Node) but still exhibit older behavior (i.e. retain the  
old API).  It will gradually be shifted out in favor of Bio::Taxon by  
rel 1.8.  We hope.

(Continue reading)

Karen Buysse | 2 Jan 15:59
Picon
Favicon

repeatmasker

Dear all,

I want to use the BioPerl repeatmasker program.
However, when I run the following program (=first lines of the synopsis):

use Bio::Tools::Run::RepeatMasker;

  my @params=("mam" => 1,"noint"=>1);
  my $factory = Bio::Tools::Run::RepeatMasker->new(@params);

I get the error message: *RepeatMasker program not found as  or not 
executable.*
I have the file *RepeatMasker.pm* in the following directories:
C:/Perl/lib/Bio/Tools/Run
C:/Perl/site/lib/Bio/Tools/Run

Can anyone please help me with this?

Many thanks in advance and a happy new year,
Karen

--

-- 
ir. Karen Buysse
Center for Medical Genetics Ghent (CMGG)
Ghent University Hospital
Medical Research Building (MRB), 2nd floor, room 120.050
De Pintelaan 185, B-9000 Ghent, Belgium
+32 9 240 39 46 (phone) 
+32 9 240 65 49 (fax)
http://medgen.ugent.be
(Continue reading)

Picon
Favicon

Re: repeatmasker

Hi Karen,

It seems like you don't have the RepeatMasker executable installed in 
your machine and available for BioPerl to make use of it. You'll need to 
download and install it first. You can do it from here:

http://www.repeatmasker.org/

I don't know if there's a compiled version of RM available for Windows. 
Does anyone knows about this? Chris, Nathan?

Regards,
Mauricio.

Karen Buysse wrote:
> Dear all,
> 
> I want to use the BioPerl repeatmasker program.
> However, when I run the following program (=first lines of the synopsis):
> 
> use Bio::Tools::Run::RepeatMasker;
> 
>   my @params=("mam" => 1,"noint"=>1);
>   my $factory = Bio::Tools::Run::RepeatMasker->new(@params);
> 
> I get the error message: *RepeatMasker program not found as  or not 
> executable.*
> I have the file *RepeatMasker.pm* in the following directories:
> C:/Perl/lib/Bio/Tools/Run
> C:/Perl/site/lib/Bio/Tools/Run
(Continue reading)

Sendu Bala | 2 Jan 17:16
Picon
Favicon

Re: repeatmasker

Karen Buysse wrote:
> Dear all,
> 
> I want to use the BioPerl repeatmasker program.
> However, when I run the following program (=first lines of the synopsis):
> 
> use Bio::Tools::Run::RepeatMasker;
> 
>   my @params=("mam" => 1,"noint"=>1);
>   my $factory = Bio::Tools::Run::RepeatMasker->new(@params);
> 
> I get the error message: *RepeatMasker program not found as  or not 
> executable.*
> I have the file *RepeatMasker.pm* in the following directories:
> C:/Perl/lib/Bio/Tools/Run
> C:/Perl/site/lib/Bio/Tools/Run
> 
> Can anyone please help me with this?

Bioperl run modules are not programs but front-ends ('wrappers') to 
external programs. You need to install the program that corresponds to 
the module before it will work.

Visit http://www.repeatmasker.org/ to get the program and database.
Erik | 2 Jan 17:23
Picon
Picon
Favicon
Gravatar

Re: acquiring a local refseq + index


That seems like an real improvement over parsing the name out of the
text-entry. I'll use taxid = $seq->species->ncbi_taxid from now on.

Thanks for that elucidation. :)

That leaves the error-throwing problem in Bio::DB::Flat, which I
encountered while making a local RefSeq BerkeleyDB index.

I supposed it remains worthwhile to prevent the indexing from breaking on
Bio::SeqIO instantiation (at least for the RefSeq entry set), so I have
put a simple fix on bugzilla that prevents one more problem entry
(NC_004822) from breaking the indexing process.

Thanks,

Erikjan

> On Jan 1, 2007, at 2:17 PM, Erik wrote:
>
>>> we could add this in as long as it passes (I'll try giving it a
>>> workout with my local bacterial seqs tonight or tomorrow).  However,
>>> in the not-too-distant future your patch would likely be rendered
>>> obsolete, as any parsing in Bio::SeqIO modules pertaining to
>>> Bio::Species-related matters will be deprecated in favor of simple
>>> parsing (more foolproof, less uncertainty) and Bio::Taxon (which has
>>> optional db lookups using NCBI Taxonomy).  Bio::Species and anything
>>> related to it are considered marked for deprecation.  Fair warning...
>>
>> What does simple parsing mean? Just returning the whole ORGANISM
(Continue reading)

Chris Fields | 2 Jan 17:21
Picon

Re: repeatmasker

RepeastMasker.pm is a wrapper for the RepeatMasker program.  This is  
from the POD:

DESCRIPTION
        RepeatMasker is a program that screens DNA sequences for  
interspersed
        repeats known to exist in mammalian genomes as well as for  
low complex-
        ity DNA sequences. For more information, on the program and  
its usage,
        please refer to http://repeatmasker.genome.washington.edu/.

Newer versions are available here:

http://www.repeatmasker.org/

You need to install RepeatMasker in your PATH in order to use the  
wrapper.

chris

On Jan 2, 2007, at 8:59 AM, Karen Buysse wrote:

> Dear all,
>
> I want to use the BioPerl repeatmasker program.
> However, when I run the following program (=first lines of the  
> synopsis):
>
> use Bio::Tools::Run::RepeatMasker;
(Continue reading)

Fairley, Derek | 2 Jan 17:13
Picon

Re: repeatmasker

Hi Karen,

RepeatMasker isn't a BioPerl program - although the
Bio::Tools::Run::RepeatMasker module can call it if it's properly
installed on your system. RepeatMasker also requires either Cross_Match
or WUBlast to be installed, in addition to a local database containing
repeat data.

Can you run RepeatMasker from the command line okay?

Derek.

-----Original Message-----
From: bioperl-l-bounces <at> lists.open-bio.org
[mailto:bioperl-l-bounces <at> lists.open-bio.org] On Behalf Of Karen Buysse
Sent: 02 January 2007 15:00
To: bioperl-l <at> lists.open-bio.org
Subject: [Bioperl-l] repeatmasker

Dear all,

I want to use the BioPerl repeatmasker program.
However, when I run the following program (=first lines of the
synopsis):

use Bio::Tools::Run::RepeatMasker;

  my @params=("mam" => 1,"noint"=>1);
  my $factory = Bio::Tools::Run::RepeatMasker->new(@params);

(Continue reading)

Nathan S. Haigh | 2 Jan 17:44
Picon
Favicon

Re: repeatmasker

Mauricio Herrera Cuadra wrote:
> Hi Karen,
>
> It seems like you don't have the RepeatMasker executable installed in 
> your machine and available for BioPerl to make use of it. You'll need to 
> download and install it first. You can do it from here:
>
> http://www.repeatmasker.org/
>
> I don't know if there's a compiled version of RM available for Windows. 
> Does anyone knows about this? Chris, Nathan?
>
> Regards,
> Mauricio.
>
>   

Ah yes, I didn't see the Windows file paths! It doesn't look like there
is an executable available from their website. Karen - it might be worth
contacting them to check!

Nath
Marian Thieme | 2 Jan 17:48
Picon

store variations, generate sequences

Hi all,

I am quite new to bioperl and I have a question about sequence data: I
am working on a resequencing project. Here we have resequenced 1000
genes of a certain gene. My question: What is easiest way to store each
discovered variation of each individual and get a fasta sequence for an
arbitrary individual.

I would expect that there is some way to set up a reference sequence and
store all variationsm relative to this reference sequence. Afterward it
should be possible to genereate sequences for each indiviudal in
question, right ?

My approach was the following:

I have created an seqdiff object:

$seqDiff = Bio::Variation::SeqDiff->new (...)

and I have assigned the reference sequence to that object via:

$seqDiff->dna_ori('atgcgtatatg');

Now I thought, I can create some variations via DNAMutation object:

$dnamut = Bio::Variation::DNAMutation->new (
  -start => 6,
  -end => 6,
  -length => 1,
  -isMutation => 1,
(Continue reading)


Gmane