Chris Fields | 1 Sep 01:23 2006
Picon

Re: Memory requirements for conversion from embl to genbank

>> ...
>> We'll try to do what we can, but only within the bounds of not  
>> breaking
>> code, performance, losing sanity, etc.
>
> I have never argued. I'll stop poisoning your mailboxes. Thanks for  
> all the fixes.
> Will upload my current patch for your convenience, I hope.
> M.

Not my point to stop your posts.  What I'm trying to get across to  
you is we can only do so much with sequence input that is already bad  
(i.e. misformatted) to begin with.  We can't break working code to  
fix bad sequences.

Regardless, please remember what I posted about submitting bugs.   
Your recent additions to the various bug reports in Bugzilla have  
little to do with your original bug summaries, so should be either  
posted as new bugs or sent to the mail list.

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
Chris Fields | 1 Sep 01:26 2006
Picon

Re: Memory requirements for conversion from embl to genbank

Yet another problem?!?  Martin, I'll ask this again, mainly out of  
curiosity: have you ever contacted the people who generated this to  
let them know of the problems?

This one is definitely not valid: can't have a lineage w/o an organism!

Chris

On Aug 31, 2006, at 5:04 PM, Martin MOKREJŠ wrote:

> Sendu,
>   one more problem with the taxonomic code:
>
> ID   5MLE000012 standard; mRNA; VRL; 421 BP.
> XX
> AC   BB136482;
> XX
> DT   26-JUL-2001 (Rel. 15, Created)
> DT   26-JUL-2001 (Rel. 15, Last updated, Version 1)
> XX
> DE   5'UTR in Murine leukemia virus Mo Ampho MCF recombinant gPr80  
> envelope
> DE   polyprotein (env) gene, complete cds.
> XX
> DR   EMBL; U36991;
> DR   UTR; CC147674;
> XX
> OC   Viruses; Retro-transcribing viruses; Retroviridae;  
> Orthoretrovirinae;
> OC   Gammaretrovirus.
(Continue reading)

Caleb F. Davis | 1 Sep 02:00 2006
Picon

trouble with pairwise_kaks.PLS on Cygwin/XP platform

Hi folks,

Thank you for all your hard work with Bioperl.  My name is Caleb Davis, 
a first year grad student at Baylor College of Medicine.  I'm trying to 
use bioperl on a WinXP/cygwin configuration.  I've got it working great 
for everything up to this point.  I can read and write files, use the 
wrappers around bl2seq, etc.  However, I am now getting starting using 
PAML, and I am stuck.  I could not find anyone with my problem on the 
mailing list so I thought I'd ask you.

I'm using pairwise_kaks.PLS code and sample data to test my machine 
configuration first.  The code fails in various ways, and I can't figure 
out why.  I'm guessing it has something to do with the tmpdir 
environmental variable and the differences between windows and unix 
paths, but that's a stab in the dark really.

Will you point me in the right direction?  Thanks,
--Caleb

Included below are:

********
output of the command:
pairwise_kaks.PLS -i ~/src/bioperl/bioperl-live/t/data/worm_fam_2785.cdna
********
my environmental variables set in .bashrc
********

//////////////////////////////////////////////

(Continue reading)

Caleb Davis | 1 Sep 01:53 2006
Picon

Re: UCSC database backend

Hi folks, first time caller here.  Love the show!

I just started going through the archive and saw this thread.  I vote in 
favor of this interface, for what it's worth.  What about doing it this 
way?:

$objSeqIO  = Bio::SeqIO->new(-file => '~/seq/myseqCustomTrack.bed',
                         -format => 'bed',
                         -assembly => 'hg18',
                         -track => 'hg18_myfavgenes');    #see example below

If this worked you could have any genomic sequence from any assembly at 
your fingertips.  The custom track format is a common way to specify 
arbitrary genomic coordinates and region identifiers.  There are a few 
different formats, but BED is my current favorite 
(http://genome.ucsc.edu/goldenPath/help/customTrack.html#format).  The 
nice thing about custom tracks is that you can use them as hooks to 
retrieve assembly sequence, as well as to visualize the regions of 
interest directly in the browser.

A brief tutorial for using the table browser to retrieve 
sequence...Here's an example of a plausible myCustomTrack.bed.  In this 
example the coordinates happen to correspond to genes, but they could 
point to arbitrary genomic segments:
---------
browser position chr1:1940702-1952050
track name=hg18_myfavgenes description="my favorite genes"
chr1    1940702    1952050    GABRD|NM_000815    1000    +    1940722   
 1951581    0,0,0    9    88,113,68,221,83,138,156,212,769   
 0,5538,5930,6114,8173,8751,9707,10147,10579
(Continue reading)

Brian Osborne | 1 Sep 03:11 2006
Picon

Re: trouble with pairwise_kaks.PLS on Cygwin/XP platform

Caleb,

In Cygwin there are 3 possible formats for a given path. Examples:

/cygdrive/c/cygwin/tmp
c:/cygwin/tmp
/tmp

What works will depend on the application and how it was compiled. Have you
tried all variations?

Brian O.

On 8/31/06 8:00 PM, "Caleb F. Davis" <cdavis <at> bcm.tmc.edu> wrote:

> out why.  I'm guessing it has something to do with the tmpdir
> environmental variable and the differences between windows and unix 
Martin MOKREJŠ | 1 Sep 09:24 2006
Picon

Re: Memory requirements for conversion from embl to genbank

This problem did not appear with 1.5.1. I will contact the people for sure
next week.
M.

Chris Fields wrote:
> Yet another problem?!?  Martin, I'll ask this again, mainly out of 
> curiosity: have you ever contacted the people who generated this to  let
> them know of the problems?
> 
> This one is definitely not valid: can't have a lineage w/o an organism!
> 
> Chris
> 
> On Aug 31, 2006, at 5:04 PM, Martin MOKREJŠ wrote:
> 
>> Sendu,
>>   one more problem with the taxonomic code:
>>
>> ID   5MLE000012 standard; mRNA; VRL; 421 BP.
>> XX
>> AC   BB136482;
>> XX
>> DT   26-JUL-2001 (Rel. 15, Created)
>> DT   26-JUL-2001 (Rel. 15, Last updated, Version 1)
>> XX
>> DE   5'UTR in Murine leukemia virus Mo Ampho MCF recombinant gPr80 
>> envelope
>> DE   polyprotein (env) gene, complete cds.
>> XX
>> DR   EMBL; U36991;
(Continue reading)

Daniel Lang | 1 Sep 12:11 2006
Picon

retrieval of PRELIMINARY uniprot sequences using Bio::Registry fails

Hi,

when using Bio::Registry (bioperl-live) to fetch uniprot entries from
local indexed uniprot *.dats, I had to realize that several entries
could not be retrieved despite the fact that they are present in the
files! A closer look reveals that they are of status PRELIMINARY:

uniprot_trembl.dat:ID   Q16EZ1_AEDAE   PRELIMINARY;   PRT;   222 AA.

I don't "grep" PRELIMINARY anywhere in my cvs checkout..
I also can't retrieve the sequences from the online database defined as
follows:
[swissprot_ebi]
protocol=biofetch
location=http://www.ebi.ac.uk/cgi-bin/dbfetch
dbname=swall

Is this a bug or a feature? If its a feature, how can I bypass it?

Thanks in advance,
Daniel
--

-- 

Daniel Lang
University of Freiburg, Plant Biotechnology
Schaenzlestr. 1, D-79104 Freiburg
fax: +49 761 203 6945
phone: +49 761 203 6974
homepage:  http://www.plant-biotech.net/
e-mail: daniel.lang <at> biologie.uni-freiburg.de
(Continue reading)

Sean Davis | 1 Sep 13:53 2006
Picon

Re: UCSC database backend

On Thursday 31 August 2006 19:53, Caleb Davis wrote:
> Hi folks, first time caller here.  Love the show!
>
> I just started going through the archive and saw this thread.  I vote in
> favor of this interface, for what it's worth.  What about doing it this
> way?:
>
> $objSeqIO  = Bio::SeqIO->new(-file => '~/seq/myseqCustomTrack.bed',
>                          -format => 'bed',
>                          -assembly => 'hg18',
>                          -track => 'hg18_myfavgenes');    #see example

Hi, Caleb.  Welcome to the list.  

What you are proposing seems to be two separate but related tasks.  First, 
parse bed-format files into bioperl-compatible sequence objects.  Second, 
once those are in, pull sequence if desired from UCSC.  

For the first, you could certainly write a parser for bed format that would 
give back sequence objects.  You might also want to look at the GFF format, 
as there are quite a few tools for GFF parsing, formatting, and sequence 
retrieval from local databases.  

For the second task, if what you are after is a straightforward way of 
retrieving arbitrary sequences bases on location, then you might want to look 
at the DAS service set up by ucsc.  Doing what you propose would be as simple 
as reading in a format your choice and then constructing a url like:

http://genome.ucsc.edu/cgi-bin/das/hg18/dna?segment=chr1:1,5000;segment=chr10:52000,53000

(Continue reading)

Sendu Bala | 1 Sep 14:25 2006
Picon

Re: SiteMatrix changes

Sendu Bala wrote:
> skirov wrote:
>> Sounds OK with me. To summarize:
>> 1. Correction is disabled by default.
>> 2. Correction should be applied to all positions.
>> 3. Thresholds for IUPAC consensus can be user defined.
>> 4. A fix for IUPAC consensus calculation: change the defaukt behavior.
>> 5. Document the options
>> Does this sounds right?
> 
> Yes, sounds good to me. I'll code those up shortly.

This is now done. I didn't quite do it the way you suggested: the 
'threshold' for IUPAC consensus is implemented as significance level for 
rounding the frequencies. This way we don't have to suffer some 
arbitrary cutoff that does unexpected things. You may also want to check 
the changes to the documentation (mostly in new() and the description) 
to make sure I understood and explained everything well enough.

The consensus string now also enforces the supplied or default 
threshold, and treats the threshold the way most people might think of 
such a thing - the minimum acceptable value (inclusive). This doesn't 
seem to actually change the answer for the few test matrices used by the 
test scripts (though the test script answers have changed since we're 
not doing pseudo-count correction anymore).

One issue is that there's no way for the user to decide to do 
pseudo-count correction or not when using the PSM::IO modules. The 
correction should probably be farmed out to a separate method. I don't 
plan to do this myself.
(Continue reading)

skirov | 1 Sep 15:14 2006
Picon

Re: SiteMatrix changes

>===== Original Message From Sendu Bala <bix <at> sendu.me.uk> =====
>Sendu Bala wrote:
>> skirov wrote:
>>> Sounds OK with me. To summarize:
>>> 1. Correction is disabled by default.
>>> 2. Correction should be applied to all positions.
>>> 3. Thresholds for IUPAC consensus can be user defined.
>>> 4. A fix for IUPAC consensus calculation: change the defaukt behavior.
>>> 5. Document the options
>>> Does this sounds right?
>>
>> Yes, sounds good to me. I'll code those up shortly.
>
>This is now done. I didn't quite do it the way you suggested: the
>'threshold' for IUPAC consensus is implemented as significance level for
>rounding the frequencies. This way we don't have to suffer some
>arbitrary cutoff that does unexpected things. You may also want to check
>the changes to the documentation (mostly in new() and the description)
>to make sure I understood and explained everything well enough.
>
Thanks Sendu. I will do that.

>The consensus string now also enforces the supplied or default
>threshold, and treats the threshold the way most people might think of
>such a thing - the minimum acceptable value (inclusive). This doesn't
>seem to actually change the answer for the few test matrices used by the
>test scripts (though the test script answers have changed since we're
>not doing pseudo-count correction anymore).
>
>One issue is that there's no way for the user to decide to do
(Continue reading)


Gmane