Hoebeke Mark | 1 Apr 2004 21:06
Picon
Picon
Favicon

Parsing GenBank files in Threads

Hi,

I was wondering if Sequence objects are thread-safe.

In a pipeline I am developing I parse a set of GenBank flat files in
several simultaneously executing threads (one file per thread,
obviously), to feed them in a database.

On execution I get erratic ArrayOutOfBoundsExceptions when invoking the
SimpleSequence.getString() method. The indices in question are mostly
negative. The SimpleSequence instance is created trough a
SequenceIterator obtained  with SeqIOTools.readGenbank().

When I prefix the method making this call with 'static synchronized' the
errors disappear.

I leafed through 6 months worth of mail archives looking for clues, to
no avail.

My guess is that there could be something odd happening in the
SeqIOTools.readGenbank() method, which is declared as a static method.

Any help would be greatly appreciated.

Thanks in advance,

Mark 

--

-- 
--------------------------Mark.Hoebeke <at> jouy.inra.fr----------------------
(Continue reading)

Thomas Down | 1 Apr 2004 21:18
Picon
Favicon

Dazzle 1.00 release

I've just put out a new release of the Dazzle modular DAS server.  The 
new release features improvements to the plugin API, updates to work 
with the BioJava 1.4 release, and many minor fixes.

You can download either source or a prebuilt web-application skeleton 
from:

      http://www.biojava.org/download/dazzle/

If you have the Subversion client available, you can also check it out 
from:

      http://www.derkholm.net/svn/repos/dazzle/tags/dazzle-1.00

Installation instructions can be found at:

      http://www.biojava.org/dazzle/deploy.html

Please not that this release *will not* work with the old ensembl-das 
plugins.  I've been working on a major re-write of the biojava-ensembl 
adaptors, including support for the schema changes in Ensembl release 
20 and some substantially improved DAS plugins.  There should be a 
release sometime next week.  In the mean time, development code is 
available from the Subversion repository at:

     http://www.derkholm.net/svn/repos/biojava-ensembl/trunk

Thomas.

_______________________________________________
(Continue reading)

Matthew Pocock | 1 Apr 2004 21:31
Picon
Favicon

Re: Parsing GenBank files in Threads

Hi,

The biojava policy on synchronization is that we try to make things safe 
if possible, but expect the user to synchronize sanely. Unfortunately, 
this is usually not documented anywhere. I could not guarantee that 
GenbankFormat is threadsafe - it would be sensible for it to be, but the 
particular implementation may not be. To help us track this, could you 
include some example stack traces of eratic behavior?

Matthew

Hoebeke Mark wrote:

>Hi,
>
>I was wondering if Sequence objects are thread-safe.
>
>In a pipeline I am developing I parse a set of GenBank flat files in
>several simultaneously executing threads (one file per thread,
>obviously), to feed them in a database.
>
>On execution I get erratic ArrayOutOfBoundsExceptions when invoking the
>SimpleSequence.getString() method. The indices in question are mostly
>negative. The SimpleSequence instance is created trough a
>SequenceIterator obtained  with SeqIOTools.readGenbank().
>
>When I prefix the method making this call with 'static synchronized' the
>errors disappear.
>
>I leafed through 6 months worth of mail archives looking for clues, to
(Continue reading)

Hoebeke Mark | 2 Apr 2004 08:41
Picon
Picon
Favicon

Re: Parsing GenBank files in Threads

Hi Matthew,

I just finished some further investigation, strengthening my feeling
that using SeqIOTools.readGenbank() might not be thread-safe.

The strongest point is that the errors appear less frequently on
uniprocessor machines that on multiprocessor ones.

As you requested, below is a snippet  of the the exception stack whith 
the bio.* related part delimited by ===========. This pattern repeats
itself for different Genbank files except for the actual value of the
corrupt(?) index.

Note that the problem is solved by prefixing the method calling
Sequence.seqString() with static synchronized, but that takes all the
fun out of the pipeline ;)

If needed, I can hand you the complete source file but I thought I'd
better not spam biojava-l with it.

Thanks for your support.

Mark

     [java] org.quartz.JobExecutionException: java.lang.Exception:
Unable to extract sequence from entry BA000019 [See nested exception:
java.lang.Exception: Unable to extract sequence from entry BA000019]
     [java] 	at pipeline.jobs.EntryFeeder.execute(EntryFeeder.java:241)
     [java] 	at org.quartz.core.JobRunShell.run(JobRunShell.java:178)
     [java] 	at
(Continue reading)

Richard Bruskiewich | 2 Apr 2004 13:42

Anybody working on a Java binding to CHADO?

Hi folks (especially Matt and Thomas...),

I've finally decided to subscribe to this list since we're getting more
heavily into Java here at IRRI.

First question: who is working on a Java binding to CHADO?

Cheers
Richard Bruskiewich
Bioinformatics Specialist
International Rice Research Institute (IRRI)

_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> biojava.org
http://biojava.org/mailman/listinfo/biojava-l

David Huen | 2 Apr 2004 14:20
Picon
Picon

Re: Anybody working on a Java binding to CHADO?

On Friday 02 Apr 2004 12:42 pm, Richard Bruskiewich wrote:
> Hi folks (especially Matt and Thomas...),
>
> I've finally decided to subscribe to this list since we're getting more
> heavily into Java here at IRRI.
>
> First question: who is working on a Java binding to CHADO?
>
Hi Richard,
I worked on one for some time before suspending my efforts.

The main problem is that the fit between the Chado data model and the 
BioJava one is particularly poor.  BioJava has a hierarachical feature 
model in which features have subfeatures etc.  All features are associated 
with a location on a sequence.  This made sense at the time BJ was being 
developed and is also inherent in a BioSQL type of world.

Chado, however, unlinks features from sequences and from each other.  Now, 
features can be associated with multiple locations on different sequences.  
Also, features have relationships to each other that are potentially 
non-hierarchical (DAG for example).  The Chado data model is probably more 
powerful and expressive.

I made one attempt to shoehorn Chado's predecessor into the Biojava by 
forcing particular relationships into particular positions in the BJ 
hierarchy but the results are ugly and fragile.  With Chado being more 
generalised that its predecessor, the results have been even more messy.  
For example, BJ uses getParent() to move up the hierarchy but an attemtp to 
use this in Chado would be ambiguous - by what relationship would the other 
object be a parent?  SImilarly, all our feature selection through filter() 
(Continue reading)

Hoebeke Mark | 2 Apr 2004 15:11
Picon
Picon
Favicon

Re: Parsing GenBank files in Threads

Wow,

replacing the biojava-1.3.1.jar with the one recompiled from the
biojava-live CVS tree seems to solve the problem.

Many thanks for suggesting just that, Matthew and sorry I didn't start
checking that out. But heck, I'm a Biojava newbie after all.

Mark

--

-- 
--------------------------Mark.Hoebeke <at> jouy.inra.fr----------------------
Unité Statistique & Génome                                      Unité MIG
+33 (0)1 60 87 38 03                   Tél.          +33 (0)1 34 65 28 85
+33 (0)1 60 87 38 09                   Fax.          +33 (0)1 34 65 29 01
Tour Evry 2, 523 pl. des Terrasses              INRA - Domaine de Vilvert
F - 91000 Evry                              F - 78352 Jouy-en-Josas CEDEX

_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> biojava.org
http://biojava.org/mailman/listinfo/biojava-l

Christian Gruber | 2 Apr 2004 15:20

Blast Parser oddity

Hi!

I am currently evaluating the XML output of NCBI Blast, and the ability 
of BioJava to parse this output. For this purpose, I have done twice the 
identical blastp and blastn (i.e. the same sequence against the same 
database with the same parameters), one time with the standard output, 
and one time with XML output ("-m 7"). I then parsed the files either 
with BlastLikeSAXParser (original output), or with BlastXMLParserFacade 
(XML output) and compared the outcome. Surprisingly, I got two different 
results...

Here is a list of the fields that are different:

SeqSimilaritySearchResult:
   Annotation:
     databaseId
     program
     queryId
     version

SeqSimilaritySearchHit:
   subjectId
   queryStrand
   subjectStrand
   Annotation:
     subjectDescription
     subjectId

SeqSimilaritySearchSubHit:
   queryStrand
(Continue reading)

Wiepert, Mathieu | 7 Apr 2004 16:21
Favicon

RE: Blast Parser oddity

Hi,

The problem is likely with the Blastoutput and not the parsers.  This was addressed in the bioperl list in
January, and biojava lists back in 2002.  Check out some of the postings at

http://www.biojava.org/pipermail/biojava-l/2002-March/002311.html
http://bioperl.org/pipermail/bioperl-l/2004-January/014749.html

There are more postings on the bioperl archive for January as well, if you need more information
http://bioperl.org/pipermail/bioperl-l/2004-January/

And note also Jason pointed out other difference

http://bioperl.org/pipermail/bioperl-l/2004-February/014769.html

HTH,

-mat

> -----Original Message-----
> From: biojava-l-bounces <at> portal.open-bio.org
> [mailto:biojava-l-bounces <at> portal.open-bio.org]On Behalf Of Christian
> Gruber
> Sent: Friday, April 02, 2004 7:20 AM
> To: biojava-l <at> open-bio.org
> Subject: [Biojava-l] Blast Parser oddity
> 
> 
> Hi!
> 
(Continue reading)

Khalil El Mazouari | 8 Apr 2004 18:38
Favicon

RestrictionEnzymeManager

Hi
I am trying to use the RestrictionEnzymeManager with a new REBASE file. 
I followed the recommendation concerning the 
RestrictionEnzymeManager.properties file and the CLASSPATH, but the 
RestrictionEnzymeManager still load the default 
RestrictionEnzymeManager.properties file that came with biojava. Any 
idea on what is going wrong?
Thank you.
Khalil

-----------------------------------------------------------------------
THIS E-MAIL MESSAGE IS INTENDED ONLY FOR THE USE OF THE INDIVIDUAL OR
ENTITY TO WHICH IT IS ADDRESSED AND MAY CONTAIN INFORMATION THAT IS
PRIVILEGED, CONFIDENTIAL AND EXEMPT FROM DISCLOSURE.
If the reader of this E-mail message is not the intended recipient, you
are hereby notified that any dissemination, distribution or copying of
this communication is strictly prohibited.  If you have received this
communication in error, please notify us immediately at
ablynx <at> ablynx.com. Thank you for your co-operation.
-----------------------------------------------------------------------
_______________________________________________
Biojava-l mailing list  -  Biojava-l <at> biojava.org
http://biojava.org/mailman/listinfo/biojava-l


Gmane