andrea | 4 Nov 2002 12:14
Picon
Favicon

Re: [BioPython] Re: BioPython -- confirmation of subscription -- request 820429

confirm 820429

_______________________________________________
BioPython mailing list  -  BioPython <at> biopython.org
http://biopython.org/mailman/listinfo/biopython

Gavin Sherlock | 4 Nov 2002 23:17
Picon

[BioPython] 4th MGED Programming Jamboree

Hi all,

	I am please announce that the 4th MGED (Microarray Gene Expression
Data Society) programming jamboree will be taking place at Stanford
University, between December 6th and 10th, 2002.  The first day is
intended as a tutorial to educate people about the MIAME standard, which
indicates what information should be recorded about a microarray
experiment, and MAGE, which consists of an Object Model, an XML format,
and a software toolkit.  Nature, Science, Cell and the Lancet recently
adopted MIAME as a standard which microarray papers should strive for.  
The rest of the tutorial will be devoted towards improving the software
toolkit, for which Perl, Java and Python versions exist, at various stages
of maturity.  There is also intended to be a C++ toolkit, though not much
has happened on that front yet.  We also hope to develop use cases for as
much of the model as possible during the jamboree, to ease its adoption.

The web page announcing the jamboree is at:

http://www.dnachip.org/mged/                                                                                                        

The driving directions aren't quite accurate yet.....                                                                               

The best (though not necessarily cheapest) hotels, in terms of                                                                      
location, are the Sheraton in Palo Alto, and the Cardinal.                                                                          

There is a short registration form, so I can get an idea of the number of
attendees, so I can get an appropriate room for the tutorial day.  For the
jamboree proper, we will be limited to 24 Net connections, though everyone
doesn't need to be online all the time.  To be able to give you an IP
address, Stanford does need the hardware address of your laptop's ethernet
(Continue reading)

Andreas Kuntzagk | 11 Nov 2002 11:34
Picon

[BioPython] Change in NCBI efetch cgi?

Just a note for anybody using the GenBank.NCBIDictionary or the
WWW.NCBI.efetch()

I'm not sure, but I think there was a change in this cgi.

As far as I can remember, some days before 

>>> from Bio import GenBank
>>> n= n=GenBank.NCBIDictionary()
>>> n["AA000001"]

would give a Record in GenBank-Format.
Now I get another Format (which i don't know)
If I want GenBank-Format, I have to use

>>> n=GenBank.NCBIDictionary(database="nucleotide")

My (educated?) guess is, that NCBI changed the "native" rettype for the
"sequence" db in efetch.cgi (which are the default values NCBIDictionary
uses.)

ciao, Andreas

Btw. there seems to be very low traffic lately on this list (except the
usual spam :-( )
Is there any reason for this. (A meeting of biopython hackers, another
mailing list, biopython beeing abandon?) 

_______________________________________________
BioPython mailing list  -  BioPython <at> biopython.org
(Continue reading)

Estienne Swart | 12 Nov 2002 11:29
Picon

[BioPython] Biopython object serialization

Hi All,

I've been wondering about a decent way of storing biopython objects for some time now. It looks like there has been some progress (CVS) on interfacing Biopython with a relational DB system, but is this the best approach? For instance, say you'd actually like to store sequences within the database (which require one of the large text field types), you then find yourself having to deal with relatively long data retrieval times (if memory serves me right, it takes on the order of a couple seconds to retrieve a single sequence from a database containing a few thousand entries, with sequences stored in the medium text field).

Have any of the biopython developers attempted/considered using an object database, such as ZODB, or at least assessed the relative merits of some different approaches to data storage/object persistence?

I recently came across an article about object persistence (http://www-106.ibm.com/developerworks/linux/library/l-pypers.html), by Patrick O'Brien (the name should ring a bell to those of you that read his O'Reilly article on Bioinformatics). He advocates the use of his own solution to persistence,
PyPerSyst, which is supposedly faster than ZODB, and simpler to implement too.

Do you think that some benchmarking would be in order (not that I'm volunteering)?

What course will Biopython be persuing in the near future (as far as object serialization is concerned)? Is there room for alternatives besides those that use relational databases, i.e. will they be competitive as far as performance is concerned.

Cheers

Estienne

--
Estienne Swart
SANBI, UWC Private Bag X17, Bellville 7535
estienne <at> sanbi.ac.za
tel work: +27 21 959 3908
tel home: +27 21 448 8118
fax work: +27 21 959 2512

Patrick K. O'Brien | 12 Nov 2002 19:50

Re: [BioPython] Biopython object serialization

On Tuesday 12 November 2002 04:29 am, Estienne Swart wrote:
> I recently came across an article about object persistence
> (http://www-106.ibm.com/developerworks/linux/library/l-pypers.html),
> by Patrick O'Brien (the name should ring a bell to those of you that
> read his O'Reilly article on Bioinformatics). He advocates the use of
> his own solution to persistence,
> PyPerSyst <http://sourceforge.net/projects/pypersyst/>, which is
> supposedly faster than ZODB, and simpler to implement too.

Thanks for mentioning the article. I hope you liked it.

I think a couple of caveats are in order concerning the PyPerSyst 
approach. First, there is only a basic implementation available right 
now. I think it needs more layers of abstraction to be truly useful to 
support persistence in a completely robust and transparent fashion. 
That will take a while. I'm a perfectionist and I want to do it right.

Second, the entire approach relies on keeping all objects in memory. 
That's why it is faster and simpler than a database and ZODB. Once you 
decide you need to support more objects than can fit in RAM, you need 
to add quite a lot of overhead to check if an object is in memory and 
what state it is in, swapping objects to and from disk and various 
states, etc. All those mechanisms are necessary but also add complexity 
and slow things down. That's why ZODB is bigger and slower.

So while I am an advocate of the PyPerSyst/Prevayler approach, I'm also 
targeting applications that can fit within the primary constraint of 
physical RAM, and Python developers who recognize that this is a new 
project that doesn't yet come with all the bells and whistles that we'd 
like it to have some day. Of course, how long that takes depends in 
part on getting other people involved with the project, so I have to do 
some amount of promotion.

I hope that helps clarify where PyPerSyst might be useful and where it 
might not.

--

-- 
Patrick K. O'Brien
Orbtech      http://www.orbtech.com/web/pobrien
-----------------------------------------------
"Your source for Python programming expertise."
-----------------------------------------------

_______________________________________________
BioPython mailing list  -  BioPython <at> biopython.org
http://biopython.org/mailman/listinfo/biopython

Jeffrey Chang | 13 Nov 2002 02:03
Picon

Re: [BioPython] Change in NCBI efetch cgi?

On Mon, Nov 11, 2002 at 11:34:03AM +0100, Andreas Kuntzagk wrote:
[GenBank.NCBIDictionary returns a weird format]

> My (educated?) guess is, that NCBI changed the "native" rettype for the
> "sequence" db in efetch.cgi (which are the default values NCBIDictionary
> uses.)

Yep, that seems reasonable.  I've gone through and changed the
NCBI.Dictionary code to use GenBank "gb" format by default.  Thanks
for the bug report and diagnosis!

> Btw. there seems to be very low traffic lately on this list (except the
> usual spam :-( )
> Is there any reason for this. (A meeting of biopython hackers, another
> mailing list, biopython beeing abandon?) 

No reason, except that biopython developers are busy working on their
day jobs.  We're still here!

Jeff

_______________________________________________
BioPython mailing list  -  BioPython <at> biopython.org
http://biopython.org/mailman/listinfo/biopython

Jeffrey Chang | 14 Nov 2002 04:02
Picon

Re: [BioPython] Biopython object serialization

On Tue, Nov 12, 2002 at 12:29:33PM +0200, Estienne Swart wrote:
> I've been wondering about a decent way of storing biopython objects for 
> some time now. It looks like there has been some progress (CVS) on 
> interfacing Biopython with a relational DB system, but is this the best 
> approach? For instance, say you'd actually like to store sequences 
> within the database (which require one of the large text field types), 
> you then find yourself having to deal with relatively long data 
> retrieval times (if memory serves me right, it takes on the order of a 
> couple seconds to retrieve a single sequence from a database containing 
> a few thousand entries, with sequences stored in the medium text field).

Aahhh, you're referring to the BioSQL project, I think.  Is this a
theoretical concern, or are you really having problems with the
performance with relational databases?

> Have any of the biopython developers attempted/considered using an 
> object database, such as ZODB, or at least assessed the relative merits 
> of some different approaches to data storage/object persistence?

No, although perhaps it is warranted.

The BioSQL project was originally started by Ewan Birney with bioperl.
Since then, interfaces to it have been written in Python and Java.
Thus, one of the requirements is that the data stored must be
accessible from those other languages as well.  This ruled out ZODB
and many other systems that were not as well supported across
different languages.

That's not to say that Biopython wouldn't benefit from supporting
other types of data storage systems that might have different
performance characteristics as relational databases.  However, unless
you're volunteering, I'm not optimistic that that's going to happen,
since the relational DB approach works well enough for most people.
;)

> I recently came across an article about object persistence 
> (http://www-106.ibm.com/developerworks/linux/library/l-pypers.html), by 
> Patrick O'Brien (the name should ring a bell to those of you that read 
> his O'Reilly article on Bioinformatics). He advocates the use of his own 
> solution to persistence,
> PyPerSyst <http://sourceforge.net/projects/pypersyst/>, which is 
> supposedly faster than ZODB, and simpler to implement too.
> 
> Do you think that some benchmarking would be in order (not that I'm 
> volunteering)?

Dang, not volunteering.

> What course will Biopython be persuing in the near future (as far as 
> object serialization is concerned)? Is there room for alternatives 
> besides those that use relational databases, i.e. will they be 
> competitive as far as performance is concerned.

For the forseeable future, we'll be working on improving BioSQL
support.  It seems to be working reasonably well enough that there's
no imminent plans to change.

Jeff

_______________________________________________
BioPython mailing list  -  BioPython <at> biopython.org
http://biopython.org/mailman/listinfo/biopython

Mike Sears | 15 Nov 2002 00:33
Picon

[BioPython] NeuralNetworks

Can anyone point me to some example code for a simple BPN using the Bio 
classes provided in Biopython.

Thanks,

Mike Sears

Michael Sears, Ph.D.
Department of Life Sciences
Indiana State University
Terre Haute, IN 47809

812-237-9638
_______________________________________________
BioPython mailing list  -  BioPython <at> biopython.org
http://biopython.org/mailman/listinfo/biopython

JP Glutting | 15 Nov 2002 17:07
Picon

Re: [BioPython] NeuralNetworks

Hi Mike,

I wrote a little script to try to do MHC binding prediction. It does not 
work, as a predictor, but it is an example. There are also some good 
tidbits of information in the modules themselves (like different issues 
related to increases in errors in the validation set).

Let me know if you have any questions - I have not looked at this in 
months, but I still remember some of the reasons I set it up this way.

And, if anyone out there really knows something about Neural Networks, I 
would love to get some good feedback.

Cheers,

JP

Mike Sears wrote:
> Can anyone point me to some example code for a simple BPN using the Bio 
> classes provided in Biopython.
> 
> Thanks,
> 
> Mike Sears
> 
> Michael Sears, Ph.D.
> Department of Life Sciences
> Indiana State University
> Terre Haute, IN 47809
> 
> 812-237-9638
> _______________________________________________
> BioPython mailing list  -  BioPython <at> biopython.org
> http://biopython.org/mailman/listinfo/biopython
> 

#!/usr/bin/env python
import random

aas =  ['G','A','V','I','L','M','F','Y','W','H','C','P','K','R','D','E','Q','N','S','T']
datamask = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,]
kbpeps = ['SIINFEKL', 'HIYEFPQL', 'SNYLFTKL', 'SSLPFQNI', 'RCQIFANI', 'SSWDFITV', 'SSISFCGV',
'LSPFPFDL', 'INFDFPKL', 'QTFDFGRL', 'INYEYAIV', 'TTIHYKYM', 'NAIVFKGL', 'AVYNFATC', 'VYDFFVWL',
'EQYKFYSV', 'AQYKFIYV', 'ANYDFICV', 'VDYNFTIV', 'RGYVYQGL', 'RTYTYEKL', 'RFYRTCKL', 'YAMIYRNL',
'IIYRFLLI', 'SMGIYQIL', 'APGNYPAL', 'KSPWFTTL', 'GVYINTAL', 'ICPMYARV', 'GGPIYRRV', 'GLEEYSAM',
'VYIEVLHL', 'SFIRGTKV', 'VGPRYTNL', 'GAYEFTTL', 'IMIKFNRL', 'QAPGFTYT']
dbpeps = ['FQPQNGQFI', 'GRPKNGCIV', 'VNIRNCCYI', 'FGISNYCQI', 'ASNENMDAM', 'ASNENMETM',
'KVPRNQDWL', 'EGSRNQDWL', 'TSPRNSTVL', 'GILGFVFTL', 'RPAPGSTAP', 'APGSTAPPA', 'FAPGNYPAL',
'RMFPNAPYL', 'QGINNLDNL', 'ILNHNFCNL', 'TNLLNDRVL', 'AMGVNLTSM', 'CCLCLTVFL', 'CSLWNGPHL',
'CKGVNKEYL', 'SAINNYAQK', 'SQVTNPANI', 'IQVGNTRTI', 'SSVVGVWYL', 'KAVYNFATC', 'RAHYNIVTF',
'FTFPNEFPF', 'FKHINHQVV', 'MHYTNWTHI', 'WMHHNMDLI', 'HAGSLLVFM', 'WSKDNLPNG', 'GQAPGFTYT']

def mknet(opn=1, hn=210, ipn=160):
    from Bio.NeuralNetwork.BackPropagation import Layer
    from Bio.NeuralNetwork.BackPropagation.Network import BasicNetwork
    output = Layer.OutputLayer(opn)
    hidden = Layer.HiddenLayer(hn, output)
    input = Layer.InputLayer(ipn, hidden)
    network = BasicNetwork(input, hidden, output)
    return network

def mkinput():
    import sys
    from Bio.NeuralNetwork.Training import TrainingExample
    from Bio.NeuralNetwork.Training import ExampleManager
    ipdata = []
    examples = []
    ipdata += rankpeps(kbpeps, 1)
    #print ipdata
    #print len(ipdata)
    ipdata += mkbaddata(8, 100)
    random.shuffle(ipdata)
    #print ipdata
    #print len(ipdata)
    for ip in ipdata:
    #    print len(ip[0])
        examples.append(TrainingExample(ip[0], ip[1]))
    manager = ExampleManager(training_percent = 0.4, validation_percent = 0.4)
    manager.add_examples(examples)
    return manager

def rankpeps(peps, value=1):
    from copy import copy
    peps.sort()
    outdata = []
    for pep in peps:
        tempdata = []
        for r in pep:
            pcode = copy(datamask)
            pcode[aas.index(r)] = 1
            tempdata += pcode
            #tempdata.append(pcode)
        #print 'Outdata: %s' %(outdata)
        outdata.append([tempdata, [value]])
    return outdata

def mkstop(max_iter=200, min_iter=50, verbose=1):
    from Bio.NeuralNetwork import StopTraining
    stopper = StopTraining.ValidationIncreaseStop(max_iter, min_iter, verbose)
    return stopper

def mkbaddata(plen=8, piter=100):
    from copy import copy
    outdata = []
    for i in range(piter):
        tempdata = []
        for ii in range(plen):
            pcode = copy(datamask)
            pcode[random.randrange(0,20)] = 1
            tempdata += pcode
            #tempdata.append(pcode)
        outdata.append([tempdata, [0]])
    return outdata

def demo():
    network = mknet()
    manager = mkinput()
    stopper = mkstop()
    # (training examples, validation examples, stop function, learning rate, momentum)
    network.train(manager.train_examples, manager.validation_examples,
stopper.stopping_criteria, 0.6, 0.5)
    for test_example in manager.test_examples:
        prediction = network.predict(test_example.inputs)
        print "expected %s, got %s" %(test_example.outputs, prediction)

if __name__ == "__main__":
    demo()
JINLING HUANG | 19 Nov 2002 15:25
Picon

[BioPython] Biopython GenBank

Hello,

I am writing a script to parse GenBank records using biopython.
Essentially I just copied the script from the biopython tutorial as
follows:

from Bio import GenBank

gb_file = 'ncbiSamples'

gb_handle = open(gb_file, 'r')
fp1 = open('ncbiSeq', 'w+')

feature_parser = GenBank.FeatureParser()

gb_iterator = GenBank.Iterator(gb_handle, feature_parser)

while 1:
    cur_record = gb_iterator.next()
    if cur_record is None:
       break
    print cur_record.id
    sequence=cur_record.seq.data
    print sequence
    fp1.write('>'+cur_record.id[:-2]+'\n'+sequence+'\n')

The script worked well for some records. The I got error messages:

Traceback (most recent call last):
  File "getSeq.py", line 13, in ?
    cur_record = gb_iterator.next()
  File "/bio/python2.2/lib/python2.2/site-packages/Bio/GenBank/__init__.py", line
    183, in next return self._parser.parse(File.StringHandle(data))
  File "/bio/python2.2/lib/python2.2/site-packages/Bio/GenBank/__init__.py", line
    268, in parse self._scanner.feed(handle, self._consumer)
  File "/bio/python2.2/lib/python2.2/site-packages/Bio/GenBank/__init__.py", line
    1252, in feed self._parser.parseFile(handle)
  File "/bio/python2.2/lib/python2.2/site-packages/Martel/Parser.py", line
    338, in parseFile self.parseString(fileobj.read())
  File "/bio/python2.2/lib/python2.2/site-packages/Martel/Parser.py", line
    366, in parseString self._err_handler.fatalError(result)
  File "/bio/python2.2/lib/python2.2/xml/sax/handler.py", line 38, in
    fatalError raise exception
Martel.Parser.ParserPositionException: error parsing at or beyond
character 1525

Has everyone encountered the same problem? How do we get rid of this
problem?

Best wishes,

Jinling Huang

_______________________________________________
BioPython mailing list  -  BioPython <at> biopython.org
http://biopython.org/mailman/listinfo/biopython


Gmane