Paul B | 1 Nov 20:50 2009
Picon

Questions on StructureBuilder, MMCIFParser, and MMCIFlex


Hi,
 
I'm a computer science guy trying to figure out some chemistry logic to support my thesis, so bear with me!
:-) To sum it up, I'm not sure MMCIFParser is handling ATOM and MODEL records correctly because of this code
in MMCIFParser:
            if fieldname=="HETATM":
                hetatm_flag="H"
            else:
                hetatm_flag=" "
This causes ATOM (and potentially MODEL) records to die as seen in the exception below (I think!)
 
My questions are:
1. Am I correct the correct code is insufficient?
2. What additional logic beyond just recognizing whether it's a HETATM, ATOM or MODEL record needs to be added?
 
Thanks!
 
Paul

 
Background:
I understand MMCIFlex.py et cetera is commented out in the Windows setup.py package due to difficulties
compiling it. So I re-wrote MMCIFlex strictly in Python to emulate what I THINK the original MMCIFlex did.
My version processes a .cif file one line at a time (readline()) then passes tokens back to MMCIF2Dict at 
each call to get_token(). That seems to work fine for unit testing of my MMCIFlex and MMCIFDict which I had
to slightly re-write (to ensure it handled how I passed SEMICOLONS line back etc).
 
However when I try and use this with MMCIFParser against the 2beg .cif file which has no HETATM records and,
as I understand the definition, no disordered atoms I get:
(Continue reading)

Peter | 1 Nov 22:28 2009
Picon
Picon

Re: Questions on StructureBuilder, MMCIFParser, and MMCIFlex

On Sun, Nov 1, 2009 at 7:50 PM, Paul B <tallpaulinjax <at> yahoo.com> wrote:
>
> Hi,
>
> I'm a computer science guy trying to figure out some chemistry logic
> to support my thesis, so bear with me! :-) To sum it up, I'm not sure
> MMCIFParser is handling ATOM and MODEL records correctly
> because of this code in MMCIFParser:
>             if fieldname=="HETATM":
>                 hetatm_flag="H"
>             else:
>                 hetatm_flag=" "
> This causes ATOM (and potentially MODEL) records to die as seen
> in the exception below (I think!)

I'll answer that below.

> My questions are:
> 1. Am I correct the correct code is insufficient?
> 2. What additional logic beyond just recognizing whether it's a
> HETATM, ATOM or MODEL record needs to be added?
>
> Thanks!
>
> Paul
>
>
> Background:
> I understand MMCIFlex.py et cetera is commented out in the
> Windows setup.py package due to difficulties compiling it.
(Continue reading)

Paul B | 2 Nov 14:21 2009
Picon

Re: Questions on StructureBuilder, MMCIFParser, and MMCIFlex

I'll use the conventional response technique in future emails! :-)
 
Hi Peter,
 
1. "Did you mean to not CC the list?": Sorry, I replied to your email 
address instead of the CC: address! 
2. Peter: "I should be able to run the flex code and you new code side by side,
for testing and profiling. Note sure when I'll find the time exactly, but
we'll see. Examples will help as while I know plenty about PDB files,
I've not used CIF at all": I'd be glad to run the tests myself as well 
and I have the time! :-) But without the flex module installed and 
operational the only way I can think of is with pickle'd .cif dicts.
3. Peter: "P.S. Are you OK with making this contribution under the Biopython
license?" Absolutely I'd be glad to contribute to biopython!
 
This was in response to my followup email to Peter:
"Hi Peter:

Paul: So I re-wrote MMCIFlex strictly in Python to emulate (the lex based MMCIFlex)

Peter: Now that would be very handy (IMO), if you can get it working.
Have you benchmarked it against the flex code? Have you been able 
to test the flex code? If not, could you give me a tiny script using the 
2beg cif file which should work? If that works, then the problem is in 
your flex replacement code.

Paul: It already works, but I have no way to benchmark it against the
flex code myself. Perhaps someone could pickle a half dozen PDB .cif files and 
send me the resultant files? I can then run a test agains each one. 
I'll also clean up the code on both the new MMCIFlex.py as well as the 
(Continue reading)

Paul B | 2 Nov 23:03 2009
Picon

Re: Questions on StructureBuilder, MMCIFParser, and MMCIFlex

Hi Peter,
 
I have attached drafts of MMCIFlex.py and MMCIFParser.py. They have __main__ methods that perform decent
testing.  On my system, I have replaced their same-named counterparts  in the appropriate folders.
Please note, however, this version of MMCIFlex.py and MMCIFParser.py must work together as a pair! So,
I don't know how you guys handling that: give them new names, or replace old files?
 
I can't test them further right now because I believe MMCIFParser needs corrections. For example, the
PDBParser.py calls the following methods in it's StructureBuilder object:
structure_builder.init_structure
structure_builder.set_header
structure_builder.set_line_counter
structure_builder.init_model
structure_builder.init_seg
structure_builder.init_chain
structure_builder.init_residue
structure_builder.init_atom
structure_builder.set_anisou
structure_builder.set_siguij
structure_builder.set_sigatm

 
However, MMCIFParser only calls:
structure_builder.init_structure
structure_builder.init_model
structure_builder.init_seg
structure_builder.init_chain
structure_builder.init_residue
structure_builder.init_atom
structure_builder.set_anisou
(Continue reading)

bugzilla-daemon | 3 Nov 14:20 2009

[Bug 2929] NCBIXML PSI-Blast parser should gather all information from XML blastgpg output

http://bugzilla.open-bio.org/show_bug.cgi?id=2929

------- Comment #4 from biopython-bugzilla <at> maubp.freeserve.co.uk  2009-11-03 08:20 EST -------
(In reply to comment #3)
> (In reply to comment #2)
> > What specifically is our parser failing to extract from this example PSI
> > BLAST XML file?
> 
> (Sorry, I've been away)
> Well, currently the code tries to get several pieces of information from the
> Blast.Record.PSIBlast (brecord):
> 
> brecord.converged

There is a <Iteration_message>CONVERGED</Iteration_message> line in the XML we
should be able to use here. I don't recall seeing this in pgpblast output from
older versions of BLAST.

> brecord.query
> brecord.query_letters

Those work (query and query_letters).

> brecord.rounds
> brecord.rounds.alignments
> brecord.rounds.alignments.title
> brecord.rounds.alignments.hsps

Those also work but not via rounds, but as separate BLAST record objects.
See mailing list discussion regarding PSI-BLAST and multiple BLAST queries.
(Continue reading)

Paul B | 3 Nov 17:36 2009
Picon

Re: Questions on StructureBuilder, MMCIFParser, and MMCIFlex


Hi,
 
I have found the reason why MMCIParser is dying. It has no provision for more than one model, so when a second
model comes around with the same chain and residue the program throws an exception.
 
I will be joining github to submit the required changes. I haven't used github before, and this is my first
open source project so please give me a few days to acclimate.
 
My mods so far are as follows in MMCIFParser.py (and require the MMCIFlex.py and MMCIF2Dict.py files I will
be submitting via github, and have submitted to Peter privately.)
 
Change the __doc__ setting:
#Mod by Paul T. Bathen to reflect MMCIFlex built solely in Python
__doc__="mmCIF parser (implemented solely in Python, no lex/flex/C code needed)" 

Insert the following model_list line:
        occupancy_list=mmcif_dict["_atom_site.occupancy"]
        fieldname_list=mmcif_dict["_atom_site.group_PDB"]
        #Added by Paul T. Bathen Nov 2009
        model_list=mmcif_dict["_atom_site.pdbx_PDB_model_num"]
        try:

 
Make the following changes:
        #Modified by Paul T. Bathen Nov 2009: comment out this line
        #current_model_id=0
        structure_builder=self._structure_builder
        structure_builder.init_structure(structure_id)
        #Modified by Paul T. Bathen Nov 2009: comment out this line
(Continue reading)

Kyle Ellrott | 3 Nov 17:46 2009
Picon

Re: [Biopython] Using SeqLocation to extract subsequence

(Moving this thread to Biopython-dev)

I've hacked together some code, and tested it against the bacterial genome
library I had on hand (of course, eukariotic features will be more
complicated, so will need to test against them next).  Examples of 'exotic'
feature location would be helpful.
I've posted the code below.  I'll be moving it into my git fork, and add
some testing.  Any thoughts where it should go?  It seems like it would best
work as a SeqRecord method.

def FeatureIDGuess( feature ):
    id = "N/A"
    try:
        id = feature.qualifiers['locus_tag'][0]
    except KeyError:
        try:
            id = feature.qualifiers['plasmid'][0]
        except KeyError:
            pass
    return id

def FeatureDescGuess( feature ):
    desc = ""
    try:
        desc=feature.qualifiers['product'][0]
    except KeyError:
        pass
    return desc

def ExtractFeatureDNA( record, feature ):
(Continue reading)

Peter | 3 Nov 18:09 2009
Picon
Picon

Re: [Biopython] Using SeqLocation to extract subsequence

On Tue, Nov 3, 2009 at 4:46 PM, Kyle Ellrott <kellrott <at> gmail.com> wrote:
> (Moving this thread to Biopython-dev)
>
> I've hacked together some code, and tested it against the bacterial genome
> library I had on hand (of course, eukariotic features will be more
> complicated, so will need to test against them next).  Examples of 'exotic'
> feature location would be helpful.
> I've posted the code below.  I'll be moving it into my git fork, and add
> some testing.  Any thoughts where it should go?  It seems like it would best
> work as a SeqRecord method.

i.e. Option (4) of this list of ideas?
http://lists.open-bio.org/pipermail/biopython-dev/2009-October/006922.html

Peter

P.S.

def FeatureDescGuess( feature ):
   desc = ""
   try:
       desc=feature.qualifiers['product'][0]
   except KeyError:
       pass
   return desc

Could be just:

def FeatureDescGuess( feature ):
   return feature.qualifiers.get('product', [""])[0]
(Continue reading)

Peter | 3 Nov 18:13 2009
Picon
Picon

Re: Questions on StructureBuilder, MMCIFParser, and MMCIFlex

On Tue, Nov 3, 2009 at 4:36 PM, Paul B <tallpaulinjax <at> yahoo.com> wrote:
>
> Hi,
>
> I have found the reason why MMCIParser is dying. It has no provision
> for more than one model, so when a second model comes around with
> the same chain and residue the program throws an exception.

Please file a bug report on bugzilla for that. I guess no-one has tried
NMR CIF data with the parser before (!).

> I will be joining github to submit the required changes. I haven't used
> github before, and this is my first open source project so please give
> me a few days to acclimate.

I you like - great. Otherwise we can manage with patches via an
enhancement bug on bugzilla.

> My mods so far are as follows in MMCIFParser.py (and require the
> MMCIFlex.py and MMCIF2Dict.py files I will be submitting via github,
> and have submitted to Peter privately.)

Actually, I think that ended up on mailing list:
http://lists.open-bio.org/pipermail/biopython-dev/2009-November/006938.html

> The only difference is the PDBParser incorrectly states the first model as 0
> when it should be 1: there is an explicit MODEL line in pdb2beg.ent. So all
> the models are off by one in 2beg when parsed by PDBParser.py. I can
> look into the bug in PDBParser.py and submit it if desired?

(Continue reading)

bugzilla-daemon | 3 Nov 18:19 2009

[Bug 2731] Adding .upper() and .lower() methods to the Seq object

http://bugzilla.open-bio.org/show_bug.cgi?id=2731

------- Comment #5 from biopython-bugzilla <at> maubp.freeserve.co.uk  2009-11-03 12:19 EST -------
Created an attachment (id=1389)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1389&action=view)
Patch to Bio/Seq.py

Compared to the earlier patch, this takes the less invasive approach of only
editing Bio/Seq.py (covering both Seq and UnknownSeq, with doctests), but has
the downside that it is not easy to deal with gapped alphabets etc nicely.

Adding (private) upper/lower methods as outlined in the earlier patch does seem
a better plan.

--

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Gmane