Kristian Rother | 1 Dec 11:18 2010
Picon

Re: Features of the GSOC branch ready to be merged


Hi Joao,

Do you have a separate GIT branch for these three features?

I would volunteer to pull them, try a local merge, and run auto & manual
tests.

Best regards,
   Kristian

> Hello all,
>
> I've been looking at the code I wrote for the GSOC to see what is ready to
> be merged in the main branch. I have to thank Kristian and whoever
> participated in the Python & Friends for the input.
>
> From what I gathered, and from my own tests, I believe the following
> functions are solid enough:
>
>
>    1.
> Bio/PDB/Atom.py<https://github.com/JoaoRodrigues/biopython/blob/GSOC2010/Bio/PDB/Atom.py#L75-105>:
>    automatically guessing atom element from atom name
>    2. Bio/PDB/Structure.py
>       1. Building biological unit from REMARK 350 in the header
> (link<https://github.com/JoaoRodrigues/biopython/blob/GSOC2010/Bio/PDB/Structure.py#L78-110>
>       )
>       2. Renumbering residues
> (link<https://github.com/JoaoRodrigues/biopython/blob/GSOC2010/Bio/PDB/Structure.py#L66-76>
(Continue reading)

Peter | 1 Dec 11:33 2010
Picon
Picon

Re: Features of the GSOC branch ready to be merged

On Wed, Dec 1, 2010 at 10:18 AM, Kristian Rother <krother <at> rubor.de> wrote:
>
> Hi Joao,
>
> Do you have a separate GIT branch for these three features?
>
> I would volunteer to pull them, try a local merge, and run auto & manual
> tests.
>
> Best regards,
>   Kristian

I think Joao just has the one branch at the moment. If it
would be feasible to split out the functionality it would be
easier to merge incrementally.

For example, a new branch (from the master) just for
the atom element stuff in Bio.PDB shouldn't be too hard.
If while working on the GSoC changes you didn't mix
up changes in single commits then you (Joao) might
find "git cherry-pick" useful. Otherwise doing a "git diff"
between the GSoC branch and the master for the
Bio.PDB files only could give you a useful patch to
start from. Does any of that make sense?

Peter
João Rodrigues | 1 Dec 12:13 2010

Re: Features of the GSOC branch ready to be merged

Sorry Peter, your email got completely hidden in my mailbox.. gmail bug.

I told Kristian I wouldn't mind at all creating a new branch just for these
features but I really don't know how to do it. I'll look into that git
cherry-pick command and see what I can do :)

Thanks!

João [...] Rodrigues
http://doeidoei.wordpress.com

On Wed, Dec 1, 2010 at 11:33 AM, Peter <biopython <at> maubp.freeserve.co.uk>wrote:

> On Wed, Dec 1, 2010 at 10:18 AM, Kristian Rother <krother <at> rubor.de> wrote:
> >
> > Hi Joao,
> >
> > Do you have a separate GIT branch for these three features?
> >
> > I would volunteer to pull them, try a local merge, and run auto & manual
> > tests.
> >
> > Best regards,
> >   Kristian
>
> I think Joao just has the one branch at the moment. If it
> would be feasible to split out the functionality it would be
> easier to merge incrementally.
>
> For example, a new branch (from the master) just for
(Continue reading)

João Rodrigues | 1 Dec 13:12 2010
Picon

Re: Features of the GSOC branch ready to be merged

Ok, I managed to branch it. There were some other files needing attention
other than Atom.py and IUPACData.py so it took a while to pinpoint them
all.. lesson learned to be careful with commits :)

If you want to test it yourselves, here it is:

https://github.com/JoaoRodrigues/biopython/tree/atom-element/

Best! And thanks for the help :)

João

_______________________________________________
Biopython-dev mailing list
Biopython-dev <at> lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython-dev
João Rodrigues | 1 Dec 18:01 2010
Picon

Re: Features of the GSOC branch ready to be merged

Following Peter's comments I changed some stuff.

I also noticed one thing: metal ions like CA and CL have their names
starting one character before regular C and N atoms. That allows some
discrimination between CA (alpha carbon) and CA (calcium) for example. I'd
never noticed this before, thus relying on the hetero_flag to try and
exclude metal ions (HETATM) because they would likely be wrong if such an
ambiguous case existed. I thus removed the hetero_flag I'd added to Atom
objects and expanded the element guessing logic to all atoms.

I also changed the tests in test_PDB.py to reflect this.

Best! And thanks Peter for the comments!
Peter | 1 Dec 18:15 2010
Picon
Picon

Re: Features of the GSOC branch ready to be merged

On Wed, Dec 1, 2010 at 5:01 PM, João Rodrigues <anaryin <at> gmail.com> wrote:
> I also noticed one thing: metal ions like CA and CL have their names
> starting one character before regular C and N atoms. That allows some
> discrimination between CA (alpha carbon) and CA (calcium) for example. I'd
> never noticed this before, ...

Is this documented in the PDB format definition? More importantly,
do third party tools follow this rule? They are the only reason we
need the code to guess the element in the first place, right? (Since
the PDB provided files should all have the element column).

Peter
Eric Talevich | 1 Dec 18:29 2010
Picon

Re: Features of the GSOC branch ready to be merged

On Wed, Dec 1, 2010 at 12:15 PM, Peter <biopython <at> maubp.freeserve.co.uk>wrote:

> On Wed, Dec 1, 2010 at 5:01 PM, João Rodrigues <anaryin <at> gmail.com> wrote:
> > I also noticed one thing: metal ions like CA and CL have their names
> > starting one character before regular C and N atoms. That allows some
> > discrimination between CA (alpha carbon) and CA (calcium) for example.
> I'd
> > never noticed this before, ...
>
> Is this documented in the PDB format definition? More importantly,
> do third party tools follow this rule? They are the only reason we
> need the code to guess the element in the first place, right? (Since
> the PDB provided files should all have the element column).
>
>
I think can rely on this convention. I'd read this somewhere else (maybe on
one of Andrew Dalke's pages) but didn't think to apply it to João's problem.

Here's a reference:
http://bmerc-www.bu.edu/needle-doc/latest/atom-format.html#pdb-atom-name-anomalies

-Eric
João Rodrigues | 1 Dec 19:34 2010
Picon

Re: Features of the GSOC branch ready to be merged

http://www.wwpdb.org/documentation/format32/sect9.html

Well, there doesn't seem to be a written rule, but it is shown in the
documentation of the format.

Also, do you think it's worthy to include a sanity check for those elements
that have been assigned? For example when parsing a file checking if the
assigned element truly corresponds to what it should be and issuing a
warning or even an exception if otherwise?
Nick Loman | 2 Dec 11:50 2010
Picon
Picon

Bio.AlignIO, Bio.Nexus, MrBayes, polymorphic sites, maximum line length

Hi there

Two questions for the developers.

1) I wanted to extract polymorphic sites from a multiple alignment and 
ended up with some code like this:

    alignment = AlignIO.read(fn, "nexus")
    rows = len(alignment)
    new_alignment = None
    for n in xrange(alignment.get_alignment_length()):
        aln = alignment[:,n]
        if aln[0] * rows != aln:
            if new_alignment:
                new_alignment += alignment[:,n:n+1]
            else:
                new_alignment = alignment[:,n:n+1]
    if new_alignment:
        AlignIO.write([new_alignment], open(fn + ".ply", "w"), "nexus")

Is this the best way of doing it? Would a method call in AlignIO to do 
the same thing be useful to others?

2) When outputting long alignments in Nexus format, MrBayes refuses to 
read the resulting files saying that the maximum line length is 19900 
characters. I'm assuming that is not the maximum input to MrBayes and 
that it can handle longer alignments if they are split in some way. 
Would it be possible for Bio.Nexus to split alignments in the 
appropriate format?

(Continue reading)

Peter | 2 Dec 12:43 2010
Picon
Picon

Re: Bio.AlignIO, Bio.Nexus, MrBayes, polymorphic sites, maximum line length

On Thu, Dec 2, 2010 at 10:50 AM, Nick Loman <n.j.loman <at> bham.ac.uk> wrote:
> Hi there
>
> Two questions for the developers.
>
> 1) I wanted to extract polymorphic sites from a multiple alignment and ended
> up with some code like this:
>
>   alignment = AlignIO.read(fn, "nexus")
>   rows = len(alignment)
>   new_alignment = None
>   for n in xrange(alignment.get_alignment_length()):
>       aln = alignment[:,n]
>       if aln[0] * rows != aln:
>           if new_alignment:
>               new_alignment += alignment[:,n:n+1]
>           else:
>               new_alignment = alignment[:,n:n+1]
>   if new_alignment:
>       AlignIO.write([new_alignment], open(fn + ".ply", "w"), "nexus")
>
> Is this the best way of doing it? Would a method call in AlignIO to
> do the same thing be useful to others?

I've got some code somewhere for iterating over the columns of
the alignment, and think I filed an enhancement bug for this.
Would that do what you want?

> 2) When outputting long alignments in Nexus format, MrBayes refuses to read
> the resulting files saying that the maximum line length is 19900 characters.
(Continue reading)


Gmane