Peter Cock | 3 Oct 2011 13:20
Gravatar

Enhancements to Bio.Graphics.BasicChromosome

Hi Brad (et al),

You might have seen on Twitter at the end of last week I mentioned
some work to extend Brad's Bio.Graphics.BasicChromosome to allow
features within a chromosome segment, optionally with labels.

The branch is here:
https://github.com/peterjc/biopython/tree/chr_diag

I put together a non-trivial example of showing the tRNA genes in
Arabidopsis as a unit test in test_GraphicsChromosome.py - this is
deliberately showing too many features in order to check the label
placement algorithm:

http://twitpic.com/6sgr1m

This kind of figure is also used for showing SNP placement and genetic
marker loci used in breeding etc.

If I had put more (or a more uniform set of) features you'd get
something worthy of the nickname "millipede diagram", looking like a
segmented body (the chromosome) with thousands of legs (the lines for
the labels).

This isn't quite backwards compatible - the old code draws the
chromosomes left aligned within their allocated space, while I put
them centrally in order to draw labels on either side.

Iddo sounded enthusiastic on Twitter. Does this look worth including
as is? Would someone (doesn't have to be Brad) like to test/review it
(Continue reading)

Kevin Jacobs | 3 Oct 2011 23:28
Picon

Re: Enhancements to Bio.Graphics.BasicChromosome

On Mon, Oct 3, 2011 at 7:20 AM, Peter Cock <p.j.a.cock <at> googlemail.com>wrote:

> You might have seen on Twitter at the end of last week I mentioned
> some work to extend Brad's Bio.Graphics.BasicChromosome to allow
> features within a chromosome segment, optionally with labels.
>
>

This looks to be extremely useful.  Is there any support for layouts to
stack or pack chromosomes?  I'm thinking of diagrams for humans, where we
don't fit as well in linear displays.  I also think supporting chromosome
bands would be extremely useful.  These could include full cytobands,
centromeres, euchromatic vs hetrochromatic regions, user configurable bands
(e.g. linkage regions, IBD blocks, etc.)

The figure shows off what I'm thinking about the banding and layout, even
though it uses colored circles instead of text labels:
http://www.genome.gov/multimedia/illustrations/GWAS_2011_1.pdf

If there is interest, I may have some time to work on these features once
the basic infrastructure is stable.

Best regards,
-Kevin
Peter Cock | 4 Oct 2011 00:24
Gravatar

Re: Enhancements to Bio.Graphics.BasicChromosome

On Monday, October 3, 2011, Kevin Jacobs &lt;jacobs <at> bioinformed.com&gt; <
bioinformed <at> gmail.com> wrote:
> On Mon, Oct 3, 2011 at 7:20 AM, Peter Cock <p.j.a.cock <at> googlemail.com
>wrote:
>
>> You might have seen on Twitter at the end of last week I mentioned
>> some work to extend Brad's Bio.Graphics.BasicChromosome to allow
>> features within a chromosome segment, optionally with labels.
>>
>>
>
> This looks to be extremely useful.  Is there any support for layouts to
> stack or pack chromosomes?  I'm thinking of diagrams for humans, where we
> don't fit as well in linear displays.  I also think supporting chromosome
> bands would be extremely useful.  These could include full cytobands,
> centromeres, euchromatic vs hetrochromatic regions, user configurable
bands
> (e.g. linkage regions, IBD blocks, etc.)
>
> The figure shows off what I'm thinking about the banding and layout, even
> though it uses colored circles instead of text labels:
> http://www.genome.gov/multimedia/illustrations/GWAS_2011_1.pdf
>
> If there is interest, I may have some time to work on these features once
> the basic infrastructure is stable.
>
> Best regards,
> -Kevin

Hi Kevin,
(Continue reading)

Keith Hughitt | 4 Oct 2011 13:31
Picon
Gravatar

Creating a NCBIFastaIterator

Hi all,

I was thinking recently that it would be nice if the FASTA file reader were
able to check for known formats (e.g. NCBI) and then use that information to
choose better values for name, id, etc.

After some discussion with Peter Cock on GitHub, however, he convinced me
that this would be problematic in terms of backwards compatibility, and that
instead a better approach might be to add a new sub-format ("fasta-ncbi") to
the list of supported format readers.

This could go something like:

1. Create a new function in SeqIO.FastaIO for parsing NCBI-formatted FASTA
files. Add it the the mapping of iterators.
2. FastaIO.NCBIFasterIterator will simply call FASTAIterator and then modify
the result by assigning a new id, name, etc (other suggestions?)
3. FastaIO.NCBIFastaWriter (modify and subclass FastaIO.FastaWriter?)
4. Modify code that interacts with NCBI services which return FASTA files
and have it return a NCBIFasterIterator (First use a deprecation/warning to
let users know of the pending change?)

Does this sound like it would be a useful feature? What about the basic
approach outlined above? Any suggestions?

Keith
Peter Cock | 4 Oct 2011 13:46
Gravatar

Re: Creating a NCBIFastaIterator

On Tue, Oct 4, 2011 at 12:31 PM, Keith Hughitt <keith.hughitt <at> gmail.com> wrote:
> Hi all,
>
> I was thinking recently that it would be nice if the FASTA file reader were
> able to check for known formats (e.g. NCBI) and then use that information to
> choose better values for name, id, etc.
>
> After some discussion with Peter Cock on GitHub, however, he convinced me
> that this would be problematic in terms of backwards compatibility, and that
> instead a better approach might be to add a new sub-format ("fasta-ncbi") to
> the list of supported format readers.
>
> This could go something like:
>
> 1. Create a new function in SeqIO.FastaIO for parsing NCBI-formatted FASTA
> files. Add it the the mapping of iterators.

Yes.

> 2. FastaIO.NCBIFasterIterator will simply call FASTAIterator and then modify
> the result by assigning a new id, name, etc (other suggestions?)

Store the GI number in the SeqRecord's annotation under key "gi"
to match the GenBank parser. There may be other things like this.

If the FASTA header does not match the NCBI style, that should
probably trigger an exception.

> 3. FastaIO.NCBIFastaWriter (modify and subclass FastaIO.FastaWriter?)

(Continue reading)

Brad Chapman | 5 Oct 2011 14:03
Gravatar

Re: Enhancements to Bio.Graphics.BasicChromosome


Peter;

> >> You might have seen on Twitter at the end of last week I mentioned
> >> some work to extend Brad's Bio.Graphics.BasicChromosome to allow
> >> features within a chromosome segment, optionally with labels.

This is awesome, thanks for extending it. All of your tweaks are good
improvements, and I'm +1 for including it in the next release. Please
improve away.

Thanks much,
Brad
Kevin Jacobs | 5 Oct 2011 15:16
Picon

Re: Enhancements to Bio.Graphics.BasicChromosome

On Mon, Oct 3, 2011 at 6:24 PM, Peter Cock <p.j.a.cock <at> googlemail.com>wrote:

> Notches in the chromosome which I assume are centromeres
> (I can see how that might be added to the Bio code as another
> segment type, similar to the telomeres).
>

Yes-- although the visual style for centromeres need not be precisely as
shown in my example.

> Coloured background regions in the chromosome (should be
> able to do this already), some of which are hatched (not possible
> right now... would have to look into ReportLab's capabilities here).
> This is what you meant by banding?
>

Yes-- being able to show cytobands and custom bands to designate regions
will be very useful for me.  As before, I'm not wed to the cross-hatching,
in fact the standard displays use only grayscale.

Multiple coloured dots for labels. Doable, but a nice API might
> be tricky.
>

I don't much care about those -- I'd be happy with text labels.

> For layout did you mean the fact this isn't just a row of
> chromosomes left to right, but here there are two rows?
> I'm inclined to say the user should just move things in
> the PDF for a final version using Adobe of Inkscape ;)
(Continue reading)

Peter Cock | 5 Oct 2011 15:32
Gravatar

Re: Enhancements to Bio.Graphics.BasicChromosome

On Wed, Oct 5, 2011 at 2:16 PM, Kevin Jacobs wrote:
> On Mon, Oct 3, 2011 at 6:24 PM, Peter Cock wrote:
>>
>> Notches in the chromosome which I assume are centromeres
>> (I can see how that might be added to the Bio code as another
>> segment type, similar to the telomeres).
>
> Yes-- although the visual style for centromeres need not be precisely as
> shown in my example.
>
>>
>> Coloured background regions in the chromosome (should be
>> able to do this already), some of which are hatched (not possible
>> right now... would have to look into ReportLab's capabilities here).
>> This is what you meant by banding?
>
> Yes-- being able to show cytobands and custom bands to designate regions
> will be very useful for me.  As before, I'm not wed to the cross-hatching,
> in fact the standard displays use only grayscale.

OK - simple colours are easy, I can add that to the test case example.

>>
>> Multiple coloured dots for labels. Doable, but a nice API might
>> be tricky.
>
> I don't much care about those -- I'd be happy with text labels.
>

Good.
(Continue reading)

Peter Cock | 5 Oct 2011 16:40
Gravatar

Re: Enhancements to Bio.Graphics.BasicChromosome

On Wed, Oct 5, 2011 at 1:03 PM, Brad Chapman <chapmanb <at> 50mail.com> wrote:
>
> Peter;
>
>> >> You might have seen on Twitter at the end of last week I mentioned
>> >> some work to extend Brad's Bio.Graphics.BasicChromosome to allow
>> >> features within a chromosome segment, optionally with labels.
>
> This is awesome, thanks for extending it. All of your tweaks are good
> improvements, and I'm +1 for including it in the next release. Please
> improve away.

Awesome. I've applied the current branch to the trunk, although
I'm not promising there won't be changes to the new stuff between
now and the next release.

In particular, doing the labels (and their placement) for the whole
of a chromosome (and not just for a segment) would allow us to
squeeze in more labels (e.g. in example I showed using the
vertical space currently reserved for the telomeres).

Peter
Peter Cock | 5 Oct 2011 23:17
Gravatar

Re: Enhancements to Bio.Graphics.BasicChromosome

On Wed, Oct 5, 2011 at 2:32 PM, Peter Cock <p.j.a.cock <at> googlemail.com> wrote:
> On Wed, Oct 5, 2011 at 2:16 PM, Kevin Jacobs wrote:
>> On Mon, Oct 3, 2011 at 6:24 PM, Peter Cock wrote:
>>> Coloured background regions in the chromosome (should be
>>> able to do this already), some of which are hatched (not possible
>>> right now... would have to look into ReportLab's capabilities here).
>>> This is what you meant by banding?
>>
>> Yes-- being able to show cytobands and custom bands to designate regions
>> will be very useful for me.  As before, I'm not wed to the cross-hatching,
>> in fact the standard displays use only grayscale.
>
> OK - simple colours are easy, I can add that to the test case example.

Done, using some random placements - I didn't manage to find
the real Arabidopsis cytoband data which would have been nicer.
https://github.com/biopython/biopython/commit/24deaca63ba55e28519a4c85650ad74e849f203e

Peter

Gmane