FW: Proposed changes to map module
Cannon, Ethalinda K [GDCBA] <ekcannon <at> iastate.edu>
2013-05-03 13:57:09 GMT
Sorry, I sent the response below directly to Josh by accident but would like to see if there are additional
reactions, so reposting to the list.
After some more talk amongst ourselves, we're proposing taking this a step further and making two changes
- add a type_id field
- change the type of mappos to numeric
I think we could also accept the featureposprop solution that Sook uses, but would then like to see that
table become part of the chado schema and rather than an add-on table that everyone who needs to store start
and end genetic coordinates will have to create.
In either case, changing the type of featurepos.mappos may be advisable as one of our objectives is to be
able to retrieve features within a range of coordinates, which will mean numeric comparisons. It sounds
like there's a trade-off regarding the data type, with numeric fields being more accurate and arbitrary
precision fields being the faster of the two. I suggest that while speed is desirable, accuracy is more
important for genetic markers and QTLs.
Obviously, we don't want to break anything. Is there any sense for how many databases are using the
featurepos table, in addition to GDR and CottonGen? How have changes to existing tables been handled in
From: Cannon, Ethalinda K [GDCBA]
Sent: Thursday, May 02, 2013 3:11 PM
To: Josh Goodman
Subject: RE: [Gmod-schema] Proposed changes to map module
Thanks for your response, Josh, I can see the reasons for your discomfort with non-integer fields.
The reason we would like some sort of non-integer data type is that genetic positions are not integers. How
do you store non-integer positions and map values?
We could multiply them by 100 (or 1000, or 10,000) and store them as integers but then we'd need to be clear
that's what we did so they could be converted back to the proper values for display or calculations. The
numeric type does look better than double precision.
Not being able to test for equality isn't an issue because genetic positions are approximations and
testing if two are equivalent doesn't seem to make much sense.
Another reason to not use the featureloc table is the need to link the position back to a specific map set
(featuremap) and its unit (usually cM in our case).
We need one-to-two relationships if using the featurepos table (which records only one position) so that
we can record beginning and end coordinates of QTLs and linkage group maps.
At this point, as we talk over the options in our earlier note, we are liking option 2 best. That way
featureloc will be unchanged (and not slowed down by numeric fields) and featurepos already contains a
float field (mappos).
From: Josh Goodman [jogoodma <at> indiana.edu]
Sent: Thursday, May 02, 2013 2:10 PM
To: Cannon, Ethalinda K [GDCBA]
Cc: GMOD Schema/Chado List
Subject: Re: [Gmod-schema] Proposed changes to map module
Hi Ethy, Naama and Steven,
How does the existing one to many relationship between feature and
featureloc not meet your needs for modeling ranges or multi genetic or
cytological positions? In FlyBase, we have many features that have
multiple locations, so I'm not sure I understand what it is you are
trying to address. Perhaps you can give us a use case?
What is your reason for wanting to convert fmin/fmax in featureloc
from an integer to a float? Float types in PostgreSQL come with very
dire warnings about their use.
8.1.3. Floating-Point Types
The data types real and double precision are inexact,
variable-precision numeric types....
Inexact means that some values cannot be converted exactly to the
internal format and are stored as approximations, so that storing and
retrieving a value might show slight discrepancies.
*Comparing two floating-point values for equality might not always
work as expected.*
That last statement makes this a non starter for me. The better type
to use in place of a float is a numeric, but that is not without
pitfalls of its own.
8.1.2. Arbitrary Precision Numbers
..."However, arithmetic on numeric values is very slow compared to the
integer types, or to the floating-point types described in the next
Doing anything that might slow down location queries is not ideal
unless the benefits outweigh the costs.
On Thu, May 2, 2013 at 1:52 PM, Cannon, Ethalinda K [GDCBA]
<ekcannon <at> iastate.edu> wrote:
> Since storing a range, or multiple genetic or cytological positions per feature is so common,
> we'd like to propose one of the following changes:
> 1. Change data type for fmin and fmax in featureloc to float.
> + table is already set up for min/max coordinates
> - table is not tied to a featuremap and therefore coordinate unit is unknown
> 2. Add a field to existing featurepos, type_id, to indicate what sort of
> position (e.g. start, end).
> + takes advantage of existing table, minimal change, adding a field
> shouldn't break existing code, views, triggers, et cetera.
> - ?
> 3. Create a new table, featureinterval with these fields:
> featuremap_id (map set, to get coordinate units)
> feature_id (object feature being placed)
> srcfeature_id (target feature)
> startpos (double precision)
> endpos (double precision)
> + straight-forward to help newbies get started
> - duplicates some information already provided by featurepos table
> Ethy Cannon
> Naama Menda
> Steven Cannon
> Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
> Get 100% visibility into your production application - at no cost.
> Code-level diagnostics for performance bottlenecks with <2% overhead
> Download for free and get started troubleshooting in minutes.
> Gmod-schema mailing list
> Gmod-schema <at> lists.sourceforge.net
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.