clement fanteria | 15 May 2013 16:35
Picon
Favicon

Chado genomic rearangement

Hello,

I'm trying to store genomic rearangement like insertion, deletion, CNV or trisomy in the Chado database.
I have some problems with the CNV and the trisomy where I want store something like a "gain" to express this genomic variation.

The feature and featureloc tables doesn't allow this kind of information. Unfortunately the section about the Genomic Rearrangements is empty for this specific genomic rearangement :
http://gmod.org/wiki/Chado_Best_Practices#xxx_Genomic_Rearrangements

Thank you for your help

Clément
------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
Krzysztof Lubieniecki | 14 May 2013 07:24
Picon
Picon
Favicon

Re: chado question BLAST

Hi Scott, 
As you suggested I start to use gmod-schema mailing list.
The data were loaded as GFF3 according "chado best practice - results from BLAST" , we do not use Tripal.
I was able to display blast results on Gbrowse but the problem I still have is the label - I can display only
uniquenames. I can pull out appropriate name related to the uniquename using mysql query (table joining)
but it does not work if I do it as a subroutine in Gbrowse.conf file. 

Thanks

Krzysztof

Hi Krzysztof, 

First let me suggest that we take this conversation to the Chado mailing list,
gmod-schema <at> lists.sourceforge.net . Subsequent replies can trim Lincoln off the cc list and and add the
schema mailing list. 

Do you knowhow the data were loaded? A typical way to load the data would be via GFF (where the blast reports
were converted to GFF3). Another way would be via Tripal, but it doesn't like ASalBase uses Tripal. If you
used GFF, the cigar string typically isn't saved, so the exact alignment generally can't be recovered,
only information about the individual HSPs can; it depends on what was originally stored from the GFF. 

Another question is, how do you want to display it? If it's in the context of GBrowse, the Chado GBrowse
adaptor should recreate the alignments in the gbrowse_details page (it doesn't use the alignment from
blast but rather does an on the fly local alignment between the match feature and the contig). 

Scott 

------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
Akiff Manji | 6 May 2013 22:48
Picon

Assigning feature relationships in Chado

Hi everyone,

I've recently put together a Chado database to store bacterial genome sequences in the form of multi fasta files. I've been using the perl bulk loader scripts (gmod_fasta2gff3.pl gmod_bulkoad_gff3.pl) to upload the sequences to the database. Everything works pretty well and I'm able to tag attributes into the feature property tables. In fact currently we have been using only the feature prop table to assign contigs from an mfasta file to a genome. This essentially creates the same name entry for every single contig we upload for a particular genome.

I realize this is not the approach to be taking, and rather we should be defining relationships according to the RDF model and the feature_relationship table. My question is, how exactly can we add these relationships on the sequence file uploads (ie. without having to later define the relationships with SQL)? What is the usual approach to implementing relationships within the Chado database? Is there a bulk uploader that does this?

Cheers,

Akiff Manji


------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and 
their applications. This 200-page book is written by three acclaimed 
leaders in the field. The early access version is available now. 
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and 
their applications. This 200-page book is written by three acclaimed 
leaders in the field. The early access version is available now. 
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
Favicon

FW: Proposed changes to map module

Sorry, I sent the response below directly to Josh by accident but would like to see if there are additional
reactions, so reposting to the list.

After some more talk amongst ourselves, we're proposing taking this a step further and making two changes
to featurepos: 
  - add a type_id field
  - change the type of mappos to numeric

I think we could also accept the featureposprop solution that Sook uses, but would then like to see that
table become part of the chado schema and rather than an add-on table that everyone who needs to store start
and end genetic coordinates will have to create.

In either case, changing the type of featurepos.mappos may be advisable as one of our objectives is to be
able to retrieve features within a range of coordinates, which will mean numeric comparisons. It sounds
like there's a trade-off regarding the data type, with numeric fields being more accurate and arbitrary
precision fields being the faster of the two. I suggest that while speed is desirable, accuracy is more
important for genetic markers and QTLs.

Obviously, we don't want to break anything. Is there any sense for how many databases are using the
featurepos table, in addition to GDR and CottonGen? How have changes to existing tables been handled in
the past?

Ethy

________________________________________
From: Cannon, Ethalinda K [GDCBA]
Sent: Thursday, May 02, 2013 3:11 PM
To: Josh Goodman
Subject: RE: [Gmod-schema] Proposed changes to map module

Thanks for your response, Josh, I can see the reasons for your discomfort with non-integer fields.

The reason we would like some sort of non-integer data type is that genetic positions are not integers. How
do you store non-integer positions and map values?

We could multiply them by 100 (or 1000, or 10,000) and store them as integers but then we'd need to be clear
that's what we did so they could be converted back to the proper values for display or calculations. The
numeric type does look better than double precision.

Not being able to test for equality isn't an issue because genetic positions are approximations and
testing if two are equivalent doesn't seem to make much sense.

Another reason to not use the featureloc table is the need to link the position back to a specific map set
(featuremap) and its unit (usually cM in our case).

We need one-to-two relationships if using the featurepos table (which records only one position) so that
we can record beginning and end coordinates of QTLs and linkage group maps.

At this point, as we talk over the options in our earlier note, we are liking option 2 best. That way
featureloc will be unchanged (and not slowed down by numeric fields) and featurepos already contains a
float field (mappos).

Ethy
Naama
Steven

________________________________________
From: Josh Goodman [jogoodma <at> indiana.edu]
Sent: Thursday, May 02, 2013 2:10 PM
To: Cannon, Ethalinda K [GDCBA]
Cc: GMOD Schema/Chado List
Subject: Re: [Gmod-schema] Proposed changes to map module

Hi Ethy, Naama and Steven,

How does the existing one to many relationship between feature and
featureloc not meet your needs for modeling ranges or multi genetic or
cytological positions?  In FlyBase, we have many features that have
multiple locations, so I'm not sure I understand what it is you are
trying to address.  Perhaps you can give us a use case?

What is your reason for wanting to convert fmin/fmax in featureloc
from an integer to a float?  Float types in PostgreSQL come with very
dire warnings about their use.

http://www.postgresql.org/docs/9.1/static/datatype-numeric.html#DATATYPE-FLOAT

8.1.3. Floating-Point Types
The data types real and double precision are inexact,
variable-precision numeric types....
Inexact means that some values cannot be converted exactly to the
internal format and are stored as approximations, so that storing and
retrieving a value might show slight discrepancies.
...
*Comparing two floating-point values for equality might not always
work as expected.*

That last statement makes this a non starter for me.  The better type
to use in place of a float is a numeric, but that is not without
pitfalls of its own.

http://www.postgresql.org/docs/9.1/static/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL

8.1.2. Arbitrary Precision Numbers

..."However, arithmetic on numeric values is very slow compared to the
integer types, or to the floating-point types described in the next
section."

Doing anything that might slow down location queries is not ideal
unless the benefits outweigh the costs.

Cheers,
Josh
FlyBase

On Thu, May 2, 2013 at 1:52 PM, Cannon, Ethalinda K [GDCBA]
<ekcannon <at> iastate.edu> wrote:
> Since storing a range, or multiple genetic or cytological positions per feature is so common,
> we'd like to propose one of the following changes:
>
> 1. Change data type for fmin and fmax in featureloc to float.
>
>     + table is already set up for min/max coordinates
>     - table is not tied to a featuremap and therefore coordinate unit is unknown
>
> 2. Add a field to existing featurepos, type_id, to indicate what sort of
>    position (e.g. start, end).
>
>     + takes advantage of existing table, minimal change, adding a field
>       shouldn't break existing code, views, triggers, et cetera.
>     - ?
>
> 3. Create a new table, featureinterval with these fields:
>      featureinterval_id
>      featuremap_id (map set, to get coordinate units)
>      feature_id (object feature being placed)
>      srcfeature_id (target feature)
>      startpos (double precision)
>      endpos (double precision)
>
>     + straight-forward to help newbies get started
>     - duplicates some information already provided by featurepos table
>
>
> Ethy Cannon
> Naama Menda
> Steven Cannon
>
>
> ------------------------------------------------------------------------------
> Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
> Get 100% visibility into your production application - at no cost.
> Code-level diagnostics for performance bottlenecks with <2% overhead
> Download for free and get started troubleshooting in minutes.
> http://p.sf.net/sfu/appdyn_d2d_ap1
> _______________________________________________
> Gmod-schema mailing list
> Gmod-schema <at> lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-schema

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
Favicon

Proposed changes to map module

Since storing a range, or multiple genetic or cytological positions per feature is so common, 
we'd like to propose one of the following changes:

1. Change data type for fmin and fmax in featureloc to float.

    + table is already set up for min/max coordinates
    - table is not tied to a featuremap and therefore coordinate unit is unknown

2. Add a field to existing featurepos, type_id, to indicate what sort of
   position (e.g. start, end).

    + takes advantage of existing table, minimal change, adding a field 
      shouldn't break existing code, views, triggers, et cetera.
    - ?

3. Create a new table, featureinterval with these fields:
     featureinterval_id
     featuremap_id (map set, to get coordinate units)
     feature_id (object feature being placed)
     srcfeature_id (target feature)
     startpos (double precision)
     endpos (double precision)

    + straight-forward to help newbies get started
    - duplicates some information already provided by featurepos table

Ethy Cannon
Naama Menda
Steven Cannon

------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
Favicon

Storing BLAST results in Chado

Hello Scott and Members

Thanks you for the prompt and very informative responses!

Our group is considering using Chado's schema and tools for storing BLAST results.

I have some concerns and questions, regarding this..

I see some of the Blast result captured in the AnalysisFeature table. Ideally it would be great to be able to store information like 'Match Length', "Number of Identicals", "Number of Positives", "Reading Frame of the match"  that the blast results file provides.

I don't see these captured directly by schema (esp for pre-stored features like a Gene, I don't see and easy way of deducing the alignment length of match as the sequence length in Feature tables is that of the whole gene).

I am assuming we will have to add a table to capture these if we decide to use Chado and then a tool to load these values in to the proper table.

Any advice/suggestion for such a scenario would be great!

Thanks
Ganesh
------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
------------------------------------------------------------------------------
Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET
Get 100% visibility into your production application - at no cost.
Code-level diagnostics for performance bottlenecks with <2% overhead
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap1
Favicon

storing feature sequences - best practices ?

Hi Members

I am trying to load prokaryotic and viral GFF features into a local Chado installation. I made the GFF files from the GenBank files (.gb) using a tool and these GFF files don't have sequences of the features.
 So obviously the residues column would be empty for the features is my guess ?

What is the best practise wrt Chado for storing the actual sequences/residues for features and the whole genome ?
I need to be able to pull them out for an interface in the future and for comparing annotations..

Thanka
Ganesh
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
Cannon, Ethalinda K [GDCBA] | 25 Apr 2013 15:43
Favicon

Saving QTL data in Chado

Hello,

We are working on a new database for a set of crop plants, using Chado and Tripal, particularly focusing on
QTL data with its associated maps, mapping populations, mapping analysis, markers, publications, and
traits. How to load QTL data into Chado is proving to be far from obvious so we have been communicating with
people from SGN, Knowpulse, CacaoGenomedb, and CottonGen and are trying to arrive at a "consensus"
solution so that our data can look as much like other plant QTL data as possible.

Are there any other groups out there (plant or otherwise) who have loaded QTL data into Chado? If so, would
you be willing to share your road map?

Alternatively, has there been any talk of creating a QTL module? The existing tables can certainly be
fitted together to handle QTL data, but it could be more straight-forward.

Below is our current road map, subject to change:

1. Data is mined from publications and entered into a human-readable spreadsheet.

2. QTLs are represented by feature records and associated with publications (feature_pub), markers
(feature_relationship where markers are also features), locations (featureloc), with all additional
information through the featureprop table.

3. The analysis method and associated values are attached to QTLs via the analysis and analysisfeature tables.

4. The maps the QTLs are placed on are represented by feature records, one per linkage group.

5. Mapping population (parents and the population itself) are represented via the stock and
stock_relationship tables.

6. We would like to associate QTLs with their mapping population through a feature_stock table, which
doesn't exist in the core Chado schema. Alternatively we could use existing tables, including the
Natural Diversity module, to create a chain to connect stock records (mapping population) to feature
records for the QTL through: nd_experiment_stock, nd_experiment, nd_experiment_genotype,
genotype, genotype_feature.

7. We would also like to associate a set of QTLs with their associated study (one publication may report on
multiple QTL studies) via the project table, though this appears to also need a new association table,
feature_project, or another long chain of connecting tables that we haven’t worked out yet.

Ethy Cannon
Nathan Weeks
Steven Cannon

------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
Amelia Ireland | 24 Apr 2013 17:04
Gravatar

Re: newbie help

Hello Susanne,

Thanks for your email and apologies that it is not clearer how to set up a Chado database. Have you had a look at the Chado tutorial, http://gmod.org/wiki/Chado_Tutorial? The second part may be of interest as it's about the more practical side, adding data to the db, etc. I've cc'd the Chado mailing list on this email so that other Chado users can give their input.

You might also like to consider using Tripal, http://tripal.info, a web-based interface to a Chado database that provides a very nice, customizable website for display and managing your data. You can also integrate tools like GBrowse (sequence visualizer) directly into Tripal for a smooth, integrated experience. Have a look at some of the sites using Tripal (links on the front page of the Tripal website), particularly the Banana Genome Hub; they've been able to set up a comprehensive resource with built-in tools very quickly. Tripal has a bulk loader that allows you to insert data into Chado as tab-separated values and similar simple formats.

We are also running the GMOD Summer School in July (http://gmod.org/wiki/2013_GMOD_Summer_School), which provides training on setting up and configuring GMOD tools, including a Chado DB.

I hope someone on the Chado list can respond directly to your Chado question, but please feel free to email the help desk again if you have any more questions.

Best wishes,
Amelia
GMOD Help Desk and Community Support



On Wed, Apr 24, 2013 at 7:37 AM, Howard, Susanne F <SusanneHoward <at> missouristate.edu> wrote:
Hi,
I am a non-programmer, non-bioinformatics and worst of all non-LINUX (ubuntu) person but I have been put into a situation to work with a complete genome sequencing project which in the end is also going to be made available to the public. The organism is grape, vitis, and the end goal is something modelled after the genoscope vits site. That will happen in a few years from now.
Planning ahead like this though led us to the attempt of using the Chado database system right now at the start. All I need to do right now is to put 2 sets of reads into a db, then later the scaffolds, chromosomes etc. and way much later annotation and other actual gene related features.

I believe it will be the main sequence module I need to use, but I am ignorant enough to not even know how to start/run a database .
I can use MS Access dbs, create my own dbs, edit the underlying tables etc, and i do know what relational dbs imply. I can generally follow the theoretical descriptions found in the various Chado tutorials, about relationships, keys, ID's and so on, what I am missing is perhaps a practical example of how to enter simple data based on an actual example?
Usually I can find examples on the web, googling things like how to import fasta  and fastq data into Chado, but I drew a complete blank this time. I tried just about any tutorial I could find, but they are either speaking a foreign language to me, or just explain the structure/schema. I have not found anything related to actually populating a database, or how to modify a table (since I am not sure that something like reads are an option with the existing schema.
Any help you can give will be GREATLY appreciated!!
Susanne Howard
Research Specialist, Missouri State University, Mountain Grove Campus
susannehoward <at> missouristate.edu
417-496-0707



--
Amelia Ireland
GMOD Community Support
http://gmod.org ||  <at> gmodproject

------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
Adam Witney | 10 Apr 2013 18:17
Picon
Favicon

failing to upload bacterial genome GFF from NCBI


Has anyone managed to get NCBI GFF files for bacterial genomes uploaded
into chado? I seem to be running into constant errors.

I have removed this header line:

##species http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=282458

I have removed the Is_circular=true tag.

But now I am getting these errors:

$ trunk/chado/load/bin/gmod_bulk_load_gff3.pl --organism "Staphylococcus
aureus" --gfffile NC_002952.gff --recreate_cache
(Re)creating the uniquename cache in the database...
Creating table...
Populating table...
Creating indexes...
Adjusting the primary key sequences (if necessary)...Done.
Preparing data for inserting into the chado database
(This may take a while ...)
Unable to find srcfeature NC_002952.2 in the database.
Perhaps you need to rerun your data load with the '--recreate_cache'
option. at
/opt/perlbrew/perls/perl-5.14.2/lib/site_perl/5.14.2/Bio/GMOD/DB/Adapter.pm
line 4599, <GEN0> line 5.

Bio::GMOD::DB::Adapter::src_second_chance('Bio::GMOD::DB::Adapter=HASH(0x1a74f588)',
'Bio::SeqFeature::Annotated=HASH(0x1bcd5db0)') called at
/homedirs8/share/Tools/GMOD/trunk/chado/load/bin/gmod_bulk_load_gff3.pl
line 851

Abnormal termination, trying to clean up...

Attempting to clean up the loader temp table (so that --recreate_cache
won't be needed)...
Trying to remove the run lock (so that --remove_lock won't be needed)...
Exiting...

I am new to chado (although not to other gmod tools), but I can't seem
to find any GFF files that will upload.

Thanks for any help

Adam

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
Ganesh S Moorthy | 8 Apr 2013 22:53
Picon
Favicon

Strain table in organism.sql - trunk - not installed with Chado

Hello Scott and other members,
 
I just saw today that there is SQL to create a set of Strain tables in the organism.sql file that I checked out from trunk before installation(Feb 2013), but the tables was not created in the chado installation.
 
I did not find any mention or description of these set of tables as part of organism module in any of the schema/table descriptions pages either.
 
What is the purpose of this strain table ?
Is it meant to be created with the default Chado installtion ? (i.e. my installation has issues)
Is there a plan to be able to store strain level annotations/features without having to resort to the species+strain concatenation that Chado recommends at several places for such a scenario ? (I do see a strain_feature table that seems to be in this direction..)
 
Ganesh
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html

Gmane