FW:dna/gc content
-----Original Message-----
From: Jayaraman, Pushkala
Sent: Monday, May 03, 2010 11:34 AM
To: Scott Cain
Cc: gmod-gbrowse@...
Subject: RE: [Gmod-gbrowse] Gmod-gbrowse Digest, Vol 48, Issue 1
Hi Scott,
if my gff3 file were like this:
where the header of the fasta file matches the "ID". If this isn't the way to do it, then for a gff3 file like
this, should the header be the feature "Chr1" ?
I know this may be a repeated question, but if that were the case, how would it know what coordinates to match?
Chr1 GenBank chromosome 17295177 19253295 . + .
ID=NW_047547;Alias=1;Dbxref=taxon:10116;Note=Rattus norvegicus chromosome 1 genomic contig%2C
reference assembly (based on RGSC v3.4).,GENOME ANNOTATION REFSEQ: Features on this sequence have been
produced for build 4 version 1 of the NCBI's genome annotation [see documentation]. The DNA sequence is
version 3.4 'November 2004 Update' of the Rat genome assembly,produced by the Baylor College of Medicine
Human Genome Sequencing Center (BCM- HGSC) as part of the Rat Genome Sequencing Consortium (RGSC). This
supersedes all prior versions of the Rat assembly (2.0,2.1,3.0,3.1 3.2 and 3.3) (see
http://www.hgsc.bcm.tmc.edu/projects/rat/). ;chromosome=1;comment1=GENOME ANNOTATION REFSEQ:
Features on this sequence have been produced for build 4 version 1 of the NCBI's genome annotation [see
documentation]. The DNA sequence is version 3.4 'November 2004 Update' of the Rat genome assembly%2C
produced by the Baylor College of Medicine Human Genome Sequencing Center (BCM- HGSC) as part of the Rat
Genome Sequencing Consortium (RGSC). This supersedes all prior versions of the Rat assembly (2.0%2C
2.1%2C 3.0%2C 3.1 3.2 and 3.3) (see http://www.hgsc.bcm.tmc.edu/projects/rat/).
;date=22-JUN-2006;mol_type=genomic DNA;organism=Rattus norvegicus;strain=BN/SsNHsdMCW;
Chr1 GenBank exon 123057586 123057716 . + . Parent=LOC680906.t01;gene=LOC680906;Dbxref=GeneID:680906,RGD:1585883,GI:109458899;
Chr1 GenBankRGDgene exon 123057586 123057716 . + . Dbxref=GeneID:680906,RGD:1585883;ID=exon18358;Parent=RGD1585883;gene=LOC680906;Name=LOC680906;
Chr1 GenBank exon 123061579 123061678 . + . Parent=LOC680906.t01;gene=LOC680906;Dbxref=GeneID:680906,RGD:1585883,GI:109458899;
Chr1 GenBankRGDgene exon 123061579 123061678 . + . Dbxref=GeneID:680906,RGD:1585883;ID=exon18359;Parent=RGD1585883;gene=LOC680906;Name=LOC680906;
Chr1 GenBank exon 123114649 123114691 . + . Parent=LOC680906.t01;gene=LOC680906;Dbxref=GeneID:680906,RGD:1585883,GI:109458899;
Chr1 GenBankRGDgene exon 123114649 123114691 . + . Dbxref=GeneID:680906,RGD:1585883;ID=exon18360;Parent=RGD1585883;gene=LOC680906;Name=LOC680906;
Chr1 GenBank pseudogene 3595 24944 . + . ID=LOC688504;Dbxref=GeneID:688504;Note=Derived by automated
computational analysis using gene prediction method: GNOMON. Supporting evidence includes
similarity to: 165 ESTs%2C 3 Proteins;gene=LOC688504;pseudo=_no_value;
Chr1 GenBank pseudogene 36020 37359 . + . ID=LOC365518;Dbxref=GeneID:365518;Note=Derived by
automated computational analysis using gene prediction method: GNOMON. Supporting evidence includes
similarity to: 1 mRNA%2C 177 ESTs%2C 6 Proteins;gene=LOC365518;pseudo=_no_value;
Chr1 GenBank pseudogene 47021 50492 . - . ID=LOC688542;Dbxref=GeneID:688542;Note=Derived by
automated computational analysis using gene prediction method: GNOMON. Supporting evidence includes
similarity to: 2 ESTs%2C 14 Proteins;gene=LOC688542;pseudo=_no_value;
Pushkala Jayaraman
Programmer Analyst II
Rat Genome Database
Medical College of Wisconsin, Wauwatosa, WI
-----Original Message-----
From: Scott Cain [mailto:scott@...]
Sent: Mon 5/3/2010 11:31 AM
To: Jayaraman, Pushkala
Cc: gmod-gbrowse@...
Subject: Re: [Gmod-gbrowse] Gmod-gbrowse Digest, Vol 48, Issue 1
Hi Pushkala,
It looks like the names of your chromosomes in your database don't
match the names in the fasta file. The screenshot shows that you are
looking at a region of "chr5", but your fasta file has names like
"NW_047714" They need to be the same for GBrowse figure out that they
go together.
Scott
On Mon, May 3, 2010 at 12:18 PM, Jayaraman, Pushkala <pjayaraman@...> wrote:
> Hi,
> I had a problem with the DNA/GC content not being displayed properly.
>
>
> This is what I did:
>
> 1. downloaded the fasta file from here:
> ftp://ftp.ncbi.nlm.nih.gov/genomes/R_norvegicus/CHR_01/rn_ref_chr1.gbk.gz
> 2. used grep and pattern matches to have the fasta sequence in a separate file.
> 3. used sed to remove comments until the "head" of my fasta file looked liek this:
>
>>NW_047714
> TGGGTTTCTATTGTTCTTCCCACTGTTACAACTGAATGCCTGTGCTATTAGTACAATCGT
> ATGGACATCACTATCACCATACTAGTAGTATTATATCTCTACCATTACTCCATTAGTTCT
> GTTATTACTTGCACATTTTTCATTTATACCACCTGTAACCAGAGAGCTTTTGATTCTAGA
> CAAAATTTTGTGTATATAGACCCTTAATCACCTTGTCCACTGGGGTATAGTTTCTGTTAT
> TTTTGCCATTTTATTCTTTAGGAAACTGAGCATTAGAGAGCTTATGCAACCTGCTTAAAA
> GACTGAGGCTGGAATTCTAATGCTGTAACTGAAGAACTTATATTTCTCCTAGGGCCCTTG
> TGGAGATATATGAGATACATGAATAGAAGATTGGTTGATTGAATAAAATGTGGCTCTTTT
> TGAAGTGTCATTTTGATAGGAAATCATAGGGAGGTGAATACTGACGAGGGGAATAAGTGT
> GGCAATATAAATCAGGGTTATCGGACAGTCTATTTCTATCAACGTAGAAACCCTTATTTT
>
>
> 4. I then used the bp_seqfeatureload.pl and loaded the fasta file into the database that had the genes gff3
files in it.
> 5. I added the track:
>
> [DNA]
> glyph = dna
> global feature = 1
> height = 40
> do_gc = 1
> gc_window = auto
> fgcolor = red
> axis_color = blue
> strand = both
> category = DNA
> key = DNA/GC Content
>
> in my .conf file and reloaded the webpage:
> the screenshot of what I see is attached herewith:
>
> I do not see any dna/gc content. where have I gone wrong?
>
>
> Pushkala Jayaraman
> Programmer Analyst II
> Rat Genome Database
> Medical College of Wisconsin, Wauwatosa, WI
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Gmod-gbrowse mailing list
> Gmod-gbrowse@...
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>
>
--
--
------------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research
------------------------------------------------------------------------------