genome | 24 Apr 19:23 2015

Digest for genome <at> soe.ucsc.edu - 8 updates in 7 topics

Jasmine Brown <jasminebro2 <at> gmail.com>: Apr 24 10:11AM -0500

Hello. I am currently trying to determine the best queries to use from my
blat output.psl file.
 
I used pslFilter and it worked on two of my files however, on the third
file it returned this error :
 
[jbrown <at> master Blat]$ pslFilter AotusBlat.psl AotusFilter.psl
Filtering AotusBlat.psl to AotusFilter.psl
sizeOne tStarts 4 bs 9
pslFilter: psl.c:77: pslLoad: Assertion `sizeOne == ret->blockCount' failed.
 
 
I'm not sure if there is something wrong with my AotusBlat.psl file or what
it means by blockCount failed. When I looked at the Aotus.FIlter.psl file
it only shows 3 queries so I'm assuming pslFilter did not finish running.
Help would be greatly appreciated.
 
Thanks,
 
Jasmine N. Brown, MS
PhD Student
Batzer Laboratory of Comparative Genomics
https://biosci-batzerlab.biology.lsu.edu/
 
Department of Biological Sciences
A653 Life Sciences Building
Louisiana State University
Baton Rouge, LA, 70803
James Kozubek <jkozubek <at> broadinstitute.org>: Apr 24 10:54AM -0400

Hi,
 
This is probably a common problem. I am blatting a virus against HG38 and
if I use a small section of my sequence I get an interesting hit, but if I
blat the entire virus I no longer get the hit. Am I doing something wrong?
 
 
 
If I submit a short sequence from my HCV virus...
 
GGCGCACAGGTAGAGGAAGAC
 
I get this hit...
 
BLAT Search Results
 
ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO
STRAND START END SPAN
---------------------------------------------------------------------------------------------------browser
<https://genome.ucsc.edu/cgi-bin/hgTracks?position=chr2:24872687-24872707&db=hg38&ss=../trash/hgSs/hgSs_genome_192c_a564a0.pslx+../trash/hgSs/hgSs_genome_192c_a564a0.fa&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
details <https://genome.ucsc.edu/cgi-bin/hgc?o=24872686&g=htcUserAli&i=../trash/hgSs/hgSs_genome_192c_a564a0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_192c_a564a0.fa+YourSeq&c=chr2&l=24872686&r=24872707&db=hg38&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
YourSeq 21 1 21 21 100.0% 2 + 24872687
24872707 21
 
 
 
...But if I submit my entire HCV virus I no longer get that hit
 
 
ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO
STRAND START END SPAN
---------------------------------------------------------------------------------------------------browser
<https://genome.ucsc.edu/cgi-bin/hgTracks?position=chr1:99120181-99120212&db=hg38&ss=../trash/hgSs/hgSs_genome_237f_a57cf0.pslx+../trash/hgSs/hgSs_genome_237f_a57cf0.fa&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
details <https://genome.ucsc.edu/cgi-bin/hgc?o=99120180&g=htcUserAli&i=../trash/hgSs/hgSs_genome_237f_a57cf0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_237f_a57cf0.fa+YourSeq&c=chr1&l=99120180&r=99120212&db=hg38&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
YourSeq 25 9545 9578 9678 85.2% 1 - 99120181
99120212 32browser
<https://genome.ucsc.edu/cgi-bin/hgTracks?position=chr4:51867001-51867021&db=hg38&ss=../trash/hgSs/hgSs_genome_237f_a57cf0.pslx+../trash/hgSs/hgSs_genome_237f_a57cf0.fa&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
details <https://genome.ucsc.edu/cgi-bin/hgc?o=51867000&g=htcUserAli&i=../trash/hgSs/hgSs_genome_237f_a57cf0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_237f_a57cf0.fa+YourSeq&c=chr4&l=51867000&r=51867021&db=hg38&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
YourSeq 21 7654 7674 9678 100.0% 4 - 51867001
51867021 21browser
<https://genome.ucsc.edu/cgi-bin/hgTracks?position=chr16:17970801-17970821&db=hg38&ss=../trash/hgSs/hgSs_genome_237f_a57cf0.pslx+../trash/hgSs/hgSs_genome_237f_a57cf0.fa&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
details <https://genome.ucsc.edu/cgi-bin/hgc?o=17970800&g=htcUserAli&i=../trash/hgSs/hgSs_genome_237f_a57cf0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_237f_a57cf0.fa+YourSeq&c=chr16&l=17970800&r=17970821&db=hg38&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
YourSeq 21 979 999 9678 100.0% 16 - 17970801
17970821 21browser
<https://genome.ucsc.edu/cgi-bin/hgTracks?position=chr12:117124846-117124872&db=hg38&ss=../trash/hgSs/hgSs_genome_237f_a57cf0.pslx+../trash/hgSs/hgSs_genome_237f_a57cf0.fa&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
details <https://genome.ucsc.edu/cgi-bin/hgc?o=117124845&g=htcUserAli&i=../trash/hgSs/hgSs_genome_237f_a57cf0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_237f_a57cf0.fa+YourSeq&c=chr12&l=117124845&r=117124872&db=hg38&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
YourSeq 21 1800 1826 9678 88.9% 12 + 117124846
117124872 27browser
<https://genome.ucsc.edu/cgi-bin/hgTracks?position=chr7:3875585-3875622&db=hg38&ss=../trash/hgSs/hgSs_genome_237f_a57cf0.pslx+../trash/hgSs/hgSs_genome_237f_a57cf0.fa&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
details <https://genome.ucsc.edu/cgi-bin/hgc?o=3875584&g=htcUserAli&i=../trash/hgSs/hgSs_genome_237f_a57cf0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_237f_a57cf0.fa+YourSeq&c=chr7&l=3875584&r=3875622&db=hg38&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
YourSeq 20 3261 3298 9678 76.4% 7 - 3875585
3875622 38
 
 
ACCCGCCCCTAATAGGGGCGACACTCCGCCATGAATCACTCCCCTGTGAGGAACTACTGTCTTCACGCAGAAAGCGTCTA
GCCATGGCGTTAGTATGAGTGTCGTACAGCCTCCAGGCCCCCCCCTCCCGGGAGAGCCATAGTGGTCTGCGGAACCGGTG
AGTACACCGGAATTGCCGGGAAGACTGGGTCCTTTCTTGGATAAACCCACTCTATGCCCGGCCATTTGGGCGTGCCCCCG
CAAGACTGCTAGCCGAGTAGCGTTGGGTTGCGAAAGGCCTTGTGGTACTGCCTGATAGGGTGCTTGCGAGTGCCCCGGGA
GGTCTCGTAGACCGTGCACCATGAGCACAAATCCTAAACCTCAAAGAAAAACCAAAAGAAACACCAACCGTCGCCCACAA
GACGTTAAGTTTCCGGGCGGCGGCCAGATCGTTGGCGGAGTATACTTGTTGCCGCGCAGGGGCCCCAGGTTGGGTGTGCG
CGCGACAAGGAAGACTTCGGAGCGGTCCCAGCCACGTGGAAGGCGCCAGCCCATCCCTAAAGATCGGCGCTCCACTGGCA
AATCCTGGGGAAAACCAGGATACCCCTGGCCCCTATACGGGAATGAGGGACTCGGCTGGGCAGGATGGCTCCTGTCCCCC
CGAGGTTCCCGTCCCTCTTGGGGCCCCAATGACCCCCGGCATAGGTCGCGCAACGTGGGTAAGGTCATCGATACCCTAAC
GTGCGGCTTTGCCGACCTCATGGGGTACATCCCTGTCGTGGGCGCCCCGCTCGGCGGCGTCGCCAGAGCTCTCGCGCATG
GCGTGAGAGTCCTGGAGGACGGGGTTAATTTTGCAACAGGGAACTTACCCGGTTGCTCCTTTTCTATCTTCTTGCTGGCC
CTGCTGTCCTGCATCACCACCCCGGTCTCCGCTGCCGAAGTGAAGAACATCAGTACCGGCTACATGGTGACTAACGACTG
CACCAATGACAGCATTACCTGGCAGCTCCAGGCTGCTGTCCTCCACGTCCCCGGGTGCGTCCCGTGCGAGAAAGTGGGGA
ATGCATCTCAGTGCTGGATACCGGTCTCACCGAATGTGGCCGTGCAGCGGCCCGGCGCCCTCACGCAGGGCTTGCGGACG
CACATCGACATGGTTGTGATGTCCGCCACGCTCTGCTCTGCCCTCTACGTGGGGGACCTCTGCGGTGGGGTGATGCTCGC
AGCCCAAATGTTCATTGTCTCGCCGCAGCACCACTGGTTTGTCCAAGACTGCAATTGCTCCATCTACCCTGGTACCATCA
CTGGACACCGCATGGCATGGGACATGATGATGAACTGGTCGCCCACGGCTACCATGATCTTGGCGTACGCGATGCGTGTC
CCCGAGGTCATTATAGACATCATTAGCGGGGCTCATTGGGGCGTCATGTTCGGCTTGGCCTACTTCTCTATGCAGGGAGC
GTGGGCGAAAGTCGTTGTCATCCTTCTGTTGGCCGCCGGGGTGGACGCGCGCACCCATACTGTTGGGGGTTCTGCCGCGC
AGACCACCGGGCGCCTCACCAGCTTATTTGACATGGGCCCCAGGCAGAAAATCCAGCTCGTTAACACCAATGGCAGCTGG
CACATCAACCGCACCGCCCTGAACTGCAATGACTCCTTGCACACCGGCTTTATCGCGTCTCTGTTCTACACCCACAGCTT
CAACTCGTCAGGATGTCCCGAACGCATGTCCGCCTGCCGCAGTATCGAGGCCTTCCGGGTGGGATGGGGCGCCTTGCAAT
ATGAGGATAATGTCACCAATCCAGAGGATATGAGACCCTATTGCTGGCACTACCCACCAAGGCAGTGTGGCGTGGTCTCC
GCGAAGACTGTGTGTGGCCCAGTGTACTGTTTCACCCCCAGCCCAGTGGTAGTGGGCACGACCGACAGGCTTGGAGCGCC
CACTTACACGTGGGGGGAGAATGAGACAGATGTCTTCCTATTGAACAGCACTCGACCACCGCTGGGGTCATGGTTCGGCT
GCACGTGGATGAACTCTTCTGGCTACACCAAGACTTGCGGCGCACCACCCTGCCGTACTAGAGCTGACTTCAACGCCAGC
ACGGACCTGTTGTGCCCCACGGACTGTTTTAGGAAGCATCCTGATACCACTTACCTCAAATGCGGCTCTGGGCCCTGGCT
CACGCCAAGGTGCCTGATCGACTACCCCTACAGGCTCTGGCATTACCCCTGCACAGTTAACTATACCATCTTCAAAATAA
GGATGTATGTGGGAGGGGTTGAGCACAGGCTCACGGCTGCATGCAATTTCACTCGTGGGGATCGTTGCAACTTGGAGGAC
AGAGACAGAAGTCAACTGTCTCCTTTGTTGCACTCCACCACGGAATGGGCCATTTTACCTTGCTCTTACTCGGACCTGCC
CGCCTTGTCGACTGGTCTTCTCCACCTCCACCAAAACATCGTGGACGTACAATTCATGTATGGCCTATCACCTGCCCTCA
CAAAATACATCGTCCGATGGGAGTGGGTAATACTCTTATTCCTGCTCTTAGCGGACGCCAGGGTTTGCGCCTGCTTATGG
ATGCTCATCTTGTTGGGCCAGGCCGAAGCAGCACTAGAGAAGCTGGTCATCTTGCACGCTGCGAGCGCAGCTAGCTGCAA
TGGCTTCCTATATTTTGTCATCTTTTTCGTGGCTGCTTGGTACATCAAGGGTCGGGTAGTCCCCTTAGCTACCTATTCCC
TCACTGGCCTGTGGTCCTTTAGCCTACTGCTCCTAGCATTGCCCCAACAGGCTTATGCTTATGACGCATCTGTGCATGGC
CAGATAGGAGCGGCTCTGCTGGTAATGATCACTCTCTTTACTCTCACCCCCGGGTATAAGACCCTTCTCAGCCGGTTTTT
GTGGTGGTTGTGCTATCTCCTGACCCTGGGGGAAGCCATGATTCAGGAGTGGGTACCACCCATGCAGGTGCGCGGCGGCC
GCGATGGCATCGCGTGGGCCGTCACTATATTCTGCCCGGGTGTGGTGTTTGACATTACCAAATGGCTTTTGGCGTTGCTT
GGGCCTGCTTACCTCTTAAGGGCCGCTTTGACACATGTGCCGTACTTCGTCAGAGCTCACGCTCTGATAAGGGTATGCGC
TTTGGTGAAGCAGCTCGCGGGGGGTAGGTATGTTCAGGTGGCGCTATTGGCCCTTGGCAGGTGGACTGGCACCTACATCT
ATGACCACCTCACACCTATGTCGGACTGGGCCGCTAGCGGCCTGCGCGACTTAGCGGTCGCCGTGGAACCCATCATCTTC
AGTCCGATGGAGAAGAAGGTCATCGTCTGGGGAGCGGAGACGGCTGCATGTGGGGACATTCTACATGGACTTCCCGTGTC
CGCCCGACTCGGCCAGGAGATCCTCCTCGGCCCAGCTGATGGCTACACCTCCAAGGGGTGGAAGCTCCTTGCTCCCATCA
CTGCTTATGCCCAGCAAACACGAGGCCTCCTGGGCGCCATAGTGGTGAGTATGACGGGGCGTGACAGGACAGAACAGGCC
GGGGAAGTCCAAATCCTGTCCACAGTCTCTCAGTCCTTCCTCGGAACAACCATCTCGGGGGTTTTGTGGACTGTTTACCA
CGGAGCTGGCAACAAGACTCTAGCCGGCTTACGGGGTCCGGTCACGCAGATGTACTCGAGTGCTGAGGGGGACTTGGTAG
GCTGGCCCAGCCCCCCTGGGACCAAGTCTTTGGAGCCGTGCAAGTGTGGAGCCGTCGACCTATATCTGGTCACGCGGAAC
GCTGATGTCATCCCGGCTCGGAGACGCGGGGACAAGCGGGGAGCATTGCTCTCCCCGAGACCCATTTCGACCTTGAAGGG
GTCCTCGGGGGGGCCGGTGCTCTGCCCTAGGGGCCACGTCGTTGGGCTCTTCCGAGCAGCTGTGTGCTCTCGGGGCGTGG
CCAAATCCATCGATTTCATCCCCGTTGAGACACTCGACGTTGTTACAAGGTCTCCCACTTTCAGTGACAACAGCACGCCA
CCGGCTGTGCCCCAGACCTATCAGGTCGGGTACTTGCATGCTCCAACTGGCAGTGGAAAGAGCACCAAGGTCCCTGTCGC
GTATGCCGCCCAGGGGTACAAAGTACTAGTGCTTAACCCCTCGGTAGCTGCCACCCTGGGGTTTGGGGCGTACCTATCCA
AGGCACATGGCATCAATCCCAACATTAGGACTGGAGTCAGGACCGTGATGACCGGGGAGGCCATCACGTACTCCACATAT
GGCAAATTTCTCGCCGATGGGGGCTGCGCTAGCGGCGCCTATGACATCATCATATGCGATGAATGCCACGCTGTGGATGC
TACCTCCATTCTCGGCATCGGAACGGTCCTTGATCAAGCAGAGACAGCCGGGGTCAGACTAACTGTGCTGGCTACGGCCA
CACCCCCCGGGTCAGTGACAACCCCCCATCCCGATATAGAAGAGGTAGGCCTCGGGCGGGAGGGTGAGATCCCCTTCTAT
GGGAGGGCGATTCCCCTATCCTGCATCAAGGGAGGGAGACACCTGATTTTCTGCCACTCAAAGAAAAAGTGTGACGAGCT
CGCGGCGGCCCTTCGGGGCATGGGCTTGAATGCCGTGGCATACTATAGAGGGTTGGACGTCTCCATAATACCAGCTCAGG
GAGATGTGGTGGTCGTCGCCACCGACGCCCTCATGACGGGGTACACTGGAGACTTTGACTCCGTGATCGACTGCAATGTA
GCGGTCACCCAAGCTGTCGACTTCAGCCTGGACCCCACCTTCACTATAACCACACAGACTGTCCCACAAGACGCTGTCTC
ACGCAGTCAGCGCCGCGGGCGCACAGGTAGAGGAAGACAGGGCACTTATAGGTATGTTTCCACTGGTGAACGAGCCTCAG
GAATGTTTGACAGTGTAGTGCTTTGTGAGTGCTACGACGCAGGGGCTGCGTGGTACGATCTCACACCAGCGGAGACCACC
GTCAGGCTTAGAGCGTATTTCAACACGCCCGGCCTACCCGTGTGTCAAGACCATCTTGAATTTTGGGAGGCAGTTTTCAC
CGGCCTCACACACATAGACGCCCACTTCCTCTCCCAAACAAAGCAAGCGGGGGAGAACTTCGCGTACCTAGTAGCCTACC
AAGCTACGGTGTGCGCCAGAGCCAAGGCCCCTCCCCCGTCCTGGGACGCCATGTGGAAGTGCCTGGCCCGACTCAAGCCT
ACGCTTGCGGGCCCCACACCTCTCCTGTACCGTTTGGGCCCTATTACCAATGAGGTCACCCTCACACACCCTGGGACGAA
GTACATCGCCACATGCATGCAAGCTGACCTTGAGGTCATGACCAGCACGTGGGTCCTAGCTGGAGGAGTCCTGGCAGCCG
TCGCCGCATATTGCCTGGCGACTGGATGCGTTTCCATCATCGGCCGCTTGCACGTCAACCAGCGAGTCGTCGTTGCGCCG
GATAAGGAGGTCCTGTATGAGGCTTTTGATGAGATGGAGGAATGCGCCTCTAGGGCGGCTCTCATCGAAGAGGGGCAGCG
GATAGCCGAGATGTTGAAGTCCAAGATCCAAGGCTTGCTGCAGCAGGCCTCTAAGCAGGCCCAGGACATACAACCCGCTA
TGCAGGCTTCATGGCCCAAAGTGGAACAATTTTGGGCCAGACACATGTGGAACTTCATTAGCGGCATCCAATACCTCGCA
GGATTGTCAACACTGCCAGGGAACCCCGCGGTGGCTTCCATGATGGCATTCAGTGCCGCCCTCACCAGTCCGTTGTCGAC
CAGTACCACCATCCTTCTCAACATCATGGGAGGCTGGTTAGCGTCCCAGATCGCACCACCCGCGGGGGCCACCGGCTTTG
TCGTCAGTGGCCTGGTGGGGGCTGCCGTGGGCAGCATAGGCCTGGGTAAGGTGCTGGTGGACATCCTGGCAGGATATGGT
GCGGGCATTTCGGGGGCCCTCGTCGCATTCAAGATCATGTCTGGCGAGAAGCCCTCTATGGAAGATGTCATCAATCTACT
GCCTGGGATCCTGTCTCCGGGAGCCCTGGTGGTGGGGGTCATCTGCGCGGCCATTCTGCGCCGCCACGTGGGACCGGGGG
AGGGCGCGGTCCAATGGATGAACAGGCTTATTGCCTTTGCTTCCAGAGGAAACCACGTCGCCCCTACTCACTACGTGACG
GAGTCGGATGCGTCGCAGCGTGTGACCCAACTACTTGGCTCTCTTACTATAACCAGCCTACTCAGAAGACTCCACAATTG
GATAACTGAGGACTGCCCCATCCCATGCTCCGGATCCTGGCTCCGCGACGTGTGGGACTGGGTTTGCACCATCTTGACAG
ACTTCAAAAATTGGCTGACCTCTAAATTGTTCCCCAAGCTGCCCGGCCTCCCCTTCATCTCTTGTCAAAAGGGGTACAAG
GGTGTGTGGGCCGGCACTGGCATCATGACCACGCGCTGCCCTTGCGGCGCCAACATCTCTGGCAATGTCCGCCTGGGCTC
TATGAGGATCACAGGGCCTAAAACCTGCATGAACACCTGGCAGGGGACCTTTCCTATCAATTGCTACACGGAGGGCCAGT
GCGCGCCGAAACCCCCCACGAACTACAAGACCGCCATCTGGAGGGTGGCGGCCTCGGAGTACGCGGAGGTGACGCAGCAT
GGGTCGTACTCCTATGTAACAGGACTGACCACTGACAATCTGAAAATTCCTTGCCAACTACCTTCTCCAGAGTTTTTCTC
CTGGGTGGACGGTGTGCAGATCCATAGGTTTGCACCCACACCAAAGCCGTTTTTCCGGGATGAGGTCTCGTTCTGCGTTG
GGCTTAATTCCTATGCTGTCGGGTCCCAGCTTCCCTGTGAACCTGAGCCCGACGCAGACGTATTGAGGTCCATGCTAACA
GATCCGCCCCACATCACGGCGGAGACTGCGGCGCGGCGCTTGGCACGGGGATCACCTCCATCTGAGGCGAGCTCCTCAGT
GAGCCAGCTATCAGCACCGTCGCTGCGGGCCACCTGCACCACCCACAGCAACACCTATGACGTGGACATGGTCGATGCCA
ACCTGCTCATGGAGGGCGGTGTGGCTCAGACAGAGCCTGAGTCCAGGGTGCCCGTTCTGGACTTTCTCGAGCCAATGGCC
GAGGAAGAGAGCGACCTTGAGCCCTCAATACCATCGGAGTGCATGCTCCCCAGGAGCGGGTTTCCACGGGCCTTACCGGC
TTGGGCACGGCCTGACTACAACCCGCCGCTCGTGGAATCGTGGAGGAGGCCAGATTACCAACCGCCCACCGTTGCTGGTT
GTGCTCTCCCCCCCCCCAAGAAGGCCCCGACGCCTCCCCCAAGGAGACGCCGGACAGTGGGTCTGAGCGAGAGCACCATA
TCAGAAGCCCTCCAGCAACTGGCCATCAAGACCTTTGGCCAGCCCCCCTCGAGCGGTGATGCAGGCTCGTCCACGGGGGC
GGGCGCCGCCGAATCCGGCGGTCCGACGTCCCCTGGTGAGCCGGCCCCCTCAGAGACAGGTTCCGCCTCCTCTATGCCCC
CCCTCGAGGGGGAGCCTGGAGATCCGGACCTGGAGTCTGATCAGGTAGAGCTTCAACCTCCCCCCCAGGGGGGGGGGGTA
GCTCCCGGTTCGGGCTCGGGGTCTTGGTCTACTTGCTCCGAGGAGGACGATACCACCGTGTGCTGCTCCATGTCATACTC
CTGGACCGGGGCTCTAATAACTCCCTGTAGCCCCGAAGAGGAAAAGTTGCCAATCAACCCTTTGAGTAACTCGCTGTTGC
GATACCATAACAAGGTGTACTGTACAACATCAAAGAGCGCCTCACAGAGGGCTAAAAAGGTAACTTTTGACAGGACGCAA
GTGCTCGACGCCCATTATGACTCAGTCTTAAAGGACATCAAGCTAGCGGCTTCCAAGGTCAGCGCAAGGCTCCTCACCTT
GGAGGAGGCGTGCCAGTTGACTCCACCCCATTCTGCAAGATCCAAGTATGGATTCGGGGCCAAGGAGGTCCGCAGCTTGT
CCGGGAGGGCCGTTAACCACATCAAGTCCGTGTGGAAGGACCTCCTGGAAGACCCACAAACACCAATTCCCACAACCATC
ATGGCCAAAAATGAGGTGTTCTGCGTGGACCCCGCCAAGGGGGGTAAGAAACCAGCTCGCCTCATCGTTTACCCTGACCT
CGGCGTCCGGGTCTGCGAGAAAATGGCCCTCTATGACATTACACAAAAGCTTCCTCAGGCGGTAATGGGAGCTTCCTATG
GCTTCCAGTACTCCCCTGCCCAACGGGTGGAGTATCTCTTGAAAGCATGGGCGGAAAAGAAGGACCCCATGGGTTTTTCG
TATGATACCCGATGCTTCGACTCAACCGTCACTGAGAGAGACATCAGGACCGAGGAGTCCATATACCAGGCCTGCTCCCT
GCCCGAGGAGGCCCGCACTGCCATACACTCGCTGACTGAGAGACTTTACGTAGGAGGGCCCATGTTCAACAGCAAGGGTC
AAACCTGCGGTTACAGACGTTGCCGCGCCAGCGGGGTGCTAACCACTAGCATGGGTAACACCATCACATGCTATGTGAAA
GCCCTAGCGGCCTGCAAGGCTGCGGGGATAGTTGCGCCCACAATGCTGGTATGCGGCGATGACCTAGTAGTCATCTCAGA
AAGCCAGGGGACTGAGGAGGACGAGCGGAACCTGAGAGCCTTCACGGAGGCCATGACCAGGTACTCTGCCCCTCCTGGTG
ATCCCCCCAGACCGGAATATGACCTGGAGCTAATAACATCCTGTTCCTCAAATGTGTCTGTGGCGTTGGGCCCGCGGGGC
CGCCGCAGATACTACCTGACCAGAGACCCAACCACTCCACTCGCCCGGGCTGCCTGGGAAACAGTTAGACACTCCCCTAT
CAATTCATGGCTGGGAAACATCATCCAGTATGCTCCAACCATATGGGTTCGCATGGTCCTAATGACACACTTCTTCTCCA
TTCTCATGGTCCAAGACACCCTGGACCAGAACCTCAACTTTGAGATGTATGGATCAGTATACTCCGTGAATCCTTTGGAC
CTTCCAGCCATAATTGAGAGGTTACACGGGCTTGACGCCTTTTCTATGCACACATACTCTCACCACGAACTGACGCGGGT
GGCTTCAGCCCTCAGAAAACTTGGGGCGCCACCCCTCAGGGTGTGGAAGAGTCGGGCTCGCGCAGTCAGGGCGTCCCTCA
TCTCCCGTGGAGGGAAAGCGGCCGTTTGCGGCCGATATCTCTTCAATTGGGCGGTGAAGACCAAGCTCAAACTCACTCCA
TTGCCGGAGGCGCGCCTACTGGACTTATCCAGTTGGTTCACCGTCGGCGCCGGCGGGGGCGACATTTTTCACAGCGTGTC
GCGCGCCCGACCCCGCTCATTACTCTTCGGCCTACTCCTACTTTTCGTAGGGGTAGGCCTCTTCCTACTCCCCGCTCGGT
AGAGCGGCACACACTAGGTACACTCCATAGCTAACTGTTCCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
TTTTCTTTTTTTTTTTTTTCCCTCTTTCTTCCCTTCTCATCTTATTCTACTTTCTTTCTTGGTGGCTCCATCTTAGCCCT
AGTCACGGCTAGCTGTGAAAGGTCCGTGAGCCGCATGACTGCAGAGAGTGCCGTAACTGGTCTCTCTGCAGATCATGT
James Kozubek <jkozubek <at> broadinstitute.org>: Apr 24 11:11AM -0400

OK- I see. "it may miss sequences less than 25 bases"
 
On Fri, Apr 24, 2015 at 10:54 AM, James Kozubek <jkozubek <at> broadinstitute.org
#CHEN GUOHAO# <JCHEN015 <at> e.ntu.edu.sg>: Apr 24 09:04AM

Dear admins
 
 
I got my data uploaded to Galaxy (It is already sorted)
 
 
[cid:f6b5e71c-8bd5-478f-80b9-812e5020fa8d]
 
 
But when I clicked on "display at UCSC main", i got linked UCSC browser with an error stating below:
 
 
[cid:564d5440-0f82-4c3c-b1d7-46ebc880bf5c]
 
 
May I know what could be the problem?
 
 
Regards,
 
Julius
kuhl <kuhl <at> molgen.mpg.de>: Apr 24 10:55AM +0200

Dear UCSC staff,
 
I just realized that you have set up genome preview browsers for genomes
we recently published:
 
European sea bass
http://genome-preview.ucsc.edu/cgi-bin/hgGateway?db=dicLab1
 
Canary
http://genome-preview.ucsc.edu/cgi-bin/hgGateway?db=serCan1
 
For these projects we have established local UCSC browsers at our
institutes which have lots of additional annotation tracks, you might be
interested to add these tracks to your genome preview browser versions.
 
http://seabass.mpipz.mpg.de/cgi-bin/hgGateway
http://public-genomes-ngs.molgen.mpg.de/cgi-bin/hgGateway?db=serCan1
 
Best wishes, Heiner
 
 
---------------------------------------------------------------
Dr. Heiner Kuhl
MPI Molecular Genetics Tel: + 49 + 30 / 8413 1776
Next Generation Sequencing
Ihnestrasse 73 email: kuhl <at> molgen.mpg.de
D-14195 Berlin http://www.molgen.mpg.de/SeqCore
---------------------------------------------------------------
#CHEN GUOHAO# <JCHEN015 <at> e.ntu.edu.sg>: Apr 24 08:25AM

Hi all
 
 
I set up a genome browser in a box on my local workstation. Eventually, it was for my sole purpose but now, I would like to share with my friends to co-use it.
 
 
I followed the instructions as follows:
 
[cid:9cd3f663-3728-4cbb-8951-57db1c2b34ed]
 
 
Result:
 
[cid:4c534f99-3019-4a1e-aec1-f740fab0f45b]
 
 
Now, I can't even access the browser myself (I used to access via 127.0.0.1:1234).
 
 
I read a bit online, and some say it may be because I did not allow incoming connections to port 1234, hence I did it as follows:
 
 
[cid:4a3c35f6-daa9-4a5a-912c-9a8a4364fdc4]
 
[cid:3d1aeeab-c27a-442d-ade6-e6a413d6f1a7]
 
 
And it still fail, please advice...
 
 
Regards
 
Julius
Vinayak Kulkarni <vkullu <at> gmail.com>: Apr 23 06:20PM -0700

Dear UCSC folks,
In the past I have used the file
http://hgdownload.cse.ucsc.edu/goldenpath/hg19/vsSelf/hg19.hg19.all.chain.gz
and
it's been very very useful for my analysis. Do you have a similar file for
hg38?
Many thanks,
Vinayak.
Matthew Speir <mspeir <at> soe.ucsc.edu>: Apr 23 01:16PM -0700

Hi Roger,
 
Thank you for your question about finding these different bat genomes in
the UCSC Genome Browser. These genomes are available on our preview
server as:
 
* David's myotis (bat) (Myotis davidii):
http://genome-preview.ucsc.edu/cgi-bin/hgGateway?db=myoDav1
* Big brown bat (Eptesicus fuscus):
http://genome-preview.ucsc.edu/cgi-bin/hgGateway?db=eptFus1
* Black flying-fox (Pteropus alecto):
http://genome-preview.ucsc.edu/cgi-bin/hgGateway?db=pteAle1
 
You can also find minimal downloads for these species on our preview
download server at:
 
* http://hgdownload-test.soe.ucsc.edu/goldenPath/myoDav1/
* http://hgdownload-test.soe.ucsc.edu/goldenPath/eptFus1/
* http://hgdownload-test.soe.ucsc.edu/goldenPath/pteAle1/
 
Any data on the preview server is still under development and subject to
change at any time. If you decide to use these data, please do so with
caution.
 
You are also welcome to create assembly hubs for all of these species.
Our assembly hub feature allows users to host their genome and related
annotations on a publicly-accessible web server and then visualize these
within our browser. You can find information about creating assembly
hubs on the following help pages:
 
* http://genomewiki.ucsc.edu/index.php/Assembly_Hubs
* http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html
 
Please review this previously answered mailing list question for
additional information on creating and maintaining assembly hubs:
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/ozSm1vjaxRY/yZNRpWHRcvQJ
 
If you do create an assembly hub containing these species and host the
data, we can add it to our public hubs page along side other externally
hosted hubs: http://genome.ucsc.edu/cgi-bin/hgHubConnect?
 
I hope this is helpful. If you have any further questions, please reply
to genome <at> soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genome-www <at> soe.ucsc.edu.
 
Matthew Speir
UCSC Genome Bioinformatics Group
 
 
On 4/22/15 7:48 PM, Roger Long wrote:
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 31 Mar 19:19 2015

Digest for genome <at> soe.ucsc.edu - 6 updates in 5 topics

Charlotte Hor <Charlotte.Hor <at> crg.eu>: Mar 31 03:20PM

Dear UCSC Browser Team,
 
Is it possible to find out where the SNP, CNV etc data from the DBA2/J mouse strain come from? Is there a publication linked with these data?
 
Thank you for your awesome resource and best regards,
Charlotte Hor
Anaïs Gouin <anais.gouin <at> irisa.fr>: Mar 31 12:08PM +0200

Good morning,
 
I would like to get the reciprocal best chains for my alignments. And I realized that your pipeline ( http://genomewiki.ucsc.edu/index.php/HowTo:_Syntenic_Net_or_Reciprocal_Best ) starts from best chains in one way (genomeA-referenced/genomeB as query).
I looked at this page http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto where it is said taht we have to use axtSort and axtBest to "keep only the longest chains". Is taht the way to get the best chains?
 
But thiese two tools are usable on axt file. So should I use it directly on the alignments on axt format or should I generate the chains with axtChain then convert the chain format in axt (if it is possible) and then use this filetring step directly on the chains?
 
Thanks very much in advance for your help.
 
Best,
 
Anaïs
"Yang, Haiwang (NIH/NIDDK) [F]" <haiwang.yang <at> nih.gov>: Mar 31 02:17AM

Dear Madam/Sir,
 
I contacted Ann Zweig today, and heard that the pairwise alignment between Dmel and 11 other Drosophila species will not be updated.
 
http://hgdownload.soe.ucsc.edu/goldenPath/dm6/vsDroSim1/
 
Therefore, I have a plan to do it myself, and here I have some questions:
 
I noticed in the pipeline that blastz was used, instead of the updated verion - lastz. However, the blastz is deprecated in its website.
Is the blastz really the tool that was used?
 
Did you used the masked genomes or unmasked genomes?
 
Many thanks!
 
Best,
Haiwang
dennis <dennis <at> email.unc.edu>: Mar 29 05:33PM -0400

I have seen this question discussed but all the answers I have found
just quote the blat web page. If I have a 120 nt query and get a 28 nt
hit with 24 nt being 100% match and a 3 nt gap. Blat reports that score
as 24 but 28*2-64-x is a negative number. This is based on
2*match-mismatch-gap_penalty that is described on the web page. Can one
of you folks explain where I am making a mistake?
"Steve Heitner" <steve <at> soe.ucsc.edu>: Mar 30 12:33PM -0700

Hello, Dennis.
 
The calculation you should be using is referenced in http://genome.ucsc.edu/FAQ/FAQblat.html#blat4. The relevant portion of the script is:
 
my $pslScore = $sizeMul * ($matches + ( $repMatches >> 1) ) - $sizeMul * $misMatches - $qNumInsert - $tNumInsert;
 
The value of sizeMul is either 3 or 1 depending on whether or not your query is a protein sequence or not. Since we're not dealing with a protein sequence, the value of sizeMul is 1, so the formula is essentially:
 
pslScore = #matches - #misMatches - #qInserts - #tInserts
 
Based on what you described, it sounds like the score is roughly what it should be.
 
If there is still confusion, could you let us know whether you're using gfServer with hgBlat, standalone blat or something else? Also, please provide your query sequence and the name of the assembly you're querying so we can attempt to replicate your query.
 
Please contact us again at genome <at> soe.ucsc.edu if you have any further questions. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genome-www <at> soe.ucsc.edu.
 
---
Steve Heitner
UCSC Genome Bioinformatics Group
 
-----Original Message-----
From: dennis [mailto:dennis <at> email.unc.edu]
Sent: Sunday, March 29, 2015 2:34 PM
To: genome <at> soe.ucsc.edu
Subject: [genome] Understanding Blat Score Calculation
 
I have seen this question discussed but all the answers I have found just quote the blat web page. If I have a 120 nt query and get a 28 nt hit with 24 nt being 100% match and a 3 nt gap. Blat reports that score as 24 but 28*2-64-x is a negative number. This is based on 2*match-mismatch-gap_penalty that is described on the web page. Can one of you folks explain where I am making a mistake?
 
--
"Bin Ahmad Zabidi,Muhammad Mamduh" <muhammad.zabidi <at> imp.ac.at>: Mar 30 12:34PM

Hi Brian,
 
It finally works! I’m using genome-euro.ucsc.edu<http://genome-euro.ucsc.edu> site since it’s a bit faster here.
 
Sincerely,
-Mamduh
 
 
On Mar 27, 2015, at 6:55 PM, Brian Lee <brianlee <at> soe.ucsc.edu<mailto:brianlee <at> soe.ucsc.edu>> wrote:
 
Dear Mamduh,
 
Thank you for using the UCSC Genome Browser and your question about track hubs.
 
Are you still seeing the error, perhaps you are visiting a mirror of the UCSC Genome Browser and not our official site, http://genome.ucsc.edu/?
 
Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to genome <at> soe.ucsc.edu<mailto:genome <at> soe.ucsc.edu>. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genome-www <at> soe.ucsc.edu<mailto:genome-www <at> soe.ucsc.edu>.
 
All the best,
 
Brian Lee
UCSC Genome Bioinformatics Group
 
On Wed, Mar 25, 2015 at 3:31 AM, Bin Ahmad Zabidi,Muhammad Mamduh <muhammad.zabidi <at> imp.ac.at<mailto:muhammad.zabidi <at> imp.ac.at>> wrote:
 
Hi there,
 
I’ve been trying to load my own tracks to the UCSC genome browser, but the Track hubs button doesn’t work since yesterday.
A colleague of mine here is also having the same problem.
 
It also doesn’t work in Chrome, Safari or Firefox on my computer.
 
Do you have any suggestion on how we could get around this?
 
Many thanks!
 
Sincerely,
-Mamduh
 
--
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 28 Mar 18:10 2015

Digest for genome <at> soe.ucsc.edu - 3 updates in 3 topics

Brian Lee <brianlee <at> soe.ucsc.edu>: Mar 27 11:25AM -0700

Dear Kumar,
 
Thank you for using the UCSC Genome Browser and your question about a
bedgraph memory issue when uploading a file.
 
Please see this recent conversation:
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/Eybt7oo8m5A/dhdFwDB8uMEJ
 
What would be best is to create a binary file that you can host on the
internet that the browser can access so that only bits of the file are
transferred when browsed, instead of the entire file.
 
For an overview you might want to look at this helpful wikipage:
http://genomewiki.ucsc.edu/index.php/Selecting_a_graphing_track_data_format
 
What you would do is get the bedGraphToBigWig utility for your system. You
can do uname -a and pick the utility from the matching directory here:
http://hgdownload.soe.ucsc.edu/admin/exe/
 
Then you would get the chrom.sizes file for the matching assembly, in this
case hg19 it appears. You can obtain it here:
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.chrom.sizes
 
Then you could run a command like:
bedGraphToBigWig test.bedGraph hg19.chrom.sizes out.test.bw
 
If you there are some annotations that extend past the end of chromosomes,
so you can avoid that problem using the -clip option, to clip these final
entries.
 
Our engineer shares that the chrM message you see could be a symptom of how
our hg19 assembly has a non-standard chrM, and that you might want to strip
those entries out, you could that by cat test.bedGraph | grep -v "chrM" >
test.edit.bedGraph. If you have more questions abou the chrM coordinates
please feel free to reply with more questions.
 
Thank you again for your inquiry and using the UCSC Genome Browser. If you
have any further questions, please reply to genome <at> soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-www <at> soe.ucsc.edu.
 
All the best,
 
Brian Lee
UCSC Genome Bioinformatics Group
 
On Tue, Mar 10, 2015 at 6:50 PM, Anandh Kumar <anandhakumar86 <at> gmail.com>
wrote:
 
Brian Lee <brianlee <at> soe.ucsc.edu>: Mar 27 10:55AM -0700

Dear Mamduh,
 
Thank you for using the UCSC Genome Browser and your question about track
hubs.
 
Are you still seeing the error, perhaps you are visiting a mirror of the
UCSC Genome Browser and not our official site, http://genome.ucsc.edu/?
 
Thank you again for your inquiry and using the UCSC Genome Browser. If you
have any further questions, please reply to genome <at> soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-www <at> soe.ucsc.edu.
 
All the best,
 
Brian Lee
UCSC Genome Bioinformatics Group
 
On Wed, Mar 25, 2015 at 3:31 AM, Bin Ahmad Zabidi,Muhammad Mamduh <
Brian Lee <brianlee <at> soe.ucsc.edu>: Mar 27 10:46AM -0700

Dear Dan,
 
Thank you for using the UCSC Genome Browser and your question about your
custom tracks that are not loading.
 
We have heard reports from other users also experiencing this issue with
copy.com/files that used to work on the browser. Other free services like
dropBox that once worked on the browser changed their configurations to
disallow the byte range requests needed for the data to be accessed by the
browser, it appears copy.com may have likewise changed some configurations
recently. While byte range requests are very efficient for internet access,
as it allows the browser to read arbitrary bits of the file without having
to read the entire file, some of these free providers find all these
requests expensive for their servers to honor. However when big files might
represent several gigabytes of data, doing the byte range request which
only fetches the tiny bits of the huge files to be displayed is much less
taxing in the big picture, so some system administrators are willing to
enable this request once they understand the bigger picture. You might want
to contact copy.com and ask them about changes they may have made.
 
It is definitely worth investigating with copy.com why these changes are
happening, but is likely similar to DropBox that they have discontinued
some assepct of support for files they are storing for free. We've seen
some people have success with an Amazon Cloud account on S3 storage, or
often if you are a paying client of these services they will work to
resolve your needs.
 
Another option is if you do not need to share these files remotely is to
use our now available Virtual Machine Genome Browser in a Box, where you
install a VM of the UCSC Browser and share files locally on your laptop,
skipping the need to run anything over the internet. GBiB is free for
non-commercial use, read more here:
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/xhucY1YVv88/K2gyAwEgEmcJ
 
Thank you again for your inquiry and using the UCSC Genome Browser. If you
have any further questions, please reply to genome <at> soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-www <at> soe.ucsc.edu.
 
All the best,
 
Brian Lee
UCSC Genome Bioinformatics Group
 
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 19 Mar 18:22 2015

Digest for genome <at> soe.ucsc.edu - 2 updates in 2 topics

Bogdan Tanasa <tanasa <at> gmail.com>: Mar 19 08:05AM -0700

Dear all,
 
please could you advise on the following :
(possibly based on http://hgdownload.soe.ucsc.edu/goldenPath/ce10/bigZips/)
 
<> what collection of sequences of mRNAs of C elegans ce10 genome shall I
use ?
 
<> how could I relate their NM/NR coded to the actual gene names (eg
daf-16) ?
 
thank you,
 
bogdan
Karol Nowicki-Osuch <karolno <at> gmail.com>: Mar 18 06:06PM

Hello,
 
I would like to unshare (delete) a session which I have shared with other
parties. I used 'Bookmark the session' option to share it with others.
 
I thought that simple deletion of the session from my saved session will do
the trick. However when I follow the original share link, I can still
access the session. Can you please advice how the original link to the
session can be deleted?
 
Thanks a lot,
 
Karol
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 7 Mar 18:19 2015

Digest for genome <at> soe.ucsc.edu - 2 updates in 2 topics

Jonathan Casper <jcasper <at> soe.ucsc.edu>: Mar 06 05:05PM -0800

Hello Manasa,
 
Thank you for your question about obtaining genomic sequence around the
location of your SNPs. It is straightforward to obtain genomic sequence in
a region 60 bp up and downstream of your SNPs, but tying that together with
reading frame information may be difficult.
 
To obtain genomic sequence for your SNPs, you will need to load them into
the UCSC Genome Browser as a custom track. If your data are in pgSNP or VCF
format, you can load that file directly as a custom track. Otherwise, you
may need to convert your SNP data into a simple coordinate format like BED (
http://genome.ucsc.edu/FAQ/FAQformat.html#format1). After that, open the
UCSC Table Browser at http://genome.ucsc.edu/cgi-bin/hgTables and select
your custom track. Select the region "genome" and output format "sequence",
then click "get output". On the next page, you can fill in the boxes to add
60 bases up and downstream from your SNPs.
 
Reading frame information is difficult because it is tied to particular
gene definitions, and for some species we have many gene tracks. The
Variant Annotation Integrator tool at http://genome.ucsc.edu/cgi-bin/hgVai will
allow you to submit variants in VCF or pgSNP format (or just as a list of
rsIDs from dbSNP), and generate consequences with respect to a gene set of
your choice. For missense variants, that includes a short display of any
codon changes. It will not, however, extend to 60 bases and there is
currently no way to change that setting. Perhaps you can combine this
output with the sequence from the Table Browser to get what you need?
 
I hope this is helpful. If you have any further questions, please reply to
genome <at> soe.ucsc.edu or genome-mirror <at> soe.ucsc.edu. Questions sent to those
addresses will be archived in publicly-accessible forums for the benefit of
other users. If your question contains sensitive data, you may send it
instead to genome-www <at> soe.ucsc.edu.
 
--
Jonathan Casper
UCSC Genome Bioinformatics Group
 
Valya Burskaya <valya.burskaya <at> gmail.com>: Mar 07 03:14AM +0300

Hello!
 
I am interested in work with alignments of 99 vertebrate genomes with human
genes. There are three variants of alignments for downloading in fasta
format - knownGene, knownCanonical and RefSeq. Can you, please, help me to
distinguish content of this files or tell where to search information?
 
Now I understand its content as follows:
 
KnownGene file contains exons of approximately 64.000 genes - it should be
all isoforms of protein genes, pseudogenes and RNA-genes of human. (So,
each exon could be represented in alignment many times if it belongs to
different splice variants.)
RefSeq file should be more or less the same, but set of gene ID-s was taken
from RefSeq database.
KnownCanonical file contains ~21.000 genes - there are only protein coding
genes, the longest CDS for each set of alternatively transribing genes.
(So, each exon should be represented in alignment only once).
 
And one more question.
Even in knownCanonical file there are stop codons in the middle of human
(~20 sequences) and vertebrate (~300.000 sequences) genes. Was there any
filtration of such cases when aligning was performed?
 
 
I'll be gratefull for any answer.
 
Valya Burskaya
graduate student
Moscow State University
Biology department
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 28 Feb 18:22 2015

Digest for genome <at> soe.ucsc.edu - 4 updates in 3 topics

Jennifer Tom <tom.jennifer <at> gene.com>: Feb 27 04:08PM -0800

Hi!
 
I've been working with the UCSC self chain track and would like to include
a transformed version of it in an R/bioconductor package. I would
obviously cite UCSC and provide a link to the original file. I just want
to make sure this is something that UCSC allows with it's data. Thanks!
 
Jen
James Studd <James.Studd <at> icr.ac.uk>: Feb 27 11:30AM

Hi,
 
I was hoping to find a bed file which contains the mapped motifs for transcription factors. It's not immediately clear from looking at the bed file above or the parent directory 'http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeRegTfbsClustered/' which file has this information. Is it contained within the wgEncodeRegTfbsClusteredV3.bed.gz file? If so how can you distinguish ChIP seq peaks from motifs?
 
Many thanks
 
James Studd | Postdoctoral Research Fellow
The Institute of Cancer Research | 15 Cotswold Road | Belmont | Sutton | Surrey | SM2 5NG
T +44 208 722 4113 | E James.Studd <at> icr.ac.uk<mailto:James.Studd <at> icr.ac.uk> | W www.icr.ac.uk<http://www.icr.ac.uk/> | Twitter <at> ICR_London<https://twitter.com/ICR_London>
Facebook www.facebook.com/theinstituteofcancerresearch<http://www.facebook.com/theinstituteofcancerresearch>
Making the discoveries that defeat cancer
[cid:image001.gif <at> 01D05280.BB67D4E0]<http://www.icr.ac.uk/>
 
 
The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.
 
This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.
Brian Lee <brianlee <at> soe.ucsc.edu>: Feb 27 03:20PM -0800

Dear James,
 
Thank you for using the UCSC Genome Browser and your question about
obtaining a bed file for the mapped motifs for transcription factors in the
TFBS Clusters track:
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRegTfbsClusteredV3
 
Usually where you looked would be the right place to find the downloadable
files. Instead, to avoid the kind of possible confusion you are hinting at,
and since the motifs are not universal to all clusters, there was
discussion to release this to a downloads location for a separate track
devoted to these factorbook generated motifs, but that hasn't happened yet.
You can obtain this information now, however, from both the Table Browser
or the Public MySql server.
 
To use the MySQL server, please see the resource page,
http://genome.ucsc.edu/goldenPath/help/mysql.html, and then you could use a
command like the following, or a variation of it to obtain parts or all of
the tables of interest:
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -Ne 'select * from
factorbookMotifPos;' hg19 > factorbookMotifPos
 
The other tables of interest would be the factorbookMotifPwm table and the
two tables that help coordinate the factorbook TFBS terms to the names used
at UCSC: factorbookMotifCanonical and factorbookGeneAlias. Those tables
help translate terms, for example, bot UCSC and factorbook use BRCA1, but
the differ with CTBP2 and CtBP2, or EP300 and P300.
 
You can also access all these tables from the Table Browser:
http://genome.ucsc.edu/cgi-bin/hgTables
 
1. Select the hg19 human assembly.
2. Set "group:" to "All Tables"
3. From "table:" select the factorbookMotifPos table.
*At this point you could use the filter or intersection tools to limit the
output to factors or locations of interest (via a bed file of coordinates,
see more about the Table Browser here:
http://www.openhelix.com/cgi/tutorialInfo.cgi?id=28)
4. Set "output format:" to either "custom track" or "BED" and click "get
output".
 
If what you are desiring is just the motifs displayed in the
wgEncodeRegTfbsClusteredV3 track, it becomes a little more complicated, as
algorithms are applied to limit the display of motifs from the original
factorbookMotifPos table to the highest score per region. Here is a session
link to help exemplify this issue where multiple motifs for NFY exist (NFYB
at UCSC), and only one is mapped to the cluster:
http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Brian%20Lee&hgS_otherUserSessionName=hg19.factorbookMotifPos
 
Thank you again for your inquiry and using the UCSC Genome Browser. If you
have any further questions, please reply togenome <at> soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-www <at> soe.ucsc.edu.
 
All the best,
 
Brian Lee
UCSC Genome Bioinformatics Group
 
STIRPARO Giuliano Giuseppe RIC <Giuliano.Stirparo <at> humanitasresearch.it>: Feb 27 12:58PM +0100

Hi, I have some data like this:
 
file_reverse.BedGraph
 
track type=bedGraph smoothingWindow=16
1 3201074 3201125 -0.06
1 3201215 3201266 -0.06
1 3201839 3201844 -0.06
1 3201845 3201891 -0.06
1 3201947 3201998 -0.06
1 3210947 3210998 -0.06
1 3211290 3211341 -0.06
1 3216209 3216260 -0.13
1 3216369 3216420 -0.13
1 3254766 3254817 -0.06
 
file_Forward.bedGraph
 
track type=bedGraph smoothingWindow=16
1 3152649 3152700 0.06
1 3152983 3153030 0.06
1 3153031 3153035 0.06
1 3215255 3215306 0.06
1 3215307 3215358 0.06
1 4243564 4243575 0.06
1 4243575 4243615 0.13
1 4243615 4243626 0.06
1 4496431 4496482 0.06
 
I have two question.
1) I would like to merge the data and obtain a uniq file and represent the strand coverage with different color.
2) I would like to smooth my data (I have already try with smoothingWindow=16 but the file look the same as smoothingWindow=off)
 
Thanks for ideas or suggestions.
 
Best
 
Giuliano
 
________________________________
 
[cid:image4b0b12.JPG <at> 336984a6.40a5cbe3]<http://www.humanitas.it/>
 
________________________________
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 17 Feb 18:23 2015

Digest for genome <at> soe.ucsc.edu - 1 update in 1 topic

Brian Lee <brianlee <at> soe.ucsc.edu>: Feb 16 11:00AM -0800

Dear Penny,
 
Thank you for using the UCSC Genome Browser and your question about
netClass usage.
 
The tDb and qDb items, where you have entered maskt and maskq, are meant to
represent MySQL databases that should exist in a MySQL server. What you
have entered, maskt and maskq, do not exist, explaining the error you are
seeing, "Unknown database 'maskt'.
 
In the example on the wikipage (momentarily unavailable), netClass -noAr
noClass.net ci2 cioSav2 cioSav2.net, the "ci2" and "cioSav2" represent two
assemblies that netClass accesses on a MySQL server.
 
To see more about using the public access MySQL database, and adding a
.hg.conf file, see this page:
http://genome.ucsc.edu/goldenPath/help/mysql.html Example usage: mysql
--user=genome --host=genome-mysql.cse.ucsc.edu -A -Ne 'show tables' ci2;
 
It may be helpful to note that the cioSav2 assembly, in the above example,
is one that has been added to the development MySQL environment,
http://genome-preview.ucsc.edu/cgi-bin/hgTracks?db=cioSav2, but is not
available from the Public MySQL server.
 
We have mailing list archives where people using netClass have asked
questions that may have helpful answers for your situation. Please review
this answer and search the google group archive for similar questions
before mailing the list again:
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/BYb35wDEwKI/-PKU8C9VI_MJ
 
Note the information in the above previous answer regarding If you are
aligning assemblies that are not hosted by UCSC, that you could set up your
own MySQL server and put information for accessing it into a .hg.conf file
in your home directory, or skip this netClass step of the alignment process.
 
Thank you again for your inquiry and using the UCSC Genome Browser. If you
have any further questions, please first search the google group archive
for similar questions before mailing the list at genome <at> soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-www <at> soe.ucsc.edu.
 
All the best,
 
Brian Lee
UCSC Genome Bioinformatics Group
 
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 16 Feb 18:26 2015

Digest for genome <at> soe.ucsc.edu - 1 update in 1 topic

"淚鱼" <liang_ping_ping <at> 163.com>: Feb 15 02:07PM +0800

dear,I have a problem about netClass when I doing Whole genome alignment
http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto‍).Can you help me?
---penny
 
 

You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 14 Feb 18:08 2015

Digest for genome <at> soe.ucsc.edu - 5 updates in 2 topics

"seirana.hashemi" <seirana.hashemi <at> ut.ac.ir>: Feb 13 11:16PM +0330

Hello,
 
Dear Sir/Madam,
 
I Emailed you before to ask some question about GRCh37, your answers
were very useful,and I really appreciate. In the continue I have some
other questions:
 
There are txStart and txEnd columns in UCSC Genes track, knownGene
table. what are these? also there are cdsStart and cdsEnd; are they
abbreviations for "start of coding regions" and "end of coding regions"?
and another question is: how can align minus strand to plus strand,
because I have to work only on plus strand. Is it correct? chromosome
size - txStart in minus strand = txStart in plus strand?
 
Looking forward to receiving your kind reply at your earliest
convenience
 
Yours sincerely,
Seirana Hashemi
"Steve Heitner" <steve <at> soe.ucsc.edu>: Feb 13 02:43PM -0800

Hello, Seirana.
 
Thank you for your kind words. We certainly always try to be as helpful as possible. :)
 
The txStart and txEnd fields indicate the start and end of transcription, which includes both coding and non-coding regions. As you properly identified, the cdsStart and cdsEnd fields indicate the start and end of the coding regions only.
 
Note that even transcripts on the - strand have their txStart, txEnd, cdsStart and cdsEnd fields defined in terms of the + strand. If your goal is to get everything in terms of + strand coordinates, then most of the work is already done for you. The only important thing to understand is that because everything is defined in terms of the + strand, what we list as txStart for - strand items is actually the end of transcription and what we list as txEnd for - strand items is actually the start of transcription. It is organized in this manner for purposes of drawing the items in our display.
 
Please contact us again at genome <at> soe.ucsc.edu if you have any further questions. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genome-www <at> soe.ucsc.edu.
 
---
Steve Heitner
UCSC Genome Bioinformatics Group
 

 
From: seirana.hashemi [mailto:seirana.hashemi <at> ut.ac.ir]
Sent: Friday, February 13, 2015 11:46 AM
To: genome <at> soe.ucsc.edu
Subject: [genome] some questions about UCSC Genes track, knownGene table
 

 
Hello,
 
Dear Sir/Madam,
 
I Emailed you before to ask some question about GRCh37, your answers were very useful,and I really appreciate. In the continue I have some other questions:
 
There are txStart and txEnd columns in UCSC Genes track, knownGene table. what are these? also there are cdsStart and cdsEnd; are they abbreviations for "start of coding regions" and "end of coding regions"? and another question is: how can align minus strand to plus strand, because I have to work only on plus strand. Is it correct? chromosome size - txStart in minus strand = txStart in plus strand?
 
Looking forward to receiving your kind reply at your earliest convenience
 
Yours sincerely,
 
Seirana Hashemi
 
--
"Steve Heitner" <steve <at> soe.ucsc.edu>: Feb 13 10:02AM -0800

Hello, Bogdan.
 
A very similar question was asked recently. Please see https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/IEP_aZjQp_s/-XB_nPwWcEMJ. You can also view the schema of the RepeatMasker track by viewing the track description page (http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19 <http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=rmsk> &g=rmsk) and clicking the “View table schema” link.
 
I believe your question about the reference sequences of transposons that the program uses would be best directed to the folks at RepeatMasker.
 
Please contact us again at genome <at> soe.ucsc.edu if you have any further questions. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genome-www <at> soe.ucsc.edu.
 
---
Steve Heitner
UCSC Genome Bioinformatics Group
 

 
From: Bogdan Tanasa [mailto:tanasa <at> gmail.com]
Sent: Thursday, February 12, 2015 10:26 PM
To: genome <at> soe.ucsc.edu
Subject: [genome] about RepeatMasker
 

 
Dear all,
 
please could you advise about the coordinates repStart, repEnd and repLeft in the output of RepeatMasker, and about the Reference Sequences of Retrotransposons that the program uses. thank you,
 
-- bogdan
 
--
Bogdan Tanasa <tanasa <at> gmail.com>: Feb 13 10:10AM -0800

Hi Steve,
 
thank you for your reply. If I may add one more question, this is about the
the data from ENCODE or RoadMap Epigenomics :
are the latest tracks from ENCODE or Roadmap Epigenomics already available
in UCSC genome browser ? thanks, and happy weekend,
 
bogdan
 
Brian Lee <brianlee <at> soe.ucsc.edu>: Feb 13 10:45AM -0800

Dear Bogdan,
 
Thank you for your question about the latest tracks from ENCODE or Roadmap
Epigenomics. For the latest information from both these sources you should
contact them directly.
 
At UCSC, only ENCODE data from 2003 to 2012, capturing the pilot project
and previous “ENCODE2” phase, are accessible currently. See the top of our
ENCODE page, http://genome.ucsc.edu/ENCODE/, for more information.
 
The new ENCODE portal, https://www.encodeproject.org, includes any new data
produced from 2013 to present day, as well as all publicly available ENCODE
data. For ENCODE data questions, please contact the ENCODE portal at
encode-help <at> lists.stanford.edu
 
At UCSC, for the Roadmap Epigenomics project, we feature two Public Hubs,
http://genome.ucsc.edu/cgi-bin/hgHubConnect, externally maintained and
updated by the Roadmap Project. You can find one hub for their Complete
Collection and a second Integrative Analysis Hub:
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&hubUrl=http://vizhub.wustl.edu/VizHub/RoadmapIntegrative.txt
 
For all Roadmap Epiginomics data questions, please contact their website,
http://www.roadmapepigenomics.org/
 
In summary, you should send questions directly to ENCODE or the Roadmap
Epigenomics Group with any questions about their very latest information.
If you have any further UCSC questions, please reply to genome <at> soe.ucsc.edu.
All messages sent to that address are archived on a publicly-accessible
forum. If your question includes sensitive data, you may send it instead to
genome-www <at> soe.ucsc.edu.
 
All the best,
 
Brian Lee
UCSC Genome Bioinformatics Group
 
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 24 Jan 18:23 2015

Digest for genome <at> soe.ucsc.edu - 1 update in 1 topic

asma.mistadi <at> yahoo.com: Jan 23 05:59PM -0500

I am very interested in knowing several thing about UCSC genome browser like the following:
 
What is the biggest data set that was used in this genome browser? What is the size of it?
 
How many concurrent user can your genome browser handle??
 
How fast is the navigation and uploading functions?
 
How many tracks can be shown? Is there an upper limit?
 
Best Regards
Asma Mistadi
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 23 Jan 18:07 2015

Digest for genome <at> soe.ucsc.edu - 2 updates in 2 topics

Luvina Guruvadoo <luvina <at> soe.ucsc.edu>: Jan 23 09:04AM -0800

Hello Havard,
 
Thank you for your question. For each item in a BED file, the program
reports the highest and lowest scores that were found in the boundaries of
the BED item. However, if you are looking for maxima from overlapping
tiles, if this is what you mean by "the maximum found of every 10 bp tile
within this region", then you can use the -samplAroundCenter option. The
sampleAroundCenter=N option takes a sample of N bases around the center of
each BED item. To do this, your BED file should contain one entry for each
position like so:
 
chr17 1 2
chr17 2 3
chr17 3 4
chr17 4 5
 
Use -sampleAroundCenter=10. Note, it is integer arithmetic, so fractions
are truncated (4.5 is turned into 4).
 
Alternatively, you may be interested in bwtools from Andy Pohl which can
manage any kind of intersection with bigWig files:
https://github.com/CRG-Barcelona/bwtool
 
If you have any further questions, please reply to genome <at> soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-www <at> soe.ucsc.edu.
 
- - -
Luvina Guruvadoo
UCSC Genome Bioinformatics Group
 
 
On Wed, Jan 21, 2015 at 2:24 AM, Håvard Aanes <Havard.Aanes <at> rr-research.no>
wrote:
 
Luvina Guruvadoo <luvina <at> soe.ucsc.edu>: Jan 23 08:32AM -0800

Hello Rhianna,
 
Thank you for your question. You can read more about the CpG Islands track
and how it was constructed on the description page:
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=cpgIslandSuper. A CpG
islands are regions with a high frequency of CG nucleotides which may or
may not have nearby transcription factor binding sites.
 
If you have any further questions, please reply to genome <at> soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-www <at> soe.ucsc.edu.
 
- - -
Luvina Guruvadoo
UCSC Genome Bioinformatics Group
 
 
On Wed, Jan 21, 2015 at 8:32 AM, Sundsbak, Rhianna S. <
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.

Gmane