Digest for genome <at> soe.ucsc.edu - 12 Messages in 11 Topics
<genome <at> soe.ucsc.edu>
2013-04-05 17:36:51 GMT
Group: http://groups.google.com/a/soe.ucsc.edu/group/genome/topics
chenjh <chenjhbio <at> 163.com> Apr 05 04:24PM +0800
dear sir,
Very sorry to trouble you.
But we met problems,would you like to tell us how to download human dbSNP 137?
many thanks for that in advance
Luvina Guruvadoo <luvina <at> soe.ucsc.edu> Apr 05 09:50AM -0700
Hello,
You can download dbSNP 137 tables directly from our downloads server.
From the main page, click on 'Downloads' (located on left-side
navigation menu). On the following page, click on 'Human' then
'Annotation Database'. This will take you here:
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/. Scroll down
and you will find all download files associated with dbSNP 137.
Please contact us again at genome <at> soe.ucsc.edu if you have any further
questions.
---
Luvina Guruvadoo
UCSC Genome Bioinformatics Group
On 4/5/2013 1:24 AM, chenjh wrote:
Peng Yu <pengyu.ut <at> gmail.com> Apr 05 11:31AM -0500
Hi,
The following mysql code shows that many proteins appears more than
once in knownGene.
~~~
mysql> select count(*) from knownGene where proteinID!="";
+----------+
| count(*) |
+----------+
| 41735 |
+----------+
1 row in set (0.32 sec)
mysql> select count(distinct proteinID) from knownGene where proteinID!="";
+---------------------------+
| count(distinct proteinID) |
+---------------------------+
| 34117 |
+---------------------------+
1 row in set (0.23 sec)
mysql> select proteinID from knownGene group by proteinID having
count(*) >=2 limit 5;
+-----------+
| proteinID |
+-----------+
| |
| A0ELI5 |
| A0JNT0 |
| A0JNY8 |
| A0JNZ2 |
+-----------+
5 rows in set (0.26 sec)
mysql> select * from knownGene where proteinID="A0ELI5";
+------------+-------+--------+----------+----------+----------+----------+-----------+-----------------------------------------------------------------+-----------------------------------------------------------------+-----------+------------+
| name | chrom | strand | txStart | txEnd | cdsStart |
cdsEnd | exonCount | exonStarts
| exonEnds
| proteinID | alignID |
+------------+-------+--------+----------+----------+----------+----------+-----------+-----------------------------------------------------------------+-----------------------------------------------------------------+-----------+------------+
| uc009pvp.1 | chr9 | + | 57556375 | 57597969 | 57561204 |
57596302 | 7 |
57556375,57561187,57563754,57574992,57587696,57592391,57595967, |
57556506,57561368,57564074,57575328,57587850,57592609,57597969, |
A0ELI5 | uc009pvp.1 |
| uc009pvq.1 | chr9 | + | 57556375 | 57597969 | 57575057 |
57596302 | 6 |
57556375,57561187,57574992,57587696,57592391,57595967, |
57556506,57561368,57575328,57587850,57592609,57597969, |
A0ELI5 | uc009pvq.1 |
+------------+-------+--------+----------+----------+----------+----------+-----------+-----------------------------------------------------------------+-----------------------------------------------------------------+-----------+------------+
2 rows in set (0.07 sec)
~~~
The following shows that UniProt has more mouse proteins than the ones
in knownGene.
~~~
~$ wget -qO- ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/proteomes/MOUSE.fasta.gz
| gunzip | grep '^>' | wc -l
50798
~~~
Does anybody know why for both these questions?
--
Regards,
Peng
"Steve Heitner" <steve <at> soe.ucsc.edu> Apr 05 08:19AM -0700
Hello, Shijian.
If you are unable to solve the connection problem at your institution, you
may consider one of the many public storage solutions available. Also, we
recommend using HTTP rather than FTP. The URL will operate more
efficiently, requiring only one connection and a single request rather than
two connections and several setup requests.
Please contact us again at genome <at> soe.ucsc.edu if you have any further
questions.
---
Steve Heitner
UCSC Genome Bioinformatics Group
From: genome <at> soe.ucsc.edu [mailto:genome <at> soe.ucsc.edu] On Behalf Of Zhangsj
Sent: Monday, April 01, 2013 4:39 PM
To: genome
Subject: Re:RE: [genome] Custom Track URL error
Sorry, I should had told that I had tried this and I really could view my
data on browser and I don't set the listen_address for vsftp. So I guess
it's not the IP permit for UCSC to access my FTP.
------------------ Original ------------------
From: "Steve Heitner"<steve <at> soe.ucsc.edu>;
Date: Tue, Apr 2, 2013 01:59 AM
To: "'Zhangsj'"<zhangsjsky <at> foxmail.com>; "'genome'"<genome <at> soe.ucsc.edu>;
Subject: RE: [genome] Custom Track URL error
Hello, Shijian.
I just attempted to browse to your FTP site and it was unavailable. If you
are unable to view the contents of the directory in a web browser, you will
likewise be unable to load your files as a custom track. Make sure that
when you browse to ftp://162.105.138.90, you can see your files in the
directory. Once you can do that, try loading your custom track again.
Please contact us again at genome <at> soe.ucsc.edu if you have any further
questions.
---
Steve Heitner
UCSC Genome Bioinformatics Group
From: genome <at> soe.ucsc.edu [mailto:genome <at> soe.ucsc.edu] On Behalf Of Zhangsj
Sent: Monday, April 01, 2013 8:39 AM
To: genome
Subject: [genome] Custom Track URL error
Hi, dear UCSC members
I want to upload BAM file to UCSC, but got the following error:
Can't access My BAM's bigDataUrl ftp://162.105.138.90/adipose.sorted.bam
and/or the associated index file
ftp://162.105.138.90/adipose.sorted.bam.bai: TCP non-blocking connect() to
162.105.138.90 timed-out in select() after 10000 milliseconds - Cancelling!
The track line is: track type=bam name="My BAM"
bigDataUrl=ftp://162.105.138.90/adipose.sorted.bam
I had set up FTP server on 162.105.138.90 with vsFTP and config it as
passive connection mode as UCSC requires. The anonymous FTP is on. The
adipose.sorted.bam.bai has been in the same directory as adipose.sorted.bam.
Did I miss something else? Hope your reply.
------------------
With best wishes,
Shijian Sky Zhang
---------------------------------------
Institute of Molecular Medicine, Peking University
Yingjie Exchange Center
Beijing 100871, China
E-mail: zhangsjsky <at> foxmail.com
zhangsjsky <at> pku.edu.cn
--
--
Pauline Fujita <pauline <at> soe.ucsc.edu> Apr 04 05:13PM -0700
Hello Mark,
We think we might have corrected the issue. If you are still having
trouble with this please contact the mailing list again at
genome <at> soe.ucsc.edu.
Best regards,
Pauline Fujita
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
Pauline Fujita <pauline <at> soe.ucsc.edu> Apr 04 01:51PM -0700
Hello Herty,
If you are wanting to draw graphs of signal data from ENCODE, you probably
want to use the ENCODE uniform signals which are in bigWig format. These
are available from the ENCODE downloads page:
encodeproject.org/ENCODE/downloads.html
You may also find our documentation on ENCODE file formats useful:
http://encodeproject.org/FAQ/FAQformat.html#ENCODE
As far as R packages are concerned, unfortunately that is beyond the scope
of this list. There are quite a few packages available online, you will
have to see which best fits your needs. A good starting point might be the
bioconductor site:
http://www.bioconductor.org/help/search/index.html?q=R
Best of luck with your research! If you have further questions regarding
Genome Browser usage, please feel free to contact the mailing list again at
genome <at> soe.ucsc.edu.
Best regards,
Pauline Fujita
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
On Thu, Apr 4, 2013 at 2:45 AM, Herty LIANY (GIS)
Pauline Fujita <pauline <at> soe.ucsc.edu> Apr 04 01:25PM -0700
Hello Sarah,
You can find the files you're look for in the downloads area of our test site:
http://hgdownload-test.soe.ucsc.edu/goldenPath/nemVec1/bigZips/
Note that in general, items on our test server have not been through
our quality assurance process and are subject to change.
Best regards,
Pauline Fujita
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu
Kate Rosenbloom <kate <at> soe.ucsc.edu> Apr 04 12:59PM -0700
Hello Brenden,
The Txn Factor ChIP-seq browser track contains clusters of enriched
sites from ChIP-seq performed in cell types from
all 3 tiers of ENCODE. You can see the complete list of cell types
included in this track on the track description page:
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRegTfbsClusteredV2
The table you pulled from the Table Browser includes information for
each cluster indicating which cell types contributed to the cluster, and
what the signal strength for that cell type was in the peak contributing
to the cluster, but not the specific coordinates for the peaks on a
cell-specific basis. For that, you would need the individual
per-cell-type datasets. The uniform ChIP-seq peaks that were used to
generate this version of the Txn Factor ChIP track are available at the
ENCODE Analysis Track Hub, for downloads, and access with the Table
Browser (and Genome Browser). For a limited number of datasets, you can
readily extract the peak coordinates for your region of interest using
the Table Browser after loading the track hub. You can do this directly
from the browser Track Connect page, or indirectly from the ENCODE
downloads page (click the TFBS link in the Uniform Peaks section
header): http://encodeproject.org/ENCODE/downloads.html
Unfortunately the ENCODE data is so voluminous that often downloading
the files and processing at your site is necessary to extract data for
your specific needs. To extract all ChIP-seq peaks in your region of
interest from the 400+ datasets represented here, you would need to
download the TFBS Peaks (SPP) via the download link, and filter the
resulting narrowPeak BED files for your regions of interest.
Fortunately the Peak files are the most compact of the ENCODE files, so
should be relatively easy to download.
Note that the datasets represented here are based on data submitted for
the ENCODE January 2011 data freeze, with analysis reported by the
ENCODE Analysis Working Group in coordinated publications in September
2012: http://encodeproject.org/ENCODE/analysis.html.
Cheers,
Kate
---
Kate Rosenbloom
UCSC Genome Bioinformatics
On 4/4/13 8:32 AM, Chen, Brenden wrote:
Brooke Rhead <rhead <at> soe.ucsc.edu> Apr 04 12:18PM -0700
Hi Bob,
Input signal values are multiplied by a normalization factor calculated
as the ratio of the maximum score value (1000) to the signal value at 1
standard deviation from the mean, with values exceeding 1000 capped at
1000. This has the effect of distributing scores up to mean + 1std
across the score range, but assigning all above to the max score.
We are not aware of a standard threshold for weak interactions.
If you have further questions, please feel free to contact us again at
genome <at> soe.ucsc.edu.
--
Brooke Rhead
UCSC Genome Bioinformatics Group
On 4/2/13 3:58 PM, Robert O'Connor wrote:
Brooke Rhead <rhead <at> soe.ucsc.edu> Apr 04 12:00PM -0700
Hi Navin,
If you are using the Table Browser to and the "CDS FASTA" output format,
then the CDS alignments from the UCSC Genes track (table: knownGene) are
in phase. If you are using RefSeq Genes (table: refGene), a few of the
refGene exons might not be in phase after the first exon since the mRNA
may have bases in it that aren't in the reference sequence, but the huge
majority will be in phase.
--
Brooke Rhead
UCSC Genome Bioinformatics Group
On 4/3/13 12:15 PM, Navin Rustagi wrote:
Brian Lee <brianlee <at> soe.ucsc.edu> Apr 04 11:47AM -0700
Dear Mark,
Thank you for using the UCSC Genome Browser and your question about
obtaining chromosomal location of genes from unique identifiers such
as UCSC transcripts and Entrez Gene accession numbers.
You can use our Table Browser "identifiers (names/accessions):"
function to upload a list of genes and select the desired output from
the corresponding track. For example, you can pull the coordinates
from the UCSC genes knownGene table if you upload a file of UCSC gene
names similar to this:
uc007aet.1
uc007aeu.1
uc007aev.1
...
1. Navigate to the Table Browser Tool,
http://genome.ucsc.edu/cgi-bin/hgTables, and make the following
selections:
Clade: Mammal
Genome: Mouse
Assembly: mm9
Group: Genes and Gene Prediction Tracks
Track: UCSC Genes
Table: knownGene
Region: genome
2. Click the "upload list" button to use your UCSC genes identifiers
file in the format described above.
3. Set "output format:" to "selected fields from primary and related
tables" and click "get output".
4. On the "Select Fields from mm9.knownGene" page, select your desired
information. For example you can select "name", "chrom", "txStart",
and "txEnd" and then click "get output" to get information similar to
this:
#name chrom txStart txEnd
uc007aet.1 chr1 3195984 3205713
uc007aeu.1 chr1 3204562 3661579
uc007aev.1 chr1 3638391 3648985
...
If you have Entrez Gene identifiers of the type like "382301", which
corresponds to UCSC identifier "uc009vfk.1", you can get the same
output using the Table Browser "filter" tool.
1. Repeat step 1 above.
2. Click the "clear list" button to remove the previous identifier
operation. Next to "filter:", click the "create" button.
3. Scroll down to the "Linked Tables" section and put a check mark
next to the "knownToLocusLink" table and click the "allow filtering
using fields in checked tables" button.
4. Under "mm9.knownToLocusLink based filters", find the "values does
match" line. Replace the "*", with a list of your Entrez Gene
identifiers, for example paste, "18019 11735 11496 16709 12916 ..."
The box is small, but it will accept a list of identifiers.
5. Repeat step 3 from above.
6. Repeat step 4 from above. Here you could use the "Linked Tables" to
add the "value" from knownToLocusLink in your output and you will get
output like:
#mm9.knownGene.name mm9.knownGene.chrom mm9.knownGene.txStart
mm9.knownGene.txEnd mm9.knownToLocusLink.value
uc008oav.1 chr2 168301909 168415848 18019
When selecting a table to browse in the Table Browser, by clicking the
"describe table schema" button you can learn more about other tables.
For example you can also change the above examples Table Browser
settings to "Track: RefSeq Genes" and "Table: RefGene", if you are
using identifiers like "NM_001195025". Or you can make similar filter
selections using Linked Tables based on RefSeq's RefGene table rather
than UCSC knownGene to obtain RefSeq coordinate output. For example
linking through "knownToRefSeq" and then "knownToLocusLink" you can
filter on Entrez Gene IDs and select output like:
#mm9.refGene.name mm9.refGene.chrom mm9.refGene.txStart
mm9.refGene.txEnd mm9.knownToLocusLink.value
NM_010899 chr2 168301909 168415783 18019
Unfortunately it looks like we do not have any track or tables that
contain MGI identifiers of the type "MGI:1341105"
Thank you again for your inquiry and using the UCSC Genome Browser. If
you have further questions please feel free to contact the mailing
list again at genome <at> soe.ucsc.edu.
All the best,
Brian Lee
UCSC Genome Bioinformatics Group
You received this message because you are subscribed to the Google Group genome.
You can post via email.
To unsubscribe from this group, send an empty message.
For more options, visit this group.
--