genome | 21 Jan 18:23 2016

Digest for genome <at> soe.ucsc.edu - 5 updates in 5 topics

Luvina Guruvadoo <luvina <at> soe.ucsc.edu>: Jan 20 03:23PM -0800

Hello Electra,
 
Thank you for your question. We encourage the use of HTTP over FTP simply
because FTP is an older system. In terms of performance, the hub.txt and
genome.txt files use a relatively small amount of data and are used less
intensively. Large files such as bigWig, etc should be mounted on HTTP for
better performance.
 
Also note, hub support for CRAM has not yet been released. It will be a few
weeks before this feature is available on our public site.
 
 
*If you have any further questions, please reply to genome <at> soe.ucsc.edu
<genome <at> soe.ucsc.edu>. All messages sent to that address are archived on a
publicly-accessible forum. If your question includes sensitive data, you
may send it instead to genome-www <at> soe.ucsc.edu <genome-www <at> soe.ucsc.edu>.*
Regards,
Luvina
 
--
Luvina Guruvadoo
UCSC Genome Browser
http://genome.ucsc.edu
 
 
 
 
On Tue, Jan 12, 2016 at 6:16 AM, Electra Tapanari. <tapanari <at> ebi.ac.uk>
wrote:
 
 
--
Luvina Guruvadoo
UCSC Genome Browser
http://genome.ucsc.edu
"Wang, Yuan (Simon)" <Yuan.Wang2 <at> ucsf.edu>: Jan 20 09:33PM

Dear staff,
 
I am trying to upload a bigwig file to UCSC but could not be able to through the URL link from dropbox. I am struggling with this. Could you give me some suggestions? Thanks a lot!
 
Best regards,
Simon
Luvina Guruvadoo <luvina <at> soe.ucsc.edu>: Jan 20 03:07PM -0800

Hello,
 
Thank you for your question. We used Pfam version 27.
 
 
*If you have any further questions, please reply to genome <at> soe.ucsc.edu
<genome <at> soe.ucsc.edu>. All messages sent to that address are archived on a
publicly-accessible forum. If your question includes sensitive data, you
may send it instead to genome-www <at> soe.ucsc.edu <genome-www <at> soe.ucsc.edu>.*
Regards,
Luvina
 
--
Luvina Guruvadoo
UCSC Genome Browser
http://genome.ucsc.edu
 
 
 
 
On Tue, Jan 19, 2016 at 1:38 AM, Genetics Savvy <genetics.savvy <at> gmail.com>
wrote:
 
 
--
Luvina Guruvadoo
UCSC Genome Browser
http://genome.ucsc.edu
Luvina Guruvadoo <luvina <at> soe.ucsc.edu>: Jan 20 03:06PM -0800

Hello Summer,
 
Thank you for contacting us. To answer your question, it really depends on
who generated the data we display in the Genome Browser.
 
For example, VCF files (e.g. 1000 Genomes Variants) left-align indels on
the forward strand of the reference genome. This is a convention, but not a
requirement, of VCF. You can read more about it here:
http://sourceforge.net/p/vcftools/mailman/message/34477041/
 
dbSNP's mapping processes are complex and they may map to the forward or
reverse strand of the reference genome, although in their VCF they have to
use forward strand coordinates. We don't use their VCF. We suggest you
contact them directly for more information about their policy on left or
right alignment.
 
HGVS notation requires that the indel be right-aligned on the strand of
transcription (3'-most representation). There is one variant annotation
tool, CAVA, that internally right-aligns indels. See
http://www.genomemedicine.com/content/pdf/s13073-015-0195-6.pdf.
 
Here are a few other links that may be of interest:
 
Description of the vt tool's algorithm for left-aligning variants for
consistent representation in VCF
http://genome.sph.umich.edu/wiki/Variant_Normalization
 
"Indel Left/Right Alignment" discussion
https://www.biostars.org/p/66843/
 
ClinVar right-aligns for HGVS, but not dbSNP's VCF
http://www.ncbi.nlm.nih.gov/clinvar/docs/faq/#leftright
 
A discussion of various tools' alignment behavior in the context of HGVS
notation
http://blog.goldenhelix.com/ajesaitis/variant-notation-in-simplicity-we-find-complexity/
 
 
 
*If you have any further questions, please reply to genome <at> soe.ucsc.edu
<genome <at> soe.ucsc.edu>. All messages sent to that address are archived on a
publicly-accessible forum. If your question includes sensitive data, you
may send it instead to genome-www <at> soe.ucsc.edu <genome-www <at> soe.ucsc.edu>.*
Regards,
Luvina
 
--
Luvina Guruvadoo
UCSC Genome Browser
http://genome.ucsc.edu
 
 
 
 
On Tue, Jan 12, 2016 at 2:04 PM, Summer Elasady <
 
--
Luvina Guruvadoo
UCSC Genome Browser
http://genome.ucsc.edu
Luvina Guruvadoo <luvina <at> soe.ucsc.edu>: Jan 20 01:26PM -0800

Hello Hyunho,
 
Thank you for your question. We recommend using BLAT for same species
liftOver and the lastz pipeline for more distant species. Please see the
following wiki page for more information:
http://genomewiki.ucsc.edu/index.php/Same_species_lift_over_construction.
Hopefully, this will help you decide which method to use.
 
 
*If you have any further questions, please reply to genome <at> soe.ucsc.edu
<genome <at> soe.ucsc.edu>. All messages sent to that address are archived on a
publicly-accessible forum. If your question includes sensitive data, you
may send it instead to genome-www <at> soe.ucsc.edu <genome-www <at> soe.ucsc.edu>.*
Regards,
Luvina
 
--
Luvina Guruvadoo
UCSC Genome Browser
http://genome.ucsc.edu
 
 
 
 
 
--
Luvina Guruvadoo
UCSC Genome Browser
http://genome.ucsc.edu
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 31 Dec 18:07 2015

Digest for genome <at> soe.ucsc.edu - 1 update in 1 topic

Robert Kuhn <kuhn <at> soe.ucsc.edu>: Dec 30 10:17AM -0800

Hello, Laura,
 
Thanks for your report. We are looking into it. Things are slow here
until January,
however, as the University is closed.
 
regards,
 
--b0b kuhn
ucsc genome bioinformatics group
 
 
On Tue, Dec 29, 2015 at 1:56 PM, 'Laura Smith' via UCSC Genome Browser
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 18 Dec 18:20 2015

Digest for genome <at> soe.ucsc.edu - 7 updates in 5 topics

"Yu, Hui" <hui.yu <at> Vanderbilt.Edu>: Dec 17 02:08PM

Dear UCSC genome support,
 
As you see, kgIDs appear like uc011mwm.1, uc011mwn.1, uc001adr.2, uc001ahe.3, etc. I'm wondering what those .1, .2, .3 mean? When I tried to map kgIDs to gene symbols, I found my kgIDs may not always found a match in the current kgXref table. I am considering if I could ignore the one digit after the period? Do these single numbers indicate sequence version?
 
Thanks,
 
Hui Yu, PhD
Research Fellow
Center for Quantitative Sciences
2220 Pierce Avenue, 482T PRB
Nashville, TN 37232-6848
Phone: +1 615 875 9689
"Steve Heitner" <steve <at> soe.ucsc.edu>: Dec 17 02:30PM -0800

Hello, Hui.
 
For an ID like uc011mwn.1, the .1 represents the revision number of the transcript. When a new version of UCSC Genes is released, a transcript like uc011mwn.1 could possibly remain the same, it could become uc011mwn.2, it could receive a new transcript ID entirely or it could disappear altogether from the new version of UCSC Genes.
 
On hg38, the current version of UCSC Genes is version 9. The current version of UCSC Genes is always contained in the table knownGene. When version 8 was replaced by version 9, the old version 8 tables were renamed knownGeneOld8 and kgXrefOld8. The differences in transcripts between version 8 and version 9 are tracked with the table kg8ToKg9. This same schema is repeated every time UCSC Genes is updated, so on hg19, you will find several knownGeneOld# tables, several kgXrefOld# tables and several kg#ToKg# tables.
 
You could possibly ignore the revision number and still get a match, but that will only work if a transcript retained the same transcript ID with a new revision number (e.g., uc011mwn.1 to uc011mwn.2). For transcripts IDs that changed or disappeared entirely, this will not work. Note the following:
 
mysql> select oldId,newId from kg8ToKg9 limit 5;
+------------+------------+
| oldId | newId |
+------------+------------+
| uc001aaa.3 | |
| uc001aab.3 | |
| uc010nxq.1 | |
| uc001aae.4 | |
| uc009vit.3 | uc031tla.1 |
+------------+------------+
5 rows in set (0.00 sec)
 
Note that the first 4 IDs in the list disappeared entirely from version 8 to version 9 and the last changed from uc009vit.3 to uc031tla.1.
 
For IDs you are having problems mapping, you can try querying kgXrefOld# or you can track the ID changes from version to version through the kg#ToKg# tables. You can download the tables in their entirety from http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/ or http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/. You can also query the tables on our public MySql server: https://genome.ucsc.edu/goldenPath/help/mysql.html
 
Please contact us again at genome <at> soe.ucsc.edu <mailto:genome <at> soe.ucsc.edu> if you have any further questions. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genome-www <at> soe.ucsc.edu <mailto:genome-www <at> soe.ucsc.edu> .
 
---
Steve Heitner
UCSC Genome Bioinformatics Group
 

 
From: Yu, Hui [mailto:hui.yu <at> Vanderbilt.Edu]
Sent: Thursday, December 17, 2015 6:08 AM
To: genome <at> soe.ucsc.edu
Subject: [genome] UCSC kgID string format
 

 
Dear UCSC genome support,
 

 
As you see, kgIDs appear like uc011mwm.1, uc011mwn.1, uc001adr.2, uc001ahe.3, etc. I'm wondering what those .1, .2, .3 mean? When I tried to map kgIDs to gene symbols, I found my kgIDs may not always found a match in the current kgXref table. I am considering if I could ignore the one digit after the period? Do these single numbers indicate sequence version?
 

 
Thanks,
 

 
Hui Yu, PhD
 
Research Fellow
 
Center for Quantitative Sciences
 
2220 Pierce Avenue, 482T PRB
 
Nashville, TN 37232-6848
 
Phone: +1 615 875 9689
 

 
--
"Dong, Xianjun" <XDONG <at> RICS.BWH.HARVARD.EDU>: Dec 17 04:17PM

Dear Brian,
 
Thank you for explaining this in such a detail. Now I understand that factorbookMotifPos table has positions for all canonical motifs, and the “Transcription Factor ChIP-seq (161 factors) from ENCODE with Factorbook Motifs “ track only shows some of them (so called “significant” motifs in your email).
 
I understand that UCSC select the “significant” motifs based on:
1. if multiple hits for the same factor, pick the one with highest score using table factorbookMotifPos. This is the case for SRF
2. if it’s a tethered binder, it won’t show in the track, e.g. GATA3. (Just to be on the same page: The current display of GATA3 in the track is a bug which you will fix it later).
3. Only the primary canonical motif in Factorbook Table S1 will be shown. This is the case for v-JUN.
 
Is it accurate to say so?
 
I’m also curious if you calculate rule #2 and #3 on fly when displaying the track or you use some internal MySQL table, as I cannot tell such information in the MySQL tables.
 
Thanks,
Xianjun
 
Brian Lee <brianlee <at> soe.ucsc.edu>: Dec 17 11:20AM -0800

Dear Xianjun,
 
Thank you for your message. The engineer for this track will be contacting
you directly soon to help clarify your remaining questions.
 
All the best,
Brian Lee
 
On Thu, Dec 17, 2015 at 8:17 AM, Dong, Xianjun <XDONG <at> rics.bwh.harvard.edu>
wrote:
 
Warren Anderson <warren.anderson <at> jefferson.edu>: Dec 17 11:02AM -0500

General question: why does the UCSC genome browser give two sets of
chromosomal coordinates for the same refSeq ID, and why do SNP from one of
the sets fail to show up in dbSNP??
 
for example, when I search for rat Tnf (NM_012675) in the UCSC genome
browser, I get these results:
 
NM_012675 at chr20:4855829-4858446
<https://genome.ucsc.edu/cgi-bin/hgTracks?position=chr20:4855829-4858446&hgsid=461518423_9vPDeSxeWtQAB0rA1IOIHPjpgJOM&refGene=pack&hgFind.matches=NM_012675,>NM_012675
at chr20:5189383-5192000
<https://genome.ucsc.edu/cgi-bin/hgTracks?position=chr20:5189383-5192000&hgsid=461518423_9vPDeSxeWtQAB0rA1IOIHPjpgJOM&refGene=pack&hgFind.matches=NM_012675,>
 
 
 
However, when I search for SNPs from one strand in dbSNP, it does not
identify the SNP in the Tnf gene:
 
[image: Inline image 1]
 
<http://www.ncbi.nlm.nih.gov/snp/198435874>
rs198435874
<http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=198435874>*
[Rattus norvegicus]*
 
TCCCTCCCTCAGCAAACCACCAAGC[A/G]GAGGAGCAGCTGGAGTGGCTGAGCC
 
Chromosome: 20:4857207
Gene: LOC103694380 (GeneView
<http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?geneId=103694380>)
Functional
Consequence: synonymous codon Validated: by frequency
This was the case for many other genes as well. Can someone please explain?
Ben Shirley <bshirley1234 <at> gmail.com>: Dec 17 12:12AM -0500

UCSC support group,
We have a website where as part of a larger process a user generates a set
of genome browser tracks. They're joined together into a single gff file
and we'd like to allow users to view this gff file on UCSC by clicking a
single link. Using the URL below, this works correctly and the tracks are
displayed.
 
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&org=human&position=chr22&hgt.customText=http://path-to-our-gff-file
 
The issue is, the track names in the gff file are dynamically generated and
describe the contents of the track. These names change for each execution
of our tool, so if a user views the tracks on UCSC after one execution of
the tool, when they view the tracks after a second execution using
different input, all of the tracks related to both executions are visible
in the genome browser itself and if the "manage custom tracks" button is
clicked.
 
Is there a way to create a URL that is self-contained? As in, a URL users
could visit which would not contain other custom tracks except those in the
current file specified by hgt.customText? I looked into sessions as a way
to do this, as it would seem to be the logical way to do it, but I wasn't
able to find a way to create a new (temporary) session for each results set
using a URL.
 
Ben
Christopher Lee <chmalee <at> ucsc.edu>: Dec 17 09:07AM -0800

Dear Pratibha,
 
Thank you for your question about quantifying the elevated H3K27Ac
signal for a specific region.
 
One tool you can use to accomplish this task is the Data Integrator
(http://genome.ucsc.edu/cgi-bin/hgIntegrator).
 
Once on the Data Integrator page, follow these steps to obtain
quantified signal information:
 
1) Under the "Select Genome Assembly and Region" section:
a) Choose your genome and assembly of interest
b) In the "region to annotate" text box paste in the genomic
location of your variant of interest. If you do not know the
specific location of your variant you may paste in a dbSNP
rs number or gene name as well.
 
2) Under the "Configure Data Sources" section:
a) From the "track group" drop down menu select "Regulation"
b) From "track" select "ENCODE Regulation - Layered H3K27Ac.."
c) From "subtrack" select the specific signal you are interested
in quantifying. For a list of what each subtrack is you can
review this page:
 
http://genome.ucsc.edu/cgi-bin/hgTrackUi?&db=hg19&g=wgEncodeRegMarkH3k27ac
d) Click the "Add" button
 
3) Under the "Output Options" section:
a) Choose whether you would like to download the output to a file
or view it as plain text in your browser.
b) Click "Get output"
 
You should now have a file containing the chromosome, chromosome
start, chromosome end, and a list of values that represent
the data points in the H3K27Ac signal.
 
Thank you again for your inquiry and using the UCSC Genome Browser. If you
have any further questions, please reply to genome <at> soe.ucsc.edu. All
messages
sent to that address are archived on a publicly-accessible forum. If your
question
includes sensitive data, you may send it instead to genome-www <at> soe.ucsc.edu.
 
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 17 Dec 18:06 2015

Digest for genome <at> soe.ucsc.edu - 3 updates in 3 topics

Brian Lee <brianlee <at> soe.ucsc.edu>: Dec 16 03:40PM -0800

Hi Xianjun,
 
Thank you for your follow-up question.
 
You are correct that the factorbookMotifPos table is actually for positions
of all motifs, and not just the significant motifs viewed in
wgEncodeRegTfbsClusteredV3.
 
When this version of the clustered TFBS track was built, there was a desire
to simultaneously release an accompanying Factorbook track, that you can
find on our genome-preview server (without any Track Description details)
that may help clarify things:
http://genome-preview.cse.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=brianlee&hgS_otherUserSessionName=hg19.factorBook.preview
 
There are indeed some behind the scenes calculations taking place so that
only those clusters with a significant motif will show only one single
green highlight bar, with the highest value selected. There were other
additional steps involved, for example, some motifs like GATA3 were meant
to be removed (these factors are annotated solely as tethered binders,
where a factor binds to another that in turn binds to DNA, and can be found
listed in the Genome Research paper in Table S3). These types of additional
decisions were taken through emails with the Factorbook group, including
the feedback on the mapping of terms used to build associated tables like
factorbookMotifCanonical and factorbookGeneAlias that would translate our
existing Target terms like JUN, JUND, and JUNB to AP1. In the
factorbookMotifCanonical table you will find entries like the following
values:
select * from factorbookMotifCanonical where target like "%JUN%"\G
 
- 1. row *********************
target: JUN
motif: AP1
- 2. row *********************
target: JUNB
motif: AP1
- 3. row *********************
target: JUND
motif: AP1
 
These were used to map items to the parent table,
wgEncodeRegTfbsClusteredV3, controlling the display of blocks in this
cluster track.
mysql> select distinct name from wgEncodeRegTfbsClusteredV3 where name like
"%JUN%"\G
 
- 1. row *********************
name: JUND
- 2. row *********************
name: JUN
- 3. row *********************
name: JUNB
 
In brief summary, the v-JUN item you point to, was not included in this
mapping through these correspondences because in supplementary table
information from Factorbook in the "canonical motif" AP1 was listed first
and any secondary canonical, like v-JUN, was not used.
 
Some examples from TableS1:
 
ENCODE filename in the UCSC database *HGNC ID* common name *canonical motif*
wgEncodeSydhTfbsHuvecCjunStdAlnRep0 *JUN* CJUN *AP-1* ;v-JUN
wgEncodeUchicagoTfbsK562EjunbControlAlnRep0 *JUNB* JUNB *AP-1* ;v-JUN
wgEncodeSydhTfbsHepg2JundIggrabAlnRep0 *JUND* JUND *AP-1* ;v-JUN
 
So that for all JUND, JUN, JUNB clusters in wgEncodeRegTfbsClusteredV3, the
corresponding motif AP1 was used. An example can be seen in
chr1:116221702-116221981 for JunD, where also a selection is happening so
only one AP1 motif is displayed:
http://genome-preview.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=brianlee&hgS_otherUserSessionName=hg19.factorBook.preview2
 
If you click into the JUND clusters, the factorbookGeneAlias table exists
so that the details link on that page can be built correctly to factorbook
as JunD.
mysql> select * from factorbookGeneAlias where name like "%JUN%"\G
 
- 1. row *********************
name: JUN
value: c-Jun
- 2. row *********************
name: JUNB
value: JunB
- 3. row *********************
name: JUND
value: JunD
 
Your suggestion to add additional notes to the description page has been
recorded.
 
Thank you again for your inquiry and using the UCSC Genome Browser. If you
have any further questions, please reply to genome <at> soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-www <at> soe.ucsc.edu.
 
All the best,
 
Brian Lee
UCSC Genomics Institute
 
On Mon, Dec 14, 2015 at 1:34 PM, Dong, Xianjun <XDONG <at> rics.bwh.harvard.edu>
wrote:
 
Luvina Guruvadoo <luvina <at> soe.ucsc.edu>: Dec 16 03:32PM -0800

Hello Gary,
 
Thank you for your question. The refGene and refFlat tables are both in
genePred format. However, refGene contains additional info. Please see the
following links for a description of each table:
 
refGene:
http://genome.ucsc.edu/cgi-bin/hgTables?hgta_doSchemaDb=hg38&hgta_doSchemaTable=refGene
 
refFlat:
http://genome.ucsc.edu/cgi-bin/hgTables?hgta_doSchemaDb=hg38&hgta_doSchemaTable=refFlat
 
Note, you can view the description for any table by clicking the "describe
table schema" button on the Table Browser
<http://genome.ucsc.edu/cgi-bin/hgTables>.
 
If you have any further questions, please reply to genome <at> soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-www <at> soe.ucsc.edu.
 
 
Regards,
Luvina
--
Luvina Guruvadoo
UCSC Genome Browser
http://genome.ucsc.edu
 
 
 
 
On Tue, Dec 15, 2015 at 1:01 PM, Yung-Chih Lai <yungchihlai <at> gmail.com>
wrote:
 
Christopher Lee <chmalee <at> ucsc.edu>: Dec 16 09:37AM -0800

Dear Andy,
 
Thank you for your question about the missing gene descriptions
from the table browser output and drop-down search. The reason for
all the discrepancies you have noted has to do with different
tables being updated at different times than others.
 
The reason the kgXref table is missing certain gene descriptions
is because it was built along with the knownGene table, which
for mm9 has not been updated for a few years. This causes
kgXref to become out of sync with the refGene and refLink tables,
both of which are updated almost weekly. Thus in your mysql query
when you try to use kgXref.description there will be some gene
descriptions missing.
 
The issue with the descriptions appearing in the drop-down for some
items and not for others is because the drop-down description comes
from the knownCanonical table, which does not contain the items
you have mentioned. However, when you click on a refGene item in the
refSeq track, and you see the description, that description comes from
the refLink table. refLink and refGene are updated frequently,
as mentioned above, thus containing the genes you are
looking for.
I hope this is helpful. If you have any further questions, please reply
to gen... <at> soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, youmay send it instead to genom... <at> soe.ucsc.edu.
 
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 14 Dec 18:13 2015

Digest for genome <at> soe.ucsc.edu - 3 updates in 3 topics

Muhammad Sohail Raza <muhammadsohailraza <at> live.com>: Dec 14 02:50AM

Hi, I have some questions about liftOver tool.
Actually, I am looking for lifting-over from hg38 to hg19 assembly. I have some genome-wide annotations of BED format in human hg38 assembly. But when i looked into sample liftOver tool command , it was:
liftOver oldFile map.chain newFile unMapped
 
Here what are the additional genome files (i.e. map.chain) that i require in order to liftOver, and from where i can get them?? When we use liftOver tool, why we get unmapped regions??
Many Thanks!sohail************************************************************************************
Muhammad Sohail RazaCenter of Genome Variation and BiomedicineBeijing Institute of Genomics, CASBeijing, China.Phone: +8613552957083
email: sohail <at> big.ac.cn muhammadsohailraza <at> live.com
常磊 <reniorchang <at> 163.com>: Dec 13 03:29PM +0800

hi
I want to use txCdsPredict score to determine whether my assembly transcrits is protein-coding or non-coding, but I could not found the txCdsPredict in UCSC. Can you tell me how to find it and use it.
Thank you sincerely.
LEI CHANG
"Vasileios Panagiotis Lenis [vpl]" <vpl <at> aber.ac.uk>: Dec 12 10:28AM

Hello UCSC genome browser people,
 
I am trying to investigate the role of the highly conserved elements in whole genome alignments.
I found the magnificent data set of the 99 species multiple alignment with human as a reference on your browser and I would like to use it
Currently I am working with alignments that are using cattle (bos taurus) as reference, so I am trying to change the reference in your alignments.
I found the mafOrder from kent' s pipeline to put cattle genome first, but the problem is that in each maf file that is based on human chromosomes I have more than one cattle chromosome that are being aligned. (For example, in chr10.maf I have alignments of bosTau8.chr8, bosTau8.chr3, etc.)
Is there any way to change the reference by using all the cattle chromosomes and after that to concatenate somehow the alignments in each chromosome?
What I am trying to say is:
Lets say that I have finally alignments for the bosTau8.chr8 in 5 different human chromosomes. Can I concatenate them and have the chr8.maf for cattle?
 
I know that when you are changing the reference in a reference based alignment practically you're loosing alignments but I don't mind about it at this phase.
 
Thank you very much,
Vasilis.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 11 Dec 18:07 2015

Digest for genome <at> soe.ucsc.edu - 2 updates in 2 topics

liuy <liuyanhu005 <at> 163.com>: Dec 11 04:51PM +0800

Dears
I used liftover to change conservation data from canfam2 to canfam3.
But I find there are some site appear more than once.
for example:
One site come from chr01. Another is come from chrX.
Are there something wrong?
thanks
Yanhu Liu
Bogdan Tanasa <tanasa <at> gmail.com>: Dec 10 03:54PM -0800

Dear all,
 
on a simple note, what is the difference between hg38.p2 and p3 patches of
hg38 ?
are the genomic coordinates still the same ? And what is the current patch
version for https://genome.ucsc.edu/cgi-bin/hgGateway?db=hg38 ?
 
thanks,
 
bogdan
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 3 Dec 18:16 2015

Digest for genome <at> soe.ucsc.edu - 3 updates in 3 topics

"Steve Heitner" <steve <at> soe.ucsc.edu>: Dec 02 01:03PM -0800

Hello, Qian.
 
Thank you for pointing this out to us. You are correct that this is an error. The alignments were indeed created using GRCh38, but the README file incorrectly lists the accession ID for GRCh37.p1. We will correct this.
 
Please contact us again at genome <at> soe.ucsc.edu if you have any further questions. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genome-www <at> soe.ucsc.edu.
 
---
Steve Heitner
UCSC Genome Bioinformatics Group
 

 
From: qianzhengzong [mailto:qianzhengzong <at> picb.ac.cn]
Sent: Saturday, November 28, 2015 5:38 AM
To: genome <at> soe.ucsc.edu
Subject: [genome] GRCh38 GenBank Assembly Accession problem
 

 
 
 
Hi,
 
As i'm using the Human(hg38)/Chimp pairwise alignment file under the directory (http://hgdownload.soe.ucsc.edu/goldenPath/hg38/vsPanTro4/), I find the Genbank Assembly Accession information of GRch38 is not consistent with that shown in NCBI revision history (http://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.27#/def_asm_Primary_Assembly), This confused me quiet a lot, I'm wondering if you have make a mistake or I'm making a wrong interpretation?
 

 

 
UCSC Genome browser:
 

 
- target/reference: Human
 
(hg38, Dec. 2013 (GRCh38/hg38),
GRCh38 Genome Reference Consortium Human Reference 38 (GCA_000001405.2))

NCBI:
 
<http://www.ncbi.nlm.nih.gov/assembly/6328/> GCA_000001405.2
 
n/a
 
n/a
 
<http://www.ncbi.nlm.nih.gov/assembly/GCA_000001405.2/> GRCh37.p1
 
Chromosome
 
Replaced GenBank
 

 

 
Best wishes!
 

 

 
Qian
 
 
 
 
 
--
Hiram Clawson <hiram <at> soe.ucsc.edu>: Dec 02 11:13AM -0800

Good Morning Aiguo:
 
When I enter the example URL you provided:
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&hubUrl=...your.../hub.txt
 
I don't find any error in the hub. The data displays and the hub functions.
Is this the URL you are using to provide access to your hub data ?
 
You are missing the descriptionUrl E10747_827.html file from your hub.txt reference.
 
I do note that some of your wiggle data has 'nan' for a value instead of
an actual value. Plus, you appear to have too many significant digits
in your values. They are not illegal, but they seem meaningless, I doubt
any experiment has that much resolution.
 
--Hiram
 
On 12/2/15 8:19 AM, Li, Aiguo (NIH/NCI) [E] wrote:
Brian Lee <brianlee <at> soe.ucsc.edu>: Dec 02 10:51AM -0800

Dear Tao,
 
Thank you for using the UCSC Genome Browser and the information about the
usage of unplaced and unlocalized on our assembly description pages.
 
We have created a documentation work ticket to make these changes as you
have noted that there is a discrepancy between our usage and the
definitions at NCBI:
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/info/definitions.shtml
 
Thank you again for your message and helping improve the UCSC Genome
Browser. If you have any further questions, please reply to
genome <at> soe.ucsc.edu. All messages sent to that address are archived on a
publicly-accessible forum. If your question includes sensitive data, you
may send it instead to genome-www <at> soe.ucsc.edu.
 
All the best,
 
Brian Lee
UCSC Genomics Institute
 
On Thu, Nov 19, 2015 at 8:32 PM, Tao Wang <wangtao.shandong <at> gmail.com>
wrote:
 
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 2 Dec 18:21 2015

Digest for genome <at> soe.ucsc.edu - 6 updates in 5 topics

"Li, Aiguo (NIH/NCI) [E]" <liai <at> mail.nih.gov>: Dec 02 04:19PM

Dear Sir/Madam,
 
I am writing to ask assistance in making track hub. I just created the hub.txt, genomes.txt and trackDb.txt files, debugged them using hubCheck and found no error. I was able to load the http link to ucsc site without error. However when adding mapping database http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&hubUrl= in front of my http url, I got the error message: ERROR: No Content-Length: returned in header for http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&hubUrl=http://helix.nih.gov/~NOB2/827hub/hub.txt, can't proceed,
 
 
I have checked my bigwig file accessibility from custom track and it works fine. I checked the bigwig file format and noticed it does not have header (see below), but I am not sure this could be the issue or not. chr1 0 914078 -0.39102792739868164
chr1 914078 3628156 -0.028905002400279045
chr1 3628156 3647104 -0.43936997652053833
chr1 3647104 5295691 0.05502462387084961
chr1 5295691 5634053 0.05502462387084961
 
Any suggestions will be appreciated!
 
Aiguo (Anna) Li
Sr. Bioinformatician
Building 37, rm 1142
NOB/NCI/NIH
301-435-1454 (o)
JONGHUN LEE <christiario <at> gmail.com>: Dec 02 01:55PM +0900

Hi. I am a Ph.D. student at University of Tokyo, Japan.
I am re-analyzing the WGBS data of h1_hESC and gm12878 cells that you
provided in 2012.
Could I know the non-conversion rate of the two samples?
Here are the samples that I am working on.
 
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE40832
 
Thank you.
 
 
--
 
JONGHUN LEE
81-80-4735-7565
Department of Medical Genome Science
University of Tokyo
Luvina Guruvadoo <luvina <at> soe.ucsc.edu>: Dec 01 12:26PM -0800

Hello Gary,
 
As my colleague mentioned in his previous email, we only display the data
that Ensembl provides to us. We suggest you contact Ensembl directly for
more information: http://www.ensembl.org/info/about/contact/index.html
 
If you have any further questions, please reply to genome <at> soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-www <at> soe.ucsc.edu.
 
- - -
Luvina Guruvadoo
UCSC Genome Bioinformatics Group
 
 
On Sat, Nov 21, 2015 at 7:58 PM, Yung-Chih Lai <yungchihlai <at> gmail.com>
wrote:
 
Yung-Chih Lai <yungchihlai <at> gmail.com>: Dec 01 05:51PM -0800

Hi Luvina,
 
Thanks for your information.
 
Best,
 
Gary
 
On Tue, Dec 1, 2015 at 12:26 PM, Luvina Guruvadoo <luvina <at> soe.ucsc.edu>
wrote:
 
Robert Kuhn <kuhn <at> soe.ucsc.edu>: Dec 01 10:03AM -0800

Hello, David,
 
Thanks for your question about sharing customized links. The short
version of how to specify which tracks are on/off is is to add to your url
a text string in this format: "&<tablename>=<visibility>". For example,
to turn off the Spliced ESTs track, which is on by default, you would add
to the url: "&intronEst=hide".
 
Here is a link to a short video describing a number of ways to determine
the tablename for a track:
 
http://genome.ucsc.edu/training/vids/index.html#vid06
 
Here is a link to our online documentation for making links:
 
https://genome.ucsc.edu/FAQ/FAQlink.html
 
And here is a recent blog post that discusses sharing Browser sessions
and constructing links.
 
http://genome.ucsc.edu/blog/how-to-share-your-ucsc-screenthoughts/
 
Thanks for being a Browser user. I hope this helps, but do not hesitate
to contact us via the mailing list for more details if my reply does not
get
you exactly where you wish to be.
 
regards,
 
--b0b kuhn
ucsc genome bioinformatics group
 
 
 
 
On Tue, Dec 1, 2015 at 2:24 AM, David Leader <David.Leader <at> glasgow.ac.uk>
wrote:
 
Brian White <bwhite <at> genome.wustl.edu>: Dec 01 10:17AM -0600

Hello,
 
I would like a list of pfam domains with their mappings to mm9.
 
Such functionality seems to be provided for hg19 in the Pfam in UCSC
gene track,
as described here
http://redmine.soe.ucsc.edu/forum/index.php?t=msg&goto=10689&S=e142e3f6ebbe77412ce70628d6509d9a
 
And I see that I can a list of pfams domains in a gene in mouse, as
described here:
http://redmine.soe.ucsc.edu/forum/index.php?t=msg&goto=11282&S=5f4f3998247f8a15ca208f990c3dbe09
 
But the latter seems to return the coordinates of the gene, not the
coordinates of the domain
within the gene.
 
Is there a solution for mouse?
 
Thank you,
Brian
 
--
Brian White, PhD
Assistant Professor
Department of Medicine, Oncology Division / McDonnell Genome Institute
Washington University School of Medicine
 
 
____
This email message is a private communication. The information transmitted, including attachments, is intended only for the person or entity to which it is addressed and may contain confidential, privileged, and/or proprietary material. Any review, duplication, retransmission, distribution, or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is unauthorized by the sender and is prohibited. If you have received this message in error, please contact the sender immediately by return email and delete the original message from all computer systems. Thank you.
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 12 Nov 18:22 2015

Digest for genome <at> soe.ucsc.edu - 5 updates in 5 topics

"Rohit Kolora" <rohit <at> bioinf.uni-leipzig.de>: Nov 12 02:14PM +0100

Dear UCSC developers,
 
I have a few questions regarding the UCSC tools for chain building and
netting from axt alignments.
 
1) blocks after duplicate removal
"12 blocks after duplicate removal" - During axtChain option, there is a
process of removing the duplicates. What kind of duplicates do you refer
to? Is it query wise duplicate i.e. hits that are covered by the same
query at multiple places (or) target-related i.e. region of target that is
covered by multiple queries?
How will a region be extended if they are covered by both strands, one
strand by a gap and other strand due to an exact hit, are they both
mentioned as different chains?
 
 
2) noSplit option in netToAxt
"Don't split chain when there is an insertion of another chain" - Does
this mean that a particular target sequence when being chained with one
query is split at a point since the target-region has a hit with another
query?
If so, is there an option to net this region but still have information
regarding these overlaps.
 
 
--
Regards,
Rohit
Chris Cheshire <chris.j.cheshire <at> gmail.com>: Nov 12 09:45AM

Hi there,
 
I am in the process of creating some custom tracks and I would like to
represent score on a coloured element. Currently, I have one greyscale
track for strength and another identical track to indicate colour. Ideally
I would like to combine the two and have the score fade the alpha value of
the colour up and down. Is this possible? I have search the documentation
but I cannot see how to do this...
 
Thanks,
 
Chris Cheshire (UCL)
"Arumilli, Meharji" <meharji.arumilli <at> helsinki.fi>: Nov 12 11:36AM +0200

Dear all,
 
We are trying to convert Ensembl Gene ID's to corresponding Gene Name
for canFam3.1
 
The UCSC table for the "ensGene" track has the gene IDs and the
transcript IDs but not the gene names.
 
ENSCAFG00000000001.3
ENSCAFG00000030108.1
 
And "ensemblToGeneName" table has the transcript ID with gene name.
Would you help us how to match the Gene ID's retrieved from UCSC browser
with the gene names?
 
Br
Mehar
aboucheha <at> ibisc.univ-evry.fr: Nov 11 11:02AM +0100

Hello;
 
i have a question about transposons downloaded from rmsk table for human,
mouse and Drosophila melanogaster species, if they are experimentally
validated in nature or predicted
 
thank you in advance
 
Regards
Anouar boucheham
Haitian Liu <hliu <at> scripps.edu>: Nov 11 10:40AM -0800

Hi,
 
I would like to use your database to search promoter sequences with genes in Pichia pastoris. When I use Blat and enter the organism, I can't find Pichia pastoris. So which genome I should use for the search? If no Pichia pastoris genome is available, how will I perform the search? Any suggestions? Many thanks!
 
Haitian
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 7 Nov 18:10 2015

Digest for genome <at> soe.ucsc.edu - 5 updates in 2 topics

Muhammad Sohail Raza <muhammadsohailraza <at> live.com>: Nov 06 09:39AM

Hi,
I am looking for downloading CpG islands of human genome in order to compare my variants data. At website: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/
we came across two headings of cpg islands for download i.e.cpgIslandExt.txt.gz cpgIslandExtUnmasked.txt.gz
can you please tell me what is the difference between two files and which one could be useful?Many thanks!--
************************************************************************************
Muhammad Sohail RazaCAS-TWAS PhD Fellow
Center of Genome Variation and BiomedicineBeijing Institute of Genomics, CASBeijing, China.Phone: +8613552957083
email: sohail <at> big.ac.cn muhammadsohailraza <at> live.com
Cath Tyner <cath <at> ucsc.edu>: Nov 06 05:14PM -0800

Dear Sohail,
 
Thank you for using the UCSC Genome Browser and for submitting your
question regarding the differences between the following CpG island files:
 
*cpgIslandExt* is the repeat-masked version of these data; repetitive
elements are excluded. In the UCSC Genome Browser, only the masked version
is displayed (by default).
 
*cpgIslandExtUnmasked* is the unmasked version of these data; potential CpG
islands existing in repeat regions are displayed, and would otherwise not
be visible in the repeat-masked version.
 
You can read more at the CpG Islands Track Description page:
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=cpgIslandSuper
 
For further information, please see the related publication which is listed
in the Track Description References section:
http://www.sciencedirect.com/science/article/pii/0022283687906899
 
Thank you again for your inquiry and for using the UCSC Genome Browser. If
you have any further questions, please reply to genome <at> soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-www <at> soe.ucsc.edu.
Enjoy,
Cath
. . .
Cath Tyner
UC Santa Cruz Genomics Institute
 
 
On Fri, Nov 6, 2015 at 1:39 AM, Muhammad Sohail Raza <
Bogdan Tanasa <tanasa <at> gmail.com>: Nov 06 04:19PM -0800

Dear Steve,
 
 
thanks again for your help regarding genePredToGtf utility. For hg19,
please may I ask what "collection name" shall I use in order to access
the GENECODE collection of hg19 (shall I use "basic" does not work).
 
 
genePredToGtf hg19 ??? GenCODE.gtf
 
 
much thanks !
 
 
-- bogdan
Bogdan Tanasa <tanasa <at> gmail.com>: Nov 06 04:21PM -0800

Just noticed that it woks with : genePredToGtf hg19 wgEncodeGencodeBasicV19
GenCode_genes.gtf
 
Thanks and happy weekend !
 
Brian Lee <brianlee <at> soe.ucsc.edu>: Nov 06 04:37PM -0800

Dear Bogdan,
 
I am sending this message for those who might see this question in our
archives and wish to see more related information at the
previously-answered question:
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/Oj41ZcVXyOc/nt0qTJ8C5_gJ
.
 
Please contact us again at genome <at> soe.ucsc.edu if you have any further
questions. All messages sent to that address are archived on a
publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead togenome-www <at> soe.ucsc.edu.
 
All the best,
Brian Lee
UCSC Genomics Institute
 
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 28 Oct 18:08 2015

Digest for genome <at> soe.ucsc.edu - 3 updates in 3 topics

Santagostino Marco <marco.santagostino <at> unipv.it>: Oct 28 12:12PM +0100

Dear Sir/Madam,
 
what is the difference between Basic and Comprehensive Gencode Genes
tracks? According to the track information page, the basic track is a
subset of the comprehensive one, what is missing?
 
Regards,
 
Marco Santagostino
 
 
 
--
Marco Santagostino, PhD
Laboratorio di Biologia Molecolare e Cellulare
Dipartimento di Biologia e Biotecnologie, University of Pavia
Via Ferrata, 9 - 27100 Pavia, Italy
Post Box: Via Ferrata, 1 - 27100 Pavia, Italy
Tel.: +39 0382 985540
Fax: +39 0382 528496
e-mail: marco.santagostino <at> unipv.it
Brian Lee <brianlee <at> soe.ucsc.edu>: Oct 27 03:40PM -0700

Dear Ian,
 
Thank you for using the UCSC Genome Browser and your question about
including X. laevis.
 
One of our engineers took a look and discovered that this assembly is not
available at GenBank and we currently focus only on assemblies that are
available through GenBank. One of our engineers strongly suggests you
prompt the assemblers of this genome to get it submitted to the NCBI
assembly system: http://www.ncbi.nlm.nih.gov/assembly/
 
Once the assembly is available there it will have an increased chance of
being used by NCBI, Ensembl, and UCSC. Also the chance will increase that
other individuals in the wider bioinformatics community may create an
assembly hub through automated processes once it is available through
GenBank.
 
Thank you again for your inquiry and using the UCSC Genome Browser. If you
have any further questions, please reply to genome <at> soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-www <at> soe.ucsc.edu.
 
All the best,
 
Brian Lee
UCSC Genomics Institute
 
Matthew Speir <mspeir <at> soe.ucsc.edu>: Oct 27 10:20AM -0700

Hi Wenze,
 
Thank you for your question about getting the percent identity between
two species. No, this percent identity information cannot be obtained
directly from the net file. If you are using assemblies that are found
in the UCSC Genome Browser, you may be able to use some of our resources
to find this information.
 
First, you may be able to use our "axtNet" files and the tool axtToPsl.
You can download the command line tool axtToPsl here:
http://hgdownload.soe.ucsc.edu/admin/exe/. The axtNet files can be found
in the "Pairwise Alignment" directories for each assembly. For example,
the axtNet files for the pairwise alignment between human (hg19) and
mouse (mm10) are found under the link "Human/Mouse (mm10)" under the on
the downloads page: http://hgdownload.soe.ucsc.edu/downloads.html. In
these directories, the axtNet files will either be in a directory
labeled "axtNet" or as a single file named something like
hg38.panTro4.net.axt.gz.
 
After downloading the axNet files for the pairwise alignment of your
choice, you can use axtToPSl to convert the axt files into PSL files.
The PSL files will contain two columns, matches and misMatches, that you
can use to calculate percent ID with the formula:
 
matches / (matches+misMatches)
 
You can read more about the PSL format here:
http://genome.ucsc.edu/FAQ/FAQformat.html#format2.
 
You may also be able to use our "featureBits" utility and tables on our
public MySQL server to get a general sense of the alignment coverage
between two assemblies. This coverage will be a measure of how many
bases are included in an aligned block, not the percent ID, which is a
measure where the two sequences have identical bases. To get this
measurement of coverage, see the answer to this previously answered
mailing list question:
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/LllcgG0uVFg/bAKu5OOEDAAJ
 
I hope this is helpful. If you have any further questions, please reply
to genome <at> soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genome-www <at> soe.ucsc.edu.
 
Matthew Speir
UCSC Genome Bioinformatics Group
 
 
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.

Gmane