genome | 26 May 19:16 2015

Digest for genome <at> soe.ucsc.edu - 5 updates in 5 topics

David da Silva Pires <pires <at> iq.usp.br>: May 25 01:44PM

Hello.
 
Maybe some more information could be helpful:
 
* The commands "sed" and "sort" correspond to the versions that are
distributed with Kubuntu Linux 15.04.
* The commands "twoBitInfo" and "bedToBigBed" correspond to the versions
that are distributed with GBiB (Ubuntu 14.04.1 LTS), updated with the
command ~browser/updateBrowser.
 
If you need something more, just tell me.
 
Greetings.
 
--
David da Silva Pires
 
 
On Fri, May 22, 2015 at 4:35 PM David da Silva Pires <pires <at> iq.usp.br>
wrote:
 
Rabail Zehra <rabailzehra <at> gmail.com>: May 24 11:13PM -0700

Hi,
 
I am interested in finding out about pre-computed MKAR values for the human
genome (hg19) at UCSC.. Can you please guide me with that?
Natalia Rodchenko <rodchenk <at> usc.edu>: May 24 05:21PM -0700

Hello,
 
I need to download the set of Common SNPs from dbSNP build 137,
 
I have found this link:
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/
 
but it only contains the file:
 
*snp138Common.txt.gz*
for the build 138.
 
Please let me know how can I get the same file, but for build 137,
thank you, Natalia
 
 
--
Natalia Rodchenko, PhD student
Computational biology and bioinformatics,
University of Southern California,
Office: RRI316C
<sofmaia <at> gmail.com>: May 24 02:07PM

I used Alamut version 2.6 to get the PhyloP score for some variants, and I know PhyloP scores in the Alamut report are obtained from UCSC Genome Browser. Now I need to know how many species and which ones are taken into consideration.
 
 
I thank you in advance.
 
 
 
 
Best regards,
 
 
Sofia Maia
 
 
Sent from Windows Mail
Andrew Nkurunungi <andrenkun <at> yahoo.com>: May 23 02:07PM

Hello,I am a bit new in Bioinformatics but I really need help downloading this file, snp132Common.txt. I need it to begin my project work. Every time I try to download it, it says that It could not be found on the server.Thanks a lot.
AndrewAndrew NkurunungiGraduate Research AssistantBioinformatics/Computer Science.Alabama Agricultural and Mechanical University
(256) 479 2957
ankurunu <at> bulldogs.aamu.edu
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 22 May 19:12 2015

Digest for genome <at> soe.ucsc.edu - 5 updates in 4 topics

"Nuket, Bilgen" <bnuket <at> rvc.ac.uk>: May 21 08:42PM

Hi,
I have two problems…
1- In my data set I used bosTau7 build as my reference genome file. Now I need to lift over variant files to UMD3.1.
To achieve that I planned have to lift over from 7 to 4 and 4 to UMD3.1.
 
Are you planning to make a chain file for bosTau7To UMD3.1 ?
2- how many bosTau7 genomes out there? I downloaded from http://support.illumina.com/sequencing/sequencing_software/igenome.html and I also use IGV whihc has bostau7 genome downloaded. But In the trying process of lifting over I realised that those file are not same. Should not they be the same?
I am very confused
 
Thank you
 
 
Bad input: the chain file you are using is not compatible with the reference you are trying to lift over to; please use the appropriate chain file for the given reference
 
[RVC Logo - link to RVC Website]<http://www.rvc.ac.uk> [Twitter icon - link to RVC (Official) Twitter] <http://twitter.com/RoyalVetCollege> [Facebook icon - link to RVC (Official) Facebook] <http://www.facebook.com/theRVC> [YouTube icon - link to RVC YouTube] <http://www.youtube.com/user/RoyalVetsLondon?feature=mhee> [Pinterest icon - link to RVC Pinterest] <http://pinterest.com/royalvetcollege/> [Instagram icon - link to RVC Instagram] <http://instagram.com/royalvetcollege>
 
This message, together with any attachments, is intended for the stated addressee(s) only and may contain privileged or confidential information. Any views or opinions presented are solely those of the author and do not necessarily represent those of the Royal Veterinary College (RVC). If you are not the intended recipient, please notify the sender and be advised that you have received this message in error and that any use, dissemination, forwarding, printing, or copying is strictly prohibited. Unless stated expressly in this email, this email does not create, form part of, or vary any contractual or unilateral obligation. Email communication cannot be guaranteed to be secure or error free as information could be intercepted, corrupted, amended, lost, destroyed, incomplete or contain viruses. Therefore, we do not accept liability for any such matters or their consequences. Communication with us by email will be taken as acceptance of the risks inherent in doing so.
"Steve Heitner" <steve <at> soe.ucsc.edu>: May 22 09:55AM -0700

Hello, Bilgen.
 
The liftOver files you refer to already exist:
 
bosTau7 to bosTau4/6: http://hgdownload.cse.ucsc.edu/goldenPath/bosTau7/liftOver/
bosTau4 to bosTau6: http://hgdownload.cse.ucsc.edu/goldenPath/bosTau4/liftOver/
 
The UCSC bosTau7 assembly is based on the Baylor Btau_4.6.1 assembly. There is only one Baylor Btau_4.6.1 assembly, but as with any assembly, several institutions have made it available on their sites and each institution has their own method of processing the raw data. The Illumina web site you referred to has download files from Ensembl, NCBI and UCSC. If you are intending to use the UCSC liftOver tool, I would recommend using the UCSC data wherever possible, but your query and target data should certainly be from the same institution. One of the main differences between the UCSC data and the Ensembl/NCBI data is that UCSC chromosome names include “chr” (e.g., “chr1” versus just “1” for chromosome 1). The UCSC liftOver utility requires chromosome names to include the “chr”, so it is best to just use the UCSC data files.
 
Please contact us again at genome <at> soe.ucsc.edu if you have any further questions. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genome-www <at> soe.ucsc.edu.
 
---
Steve Heitner
UCSC Genome Bioinformatics Group
 

 
From: Nuket, Bilgen [mailto:bnuket <at> rvc.ac.uk]
Sent: Thursday, May 21, 2015 1:42 PM
To: genome <at> soe.ucsc.edu
Subject: [genome] chain file request for bosTau7To UMD3.1
 

 
Hi,
 
I have two problems…
 
1- In my data set I used bosTau7 build as my reference genome file. Now I need to lift over variant files to UMD3.1.
 
To achieve that I planned have to lift over from 7 to 4 and 4 to UMD3.1.
 

 
Are you planning to make a chain file for bosTau7To UMD3.1 ?
 
2- how many bosTau7 genomes out there? I downloaded from http://support.illumina.com/sequencing/sequencing_software/igenome.html and I also use IGV whihc has bostau7 genome downloaded. But In the trying process of lifting over I realised that those file are not same. Should not they be the same?
 
I am very confused
 

 
Thank you
 

 
Bad input: the chain file you are using is not compatible with the reference you are trying to lift over to; please use the appropriate chain file for the given reference
 
<http://www.rvc.ac.uk> RVC Logo - link to RVC Website <http://twitter.com/RoyalVetCollege> Twitter icon - link to RVC (Official) Twitter <http://www.facebook.com/theRVC> Facebook icon - link to RVC (Official) Facebook <http://www.youtube.com/user/RoyalVetsLondon?feature=mhee> YouTube icon - link to RVC YouTube <http://pinterest.com/royalvetcollege/> Pinterest icon - link to RVC Pinterest <http://instagram.com/royalvetcollege> Instagram icon - link to RVC Instagram
 
This message, together with any attachments, is intended for the stated addressee(s) only and may contain privileged or confidential information. Any views or opinions presented are solely those of the author and do not necessarily represent those of the Royal Veterinary College (RVC). If you are not the intended recipient, please notify the sender and be advised that you have received this message in error and that any use, dissemination, forwarding, printing, or copying is strictly prohibited. Unless stated expressly in this email, this email does not create, form part of, or vary any contractual or unilateral obligation. Email communication cannot be guaranteed to be secure or error free as information could be intercepted, corrupted, amended, lost, destroyed, incomplete or contain viruses. Therefore, we do not accept liability for any such matters or their consequences. Communication with us by email will be taken as acceptance of the risks inherent in doing so.
 
--
Yuan Jian <jayuan2008 <at> yahoo.com>: May 22 06:39AM

hello,I have downloaded blat standalone version. i can blat 20bp sequence at UCSC genome web browser, but in standalone version, i get no answer. how can i query a sequence of size 20 bp in standalone version?
thanksYu
Baochun Zhang <Baochun_Zhang <at> dfci.harvard.edu>: May 21 02:23PM -0400

Hi,
 
I am searching for the distribution of various isoforms of mouse Rapgef4 gene in different tissues/cell types, so far I can see such data available for B cells and T cells (linked through Immgen.org). Are there also data for liver and brain tissues? It seems the transcript in B and T cells lacks exon 1-4, I wonder if the full-length transcript is expressed in other tissues/cell types, such as liver and brain which are known to express high levels of total Rapgef4.
 
Thanks in advance for your help,
 
Baochun Zhang, MD, PhD
Assistant Professor, Harvard Medical School
Division of Hematologic Neoplasia
Department of Medical Oncology
Department of Cancer Immunology and AIDS
Dana-Farber Cancer Institute
450 Brookline Ave, Mayer 521B
Boston, MA 02215
 
 
 
 
 
The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.
"Steve Heitner" <steve <at> soe.ucsc.edu>: May 21 11:09AM -0700

Hello, Vinay.
 
I assume you are following the instructions at http://genomewiki.ucsc.edu/index.php/Same_species_lift_over_construction. I have a few questions for you regarding this:
 
1. Do you have the additional kent source programs required by these scripts? (twoBitInfo, twoBitToFa, partitionSequence.pl, gensub2, blat, calc, axtChain)
 
2. Do you also have the companion script BlatJob.csh?
 
3. How did you obtain your query.2bit and target.2bit files?
 
4. Are there any error messages that might provide us additional help in diagnosing the problem?
 
Please contact us again at genome <at> soe.ucsc.edu if you have any further questions. Questions sent to that address will be archived in a publicly-accessible forum for the benefit of other users. If your question contains sensitive data, you may send it instead to genome-www <at> soe.ucsc.edu.
 
---
Steve Heitner
UCSC Genome Bioinformatics Group
 

 
From: Vinay R S [mailto:vinay.rs <at> leucinerichbio.com]
Sent: Thursday, May 21, 2015 3:10 AM
To: genome <at> soe.ucsc.edu
Subject: [genome] Plasmodium falciparum liftover
 

 
Hi all,
 
I tried running available scripts for liftover(SameSpeciesBlatSetup.sh and SameSpeciesChainNet.sh) for Plasmodium falciparum 2008 to 2015,
I am not getting the .psl output or chain out puts.
Till chromosome sequence size is printing along with joblist but i am not getting any out put.
Could someone help me with this
Thanks in advance.
 
Regards
Vinay R S
 
--
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 21 May 19:07 2015

Digest for genome <at> soe.ucsc.edu - 7 updates in 5 topics

Vinay R S <vinay.rs <at> leucinerichbio.com>: May 21 03:40PM +0530

Hi all,
I tried running available scripts for liftover(SameSpeciesBlatSetup.sh and
SameSpeciesChainNet.sh) for Plasmodium falciparum 2008 to 2015,
I am not getting the .psl output or chain out puts.
Till chromosome sequence size is printing along with joblist but i am not
getting any out put.
Could someone help me with this
Thanks in advance.
 
Regards
Vinay R S
Jonathan Casper <jcasper <at> soe.ucsc.edu>: May 20 02:53PM -0700

Hello Amit,
 
Thank you for your question about support for HGVS nomenclature. While we
do have plans to add support for HGVS names to our tools, there is no
timetable for it right now. Some suggestions for working around this
limitation are described in this mailing list question:
https://groups.google.com/a/soe.ucsc.edu/d/topic/genome/oMVq6Z9p_0k/discussion.
Please note that the Variant Annotation Integrator (
http://genome.ucsc.edu/cgi-bin/hgVai) does now support the entry of rs# IDs
from dbSNP - under the "Select Variants" drop down menu, select the
"Variant Identifiers" option.
 
I hope this is helpful. If you have any further questions, please reply to
genome <at> soe.ucsc.edu or genome-mirror <at> soe.ucsc.edu. Questions sent to those
addresses will be archived in publicly-accessible forums for the benefit of
other users. If your question contains sensitive data, you may send it
instead to genome-www <at> soe.ucsc.edu.
 
--
Jonathan Casper
UCSC Genome Bioinformatics Group
 
David da Silva Pires <pires <at> iq.usp.br>: May 20 05:47PM

Hi.
 
I have build a bigBed track called "SMPs v5.2" at the following assembly
hub:
 
http://www.vision.ime.usp.br/~davidsp/hub/geneNetwork2/hub.txt
 
Since this track was obtained from a bed 12 file, every feature has the
information about the number of blocks, blocks starts and blocks sizes.
But, when I click at a feature to access its specific information page,
there is no way to download just the coding sequence (CDS). The option that
is displayed is relative to the entire window, including introns and
intergenic regions.
 
What am I supposed to do in order to download just the CDS?
 
Tranks.
 
--
David da Silva Pires
Jonathan Casper <jcasper <at> soe.ucsc.edu>: May 20 01:04PM -0700

Hello David,
 
Thank you for your question about obtaining CDS sequence for items in your
BED 12 track. We would like to improve the DNA retrieval options for the
description pages of individual features, but there is no timetable for it
right now. In the interim, the easiest way to get the sequence filtering
options you describe is to use the Table Browser. First load your hub on
our site, and then follow these steps:
 
1. Open the UCSC Table Browser at http://genome.ucsc.edu/cgi-bin/hgTables (or
click "Table Browser" from the top "Tools" menu on our site).
2. Select your track hub and track from the drop-down menus, set the region
to "genome", then click the "Identifiers: paste list" button.
3. On the new page, add the name of the BED 12 item that you want sequence
from to the text box. Click "submit".
4. Select the output format "sequence" and click "get output".
 
On the resulting page, you should be able to choose which portions of your
feature to retrieve sequence for (CDS, exons, UTR, etc.).
 
I hope this is helpful. If you have any further questions, please reply to
genome <at> soe.ucsc.edu or genome-mirror <at> soe.ucsc.edu. Questions sent to those
addresses will be archived in publicly-accessible forums for the benefit of
other users. If your question contains sensitive data, you may send it
instead to genome-www <at> soe.ucsc.edu.
 
--
Jonathan Casper
UCSC Genome Bioinformatics Group
 
On Wed, May 20, 2015 at 10:47 AM, David da Silva Pires <pires <at> iq.usp.br>
wrote:
 
David da Silva Pires <pires <at> iq.usp.br>: May 20 05:55PM

Hello.
 
One of my colleagues noted that all the genes for which the search is
successful have names starting with "Chr_" (the longest ones). The searches
that fail are all from scaffolds ou mitochondrial DNA (the shortest ones).
 
Is this an undocumented feature? Should the genome and, subsequently, all
the tracks relative to this genome, have its chromosomes names starting
with "Chr_" as a prerequisite in order to the search tool work?
 
Tranks.
 
 
On Tue, May 19, 2015 at 10:05 PM David da Silva Pires <pires <at> iq.usp.br>
wrote:
 
Jonathan Casper <jcasper <at> soe.ucsc.edu>: May 20 12:38PM -0700

Hello David,
 
Thank you for your question about a problem with the search index for your
bigBed. We are able to see the issue with your bigBed file, but have been
unable to reproduce it with anything we create ourselves. Are you able to
send us the data files and the program binaries (e.g., bedToBigBed) that
you used to construct your smps.bb file? You can send them to me privately
to avoid sharing with the mailing list if you prefer.
 
One of our engineers notes that search names are not required to start with
"Chr_"; we suspect that another bug is responsible.
 
If you have any further questions, please reply to genome <at> soe.ucsc.edu or
genome-mirror <at> soe.ucsc.edu. Questions sent to those addresses will be
archived in publicly-accessible forums for the benefit of other users. If
your question contains sensitive data, you may send it instead to
genome-www <at> soe.ucsc.edu.
 
--
Jonathan Casper
UCSC Genome Bioinformatics Group
 
On Wed, May 20, 2015 at 10:55 AM, David da Silva Pires <pires <at> iq.usp.br>
wrote:
 
Brian Lee <brianlee <at> soe.ucsc.edu>: May 20 12:27PM -0700

Dear Jonghun Lee,
 
Thank you for using the UCSC Genome Browser and your question about Whole
Genome Bisulfite Sequencing (WGBS) data from ENCODE.
 
This data was not released to the public UCSC Genome Browser site, due to
the complexity of perfecting alignment methods, but is available at GEO and
also at the current ENCODE Portal:
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1002650
https://www.encodeproject.org/experiments/ENCSR000AJI/
 
On the ENCODE portal you will find a "General protocol" Track Description
document which states these data were produced by the Dr. Richard Myers Lab
at the HudsonAlpha Institute for Biotechnology and lists Dr. Florencia
Pauli, fpauli at hudsonalpha.org, as the contact for this data.
 
Please send any data questions about the track to the source laboratory.
 
Thank you again for your inquiry and using the UCSC Genome Browser. If you
have any further questions, please reply to genome <at> soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-www <at> soe.ucsc.edu.
 
All the best,
 
Brian Lee
UCSC Genome Bioinformatics Group
 
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 9 May 19:23 2015

Digest for genome <at> soe.ucsc.edu - 2 updates in 2 topics

Jonathan Casper <jcasper <at> soe.ucsc.edu>: May 08 04:36PM -0700

Hello Ivan,
 
Thank you for checking up on this. The mid-April update to build 142 is
something that we would like to incorporate, but at this point build 144 is
likely to be released by the time that we would have the corrected version
of 142 ready for display. The plan for now is that our next update will be
for the release of SNP 144 data. We are keeping a close eye on the progress
of 144 and may reconsider if it is significantly delayed.
 
If you have any further questions, please reply to genome <at> soe.ucsc.edu or
genome-mirror <at> soe.ucsc.edu. Questions sent to those addresses will be
archived in publicly-accessible forums for the benefit of other users. If
your question contains sensitive data, you may send it instead to
genome-www <at> soe.ucsc.edu.
 
--
Jonathan Casper
UCSC Genome Bioinformatics Group
 
On Mon, May 4, 2015 at 6:43 PM, Ivan Adzhubey <
Matthew Speir <mspeir <at> soe.ucsc.edu>: May 08 11:43AM -0700

Hi Mehar,
 
Thank you for your question about converting your coordinates between
hg19 and canFam3 using liftOver. If you modify your original bed file to
include the hg19 coordinates in the name column, then these will be
carried over to the output file from liftOver. This means that if your
input file contained a line like:
 
chr21 33031596 33041570 SOD1.chr21.33031596.33041570
 
Then your output region would contain "SOD1.chr21.33031596.33041570" in
the name column, thus preserving the hg19 coordinates in the canFam3
output file. You can use a command like
 
awk '{print $1,$2,$3,$4 "." $1 "." $2 "." $3}' yourOriginal.bed >
newFileWithPositionNames.bed
 
to modify your original file to include the position in the name column.
You can then use this "newFileWithPositionNames.bed" file as your input
for liftOver. If you already have periods in the names of your regions,
you can replace the periods with anything but spaces.
 
I hope this is helpful. If you have any further questions, please reply
to genome <at> soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genome-www <at> soe.ucsc.edu.
 
Matthew Speir
UCSC Genome Bioinformatics Group
 
 
On 5/7/15 2:58 PM, mehar wrote:
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 8 May 19:10 2015

Digest for genome <at> soe.ucsc.edu - 4 updates in 3 topics

Matt Jones <jonesmr9 <at> gmail.com>: May 07 12:38PM -0600

Hello,
 
I am mapping exome data to divergent reference genome and I was wondering
if you could provide liftover files for chinese hamster (criGri1) to mouse
(mm9) and squirrel (speTri2) to mouse (mm9). Thank you!
 
Best,
Matt
 
--
Matt Jones
Ph.D Student
University of Montana
Department of Organismal Biology and Ecology
matthew2.jones <at> umontana.edu
matthewrjones.com
Matthew Speir <mspeir <at> soe.ucsc.edu>: May 08 10:03AM -0700

Hi Matt,
 
Thank you for your question about these liftOver files. Unfortunately,
we don't have plans to create these liftOver files. However, you can
still carry out these conversions using our existing liftOver files in a
stepwise fashion. To go from speTri2 to mm9, use the following steps:
1. Lift coordinates from speTri2 to mm10 using this file:
http://hgdownload.soe.ucsc.edu/goldenPath/mm10/liftOver/mm10ToSpeTri2.over.chain.gz
2. Then lift from mm10 to mm9 using this file:
http://hgdownload.soe.ucsc.edu/goldenPath/mm10/liftOver/mm10ToMm9.over.chain.gz
 
To convert you coordinates from criGri1 to mm9, use the following steps:
1. Lift coordinates between criGri1 and hg19 using this file:
http://hgdownload.soe.ucsc.edu/goldenPath/hg19/liftOver/hg19ToCriGri1.over.chain.gz
2. Then lift from hg19 to mm9 using this file:
http://hgdownload.soe.ucsc.edu/goldenPath/hg19/liftOver/hg19ToMm9.over.chain.gz
 
Alternatively, if you want to go directly from speTri2 or criGri1 to
mm9, you can create the liftOver files yourself. To do so, please refer
to the following GenomeWiki page:
http://genomewiki.cse.ucsc.edu/index.php/Whole_genome_alignment_howto.
 
Lastly, one of our engineers provided the following input on mapping
your exome sequences between species:
"I would recommend they take the exome sequences they have and blat
them against the target of interest."
 
I hope this is helpful. If you have any further questions, please reply
to genome <at> soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genome-www <at> soe.ucsc.edu.
 
Matthew Speir
UCSC Genome Bioinformatics Group
 
 
On 5/7/15 11:38 AM, Matt Jones wrote:
Joy Lee <joylee555350 <at> gmail.com>: May 07 04:59PM -0700

72 1 1
76 1 1
44 1 1
67 1 1
46 3 1
97 1 1
341 1 1
687 1 1
913 1 1
130 1 1
222 1 1
400 0 1
1661 1 1
810 1 1
1447 1 1
1672 1 1
158 1 1
516 1 1
162 1 1
857 1 1
420 1 1
53 1 1
143 1 1
701 1 1
2 1 1
982 1 1
144 1 1
1361 1 1
367 1 1
185 1 1
138 1 1
395 1 1
24 1 1
605 1 1
239 1 1
10 2 1
5 1 1
33 1 1
96 1 1
198 1 1
50
mehar <meharji.arumilli <at> helsinki.fi>: May 08 12:58AM +0300

Dear all,
 
I have downloaded the liftover executable and hg19ToCanFam3 chain file
to lift hg19 coordinates to canFam3 coordinates in the below shown way:
 
./liftOver hg19.bed hg19ToCanFam3.over.chain.gz canFam3.bed failed.bed
 
The command processed and gave canFam3.bed file with canFam3
coordinates. However, i would like to have the coordinates of both hg19
and canFam3
in the output files inorder to know corresponding coordinates in both
genome assemblies. COuld someone help if there is a way to get this. Thanks.
 
Br
Mehar
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 24 Apr 19:23 2015

Digest for genome <at> soe.ucsc.edu - 8 updates in 7 topics

Jasmine Brown <jasminebro2 <at> gmail.com>: Apr 24 10:11AM -0500

Hello. I am currently trying to determine the best queries to use from my
blat output.psl file.
 
I used pslFilter and it worked on two of my files however, on the third
file it returned this error :
 
[jbrown <at> master Blat]$ pslFilter AotusBlat.psl AotusFilter.psl
Filtering AotusBlat.psl to AotusFilter.psl
sizeOne tStarts 4 bs 9
pslFilter: psl.c:77: pslLoad: Assertion `sizeOne == ret->blockCount' failed.
 
 
I'm not sure if there is something wrong with my AotusBlat.psl file or what
it means by blockCount failed. When I looked at the Aotus.FIlter.psl file
it only shows 3 queries so I'm assuming pslFilter did not finish running.
Help would be greatly appreciated.
 
Thanks,
 
Jasmine N. Brown, MS
PhD Student
Batzer Laboratory of Comparative Genomics
https://biosci-batzerlab.biology.lsu.edu/
 
Department of Biological Sciences
A653 Life Sciences Building
Louisiana State University
Baton Rouge, LA, 70803
James Kozubek <jkozubek <at> broadinstitute.org>: Apr 24 10:54AM -0400

Hi,
 
This is probably a common problem. I am blatting a virus against HG38 and
if I use a small section of my sequence I get an interesting hit, but if I
blat the entire virus I no longer get the hit. Am I doing something wrong?
 
 
 
If I submit a short sequence from my HCV virus...
 
GGCGCACAGGTAGAGGAAGAC
 
I get this hit...
 
BLAT Search Results
 
ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO
STRAND START END SPAN
---------------------------------------------------------------------------------------------------browser
<https://genome.ucsc.edu/cgi-bin/hgTracks?position=chr2:24872687-24872707&db=hg38&ss=../trash/hgSs/hgSs_genome_192c_a564a0.pslx+../trash/hgSs/hgSs_genome_192c_a564a0.fa&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
details <https://genome.ucsc.edu/cgi-bin/hgc?o=24872686&g=htcUserAli&i=../trash/hgSs/hgSs_genome_192c_a564a0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_192c_a564a0.fa+YourSeq&c=chr2&l=24872686&r=24872707&db=hg38&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
YourSeq 21 1 21 21 100.0% 2 + 24872687
24872707 21
 
 
 
...But if I submit my entire HCV virus I no longer get that hit
 
 
ACTIONS QUERY SCORE START END QSIZE IDENTITY CHRO
STRAND START END SPAN
---------------------------------------------------------------------------------------------------browser
<https://genome.ucsc.edu/cgi-bin/hgTracks?position=chr1:99120181-99120212&db=hg38&ss=../trash/hgSs/hgSs_genome_237f_a57cf0.pslx+../trash/hgSs/hgSs_genome_237f_a57cf0.fa&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
details <https://genome.ucsc.edu/cgi-bin/hgc?o=99120180&g=htcUserAli&i=../trash/hgSs/hgSs_genome_237f_a57cf0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_237f_a57cf0.fa+YourSeq&c=chr1&l=99120180&r=99120212&db=hg38&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
YourSeq 25 9545 9578 9678 85.2% 1 - 99120181
99120212 32browser
<https://genome.ucsc.edu/cgi-bin/hgTracks?position=chr4:51867001-51867021&db=hg38&ss=../trash/hgSs/hgSs_genome_237f_a57cf0.pslx+../trash/hgSs/hgSs_genome_237f_a57cf0.fa&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
details <https://genome.ucsc.edu/cgi-bin/hgc?o=51867000&g=htcUserAli&i=../trash/hgSs/hgSs_genome_237f_a57cf0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_237f_a57cf0.fa+YourSeq&c=chr4&l=51867000&r=51867021&db=hg38&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
YourSeq 21 7654 7674 9678 100.0% 4 - 51867001
51867021 21browser
<https://genome.ucsc.edu/cgi-bin/hgTracks?position=chr16:17970801-17970821&db=hg38&ss=../trash/hgSs/hgSs_genome_237f_a57cf0.pslx+../trash/hgSs/hgSs_genome_237f_a57cf0.fa&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
details <https://genome.ucsc.edu/cgi-bin/hgc?o=17970800&g=htcUserAli&i=../trash/hgSs/hgSs_genome_237f_a57cf0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_237f_a57cf0.fa+YourSeq&c=chr16&l=17970800&r=17970821&db=hg38&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
YourSeq 21 979 999 9678 100.0% 16 - 17970801
17970821 21browser
<https://genome.ucsc.edu/cgi-bin/hgTracks?position=chr12:117124846-117124872&db=hg38&ss=../trash/hgSs/hgSs_genome_237f_a57cf0.pslx+../trash/hgSs/hgSs_genome_237f_a57cf0.fa&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
details <https://genome.ucsc.edu/cgi-bin/hgc?o=117124845&g=htcUserAli&i=../trash/hgSs/hgSs_genome_237f_a57cf0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_237f_a57cf0.fa+YourSeq&c=chr12&l=117124845&r=117124872&db=hg38&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
YourSeq 21 1800 1826 9678 88.9% 12 + 117124846
117124872 27browser
<https://genome.ucsc.edu/cgi-bin/hgTracks?position=chr7:3875585-3875622&db=hg38&ss=../trash/hgSs/hgSs_genome_237f_a57cf0.pslx+../trash/hgSs/hgSs_genome_237f_a57cf0.fa&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
details <https://genome.ucsc.edu/cgi-bin/hgc?o=3875584&g=htcUserAli&i=../trash/hgSs/hgSs_genome_237f_a57cf0.pslx+..%2Ftrash%2FhgSs%2FhgSs_genome_237f_a57cf0.fa+YourSeq&c=chr7&l=3875584&r=3875622&db=hg38&hgsid=424074869_Yc5f3nHoAsKAIFR4pxuiASePEOhl>
YourSeq 20 3261 3298 9678 76.4% 7 - 3875585
3875622 38
 
 
ACCCGCCCCTAATAGGGGCGACACTCCGCCATGAATCACTCCCCTGTGAGGAACTACTGTCTTCACGCAGAAAGCGTCTA
GCCATGGCGTTAGTATGAGTGTCGTACAGCCTCCAGGCCCCCCCCTCCCGGGAGAGCCATAGTGGTCTGCGGAACCGGTG
AGTACACCGGAATTGCCGGGAAGACTGGGTCCTTTCTTGGATAAACCCACTCTATGCCCGGCCATTTGGGCGTGCCCCCG
CAAGACTGCTAGCCGAGTAGCGTTGGGTTGCGAAAGGCCTTGTGGTACTGCCTGATAGGGTGCTTGCGAGTGCCCCGGGA
GGTCTCGTAGACCGTGCACCATGAGCACAAATCCTAAACCTCAAAGAAAAACCAAAAGAAACACCAACCGTCGCCCACAA
GACGTTAAGTTTCCGGGCGGCGGCCAGATCGTTGGCGGAGTATACTTGTTGCCGCGCAGGGGCCCCAGGTTGGGTGTGCG
CGCGACAAGGAAGACTTCGGAGCGGTCCCAGCCACGTGGAAGGCGCCAGCCCATCCCTAAAGATCGGCGCTCCACTGGCA
AATCCTGGGGAAAACCAGGATACCCCTGGCCCCTATACGGGAATGAGGGACTCGGCTGGGCAGGATGGCTCCTGTCCCCC
CGAGGTTCCCGTCCCTCTTGGGGCCCCAATGACCCCCGGCATAGGTCGCGCAACGTGGGTAAGGTCATCGATACCCTAAC
GTGCGGCTTTGCCGACCTCATGGGGTACATCCCTGTCGTGGGCGCCCCGCTCGGCGGCGTCGCCAGAGCTCTCGCGCATG
GCGTGAGAGTCCTGGAGGACGGGGTTAATTTTGCAACAGGGAACTTACCCGGTTGCTCCTTTTCTATCTTCTTGCTGGCC
CTGCTGTCCTGCATCACCACCCCGGTCTCCGCTGCCGAAGTGAAGAACATCAGTACCGGCTACATGGTGACTAACGACTG
CACCAATGACAGCATTACCTGGCAGCTCCAGGCTGCTGTCCTCCACGTCCCCGGGTGCGTCCCGTGCGAGAAAGTGGGGA
ATGCATCTCAGTGCTGGATACCGGTCTCACCGAATGTGGCCGTGCAGCGGCCCGGCGCCCTCACGCAGGGCTTGCGGACG
CACATCGACATGGTTGTGATGTCCGCCACGCTCTGCTCTGCCCTCTACGTGGGGGACCTCTGCGGTGGGGTGATGCTCGC
AGCCCAAATGTTCATTGTCTCGCCGCAGCACCACTGGTTTGTCCAAGACTGCAATTGCTCCATCTACCCTGGTACCATCA
CTGGACACCGCATGGCATGGGACATGATGATGAACTGGTCGCCCACGGCTACCATGATCTTGGCGTACGCGATGCGTGTC
CCCGAGGTCATTATAGACATCATTAGCGGGGCTCATTGGGGCGTCATGTTCGGCTTGGCCTACTTCTCTATGCAGGGAGC
GTGGGCGAAAGTCGTTGTCATCCTTCTGTTGGCCGCCGGGGTGGACGCGCGCACCCATACTGTTGGGGGTTCTGCCGCGC
AGACCACCGGGCGCCTCACCAGCTTATTTGACATGGGCCCCAGGCAGAAAATCCAGCTCGTTAACACCAATGGCAGCTGG
CACATCAACCGCACCGCCCTGAACTGCAATGACTCCTTGCACACCGGCTTTATCGCGTCTCTGTTCTACACCCACAGCTT
CAACTCGTCAGGATGTCCCGAACGCATGTCCGCCTGCCGCAGTATCGAGGCCTTCCGGGTGGGATGGGGCGCCTTGCAAT
ATGAGGATAATGTCACCAATCCAGAGGATATGAGACCCTATTGCTGGCACTACCCACCAAGGCAGTGTGGCGTGGTCTCC
GCGAAGACTGTGTGTGGCCCAGTGTACTGTTTCACCCCCAGCCCAGTGGTAGTGGGCACGACCGACAGGCTTGGAGCGCC
CACTTACACGTGGGGGGAGAATGAGACAGATGTCTTCCTATTGAACAGCACTCGACCACCGCTGGGGTCATGGTTCGGCT
GCACGTGGATGAACTCTTCTGGCTACACCAAGACTTGCGGCGCACCACCCTGCCGTACTAGAGCTGACTTCAACGCCAGC
ACGGACCTGTTGTGCCCCACGGACTGTTTTAGGAAGCATCCTGATACCACTTACCTCAAATGCGGCTCTGGGCCCTGGCT
CACGCCAAGGTGCCTGATCGACTACCCCTACAGGCTCTGGCATTACCCCTGCACAGTTAACTATACCATCTTCAAAATAA
GGATGTATGTGGGAGGGGTTGAGCACAGGCTCACGGCTGCATGCAATTTCACTCGTGGGGATCGTTGCAACTTGGAGGAC
AGAGACAGAAGTCAACTGTCTCCTTTGTTGCACTCCACCACGGAATGGGCCATTTTACCTTGCTCTTACTCGGACCTGCC
CGCCTTGTCGACTGGTCTTCTCCACCTCCACCAAAACATCGTGGACGTACAATTCATGTATGGCCTATCACCTGCCCTCA
CAAAATACATCGTCCGATGGGAGTGGGTAATACTCTTATTCCTGCTCTTAGCGGACGCCAGGGTTTGCGCCTGCTTATGG
ATGCTCATCTTGTTGGGCCAGGCCGAAGCAGCACTAGAGAAGCTGGTCATCTTGCACGCTGCGAGCGCAGCTAGCTGCAA
TGGCTTCCTATATTTTGTCATCTTTTTCGTGGCTGCTTGGTACATCAAGGGTCGGGTAGTCCCCTTAGCTACCTATTCCC
TCACTGGCCTGTGGTCCTTTAGCCTACTGCTCCTAGCATTGCCCCAACAGGCTTATGCTTATGACGCATCTGTGCATGGC
CAGATAGGAGCGGCTCTGCTGGTAATGATCACTCTCTTTACTCTCACCCCCGGGTATAAGACCCTTCTCAGCCGGTTTTT
GTGGTGGTTGTGCTATCTCCTGACCCTGGGGGAAGCCATGATTCAGGAGTGGGTACCACCCATGCAGGTGCGCGGCGGCC
GCGATGGCATCGCGTGGGCCGTCACTATATTCTGCCCGGGTGTGGTGTTTGACATTACCAAATGGCTTTTGGCGTTGCTT
GGGCCTGCTTACCTCTTAAGGGCCGCTTTGACACATGTGCCGTACTTCGTCAGAGCTCACGCTCTGATAAGGGTATGCGC
TTTGGTGAAGCAGCTCGCGGGGGGTAGGTATGTTCAGGTGGCGCTATTGGCCCTTGGCAGGTGGACTGGCACCTACATCT
ATGACCACCTCACACCTATGTCGGACTGGGCCGCTAGCGGCCTGCGCGACTTAGCGGTCGCCGTGGAACCCATCATCTTC
AGTCCGATGGAGAAGAAGGTCATCGTCTGGGGAGCGGAGACGGCTGCATGTGGGGACATTCTACATGGACTTCCCGTGTC
CGCCCGACTCGGCCAGGAGATCCTCCTCGGCCCAGCTGATGGCTACACCTCCAAGGGGTGGAAGCTCCTTGCTCCCATCA
CTGCTTATGCCCAGCAAACACGAGGCCTCCTGGGCGCCATAGTGGTGAGTATGACGGGGCGTGACAGGACAGAACAGGCC
GGGGAAGTCCAAATCCTGTCCACAGTCTCTCAGTCCTTCCTCGGAACAACCATCTCGGGGGTTTTGTGGACTGTTTACCA
CGGAGCTGGCAACAAGACTCTAGCCGGCTTACGGGGTCCGGTCACGCAGATGTACTCGAGTGCTGAGGGGGACTTGGTAG
GCTGGCCCAGCCCCCCTGGGACCAAGTCTTTGGAGCCGTGCAAGTGTGGAGCCGTCGACCTATATCTGGTCACGCGGAAC
GCTGATGTCATCCCGGCTCGGAGACGCGGGGACAAGCGGGGAGCATTGCTCTCCCCGAGACCCATTTCGACCTTGAAGGG
GTCCTCGGGGGGGCCGGTGCTCTGCCCTAGGGGCCACGTCGTTGGGCTCTTCCGAGCAGCTGTGTGCTCTCGGGGCGTGG
CCAAATCCATCGATTTCATCCCCGTTGAGACACTCGACGTTGTTACAAGGTCTCCCACTTTCAGTGACAACAGCACGCCA
CCGGCTGTGCCCCAGACCTATCAGGTCGGGTACTTGCATGCTCCAACTGGCAGTGGAAAGAGCACCAAGGTCCCTGTCGC
GTATGCCGCCCAGGGGTACAAAGTACTAGTGCTTAACCCCTCGGTAGCTGCCACCCTGGGGTTTGGGGCGTACCTATCCA
AGGCACATGGCATCAATCCCAACATTAGGACTGGAGTCAGGACCGTGATGACCGGGGAGGCCATCACGTACTCCACATAT
GGCAAATTTCTCGCCGATGGGGGCTGCGCTAGCGGCGCCTATGACATCATCATATGCGATGAATGCCACGCTGTGGATGC
TACCTCCATTCTCGGCATCGGAACGGTCCTTGATCAAGCAGAGACAGCCGGGGTCAGACTAACTGTGCTGGCTACGGCCA
CACCCCCCGGGTCAGTGACAACCCCCCATCCCGATATAGAAGAGGTAGGCCTCGGGCGGGAGGGTGAGATCCCCTTCTAT
GGGAGGGCGATTCCCCTATCCTGCATCAAGGGAGGGAGACACCTGATTTTCTGCCACTCAAAGAAAAAGTGTGACGAGCT
CGCGGCGGCCCTTCGGGGCATGGGCTTGAATGCCGTGGCATACTATAGAGGGTTGGACGTCTCCATAATACCAGCTCAGG
GAGATGTGGTGGTCGTCGCCACCGACGCCCTCATGACGGGGTACACTGGAGACTTTGACTCCGTGATCGACTGCAATGTA
GCGGTCACCCAAGCTGTCGACTTCAGCCTGGACCCCACCTTCACTATAACCACACAGACTGTCCCACAAGACGCTGTCTC
ACGCAGTCAGCGCCGCGGGCGCACAGGTAGAGGAAGACAGGGCACTTATAGGTATGTTTCCACTGGTGAACGAGCCTCAG
GAATGTTTGACAGTGTAGTGCTTTGTGAGTGCTACGACGCAGGGGCTGCGTGGTACGATCTCACACCAGCGGAGACCACC
GTCAGGCTTAGAGCGTATTTCAACACGCCCGGCCTACCCGTGTGTCAAGACCATCTTGAATTTTGGGAGGCAGTTTTCAC
CGGCCTCACACACATAGACGCCCACTTCCTCTCCCAAACAAAGCAAGCGGGGGAGAACTTCGCGTACCTAGTAGCCTACC
AAGCTACGGTGTGCGCCAGAGCCAAGGCCCCTCCCCCGTCCTGGGACGCCATGTGGAAGTGCCTGGCCCGACTCAAGCCT
ACGCTTGCGGGCCCCACACCTCTCCTGTACCGTTTGGGCCCTATTACCAATGAGGTCACCCTCACACACCCTGGGACGAA
GTACATCGCCACATGCATGCAAGCTGACCTTGAGGTCATGACCAGCACGTGGGTCCTAGCTGGAGGAGTCCTGGCAGCCG
TCGCCGCATATTGCCTGGCGACTGGATGCGTTTCCATCATCGGCCGCTTGCACGTCAACCAGCGAGTCGTCGTTGCGCCG
GATAAGGAGGTCCTGTATGAGGCTTTTGATGAGATGGAGGAATGCGCCTCTAGGGCGGCTCTCATCGAAGAGGGGCAGCG
GATAGCCGAGATGTTGAAGTCCAAGATCCAAGGCTTGCTGCAGCAGGCCTCTAAGCAGGCCCAGGACATACAACCCGCTA
TGCAGGCTTCATGGCCCAAAGTGGAACAATTTTGGGCCAGACACATGTGGAACTTCATTAGCGGCATCCAATACCTCGCA
GGATTGTCAACACTGCCAGGGAACCCCGCGGTGGCTTCCATGATGGCATTCAGTGCCGCCCTCACCAGTCCGTTGTCGAC
CAGTACCACCATCCTTCTCAACATCATGGGAGGCTGGTTAGCGTCCCAGATCGCACCACCCGCGGGGGCCACCGGCTTTG
TCGTCAGTGGCCTGGTGGGGGCTGCCGTGGGCAGCATAGGCCTGGGTAAGGTGCTGGTGGACATCCTGGCAGGATATGGT
GCGGGCATTTCGGGGGCCCTCGTCGCATTCAAGATCATGTCTGGCGAGAAGCCCTCTATGGAAGATGTCATCAATCTACT
GCCTGGGATCCTGTCTCCGGGAGCCCTGGTGGTGGGGGTCATCTGCGCGGCCATTCTGCGCCGCCACGTGGGACCGGGGG
AGGGCGCGGTCCAATGGATGAACAGGCTTATTGCCTTTGCTTCCAGAGGAAACCACGTCGCCCCTACTCACTACGTGACG
GAGTCGGATGCGTCGCAGCGTGTGACCCAACTACTTGGCTCTCTTACTATAACCAGCCTACTCAGAAGACTCCACAATTG
GATAACTGAGGACTGCCCCATCCCATGCTCCGGATCCTGGCTCCGCGACGTGTGGGACTGGGTTTGCACCATCTTGACAG
ACTTCAAAAATTGGCTGACCTCTAAATTGTTCCCCAAGCTGCCCGGCCTCCCCTTCATCTCTTGTCAAAAGGGGTACAAG
GGTGTGTGGGCCGGCACTGGCATCATGACCACGCGCTGCCCTTGCGGCGCCAACATCTCTGGCAATGTCCGCCTGGGCTC
TATGAGGATCACAGGGCCTAAAACCTGCATGAACACCTGGCAGGGGACCTTTCCTATCAATTGCTACACGGAGGGCCAGT
GCGCGCCGAAACCCCCCACGAACTACAAGACCGCCATCTGGAGGGTGGCGGCCTCGGAGTACGCGGAGGTGACGCAGCAT
GGGTCGTACTCCTATGTAACAGGACTGACCACTGACAATCTGAAAATTCCTTGCCAACTACCTTCTCCAGAGTTTTTCTC
CTGGGTGGACGGTGTGCAGATCCATAGGTTTGCACCCACACCAAAGCCGTTTTTCCGGGATGAGGTCTCGTTCTGCGTTG
GGCTTAATTCCTATGCTGTCGGGTCCCAGCTTCCCTGTGAACCTGAGCCCGACGCAGACGTATTGAGGTCCATGCTAACA
GATCCGCCCCACATCACGGCGGAGACTGCGGCGCGGCGCTTGGCACGGGGATCACCTCCATCTGAGGCGAGCTCCTCAGT
GAGCCAGCTATCAGCACCGTCGCTGCGGGCCACCTGCACCACCCACAGCAACACCTATGACGTGGACATGGTCGATGCCA
ACCTGCTCATGGAGGGCGGTGTGGCTCAGACAGAGCCTGAGTCCAGGGTGCCCGTTCTGGACTTTCTCGAGCCAATGGCC
GAGGAAGAGAGCGACCTTGAGCCCTCAATACCATCGGAGTGCATGCTCCCCAGGAGCGGGTTTCCACGGGCCTTACCGGC
TTGGGCACGGCCTGACTACAACCCGCCGCTCGTGGAATCGTGGAGGAGGCCAGATTACCAACCGCCCACCGTTGCTGGTT
GTGCTCTCCCCCCCCCCAAGAAGGCCCCGACGCCTCCCCCAAGGAGACGCCGGACAGTGGGTCTGAGCGAGAGCACCATA
TCAGAAGCCCTCCAGCAACTGGCCATCAAGACCTTTGGCCAGCCCCCCTCGAGCGGTGATGCAGGCTCGTCCACGGGGGC
GGGCGCCGCCGAATCCGGCGGTCCGACGTCCCCTGGTGAGCCGGCCCCCTCAGAGACAGGTTCCGCCTCCTCTATGCCCC
CCCTCGAGGGGGAGCCTGGAGATCCGGACCTGGAGTCTGATCAGGTAGAGCTTCAACCTCCCCCCCAGGGGGGGGGGGTA
GCTCCCGGTTCGGGCTCGGGGTCTTGGTCTACTTGCTCCGAGGAGGACGATACCACCGTGTGCTGCTCCATGTCATACTC
CTGGACCGGGGCTCTAATAACTCCCTGTAGCCCCGAAGAGGAAAAGTTGCCAATCAACCCTTTGAGTAACTCGCTGTTGC
GATACCATAACAAGGTGTACTGTACAACATCAAAGAGCGCCTCACAGAGGGCTAAAAAGGTAACTTTTGACAGGACGCAA
GTGCTCGACGCCCATTATGACTCAGTCTTAAAGGACATCAAGCTAGCGGCTTCCAAGGTCAGCGCAAGGCTCCTCACCTT
GGAGGAGGCGTGCCAGTTGACTCCACCCCATTCTGCAAGATCCAAGTATGGATTCGGGGCCAAGGAGGTCCGCAGCTTGT
CCGGGAGGGCCGTTAACCACATCAAGTCCGTGTGGAAGGACCTCCTGGAAGACCCACAAACACCAATTCCCACAACCATC
ATGGCCAAAAATGAGGTGTTCTGCGTGGACCCCGCCAAGGGGGGTAAGAAACCAGCTCGCCTCATCGTTTACCCTGACCT
CGGCGTCCGGGTCTGCGAGAAAATGGCCCTCTATGACATTACACAAAAGCTTCCTCAGGCGGTAATGGGAGCTTCCTATG
GCTTCCAGTACTCCCCTGCCCAACGGGTGGAGTATCTCTTGAAAGCATGGGCGGAAAAGAAGGACCCCATGGGTTTTTCG
TATGATACCCGATGCTTCGACTCAACCGTCACTGAGAGAGACATCAGGACCGAGGAGTCCATATACCAGGCCTGCTCCCT
GCCCGAGGAGGCCCGCACTGCCATACACTCGCTGACTGAGAGACTTTACGTAGGAGGGCCCATGTTCAACAGCAAGGGTC
AAACCTGCGGTTACAGACGTTGCCGCGCCAGCGGGGTGCTAACCACTAGCATGGGTAACACCATCACATGCTATGTGAAA
GCCCTAGCGGCCTGCAAGGCTGCGGGGATAGTTGCGCCCACAATGCTGGTATGCGGCGATGACCTAGTAGTCATCTCAGA
AAGCCAGGGGACTGAGGAGGACGAGCGGAACCTGAGAGCCTTCACGGAGGCCATGACCAGGTACTCTGCCCCTCCTGGTG
ATCCCCCCAGACCGGAATATGACCTGGAGCTAATAACATCCTGTTCCTCAAATGTGTCTGTGGCGTTGGGCCCGCGGGGC
CGCCGCAGATACTACCTGACCAGAGACCCAACCACTCCACTCGCCCGGGCTGCCTGGGAAACAGTTAGACACTCCCCTAT
CAATTCATGGCTGGGAAACATCATCCAGTATGCTCCAACCATATGGGTTCGCATGGTCCTAATGACACACTTCTTCTCCA
TTCTCATGGTCCAAGACACCCTGGACCAGAACCTCAACTTTGAGATGTATGGATCAGTATACTCCGTGAATCCTTTGGAC
CTTCCAGCCATAATTGAGAGGTTACACGGGCTTGACGCCTTTTCTATGCACACATACTCTCACCACGAACTGACGCGGGT
GGCTTCAGCCCTCAGAAAACTTGGGGCGCCACCCCTCAGGGTGTGGAAGAGTCGGGCTCGCGCAGTCAGGGCGTCCCTCA
TCTCCCGTGGAGGGAAAGCGGCCGTTTGCGGCCGATATCTCTTCAATTGGGCGGTGAAGACCAAGCTCAAACTCACTCCA
TTGCCGGAGGCGCGCCTACTGGACTTATCCAGTTGGTTCACCGTCGGCGCCGGCGGGGGCGACATTTTTCACAGCGTGTC
GCGCGCCCGACCCCGCTCATTACTCTTCGGCCTACTCCTACTTTTCGTAGGGGTAGGCCTCTTCCTACTCCCCGCTCGGT
AGAGCGGCACACACTAGGTACACTCCATAGCTAACTGTTCCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
TTTTCTTTTTTTTTTTTTTCCCTCTTTCTTCCCTTCTCATCTTATTCTACTTTCTTTCTTGGTGGCTCCATCTTAGCCCT
AGTCACGGCTAGCTGTGAAAGGTCCGTGAGCCGCATGACTGCAGAGAGTGCCGTAACTGGTCTCTCTGCAGATCATGT
James Kozubek <jkozubek <at> broadinstitute.org>: Apr 24 11:11AM -0400

OK- I see. "it may miss sequences less than 25 bases"
 
On Fri, Apr 24, 2015 at 10:54 AM, James Kozubek <jkozubek <at> broadinstitute.org
#CHEN GUOHAO# <JCHEN015 <at> e.ntu.edu.sg>: Apr 24 09:04AM

Dear admins
 
 
I got my data uploaded to Galaxy (It is already sorted)
 
 
[cid:f6b5e71c-8bd5-478f-80b9-812e5020fa8d]
 
 
But when I clicked on "display at UCSC main", i got linked UCSC browser with an error stating below:
 
 
[cid:564d5440-0f82-4c3c-b1d7-46ebc880bf5c]
 
 
May I know what could be the problem?
 
 
Regards,
 
Julius
kuhl <kuhl <at> molgen.mpg.de>: Apr 24 10:55AM +0200

Dear UCSC staff,
 
I just realized that you have set up genome preview browsers for genomes
we recently published:
 
European sea bass
http://genome-preview.ucsc.edu/cgi-bin/hgGateway?db=dicLab1
 
Canary
http://genome-preview.ucsc.edu/cgi-bin/hgGateway?db=serCan1
 
For these projects we have established local UCSC browsers at our
institutes which have lots of additional annotation tracks, you might be
interested to add these tracks to your genome preview browser versions.
 
http://seabass.mpipz.mpg.de/cgi-bin/hgGateway
http://public-genomes-ngs.molgen.mpg.de/cgi-bin/hgGateway?db=serCan1
 
Best wishes, Heiner
 
 
---------------------------------------------------------------
Dr. Heiner Kuhl
MPI Molecular Genetics Tel: + 49 + 30 / 8413 1776
Next Generation Sequencing
Ihnestrasse 73 email: kuhl <at> molgen.mpg.de
D-14195 Berlin http://www.molgen.mpg.de/SeqCore
---------------------------------------------------------------
#CHEN GUOHAO# <JCHEN015 <at> e.ntu.edu.sg>: Apr 24 08:25AM

Hi all
 
 
I set up a genome browser in a box on my local workstation. Eventually, it was for my sole purpose but now, I would like to share with my friends to co-use it.
 
 
I followed the instructions as follows:
 
[cid:9cd3f663-3728-4cbb-8951-57db1c2b34ed]
 
 
Result:
 
[cid:4c534f99-3019-4a1e-aec1-f740fab0f45b]
 
 
Now, I can't even access the browser myself (I used to access via 127.0.0.1:1234).
 
 
I read a bit online, and some say it may be because I did not allow incoming connections to port 1234, hence I did it as follows:
 
 
[cid:4a3c35f6-daa9-4a5a-912c-9a8a4364fdc4]
 
[cid:3d1aeeab-c27a-442d-ade6-e6a413d6f1a7]
 
 
And it still fail, please advice...
 
 
Regards
 
Julius
Vinayak Kulkarni <vkullu <at> gmail.com>: Apr 23 06:20PM -0700

Dear UCSC folks,
In the past I have used the file
http://hgdownload.cse.ucsc.edu/goldenpath/hg19/vsSelf/hg19.hg19.all.chain.gz
and
it's been very very useful for my analysis. Do you have a similar file for
hg38?
Many thanks,
Vinayak.
Matthew Speir <mspeir <at> soe.ucsc.edu>: Apr 23 01:16PM -0700

Hi Roger,
 
Thank you for your question about finding these different bat genomes in
the UCSC Genome Browser. These genomes are available on our preview
server as:
 
* David's myotis (bat) (Myotis davidii):
http://genome-preview.ucsc.edu/cgi-bin/hgGateway?db=myoDav1
* Big brown bat (Eptesicus fuscus):
http://genome-preview.ucsc.edu/cgi-bin/hgGateway?db=eptFus1
* Black flying-fox (Pteropus alecto):
http://genome-preview.ucsc.edu/cgi-bin/hgGateway?db=pteAle1
 
You can also find minimal downloads for these species on our preview
download server at:
 
* http://hgdownload-test.soe.ucsc.edu/goldenPath/myoDav1/
* http://hgdownload-test.soe.ucsc.edu/goldenPath/eptFus1/
* http://hgdownload-test.soe.ucsc.edu/goldenPath/pteAle1/
 
Any data on the preview server is still under development and subject to
change at any time. If you decide to use these data, please do so with
caution.
 
You are also welcome to create assembly hubs for all of these species.
Our assembly hub feature allows users to host their genome and related
annotations on a publicly-accessible web server and then visualize these
within our browser. You can find information about creating assembly
hubs on the following help pages:
 
* http://genomewiki.ucsc.edu/index.php/Assembly_Hubs
* http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html
 
Please review this previously answered mailing list question for
additional information on creating and maintaining assembly hubs:
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/ozSm1vjaxRY/yZNRpWHRcvQJ
 
If you do create an assembly hub containing these species and host the
data, we can add it to our public hubs page along side other externally
hosted hubs: http://genome.ucsc.edu/cgi-bin/hgHubConnect?
 
I hope this is helpful. If you have any further questions, please reply
to genome <at> soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genome-www <at> soe.ucsc.edu.
 
Matthew Speir
UCSC Genome Bioinformatics Group
 
 
On 4/22/15 7:48 PM, Roger Long wrote:
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 31 Mar 19:19 2015

Digest for genome <at> soe.ucsc.edu - 6 updates in 5 topics

Charlotte Hor <Charlotte.Hor <at> crg.eu>: Mar 31 03:20PM

Dear UCSC Browser Team,
 
Is it possible to find out where the SNP, CNV etc data from the DBA2/J mouse strain come from? Is there a publication linked with these data?
 
Thank you for your awesome resource and best regards,
Charlotte Hor
Anaïs Gouin <anais.gouin <at> irisa.fr>: Mar 31 12:08PM +0200

Good morning,
 
I would like to get the reciprocal best chains for my alignments. And I realized that your pipeline ( http://genomewiki.ucsc.edu/index.php/HowTo:_Syntenic_Net_or_Reciprocal_Best ) starts from best chains in one way (genomeA-referenced/genomeB as query).
I looked at this page http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto where it is said taht we have to use axtSort and axtBest to "keep only the longest chains". Is taht the way to get the best chains?
 
But thiese two tools are usable on axt file. So should I use it directly on the alignments on axt format or should I generate the chains with axtChain then convert the chain format in axt (if it is possible) and then use this filetring step directly on the chains?
 
Thanks very much in advance for your help.
 
Best,
 
Anaïs
"Yang, Haiwang (NIH/NIDDK) [F]" <haiwang.yang <at> nih.gov>: Mar 31 02:17AM

Dear Madam/Sir,
 
I contacted Ann Zweig today, and heard that the pairwise alignment between Dmel and 11 other Drosophila species will not be updated.
 
http://hgdownload.soe.ucsc.edu/goldenPath/dm6/vsDroSim1/
 
Therefore, I have a plan to do it myself, and here I have some questions:
 
I noticed in the pipeline that blastz was used, instead of the updated verion - lastz. However, the blastz is deprecated in its website.
Is the blastz really the tool that was used?
 
Did you used the masked genomes or unmasked genomes?
 
Many thanks!
 
Best,
Haiwang
dennis <dennis <at> email.unc.edu>: Mar 29 05:33PM -0400

I have seen this question discussed but all the answers I have found
just quote the blat web page. If I have a 120 nt query and get a 28 nt
hit with 24 nt being 100% match and a 3 nt gap. Blat reports that score
as 24 but 28*2-64-x is a negative number. This is based on
2*match-mismatch-gap_penalty that is described on the web page. Can one
of you folks explain where I am making a mistake?
"Steve Heitner" <steve <at> soe.ucsc.edu>: Mar 30 12:33PM -0700

Hello, Dennis.
 
The calculation you should be using is referenced in http://genome.ucsc.edu/FAQ/FAQblat.html#blat4. The relevant portion of the script is:
 
my $pslScore = $sizeMul * ($matches + ( $repMatches >> 1) ) - $sizeMul * $misMatches - $qNumInsert - $tNumInsert;
 
The value of sizeMul is either 3 or 1 depending on whether or not your query is a protein sequence or not. Since we're not dealing with a protein sequence, the value of sizeMul is 1, so the formula is essentially:
 
pslScore = #matches - #misMatches - #qInserts - #tInserts
 
Based on what you described, it sounds like the score is roughly what it should be.
 
If there is still confusion, could you let us know whether you're using gfServer with hgBlat, standalone blat or something else? Also, please provide your query sequence and the name of the assembly you're querying so we can attempt to replicate your query.
 
Please contact us again at genome <at> soe.ucsc.edu if you have any further questions. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genome-www <at> soe.ucsc.edu.
 
---
Steve Heitner
UCSC Genome Bioinformatics Group
 
-----Original Message-----
From: dennis [mailto:dennis <at> email.unc.edu]
Sent: Sunday, March 29, 2015 2:34 PM
To: genome <at> soe.ucsc.edu
Subject: [genome] Understanding Blat Score Calculation
 
I have seen this question discussed but all the answers I have found just quote the blat web page. If I have a 120 nt query and get a 28 nt hit with 24 nt being 100% match and a 3 nt gap. Blat reports that score as 24 but 28*2-64-x is a negative number. This is based on 2*match-mismatch-gap_penalty that is described on the web page. Can one of you folks explain where I am making a mistake?
 
--
"Bin Ahmad Zabidi,Muhammad Mamduh" <muhammad.zabidi <at> imp.ac.at>: Mar 30 12:34PM

Hi Brian,
 
It finally works! I’m using genome-euro.ucsc.edu<http://genome-euro.ucsc.edu> site since it’s a bit faster here.
 
Sincerely,
-Mamduh
 
 
On Mar 27, 2015, at 6:55 PM, Brian Lee <brianlee <at> soe.ucsc.edu<mailto:brianlee <at> soe.ucsc.edu>> wrote:
 
Dear Mamduh,
 
Thank you for using the UCSC Genome Browser and your question about track hubs.
 
Are you still seeing the error, perhaps you are visiting a mirror of the UCSC Genome Browser and not our official site, http://genome.ucsc.edu/?
 
Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to genome <at> soe.ucsc.edu<mailto:genome <at> soe.ucsc.edu>. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genome-www <at> soe.ucsc.edu<mailto:genome-www <at> soe.ucsc.edu>.
 
All the best,
 
Brian Lee
UCSC Genome Bioinformatics Group
 
On Wed, Mar 25, 2015 at 3:31 AM, Bin Ahmad Zabidi,Muhammad Mamduh <muhammad.zabidi <at> imp.ac.at<mailto:muhammad.zabidi <at> imp.ac.at>> wrote:
 
Hi there,
 
I’ve been trying to load my own tracks to the UCSC genome browser, but the Track hubs button doesn’t work since yesterday.
A colleague of mine here is also having the same problem.
 
It also doesn’t work in Chrome, Safari or Firefox on my computer.
 
Do you have any suggestion on how we could get around this?
 
Many thanks!
 
Sincerely,
-Mamduh
 
--
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 28 Mar 18:10 2015

Digest for genome <at> soe.ucsc.edu - 3 updates in 3 topics

Brian Lee <brianlee <at> soe.ucsc.edu>: Mar 27 11:25AM -0700

Dear Kumar,
 
Thank you for using the UCSC Genome Browser and your question about a
bedgraph memory issue when uploading a file.
 
Please see this recent conversation:
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/Eybt7oo8m5A/dhdFwDB8uMEJ
 
What would be best is to create a binary file that you can host on the
internet that the browser can access so that only bits of the file are
transferred when browsed, instead of the entire file.
 
For an overview you might want to look at this helpful wikipage:
http://genomewiki.ucsc.edu/index.php/Selecting_a_graphing_track_data_format
 
What you would do is get the bedGraphToBigWig utility for your system. You
can do uname -a and pick the utility from the matching directory here:
http://hgdownload.soe.ucsc.edu/admin/exe/
 
Then you would get the chrom.sizes file for the matching assembly, in this
case hg19 it appears. You can obtain it here:
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.chrom.sizes
 
Then you could run a command like:
bedGraphToBigWig test.bedGraph hg19.chrom.sizes out.test.bw
 
If you there are some annotations that extend past the end of chromosomes,
so you can avoid that problem using the -clip option, to clip these final
entries.
 
Our engineer shares that the chrM message you see could be a symptom of how
our hg19 assembly has a non-standard chrM, and that you might want to strip
those entries out, you could that by cat test.bedGraph | grep -v "chrM" >
test.edit.bedGraph. If you have more questions abou the chrM coordinates
please feel free to reply with more questions.
 
Thank you again for your inquiry and using the UCSC Genome Browser. If you
have any further questions, please reply to genome <at> soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-www <at> soe.ucsc.edu.
 
All the best,
 
Brian Lee
UCSC Genome Bioinformatics Group
 
On Tue, Mar 10, 2015 at 6:50 PM, Anandh Kumar <anandhakumar86 <at> gmail.com>
wrote:
 
Brian Lee <brianlee <at> soe.ucsc.edu>: Mar 27 10:55AM -0700

Dear Mamduh,
 
Thank you for using the UCSC Genome Browser and your question about track
hubs.
 
Are you still seeing the error, perhaps you are visiting a mirror of the
UCSC Genome Browser and not our official site, http://genome.ucsc.edu/?
 
Thank you again for your inquiry and using the UCSC Genome Browser. If you
have any further questions, please reply to genome <at> soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-www <at> soe.ucsc.edu.
 
All the best,
 
Brian Lee
UCSC Genome Bioinformatics Group
 
On Wed, Mar 25, 2015 at 3:31 AM, Bin Ahmad Zabidi,Muhammad Mamduh <
Brian Lee <brianlee <at> soe.ucsc.edu>: Mar 27 10:46AM -0700

Dear Dan,
 
Thank you for using the UCSC Genome Browser and your question about your
custom tracks that are not loading.
 
We have heard reports from other users also experiencing this issue with
copy.com/files that used to work on the browser. Other free services like
dropBox that once worked on the browser changed their configurations to
disallow the byte range requests needed for the data to be accessed by the
browser, it appears copy.com may have likewise changed some configurations
recently. While byte range requests are very efficient for internet access,
as it allows the browser to read arbitrary bits of the file without having
to read the entire file, some of these free providers find all these
requests expensive for their servers to honor. However when big files might
represent several gigabytes of data, doing the byte range request which
only fetches the tiny bits of the huge files to be displayed is much less
taxing in the big picture, so some system administrators are willing to
enable this request once they understand the bigger picture. You might want
to contact copy.com and ask them about changes they may have made.
 
It is definitely worth investigating with copy.com why these changes are
happening, but is likely similar to DropBox that they have discontinued
some assepct of support for files they are storing for free. We've seen
some people have success with an Amazon Cloud account on S3 storage, or
often if you are a paying client of these services they will work to
resolve your needs.
 
Another option is if you do not need to share these files remotely is to
use our now available Virtual Machine Genome Browser in a Box, where you
install a VM of the UCSC Browser and share files locally on your laptop,
skipping the need to run anything over the internet. GBiB is free for
non-commercial use, read more here:
https://groups.google.com/a/soe.ucsc.edu/d/msg/genome/xhucY1YVv88/K2gyAwEgEmcJ
 
Thank you again for your inquiry and using the UCSC Genome Browser. If you
have any further questions, please reply to genome <at> soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-www <at> soe.ucsc.edu.
 
All the best,
 
Brian Lee
UCSC Genome Bioinformatics Group
 
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 19 Mar 18:22 2015

Digest for genome <at> soe.ucsc.edu - 2 updates in 2 topics

Bogdan Tanasa <tanasa <at> gmail.com>: Mar 19 08:05AM -0700

Dear all,
 
please could you advise on the following :
(possibly based on http://hgdownload.soe.ucsc.edu/goldenPath/ce10/bigZips/)
 
<> what collection of sequences of mRNAs of C elegans ce10 genome shall I
use ?
 
<> how could I relate their NM/NR coded to the actual gene names (eg
daf-16) ?
 
thank you,
 
bogdan
Karol Nowicki-Osuch <karolno <at> gmail.com>: Mar 18 06:06PM

Hello,
 
I would like to unshare (delete) a session which I have shared with other
parties. I used 'Bookmark the session' option to share it with others.
 
I thought that simple deletion of the session from my saved session will do
the trick. However when I follow the original share link, I can still
access the session. Can you please advice how the original link to the
session can be deleted?
 
Thanks a lot,
 
Karol
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 7 Mar 18:19 2015

Digest for genome <at> soe.ucsc.edu - 2 updates in 2 topics

Jonathan Casper <jcasper <at> soe.ucsc.edu>: Mar 06 05:05PM -0800

Hello Manasa,
 
Thank you for your question about obtaining genomic sequence around the
location of your SNPs. It is straightforward to obtain genomic sequence in
a region 60 bp up and downstream of your SNPs, but tying that together with
reading frame information may be difficult.
 
To obtain genomic sequence for your SNPs, you will need to load them into
the UCSC Genome Browser as a custom track. If your data are in pgSNP or VCF
format, you can load that file directly as a custom track. Otherwise, you
may need to convert your SNP data into a simple coordinate format like BED (
http://genome.ucsc.edu/FAQ/FAQformat.html#format1). After that, open the
UCSC Table Browser at http://genome.ucsc.edu/cgi-bin/hgTables and select
your custom track. Select the region "genome" and output format "sequence",
then click "get output". On the next page, you can fill in the boxes to add
60 bases up and downstream from your SNPs.
 
Reading frame information is difficult because it is tied to particular
gene definitions, and for some species we have many gene tracks. The
Variant Annotation Integrator tool at http://genome.ucsc.edu/cgi-bin/hgVai will
allow you to submit variants in VCF or pgSNP format (or just as a list of
rsIDs from dbSNP), and generate consequences with respect to a gene set of
your choice. For missense variants, that includes a short display of any
codon changes. It will not, however, extend to 60 bases and there is
currently no way to change that setting. Perhaps you can combine this
output with the sequence from the Table Browser to get what you need?
 
I hope this is helpful. If you have any further questions, please reply to
genome <at> soe.ucsc.edu or genome-mirror <at> soe.ucsc.edu. Questions sent to those
addresses will be archived in publicly-accessible forums for the benefit of
other users. If your question contains sensitive data, you may send it
instead to genome-www <at> soe.ucsc.edu.
 
--
Jonathan Casper
UCSC Genome Bioinformatics Group
 
Valya Burskaya <valya.burskaya <at> gmail.com>: Mar 07 03:14AM +0300

Hello!
 
I am interested in work with alignments of 99 vertebrate genomes with human
genes. There are three variants of alignments for downloading in fasta
format - knownGene, knownCanonical and RefSeq. Can you, please, help me to
distinguish content of this files or tell where to search information?
 
Now I understand its content as follows:
 
KnownGene file contains exons of approximately 64.000 genes - it should be
all isoforms of protein genes, pseudogenes and RNA-genes of human. (So,
each exon could be represented in alignment many times if it belongs to
different splice variants.)
RefSeq file should be more or less the same, but set of gene ID-s was taken
from RefSeq database.
KnownCanonical file contains ~21.000 genes - there are only protein coding
genes, the longest CDS for each set of alternatively transribing genes.
(So, each exon should be represented in alignment only once).
 
And one more question.
Even in knownCanonical file there are stop codons in the middle of human
(~20 sequences) and vertebrate (~300.000 sequences) genes. Was there any
filtration of such cases when aligning was performed?
 
 
I'll be gratefull for any answer.
 
Valya Burskaya
graduate student
Moscow State University
Biology department
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.
genome | 28 Feb 18:22 2015

Digest for genome <at> soe.ucsc.edu - 4 updates in 3 topics

Jennifer Tom <tom.jennifer <at> gene.com>: Feb 27 04:08PM -0800

Hi!
 
I've been working with the UCSC self chain track and would like to include
a transformed version of it in an R/bioconductor package. I would
obviously cite UCSC and provide a link to the original file. I just want
to make sure this is something that UCSC allows with it's data. Thanks!
 
Jen
James Studd <James.Studd <at> icr.ac.uk>: Feb 27 11:30AM

Hi,
 
I was hoping to find a bed file which contains the mapped motifs for transcription factors. It's not immediately clear from looking at the bed file above or the parent directory 'http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeRegTfbsClustered/' which file has this information. Is it contained within the wgEncodeRegTfbsClusteredV3.bed.gz file? If so how can you distinguish ChIP seq peaks from motifs?
 
Many thanks
 
James Studd | Postdoctoral Research Fellow
The Institute of Cancer Research | 15 Cotswold Road | Belmont | Sutton | Surrey | SM2 5NG
T +44 208 722 4113 | E James.Studd <at> icr.ac.uk<mailto:James.Studd <at> icr.ac.uk> | W www.icr.ac.uk<http://www.icr.ac.uk/> | Twitter <at> ICR_London<https://twitter.com/ICR_London>
Facebook www.facebook.com/theinstituteofcancerresearch<http://www.facebook.com/theinstituteofcancerresearch>
Making the discoveries that defeat cancer
[cid:image001.gif <at> 01D05280.BB67D4E0]<http://www.icr.ac.uk/>
 
 
The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.
 
This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.
Brian Lee <brianlee <at> soe.ucsc.edu>: Feb 27 03:20PM -0800

Dear James,
 
Thank you for using the UCSC Genome Browser and your question about
obtaining a bed file for the mapped motifs for transcription factors in the
TFBS Clusters track:
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeRegTfbsClusteredV3
 
Usually where you looked would be the right place to find the downloadable
files. Instead, to avoid the kind of possible confusion you are hinting at,
and since the motifs are not universal to all clusters, there was
discussion to release this to a downloads location for a separate track
devoted to these factorbook generated motifs, but that hasn't happened yet.
You can obtain this information now, however, from both the Table Browser
or the Public MySql server.
 
To use the MySQL server, please see the resource page,
http://genome.ucsc.edu/goldenPath/help/mysql.html, and then you could use a
command like the following, or a variation of it to obtain parts or all of
the tables of interest:
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -Ne 'select * from
factorbookMotifPos;' hg19 > factorbookMotifPos
 
The other tables of interest would be the factorbookMotifPwm table and the
two tables that help coordinate the factorbook TFBS terms to the names used
at UCSC: factorbookMotifCanonical and factorbookGeneAlias. Those tables
help translate terms, for example, bot UCSC and factorbook use BRCA1, but
the differ with CTBP2 and CtBP2, or EP300 and P300.
 
You can also access all these tables from the Table Browser:
http://genome.ucsc.edu/cgi-bin/hgTables
 
1. Select the hg19 human assembly.
2. Set "group:" to "All Tables"
3. From "table:" select the factorbookMotifPos table.
*At this point you could use the filter or intersection tools to limit the
output to factors or locations of interest (via a bed file of coordinates,
see more about the Table Browser here:
http://www.openhelix.com/cgi/tutorialInfo.cgi?id=28)
4. Set "output format:" to either "custom track" or "BED" and click "get
output".
 
If what you are desiring is just the motifs displayed in the
wgEncodeRegTfbsClusteredV3 track, it becomes a little more complicated, as
algorithms are applied to limit the display of motifs from the original
factorbookMotifPos table to the highest score per region. Here is a session
link to help exemplify this issue where multiple motifs for NFY exist (NFYB
at UCSC), and only one is mapped to the cluster:
http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=Brian%20Lee&hgS_otherUserSessionName=hg19.factorbookMotifPos
 
Thank you again for your inquiry and using the UCSC Genome Browser. If you
have any further questions, please reply togenome <at> soe.ucsc.edu. All
messages sent to that address are archived on a publicly-accessible forum.
If your question includes sensitive data, you may send it instead to
genome-www <at> soe.ucsc.edu.
 
All the best,
 
Brian Lee
UCSC Genome Bioinformatics Group
 
STIRPARO Giuliano Giuseppe RIC <Giuliano.Stirparo <at> humanitasresearch.it>: Feb 27 12:58PM +0100

Hi, I have some data like this:
 
file_reverse.BedGraph
 
track type=bedGraph smoothingWindow=16
1 3201074 3201125 -0.06
1 3201215 3201266 -0.06
1 3201839 3201844 -0.06
1 3201845 3201891 -0.06
1 3201947 3201998 -0.06
1 3210947 3210998 -0.06
1 3211290 3211341 -0.06
1 3216209 3216260 -0.13
1 3216369 3216420 -0.13
1 3254766 3254817 -0.06
 
file_Forward.bedGraph
 
track type=bedGraph smoothingWindow=16
1 3152649 3152700 0.06
1 3152983 3153030 0.06
1 3153031 3153035 0.06
1 3215255 3215306 0.06
1 3215307 3215358 0.06
1 4243564 4243575 0.06
1 4243575 4243615 0.13
1 4243615 4243626 0.06
1 4496431 4496482 0.06
 
I have two question.
1) I would like to merge the data and obtain a uniq file and represent the strand coverage with different color.
2) I would like to smooth my data (I have already try with smoothingWindow=16 but the file look the same as smoothingWindow=off)
 
Thanks for ideas or suggestions.
 
Best
 
Giuliano
 
________________________________
 
[cid:image4b0b12.JPG <at> 336984a6.40a5cbe3]<http://www.humanitas.it/>
 
________________________________
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to genome+unsubscribe <at> soe.ucsc.edu.

To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe <at> soe.ucsc.edu.

Gmane