do r | 14 Sep 16:42 2014

export funciton alters ranges in output BED file


I am attempting to use the export() function to generate a BED file from a
GRanges object.
However, the ranges in the output file are altered so that the start
coordinate is subtracted by one,
for example:

[987]        3 [37035154, 37035155]      +   |  Class 4     MLH1    c.116+1G>A
  [988]        3 [37067241, 37067242]      +   |  Class 4     MLH1     c.1153C>T
  [989]        3 [37067125, 37067126]      +   |  Class 4     MLH1   c.1039-2A>T
  [990]        3 [37067125, 37067126]      +   |  Class 4     MLH1   c.1039-2A>G
  [991]        3 [37061954, 37061955]      +   |  Class 4     MLH1   c.1038+1G>C

results in this output:
3	37067240	37067242	.	0	+
3	37067124	37067126	.	0	+
3	37067124	37067126	.	0	+
3	37061953	37061955	.	0	+

Since I intend to later to searrch for intersections between the
ranges in the BED file and variants in a vcf file (using Tabix), I am
afraid that this subtratcion may lead to false positive.

What is the reason for this subtraction from the start and is there
any way to supress it?

thanks in advance

Dolev Rahat
(Continue reading)

Gordon K Smyth | 14 Sep 06:54 2014

Re: Packages for GO and KEGG analysis on RNAseq data

Dear Nicolas,

I think you are suffering from the "paradox of choice"

which says that too much choice leads to more unhappiness and less action 
even when all the choices are potentially good ones.

Pathway analysis is a very large and very active research area, and there 
are a large number of alternative methods.  This is because there is no 
unique definition of what a pathway is, and because there are large number 
of different ways to rank pathways, and because there is no consensus as 
to which ranking will best match biological significance.  Perhaps no 
consensus is even possible, because different methods may give better 
results in different circumstances.

Some methods simply count the number of genes associated with each GO term 
in your DE list (overlap methods).  Other methods work with the test 
statistics or fold changes for each gene (gene set tests).  Some methods 
adjust for inter-correlations between genes and some don't.  Some adjust 
RNA-seq specific biases like gene length.  Some methods test each GO term 
in isolation whereas others test them relative to each other.  It is not 
surprising that methods that use different information will potentially 
give different results.

You are asking which is the most validated method, but in truth there are 
many validated methods. I would however avoid methods which work purely on 
logFC, because low expression genes may have large fold changes from 
RNA-seq just by chance.  Steve suggested roast and camera, which are 
(Continue reading)

Gordon K Smyth | 14 Sep 05:22 2014

DE analysis of PCR array [was: dataset dim for siggenes]

Hi Fred,

We frequently use limma to analyze ct values from PCR arrays similar to 
yours (although usually with fewer samples), so the analysis that you have 
already done is basically what we recommend.

We use cyclicloess normalization with house-keeping genes up-weighted. 
This gives a nice compromise between global normalization and 
normalization on house-keeping genes.  After normalization, we use MDS 
plots to search for unexpected batch effects.  Then we use the usual limma 
pipeline except that we set robust=TRUE and trend=TRUE when running 
eBayes().  Sometimes we use treat() instead of eBayes() to give more 
emphasis to larger fold changes.

People worry too much about normality.  Limma makes similar assumptions to 
anova, and both are quite robust against non-normality for a two group 

There are more important things to worry about, for example 
heteroscedasticity (large ct values are less precise than small ct 
values), outliers and batch efffects.  Trend=TRUE is intended to deal with 
heteroscedasicity, robust=TRUE with outliers, and limma allows you to 
correct for batch effects if they exist.

Best wishes

> Date: Fri, 12 Sep 2014 19:45:27 -0300 (BRT)
> From: ferreirafm@...
> To: jmacdon <at>
(Continue reading)

Mahes Muniandy [guest] | 13 Sep 20:31 2014

HGU133Plus2 CDF vs hgu133plus2hsentrezgcdf CDF (30% difference in results)

My name is Mahes Muniandy and I am a doctoral student. I have been analysing Affymetrix HGU133Plus2 cel
files to determine differential expressions in twin pairs (within pair differences). I have used affy,
gcrma, nsfilter and limma to do my analysis. I have run my analysis using the HGU133plus2 CDF available in
biocondutor and then tried the whole analysis again using the HGU133plus2 cdf from Brainarray. The limma
results differ significantly (2351 differentially expressed genes for the former and 2700  genes for the
latter analysis). 630 genes (about 30%) from the 2351 genes do not exist in the list of 2700 genes.

I have read "Evolving Gene/Transcript Definitions Significantly Alter the Interpretation of GeneChip
Data  M. Dai  et al." and see some convincing arguments there. But, I am confused with which limma results to
go with. Could you advise me on the guiding principles that I should follow in order to decide which cdf to
use. I do realise that the onus is on me to decide but sadly, I am quite lost in this matter. I would appreciate
any help available.

Many Thanks,
Mahes Muniandy,
Uni. Helsinki

 -- output of sessionInfo(): 

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-unknown-linux-gnu (64-bit)

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
(Continue reading)

Chris Clarkson [guest] | 13 Sep 14:56 2014

Bioinformatics researching Schizophrenia

I am an undergraduate student and I have been given a project that involves bioinformatics. My supervisor
researches chromosome abnormalities and gene expression and he has been given a list of GWAS generated
SNPs, that have been linked to Schizophrenia and it has been assigned to me to generate a list of potential
genes/regions that we could CROSS REFERENCE with the other list to see if any of these SNPs occur at sites
such as miRNA sites, methyltransferase genes, Acetylation and Deacetylation genes etc. that could be
implicated in Schizophrenia.
I was hoping you would be able to recommend the necessary R-Packages for A. generating a list of potentially
implicated regions and B. to cross validate this list with the one I have received.
Thank you

 -- output of sessionInfo(): 


Sent via the guest posting facility at

Bioconductor mailing list
Search the archives:

Bo [guest] | 13 Sep 10:29 2014

Ask help for Rcpp

Does anyone know how to use regex in Rcpp for Bioconductor package development?
Best regards!

 -- output of sessionInfo(): 


Sent via the guest posting facility at

Bioconductor mailing list
Search the archives:

stefano romano | 13 Sep 10:13 2014

topGO: how to visualize gene in GO enriched categories?

Hi All,

I am using 'topGO package for GO enrichment analysis in non model organisms.
I found the tool really intuitive and versatile as far as the statistical
test are concerning.
However, I was wondering if there is a way to display/list the genes
belongig to each GO enriched categories.

Thanks in advance.

	[[alternative HTML version deleted]]

Bioconductor mailing list
Search the archives:

stefano romano | 13 Sep 10:03 2014

KEGGprofile: "Error in phyper - Non-numeric argument to mathematical function" when using non model organism


I am using KEGGprofile to perform KEGG enrichment for a non model organism.
Using the data from the R documentation i perfectely obtaine the enrichment.

> summary(genes)
   Length     Class      Mode
      300 character character
> is.vector(genes)
[1] TRUE

However, when I use my datasets, which I submit as character vector with
NCBI ID, I get the following error:

Error in phyper(kegg_result_length[x], keggpathway2gene_length[x],
length(unique(unlist(keggpathway2gene))) -  :
  Non-numeric argument to mathematical function

Any suggestion how to overcome this problem?

Thank you very much.

	[[alternative HTML version deleted]]

Bioconductor mailing list
(Continue reading)

Vinicius Henrique da Silva | 12 Sep 17:19 2014

Changing the x axis size using tracks (ggbio)


I am using the tracks function from ggbio package, but I am unable to change the size in the x axis. 

The df dataframe: 
 chr    start     end           id
chr12 72065147 72204484 ENSBTAG00000045751
chr12 72529373 72690449 ENSBTAG00000047181
chr12 73574461 73704802 ENSBTAG00000046041
chr12 73890111 73977633 ENSBTAG00000047764
chr12 74198711 74199129 ENSBTAG00000047360
chr12 74816044 74978179 ENSBTAG00000023309
chr12 75457896 75536848 ENSBTAG00000026070
chr12 75664651 75870596 ENSBTAG00000039714
chr12 76758753 76864805 ENSBTAG00000003568
chr12 76867833 76922959 ENSBTAG00000003569
chr12 76958977 77024110 ENSBTAG00000012065
chr12 77032585 77176916 ENSBTAG00000004401

gr <- makeGRangesFromDataFrame(df, TRUE)

ex <- autoplot(gr, ylab = "Soft") + theme(text = element_text(size=20))


In this example just the y axis in changed by theme (size =20). The x axis continues very small.
Interestingly if I dont use tracks function the two axis (x and y) change normally to my desired size (size
=20). However, I have several tracks in my actual data, which  coerce me to use tracks function. Any ideas
why it is not working?

(Continue reading)

Marc Carlson | 13 Sep 00:08 2014

important announcement


This is a second warning that in less than a week we plan to roll out
the new support site for Bioconductor.

*Important* Once the support site is 'live', posts to the Bioconductor
mailing list will receive an automatic reply indicating that it is no
longer in service and directing you to the new site. This change
affects the 'bioconductor' mailing list; the 'bioc-devel' mailing list
will continue to function as before.


Bioconductor mailing list
Search the archives:

Victoria Svinti | 12 Sep 21:29 2014

what drives sample outliers in 450k (detectOutlier, lumi)

Hi there

I am finding some sample outliers in the result from the detectOutlier method in lumi. I don’t see
anything strange about these samples when I look at the mean beta values. I wonder how I can find out what
drives these samples to be outliers (what set of probes, etc)? It would be good to know the reason for these,
to determine whether they need to be re-included in a separate experiment. They do not seem to be outliers
in the gender tests .. 

I would appreciate your thoughts on this. 

Many thanks

Victoria Svinti
Colon Cancer Genetics Group
MRC Human Genetics Unit, IGMM
University of Edinburgh, Western General Hospital,
Crewe Road, Edinburgh, EH4 2XU

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Bioconductor mailing list
(Continue reading)