Marc Carlson | 16 Sep 00:40 2014

Important announcement about our new support site


We are pleased to announce the official transition to our new support

This site replaces the Bioconductor mailing list as the primary source
for user support; it includes the last 11+ years of mailing list

To post to the site, create a new account by visiting the User Login

For those who have posted to the bioconductor mailing list before,
recover your well earned reputation from previous posts and answers by
scrolling to the bottom of the log in page and clicking the link that
says 'Forgot Password?'.

Frequently Asked Questions about using the support site are available at

The FAQ describes how to merge different names into a single account,
and contains other helpful information about how to get the most out
of the new site.

This is the final post to the Bioconductor mailing list; we look
(Continue reading)

KC [guest] | 15 Sep 23:24 2014

design matrix in limma

Hello BioC forum,

I am making a design matrix for gene expression data analysis with  the adjustment of Sex variable.  The R code
I am using is below. I want to make sure if I am doing correctly.  Thanks for your review and feedback.

> library(limma)
> Sample<-factor(pd$Sample_Group, levels=c("Control","Case"))   ### treatment 
> Sex<-factor(pd$Sex, levels=c("M","F"))                        ### Sex
> design<-model.matrix(~Sample+Sex)
> design
   (Intercept) SampleCase SexF
1            1          0    0
2            1          0    0
3            1          0    0
4            1          0    1
5            1          0    1
6            1          0    1
7            1          1    1
8            1          0    0
9            1          0    1
10           1          1    0
11           1          1    1
12           1          0    1
13           1          0    1
14           1          1    0
15           1          0    0
16           1          1    1
17           1          0    0
18           1          0    0
19           1          1    0
(Continue reading)

Wolfgang Huber | 15 Sep 19:36 2014

PI position at EMBL-EBI (Cambridge UK)

// see which can also be
reached from the EMBL / Jobs homepage.

We are seeking an outstanding computational biologist to establish a new research group to complement our
current research programme. The EMBL-EBI, located on the Wellcome Trust Genome Campus near Cambridge,
UK, is a global centre for biological data resources and computational biology research. We pursue many
different aspects of biology, exploiting and helping to interpret and integrate the biological data
held in the databases. Current research topics include sequence analysis, computational genomics,
phylogeny, structural biology, regulatory control networks, functional genomics, neurobiology and
literature/text mining. 

The advertised research position is suitable for outstanding candidates from any area of computational
biology, but we are keen to strengthen our research in translational bioinformatics, genomic
sequencing analysis, systems biology, cellular modelling and literature and data integration. The
candidate will join a strong and supportive group of young research group leaders and will interact
closely with the large database service teams, benefiting from their technical expertise and
scientific knowledge.

The EBI, part of the European Molecular Biology Laboratory (EMBL), is a world-leading bioinformatics
centre providing biological data to the scientific community with expertise in data storage, analysis
and representation.
Bioconductor mailing list
Search the archives:

zhao shilin | 15 Sep 16:50 2014

Re: KEGGprofile: "Error in phyper - Non-numeric argument to mathematical function" when using non model organism

Dear Stefano,

I've tested species Pseudovibrio FO-BEG1 ("psf" in KEGG database) and found
it was imported into KEGG database in 2013 ( But KEGG.db package
was updated until 2012. I think that is the reason for the error message.
Now I provide a new parameter "download_latest" in "find_enriched_pathway"
function of KEGGprofile package, you can find it in KEGGprofile version
1.7.6 on Bioconductor. But I think it is still in the develop channel
of Bioconductor so you may not able to see it immediately. You can download
it from my github ( now.

Here is the test codes:

Please let me know if you have any other questions.


2014-09-14 2:07 GMT-05:00 stefano romano <stfn.romano@...>:

(Continue reading)

Pau Marc Muñoz Torres | 15 Sep 16:40 2014

affymetrix probe databases

Good afternoon to everybody,

 I'm just doing my firsts steps working with  affymetrix data and i have
some questions.

 I  started to working with CEL files by moving the data contained in them
to a csv file. Then I tried to relate affymetrix codes with uniprot codes.
I  performed it as follow:

data <- ReadAffy()
my_frame <- data.frame(exprs(data))
Annot <- data.frame(ACCNUM=sapply(contents(xxxACCNUM), paste, collapse=",
"), SYMBOL=sapply(contents(xxxSYMBOL), paste, collapse=", "),
DESC=sapply(contents(xxxGENENAME), paste, collapse=",
"),DESC=sapply(contents(xxxUNIPROT), paste, collapse=", "))
all <- merge(Annot, my_frame, by.x=0, by.y=0, all=T)
write.csv(all, file = "xxx.csv")

where XXX is one of the follwing database


 unfortuntally the codes cointained at the CEL  files and the database do
not match well. Some examples are:
(Continue reading)

Martin Morgan | 15 Sep 16:13 2014

Course: Learning R / Bioconductor for Sequence Analysis, Seattle, WA Oct 27-29

Course: Learning R / Bioconductor for Sequence Analysis

Dates: October 27-29, Seattle, WA.


This course is directed at beginning and intermediate users who would like an 
introduction to the analysis and comprehension of high-throughput sequence data 
using R and Bioconductor. Day 1 focuses on learning essential background: an 
introduction to the R programming language; central concepts for effective use 
of Bioconductor software; and an overview of high-throughput sequence analysis 
work flows. Day 2 emphasizes use of Bioconductor for specific tasks: an RNA-seq 
differential expression work flow; exploratory, machine learning, and other 
statistical tasks; gene set enrichment; and annotation. Day 3 transitions to 
understanding effective approaches for managing larger challenges: strategies 
for working with large data, writing re-usable functions, developing 
reproducible reports and work flows, and visualizing results. The course 
combines lectures with extensive hands-on practicals; students are required to 
bring a laptop with wireless internet access and a modern version of the Chrome 
or Safari web browser.

Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

Bioconductor mailing list
(Continue reading)

Hugo Varet | 15 Sep 15:41 2014

Interaction categorical/continuous variable DESeq2

Dear list, dear Mike Love,

I am using DESeq2 to model counts from an unusual type of experiment and 
I have a question about the strategy I employed. The experiment 
consisted in sequencing 33 samples for which we have the following 
  - group (16 samples from group A and 17 from group B)
  - a continuous variable X almost uniform (variable of interest)

I have to add the group to the design formula because I know it has a 
strong effect on the counts. Then, as my goal is to detect genes which 
vary with the continous variable X in the same way within both groups A 
and B, I want to exclude genes for which there is an interaction between 
group and X. The design is thus ~ group + X + group:X and I used the 
following lines to test the interaction:

dds <- DESeqDataSetFromMatrix(countData=counts, colData=target, design = 
~ group + X + group:X)
dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds)
res <- results(dds, name="groupB.X")
sum(res$padj<=0.05, na.rm=TRUE)

As I found no significant interaction (the minimum adjusted p-value is 
about 0.6), I decided to remove the interaction term from the design and 
to use ~ group + X. I can then test for the coefficients of X.

If I do not detect any significant interaction, I think it is due to a 
(Continue reading)

Chapeaublanc Elodie | 15 Sep 13:23 2014

NA values in Biomart query

I want to retrieve some annotation informations from "ensembl exon id" by using a biomart query : "ensembl
gene id", "ensembl transcript id", ...
After my "getBM" query, without error message, my object "results" contains lot of "NA" values but if I
re-send a query on "NA" values id, I successfully retrieve informations.

Example : 
ensembl <-  useMart(host="", biomart="ENSEMBL_MART_ENSEMBL", dataset
= "hsapiens_gene_ensembl")

results <-
getBM(attributes=c("ensembl_exon_id","ensembl_gene_id"),filters="ensembl_exon_id",values=ENSE,mart=ensembl, uniqueRows=TRUE)
My object ENSE contain 512120 exon_id.

Why I obtain lot of "NA" values when my value option  is large ? Must I to split it ? and Why ? 



R version 3.1.1 (2014-07-10)
Platform: x86_64-pc-linux-gnu (64-bit)

 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
 [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
(Continue reading)

stefano romano | 15 Sep 10:46 2014

error: "'names' attribute [16] must be the same length as the vector [2]", using Making Organism packages use of makeOrgPackage()

I am creating my annotation package using makeOrgPackage().
I followed the instruction here:

This are the dataframes and the commands I used:

> test3
       GID Gene Name                                               Protein
1 PSE_0724  PSE_0724                           GCN5-related
2 PSE_0725  PSE_0725                           GCN5-related
3 PSE_0726  PSE_0726                           GCN5-related
4 PSE_0727  PSE_0727                           GCN5-related
5 PSE_0728  PSE_0728                             Acetyltransferase, GNAT
6 PSE_0729  PSE_0729                             Acetyltransferase, GNAT
7 PSE_0730  PSE_0730 Protein containing GCN5-related N-acetyltransferase
8 PSE_0731  PSE_0731                 Ribosomal-protein-serine

> test2
       GID         GO EVIDENCE
1 PSE_0724 GO:0008080      IEA
2 PSE_0725 GO:0008080      IEA
(Continue reading)

do r | 14 Sep 16:42 2014

export funciton alters ranges in output BED file


I am attempting to use the export() function to generate a BED file from a
GRanges object.
However, the ranges in the output file are altered so that the start
coordinate is subtracted by one,
for example:

[987]        3 [37035154, 37035155]      +   |  Class 4     MLH1    c.116+1G>A
  [988]        3 [37067241, 37067242]      +   |  Class 4     MLH1     c.1153C>T
  [989]        3 [37067125, 37067126]      +   |  Class 4     MLH1   c.1039-2A>T
  [990]        3 [37067125, 37067126]      +   |  Class 4     MLH1   c.1039-2A>G
  [991]        3 [37061954, 37061955]      +   |  Class 4     MLH1   c.1038+1G>C

results in this output:
3	37067240	37067242	.	0	+
3	37067124	37067126	.	0	+
3	37067124	37067126	.	0	+
3	37061953	37061955	.	0	+

Since I intend to later to searrch for intersections between the
ranges in the BED file and variants in a vcf file (using Tabix), I am
afraid that this subtratcion may lead to false positive.

What is the reason for this subtraction from the start and is there
any way to supress it?

thanks in advance

Dolev Rahat
(Continue reading)

Gordon K Smyth | 14 Sep 06:54 2014

Re: Packages for GO and KEGG analysis on RNAseq data

Dear Nicolas,

I think you are suffering from the "paradox of choice"

which says that too much choice leads to more unhappiness and less action 
even when all the choices are potentially good ones.

Pathway analysis is a very large and very active research area, and there 
are a large number of alternative methods.  This is because there is no 
unique definition of what a pathway is, and because there are large number 
of different ways to rank pathways, and because there is no consensus as 
to which ranking will best match biological significance.  Perhaps no 
consensus is even possible, because different methods may give better 
results in different circumstances.

Some methods simply count the number of genes associated with each GO term 
in your DE list (overlap methods).  Other methods work with the test 
statistics or fold changes for each gene (gene set tests).  Some methods 
adjust for inter-correlations between genes and some don't.  Some adjust 
RNA-seq specific biases like gene length.  Some methods test each GO term 
in isolation whereas others test them relative to each other.  It is not 
surprising that methods that use different information will potentially 
give different results.

You are asking which is the most validated method, but in truth there are 
many validated methods. I would however avoid methods which work purely on 
logFC, because low expression genes may have large fold changes from 
RNA-seq just by chance.  Steve suggested roast and camera, which are 
(Continue reading)