Gordon K Smyth | 24 May 2013 04:23
Picon
Picon
Favicon

RNAseq less sensitive than microarrays? Is it a statistical issue?

Dear Lucia,

We have compared RNA-seq to microarrays on exactly the same RNA samples 
for several studies.  We consistently find that sequence depths of 10 
millions reads or so are sufficient to get substantially more sensitivity 
from RNA-seq than from microarrays, but only when using some RNA-seq 
analysis pipelines.  Our RNA-seq pipeline uses Rsubread and featureCounts 
to get genewise counts, then either voom or edgeR to the differential 
expression analysis.  Voom is attractive for this comparison because it 
ensures closely comparable analyses for RNA-seq as for the microarrays.

See slides 40-43 of a talk I gave last year:

   http://bioinformatics.org.au/ws12/program

for a comparison of Illumina microarrays vs Illumina sequencing for a 
particular set of RNA samples.  This shows that both microarrays and 
sequencing performs well, but RNA-seq gives a greater dynamic range and 
finds more genes.  In particular it finds lots of genes that were not even 
represented on the microarrays.  This is our typical experience.

Best wishes
Gordon

> Date: Wed, 22 May 2013 15:57:52 -0400
> From: Lucia Peixoto <luciap@...>
> To: Wolfgang Huber <whuber@...>
> Cc: bioconductor@... list [bioconductor@...]
> 	<bioconductor@...>
> Subject: Re: [BioC] [Bioc] RNAseq less sensitive than microarrays? Is
(Continue reading)

Gordon K Smyth | 24 May 2013 04:02
Picon
Picon
Favicon

How do I background correct an Illumina eset without using lumiB

Dear Emma,

Do you have detection p-values for each sample?  If you do, then the 
neqc() function in the limma package can background correction the 
Illumina intensities by automatically reconstructing what the negative 
control values must have been for each array.

See

  http://nar.oxfordjournals.org/content/38/22/e204

for a description of the neqc() method.  Also see the Mammary Stem Cell 
case study in the limma User's Guide for an example of its use.  (The 
published article and the case study assume that control probes are 
available, but the usage with detection p-values is similar.)

Best wishes
Gordon

> Date: Wed, 22 May 2013 03:08:56 -0700 (PDT)
> From: "Emma Bell [guest]" <guest@...>
> To: bioconductor@..., e.bell12@...
> Cc: lumi Maintainer <dupan.mail@...>
> Subject: [BioC] How do I background correct an Illumina eset without
> 	using	lumiB?
>
> Hello,
>
> I'm doing some work with publicly available microarray data sets that 
> I've downloaded from GEO. I'm having some trouble using the lumi package 
(Continue reading)

Thomas Girke | 24 May 2013 00:39
Favicon

Re: multiple feature/mode counting with summarizeOverlaps

Thanks Varlerie for looking into these suggestions and improvements. I
really appreciate the time and effort you keep investing into my
unsolicited questions. - Obviously, read counting has become a major
activity for many of us and your software will definitley put to great
use. In our projects, flexiblity and accuracy of the read counter is
most important to us, which summarizeOverlaps and associates address
very well. Since research facilities like ours have to perform those
analyses on very large disk storage devices with several hundred TBs of
space that are shared among many NGS users, disk I/O and network speeds
on compute clusters have become a major bottleneck. Thus, we are also
very interested in solutions that collect counts for many feature types
and counting modes in a single pass-through of these big files, as
opposed to accessing them over and over again when it is not really
necessary. Usually, I also like to delete the bam files early on in most
of our RNA-Seq workflows to save storage space. However, our
collaborators often keep coming up with all kinds of creative and
exciting new ideas what features/modes to count on where the best
analysis strategy is often to generate by default counts for many
feature types right away (even if I don't intend and recommend or to use
them) rather than only one prescibed solution that usually ends up not
being enough. I guess RNA-Seq is still very much in an engineering stage
where most of us depend on software design that provides a very high
level of flexiblitly.

Again, thanks a lot.

Thomas

On Thu, May 23, 2013 at 09:06:10PM +0000, Valerie Obenchain wrote:
> Hi Thomas,
(Continue reading)

Steve Lianoglou | 23 May 2013 23:09

Re: Installation Problems

Hi Rishi,

Please use "reply all" when replying to emails on the list so that discussion stays on the list. This way you
will likely get better help since more eyeballs are looking at your problems, and the list can be used as a
resource for people in the future.

On May 23, 2013, at 1:37 PM, RISHI SINHA <rishi.z.sinha@...> wrote:

> Hi Steve,
> 
> Sorry, I guess I really don't know exactly what I'm downloading from Bioconductor. I'm using a book,
Applied Statistics for Bioinformatics Using R, and it says:
> "Bioconductor is primarily based on R and can be installed, as
> follows.
> > source("http://www.bioconductor.org/biocLite.R")
> > biocLite()"

Ahh, I see. The packages that get installed by default have likely changed since that book was published.
Still, you can likely broadly apply much of what's there, although you may run into bumps.

The `biocLite()` function installs thing from bioconductor's repositories. You can use it to specify
packages you'd like, so you could do:

R> source("http://bioconductor.org/biocLite.R")
R> biocLite("multtest")
R> library(multtest)
R> library(golub)

and you can continue on your way, you will like need to install some other things. I suspect installing
"limma" depends on enough other things that it should get you on your way:
(Continue reading)

Rishi Sinha [guest] | 23 May 2013 22:04
Favicon

Installation Problems


Hello!

I was recently just trying to install bioconductor using the source() and then biocLite() commands, but
the only problem was that it wasn't able to access:

"http://brainarray.mbni.med.umich.edu/bioc/bin/windows/contrib/2.15"

Upon going to "http://brainarray.mbni.med.umich.edu/bioc/bin/windows/contrib/", I found that the
v2.15 has been removed and only v3.0 is there now.

To download that, how should I change the commands? Or will you guys update it soon?

Also, after using biocLite(), it prompts me about whether I would like to update all/some/none packages
(a/s/n), but it doesn't respond to any inputs ('a', 'all', etc.) and I'm forced to quit it using 'esc'. Any
idea what the problem might be??

Thanks!

 -- output of sessionInfo(): 

> biocLite()
BioC_mirror: http://bioconductor.org
Using Bioconductor version 2.11 (BiocInstaller 1.8.3), R version 2.15.
Installing package(s) 'Biobase' 'IRanges' 'AnnotationDbi'
Warning: unable to access index for repository http://brainarray.mbni.med.umich.edu/bioc/bin/windows/contrib/2.15
Warning: package ‘Biobase’ is in use and will not be installed
trying URL 'http://bioconductor.org/packages/2.11/bioc/bin/windows/contrib/2.15/IRanges_1.16.6.zip'
Content type 'application/zip' length 2048190 bytes (2.0 Mb)
opened URL
(Continue reading)

Ina Hoeschele | 23 May 2013 19:49
Picon

problem installing limma

Hi,
when I try to install limma, I get the error message below. Can someone please give me a hint - thanks, Ina

 installing *source* package ‘limma’ ...
** libs
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -O3 -pipe  -g  -c normexp.c -o normexp.o
normexp.c: In function ‘fit_saddle_nelder_mead’:
normexp.c:153: warning: floating constant exceeds range of ‘double’
gcc -std=gnu99 -shared -o limma.so normexp.o -L/usr/lib64/R/lib -lR
installing to /usr/local/lib/R/site-library/limma/libs
** R
** inst
** preparing package for lazy loading
Error : unknown namespace directive: function(lib, pkg) require(methods)
ERROR: lazy loading failed for package ‘limma’
* removing ‘/usr/local/lib/R/site-library/limma’

> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
(Continue reading)

Natasha Sahgal | 23 May 2013 17:22
Picon
Picon
Favicon

Microarray time series experiment

Dear List,

I have an experiment with 3 WT (GBT) and 3 KO (GAT) mice, each with 4 time points: 0h,2h,8h,24h.
The PI is interested in the overall effect of KO vs WT (at each time point and overall effect too). From the PCA
plot I could see that time has a large effect compared to the groups.

I thus generated differential expression analysis in the following manner:
------
>s.info
   Sample_Group Group Time     GT MousePair Chip Rep
1         GAT0h    KO    0  KO_0h         1    1   1
8         GAT0h    KO    0  KO_0h         4    2   2
13        GAT0h    KO    0  KO_0h         5    3   3
3         GAT2h    KO    2  KO_2h         1    1   1
10        GAT2h    KO    2  KO_2h         4    2   2
19        GAT2h    KO    2  KO_2h         5    4   3
5         GAT8h    KO    8  KO_8h         1    1   1
16        GAT8h    KO    8  KO_8h         4    3   2
21        GAT8h    KO    8  KO_8h         5    4   3
11       GAT24h    KO   24 KO_24h         1    2   1
18       GAT24h    KO   24 KO_24h         4    3   2
23       GAT24h    KO   24 KO_24h         5    4   3
2         GBT0h    WT    0  WT_0h         2    1   1
7         GBT0h    WT    0  WT_0h         3    2   2
14        GBT0h    WT    0  WT_0h         6    3   3
4         GBT2h    WT    2  WT_2h         2    1   1
9         GBT2h    WT    2  WT_2h         3    2   2
20        GBT2h    WT    2  WT_2h         6    4   3
6         GBT8h    WT    8  WT_8h         2    1   1
15        GBT8h    WT    8  WT_8h         3    3   2
(Continue reading)

Gustavo Fernández Bayón | 23 May 2013 17:06
Picon

minfi: Problem with GenomicMethylSet

Hi everybody.

After upgrading to R3.0 and Bioc 2.13, some of my scripts broke. 
Currently I have a problem that I'll try to reproduce here with a 
minimal scenario:

I have a MethylSet produced by preprocessSWAN.

 > mset
MethylSet (storageMode: lockedEnvironment)
assayData: 485512 features, 20 samples
   element names: Meth, Unmeth
phenoData
   sampleNames: 8691803020_R03C02 8691803043_R02C01 ...
     8691803052_R05C02 (20 total)
   varLabels: Sample_Name Sample_Well ... filenames (22 total)
   varMetadata: labelDescription
Annotation
   array: IlluminaHumanMethylation450k
   annotation: ilmn.v1.2
Preprocessing
   Method: SWAN (based on a MethylSet preprocesses as 'Raw (no 
normalization or bg correction)'
   minfi version: 1.7.3
   Manifest version: 0.4.0

If I try to extract Meth and Unmeth matrices, and create a 
GenomicMethylSet from them, I get the following error:

 > gmset <- GenomicMethylSet(hm450[featureNames(mset)], getMeth(mset),
(Continue reading)

Arpit Jain [guest] | 23 May 2013 08:52
Favicon

How to analyze Agilent DNA Methylation Microarray Analysis


Hi,

I have 18 samples (2 Normal, 16 Cancer data sets) obtained using Agilent DNA Methylation Microarrays. The
data sets are obtained in .txt format from Aglilent Feature Extraction software.
I want to find the differentially methylated regions/genes in the sample data. 
The packages in Bioconductor support data from Illumina Methylation.

How should I proceed with the work?? I also need to perform pathway analysis and clustering differentially
methylated miRNAs.

Hope for a positive reply.

 -- output of sessionInfo(): 

None

--
Sent via the guest posting facility at bioconductor.org.

_______________________________________________
Bioconductor mailing list
Bioconductor@...
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Mark Dane | 23 May 2013 07:14
Picon

cellHTS Repeat Values in Text Output

Hi,

I am seeing a problem with the text output in my multi-channel experiment. I have different types of data so I
want to normalize and score the channels separately. Therefore, I do not want to run
summarizeReplicates. I think the following will show what I am doing:

x <- readPlateList("Platelist.txt", name=experimentName, path=dataPath)
x <- configure(x, "Description.txt", "Plateconf.txt", "Screenlog.txt",
               path=dataPath) 
xn <- normalizePlates(x, scale="multiplicative", method="median", 
                      varianceAdjust="none")
xsc <- scoreReplicates(xn, sign="-", method="zscore") 
xsc <at> state[3]=TRUE
getTopTable(cellHTSlist=list("raw"=x,"normalized"=xn, "scored"=xsc),
            file="testtable.txt")

The output in testtable.txt (and similarly in writeReport's text output) has repeated values that are not
what is actually in the cellHTS objects.

raw'G01plate	position	well	score	wellAnno	finalWellAnno	raw_r1_ch1	raw_r2_ch1	raw_r1_ch2	raw_r2_ch2	median_ch1	diff_ch1	median_ch2	diff_ch2	raw/PlateMedian_r1_ch1	raw/PlateMedian_r2_ch1	raw/PlateMedian_r1_ch2	raw/PlateMedian_r2_ch2	normalized_r1_ch1	normalized_r2_ch1	normalized_r1_ch2	normalized_r2_ch2
3	145	G01	3.33	sample	sample	80	80	80	80	80	0	80	0	0.0502	0.0502	0.0502	0.0502	0.05	0.05	0.05	0.05
1	7	A07	3.33	sample	sample	80	80	80	80	80	0	80	0	0.051	0.051	0.051	0.051	0.051	0.051	0.051	0.051
3	202	I10	3.27	sample	sample	110	110	110	110	110	0	110	0	0.069	0.069	0.069	0.069	0.069	0.069	0.069	0.069

Is it ok to force the scored state of xsc to TRUE? Please let me know if I'm using this correctly. I really
appreciate your prior quick and helpful responses.

thank you,

Mark Dane
(Continue reading)

Gordon K Smyth | 23 May 2013 01:53
Picon
Picon
Favicon

edgeR MDS

Dear Manoj,

plotMDS does not do PCA.  As the documentation says

"This function is a variation on the usual multdimensional scaling (or 
principle coordinate) plot".

Statisticians are sometimes not very imaginative when choosing names for 
things, but PCA is an abbreviation for "principle component analysis" 
which is not the same as "principle coordinate analysis".

Best wishes
Gordon

> Date: Tue, 21 May 2013 11:55:36 -0700 (PDT)
> From: Manoj Hariharan <h_manoj@...>
> To: "Bioconductor@..." <Bioconductor@...>
> Subject: [BioC] edgeR MDS
>
> Hello,
>
> I am working on edgeR version 3.2.3.
>
>> From the documentation, I guess the "plotMDS.DGEList" is similar to 
>> PCA. The manual mentions that "Distances on the plot represent 
>> coefficient of variation of expression between samples".
>
> Is it possible to get a value of proportion of variance explained from 
> each dimension (component)? Also, is it possible to use the DGEList to 
> make a 3D PCA plot?
(Continue reading)


Gmane