Tina Asante Boahene | 11 Feb 12:57
Picon
Picon
Gravatar

Help baySeq

Hi all,

This is just a quick question regarding the comparison between the classical and your Bayesian approach.
In bayeseq, what FDR cutoff should I choose if I want to make a comparison with edgeR, where I for example take
a 5% cutoff on the adjusted p-values? (in fact, which one did you choose on your bayeseq paper for the real
data analysis?)

Thank you.

Kind Regards

Tina 
_______________________________________________
Bioconductor mailing list
Bioconductor@...
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Alicia Oshlack | 11 Feb 12:56
Picon
Favicon
Gravatar

Re: GOseq error

Hi Iain,

I'm not sure what this error is but how does the pwf look when you run
nullp()? Does it look like a reasonable distribution? Are you using the
Wallenius approximation? Might be worth trying with method="Sampling" and
see if you get the same error.

Cheers,
Alicia

On 11/02/12 10:00 PM, "bioconductor-request@..."
<bioconductor-request@...> wrote:

> Date: Fri, 10 Feb 2012 13:22:17 +0000
> From: Iain Gallagher <iaingallagher@...>
> To: bioconductor <bioconductor@...>
> Subject: [BioC] GOSeq error
> Message-ID:
> <1328880137.28802.YahooMailNeo@...>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> Hello List
> 
> I'm using GOSeq to examine the biology behind genes found to be regulated in a
> RNA-Seq study. One of my comparisons yields ~450 regulated genes. Attempting
> to run GOSeq with this list results in the following error:
> 
> Error in if (min(fv) < lower_bound) fv = fv - min(fv) + lower_bound :
> ? missing value where TRUE/FALSE needed
> 
(Continue reading)

Thomas Girke | 11 Feb 04:44
Favicon
Gravatar

Rsubread/Rsamtools mappings outside of chromosome ranges

With some Illumina libraries I have been running into problems importing
the read mappings from Rsubread into Rsamtools. After some testing I
found out that some reads reported in Rsubread's SAM output have their
end positions outside of the chromosome ranges. If those reads are
removed from the SAM file then the import into Rsamtools works just
fine. Below is a reproducible example of this problem.

I could think of several solutions to fix this, e.g. not reporting reads
outside of chromosome ranges or updating their length in the CIAGR string.
An additional option could be to remove invalid mappings during the import
into Rsamtools. 

Thanks,

Thomas

################################
## Read mapping with Rsubread ##
################################
library(Rsubread)
## Reference source: ftp://ftp.arabidopsis.org/home/tair/Sequences/whole_chromosomes/
## Fastq source: ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP%2FSRP009%2FSRP009850/SRR390302/
buildindex(basename="tair10chr.fasta", reference="tair10chr.fasta")
align(index="tair10chr.fasta", readfile1="SRR390302.fastq",
output_file="SRR390302_subread.sam", nthreads=8, indels=1, TH1=2)

#####################################
## Process SAM file with Rsamtools ##
#####################################
library(Rsamtools)
(Continue reading)

Elena Sorokin | 11 Feb 02:06
Picon
Favicon

DEXseq: Making exonCountSet object

Hello again,

As a result of my recent post, I'm now delving into the DEXseq package. 
But I am having struggling with building my exonCountSet object, as 
described in the vignette/ R documentation. First I went through the 
example in the vignette, and was able to complete the pasilla analysis.

With my own data, though, the differential expression testing doesn't 
work, and additionally, the annotation GFF file doesn't seem to have 
loaded. Could somebody kindly write out the exact code for reading in 
the HTSeq counts files using the pasilla input files (p.11 of the 
vignette), or better yet, point out where my code below is wrong?

My goal is to be able to use the neat graphics as well as DE testing on 
my dataset, so I don't want to do the bare-bones loading. My code is 
posted below... I apologize that it's lengthy.

Many thanks in advance to anybody out there who can help me,

Elena
_____________________

# For reference, I have four counts files, called veh1_counts.txt, 
veh2_counts.txt, drug1_counts.txt, drug2_counts.txt that I made using, 
for example:

*python dexseq_count.py DEXSeq_annotations.gff <samfile> veh1_counts.txt *

and one GFF file, called DEXseq_annotations.gff, that I made using:
*
(Continue reading)

Gordon K Smyth | 10 Feb 22:34
Picon
Picon
Favicon

limma barcodeplot with 2 groups

Dear Dario,

The "labels" of the barcodeplot give the names of the two groups you are 
comparing (cancer vs normal for example).  If you are comparing cancer vs 
normal so such positive t-statistics indicate up in cancer, then you would 
put the cancer label on the left and the normal label on the right.  The 
labels are the same for both gene sets, because the same groups are being 
compared in each case.  Hence the labels are at the ends of the plot.

I think that you have assumed that the labels are intended to be for the 
gene sets, which they are not.

Best wishes
Gordon

> Date: Fri, 10 Feb 2012 12:00:12 +1100 (EST)
> From: Dario Strbenac <D.Strbenac@...>
> To: bioconductor@...
> Subject: [BioC] limma barcodeplot with 2 groups
> Message-ID: <20120210120012.BQA38016@...>
> Content-Type: text/plain; charset="us-ascii"
>
> Hello,
>
> The labels that I get are still on the side, as for a 1-sample plot. 
> Would it make more sense graphically to have them at the top and botton 
> of the graphic for 2-sample plots ? In the current format, I cannot tell 
> which label relates to which colour.
>
> I have limma_3.10.2
(Continue reading)

Abhishek Pratap | 10 Feb 20:48
Favicon
Gravatar

gene function assessment

We have identified a set of genes to be differentially expressed between
treatment and control.  The current gene annotation doesnt have functional
information about the gene.

What we would like to do is asses the biological functions of these genes(
may be based on GO and then cluster them). I was wondering if there is
something already out there to do so in semi automated way.

I am thinking of blasting them against pfam  or be myhits dbase but wanted
to take your opinion.

PS  : cross posted on biostar mailing list :
http://biostar.stackexchange.com/questions/17332/functional-assessment-of-genes

Thanks!
-Abhi

	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor@...
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Robert Scharpf | 10 Feb 18:15
Gravatar

Empty RangedData with value columns?

Hi, 

Would it be possible to have a column (value) in a RangedData object persist even if it has zero length? 

For example,

RangedData(IRanges(), id=character())

has 0 value columns, though I think it would be nice if the 'id' column could persist.  Afterall, it is
permissable to have a DataFrame with values that are of zero length

values <- DataFrame(id=character())
colnames(values)

But, these values disappear in the RangedData constructor:

space <- Rle(factor())
values <- split(values, space)
colnames(values)

where the last line above returns NULL.

Thanks-

Rob

> sessionInfo()
R Under development (unstable) (2012-02-10 r58315)
Platform: x86_64-unknown-linux-gnu (64-bit)

(Continue reading)

Florian Markowetz | 10 Feb 18:14
Picon

Open Position :: Postdoc in systems genetics in Cambridge

Open Position :: Postdoc in systems genetics in Cambridge

The genetics lab of Prof. Sir Bruce Ponder and the computational biology lab of Dr. Florian Markowetz at the
Cancer Research UK Cambridge Research Institute offer a joint position for a postdoctoral researcher
interested in statistical and computational approaches to systems genetics in cancer.
The Ponder laboratory focuses on genetic susceptibility to breast and other common cancers [1] and the
Markowetz laboratory on integrative systemic analyses [2].
Project: The function of the two transcription factors FoxA1 and estrogen receptor (ER) are key drivers of
breast cancer and standard breast cancer therapy is based on inhibition of ER function. We now wish
toexplore to which extent genetic variation in the population affects ER/FoxA1 binding and function in
order to better understand its role in susceptibility to disease as well as response to therapy. In
collaboration with the laboratory of Jason Carroll at CRI existing ChIP-Seq data sets [3] will be mined to
explore genome-wide allele-specific binding and we hope to integrate diverse genomic data sources
(epigenetic marks such chromatin modification and DNA methylation, expression microarrays etc) using
statistical network methods to model a cell’s response to ER. The methods developed in breast cancer
will also be applied to other cancer types, e.g. lung cancer.

The position bridges an experimental and a computational lab and is ideal if you are interested in data
analysis and method development motivated by close collaborations with experimentalists.

The ideal applicant has a strong background in genomic data analysis (e.g. ChIP seq analysis and motif
finding) and statistical modelling (using R or Matlab) as well as data visualization and exploration
(e.g. with Affy's Integrated Genome Browser). Experience in medical research is desirable.

If you are highly motivated to work in an interdisciplinary and very collaborative environment at an
internationally recognized research institute, apply by sending your CV to Florian Markowetz
<http://markowetzlab.org/florian/> .

References

(Continue reading)

Nicolas Delhomme | 10 Feb 17:04
Picon
Gravatar

Integer overflow when summing an 'integer' Rle

Hi all,

While calculating some statistics of an RNA-seq experiment I tumbled onto the following problem.
Applying the IRanges coverage function to my IRanges, I get back an integer Rle object. However trying to
get the mean or sum of that Rle object results in an integer overflow. The following example just exemplify
that overflow.

library(IRanges)
rC <- Rle(values=as.integer(c(1,(2^31)-1,1)))
sum(rC)
mean(rC)

Both result in an integer overflow. 

[1] NA
Warning message:
In sum(runValue(x) * runLength(x), ..., na.rm = na.rm) :
  Integer overflow - use sum(as.numeric(.))

The solution to  that is to do the following:

sum(as.numeric(runLength(rC) * runValue(rC)))

but IMO it should be handled at the Rle level code; i.e. an integer Rle can clearly have a sum, a mean, etc...
result that involve calculating values outside the integer range. Is there anything that speaks again
having these functions internally converting the integer values to numeric before calculating the sum
or mean?

Looking forward to hearing your thoughts on this,

(Continue reading)

wang peter | 10 Feb 17:05
Picon

about library size and length of gene information in DEseq

i am suing DEseq, but i donot know
how to set length of gene information,
does it use length information in finding differential expressed genes?

and either i donot know how to set library size of each sample
but in edgeR, I can do it by such function
d <- DGEList(counts = d, lib.size =
c(9893630,11055814,11207084,9663487,11455088,8140053), group = group)

--

-- 
shan gao
Room 231(Dr.Fei lab)
Boyce Thompson Institute
Cornell University
Tower Road, Ithaca, NY 14853-1801
Office phone: 1-607-254-1267(day)
Official email:sg839@...
Facebook:http://www.facebook.com/profile.php?id=100001986532253

_______________________________________________
Bioconductor mailing list
Bioconductor@...
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Favicon
Gravatar

Finding the similarity of two microarray experiments

Hello Bioconductor list server,

I am working on a microarray experiment comparing a control to a treatment and the effects on cells. I have
the results from my current experiment and from a past experiment using the same microarrays but a
different treatment. I have already set up the workflow for both experiments. I find the significant
genes using topTable.

> results = topTable(fit2, coef="control-treatment", adjust.method="fdr", lfc=2)

> trimmed_results = results_v4c4[results_v4c4[,5]<0.05,]

I would like to find if the there are similar results from the past experiment to the new experiment. I would
like to say in the paper: treatment A had similar transcripts in treatment B indicating similar mechanism
between the two treatments or (if there are no similar transcripts) the two treatments do not use the the
same mechanism. I would then like to find the transcripts similar in both results and pull out the
transcripts with the log fold change and p-values for both results. The log fold changes and p-values can
be different between the to experiments. What would be the best method for accomplishing this task?

I have read the answers similar to my question on the list server. The main conclusion is to do the contrasts
during the matrix design. I ran the contrasts and compared the results to the different experiments' results.

Result from the contrasts:

            ID                    logFC       AveExpr    t             P.Value          adj.P.Val

49667 240417_at         -2.392707 3.715039 -5.967782 0.0002286033   0.9319275

24271 214975_s_at       2.720147 4.163144  5.574570 0.0003717616   0.9319275

The first ID is only found to be significant in one of the experiments. The second ID is found in both
(Continue reading)


Gmane