Dan Tenenbaum | 30 Sep 21:51

bioc-sig-sequencing list will be removed

Hi all,

We will be closing the bioc-sig-sequencing email list. The reasons are
because sequencing is now a mainstream part of the Bioconductor
project, posts to one list are often relevant to both, new users are
confused by the appropriate list for posting, and a large fraction of
bioc-sig-sequencing subscribers also subscribe to the main
Bioconductor mailing list.

The list archive will continue to be available at

 https://stat.ethz.ch/pipermail/bioc-sig-sequencing/

For continued support, you may use the Bioconductor mailing list

 http://bioconductor.org/help/mailing-list/

including a new feature allowing questions posted as a guest

 http://bioconductor.org/help/mailing-list/mailform/

The bioc-sig-sequencing list will be removed on Tuesday, October 4th.
Dan Tenenbaum | 30 Sep 21:12

Posting to the Bioconductor list without subscribing

Hi all,

We've added a new feature to the Bioconductor web site: a form through which
you can post to the Bioconductor email list without first subscribing to it.
The form contains a captcha to protect against spammers.

The form is here:
http://bioconductor.org/help/mailing-list/mailform/

and it can also be reached through the Mailing Lists link on the main page
at http://bioconductor.org.

Enjoy,
Dan

	[[alternative HTML version deleted]]
Ivan Gregoretti | 30 Sep 16:48
Picon

Another ScanBamParam suggestion

Following Janet's example, I would also like to propose an upgrade to
ScanBamParam:

It would be great if we could tell ScanBamPram that we want to load
only the reads that passed the vendor's quality filter.

In other words, the functionality I am suggesting is analogous to the
filter in readAligned() from the ShortRead library.

With the new release of Illumina sequencing reagents (version 3) you
get 200 million reads per lane from the HiSeq 2000. In my view, with
samples that big becoming popular, any investment in "read in"
efficiency is a good investment. I would be happy to provide a sample
BAM for those interested in addressing this suggestion.

It is also my humble opinion that we should start considering
parallelisation for reading in. I hope that I am not just wishing too
much.

Thank you,

Ivan
Janet Young | 30 Sep 02:26

ScanBamParam suggestion

Hi,  

I have a suggestion to make ScanBamParam easier to use for coding amateurs like myself (I'm still sometimes
confused with the many ways to encode genomic regions):

Is it easy/possible to change bamWhich function to accept GRanges objects, rather than requiring
RangesList?  See below...

thanks, as usual,

Janet

############
library(Rsamtools)
myGR <- GRanges(seqnames="chr1",ranges=IRanges(start=1,end=100))

# I can use GRanges as the "which" argument if I do it when I create the ScanBamParam object
myparams1 <- ScanBamParam(which=myGR)

#but not if I try to set later
myparams2 <- ScanBamParam()
bamWhich(myparams2) <- myGR
### Error in checkSlotAssignment(object, name, value) : 
###   assignment of an object of class "GRanges" is not valid for slot "which" 
### in an object of class "ScanBamParam"; is(value, "RangesList") is not TRUE

## it's OK, though coercion does work.  
bamWhich(myparams2) <- as(myGR,"RangesList")

sessionInfo()
(Continue reading)

wang peter | 28 Sep 22:14
Picon

a wired problem

dear all:
     Using vcountPattern, i found some matched sequences.
but those are not similar to the pattern.
     see such coding

rm(list=ls())
reads <- readFastq(fastqfile);#downloaded from
http://biocluster.ucr.edu/~tbackman/query.fastq
seqs <- sread(reads);
PCR2rc<-DNAString("AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAACAAA")
result <- vcountPattern(PCR2rc, seqs, max.mismatch=1, min.mismatch=0,
with.indels=TRUE, algorithm="indels")
reads <- reads[result]
seqs <- sread(reads)
sum(result)
     then using countPattern, i found they are really not match

subject1 = "GTTGGTGCAAACATTAGTTCTTCTGTTGGTGCAACCTTTG"
result <- countPattern(PCR2rc, subject1, max.mismatch=1, min.mismatch=0,
with.indels=TRUE)
[1] 0

shan gao

	[[alternative HTML version deleted]]
Francesco Lescai | 27 Sep 21:53
Picon
Picon
Favicon

coverage vector longer than covered area (ScanBam and IRanges)?

Hi there,
may be I am missing something obvious in my code.
I extracted information from a bam file on 11 regions.
when I calculate the coverage on one of these regions, I get a vector which is much longer than the number of
bases of the region itself.

I tried several times to replicate this problem on a small IRanges object but - of course - everything was ok.
Anyone has an idea?

that's what I've done:

> regions
   chr     start       end
1    1 195915909 197867662
2    2   1199920   2234892 [...]

scanned my bam within those regions
which=GRanges(regions$chr,IRanges(regions$start,regions$end))
ggf<-scanBam("s_7_1_sequence.txt.novo.rmdup.bam_sorted.bam", 
             param=ScanBamParam(which=which,
                                what = c("pos", "qwidth"),
                                flag =scanBamFlag(isUnmappedQuery =FALSE)))

created a summary function, as suggested in the IRanges vignette
summaryFunction<-function(seqname,bamfile,...){
  x<-bamfile[[seqname]]
  coverage(IRanges(x[["pos"]],width=x[["qwidth"]]))
}

applied to get a list of coverage for each region
(Continue reading)

wang peter | 27 Sep 21:23
Picon

what is

hello every one,

i have such coding

> subject1 =
"ATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAATATT"

> pattern1 =
"AATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG"
> matchPattern(pattern1, subject1, max.mismatch=0.2, min.mismatch=0,
with.indels=TRUE)
  Views on a 81-letter BString subject
subject:
ATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAATATT
views: NONE
> trimLRPatterns(Lpattern = pattern1, subject = subject,
max.Lmismatch=0,with.Lindels=TRUE)
[1] "AAAAAAAAAAAATATT"
>

i already allow the indels,but why matchPattern cannot find the pattern in
subject
what does with.indels mean?
i am confused
thx
shan gao

	[[alternative HTML version deleted]]
Anita Lerch | 27 Sep 11:39
Picon
Favicon

Read big fastq files in chunks

Hi,
I should filter a big fastq file (HiSeq ca. >30'000'000, which does not
fit in memory) for low qualities e.g. to many N in the sequence.

I plan to read the file in chunks, filter the ShortReads and write them
into a new file.
This is no problem for fasta files, see the example:

filename <- "data/DmS2DRSC_RNA_seqs_subsample.fa"
eof <- FALSE
append <- FALSE
cycle <- 1L
while(!eof){
  chunk <- readFasta(filename, nrec=nrec, skip=(cycle-1)*nrec)
  nFilt <- nFilter(5)
  writeFasta(chunk[nFilt(chunk)],
             sprintf("%s/filtered_%s", 
                     dirname(filename), basename(filename)),
             append=append)
  append <- TRUE
  cycle <- cycle + 1
  if(length(chunk) == 0L)
      eof <- TRUE
}

If I try to do the same for fastq file e.g.

filename <- "data/test_data.fq"
eof <- FALSE
mode <- "w"
(Continue reading)

wang peter | 26 Sep 22:51
Picon

question about trimLRPatterns

dear harris:
thank you very much for your previous help, but i am still confused by such
problems:
[1] why does the second section of coding can not work,but the first can
subject = "TTTACGT"
Lpattern = "TTTAACGT"
trimLRPatterns(Lpattern = Lpattern, subject = subject,
max.Lmismatch=1,with.Lindels=TRUE)
subject = "TGCATTT"
Rpattern = "TGCAATTT"
trimLRPatterns(Rpattern = Rpattern, subject = subject,
max.Rmismatch=1,with.Rindels=TRUE)
Error in solveUserSEW(width(x), start = start, end = end, width = width) :
  solving row 1: 'allow.nonnarrowing' is FALSE and the supplied start (0) is
< 1

[2] how can i see the code of such functions: which.isMatchingStartingAt,
rev, normargPattern
which are called by Biostrings:::.computeTrimEnd
showMethods("which.isMatchingStartingAt") can not work
[3] max.Rmismatch=0.1 will be replaced by 0.1*nchar(Rpattern) and then by
(-1,-1,...as.integer(0.1*nchar(Rpattern)))
to better control each try, can i use max.Rmismatch=0.1*(1:nchar(Rpattern))

good luck
shan gao

	[[alternative HTML version deleted]]
wang peter | 26 Sep 15:37
Picon

remove adaptor before remove barcode

dear all:
 i found a problem trimLRPatterns cannot allow this situation
that is the Rpattern has a left shift from subject sequence, see below

subject
 ATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAATATT

Rpattern
AATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG

so i change the code to keep the 5' barcode GCATT, so the pattern can be
recognized

subject
 GCATTATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAATATT

Rpattern
           AATCGAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG

after remove 3' adaptor, i can remove the 5' barcode

any ideas?

shan gao

	[[alternative HTML version deleted]]
wang peter | 26 Sep 01:25
Picon

how to write a fasta file

hi every one:

i read fastq file and save them as fasta
but falied

fastqfile="unmapped"
library(ShortRead)
reads <- readFastq(fastqfile)
seqs <- sread(reads)
ids <- id(reads)

writeFasta(seqs, file="unmapped.fa")
Error in function (classes, fdef, mtable)  :
  unable to find an inherited method for function "writeFasta", for
signature "DNAStringSet"

writeFASTA(seqs, file="unmapped.fa")
Error in for (rec in x) { : invalid for() loop sequence

what is the problem
thank you

shan gao

	[[alternative HTML version deleted]]

Gmane