16 May 20:05
Re: fastq splitter - working but not before xmas!!
Sean O'Keeffe <limericksean <at> gmail.com>
2012-05-16 18:05:28 GMT
2012-05-16 18:05:28 GMT
So now I've got a bunch of fastq's all about 17GB in size. The script is puttering away but this is tediously slow. I tried the the fastq-dump tool from sra toolkit but it didn't like my commands (fastq-dump --split-files <input_fastq_file> ) - my ignorance no doubt. Any ideas out there on speeding up Bio::SeqIO::fastq output? Thanks. On 1 March 2012 03:16, Joel Martin <j_martin <at> lbl.gov> wrote: > Just a caution to double check that the read1 and read2 names match after > splitting. I don't know if this thread jinxed me or what, but I just for > the first time received a concatenated fastq file formatted as you > describe, except the first read1 doesn't match the first read2. zut alores! > > came up with converting to scarf, /usr/bin/sort the scarf, then read that > with tossing into single or paired files and reconverting to fastq in the > process. it wasn't too bad, but I don't think bioperl has a scarf > conversion, it's basically fastq with : substituted for \n. most > delimeters that aren't : would work better but i already had a fastq2scarf > from early solexa days ( i think ). > > # this was the last step, if it's handy for this plague of hideous files, > the fixed fields for : would need adjusting > use strict; > > open( my $oph, '>', 'paired.fq' ) or die $!; > open( my $osh, '>', 'single.fq' ) or die $!; >(Continue reading)
RSS Feed