Guy Coates | 1 Jun 2010 14:03
Picon
Favicon

Re: cluster scheduler for dynamic tree-structured jobs?

On 15/05/10 11:24, Andrew Piskorski wrote:
> Folks, I could use some advice on which cluster job scheduler (batch
> queuing system) would be most appropriate for my particular needs.
> I've looked through docs for SGE, Slurm, etc., but without first-hand
> experience with each one it's not at all clear to me which I should
> choose...
> 

This may be late in the day but...
If you job dependencies are too complicated for you queuing system to
deal with, you may want to look at the Ensembl Hive system;

http://www.ensembl.org/info/docs/eHive/index.html

It is the system we use in-house for our genome-analysis pipelines,
which have lots of complicated dependencies. It sits on top of a
traditional queuing system which handles job-dispatch etc.

It has been de-coupled from the genome analysis workflow, so you should
(in theory) be able to use it for any analysis.

Cheers,

Guy

--

-- 
Dr. Guy Coates, Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
Tel: +44 (0)1223 834244 x 6925
Fax: +44 (0)1223 496802
(Continue reading)

Ricardo Reis | 2 Jun 2010 08:44
Picon
Favicon
Gravatar

Top 500 in the BBC


http://infosthetics.com/archives/2010/05/bbc_news_visualizing_the_top_500_supercomputer_report.html

best regards,

  Ricardo Reis

  'Non Serviam'

  PhD candidate  <at>  Lasef
  Computational Fluid Dynamics, High Performance Computing, Turbulence
  http://www.lasef.ist.utl.pt

  Cultural Instigator  <at>  Rádio Zero
  http://www.radiozero.pt

  Keep them Flying! Ajude a/help Aero Fénix!

  http://www.aeronauta.com/aero.fenix

  http://www.flickr.com/photos/rreis/

                            < sent with alpine 2.00 >

http://infosthetics.com/archives/2010/05/bbc_news_visualizing_the_top_500_supercomputer_report.html

best regards,

  Ricardo Reis
(Continue reading)

Sabuj Pattanayek | 4 Jun 2010 05:01
Picon

Re: recommendations for parallel IO

>  We need an open source solution, we are looking into PVFS and Gluster (but
> from what we see, Gluster doesn't quit fit the bill? It's more a distributed
> filesystem than a parallel filesystem... or are we taking the wrong turn on
> our reasoning, somewhere about this?)

gluster in stripe mode is parallel. It can also be distributed or
distributed and parallel, mirrored, etc.

Paulo Afonso Lopes | 4 Jun 2010 13:45
Picon
Favicon

Re: recommendations for parallel IO

Oi, Ricardo.

>
>   Hi all
>
>   We have a small cluster but some users need to use MPI-IO. We have a
> NFS3
> shared partition but you would need to mount it with special options who
> would hurt performance.

Yes... options include all the available ways to enforce "no client
caching" and that is (usually) very bad for performance :-)

There's also NFS4.1 but I can't speak about it other than the last time (>
6 months) I looked, it was VERY OS dependent (you had to run kernel
2.6.x.y.z); furthermore, I haven't looked at the MPI-IO support status on
4.1.

> We are looking into a nice parallel file system to
> deploy in this context. We got 4 boxes with a 500Gb disk in each, for the

Are the 4 boxes just for the filesystem service, or are they "the small
cluster" ?

> moment, connected with Gb. We have another Gb connection dedicated to the
> MPI traffic.
>
>   We need an open source solution, we are looking into PVFS

I am using it and I have very good experiences with PVFS: easy
(Continue reading)

Gus Correa | 4 Jun 2010 18:43
Favicon

Re: Top 500 in the BBC

Ola' Ricardo,

That's really nice!
Here's the BBC link also:
http://news.bbc.co.uk/2/hi/10187248.stm

Na~o ha' nada como o ra'dio para a difusa~o da informaca~o!

Abrac,o
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

Ricardo Reis wrote:
> 
>
http://infosthetics.com/archives/2010/05/bbc_news_visualizing_the_top_500_supercomputer_report.html 
> 
> 
> best regards,
> 
>  Ricardo Reis
> 
>  'Non Serviam'
> 
>  PhD candidate  <at>  Lasef
>  Computational Fluid Dynamics, High Performance Computing, Turbulence
(Continue reading)

David Mathog | 8 Jun 2010 19:44
Picon
Favicon

OT: recoverable optical media archive format?

This is off topic so I will try to keep it short:  is there an
"archival" format for large binary files which contains enough error
correction to that all original data may be recovered even if there is a
little data loss in the storage media?  

For my purposes these are disk images, sometimes .tar.gz, other times
gunzip -c of dd dumps of whole partitions which have been "cleared" by
filling the empty space with one big file full of zero, and then that
file deleted.  I'm thinking of putting this information on DVD's (only
need to keep it for a few years at a time) but I don't trust that media
not to lose a sector here or there - having watched far too many
scratched DVD movies with playback problems.

Unlike an SDLT with a bad section, the good parts of a DVD are still
readable when there is a bad block (using dd or ddrescue) but of course
even a single missing chunk makes it impossible to decompress a .gz file
correctly.  So what I'm looking for is some sort of .img.gz.ecc format,
where the .ecc puts in enough redundant information to recover the
underlying img.gz even when sectors or data are missing.   If no such
tool/format exists then two copies should be enough to recover all of an
.img.gz so long as the same data wasn't lost on both media, and if bad
DVD sectors always come back as "failed read", never ever showing up as
a good read but actually containing bad data.  Perhaps the frame
checksum on a DVD is enough to guarantee that?

Thanks,

David Mathog
mathog <at> caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
(Continue reading)

Michael Di Domenico | 8 Jun 2010 20:05
Picon

Re: OT: recoverable optical media archive format?

What's the ramification of losing a block?  (ie file-system won't
mount, data has a hole)

Not that it's elegant, the first thing that pops to mind is using
'split' to chunk the file into many little bits and then md5 each bit

On Tue, Jun 8, 2010 at 1:44 PM, David Mathog <mathog <at> caltech.edu> wrote:
> This is off topic so I will try to keep it short:  is there an
> "archival" format for large binary files which contains enough error
> correction to that all original data may be recovered even if there is a
> little data loss in the storage media?
>
> For my purposes these are disk images, sometimes .tar.gz, other times
> gunzip -c of dd dumps of whole partitions which have been "cleared" by
> filling the empty space with one big file full of zero, and then that
> file deleted.  I'm thinking of putting this information on DVD's (only
> need to keep it for a few years at a time) but I don't trust that media
> not to lose a sector here or there - having watched far too many
> scratched DVD movies with playback problems.
>
> Unlike an SDLT with a bad section, the good parts of a DVD are still
> readable when there is a bad block (using dd or ddrescue) but of course
> even a single missing chunk makes it impossible to decompress a .gz file
> correctly.  So what I'm looking for is some sort of .img.gz.ecc format,
> where the .ecc puts in enough redundant information to recover the
> underlying img.gz even when sectors or data are missing.   If no such
> tool/format exists then two copies should be enough to recover all of an
> .img.gz so long as the same data wasn't lost on both media, and if bad
> DVD sectors always come back as "failed read", never ever showing up as
> a good read but actually containing bad data.  Perhaps the frame
(Continue reading)

Reuti | 8 Jun 2010 21:03
Picon

Re: OT: recoverable optical media archive format?

Hi,

Am 08.06.2010 um 19:44 schrieb David Mathog:

> This is off topic so I will try to keep it short:  is there an
> "archival" format for large binary files which contains enough error
> correction to that all original data may be recovered even if there  
> is a
> little data loss in the storage media?
>
> For my purposes these are disk images, sometimes .tar.gz, other times
> gunzip -c of dd dumps of whole partitions which have been "cleared" by
> filling the empty space with one big file full of zero, and then that
> file deleted.  I'm thinking of putting this information on DVD's (only
> need to keep it for a few years at a time) but I don't trust that  
> media
> not to lose a sector here or there - having watched far too many
> scratched DVD movies with playback problems.
>
> Unlike an SDLT with a bad section, the good parts of a DVD are still
> readable when there is a bad block (using dd or ddrescue) but of  
> course
> even a single missing chunk makes it impossible to decompress a .gz  
> file
> correctly.  So what I'm looking for is some sort of .img.gz.ecc  
> format,
> where the .ecc puts in enough redundant information to recover the
> underlying img.gz even when sectors or data are missing.   If no such
> tool/format exists then two copies should be enough to recover all  
> of an
(Continue reading)

Jesse Becker | 9 Jun 2010 02:49
Picon

Re: OT: recoverable optical media archive format?

I came across this page a few years back that discusses this very
problem:

    http://users.softlab.ntua.gr/~ttsiod/rsbep.html

On Tue, Jun 08, 2010 at 01:44:55PM -0400, David Mathog wrote:
>This is off topic so I will try to keep it short:  is there an
>"archival" format for large binary files which contains enough error
>correction to that all original data may be recovered even if there is a
>little data loss in the storage media?  
>
>For my purposes these are disk images, sometimes .tar.gz, other times
>gunzip -c of dd dumps of whole partitions which have been "cleared" by
>filling the empty space with one big file full of zero, and then that
>file deleted.  I'm thinking of putting this information on DVD's (only
>need to keep it for a few years at a time) but I don't trust that media
>not to lose a sector here or there - having watched far too many
>scratched DVD movies with playback problems.
>
>Unlike an SDLT with a bad section, the good parts of a DVD are still
>readable when there is a bad block (using dd or ddrescue) but of course
>even a single missing chunk makes it impossible to decompress a .gz file
>correctly.  So what I'm looking for is some sort of .img.gz.ecc format,
>where the .ecc puts in enough redundant information to recover the
>underlying img.gz even when sectors or data are missing.   If no such
>tool/format exists then two copies should be enough to recover all of an
>.img.gz so long as the same data wasn't lost on both media, and if bad
>DVD sectors always come back as "failed read", never ever showing up as
>a good read but actually containing bad data.  Perhaps the frame
>checksum on a DVD is enough to guarantee that?
(Continue reading)

Kilian CAVALOTTI | 9 Jun 2010 09:33
Picon

Re: OT: recoverable optical media archive format?

On Tue, Jun 8, 2010 at 8:05 PM, Michael Di Domenico
<mdidomenico4 <at> gmail.com> wrote:
> Not that it's elegant, the first thing that pops to mind is using
> 'split' to chunk the file into many little bits and then md5 each bit

While this may let you know that a file has been corrupted, it won't
help recovering that file.

Some compression algorithms, which may be considered as storage
algorithms if you turn compression off, have options to create
recovery records. For instance, in the RAR format
(http://en.wikipedia.org/wiki/RAR), you can choose how much redundant
data you want to include in your archive (whose size will be increased
accordingly).

Excerpt from Alexander Roshal's rar user's manual:

"""
    rr[N]   Add data recovery record. Optionally, redundant information
            (recovery record) may be added to an archive. This will cause
            a small increase of the archive size and helps to recover
            archived files in case of floppy disk failure or data losses of
            any other kind. A recovery record contains up to 524288 recovery
            sectors. The number of sectors may be specified directly in the
            'rr' command (N = 1, 2 .. 524288) or, if it is not specified by
            the user, it will be selected automatically according to the
            archive size: a size of the recovery information will be about
            1% of the total archive size, usually allowing the recovery of
            up to 0.6% of the total archive size of continuously damaged data.

(Continue reading)


Gmane