Chris Samuel | 2 Jan 03:06 2007
X-Face

Re: Which distro for the cluster?

On Saturday 30 December 2006 04:24, Robert G. Brown wrote:

> On Fri, 29 Dec 2006, Geoff Jacobs wrote:
>
> > What I'd like to see is an interested party which would implement a
> > good, long term security management program for FC(2n+b) releases. RH
> > obviously won't do this.
>
> I thought there was such a party, but I'm too lazy to google for it.

Fedora Legacy.  It's pretty much dead these days. :-(

http://fedoralegacy.org/

 Important Notice: December 12, 2006 

 The current model for supporting maintenance distributions is being 
re-examined. In the meantime, we are unable to extend support to older Fedora 
Core releases as we had planned. As of now, Fedora Core 4 and earlier 
distributions are no longer being maintained. 

--

-- 
 Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

On Saturday 30 December 2006 04:24, Robert G. Brown wrote:

(Continue reading)

Chris Samuel | 2 Jan 03:10 2007
X-Face

Re: Which distro for the cluster?

On Friday 29 December 2006 21:05, Geoff Jacobs wrote:

> Here's a bare bones kickstart method (not Kickstart[tm] per se):
> http://linuxmafia.com/faq/Debian/kickstart.html

Good old Rick, he crops up everywhere & is a mine of information. ;-)

> Regarding kickstart, among choices for pre-scripted installers it is one
> of many. I personally favor the likes of SystemImager, even though it's
> not quite in the same category (FAI is though, IMO). Even dd with netcat
> is pretty powerful for homogeneous nodes.

FAI is the one I've heard of before, but never had the chance to play with it 
yet.   I hear tell that Warewulf is distro neutral and will deploy J.Random 
Distro onto hardware (and maybe even 'doze, shudder).

> Once you've chosen your distro based on experience/need, there are
> usually a few ways to put it on your spindles.

Oh indeed!

cheers,
Chris
--

-- 
 Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

(Continue reading)

Chris Dagdigian | 2 Jan 22:06 2007

Re: picking out a job scheduler


For what it's worth I'm a biased Grid Engine and Platform LSF user  ...

On Dec 29, 2006, at 11:40 AM, Nathan Moore wrote:

> I've presently set up a cluster of 5 AMD  dual-core linux boxes for  
> my students (at a small college).  I've got  MPICH running, shared  
> NIS/NFS home directories etc.  After reading the MPICH installation  
> guide and manual, I can't say I understand how to deploy MPICH for  
> my students to use.  So far as I can tell, there no load balancing  
> or migration of processes in the library, and so now I'm trying to  
> figure out what piece of software to add to the  cluster to (for  
> example) prevent the starting of an MPI job when there's already  
> another job running.
>
> (1) Is openPBS or gridengine the appropriate tool to use for a  
> multi-user system where mpich is available?  Are there better  
> scheduling options?
>

Both should be fine although if you are considering *PBS you should  
look at both Torque (a fork of OpenPBS I think) and PBSPro  
(commercial but last time I checked they had very good options for  
academic sites).  I can't speak intelligently about the PBS variants  
these days... it's been too long since I've been hands on.

Lots of people use Grid Engine with MPICH using both loose and tight  
integration methods. The mailing list  
(users <at> gridengine.sunsource.net) has a very helpful community with an  
excellent signal to noise ratio.
(Continue reading)

Robert G. Brown | 3 Jan 00:32 2007

Re: FW: Which distro for the cluster?

On Thu, 28 Dec 2006, Cunningham, Dave wrote:

> I notice that Scyld is notable by it's absence from this discussion.  Is
> that due to cost, or bad/no experience, or other factors?  There is a
> lot of interest in it around my company lately.

Scyld is a fine choice for a cluster, but not usually for a first time
learning cluster for non-professionals.  This is in part because it
costs money, and in part because it is designed to encapsulate a lot of
what one has to do to "make a cluster" to the point where it is nearly
entirely hidden from the user/administrator.  This is desireable from a
corporate point of view (although I personally think that one needs a
certain amount of actual cluster experience to get the most out of even
Scyld) but not so good for poor people seeking to learn.  It also limits
you at least somewhat to the particular parallel computing model that
Scyld itself embraces.

A good friend of mine at Duke uses Scyld for his biochemistry cluster,
and although he's been doing cluster computing for a rather long time
(close to 10 years at a guess, maybe even more) and COULD and HAS IN THE
PAST done it all himself, he really likes Scyld's general cluster
administration and encapsulation features.  Of course the grants that
fund the research are deep-pocketed enough to afford it, as well.  That
isn't always the case in academe, and it really isn't the case at home.

However, Don Becker is on the list and you've given him an open
invitation to present Scyld, who it is really designed and intended for,
and maybe even an overview of how it (currently) works.  Don?

    rgb
(Continue reading)

Robert G. Brown | 3 Jan 00:44 2007

Re: picking out a job scheduler

On Tue, 2 Jan 2007, Chris Dagdigian wrote:

>> (3) Its likely that in the future I'll have part-time access to another 
>> cluster of dual-boot (XP/linux) machines.  The machines will default to 
>> booting to Linux, but will occasionally (5-20 hours a week) be used as 
>> windows workstations by a console user (when a user is finished, they'll 
>> restart the machine and it will boot back to linux).  If cluster nodes are 
>> available in this sort of unpredictable and intermittent way, can they be 
>> used as compute nodes in some fashion? Wil gridengine/PBS /??? take care of 
>> this sort of process migration?
>> 
>
> Grid Engine will not transparently preserve and migrate running jobs off of 
> machines that get bounced suddenly.  This sort of transparent and automatic 
> checkpointing and migration is actually pretty hard to do in practice.  If 
> you know in advance which machines are going to be shut down and rebooted 
> into windows then there are tools in all the common scheduling packages for 
> "draining" a particular machine or queue.  You can also "kill and reschedule"

For what it is worth, the current generation of Condor can, for some
code and linked with its own migration library, permit transparent
checkpointing and code migration, and it also has a very complex
"policy" engine that lets one specify in great deal how to turn jobs on
and off as user/owners use the systems in the pool.  It has recently
become "true open source" although the download website is still a PITA
to navigate and requires a kind of "registration" and its license is
still not a straight GPL.

This is kind of funny because as I read it, the toolset can now be
wrapped up in source RPMs and distributed as a standard component of
(Continue reading)

David Simas | 3 Jan 01:03 2007

Re: picking out a job scheduler

On Tue, Jan 02, 2007 at 06:44:50PM -0500, Robert G. Brown wrote:
> 
> One of the bitches that I and many others have about all of the
> alternatives is that they are too damn complicated.  Many sites -- I
> won't say most but many -- have very, very simple needs for a
> scheduler/queuing system.  Needs that could be met without requiring the
> admin to read a 1000 page manual, join a mailing list, work through a
> really complicated build, and try to figure out several distinct
> security models and policy models.  What is really needed is a fully
> open source "scheduler lite" that pretty much sets up a simple queue for
> a simple list of machines with a simple cron-like policy statement,
> maybe all defined with an XMLish config file that permitted classes of
> machines (like a bunch that belong to user A) to share a policy.

Ruby Queue?

	http://raa.ruby-lang.org/project/rq/
	http://www.artima.com/rubycs/articles/rubyqueue.html

DGS

> 

Chris Samuel | 3 Jan 02:23 2007
X-Face

Re: picking out a job scheduler

On Wednesday 03 January 2007 08:06, Chris Dagdigian wrote:

> Both should be fine although if you are considering *PBS you should  
> look at both Torque (a fork of OpenPBS I think)

That's correct, it (and ANU-PBS, another fork) seem to be the defacto queuing 
systems in the state and national HPC centers down here.

Torque is just *so* much better than OpenPBS used to be (not that it was 
particularly hard).

cheers,
Chris
--

-- 
 Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

On Wednesday 03 January 2007 08:06, Chris Dagdigian wrote:

> Both should be fine although if you are considering *PBS you should  
> look at both Torque (a fork of OpenPBS I think)

That's correct, it (and ANU-PBS, another fork) seem to be the defacto queuing 
systems in the state and national HPC centers down here.

Torque is just *so* much better than OpenPBS used to be (not that it was 
particularly hard).
(Continue reading)

Joe Landman | 3 Jan 05:54 2007

OT: Announcing MPI-HMMER

Hi folks:

  Short OT break.  http://code.google.com/p/mpihmmer/  an MPI
implementation of HMMer 2.3.2.

  Back to your regularly scheduled cluster.

Joe

--

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman <at> scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452 or +1 866 888 3112
cell : +1 734 612 4615
Leif Nixon | 3 Jan 11:54 2007
Picon
Picon

Re: Which distro for the cluster?

"Robert G. Brown" <rgb <at> phy.duke.edu> writes:

> Also, plenty of folks on this list have done just fine running "frozen"
> linux distros "as is" for years on cluster nodes.  If they aren't broke,
> and live behind a firewall so security fixes aren't terribly important,
> why fix them? 

Because your users will get their passwords stolen.

If your cluster is accessible remotely, that firewall doesn't really
help you very much. The attacker can simply login as a legitimate user
and proceed to walk through your wide-open local security holes.

But you know this already.

--

-- 
Leif Nixon                       -            Systems expert
------------------------------------------------------------
National Supercomputer Centre    -      Linkoping University
------------------------------------------------------------
Reuti | 3 Jan 13:01 2007
Picon

Re: picking out a job scheduler

Hi,

Am 03.01.2007 um 02:23 schrieb Chris Samuel:

> On Wednesday 03 January 2007 08:06, Chris Dagdigian wrote:
>
>> Both should be fine although if you are considering *PBS you should
>> look at both Torque (a fork of OpenPBS I think)

although I'm somehow biased to suggest SGE, I also check from time to  
time the Torque mailing list.

> That's correct, it (and ANU-PBS, another fork) seem to be the  
> defacto queuing
> systems in the state and national HPC centers down here.

Whether any queuing system is a standard might not matter. More  
important for chosing one, may be the technical points. To compare  
SGE and Torque e.g.:

- Do you need support for Tight Integrated Linda (I think this will  
most often mean Gaussian) (and PVM) parallel jobs: use SGE

- Do you have some special nodes inside your cluster, and you need to  
specify your resource requests for a parallel job (i.e. combination  
of different types of machines you need for it) in a fine granulated  
manner: use Torque

It's of course impossible to know already in advance a) the needs of  
all the applications, and b) all the features of the queuingsystems,  
(Continue reading)


Gmane