Koza Zbigniew | 1 Dec 2004 14:45
Picon

clustering: how to start?

We have just bought 4 dual-processor Xeon boxes for a physics
department and want to turn it into a computing cluster.
This means we want that the users will communicate with it via
ssh/sftp and see all processors as if they were part of a one virtual
UNIX machine with 8 processors on board.
A typical job we want to run is
(a bunch of) numerical simulation programs that run for 2 weeks or so.
Thus, the cluster will serve only as a numerical workstation.

We have experimented with Debian and OpenMosix. The first outcome of our
tests is that Intels' so much advertised Hyperthreading technology is a
disaster. Then, OpenMosix seems to have a bug that can bring the whole
system to halt if one tries to run simultanieously several
memory-consuming processes.
We have also problems with compiling custom Linux kernel so that it
could see all the swap space and memory it should see.

Then I recalled that several years ago I experimented with FreeBSD and I
liked it. Now I hear that FreeBSD supports SMP "natively", so
no kernel recompilation will be necessary.
I thought - maybe we could try and use FreeBSD instead of Debian?
Now my question is: where should I start?
Which docs should I read?
Which program(s) should I use to cluster our 4 FreeBSD boxes?

regards,
Z. Koza

_______________________________________________
freebsd-cluster <at> freebsd.org mailing list
(Continue reading)

Devon H. O'Dell | 1 Dec 2004 14:51

Re: clustering: how to start?

Koza Zbigniew wrote:
[snip]
> Then I recalled that several years ago I experimented with FreeBSD and I
> liked it. Now I hear that FreeBSD supports SMP "natively", so
> no kernel recompilation will be necessary.
> I thought - maybe we could try and use FreeBSD instead of Debian?
> Now my question is: where should I start?
> Which docs should I read?
> Which program(s) should I use to cluster our 4 FreeBSD boxes?
> 
> regards,
> Z. Koza

FreeBSD won't do process distribution natively. Want a project? :)

Other than that, there are other programs you can use to distribute 
applications in userspace (such as any of the MPI packages; I've had 
good experience with lam-mpi). To monitor the status of your clusters, 
you can use an application called Ganglia.

Good luck!

Devon H. O'Dell

> _______________________________________________
> freebsd-cluster <at> freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-cluster
> To unsubscribe, send any mail to "freebsd-cluster-unsubscribe <at> freebsd.org"
> 
> 
(Continue reading)

Marc. Perisa | 1 Dec 2004 17:15
Picon

Re: clustering: how to start?

On Wed, 01 Dec 2004 14:51:08 +0100, Devon H. O'Dell
<dodell <at> sitetronics.com> wrote:
> Koza Zbigniew wrote:
> [snip]
> 
> > Then I recalled that several years ago I experimented with FreeBSD and I
> > liked it. Now I hear that FreeBSD supports SMP "natively", so
> > no kernel recompilation will be necessary.
> > I thought - maybe we could try and use FreeBSD instead of Debian?
> > Now my question is: where should I start?
> > Which docs should I read?
> > Which program(s) should I use to cluster our 4 FreeBSD boxes?
> >
> 
> FreeBSD won't do process distribution natively. Want a project? :)
> 
> Other than that, there are other programs you can use to distribute
> applications in userspace (such as any of the MPI packages; I've had
> good experience with lam-mpi). To monitor the status of your clusters,
> you can use an application called Ganglia.
>

Hi,

the "normal" way to go is to use MPI for number crunching operations
on "normal" hardware. Since MPI is quite industrial standard most
programs which are spanning multiple machines support this interface.
To get a high real-world performance you should use high speed
interconnects with a low latency.

(Continue reading)

Maxim Sobolev | 9 Dec 2004 18:41
Favicon

Re: ng_one2many heartbeat algorithm for LAN fault tolerance

Hi,

What is the status of those patches? I am using one2many node to 
implement simple multilink connection and like you idea.  Why they 
aren't in the tree yet? Are there any problems?

Regards,

Maxim
_______________________________________________
freebsd-cluster <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-cluster
To unsubscribe, send any mail to "freebsd-cluster-unsubscribe <at> freebsd.org"

Evgeny Dolgopiat | 10 Dec 2004 12:22
Picon

Re: ng_one2many heartbeat algorithm for LAN fault tolerance

Hi

This patch is working in 5.3 and in -CURRENT branches. I don't know why it's 
not in the tree :(. I sent messages about patches to Julian Elischer and 
Archie Cobbs but haven't got any reply. I'm interested in adding the patch to 
the tree, but haven't possibility. Do you have any ideas who can commit it?

Evgeny

>Hi,
>
>What is the status of those patches? I am using one2many node to 
>implement simple multilink connection and like you idea.  Why they 
>aren't in the tree yet? Are there any problems?
>
>Regards,
>
>Maxim

_______________________________________________
freebsd-cluster <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-cluster
To unsubscribe, send any mail to "freebsd-cluster-unsubscribe <at> freebsd.org"

Maxim Sobolev | 10 Dec 2004 12:36
Favicon

Re: ng_one2many heartbeat algorithm for LAN fault tolerance

Evgeny Dolgopiat wrote:
> Hi
> 
> This patch is working in 5.3 and in -CURRENT branches. I don't know why it's 
> not in the tree :(. I sent messages about patches to Julian Elischer and 
> Archie Cobbs but haven't got any reply. I'm interested in adding the patch to 
> the tree, but haven't possibility. Do you have any ideas who can commit it?

I can after some testing. ;-)

-Maxim
_______________________________________________
freebsd-cluster <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-cluster
To unsubscribe, send any mail to "freebsd-cluster-unsubscribe <at> freebsd.org"

Evgeny Dolgopiat | 10 Dec 2004 13:45
Picon

Re: ng_one2many heartbeat algorithm for LAN fault tolerance

Ok. Do you need something from me (some patches, docs etc.)? 

>
> I can after some testing. ;-)
>
> -Maxim

_______________________________________________
freebsd-cluster <at> freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-cluster
To unsubscribe, send any mail to "freebsd-cluster-unsubscribe <at> freebsd.org"

Maxim Sobolev | 10 Dec 2004 14:10
Favicon

Re: ng_one2many heartbeat algorithm for LAN fault tolerance

Well, I've took closer look at the patch today and found that two things 
  need to be done to get it to the commitable shape:

1. Make it working independently of the layer on which one2many 
operates. For example, I use it for doing ip-over-udp multilink tunnel 
between two points. Since one2many is protocol agnostic heartbeat 
feature should not make any assumptions regarding type of protocol (e.g. 
ethernet, ip, udp etc).

This is not trivial, I know, maybe the best way to address it is to make 
content of heartbeat packet configurable, so that one can put in 
whatever is appropriate for his task. Then you can put guidelines about 
configuring it for ethernet into man page. For example, in my situation 
I can put single 0 byte as a heartbeat payloas since it will be used as 
a payload to UDP packet, so that no real header is necessary.

2. It would be very nice to extend heartbeat algorithm with recovery 
function, so that it will keep sending heartbeats over links that were 
detected as dead ones previously to see if any of them have recovered.

Please let me know what do you think about it.

Regards,

Maxim

Evgeny Dolgopiat wrote:
> Ok. Do you need something from me (some patches, docs etc.)? 
> 
> 
(Continue reading)


Gmane