Dan Stromberg | 1 Mar 2006 02:11
Picon

Re: g77 limits...


python's the way of the future though :)  Agreed that shell (bash) and
make are important, and regular expressions of a (very powerful)
necessary evil.

(I tend to code in python as my first choice, but some things just work
better as bash or C.  And hopefully someday I'll find time to resume
studying Haskell, since it and OCaml are apparently so good at adapting
to changes in a program's requirements)

On Tue, 2006-02-28 at 20:30 +0000, Andrew M.A. Cater wrote:
> C first, last and always: somewhere or other, you'll come across C code
> / someone who only knows C and whose high-level pseudocode is all
> "C like". Try hard to stick to portable C and follow the ANSI standards.
> [Note: C99, though it is seven years old now, is not well supported
> except in the newer compilers]
> 
> Shell scripting: quick and dirty hacks used to be done entirely in shell
> script. It's worth knowing enough to be able to read good Bourne shell
> scripts and, by extension, bash scripts - they crop up all over the
> place in Linux and "classical" UNIX.
> 
> Perl: Swiss Army chainsaw - you can do anything script-y in Perl and a
> whole lot more. It is easy to write poor-quality Perl: the canonical
> books are published by O'Reilly and Co and known as "The llama book"
> and "The camel book" aka learning Perl and programming Perl. Get the
> latest editions.
> 
> Regular expressions and pattern matching crop up a lot in scripting and 
> Perl. The O'Reilly regexp book by Friedl [Mastering Regular Expressions] 
(Continue reading)

Stuart Midgley | 1 Mar 2006 02:15
Picon

Re: Multidimensional FFTs

Hi Bill

I've tested fft's rather extensively and run other codes that require  
a transpose.  In my experience, a well tuned gig-e network is capable  
of giving speed up, though not necessarily scaling that well.  The  
most important thing is that you have full bisection bandwidth.   
Anything less will reduce your scaling.  That is, if you use gig-e  
you can't trunk switches, you will need to stay within a single  
switch.  Typically, I've seen a 16 cpu job on gig-e gig about a 10  
times speedup.  Of course, it is processor/memory/nic dependant.

I've also run fft's on Quadrics Elan 3/4, IBM hps, and SGI Numalink  
4.  Since these are considerably higher bandwidth network they  
perform much better.  On a 16cpu job I've seen around 14 times speed  
up on these higher bandwidth networks.

As the size increases (say 256 cpu's) the networks that maintain full  
bisection bandwidth scale the best.  There are very few reasonably  
prices gig-e switches that maintain full bisection bandwidth at 256  
cpu's, while Quadrics and HPS do (though their starting price is  
high, at the larger system sizes, they become a realistic  
proposition).  Numalink falls away a little due to the weird network  
topology (dual plane quad bristle fat tree) which has drops in  
network connectivity/cpu as the system gets larger.

If you want to go with gig-e a few things to be aware of:

*The nic matters (pro1000MT's give 10-15% better performance that  
pro1000T's)

(Continue reading)

Joe Landman | 1 Mar 2006 02:42

Re: g77 limits...

Odd, I thought it was Ruby.  On Rails at that. :^

Dan Stromberg wrote:
> python's the way of the future though :)  Agreed that shell (bash) and
> make are important, and regular expressions of a (very powerful)
> necessary evil.

I have always had problems with any language that begins its self 
justification with "because it's not X" where X is any of Fortran, Perl, 
APL, ...

My choices would be C and Perl to start for systems and lower level 
programming.  Higher level stuff (glueing code / processes together) 
would be Perl and X (pick the X of your choice, Python, Ruby, ...). 
Fast HLL type but really fast interface to a huge corpus of numerical 
libraries I would recommend Fortran.  Yeah, well, it works, really well, 
and though some people would rather naw off their fingers than allow 
themselves to type a do loop, it doesn't reflect upon the language.

For algorithm design/testing, I might recommend Matlab/Octave.  Yeah, 
lots of people have mixed feelings on this.  I like the immediacy of the 
feedback.  It actually makes life a little easier as a prototype bit.

For SMP, OpenMP.  Just make your life easier unless you are designing an 
OS or some sort of complex low level service.  For DMP, MPI, usually LAM 
or similar.  Nothing against MPICH, just I have had somewhat better luck 
using LAM and using it to diagnose problems with machines and applications.

In the end the choice of language depends strongly upon what you are 
doing and what you need to do.  Some languages are uniformly 
(Continue reading)

DGS | 1 Mar 2006 02:55

Re: g77 limits...

On Tue, Feb 28, 2006 at 08:42:10PM -0500, Joe Landman wrote:
> 
> In the end the choice of language depends strongly upon what you are 
> doing and what you need to do.  Some languages are uniformly 
> ill-equipped to the tasks they are set to.  Some are well equipped, but 
> for various fad reasons, are out of favor in deference to the flavor of 
> the day.

I think that you should try to teach yourself a new programming
language every few years.  Even if you never end up using it
much, it's a good intellectual exercise.

David S.

> 
_______________________________________________
Beowulf mailing list, Beowulf <at> beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Greg Lindahl | 1 Mar 2006 03:34
Favicon

Re: Multidimensional FFTs

On Tue, Feb 28, 2006 at 01:26:51PM -0500, Bill Rankin wrote:

> There is a research group here at Duke doing some application  
> development and they are looking at implementing their codes in a  
> cluster environment.  The main problem is that 95% of their  
> processing time is taken up by medium to large sized 3D FFTs (minimum  
> 64 elements on an edge, 256k total elements).

That's a fairly small FFT on a parallel cluster. How many cpus do they
imagine using? Perhaps the easiest thing to do is to whip up some code
and invite people to benchmark it. The G-PTRANS and G-FFTE elements of
HPC Challenge are relevant but not many folks have submitted numbers.

Let's see: for 64**3, and 64 cpus with a 1D decomposition, there are
64**2 words per cpu, and a naive Alltoall will send 64 messages of 64
words each to 63 other nodes. Then the message length is 1024 bytes
(double precision complex). I would disagree with Stu's
recommendations at this size due to the short message length, but I
don't know if 2D would be a better decomposition at this size. FFTW
version 2's MPI routines only do 1D decomposition.

-- greg

_______________________________________________
Beowulf mailing list, Beowulf <at> beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Robert G. Brown | 1 Mar 2006 10:04
Gravatar

Re: g77 limits...

On Tue, 28 Feb 2006, Dan Stromberg wrote:

>
> python's the way of the future though :)  Agreed that shell (bash) and
> make are important, and regular expressions of a (very powerful)
> necessary evil.

Not just powerful, not just evil.  Regular expressions are one of the
things that give a systems administrator that je ne sais quoi, that
special little something, that aura of invincible power.

After all, to add aligned left-padded line numbers, which one looks more
impressive?

   sed = myfile | sed 'N; s/^/     /; s/ *\(.\{6,\}\)\n/\1  /'

or

   cat -n myfile

I rest my case.

As for the python... well, I just plain like delimiters in my code.  I
might even use it if the authors of python hadn't imposed two pieces of
religion on its users:

   No line terminator (e.g. ;)
   No {} -- all code grouping MUST be accomplished by indentation.

Violators will be shot.  News at 11.
(Continue reading)

Leif Nixon | 1 Mar 2006 11:11
Picon
Picon
Favicon

Re: g77 limits...

"Robert G. Brown" <rgb <at> phy.duke.edu> writes:

> As for the python... well, I just plain like delimiters in my code.  I
> might even use it if the authors of python hadn't imposed two pieces of
> religion on its users:
>
>    No line terminator (e.g. ;)
>    No {} -- all code grouping MUST be accomplished by indentation.

In braces-riddled languages you have to mark up the block structure
twice; first with braces for the sake of the compiler, and then with
indentation for the sake of humans. That's a bit silly and error prone
in my eyes...

--

-- 
Leif Nixon                       -            Systems expert
------------------------------------------------------------
National Supercomputer Centre    -      Linkoping University
------------------------------------------------------------
_______________________________________________
Beowulf mailing list, Beowulf <at> beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Andrew Piskorski | 1 Mar 2006 12:25

Re: "dual" Quad solution from Tyan

On Mon, Feb 27, 2006 at 08:49:23PM +0000, Ricardo Reis wrote:

> The target machine I was looking for was 8 node with 
> dual processor Opteron (16 CPU). The machine that was at the presentation 
> is
> 
> http://www.cybernex.co.uk/tyan_vx50_quad_opteron_servers.htm
> 
> which means 8 CPU, dual core. And all are connect with hypertransport 
> which meand I got ready of latency I figure. So...

> The professor head of the lab wants me to know "if that is so good why 
> isn't everyone buying one?"

Where is the data suggesting that it's "so good"?  And what do you
want to use it for?

> From: Joe Landman <landman <at> scalableinformatics.com>
> See page 2 of ftp://ftp.tyan.com/datasheets/d_s4881_104.pdf

Ah.  Apparently that box uses a Tyan S4881 4-socket motherboard which
with a special M4881 add-on riser card, can be extended to 8 sockets.
There are other 8-socket Opteron motherboards out there - Iwill makes
one, there are probalby others as well.

So you seem to want to compare a single big 8-socket ccNUMA Opteron
box to a cluster of four 2-socket boxes.

On a per-cpu or per-motherboard basis, the 8xx series Opterons and
4/8-way motherboards are going to be SUBSTANTIALLY more expensive than
(Continue reading)

Joe Landman | 1 Mar 2006 14:08

Re: g77 limits...


Robert G. Brown wrote:
> On Tue, 28 Feb 2006, Dan Stromberg wrote:
> 
>>
>> python's the way of the future though :)  Agreed that shell (bash) and
>> make are important, and regular expressions of a (very powerful)
>> necessary evil.
> 
> 
> Not just powerful, not just evil.  Regular expressions are one of the
> things that give a systems administrator that je ne sais quoi, that
> special little something, that aura of invincible power.

Folks who don't grok RE are missing quite a bit.  There is a huge amount 
of power in a tiny space.  They ain't extremely easy, but they are 
incredibly powerful.

[...]

> As for the python... well, I just plain like delimiters in my code.  I
> might even use it if the authors of python hadn't imposed two pieces of
> religion on its users:
> 
>   No line terminator (e.g. ;)
>   No {} -- all code grouping MUST be accomplished by indentation.
> 
> Violators will be shot.  News at 11.

Ugh... you just woke the beast.  You will be escorted to the cheeseshop 
(Continue reading)

Joe Landman | 1 Mar 2006 14:23

Re: "dual" Quad solution from Tyan


Andrew Piskorski wrote:
> On Mon, Feb 27, 2006 at 08:49:23PM +0000, Ricardo Reis wrote:

> 
> Do you NEED low latency?  The 8-way box PROBABLY has better latency,
> but perhaps Infinipath HTX adapters between 2-socket nodes would give
> similar (or possibly better in some cases?)  latency - you need to
> compare actual performance numbers.

Looking at this design, it looks like 2 hops to the furthest memory, 
which gives you a latency of something like 100ns + 150ns*(N_hops), 
moreover, some of these hops are over a lower speed HT fabric.

This will be a hard machine to optimize well for at a low level 
(lightweight threads and some load balance issues), but should be fine 
at a higher level (MPI shared memory or some OpenMP).

> Note that that Tyan 8-socket box does not have any extra HTX slots, so
> you if you wanted to cluster multiple such boxes you would have to do
> so over its PCI Express slots or built in Gigabit Ethernet.

If you need huge memories, you can stick up to 16 GB on a normal dual 
processor MB or 32 GB if you are willing to pay outrageous memory 
prices.  You can go to 32 GB using 2 GB dimms on the DK88 board from 
iWill if you don't mind running at DDR/333.  You also don't want to put 
that in a 1 U, the memory generates quite a bit of heat, and you need to 
pull that off.  Not to mention my concerns about electrical stability of 
running 8 memory sockets off a single chip.

(Continue reading)


Gmane