David Brown | 1 Jun 2007 01:10
Picon

Re: Fwd: Okay some test data and question

> Running the clients and servers on the same machine might actually
> hurt your performance a bit.  Since PVFS doesn't have a native
> quadrics method, maybe you save a lot of overhead skipping the
> tcp-over-quadrics stuff.

Yeah but I wasn't running over the quadrics I was using the
gig-ethernet however the data should be using the loopback then going
right to disk... One of these days we'll get an infiniband box we can
test on ;)

> It might be easier to see patterns if you held servers constant and
> increased the number of clients, or held clients constant while
> varying numbers of servers.  To visualize both you'd end up with a
> 3-d plot...

That'd be interesting but where would the data go for the odd clients
as you get more clients than servers? and how would you control that
to make the graph make sense?

> I'm looking at 255-test.csv.  That's 256 nodes (acting as servers and
> clients), each client running dd to write 10 GB to a single server?

Yeah each client dd'd a 10Gb file to itself through pvfs.

> I don't know why that workload would take about a minute for up to 64
> clients, speed up for 65-141 clients, and then go back to being slower
> for the rest of the runs, except for a cluster of fast runs at 173-183
> clients.
>
> Since you've got things set up so each client talks to a single server
(Continue reading)

Andrew Pochinsky | 5 Jun 2007 20:55
Picon
Favicon

web site down?

Hi, everybody,
     I'm wondering if www.pvfs.org is still the site for PVFS2. If it  
is, it appears to be down at the moment, if it is not, would somebody  
point to the new place, please?
Thanks,
--andrew
Robert Latham | 6 Jun 2007 16:37
Favicon

Re: web site down?

On Tue, Jun 05, 2007 at 02:55:14PM -0400, Andrew Pochinsky wrote:
> Hi, everybody,
>     I'm wondering if www.pvfs.org is still the site for PVFS2. If it  
> is, it appears to be down at the moment, if it is not, would somebody  
> point to the new place, please?

Hi Andrew

Clemson had a big power outage for a couple days, but everything seems
to be back on line now.  

Sorry for the troubles.

==rob

--

-- 
Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA                 B29D F333 664A 4280 315B
Walter B. Ligon III | 6 Jun 2007 16:54
Favicon

pvfs.org and mailing lists


Hello folks.  Last week there was a failure in a power distribution 
device here at Clemson.  The power grid here on campus is a mess, and as 
a result when this kind of thing happens they end up taking down about 
1/2 of campus to fix it.  It was supposed to be done starting the wee 
hours Monday morning and ending by 1pm on Monday.  Then it was 2pm. 
Then it was 4pm.  At 11pm I came to campus to find about 2/3 of the 
circuits running, no A/C and various other weird things going on.  Turns 
  out when they powered back up they blew one leg of the 3-phase line 
that feeds our building.  They ended up shutting us down, and we 
remained down all day Tuesday.  Some time before 5am this morning they 
got it back up.  Everything should be working OK now.  Sorry for the 
inconvenience.

Walt
--

-- 
Dr. Walter B. Ligon III
Associate Professor
ECE Department
Clemson University
slang | 6 Jun 2007 19:34
Favicon

Fwd: [Pvfs2-developers] pvfs.org and mailing lists


Begin forwarded message:

> From: "Walter B. Ligon III" <walt <at> clemson.edu>
> Date: June 6, 2007 9:54:59 AM CDT
> To: PVFS2-developers <pvfs2-developers <at> beowulf-underground.org>,  
> pvfs2-users <pvfs2-users <at> beowulf-underground.org>
> Subject: [Pvfs2-developers] pvfs.org and mailing lists
>
>
> Hello folks.  Last week there was a failure in a power distribution  
> device here at Clemson.  The power grid here on campus is a mess,  
> and as a result when this kind of thing happens they end up taking  
> down about 1/2 of campus to fix it.  It was supposed to be done  
> starting the wee hours Monday morning and ending by 1pm on Monday.   
> Then it was 2pm. Then it was 4pm.  At 11pm I came to campus to find  
> about 2/3 of the circuits running, no A/C and various other weird  
> things going on.  Turns  out when they powered back up they blew  
> one leg of the 3-phase line that feeds our building.  They ended up  
> shutting us down, and we remained down all day Tuesday.  Some time  
> before 5am this morning they got it back up.  Everything should be  
> working OK now.  Sorry for the inconvenience.
>
> Walt
> -- 
> Dr. Walter B. Ligon III
> Associate Professor
> ECE Department
> Clemson University
> _______________________________________________
(Continue reading)

Murali Vilayannur | 7 Jun 2007 00:13
Picon

Re: unable to insert kernel module pvfs2.ko

Tom,
I think you meant to send this to pvfs2 mailing list as well! cc'ing
pvfs2-users..
Glad that things are working finally for you!
As part of mount, you need to specify the file system name, not the
storage space directory..
That is why you are seeing this error (-ENODEV). The PVFS faq on
www.pvfs.org (if the website is up) is also a good source for common
errors/using the pvfs tools etc etc

mount -t pvfs2 tcp://node5:3334/pvfs2-fs /mnt/pvfs2
should do the trick.

pvfs2-ping is definitely one good source of troubleshooting.
I think Sam/Phil also had some other tools (pvfs2-validate?) that I think
they are most qualified to answer about.
Regards,
Murali

On 6/6/07, Tommy Butler <tommy <at> atrixnet.com> wrote:
> Solution: download and install debian kernel for bigmem, kernel headers for
> bigmem, make kernel modules under /usr/src/linux, make pvfs2 and pvfs2.ko,
> insmod works.
>
> Now... does pvfs2 offer any troubleshooting tools other than pvfs2-ping?
>
> mount -t pvfs2 tcp://node5:3334/pvfs2-storage-space
> /mnt/pvfs2
> mount: tcp://node5:3334/pvfs2-storage-space: unknown device
>
(Continue reading)

Tommy Butler | 7 Jun 2007 23:49

Re: unable to insert kernel module pvfs2.ko

I am having more trouble with getting pvfs2 set up on my cluster than I've ever had problems with any other software in my entire career as a linux sysadmin.  I am growing very frustrated.  One problem is overcome only to encounter another.  Right now I'm getting to the point where on most (not all) of my systems I can start up the pvfs2 server, the client, and mount the pvfs2 filesystem.  There are servers that still fail to get past starting the server.  But despite all this, pvfs2-ping fails every time, and I can't ever create a file on the pvfs2 fs.

Is there anyone out there who is willing to help me by logging in and investigating my set up?  It would be nice if someone were willing to do it without charge, but I may be able to arrange payment if that is the only way I can get some assistance hands-on.

Please help me, someone.  I set up pvfs2 two years ago on a cluster, and had no problems whatsoever.  None.  The installation and the deployment came off flawlessly and that filesystem is still running in production on the cluster.

This is precisely why I can't understand why I keep failing to get this operational, time and time again, day after day, week after week.

Thanks, all.

--
Tommy Butler

_______________________________________________
Pvfs2-users mailing list
Pvfs2-users <at> beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
Tommy Butler | 8 Jun 2007 04:28

pvfs2-ping says "appears to have problems"

Can anyone help me understand this error (pasted in at the bottom of this email)?  After reading over the FAQ for a second time, as Murali suggested, I still find no clues as to what I should do to fix this.

In order to aid in troubleshooting, I've put all the configuration files I have for the pvfs2 cluster in a tgz file at www.atrixnet.com/tmp/pvfs2-configs.tgz

As a side note, if anyone is wondering why there are only config files for lvsdirector2, node1, node2, and node5, it is because those are the only nodes that will start the pvfs2 server and are able to mount the pvfs2 filesystem, despite identical configurations and identical hardware (all machines are ibm zseries 1U rack mounts with 4G ram and 2Gz CPU, IDE bus.

And now, the error from pvfs2-ping when issued from node5:

node5:~# pvfs2-ping -m /mnt/pvfs2/

(1) Parsing tab file...

(2) Initializing system interface...

(3) Initializing each file system found in tab file: /etc/pvfs2tab...

   PVFS2 servers: tcp://node5:3334
   Storage name: pvfs2-fs
   Local mount point: /mnt/pvfs2
   /mnt/pvfs2: Ok

(4) Searching for /mnt/pvfs2/ in pvfstab...

   PVFS2 servers: tcp://node5:3334
   Storage name: pvfs2-fs
   Local mount point: /mnt/pvfs2

   meta servers:
   tcp://lvsdirector2:3334
   tcp://node1:3334
   tcp://node2:3334
   tcp://node5:3334

   data servers:
   tcp://lvsdirector2:3334
   tcp://node1:3334
   tcp://node2:3334
   tcp://node5:3334

(5) Verifying that all servers are responding...

   meta servers:
   tcp://lvsdirector2:3334 Ok
   tcp://node1:3334 Ok
   tcp://node2:3334 Ok
   tcp://node5:3334 Ok

   data servers:
   tcp://lvsdirector2:3334 Ok
   tcp://node1:3334 Ok
   tcp://node2:3334 Ok
   tcp://node5:3334 Ok

(6) Verifying that fsid 1665071444 is acceptable to all servers...

   Ok; all servers understand fs_id 1665071444

(7) Verifying that root handle is owned by one server...

   Root handle: 1048576
Failure: check root handle failed
PVFS_mgmt_setparam_all: Detailed per-server errors are available (error class: 0)

        Exactly one server must own the root handle.
        In this setup, no servers own the root handle.

Per-server errors:
=============================================================

The PVFS2 filesystem at /mnt/pvfs2/ appears to have problems.

--
Tommy Butler

_______________________________________________
Pvfs2-users mailing list
Pvfs2-users <at> beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
Murali Vilayannur | 8 Jun 2007 05:03
Picon

Re: pvfs2-ping says "appears to have problems"

Hi Tommy,
That is really weird..
Your config files all look ok to me..
Were all the servers pointed to the same config files when they were started?
Servers are assigned meta handle ranges and data handle ranges. i.e
they take owner ship
of handle ranges and requests are sent to those servers from the client.
What the error message says below is that the handle for the root
inode does not seem to be claimed by any server which means that it is
outside of any of the servers assigned ranges which is extremely weird
in this case since all the config files appear to be ok here..
SOmething else is going on or you found an elusive bug with the config
files stuff..
thanks,
Murali

On 6/7/07, Tommy Butler <tommy <at> atrixnet.com> wrote:
> Can anyone help me understand this error (pasted in at the bottom of this
> email)?  After reading over the FAQ for a second time, as Murali suggested,
> I still find no clues as to what I should do to fix this.
>
> In order to aid in troubleshooting, I've put all the configuration files I
> have for the pvfs2 cluster in a tgz file at
> www.atrixnet.com/tmp/pvfs2-configs.tgz
>
> As a side note, if anyone is wondering why there are only config files for
> lvsdirector2, node1, node2, and node5, it is because those are the only
> nodes that will start the pvfs2 server and are able to mount the pvfs2
> filesystem, despite identical configurations and identical hardware (all
> machines are ibm zseries 1U rack mounts with 4G ram and 2Gz CPU, IDE bus.
>
> And now, the error from pvfs2-ping when issued from node5:
>
> node5:~# pvfs2-ping -m /mnt/pvfs2/
>
> (1) Parsing tab file...
>
> (2) Initializing system interface...
>
> (3) Initializing each file system found in tab file: /etc/pvfs2tab...
>
>    PVFS2 servers: tcp://node5:3334
>    Storage name: pvfs2-fs
>    Local mount point: /mnt/pvfs2
>    /mnt/pvfs2: Ok
>
> (4) Searching for /mnt/pvfs2/ in pvfstab...
>
>    PVFS2 servers: tcp://node5:3334
>    Storage name: pvfs2-fs
>    Local mount point: /mnt/pvfs2
>
>    meta servers:
>    tcp://lvsdirector2:3334
>    tcp://node1:3334
>    tcp://node2:3334
>    tcp://node5:3334
>
>    data servers:
>    tcp://lvsdirector2:3334
>    tcp://node1:3334
>    tcp://node2:3334
>    tcp://node5:3334
>
> (5) Verifying that all servers are responding...
>
>    meta servers:
>    tcp://lvsdirector2:3334 Ok
>    tcp://node1:3334 Ok
>    tcp://node2:3334 Ok
>    tcp://node5:3334 Ok
>
>    data servers:
>    tcp://lvsdirector2:3334 Ok
>    tcp://node1:3334 Ok
>    tcp://node2:3334 Ok
>    tcp://node5:3334 Ok
>
> (6) Verifying that fsid 1665071444 is acceptable to all servers...
>
>    Ok; all servers understand fs_id 1665071444
>
> (7) Verifying that root handle is owned by one server...
>
>    Root handle: 1048576
> Failure: check root handle failed
> PVFS_mgmt_setparam_all: Detailed per-server errors are available (error
> class: 0)
>
>         Exactly one server must own the root handle.
>         In this setup, no servers own the root handle.
>
> Per-server errors:
> =============================================================
>
> The PVFS2 filesystem at /mnt/pvfs2/ appears to have problems.
>
> --
> Tommy Butler
> _______________________________________________
> Pvfs2-users mailing list
> Pvfs2-users <at> beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
>
Tommy Butler | 8 Jun 2007 17:32

apache serving images of pvfs2 not working

I'm just wondering if anyone else has seen this.  I can get to text/html content just fine if I serve files (or php for that matter) off of the pvfs2 filesystem.  But images are another thing.  They all stop after 0 bytes are sent.  It is so strange.

Any thoughts?  I've ***'d out the IP address/domain here...

node5:/var/www# wget http://***.***.***.***/images/store_sign-up.jpg
--10:24:48--  http://***.***.***.***/images/store_sign-up.jpg
           => `store_sign- up.jpg'
Connecting to ***.***.***.***:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 257,118 (251K) [image/jpeg]

 0% [                                                                                                                                                           ] 0             --.--K/s

10:24:48 (0.00 B/s) - Connection closed at byte 0. Retrying.

--10:24:49--  http://***.***.***.***/images/store_sign-up.jpg
  (try: 2) => `store_sign-up.jpg'
Connecting to ***.***.***.***:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 257,118 (251K) [image/jpeg]
store_sign-up.jpg has sprung into existence.
Retrying.

--10:24:51--  http://***.***.***.***/images/store_sign-up.jpg
  (try: 3) => `store_sign-up.jpg.1'
Connecting to ***.***.***.***... connected.
HTTP request sent, awaiting response... 200 OK
Length: 257,118 (251K) [image/jpeg]

 0% [                                                                                                                                                           ] 0             --.--K/s

10:24:51 (0.00 B/s) - Connection closed at byte 0. Retrying.

--10:24:54--  http://***.***.***.***/images/store_sign-up.jpg
  (try: 4) => `store_sign-up.jpg.1'
Connecting to ***.***.***.***:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 257,118 (251K) [image/jpeg]
store_sign-up.jpg.1 has sprung into existence.
Retrying.

--10:24:58--  http://***.***.***.***/images/store_sign-up.jpg
  (try: 5) => `store_sign-up.jpg.2'
Connecting to ***.***.***.***:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 257,118 (251K) [image/jpeg]

 0% [                                                                                                                                                           ] 0             --.--K/s

10:24:58 (0.00 B/s) - Connection closed at byte 0. Retrying.

--10:25:03--  http://***.***.***.***/images/store_sign-up.jpg
  (try: 6) => `store_sign-up.jpg.2'
Connecting to ***.***.***.***:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 257,118 (251K) [image/jpeg]
store_sign-up.jpg.2 has sprung into existence.
Retrying.

^C

--
Tommy Butler

_______________________________________________
Pvfs2-users mailing list
Pvfs2-users <at> beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Gmane