Chuck | 2 Oct 2007 00:33
Picon
Favicon

pvfs2 as 15T filesystem

Hello,
  I am in the process of ordering 5 3T 1U machines. I was originally
going to try to use gfs in this cluster to create a large contiguous
filesystem  for my users. The gfs irc gents told me that their product
was not designed for this, at least not at my scale. I need it to scale
out to 120T possibly. I'm thinking of using pvfs2 as my filesystem. I've
looked at gfs, pvfs2, and lustre, but I like pvfs2 the most because I
can actually compile and run it pretty simply. Unfortunately, there
isn't really system level fallbacks, so if one of my nodes goes, so too
does my fs, and of course I cant store a db on it, which might be nice.
What do people out there suggest? Is there something else better for my
purposes? I'm not storing homes on this fs, though that might be nice,
mostly I'll be storing images and text files. Thanks for any comments!
  -Chuck
hyeyoung cho | 3 Oct 2007 08:00
Picon

pvfs performance over IB

Hello.

 

I am working on the performance test of PVFS2 over IB(IBGD-1.8.2).

But the performance was terrible.  (70~80MB using pvfs2-cp)

So just wonder anybody has observed the same behavior earlier?

And anybody can help me with some insight here?

 

When I was IB performance using perf_main, which is a benchmark tool in IBGD,

The IB performance was about 400MB.

 

my settings

- AMD Opteron Processor 240 dual CPU  

- kernel 2.6.9-55.0.2.ELsmp

- running pvfs2-2.6.3

- IBGD-1.8.2 (Topspin HCA - 2pt, 10GB, PCI-X, 128MB)

 

I compiled pvfs2 with following flags:

./configure --with-kernel=/usr/src/linux-2.6 --with-ib=/usr/local/ibgd

--with-ib-includes=/usr/local/ibgd/driver/infinihost/include

--with-ib-libs=/usr/local/ibgd/driver/infinihost/lib64

--with-gm=/opt/gm --with-gm-libs=/opt/gm/lib64

 

 

configuration

1 server, 1 client

 

- Client mount command:

  mount -t pvfs2 ib://c0-10-ib:3335/pvfs2-fs /mnt/pvfs2

  

- server pvfs2-fs.conf:

<Aliases>

        Alias c0-10-ib ib://c0-10-ib:3335

</Aliases>

  

- pvfs2-server.conf-c0-10-ib:

StorageSpace /state/pvfs-part/pvfs2-stoarge-space

HostID "ib://c0-10-ib:3335"

LogFile /tmp/pvfs2-server-ib1server.log

 

Test pvfs2-cp

  [root <at> compute-0-0 ~]# pvfs2-cp -t /state/partition1/testHY /mnt/pvfs2/5

  Wrote 429588480 bytes in 4.854962 seconds. 84.385318 MB/seconds

  [root <at> compute-0-0 ~]# pvfs2-cp -t -b 8388608 /state/partition1/testHY /mnt/pvfs2/aa

  Wrote 429588480 bytes in 5.449869 seconds. 75.173826 MB/seconds

[root <at> compute-0-0 ~]# pvfs2-cp -t -b 4194304 /state/partition1/testHY /mnt/pvfs2/bb

Wrote 429588480 bytes in 5.600240 seconds. 73.155347 MB/seconds

 

Test IB performance(usingperf_main)

  perf_main --send -trc -mbw -s10240 -n1000

BW: 391.8 MBytes/sec [size: 10240 bytes, iter: 1000, total 10240000]

 

Regards,

Thank you in advance for valuable comments.

Hyeyoung cho

 

_______________________________________________
Pvfs2-users mailing list
Pvfs2-users <at> beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
Rob Ross | 3 Oct 2007 17:13

Re: pvfs performance over IB

you should disable TCP. -- rob

hyeyoung cho wrote:
> Hello.
> 
>  
> 
> I am working on the performance test of PVFS2 over IB(IBGD-1.8.2).
> 
> But the performance was terrible.  (70~80MB using pvfs2-cp)
> 
> So just wonder anybody has observed the same behavior earlier?
> 
> And anybody can help me with some insight here?
> 
>  
> 
> When I was IB performance using perf_main, which is a benchmark tool in 
> IBGD,
> 
> The IB performance was about 400MB.
> 
>  
> 
> my settings
> 
> - AMD Opteron Processor 240 – dual CPU  
> 
> - kernel 2.6.9-55.0.2.ELsmp
> 
> - running pvfs2-2.6.3
> 
> - IBGD-1.8.2 (Topspin HCA - 2pt, 10GB, PCI-X, 128MB)
> 
>  
> 
> I compiled pvfs2 with following flags:
> 
> ./configure --with-kernel=/usr/src/linux-2.6 --with-ib=/usr/local/ibgd
> 
> --with-ib-includes=/usr/local/ibgd/driver/infinihost/include
> 
> --with-ib-libs=/usr/local/ibgd/driver/infinihost/lib64
> 
> --with-gm=/opt/gm --with-gm-libs=/opt/gm/lib64
> 
>  
> 
>  
> 
> configuration
> 
> 1 server, 1 client
> 
>  
> 
> - Client mount command:
> 
>   mount -t pvfs2 ib://c0-10-ib:3335/pvfs2-fs /mnt/pvfs2
> 
>   
> 
> - server pvfs2-fs.conf:
> 
> <Aliases>
> 
>         Alias c0-10-ib ib://c0-10-ib:3335
> 
> </Aliases>
> 
>   
> 
> - pvfs2-server.conf-c0-10-ib:
> 
> StorageSpace /state/pvfs-part/pvfs2-stoarge-space
> 
> HostID "ib://c0-10-ib:3335"
> 
> LogFile /tmp/pvfs2-server-ib1server.log
> 
>  
> 
> Test pvfs2-cp
> 
>   [root <at> compute-0-0 ~]# pvfs2-cp -t /state/partition1/testHY /mnt/pvfs2/5
> 
>   Wrote 429588480 bytes in 4.854962 seconds. 84.385318 MB/seconds
> 
>   [root <at> compute-0-0 ~]# pvfs2-cp -t -b 8388608 /state/partition1/testHY 
> /mnt/pvfs2/aa
> 
>   Wrote 429588480 bytes in 5.449869 seconds. 75.173826 MB/seconds
> 
> [root <at> compute-0-0 ~]# pvfs2-cp -t -b 4194304 /state/partition1/testHY 
> /mnt/pvfs2/bb
> 
> Wrote 429588480 bytes in 5.600240 seconds. 73.155347 MB/seconds
> 
>  
> 
> Test IB performance(usingperf_main)
> 
>   perf_main --send -trc -mbw -s10240 -n1000
> 
> BW: 391.8 MBytes/sec [size: 10240 bytes, iter: 1000, total 10240000]
> 
>  
> 
> Regards,
> 
> Thank you in advance for valuable comments.
> 
> Hyeyoung cho
> 
>  
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Pvfs2-users mailing list
> Pvfs2-users <at> beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
Murali Vilayannur | 3 Oct 2007 19:18
Picon

Re: pvfs2 as 15T filesystem

Hi Chuck,

>   I am in the process of ordering 5 3T 1U machines. I was originally
> going to try to use gfs in this cluster to create a large contiguous
> filesystem  for my users. The gfs irc gents told me that their product
> was not designed for this, at least not at my scale. I need it to scale
> out to 120T possibly. I'm thinking of using pvfs2 as my filesystem.

Remember that pvfs2 does not have a native on-disk format and relies on
other on-disk file systems. Therefore your scaling will be limited by the choice
of on-disk FS that you choose on each of the 3T nodes.
ext3's max volume size ranges from 2TB to 32 TB (block sizes 1KB to 8KB)
Remember that block sizes are directly related to the max.
architectural limit of PAGE size
and the higher block sizes don't work on IA32.
XFS scales upto 8 EB and hence won't have any problems.
ext4 will also probably scale to this capacity *if* it stabilizes.

> looked at gfs, pvfs2, and lustre, but I like pvfs2 the most because I
> can actually compile and run it pretty simply. Unfortunately, there
> isn't really system level fallbacks, so if one of my nodes goes, so too
> does my fs, and of course I cant store a db on it, which might be nice.
> What do people out there suggest? Is there something else better for my
> purposes? I'm not storing homes on this fs, though that might be nice,
> mostly I'll be storing images and text files. Thanks for any comments!

It seems like PVFS2 should do just fine for storing images and text
files. Keep in mind though that there are
no redundancy options at the PVFS layers and if any of the disks go
bad/get corrupted
there is a good chance some/all of the FS will be left inaccessible.
Node failures is not that big a problem. It is disk failures that will
leave you in a bad state.

Cannot comment on gfs or lustre since I havent' used them.
Hope this helps,
thanks,
Murali

>   -Chuck
>
> _______________________________________________
> Pvfs2-users mailing list
> Pvfs2-users <at> beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
hyeyoung cho | 4 Oct 2007 03:23
Picon

RE: pvfs performance over IB


Hi Rob. 
But I already tested with "--without-bmi-tcp" and the flag gave about 10%
improving performance.  

I found my problem. 
I doubt PCI-X performance. My testbed is old system using PCI-X 100MHz.
Maybe I think the PIC-X performance was my bottle neck.

Thank you for your comments.

Regards,
Hyeyoung Cho

-----Original Message-----
From: Rob Ross [mailto:rross <at> mcs.anl.gov] 
Sent: Thursday, October 04, 2007 12:14 AM
To: hyeyoung cho
Cc: pvfs2-users <at> beowulf-underground.org
Subject: Re: [Pvfs2-users] pvfs performance over IB

you should disable TCP. -- rob

hyeyoung cho wrote:
> Hello.
> 
>  
> 
> I am working on the performance test of PVFS2 over IB(IBGD-1.8.2).
> 
> But the performance was terrible.  (70~80MB using pvfs2-cp)
> 
> So just wonder anybody has observed the same behavior earlier?
> 
> And anybody can help me with some insight here?
> 
>  
> 
> When I was IB performance using perf_main, which is a benchmark tool in 
> IBGD,
> 
> The IB performance was about 400MB.
> 
>  
> 
> my settings
> 
> - AMD Opteron Processor 240 - dual CPU  
> 
> - kernel 2.6.9-55.0.2.ELsmp
> 
> - running pvfs2-2.6.3
> 
> - IBGD-1.8.2 (Topspin HCA - 2pt, 10GB, PCI-X, 128MB)
> 
>  
> 
> I compiled pvfs2 with following flags:
> 
> ./configure --with-kernel=/usr/src/linux-2.6 --with-ib=/usr/local/ibgd
> 
> --with-ib-includes=/usr/local/ibgd/driver/infinihost/include
> 
> --with-ib-libs=/usr/local/ibgd/driver/infinihost/lib64
> 
> --with-gm=/opt/gm --with-gm-libs=/opt/gm/lib64
> 
>  
> 
>  
> 
> configuration
> 
> 1 server, 1 client
> 
>  
> 
> - Client mount command:
> 
>   mount -t pvfs2 ib://c0-10-ib:3335/pvfs2-fs /mnt/pvfs2
> 
>   
> 
> - server pvfs2-fs.conf:
> 
> <Aliases>
> 
>         Alias c0-10-ib ib://c0-10-ib:3335
> 
> </Aliases>
> 
>   
> 
> - pvfs2-server.conf-c0-10-ib:
> 
> StorageSpace /state/pvfs-part/pvfs2-stoarge-space
> 
> HostID "ib://c0-10-ib:3335"
> 
> LogFile /tmp/pvfs2-server-ib1server.log
> 
>  
> 
> Test pvfs2-cp
> 
>   [root <at> compute-0-0 ~]# pvfs2-cp -t /state/partition1/testHY /mnt/pvfs2/5
> 
>   Wrote 429588480 bytes in 4.854962 seconds. 84.385318 MB/seconds
> 
>   [root <at> compute-0-0 ~]# pvfs2-cp -t -b 8388608 /state/partition1/testHY 
> /mnt/pvfs2/aa
> 
>   Wrote 429588480 bytes in 5.449869 seconds. 75.173826 MB/seconds
> 
> [root <at> compute-0-0 ~]# pvfs2-cp -t -b 4194304 /state/partition1/testHY 
> /mnt/pvfs2/bb
> 
> Wrote 429588480 bytes in 5.600240 seconds. 73.155347 MB/seconds
> 
>  
> 
> Test IB performance(usingperf_main)
> 
>   perf_main --send -trc -mbw -s10240 -n1000
> 
> BW: 391.8 MBytes/sec [size: 10240 bytes, iter: 1000, total 10240000]
> 
>  
> 
> Regards,
> 
> Thank you in advance for valuable comments.
> 
> Hyeyoung cho
> 
>  
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Pvfs2-users mailing list
> Pvfs2-users <at> beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
Kyle Schochenmaier | 4 Oct 2007 04:03
Picon

Re: pvfs performance over IB

Yes, PCI-X should be a huge bottleneck here, but doesnt explain how you were able to get 400MB/s with the test utilities and not on the fs.
You should have a definite and measurable performance increase moving to a pci-e platform of course.

Could you let us know if this turns out to be the final solution?

~Kyle

On 10/3/07, hyeyoung cho < chohy <at> kisti.re.kr> wrote:

Hi Rob.
But I already tested with "--without-bmi-tcp" and the flag gave about 10%
improving performance.

I found my problem.
I doubt PCI-X performance. My testbed is old system using PCI-X 100MHz.
Maybe I think the PIC-X performance was my bottle neck.

Thank you for your comments.

Regards,
Hyeyoung Cho

-----Original Message-----
From: Rob Ross [mailto:rross <at> mcs.anl.gov]
Sent: Thursday, October 04, 2007 12:14 AM
To: hyeyoung cho
Cc: pvfs2-users <at> beowulf-underground.org
Subject: Re: [Pvfs2-users] pvfs performance over IB

you should disable TCP. -- rob

hyeyoung cho wrote:
> Hello.
>
>
>
> I am working on the performance test of PVFS2 over IB(IBGD-1.8.2).
>
> But the performance was terrible.  (70~80MB using pvfs2-cp)
>
> So just wonder anybody has observed the same behavior earlier?
>
> And anybody can help me with some insight here?
>
>
>
> When I was IB performance using perf_main, which is a benchmark tool in
> IBGD,
>
> The IB performance was about 400MB.
>
>
>
> my settings
>
> - AMD Opteron Processor 240 - dual CPU
>
> - kernel 2.6.9-55.0.2.ELsmp
>
> - running pvfs2-2.6.3
>
> - IBGD-1.8.2 (Topspin HCA - 2pt, 10GB, PCI-X, 128MB)
>
>
>
> I compiled pvfs2 with following flags:
>
> ./configure --with-kernel=/usr/src/linux-2.6 --with-ib=/usr/local/ibgd
>
> --with-ib-includes=/usr/local/ibgd/driver/infinihost/include
>
> --with-ib-libs=/usr/local/ibgd/driver/infinihost/lib64
>
> --with-gm=/opt/gm --with-gm-libs=/opt/gm/lib64
>
>
>
>
>
> configuration
>
> 1 server, 1 client
>
>
>
> - Client mount command:
>
>   mount -t pvfs2 ib://c0-10-ib:3335/pvfs2-fs /mnt/pvfs2
>
>
>
> - server pvfs2-fs.conf:
>
> <Aliases>
>
>         Alias c0-10-ib ib://c0-10-ib:3335
>
> </Aliases>
>
>
>
> - pvfs2-server.conf-c0-10-ib:
>
> StorageSpace /state/pvfs-part/pvfs2-stoarge-space
>
> HostID "ib://c0-10-ib:3335"
>
> LogFile /tmp/pvfs2-server-ib1server.log
>
>
>
> Test pvfs2-cp
>
>   [root <at> compute-0-0 ~]# pvfs2-cp -t /state/partition1/testHY /mnt/pvfs2/5
>
>   Wrote 429588480 bytes in 4.854962 seconds. 84.385318 MB/seconds
>
>   [root <at> compute-0-0 ~]# pvfs2-cp -t -b 8388608 /state/partition1/testHY
> /mnt/pvfs2/aa
>
>   Wrote 429588480 bytes in 5.449869 seconds. 75.173826 MB/seconds
>
> [root <at> compute-0-0 ~]# pvfs2-cp -t -b 4194304 /state/partition1/testHY
> /mnt/pvfs2/bb
>
> Wrote 429588480 bytes in 5.600240 seconds. 73.155347 MB/seconds
>
>
>
> Test IB performance(usingperf_main)
>
>   perf_main --send -trc -mbw -s10240 -n1000
>
> BW: 391.8 MBytes/sec [size: 10240 bytes, iter: 1000, total 10240000]
>
>
>
> Regards,
>
> Thank you in advance for valuable comments.
>
> Hyeyoung cho
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Pvfs2-users mailing list
> Pvfs2-users <at> beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

_______________________________________________
Pvfs2-users mailing list
Pvfs2-users <at> beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users



--
Kyle Schochenmaier
_______________________________________________
Pvfs2-users mailing list
Pvfs2-users <at> beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
Anthony Tong | 5 Oct 2007 19:00
Picon

2.6.3, file hole issues?

I'm getting file holes with pvfs 2.6.3 on linux 2.6 (rhel4 kernels i386,
and a vanilla 2.6 as well) on a test system and can consistently reproduce
them. Holes are about 16k+ of zeros.

Simple setup: 4 io servers, 1 metadata, over TCP, these nodes also mount
the filesystem.

I am writing gigabyte files sequentially from a client via the vfs
interface

I finally had some time to do some debugging this morning and here's
what I have found so far. "io,client" is the gossip mask on for
pvfs2-client-core-threaded.

For the first instance of the hole, I see "Posted UNKNOWN" in
the log. The offset (1301371504) corresponds with where the first
hole is in my test file.

Snippet from output of cmp -l good.file corrupt.file
1301371505 127   0
1301371506 376   0
1301371509 115   0
1301371510 221   0
1301371511 132   0
... (and so forth till)..
1301438544 110   0
1571986033   7   0

Searching for other "Posted UNKNOWN" messages and if there's a
file_req_off nearby, it corresponds for other holes as well.

Gossip snippets

[D 11:09:40.531507] * mem req size is 67040, file_req size is 67040 (bytes)
[D 11:09:40.531534]   bstream_size = 325343856, datafile nr=1, ct=4, file_req_off = 1301371504
[D 11:09:40.531712]   posted flow for context 0xb4bfd720
[D 11:09:40.531790]   preposting write ack for context 0xb4bfd720.
[D 11:09:41.563356] Posted UNKNOWN (waiting for test)
[D 11:09:41.563558] Posted UNKNOWN (waiting for test)
[D 11:09:41.640702] get_config state: server_get_config_setup_msgpair
[D 11:09:41.641900] Posted PVFS_SYS_FS_ADD (waiting for test)
[D 11:09:41.644099] * Adding new dynamic mount point <DYNAMIC-1> [7,0]
[D 11:09:41.644148] PINT_server_config_mgr_add_config: adding config 0x84e6680
[D 11:09:41.644177]     mapped fs_id 1867692515 => config 0x84e6680
[D 11:09:41.644218] Set min handle recycle time to 360 seconds
[D 11:09:41.644249] Reloading handle mappings for fs_id 1867692515
[D 11:09:41.644472] PVFS_isys_io entered [1048186]
[D 11:09:41.644548] (0x84f0c68) io state: io_init
[D 11:09:41.644582] (0x84f0c68) getattr_setup_msgpair
[D 11:09:41.644702] Posted PVFS_SYS_IO (waiting for test)
[D 11:09:41.645097] trying to add object reference to acache
[D 11:09:41.645138] (0x84f0c68) getattr state: getattr_cleanup
[D 11:09:41.645169] (0x84f0c68) io state: io_datafile_setup_msgpairs
[D 11:09:41.645201] - io_find_target_datafiles called
[D 11:09:41.645279] io_find_target_datafiles: datafile[1] might have data (out=1)
[D 11:09:41.645319] io_find_target_datafiles: datafile[2] might have data (out=2)

...

[D 11:09:55.609389] * mem req size is 100272, file_req size is 100272 (bytes)
[D 11:09:55.609417]   bstream_size = 393019392, datafile nr=0, ct=4, file_req_off = 1571986032
[D 11:09:55.609526]   posted flow for context 0xb55fd318
[D 11:09:55.609554]   preposting write ack for context 0xb55fd318.
[D 11:09:56.627065] Posted UNKNOWN (waiting for test)
[D 11:09:56.627238] Posted UNKNOWN (waiting for test)
[D 11:09:56.693300] get_config state: server_get_config_setup_msgpair
[D 11:09:56.694529] Posted PVFS_SYS_FS_ADD (waiting for test)
[D 11:09:56.700558] * Adding new dynamic mount point <DYNAMIC-1> [7,0]
[D 11:09:56.700620] PINT_server_config_mgr_add_config: adding config 0x83c6680
[D 11:09:56.700650]     mapped fs_id 1867692515 => config 0x83c6680
[D 11:09:56.700692] Set min handle recycle time to 360 seconds
[D 11:09:56.700735] Reloading handle mappings for fs_id 1867692515
[D 11:09:56.700954] PVFS_isys_io entered [1048186]
[D 11:09:56.701033] (0x83d0c68) io state: io_init
[D 11:09:56.701066] (0x83d0c68) getattr_setup_msgpair
[D 11:09:56.701188] Posted PVFS_SYS_IO (waiting for test)
[D 11:09:56.701589] trying to add object reference to acache
[D 11:09:56.701631] (0x83d0c68) getattr state: getattr_cleanup
[D 11:09:56.701663] (0x83d0c68) io state: io_datafile_setup_msgpairs
Murali Vilayannur | 6 Oct 2007 07:59
Picon

Re: 2.6.3, file hole issues?

Hi Anthony,
Argh.. That is really bad..:(

Can you share a snippet of your code so that we can repro it locally,
fix the bug and add it to the nightlies to catch future regressions?
A couple of questions though:
- Is this seen only through the Linux VFS interface? Does it work if
it uses the pvfs system interfaces/MPI-IO?
- What distro and/or glibc version on server?

As regards to  this
> For the first instance of the hole, I see "Posted UNKNOWN" in
> the log. The offset (1301371504) corresponds with where the first
> hole is in my test file.

The message itself is harmless. It is a buglet in
client-state-machine.c's PINT_client_get_name_str()
since we don't have an entry for PVFS_CLIENT_PERF_COUNT_TIMER.
Hence it prints "UNKNOWN".
THis gets called only on the pvfs2-client-core startup though and what is weird
is why it appears more than twice.

I suspect a bug in the file offset handling in the kernel code but
require some information on what interface is being used
(readv/writev/aio/...?) and if possible the code itself..
thanks,
Murali

On 10/5/07, Anthony Tong <atong <at> ncsa.uiuc.edu> wrote:
> I'm getting file holes with pvfs 2.6.3 on linux 2.6 (rhel4 kernels i386,
> and a vanilla 2.6 as well) on a test system and can consistently reproduce
> them. Holes are about 16k+ of zeros.
>
> Simple setup: 4 io servers, 1 metadata, over TCP, these nodes also mount
> the filesystem.
>
> I am writing gigabyte files sequentially from a client via the vfs
> interface
>
> I finally had some time to do some debugging this morning and here's
> what I have found so far. "io,client" is the gossip mask on for
> pvfs2-client-core-threaded.
>
>
> Snippet from output of cmp -l good.file corrupt.file
> 1301371505 127   0
> 1301371506 376   0
> 1301371509 115   0
> 1301371510 221   0
> 1301371511 132   0
> ... (and so forth till)..
> 1301438544 110   0
> 1571986033   7   0
>
> Searching for other "Posted UNKNOWN" messages and if there's a
> file_req_off nearby, it corresponds for other holes as well.
>
> Gossip snippets
>
> [D 11:09:40.531507] * mem req size is 67040, file_req size is 67040 (bytes)
> [D 11:09:40.531534]   bstream_size = 325343856, datafile nr=1, ct=4, file_req_off = 1301371504
> [D 11:09:40.531712]   posted flow for context 0xb4bfd720
> [D 11:09:40.531790]   preposting write ack for context 0xb4bfd720.
> [D 11:09:41.563356] Posted UNKNOWN (waiting for test)
> [D 11:09:41.563558] Posted UNKNOWN (waiting for test)
> [D 11:09:41.640702] get_config state: server_get_config_setup_msgpair
> [D 11:09:41.641900] Posted PVFS_SYS_FS_ADD (waiting for test)
> [D 11:09:41.644099] * Adding new dynamic mount point <DYNAMIC-1> [7,0]
> [D 11:09:41.644148] PINT_server_config_mgr_add_config: adding config 0x84e6680
> [D 11:09:41.644177]     mapped fs_id 1867692515 => config 0x84e6680
> [D 11:09:41.644218] Set min handle recycle time to 360 seconds
> [D 11:09:41.644249] Reloading handle mappings for fs_id 1867692515
> [D 11:09:41.644472] PVFS_isys_io entered [1048186]
> [D 11:09:41.644548] (0x84f0c68) io state: io_init
> [D 11:09:41.644582] (0x84f0c68) getattr_setup_msgpair
> [D 11:09:41.644702] Posted PVFS_SYS_IO (waiting for test)
> [D 11:09:41.645097] trying to add object reference to acache
> [D 11:09:41.645138] (0x84f0c68) getattr state: getattr_cleanup
> [D 11:09:41.645169] (0x84f0c68) io state: io_datafile_setup_msgpairs
> [D 11:09:41.645201] - io_find_target_datafiles called
> [D 11:09:41.645279] io_find_target_datafiles: datafile[1] might have data (out=1)
> [D 11:09:41.645319] io_find_target_datafiles: datafile[2] might have data (out=2)
>
> ...
>
> [D 11:09:55.609389] * mem req size is 100272, file_req size is 100272 (bytes)
> [D 11:09:55.609417]   bstream_size = 393019392, datafile nr=0, ct=4, file_req_off = 1571986032
> [D 11:09:55.609526]   posted flow for context 0xb55fd318
> [D 11:09:55.609554]   preposting write ack for context 0xb55fd318.
> [D 11:09:56.627065] Posted UNKNOWN (waiting for test)
> [D 11:09:56.627238] Posted UNKNOWN (waiting for test)
> [D 11:09:56.693300] get_config state: server_get_config_setup_msgpair
> [D 11:09:56.694529] Posted PVFS_SYS_FS_ADD (waiting for test)
> [D 11:09:56.700558] * Adding new dynamic mount point <DYNAMIC-1> [7,0]
> [D 11:09:56.700620] PINT_server_config_mgr_add_config: adding config 0x83c6680
> [D 11:09:56.700650]     mapped fs_id 1867692515 => config 0x83c6680
> [D 11:09:56.700692] Set min handle recycle time to 360 seconds
> [D 11:09:56.700735] Reloading handle mappings for fs_id 1867692515
> [D 11:09:56.700954] PVFS_isys_io entered [1048186]
> [D 11:09:56.701033] (0x83d0c68) io state: io_init
> [D 11:09:56.701066] (0x83d0c68) getattr_setup_msgpair
> [D 11:09:56.701188] Posted PVFS_SYS_IO (waiting for test)
> [D 11:09:56.701589] trying to add object reference to acache
> [D 11:09:56.701631] (0x83d0c68) getattr state: getattr_cleanup
> [D 11:09:56.701663] (0x83d0c68) io state: io_datafile_setup_msgpairs
>
>
> _______________________________________________
> Pvfs2-users mailing list
> Pvfs2-users <at> beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
Sam Lang | 6 Oct 2007 15:41

Re: 2.6.3, file hole issues?


On Oct 5, 2007, at 12:00 PM, Anthony Tong wrote:

> I'm getting file holes with pvfs 2.6.3 on linux 2.6 (rhel4 kernels  
> i386,
> and a vanilla 2.6 as well) on a test system and can consistently  
> reproduce
> them. Holes are about 16k+ of zeros.
>
> Simple setup: 4 io servers, 1 metadata, over TCP, these nodes also  
> mount
> the filesystem.
>
> I am writing gigabyte files sequentially from a client via the vfs
> interface
>
> I finally had some time to do some debugging this morning and here's
> what I have found so far. "io,client" is the gossip mask on for
> pvfs2-client-core-threaded.

Hi Anthony,

Are you able to reproduce the problem with pvfs2-client-core?  Also,  
are you running pvfs2-client-core-threaded directly or through pvfs2- 
client --threaded?

 From the traces you've included below, it looks like you're mounting/ 
unmounting the filesystem over and over between each IO.  Any reason  
to do that?
-sam

>
> For the first instance of the hole, I see "Posted UNKNOWN" in
> the log. The offset (1301371504) corresponds with where the first
> hole is in my test file.
>
> Snippet from output of cmp -l good.file corrupt.file
> 1301371505 127   0
> 1301371506 376   0
> 1301371509 115   0
> 1301371510 221   0
> 1301371511 132   0
> ... (and so forth till)..
> 1301438544 110   0
> 1571986033   7   0
>
> Searching for other "Posted UNKNOWN" messages and if there's a
> file_req_off nearby, it corresponds for other holes as well.
>
> Gossip snippets
>
> [D 11:09:40.531507] * mem req size is 67040, file_req size is 67040  
> (bytes)
> [D 11:09:40.531534]   bstream_size = 325343856, datafile nr=1,  
> ct=4, file_req_off = 1301371504
> [D 11:09:40.531712]   posted flow for context 0xb4bfd720
> [D 11:09:40.531790]   preposting write ack for context 0xb4bfd720.
> [D 11:09:41.563356] Posted UNKNOWN (waiting for test)
> [D 11:09:41.563558] Posted UNKNOWN (waiting for test)
> [D 11:09:41.640702] get_config state: server_get_config_setup_msgpair
> [D 11:09:41.641900] Posted PVFS_SYS_FS_ADD (waiting for test)
> [D 11:09:41.644099] * Adding new dynamic mount point <DYNAMIC-1> [7,0]
> [D 11:09:41.644148] PINT_server_config_mgr_add_config: adding  
> config 0x84e6680
> [D 11:09:41.644177]     mapped fs_id 1867692515 => config 0x84e6680
> [D 11:09:41.644218] Set min handle recycle time to 360 seconds
> [D 11:09:41.644249] Reloading handle mappings for fs_id 1867692515
> [D 11:09:41.644472] PVFS_isys_io entered [1048186]
> [D 11:09:41.644548] (0x84f0c68) io state: io_init
> [D 11:09:41.644582] (0x84f0c68) getattr_setup_msgpair
> [D 11:09:41.644702] Posted PVFS_SYS_IO (waiting for test)
> [D 11:09:41.645097] trying to add object reference to acache
> [D 11:09:41.645138] (0x84f0c68) getattr state: getattr_cleanup
> [D 11:09:41.645169] (0x84f0c68) io state: io_datafile_setup_msgpairs
> [D 11:09:41.645201] - io_find_target_datafiles called
> [D 11:09:41.645279] io_find_target_datafiles: datafile[1] might  
> have data (out=1)
> [D 11:09:41.645319] io_find_target_datafiles: datafile[2] might  
> have data (out=2)
>
> ...
>
> [D 11:09:55.609389] * mem req size is 100272, file_req size is  
> 100272 (bytes)
> [D 11:09:55.609417]   bstream_size = 393019392, datafile nr=0,  
> ct=4, file_req_off = 1571986032
> [D 11:09:55.609526]   posted flow for context 0xb55fd318
> [D 11:09:55.609554]   preposting write ack for context 0xb55fd318.
> [D 11:09:56.627065] Posted UNKNOWN (waiting for test)
> [D 11:09:56.627238] Posted UNKNOWN (waiting for test)
> [D 11:09:56.693300] get_config state: server_get_config_setup_msgpair
> [D 11:09:56.694529] Posted PVFS_SYS_FS_ADD (waiting for test)
> [D 11:09:56.700558] * Adding new dynamic mount point <DYNAMIC-1> [7,0]
> [D 11:09:56.700620] PINT_server_config_mgr_add_config: adding  
> config 0x83c6680
> [D 11:09:56.700650]     mapped fs_id 1867692515 => config 0x83c6680
> [D 11:09:56.700692] Set min handle recycle time to 360 seconds
> [D 11:09:56.700735] Reloading handle mappings for fs_id 1867692515
> [D 11:09:56.700954] PVFS_isys_io entered [1048186]
> [D 11:09:56.701033] (0x83d0c68) io state: io_init
> [D 11:09:56.701066] (0x83d0c68) getattr_setup_msgpair
> [D 11:09:56.701188] Posted PVFS_SYS_IO (waiting for test)
> [D 11:09:56.701589] trying to add object reference to acache
> [D 11:09:56.701631] (0x83d0c68) getattr state: getattr_cleanup
> [D 11:09:56.701663] (0x83d0c68) io state: io_datafile_setup_msgpairs
>
>
> _______________________________________________
> Pvfs2-users mailing list
> Pvfs2-users <at> beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
Anthony Tong | 6 Oct 2007 18:21
Picon

Re: 2.6.3, file hole issues?

On Sat, Oct 06, 2007 at 08:41:59AM -0500, Sam Lang wrote:
> Hi Anthony,
>
> Are you able to reproduce the problem with pvfs2-client-core?  Also, are 
> you running pvfs2-client-core-threaded directly or through pvfs2-client 
> --threaded?

Just tried pvfs2-client-core for several test runs.
Seems to work ok!

Previously, I had pvfs2-client start the threaded version by:

/usr/local/pvfs2/bin/pvfs2-client -p /usr/local/pvfs2/bin/pvfs2-client-core-threaded

> From the traces you've included below, it looks like you're 
> mounting/unmounting the filesystem over and over between each IO.  Any 
> reason to do that?

I definitely am not conscious of anything doing that.

To clarify what you said & what I see in the logs: before each problematic
IO, it looks like the client daemons are remounting the fs? Maybe
the threaded client core was getting restarted for some reason?

Let me know what debug settings/logs would be useful

Gmane