Kevin Davis | 1 Mar 2010 16:03
Picon
Favicon

Re: Has anyone seen this ndmpcopy error message

Michael,

> We have two netapps, both model FAS270. Netapp1, the source for ndmpcopy,
> runs 7.0.4; netapp2, the destination, netapp2, runs 7.2.7. We use ndmpcopy
> to copy volumes from netapp1 to netapp2. Intermittently (though it seems
> to occur more frequently now), the ndmpcopy fails to run and gives the
> following message:
>
>  Body error NDMP_ILLEGAL_STATE_ERR in reply message NdmpMessageDataStartRecover
>    from destination
>  Feb 25 06:00:05 CST [ndmpc:233]: Failed to start restore on destination
>
> There is no pattern regarding on which volume or the size of the volume
> the ndmpcopy operation will fail. It can occur on one of our smaller
> volumes (1g) or on our largest (765g). The error will occur when the

Intermittent failures.  Our favorite kind!

Are you able to reproduce the failure on each filer separately, with the 
source/destination being the same (netapp1<->netapp1, netapp2<->netapp2)?  How 
about in the opposite direction?

What sort of authentication are you using?  Looks like you're digging through 
/etc/log/ndmpdlog.  Is there anything telling in /etc/log/ndmpcopy.<date>?

-Kevin

*------------------------------------------*-----------------------*
| Kevin Davis (UNIX/Storage Sysadmin)      | Natick, Massachusetts |
| 508.647.7660                             |            01760-2098 |
(Continue reading)

Kevin Davis | 1 Mar 2010 16:16
Picon
Favicon

Re: NetApp scripting question


> I am putting in a new filer to replace three Windows servers and in the process I will have to create
> approx 110 shares. I have heard that there is a way of scripting the process. does anyone have any
> information that they could provide for me on this process?

It's easy, but like any scripting task, you just need to know what you want 
ahead of time. If these shares are of the most basic sort (share name, path, 
everyone/full control access), then all you need to do is dump all the commands 
into a file and execute them on the filer or via SSH.  If you can handle a 
reboot, just append all of the new shares into <root>/etc/cifsconfig_share.cfg. 
The syntax is straightforward, and the shares will be created upon next boot.  I 
suspect the same would happen if you just restarted CIFS, but I haven't tried 
it.  Take a look at it - it's simple - and you can just as easily set different 
access to each share if you have that information too.

*------------------------------------------*-----------------------*
| Kevin Davis (UNIX/Storage Sysadmin)      | Natick, Massachusetts |
| 508.647.7660                             |            01760-2098 |
| mailto:kevin.davis <at> mathworks.com         *-----------------------*
| http://www.mathworks.com                 |                       |
*------------------------------------------*-----------------------*

Adam McDougall | 1 Mar 2010 19:34
Picon
Favicon

Slow aggregate/shelf, hot disk

For a long time we've known backing up our largest volume (3.5T) was 
slow.  More recently I've been investigating why and it seems like a 
problem with only that shelf or possibly aggregate.  Basically it is 
several times slower than any other shelf/aggregate we have, it 
seems bottlenecked whether I am reading/writing from nfs, ndmp, 
reallocate scans, etc, that shelf is always slower.  I will probably
have a support case opened tomorrow with netapp but I feel like 
checking with the list to see what else I can find out on my own.
When doing NDMP backups I get only around 230Mbit/sec as opposed to
800+ on others.  The performance drops distinctly on the hour 
probably for snapshots (see pic).  Details below.  0c.25 seems like
a hot disk but the activity on that aggr also seems too high since 
the network bandwidth is fairly small.  A 'reallocate measure' on
the two large volumes on aggregate hanksata0 both return a score of 
'1'.

I guess my two main questions are, how do I figure out what is 
causing the activity on hanksata0 (especially the hot disk which is
sometimes at 100%) and if its not just activity but an actual 
problem, how could I further debug the slow performance to find out 
what items are at fault?  

I used ndmpcopy to copy a fast volume with large files from another 
filer to a new volume on hanksata0 and hanksata1.  The volume on 
hanksata0 is slow but the one on hanksata1 is not.  Both of those 
aggregates are on the same loop with hanksata1 terminating it.

Sun Feb 28 20:14:20 EST [hank: wafl.scan.start:info]: Starting WAFL
layout measurement on volume scratchtest.
Sun Feb 28 20:19:01 EST [hank: wafl.reallocate.check.value:info]:
(Continue reading)

David McWilliams | 1 Mar 2010 19:30
Picon

Re: NetApp scripting question

This is what I did and it worked great, all qtrees set up in no time, with all appropriate perms.

Sláinte,

David

"Build a man a fire he'll be warm for a day, set a man on fire and he'll be warm for the rest of his life" - Terry Pratchett

Checkout my photos - http://www.panoramio.com/user/1113507


On Mon, Mar 1, 2010 at 1:23 PM, <Marek.Stopka <at> tieto.com> wrote:
Or you can put all commandis into a single file and go like

toaster> source /home/createallneededcifs.txt

:-)

I like this one more then directly touching configuration file...
________________________________________
From: owner-toasters <at> mathworks.com [owner-toasters <at> mathworks.com] On Behalf Of Kevin Davis [kdavis <at> mathworks.com]
Sent: Monday, March 01, 2010 5:16 PM
To: David McWilliams
Cc: NetApp list
Subject: Re: NetApp scripting question

> I am putting in a new filer to replace three Windows servers and in the process I will have to create
> approx 110 shares. I have heard that there is a way of scripting the process. does anyone have any
> information that they could provide for me on this process?

It's easy, but like any scripting task, you just need to know what you want
ahead of time. If these shares are of the most basic sort (share name, path,
everyone/full control access), then all you need to do is dump all the commands
into a file and execute them on the filer or via SSH.  If you can handle a
reboot, just append all of the new shares into <root>/etc/cifsconfig_share.cfg.
The syntax is straightforward, and the shares will be created upon next boot.  I
suspect the same would happen if you just restarted CIFS, but I haven't tried
it.  Take a look at it - it's simple - and you can just as easily set different
access to each share if you have that information too.


*------------------------------------------*-----------------------*
| Kevin Davis (UNIX/Storage Sysadmin)      | Natick, Massachusetts |
| 508.647.7660                             |            01760-2098 |
| mailto:kevin.davis <at> mathworks.com         *-----------------------*
| http://www.mathworks.com                 |                       |
*------------------------------------------*-----------------------*


Adam McDougall | 1 Mar 2010 20:15
Picon
Favicon

Re: Slow aggregate/shelf, hot disk

Approx late Sept-09.  I wouldn't be surprised if it was slow before that
but I have no real data to back that up.

On 03/01/10 13:50, Jeff Mohler wrote:
> How long has this aggregate been over 95% full?
>
>
>
> On Mon, Mar 1, 2010 at 10:34 AM, Adam McDougall <mcdouga9 <at> egr.msu.edu
> <mailto:mcdouga9 <at> egr.msu.edu>> wrote:
>
>     For a long time we've known backing up our largest volume (3.5T) was
>     slow.  More recently I've been investigating why and it seems like a
>     problem with only that shelf or possibly aggregate.  Basically it is
>     several times slower than any other shelf/aggregate we have, it
>     seems bottlenecked whether I am reading/writing from nfs, ndmp,
>     reallocate scans, etc, that shelf is always slower.  I will probably
>     have a support case opened tomorrow with netapp but I feel like
>     checking with the list to see what else I can find out on my own.
>     When doing NDMP backups I get only around 230Mbit/sec as opposed to
>     800+ on others.  The performance drops distinctly on the hour
>     probably for snapshots (see pic).  Details below.  0c.25 seems like
>     a hot disk but the activity on that aggr also seems too high since
>     the network bandwidth is fairly small.  A 'reallocate measure' on
>     the two large volumes on aggregate hanksata0 both return a score of
>     '1'.
>
>     I guess my two main questions are, how do I figure out what is
>     causing the activity on hanksata0 (especially the hot disk which is
>     sometimes at 100%) and if its not just activity but an actual
>     problem, how could I further debug the slow performance to find out
>     what items are at fault?
>
>     I used ndmpcopy to copy a fast volume with large files from another
>     filer to a new volume on hanksata0 and hanksata1.  The volume on
>     hanksata0 is slow but the one on hanksata1 is not.  Both of those
>     aggregates are on the same loop with hanksata1 terminating it.
>
>     Sun Feb 28 20:14:20 EST [hank: wafl.scan.start:info]: Starting WAFL
>     layout measurement on volume scratchtest.
>     Sun Feb 28 20:19:01 EST [hank: wafl.reallocate.check.value:info]:
>     Allocation measurement check on
>     '/vol/scratchtest' is 2.
>
>     ^^^ almost 5 minutes!
>
>     Sun Feb 28 20:13:38 EST [hank: wafl.scan.start:info]: Starting
>     WAFL layout measurement on volume scratchtest2.
>     Sun Feb 28 20:14:12 EST [hank: wafl.reallocate.check.value:info]:
>     Allocation measurement check on
>     '/vol/scratchtest2' is 1.
>
>     ^^^ less than 1 min
>
>     When I write to scratchtest, you can see the network bandwidth jump
>     up for a few seconds then it stalls for about twice as long,
>     presumably so the filer can catch up writing, then it repeats.
>     Speed averages around 30-40MB/sec if that.
>
>     I even tried using the spare sata disk from both of these shelves
>     to make a new volume, copied scratchtest to it (which took 26
>     minutes for around 40G), and reads were equally slow as the existing
>     scratchtest, although I'm not sure if thats because a single disk is
>     too slow to prove anything, or theres a shelf problem.
>
>     hanksata0           6120662048 6041632124   79029924      99%
>     hanksata0/.snapshot  322140104   14465904  307674200       4%
>     hanksata1           8162374688 2191140992 5971233696      27%
>     hanksata1/.snapshot  429598664   39636812  389961852       9%
>
>     hanksata0 and 1 are both ds14mk2 AT but hanksata0 has
>     X268_HGEMI aka X268A-R5 (750m x 14) and hanksata1 has
>     disks X269_HGEMI aka X269A-R5 (1T x 14).  hanksata0 has
>     been around since we got the filer say around 2 years ago,
>     hanksata1 was added within the last half year.  Both
>     shelves have always had 11 data disks, 2 parity, 1 spare,
>     the aggregates were never grown.
>
>     volumes on hanksata0 besides root (all created over a year ago):
>
>     volume 1 (research):
>     NO dedupe (too big)
>     10 million inodes, approx 3.5T, 108G in snapshots
>     endures random user read/write but usually fairly light traffic.
>     Populated initially with rsync then opened to user access via NFS.
>     Sun Feb 28 21:38:11 EST [hank: wafl.reallocate.check.value:info]:
>     Allocation measurement check on '/vol/research' is 1.
>
>     volume 2 (reinstallbackups):
>     dedupe enabled
>     6.6 million files, approx 1.6T, 862G in snapshots
>     volume created over a year ago and has several dozen gigs of windows
>     PC backups written or read multiple times per week using CIFS but
>     otherwise COMPLETELY idle.  Older data is generally deleted to
>     snapshots after some weeks and the snapshots expire after a few weeks.
>     Only accessed via CIFS.
>     Mon Mar  1 12:15:58 EST [hank: wafl.reallocate.check.value:info]:
>     Allocation measurement check on '/vol/reinstallbackups' is 1.
>
>
>     hanksata1 only has one volume besides the small test ones I made,
>     it runs plenty fast.
>     dedupe enabled
>
>     4.3 million files, approx 1.6T, 12G in snapshots
>     created a few months ago on an otherwise unused new aggregate with
>     initial rsync,
>     then daily rsyncs from another fileserver that is not very active
>
>
>
>     disk             ut%  xfers  ureads--chain-usecs writes--chain-usecs
>     cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs
>     /hanksata0/plex0/rg0:
>     0c.16              7   5.69    0.94   1.00 55269   3.22   3.02  2439
>        1.52   2.71   579   0.00   ....     .   0.00   ....     .
>     0c.17              9   6.34    0.94   1.00 74308   3.84   2.86  2228
>        1.56   2.93   873   0.00   ....     .   0.00   ....     .
>     0c.18             63 121.00  118.86   1.01 30249   1.38   3.26  3516
>        0.76   5.43  2684   0.00   ....     .   0.00   ....     .
>     0c.19             60 117.74  116.69   1.00 30546   0.40   3.73  5049
>        0.65   5.56  2840   0.00   ....     .   0.00   ....     .
>     0c.20             60 120.82  119.66   1.02 29156   0.43   5.33  5469
>        0.72   4.80  3583   0.00   ....     .   0.00   ....     .
>     0c.21             60 119.37  118.25   1.02 29654   0.36   4.60  5870
>        0.76   5.76  3140   0.00   ....     .   0.00   ....     .
>     0c.22             62 124.87  123.32   1.02 29423   0.62   5.65  5677
>        0.94   3.58  2710   0.00   ....     .   0.00   ....     .
>     0c.23             62 119.48  118.35   1.03 30494   0.36   4.00  6875
>        0.76   5.14  3417   0.00   ....     .   0.00   ....     .
>     0c.24             61 119.08  117.96   1.02 29981   0.47   6.92  3289
>        0.65   3.94  2930   0.00   ....     .   0.00   ....     .
>     0c.25             93 118.17  116.72   1.03 45454   0.58   4.00 17719
>        0.87   4.63 11658   0.00   ....     .   0.00   ....     .
>     0c.26             61 121.40  120.27   1.04 29271   0.43   7.75  3097
>        0.69   5.21  2131   0.00   ....     .   0.00   ....     .
>     0c.27             59 115.75  114.81   1.03 29820   0.43   5.50  4530
>        0.51   6.00  3321   0.00   ....     .   0.00   ....     .
>     0c.28             63 125.53  124.15   1.01 30302   0.65   6.94  3808
>        0.72   3.40  5191   0.00   ....     .   0.00   ....     .
>
>     Both sata shelves are on controller 0c attached to two 3040.
>     Both sata shelves are on controller 0c attached to two 3040.
>     Raid-DP in 13-disk raid groups so we have 2 parity and one spare
>     per shelf.
>     Active-Active single path HA.
>     Latest firmwares/code as of beginning of the year. 7.3.2.
>     no VMs, no snapmirror, nothing fancy that I can think of.
>     wafl scan status only shows 'active bitmap rearrangement' or
>     'container block reclamation'.
>
>     Thanks for thoughts and input!
>
>
>
>
> --
> No Signature Required
> Save The Bits, Save The World!

tmac | 1 Mar 2010 20:49
Picon

Re: Slow aggregate/shelf, hot disk

Questions:

What does the raid layout look like on the aggregate (aggr status -r aggrname)

Did you *ever* let this aggregate fill up or get nearly full (90% or
more) before adding more disks?

If you added more disks, how were they added? In other words, what was
the layout before and after the disk add?

--tmac
         Tim McCarthy
     Principal Consultant

  RedHat Certified Engineer
   804006984323821 (RHEL4)
   805007643429572 (RHEL5)

On Mon, Mar 1, 2010 at 2:15 PM, Adam McDougall <mcdouga9 <at> egr.msu.edu> wrote:
> Approx late Sept-09.  I wouldn't be surprised if it was slow before that
> but I have no real data to back that up.
>
> On 03/01/10 13:50, Jeff Mohler wrote:
>>
>> How long has this aggregate been over 95% full?
>>
>>
>>
>> On Mon, Mar 1, 2010 at 10:34 AM, Adam McDougall <mcdouga9 <at> egr.msu.edu
>> <mailto:mcdouga9 <at> egr.msu.edu>> wrote:
>>
>>    For a long time we've known backing up our largest volume (3.5T) was
>>    slow.  More recently I've been investigating why and it seems like a
>>    problem with only that shelf or possibly aggregate.  Basically it is
>>    several times slower than any other shelf/aggregate we have, it
>>    seems bottlenecked whether I am reading/writing from nfs, ndmp,
>>    reallocate scans, etc, that shelf is always slower.  I will probably
>>    have a support case opened tomorrow with netapp but I feel like
>>    checking with the list to see what else I can find out on my own.
>>    When doing NDMP backups I get only around 230Mbit/sec as opposed to
>>    800+ on others.  The performance drops distinctly on the hour
>>    probably for snapshots (see pic).  Details below.  0c.25 seems like
>>    a hot disk but the activity on that aggr also seems too high since
>>    the network bandwidth is fairly small.  A 'reallocate measure' on
>>    the two large volumes on aggregate hanksata0 both return a score of
>>    '1'.
>>
>>    I guess my two main questions are, how do I figure out what is
>>    causing the activity on hanksata0 (especially the hot disk which is
>>    sometimes at 100%) and if its not just activity but an actual
>>    problem, how could I further debug the slow performance to find out
>>    what items are at fault?
>>
>>    I used ndmpcopy to copy a fast volume with large files from another
>>    filer to a new volume on hanksata0 and hanksata1.  The volume on
>>    hanksata0 is slow but the one on hanksata1 is not.  Both of those
>>    aggregates are on the same loop with hanksata1 terminating it.
>>
>>    Sun Feb 28 20:14:20 EST [hank: wafl.scan.start:info]: Starting WAFL
>>    layout measurement on volume scratchtest.
>>    Sun Feb 28 20:19:01 EST [hank: wafl.reallocate.check.value:info]:
>>    Allocation measurement check on
>>    '/vol/scratchtest' is 2.
>>
>>    ^^^ almost 5 minutes!
>>
>>    Sun Feb 28 20:13:38 EST [hank: wafl.scan.start:info]: Starting
>>    WAFL layout measurement on volume scratchtest2.
>>    Sun Feb 28 20:14:12 EST [hank: wafl.reallocate.check.value:info]:
>>    Allocation measurement check on
>>    '/vol/scratchtest2' is 1.
>>
>>    ^^^ less than 1 min
>>
>>    When I write to scratchtest, you can see the network bandwidth jump
>>    up for a few seconds then it stalls for about twice as long,
>>    presumably so the filer can catch up writing, then it repeats.
>>    Speed averages around 30-40MB/sec if that.
>>
>>    I even tried using the spare sata disk from both of these shelves
>>    to make a new volume, copied scratchtest to it (which took 26
>>    minutes for around 40G), and reads were equally slow as the existing
>>    scratchtest, although I'm not sure if thats because a single disk is
>>    too slow to prove anything, or theres a shelf problem.
>>
>>    hanksata0           6120662048 6041632124   79029924      99%
>>    hanksata0/.snapshot  322140104   14465904  307674200       4%
>>    hanksata1           8162374688 2191140992 5971233696      27%
>>    hanksata1/.snapshot  429598664   39636812  389961852       9%
>>
>>    hanksata0 and 1 are both ds14mk2 AT but hanksata0 has
>>    X268_HGEMI aka X268A-R5 (750m x 14) and hanksata1 has
>>    disks X269_HGEMI aka X269A-R5 (1T x 14).  hanksata0 has
>>    been around since we got the filer say around 2 years ago,
>>    hanksata1 was added within the last half year.  Both
>>    shelves have always had 11 data disks, 2 parity, 1 spare,
>>    the aggregates were never grown.
>>
>>    volumes on hanksata0 besides root (all created over a year ago):
>>
>>    volume 1 (research):
>>    NO dedupe (too big)
>>    10 million inodes, approx 3.5T, 108G in snapshots
>>    endures random user read/write but usually fairly light traffic.
>>    Populated initially with rsync then opened to user access via NFS.
>>    Sun Feb 28 21:38:11 EST [hank: wafl.reallocate.check.value:info]:
>>    Allocation measurement check on '/vol/research' is 1.
>>
>>    volume 2 (reinstallbackups):
>>    dedupe enabled
>>    6.6 million files, approx 1.6T, 862G in snapshots
>>    volume created over a year ago and has several dozen gigs of windows
>>    PC backups written or read multiple times per week using CIFS but
>>    otherwise COMPLETELY idle.  Older data is generally deleted to
>>    snapshots after some weeks and the snapshots expire after a few weeks.
>>    Only accessed via CIFS.
>>    Mon Mar  1 12:15:58 EST [hank: wafl.reallocate.check.value:info]:
>>    Allocation measurement check on '/vol/reinstallbackups' is 1.
>>
>>
>>    hanksata1 only has one volume besides the small test ones I made,
>>    it runs plenty fast.
>>    dedupe enabled
>>
>>    4.3 million files, approx 1.6T, 12G in snapshots
>>    created a few months ago on an otherwise unused new aggregate with
>>    initial rsync,
>>    then daily rsyncs from another fileserver that is not very active
>>
>>
>>
>>    disk             ut%  xfers  ureads--chain-usecs writes--chain-usecs
>>    cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs
>>    /hanksata0/plex0/rg0:
>>    0c.16              7   5.69    0.94   1.00 55269   3.22   3.02  2439
>>       1.52   2.71   579   0.00   ....     .   0.00   ....     .
>>    0c.17              9   6.34    0.94   1.00 74308   3.84   2.86  2228
>>       1.56   2.93   873   0.00   ....     .   0.00   ....     .
>>    0c.18             63 121.00  118.86   1.01 30249   1.38   3.26  3516
>>       0.76   5.43  2684   0.00   ....     .   0.00   ....     .
>>    0c.19             60 117.74  116.69   1.00 30546   0.40   3.73  5049
>>       0.65   5.56  2840   0.00   ....     .   0.00   ....     .
>>    0c.20             60 120.82  119.66   1.02 29156   0.43   5.33  5469
>>       0.72   4.80  3583   0.00   ....     .   0.00   ....     .
>>    0c.21             60 119.37  118.25   1.02 29654   0.36   4.60  5870
>>       0.76   5.76  3140   0.00   ....     .   0.00   ....     .
>>    0c.22             62 124.87  123.32   1.02 29423   0.62   5.65  5677
>>       0.94   3.58  2710   0.00   ....     .   0.00   ....     .
>>    0c.23             62 119.48  118.35   1.03 30494   0.36   4.00  6875
>>       0.76   5.14  3417   0.00   ....     .   0.00   ....     .
>>    0c.24             61 119.08  117.96   1.02 29981   0.47   6.92  3289
>>       0.65   3.94  2930   0.00   ....     .   0.00   ....     .
>>    0c.25             93 118.17  116.72   1.03 45454   0.58   4.00 17719
>>       0.87   4.63 11658   0.00   ....     .   0.00   ....     .
>>    0c.26             61 121.40  120.27   1.04 29271   0.43   7.75  3097
>>       0.69   5.21  2131   0.00   ....     .   0.00   ....     .
>>    0c.27             59 115.75  114.81   1.03 29820   0.43   5.50  4530
>>       0.51   6.00  3321   0.00   ....     .   0.00   ....     .
>>    0c.28             63 125.53  124.15   1.01 30302   0.65   6.94  3808
>>       0.72   3.40  5191   0.00   ....     .   0.00   ....     .
>>
>>    Both sata shelves are on controller 0c attached to two 3040.
>>    Both sata shelves are on controller 0c attached to two 3040.
>>    Raid-DP in 13-disk raid groups so we have 2 parity and one spare
>>    per shelf.
>>    Active-Active single path HA.
>>    Latest firmwares/code as of beginning of the year. 7.3.2.
>>    no VMs, no snapmirror, nothing fancy that I can think of.
>>    wafl scan status only shows 'active bitmap rearrangement' or
>>    'container block reclamation'.
>>
>>    Thanks for thoughts and input!
>>
>>
>>
>>
>> --
>> No Signature Required
>> Save The Bits, Save The World!
>
>

Adam McDougall | 1 Mar 2010 23:58
Picon
Favicon

Re: Slow aggregate/shelf, hot disk

On 03/01/10 14:49, tmac wrote:
> Questions:
>
> What does the raid layout look like on the aggregate (aggr status -r aggrname)

hank> aggr status -r hanksata0
Aggregate hanksata0 (online, raid_dp) (block checksums)
   Plex /hanksata0/plex0 (online, normal, active)
     RAID group /hanksata0/plex0/rg0 (normal)

       RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used 
(MB/blks)    Phys (MB/blks)
       --------- ------  ------------- ---- ---- ---- ----- 
--------------    --------------
       dparity   0c.16   0c    1   0   FC:B   -  ATA   7200 
635555/1301618176 635858/1302238304
       parity    0c.17   0c    1   1   FC:B   -  ATA   7200 
635555/1301618176 635858/1302238304
       data      0c.18   0c    1   2   FC:B   -  ATA   7200 
635555/1301618176 635858/1302238304
       data      0c.19   0c    1   3   FC:B   -  ATA   7200 
635555/1301618176 635858/1302238304
       data      0c.20   0c    1   4   FC:B   -  ATA   7200 
635555/1301618176 635858/1302238304
       data      0c.21   0c    1   5   FC:B   -  ATA   7200 
635555/1301618176 635858/1302238304
       data      0c.22   0c    1   6   FC:B   -  ATA   7200 
635555/1301618176 635858/1302238304
       data      0c.23   0c    1   7   FC:B   -  ATA   7200 
635555/1301618176 635858/1302238304
       data      0c.24   0c    1   8   FC:B   -  ATA   7200 
635555/1301618176 635858/1302238304
       data      0c.25   0c    1   9   FC:B   -  ATA   7200 
635555/1301618176 635858/1302238304
       data      0c.26   0c    1   10  FC:B   -  ATA   7200 
635555/1301618176 635858/1302238304
       data      0c.27   0c    1   11  FC:B   -  ATA   7200 
635555/1301618176 635858/1302238304
       data      0c.28   0c    1   12  FC:B   -  ATA   7200 
635555/1301618176 635858/1302238304

>
> Did you *ever* let this aggregate fill up or get nearly full (90% or
> more) before adding more disks?

I have never added more disks to it.  I *attempted* to once, but it 
rejected my request because the aggr would have been over 16T, which is 
why I created a second aggr just like it with bigger disks that seems to 
work just fine:

hank> aggr status -r hanksata1
Aggregate hanksata1 (online, raid_dp) (block checksums)
   Plex /hanksata1/plex0 (online, normal, active)
     RAID group /hanksata1/plex0/rg0 (normal)

       RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used 
(MB/blks)    Phys (MB/blks)
       --------- ------  ------------- ---- ---- ---- ----- 
--------------    --------------
       dparity   0c.39   0c    2   7   FC:B   -  ATA   7200 
847555/1735794176 847827/1736350304
       parity    0c.38   0c    2   6   FC:B   -  ATA   7200 
847555/1735794176 847827/1736350304
       data      0c.44   0c    2   12  FC:B   -  ATA   7200 
847555/1735794176 847827/1736350304
       data      0c.43   0c    2   11  FC:B   -  ATA   7200 
847555/1735794176 847827/1736350304
       data      0c.37   0c    2   5   FC:B   -  ATA   7200 
847555/1735794176 847827/1736350304
       data      0c.36   0c    2   4   FC:B   -  ATA   7200 
847555/1735794176 847827/1736350304
       data      0c.42   0c    2   10  FC:B   -  ATA   7200 
847555/1735794176 847827/1736350304
       data      0c.35   0c    2   3   FC:B   -  ATA   7200 
847555/1735794176 847827/1736350304
       data      0c.41   0c    2   9   FC:B   -  ATA   7200 
847555/1735794176 847827/1736350304
       data      0c.34   0c    2   2   FC:B   -  ATA   7200 
847555/1735794176 847827/1736350304
       data      0c.40   0c    2   8   FC:B   -  ATA   7200 
847555/1735794176 847827/1736350304
       data      0c.33   0c    2   1   FC:B   -  ATA   7200 
847555/1735794176 847827/1736350304
       data      0c.32   0c    2   0   FC:B   -  ATA   7200 
847555/1735794176 847827/1736350304

>
> If you added more disks, how were they added? In other words, what was
> the layout before and after the disk add?
>
> --tmac
>           Tim McCarthy
>       Principal Consultant
>
>
>    RedHat Certified Engineer
>     804006984323821 (RHEL4)
>     805007643429572 (RHEL5)
>
>
>
> On Mon, Mar 1, 2010 at 2:15 PM, Adam McDougall<mcdouga9 <at> egr.msu.edu>  wrote:
>> Approx late Sept-09.  I wouldn't be surprised if it was slow before that
>> but I have no real data to back that up.
>>
>> On 03/01/10 13:50, Jeff Mohler wrote:
>>>
>>> How long has this aggregate been over 95% full?
>>>
>>>
>>>
>>> On Mon, Mar 1, 2010 at 10:34 AM, Adam McDougall<mcdouga9 <at> egr.msu.edu
>>> <mailto:mcdouga9 <at> egr.msu.edu>>  wrote:
>>>
>>>     For a long time we've known backing up our largest volume (3.5T) was
>>>     slow.  More recently I've been investigating why and it seems like a
>>>     problem with only that shelf or possibly aggregate.  Basically it is
>>>     several times slower than any other shelf/aggregate we have, it
>>>     seems bottlenecked whether I am reading/writing from nfs, ndmp,
>>>     reallocate scans, etc, that shelf is always slower.  I will probably
>>>     have a support case opened tomorrow with netapp but I feel like
>>>     checking with the list to see what else I can find out on my own.
>>>     When doing NDMP backups I get only around 230Mbit/sec as opposed to
>>>     800+ on others.  The performance drops distinctly on the hour
>>>     probably for snapshots (see pic).  Details below.  0c.25 seems like
>>>     a hot disk but the activity on that aggr also seems too high since
>>>     the network bandwidth is fairly small.  A 'reallocate measure' on
>>>     the two large volumes on aggregate hanksata0 both return a score of
>>>     '1'.
>>>
>>>     I guess my two main questions are, how do I figure out what is
>>>     causing the activity on hanksata0 (especially the hot disk which is
>>>     sometimes at 100%) and if its not just activity but an actual
>>>     problem, how could I further debug the slow performance to find out
>>>     what items are at fault?
>>>
>>>     I used ndmpcopy to copy a fast volume with large files from another
>>>     filer to a new volume on hanksata0 and hanksata1.  The volume on
>>>     hanksata0 is slow but the one on hanksata1 is not.  Both of those
>>>     aggregates are on the same loop with hanksata1 terminating it.
>>>
>>>     Sun Feb 28 20:14:20 EST [hank: wafl.scan.start:info]: Starting WAFL
>>>     layout measurement on volume scratchtest.
>>>     Sun Feb 28 20:19:01 EST [hank: wafl.reallocate.check.value:info]:
>>>     Allocation measurement check on
>>>     '/vol/scratchtest' is 2.
>>>
>>>     ^^^ almost 5 minutes!
>>>
>>>     Sun Feb 28 20:13:38 EST [hank: wafl.scan.start:info]: Starting
>>>     WAFL layout measurement on volume scratchtest2.
>>>     Sun Feb 28 20:14:12 EST [hank: wafl.reallocate.check.value:info]:
>>>     Allocation measurement check on
>>>     '/vol/scratchtest2' is 1.
>>>
>>>     ^^^ less than 1 min
>>>
>>>     When I write to scratchtest, you can see the network bandwidth jump
>>>     up for a few seconds then it stalls for about twice as long,
>>>     presumably so the filer can catch up writing, then it repeats.
>>>     Speed averages around 30-40MB/sec if that.
>>>
>>>     I even tried using the spare sata disk from both of these shelves
>>>     to make a new volume, copied scratchtest to it (which took 26
>>>     minutes for around 40G), and reads were equally slow as the existing
>>>     scratchtest, although I'm not sure if thats because a single disk is
>>>     too slow to prove anything, or theres a shelf problem.
>>>
>>>     hanksata0           6120662048 6041632124   79029924      99%
>>>     hanksata0/.snapshot  322140104   14465904  307674200       4%
>>>     hanksata1           8162374688 2191140992 5971233696      27%
>>>     hanksata1/.snapshot  429598664   39636812  389961852       9%
>>>
>>>     hanksata0 and 1 are both ds14mk2 AT but hanksata0 has
>>>     X268_HGEMI aka X268A-R5 (750m x 14) and hanksata1 has
>>>     disks X269_HGEMI aka X269A-R5 (1T x 14).  hanksata0 has
>>>     been around since we got the filer say around 2 years ago,
>>>     hanksata1 was added within the last half year.  Both
>>>     shelves have always had 11 data disks, 2 parity, 1 spare,
>>>     the aggregates were never grown.
>>>
>>>     volumes on hanksata0 besides root (all created over a year ago):
>>>
>>>     volume 1 (research):
>>>     NO dedupe (too big)
>>>     10 million inodes, approx 3.5T, 108G in snapshots
>>>     endures random user read/write but usually fairly light traffic.
>>>     Populated initially with rsync then opened to user access via NFS.
>>>     Sun Feb 28 21:38:11 EST [hank: wafl.reallocate.check.value:info]:
>>>     Allocation measurement check on '/vol/research' is 1.
>>>
>>>     volume 2 (reinstallbackups):
>>>     dedupe enabled
>>>     6.6 million files, approx 1.6T, 862G in snapshots
>>>     volume created over a year ago and has several dozen gigs of windows
>>>     PC backups written or read multiple times per week using CIFS but
>>>     otherwise COMPLETELY idle.  Older data is generally deleted to
>>>     snapshots after some weeks and the snapshots expire after a few weeks.
>>>     Only accessed via CIFS.
>>>     Mon Mar  1 12:15:58 EST [hank: wafl.reallocate.check.value:info]:
>>>     Allocation measurement check on '/vol/reinstallbackups' is 1.
>>>
>>>
>>>     hanksata1 only has one volume besides the small test ones I made,
>>>     it runs plenty fast.
>>>     dedupe enabled
>>>
>>>     4.3 million files, approx 1.6T, 12G in snapshots
>>>     created a few months ago on an otherwise unused new aggregate with
>>>     initial rsync,
>>>     then daily rsyncs from another fileserver that is not very active
>>>
>>>
>>>
>>>     disk             ut%  xfers  ureads--chain-usecs writes--chain-usecs
>>>     cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs
>>>     /hanksata0/plex0/rg0:
>>>     0c.16              7   5.69    0.94   1.00 55269   3.22   3.02  2439
>>>        1.52   2.71   579   0.00   ....     .   0.00   ....     .
>>>     0c.17              9   6.34    0.94   1.00 74308   3.84   2.86  2228
>>>        1.56   2.93   873   0.00   ....     .   0.00   ....     .
>>>     0c.18             63 121.00  118.86   1.01 30249   1.38   3.26  3516
>>>        0.76   5.43  2684   0.00   ....     .   0.00   ....     .
>>>     0c.19             60 117.74  116.69   1.00 30546   0.40   3.73  5049
>>>        0.65   5.56  2840   0.00   ....     .   0.00   ....     .
>>>     0c.20             60 120.82  119.66   1.02 29156   0.43   5.33  5469
>>>        0.72   4.80  3583   0.00   ....     .   0.00   ....     .
>>>     0c.21             60 119.37  118.25   1.02 29654   0.36   4.60  5870
>>>        0.76   5.76  3140   0.00   ....     .   0.00   ....     .
>>>     0c.22             62 124.87  123.32   1.02 29423   0.62   5.65  5677
>>>        0.94   3.58  2710   0.00   ....     .   0.00   ....     .
>>>     0c.23             62 119.48  118.35   1.03 30494   0.36   4.00  6875
>>>        0.76   5.14  3417   0.00   ....     .   0.00   ....     .
>>>     0c.24             61 119.08  117.96   1.02 29981   0.47   6.92  3289
>>>        0.65   3.94  2930   0.00   ....     .   0.00   ....     .
>>>     0c.25             93 118.17  116.72   1.03 45454   0.58   4.00 17719
>>>        0.87   4.63 11658   0.00   ....     .   0.00   ....     .
>>>     0c.26             61 121.40  120.27   1.04 29271   0.43   7.75  3097
>>>        0.69   5.21  2131   0.00   ....     .   0.00   ....     .
>>>     0c.27             59 115.75  114.81   1.03 29820   0.43   5.50  4530
>>>        0.51   6.00  3321   0.00   ....     .   0.00   ....     .
>>>     0c.28             63 125.53  124.15   1.01 30302   0.65   6.94  3808
>>>        0.72   3.40  5191   0.00   ....     .   0.00   ....     .
>>>
>>>     Both sata shelves are on controller 0c attached to two 3040.
>>>     Both sata shelves are on controller 0c attached to two 3040.
>>>     Raid-DP in 13-disk raid groups so we have 2 parity and one spare
>>>     per shelf.
>>>     Active-Active single path HA.
>>>     Latest firmwares/code as of beginning of the year. 7.3.2.
>>>     no VMs, no snapmirror, nothing fancy that I can think of.
>>>     wafl scan status only shows 'active bitmap rearrangement' or
>>>     'container block reclamation'.
>>>
>>>     Thanks for thoughts and input!
>>>
>>>
>>>
>>>
>>> --
>>> No Signature Required
>>> Save The Bits, Save The World!
>>
>>
>

tmac | 2 Mar 2010 00:27
Picon

Re: Slow aggregate/shelf, hot disk

You added more disks after the fact. Data ONTAP would not have laid
out the disks like that if they were all there to begin with.

Somethings that *might* help:

1. Shut down your filer. pull half the disks out of shelf 1 and shelf
two and swap them
2. Make sure your are configured for multipath disk I/O
-> You should have 0a, 0b, 0c & 0d as controllers.
If you can, hook 0a to 1 (module a-in), 0c to 1 (module b-in)
If you can, hook 0b to 2 (module a-in), 0d to 2 (module b-in)
-> this gives two paths to each disk and splits all your disks into 4
paths versus 1.

If you only have two controllers, make sure one is from 0a/0b and the
other is from 0c/0d
Connect one to Shelf 1-A Module-input (then daisy chain to shelf 2)
Connect one to Shelf 2-B Module-input (then daisy chain to shelf 1)

--tmac
         Tim McCarthy
     Principal Consultant

  RedHat Certified Engineer
   804006984323821 (RHEL4)
   805007643429572 (RHEL5)

On Mon, Mar 1, 2010 at 5:58 PM, Adam McDougall <mcdouga9 <at> egr.msu.edu> wrote:
> On 03/01/10 14:49, tmac wrote:
>>
>> Questions:
>>
>> What does the raid layout look like on the aggregate (aggr status -r
>> aggrname)
>
> hank> aggr status -r hanksata0
> Aggregate hanksata0 (online, raid_dp) (block checksums)
>  Plex /hanksata0/plex0 (online, normal, active)
>    RAID group /hanksata0/plex0/rg0 (normal)
>
>      RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)
>  Phys (MB/blks)
>      --------- ------  ------------- ---- ---- ---- ----- --------------
>  --------------
>      dparity   0c.16   0c    1   0   FC:B   -  ATA   7200 635555/1301618176
> 635858/1302238304
>      parity    0c.17   0c    1   1   FC:B   -  ATA   7200 635555/1301618176
> 635858/1302238304
>      data      0c.18   0c    1   2   FC:B   -  ATA   7200 635555/1301618176
> 635858/1302238304
>      data      0c.19   0c    1   3   FC:B   -  ATA   7200 635555/1301618176
> 635858/1302238304
>      data      0c.20   0c    1   4   FC:B   -  ATA   7200 635555/1301618176
> 635858/1302238304
>      data      0c.21   0c    1   5   FC:B   -  ATA   7200 635555/1301618176
> 635858/1302238304
>      data      0c.22   0c    1   6   FC:B   -  ATA   7200 635555/1301618176
> 635858/1302238304
>      data      0c.23   0c    1   7   FC:B   -  ATA   7200 635555/1301618176
> 635858/1302238304
>      data      0c.24   0c    1   8   FC:B   -  ATA   7200 635555/1301618176
> 635858/1302238304
>      data      0c.25   0c    1   9   FC:B   -  ATA   7200 635555/1301618176
> 635858/1302238304
>      data      0c.26   0c    1   10  FC:B   -  ATA   7200 635555/1301618176
> 635858/1302238304
>      data      0c.27   0c    1   11  FC:B   -  ATA   7200 635555/1301618176
> 635858/1302238304
>      data      0c.28   0c    1   12  FC:B   -  ATA   7200 635555/1301618176
> 635858/1302238304
>
>>
>> Did you *ever* let this aggregate fill up or get nearly full (90% or
>> more) before adding more disks?
>
> I have never added more disks to it.  I *attempted* to once, but it rejected
> my request because the aggr would have been over 16T, which is why I created
> a second aggr just like it with bigger disks that seems to work just fine:
>
> hank> aggr status -r hanksata1
> Aggregate hanksata1 (online, raid_dp) (block checksums)
>  Plex /hanksata1/plex0 (online, normal, active)
>    RAID group /hanksata1/plex0/rg0 (normal)
>
>      RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)
>  Phys (MB/blks)
>      --------- ------  ------------- ---- ---- ---- ----- --------------
>  --------------
>      dparity   0c.39   0c    2   7   FC:B   -  ATA   7200 847555/1735794176
> 847827/1736350304
>      parity    0c.38   0c    2   6   FC:B   -  ATA   7200 847555/1735794176
> 847827/1736350304
>      data      0c.44   0c    2   12  FC:B   -  ATA   7200 847555/1735794176
> 847827/1736350304
>      data      0c.43   0c    2   11  FC:B   -  ATA   7200 847555/1735794176
> 847827/1736350304
>      data      0c.37   0c    2   5   FC:B   -  ATA   7200 847555/1735794176
> 847827/1736350304
>      data      0c.36   0c    2   4   FC:B   -  ATA   7200 847555/1735794176
> 847827/1736350304
>      data      0c.42   0c    2   10  FC:B   -  ATA   7200 847555/1735794176
> 847827/1736350304
>      data      0c.35   0c    2   3   FC:B   -  ATA   7200 847555/1735794176
> 847827/1736350304
>      data      0c.41   0c    2   9   FC:B   -  ATA   7200 847555/1735794176
> 847827/1736350304
>      data      0c.34   0c    2   2   FC:B   -  ATA   7200 847555/1735794176
> 847827/1736350304
>      data      0c.40   0c    2   8   FC:B   -  ATA   7200 847555/1735794176
> 847827/1736350304
>      data      0c.33   0c    2   1   FC:B   -  ATA   7200 847555/1735794176
> 847827/1736350304
>      data      0c.32   0c    2   0   FC:B   -  ATA   7200 847555/1735794176
> 847827/1736350304
>
>
>>
>> If you added more disks, how were they added? In other words, what was
>> the layout before and after the disk add?
>>
>> --tmac
>>          Tim McCarthy
>>      Principal Consultant
>>
>>
>>   RedHat Certified Engineer
>>    804006984323821 (RHEL4)
>>    805007643429572 (RHEL5)
>>
>>
>>
>> On Mon, Mar 1, 2010 at 2:15 PM, Adam McDougall<mcdouga9 <at> egr.msu.edu>
>>  wrote:
>>>
>>> Approx late Sept-09.  I wouldn't be surprised if it was slow before that
>>> but I have no real data to back that up.
>>>
>>> On 03/01/10 13:50, Jeff Mohler wrote:
>>>>
>>>> How long has this aggregate been over 95% full?
>>>>
>>>>
>>>>
>>>> On Mon, Mar 1, 2010 at 10:34 AM, Adam McDougall<mcdouga9 <at> egr.msu.edu
>>>> <mailto:mcdouga9 <at> egr.msu.edu>>  wrote:
>>>>
>>>>    For a long time we've known backing up our largest volume (3.5T) was
>>>>    slow.  More recently I've been investigating why and it seems like a
>>>>    problem with only that shelf or possibly aggregate.  Basically it is
>>>>    several times slower than any other shelf/aggregate we have, it
>>>>    seems bottlenecked whether I am reading/writing from nfs, ndmp,
>>>>    reallocate scans, etc, that shelf is always slower.  I will probably
>>>>    have a support case opened tomorrow with netapp but I feel like
>>>>    checking with the list to see what else I can find out on my own.
>>>>    When doing NDMP backups I get only around 230Mbit/sec as opposed to
>>>>    800+ on others.  The performance drops distinctly on the hour
>>>>    probably for snapshots (see pic).  Details below.  0c.25 seems like
>>>>    a hot disk but the activity on that aggr also seems too high since
>>>>    the network bandwidth is fairly small.  A 'reallocate measure' on
>>>>    the two large volumes on aggregate hanksata0 both return a score of
>>>>    '1'.
>>>>
>>>>    I guess my two main questions are, how do I figure out what is
>>>>    causing the activity on hanksata0 (especially the hot disk which is
>>>>    sometimes at 100%) and if its not just activity but an actual
>>>>    problem, how could I further debug the slow performance to find out
>>>>    what items are at fault?
>>>>
>>>>    I used ndmpcopy to copy a fast volume with large files from another
>>>>    filer to a new volume on hanksata0 and hanksata1.  The volume on
>>>>    hanksata0 is slow but the one on hanksata1 is not.  Both of those
>>>>    aggregates are on the same loop with hanksata1 terminating it.
>>>>
>>>>    Sun Feb 28 20:14:20 EST [hank: wafl.scan.start:info]: Starting WAFL
>>>>    layout measurement on volume scratchtest.
>>>>    Sun Feb 28 20:19:01 EST [hank: wafl.reallocate.check.value:info]:
>>>>    Allocation measurement check on
>>>>    '/vol/scratchtest' is 2.
>>>>
>>>>    ^^^ almost 5 minutes!
>>>>
>>>>    Sun Feb 28 20:13:38 EST [hank: wafl.scan.start:info]: Starting
>>>>    WAFL layout measurement on volume scratchtest2.
>>>>    Sun Feb 28 20:14:12 EST [hank: wafl.reallocate.check.value:info]:
>>>>    Allocation measurement check on
>>>>    '/vol/scratchtest2' is 1.
>>>>
>>>>    ^^^ less than 1 min
>>>>
>>>>    When I write to scratchtest, you can see the network bandwidth jump
>>>>    up for a few seconds then it stalls for about twice as long,
>>>>    presumably so the filer can catch up writing, then it repeats.
>>>>    Speed averages around 30-40MB/sec if that.
>>>>
>>>>    I even tried using the spare sata disk from both of these shelves
>>>>    to make a new volume, copied scratchtest to it (which took 26
>>>>    minutes for around 40G), and reads were equally slow as the existing
>>>>    scratchtest, although I'm not sure if thats because a single disk is
>>>>    too slow to prove anything, or theres a shelf problem.
>>>>
>>>>    hanksata0           6120662048 6041632124   79029924      99%
>>>>    hanksata0/.snapshot  322140104   14465904  307674200       4%
>>>>    hanksata1           8162374688 2191140992 5971233696      27%
>>>>    hanksata1/.snapshot  429598664   39636812  389961852       9%
>>>>
>>>>    hanksata0 and 1 are both ds14mk2 AT but hanksata0 has
>>>>    X268_HGEMI aka X268A-R5 (750m x 14) and hanksata1 has
>>>>    disks X269_HGEMI aka X269A-R5 (1T x 14).  hanksata0 has
>>>>    been around since we got the filer say around 2 years ago,
>>>>    hanksata1 was added within the last half year.  Both
>>>>    shelves have always had 11 data disks, 2 parity, 1 spare,
>>>>    the aggregates were never grown.
>>>>
>>>>    volumes on hanksata0 besides root (all created over a year ago):
>>>>
>>>>    volume 1 (research):
>>>>    NO dedupe (too big)
>>>>    10 million inodes, approx 3.5T, 108G in snapshots
>>>>    endures random user read/write but usually fairly light traffic.
>>>>    Populated initially with rsync then opened to user access via NFS.
>>>>    Sun Feb 28 21:38:11 EST [hank: wafl.reallocate.check.value:info]:
>>>>    Allocation measurement check on '/vol/research' is 1.
>>>>
>>>>    volume 2 (reinstallbackups):
>>>>    dedupe enabled
>>>>    6.6 million files, approx 1.6T, 862G in snapshots
>>>>    volume created over a year ago and has several dozen gigs of windows
>>>>    PC backups written or read multiple times per week using CIFS but
>>>>    otherwise COMPLETELY idle.  Older data is generally deleted to
>>>>    snapshots after some weeks and the snapshots expire after a few
>>>> weeks.
>>>>    Only accessed via CIFS.
>>>>    Mon Mar  1 12:15:58 EST [hank: wafl.reallocate.check.value:info]:
>>>>    Allocation measurement check on '/vol/reinstallbackups' is 1.
>>>>
>>>>
>>>>    hanksata1 only has one volume besides the small test ones I made,
>>>>    it runs plenty fast.
>>>>    dedupe enabled
>>>>
>>>>    4.3 million files, approx 1.6T, 12G in snapshots
>>>>    created a few months ago on an otherwise unused new aggregate with
>>>>    initial rsync,
>>>>    then daily rsyncs from another fileserver that is not very active
>>>>
>>>>
>>>>
>>>>    disk             ut%  xfers  ureads--chain-usecs writes--chain-usecs
>>>>    cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs
>>>>    /hanksata0/plex0/rg0:
>>>>    0c.16              7   5.69    0.94   1.00 55269   3.22   3.02  2439
>>>>       1.52   2.71   579   0.00   ....     .   0.00   ....     .
>>>>    0c.17              9   6.34    0.94   1.00 74308   3.84   2.86  2228
>>>>       1.56   2.93   873   0.00   ....     .   0.00   ....     .
>>>>    0c.18             63 121.00  118.86   1.01 30249   1.38   3.26  3516
>>>>       0.76   5.43  2684   0.00   ....     .   0.00   ....     .
>>>>    0c.19             60 117.74  116.69   1.00 30546   0.40   3.73  5049
>>>>       0.65   5.56  2840   0.00   ....     .   0.00   ....     .
>>>>    0c.20             60 120.82  119.66   1.02 29156   0.43   5.33  5469
>>>>       0.72   4.80  3583   0.00   ....     .   0.00   ....     .
>>>>    0c.21             60 119.37  118.25   1.02 29654   0.36   4.60  5870
>>>>       0.76   5.76  3140   0.00   ....     .   0.00   ....     .
>>>>    0c.22             62 124.87  123.32   1.02 29423   0.62   5.65  5677
>>>>       0.94   3.58  2710   0.00   ....     .   0.00   ....     .
>>>>    0c.23             62 119.48  118.35   1.03 30494   0.36   4.00  6875
>>>>       0.76   5.14  3417   0.00   ....     .   0.00   ....     .
>>>>    0c.24             61 119.08  117.96   1.02 29981   0.47   6.92  3289
>>>>       0.65   3.94  2930   0.00   ....     .   0.00   ....     .
>>>>    0c.25             93 118.17  116.72   1.03 45454   0.58   4.00 17719
>>>>       0.87   4.63 11658   0.00   ....     .   0.00   ....     .
>>>>    0c.26             61 121.40  120.27   1.04 29271   0.43   7.75  3097
>>>>       0.69   5.21  2131   0.00   ....     .   0.00   ....     .
>>>>    0c.27             59 115.75  114.81   1.03 29820   0.43   5.50  4530
>>>>       0.51   6.00  3321   0.00   ....     .   0.00   ....     .
>>>>    0c.28             63 125.53  124.15   1.01 30302   0.65   6.94  3808
>>>>       0.72   3.40  5191   0.00   ....     .   0.00   ....     .
>>>>
>>>>    Both sata shelves are on controller 0c attached to two 3040.
>>>>    Both sata shelves are on controller 0c attached to two 3040.
>>>>    Raid-DP in 13-disk raid groups so we have 2 parity and one spare
>>>>    per shelf.
>>>>    Active-Active single path HA.
>>>>    Latest firmwares/code as of beginning of the year. 7.3.2.
>>>>    no VMs, no snapmirror, nothing fancy that I can think of.
>>>>    wafl scan status only shows 'active bitmap rearrangement' or
>>>>    'container block reclamation'.
>>>>
>>>>    Thanks for thoughts and input!
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> No Signature Required
>>>> Save The Bits, Save The World!
>>>
>>>
>>
>
>

tmac | 2 Mar 2010 00:29
Picon

Re: Slow aggregate/shelf, hot disk

Sorry, missed the Active/Active Stuff.
The multipath is slightly different for a clustered system. Please
refer to the multipath I/O guide on now for proper cabling techniques.

--tmac
         Tim McCarthy
     Principal Consultant

  RedHat Certified Engineer
   804006984323821 (RHEL4)
   805007643429572 (RHEL5)

On Mon, Mar 1, 2010 at 6:27 PM, tmac <tmacmd <at> gmail.com> wrote:
> You added more disks after the fact. Data ONTAP would not have laid
> out the disks like that if they were all there to begin with.
>
> Somethings that *might* help:
>
> 1. Shut down your filer. pull half the disks out of shelf 1 and shelf
> two and swap them
> 2. Make sure your are configured for multipath disk I/O
> -> You should have 0a, 0b, 0c & 0d as controllers.
> If you can, hook 0a to 1 (module a-in), 0c to 1 (module b-in)
> If you can, hook 0b to 2 (module a-in), 0d to 2 (module b-in)
> -> this gives two paths to each disk and splits all your disks into 4
> paths versus 1.
>
> If you only have two controllers, make sure one is from 0a/0b and the
> other is from 0c/0d
> Connect one to Shelf 1-A Module-input (then daisy chain to shelf 2)
> Connect one to Shelf 2-B Module-input (then daisy chain to shelf 1)
>
> --tmac
>         Tim McCarthy
>     Principal Consultant
>
>  RedHat Certified Engineer
>   804006984323821 (RHEL4)
>   805007643429572 (RHEL5)
>
>
>
> On Mon, Mar 1, 2010 at 5:58 PM, Adam McDougall <mcdouga9 <at> egr.msu.edu> wrote:
>> On 03/01/10 14:49, tmac wrote:
>>>
>>> Questions:
>>>
>>> What does the raid layout look like on the aggregate (aggr status -r
>>> aggrname)
>>
>> hank> aggr status -r hanksata0
>> Aggregate hanksata0 (online, raid_dp) (block checksums)
>>  Plex /hanksata0/plex0 (online, normal, active)
>>    RAID group /hanksata0/plex0/rg0 (normal)
>>
>>      RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)
>>  Phys (MB/blks)
>>      --------- ------  ------------- ---- ---- ---- ----- --------------
>>  --------------
>>      dparity   0c.16   0c    1   0   FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>      parity    0c.17   0c    1   1   FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>      data      0c.18   0c    1   2   FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>      data      0c.19   0c    1   3   FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>      data      0c.20   0c    1   4   FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>      data      0c.21   0c    1   5   FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>      data      0c.22   0c    1   6   FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>      data      0c.23   0c    1   7   FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>      data      0c.24   0c    1   8   FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>      data      0c.25   0c    1   9   FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>      data      0c.26   0c    1   10  FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>      data      0c.27   0c    1   11  FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>      data      0c.28   0c    1   12  FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>
>>>
>>> Did you *ever* let this aggregate fill up or get nearly full (90% or
>>> more) before adding more disks?
>>
>> I have never added more disks to it.  I *attempted* to once, but it rejected
>> my request because the aggr would have been over 16T, which is why I created
>> a second aggr just like it with bigger disks that seems to work just fine:
>>
>> hank> aggr status -r hanksata1
>> Aggregate hanksata1 (online, raid_dp) (block checksums)
>>  Plex /hanksata1/plex0 (online, normal, active)
>>    RAID group /hanksata1/plex0/rg0 (normal)
>>
>>      RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)
>>  Phys (MB/blks)
>>      --------- ------  ------------- ---- ---- ---- ----- --------------
>>  --------------
>>      dparity   0c.39   0c    2   7   FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>      parity    0c.38   0c    2   6   FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>      data      0c.44   0c    2   12  FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>      data      0c.43   0c    2   11  FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>      data      0c.37   0c    2   5   FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>      data      0c.36   0c    2   4   FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>      data      0c.42   0c    2   10  FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>      data      0c.35   0c    2   3   FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>      data      0c.41   0c    2   9   FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>      data      0c.34   0c    2   2   FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>      data      0c.40   0c    2   8   FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>      data      0c.33   0c    2   1   FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>      data      0c.32   0c    2   0   FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>
>>
>>>
>>> If you added more disks, how were they added? In other words, what was
>>> the layout before and after the disk add?
>>>
>>> --tmac
>>>          Tim McCarthy
>>>      Principal Consultant
>>>
>>>
>>>   RedHat Certified Engineer
>>>    804006984323821 (RHEL4)
>>>    805007643429572 (RHEL5)
>>>
>>>
>>>
>>> On Mon, Mar 1, 2010 at 2:15 PM, Adam McDougall<mcdouga9 <at> egr.msu.edu>
>>>  wrote:
>>>>
>>>> Approx late Sept-09.  I wouldn't be surprised if it was slow before that
>>>> but I have no real data to back that up.
>>>>
>>>> On 03/01/10 13:50, Jeff Mohler wrote:
>>>>>
>>>>> How long has this aggregate been over 95% full?
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Mar 1, 2010 at 10:34 AM, Adam McDougall<mcdouga9 <at> egr.msu.edu
>>>>> <mailto:mcdouga9 <at> egr.msu.edu>>  wrote:
>>>>>
>>>>>    For a long time we've known backing up our largest volume (3.5T) was
>>>>>    slow.  More recently I've been investigating why and it seems like a
>>>>>    problem with only that shelf or possibly aggregate.  Basically it is
>>>>>    several times slower than any other shelf/aggregate we have, it
>>>>>    seems bottlenecked whether I am reading/writing from nfs, ndmp,
>>>>>    reallocate scans, etc, that shelf is always slower.  I will probably
>>>>>    have a support case opened tomorrow with netapp but I feel like
>>>>>    checking with the list to see what else I can find out on my own.
>>>>>    When doing NDMP backups I get only around 230Mbit/sec as opposed to
>>>>>    800+ on others.  The performance drops distinctly on the hour
>>>>>    probably for snapshots (see pic).  Details below.  0c.25 seems like
>>>>>    a hot disk but the activity on that aggr also seems too high since
>>>>>    the network bandwidth is fairly small.  A 'reallocate measure' on
>>>>>    the two large volumes on aggregate hanksata0 both return a score of
>>>>>    '1'.
>>>>>
>>>>>    I guess my two main questions are, how do I figure out what is
>>>>>    causing the activity on hanksata0 (especially the hot disk which is
>>>>>    sometimes at 100%) and if its not just activity but an actual
>>>>>    problem, how could I further debug the slow performance to find out
>>>>>    what items are at fault?
>>>>>
>>>>>    I used ndmpcopy to copy a fast volume with large files from another
>>>>>    filer to a new volume on hanksata0 and hanksata1.  The volume on
>>>>>    hanksata0 is slow but the one on hanksata1 is not.  Both of those
>>>>>    aggregates are on the same loop with hanksata1 terminating it.
>>>>>
>>>>>    Sun Feb 28 20:14:20 EST [hank: wafl.scan.start:info]: Starting WAFL
>>>>>    layout measurement on volume scratchtest.
>>>>>    Sun Feb 28 20:19:01 EST [hank: wafl.reallocate.check.value:info]:
>>>>>    Allocation measurement check on
>>>>>    '/vol/scratchtest' is 2.
>>>>>
>>>>>    ^^^ almost 5 minutes!
>>>>>
>>>>>    Sun Feb 28 20:13:38 EST [hank: wafl.scan.start:info]: Starting
>>>>>    WAFL layout measurement on volume scratchtest2.
>>>>>    Sun Feb 28 20:14:12 EST [hank: wafl.reallocate.check.value:info]:
>>>>>    Allocation measurement check on
>>>>>    '/vol/scratchtest2' is 1.
>>>>>
>>>>>    ^^^ less than 1 min
>>>>>
>>>>>    When I write to scratchtest, you can see the network bandwidth jump
>>>>>    up for a few seconds then it stalls for about twice as long,
>>>>>    presumably so the filer can catch up writing, then it repeats.
>>>>>    Speed averages around 30-40MB/sec if that.
>>>>>
>>>>>    I even tried using the spare sata disk from both of these shelves
>>>>>    to make a new volume, copied scratchtest to it (which took 26
>>>>>    minutes for around 40G), and reads were equally slow as the existing
>>>>>    scratchtest, although I'm not sure if thats because a single disk is
>>>>>    too slow to prove anything, or theres a shelf problem.
>>>>>
>>>>>    hanksata0           6120662048 6041632124   79029924      99%
>>>>>    hanksata0/.snapshot  322140104   14465904  307674200       4%
>>>>>    hanksata1           8162374688 2191140992 5971233696      27%
>>>>>    hanksata1/.snapshot  429598664   39636812  389961852       9%
>>>>>
>>>>>    hanksata0 and 1 are both ds14mk2 AT but hanksata0 has
>>>>>    X268_HGEMI aka X268A-R5 (750m x 14) and hanksata1 has
>>>>>    disks X269_HGEMI aka X269A-R5 (1T x 14).  hanksata0 has
>>>>>    been around since we got the filer say around 2 years ago,
>>>>>    hanksata1 was added within the last half year.  Both
>>>>>    shelves have always had 11 data disks, 2 parity, 1 spare,
>>>>>    the aggregates were never grown.
>>>>>
>>>>>    volumes on hanksata0 besides root (all created over a year ago):
>>>>>
>>>>>    volume 1 (research):
>>>>>    NO dedupe (too big)
>>>>>    10 million inodes, approx 3.5T, 108G in snapshots
>>>>>    endures random user read/write but usually fairly light traffic.
>>>>>    Populated initially with rsync then opened to user access via NFS.
>>>>>    Sun Feb 28 21:38:11 EST [hank: wafl.reallocate.check.value:info]:
>>>>>    Allocation measurement check on '/vol/research' is 1.
>>>>>
>>>>>    volume 2 (reinstallbackups):
>>>>>    dedupe enabled
>>>>>    6.6 million files, approx 1.6T, 862G in snapshots
>>>>>    volume created over a year ago and has several dozen gigs of windows
>>>>>    PC backups written or read multiple times per week using CIFS but
>>>>>    otherwise COMPLETELY idle.  Older data is generally deleted to
>>>>>    snapshots after some weeks and the snapshots expire after a few
>>>>> weeks.
>>>>>    Only accessed via CIFS.
>>>>>    Mon Mar  1 12:15:58 EST [hank: wafl.reallocate.check.value:info]:
>>>>>    Allocation measurement check on '/vol/reinstallbackups' is 1.
>>>>>
>>>>>
>>>>>    hanksata1 only has one volume besides the small test ones I made,
>>>>>    it runs plenty fast.
>>>>>    dedupe enabled
>>>>>
>>>>>    4.3 million files, approx 1.6T, 12G in snapshots
>>>>>    created a few months ago on an otherwise unused new aggregate with
>>>>>    initial rsync,
>>>>>    then daily rsyncs from another fileserver that is not very active
>>>>>
>>>>>
>>>>>
>>>>>    disk             ut%  xfers  ureads--chain-usecs writes--chain-usecs
>>>>>    cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs
>>>>>    /hanksata0/plex0/rg0:
>>>>>    0c.16              7   5.69    0.94   1.00 55269   3.22   3.02  2439
>>>>>       1.52   2.71   579   0.00   ....     .   0.00   ....     .
>>>>>    0c.17              9   6.34    0.94   1.00 74308   3.84   2.86  2228
>>>>>       1.56   2.93   873   0.00   ....     .   0.00   ....     .
>>>>>    0c.18             63 121.00  118.86   1.01 30249   1.38   3.26  3516
>>>>>       0.76   5.43  2684   0.00   ....     .   0.00   ....     .
>>>>>    0c.19             60 117.74  116.69   1.00 30546   0.40   3.73  5049
>>>>>       0.65   5.56  2840   0.00   ....     .   0.00   ....     .
>>>>>    0c.20             60 120.82  119.66   1.02 29156   0.43   5.33  5469
>>>>>       0.72   4.80  3583   0.00   ....     .   0.00   ....     .
>>>>>    0c.21             60 119.37  118.25   1.02 29654   0.36   4.60  5870
>>>>>       0.76   5.76  3140   0.00   ....     .   0.00   ....     .
>>>>>    0c.22             62 124.87  123.32   1.02 29423   0.62   5.65  5677
>>>>>       0.94   3.58  2710   0.00   ....     .   0.00   ....     .
>>>>>    0c.23             62 119.48  118.35   1.03 30494   0.36   4.00  6875
>>>>>       0.76   5.14  3417   0.00   ....     .   0.00   ....     .
>>>>>    0c.24             61 119.08  117.96   1.02 29981   0.47   6.92  3289
>>>>>       0.65   3.94  2930   0.00   ....     .   0.00   ....     .
>>>>>    0c.25             93 118.17  116.72   1.03 45454   0.58   4.00 17719
>>>>>       0.87   4.63 11658   0.00   ....     .   0.00   ....     .
>>>>>    0c.26             61 121.40  120.27   1.04 29271   0.43   7.75  3097
>>>>>       0.69   5.21  2131   0.00   ....     .   0.00   ....     .
>>>>>    0c.27             59 115.75  114.81   1.03 29820   0.43   5.50  4530
>>>>>       0.51   6.00  3321   0.00   ....     .   0.00   ....     .
>>>>>    0c.28             63 125.53  124.15   1.01 30302   0.65   6.94  3808
>>>>>       0.72   3.40  5191   0.00   ....     .   0.00   ....     .
>>>>>
>>>>>    Both sata shelves are on controller 0c attached to two 3040.
>>>>>    Both sata shelves are on controller 0c attached to two 3040.
>>>>>    Raid-DP in 13-disk raid groups so we have 2 parity and one spare
>>>>>    per shelf.
>>>>>    Active-Active single path HA.
>>>>>    Latest firmwares/code as of beginning of the year. 7.3.2.
>>>>>    no VMs, no snapmirror, nothing fancy that I can think of.
>>>>>    wafl scan status only shows 'active bitmap rearrangement' or
>>>>>    'container block reclamation'.
>>>>>
>>>>>    Thanks for thoughts and input!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> No Signature Required
>>>>> Save The Bits, Save The World!
>>>>
>>>>
>>>
>>
>>
>

Adam McDougall | 2 Mar 2010 01:48
Picon
Favicon

Re: Slow aggregate/shelf, hot disk

Okay you caught me (sort of).  I looked back in my documentation just to 
see.  On May 2 2008 I installed this filer by netbooting 7.2.4 and 
zeroing all the disks.  The installer setup 3 disks automatically:

Fri May  2 00:26:23 GMT [raid.vol.disk.add.done:notice]: Addition of 
Disk /aggr0/plex0/rg0/0c.18 Shelf 1 Bay 2 [NETAPP   X268_HGEMIT75SSX 
A90A] S/N [P8G8W6ZF] to aggregate aggr0 has completed successfully
Fri May  2 00:26:23 GMT [raid.vol.disk.add.done:notice]: Addition of 
Disk /aggr0/plex0/rg0/0c.17 Shelf 1 Bay 1 [NETAPP   X268_HGEMIT75SSX 
A90A] S/N [P8G8TT2F] to aggregate aggr0 has completed successfully
Fri May  2 00:26:23 GMT [raid.vol.disk.add.done:notice]: Addition of 
Disk /aggr0/plex0/rg0/0c.16 Shelf 1 Bay 0 [NETAPP   X268_HGEMIT75SSX 
A90A] S/N [P8G8WG4F] to aggregate aggr0 has completed successfully

I renamed the aggr to hanksata0, renamed vol0 to root, set the raidsize 
to 13, then I added 10 of the 11 spares on that shelf to hanksata0 with 
this command:
aggr add hanksata0 -d 0c.19 0c.20 0c.21 0c.22 0c.23 0c.24 0c.25 0c.26 
0c.27 0c.28

Then I created my data volume:
vol create research -l C hanksata0 2900g
and eventually copied data to it.

I found from a Weekly status email from the filer on 5/18/2008:

===== DF-R =====
Filesystem              kbytes       used      avail   reserved  Mounted on
/vol/root/            20971520     267724   20703796          0  /vol/root/
/vol/root/.snapshot    5242880      57268    5185612          0 
/vol/root/.snapshot
/vol/research/      2949644288   54024652 2895619636          0 
/vol/research/
/vol/research/.snapshot   91226112      40836   91185276          0 
/vol/research/.snapshot

===== DF-A =====
Aggregate               kbytes       used      avail capacity
hanksata0           6120662048 3067381464 3053280584      50%
hanksata0/.snapshot  322140104    1433816  320706288       0%

So as of 5/18/2008 the aggregate existed at its current size with nearly 
nothing on it (54 gigs looks like).

I'm using 0a/0b on both filers for connections to two other FC loops 
which were setup at different times, so I did what I could with the 
resources I had.  Yes, ideally I should order more FC interfaces and 
whatnot to have multipath to each loop but our usage can tolerate a 
cluster failover if I lose a cable.  Smaller stripes would probably be 
better, but our setup can perform much faster than we need it so I went 
for shelf-sized aggregates.

I did consider swapping some or all disks between shelves to see what it 
does but I'm evaluating my options first and starting with gentle 
changes because its only the backups that are a concern at this time, 
users are not reporting problems so I don't want to introduce downtime 
yet.  Thanks.

On 03/01/10 18:27, tmac wrote:
> You added more disks after the fact. Data ONTAP would not have laid
> out the disks like that if they were all there to begin with.
>
> Somethings that *might* help:
>
> 1. Shut down your filer. pull half the disks out of shelf 1 and shelf
> two and swap them
> 2. Make sure your are configured for multipath disk I/O
> ->  You should have 0a, 0b, 0c&  0d as controllers.
> If you can, hook 0a to 1 (module a-in), 0c to 1 (module b-in)
> If you can, hook 0b to 2 (module a-in), 0d to 2 (module b-in)
> ->  this gives two paths to each disk and splits all your disks into 4
> paths versus 1.
>
> If you only have two controllers, make sure one is from 0a/0b and the
> other is from 0c/0d
> Connect one to Shelf 1-A Module-input (then daisy chain to shelf 2)
> Connect one to Shelf 2-B Module-input (then daisy chain to shelf 1)
>
> --tmac
>           Tim McCarthy
>       Principal Consultant
>
>    RedHat Certified Engineer
>     804006984323821 (RHEL4)
>     805007643429572 (RHEL5)
>
>
>
> On Mon, Mar 1, 2010 at 5:58 PM, Adam McDougall<mcdouga9 <at> egr.msu.edu>  wrote:
>> On 03/01/10 14:49, tmac wrote:
>>>
>>> Questions:
>>>
>>> What does the raid layout look like on the aggregate (aggr status -r
>>> aggrname)
>>
>> hank>  aggr status -r hanksata0
>> Aggregate hanksata0 (online, raid_dp) (block checksums)
>>   Plex /hanksata0/plex0 (online, normal, active)
>>     RAID group /hanksata0/plex0/rg0 (normal)
>>
>>       RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)
>>   Phys (MB/blks)
>>       --------- ------  ------------- ---- ---- ---- ----- --------------
>>   --------------
>>       dparity   0c.16   0c    1   0   FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>       parity    0c.17   0c    1   1   FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>       data      0c.18   0c    1   2   FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>       data      0c.19   0c    1   3   FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>       data      0c.20   0c    1   4   FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>       data      0c.21   0c    1   5   FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>       data      0c.22   0c    1   6   FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>       data      0c.23   0c    1   7   FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>       data      0c.24   0c    1   8   FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>       data      0c.25   0c    1   9   FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>       data      0c.26   0c    1   10  FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>       data      0c.27   0c    1   11  FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>       data      0c.28   0c    1   12  FC:B   -  ATA   7200 635555/1301618176
>> 635858/1302238304
>>
>>>
>>> Did you *ever* let this aggregate fill up or get nearly full (90% or
>>> more) before adding more disks?
>>
>> I have never added more disks to it.  I *attempted* to once, but it rejected
>> my request because the aggr would have been over 16T, which is why I created
>> a second aggr just like it with bigger disks that seems to work just fine:
>>
>> hank>  aggr status -r hanksata1
>> Aggregate hanksata1 (online, raid_dp) (block checksums)
>>   Plex /hanksata1/plex0 (online, normal, active)
>>     RAID group /hanksata1/plex0/rg0 (normal)
>>
>>       RAID Disk Device  HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)
>>   Phys (MB/blks)
>>       --------- ------  ------------- ---- ---- ---- ----- --------------
>>   --------------
>>       dparity   0c.39   0c    2   7   FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>       parity    0c.38   0c    2   6   FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>       data      0c.44   0c    2   12  FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>       data      0c.43   0c    2   11  FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>       data      0c.37   0c    2   5   FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>       data      0c.36   0c    2   4   FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>       data      0c.42   0c    2   10  FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>       data      0c.35   0c    2   3   FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>       data      0c.41   0c    2   9   FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>       data      0c.34   0c    2   2   FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>       data      0c.40   0c    2   8   FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>       data      0c.33   0c    2   1   FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>       data      0c.32   0c    2   0   FC:B   -  ATA   7200 847555/1735794176
>> 847827/1736350304
>>
>>
>>>
>>> If you added more disks, how were they added? In other words, what was
>>> the layout before and after the disk add?
>>>
>>> --tmac
>>>           Tim McCarthy
>>>       Principal Consultant
>>>
>>>
>>>    RedHat Certified Engineer
>>>     804006984323821 (RHEL4)
>>>     805007643429572 (RHEL5)
>>>
>>>
>>>
>>> On Mon, Mar 1, 2010 at 2:15 PM, Adam McDougall<mcdouga9 <at> egr.msu.edu>
>>>   wrote:
>>>>
>>>> Approx late Sept-09.  I wouldn't be surprised if it was slow before that
>>>> but I have no real data to back that up.
>>>>
>>>> On 03/01/10 13:50, Jeff Mohler wrote:
>>>>>
>>>>> How long has this aggregate been over 95% full?
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Mar 1, 2010 at 10:34 AM, Adam McDougall<mcdouga9 <at> egr.msu.edu
>>>>> <mailto:mcdouga9 <at> egr.msu.edu>>    wrote:
>>>>>
>>>>>     For a long time we've known backing up our largest volume (3.5T) was
>>>>>     slow.  More recently I've been investigating why and it seems like a
>>>>>     problem with only that shelf or possibly aggregate.  Basically it is
>>>>>     several times slower than any other shelf/aggregate we have, it
>>>>>     seems bottlenecked whether I am reading/writing from nfs, ndmp,
>>>>>     reallocate scans, etc, that shelf is always slower.  I will probably
>>>>>     have a support case opened tomorrow with netapp but I feel like
>>>>>     checking with the list to see what else I can find out on my own.
>>>>>     When doing NDMP backups I get only around 230Mbit/sec as opposed to
>>>>>     800+ on others.  The performance drops distinctly on the hour
>>>>>     probably for snapshots (see pic).  Details below.  0c.25 seems like
>>>>>     a hot disk but the activity on that aggr also seems too high since
>>>>>     the network bandwidth is fairly small.  A 'reallocate measure' on
>>>>>     the two large volumes on aggregate hanksata0 both return a score of
>>>>>     '1'.
>>>>>
>>>>>     I guess my two main questions are, how do I figure out what is
>>>>>     causing the activity on hanksata0 (especially the hot disk which is
>>>>>     sometimes at 100%) and if its not just activity but an actual
>>>>>     problem, how could I further debug the slow performance to find out
>>>>>     what items are at fault?
>>>>>
>>>>>     I used ndmpcopy to copy a fast volume with large files from another
>>>>>     filer to a new volume on hanksata0 and hanksata1.  The volume on
>>>>>     hanksata0 is slow but the one on hanksata1 is not.  Both of those
>>>>>     aggregates are on the same loop with hanksata1 terminating it.
>>>>>
>>>>>     Sun Feb 28 20:14:20 EST [hank: wafl.scan.start:info]: Starting WAFL
>>>>>     layout measurement on volume scratchtest.
>>>>>     Sun Feb 28 20:19:01 EST [hank: wafl.reallocate.check.value:info]:
>>>>>     Allocation measurement check on
>>>>>     '/vol/scratchtest' is 2.
>>>>>
>>>>>     ^^^ almost 5 minutes!
>>>>>
>>>>>     Sun Feb 28 20:13:38 EST [hank: wafl.scan.start:info]: Starting
>>>>>     WAFL layout measurement on volume scratchtest2.
>>>>>     Sun Feb 28 20:14:12 EST [hank: wafl.reallocate.check.value:info]:
>>>>>     Allocation measurement check on
>>>>>     '/vol/scratchtest2' is 1.
>>>>>
>>>>>     ^^^ less than 1 min
>>>>>
>>>>>     When I write to scratchtest, you can see the network bandwidth jump
>>>>>     up for a few seconds then it stalls for about twice as long,
>>>>>     presumably so the filer can catch up writing, then it repeats.
>>>>>     Speed averages around 30-40MB/sec if that.
>>>>>
>>>>>     I even tried using the spare sata disk from both of these shelves
>>>>>     to make a new volume, copied scratchtest to it (which took 26
>>>>>     minutes for around 40G), and reads were equally slow as the existing
>>>>>     scratchtest, although I'm not sure if thats because a single disk is
>>>>>     too slow to prove anything, or theres a shelf problem.
>>>>>
>>>>>     hanksata0           6120662048 6041632124   79029924      99%
>>>>>     hanksata0/.snapshot  322140104   14465904  307674200       4%
>>>>>     hanksata1           8162374688 2191140992 5971233696      27%
>>>>>     hanksata1/.snapshot  429598664   39636812  389961852       9%
>>>>>
>>>>>     hanksata0 and 1 are both ds14mk2 AT but hanksata0 has
>>>>>     X268_HGEMI aka X268A-R5 (750m x 14) and hanksata1 has
>>>>>     disks X269_HGEMI aka X269A-R5 (1T x 14).  hanksata0 has
>>>>>     been around since we got the filer say around 2 years ago,
>>>>>     hanksata1 was added within the last half year.  Both
>>>>>     shelves have always had 11 data disks, 2 parity, 1 spare,
>>>>>     the aggregates were never grown.
>>>>>
>>>>>     volumes on hanksata0 besides root (all created over a year ago):
>>>>>
>>>>>     volume 1 (research):
>>>>>     NO dedupe (too big)
>>>>>     10 million inodes, approx 3.5T, 108G in snapshots
>>>>>     endures random user read/write but usually fairly light traffic.
>>>>>     Populated initially with rsync then opened to user access via NFS.
>>>>>     Sun Feb 28 21:38:11 EST [hank: wafl.reallocate.check.value:info]:
>>>>>     Allocation measurement check on '/vol/research' is 1.
>>>>>
>>>>>     volume 2 (reinstallbackups):
>>>>>     dedupe enabled
>>>>>     6.6 million files, approx 1.6T, 862G in snapshots
>>>>>     volume created over a year ago and has several dozen gigs of windows
>>>>>     PC backups written or read multiple times per week using CIFS but
>>>>>     otherwise COMPLETELY idle.  Older data is generally deleted to
>>>>>     snapshots after some weeks and the snapshots expire after a few
>>>>> weeks.
>>>>>     Only accessed via CIFS.
>>>>>     Mon Mar  1 12:15:58 EST [hank: wafl.reallocate.check.value:info]:
>>>>>     Allocation measurement check on '/vol/reinstallbackups' is 1.
>>>>>
>>>>>
>>>>>     hanksata1 only has one volume besides the small test ones I made,
>>>>>     it runs plenty fast.
>>>>>     dedupe enabled
>>>>>
>>>>>     4.3 million files, approx 1.6T, 12G in snapshots
>>>>>     created a few months ago on an otherwise unused new aggregate with
>>>>>     initial rsync,
>>>>>     then daily rsyncs from another fileserver that is not very active
>>>>>
>>>>>
>>>>>
>>>>>     disk             ut%  xfers  ureads--chain-usecs writes--chain-usecs
>>>>>     cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs
>>>>>     /hanksata0/plex0/rg0:
>>>>>     0c.16              7   5.69    0.94   1.00 55269   3.22   3.02  2439
>>>>>        1.52   2.71   579   0.00   ....     .   0.00   ....     .
>>>>>     0c.17              9   6.34    0.94   1.00 74308   3.84   2.86  2228
>>>>>        1.56   2.93   873   0.00   ....     .   0.00   ....     .
>>>>>     0c.18             63 121.00  118.86   1.01 30249   1.38   3.26  3516
>>>>>        0.76   5.43  2684   0.00   ....     .   0.00   ....     .
>>>>>     0c.19             60 117.74  116.69   1.00 30546   0.40   3.73  5049
>>>>>        0.65   5.56  2840   0.00   ....     .   0.00   ....     .
>>>>>     0c.20             60 120.82  119.66   1.02 29156   0.43   5.33  5469
>>>>>        0.72   4.80  3583   0.00   ....     .   0.00   ....     .
>>>>>     0c.21             60 119.37  118.25   1.02 29654   0.36   4.60  5870
>>>>>        0.76   5.76  3140   0.00   ....     .   0.00   ....     .
>>>>>     0c.22             62 124.87  123.32   1.02 29423   0.62   5.65  5677
>>>>>        0.94   3.58  2710   0.00   ....     .   0.00   ....     .
>>>>>     0c.23             62 119.48  118.35   1.03 30494   0.36   4.00  6875
>>>>>        0.76   5.14  3417   0.00   ....     .   0.00   ....     .
>>>>>     0c.24             61 119.08  117.96   1.02 29981   0.47   6.92  3289
>>>>>        0.65   3.94  2930   0.00   ....     .   0.00   ....     .
>>>>>     0c.25             93 118.17  116.72   1.03 45454   0.58   4.00 17719
>>>>>        0.87   4.63 11658   0.00   ....     .   0.00   ....     .
>>>>>     0c.26             61 121.40  120.27   1.04 29271   0.43   7.75  3097
>>>>>        0.69   5.21  2131   0.00   ....     .   0.00   ....     .
>>>>>     0c.27             59 115.75  114.81   1.03 29820   0.43   5.50  4530
>>>>>        0.51   6.00  3321   0.00   ....     .   0.00   ....     .
>>>>>     0c.28             63 125.53  124.15   1.01 30302   0.65   6.94  3808
>>>>>        0.72   3.40  5191   0.00   ....     .   0.00   ....     .
>>>>>
>>>>>     Both sata shelves are on controller 0c attached to two 3040.
>>>>>     Both sata shelves are on controller 0c attached to two 3040.
>>>>>     Raid-DP in 13-disk raid groups so we have 2 parity and one spare
>>>>>     per shelf.
>>>>>     Active-Active single path HA.
>>>>>     Latest firmwares/code as of beginning of the year. 7.3.2.
>>>>>     no VMs, no snapmirror, nothing fancy that I can think of.
>>>>>     wafl scan status only shows 'active bitmap rearrangement' or
>>>>>     'container block reclamation'.
>>>>>
>>>>>     Thanks for thoughts and input!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> No Signature Required
>>>>> Save The Bits, Save The World!
>>>>
>>>>
>>>
>>
>>
>


Gmane