Tyler J. Wagner | 7 Feb 2013 11:01
Gravatar

Re: (somewhat solved) Backing up many small files

On 2013-02-07 07:15, Sorin Srbu wrote:
>>> /usr/bin/time -v find /home -mtime 1 >/dev/null
>
> Output from the client (/home):
> User time (seconds): 2.12
> Elapsed (wall clock) time (h:mm:ss or m:ss): 10:17.32
> 
> Output from the BPC-server (/bak):
> User time (seconds): 228.31
> Elapsed (wall clock) time (h:mm:ss or m:ss): 14:39:44
> 
> According to 
> <<http://linux.about.com/od/commands/a/Example-Uses-Of-The-Command-Time.htm>>, 
> the wall clock indicates how long the process run would take. With that in 
> mind, the backup from the client should take about ten minutes. This is 
> clearly not so according to the BPC logs.

This indicates that I was wrong; directory/inode traversal is not the
issue. It's speedy enough to run the test. It is likely in the rsync block
checksum comparison after all.

Regards,
Tyler

--

-- 
""Each man must for himself alone decide what is right and what is wrong,
which course is patriotic and which isn't. You cannot shirk this and be
a man."
   -- Mark Twain

(Continue reading)

Sorin Srbu | 7 Feb 2013 08:15
Picon
Picon

Re: (somewhat solved) Backing up many small files

> -----Original Message-----
> From: Sorin Srbu [mailto:sorin.srbu <at> orgfarm.uu.se]
> Sent: Wednesday, February 06, 2013 11:02 AM
> To: 'General list for user discussion, questions and support'
> Subject: Re: [BackupPC-users] (somewhat solved) Backing up many small
> files
>
> > No, this is what you want:
> >
> > /usr/bin/time -v find /home -mtime 1 >/dev/null
> >
> > This will generate a list of all files in /home, checking each of
> their
> > modification times, and throw all the output away. This performs a
> > total
> > directory traversal where each file's inode is checked. This is
> almost
> > certainly the limiting factor of your rsync.
> >
> > The above spits out a lot of output. You are interested in "User
> time"
> > (CPU
> > time in userspace) and "Elapsed (wall clock) time". Example from my
> PC:
> >
> > tyler <at> baal:~$ /usr/bin/time -v find /home -mtime 1 >/dev/null
> > 	Command being timed: "find /home -mtime 1"
> > 	User time (seconds): 0.14
> > 	System time (seconds): 0.96
> > 	Percent of CPU this job got: 15%
(Continue reading)

Shawn T Perry | 6 Feb 2013 17:27

shadowmountrsync

I've followed all the instructions here (http://sourceforge.net/apps/mediawiki/backuppc/index.php?title=User_Scripts_-_Client_-_Windows_VSS), and when I don't have the script use VSS, it works fine.  However, when I turn it on, it sets up everything, but the ssh link doesn't let go, allowing backuppc to continue.  What do I need to provide to get some help getting this to work?

backuppc server is ubuntu 12.04 64bit
Client is Windows 2003 server running cygwin 1.7

------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
BackupPC-users mailing list
BackupPC-users <at> lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
Les Mikesell | 6 Feb 2013 17:14
Picon

Re: Backing up many small files

On Wed, Feb 6, 2013 at 3:56 AM, Sorin Srbu <sorin.srbu <at> orgfarm.uu.se> wrote:
>>>
>> ??? 7MB/s (that is 7Mbyte/s) is a usable value for a 100Mb/s (that is 100
>> Mbits/s) connection! 100Mb/s translates to 12MB/s, 10Mb/s would be
>> 1.2MB/s...
>>
>> On a 1G network, you can get similar rates as with local hard disk,
>> that is ~100-120MB/s.
>>
>> So on a 100Mb/s-network, the OP can't get higher transfer-rates then 12MB/s.
>> Its the physical limit. 7MB/s is a good value.
>>
>
> Thanks for clearing that up!

But with rsync over ssh, that's not very strictly related to
real-world results.   Except in the case of new files, rsync is only
going to send changes that may take little bandwidth but can trigger a
lot of slow disk activity reconstructing the new copy.   And if your
bandwidth is the limiting factor you can add the -C option to ssh for
compression.

--

-- 
   Les Mikesell
      lesmikesell <at> gmail.com

------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
BackupPC-users mailing list
BackupPC-users <at> lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Tyler J. Wagner | 6 Feb 2013 10:47
Gravatar

Re: (somewhat solved) Backing up many small files

On 2013-02-06 06:56, Sorin Srbu wrote:
>> From: Les Mikesell [mailto:lesmikesell <at> gmail.com]
>> You are testing sustained transfer times there.  The killer with small
>> files is the seek time while the disk head bounces around reading
>> little bits of directory and inode data.    And once you get started,
>> the server has to do approximately the same to check for matches -
>> possibly with other backups running.
> 
> I realized that afterwards. I'll be looking for some kind of random test 
> instead.

No, this is what you want:

/usr/bin/time -v find /home -mtime 1 >/dev/null

This will generate a list of all files in /home, checking each of their
modification times, and throw all the output away. This performs a total
directory traversal where each file's inode is checked. This is almost
certainly the limiting factor of your rsync.

The above spits out a lot of output. You are interested in "User time" (CPU
time in userspace) and "Elapsed (wall clock) time". Example from my PC:

tyler <at> baal:~$ /usr/bin/time -v find /home -mtime 1 >/dev/null
	Command being timed: "find /home -mtime 1"
	User time (seconds): 0.14
	System time (seconds): 0.96
	Percent of CPU this job got: 15%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.07
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 2320
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 1512
	Voluntary context switches: 2398
	Involuntary context switches: 196
	Swaps: 0
	File system inputs: 31000
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

Directory and inode data is cached by the kernel so subsequent runs are
faster. You can tune these with sysctl:

vm.swappiness
vm.vfs_cache_pressure

More information (shameless use of my blog):

http://www.tolaris.com/2008/09/28/making-the-gui-faster-in-ubuntu-linux/

Regards,
Tyler

--

-- 
"There is no 'eastern' and 'western' medicine. There's 'medicine' and
then there's 'stuff that has not been proven to work.'"
   -- Maki Naro, "The Red Flags of Quackery, v2.0", Sci-ence.org

------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
BackupPC-users mailing list
BackupPC-users <at> lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Les Mikesell | 5 Feb 2013 18:00
Picon

Re: Backing up many small files

On Tue, Feb 5, 2013 at 2:36 AM, Sorin Srbu <sorin.srbu <at> orgfarm.uu.se> wrote:
> Hi all,
>
> I see incremental backup times in the 300-400 minutes range every day on this
> particular machine. A full backup is about 28 GB and each daily incremental
> backup is in the 150-250 MB range. The incrementals take like forever (well,
> about 6-7 hrs each).
>
> Is there *anything* I can do to tweak the backup-speed of BPC in order to
> speed up a backup from this machine that contains hundreds of thousands of
> small files? Maybe something on the other machine?

Is it split into some small number of top-level directories?   If so,
you might add additional 'hosts', each configured to point to the same
target machine via ClientAliasName, but backing up different
directories.   This may not save overall time unless the total number
of files is causing the directory read to run out of RAM and swap, but
it will let you skew the days where each part does a full.

--

-- 
   Les Mikesell
      lesmikesell <at> gmail.com

------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
BackupPC-users mailing list
BackupPC-users <at> lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Les Mikesell | 5 Feb 2013 17:23
Picon

Re: (somewhat solved) Backing up many small files

On Tue, Feb 5, 2013 at 10:03 AM, Sorin Srbu <sorin.srbu <at> orgfarm.uu.se> wrote:
>>
> A quick additional note, the drives on the client seem to be pretty fast, even
> compared to the raid0 array...
>
> user <at> BPC-client ~/ [0]#  hdparm -tT /dev/sd[ab]
>
> /dev/sda:
>  Timing cached reads:   10832 MB in  2.00 seconds = 5416.82 MB/sec
>  Timing buffered disk reads:  208 MB in  3.02 seconds =  68.77 MB/sec
>
> /dev/sdb:
>  Timing cached reads:   11108 MB in  2.00 seconds = 5554.84 MB/sec
>  Timing buffered disk reads:  254 MB in  3.00 seconds =  84.54 MB/sec

You are testing sustained transfer times there.  The killer with small
files is the seek time while the disk head bounces around reading
little bits of directory and inode data.    And once you get started,
the server has to do approximately the same to check for matches -
possibly with other backups running.

--

-- 
   Les Mikesell
     lesmikesell <at> gmail.com

------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
BackupPC-users mailing list
BackupPC-users <at> lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Sorin Srbu | 5 Feb 2013 17:03
Picon
Picon

Re: (somewhat solved) Backing up many small files

> -----Original Message-----
> From: Sorin Srbu [mailto:sorin.srbu <at> orgfarm.uu.se]
> Sent: Tuesday, February 05, 2013 4:38 PM
> To: 'General list for user discussion, questions and support'
> Subject: Re: [BackupPC-users] (somewhat solved) Backing up many small
> files
>
> > Oh, yeah. I remember that one now, I followed it rather closely for a
> > while. Thanks for the reminder!
> >
> > Doing the suggested find-operation. Will be back in a while with more
> > info.
>
> Only about 2 milion files.
> user <at> BPC-client ~/ [0]# find /home | wc -l
> 1883781
>
> And the test incremental backup took 118.5 min's for 61 MB.
>
> Somewhat better I guess after the noa* tweaks. I'm inclined to believe
> that
> the bootle-neck is, as Markus points out, that the client maybe isn't
> to
> top-notch and the sheer number of files to be checked.
>
> I'll leave this issue as somewhat solved and maybe wait for BPC to
> implement
> a more modern rsync-strategi as mention in the archived thread above.
>
> Thanks everybody!

A quick additional note, the drives on the client seem to be pretty fast, even 
compared to the raid0 array...

user <at> BPC-client ~/ [0]#  hdparm -tT /dev/sd[ab]

/dev/sda:
 Timing cached reads:   10832 MB in  2.00 seconds = 5416.82 MB/sec
 Timing buffered disk reads:  208 MB in  3.02 seconds =  68.77 MB/sec

/dev/sdb:
 Timing cached reads:   11108 MB in  2.00 seconds = 5554.84 MB/sec
 Timing buffered disk reads:  254 MB in  3.00 seconds =  84.54 MB/sec

--

-- 
/Sorin
Attachment (smime.p7s): application/x-pkcs7-signature, 5598 bytes
------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
BackupPC-users mailing list
BackupPC-users <at> lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
Sorin Srbu | 5 Feb 2013 16:38
Picon
Picon

Re: (somewhat solved) Backing up many small files

> -----Original Message-----
> From: Sorin Srbu [mailto:sorin.srbu <at> orgfarm.uu.se]
> Sent: Tuesday, February 05, 2013 12:40 PM
> To: 'General list for user discussion, questions and support'
> Subject: Re: [BackupPC-users] Backing up many small files
> 
> > -----Original Message-----
> > From: Markus [mailto:universe <at> truemetal.org]
> > Sent: Tuesday, February 05, 2013 11:58 AM
> > To: General list for user discussion, questions and support
> > Subject: Re: [BackupPC-users] Backing up many small files
> >
> > I posted basically your same question a few months ago and with the
> > help
> > of the friendly people on this list it turned out that I just have
> > simply way too many files on the client, about 25 million actually. Do
> > a "find / | wc -l" to see how many files you got! My find run alone took
8
> > hours if I remember correctly. So how is rsync supposed to do it any
> > faster :)  You can check the thread here, people share some really good
> > infos: http://sourceforge.net/mailarchive/message.php?msg_id=30104262
> 
> Oh, yeah. I remember that one now, I followed it rather closely for a
> while. Thanks for the reminder!
> 
> Doing the suggested find-operation. Will be back in a while with more
> info.

Only about 2 milion files.
user <at> BPC-client ~/ [0]# find /home | wc -l
1883781

And the test incremental backup took 118.5 min's for 61 MB.

Somewhat better I guess after the noa* tweaks. I'm inclined to believe that
the bootle-neck is, as Markus points out, that the client maybe isn't to
top-notch and the sheer number of files to be checked.

I'll leave this issue as somewhat solved and maybe wait for BPC to implement
a more modern rsync-strategi as mention in the archived thread above.

Thanks everybody!

--

-- 
/Sorin

Attachment (smime.p7s): application/x-pkcs7-signature, 5598 bytes
------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
BackupPC-users mailing list
BackupPC-users <at> lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
Adam Goryachev | 5 Feb 2013 13:09
Picon

Re: Backing up many small files

On 05/02/13 21:11, Sorin Srbu wrote:
>> Have you done at least two FULL backups since you enabled the
>> checksum-seed option? If not, stop now, and wait until you have.
> I have eighteen full backups online for this particular machine.
Note I said number of full backups since you changed that option, not
just number of fulls. There is a difference!
>> Check the following during an incremental backup:
>> 1) Memory/swap used on both backup server and the backup client. If you
>> are using all available memory, or see memory being paged in/out (use
>> vmstat) then you need to upgrade RAM on that machine, or find a way to
>> backup a smaller number of files (split the client into multiple shares
>> or multiple machines, etc).
> Thanks. Are there any particular limits/numbers I should be aware of, i.e. the 
> rule-of-thumb kind?
No, as long as you have enough. The only way to see if you have enough
in your environment is to watch it during a backup and see.
>> 2) Check disk performance on the backup client. You have a single SATA
>> drive on the client, and this will be slow, you are doing a lot of
>> seeks, not just one big read. Can you enable noatime on the client
>> (probably)? This would decrease the amount of seek and writes on the
>> single SATA drive.
> Noatime on client; Just checked - not set. Setting. Thanks for the reminder!
>
> As for the single drive, I can't do much about that. It's an instrument 
> computer and not really allowed to change the config or the service 
> support-people won't be too happy about it.
Understood, sometimes you just don't get a choice.... In any case, you
do need to check this anyway to find out if it is in fact the
bottleneck. If it is, then there is no point looking or changing
anything else, if not, then you will need to keep looking.

You can keep an eye on the /sys/block/sda/stat file, in particular watch
the activetime value (10th value). If this is increasing at the nearly
the same rate as wall clock time, then it means your drive is basically
100% busy, and therefore the bottleneck. If it is much slower than wall
clock time, then your bottleneck is elsewhere... Again, you want to
watch this during a backup, both during the first stage while the client
is building the list of files to backup, and again while the 200M data
is being transferred.
>> 3) Check disk performance on the backup server
> Any best practices here?
Yes, get as many drives as possible, make each drive as fast as
possible, and combine with a hardware raid card with a Battery Back Up
write cache.
IMHO, also only use RAID level 0, 1 or 10 (ie, no checksum based raid
levels like 5 or 6).
In reality, you make do with what you have.
BTW, you might also look into changing the filesystem format from ext3
to something more modern which will probably perform better. Personally,
I use reiserfs, but only because I built this system back when it was
the best performing filesystem, wouldn't suggest it now for a new system
due to it's seemingly un-maintained status... Though it has proved
reliable for me.
>> 4) Check CPU on the backup server, if you have compression enabled, this
>> will really slow things down, consider to disable compression (though
>> this well mess with the pool).
> Is this the (GUI) Edit Config/Backup settings/CompressLevel=3 you're referring 
> to?
Yes, set this to 0 to disable compression, but you might need to delete
all existing backups to really see the effect it will have. (Existing
unchanged files will still be stored compressed, only new files will be
stored uncompressed. You will still need to uncompress an old file if it
is updated, but after the first update it will be stored uncompressed.
Also, you will still uncompress a file to do a full comparison (of a
small percentage of files). (Watch CPU consumption on the backup server,
if CPU is busy, then compression is an issue, remember compression will
only use a single core even if you have a multi-core CPU).
>> 5) Check bandwidth between the two (least likely to be the culprit, but
>> worth checking).
> user <at> BPC-server ~/ [0]# lftp -e 'pget 
> http://ftp.sunet.se/pub/os/Linux/distributions/centos/6/isos/x86_64/CentOS-6.3-x86_64-bin-DVD1.iso'
> `/pub/os/Linux/distributions/centos/6/isos/x86_64/CentOS-6.3-x86_64-bin-DVD1.iso', 
> got 193027041 of 4289386496 (4%) 6.99M/s eta:11m
>
> Speeds varies around 7 M(Mbyte? Mbit?)/s. I guess it's good enough for a 
> 100Mbps-connection.
This is not relevant, I meant to watch what the bandwidth usage was
during a backup. BTW, 7MB/s is fine for a 10Mbps connection, but if you
really have a 100Mbps network, you should see at least 80MB/s transfer
speeds. Try testing with iperf if you want to generate your own load.
BTW, slower speed with the client compared to the server may point to
CPU or network driver issues on the client (ie, old crappy network card,
or slow cpu, etc).
>> BTW, are you sure the backup server has 2TB with 4 drives in RAID0 ?
>> That suggests that any one of those 4 drives fail, and you lose ALL of
>> your backups and pool etc... You might confirm you are using RAID0 and
>> not linear, and also check the stripe size. If you are backing up lots
>> of small files, then you want the stripe size to be about the same size
>> as your file size. If your files are between 1 and 2kB each, then you
>> would want a stripe size of 4k, not the current linux default of 512k.
>>
>> I would suggest RAID10 if you want any sort of resilience....
> I was a bit wrong here I see; three drives and 1,4 TB. All seem active. It 
> would seem I also added a drive on the PATA-port, in addition to the 
> SATA-ports. I think the reason for using a PATA-drive at the time was the mobo 
> only had two SATA-ports and I needed more space, thus adding a slower 
> PATA-drive as well. The actual BPC-server was rather old even at the beginning 
> when it was converted to a backup-server.
>
> Maybe PATA would slow things down a bit as well?
So is this 3 x 500G drives?
Try a "hdparm -tT /dev/sd[ab] /dev/hdb" to compare the speeds of them
(while there are no running backups). Maybe you could replace the PATA
drive now with a SATA if you have a spare around? Also might be able to
increase to the 4 x 500G drives you thought you had.... Though none of
this will help if the problem is somewhere up above....

> user <at> BPC-server ~/ [0]# cat /proc/mdstat
> Personalities : [raid0]
> md0 : active raid0 sdb1[2] sda1[1] hdb1[0]
>       1465151616 blocks 128k chunks
> I'm assuming the above mentioned 128 kB chunks are the same as stripe sizes 
> and can't be changed to the 4 kB size you mention w/o a reformat. Correct?
Correct, can't be changed unless you re-create the raid wiping all the
data in the process.
> Anyway, I know. It's a calculated risk using raid0.
> I'm figuring as these are just casual backups (users always copy their 
> personal data using Winscp to their homefolders, which is being backed up on 
> another more resilient BPC), there's no real need for redundancy - just plenty 
> of space.
No problem... just pointing it out :) You never know what other people
do or don't know.

If the problem is server side, it may be worthwhile to wipe the data and
use a smaller chunk size on the RAID, format with a different
filesystem, and start the backups again. Just remember, don't bother
timing anything until after the second full backup has finished.

Regards,
Adam

--

-- 
Adam Goryachev
Website Managers
www.websitemanagers.com.au

------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
BackupPC-users mailing list
BackupPC-users <at> lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

F.Trojahn | 5 Feb 2013 10:35
Picon

Search files in Frontend and/or using command line

Hello list,

I'm looking for a way to search for files or directories (not contents)
within a host or within certain hosts. Especially when restoring parts
of a host this would be useful.

Is there any script for that on cmd line or could this be a feature for
Frontend?

Thanx in advance
Falko

--

-- 
Your's sincerely - Mit freundlichen Grüßen -
Reçevez mes salutations distinguées

Falko Trojahn 
http://www.trojahn.de

------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
BackupPC-users mailing list
BackupPC-users <at> lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Gmane