Linda A. W. | 2 Aug 2004 03:20

rsync "-I" vs. "-c"

If I use the "-I" to ignore date and size as quick-check methods of 
determining
change, what method does it use to determine difference?  If it falls 
back to
checksumming the entire file, maybe the manpage might warn that this 
would be
as expensive as using the "-c" option...or not depending on what it uses for
determining difference at that point.

So exactly how does rsync compare files for differences when date & size are
used but checksumming is not?

Thanks!
-Linda
--

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Chris Shoemaker | 1 Aug 2004 23:33
Picon

Re: rsync "-I" vs. "-c"

On Sun, Aug 01, 2004 at 06:20:11PM -0700, Linda A. W. wrote:
> If I use the "-I" to ignore date and size as quick-check methods of 
> determining

just modtime.  rsync never ignores size differences.

> change, what method does it use to determine difference?  If it falls 
> back to
> checksumming the entire file, maybe the manpage might warn that this 
> would be
> as expensive as using the "-c" option...or not depending on what it uses for
> determining difference at that point.

Short answer:  It does fall back to checksum comparison.

Long answer:  I'm not sure, but I suspect that the reason this is not so
explicit in the man page is that it's a bit complicated in the code,
too.  I _think_ that using -I would be even more expensive than -c,
because it will take longer to eventually do the same checksum
comparison.

But, Wayne knows these options like the back of his hand.  Wayne?

-chris

> 
> So exactly how does rsync compare files for differences when date & size are
> used but checksumming is not?
> 
> Thanks!
(Continue reading)

Wayne Davison | 2 Aug 2004 04:47
Picon
Favicon

Re: rsync "-I" vs. "-c"

On Sun, Aug 01, 2004 at 06:20:11PM -0700, Linda A. W. wrote:
> If I use the "-I" to ignore date and size as quick-check methods of
> determining change, what method does it use to determine difference?

With -I, rsync does no advance determination of sameness, it just
transfers all the files and lets the matching data make the transfer as
small as possible.  This is more expensive than -c if all the files are
the same (since the receiving side reads the file twice and writes it
once for each file), but could be slightly faster than -c if most of the
files have changes (since the -c option makes the receiving side read the
file thrice and write it once when it is different, but only reads the
file once if it is the same).

> So exactly how does rsync compare files for differences when date &
> size are used but checksumming is not?

When date & size are used, rsync skips the file when the modify time and
the size is identical on the receiver and the sender for a given file.

..wayne..
--

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Chris Shoemaker | 2 Aug 2004 00:16
Picon

reducing memmoves

Attached is a patch that makes window strides constant when files are
walked with a constant block size.  In these cases, it completely
avoids all memmoves.

In my simple local test of rsyncing 57MB of 10 local files, memmoved
bytes went from 18MB to zero.

I haven't tested this for a big variety of file cases.  I think that this
will always reduce the memmoves involved with walking a large file, but
perhaps there's some case I'm not seeing.

Also, the memmove cost if obviously neglegible compared to real disk
i/o, so you have pretty much no chance of measuring the difference
unless your files are already starting in cache.

Also, with the new new caps on window size, the worst case
memmoves are quite a bit smaller than they used to be, so the benefit of
avoiding them is comensurately reduced.  Therefore, in order to measure
the difference in terms of actually time to completion, you'd need to be
walking through a lot of cached data.

I don't have enough RAM (I'm at 192MB) to really measure this difference
well.  If you do, feedback from testing is especially welcome.  [glances
in wally's direction]  :)

Overall, I think this should never hurt performance, but with large
datasets and much memory, it should improve performance.

-chris

(Continue reading)

Wayne Davison | 2 Aug 2004 09:06
Picon
Favicon

Re: reducing memmoves

One comment on eliminating the read-behind in map_ptr():

The sender's read pattern can jump back a blocksize or so when it is
scanning the file using the rolling checksum and it needs to send out
the just-passed data that it didn't find a match for.  We don't want
this data to be re-read from disk, so we should make sure that this
unmatched data is retained in the buffer.

I think a good way to do this (when combined with the removal of the
window_start-kluge in map_ptr() that your patch removed) would be to
have the sender code call map_ptr() with a read range that includes the
start of the unmatched data that it needs up through the rolling-
checksum area that it is processing.  This way it would always tell
map_ptr() what memory must be preserved to prevent a re-read.

..wayne..
--

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Question about --stats


Is there a way to get rsync to estimate the net change of disk space usage
for a transfer?  I can get the gross amount of files to be transferred using
"--stats", and if I use "-v" I can see a list of files to be deleted.  But
then I'd have to mangle the filenames to the target path (prepend some text
to each line) and do a "du -c | tail -1" to get the disk usage of that list,
then subtract the number to the "Total transferred file size" line from
"--stats".  Not a huge hassle, but some hassle.

If it's easy, maybe this would be a good addition to rsync --stats.  List
the size of the files deleted, and perhaps the net change in disk usage.

I'm interested in this because I rsync data to a raid set for off-site
storage, but I'm running out of space.  I have some data that I want to be
sure is on the raid set, and other data that I care less about and could
delete some of to make sure the important data is transferred.  I do this
weekly to a 800Gb raid set, so wiping all the low-priority data,
transferring the high-priority data and then rsyncing as much of
low-priority data as will fit is not an option -- it would take too long.

Thanks!

Bart
---
Bart Brashers, Ph.D.
MFG Inc.
19203 36th Ave W Ste 101
Lynnwood WA 98036
425.921.4000 voice
425.921.4040 fax
(Continue reading)

Wayne Davison | 2 Aug 2004 19:54
Picon
Favicon

Re: reducing memmoves

On Sun, Aug 01, 2004 at 06:16:05PM -0400, Chris Shoemaker wrote:
> Attached is a patch that makes window strides constant when files are
> walked with a constant block size.  In these cases, it completely
> avoids all memmoves.

Seems like a good start to me.  Here's a patch I created that also makes
these changes:

    - The map_file() function now takes the window-size directly
      rather than the block-size.  This lets the the caller choose
      the value.

    - Figure out an appropriate window-size for the receiver,
      sender, generator, and the file_checksum() function to send to
      map_file().

    - Also removed the (offset > 2*CHUNK_SIZE) check in map_ptr().
      (Did you leave this in for a reason?)

    - The sender now calls map_ptr() with a range of memory that
      encompasses both the rolling-checksum data and the data at
      last_match that we may need to reread.

    - Defined MAX_BLOCK_SIZE as a separate value from MAX_MAP_SIZE.

    - Increased the size of MAX_MAP_SIZE.

I think this should improve several things.  Comments?

..wayne..
(Continue reading)

Wayne Davison | 2 Aug 2004 20:05
Picon
Favicon

Re: Question about --stats

On Mon, Aug 02, 2004 at 11:01:39AM -0600, Brashers, Bart -- MFG, Inc. wrote:
> If it's easy, maybe this would be a good addition to rsync --stats.  List
> the size of the files deleted, and perhaps the net change in disk usage.

Yes, an addition like that sounds like a good idea to me.  It will
require a protocol bump because of the weird way the stat data moves
around to the process that actually does the output (and the current
stats are all known to the sender but this stat would only be known to
the receiver).  So, a good idea, but it will be a little harder to do
than it might appear at first glance.  I'll consider it.

..wayne..
--

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Wayne Davison | 2 Aug 2004 20:25
Picon
Favicon

Re: HP-UX 11i and largefiles on rsync 2.6.2

Would anyone who is seeing this problem please try out the patch that is
attached to this bugzilla bug:

    https://bugzilla.samba.org/show_bug.cgi?id=1536

You'll need to re-run configure and re-build before re-testing.  (I
appended this patch to the bugzilla bug back on the 30th, but the
bugzilla emails that would have let you know about this are not making
it through to the list at the moment...)

..wayne..
--

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Adams, John | 2 Aug 2004 20:48
Favicon

Rsync and open files

I didn't see this answered in the FAQ, and it's important to one of my
application owners.

How does RSYNC handle open or write in process files?

Thank you!

J

jadams <at> molex.com
john.adams <at> molex.com

CONFIDENTIALITY NOTICE: This message (including any attachments) may contain Molex confidential
information, protected by law.  If this message is confidential, forwarding it to individuals, other
than those with a need to know, without the permission of the sender, is prohibited. 
 
This message is also intended for a specific individual.  If you are not the intended recipient, you should
delete this message and are hereby notified that any disclosure, copying, or distribution of this
message or taking of any action based upon it, is strictly prohibited.
--

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Gmane