Martin Pool | 18 Feb 2004 01:46
Picon
Favicon

Re: [rfc] hide rs_buffers_t from public interface

On 18 Feb 2004, Donovan Baarda <abo <at> minkirri.apana.org.au> wrote:
> In any case, flush is potentialy useful for different types of flush,
> as used in zlib. I haven't yet hit a case with librsync where I have
> really needed this, but I have used it a fair bit in zlib, so I can
> imagine a need for it. I probably prefer a seperate flush method, but
> guess a flush parameter would be more consistant.

That would be good.  I think that's fairly orthogonal to this
discussion?

> > > 3) It allows the application to be in full control of memory buffers
> > > to control and optimize memory usage.
> > 
> > The library needs to do some dynamic allocation, so the application is
> > not strictly in complete control.
> 
> Yeah. but minimizing how much it allocates helps for those in
> constrained memory land. Once again, not a biggie.

Is anyone actually going to use it in a space where allocating 10s of
kB of buffers is a problem?  There are not many such machines around
anymore.

Presumably the hash lookups are going to be a bigger deal.

> Nooo, not larger buffers.... :-)
> 
> The problem is, as it is currently implemented, you nearly always end
> up with less than a complete block left in the internal buffer, so you
> always have to copy into the internal buffer. The _only_ time you
(Continue reading)

Martin Pool | 18 Feb 2004 01:54
Picon
Favicon

Re: [rfc] hide rs_buffers_t from public interface

(Trying to convince myself perhaps.)

The one case where rs_buffers_t is the simplest interface is where the
application has all the input data already in a buffer in memory.
Perhaps it has been mmaped from a file or perhaps it was produced by
some other routine.  

This case ought to be pretty simple: allocate a big output buffer, set
up a rs_buffers_t and just make one call.

This is probably not a very good design for most applications, because
it limits the amount of data they can process to the largest buffer
they can allocate.  It is very common today to have x86 machines which
can only allocate contiguous chunks of 1GB or so, but that have disks
of 100s of GB.  It is pretty common to have single files larger than
can be held in memory.

Perhaps some kind of embedded application that needs to decompress
something from ROM/flash?

Even here, output is a problem.  The application needs to either have
a maximum-sized buffer, or do dynamic allocation.  Dynamic allocation
and resizing the buffer might be as easy to do through a callback
rather than buffer structures.

--

-- 
Martin 
Donovan Baarda | 18 Feb 2004 02:15
Picon
Picon
Gravatar

Re: [rfc] hide rs_buffers_t from public interface

On Wed, 2004-02-18 at 11:46, Martin Pool wrote:
> On 18 Feb 2004, Donovan Baarda <abo <at> minkirri.apana.org.au> wrote:
> > In any case, flush is potentialy useful for different types of flush,
> > as used in zlib. I haven't yet hit a case with librsync where I have
> > really needed this, but I have used it a fair bit in zlib, so I can
> > imagine a need for it. I probably prefer a seperate flush method, but
> > guess a flush parameter would be more consistant.
> 
> That would be good.  I think that's fairly orthogonal to this
> discussion?

Yep... just threw it in while we were talking about API changes :-)

> > If it was implemented right, you would only need a single fixed
> > internal buffer of exactly block_len size. You don't really need data
> > to be contiguous to calculate the block sums, so you can just walk
> > over the fragment in the internal buffer, and onto the supplied input
> > buffer. When you reach the end of the input buffer, you only need to
> > copy the last non-matching block or block fragment into the internal
> > buffer for next time.
> 
> Yes.  I think this can be fixed just in scoop.c?

Yes. The current scoop.c needs to changed a fair bit. The API needs to
be changed to add get_md4sum(offset, length) and
get_rollsum(offset,length) methods that can span the internal and input
buffers, as well as a few other things. The current delta.c hardly uses
the scoop at all... the copy minimisation it attempted to do was so
broken I just forced it to copy everything to the input buffer and walk
through it directly.
(Continue reading)


Gmane