Juan Gomez | 1 Jun 2002 01:58
Picon
Favicon

Re: NFSv4 Replication and Migration: design team conference call/Draft

A couple of suggestions/questions regarding the draft and the recent
minute:

1.-Do we want to provide for failure recovery in the migration/replication
protocol? I know that in most cases we should be sending differential
changes to the filesystems but I keep thinking of the case were we have a
full (intial) migration/replication of a file system and one side involved
in the process fails; do we want to have some type of migration
checkpointing?

2.-I think striving for a efficient bandwith utilization is a great idea so
I wanted to make the group aware of recent work in this area that may
relevant to the design of a a bandwith efficient migration replication
protocol:

http://www.cs.ucsd.edu/sosp01/papers/mazieres.pdf

I think the bandwith-saving ideas presented in this paper may work great in
migration/replication. What do you guys think? I am not so sure if any
patents cover these ideas there, though.

Juan

|---------+---------------------------->
|         |           Robert Thurlow   |
|         |           <Robert.Thurlow <at> e|
|         |           ng.sun.com>      |
|         |                            |
|         |           05/30/02 09:12 AM|
|         |           Please respond to|
(Continue reading)

Robert Thurlow | 3 Jun 2002 17:47
Picon

Re: NFSv4 Replication and Migration: design team conference call/Draft

> 1.-Do we want to provide for failure recovery in the migration/replication
> protocol? I know that in most cases we should be sending differential
> changes to the filesystems but I keep thinking of the case were we have a
> full (intial) migration/replication of a file system and one side involved
> in the process fails; do we want to have some type of migration
> checkpointing?

Yes; I believe that restartability in this case is a requirement.

> 2.-I think striving for a efficient bandwith utilization is a great idea so
> I wanted to make the group aware of recent work in this area that may
> relevant to the design of a a bandwith efficient migration replication
> protocol:
> 
> http://www.cs.ucsd.edu/sosp01/papers/mazieres.pdf
> 
> I think the bandwith-saving ideas presented in this paper may work great in
> migration/replication. What do you guys think?

This is an interesting paper I had not known about, but I believe
it is really only applicable with conventional filesystems traffic.
With replication/migration, you have a largely unidirectional flow
of data, and you can't benefit from caching and block recognition
as the paper describes.  The main thing would seem to be to keep
the pipe full; a streaming protocol which minimizes the risk of a
stall waiting for back-traffic from destination to source would
seem to cover us here.  Compression of data could be a benefit;
are there other ideas in the paper you believe would apply?

Rob T
(Continue reading)

Juan Gomez | 2 Jun 2002 20:14
Picon
Favicon

Re: NFSv4 Replication and Migration: design team conference call/Draft

Rob,

I think I was looking at the paper more from the point of view of
compression: recognizing similar blocks in a filesystem may help us a lot
in compressing what we have to send during a migration, I do not have
numbers but I think we may expect reasonable block replication on a given
filesytem, enough to be worth exploring.

Also in replication, I think we should have some protocol capable of
shipping only new blocks (i.e. updates) and not the old ones already
present in the other servers. I am not sure if we want to consider the
complications associated with this as it requires persistent storage of
replication-related metadata.

Juan

                                                                                                           
                      Robert Thurlow                                                                       
                      <Robert.Thurlow <at> e        To:       Juan Gomez/Almaden/IBM <at> IBMUS                      
                      ng.sun.com>              cc:       nfsv4-wg <at> sunroof.eng.sun.com                      
                                               Subject:  Re: NFSv4 Replication and Migration: design team  
                      06/03/02 08:47 AM         conference call/Draft                                      
                      Please respond to                                                                    
                      Robert Thurlow                                                                       

> 1.-Do we want to provide for failure recovery in the
migration/replication
> protocol? I know that in most cases we should be sending differential
> changes to the filesystems but I keep thinking of the case were we have a
> full (intial) migration/replication of a file system and one side
(Continue reading)

Robert Thurlow | 5 Jun 2002 22:34
Picon

Re: NFSv4 Replication and Migration: design team conference call/Draft

> I think I was looking at the paper more from the point of view of
> compression: recognizing similar blocks in a filesystem may help us a lot
> in compressing what we have to send during a migration, I do not have
> numbers but I think we may expect reasonable block replication on a given
> filesytem, enough to be worth exploring.

I think compression could be worthwhile to explore.  It's not clear
that we can take advantage of similar blocks except for blocks of
zeros, which we should certainly optimize.  What amount of block
replication would you expect, and under what circumstances?

> Also in replication, I think we should have some protocol capable of
> shipping only new blocks (i.e. updates) and not the old ones already
> present in the other servers. I am not sure if we want to consider the
> complications associated with this as it requires persistent storage of
> replication-related metadata.

The protocol certainly needs to be able to send only files which have
changed, and also to send only changed regions of individual files.
Otherwise, huge files cannot be dealt with.  The draft covers this.

Rob T

Brent Callaghan | 5 Jun 2002 23:12
Picon

Re: NFSv4 Replication and Migration: design team conference call/Draft

> I think compression could be worthwhile to explore.  It's not clear
> that we can take advantage of similar blocks except for blocks of
> zeros, which we should certainly optimize.  What amount of block
> replication would you expect, and under what circumstances?

I'm not sure if I read it in the paper Juan referenced, but
you could in theory avoid some data transfers if you kept
track of each block of transferred data together with a
checksum.

Then when you need to transfer a block of new data in
a file, you checksum the block and see if you've transferred
it previously as part of another file.  If you get a hit,
then instead of transferring the block, you just tell the
other end where it is, and where it needs to go.

I've no idea whether this happens often enough to be
worthwhile - maybe I should go back and read Maziere's paper
again ...

	Brent

Peter Staubach | 5 Jun 2002 23:13
Picon

Re: NFSv4 Replication and Migration: design team conference call/Draft

> 
> > I think compression could be worthwhile to explore.  It's not clear
> > that we can take advantage of similar blocks except for blocks of
> > zeros, which we should certainly optimize.  What amount of block
> > replication would you expect, and under what circumstances?
> 
> I'm not sure if I read it in the paper Juan referenced, but
> you could in theory avoid some data transfers if you kept
> track of each block of transferred data together with a
> checksum.
> 
> Then when you need to transfer a block of new data in
> a file, you checksum the block and see if you've transferred
> it previously as part of another file.  If you get a hit,
> then instead of transferring the block, you just tell the
> other end where it is, and where it needs to go.
> 
> I've no idea whether this happens often enough to be
> worthwhile - maybe I should go back and read Maziere's paper
> again ...
> 

Between this and the sparse file detection, the cpu overheads are
possibly starting to become significant.  Looking at a block to see
whether it is all zeros is really expensive.  There needs to be a
cheaper way.

This also doesn't match real life semantics.  We will need to be able
to replicate files created with mkfile(1M) for example.  They tend
to be populated with zeros, but are fully populated for good reasons.
(Continue reading)

David Robinson | 6 Jun 2002 00:02
Picon

Re: NFSv4 Replication and Migration: design team conference call/Draft

Brent Callaghan wrote:

> I'm not sure if I read it in the paper Juan referenced, but
> you could in theory avoid some data transfers if you kept
> track of each block of transferred data together with a
> checksum.
> 
> Then when you need to transfer a block of new data in
> a file, you checksum the block and see if you've transferred
> it previously as part of another file.  If you get a hit,
> then instead of transferring the block, you just tell the
> other end where it is, and where it needs to go.

Be careful here, checksums are not generally designed to
uniquely identify a block, but to detect errors. As an
example, a 2TB storage device is all that is in theory needed
to return any 512 byte block based on a 32-bit checksum.
Clearly all the world's information is not represented
by 2TB of data. By the time you get to a high enough probability
of no error, the size of the checksum is large enough that
it is probably better to just use a lossless compression
algorithm.

	-David

Brent Callaghan | 6 Jun 2002 00:13
Picon

Re: NFSv4 Replication and Migration: design team conference call/Draft

Peter Staubach wrote:
> 
> >
> > > I think compression could be worthwhile to explore.  It's not clear
> > > that we can take advantage of similar blocks except for blocks of
> > > zeros, which we should certainly optimize.  What amount of block
> > > replication would you expect, and under what circumstances?
> 
> Between this and the sparse file detection, the cpu overheads are
> possibly starting to become significant.  Looking at a block to see
> whether it is all zeros is really expensive.  There needs to be a
> cheaper way.

Yes, if moving data over a multi-gigabit network, then it's
likely cheaper just to move the zeros.  But I think a typical
application of migration will be to move the data over a
relatively low speed network, where a little CPU effort
will go a long way.

> This also doesn't match real life semantics.  We will need to be able
> to replicate files created with mkfile(1M) for example.  They tend
> to be populated with zeros, but are fully populated for good reasons.
> 
> A block of zeros does not mean that the block is not populated.

It's not necessary to send the zeros - it's much faster just
to tell the other end to imagine some :-)

Actually, in my experience with ISDN and DSL links, the
link hardware is pretty good at simple compression.
(Continue reading)

Noveck, Dave | 6 Jun 2002 00:15
Picon

RE: NFSv4 Replication and Migration: design team conference call/ Draft

What is worthwhile depends on the environment.  I believe
the environment that the paper was talking about was that
of optimizing transmission over DSL.  This is an environment 
in which the bandwidths are low, so the cost of processing
all of the data you are sending is relatively low, while
the value of optimizing the use of very scarce bandwidth
is pretty high.

I think a considerable portion of the use of the migration 
and replication protocols usage is going to be within a
building in which the considerations are quite different.
There will be significant portion over distance so I don't
think we want to profligate with bandwidth, but I don't think
we will be seeing much use in very-low-bandwidth situations
so I think the first version at least should be oriented toward
optimizing TTWI (time-to-working-implementation) and we should
avoid going for anything fancy in the bandwidth optimization
area.

-----Original Message-----
From: Peter Staubach [mailto:Peter.Staubach <at> sun.com]
Sent: Wednesday, June 05, 2002 5:14 PM
To: Robert.Thurlow <at> eng.sun.com; brent <at> eng.sun.com
Cc: juang <at> us.ibm.com; nfsv4-wg <at> sunroof.eng.sun.com
Subject: Re: NFSv4 Replication and Migration: design team conference
call/Draft

> 
> > I think compression could be worthwhile to explore.  It's not clear
> > that we can take advantage of similar blocks except for blocks of
(Continue reading)

Noveck, Dave | 6 Jun 2002 00:17
Picon

RE: NFSv4 Replication and Migration: design team conference call/ Draft

> Clearly all the world's information is not represented
> by 2TB of data. 

Some random useless (and probably unreliable) information:

In last week's issue of Science there was a little piece
on considering the universe as a quantum computer.  They
had a statement that the universe as a whole contained 
10**120 bits of information while the sum of all computers
contained 10**21 bits of information.  Don't know how
they came up with these.  

-----Original Message-----
From: David Robinson [mailto:David.Robinson <at> sun.com]
Sent: Wednesday, June 05, 2002 6:02 PM
To: Brent Callaghan
Cc: Robert Thurlow; Juan Gomez; nfsv4-wg <at> sunroof.eng.sun.com
Subject: Re: NFSv4 Replication and Migration: design team conference
call/Draft

Brent Callaghan wrote:

> I'm not sure if I read it in the paper Juan referenced, but
> you could in theory avoid some data transfers if you kept
> track of each block of transferred data together with a
> checksum.
> 
> Then when you need to transfer a block of new data in
> a file, you checksum the block and see if you've transferred
> it previously as part of another file.  If you get a hit,
(Continue reading)


Gmane