FW: [Fwd: Review: draft-ietf-rserpool-arch-11.txt] - Baker comments
Ong, Lyndon <Lyong <at> Ciena.com>
2006-07-12 17:11:01 GMT
The attached message is part 1 of a series of comments from
Fred Baker as part of his review of Rserpool work.
> 3. What type of redundancy model will I use, 2N or N+K?
Is 2N a special case of N+K? I suspect that the key difference
between the models is that the K extra servers in the one back up up
to K failures among the N, where in the 2N case each of the first N
servers has a designated failover server (all N could fail, and if
the first N are serving a specific subset of the transaction set each
backup steps directly up to that service set). But I can't tell this
from your text. Should this be reworded to make the models more clear?
> 5. How does a server assure that when it fails (or dies), the
> clients will access the "best" server that is able to handle
> failure (or if you will take over for the departed server)?
is that the server instance's responsibility, or the service's
Let me give you an example. At one point I was talking with Paul
Vixie about potential models for the DNS Root. At the time, instead
of using distributed anycast servers, he had a number of servers on
the same LAN at F Root and a single router in front of them. The
router distributed load by hashing the source and destination
addresses, and these all responded to the same destination address,
so in effect the router was distributing by source address. I
wondered whether he wanted us to look deeper into the packet and
distribute based on some attribute of the name being looked up, such
as a hash of the characters of the TLD. In this way, his servers
would get a predictable subset. Should one fail, it would no longer
be seen as a "next hop" in routing, and the hash would by definition
distribute the load differently. None of this was ever implemented to
my knowledge, btw. Butmy point is that in this case the service,
which included a load distribution function in the router,
distributed the load, but the individual servers had no knowledge
that each other were there.
One way that a service can back up servers is to have the servers
identify their backups. My point is that there are other ways, and I
suspect they are more in keeping with your design.
> A fault tolerant application needs to deal with these issues and
> more. Often an application is developed and then later, it is
> realized that the application needs to be fault tolerant. The
> response to this new requirement mandates either a hack or re-write
> of the application.
that comma doesn't make any sense. Would "Often an application is
developed, and it is realized later that the application needed to be
> So how can application writers solves these issues and makes it
> for the application designer to add fault tolerance without hacking
> or rewriting the application code?
What, technically, is the difference between "hacking" and
"rewriting" the code? Is "hacking" a bad job?
> A second important point is that this layering no longer requires
> each application to custom programmed for fault tolerance. By
s/to custom/to be custom/?
> 3. An application can be developed without a fault tolerant
> requirement and later in the life cycle, if this requirement
> emerges, it can be met with Rserpool without a costly redesign.
Suggestion: "An application can be developed in a manner that is not
fault tolerant, and fault tolerance introduced later without a costly
redesign by using Rserpool"
> The above summary is the overall goal of Rserpool.
You have six goals, if I understood the text.
"The above summarizes the overall goals of Rserpool."
> Each server pool is identified by a unique identifier which is
> a byte string, called the pool handle. This allows binary
> identifiers to be used.
Dumb question. Does the pool handle identify the pool of servers, or
does it identify a server in the pool? It seems to me that there are
uses for both forms of identifiers.
> These pool handles are not valid in the whole internet but only in
> smaller domains, called the operational scope.
What if the operational scope is the whole Internet?
It seems to me that "the whole internet" is a network layer
description, while the operational scope is an application layer
description. As such - and maybe it's just me - the above doesn't
make a lot of sense.
Example: www.microsoft.com, if you look it up using nslookup, appears
to be a service that is offered through a distributed redirection
service to a variety of individual servers. I have no idea how many
servers are involved or where they are located, but I get the idea
that there is some DNS server that resolves that name with some
knowledge of the source address, yielding something akin to what
Akamai does with names, and without resorting to anycast routing
(which IMHO would be a superior solution, but I digress). Now, the
operational scope of the service is "anyone who might access
www.microsoft.com", which is far too many end systems - it
effectively IS the whole Internet. The pool handle in this case would
seem to be implemented at the network layer, and is the address
returned for the instance of that service that a given user will access.
So help me out here. What in the world is this "operational scope"?
> In each operational scope there must be at least one ENRP server.
> All ENRP servers within the operational scope have knowledge of all
> server pools within the operational scope.
so, coming back to www.microsoft.com, let's suppose that it is
implemented as some number of groups of servers, each group of which
is co-located in one of Microsoft's favorite service providers. You
are presuming that there is at least one redundancy server in the
service, and perhaps one or more in each such group of servers. So
you are calling the groups of servers in a given colo location a
pool, and the entire thing a service? Is that right?
> Pool element: A server entity having registered to a pool.
This is a nice architectural term, I suppose. It doesn't help the
reader much, though. The pool element is an instance of an
application (in a virtual server, there might be many such running on
the same physical hardware) that has registered with a pool server.
Instead of focusing on the hardware it runs on (the server), could
you call it an "application instance" or an "instance of an
application", and the pool itself a pool of application instances?
I'm a humble engineer. I just make stuff work. When I define terms, I
usually try to do so in a manner that will help me point to objects
in pictures. So far, the terminology in this document has me
completely lost. If you were to explain it in English...
> 2.1.2. ENRP Servers
> The second class of entities in the RSerPool architecture is the
> class of ENRP servers. ENRP servers are designed to provide a
> distributed fault-tolerant real-time translation service that
> maps a
> pool handle to set of transport addresses pointing to a specific
> group of networked communication endpoints registered under that
what is a "transport address"?
I can tell you what a TSAP is, or a TCEPID, if that is what you're
getting at. A TSAP is the name a transport client uses to identify
its connection to the transport, and a TCEPID is the name used to
identify the transport in a a remote system when talking to the
network layer and asking for connectivity (it translates to an NSAP
and a multiplexing ID). In the IP architecture we don't talk much
about those concepts, though - that's OSI. We generally talk about
the network layer having addresses and the transport layer having
ports, and when talking to a remote application we talk about the
network address and transport port number that the application is using.
If that is what you're calling a "transport address", you'd best
define the term. The text in this section about the destination
address, protocol, and port might be a good starting point.
Or if you really mean an internet address, say so. We have enough
confusion in the IETF with bellheads that call the physical layer the
"transport". We don't need people misidentifying the internet
sublayer of the network layer as the "transport" too. SCTP and TCP
are transports - that's what the "T" stands for.
> 2.1.3. Pool Users
> A third class of entities in the architecture is the Pool User (PU)
> class, consisting of the clients being served by the PEs of a
What on God's Green Earth is a "user" in this situation? Is it a
person? An application? What?
> 2.2.2. Aggregate Server Access Protocol
> The PU wanting service from the pool uses the Aggregate Server
> Protocol (ASAP) to access members of the pool.
OK. So I'm a pool user (kid in a bathing suit, I think). I want to
use your Rserpool thing to, I dunno, open a BEEP connection to an
application in some other system. That means I'm going to open an
SCTP connection (yes, Marshall never wrote down BEEP/SCTP, but he
should have) to a given Internet address and transport port number,
which will get me to that application. Do I have to also open an ASAP
connection to a server to tell me which one to address? I thought I
got that from DNS?
Or is the pool user someone else?If so, who?
> o The PE can send a business card
> o The PE can send cookies
these are defined terms, right? No, they are not defined in 2.3; 2.3
merely says that they can be shipped around.
To me a business card is a small piece of paper, and a cookie is as
defined in Betty Crocker...
> basic FTP model. These examples use FTP for illustrative purposes.
> FTP was chosen since many of the basic concept are well known and
> should be familiar to readers.
s/basic concept/basic concepts/
It's really great to find the example here. One could imagine it
being in the overview as a motivator for the discussion, showing the
problem space and how the solution addresses it.
> To effect a file transfer the following steps would take place.
> 1. The application in PU(X) sends a login request. The PU(X)'s
> layer sends an ASAP request to an ENRP server to request the
> of pool elements (using (a)). The pool handle to identify the
> pool is "File Transfer Pool". The ASAP layer queues the login
Dumb question from the guy who knows far too much about queues. Does
the ASAP layer queue it (store it in a FIFO, LIFO, or etc data
structure), or does it store it in a database of active requests? I
should think it would do the latter, so that the request could then
be serviced as the necessary resources become available rather than
artificially making it wait for other requests whose resources don't
become available quite as rapidly.
> 2. The ENRP server returns a list of the three PEs PE(1,A), PE
> and PE(1,C) to the ASAP layer in PU(X) (using (b)).
You need some punctuation there somehow. I tried, I have no idea how.
rserpool mailing list
rserpool <at> ietf.org