The document never uses the term NFSv4.0. As a result it is unclear when it says "NFSv4" whether NFSv4.0 or all NFSv4 protocols is meant.
I found the treatment of "loose coupling" confusing. "Loose coupling" is described as a situation in which no control protocol is present.
On the other hand, the "control protocol" is defined as "a set of requirements", which I assume applies to all mapping types and both tight and loose coupling. Any particular control protocol for file-based access would also meet the requirements specified in Section 2, in either their tight-coupling or loose-coupling variants.
Comments by Section:
Too much of this is devoted to explaining pNFS, which is not the focus of this document. I suggest that the Abstract only refer briefly to the fact that this document defines a pNFS mapping type. It seems to me that the abstract is trying to be a short introduction, which, in the nature of things, it cannot be.
While the current introduction mentions loose coupling as the only distinctive feature of the flex files layout ("to allow the use of storage devices which need not be tightly coupled to the metadata server"), I don't think this is quite accurate. It seems to me that the mirroring stuff is important enough to rate a mention in the abstract. Also, while you are using the term "tightly coupled" in line with your later definition, I don't think you can assume that in the abstract. Perhaps you want to say something like "to allow the use of storage devices in a fashion such that they require only a quite limited degree of interaction with the metadata server, using already existing protocols")
In the first paragraph:
- Given that RFC3530bis has been approved for publication it may make sense to switch the v4.0 reference to refer to the newer document.
- I think you should add a clause allowing use of future minor versions.
Given your desire to standardize on "data storage device", you may (or may not) want to do something about:
There are different Layout Types for different storage systems and methods of arranging data on storage devices. This document defines the Flexible File Layout Type used with file-based data servers (emphases added).
The second paragraph also is afflicted by confusion/uncertainty regarding how to describe issues relating to coupling strength. I think it should be made clearer that:
- The "requirements" embodied by a control protocol (since a control protocol is a set of requirements) apply to both tight and loose coupling and all pNFS mapping types.
The global stateid model (which should be explained a bit further) applies only to the files mapping type and the tight-coupling variant of this mapping type, has a more extensive set of requirements, and presumably a more complicated control protocol, with the latter not being specified in this document.
Later (in section 1.2). you try to standardize on "data storage device' as opposed to 'data server". It would be worthwhile adjusting the definitions of these terms to match that intention.
I note you define "object layout type" but never use the term. Would it be better to remove this?
you define 'mirror' as follows:
is a copy of a file. While mirroring can be used for backing up a file, the copies can be distributed such that each remote site has a locally cached copy. Note that if one copy of the mirror is updated, then all copies must be updated.
I'm wondering about the use of the term "cached" here. i suspect that this is actually a mirrored copy and the word 'cached' will confuse pople
Also i'm afraid about putting this so early in the document. I'm worried that people's reaction might go:
- I can have a local copy for each of my sites, allowing low-latency access
- Whenever I write to a layout, I (i.e. my client) might have to do a whole bunch of high-latency writes to remote sites before returning any layout.
In a more extended treatment of mirroring (somewhere other than the definition section) there would room to explain:
- that the MDS might only specify one or two mirrors to someone getting a write layout,.
- that in that case, the MDS might take on the job of propagating updates to the other N-1 or N-2 mirrors.
- that the number-of-mirrors hint is going to have some role in letting client's say "don't bury me under a pile of mirrors"
A related concern is how the occurrence of multiple mirrors relates to the issues regarding including "less optimal" mirrors discussed in section 4.2. If one of the mirrors is local it would seem that any other would have to be less optimal, creating a problem if one wants all written data to go to at least two locations.
We defined a data server as a pNFS server, which implies that it can utilize the NFSv4.1 protocol to communicate with the client.
However in section 1.1 it contradicts that, saying:
Note that while the metadata server is strictly accessed over the NFSv4.1 protocol, depending on the Layout Type, the data server could be accessed via any protocol that meets the pNFS requirements..
I think the basic purpose of this section is to say that you are using the term "storage device" instead of "data server". I think you want to say that there is essentially no difference and you have to pick one term. If you try to explain the supposed differences between these two very similar concepts, you wind up in a swamp. Look out for alligators.
The second paragraph is not really relevant to the topic of this section. It is part of a general introduction to pNFS, as it applies to this mapping type.
As to the first paragraph, I see the basic problem as connected to the multiple use of the phrase "control protocol". Also, I'm troubled be the "MUST be defined" in the last sentence. Am I wrong to suppose that the document is in fact laying down the law, not to implementers, (which what the RFC2119 terms are for) but to itself.? As a way of checking whether I'm understanding the issues here, what would you say about the following as a replacement?
The coupling of the metadata server with the storage devices can be either tight or loose. In a tight coupling, there is a control protocol, a purpose-built protocol (not specified here) present to manage security, LAYOUTCOMMITs, etc. In the case if loose coupling, the control protocol's functionality is more limited, and a version of NFS might serve the purpose. Given the functionality available in such an arrangement, semantics with regard to managing security, state, and locking models may be different from those in the case of tight coupling and are described below.
With regard to the first sentence, you need to state somewhere (section 2?) your general policy regarding the inheritance of semantics from the file mapping type. I presume that in the tight coupling case you basically inherit the semantics of the file mapping type.
With regard to the second sentence, you need to address the case in which commit is not needed, e.g. all the WRITEs were synchronous.
In the third sentence, I suggest deleting the "I.e.,"
It isn't clear to me what is intended by the first paragraph. it seems to me that you need to choose between specifying the requirements for mds-ds interaction or specifying how they are to interact. As of now, the text is somewhere between those two. For example, saying "SETATTR" makes it sound like you are describing a wire protocol. I suspect you don't intend to and if that's the case, need to indicate what the real requirements are.
Also, to get back to the issue of storage-device-vs.-data-server, generally storage devices don't have a 'filesystem exports level".
The last sentence of the first paragraph (the one after the semicolon) is confusing. It says "hence it provides a level of security equivalent to NFSv3." The (minimal) level of security this provides derives from the use of AUTH_SYS. It has nothing to do with NFSv3. Using NFSv4 or NFSv2 with AUTH_SYS would be exactly the same.
In the second paragraph you stray pretty far into essentially describing a particular implementation. Also, the MDS only needs to propagate to the DS information to determine whether READs or WRITEs are valid. Also, I'm not sure what "go through the same authorization process" means. Do you mean that the client is authorized using the same criteria that would apply to authorizing READs and WRITEs issued through the MDS? Perhaps it would make sense to reference section 13.12 of RFC5661 here, either to incorporate its requirements or to say how those of the flexible files layout are different.
All the other sub-sections of section are organized around the distinction between tight and loose coupling, while in this sub-section this distinction is not mentioned at all. Instead this is organized by data access protocol, which I found confusing. I would up with the the following conclusions. Are these right?
- That for NFSv3 only the loose coupling model is supported.
- That the second paragraph describes the tight coupling model for NFSv4.0. I had originally thought it described it described the loose coupling model , but from section 5.1 concluded that it described the tight coupling model and that the loose coupling model for v4.0 was also supported.
- That the third and fourth paragraphs describe the loose and tight coupling models respectively for v4.1
Suggest replacing the first paragraph by:
Locking-related operations are always sent by the clients to the metadata server. These operations include:
operations relating to opens such as OPEN, OPEN_DOWNGRADE, and CLOSE.
operations relating to byte-range locks such as LOCK, LOCKU, and LOCKT
operations relating to delegation management, such as DELEGRETURN.
miscellaneous operations relating to stateid management such as TEST_STATEID and FREE_STATEID.
In the second paragraph, suggest replacing "NFSv4' by 'NFSv4.0".
In the second paragraph, it says "in response to the state-changing operation" which isn't altogether consistent with the first paragraph lead-in. In particular it isn't clear:
- How byte-range locks are dealt with. I presume mandatory byte-range locks are not supported but if so, the text should say that.
- How changing open state (e.g open upgrades and downgrades) are dealt with if a GETLAYOUT does not occur at the right time to propagate the new stateid to the client.
With regard to the fourth paragraph, it says that "NFSv4.1 clustered storage devices ... use a back-end control protocol as described in [RFC5661]". U
nfortunately it isn't really described in RFC5661, How about the
following as a replacement for this paragraph:
NFSv4.1 clustered storage devices that do identify themselves with the EXCHGID4_FLAG_USE_PNFS_DS flag to EXCHANGE_ID are dealt with using the tight coupling model.This includes implementation of a global stateid model as defined in [RFC5661]. In this case the MDS and storage devices co-operate using a back-end control protocol adequate to meet the requirements of file mapping type defined in [RFC5661] which apply to the tight-coupling case of the flexible file mapping type as well.
There are a number of locking-related matters that need to be dealt with somewhere, if not necessarily here:
- delegation recall/revocation.
- lease renewal and expiration.
- reboots of MDS, clients, storage devices.
At the end of the second paragraph, suggest replacing "using NFSv4" by using the "specified minor version of NFSv4"
Suggest a paragraph after the current third paragraph to deal with optional features. For example,
In the case in which a minor version contains an OPTIONAL feature that the client might use to communicate with
the storage device, the client can check for the presence of support when the GETDEVICEINFO returns. If the
absence of support for the feature causes the device not to be acceptable, then this needs to be communicated to
the MDS by doing a LAYOUTREURN.
Suggest adding the following after all existing paragraphs.
In the case in which the two storage devices do not have the same server owner major ID, the two devices will be accessed over different sessions in the case in which Access is over NFSv4.1 or a later minor version. This is the clientid trunking is present or not. Given the way in which state is managed using the flexible files layout, there is no need for the client to treat state for the separate sessions to be shared based on clientid. This true whether tight or loose coupling in effect.
in the case of tight coupling all state management happens under the aegis of the clientid associated with the MDS.
- in the case of loose coupling, there is no state management in that the clients imply use the stateids already established on storage devices by the mds.
Thus, the client may, in this case, accommodate NFSv4.1 multipathing without significant client-based support for clientid clientid trunking. When there are multiple sessions, the only difference between a situation in which there are multiple clientids and in which there is a single one is with regard to lease renewal.
Given the IESG's current attitude regarding RFC2119 terms (that the upper-case keywords need to be clearly called for), perhaps it makes sense to rewrite the following material from the third paragraph::
If some network addresses are less optimal paths to the data than others, then the MDS SHOULD NOT include those network addresses in ffda_netaddrs. If less optimal network addresses exist to provide failover, the RECOMMENDED method to offer the addresses is to provide them in a replacement device-ID-to-device-address mapping, or a replacement device ID
and instead write something like the following.
If some network addresses are less optimal paths to the data than others, then the MDS should not include those network addresses in ffda_netaddrs. If less optimal network addresses exist to provide failover, the appropriate method to offer the addresses is to provide them in a replacement device-ID-to-device-address mapping, or a replacement device ID
The seventh paragraph is confusing
For tight coupling, ffds_stateid provides the stateid to be used by the client to access the file. For loose coupling and a NFSv4 storage device, the client may use an anonymous stateid to perform I/O on the storage device as there is no use for the metadata server stateid (no control protocol). In such a scenario, the server MUST set the ffds_stateid to be zero.
The problem is with the phrase "may use an anonymous stateid" (emphasis added). The implication (or maybe it's an implicature) is that it might use some other stateid. Given that ffds_stateid MUST be zero, it's hard to imagine what other stateid the client might use.
Is the following replacement correct? If so, I feel it is clearer.
The field ffds_stateid provides the stateid to be used by the client to access the file. For tight coupling this stateid is established using the control protocol. When loose coupling in effect ffds_stateid must be set by the MDS to be zero. As a result client access to the storage device uses the anonymous stateid for NFSv4 protocols.
Suggest replacing the second sentence by the following:
Some reasons for this are accommodating clustered servers which share the same filehandle and allowing for multiple read-only copies of the file on the same storage device.
You would need to say something about the case of feature mismatch, as opposed to version mismatch to match my proposed changes for section 4.1.
Not sure if this fits in this section but there are a number of other sorts of mismatch that you might want to address:
- mismatch as to an acceptable value for ffdv_tightly_coupled (or individualized coupling characteristics flags).
- possible rejection due to mirror count.
I think you need a final paragraph to deal with cases in which the MDS is unable to determine that the appropriate updates have been done. For example, revocation of layouts or client reboot.
Suggest revising the first paragraph to read as follows:
When the LAYOUT4_RET_REC_FILE case of the LAYOUT_RETURN operation is being used, the lrf_body field of the layoutreturn_file4 is used to convey layout-type specific information to the server. Some relevant definitions from [RFC5661] are as follows.
You need to give some guidance regarding use (or non-use) of the FSID and ALL cases of LAYOUT_RETURN.
I think you are missing a "<>" after ffie_errors.
Wondering about use of nfstime4 here since:
It uses up 12 bytes of space.
- Has a range of hundred of billions of years which I hope is overkill.
- Is not convenient for the kinds of calculations you'll by doing with this sort of data.
If any code has been written to use this, it isn't worth changing at this point. However, it it hasn't, the following formats should be considered:
- uint32 microseconds has reasonable resolution and covers durations up to about 50 days.
- uint64 nanoseconds has adequate resolution for the foreseeable future and covers durations up to hundreds of years.
In the first sentence, suggest replacing "differentiates" by "specifies".
In the second sentence suggest replacing "respectively for both read and write operations by "for read and write operations respectively."
In the third sentence of the penultimate paragraph, suggest replacing "it is infeasable" by "it may be infeasible"
In the last sentence of the penultimate paragraph, suggest deleting the word "contiguous".
In the last sentence of the final paragraph, suggest replacing "between" by "among."
Suggest replacing the last sentence by "The only parameter is optional so the client need not specify values for a parameter it does not care about".
You need to have more info about mirror hints somewhere, and this is a likely place. Some issues to address:
- Why a client might choose a particular ffmc_mirrors value
- Why the server might choose to give a different mirror count than hinted by the client
- What options a client has if a layout's mirror count is not acceptable to him
Not sure what you mean by the statement "at the least the server MAY revoke client layouts and/ or device address mappings". If that's what the server may do at the least, what might it do at the most? For example, might it corrupt any file that the client has touched. Or do you mean that the client MUST at least do these things?
I think you're going to have problems with the the IESG with the section as written. I suggest rewriting it.
Some suggestions to consider for a replacement are listed below.
- Start with the tight coupling model. Don't mention Kerberos specifically, but say this mapping type has all the security properties of NFSv4 and the pNFS file mapping type.
- With regard to the loose coupling variant:
- Stress that it is intended for, and SHOULD only be used in, appropriate environments, i.e.those characterized by physically isolated networks or VPN"s with comparable security characteristics.
- Mention that this should be compared to the block mapping type. and that it has equivalent security against malicious clients and better protection against data corruption due to client mistakes.
- If you are not going to implement or even specify something in an area, either don't mention it or simply say it is not part of this specification. Thinking out loud about why you are not doing something is not a good idea in something that is to be an RFC candidate. In particular, if you are doing revocation by changing ownership in order to accommodate NFSv3, the chances of people implementing NFSv3 servers with GSSRPCV3 support is near zero. It is better to be honest and say that this is simply a temporary expedient to support existing data servers and that kerberization of this revocation model is not going to happen.
You should also mention the stuff associated with CB_RECALL_ANY.