Re: provisional registration: packaged content over http (5 headers)
Richard Jones <rich.d.jones <at> gmail.com>
2012-01-17 14:30:49 GMT
Hi Folks,
Thanks for the feedback; I've got some inline comments to see if I understand what changes I need to make ...
My comments below notwithstanding, I don't think there are any specific problems for provisional registration (but I think further discussion would be needed for progression to permanent registration).
At this stage we're absolutely only looking for provisional registration; there are other discussions which need to take place before we can see if we have the resources to go full standards track with any of this.
(By way of full disclosure to other readers, I have been a technical advisor to the SWORD project which is presenting these proposals.)
[[
The Packaging header applies to resources delivered over HTTP which are comprised of component resources, and is for uniquely identifying these well structured packaged objects in a similar way that Content-Type does for MIME formats.
]]
I think this would be a good opportunity to canvas for information about whether any other projects are addressing similar issues w.r.t. conveying information about packaging or composite object formats in HTTP. I'm pretty sure this isn't a one-off problem.
Packaging doesn't really fall into the role of a content-type, as it doesn't say anything about the nature of purpose of the packaged content. But it also is not really a content transfer encoding, as it may convey application-relevant metadata in addition to simply encoding content for transfer.
The nearest I can think of that has been addressed in the IETF is the MHTML work from some years ago, which uses multipart/related structures to bundle up the content of a web page (http://tools.ietf.org/html/rfc2557). But that doesn't really work in this case, as SWORD and related applications are already using a number of alternative formats that don't easily map into a multipart/related or similar MIME encapsulation structure.
[[
The Packaging request header SHOULD be used by the client during HTTP POST to give information to the server about the packaging format used to construct the content being POSTed or PUT. Servers SHOULD use this information to unpack the supplied content into its component parts. If the server does not understand the package format it MUST either store the content as delivered without unpacking or respond with 415 (Unsupported Media Type).
]]
It is not clear from this text that the SHOULD here applies to implementations of SWORD. For the header specification document, I think it would be better to avoid such normative claims about its use, which might be read as claiming that any HTTP client SHOULD use the header. e.g. just say "The Packaging request header may be used by a client ..."
Ok, good point. I wanted to try and spin these as not so SWORD specific (and then to re-use them in the SWORD protocol). So just substituting SHOULD for MAY will be sufficient?
[[
The In-Progress request header MAY be used by the client to inform the server that the current content payload is not yet complete in some unspecified way during PUT, POST or DELETE. For example, there may by further content packages that the client plans to deliver to the server before the full content has been delivered, or the client may need to carry out other actions against the server before confirming that the server can proceed to fully process the content. Exact interpretation of this header is left to the server, so it is necessary that server/client pairs will have to have a common understanding of its meaning which is beyond the scope of this document.
]]
This feels to me like an abuse of the HTTP protocol - if it modifies the intent of the method, that would be wrong, but that's not what I think is intended. Rather, it seems to modify the interpretation of the resource identified by the request URI, which makes me wonder if the intent might not be better conveyed by using a different server-designated URI for the parts.
Yeah, this is a tricky one to explain, and clearly needs more work. Let me try again here, and then if that makes sense I can fold it back into the docs:
In-Progress is intended as a guiding hand for the client to use to tell the server that it should expect more content to be delivered to this web resource before it can be considered "complete". The following scenarios would necessitate its usage:
- you are uploading a number of files via a client in a number of discrete upload steps. At each file upload, the file (which may be a package) is sent to the server, which accepts the deposit. But you don't want the server to inject the content into any kind of workflow (e.g. for re-publication on the web) until you have finished uploading all of the files, so the In-Progress header tells the server that it should expect more related content in future requests. On your final upload either omit the header or send In-Progress: false to tell the server that it can go ahead and start its workflows.
- you are uploading content from a number of sources. Perhaps a content package from a research management/information system and related research data from lab equipment. They both pertain to the same "object" on the server, but will be delivered an uncertain times and from different sources. By using In-Progress, both depositors can tell the server that more data is coming for that object and it is not yet finished.
We had extensive discussion about how to pitch this, and there was some suggestion about embedding the In-Progress information into the request /body/ rather than the headers, but this would require specific treatment of the package itself, which we wanted to avoid. I think of it as like a more decoupled version of Transfer-Encoding: chunked. So, if we can find a way to describe this which makes sense and doesn't appear to impact the other purposes of HTTP, that would be good. Thoughts?
[[
The Metadata-Relevant request header MAY be used by the client to instruct the server to (attempt to) extract metadata from the supplied content package, during PUT, POST or DELETE. Content packages commonly contain both file content and metadata about its contents, and during unpacking servers may process this metadata in a way which is meaningful to them. If the content package is being supplied to an HTTP resource which is not interested in metadata, then it may be that the enclosed information will not be correctly or adequately treated. This directive allows the client to indicate to the server that there is metadata contained within the package which may be of interest to related resources (for example a resource which contains the resource receiving the content), and that the server should be free to update those resources accordingly.
]]
Do you *really* mean for this to be applicable to HTTP DELETE operations?
No, sorry, good catch; will fix in the next revision.
(A bit of Googling - e.g. http://www.spenceruresk.com/2011/11/http-delete-requests-that-include-a-body/, http://stackoverflow.com/questions/299628/is-an-entity-body-allowed-for-an-http-delete-request - suggests that there's no specific prohibition of sending data/metadata with a DELETE request, but that any such attempt is unlikely to be handled dependably by existing software. And it's really not clear what it would mean to send metadata about a resource that is being deleted.)
Did you consider the option that the relevance of metadata might be conveyed by a parameter on the Packaging header field? This header doesn't seem to have any purpose independently of the Packaging hea
I think there's a risk in combining these, because you could imagine that there is relevant metadata in, say, a PDF, and if this was part of the Packaging header you couldn't re-use this header outside of that context.
Cheers,
Richard
<div>
<p>Hi Folks,</p>
<div><br></div>
<div>Thanks for the feedback; I've got some inline comments to see if I understand what changes I need to make ...<br><br><div class="gmail_quote">
<blockquote class="gmail_quote">
My comments below notwithstanding, I don't think there are any specific problems for provisional registration (but I think further discussion would be needed for progression to permanent registration).<br>
</blockquote>
<div><br></div>
<div>At this stage we're absolutely only looking for provisional registration; there are other discussions which need to take place before we can see if we have the resources to go full standards track with any of this.</div>
<div><br></div>
<div> </div>
<blockquote class="gmail_quote">
(By way of full disclosure to other readers, I have been a technical advisor to the SWORD project which is presenting these proposals.)<div class="im">
<br><br>
On 22/12/2011 17:31, Richard Jones wrote:<br><blockquote class="gmail_quote">
Packaging<br><br>
Header field name: Packaging<br>
Applicable protocol: HTTP<br>
Status: provisional<br>
Author/Change controller: Richard Jones c/o UKOLN, University of Bath;<br><a href="mailto:rich.d.jones <at> gmail.com" target="_blank">rich.d.jones <at> gmail.com</a><br>
Specification Document:<br><a href="http://sword-app.svn.sourceforge.net/viewvc/sword-app/spec/tags/sword-2.0/SWORD001.html?revision=377" target="_blank">http://sword-app.svn.sourceforge.net/viewvc/sword-app/spec/tags/sword-2.0/SWORD001.html?revision=377</a><br>
Related Information: <a href="http://swordapp.org/sword-v2/sword-v2-specifications/" target="_blank">http://swordapp.org/sword-v2/sword-v2-specifications/</a><br><br><br>
Accept-Packaging<br><br>
Header field name: Accept-Packaging<br>
Applicable protocol: HTTP<br>
Status: provisional<br>
Author/Change controller: Richard Jones c/o UKOLN, University of Bath;<br><a href="mailto:rich.d.jones <at> gmail.com" target="_blank">rich.d.jones <at> gmail.com</a><br>
Specification Document:<br><a href="http://sword-app.svn.sourceforge.net/viewvc/sword-app/spec/tags/sword-2.0/SWORD001.html?revision=377" target="_blank">http://sword-app.svn.sourceforge.net/viewvc/sword-app/spec/tags/sword-2.0/SWORD001.html?revision=377</a><br>
Related Information: <a href="http://swordapp.org/sword-v2/sword-v2-specifications/" target="_blank">http://swordapp.org/sword-v2/sword-v2-specifications/</a><br>
</blockquote>
<br>
</div>
[[<br>
The Packaging header applies to resources delivered over HTTP which are comprised of component resources, and is for uniquely identifying these well structured packaged objects in a similar way that Content-Type does for MIME formats.<br>
]]<br><br>
I think this would be a good opportunity to canvas for information about whether any other projects are addressing similar issues w.r.t. conveying information about packaging or composite object formats in HTTP. I'm pretty sure this isn't a one-off problem.<br><br>
Packaging doesn't really fall into the role of a content-type, as it doesn't say anything about the nature of purpose of the packaged content. But it also is not really a content transfer encoding, as it may convey application-relevant metadata in addition to simply encoding content for transfer.<br><br>
The nearest I can think of that has been addressed in the IETF is the MHTML work from some years ago, which uses multipart/related structures to bundle up the content of a web page (<a href="http://tools.ietf.org/html/rfc2557" target="_blank">http://tools.ietf.org/html/rfc2557</a>). But that doesn't really work in this case, as SWORD and related applications are already using a number of alternative formats that don't easily map into a multipart/related or similar MIME encapsulation structure.<br><br>
[[<br>
The Packaging request header SHOULD be used by the client during HTTP POST to give information to the server about the packaging format used to construct the content being POSTed or PUT. Servers SHOULD use this information to unpack the supplied content into its component parts. If the server does not understand the package format it MUST either store the content as delivered without unpacking or respond with 415 (Unsupported Media Type).<br>
]]<br><br>
It is not clear from this text that the SHOULD here applies to implementations of SWORD. For the header specification document, I think it would be better to avoid such normative claims about its use, which might be read as claiming that any HTTP client SHOULD use the header. e.g. just say "The Packaging request header may be used by a client ..."</blockquote>
<div><br></div>
<div>Ok, good point. I wanted to try and spin these as not so SWORD specific (and then to re-use them in the SWORD protocol). So just substituting SHOULD for MAY will be sufficient?</div>
<div> </div>
<div>
<br>
</div>
<blockquote class="gmail_quote">
<div class="im">
<br><blockquote class="gmail_quote">
On-Behalf-Of<br><br>
Header field name: On-Behalf-Of<br>
Applicable protocol: HTTP<br>
Status: provisional<br>
Author/Change controller: Richard Jones c/o UKOLN, University of Bath;<br><a href="mailto:rich.d.jones <at> gmail.com" target="_blank">rich.d.jones <at> gmail.com</a><br>
Specification Document:<br><a href="http://sword-app.svn.sourceforge.net/viewvc/sword-app/spec/tags/sword-2.0/SWORD001.html?revision=377" target="_blank">http://sword-app.svn.sourceforge.net/viewvc/sword-app/spec/tags/sword-2.0/SWORD001.html?revision=377</a><br>
Related Information: <a href="http://swordapp.org/sword-v2/sword-v2-specifications/" target="_blank">http://swordapp.org/sword-v2/sword-v2-specifications/</a><br><br><br>
In-Progress<br><br>
Header field name: In-Progress<br>
Applicable protocol: HTTP<br>
Status: provisional<br>
Author/Change controller: Richard Jones c/o UKOLN, University of Bath;<br><a href="mailto:rich.d.jones <at> gmail.com" target="_blank">rich.d.jones <at> gmail.com</a><br>
Specification Document:<br><a href="http://sword-app.svn.sourceforge.net/viewvc/sword-app/spec/tags/sword-2.0/SWORD001.html?revision=377" target="_blank">http://sword-app.svn.sourceforge.net/viewvc/sword-app/spec/tags/sword-2.0/SWORD001.html?revision=377</a><br>
Related Information: <a href="http://swordapp.org/sword-v2/sword-v2-specifications/" target="_blank">http://swordapp.org/sword-v2/sword-v2-specifications/</a><br>
</blockquote>
<br>
</div>
[[<br>
The In-Progress request header MAY be used by the client to inform the server that the current content payload is not yet complete in some unspecified way during PUT, POST or DELETE. For example, there may by further content packages that the client plans to deliver to the server before the full content has been delivered, or the client may need to carry out other actions against the server before confirming that the server can proceed to fully process the content. Exact interpretation of this header is left to the server, so it is necessary that server/client pairs will have to have a common understanding of its meaning which is beyond the scope of this document.<br>
]]<br><br>
This feels to me like an abuse of the HTTP protocol - if it modifies the intent of the method, that would be wrong, but that's not what I think is intended. Rather, it seems to modify the interpretation of the resource identified by the request URI, which makes me wonder if the intent might not be better conveyed by using a different server-designated URI for the parts.</blockquote>
<div><br></div>
<div>Yeah, this is a tricky one to explain, and clearly needs more work. Let me try again here, and then if that makes sense I can fold it back into the docs:</div>
<div><br></div>
<div>In-Progress is intended as a guiding hand for the client to use to tell the server that it should expect more content to be delivered to this web resource before it can be considered "complete". The following scenarios would necessitate its usage:</div>
<div><br></div>
<div>- you are uploading a number of files via a client in a number of discrete upload steps. At each file upload, the file (which may be a package) is sent to the server, which accepts the deposit. But you don't want the server to inject the content into any kind of workflow (e.g. for re-publication on the web) until you have finished uploading all of the files, so the In-Progress header tells the server that it should expect more related content in future requests. On your final upload either omit the header or send In-Progress: false to tell the server that it can go ahead and start its workflows.</div>
<div><br></div>
<div>- you are uploading content from a number of sources. Perhaps a content package from a research management/information system and related research data from lab equipment. They both pertain to the same "object" on the server, but will be delivered an uncertain times and from different sources. By using In-Progress, both depositors can tell the server that more data is coming for that object and it is not yet finished.</div>
<div><br></div>
<div>We had extensive discussion about how to pitch this, and there was some suggestion about embedding the In-Progress information into the request /body/ rather than the headers, but this would require specific treatment of the package itself, which we wanted to avoid. I think of it as like a more decoupled version of Transfer-Encoding: chunked. So, if we can find a way to describe this which makes sense and doesn't appear to impact the other purposes of HTTP, that would be good. Thoughts?</div>
<div><br></div>
<div> </div>
<blockquote class="gmail_quote">
<div class="im">
<br><blockquote class="gmail_quote">
Metadata-Relevant<br><br>
Header field name: Metadata-Relevant<br>
Applicable protocol: HTTP<br>
Status: provisional<br>
Author/Change controller: Richard Jones c/o UKOLN, University of Bath;<br><a href="mailto:rich.d.jones <at> gmail.com" target="_blank">rich.d.jones <at> gmail.com</a><br>
Specification Document:<br><a href="http://sword-app.svn.sourceforge.net/viewvc/sword-app/spec/tags/sword-2.0/SWORD001.html?revision=377" target="_blank">http://sword-app.svn.sourceforge.net/viewvc/sword-app/spec/tags/sword-2.0/SWORD001.html?revision=377</a><br>
Related Information: <a href="http://swordapp.org/sword-v2/sword-v2-specifications/" target="_blank">http://swordapp.org/sword-v2/sword-v2-specifications/</a><br>
</blockquote>
<br>
</div>
[[<br>
The Metadata-Relevant request header MAY be used by the client to instruct the server to (attempt to) extract metadata from the supplied content package, during PUT, POST or DELETE. Content packages commonly contain both file content and metadata about its contents, and during unpacking servers may process this metadata in a way which is meaningful to them. If the content package is being supplied to an HTTP resource which is not interested in metadata, then it may be that the enclosed information will not be correctly or adequately treated. This directive allows the client to indicate to the server that there is metadata contained within the package which may be of interest to related resources (for example a resource which contains the resource receiving the content), and that the server should be free to update those resources accordingly.<br>
]]<br><br>
Do you *really* mean for this to be applicable to HTTP DELETE operations?<br>
</blockquote>
<div><br></div>
<div>No, sorry, good catch; will fix in the next revision.</div>
<div> </div>
<blockquote class="gmail_quote">
<br>
(A bit of Googling - e.g. <a href="http://www.spenceruresk.com/2011/11/http-delete-requests-that-include-a-body/" target="_blank">http://www.spenceruresk.com/2011/11/http-delete-requests-that-include-a-body/</a>, <a href="http://stackoverflow.com/questions/299628/is-an-entity-body-allowed-for-an-http-delete-request" target="_blank">http://stackoverflow.com/questions/299628/is-an-entity-body-allowed-for-an-http-delete-request</a> - suggests that there's no specific prohibition of sending data/metadata with a DELETE request, but that any such attempt is unlikely to be handled dependably by existing software. And it's really not clear what it would mean to send metadata about a resource that is being deleted.)<br><br>
Did you consider the option that the relevance of metadata might be conveyed by a parameter on the Packaging header field? This header doesn't seem to have any purpose independently of the Packaging hea</blockquote>
<div><br></div>
<div>I think there's a risk in combining these, because you could imagine that there is relevant metadata in, say, a PDF, and if this was part of the Packaging header you couldn't re-use this header outside of that context.</div>
<div><br></div>
<div>Cheers,</div>
<div><br></div>
<div>Richard</div>
</div>
</div>
</div>