Jian Li | 1 Jul 02:35 2010
Picon

Re: [File API] Recent Updates To Specification + Co-Editor

We've some more questions regarding the blob URL.

1. The spec does not describe how blob and blob URL will work in the worker and shared worker scenarios. I think we should allow WorkerGlobalScope to be the binding context for the blob URL, like Document. In addition, we should define how a blob object can be passed to the worker via structured cloning. A new blob object should be expected to be created and it points to the same underlying data.

2. The current spec says that the lifetime of the blob URL is bound to the lifetime of the spawning context. What happens if we try to access the blob url from multiple contexts? Say, we 
call "parent.blob.url", the lifetime of the url is bound to the parent context, not the current context, per the spec. This sounds a little bit unnatural. Could we explicitly provide the context while creating the blob URL, like "window.createBlobUrl(blob)"?

3. Since the lifetime of the blob URL is bound to a context, the blob URL (the underlying blob data) will get disposed only when the context dies. When we have long-live pages or shared workers, we could have "leaked" blob URLs that result in unclaimed blob storages. It will be nice if we can add the capability to revoke the blob URL pragmatically, like "window.revokeBlobUrl(url)",

4. It will be good if the spec could say more about the lifetime of the blob object and the blob URL since they're kind of orthogonal: the blob object will still be functional as long as it is not GC-ed even if the associated context dies.

5. The spec does not describe explicitly about the transient cases, like "location.href = blob.url". Probably the spec could mention that the resource pointed by blob URL should be loaded successfully as long as the blob URL is valid at the time when the resource is starting to load.



On Mon, Jun 28, 2010 at 2:20 PM, Arun Ranganathan <arun <at> mozilla.com> wrote:
Greetings WebApps WG,

I have made edits to the File API specification [1].  There are a few things of note that I'd like to call the WG's attention to.

1. There is a name change in effect.  FileReader has been re-named BlobReader, upon request from Chrome team folks[2][3].  The name "BlobReader" won't win awards in a beauty pageant, but it tersely describes an object to read Blobs (which could originate from the underlying file system *or* be generated *within* a Web App).  My present understanding is that FileWriter will also undergo a name change.  Naming is really hard.  Firefox already ships with FileReader, but I see the point of having an object named for what it does, which in this case is certainly more than file reading from the underlying file system.  I also abhor bike shedding, especially over naming, but this is something that's exposed to the authors.  I have not renamed FileError or FileException.  In the case of errors and exceptions, I think *most* scenarios will occur as a result of issues with the underlying file system.  These names should remain.

2. I've updated the URL scheme for Blobs using an ABNF that calls for an "opaque string" which is a term I define in the specification.  There was much discussion about this aspect of the File API specification, and I think the existing scheme does allow for user agents to tack on origin information in the URL (this is not something the spec. says you should do).  The actual choice of opaque string is left to implementations, though the specification suggests UUID in its canonical form (and provides an ABNF for this).  I think this is the most any specification has said on the subject of URLs.

3. There is an additional asynchronous read method on BlobReader, and an additional synchronous read method on BlobReaderSync, namely readAsArrayBuffer.  These use the TypedArrays definition initially defined by the WebGL WG [4].

4. I am moving on from my full-time role at Mozilla to a part-time consulting role.  I'll continue to be an editor of the File API, but I am stepping down as Chair of the WebGL WG.  I'll continue to be active in standards communities, though :-)

5. I spoke to Jonas Sicking, who expressed willingness to be a co-editor of the File API specification.  Most people who work on HTML5 and WebApps know Jonas' contributions to both WGs; with everyone's consent, I'd like to nominate him as co-editor.  His model for an asynchronous event-driven API is what prompted the initial rewrite, and he also works on both File API and IndexedDB implementation (amongst other things).

-- A*

[1] http://dev.w3.org/2006/webapi/FileAPI/
[2] http://lists.w3.org/Archives/Public/public-webapps/2010AprJun/0755.html
[3] http://lists.w3.org/Archives/Public/public-webapps/2010AprJun/0716.html
[4] https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/doc/spec/TypedArray-spec.html


Jian Li | 1 Jul 02:38 2010

Re: [File API] Recent Updates To Specification + Co-Editor

Thanks for the update. We've some more questions regarding the blob URL.

1. The spec does not describe how blob and blob URL will work in the worker and shared worker scenarios. I think we should allow WorkerGlobalScope to be the binding context for the blob URL, like Document. In addition, we should define how a blob object can be passed to the worker via structured cloning. A new blob object should be expected to be created and it points to the same underlying data.

2. The current spec says that the lifetime of the blob URL is bound to the lifetime of the spawning context. What happens if we try to access the blob url from multiple contexts? Say, we 
call "parent.blob.url", the lifetime of the url is bound to the parent context, not the current context, per the spec. This sounds a little bit unnatural. Could we explicitly provide the context while creating the blob URL, like "window.createBlobUrl(blob)"?

3. Since the lifetime of the blob URL is bound to a context, the blob URL (the underlying blob data) will get disposed only when the context dies. When we have long-live pages or shared workers, we could have "leaked" blob URLs that result in unclaimed blob storages. It will be nice if we can add the capability to revoke the blob URL pragmatically, like "window.revokeBlobUrl(url)",

4. It will be good if the spec could say more about the lifetime of the blob object and the blob URL since they're kind of orthogonal: the blob object will still be functional as long as it is not GC-ed even if the associated context dies.

5. The spec does not describe explicitly about the transient cases, like "location.href = blob.url". Probably the spec could mention that the resource pointed by blob URL should be loaded successfully as long as the blob URL is valid at the time when the resource is starting to load.


On Mon, Jun 28, 2010 at 2:20 PM, Arun Ranganathan <arun <at> mozilla.com> wrote:
Greetings WebApps WG,

I have made edits to the File API specification [1].  There are a few things of note that I'd like to call the WG's attention to.

1. There is a name change in effect.  FileReader has been re-named BlobReader, upon request from Chrome team folks[2][3].  The name "BlobReader" won't win awards in a beauty pageant, but it tersely describes an object to read Blobs (which could originate from the underlying file system *or* be generated *within* a Web App).  My present understanding is that FileWriter will also undergo a name change.  Naming is really hard.  Firefox already ships with FileReader, but I see the point of having an object named for what it does, which in this case is certainly more than file reading from the underlying file system.  I also abhor bike shedding, especially over naming, but this is something that's exposed to the authors.  I have not renamed FileError or FileException.  In the case of errors and exceptions, I think *most* scenarios will occur as a result of issues with the underlying file system.  These names should remain.

2. I've updated the URL scheme for Blobs using an ABNF that calls for an "opaque string" which is a term I define in the specification.  There was much discussion about this aspect of the File API specification, and I think the existing scheme does allow for user agents to tack on origin information in the URL (this is not something the spec. says you should do).  The actual choice of opaque string is left to implementations, though the specification suggests UUID in its canonical form (and provides an ABNF for this).  I think this is the most any specification has said on the subject of URLs.

3. There is an additional asynchronous read method on BlobReader, and an additional synchronous read method on BlobReaderSync, namely readAsArrayBuffer.  These use the TypedArrays definition initially defined by the WebGL WG [4].

4. I am moving on from my full-time role at Mozilla to a part-time consulting role.  I'll continue to be an editor of the File API, but I am stepping down as Chair of the WebGL WG.  I'll continue to be active in standards communities, though :-)

5. I spoke to Jonas Sicking, who expressed willingness to be a co-editor of the File API specification.  Most people who work on HTML5 and WebApps know Jonas' contributions to both WGs; with everyone's consent, I'd like to nominate him as co-editor.  His model for an asynchronous event-driven API is what prompted the initial rewrite, and he also works on both File API and IndexedDB implementation (amongst other things).

-- A*

[1] http://dev.w3.org/2006/webapi/FileAPI/
[2] http://lists.w3.org/Archives/Public/public-webapps/2010AprJun/0755.html
[3] http://lists.w3.org/Archives/Public/public-webapps/2010AprJun/0716.html
[4] https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/doc/spec/TypedArray-spec.html


Jonas Sicking | 1 Jul 03:17 2010

[IndexedDB] Should .add/.put/.update throw when called in read-only transaction?

Hi All,

Currently the IndexedDB specification is silent on what should happen
if IDBObjectStore.add, IDBObjectStore.put, IDBObjectStore.remove,
IDBCursor.update or IDBCursor.remove() is called from a READ_ONLY
transaction. There are two possible ways we can handle this:

1. We can throw an exception.
2. We can return a IDBRequest object and asynchronously fire a 'error'
event on this object.

The advantage of 1 is that we pretty much know that this was an error
due to a bug in the web page, and we can always know this
synchronously without having to consult the database. Throwing an
error means that all the existing infrastructure for error handling
with automatically kick in. For example any higher-level try/catch
constructs will have an opportunity to catch the error.
Implementations generally report uncaught exceptions to an error log.
The browser will fire an 'error' event on the window which the page
can use for further logging. Firing an error event on the other hand
does not allow the browser to automatically log the error in a console
as the page hasn't yet gotten a chance to handle it.

The advantage of 2 is that this is consistent with other error
conditions, such as writing duplicate keys, disk errors during writing
the database to disk, internal errors in the database, etc.

While consistency, and only needing to check for errors one way, is
certainly good arguments, I would argue that people won't need to
check for calling-add-on-read-only-transactions. For properly written
code it's not an error that will occur, and thus there is no need to
check for it. In fact, you probably are generally better off letting
the exception bubble all the way up and get logged or caught by
generic error handlers.

Additionally, the structured clone algorithm, which defines that an
exception should synchronously be thrown if the object is malformed,
for example if it consists of a cyclic graph. So .add/.put/.update can
already throw under certain circumstances.

Also compare to if we were using a different API strategy of making
objectStores and cursors returned from READ_ONLY transactions not have
mutating functions. In this case if someone tried to call .put(), that
also would result in a exception from the JS interpreter stating that
you're calling a function that doesn't exist.

So I would argue that we should throw for at least all transaction
violations. I.e. whenever you try to perform an action not allowed by
the current transaction. This would also cover the case of calling
createObjectStore/removeObjectStore/createIndex/removeIndex during a
non-setVersion-transaction.

There is also another case where synchronously know that an error will
be reported. We could throw when IDBCursor.update() is called when the
underlying object store uses in-line keys and the property at the key
path does not match the key in this cursor's position. In this case we
similarly immediately know that there is an error without having to
consult the database. We also generally can be sure that there is a
bug in the web page which would benefit from being reported like other
bugs are.

And like stated above, IDBCursor.update() can already throw if the
passed in object can't be structurally cloned.

Jeremy previously asked if there was a test we could use to
clearly/intuitively break error conditions into two groups. Ones that
cause exceptions to be thrown, and ones that cause error events to be
fired. I would say that errors that do not depend on what data is in
the database, but rather are clearly due to errors at the call site
should throw an exception.

Let me know what you think.

/ Jonas

Jeremy Orlow | 1 Jul 03:42 2010

Re: [IndexedDB] Should .add/.put/.update throw when called in read-only transaction?



On Thu, Jul 1, 2010 at 11:17 AM, Jonas Sicking <jonas <at> sicking.cc> wrote:
Hi All,

Currently the IndexedDB specification is silent on what should happen
if IDBObjectStore.add, IDBObjectStore.put, IDBObjectStore.remove,
IDBCursor.update or IDBCursor.remove() is called from a READ_ONLY
transaction. There are two possible ways we can handle this:

1. We can throw an exception.
2. We can return a IDBRequest object and asynchronously fire a 'error'
event on this object.

The advantage of 1 is that we pretty much know that this was an error
due to a bug in the web page, and we can always know this
synchronously without having to consult the database. Throwing an
error means that all the existing infrastructure for error handling
with automatically kick in. For example any higher-level try/catch
constructs will have an opportunity to catch the error.
Implementations generally report uncaught exceptions to an error log.
The browser will fire an 'error' event on the window which the page
can use for further logging. Firing an error event on the other hand
does not allow the browser to automatically log the error in a console
as the page hasn't yet gotten a chance to handle it.

The advantage of 2 is that this is consistent with other error
conditions, such as writing duplicate keys, disk errors during writing
the database to disk, internal errors in the database, etc.

While consistency, and only needing to check for errors one way, is
certainly good arguments, I would argue that people won't need to
check for calling-add-on-read-only-transactions. For properly written
code it's not an error that will occur, and thus there is no need to
check for it. In fact, you probably are generally better off letting
the exception bubble all the way up and get logged or caught by
generic error handlers.

Additionally, the structured clone algorithm, which defines that an
exception should synchronously be thrown if the object is malformed,
for example if it consists of a cyclic graph. So .add/.put/.update can
already throw under certain circumstances.

Also compare to if we were using a different API strategy of making
objectStores and cursors returned from READ_ONLY transactions not have
mutating functions. In this case if someone tried to call .put(), that
also would result in a exception from the JS interpreter stating that
you're calling a function that doesn't exist.

So I would argue that we should throw for at least all transaction
violations. I.e. whenever you try to perform an action not allowed by
the current transaction. This would also cover the case of calling
createObjectStore/removeObjectStore/createIndex/removeIndex during a
non-setVersion-transaction.


There is also another case where synchronously know that an error will
be reported. We could throw when IDBCursor.update() is called when the
underlying object store uses in-line keys and the property at the key
path does not match the key in this cursor's position. In this case we
similarly immediately know that there is an error without having to
consult the database. We also generally can be sure that there is a
bug in the web page which would benefit from being reported like other
bugs are.

And like stated above, IDBCursor.update() can already throw if the
passed in object can't be structurally cloned.


Jeremy previously asked if there was a test we could use to
clearly/intuitively break error conditions into two groups. Ones that
cause exceptions to be thrown, and ones that cause error events to be
fired. I would say that errors that do not depend on what data is in
the database, but rather are clearly due to errors at the call site
should throw an exception.

This would limit us in the future in terms of schema changes.  The current async interface differs starting the transaction until the first call that accesses/modifies data (which are all async).  If we ever allow a schema change to happen without disconnecting all clients, it'd be possible that the objectStore could be deleted between when the call is made and when the transaction is actually allowed to start.

This also will limit what can be done on a background thread.  For example, an implementation couldn't do serialization of the object on a background thread (yes, if you did this, you'd need to make sure the main thread didn't modify it until it finished serializing).

Because of these reasons, I'm not too excited about this particular heuristic for when to throw vs fire an error callback.  I've thought about it a bit and can't think of anything better though, unfortunately.

I think I'm still slightly in favor of routing all errors through onerror callbacks and never throwing from a function that returns an IDBResult, but I think there were some good points brought up by Jonas for why throwing on some errors would make sense.
 
Let me know what you think.

/ Jonas


Jonas Sicking | 1 Jul 03:55 2010

Re: [IndexedDB] Should .add/.put/.update throw when called in read-only transaction?

On Wed, Jun 30, 2010 at 6:42 PM, Jeremy Orlow <jorlow <at> chromium.org> wrote:
> On Thu, Jul 1, 2010 at 11:17 AM, Jonas Sicking <jonas <at> sicking.cc> wrote:
>>
>> Hi All,
>>
>> Currently the IndexedDB specification is silent on what should happen
>> if IDBObjectStore.add, IDBObjectStore.put, IDBObjectStore.remove,
>> IDBCursor.update or IDBCursor.remove() is called from a READ_ONLY
>> transaction. There are two possible ways we can handle this:
>>
>> 1. We can throw an exception.
>> 2. We can return a IDBRequest object and asynchronously fire a 'error'
>> event on this object.
>>
>> The advantage of 1 is that we pretty much know that this was an error
>> due to a bug in the web page, and we can always know this
>> synchronously without having to consult the database. Throwing an
>> error means that all the existing infrastructure for error handling
>> with automatically kick in. For example any higher-level try/catch
>> constructs will have an opportunity to catch the error.
>> Implementations generally report uncaught exceptions to an error log.
>> The browser will fire an 'error' event on the window which the page
>> can use for further logging. Firing an error event on the other hand
>> does not allow the browser to automatically log the error in a console
>> as the page hasn't yet gotten a chance to handle it.
>>
>> The advantage of 2 is that this is consistent with other error
>> conditions, such as writing duplicate keys, disk errors during writing
>> the database to disk, internal errors in the database, etc.
>>
>> While consistency, and only needing to check for errors one way, is
>> certainly good arguments, I would argue that people won't need to
>> check for calling-add-on-read-only-transactions. For properly written
>> code it's not an error that will occur, and thus there is no need to
>> check for it. In fact, you probably are generally better off letting
>> the exception bubble all the way up and get logged or caught by
>> generic error handlers.
>>
>> Additionally, the structured clone algorithm, which defines that an
>> exception should synchronously be thrown if the object is malformed,
>> for example if it consists of a cyclic graph. So .add/.put/.update can
>> already throw under certain circumstances.
>>
>> Also compare to if we were using a different API strategy of making
>> objectStores and cursors returned from READ_ONLY transactions not have
>> mutating functions. In this case if someone tried to call .put(), that
>> also would result in a exception from the JS interpreter stating that
>> you're calling a function that doesn't exist.
>>
>> So I would argue that we should throw for at least all transaction
>> violations. I.e. whenever you try to perform an action not allowed by
>> the current transaction. This would also cover the case of calling
>> createObjectStore/removeObjectStore/createIndex/removeIndex during a
>> non-setVersion-transaction.
>>
>>
>> There is also another case where synchronously know that an error will
>> be reported. We could throw when IDBCursor.update() is called when the
>> underlying object store uses in-line keys and the property at the key
>> path does not match the key in this cursor's position. In this case we
>> similarly immediately know that there is an error without having to
>> consult the database. We also generally can be sure that there is a
>> bug in the web page which would benefit from being reported like other
>> bugs are.
>>
>> And like stated above, IDBCursor.update() can already throw if the
>> passed in object can't be structurally cloned.
>>
>>
>> Jeremy previously asked if there was a test we could use to
>> clearly/intuitively break error conditions into two groups. Ones that
>> cause exceptions to be thrown, and ones that cause error events to be
>> fired. I would say that errors that do not depend on what data is in
>> the database, but rather are clearly due to errors at the call site
>> should throw an exception.
>
> This would limit us in the future in terms of schema changes.  The current
> async interface differs starting the transaction until the first call that
> accesses/modifies data (which are all async).  If we ever allow a schema
> change to happen without disconnecting all clients, it'd be possible that
> the objectStore could be deleted between when the call is made and when the
> transaction is actually allowed to start.

I'm not quite following here. Even if we in the future allow
objectStores to be deleted while there are transactions open against
it, then .add/.put would still know if we're inside a READ_ONLY or
READ_WRITE transaction, no? And so could still throw an error if we're
in a READ_ONLY transaction.

By the test defined above, .put would in that situation have to fire
an error event, rather than throw, if properly called on an READ_WRITE
transaction, but where the objectStore had been deleted. This because
we would have to check with the database if the objectStore was
deleted and thus would fail the "do not depend on what data is in the
database" check.

> This also will limit what can be done on a background thread.  For example,
> an implementation couldn't do serialization of the object on a background
> thread (yes, if you did this, you'd need to make sure the main thread didn't
> modify it until it finished serializing).

For what it's worth, HTML5 already defines that the structured clone
algorithm throws an exception, so that's not really something
introduced by me in this thread. I also think that the problem you
describe in parenthesis effectively prevents you from doing background
serialization, at least without first copying so much data that you
could also perform the constraints checks at the same time.

> Because of these reasons, I'm not too excited about this particular
> heuristic for when to throw vs fire an error callback.  I've thought about
> it a bit and can't think of anything better though, unfortunately.
> I think I'm still slightly in favor of routing all errors through onerror
> callbacks and never throwing from a function that returns an IDBResult, but
> I think there were some good points brought up by Jonas for why throwing on
> some errors would make sense.

I think HTML5 already forces us to make .put/.add/.update throw in
certain circumstances. And I think the benefits that come with
exception handling outweigh the theoretical possibility of performing
the structured clone in a background thread.

/ Jonas

Jonas Sicking | 1 Jul 04:07 2010

[IndexedDB] .value of no-duplicate cursors

Hi All,

This was one issue we ran into while implementing IndexedDB. In the
code examples I'll use the mozilla proposed asynchronous APIs, but the
issue applies equally to the spec as it is now, as well as the
synchronous APIs.

Consider an objectStore containing the following objects:

{ id: 1, name: "foo", flags: ["hi", "low"] }
{ id: 2, name: "foo", flags: ["apple", "orange"] }
{ id: 3, name: "foo", flags: ["hello", "world"] }
{ id: 4, name: "bar", flags: ["fahrvergnügen"] }

And an index keyed on the "name" property. What should the following code alert?

results = [];
db.objectStore("myObjectStore").index("nameIndex").openCursor(null,
IDBCursor.NEXT_NO_DUPLICATE).onsuccess = function(e) {
  cursor = e.result;
  if (!cursor) {
    alert(results.length);
    alert(results);
  }
  results.push(cursor.value);
  cursor.continue();
};

It's clear that the first alert would display '2', as there are 2
distinct 'name' values in the objectStore. However it's not clear what
the second alert would show. I.e. what would cursor.value be on each
'success' event firing?

We could define that it is one of the rows matching the distinct
value. In that case either "1,4", "2,4" or "3,4" would be valid values
for the second alert. If we choose that solution then ideally we
should define which one and make it consistent in all implementations.

Alternatively we could say that .value is null for all *_NO_DUPLICATE cursors.

The question equally applies if the above code used openObjectCursor
rather than openCursor. However if we define that .value is null for
*_NO_DUPLICATE cursors, then openObjectCursor with *_NO_DUPLICATE
doesn't make much sense in that it returns the same thing as
openCursor with *_NO_DUPLICATE.

I don't personally don't care much which solution we use. I'm unclear
on what the exact use cases are for *_NO_DUPLICATE cursors. However if
we do say that .value should represent a particular row, then I think
we should define which row is returned.

/ Jonas

Brett Zamir | 1 Jul 04:08 2010
Picon

Re: Custom DOM events and privileged browser add-ons; Was: Bubbling/Capturing for XHR + other non-DOM objects

> On 6/29/10 2:36 PM, Chris Wilson wrote:
>> See, this is exactly why we asked the question - because it seems 
>> that behavior is inconsistent, we're not sure what the expectation is.
>
> Note that the Firefox behavior I described is irrelevant to 
> specification efforts, because it's not visible to web pages....

I would really like to know (along the lines of the changed thread 
title) why other browsers are not, as it appears, interested in making 
this part of a specification effort.

Although there might not yet be interest to make some official standard 
for browser add-ons at this point, I would think that in this one 
crucial area, browsers could consider allowing extension authors and web 
developers the ability to have a common means, regardless of browser, of 
communicating back-and-forth in the same "protocol" by using custom DOM 
events in this way as Firefox currently does. It is no small 
functionality to allow websites the abiility to communicate with add-ons 
and in an extensible manner.

Despite being an advocate of open source and open formats myself, I 
strongly disagree with the sentiment I have heard some express, that the 
ability to make custom formats within X/HTML, whether through namespaced 
elements and attributes or processing instructions (both regrettably 
disallowed by HTML at present, in my view) or by custom DOM events, is 
only promoting proprietary formats.

On the contrary, I feel that such ability to experiment allows new 
standards to emerge which meet needs that HTML is not presently willing 
to implement.

Here are just a few use cases, though the list could really go on and on:
1) Allowing websites to interact with client-side chat clients for 
real-time collaboration by site visitors (as the Firefox extension 
SamePlace does)
2) Allowing websites to share and access each other's databases when 
permitted by the site (or even by the user alone)
     a) Adoption of more specific shared database formats in a specific 
genre like the ability to view, schedule, and edit meetings or events
3) Supporting the ability to query XML databases with powerful XQuery 
and update facilities through the browser.

While it is great that the big browsers have settled on avoiding too 
extreme a competition with their own new markup, going through standards 
bodies at least in cases like <video/> where there is no need for 1000 
different ways to express the markup (even while the case could be made 
for allowing 1000 different namespaced attributes on the tag until 
standardization is complete), smaller sites or special interests if not 
bigger organizations as well, still need the ability to innovate. The 
web will clearly not settle for perpetually proprietary formats anyways, 
except in cases where the standards have not yet caught up.  And 
"proprietary" here is a relative term since the underlying platform 
(DOM/XML) is still itself standardized.

Better to give the means for innovation in the first place but in a way 
which can be made to work cross-browser, rather than shut the door and 
treat web innovators and users paternalistically if not outright 
domineering them by maintaining exclusive control in the name of their 
supposed interest. While it is great that web authors have not been 
precipitously forced to use XML, it is a pity, in my view, that the 
extensibility of XML has not been carried over and embraced, even if a 
co-editor of XML himself (reasonably) suggested avoiding creating new 
dialects where possible. Please give us at least one means of extending 
functionality beyond what you currently happen to support or want to 
support...

Brett

Krzysztof Maczyński | 1 Jul 04:08 2010
Picon

LCWD comments

Dear WebApps WG,

In this message I state my LC comments on the following documents:
[a] http://www.w3.org/TR/2009/WD-eventsource-20091222/
[b] http://www.w3.org/TR/2009/WD-webstorage-20091222/
[c] http://www.w3.org/TR/2009/WD-workers-20091222/

Some comments on particular drafts apply to those listed later above as well.

0. In [a]:
> The W3C Web Apps Working Group
It's either "WebApps" or "Web Applications" according to the charter.

1. [a] references HTML5 which is unlikely to go to Rec any time soon. What path do you envision following to
resolve this? In particular, do you agree that these bits are generic, not specific to HTML, and therefore
should be in separate specs and become Recs sooner (possibly at first with only a subset of what's
currently in the HTML5 draft, given the possibility of multiple reiterations)? (Reminder: charters of
our groups call explicitly for aligning such issues.)

2. In [a]:
> HTTP 302 Found, 303 See Other, and 307 Temporary Redirect responses must cause the user agent to connect to
the new server-specified URL, but if the user agent needs to again request the resource at a later point, it
must return to the previously specified URL for this event source.
Does it include only requesting a representation of the resource using the same EventSource object, the
same browsing context, or globally (as long as the UA remembers having requested it before)? Is this
consistent (especially for 303) with HTTP semantics?

3. In [a]:
> Any other HTTP response code not listed here, and any network error that prevents the HTTP connection from
being established in the first place (e.g. DNS errors), must cause the user agent to fail the connection.
I'm unsure whether it doesn't violate semantics of some HTTP codes already defined. But it surely imposes
limits on what future codes may indicate, flying in the face of this extensibility point. All that this
spec should say about the potentially coming unknown is that semantics of other codes must be applied as
required for them.

4. text/event-stream uses text before / which is inappropriate and should be application. Entities of
type text must be, at least as the last resort, feasible for rendering to the user as text (which not any
sequence of octets is, e.g. text/html, but application/msword).

5. In [a]:
> formats of event framing defined by other applicable specifications may be supported
What is event framing? Additionally, are we going to have „other applicable specifications”
extension point everywhere? For the moment it's just a beast dwelling in the HTML5 spec, tolerated until
it gets its extensibility story straight, but if it's going to affect other specs in this way, the TAG
should certainly have a look.

6. Section 10 of [a] seems something new in W3C's LCs. What is the story behind specifying requirements on
finalization (note that this name is better, since "garbage collection" looks like limiting this
behaviour to environments with a GC) and some rules stating when a spec should include them? Has there been
any architectural discussion about it?

7. In [b]:
> The term "JavaScript" is used to refer to ECMA262, rather than the official term ECMAScript, since the
term JavaScript is more widely known.
The term "JavaScript" already refers to a proprietary implementation of ECMAScript. A confusion should
not be entrenched further just because it's common. Clarity will be appreciated by intended readers of
the spec, of whom knowledge of this distinction can be safely assumed. The term is unused anyway.

8. In [b]:
> To mitigate this, pages can use SSL.
Please change to TLS which is the standard name.

9. [b] states more precise requirements on scripts running in the context of an HTMLDocument than
otherwise. Should some of them apply more widely?

10. In [c]:
> the MIME type of the script is ignored
This is a new spec. It shouldn't be plagued with such idiosyncratic legacy mechanisms.

11. In [c]:
> there is no way to override the type. It's always assumed to be JavaScript.
This is a violation of orthogonality for no real benefit.

12. In [c]:
> If there are any outstanding transactions that have callbacks
What's a transaction, when has it got a callback?

13. Why does [c] define ErrorEvent instead of reusing DOMError? Besides, it uses a misnamed attribute
"filename" and suboptimally spelled "lineno".

14. In [c]:
> Thus, scripts must be external files with the same scheme as the original page: you can't load a script from
a data: URL
Why impose this restriction? Is it that exceptions to same-origin policy when it's still new would be
confusing for specifiers and the audience, so this possibility is postponed to the next version? In any
case, I suggest allowing workers to be instantiated with Function objects (whatever this may be in
language bindings, given positive resolution of 11) as well. Including workers directly inline seems
natural in many scenarios.

15. Why using URI fragments is not allowed when instantiating workers? If it remains so, the spec should say
that used URIs MUST be absolute (keeping unchanged the behaviour in the other case as error recovery).

16. In [c]:
> The DOM APIs (Node objects, Document  objects, etc) are not available to workers in this version of this specification.
I understand this as not being forbidden if an implementation provides such an extension. However, those
APIs would operate on nodes floating in the air, unrelated to any document possibly opened.

17. Isn't there already something like WorkerLocation specced?

Best regards,

Krzysztof Maczyński
Invited Expert, HTML WG

Jeremy Orlow | 1 Jul 05:16 2010

Re: [IndexedDB] Should .add/.put/.update throw when called in read-only transaction?

I've thought about this more and have some additional doubts inline.

On Thu, Jul 1, 2010 at 11:55 AM, Jonas Sicking <jonas <at> sicking.cc> wrote:
On Wed, Jun 30, 2010 at 6:42 PM, Jeremy Orlow <jorlow <at> chromium.org> wrote:
> On Thu, Jul 1, 2010 at 11:17 AM, Jonas Sicking <jonas <at> sicking.cc> wrote:
>>
>> Hi All,
>>
>> Currently the IndexedDB specification is silent on what should happen
>> if IDBObjectStore.add, IDBObjectStore.put, IDBObjectStore.remove,
>> IDBCursor.update or IDBCursor.remove() is called from a READ_ONLY
>> transaction. There are two possible ways we can handle this:
>>
>> 1. We can throw an exception.
>> 2. We can return a IDBRequest object and asynchronously fire a 'error'
>> event on this object.
>>
>> The advantage of 1 is that we pretty much know that this was an error
>> due to a bug in the web page,

I don't see why this is compelling.  Many of the other errors that you still propose we fire via the callback are due to bugs in the page.
 
>> and we can always know this
>> synchronously without having to consult the database.

So?  Just because we can doesn't mean we should.
 
>> Throwing an
>> error means that all the existing infrastructure for error handling
>> with automatically kick in. For example any higher-level try/catch
>> constructs will have an opportunity to catch the error.
>> Implementations generally report uncaught exceptions to an error log.
>> The browser will fire an 'error' event on the window which the page
>> can use for further logging. Firing an error event on the other hand
>> does not allow the browser to automatically log the error in a console
>> as the page hasn't yet gotten a chance to handle it.

Sure, but this doesn't help the majority of error conditions in IndexedDB.  It also ignores the cost of handling errors in 2 different ways.
 
>> The advantage of 2 is that this is consistent with other error
>> conditions, such as writing duplicate keys, disk errors during writing
>> the database to disk, internal errors in the database, etc.

The other problem is that users then need 2 sets of error handling routines for each call.  Given how difficult it is to get web developers to do any error checking, requiring 2 types of checks seems like a big downside.
 
>> While consistency, and only needing to check for errors one way, is
>> certainly good arguments, I would argue that people won't need to
>> check for calling-add-on-read-only-transactions. For properly written
>> code it's not an error that will occur, and thus there is no need to
>> check for it. In fact, you probably are generally better off letting
>> the exception bubble all the way up and get logged or caught by
>> generic error handlers.

These are awfully bold assumptions.  Simply not catching the error is not an option for many web applications or libraries.  At very least, they'll need to add finally statements to handle such cases.

>> Additionally, the structured clone algorithm, which defines that an
>> exception should synchronously be thrown if the object is malformed,
>> for example if it consists of a cyclic graph. So .add/.put/.update can
>> already throw under certain circumstances.
>>
>> Also compare to if we were using a different API strategy of making
>> objectStores and cursors returned from READ_ONLY transactions not have
>> mutating functions. In this case if someone tried to call .put(), that
>> also would result in a exception from the JS interpreter stating that
>> you're calling a function that doesn't exist.
>>
>> So I would argue that we should throw for at least all transaction
>> violations. I.e. whenever you try to perform an action not allowed by
>> the current transaction. This would also cover the case of calling
>> createObjectStore/removeObjectStore/createIndex/removeIndex during a
>> non-setVersion-transaction.
>>
>>
>> There is also another case where synchronously know that an error will
>> be reported. We could throw when IDBCursor.update() is called when the
>> underlying object store uses in-line keys and the property at the key
>> path does not match the key in this cursor's position. In this case we
>> similarly immediately know that there is an error without having to
>> consult the database. We also generally can be sure that there is a
>> bug in the web page which would benefit from being reported like other
>> bugs are.
>>
>> And like stated above, IDBCursor.update() can already throw if the
>> passed in object can't be structurally cloned.
>>
>>
>> Jeremy previously asked if there was a test we could use to
>> clearly/intuitively break error conditions into two groups. Ones that
>> cause exceptions to be thrown, and ones that cause error events to be
>> fired. I would say that errors that do not depend on what data is in
>> the database, but rather are clearly due to errors at the call site
>> should throw an exception.
>
> This would limit us in the future in terms of schema changes.  The current
> async interface differs starting the transaction until the first call that
> accesses/modifies data (which are all async).  If we ever allow a schema
> change to happen without disconnecting all clients, it'd be possible that
> the objectStore could be deleted between when the call is made and when the
> transaction is actually allowed to start.

I'm not quite following here. Even if we in the future allow
objectStores to be deleted while there are transactions open against
it, then .add/.put would still know if we're inside a READ_ONLY or
READ_WRITE transaction, no? And so could still throw an error if we're
in a READ_ONLY transaction.

By the test defined above, .put would in that situation have to fire
an error event, rather than throw, if properly called on an READ_WRITE
transaction, but where the objectStore had been deleted. This because
we would have to check with the database if the objectStore was
deleted and thus would fail the "do not depend on what data is in the
database" check.

I don't see how the existence of an objectStore can be considered "data".  Sure an objectStore _contains_ data and that data would no longer be accessible, but I don't see how the existence can be considered anything other than meta data.

> This also will limit what can be done on a background thread.  For example,
> an implementation couldn't do serialization of the object on a background
> thread (yes, if you did this, you'd need to make sure the main thread didn't
> modify it until it finished serializing).

For what it's worth, HTML5 already defines that the structured clone
algorithm throws an exception, so that's not really something
introduced by me in this thread.

Is there any technical reason this has to throw rather than raising the error through the callback?  I don't see one.  We already made the change to WebKit so that we can opt out of throwing on a serialization error.
 
I also think that the problem you
describe in parenthesis effectively prevents you from doing background
serialization, at least without first copying so much data that you
could also perform the constraints checks at the same time.

Some implementations might be able to copy on write efficiently.  You could also put a lock around the data, lock it before main thread execution begins again, and have the main thread take the lock before modifying the data.

This was also just one example of processing that could be done in the background.  And even if none of them are very compelling, keep in mind that we're backing ourselves into a corner in terms of what we can add in the future (without breaking this heuristic).
 
> Because of these reasons, I'm not too excited about this particular
> heuristic for when to throw vs fire an error callback.  I've thought about
> it a bit and can't think of anything better though, unfortunately.
> I think I'm still slightly in favor of routing all errors through onerror
> callbacks and never throwing from a function that returns an IDBResult, but
> I think there were some good points brought up by Jonas for why throwing on
> some errors would make sense.

I think HTML5 already forces us to make .put/.add/.update throw in
certain circumstances. And I think the benefits that come with
exception handling outweigh the theoretical possibility of performing
the structured clone in a background thread.

As I mentioned earlier in the thread, these were examples of small problems in the current spec that I expect will really come back to bite us in the future as we try to add more complex features.  And I'm not convinced HTML5 forces us to do anything.

After thinking more about this, I'm not totally against this proposed change, but I also don't find the arguments in its favor very compelling.

J
Jeremy Orlow | 1 Jul 07:42 2010

Re: [IndexedDB] .value of no-duplicate cursors

On Thu, Jul 1, 2010 at 12:07 PM, Jonas Sicking <jonas <at> sicking.cc> wrote:

Hi All,

This was one issue we ran into while implementing IndexedDB. In the
code examples I'll use the mozilla proposed asynchronous APIs, but the
issue applies equally to the spec as it is now, as well as the
synchronous APIs.

Consider an objectStore containing the following objects:

{ id: 1, name: "foo", flags: ["hi", "low"] }
{ id: 2, name: "foo", flags: ["apple", "orange"] }
{ id: 3, name: "foo", flags: ["hello", "world"] }
{ id: 4, name: "bar", flags: ["fahrvergnügen"] }

And an index keyed on the "name" property. What should the following code alert?

results = [];
db.objectStore("myObjectStore").index("nameIndex").openCursor(null,
IDBCursor.NEXT_NO_DUPLICATE).onsuccess = function(e) {
 cursor = e.result;
 if (!cursor) {
   alert(results.length);
   alert(results);
 }
 results.push(cursor.value);
 cursor.continue();
};

It's clear that the first alert would display '2', as there are 2
distinct 'name' values in the objectStore. However it's not clear what
the second alert would show. I.e. what would cursor.value be on each
'success' event firing?

We could define that it is one of the rows matching the distinct
value. In that case either "1,4", "2,4" or "3,4" would be valid values
for the second alert. If we choose that solution then ideally we
should define which one and make it consistent in all implementations.

Alternatively we could say that .value is null for all *_NO_DUPLICATE cursors.

The question equally applies if the above code used openObjectCursor
rather than openCursor. However if we define that .value is null for
*_NO_DUPLICATE cursors, then openObjectCursor with *_NO_DUPLICATE
doesn't make much sense in that it returns the same thing as
openCursor with *_NO_DUPLICATE.

I don't personally don't care much which solution we use. I'm unclear
on what the exact use cases are for *_NO_DUPLICATE cursors.

This is a very good point.  What are the use cases?  After all, you can easily emulate such a cursor yourself.  Unless there are some compelling use cases, I'd be happy to just get rid of it.
 
However if
we do say that .value should represent a particular row, then I think
we should define which row is returned.

Agreed that it should be deterministic.  I'm fine with null, the first value, or the last value.  If we do null, then I think calling openObjectCursor with *_NO_DUPLICATE should be an error.  It seems like there'd be some value in returning the first or last value, though since an app might know that all entries in some particular object store with some particular key will share some other property in common.  (Of course, in this case, it'd probably be better for the application to normalize the data into multiple objectStores, but given that joins will be a pain and doing so might be overkill, it still seems like a use case worth supporting.  Especially since it's little additional effort on our part.)  If it's between first and last, I'd just as soon say that we return the first value associated with each key.  ...assuming we don't just get rid of this feature for v1.

J

Gmane