Adam Roach | 5 Apr 2012 17:39
Favicon

Difficulty using DocumentInfo

At the most recent code sprint, I ran into some difficulty with the 
DocumentInfo class (from ietf/doc/models.py).

As originally envisaged, this model looked well designed. It contained 
all the information you might want to know about a document. In 
particular, if one makes certain assumptions about what is meant by 
field names, it contained both a local filesystem path to the document 
(from "get_file_path") and a handy external URL that could presumably be 
used to access the document from anywhere on the Internet (the field 
"external_url").

The latter is very important for generating links to documents once one 
has retrieved them using whatever query is appropriate for the situation.

However, what has ended up being populated in the "external_url" field 
is emphatically not a URL, and is highly frustrating to use.

For example, if I pull the agenda document for the MMUSIC meeting in 
Paris and check the "external_url" field, I get 
"/proceedings/83/agenda/agenda-83-mmusic.htm". I'm not as bothered that 
it's missing a scheme and an authority as I am that I can't get to a 
document at that path on datatracker. For example, If I do an 
href="/proceedings/83/agenda/agenda-83-mmusic.htm" (which the web 
browser ends up resolving to 
http://datatracker.ietf.org/proceedings/83/agenda/agenda-83-mmusic.htm), 
I get a 404 error.

We really need to fix this up. Either we need to add URL patterns to the 
datatracker tree that successfully the URLs in this field (which would 
require a new pattern for each document type), or we need to repopulate 
(Continue reading)

Adam Roach | 5 Apr 2012 19:50
Favicon

Re: Difficulty using DocumentInfo

On 4/5/12 12:11 PM, Ryan Cross wrote:
> The method get_absolute_url() should return the information you are looking for.
>

Actually, I *did* manage to find that method (even though it's on 
Document instead of DocumentInfo). The problem is that it does not work 
-- the resulting URLs don't resolve, at least not for agendas. 
get_absolute_url uses the following model for building agenda path names:

                 url = '%s/proceedings/%s/%s/%s' % 
(settings.MEDIA_URL,meeting.number,self.type_id,filename)

It would appear that the proper form for agendas is:

                 url = '%s/meeting/%s/agenda/%s-agenda/' % 
(settings.MEDIA_URL,meeting.number,session.group.acronym)

And then there are different forms for minutes and slides.

So we're back to the question about whether we should fix 
get_absolute_url() to match ietf/meeting/urls.py or fix 
ietf/meetings/urls.py to match get_absolute_url().

/a
Adam Roach | 5 Apr 2012 19:51
Favicon

Fwd: Re: Difficulty using DocumentInfo

Forwarding back to the tools mailing list, since we need to come to some kind of consensus on this.

/a

-------- Original Message -------- Subject: Date: From: To:
Re: [Tools-discuss] Difficulty using DocumentInfo
Thu, 5 Apr 2012 10:48:04 -0700
Ryan Cross <rcross <at> amsl.com>
Adam Roach <adam <at> nostrum.com>


Yes, I agree the field name is inconsistent with how it's used and it should be fixed. "filename" would be better. I'll have to defer to Henrik to respond to model change requests (or he may have another solution). On Apr 5, 2012, at 10:33 AM, Adam Roach wrote: > On 4/5/12 12:11 PM, Ryan Cross wrote: >> Hi Adam, >> >> You are correct that meeting material documents (slides, agenda, and minutes) contain just a filename not a qualified URL in the external_url field. > > Which is a fine bit of data to store, but -- um -- look at the field name and tell me how that makes any sense. This really makes what is otherwise a nice clean model difficult for beginners to get into. > >> When converting the Meeting Material Manager tool I followed the precedent of the migration in the use of this field because we needed a place to store the disk filename 1) because it might differ from the Document.name and 2) to save the file extension. > > Can we rename the field -- perhaps to something like "filename" -- before too much code gets written around it? This is the kind of crazy legacy hack we were trying to get away from with the migration. > >> The method get_absolute_url() should return the information you are looking for. > > Aha! Yes, that looks exactly like the option (c) I describe below. I'm a bit perplexed about why it is present on Document rather than DocumentInfo. It never even occurred to me to look there. > > /a >
_______________________________________________
Tools-discuss mailing list
Tools-discuss <at> ietf.org
https://www.ietf.org/mailman/listinfo/tools-discuss
Adam Roach | 5 Apr 2012 20:02
Favicon

Re: Fwd: Re: Difficulty using DocumentInfo

So, thinking about Ryan's involvement, I just realized something about 
this issue that makes it far more complicated than it has any right to be.

Ryan is working in the same database as we are, using the same models. 
But his stuff deploys on www.ietf.org

The code I work on deploys on datatracker.ietf.org.

Other stuff, using the same model, deploys on tools.ietf.org.

And then there's a pantheon of esoteric, one-off hosts like pub.ietf.org 
that presumably use the same data model.

None of these hosts have a harmonized URL scheme for accessing documents.

So there's no one method that can be added to the model that works for 
everyone.

We can address this partially by making sure that the main three 
authorities {www,tools,datatracker}.ietf.org use the same path scheme 
for accessing documents. I'm happy to make the changes to datatracker, 
but I *do* note that datatracker was recently changed *away* from using 
the same format at www -- so I want to get a "go ahead" from Henrik 
before I arguably undo someone else's work.

The alternative would be to add something ugly like get_www_url(), 
get_datatracker_url(), get_tools_url(), etc to the DocumentInfo class.

All of those feel like short-term hacks, through. I don't know what to 
do long-term, but the current situation is difficult and will be tricky 
to maintain.

/a
Henrik Levkowetz | 5 Apr 2012 23:18

Re: Difficulty using DocumentInfo

Hi Adam,

On 2012-04-05 17:39 Adam Roach said the following:

<snip very complete presentation of problem>

The thing is, the 'external_url' attribute was not intended to be used
as a way to get the url of a document like a draft, agenda, charter,
etc.; which all reside at well-known paths in the server file-system;
it was meant as a way to record the url at which an _external_ document
(not residing on the datatracker server) could be found.

The conversion code has not adhered to this; which means that I failed
to communicate my intentions for this field.

It also means that both (b) and (c) below should be carried out -- for
(b) this means blanking out this field for docs with known locations.

> (a) Fixing the URL patterns so that the values in 
> DocumentInfo.external_url work
> (b) Fixing the values in DocumentInfo.external_url so they match the 
> existing URL patterns, or
> (c) Adding a new method to DocumentInfo that returns a value that can 
> always be used in an href (and similar constructs); e.g. get_href()
> 
> I can do (a) or (c) myself -- (b) would require that Henrik (or someone 
> with similar access to the database) perform the fix-up.

For (c) I would suggest adding a dictionary to settings.py which
took the document prefix (the result of document.name.split('-',1)[0])
as a key, and returned a path format string, which would use a dictionary
of values relevant to the document to fill in the variable spots.

I'd use PEP 3101 format strings rather than the default %-formatting
in order to be able to express such things as zero-filled 2-digit
revision numbers (see http://www.python.org/dev/peps/pep-3101/)

You could have for instance

DOC_PATHS = {
	...
	"draft": "http://www.ietf.org/id/{doc.name}-{doc.rev:02}.txt",
	...
}

to pick a simple example.

The get_href() would then pick the appropriate format string from the
DOC_PATHS setting, and use .format(doc) on it to fill in the blanks
-- *or*, if there were no pattern for the document prefix, return the
doc.external_url if set.

Some utility functions would probably be needed in order to make it
easy to get things such as meeting and group given a document.

(We probably should have another method to get the path of a document
on the local file system.  That could probably be defined in terms of
get_url() together with an url lookup (or it could be done the other
way around, of course)).

Best regards,

	Henrik
Henrik Levkowetz | 5 Apr 2012 23:22

Re: Fwd: Re: Difficulty using DocumentInfo

Hi Adam, Ryan,

See the longer email I just sent in reply to Adam's first post -- the
use of the external_url field doesn't match the original intentions
for it, and sorting this out will need some cleanup.  We still will
need a way to get at both url and filename for a document, but the
external_url field was *only* intended to provide that for documents
which doesn't reside on the local file system.

Again, I consider the current mix-up a failure on my part to make clear
the intentions for that field.

Best regards,

	Henrik

On 2012-04-05 19:51 Adam Roach said the following:
> Forwarding back to the tools mailing list, since we need to come to some 
> kind of consensus on this.
> 
> /a
> 
> -------- Original Message --------
> Subject: 	Re: [Tools-discuss] Difficulty using DocumentInfo
> Date: 	Thu, 5 Apr 2012 10:48:04 -0700
> From: 	Ryan Cross <rcross <at> amsl.com>
> To: 	Adam Roach <adam <at> nostrum.com>
> 
> 
> 
> Yes, I agree the field name is inconsistent with how it's used and it should be fixed.  "filename" would be
better.  I'll have to defer to Henrik to respond to model change requests (or he may have another solution).
> 
> On Apr 5, 2012, at 10:33 AM, Adam Roach wrote:
> 
>>  On 4/5/12 12:11 PM, Ryan Cross wrote:
>>>  Hi Adam,
>>>
>>>  You are correct that meeting material documents (slides, agenda, and minutes) contain just a filename
not a qualified URL in the external_url field.
>>
>>  Which is a fine bit of data to store, but -- um -- look at the field name and tell me how that makes any sense.
This really makes what is otherwise a nice clean model difficult for beginners to get into.
>>
>>>  When converting the Meeting Material Manager tool I followed the precedent of the migration in the use
of this field because we needed a place to store the disk filename 1) because it might differ from the
Document.name and 2) to save the file extension.
>>
>>  Can we rename the field -- perhaps to something like "filename" -- before too much code gets written
around it? This is the kind of crazy legacy hack we were trying to get away from with the migration.
>>
>>>    The method get_absolute_url() should return the information you are looking for.
>>
>>  Aha! Yes, that looks exactly like the option (c) I describe below. I'm a bit perplexed about why it is
present on Document rather than DocumentInfo. It never even occurred to me to look there.
>>
>>  /a
>>
> 
> 
> 
> 
> 
> _______________________________________________
> Tools-discuss mailing list
> Tools-discuss <at> ietf.org
> https://www.ietf.org/mailman/listinfo/tools-discuss
Henrik Levkowetz | 5 Apr 2012 23:29

Re: Fwd: Re: Difficulty using DocumentInfo

Hi Adam,

On 2012-04-05 20:02 Adam Roach said the following:
> So, thinking about Ryan's involvement, I just realized something about 
> this issue that makes it far more complicated than it has any right to be.
> 
> Ryan is working in the same database as we are, using the same models. 
> But his stuff deploys on www.ietf.org
> 
> The code I work on deploys on datatracker.ietf.org.
> 
> Other stuff, using the same model, deploys on tools.ietf.org.
> 
> And then there's a pantheon of esoteric, one-off hosts like pub.ietf.org 
> that presumably use the same data model.
> 
> None of these hosts have a harmonized URL scheme for accessing documents.

However, there presumably is a canonical location for all documents we
have.  (Which I see, as I read on, is something you propose further down)

But we maybe don't have to go there -- if we define get_path() and get_url()
methods for documents which use DOC_PATH and DOC_URL dictionaries in 
settings.py (which will be different for the different deployments) then
things should still work the same in all deployments.

> So there's no one method that can be added to the model that works for 
> everyone.

I think my proposal above would work, nevertheless...

> We can address this partially by making sure that the main three 
> authorities {www,tools,datatracker}.ietf.org use the same path scheme 
> for accessing documents. I'm happy to make the changes to datatracker, 
> but I *do* note that datatracker was recently changed *away* from using 
> the same format at www -- so I want to get a "go ahead" from Henrik 
> before I arguably undo someone else's work.
> 
> The alternative would be to add something ugly like get_www_url(), 
> get_datatracker_url(), get_tools_url(), etc to the DocumentInfo class.
> 
> All of those feel like short-term hacks, through. I don't know what to 
> do long-term, but the current situation is difficult and will be tricky 
> to maintain.

Let me know what you think of my proposal in my reply to your first note.

Best regards,

	Henrik
Adam Roach | 5 Apr 2012 23:50
Favicon

Re: Fwd: Re: Difficulty using DocumentInfo

On 4/5/12 4:29 PM, Henrik Levkowetz wrote:
> However, there presumably is a canonical location for all documents we
> have.  (Which I see, as I read on, is something you propose further down)

Pretty much, yes. But to be clear: I'd like to have the ability to get 
to documents via the same authority as my page (currently, 
datatracker.ietf.org for all the things I work on). For example, in 
order to resize iframes based on their contents, I need to be able to 
get to their DOM -- which I can do only if they are from the same 
origin. So the idea is that the same path gets you to the same document 
on multiple hosts.

/a
Henrik Levkowetz | 6 Apr 2012 12:56

Re: Fwd: Re: Difficulty using DocumentInfo

Hi Adam,

On 2012-04-05 23:50 Adam Roach said the following:
> On 4/5/12 4:29 PM, Henrik Levkowetz wrote:
>> However, there presumably is a canonical location for all documents we
>> have.  (Which I see, as I read on, is something you propose further down)
> 
> Pretty much, yes. But to be clear: I'd like to have the ability to get 
> to documents via the same authority as my page (currently, 
> datatracker.ietf.org for all the things I work on). For example, in 
> order to resize iframes based on their contents, I need to be able to 
> get to their DOM -- which I can do only if they are from the same 
> origin. So the idea is that the same path gets you to the same document 
> on multiple hosts.

Making this part of settings.py (through a per-document-prefix dictionary)
as I proposed in my first response to you should make the same code work
everywhere.

I do wonder, however, if maybe there are different use cases which need
to be accommodated (possibly with different get_*_url() methods) -- there's
your use case, which prefers same-host urls, and there's another where one
may explicitly prefer to point to the authoritative or canonical location
at which to retrieve a document.

Best regards,

	Henrik
Tony Hansen | 6 Apr 2012 16:26
Picon
Favicon

Re: Fwd: Re: Difficulty using DocumentInfo

Are you suggesting that settings.py should have different 
per-document-prefix dictionaries for the different machines that you're 
running on?

If so, that's wrong: it needs to be per-document-prefix dictionaries for 
the different machines that you're publishing the path for.

Another strawman, instead of get_*_url(), perhaps we need something like:

get_url('datatracker', 'agendas', '99/whatever')

     Tony Hansen

On 4/6/2012 6:56 AM, Henrik Levkowetz wrote:
> Pretty much, yes. But to be clear: I'd like to have the ability to get
> to documents via the same authority as my page (currently,
> datatracker.ietf.org for all the things I work on). For example, in
> order to resize iframes based on their contents, I need to be able to
> get to their DOM -- which I can do only if they are from the same
> origin. So the idea is that the same path gets you to the same document
> on multiple hosts.
> Making this part of settings.py (through a per-document-prefix dictionary)
> as I proposed in my first response to you should make the same code work
> everywhere.
>
> I do wonder, however, if maybe there are different use cases which need
> to be accommodated (possibly with different get_*_url() methods) -- there's
> your use case, which prefers same-host urls, and there's another where one
> may explicitly prefer to point to the authoritative or canonical location
> at which to retrieve a document.

Gmane