Re: Matching metrics (was: Registry in record-jar format)
John Cowan <jcowan <at> reutershealth.com>
2005-04-06 02:04:33 GMT
Frank Ellermann scripsit:
> I've added 0 for '*' = '*' and no match. Otherwise 8/4/2/1,
> is this what you wanted ? Your metrics has apparently a
> problem with en-Latn-US-scouse:
I'm not clear on whether * = * should count as a match or a no-match.
Originally I thought it should count as a match, but perhaps not.
> If one side wants en-GB-scouse, and the other side offers
> en-Latn-US-scouse (9) or en-Latn-GB (10), and it also has
> en-Brai-GB-scouse (11), then en-Brai-GB-scouse "wins". All
> in the 2nd column for en-GB-scouse.
Fortunately en-US-scouse doesn't exist.
> Not okay, but not completely unintentional, for some languages
> I can guess what the text is about, as long as it's Latn: The
> combined power of forgotten school Latin plus miserable French
> sometimes helps with es or pt. But with ru I'd be lost - with
> luck I can decode some Cyrl. For fy my chances are lousy, for
> dk or nl it's better than zero.
Fair enough.
> One effect you see with both metrics; If one side wants
> en-scouse, and the other side has only en-Latn-US-scouse and
> en-Brai-GB-scouse, you get a draw. Apparently your algorithm
> cannot completely replace the "default script" approach.
Well, that problem applies at all levels: if you ask for en-AU,
then no algorithm can choose between the offered en-GB and en-US
(except RFC 2616, which will simply fail).
For that matter, if you ask for de and nn and nb are all that's
available, the matching algorithm won't help then either.
--
--
Is a chair finely made tragic or comic? Is the John Cowan
portrait of Mona Lisa good if I desire to see jcowan <at> reutershealth.com
it? Is the bust of Sir Philip Crampton lyrical, www.ccil.org/~cowan
epical or dramatic? If a man hacking in fury www.reutershealth.com
at a block of wood make there an image of a cow,
is that image a work of art? If not, why not? --Stephen Dedalus