Martin Duerst | 1 Aug 01:49
Picon
Gravatar

absent, server down

Dear WG,

I'll be away from email for the next 2.5 days. Also, the
server with the Last Call page for the matching draft
will be down most of that time.

Regards,    Martin.

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst <at> it.aoyama.ac.jp     

Erkki Kolehmainen | 1 Aug 16:59
Picon
Favicon

Re: ISO CD 639-6 (was: Re: Language and script encoding standards)

I fully agree on this.

Erkki I. Kolehmainen

John Cowan wrote:

> Doug Ewell scripsit:
> 
>>I fully agree that the appropriate question to discuss right now is 
>>whether the new charter should *allow* discussions about the pros and 
>>cons of adding ISO 639-6-based subtags, *when the time comes to have 
>>those discussions*.  We need not, and indeed should not, conduct those 
>>discussions now.
>>
> 
> Just so.  And I say that the charter should allow such discussions,
> because the relevance (as distinct from the desirability) of
> ISO 639-6 is beyond question.
> 
> 

Picon

Test suite for language tags?

I just wrote a non-validating parser for language tags and I'm looking
for test data. I want to test bizarre tags to see if the parser does
classify them properly.

I'm specially interested in badly-formed tags: the I-D contains mostly
well-formed tags.

Addison Phillips | 1 Aug 23:06
Picon
Favicon

Re: Test suite for language tags?

 > I just wrote a non-validating parser for language tags and I'm looking
 > for test data. I want to test bizarre tags to see if the parser does
 > classify them properly.

Good for you!

 > I'm specially interested in badly-formed tags: the I-D contains mostly
 > well-formed tags.

Your best bet is probably to generate subtag sequences based on the 
ABNF. Some particular problem cases to check would be:

- singletons in the first position (except for 'x' and the grandfathered 
list)
- overlong subtags (longer than 8 characters)
- more than three extlangs
- misplaced extlang (3ALPHA in the third or later position following any 
of these: 4ALPHA, 2ALPHA, 3DIGIT, 5*8alphanum, DIGIT 3alpha)[note: stop 
at singleton]
- misplaced script (4ALPHA following any of these: 2ALPHA, 3DIGIT, 
5*8alphanum, DIGIT 3alphanum)[note: stop at singleton]
- misplaced variant (five or more characters, or four or more starting 
with a digit; either occurring before an extlang/script/region is an error).
- non-x singleton followed immediately by a singleton (including 'x')
- missing subtag ("--")
- a dangling hyphen ("foo-bar-baz-") or initial hyphen ("-foo-bar-baz")
- digits in the primary (first) subtag
- repeated singleton (note case insensitivity)

Thus, these are all errors:
(Continue reading)

Mark Davis | 1 Aug 23:30
Favicon

Re: Test suite for language tags?

What may be useful is that ICU has a test string generator (BNF) that generates strings that match a specified BNF syntax. It augments the regular syntax with percent values that indicate relative weights. That is, if you have

x = (a | b | c)

in the BNF, you can make it

x = (a 25% | b 45% | c 30%)

so that it generates those alternatives with those frequencies.

It is an internal testing class, and doesn't have much documentation, but I thought I'd mention it in case you'd find it useful.

Mark

On 8/1/06, Addison Phillips <addison <at> yahoo-inc.com> wrote:
> I just wrote a non-validating parser for language tags and I'm looking
> for test data. I want to test bizarre tags to see if the parser does
> classify them properly.

Good for you!

> I'm specially interested in badly-formed tags: the I-D contains mostly
> well-formed tags.

Your best bet is probably to generate subtag sequences based on the
ABNF. Some particular problem cases to check would be:

- singletons in the first position (except for 'x' and the grandfathered
list)
- overlong subtags (longer than 8 characters)
- more than three extlangs
- misplaced extlang (3ALPHA in the third or later position following any
of these: 4ALPHA, 2ALPHA, 3DIGIT, 5*8alphanum, DIGIT 3alpha)[note: stop
at singleton]
- misplaced script (4ALPHA following any of these: 2ALPHA, 3DIGIT,
5*8alphanum, DIGIT 3alphanum)[note: stop at singleton]
- misplaced variant (five or more characters, or four or more starting
with a digit; either occurring before an extlang/script/region is an error).
- non-x singleton followed immediately by a singleton (including 'x')
- missing subtag ("--")
- a dangling hyphen ("foo-bar-baz-") or initial hyphen ("-foo-bar-baz")
- digits in the primary (first) subtag
- repeated singleton (note case insensitivity)

Thus, these are all errors:

"a-foo"
"abcdefghi-012345678"
"ab-abc-abc-abc-abc"
"ab-abcd-abc"
"ab-ab-abc"
"ab-123-abc"
"ab-abcde-abc"
"ab-1abc-abc"
"ab-ab-abcd"
"ab-123-abcd"
"ab-abcde-abcd"
"ab-1abc-abcd"
"ab-a-b"
"ab-a-x"
"ab--ab"
"ab-abc-"
"-ab-abc"
"ab-a-abc-a-abc"

These are not errors:

"ab-x-abc-x-abc" // anything goes after x
"ab-x-abc-a-a"   // ditto
"i-default"      // grandfathered

Hope that helps,

Addison

Addison Phillips
Globalization Architect − Yahoo! Inc.

Internationalization is an architecture.
It is not a feature.

_______________________________________________
Ltru mailing list
Ltru <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ltru

Picon

Re: Test suite for language tags?

On Tue, Aug 01, 2006 at 02:30:57PM -0700,
 Mark Davis <mark.davis <at> icu-project.org> wrote 
 a message of 117 lines which said:

> What may be useful is that ICU has a test string generator (BNF)
> that generates strings that match a specified BNF syntax.

Interesting. An ABNF2tests that takes RFC 4234 as input and produces
Junit / PyUnit / Hunit / whatever would certainly be an useful tool
for IETF. I do not use Java but I'll have a look.

Picon

Re: Test suite for language tags?

On Tue, Aug 01, 2006 at 02:06:02PM -0700,
 Addison Phillips <addison <at> yahoo-inc.com> wrote 
 a message of 65 lines which said:

> Some particular problem cases to check would be:

Many thanks for the test cases, I discovered a bug in the parser with
them. Now, it works:

Cases: 58  Tried: 58  Errors: 0  Failures: 0

:-)

Mark Davis | 2 Aug 15:37
Favicon

Re: Re: Test suite for language tags?

I'm sorry I gave the wrong impression. It doesn't use RFC 4234 syntax; it uses Perl-style syntax, eg

x = ( a | b | c ) d* ( e | f )+ ....
instead of
x=("a"/"b"/"c") *d *1("e"/"f")

It would not take a lot of work to have it also take the older syntax as well, but it doesn't use it out of the box.

Mark

On 8/2/06, Stephane Bortzmeyer <bortzmeyer <at> nic.fr> wrote:
On Tue, Aug 01, 2006 at 02:06:02PM -0700,
Addison Phillips <addison <at> yahoo-inc.com> wrote
a message of 65 lines which said:

> Some particular problem cases to check would be:

Many thanks for the test cases, I discovered a bug in the parser with
them. Now, it works:

Cases: 58  Tried: 58  Errors: 0  Failures: 0

:-)


_______________________________________________
Ltru mailing list
Ltru <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ltru

Mark Davis | 2 Aug 15:47
Favicon

Re: Re: Test suite for language tags?

P.S. It was also ported to C++ (although with some limitations) in wbnf.*

Mark

On 8/2/06, Mark Davis < mark.davis <at> icu-project.org> wrote:
I'm sorry I gave the wrong impression. It doesn't use RFC 4234 syntax; it uses Perl-style syntax, eg

x = ( a | b | c ) d* ( e | f )+ ....
instead of
x=("a"/"b"/"c") *d *1("e"/"f")

It would not take a lot of work to have it also take the older syntax as well, but it doesn't use it out of the box.

Mark


On 8/2/06, Stephane Bortzmeyer <bortzmeyer <at> nic.fr> wrote:
On Tue, Aug 01, 2006 at 02:06:02PM -0700,
Addison Phillips <addison <at> yahoo-inc.com> wrote
a message of 65 lines which said:

> Some particular problem cases to check would be:

Many thanks for the test cases, I discovered a bug in the parser with
them. Now, it works:

Cases: 58  Tried: 58  Errors: 0  Failures: 0

:-)


_______________________________________________
Ltru mailing list
Ltru <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ltru


Addison Phillips | 3 Aug 17:48
Picon
Favicon

Re: Test suite for language tags?

Glad it helped.

I could probably have saved time writing that message if I'd just sent 
my JUnits ;-). Glad to see that others are getting their parsers going.

Just curious: will you also be implementing a validating parser?

Addison

Stephane Bortzmeyer wrote:
> On Tue, Aug 01, 2006 at 02:06:02PM -0700,
>  Addison Phillips <addison <at> yahoo-inc.com> wrote 
>  a message of 65 lines which said:
> 
>> Some particular problem cases to check would be:
> 
> Many thanks for the test cases, I discovered a bug in the parser with
> them. Now, it works:
> 
> Cases: 58  Tried: 58  Errors: 0  Failures: 0
> 
> :-)
> 

--

-- 
Addison Phillips
Globalization Architect -- Yahoo! Inc.

Internationalization is an architecture.
It is not a feature.


Gmane