Re: Test suite for language tags?
Addison Phillips <addison <at> yahoo-inc.com>
2006-08-01 21:06:02 GMT
> I just wrote a non-validating parser for language tags and I'm looking
> for test data. I want to test bizarre tags to see if the parser does
> classify them properly.
Good for you!
> I'm specially interested in badly-formed tags: the I-D contains mostly
> well-formed tags.
Your best bet is probably to generate subtag sequences based on the
ABNF. Some particular problem cases to check would be:
- singletons in the first position (except for 'x' and the grandfathered
list)
- overlong subtags (longer than 8 characters)
- more than three extlangs
- misplaced extlang (3ALPHA in the third or later position following any
of these: 4ALPHA, 2ALPHA, 3DIGIT, 5*8alphanum, DIGIT 3alpha)[note: stop
at singleton]
- misplaced script (4ALPHA following any of these: 2ALPHA, 3DIGIT,
5*8alphanum, DIGIT 3alphanum)[note: stop at singleton]
- misplaced variant (five or more characters, or four or more starting
with a digit; either occurring before an extlang/script/region is an error).
- non-x singleton followed immediately by a singleton (including 'x')
- missing subtag ("--")
- a dangling hyphen ("foo-bar-baz-") or initial hyphen ("-foo-bar-baz")
- digits in the primary (first) subtag
- repeated singleton (note case insensitivity)
Thus, these are all errors:
(Continue reading)