7 Mar 2009 18:36
xindy support for Welsh language
David Stone <dfstone <at> lithoi.org.uk>
2009-03-07 17:36:40 GMT
2009-03-07 17:36:40 GMT
I've been trying to write a Welsh language module for xindy, but am unsure what to do, even after reading the source and documentation (though perhaps I haven't looked in the right place). Welsh uses the Latin alphabet, but has digraphs ch, dd, ff, ng, ll, ph, rh, th, which occur as separate letters just after c, d, f, g, l, p, r, t respectively. This is rather like Spanish ch, ll with the traditional spelling. Also, vowels including w and y can take accents: acute, grave, circumflex, diaeresis (some combinations are rare; circumflex is common); accents are ignored when sorting. Some other letters are only used in borrowed words. See e.g. http://en.wikipedia.org/wiki/Welsh_alphabet Because of combinations like w-circumflex, it seemed simplest to use utf8. I've been trying to use make-rules.pl, and using traditional Spanish as a model, spanish/traditional-utf8.pl.in, and have created welsh/utf8.pl.in locally. But I do not understand why $alphabet seems to have a fixed number of elements, nor how to change it to what I require. The comments suggest that you have to have an element for unused letters, which is left as [] if you do not require it. But there is no element for several of the Welsh digraphs. Should I be setting <at> letter_group_names? Is there some other array which I need to set? Eventually I want to use the result with texindy, by making the .xdy file and then specifying it with -M. If the result works, it can of course be included with standard(Continue reading)
> Welsh uses the Latin alphabet, but has digraphs ch, dd, ff, ng, ll,
> ph, rh, th, which occur as separate letters just after c, d, f, g, l,
> p, r, t respectively. This is rather like Spanish ch, ll with the
> traditional spelling. Also, vowels including w and y can take
> accents: acute, grave, circumflex, diaeresis (some combinations are
> rare; circumflex is common); accents are ignored when sorting. Some
> other letters are only used in borrowed words. See
> e.g.
RSS Feed