Dave Pawson | 9 Nov 19:41
Picon
Gravatar

IPA and SAMPA

W3C ssml are using IPA for the specification of the lexicons.

I prefer the simpler (but less comprehensive) SAMPA.
It's easier to use.

Should SAMPA (http://www.phon.ucl.ac.uk/home/sampa/)
be identified as a variant of IPA please?

regards

--

-- 
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk
Michael Everson | 10 Nov 17:26
Favicon
Gravatar

Phonetic orthographies

It's been brought to my attention that people are still worried about 
tags for phonetic orthographies. Obviously such transcription is 
useful, and it would be helpful to be able to identify languages when 
written in such orthographies. Here are a few off the top of my head:

fonipa International Phonetic Alphabet
fonupa Uralic Phonetic Alphabet
fonweb Websters phonetic respelling (i-macron = [aj] etc)
fonami Americanist phonetic tradition
fonlep Lepsius' Standard Alphabet

--

-- 
Michael Everson * http://www.evertype.com
Michael Everson | 10 Nov 18:49
Favicon
Gravatar

Re: Phonetic orthographies

At 11:49 -0500 2006-11-10, John Cowan wrote:

>Thanks for the list.  The issue currently raging on LTRU is whether
>the degree of difference between phonetic and ordinary orthographies,
>or between one phonetic orthography and another, is properly handled as
>a variant, or whether it rises to the level of a subscript.

The ISO 15924 RA has already received and rejected an application for
a script tag for IPA. It does not meet the criteria established in
ISO 15924. It is simply a large collection of Latin letters,
typically drawn in ordinary Roman or Italic style, with a couple of
Greek letters that arguably ought to have been cloned.

This is not my view only. It was the view of the RA. It is of course
recognized that a tagging mechanism is needed, but ISO 15924 script
codes are not not the way to do it.

>There are three positions:
>1) All these orthographies are simply applications of the Latin script;

Yes.

>2) Latin phonetic orthographies are, as a whole, a subscript of Latin;

No. They differ too much.

>3) Each distinct Latin phonetic orthography is a distinct subscript
>     of Latin.

I should think not, really. Websters or Berlitz respellings, or Cut
(Continue reading)

Peter Constable | 10 Nov 19:49
Picon
Favicon

RE: Phonetic orthographies

-----Original Message-----
From: ietf-languages-bounces <at> alvestrand.no
[mailto:ietf-languages-bounces <at> alvestrand.no] On Behalf Of Michael
Everson

> The ISO 15924 RA has already received and rejected an application for
> a script tag for IPA. It does not meet the criteria established in
> ISO 15924. ... This is not my view only. It was the view of the RA. 
> It is of course recognized that a tagging mechanism is needed, but 
> ISO 15924 script codes are not not the way to do it.

Perhaps the ISO 15924 RA would like to suggest a alternative solution to
its user community in view of the request for a solution?

>>2) Latin phonetic orthographies are, as a whole, a subscript of Latin;
>
> No. They differ too much.

They may differ greatly from one another formally, but in terms of
function they clearly form a group that unites them with one another but
differentiate them from Latin practical orthographies in common use.

>>The argument for #2 and #3 is that the degree of unintelligibility of
a
>>phonetic orthography to those who know the conventional one is close
to
>>that of a script-level transcription or transliteration.
>
> Personally I think this is bogus. Yes, there may be some unfamiliar
> letters in the extended alphabet. That depends greatly on the
(Continue reading)

John Cowan | 10 Nov 20:49

Re: Phonetic orthographies

Peter Constable scripsit:

> Think of it like creating a filter for you email inbox, and suppose
> these were tags in the subject field: you'd be creating a bunch of
> rules, one for each of these, with a need to keep adding rules as you
> discovered more and more cases; but the alternative would be having one
> tag, Latp, that was getting used with all of these, allowing you to
> write your rule once and never need to update it. That analogy should
> give you a partial picture of how #2 could be useful and solve a need.

In addition, one may be able to discern a phonetic orthography even if
one cannot reliably distinguish Trager/Smith from Kenyon & Knott.

--

-- 
John Cowan    cowan <at> ccil.org    http://ccil.org/~cowan
Half the lies they tell about me are true.
        -- Tallulah Bankhead, American actress
Michael Everson | 10 Nov 20:52
Favicon
Gravatar

RE: Phonetic orthographies

At 10:49 -0800 2006-11-10, Peter Constable wrote:

>  > The ISO 15924 RA has already received and rejected an application for
>>  a script tag for IPA. It does not meet the criteria established in
>>  ISO 15924. ... This is not my view only. It was the view of the RA.
>>  It is of course recognized that a tagging mechanism is needed, but
>>  ISO 15924 script codes are not not the way to do it.
>
>Perhaps the ISO 15924 RA would like to suggest a alternative solution to
>its user community in view of the request for a solution?

It's not the RA's job to do that, really. 
However, I (for my part) did suggest that the 
following might be used:

fonipa International Phonetic Alphabet
fonupa Uralic Phonetic Alphabet
fonweb Websters phonetic respelling (i-macron = [aj] etc)
fonami Americanist phonetic tradition
fonlep Lepsius' Standard Alphabet
fonmal Landsmaalalfabetet.
fondan Danish dialect alphabet
fornor Norwegian dialect alphabet

All of these are Latin-script orthographies which 
may be written to write any number of languages.

As an aside, consider too the following orthography tags:

monoton Greek Monotonic orthography
(Continue reading)

Randy Presuhn | 10 Nov 21:08
Picon

Re: Phonetic orthographies

Hi -

> From: "Michael Everson" <everson <at> evertype.com>
> To: "IETF Languages Discussion" <ietf-languages <at> iana.org>
> Sent: Friday, November 10, 2006 9:49 AM
> Subject: Re: Phonetic orthographies
...
> Personally I think this is bogus. Yes, there may be some unfamiliar
> letters in the extended alphabet. That depends greatly on the
> language. Look at the Finnish and Estonian examples in the 1949 IPA
> handbook. They hardly differ from standard orthography!

I don't know about the specific cases of Finnish & Estonian,
but for the languages I've studied, it makes a big difference
where the transcription lies on the range from broad phonemic
to narrow phonetic transcriptions.  If the transcription is narrow
enough to describe regional "accent" differences, chances are
good that it will be quite far removed from the standard
orthography.

> >The argument for #2 as opposed to #3 is administrative convenience,
> >making 'Latp' a blanket term, a sort of analogue of 'sgn'.
> 
> I don't see how that solves anything. You would still need a tag to
> determine WHICH phonetic orthography it was (apart from the question
> of how to define "phonetic orthography"). Latn-fonipa is no different
> from *Latp-fonipa in that case.
...

I agree that there is great importance in being able to distinguish
(Continue reading)

Peter Constable | 10 Nov 22:17
Picon
Favicon

RE: Phonetic orthographies

From: ietf-languages-bounces <at> alvestrand.no [mailto:ietf-languages-bounces <at> alvestrand.no] On
Behalf Of Michael Everson

>>Perhaps the ISO 15924 RA would like to suggest a alternative solution to
>>its user community in view of the request for a solution?
>
> It's not the RA's job to do that, really. 

It *is* the RAs job to register tags that users want to use, and to service the user needs for which ISO 15924
was created. If the RA does't feel a particular user need should be met using the standard when users are
suggesting that it should, then IMO the RA should be prepared to suggest where an alternative solution
might lie. Just the the ISO 639 JAC needs to be prepared to do.

> However, I (for my part) did suggest that the 
> following might be used:

Yes, but users are saying these alone are not considered sufficient for the needs, and you have not provided
a solution to that extent.

>>They may differ greatly from one another 
>>formally, but in terms of function they clearly 
>>form a group that unites them with one another 
>>but differentiate them from Latin practical 
>>orthographies in common use.
>
> ISO 15924 is based on form.

Well, let's consider this. Is Fraser a subset of Latin or separate script? In terms of form, it is very
clearly a subset of Latin, yet I believe I've heard you say it must be considered a separate script because
of its unicameral behaviour. Phonetic transcriptions -- certainly those I'm familiar with -- are
(Continue reading)

Mark Davis | 11 Nov 00:52
Favicon

Re: Phonetic orthographies

+1

On 11/10/06, Peter Constable <petercon <at> microsoft.com> wrote:
From: ietf-languages-bounces <at> alvestrand.no [mailto:ietf-languages-bounces <at> alvestrand.no] On Behalf Of Michael Everson

>>Perhaps the ISO 15924 RA would like to suggest a alternative solution to
>>its user community in view of the request for a solution?
>
> It's not the RA's job to do that, really.

It *is* the RAs job to register tags that users want to use, and to service the user needs for which ISO 15924 was created. If the RA does't feel a particular user need should be met using the standard when users are suggesting that it should, then IMO the RA should be prepared to suggest where an alternative solution might lie. Just the the ISO 639 JAC needs to be prepared to do.


> However, I (for my part) did suggest that the
> following might be used:

Yes, but users are saying these alone are not considered sufficient for the needs, and you have not provided a solution to that extent.


>>They may differ greatly from one another
>>formally, but in terms of function they clearly
>>form a group that unites them with one another
>>but differentiate them from Latin practical
>>orthographies in common use.
>
> ISO 15924 is based on form.

Well, let's consider this. Is Fraser a subset of Latin or separate script? In terms of form, it is very clearly a subset of Latin, yet I believe I've heard you say it must be considered a separate script because of its unicameral behaviour. Phonetic transcriptions -- certainly those I'm familiar with -- are absolutely unicameral. ( E.g. in Americanist, "a" and "A" represent distinct sounds.) So, by that line of reasoning, you ought equally to consider phonetic transcriptions separate scripts. I think we'd all agree that that's not where we want to go. But I suggest to you it ought to be enough to say that phonetic transcriptions based on Latin have some distinctive behaviour that warrants considering them a script variant.


>>But the functionality of phonetic transcriptions
>>is clearly distinct, and the desirability for a
>>user of getting content in phonetic
>>transcription vs. common practical orthography
>>is in general very real.
>
> That still does not mean that IPA, or UPA, or
> Landsmålsalfabetet, or Webster's spelling, are
> scripts other than Latin. Nor does it mean that
> they belong to some collective variant of Latin

I think you are too swayed by an academic, graphology perspective and have lost site of the fact that ISO 15924 exists NOT as a form of academic documentation but rather to serve practical IT purposes. (I find this very reminiscent of the es-americas issue: you opposed it because it didn't fit your understanding of dialectology when you were missing the very real practical IT need.)



> I understand that you have a problem because of
> the way that your parsing taxonomy works. I don't
> see how that translates into changing the intent
> of ISO 15924 into

So, let's revisit the intent:

"The codes were devised for use in terminology, lexicography, bibliography, and linguistics, but they may be used for any application requiring the expression of scripts in coded form. This International Standard also includes guidance on the use of script codes in some of these applications."

Again, you've got users saying that they have a need -- including in lexicography and linguistics -- to code Latin-based phonetic transcriptions as a script variant. The intent of the standard is to code just such things, and to provide usage guidance. Please encode "Latp", or please provide guidance as to how the practical need can be better met.



> What script is this in?
>
>       crdiloetis kari da mza k'amatobden tu romeli iqo upro dzlieri.
>
> It's Latin, isn't it?

Yes; and note the complete in appropriateness of

        Crdiloetis kari da mza k'amatobden tu romeli iqo upro dzlieri.

The capitalization has just turned this content into some completely different "orthography" with no known usage. Clearly this is Latin, but with exceptional rules -- i.e. a distinct variant of Latin.



> I comprehend what you are describing. I don't
> think that ISO standards should be, hm, abused in
> this way.

This is not an abuse but a very reasonable and practical IT application. It can only be seen as an abuse if you insist of thinking of the intent of the standard as being to provide academic documentation of scripts, or if you find a much better way to engineer solutions to the IT needs. Again, the RA has not done the latter, so I must assume the RA is doing the former, which is deviating from the intent of the standard.


> *Latp is no different than, say an ISO
> 639 tag *enc, taken to be a variety of "eng"
> 'English' designed for use by speakers of
> varieties of "Commonwealth English" (en-GB,
> en-IE, en-ZA, en-AU, en-NZ) which may share many
> features and be difficult for speakers of other
> varieties of English to understand. It would make
> your filter much easier, but it would be the
> wrong thing to do.

I think a much closer analogy would be an ISO 639 ID zh that encompasses yue, cmn, etc. And ISO 639 does encode zh.



Peter
_______________________________________________
Ietf-languages mailing list
Ietf-languages <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages

_______________________________________________
Ietf-languages mailing list
Ietf-languages <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages
Gerard Meijssen | 11 Nov 08:02

Re: Phonetic orthographies

Hoi,
Actually, zh is only to be used within the confines of ISO-639-1 and 
ISO-639-2. The new standard has zh and consequently zho marked as a 
macro language. Making assumptions on the basis of the zh code is only 
useful for the hopefully short period until the use of zh is only to be 
used for historical reasons.

Consequently using zh as an example for other use is not a great idea.

Thanks,
     GerardM

Mark Davis wrote:
> +1
>
> On 11/10/06, *Peter Constable* <petercon <at> microsoft.com 
> <mailto:petercon <at> microsoft.com>> wrote:
>
>     From: ietf-languages-bounces <at> alvestrand.no
>     <mailto:ietf-languages-bounces <at> alvestrand.no>
>     [mailto:ietf-languages-bounces <at> alvestrand.no
>     <mailto:ietf-languages-bounces <at> alvestrand.no>] On Behalf Of
>     Michael Everson
>
>     >>Perhaps the ISO 15924 RA would like to suggest a alternative
>     solution to
>     >>its user community in view of the request for a solution?
>     >
>     > It's not the RA's job to do that, really.
>
>     It *is* the RAs job to register tags that users want to use, and
>     to service the user needs for which ISO 15924 was created. If the
>     RA does't feel a particular user need should be met using the
>     standard when users are suggesting that it should, then IMO the RA
>     should be prepared to suggest where an alternative solution might
>     lie. Just the the ISO 639 JAC needs to be prepared to do.
>
>
>     > However, I (for my part) did suggest that the
>     > following might be used:
>
>     Yes, but users are saying these alone are not considered
>     sufficient for the needs, and you have not provided a solution to
>     that extent.
>
>
>     >>They may differ greatly from one another
>     >>formally, but in terms of function they clearly
>     >>form a group that unites them with one another
>     >>but differentiate them from Latin practical
>     >>orthographies in common use.
>     >
>     > ISO 15924 is based on form.
>
>     Well, let's consider this. Is Fraser a subset of Latin or separate
>     script? In terms of form, it is very clearly a subset of Latin,
>     yet I believe I've heard you say it must be considered a separate
>     script because of its unicameral behaviour. Phonetic
>     transcriptions -- certainly those I'm familiar with -- are
>     absolutely unicameral. ( E.g. in Americanist, "a" and "A"
>     represent distinct sounds.) So, by that line of reasoning, you
>     ought equally to consider phonetic transcriptions separate
>     scripts. I think we'd all agree that that's not where we want to
>     go. But I suggest to you it ought to be enough to say that
>     phonetic transcriptions based on Latin have some distinctive
>     behaviour that warrants considering them a script variant.
>
>
>     >>But the functionality of phonetic transcriptions
>     >>is clearly distinct, and the desirability for a
>     >>user of getting content in phonetic
>     >>transcription vs. common practical orthography
>     >>is in general very real.
>     >
>     > That still does not mean that IPA, or UPA, or
>     > Landsmålsalfabetet, or Webster's spelling, are
>     > scripts other than Latin. Nor does it mean that
>     > they belong to some collective variant of Latin
>
>     I think you are too swayed by an academic, graphology perspective
>     and have lost site of the fact that ISO 15924 exists NOT as a form
>     of academic documentation but rather to serve practical IT
>     purposes. (I find this very reminiscent of the es-americas issue:
>     you opposed it because it didn't fit your understanding of
>     dialectology when you were missing the very real practical IT need.)
>
>
>
>     > I understand that you have a problem because of
>     > the way that your parsing taxonomy works. I don't
>     > see how that translates into changing the intent
>     > of ISO 15924 into
>
>     So, let's revisit the intent:
>
>     "The codes were devised for use in terminology, lexicography,
>     bibliography, and linguistics, but they may be used for any
>     application requiring the expression of scripts in coded form.
>     This International Standard also includes guidance on the use of
>     script codes in some of these applications."
>
>     Again, you've got users saying that they have a need -- including
>     in lexicography and linguistics -- to code Latin-based phonetic
>     transcriptions as a script variant. The intent of the standard is
>     to code just such things, and to provide usage guidance. Please
>     encode "Latp", or please provide guidance as to how the practical
>     need can be better met.
>
>
>
>     > What script is this in?
>     >
>     >       crdiloetis kari da mza k'amatobden tu romeli iqo upro dzlieri.
>     >
>     > It's Latin, isn't it?
>
>     Yes; and note the complete in appropriateness of
>
>             Crdiloetis kari da mza k'amatobden tu romeli iqo upro dzlieri.
>
>     The capitalization has just turned this content into some
>     completely different "orthography" with no known usage. Clearly
>     this is Latin, but with exceptional rules -- i.e. a distinct
>     variant of Latin.
>
>
>
>     > I comprehend what you are describing. I don't
>     > think that ISO standards should be, hm, abused in
>     > this way.
>
>     This is not an abuse but a very reasonable and practical IT
>     application. It can only be seen as an abuse if you insist of
>     thinking of the intent of the standard as being to provide
>     academic documentation of scripts, or if you find a much better
>     way to engineer solutions to the IT needs. Again, the RA has not
>     done the latter, so I must assume the RA is doing the former,
>     which is deviating from the intent of the standard.
>
>
>     > *Latp is no different than, say an ISO
>     > 639 tag *enc, taken to be a variety of "eng"
>     > 'English' designed for use by speakers of
>     > varieties of "Commonwealth English" (en-GB,
>     > en-IE, en-ZA, en-AU, en-NZ) which may share many
>     > features and be difficult for speakers of other
>     > varieties of English to understand. It would make
>     > your filter much easier, but it would be the
>     > wrong thing to do.
>
>     I think a much closer analogy would be an ISO 639 ID zh that
>     encompasses yue, cmn, etc. And ISO 639 does encode zh.
>
>
>
>     Peter
>

Gmane