Praveen Prakash | 1 Dec 02:31
Picon

Unicode equivalence

Hi,

I am from Malayalam Wikipedia (ml.wikipedia - user:Praveenp), and my 
language is Malayalam.  Consider our one big problem.

After the release of Unicode 5.1.0, there are two kind of encoding for 
some characters of Malayalam alphabet (because of reverse 
compatibility). This cause serious problems in linking, searching etc in 
mediawiki software. Currently Windows 7 is the only operating system 
which supports Unicode 5.1.0. (? according to my knowledge), but lot of 
third-party tools for writing and reading Malayalam supports new 
version. And now large quantity of data in Wikimedia projects are in new 
version. It is not possible to link, or search titles encoded in 
pre-Unicode 5.1.0 from Unicode 5.1.0 or vice versa. Currently one of our 
namespace ???????? (Category) also has one such character, so it is 
possible to write ???????? as ?????? which renders same as first but 
different in encoding. It causes problem in categorization also.

Is it possible to put some unicode equivalence 
<http://en.wikipedia.org/wiki/Unicode_equivalence> in mediawiki 
software? We need urgent help.

Pls check 
http://unicode.org/versions/Unicode5.1.0/#Malayalam_Chillu_Characters also

  	*Visual * 	*Representation in 5.0 and Prior* 	*Preferred 5.1 
Representation*
1 	CHILLU_NN.png 	0D23, 0D4D, 200D 	0D7A

	CHILLU_N.png 	0D28, 0D4D, 200D 	0D7B
(Continue reading)

Tim Starling | 1 Dec 02:39
Picon

Re: Unicode equivalence

Praveen Prakash wrote:
> Currently Windows 7 is the only operating system 
> which supports Unicode 5.1.0. (? according to my knowledge), but lot of 
> third-party tools for writing and reading Malayalam supports new 
> version. 

So do you want everything to be converted to the Unicode 5.0 version,
including page titles, namespaces and article content, and for Unicode
5.1 characters sent by browsers during editing to be converted to
Unicode 5.0 before storage? We can probably set that up.

-- Tim Starling
Praveen Prakash | 1 Dec 02:53
Picon

Re: Unicode equivalence

Is it possible to implement some method to tell server both characters 
are same?? It is heard that more changes coming in future versions of 
unicode. And now almost half of the data coming is in unicode 5.1 
version. I am not sure about reverse converting.
Tim Starling wrote:
> Praveen Prakash wrote:
>   
>> Currently Windows 7 is the only operating system 
>> which supports Unicode 5.1.0. (? according to my knowledge), but lot of 
>> third-party tools for writing and reading Malayalam supports new 
>> version. 
>>     
>
> So do you want everything to be converted to the Unicode 5.0 version,
> including page titles, namespaces and article content, and for Unicode
> 5.1 characters sent by browsers during editing to be converted to
> Unicode 5.0 before storage? We can probably set that up.
>
> -- Tim Starling
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l <at> lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>   

--

-- 
Wikipedia Affiliate Button 
(Continue reading)

Gerard Meijssen | 1 Dec 03:06
Picon

Re: Unicode equivalence

Hoi,
Given that we should be moving forward not backward, it makes more sense to
provide Unicode 5.1 characters and webfonts.

The big thing of MediaWiki was that it supported Unicode when this was still
a new thing to do. We should support the latest and the best Unicode
support.

NB this is not an issue that is problematic for Malayam alone. Another
script that was updated was Devanagari ... used for Hindi for instance.. We
have a request for support for fonts for the Ge'ez script ... used for
Amharic and a few others.
Thanks,
     GerardM

2009/12/1 Tim Starling <tstarling <at> wikimedia.org>

> Praveen Prakash wrote:
> > Currently Windows 7 is the only operating system
> > which supports Unicode 5.1.0. (? according to my knowledge), but lot of
> > third-party tools for writing and reading Malayalam supports new
> > version.
>
> So do you want everything to be converted to the Unicode 5.0 version,
> including page titles, namespaces and article content, and for Unicode
> 5.1 characters sent by browsers during editing to be converted to
> Unicode 5.0 before storage? We can probably set that up.
>
> -- Tim Starling
>
(Continue reading)

Tim Starling | 1 Dec 03:25
Picon

Re: Unicode equivalence

Praveen Prakash wrote:
> Is it possible to implement some method to tell server both characters 
> are same?? 

No.

> It is heard that more changes coming in future versions of 
> unicode. And now almost half of the data coming is in unicode 5.1 
> version. I am not sure about reverse converting.

That link you gave in your last post had a conversion table, it looks
pretty straightforward:

CHILLU NN -> NNA, VIRAMA, ZWJ
CHILLU N -> NA, VIRAMA, ZWJ
CHILLU RR -> RA, VIRAMA, ZWJ
CHILLU L -> LA, VIRAMA, ZWJ
CHILLU LL -> LLA, VIRAMA, ZWJ

The other new characters would remain unconverted.

Gerard Meijssen wrote:
> Hoi,
> Given that we should be moving forward not backward, it makes more sense to
> provide Unicode 5.1 characters and webfonts.
> 
> The big thing of MediaWiki was that it supported Unicode when this was still
> a new thing to do. We should support the latest and the best Unicode
> support.

(Continue reading)

William Pietri | 1 Dec 03:31
Favicon
Gravatar

Re: Unicode equivalence

Gerard Meijssen wrote:
> Hoi,
> Given that we should be moving forward not backward, it makes more sense to
> provide Unicode 5.1 characters and webfonts.

If you'll indulge my curiosity for a moment, how well is this dealt with 
on clients? Presumably webfonts would solve the display issue, but I'm 
wondering about things like copy-pasting, bookmarking, feed readers, and 
the like.

Naively, I'd expect that whatever we ended up storing internally, the 
Robustness Principle would suggest we'd accept either sort of character, 
but emit the older one. But since my work with non-roman character sets 
is modest, naiveté is all I have.

William
Praveen Prakash | 1 Dec 04:49
Picon

Re: Unicode equivalence

On Tue, Dec 1, 2009 at 7:55 AM, Tim Starling <tstarling <at> wikimedia.org>wrote:

> Praveen Prakash wrote:
> > Is it possible to implement some method to tell server both characters
> > are same??
>
> No.
>
> > It is heard that more changes coming in future versions of
> > unicode. And now almost half of the data coming is in unicode 5.1
> > version. I am not sure about reverse converting.
>
> That link you gave in your last post had a conversion table, it looks
> pretty straightforward:
>
> CHILLU NN -> NNA, VIRAMA, ZWJ
> CHILLU N -> NA, VIRAMA, ZWJ
> CHILLU RR -> RA, VIRAMA, ZWJ
> CHILLU L -> LA, VIRAMA, ZWJ
> CHILLU LL -> LLA, VIRAMA, ZWJ
>
> The other new characters would remain unconverted.
>
> Gerard Meijssen wrote:
> > Hoi,
> > Given that we should be moving forward not backward, it makes more sense
> to
> > provide Unicode 5.1 characters and webfonts.
> >
> > The big thing of MediaWiki was that it supported Unicode when this was
(Continue reading)

Helder Geovane | 1 Dec 12:16
Picon

Inconsistent revision history after importing

Hello!

After importing
http://pt.wikibooks.org/w/index.php?title=Especial%3ARegisto&type=import&user=&page=Predefini%C3%A7%C3%A3o%3APeqind&year=&month=-1&uselang=en
a template from pt.wikipedia to pt.wikibooks, the revision history seems to
be inconsistent. According to the history
http://pt.wikibooks.org/w/index.php?title=Predefini%C3%A7%C3%A3o:Peqind&action=history&uselang=en
, the last 3 edits are:
----
# (cur) (prev)  2009-12-01T08:48:52 Heldergeovane (Talk | contribs) m (324
bytes) (19 edições de w:Predefinição:Peqind: importando o histórico de
contribuições) (undo)
# (cur) (prev) 2009-11-08T19:17:22 Berganus (Talk | contribs) (324 bytes)
(Criou nova página com '__NOTOC__ {| border="0" id="toc" style="margin: 0
auto;" align=center | '''Índice: ''' A B C D E F G H I [[#J...') (undo)
# (cur) (prev) 2009-07-25T12:30:47 Capmo (Talk | contribs) (673 bytes)
(adicionando {{PAGENAME}}) (undo)
----

When I select the edits from  2009-07-25 and 2009-11-08 for diffs, I get
this
http://pt.wikibooks.org/w/index.php?title=Predefini%C3%A7%C3%A3o%3APeqind&action=historysubmit&diff=142330&oldid=144851&uselang=en
where is is shown "(18 intermediate revisions not shown)". Besides this,
when clicking at "Newer edit →", we go to an edit from 2004
http://pt.wikibooks.org/w/index.php?title=Predefini%C3%A7%C3%A3o:Peqind&diff=next&oldid=142330&uselang=en

What is wrong?

Helder
_______________________________________________
(Continue reading)

Marcus Buck | 1 Dec 12:40

Re: Unicode equivalence

Tim Starling hett schreven:
> Gerard Meijssen wrote:
>   
>> Hoi,
>> Given that we should be moving forward not backward, it makes more sense to
>> provide Unicode 5.1 characters and webfonts.
>>
>> The big thing of MediaWiki was that it supported Unicode when this was still
>> a new thing to do. We should support the latest and the best Unicode
>> support.
>>     
>
> You did read the post didn't you? Forcing everyone to buy Windows 7 is
> not generally the way we do things. Unless the client situation is not
> as bad as it sounds, we will need to have a transition period where we
> support older clients until their market share falls far lower than
> 50%, which is where, by Praveen's figures, it is now.
>   
I guess you are both right. To me the best solution seems to be: accept 
both as input (obviously), normalize everything to 5.1 and store it in 
that codeset (so our data is consistently 5.1). For output convert it to 
5.0 to evade problems with clients not yet ready for 5.1. The advantage 
is, that our data is stored in the most modern format, but still the 
clients are served data that they can process.
If there are performance problems with the conversion on serving or 
anything like that, of course storing the data in 5.0 is still good 
enough. More important than discussing the specific technical details is 
actually doing it, implementing it.

Marcus Buck
(Continue reading)

Roan Kattouw | 1 Dec 14:03
Picon

Re: Unicode equivalence

2009/12/1 Marcus Buck <wiki <at> marcusbuck.org>:
> If there are performance problems with the conversion on serving or
> anything like that, of course storing the data in 5.0 is still good
> enough.
You're answering your own question here: converting the data once and
storing it in 5.0 so it's ready-to-serve is of course faster and
easier than juggling between 5.0 and 5.1 all the time.

Roan Kattouw (Catrope)

Gmane