20 May 2013 07:55
[Bug 48630] New: Data model needs characters, not code points
<bugzilla-daemon <at> wikimedia.org>
2013-05-20 05:55:47 GMT
2013-05-20 05:55:47 GMT
https://bugzilla.wikimedia.org/show_bug.cgi?id=48630 Web browser: --- Bug ID: 48630 Summary: Data model needs characters, not code points Product: VisualEditor Version: unspecified Hardware: All OS: All Status: UNCONFIRMED Severity: normal Priority: Unprioritized Component: Data Model Assignee: esanders <at> wikimedia.org Reporter: david <at> sheetmusic.org.uk CC: jforrester <at> wikimedia.org, roan.kattouw <at> gmail.com, tparscal <at> wikimedia.org Classification: Unclassified Mobile Platform: --- At present, the VisualEditor treats UTF-16 code points as if they were synonymous with abstract characters. Here are two cases where this causes bugs: 1) UTF-16 uses a surrogate pair to represent each Unicode character above U+FFFF. For instance, U+282E2 ('elevator' in Cantonese) is a single character represented in Javascript as "\uD860\uDEE2". In a plain textarea, this behaves like a single character from the point of view of the user. However in the VisualEditor, cursoring and backspacing requires two presses; and after cursoring once, any text typed will go in the middle of the surrogate pair, creating invalid UTF-16. (see The Unicode Standard, Version 6.2, Section 3.8,(Continue reading)
RSS Feed