[OSM-dev] Re: Re: [OSM-talk] planet.dump
Erik Johansson <erjohan <at> gmail.com>
2006-08-03 08:05:34 GMT
On 8/3/06, Jonas Svensson <jonass <at> lysator.liu.se> wrote:
> On Wed, 2 Aug 2006, Raphael Jacquot wrote:
> > Jonas Svensson wrote:
> > > Has there been any discussion on how to handle international names and
> > > character encodings? Also things like writing direction (left-to-right,
> > > right-to-left and others)?
> > >
> > > I notice that the MapFeatures-page mentions International name, local name
> > > and regional namn so there must have been some thinking on this subject.
> > for starters, the whole thing should be UTF-8
> Yes, wouldn't it be good to change the API to require strings (like names)
> to be UTF-8 when sent to the server/database? If possible also change the
> server to validate strings to be valid UTF-8.
Valid UTF-8 isn't enough. E.g. some time ago someone complained
about the encoding in planet.osm, they gave the example of
"Älvsjövägen" and said it looked horrible in the dump, it was UTF-8
encoded with "&"-entities.
That was perfectly valid UTF-8, but perhaps not the thing you want to
have in the DB. And I don't see how you can make sure applications
handle that correctly, because someone will always write a small
one-line script and make a mess.
1. only pass valid UTF-8 chars