Stefaan A Eeckels | 5 Oct 23:19 2006
Picon

Operator recognition problems with Turkish locale

Hi list,

When using McKoi with a Turkish locale, the internal queries that
create the views don't work, and the database doesn't start. 

The reason is that they are written in upper case, and McKoi's
Operator.get methods converts the keywords to lower case to recognise
them. 

The Turkish locale is particular in that the lower case of 'I'
isn't 'i', but rather what we'd render as 'ý', and as a result the 'IN',
'IS', 'IS NOT', 'NOT IN', 'LIKE' and 'NOT LIKE' operators are not
recognised.

The fix is to convert to upper case instead
(/com/mckoi/database/Operator.java line 348):

    // Operators that are words, convert to upper case...
    // Upper case to avoid the Turkish locale problem [sae]
    op = op.toUpperCase();
    if (op.equals("IS")) { return is_op; }
    else if (op.equals("IS NOT")) { return isn_op; }
    else if (op.equals("LIKE")) { return like_op; }
    else if (op.equals("NOT LIKE")) { return nlike_op; }
    else if (op.equals("REGEX")) { return regex_op; }

    else if (op.equals("IN")) { return in_op; }
    else if (op.equals("NOT IN")) { return nin_op; }

    else if (op.equals("NOT")) { return not_op; }
(Continue reading)

Stefaan A Eeckels | 6 Oct 00:33 2006
Picon

Re: Operator recognition problems with Turkish locale

On Thu, 5 Oct 2006 23:19:02 +0200
Stefaan A Eeckels <Stefaan.Eeckels <at> ecc.lu> wrote:

> The fix is to convert to upper case instead
> (/com/mckoi/database/Operator.java line 348):
> 
>     // Operators that are words, convert to upper case...
>     // Upper case to avoid the Turkish locale problem [sae]
>     op = op.toUpperCase();
>     if (op.equals("IS")) { return is_op; }
>     else if (op.equals("IS NOT")) { return isn_op; }
>     else if (op.equals("LIKE")) { return like_op; }
>     else if (op.equals("NOT LIKE")) { return nlike_op; }
>     else if (op.equals("REGEX")) { return regex_op; }
> 
>     else if (op.equals("IN")) { return in_op; }
>     else if (op.equals("NOT IN")) { return nin_op; }
> 
>     else if (op.equals("NOT")) { return not_op; }
>     else if (op.equals("AND")) { return and_op; }
>     else if (op.equals("OR")) { return or_op; }
> 

Actually, it's not that simple. There are other places where
toLowerCase is used (like the functions) which need to be modified as
well. 

I'll see if I can fix this properly (as we have a Turkish customer, it
is rather important).

(Continue reading)

Stefaan A Eeckels | 11 Oct 01:14 2006
X-Face
Picon

Re: Fix for Operator recognition problems with Turkish locale

On Thu, 5 Oct 2006 23:19:02 +0200
Stefaan A Eeckels <Stefaan.Eeckels <at> ecc.lu> wrote:

> When using McKoi with a Turkish locale, the internal queries that
> create the views don't work, and the database doesn't start. 

I've researched this a bit better, and I've now a better analysis plus
a patch to fix this. 

The problem with Turkish is that there are two types of "I", with and
without dot. The two lowercase letters are \u0069 "i" and \u0131
(dotless lowercase "i") and they are totally unrelated. Their uppercase
versions are \u0130 (dotted capital "I") and \u0049, the dotless
uppercase "I". When using towLowerCase(), the dotless "I" becomes a
dotless "i", which is not the same as in English; as a result,
uppercase keywords - and anything that gets lowercased for to get rid
of case sensitivity - becomes unrecognisable.

Unfortunately, using upper case doesn't solve the problem, because the
lowercase dotted "i" becomes the uppercase dotted "I", which again
causes comparisons to fail. 

The fix I applied to Mckoi is to make sure that the conversions to
lower case (with the exception of the one instance used to implement
the "LOWER()" function are called with the java.util.Locale.ENGLISH
parameter. All conversions to upper case with the exception of the one
instance used to implement the "UPPER()" function should also be
changed to use the java.util.Locale.ENGLISH parameter.

Obviously, if you want your application to run properly in the Turkish
(Continue reading)


Gmane