Martin v. Löwis | 1 Oct 2011 14:52
Picon
Gravatar

Re: [Python-checkins] cpython: Enhance Py_ARRAY_LENGTH(): fail at build time if the argument is not an array

>> Do we really need a new file? Why not pyport.h where other compiler stuff
>> goes?
> 
> I'm not sure that pyport.h is the right place to add Py_MIN, Py_MAX, 
> Py_ARRAY_LENGTH. pyport.h looks to be related to all things specific to the 
> platform like INT_MAX, Py_VA_COPY, ... pymacro.h contains platform independant 
> macros.

I'm -1 on additional header files as well. If no other reasonable place
is found, Python.h is still available.

Regards,
Martin
Stefan Krah | 1 Oct 2011 15:06

PEP-393: request for keeping PyUnicode_EncodeDecimal()

Hello,

the subject says it all. PyUnicode_EncodeDecimal() is listed among
the deprecated functions. In cdecimal, I'm relying on this function
for a number of reasons:

  * It is not trivial to implement.

  * With the Unicode implementation constantly changing, it is nearly
    impossible to know what input is currently regarded as a decimal
    digit. See also:

       http://bugs.python.org/issue10557
       http://bugs.python.org/issue10557#msg123123

         "The API won't go away (it does have its use and is being
          used in 3rd party extensions) [...]"

Stefan Krah

Martin v. Löwis | 1 Oct 2011 15:26
Picon
Gravatar

Re: [Python-checkins] cpython: Implement PEP 393.

Am 29.09.2011 01:21, schrieb Eric V. Smith:
> Is there some reason str.format had such major surgery done to it?

Yes: I couldn't figure out how to do it any other way. The formatting
code had a few basic assumptions which now break (unless you keep using
the legacy API). Primarily, the assumption is that there is a notion of
a "STRINGLIB_CHAR" which is the element of a string representation. With
PEP 393, no such type exists anymore - it depends on the individual
object what the element type for the representation is.

In other cases, I worked around that by compiling the stringlib three
times, for Py_UCS1, Py_UCS2, and Py_UCS4. For one, this gives
considerable code bloat, which I didn't like for the formatting code
(as that is already a considerable amount of code). More importantly,
this approach wouldn't have worked well, anyway, since the formatting
combines multiple Unicode objects (especially with the OutputString
buffer), and different inputs may have different representations. On
top of that, OutputString needs widening support, starting out with
a narrow string, and widening step-by-step as input strings are more
wide than the current output (or not, if the input strings are all
ASCII).

It would have been possible to keep the basic structure by doing
all formatting in Py_UCS4. This would cost a significant memory and
runtime overhead.

> In addition, there are outstanding patches that are now broken.

I'm sorry about that. Try applying them to the new files, though - patch
may still be able to figure out how to integrate them, as the
(Continue reading)

Martin v. Löwis | 1 Oct 2011 16:14
Picon
Gravatar

Re: PEP-393: request for keeping PyUnicode_EncodeDecimal()

> the subject says it all. PyUnicode_EncodeDecimal() is listed among
> the deprecated functions.

Please see the section on deprecation. None of the deprecated functions
will be removed for a period of five years, and afterwards, they will
be kept until usage outside of the core is low. Most likely, this means
they will be kept until Python 4.

>   * It is not trivial to implement.
> 
>   * With the Unicode implementation constantly changing, it is nearly
>     impossible to know what input is currently regarded as a decimal
>     digit. See also:

I still recommend that you come up with your own implementation of that
algorithm. You probably don't need any of the error handler support,
which makes the largest portion of the code. Then, use
Py_UNICODE_TODECIMAL to process individual characters. It's a simple
loop over every character.

In addition, you could also take the same approach as decimal.py,
i.e. do

   self._int = str(int(intpart+fracpart))

This would improve compatibility with the decimal.py implementation,
which doesn't use PyUnicode_EncodeDecimal either (but instead goes
through _PyUnicode_TransformDecimalAndSpaceToASCII).

Regards,
(Continue reading)

Stefan Krah | 1 Oct 2011 16:58

Re: PEP-393: request for keeping PyUnicode_EncodeDecimal()

"Martin v. Löwis" <martin <at> v.loewis.de> wrote:
> > the subject says it all. PyUnicode_EncodeDecimal() is listed among
> > the deprecated functions.
> 
> Please see the section on deprecation. None of the deprecated functions
> will be removed for a period of five years, and afterwards, they will
> be kept until usage outside of the core is low. Most likely, this means
> they will be kept until Python 4.

I've to confess that I missed that; sounds good.

> In addition, you could also take the same approach as decimal.py,
> i.e. do
> 
>    self._int = str(int(intpart+fracpart))
> 
> This would improve compatibility with the decimal.py implementation,
> which doesn't use PyUnicode_EncodeDecimal either (but instead goes
> through _PyUnicode_TransformDecimalAndSpaceToASCII).

longobject.c still used PyUnicode_EncodeDecimal() until 10 months
ago (8304bd765bcf). I missed the PyUnicode_TransformDecimalToASCII()
commit, probably because #10557 is still open.

That's why I wouldn't like to implement the function myself at least
until the API is settled.

I see this in the new code:

#if 0
(Continue reading)

Antoine Pitrou | 1 Oct 2011 17:18

Re: cpython: Add _PyUnicode_UTF8() and _PyUnicode_UTF8_LENGTH() macros

On Sat, 01 Oct 2011 16:53:44 +0200
victor.stinner <python-checkins <at> python.org> wrote:
> http://hg.python.org/cpython/rev/4afab01f5374
> changeset:   72565:4afab01f5374
> user:        Victor Stinner <victor.stinner <at> haypocalc.com>
> date:        Sat Oct 01 16:48:13 2011 +0200
> summary:
>   Add _PyUnicode_UTF8() and _PyUnicode_UTF8_LENGTH() macros
> 
>  * Rename existing _PyUnicode_UTF8() macro to PyUnicode_UTF8()

Wouldn't this be better called PyUnicode_AS_UTF8()?

Martin v. Löwis | 1 Oct 2011 17:40
Picon
Gravatar

Re: PEP-393: request for keeping PyUnicode_EncodeDecimal()

> longobject.c still used PyUnicode_EncodeDecimal() until 10 months
> ago (8304bd765bcf). I missed the PyUnicode_TransformDecimalToASCII()
> commit, probably because #10557 is still open.
> 
> That's why I wouldn't like to implement the function myself at least
> until the API is settled.

I don't understand. If you implement it yourself, you don't have to
worry at all what the API is. Py_UNICODE_TODECIMAL has been around
for a long time, and will stay, no matter how number parsing is
implemented. That's all you need.

   out = malloc(PyUnicode_GET_LENGTH(in)+1);
   for (i = 0; i < PyUnicode_GET_LENGTH(in); i++) {
       Py_UCS4 ch = PyUnicode_READ_CHAR(in, i);
       int d = Py_UNICODE_TODIGIT(ch);
       if (d != -1) {
          out[i] == '0'+d;
          continue;
       }
       if (ch < 128)
          out[i] = ch;
       else {
          error();
          return;
       }
   }
   out[i] = '\0';

OTOH, *if* number parsing is ever updated (e.g. to consider alternative
(Continue reading)

Martin v. Löwis | 1 Oct 2011 17:47
Picon
Gravatar

Re: cpython: Add _PyUnicode_UTF8() and _PyUnicode_UTF8_LENGTH() macros

Am 01.10.2011 17:18, schrieb Antoine Pitrou:
> On Sat, 01 Oct 2011 16:53:44 +0200
> victor.stinner <python-checkins <at> python.org> wrote:
>> http://hg.python.org/cpython/rev/4afab01f5374
>> changeset:   72565:4afab01f5374
>> user:        Victor Stinner <victor.stinner <at> haypocalc.com>
>> date:        Sat Oct 01 16:48:13 2011 +0200
>> summary:
>>   Add _PyUnicode_UTF8() and _PyUnicode_UTF8_LENGTH() macros
>>
>>  * Rename existing _PyUnicode_UTF8() macro to PyUnicode_UTF8()
> 
> Wouldn't this be better called PyUnicode_AS_UTF8()?

No. _AS_UTF8 would imply that some conversion function is called.
In this case, it's a pure structure accessor macro, that may give
NULL if the pointer is not yet filled out.

It's not called Py_AS_TYPE, but Py_TYPE; likewise not
PyWeakref_AS_OBJECT, but PyWeakref_GET_OBJECT. In this case,
PyUnicode_GET_UTF8 might have been an alternative.

Regards,
Martin
Antoine Pitrou | 1 Oct 2011 17:48

Re: cpython: Add _PyUnicode_UTF8() and _PyUnicode_UTF8_LENGTH() macros

On Sat, 01 Oct 2011 17:47:26 +0200
"Martin v. Löwis" <martin <at> v.loewis.de> wrote:
> Am 01.10.2011 17:18, schrieb Antoine Pitrou:
> > On Sat, 01 Oct 2011 16:53:44 +0200
> > victor.stinner <python-checkins <at> python.org> wrote:
> >> http://hg.python.org/cpython/rev/4afab01f5374
> >> changeset:   72565:4afab01f5374
> >> user:        Victor Stinner <victor.stinner <at> haypocalc.com>
> >> date:        Sat Oct 01 16:48:13 2011 +0200
> >> summary:
> >>   Add _PyUnicode_UTF8() and _PyUnicode_UTF8_LENGTH() macros
> >>
> >>  * Rename existing _PyUnicode_UTF8() macro to PyUnicode_UTF8()
> > 
> > Wouldn't this be better called PyUnicode_AS_UTF8()?
> 
> No. _AS_UTF8 would imply that some conversion function is called.

PyBytes_AS_STRING doesn't call any conversion function, and neither did
PyUnicode_AS_UNICODE.

Martin v. Löwis | 1 Oct 2011 17:52
Picon
Gravatar

Re: What it takes to change a single keyword.

> First of all, I am sincerely sorry if this is wrong mailing list to ask
> this question. I checked out definitions of couple other mailing list,
> and this one seemed most suitable. Here is my question:

In principle, python-list would be more appropriate, but this really
is a border case. So welcome!

> Let's say I want to change a single keyword, let's say import keyword,
> to be spelled as something else, like it's translation to my language. I
> guess it would be more complicated than modifiying Grammar/Grammar, but
> I can't be sure which files should get edited.

Hmm. I also think editing Grammar/Grammar should be sufficient. Try
restricting yourself to ASCII keywords first; this just worked fine for
me.

Of course, if you change a single keyword, none of the existing Python
code will work anymore. See for yourself by changing 'def' to 'fed' (say).

Regards,
Martin

Gmane