Mark Hammond | 1 May 2009 02:20
Picon
Gravatar

Re: Proposed: add support for UNC paths to all functions in ntpath

Larry Hastings wrote:
> 
> 
> Counting the votes for http://bugs.python.org/issue5799 :
> 
>    +1 from Mark Hammond (via private mail)
>    +1 from Paul Moore (via the tracker)
>    +1 from Tim Golden (in Python-ideas, though what he literally said
>    was "I'm up for it")
>    +1 from Michael Foord
>    +1 from Eric Smith
> 
> There have been no other votes.
> 
> Is that enough consensus for it to go in?  If so, are there any core 
> developers who could help me get it in before the 3.1 feature freeze?  
> The patch should be in good shape; it has unit tests and updated 
> documentation.

I've taken the liberty of explicitly CCing Martin just incase he missed 
the thread with all the noise regarding PEP383.

If there are no objections from Martin or anyone else here, please feel 
free to assign it to me (and mail if I haven't taken action by the day 
before the beta freeze...)

Cheers,

Mark

(Continue reading)

Steven D'Aprano | 1 May 2009 04:40

Re: PEP 383: Non-decodable Bytes in System Character Interfaces

On Fri, 1 May 2009 06:55:48 am Thomas Breuel wrote:

> You can get the same error on Linux:
>
> $ python
> Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
> [GCC 4.3.3] on linux2
> Type "help", "copyright", "credits" or "license" for more
> information.
>
> >>> f=open(chr(255),'w')
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> IOError: [Errno 22] invalid mode ('w') or filename: '\xff'

Works for me under Fedora using ext3 as the file system.

$ python2.6
Python 2.6.1 (r261:67515, Dec 24 2008, 00:33:13)
[GCC 4.1.2 20070502 (Red Hat 4.1.2-12)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> f=open(chr(255),'w')
>>> f.close()
>>> import os
>>> os.remove(chr(255))
>>>                      

Given that chr(255) is a valid filename on my file system, I would 
consider it a bug if Python couldn't deal with a file with that name.
(Continue reading)

Ronald Oussoren | 1 May 2009 07:41
Picon

Re: PEP 383: Non-decodable Bytes in System Character Interfaces


On 30 Apr, 2009, at 21:33, Piet van Oostrum wrote:

>>>>>> Ronald Oussoren <ronaldoussoren <at> mac.com> (RO) wrote:
>
>> RO> For what it's worth, the OSX API's seem to behave as follows:
>> RO> * If you create a file with an non-UTF8 name on a HFS+  
>> filesystem the
>> RO> system automaticly encodes the name.
>
>> RO> That is,  open(chr(255), 'w') will silently create a file named  
>> '%FF'
>> RO> instead of the name you'd expect on a unix system.
>
> Not for me (I am using Python 2.6.2).
>
>>>> f = open(chr(255), 'w')
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> IOError: [Errno 22] invalid mode ('w') or filename: '\xff'
>>>>

That's odd. Which version of OSX do you use?

ronald <at> Rivendell-2[0]$ sw_vers
ProductName:	Mac OS X
ProductVersion:	10.5.6
BuildVersion:	9G55

[~/testdir]
(Continue reading)

Zooko O'Whielacronx | 1 May 2009 07:44
Picon
Gravatar

Re: PEP 383 and GUI libraries

Folks:

My use case (Tahoe-LAFS [1]) requires that I am *able* to read arbitrary
binary names from the filesystem and store them so that I can regenerate
the same byte string later, but it also requires that I *know* whether
what I got was a valid string in the expected encoding (which might be
utf-8) or whether it was not and I need to fall back to storing the
bytes.  So far, it looks like PEP 383 doesn't provide both of these
requirements, so I am going to have to continue working-around the
Python API even after PEP 383.  In fact, it might actually increase the
amount of working-around that I have to do.

If I understand correctly, .decode(encoding, 'strict') will not be
changed by PEP 383.  A new error handler is added, so .decode('utf-8',
'python-escape') performs the utf-8b decoding.  Am I right so far?
Therefore if I have a string of bytes, I can attempt to decode it with
'strict', and if that fails I can set the flag showing that it was not a
valid byte string in the expected encoding, and then I can invoke
.decode('utf-8', 'python-escape') on it.  So far, so good.

(Note that I never want to do .decode(expected_encoding,
'python-escape') -- if it wasn't a valid bytestring in the
expected_encoding, then I want to decode it with utf-8b, regardless of
what the expected encoding was.)

Anyway, I can use it like this:

class FName:
    def __init__(self, name, failed_decode=False):
        self.name = name
(Continue reading)

Martin v. Löwis | 1 May 2009 08:25
Picon
Gravatar

Re: Proposed: add support for UNC paths to all functions in ntpath

> I've taken the liberty of explicitly CCing Martin just incase he missed
> the thread with all the noise regarding PEP383.
> 
> If there are no objections from Martin

It's fine with me - I just won't have time to look into the details of
that change.

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev <at> python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/python-python-dev%40m.gmane.org

Michael Foord | 1 May 2009 11:06
Picon
Favicon
Gravatar

Re: PEP 383 and GUI libraries

Zooko O'Whielacronx wrote:
> [snip...]
> Would it be possible for Python unicode objects to have a flag
> indicating whether the 'python-escape' error handler was present?  That
> would serve the same purpose as my "failed_decode" flag above, and would
> basically allow me to use the Python APIs directory and make all this
> work-around code disappear.
>
> Failing that, I can't see any way to use the os.listdir() in its
> unicode-oriented mode to satisfy Tahoe's requirements.
>
> If you take the above code and then add the fact that you want to use
> the failed_decode flag when *encoding* the d argument to os.listdir(),
> then you get this code: [2].
>
> Oh, I just realized that I *could* use the PEP 383 os.listdir(), like
> this:
>
> def listdir(d):
>     fse = sys.getfilesystemencoding()
>     if fse == 'utf-8b':
>         fse = 'utf-8'
>     ns = []
>     for fn in os.listdir(d):
>         bytes = fn.encode(fse, 'python-escape')
>         try:
>             ns.append(FName(bytes.decode(fse, 'strict')))
>         except UnicodeDecodeError:
>             ns.append(FName(fn.decode('utf-8', 'python-escape'),
>                       failed_decode=True))
(Continue reading)

R. David Murray | 1 May 2009 13:13
Gravatar

Re: PEP 383 and GUI libraries

On Thu, 30 Apr 2009 at 23:44, Zooko O'Whielacronx wrote:
> Would it be possible for Python unicode objects to have a flag
> indicating whether the 'python-escape' error handler was present?  That

Unless I'm misunderstanding something, couldn't you implement what you
need by looking in a given string for the half surrogates?  If you find
one, you have a string python-escape modified, if you don't, it didn't.

What does Tahoe do on Windows when it gets a filename that is not valid
Unicode?  You might not even have to conditionalize the above code
on platform (ie: instead you have a generalized is_valid_unicode test
function that you always use).

--David
_______________________________________________
Python-Dev mailing list
Python-Dev <at> python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/python-python-dev%40m.gmane.org

Martin v. Löwis | 1 May 2009 17:16
Picon
Gravatar

Deferring PEP 382

During Guido's review, we discovered that PEP 382 doesn't
deal with PEP 302 loaders; I believe that it should, though.

Rather than coming up with an ad-hoc design, I propose to
defer the PEP to Python 3.2 - unless somebody can propose
a straight-forward design with not too many new interfaces.

FWIW, my own approach would be to add two new interfaces to
loaders:
1. extend the package path according to .pth files available
   to the loader (alternatively, provide the contents of the
   .pth files of the package in question)
2. search for and execute a package initialization module.

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev <at> python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/python-python-dev%40m.gmane.org

Stephen J. Turnbull | 1 May 2009 17:36
Picon
Favicon

Re: PEP 383: Non-decodable Bytes in System Character Interfaces

James Y Knight writes:

 > in python. It seems like the most common reason why people want to use  
 > SJIS is to make old pre-unicode apps work right in WINE -- in which  
 > case it doesn't actually affect unix python at all.

Mounting external drives, especially USB memory sticks which tend to
be FAT-initialized by the manufacturers, is another common case.

But I don't understand why PEP 383 needs to care at all.
_______________________________________________
Python-Dev mailing list
Python-Dev <at> python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/python-python-dev%40m.gmane.org

Zooko O'Whielacronx | 1 May 2009 17:31
Picon
Gravatar

Re: PEP 383 and GUI libraries

Following-up to my own post to correct a major error:

On Thu, Apr 30, 2009 at 11:44 PM, Zooko O'Whielacronx <zookog <at> gmail.com> wrote:
> Folks:
>
> My use case (Tahoe-LAFS [1]) requires that I am *able* to read arbitrary
> binary names from the filesystem and store them so that I can regenerate
> the same byte string later, but it also requires that I *know* whether
> what I got was a valid string in the expected encoding (which might be
> utf-8) or whether it was not and I need to fall back to storing the
> bytes.

Okay, I am wrong about this.  Having a flag to remember whether I had to
fall back to the utf-8b trick is one method to implement my requirement,
but my actual requirement is this:

Requirement: either the unicode string or the bytes are faithfully
transmitted from one system to another.

That is: if you read a filename from the filesystem, and transmit that
filename to another system and use it, then there are two cases:

Requirement 1: the byte string was valid in the encoding of source
system, in which case the unicode name is faithfully transmitted
(i.e. the bytes that finally land on the target system are the result of
sourcebytes.decode(source_sys_encoding).encode(target_sys_encoding).

Requirement 2: the byte string was not valid in the encoding of source
system, in which case the bytes are faithfully transmitted (i.e. the
bytes that finally land on the target system are the same as the bytes
(Continue reading)


Gmane