Guido van Rossum | 4 Sep 21:10 2002

Re: Signal-resistant code (was: Two random and nearly unrelated ideas)


Tim Peters | 1 Sep 09:04 2002
Picon
Picon

RE: Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8

[Neil Schemenauer]
> ...
> For whatever reason, setting HAMBIAS to 1.0 seems to produce worse
results.

It's remarkable.  Graham's scheme is pasted together out of all sorts of
things that shouldn't work <wink>, but this one seems the most mysterious.

It has a huge effect in my 5x5 c.l.py test grid.  Combining all unique msgs
identified as false negative or false positive across all 20 test runs,

At HAMBIAS = 1.0
    total false negatives goes down by a factor of 2 (337 -> 166)
    total false positives goes up by a factor of 7.6 (23 -> 174)

and some of the false positives are just amazing -- David Ascher announcing
a Python conference, Laura Creighton pontificating about the GPL, ... it's
hard to fathom!  One innocuous example:

"""
Hello,
        I love all these speed debates but if speed were our only concern we
would all be writing in assembly for all non internet based programs...!

        Thank you,
        Vincent A. Primavera

prob = 0.99918657946
prob('only') = 0.645419
prob('would') = 0.349237
(Continue reading)

Steve Holden | 1 Sep 11:14 2002
Picon

Re: tiny optimization in ceval mainloop

[Michael Hudson]
> "Steve Holden" <sholden <at> holdenweb.com> writes:
>
> > > A bunch of 0.5% improvements add up.  If there's not much cost in
> > > complexity, why not go for it?
> > >
> >
> > Yeah, right, we just need 200 of them and we're laughing. Computation in
> > infinitesimal time.
>
> Multiply up doesn't have the same ring to it, does it?
>
Indeed not. I try to keep my pedantry in control, but it escapes from time
to time.

regards
-----------------------------------------------------------------------
Steve Holden                                  http://www.holdenweb.com/
Python Web Programming                        pydish.holdenweb.com/pwp/
Previous .sig file retired to                    www.homeforoldsigs.com
-----------------------------------------------------------------------
Skip Montanaro | 1 Sep 14:00 2002

Weekly Python Bug/Patch Summary


Bug/Patch Summary
-----------------

282 open / 2810 total bugs (+7)
119 open / 1676 total patches (+10)

New Bugs
--------

textwrap has problems wrapping hyphens (2002-08-17)
	http://python.org/sf/596434
Another dealloc stack killer (2002-08-25)
	http://python.org/sf/600007
Installing w/o admin generates key error (2002-08-27)
	http://python.org/sf/600952
bug in new execvpe (2002-08-27)
	http://python.org/sf/601077
weird header wrapping in email.Generator (2002-08-28)
	http://python.org/sf/601392
xmlrpclib ignores CDATA (2002-08-28)
	http://python.org/sf/601534
some int results that should be bool (2002-08-29)
	http://python.org/sf/601775
smtplib mishandles empty sender (2002-08-29)
	http://python.org/sf/602029
configure finds c++ w/o --with-cxx (2002-08-29)
	http://python.org/sf/602102
os.popen() negative error code IOError (2002-08-29)
	http://python.org/sf/602245
(Continue reading)

Martin v. Loewis | 1 Sep 23:25 2002
Picon

Re: mimetypes patch #554192

Walter Dörwald <walter <at> livinglogic.de> writes:

> >>Even better would be, if we could assign priorities to the mappings,
> >>so that for e.g. image/jpeg the preferred extension is .jpeg.
> >>Then guess_type() and guess_extension() would return the preferred
> >>mimetype/extension.
> > Do you have a specific application for that in mind? It sounds like
> > overkill.
> 
> I'm using a web mirror script which uses the extensions from
> guess_extension to save all downloaded resources, and I hate it
> when the HTML files are named .htm and JPEG images are named .jpe.

Then this is your preference - others might prefer jpg, just because
their file system can deal better with that. If you can agree that
this is your preference, you should put the preference mechanism into
the application.

Maybe your preference can be expressed algorithmically? It might be
that you always want the longest known extension (it is unlikely that
you prefer "jpeg" over "jpg" just because that contains a vowel :-).

Regards,
Martin
Martin v. Loewis | 1 Sep 23:31 2002
Picon

Re: PyString_DecodeEscape and PEP293

Walter Dörwald <walter <at> livinglogic.de> writes:

> A recent checkin added a function PyString_DecodeEscape()
> to stringobject.c. To make this function PEP293 compatible
> it would need access to unicode_decode_call_errorhandler
> which is defined static in unicodeobject.c. Does
> PyString_DecodeEscape() really need an errors argument?

What do you mean, "really need"? The callers of this function pass the
argument, in particular escape_decode. Is that "real"?

> If yes, we could either move it to unicodeobject.c 

No. It has to do little with Unicode.

> or make unicode_decode_call_errorhandler externally visible.

I don't know this function. What does this have to do with Unicode?

> Another problem that I noticed is that string-escape can't
> be used for encoding Unicode objects:

That is a feature. string-escape has nothing to do with Unicode.

Regards,
Martin
Martin v. Loewis | 1 Sep 23:22 2002
Picon

Re: PEP 277 (unicode filenames): please review

Matthias Urlichs <smurf <at> noris.de> writes:

> Linux and MacOSX use UTF-8 and should probably be treated as such, 
> i.e. I want to open("äöü"), not open("äöü".encode("utf-8")).

What would be "äöü" in this context? Your message was encoded as
Latin-1 - was that deliberate?

You could expect that open(u"äöü") works well; for the way you write
it, somebody needs to know what encoding the string has.

Linux does *not* "use" UTF-8. On the file system API, it treats
arbitrary byte sequences as-is, i.e. when you pass "äöü" as Latin-1,
it will put those bytes on disk - if you later use "äöü" in UTF-8,
Linux won't find the file.

Instead, the convention seems to be that file names are in the
locale's encoding - which might be UTF-8, if you use a UTF-8 locale.

> Byte strings are perfectly OK if they have a common encoding (meaning 
> UTF-8, in some accepted normal form). 

Unfortunately, that precondition is false. There is no common encoding
on Linux.

Regards,
Martin
Martin v. Loewis | 1 Sep 23:57 2002
Picon

Re: To commit or not to commit

Guido van Rossum <guido <at> python.org> writes:

> > Any objections against committing the patch?
> 
> What do MvL and MAL say?

I'm still concerned about the massive amounts of C code, most of which
could be expressed way more compact in Python code. Walter convinced
me that this (the aspect that I picked in a discussion) does have a
real performance impact for real data, so I guess I have to live with
that.

Because of the size, I'm sure there are still bugs in it. I couldn't
spot any by inspection, so I think the patch is ready to be installed.

Regards,
Martin
Delaney, Timothy | 2 Sep 00:53 2002

RE: The first trustworthy <wink> GBayes results

> From: Tim Peters [mailto:tim.one <at> comcast.net]
> 
> Training GBayes is cheap, and the more you feed it the less need to do
> information-destroying transformations (like folding case or ignoring
> punctuation).

Speaking of which, I had a thought this morning (in the shower of course ;)
about a slightly more intelligent tokeniser.

Split on whitespace, then runs of punctuation at the end of "words" are
split off as a separate word.

So:

    a.b.c -> 'a.b.c' (main use: keeps file extensions with filenames)

    A phrase. -> 'A', 'phrase', '.'

    WTF??? -> 'WTF', '???'

    >>> import module -> '>>>', 'import', 'module'

Might this be useful? No code of course ;)

Tim Delaney
Brett Cannon | 2 Sep 00:57 2002
Picon

Python-dev summary for 2002-08-15 - 2002-09-01

Yes, with Michael's permission, I am attempting to start up the Python-dev
summaries again.  Below is my attempt at summarizing the last half of
August.  It's longer then normal summaries, but that is because I bothered
to include discussions on threads that were not directly relating to the
Python core but are interesting nonetheless (e.g., the whole spambayes
thread).

I am posting to Python-dev first before posting to c.l.py, c.l.py.a (also
lwn.net and probably Slashdot) because I want to get the general okay from
the list that I have done a good enough of a job to send this out; I don't
want to have a summary that represents the going-ons here without the
general populace (or just the BDFL since he can overrule =) being okay
with it.  I am also curious as to whether I should go into more or less
detail, leave out the summaries that do not directly pertain to the Python
core, etc.

So please read the summary and let me know if you are okay with it.  If so
I will try to do semi-monthly summaries from now on.  Oh, and I am on
vacation right now and will be doing a lot of travelling in the next two
months, so I can't guarantee summaries will be this quick to come out for
a while.  I will do them, though, even if they are a week late.  =)

Oh, and if I do get the okay to do this, expect a lot of dumb questions
from me in the future in terms of clarifying things.  Just remember, it is
for the good of the Python community.  =)

=======================================

This is a summary of traffic on the python-dev mailing list between August
16, 2002 and September 1, 2002 (exclusive).  It is intended to inform the
(Continue reading)


Gmane