Tim Peters | 1 May 2003 04:13
Picon

RE: Dictionary tuning

[Raymond Hettinger]
> ...
> I worked on similar approaches last month and found them wanting.
> The concept was that a 64byte cache line held 5.3 dict entries and
> that probing those was much less expensive than making a random
> probe into memory outside of the cache.
>
> The first thing I learned was that the random probes were necessary
> to reduce collisions.  Checking the adjacent space is like a single
> step of linear chaining, it increases the number of collisions.

Yes, I believe that any regularity will.

> That would be fine if the cost were offset by decreased memory
> access time; however, for small dicts, the whole dict is already
> in cache and having more collisions degrades performance
> with no compensating gain.
>
> The next bright idea was to have a separate lookup function for
> small dicts and for larger dictionaries.  I set the large dict lookup
> to search adjacent entries.  The good news is that an artificial
> test of big dicts showed a substantial improvement (around 25%).
> The bad news is that real programs were worse-off than before.

You should qualify that to "some real programs", or perhaps "all real
programs I've tried".  On the other side, I have real programs that access
large dicts in random order, so if you tossed those into your mix, a 25%
gain on those would more than wipe out the 1-2% losses you saw elsewhere.

> A day of investigation showed the cause.  The artificial test
(Continue reading)

Tim Peters | 1 May 2003 04:36
Picon

RE: Dictionary tuning

[Raymond Hettinger]
> ...
> I'm going to write-up an informational PEP to summarize the
> results of research to-date.

I'd suggest instead a text file checked into the Object directory, akin to
the existing listsort.txt -- it's only of interest to the small fraction of
hardcore developers with an optimizing bent.

> After the first draft, I'm sure the other experimenters will each have
> lessons to share.  In addition, I'll attach a benchmarking suite and
> dictionary simulator (fully instrumented).  That way, future generations
> can reproduce the results and pickup where we left-off.

They probably won't, though.  The kind of people attracted to this kind of
micro-level fiddling revel in recreating this kind of stuff themselves.  For
example, you didn't look hard enough to find the sequence of dict simulators
Christian posted last time he got obsessed with this <wink>.  On the chance
that they might, a plain text file-- or a Wiki page! --is easier to update
than a PEP over time.

The benchmarking suite should also be checked in, and should be very
welcome.  Perhaps it's time for a "benchmark" subdirectory under Lib/test?
It doesn't make much sense even now that pystone and sortperf live directly
in the test directory.

> I've decided that this new process should have a name,
> something pithy, yet magical sounding, so it shall be
> dubbed SCIENCE.

(Continue reading)

Tim Peters | 1 May 2003 05:50
Picon

RE: New thread death in test_bsddb3

[Thomas Heller]
> ...
> So is the policy now that it is no longer *allowed* to create another
> thread state, while in previous versions there wasn't any choice,
> because there existed no way to get the existing one?

You can still create all the thread states you like; the new check is in
PyThreadState_Swap(), not in PyThreadState_New().

There was always a choice, but previously Python provided no *help* in
keeping track of whether a thread already had a thread state associated with
it.  That didn't stop careful apps from providing their own mechanisms to do
so.

About policy, yes, it appears to be so now, else Mark wouldn't be raising a
fatal error <wink>.  I view it as having always been the policy (from a
good-faith reading of the previous code), just a policy that was too
expensive for Python to enforce.  There are many policies like that, such as
not passing goofy arguments to macros, and not letting references leak.
Python doesn't currently enforce them because it's currently too expensive
to enforce them.  Over time that can change.

> IMO a fatal error is very harsh, especially there's no problem to
> continue execution - excactly what happens in a release build.

There may or may not be a problem with continued execution -- if you've
associated more than one living thread state with a thread, your app may
very well be fatally confused in a way that's very difficult to diagnose
without this error.

(Continue reading)

Brett Cannon | 1 May 2003 06:38
Picon

RE: Dictionary tuning

[Tim Peters]

> The benchmarking suite should also be checked in, and should be very
> welcome.  Perhaps it's time for a "benchmark" subdirectory under Lib/test?
> It doesn't make much sense even now that pystone and sortperf live directly
> in the test directory.
>

Works for me.  Can we perhaps decide whether we want to do this in the
near future?  I am going to be writing up module docs for the test package
and if we are going to end up moving them I would like to be get this
written into the docs the first time through.

-Brett
Martin v. Löwis | 1 May 2003 06:10
Picon
Gravatar

Re: Initialization hook for extenders

"Patrick J. Miller" <patmiller <at> llnl.gov> writes:

> I actually want this to do some MPI initialization to setup a
> single user prompt with broadcast which has to run after
> Py_Initialize() but before the import of readline.

-1. It is easy enough to copy the code of Py_Main, and customize it
for special requirements. The next user may want to have a hook to put
additional command line options into Py_Main, YAGNI.

Regards,
Martin
Tim Peters | 1 May 2003 08:13
Picon

RE: Re: heaps

[Raymond Hettinger]
>> I'm quite pleased with the version already in CVS.  It is a small
>> masterpiece of exposition, sophistication, simplicity, and speed.
>> A class based interface is not necessary for every algorithm.

[David Eppstein]
> It has some elegance, but omits basic operations that are necessary for
> many heap-based algorithms and are not provided by this interface.

I think Raymond was telling you it isn't intended to be "an interface",
rather it's quite up-front about being a collection of functions that
operate directly on a Python list, implementing a heap in a very
straightforward way, and deliberately not trying to hide any of that.  IOW,
it's a concrete data type, not an abstract one.  I asked, and it doesn't
feel like apologizing for being what it is <wink>.

That's not to say Python couldn't benefit from providing an abstract heap
API too, and umpteen different implementations specialized to different
kinds of heap applications.  It is saying that heapq isn't trying to be
that, so pointing out that it isn't falls kinda flat.

> Specifically, the three algorithms that use heaps in my upper-division
> undergraduate algorithms classes are heapsort (for which heapq works
> fine, but you would generally want to use L.sort() instead), Dijkstra's
> algorithm (and its relatives such as A* and Prim), which needs the
> ability to decrease keys, and event-queue-based plane sweep algorithms
> (e.g. for finding all crossing pairs in a set of line segments) which
> need the ability to delete items from other than the top.

Then some of those will want a different implementation of a heap.  The
(Continue reading)

David Eppstein | 1 May 2003 08:36
Picon

RE: Re: heaps

On 5/1/03 2:13 AM -0400 Tim Peters <tim_one <at> email.msn.com> wrote:
> It surprised me that you tried using heapq at all for this algorithm.  I
> was also surprised that you succeeded <0.9 wink>.

Wink noted, but it surprised me too, a little.  I had thought decrease key 
was a necessary part of the algorithm, not something that could be finessed 
like that.

> You can wrap any interface you like around heapq (that's very easy to do
> in Python), but it won't change that heapq's implementation is poorly
> suited to this application.  priorityDictionary looks like an especially
> nice API for this specific algorithm, but, e.g., impossible to use
> directly for maintaining an N-best queue (priorityDictionary doesn't
> support multiple values with the same priority, right?  if we're trying
> to find the 10,000 poorest people in America, counting only one as dead
> broke would be too Republican for some peoples' tastes <wink>).  OTOH,
> heapq is easy and efficient for *that* class of heap application.

I agree with your main points (heapq's inability to handle certain priority 
queue applications doesn't mean it's useless, and its 
implementation-specific API helps avoid fooling programmers into thinking 
it's any more than what it is).  But I am confused at this example.  Surely 
it's just as easy to store (income,identity) tuples in either data 
structure.

If you mean, you want to find the 10k smallest income values (rather than 
the people having those incomes), then it may be that a better data 
structure would be a simple list L in which the value of L[i] is the count 
of people with income i.

(Continue reading)

Guido van Rossum | 1 May 2003 15:28
Favicon

Re: Dictionary tuning

[Tim]
> > The benchmarking suite should also be checked in, and should be
> > very welcome.  Perhaps it's time for a "benchmark" subdirectory
> > under Lib/test?  It doesn't make much sense even now that pystone
> > and sortperf live directly in the test directory.

> Works for me.  Can we perhaps decide whether we want to do this in the
> near future?  I am going to be writing up module docs for the test package
> and if we are going to end up moving them I would like to be get this
> written into the docs the first time through.
> 
> -Brett

Should the benchmarks directory be part of the distribution, or should
it be in the nondist part of the CVS tree?

--Guido van Rossum (home page: http://www.python.org/~guido/)
Michael Hudson | 1 May 2003 16:07
Favicon

Re: Dictionary tuning

Guido van Rossum <guido <at> python.org> writes:

> [Tim]
>> > The benchmarking suite should also be checked in, and should be
>> > very welcome.  Perhaps it's time for a "benchmark" subdirectory
>> > under Lib/test?  It doesn't make much sense even now that pystone
>> > and sortperf live directly in the test directory.
>
>> Works for me.  Can we perhaps decide whether we want to do this in the
>> near future?  I am going to be writing up module docs for the test package
>> and if we are going to end up moving them I would like to be get this
>> written into the docs the first time through.
>> 
>> -Brett
>
> Should the benchmarks directory be part of the distribution, or should
> it be in the nondist part of the CVS tree?

I can't think why you'd want it in nondist, unless they depend on huge
input files or something.

Cheers,
M.

--

-- 
  Those who have deviant punctuation desires should take care of their
  own perverted needs.                  -- Erik Naggum, comp.lang.lisp
Guido van Rossum | 1 May 2003 16:18
Favicon

Re: Dictionary tuning

> > Should the benchmarks directory be part of the distribution, or should
> > it be in the nondist part of the CVS tree?
> 
> I can't think why you'd want it in nondist, unless they depend on huge
> input files or something.

OK.

--Guido van Rossum (home page: http://www.python.org/~guido/)

Gmane