Paolo Giarrusso | 1 Jan 06:38 2009
Picon

Re: [Fwd: [Fwd: Re: Threaded interpretation]]

On Tue, Dec 30, 2008 at 23:37, Antonio Cuni <anto.cuni <at> gmail.com> wrote:
> Hi,
> Antoine Pitrou told me that his mail got rejected by the mailing list, so I'm
> forwarding it.

> -------- Message transféré --------
> De: Antoine Pitrou <solipsis <at> pitrou.net>
> À: pypy-dev <at> codespeak.net
> Sujet: Re: Threaded interpretation
> Date: Fri, 26 Dec 2008 21:16:36 +0000 (UTC)
>
> Hi people,
>
> By reading this thread I had the idea to write a threaded interpretation patch
> for py3k. Speedup on pybench and pystone is 15-20%.
> http://bugs.python.org/issue4753

> Regards

> Antoine.

That's nice!

I didn't have time until now (Erasmus life is pretty demanding), but
it was nice anyway, also because the code is a bit different, and a
bit better than mine (I didn't realize I could fetch the operand after
dispatch instead of before, and this makes a difference, because after
the jump the result of HAS_ARG() becomes known at compile time).

Regards
(Continue reading)

Armin Rigo | 2 Jan 14:23 2009

Re: Threaded interpretation (was: Re: compiler optimizations: collecting ideas)

Hi Paolo,

On Thu, Dec 25, 2008 at 12:42:18AM +0100, Paolo Giarrusso wrote:
> If I'll want to try something without refcounting, I'll guess I'd turn
> to PyPy, but don't hold your breath for that. The fact that indirect
> threading didn't work, that you're 1.5-2x slower than CPython, and
> that you store locals in frame objects, they all show that the
> abstraction overhead of the interpret is too high.

True, but a 1.5x slowdown is not a big deal on many application; the
blocker is mostly elsewhere.  And on the other hand, we've been working
on the JIT generator -- since a while now, so I cannot make any promise
-- and the goal is to turn this slowish interpreter into a good JITting
one "for free".  As far as I know, this cannot be done so easily if you
start from a lower-level interpreter like CPython.  This is our "real"
goal somehow, or one of our real goals: a JITting virtual machine for
any language that we care to write an interpreter for.

> 3) still, I do believe that working on it was interesting to get
> experience about how to optimize an interpreter.

Sure, it makes some sense if you think about it in this way.  I doesn't
if you think about "I want to give the world a fast Python interpreter",
but you corrected me already: this is not your goal, so it's fine :-)

> And the original idea was to show that real multithreading (without a
> global interpreter lock) cannot be done in Python just because of the
> big design mistakes of CPython.

I disagree on this point.  Jython shows that without redesign of the
(Continue reading)

Paolo Giarrusso | 3 Jan 05:59 2009
Picon

Re: Threaded interpretation (was: Re: compiler optimizations: collecting ideas)

On Fri, Jan 2, 2009 at 14:23, Armin Rigo <arigo <at> tunes.org> wrote:
> Hi Paolo,

> On Thu, Dec 25, 2008 at 12:42:18AM +0100, Paolo Giarrusso wrote:
>> If I'll want to try something without refcounting, I'll guess I'd turn
>> to PyPy, but don't hold your breath for that. The fact that indirect
>> threading didn't work, that you're 1.5-2x slower than CPython, and
>> that you store locals in frame objects, they all show that the
>> abstraction overhead of the interpret is too high.

> True, but a 1.5x slowdown is not a big deal on many application; the
> blocker is mostly elsewhere.

Let's say 1.5x * 2x = 3x, since CPython is not as fast as it could be,
because of refcounting for instance. The 2x is taken from PyVM
performance reports (see http://www.python.org/dev/implementations/).
And for PyPy a slower interpreter means a slower JIT output, isn't it?
See below.

Also, according to CS literature, interpreter performance makes more
difference than JIT. According to the paper about efficient
interpreters I'm mentioning endless times, an inefficient interpreter
is 1000x slower than C, while an efficient one is only 10x slower.
Python is not that slow, but what you wrote about Psyco seems to imply
that there is still lots of room for improvement.

"You might get 10x to 100x speed-ups. It is theoretically possible to
actually speed up this kind of code up to the performance of C
itself."

(Continue reading)

Paolo Giarrusso | 3 Jan 07:18 2009
Picon

GIL removal in PyPy (was: Re: Threaded interpretation)

On Fri, Jan 2, 2009 at 14:23, Armin Rigo <arigo <at> tunes.org> wrote:

> About PyPy, the lesson that we learned is different: it's that
> implementing a GIL-free model requires a bit of tweaking of the whole
> interpreter -- as opposed to doing it in a nicely separable manner.  So
> far we only implemented threads-with-a-GIL, which can be done in this
> separable way (i.e. without changing a single line of most of the
> interpreter).

> Given enough interest we can implement full GIL-free
> multithreading the slow way: by going over most features of the
> interpreter and thinking.

That'll take more time now than it would have taken to do it in the first place.

IMHO, introducing fine-grained locking is one of the things costing x
if done from the beginning and costing 10x to do afterwards. Switching
from refcounting to GC is another of those things (the factor here is
maybe even higher), and your EU report on GC acknowledges this when
talking about CPython's choice of refcounting. The location of all
pointers on the C stack would have to be registered to convert
CPython.
So, I can understand what you mean by "nicely separable".

> Note that it would also be possible to do
> that in CPython (but it would be even more work).

That has been proven to be false :-).
It has been tried in 2004 (look for "Python free threading", I guess),
but free (i.e. GIL-less) threading requires atomic manipulation of
(Continue reading)

Leonardo Santagada | 3 Jan 12:05 2009
Picon

Re: GIL removal in PyPy (was: Re: Threaded interpretation)


On Jan 3, 2009, at 4:18 AM, Paolo Giarrusso wrote:

> There was, luckily, somebody on the mailing list who said "maybe we
> should drop refcounting", but people didn't listen for some reason.

You repeated this meme many times in your emails, so I thought that  
maybe you really didn't see the full picture. This is what I  
understand from the reasoning behind it.

Dropping refcounting and move to free threading would completely break  
all C modules so they would have to be rewritten and would make the  
CPython API much more complex and integration with C libraries hard.  
That's why no one took it seriously. Think like this, breaking all c  
modules would make CPython as usable as haskell :), or just look at  
the number of libraries not available right now for Python 3.0.

It is not some retarded choice made by GvR, but a pragmatic one.  
Python as a language used by millions of people can't completely  
change semantics from version to version.

--
Leonardo Santagada
santagada at gmail.com

_______________________________________________
pypy-dev <at> codespeak.net
http://codespeak.net/mailman/listinfo/pypy-dev

(Continue reading)

Carl Friedrich Bolz | 3 Jan 13:23 2009
Picon
Picon

Re: GIL removal in PyPy (was: Re: Threaded interpretation)

Hi Leonardo,

Leonardo Santagada wrote:
> On Jan 3, 2009, at 4:18 AM, Paolo Giarrusso wrote:
> 
>> There was, luckily, somebody on the mailing list who said "maybe we
>> should drop refcounting", but people didn't listen for some reason.
> 
> 
> You repeated this meme many times in your emails, so I thought that  
> maybe you really didn't see the full picture. This is what I  
> understand from the reasoning behind it.
> 
> Dropping refcounting and move to free threading would completely break  
> all C modules so they would have to be rewritten and would make the  
> CPython API much more complex and integration with C libraries hard.  
> That's why no one took it seriously. Think like this, breaking all c  
> modules would make CPython as usable as haskell :), or just look at  
> the number of libraries not available right now for Python 3.0.
> 
> It is not some retarded choice made by GvR, but a pragmatic one.  
> Python as a language used by millions of people can't completely  
> change semantics from version to version.

While I understand the reasoning above, I don't agree with the last
paragraph. Switching to a different GC doesn't lead to that huge a
change in semantics. There are subtle difference about finalizers (see
the blog posts about that:

http://morepypy.blogspot.com/2008/02/python-finalizers-semantics-part-1.html
(Continue reading)

Armin Rigo | 3 Jan 15:12 2009

Re: Threaded interpretation (was: Re: compiler optimizations: collecting ideas)

Hi,

Your writing style is too annoying to answer properly :-(
I will not try to answer all of the points you mention;
instead, let's just point out (as others have done) that
the picture is more complex that you suppose, because it
includes a lot of aspects that you completely ignore.  I
know definitely that with changes to the semantics of the
language, Python could be made seriously faster.  PyPy is
not about this; it's a collection of people who either
don't care too much about performance, or believe that
even with these semantics Python *can* be made seriously
fast -- it's just harder, but not impossible.

A bientot,

Armin.
_______________________________________________
pypy-dev <at> codespeak.net
http://codespeak.net/mailman/listinfo/pypy-dev

Armin Rigo | 3 Jan 16:33 2009

Re: GIL removal in PyPy (was: Re: Threaded interpretation)

Hi Paolo,

On Sat, Jan 03, 2009 at 07:18:22AM +0100, Paolo Giarrusso wrote:
> > Given enough interest we can implement full GIL-free
> > multithreading the slow way: by going over most features of the
> > interpreter and thinking.
> 
> That'll take more time now than it would have taken to do it in the first place.

Yes, so we probably won't do it (except if we can find another
orthogonal technique to try, i.e. a separable way, which is only at the
state of vague discussion so far).

> > Note that it would also be possible to do
> > that in CPython (but it would be even more work).
> 
> That has been proven to be false :-).

Wrong.  For example, Jython is such an example, showing that we "can" do
it even in C (becauce anything possible in Java is possible in C, given
enough efforts).  I know it has been tried several times over the course
of the CPython development, but never succeeded.  That's why I'm saying
it would be even more work in CPython than in PyPy.

A bientot,

Armin.
_______________________________________________
pypy-dev <at> codespeak.net
http://codespeak.net/mailman/listinfo/pypy-dev
(Continue reading)

Paolo Giarrusso | 3 Jan 19:36 2009
Picon

Re: GIL removal in PyPy (was: Re: Threaded interpretation)

Hi Armin,

On Sat, Jan 3, 2009 at 16:33, Armin Rigo <arigo <at> tunes.org> wrote:
>> > Note that it would also be possible to do
>> > that in CPython (but it would be even more work).

>> That has been proven to be false :-).

> Wrong.  For example, Jython is such an example, showing that we "can" do
> it even in C (becauce anything possible in Java is possible in C, given
> enough efforts).  I know it has been tried several times over the course
> of the CPython development, but never succeeded.  That's why I'm saying
> it would be even more work in CPython than in PyPy.

Obviously what can be done in Java can be done in C, but atomic
refcount manipulation is too expensive, so that, again, would require
dropping refcounting; in fact Jython has no refcounting.

Regards
--

-- 
Paolo Giarrusso
_______________________________________________
pypy-dev <at> codespeak.net
http://codespeak.net/mailman/listinfo/pypy-dev

Paolo Giarrusso | 4 Jan 01:23 2009
Picon

Re: Threaded interpretation (was: Re: compiler optimizations: collecting ideas)

On Sat, Jan 3, 2009 at 15:12, Armin Rigo <arigo <at> tunes.org> wrote:
> Hi,

> Your writing style is too annoying to answer properly :-(

Sorry for that, I don't usually do it. I'm trying to do advocacy of my
position, but sometimes I realize there's a subtle boundary with
flaming; and most of that flaming is actually addressed to CPython
developers.

> I will not try to answer all of the points you mention;
> instead, let's just point out (as others have done) that
> the picture is more complex that you suppose, because it
> includes a lot of aspects that you completely ignore.

Tradeoffs are always involved, but in lots of cases, in the Python
community, misaddressed concerns about simplicity of the Python VM,
and ignorance of literature on Virtual Machines, lead to mistakes. And
I think we mostly agree on the facts underlying this statement, even
if we express that with different harshness.

I don't claim to be an expert of VM development, not at all.
But since there is a big disagreement between what I learned during 3
months of lessons (which was a first course on VM development) and
CPython, either one is wrong; my professor suggests that the Python
side is wrong and that implementers of Smalltalk, Self, and other
languages have some point. And all the evidence I could find
(including your project) agrees with that, on topics like:

- threading in the interpreter
(Continue reading)


Gmane