Dag Sverre Seljebotn | 1 Jun 2012 15:49
Picon
Picon
Gravatar

Re: [Cython] SEP 201 draft: Native callable objects

On 05/31/2012 10:13 PM, Robert Bradshaw wrote:
> On Thu, May 31, 2012 at 12:29 PM, Dag Sverre Seljebotn
> <d.s.seljebotn@...>  wrote:
>> On 05/31/2012 08:50 PM, Robert Bradshaw wrote:
>>>
>>> On Thu, May 31, 2012 at 7:04 AM, Dag Sverre Seljebotn
>>> <d.s.seljebotn@...>    wrote:
>>>>
>>>> [Discussion on numfocus@... please]
>>>>
>>>> I've uploaded a draft-state SEP 201 (previously CEP 1000):
>>>>
>>>> https://github.com/numfocus/sep/blob/master/sep201.rst
>>>>
>>>> """
>>>> Many callable objects are simply wrappers around native code. This holds
>>>> for
>>>> any Cython function, f2py functions, manually written CPython extensions,
>>>> Numba, etc.
>>>>
>>>> Obviously, when native code calls other native code, it would be nice to
>>>> skip the significant cost of boxing and unboxing all the arguments.
>>>> """
>>>>
>>>>
>>>> The thread about this on the Cython list is almost endless:
>>>>
>>>> http://thread.gmane.org/gmane.comp.python.cython.devel/13416/focus=13443
>>>>
>>>> There was a long discussion on the key-comparison vs. interned-string
(Continue reading)

Dag Sverre Seljebotn | 1 Jun 2012 16:25
Picon
Picon
Gravatar

Re: [Cython] SEP 201 draft: Native callable objects

On 06/01/2012 03:49 PM, Dag Sverre Seljebotn wrote:
> On 05/31/2012 10:13 PM, Robert Bradshaw wrote:
>> On Thu, May 31, 2012 at 12:29 PM, Dag Sverre Seljebotn
>> <d.s.seljebotn@...> wrote:
>>> On 05/31/2012 08:50 PM, Robert Bradshaw wrote:
>>>>
>>>> On Thu, May 31, 2012 at 7:04 AM, Dag Sverre Seljebotn
>>>> <d.s.seljebotn@...> wrote:
>>>>>
>>>>> [Discussion on numfocus@... please]
>>>>>
>>>>> I've uploaded a draft-state SEP 201 (previously CEP 1000):
>>>>>
>>>>> https://github.com/numfocus/sep/blob/master/sep201.rst
>>>>>
>>>>> """
>>>>> Many callable objects are simply wrappers around native code. This
>>>>> holds
>>>>> for
>>>>> any Cython function, f2py functions, manually written CPython
>>>>> extensions,
>>>>> Numba, etc.
>>>>>
>>>>> Obviously, when native code calls other native code, it would be
>>>>> nice to
>>>>> skip the significant cost of boxing and unboxing all the arguments.
>>>>> """
>>>>>
>>>>>
>>>>> The thread about this on the Cython list is almost endless:
(Continue reading)

Dag Sverre Seljebotn | 4 Jun 2012 21:44
Picon
Picon
Gravatar

[Cython] Hash-based vtables

Me and Robert had a long discussion on the NumFOCUS list about this
already, but I figured it was better to continue it and provide more
in-depth benchmark results here.

It's basically a new idea of how to provide a vtable based on perfect
hashing, which should be a lot simpler to implement than what I first
imagined.

I'll write down some context first, if you're familiar with this
skip ahead a bit..

This means that you can do fast dispatches *without* the messy
business of binding vtable slots at compile time. To be concrete, this
might e.g. take the form

def f(obj):
     obj.method(3.4) # try to find a vtable with "void method(double)" in it

or, a more typed approach,

# File A
cdef class MyImpl:
     def double method(double x): return x * x

# File B
# Here we never know about MyImpl, hence "duck-typed"
 <at> cython.interface
class MyIntf:
     def double method(double x): pass

(Continue reading)

Dag Sverre Seljebotn | 4 Jun 2012 22:55
Picon
Picon
Gravatar

Re: [Cython] Hash-based vtables

On 06/04/2012 09:44 PM, Dag Sverre Seljebotn wrote:
> Me and Robert had a long discussion on the NumFOCUS list about this
> already, but I figured it was better to continue it and provide more
> in-depth benchmark results here.
>
> It's basically a new idea of how to provide a vtable based on perfect
> hashing, which should be a lot simpler to implement than what I first
> imagined.
>
> I'll write down some context first, if you're familiar with this
> skip ahead a bit..
>
> This means that you can do fast dispatches *without* the messy
> business of binding vtable slots at compile time. To be concrete, this
> might e.g. take the form
>
> def f(obj):
> obj.method(3.4) # try to find a vtable with "void method(double)" in it
>
> or, a more typed approach,
>
> # File A
> cdef class MyImpl:
> def double method(double x): return x * x
>
> # File B
> # Here we never know about MyImpl, hence "duck-typed"
>  <at> cython.interface
> class MyIntf:
> def double method(double x): pass
(Continue reading)

Robert Bradshaw | 4 Jun 2012 23:43
Picon

Re: [Cython] Hash-based vtables

On Mon, Jun 4, 2012 at 1:55 PM, Dag Sverre Seljebotn
<d.s.seljebotn <at> astro.uio.no> wrote:
> On 06/04/2012 09:44 PM, Dag Sverre Seljebotn wrote:
>>
>> Me and Robert had a long discussion on the NumFOCUS list about this
>> already, but I figured it was better to continue it and provide more
>> in-depth benchmark results here.
>>
>> It's basically a new idea of how to provide a vtable based on perfect
>> hashing, which should be a lot simpler to implement than what I first
>> imagined.
>>
>> I'll write down some context first, if you're familiar with this
>> skip ahead a bit..
>>
>> This means that you can do fast dispatches *without* the messy
>> business of binding vtable slots at compile time. To be concrete, this
>> might e.g. take the form
>>
>> def f(obj):
>> obj.method(3.4) # try to find a vtable with "void method(double)" in it
>>
>> or, a more typed approach,
>>
>> # File A
>> cdef class MyImpl:
>> def double method(double x): return x * x
>>
>> # File B
>> # Here we never know about MyImpl, hence "duck-typed"
(Continue reading)

Dag Sverre Seljebotn | 5 Jun 2012 00:07
Picon
Picon
Gravatar

Re: [Cython] Hash-based vtables


Robert Bradshaw <robertwb <at> gmail.com> wrote:

>On Mon, Jun 4, 2012 at 1:55 PM, Dag Sverre Seljebotn
><d.s.seljebotn <at> astro.uio.no> wrote:
>> On 06/04/2012 09:44 PM, Dag Sverre Seljebotn wrote:
>>>
>>> Me and Robert had a long discussion on the NumFOCUS list about this
>>> already, but I figured it was better to continue it and provide more
>>> in-depth benchmark results here.
>>>
>>> It's basically a new idea of how to provide a vtable based on
>perfect
>>> hashing, which should be a lot simpler to implement than what I
>first
>>> imagined.
>>>
>>> I'll write down some context first, if you're familiar with this
>>> skip ahead a bit..
>>>
>>> This means that you can do fast dispatches *without* the messy
>>> business of binding vtable slots at compile time. To be concrete,
>this
>>> might e.g. take the form
>>>
>>> def f(obj):
>>> obj.method(3.4) # try to find a vtable with "void method(double)" in
>it
>>>
>>> or, a more typed approach,
(Continue reading)

Robert Bradshaw | 5 Jun 2012 00:30
Picon

Re: [Cython] Hash-based vtables

On Mon, Jun 4, 2012 at 3:07 PM, Dag Sverre Seljebotn
<d.s.seljebotn <at> astro.uio.no> wrote:
>
>
> Robert Bradshaw <robertwb <at> gmail.com> wrote:
>
>>On Mon, Jun 4, 2012 at 1:55 PM, Dag Sverre Seljebotn
>><d.s.seljebotn <at> astro.uio.no> wrote:
>>> On 06/04/2012 09:44 PM, Dag Sverre Seljebotn wrote:
>>>>
>>>> Me and Robert had a long discussion on the NumFOCUS list about this
>>>> already, but I figured it was better to continue it and provide more
>>>> in-depth benchmark results here.
>>>>
>>>> It's basically a new idea of how to provide a vtable based on
>>perfect
>>>> hashing, which should be a lot simpler to implement than what I
>>first
>>>> imagined.
>>>>
>>>> I'll write down some context first, if you're familiar with this
>>>> skip ahead a bit..
>>>>
>>>> This means that you can do fast dispatches *without* the messy
>>>> business of binding vtable slots at compile time. To be concrete,
>>this
>>>> might e.g. take the form
>>>>
>>>> def f(obj):
>>>> obj.method(3.4) # try to find a vtable with "void method(double)" in
(Continue reading)

Stefan Behnel | 5 Jun 2012 09:25
Picon
Favicon

Re: [Cython] Hash-based vtables

Dag Sverre Seljebotn, 04.06.2012 21:44:
>    This can cause crashes/stack smashes
>    etc. if there's lower-64bit-of-md5 collisions, but a) the
>    probability is incredibly small, b) it would only matter in
>    situations that should cause an AttributeError anyway, c) if we
>    really care, we can always use an interning-like mechanism to
>    validate on module loading that its hashes doesn't collide with
>    other hashes (and raise an exception "Congratulations, you've
>    discovered a phenomenal md5 collision, get in touch with cython
>    devs and we'll work around it right away").

I'm not a big fan of such an attitude. If this happens at runtime, it can
induce any cost from cheap-at-test-time to hugely-expensive-in-production.
Thinking with my evil hat on, this can potentially be data triggered from
the outside (e.g. if a JIT compiler is involved at one end), thus possibly
even leading to a security hole.

We should try to produce software that others can build a business on.

Stefan
Stefan Behnel | 5 Jun 2012 10:07
Picon
Favicon

Re: [Cython] Hash-based vtables

Dag Sverre Seljebotn, 05.06.2012 00:07:
> The C FAQ says 'if you know the contents of your hash table up front you can devise a perfect hash', but no
details, probably just hand-waving.
> 
> 128 bits gives more entropy for perfect hashing: some but not much since each shift r is hardwired to one 64
bit subset.

Perfect hashing can be done with any fixed size data set (although it's not
guaranteed to always be the most efficient solution). It doesn't matter if
you use 64 bits or 128 bits. If 4 bits is enough, go with that. The
advantage of perfect hashing of a fixed size data set is that the hash
table has no free space and a match is guaranteed to be exact.

However, the problem in this specific case is that the caller and the
callee do not agree on the same set of entries, so there may be collisions
during the lookup (of a potentially very large set of signatures) that were
not anticipated in the perfect hash table layout (of the much smaller set
of provided signatures). Perfect hashing works here as well, but it looses
one of its main advantage over other hashing schemes. You then have to
compare the entries exactly after the lookup in order to make sure that you
didn't run into a collision, thus loosing time again that you just won with
the hashing.

But at least you only have to do exactly one such comparison, so that's an
advantage over a hashing scheme that allows collisions also in the layout.
Maybe you can even handle mismatches more quickly by adding a dedicated
"empty" entry for them that most (all?) anticipated mismatches would hash to.

Stefan
(Continue reading)

Robert Bradshaw | 5 Jun 2012 11:16
Picon

Re: [Cython] Hash-based vtables

On Tue, Jun 5, 2012 at 1:07 AM, Stefan Behnel <stefan_ml@...> wrote:
> Dag Sverre Seljebotn, 05.06.2012 00:07:
>> The C FAQ says 'if you know the contents of your hash table up front you can devise a perfect hash', but no
details, probably just hand-waving.
>>
>> 128 bits gives more entropy for perfect hashing: some but not much since each shift r is hardwired to one 64
bit subset.
>
> Perfect hashing can be done with any fixed size data set (although it's not
> guaranteed to always be the most efficient solution). It doesn't matter if
> you use 64 bits or 128 bits. If 4 bits is enough, go with that. The
> advantage of perfect hashing of a fixed size data set is that the hash
> table has no free space and a match is guaranteed to be exact.

The hash function is f(h(sig)) where f is parameterized but must be
*extremely* cheap and h is fixed without regard to the entry set. This
is why having 128 bits for the output of h may be an advantage.

> However, the problem in this specific case is that the caller and the
> callee do not agree on the same set of entries, so there may be collisions
> during the lookup (of a potentially very large set of signatures) that were
> not anticipated in the perfect hash table layout (of the much smaller set
> of provided signatures). Perfect hashing works here as well, but it looses
> one of its main advantage over other hashing schemes. You then have to
> compare the entries exactly after the lookup in order to make sure that you
> didn't run into a collision, thus loosing time again that you just won with
> the hashing.
>
> But at least you only have to do exactly one such comparison, so that's an
> advantage over a hashing scheme that allows collisions also in the layout.
(Continue reading)


Gmane