Jieyun Fu | 2 Jan 2012 15:46
Picon

Why is one function slower than the other, despite that "cimport numpy" is used?

Hi all,

I am comparing the performance between these two short functions (I
understand both functions can be directly vectorized using
numpy.searchsorted, but I just wanted to compare them nonetheless),
and surprisingly g_cython() is significantly slower than
g_less_cython(). Why is that? I uses cimport numpy to allow cython to
index the data in array a and b more efficiently,so g_cython() should
definitely be faster?

I attach the source code and test cases below. Thanks!

def g_cython(np.ndarray[np.int_t, ndim = 1] a, percentile):
    cdef int i
    cdef int n = len(a)
    cdef np.ndarray[np.int_t, ndim = 1] b = np.zeros(n)
    for i in xrange(n):
        b[i] = np.searchsorted(percentile, a[i])
    return b

def g_less_cython(a, percentile):
    cdef int i
    b = np.zeros_like(a)
    for i in xrange(len(a)):
        b[i] = np.searchsorted(percentile, a[i])
    return b

Test cases:

In [1]: import numpy as np
(Continue reading)

Robert Bradshaw | 2 Jan 2012 20:51
Favicon

Re: access to underlying Python/C API?

Much of Python's C API can be cimported from
https://github.com/cython/cython/tree/master/Cython/Includes/cpython ,
and you can write your own. Most of the time you shouldn't need to
interact with the C API directly, but this seems like a valid usecase
(though something we might want to consider supporting more natively
in the future).

- Robert

On Fri, Dec 23, 2011 at 12:22 AM, Tay Ray Chuan <rctay89 <at> gmail.com> wrote:
> Hi,
>
> I wish to write code to access the Python's C API functions, how do I do
> that? Or do I have to spin my own "Python.pxd" with the signatures of the
> functions that I need to call?
>
> Background: I need to interact with file objects (PyFileObject). For
> example,
>
> cdef class Foo(object):
>   cdef object _file
>
>   def __init__(self, file):
>     self._file = file
>
>   def read10(self):
>     cdef FILE *fp
>     fp = PyFile_AsFile(self._file)
>     fread(fp...)
>
(Continue reading)

Rajeev Singh | 3 Jan 2012 12:49
Picon
Gravatar

spam on cython wiki

Hi,

I just found a spam on cython wiki -

http://wiki.cython.org/Triactol%20Reviews

Someone should delete it. I don't know how to do it!

Rajeev

Stefan Behnel | 3 Jan 2012 13:12
Picon
Favicon

Re: access to underlying Python/C API?

Tay Ray Chuan, 23.12.2011 09:22:
> Background: I need to interact with file objects (PyFileObject). For
> example,
>
> cdef class Foo(object):
>    cdef object _file
>
>    def __init__(self, file):
>      self._file = file
>
>    def read10(self):
>      cdef FILE *fp
>      fp = PyFile_AsFile(self._file)
>      fread(fp...)

Note that this is not portable to Py3. The "file" type doesn't have a 
(usable) C-API there.

Stefan

Lars Buitinck | 3 Jan 2012 14:01
Picon
Picon
Favicon

Re: spam on cython wiki

2012/1/3 Rajeev Singh <rajs2010 <at> gmail.com>:
> I just found a spam on cython wiki -
>
> http://wiki.cython.org/Triactol%20Reviews
>
> Someone should delete it. I don't know how to do it!

You can make an account by clicking "login", then go to the page and
select "Delete page" from the "More actions" drop-down menu.
Apparently, there's a whole lot of spam on this wiki.

--

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

Chris Barker | 3 Jan 2012 18:37
Picon
Favicon

Re: Why is one function slower than the other, despite that "cimport numpy" is used?

On Mon, Jan 2, 2012 at 6:46 AM, Jieyun Fu <jieyunfu <at> gmail.com> wrote:
> I am comparing the performance between these two short functions (I
> understand both functions can be directly vectorized using
> numpy.searchsorted, but I just wanted to compare them nonetheless),
> and surprisingly g_cython() is significantly slower than
> g_less_cython(). Why is that?

I'm not sure why it is so much slower, but I think I know why it isn't
any faster:

>  I uses cimport numpy to allow cython to
> index the data in array a and b more efficiently,so g_cython() should
> definitely be faster?

Cython knows that a and b are ndarrays, but the real work here is
being done by np.searchsorted -- Cython does not reach into numpy
functions and re-write them, so you are getting the same searchsorted
in both cases.

Also, am I missing something? searchsorted assumes that the input
array is already sorted -- it doesn't look like you are doing that.

If you really need to speed it up, you'd need to writre your own
version of searchsorted in Cython -- though I doubt you'd get much,
it's already a C loop.

HTH,
  -Chris

> I attach the source code and test cases below. Thanks!
(Continue reading)

Chris Barker | 3 Jan 2012 19:11
Picon
Favicon

Re: Re: ctypedef from a pxd file.

On Fri, Dec 30, 2011 at 2:44 PM, Chris Barker <chris.barker <at> noaa.gov> wrote:
>> Can you open the glu.h header in your system? I bet the typedef is
>> GLUfuncptr and not _GLUfuncptr.

Thanks for the hint. It turns out that various glu implementations
handle the callback pointere sdifferently. Some use a typedef to
"_GLUfuncptr". Howver, others use other things.

The glu.h that I was pointing to had no such typedef, but was rather
using GLvoid instead.

However, other glu.h files on my system, do other things:

I found "_GLUfuncptr" in the glu.h files that come with X11, but not
in the OS-X SDKs. They have these calls:

typedef GLvoid (*_GLUfuncptr)();
typedef GLvoid (*_GLUfuncptr)(GLvoid);

Those all use: ""

typedef GLvoid (*_GLUfuncptr)();
typedef GLvoid (*_GLUfuncptr)(GLvoid);

So the question for me now is -- how do I fix this? I need this to
work on other platforms, so I'd like a platform independent solution
-- though not easy to check what's up with Windows right now.

Poking around the web, I saw this quote:

(Continue reading)

Jieyun Fu | 3 Jan 2012 19:18
Picon

Re: Why is one function slower than the other, despite that "cimport numpy" is used?

Hi Chris, 


Thank you for your reply. While I understand that np.searchsorted takes most of the time, I imagine adding my numpy typing also make the indexing more efficient. That should account for quite a bit of time usage for an array of 10^6, from my experience. 

Meanwhile, I make sure the "percentile" array that passes in is sorted, so the np.searchsorted part should be correct. But I don't think that should affect the performance anyways.

Thanks

On Tue, Jan 3, 2012 at 12:37 PM, Chris Barker <chris.barker <at> noaa.gov> wrote:
On Mon, Jan 2, 2012 at 6:46 AM, Jieyun Fu <jieyunfu <at> gmail.com> wrote:
> I am comparing the performance between these two short functions (I
> understand both functions can be directly vectorized using
> numpy.searchsorted, but I just wanted to compare them nonetheless),
> and surprisingly g_cython() is significantly slower than
> g_less_cython(). Why is that?

I'm not sure why it is so much slower, but I think I know why it isn't
any faster:

>  I uses cimport numpy to allow cython to
> index the data in array a and b more efficiently,so g_cython() should
> definitely be faster?

Cython knows that a and b are ndarrays, but the real work here is
being done by np.searchsorted -- Cython does not reach into numpy
functions and re-write them, so you are getting the same searchsorted
in both cases.

Also, am I missing something? searchsorted assumes that the input
array is already sorted -- it doesn't look like you are doing that.

If you really need to speed it up, you'd need to writre your own
version of searchsorted in Cython -- though I doubt you'd get much,
it's already a C loop.

HTH,
 -Chris


> I attach the source code and test cases below. Thanks!
>
> def g_cython(np.ndarray[np.int_t, ndim = 1] a, percentile):
>    cdef int i
>    cdef int n = len(a)
>    cdef np.ndarray[np.int_t, ndim = 1] b = np.zeros(n)
>    for i in xrange(n):
>        b[i] = np.searchsorted(percentile, a[i])
>    return b
>
>
> def g_less_cython(a, percentile):
>    cdef int i
>    b = np.zeros_like(a)
>    for i in xrange(len(a)):
>        b[i] = np.searchsorted(percentile, a[i])
>    return b
>
>
> Test cases:
>
> In [1]: import numpy as np
>
> In [2]: n = 1000000
>
> In [3]: a = np.random.random_integers(0,10000000,n)
>
> In [4]: percentile = np.linspace(0, 10000000, 101 )
>
> In [5]: percentile = np.asarray(percentile, dtype = 'int' )
>
> In [6]: import test_time_c
>
> In [7]: timeit test_time_c.g_less_cython(a, percentile)
> 1 loops, best of 3: 1.58 s per loop
>
> In [8]: timeit test_time_c.g_cython(a, percentile)
> 1 loops, best of 3: 3.63 s per loop
>



--
--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker <at> noaa.gov

Wonjun, Choi | 4 Jan 2012 05:06
Picon

what is the best way to pass c, c++ multi-dimensional array to numpy in cython?

hello,

what is the best way to pass c, c++ multi-dimensional array to numpy
in cython?
I have found several way to do this but I want to do this in cython
http://stackoverflow.com/questions/5862915/passing-numpy-arrays-to-a-c-function-for-input-and-output
http://stackoverflow.com/questions/5748566/array-of-pointers-from-c-to-numpy-throught-cython
http://stackoverflow.com/questions/4101536/multi-dimensional-char-array-array-of-strings-in-python-ctypes
=> I wonder I should use ctypes..

and I found this http://groups.google.com/group/cython-users/browse_thread/thread/97f905aeb6a52c71
which is related to this issue.
in this article, Dag Sverre Seljebotn mentioned that PyArray_DATA on
docs.scipy.org in the NumPy C API section and jasonmccampbell's
github(there are fwrap, numpy-refactor, numpy-refactor-sprint, scipy-
refactor and I am not sure what is difference among these. and which
part I should look into).

I want to find the simplest way not using free, malloc, memcpy, numpy
C API.

Wonjun, Choi

mark florisson | 4 Jan 2012 08:17
Picon
Gravatar

Re: what is the best way to pass c, c++ multi-dimensional array to numpy in cython?

On 4 January 2012 05:06, Wonjun, Choi <wonjunchoi001 <at> gmail.com> wrote:
> hello,
>
> what is the best way to pass c, c++ multi-dimensional array to numpy
> in cython?
> I have found several way to do this but I want to do this in cython
> http://stackoverflow.com/questions/5862915/passing-numpy-arrays-to-a-c-function-for-input-and-output
> http://stackoverflow.com/questions/5748566/array-of-pointers-from-c-to-numpy-throught-cython
> http://stackoverflow.com/questions/4101536/multi-dimensional-char-array-array-of-strings-in-python-ctypes
> => I wonder I should use ctypes..
>
> and I found this http://groups.google.com/group/cython-users/browse_thread/thread/97f905aeb6a52c71
> which is related to this issue.
> in this article, Dag Sverre Seljebotn mentioned that PyArray_DATA on
> docs.scipy.org in the NumPy C API section and jasonmccampbell's
> github(there are fwrap, numpy-refactor, numpy-refactor-sprint, scipy-
> refactor and I am not sure what is difference among these. and which
> part I should look into).
>
> I want to find the simplest way not using free, malloc, memcpy, numpy
> C API.
>
> Wonjun, Choi

Search the cython-users mailing list for "memoryview numpy" and you
should find a number of relevant discussions.


Gmane