1 Jan 05:29 2011

### Rolling window (moving average, moving std, and more)

Hi,

Implementing moving average, moving std and other functions working
over rolling windows using python for loops are slow. This is a
effective stride trick I learned from Keith Goodman's
<kwgoodman <at> gmail.com> Bottleneck code but generalized into arrays of
any dimension. This trick allows the loop to be performed in C code
and in the future hopefully using multiple cores.

import numpy as np

def rolling_window(a, window):
"""
Make an ndarray with a rolling window of the last dimension

Parameters
----------
a : array_like
Array to add rolling window to
window : int
Size of rolling window

Returns
-------
Array that is a view of the original array with a added dimension
of size w.

Examples
--------
>>> x=np.arange(10).reshape((2,5))

1 Jan 12:23 2011

### Re: Optimization suggestion sought

----- Original Message -----
From: "Robert Bradshaw" <robertwb <at> math.washington.edu>
Sent: Wednesday, December 29, 2010 4:47 PM

[...]
>> Regarding Justin's suggestion, before trying Cython (which, according to
>> http://wiki.cython.org/tutorials/numpy , seems to require a bit of work
>> to
>> handle numpy arrays properly)
>
> Cython doesn't have to be that complicated. For your example, you just
> have to unroll the vectorization (and account for the fact that the
> result is mutated in place, which was your original goal).

Thanks, but the full de-vectorization forces to give up any use of BLAS (I
suppose that for array products numpy relies on its routines). In my tests,
the performance in terms of speed is more or less the same as the original
pure-numpy code (which may be made less memory-hungry with the chunking
suggested by Josef).

Instead, it would be nice to have a native function able to perform
evaluation of arbitrary numpy expressions without converting the
intermediate results in Python format (a sort of "better weave.blitz", able
to understand slicing, broadcasting rules etc.). That would give us the best
of both worlds: code execution at BLAS speeds, and savings in unnecessary
conversions and temporary variable allocations. Such "numpy calculator"
could also be a simple interpreter, avoiding the complexities and site
dependencies deriving from the use of a C compiler: it should build
temporary C data structures for the parameters in input, call the relevant C
ATLAS/BLAS/LAPACK functions in the right order (possibly allocating

1 Jan 12:40 2011

### Re: OS X binaries.

On Sat, Jan 1, 2011 at 5:44 AM, Gideon wrote:
I noticed that 1.5.1 was released, and sourceforge is suggesting I use
the package numpy-1.5.1-py2.6-python.org-macosx10.3.dmg.  However, I
have an OS X 10.6 machine.

Can/should I use this binary?

Yes you can. The naming scheme corresponds to the one used by Python itself on python.org. For 2.6 the ..macosx10.3.dmg works for all supported versions of OS X. For 2.7 you have the choice of 2 versions if you are on 10.6, depending on whether or not you want 32-bit or 64-bit.

Cheers,
Ralf

Should I just compile from source?
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
1 Jan 20:23 2011

### Re: Giving numpy the ability to multi-iterate excluding an axis

This thread is a bit old, but since it's not possible to use the C-API is possible to accomplish this same thing with the Python API?

On Tue, Dec 21, 2010 at 5:12 PM, Mark Wiebe wrote:
On Mon, Dec 20, 2010 at 1:42 PM, John Salvatier wrote:
A while ago, I asked a whether it was possible to multi-iterate over several ndarrays but exclude a certain axis(http://www.mail-archive.com/numpy-discussion <at> scipy.org/msg29204.html), sort of a combination of PyArray_IterAllButAxis and PyArray_MultiIterNew. My goal was to allow creation of relatively complex ufuncs that can allow reduction or directionally dependent computation and still use broadcasting (for example a moving averaging ufunc that can have changing averaging parameters). I didn't get any solutions, which I take to mean that no one knew how to do this.

I am thinking about trying to make a numpy patch with this functionality, and I have some questions: 1) How difficult would this kind of task be for someone with non-expert C knowledge and good numpy knowledge? 2) Does anyone have advice on how to do this kind of thing?

You may be able to do what you would like with the new iterator I've written.  In particular, it supports nesting multiple iterators by providing either pointers or offsets, and allowing you to specify any subset of the axes to iterate.  Here's how the code to do this in a simple 3D case might look, for making axis 1 the inner loop:

PyArrayObject *op[2] = {a,b};
npy_intp axes_outer[2] = {0,2}};
npy_intp *op_axes[2];
npy_intp axis_inner = 1;
NpyIter *outer, *inner;
NpyIter_IterNext_Fn oiternext, iiternext;
npy_intp *ooffsets;
char **idataptrs;

op_axes[0] = op_axes[1] = axes_outer;
outer = NpyIter_MultiNew(2, op, NPY_ITER_OFFSETS,
NPY_KEEPORDER, NPY_NO_CASTING, flags, NULL, 2, op_axes, 0);
op_axes[0] = op_axes[1] = &axis_inner;
inner = NpyIter_MultiNew(2, op, 0, NPY_KEEPORDER, NPY_NO_CASTING, flags, NULL, 1, op_axes, 0);

oiternext = NpyIter_GetIterNext(outer);
iiternext = NpyIter_GetIterNext(inner);

ooffsets = (npy_intp *)NpyIter_GetDataPtrArray(outer);
idataptrs = NpyIter_GetDataPtrArray(inner);

do {
do {
char *a_data = idataptrs[0] + ooffsets[0], *b_data = idataptrs[0] + ooffsets[0];
/* Do stuff with the data */
} while(iiternext());
NpyIter_Reset(inner);
} while(oiternext());

NpyIter_Deallocate(outer);
NpyIter_Deallocate(inner);

Extending to more dimensions, or making both the inner and outer loops have multiple dimensions, isn't too crazy.  Is this along the lines of what you need?

If you check out my code, note that it currently isn't exposed as NumPy API yet, but you can try a lot of things with the Python exposure.

Cheers,
Mark

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
2 Jan 02:42 2011

### Arrays with aliased elements?

Is there any way, not involving compilation of C code, to define ndarrays
where some rows or columns share the same data buffers? For example,
something built by a hypothetical variant of the np.repeat() function, such
that, if a = array([2,3]), calling:

b = np.aliasedrepeat(x, [1, 2], axis=0)

would return in b:

array([[2, 3],
[2, 3],
[2, 3]])

...with the understanding that the three rows would actually share the same
data, so setting e.g.:

b[0,1] = 5

...would change b into:

array([[2, 5],
[2, 5],
[2, 5]])

In other words, something with a behaviour similar to a list of lists:

>>> a = [2,3]
>>> b = [a,a,a]
>>> b
[[2, 3], [2, 3], [2, 3]]
>>> b[0][1] = 5
>>> b
[[2, 5], [2, 5], [2, 5]]

This would save memory (and time spent in unnecessary copying) in some
applications with large arrays, and would allow to cope with the current
inability of weave.blitz to understand broadcasting rules, e.g. for
calculating outer products (I mentioned this in a previous thread).

Enzo
2 Jan 02:53 2011

### Re: Arrays with aliased elements?

def repeat(arr, num):
arr = numpy.asarray(arr)
return numpy.ndarray(arr.shape+(num,), dtype=arr.dtype,
buffer=arr, strides=arr.strides+(0,))

There are limits to what these sort of stride tricks can accomplish,
but repeating as above, or similar, is feasible.

On Jan 1, 2011, at 8:42 PM, Enzo Michelangeli wrote:

> Is there any way, not involving compilation of C code, to define
> ndarrays
> where some rows or columns share the same data buffers? For example,
> something built by a hypothetical variant of the np.repeat()
> function, such
> that, if a = array([2,3]), calling:
>
>   b = np.aliasedrepeat(x, [1, 2], axis=0)
>
> would return in b:
>
>   array([[2, 3],
>          [2, 3],
>          [2, 3]])
>
> ...with the understanding that the three rows would actually share
> the same
> data, so setting e.g.:
>
>   b[0,1] = 5
>
> ...would change b into:
>
>   array([[2, 5],
>          [2, 5],
>          [2, 5]])
>
> In other words, something with a behaviour similar to a list of lists:
>
>>>> a = [2,3]
>>>> b = [a,a,a]
>>>> b
> [[2, 3], [2, 3], [2, 3]]
>>>> b[0][1] = 5
>>>> b
> [[2, 5], [2, 5], [2, 5]]
>
> This would save memory (and time spent in unnecessary copying) in some
> applications with large arrays, and would allow to cope with the
> current
> inability of weave.blitz to understand broadcasting rules, e.g. for
> calculating outer products (I mentioned this in a previous thread).
>
> Enzo
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion <at> scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
2 Jan 03:08 2011

### Re: Arrays with aliased elements?

On Sat, Jan 1, 2011 at 19:42, Enzo Michelangeli <enzomich <at> gmail.com> wrote:
> Is there any way, not involving compilation of C code, to define ndarrays
> where some rows or columns share the same data buffers? For example,
> something built by a hypothetical variant of the np.repeat() function, such
> that, if a = array([2,3]), calling:
>
>   b = np.aliasedrepeat(x, [1, 2], axis=0)
>
> would return in b:
>
>   array([[2, 3],
>          [2, 3],
>          [2, 3]])
>
> ...with the understanding that the three rows would actually share the same
> data, so setting e.g.:
>
>   b[0,1] = 5
>
> ...would change b into:
>
>   array([[2, 5],
>          [2, 5],
>          [2, 5]])
>
> In other words, something with a behaviour similar to a list of lists:
>
>>>> a = [2,3]
>>>> b = [a,a,a]
>>>> b
> [[2, 3], [2, 3], [2, 3]]
>>>> b[0][1] = 5
>>>> b
> [[2, 5], [2, 5], [2, 5]]
>
> This would save memory (and time spent in unnecessary copying) in some
> applications with large arrays, and would allow to cope with the current
> inability of weave.blitz to understand broadcasting rules, e.g. for
> calculating outer products (I mentioned this in a previous thread).

See numpy.lib.stride_tricks for tools to do this, specifically the
as_strided() function. See numpy.broadcast_arrays() for the latter
functionality.

--

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
2 Jan 08:01 2011

### Re: Arrays with aliased elements?

Thanks. Meanwhile, I had arrived to a solution similar to the one suggested
by Zachary:

>>> a = array([2,3])
>>> ndarray((3,a.shape[0]), strides=(0,a.itemsize), buffer = a, offset=0,
>>> dtype=a.dtype)
array([[2, 3],
[2, 3],
[2, 3]])

...but I'd say that numpy.broadcast_arrays is the cleanest way of obtaining
pre-broadcasted views to pass to weave.blitz(). But alas, it appears that
blitz doesn't work well with such non-contiguous views:

tableau = tableau - tsb * pivb

...works, but:

weave.blitz('tableau = tableau - tsb * pivb')

...returns wrong results. And, of course, converting them to contiguous
through the array() function defeats the intended savings in memory and CPU
cycles...

Enzo

----- Original Message -----
From: "Robert Kern" <robert.kern <at> gmail.com>
To: "Discussion of Numerical Python" <numpy-discussion <at> scipy.org>
Sent: Sunday, January 02, 2011 10:08 AM
Subject: Re: [Numpy-discussion] Arrays with aliased elements?

> On Sat, Jan 1, 2011 at 19:42, Enzo Michelangeli <enzomich <at> gmail.com>
> wrote:
>> Is there any way, not involving compilation of C code, to define ndarrays
>> where some rows or columns share the same data buffers? For example,
>> something built by a hypothetical variant of the np.repeat() function,
>> such
>> that, if a = array([2,3]), calling:
>>
>> b = np.aliasedrepeat(x, [1, 2], axis=0)
>>
>> would return in b:
>>
>> array([[2, 3],
>> [2, 3],
>> [2, 3]])
>>
>> ...with the understanding that the three rows would actually share the
>> same
>> data, so setting e.g.:
>>
>> b[0,1] = 5
>>
>> ...would change b into:
>>
>> array([[2, 5],
>> [2, 5],
>> [2, 5]])
>>
>> In other words, something with a behaviour similar to a list of lists:
>>
>>>>> a = [2,3]
>>>>> b = [a,a,a]
>>>>> b
>> [[2, 3], [2, 3], [2, 3]]
>>>>> b[0][1] = 5
>>>>> b
>> [[2, 5], [2, 5], [2, 5]]
>>
>> This would save memory (and time spent in unnecessary copying) in some
>> applications with large arrays, and would allow to cope with the current
>> inability of weave.blitz to understand broadcasting rules, e.g. for
>> calculating outer products (I mentioned this in a previous thread).
>
> See numpy.lib.stride_tricks for tools to do this, specifically the
> as_strided() function. See numpy.broadcast_arrays() for the latter
> functionality.
>
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
> -- Umberto Eco
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion <at> scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
3 Jan 07:46 2011

### Prime size FFT: bluestein transform vs general chirp/z transform ?

Hi,

I finally took the time to clean up my code to speed up prime-size FFT
(which use a O(N^2) algo in both numpy and scipy). The code is there:
https://github.com/cournape/numpy/tree/bluestein (most of the code is
tests, because numpy.fft had almost none).

Bottom line: it is used only for prime numbers, and is faster than the
current code for complex transforms > 500. Because of python +
inherent bluestein overhead, this is mostly useful for "long" fft
(where the speed up is significant - already 100x speed up for prime
size ~ 50000).

- the overhead is pretty significant (on my machine, bluestein
transfrom is slower for prime size < 500)
- it could be used as such for real transforms, but the overhead
would be even more significant (there is no bluestein transform for
real transforms, so one needs to re-rexpress real transforms in term
of complex ones, multiplying the overhead by 2x). There are several
alternatives to make things faster (Rader-like transform, as used by
fftw), but I think this would be quite hard to do in python without
significant slowdown, because the code cannot be vectorized.
- one could also decide to provide a chirp-z transform, of which
Bluestein transform is a special case. Maybe this is more adapted to
scipy ?
- more generic code will require a few simple (but not trivial)
arithmetic-like functions (find prime factors, find generator of Z/nZ
groups with n prime, etc...). Where should I put those ?

cheers,

David
3 Jan 11:13 2011

### Re: Rolling window (moving average, moving std, and more)

Hi Erik,
This is really neat !  Do I understand correctly, that you mean by
"stride tricks", that your rolling_window is _not_ allocating any new
memory ?
IOW, If I have a large array using 500MB of memory, say of float32 of
shape 125,1000,1000 and I want the last axis rolling of window size
11, what would the peak memory usage of that operation be ?
How about renaming the option `window` to `window_size`  (first I was
thinking of things like hamming and hanning windows...)... ?

Thanks,
Sebastian Haase

On Sat, Jan 1, 2011 at 5:29 AM, Erik Rigtorp <erik <at> rigtorp.com> wrote:
> Hi,
>
> Implementing moving average, moving std and other functions working
> over rolling windows using python for loops are slow. This is a
> effective stride trick I learned from Keith Goodman's
> <kwgoodman <at> gmail.com> Bottleneck code but generalized into arrays of
> any dimension. This trick allows the loop to be performed in C code
> and in the future hopefully using multiple cores.
>
> import numpy as np
>
> def rolling_window(a, window):
>    """
>    Make an ndarray with a rolling window of the last dimension
>
>    Parameters
>    ----------
>    a : array_like
>        Array to add rolling window to
>    window : int
>        Size of rolling window
>
>    Returns
>    -------
>    Array that is a view of the original array with a added dimension
>    of size w.
>
>    Examples
>    --------
>    >>> x=np.arange(10).reshape((2,5))
>    >>> rolling_window(x, 3)
>    array([[[0, 1, 2], [1, 2, 3], [2, 3, 4]],
>           [[5, 6, 7], [6, 7, 8], [7, 8, 9]]])
>
>    Calculate rolling mean of last dimension:
>    >>> np.mean(rolling_window(x, 3), -1)
>    array([[ 1.,  2.,  3.],
>           [ 6.,  7.,  8.]])
>
>    """
>    if window < 1:
>        raise ValueError, "`window` must be at least 1."
>    if window > a.shape[-1]:
>        raise ValueError, "`window` is too long."
>    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
>    strides = a.strides + (a.strides[-1],)
>    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
>
>
> Using np.swapaxes(-1, axis) rolling aggregations over any axis can be computed.
>
> I submitted a pull request to add this to the stride_tricks module.
>
> Erik

Gmane