Neil Girdhar | 22 Oct 19:32 2014

Bug in 1.9?


Is this desired behaviour or a regression or a bug?


NumPy-Discussion mailing list
NumPy-Discussion <at>
Daniel Hyams | 22 Oct 05:52 2014

numpy log and exceptions

I would have thought that this snippet would raise an exception:

import numpy
a = numpy.array([1.0,0.0,-1.0])
b = numpy.log(a)

I get as a result (in b): [0, -Inf, NaN]

It's basically the same issue as:

Except that I have explicitly set the error flags to raise exceptions.  It 
works fine for sqrt(), but not for log().  I've checked numpy 1.4.0 and 
1.7.1 and both have the same behavior.

Is there a way to force the log (and log10) function to raise an exception 
on invalid input?
Matt Gregory | 22 Oct 00:18 2014

return unique combinations of stacked arrays - slow

I'm trying to create an output array of integers where each value 
represents a unique combination of values from (1..n) input arrays.  As 
a simple example, given these three arrays:

a = np.array([0, 1, 2, 3, 0, 1, 2, 3])
b = np.array([0, 1, 0, 1, 0, 1, 0, 1])
c = np.array([0, 1, 1, 0, 0, 1, 0, 1])

I want an output array that holds 'codes' for the unique combinations 
and a dictionary that holds the unique combinations as keys and codes as 

out = np.array([0, 1, 2, 3, 0, 1, 4, 5])
out_dict = {
   (0, 0, 0): 0,
   (1, 1, 1): 1,
   (2, 0, 1): 2,
   (3, 1, 0): 3,
   (2, 0, 0): 4,
   (3, 1, 1): 5,

An additional constraint is that I'm bringing in the (a, b, c) arrays a 
chunk at a time due to memory limits (ie. very large rasters) and so I 
need to retain the mapping between chunks.

My current (very naive and pretty slow) implementation in loop form is:

out_dict = {}
out = np.zeros_like(a)
count = 0
stack = np.vstack((a, b, c)).T
for (i, arr) in enumerate(stack):
     t = tuple(arr)
     if t not in out_dict:
         out_dict[t] = count
         count += 1
     out[i] = out_dict[t]

Thanks for help,
Mark Mikofski | 21 Oct 18:43 2014

MKL not available as separate download since 10/16/2014

the old MKL links point to 404.html

the new MKL link has this link in the right sidebar:
Changes to Intel® Software Development Products | Intel® Developer Zone
Intel® Parallel Studio XE simplification Intel® Parallel Studio XE has been simplified to three editions, Composer, Professional and Cluster. Here’s a table showing the old and new simplified names. More information about Intel Parallel Studio editions is here.
Preview by Yahoo

which has the following statement:

Ingredient product availability changes

New licenses of the following ingredient products will be available only in suites going forward. Current licensees of these ingredient products can continue to renew support maintenance for their existing license or upgrade to a suite. For more info about the ingredients below, click the product name below:
“As I breath in, I calm my body, as I breath out, I smile” - Thich_Nhat_Hanh
NumPy-Discussion mailing list
NumPy-Discussion <at>
Nathaniel Smith | 19 Oct 03:17 2014


Okay! I think I now actually understand what was going on with
np.gradient! The discussion has been pretty confused, and I'm worried
that what's in master right now is not the right solution, which is a
problem because we're supposed to cut 1.9.1 tomorrow.


np.gradient computes gradients using a finite difference method.
Wikipedia has a good explanation of how this works:
The key point is that there are multiple formulas one can use that all
converge to the derivative, e.g.

   (f(x + h) - f(x)) / h


   (f(x + h) - f(x - h)) / 2h

The first is the textbook definition of the derivative. As h -> 0, the
error in that approximation shrinks like O(h). The second formula also
converges to the derivative as h -> 0, but it converges like O(h^2),
i.e., much faster. And there's are many many formulas like this with
different trade-offs:
In practice, given a grid of values of f(x) and a grid stepsize of h,
all of these formulas come down to doing a simple convolution of the
data with certain coefficients (e.g. [-1/h, 1/h] or [-1/2h, 0, 1/2h]
for the two formulas above).

Traditionally np.gradient has used the second formula above, with its
quadratically diminishing error term, for interior points in the grid.
For points on the boundary of the grid, though, this formula has a
problem, b/c it requires you look at points to both the left and the
right of the current point -- but for boundary points one of these
doesn't exist. In such situations np.gradient has traditionally used
the first formula instead (the "forward (or backward) finite
difference approximation with linear error", where
"forward"/"backward" means that it works on the boundary). As the web
page linked above shows, though, there's an easy alternative formula
that works on

The change in numpy 1.9.0 that has caused all the fuss, is that we
switched to using the quadratically accurate approximation instead of
the linearly accurate approximation for boundary points.

This had two effects:

1) it produced (hopefully small) changes in the calculation for the
boundaries. In general these should have made the calculations more
accurate, but apparently this did cause problems for some people who
wanted perfectly reproducible output.

2) it tickled a nasty bug in how gradient and masked arrays work
together. It turns out gradient applied to masked arrays has always
been seriously buggy; there's some horribleness in the masked array
implementation that means the output array that np.gradient is writing
into ends up sharing a mask array with the input. So as we go along
calculating the output, we also mutate the input. This can definitely
corrupt the input, for example here:
...and I strongly suspect it's possible to find cases where it means
you get the wrong answer.

For mysterious reasons involving the order in which values were
calculated or something, the version of np.gradient in 1.9.0 tickled
this bug in a much worse manner, with early mutations to the input
array ended up affecting later calculations using the same input
array, causing cascading errors. This is the cause of the massive
problems that matplotlib encountered in their test suite:

There's a proposed fix for the masked array issues at:
For the reasons described above, I would be very wary about using
*any* currently released version of np.gradient on masked arrays. A
workaround would be to make a local copy of the definition of
np.gradient, and immediately after the 'out' array is allocated do
'out.mask = out.mask.copy()'.

Once this bug is fixed, the "new" gradient code produces results which
are identical to the "old" gradient code on the interior; only the
boundary is different.


So here are my concerns:

- We decided to revert the changes to np.gradient in 1.9.1 (at least
by default). I'm not sure how much of that decision was based on the
issues with masked arrays, though, which turns out to be a different
issue entirely. The previous discussion conflated the two issues

- We decided to gate the more-accurate boundary calculation behind a
kwarg called "edge_order=2". AFAICT now that I actually understand
what this code is doing, this is a terrible name -- we're using a
3-coefficient kernel to compute a quadratically accurate approximation
to a first-order derivative. There probably exist other kernels that
are also quadratically accurate. "Order 2" simply doesn't refer to
this in any unique or meaningful way. And it will be even more
confusing if we ever add the option to select which kernel to use on
the interior, where "quadratically accurate" is definitely not enough
information to uniquely define a kernel. Maybe we want something like
edge_accuracy=2 or edge_accuracy="quadratic" or something? I'm not

- If edge_order=2 escapes into a release tomorrow then we'll be stuck with it.

Some possible options for 1.9.1:
- Just revert the entire np.gradient code to match 1.8, and put off a
real fix until 1.9. This at least avoids getting stuck with a poor

- Put the masked array bug fix into 1.9.1, and leave the gradient code
the same as 1.9.0; put off designing a proper API for selecting the
old behavior for 1.9.2. (Or 1.10 I guess.) This doesn't solve the pure
reproduceability problems, but it might be enough to make matplotlib
work again at least?

- Delay 1.9.1 to rehash the issues and make a decision on what we want
to support long-term.



Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
Artur Bercik | 18 Oct 07:58 2014

Extract Indices of Numpy Array Based on Given Bit Information

Dear Python and Numpy Users:

My data are in the form of '32-bit unsigned integer' as follows:

myData = np.array([1073741824, 1073741877, 1073742657, 1073742709, 1073742723, 1073755137, 1073755189,1073755969],dtype=np.int32)

I want to get the index of my data where the following occurs:

Bit No. 0–1
Bit Combination: 00

How can I do it? I heard this type of problem first time, please help me.

NumPy-Discussion mailing list
NumPy-Discussion <at>
Stephan Hoyer | 18 Oct 06:56 2014

Add an axis argument to generalized ufuncs?

Yesterday I created a GitHub issue proposing adding an axis argument to numpy's gufuncs:

I was told I should repost this on the mailing list, so here's the recap:

I would like to write generalized ufuncs (probably using numba), to create fast functions such as nanmean (signature '(n)->()') or rolling_mean (signature '(n),()->(n)') that take the axis along which to aggregate as a keyword argument, e.g., nanmean(x, axis=0) or rolling_mean(x, window=5, axis=0).

Of course, I could write my own wrapper for this that reorders dimensions using swapaxes or transpose. But I also think that an "axis" argument to allow for specifying the core dimensions of gufuncs would be more generally useful, and we should consider adding it to numpy.

Nathaniel and Jaime added some good points, noting that such an axis argument should cleanly handle multiple input and output arguments and have a plan for handling optional dimensions (e.g., (m?,n),(n,p?)->(m?,p?) for the new dot).

Here are my initial thoughts on the syntax:

(1) Generally speaking, I think the "nested tuple" syntax (e.g., axis=[(0, 1), (2, 3)]) would be most congruous with the axis arguments numpy already supports.

(2) For gufuncs with simpler signatures, we should support supplying an integer or an unnested tuple, e.g.,
- axis=0 for (n)->()
- axis=(0, 1) for (n)(m)->() or (n,m)->()
- axis=[(0, 1), 2] for (n,m),(o)->().

(3) If we require a full axis specification for core dimensions, we could use the axis argument for unambiguous control of optional core dimensions: e.g., axis=(0, 1) would indicate that you want the "vectorized inner product" version of the new dot operator, rather than matrix multiplication, and axis=[(-2, -1), -1] would mean that you want the "vectorized matrix-vector product". This seems relatively tidy, although I admit I am not convinced that optional core dimensions are necessary.

(4) We can either include the output axis as part of the signature, or add another argument "axis_out" or "out_axis". I think prefer the separate argument, particularly if we require "axis" to specify all core dimensions, which may be a good idea even if we don't use "axis" for controlling optional core dimensions.

NumPy-Discussion mailing list
NumPy-Discussion <at>
Jaime Fernández del Río | 17 Oct 05:43 2014

Passing multiple output arguments to ufunc

There is an oldish feature request in github
(, complaining about it not
being possible to pass multiple output arguments to a ufunc using
keyword arguments.

You can pass them all as positional arguments:

>>> out1 = np.empty(1)
>>> out2 = np.empty(1)
>>> np.modf([1.333], out1, out2)
(array([ 0.333]), array([ 1.]))

You can also pass the first as a kwarg if you leave the others unspecified:

>>> np.modf([1.333], out=out1)
(array([ 0.333]), array([ 1.]))

You can also use None in a positional argument to leave some of the
output arguments unspecified:

>>> np.modf([1.3333], None, out2)
(array([ 0.3333]), array([ 1.]))

But you cannot do something like

>>> np.modf([1.333], out=(out1, out2))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: return arrays must be of ArrayType

Would this behavior make sense? The idea would be to allow a tuple as
a valid input for the 'out=' kwarg. It would have to have a length
exactly matching the number of output arguments, and its items would
have to be either arrays or None.

For backwards compatibility we probably should still allow a single
array to mean the first output argument, even if the ufunc has
multiple outputs.

Any other thoughts?



( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
planes de dominación mundial.
NumPy-Discussion mailing list
NumPy-Discussion <at>
Nicola | 15 Oct 16:21 2014

Endianness not detected in OS X Lion

I am trying to install numpy 1.9.0 on OS X Lion 10.7.5 and I get the 
same error as reported in this old thread:

that is,

numpy/core/src/npymath/npy_math_private.h:78:3: error: conflicting types 
for 'ieee_double_shape_type'
 } ieee_double_shape_type;
numpy/core/src/npymath/npy_math_private.h:64:3: note: previous 
declaration of 'ieee_double_shape_type' was here
 } ieee_double_shape_type;
In file included from numpy/core/src/npymath/npy_math.c.src:56:0:
numpy/core/src/npymath/npy_math_private.h:78:3: error: conflicting types 
for 'ieee_double_shape_type'
 } ieee_double_shape_type;
numpy/core/src/npymath/npy_math_private.h:64:3: note: previous 
declaration of 'ieee_double_shape_type' was here
 } ieee_double_shape_type;

no matter whether I use the system's clang or gcc 4.9 (installed with 
Homebrew). Is there a workaround for this?

Neal Becker | 14 Oct 21:36 2014

array indexing question

I'm using np.nonzero to construct the tuple:
(array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2]), array([1, 3, 5, 7, 2, 3, 6, 7, 4, 
5, 6, 7]))

Now what I want is the 2-D index array:


Any ideas?


-- Those who don't understand recursion are doomed to repeat it
Fadzil Mnor | 14 Oct 12:50 2014

numpy.mean slicing in a netCDF file

Hi all,
I wrote a script and plot monthly mean zonal wind (from a netcdf file names and I'm not sure if I'm doing it correctly. What I have:

#this part calculates mean values for january only, from 1980-2010; thus the index looks like this 396:757:12

def ujan():
    f = nc.Dataset('~/data/ncep/')
    u10_1 = f.variables['uwnd']
    u10_2 = np.mean(u10_1[396:757:12,:,38,39:43],axis=0)
    return u10_2

uJan = ujan() #calling function

#this part is only to define lon, lat and level 
q = nc.Dataset('~/data/ncep/') 

#for some reason I need to define this unless it gave error "length of x must be number of column in z"


#begin plotting

fig = plt.figure(figsize=(11, 8))
ax = fig.add_subplot(111)
ax.axis([97.5, 105., 1000., 10.])
ax.tick_params(direction='out', which='both')
ax.set_xlabel('Lon (degrees)')
ax.set_ylabel('Pressure (mb)')
ax.set_xticks(np.arange(97.5, 105., .5))
ax.set_yticks([1000, 700, 500, 300, 100, 10])
cs=ax.contourf(lon, lev, uJan, clevs, extend='both',cmap='seismic')
plt.title('Zonal winds average (Jan, 1981-2010)')
cax = fig.add_axes([0.99, 0.1, 0.03, 0.8]) 
plt.savefig('~/uwind-crossection-test.png', bbox_inches='tight')

the result is attached.
I have no idea how to confirm the result (at least until this email is written) , but I believe the lower altitude values should be mostly negative, because over this region, the zonal wind are usually easterly (thus,negative values), but I got positive values.

Put the information above aside, I just want to know if my slicing in the ujan() function is correct. If it is, then, there must be nothing wrong(except my above mentioned assumption).
The data file dimension is:

This part:
u10_2 = np.mean(u10_1[396:757:12,:,38,39:43],axis=0)
The line above will calculate the mean of zonal wind (uwnd) in a range of time index 396 to 757 for each year (january only), for all vertical level, at latitude index 38 (5 N) and in between longitude index 39 to 43 (97.5E-105E).
I assume it will calculate a 30-year average of zonal wind for january only.
Is this correct?

Thank you.

NumPy-Discussion mailing list
NumPy-Discussion <at>