Julian Taylor | 18 Jul 16:13 2014

proposal: new commit guidelines for backportable bugfixes

hi,
I have been doing a lot of backporting for the last few bugfix
releases and noticed that our current approach committing to master
and cherrypicking is not so good for the git history.
When cherry picking a bugfix from master to a maintenance branch both
branches contain a commit with the same content and git knows of no
relation between them. This causes unnecessary merge conflicts when
cherry picking two changes that modify the same file. The git version
(1.9.1) I am using is not smart enough too figure out the changesets
in both leaf commits is the same.
Additionally the output of `git log maintenance/1.9.x..master` becomes
very large as all already backported issues appear again in master.
[0]

To help with this I want to propose new best practices for pull
requests of bugfixes suitable for backporting.
Instead of basing the bugfix on the head commit of the master, base
them on the merge base between master and the latest maintenance
branch.
This allows merging the PR into both master and the maintenance branch
without pulling in any extra changes from either branches.
Then both branches contain the same commit and gits automerging can
work better and git log will only show you the commits that are only
really on one branch or the other.

In practice this is very simple. You can still develop your bugfix on
master but before you push it you just run:

git rebase --onto $(git merge-base master maintenance/1.9.x) HEAD^

(Continue reading)

josef.pktd | 18 Jul 13:07 2014
Picon

problems with mailing list ?

Are the problems with sending out the messages with the mailing lists?

I'm getting some replies without original messages, and in some threads I don't get replies, missing part of the discussions.


Josef
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Robert Lupton the Good | 17 Jul 15:48 2014
Picon

Re: Numpy BoF at SciPy 2014 - quick report

Having just re-read the PEP I'm concerned that this proposal leaves at least one major (?) trap for naive
users, namely
	x = np.array([1, 10])
	print X.T <at> x
which will print 101, not [[1, 10], [10, 100]]

Yes, I know why this is happening but it's still a problem -- the user said, "I'm thinking matrices" when they
wrote  <at>  but the x.T had done the "wrong" thing before the  <at>  kicked in.  And yes, a savvy user would have written
x = np.ones([[1, 10]]) (but then np.dot(x, x.T) isn't a scalar).

This is the way things are at present, but with the new  <at>  syntax coming in I think we should consider fixing it.

I can think of three possibilities:
	1. Leave this as a trap for the unwary, and a reason for people to stick to np.matrix (np.matrix([1, 10])
behaves "correctly")
	2. Make x.T a syntax error for 1-D arrays.  It's a no-op and IMHO a trap. 
	3. Make x.T promote the shape == (2,) array to (1, 2) and return a (2, 1) array.  This may be too magic, but it's
my preferred solution.

						R

> Implementation of  <at>  (matrix multiplication)
>  - will be in 3.5 ~ 18months
>  - no work started yet -- have to make sure we do it.
>  -  <at>  <at>  was not added.
>  - The PEP for numpy is well-defined. Not much thinking to be done. (Good for a sprint)
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Fernando Perez | 17 Jul 05:08 2014
Picon

Numpy BoF at SciPy 2014 - quick report

Hi all,

sorry for not posting earlier, post-conference InboxInfinity blues and all that...

The BoF did go as planned, and it was a good discussion, mostly following the tentative agenda outlined here:


Various folks were kind enough to take notes during the conversation on an Etherpad instance:


For the sake of completeness and future reference, below I'm including a copy of the notes in this email.

Other than what's in the notes, my take home from the discussion is mostly that:

- we probably needed a longer slot than 45 minutes to have a chance to dig in a little deeper.

- it would have been more productive if a focused numpy sprint had been also planned, so that there could be more structured follow-up on the ideas that came up.

It would be great to hear from others who were present at the conference. In particular, Chris Barker brought up a number of things regarding datetime and planned on following up during the sprints, but I'm not sure what ended up happening.

Thanks to everyone who participated!

Cheers

f


#### Copy of Etherpad notes as of 7/16/2014:

Notes from BoF:
  1:30, July 19, 2014
  
  
Working with topics on this page: 

chuck: where do we go from here? -- what is the role of numpy now?

Generalized ufuncs -- still some more to do -- (LA stuff - norms)
 - some ufuncs don't impliment array interface -- which are those -- sprint topic?
 - zeros_like, ones_like, more... (duplicate) github issue: https://github.com/numpy/numpy/issues/4862
 
 Here's the original issue: https://github.com/numpy/numpy/issues/3602

Implementation of <at> (matrix multiplication)
 - will be in 3.5 ~ 18months
 - no work started yet -- have to make sure we do it.
 - <at> <at> was not added.
 - The PEP for numpy is well-defined. Not much thinking to be done. (Good for a sprint)
 
 Datetime: 
  - Can it be done? -- too many calendars -- to many time scales, etc.
  -  Can we cover most applications?
  - DynND -- higher abstraction -- convert to back end implimentation
  - Also look at what R and Julia do?
  - Maybe fix up the little issues in datetime64, first?
  - Pandas does not use numpy machinery
    - uses a array of objects: those objects are subclassed form datetime.datetime
     - does use int64, but gets unboxed on storage.
  - Root cause is using UTC, rather than a naive time.
   - Naive is not associated with a time zone. Can be interpreted in any way.
    - Ripping out the locale timezone on I/O would help.
    - More often than not, using the locale timezone is not desired.
   - For example, many experimental data do not attach time zones. (Or wrong timezone)
   - Consider laboratory time (stopwatch rather than a clock). (timedelta)
   - The C++ committee is standardizing this.
   - A key feature which is missing, is being able to choose your epoch.

New DTypes
 - Example: quad float types. A solution for missing values? Adding units support.
 - Record & structured arrays play around with dtypes. Needs to be easier to use these.
 - Improve documentation.
 - How to extend to support things like labeled arrays?
  - This is orthogonal to dtypes.
  - Would rather access time column instead of 3rd column.
  - Would provide a better foundation for pandas.
 - Key is to keep inputs simple.
 - Finish the DataArray push?
  - We are very closely there. It has been sitting there for a while.
  - If interested, talk at sprints on July 10.

Missing values?
 - maybe improve masked array.
 - give up for now.

Inheriting ndarray
 - introduces many bugs.
 - should discourage this, but make it easier to work with it.

Dynd
 - The issues discussed so far were motivation for starting dynd
  - for example, a pluggable type system
  - adding a categorical type in numpy (at Continuum) broke lots. Easier in dynd.
 - Commitment for dynd is to give it a numpy-like API
 - Both need to evolve together.
  - Find ways to make things more uniform (in numpy)
  - Dynd is more an experimental phase, changing quickly.
 - Can we import dynd as np?
  - Not a goal. More exploratory in this phase.
  - Adding a layer like that at a later time would be good. Not there, yet.
  - Do not want to repeat py2->py3 debacle.
 - Buffer protocol:
  - Supported, but dynd extends it.
  - As a pure C++ library, goal is to freeze once stable so systems beyond Python can depend on it as a stable interface for working with array data.

Boost::Python
 - Nothing official from numpy for using numpy arrays in C++
 - Not prioritized.
 - Numpy has gotten better about namespace pollution?
 - It kind of works already. Talk to Mike Droettboom

--
Fernando Perez ( <at> fperez_org; http://fperez.org)
fperez.net-at-gmail: mailing lists only (I ignore this when swamped!)
fernando.perez-at-berkeley: contact me here for any direct mail
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Julian Taylor | 17 Jul 00:20 2014

parallel distutils extensions build? use gcc -flto

hi,
I have been playing around a bit with gccs link time optimization
feature and found that using it actually speeds up a from scratch build
of numpy due to its ability to perform parallel optimization and linking.
As a bonus you also should get faster binaries due to the better
optimizations lto allows.

As compiling with lto does require some possibly lesser know details I
wanted to share it.

Prerequesits are a working gcc toolchain of at least gcc-4.8 and
binutils > 2.21, gcc 4.9 is better as its faster.

First of all numpy checks the long double representation by compiling a
file and looking at the binary, this won't work as the od -b
reimplementation here does not understand lto objects, so on x86 we must
short circuit that:
--- a/numpy/core/setup_common.py
+++ b/numpy/core/setup_common.py
 <at>  <at>  -174,6 +174,7  <at>  <at>  def check_long_double_representation(cmd):
     # We need to use _compile because we need the object filename
     src, object = cmd._compile(body, None, None, 'c')
     try:
+	return 'IEEE_DOUBLE_LE'
         type = long_double_representation(pyod(object))
         return type
     finally:

Next we build numpy as usual but override the compiler, linker and ar to
add our custom flags.
The setup.py call would look like this:

CC='gcc -fno-fat-lto-objects -flto=4 -fuse-linker-plugin -O3' \
LDSHARED='gcc -fno-fat-lto-objects -flto=4 -fuse-linker-plugin -shared
-O3' AR=gcc-ar \
python setup.py build_ext

Some explanation:
The ar override is needed as numpy builds a static library and ar needs
to know about lto objects. gcc-ar does exactly that.
-flto=4 the main flag tell gcc to perform link time optimizations using
4 parallel processes.
-fno-fat-lto-objects tells gcc to only build lto objects, normally it
builds both an lto object and a normal object for toolchain
compatibilty. If our toolchain can handle lto objects this is just a
waste of time and we skip it. (The flag is default in gcc-4.9 but not 4.8)
-fuse-linker-plugin directs it to run its link time optimizer plugin in
the linking step, the linker must support plugins, both bfd (> 2.21) and
gold linker do so. This allows for more optimizations.
-O3 has to be added to the linker too as thats where the optimization
occurs. In general a problem with lto is that the compiler options of
all steps much match the flags used for linking.

If you are using c++ or gfortran you also have to override that to use
lto (CXX and FF(?))

See https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html for a lot
more details.

For some numbers on my machine a from scratch numpy build with no
caching takes 1min55s, with lto on 4 it only takes 55s. Pretty neat for
a much more involved optimization process.

Concerning the speed gain we get by this, I ran our benchmark suite with
this build, there were no really significant gains which is somewhat
expected as numpy is simple C code with most function bottlenecks
already inlined.

So conclusion: flto seems to work well with recent gccs and allows for
faster builds using the limited distutils. While probably not useful for
development where compiler caching (ccache) is of utmost importance it
is still interesting for projects doing one shot uncached builds (travis
like CI) and have huge objects (e.g. swig or cython) and don't want to
change to proper parallel build systems like bento.

PS: So far I know clang also supports lto but I never used it
PPS: using NPY_SEPARATE_COMPILATION=0 crashes gcc-4.9, time for a bug
report.

Cheers,
Julian
Nathaniel Smith | 17 Jul 13:04 2014
Picon

Mailing list slowdown (was Re: __numpy_ufunc__)

On 17 Jul 2014 11:51, "Sebastian Berg" <sebastian <at> sipsolutions.net> wrote:
>
> On Mi, 2014-07-16 at 09:07 +0100, Nathaniel Smith wrote:
> > Weirdly, I never received Chuck's original email in this thread.
> > Should some list admin be informed?
> >
>
> I send some mails yesterday and they never arrived... Not sure if it is
> a problem on my side or not.

I did eventually get Chuck's original message, but not until several days later.

CC'ing postmaster <at> enthought.com in case they have some insight into what's going on!

-n

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Tony Yu | 16 Jul 06:37 2014
Picon

`allclose` vs `assert_allclose`

Is there any reason why the defaults for `allclose` and `assert_allclose` differ? This makes debugging a broken test much more difficult. More importantly, using an absolute tolerance of 0 causes failures for some common cases. For example, if two values are very close to zero, a test will fail:

    np.testing.assert_allclose(0, 1e-14)

Git blame suggests the change was made in the following commit, but I guess that change only reverted to the original behavior.


It seems like the defaults for  `allclose` and `assert_allclose` should match, and an absolute tolerance of 0 is probably not ideal. I guess this is a pretty big behavioral change, but the current default for `assert_allclose` doesn't seem ideal.

Thanks,
-Tony
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Neil Hodgson | 15 Jul 11:22 2014
Picon

Bug in np.cross for 2D vectors

Hi,

We came across this bug while using np.cross on 3D arrays of 2D vectors.
The first example shows the problem and we looked at the source for np.cross and believe we found the bug - an unnecessary swapaxes when returning the output (comment inserted in the code).

Thanks
Neil

# Example

shape = (3,5,7,2)

# These are effectively 3D arrays (3*5*7) of 2D vectors
data1 = np.random.randn(*shape)
data2 = np.random.randn(*shape)

# The cross product of data1 and data2 should produce a (3*5*7) array of scalars
cross_product_longhand = data1[:,:,:,0]*data2[:,:,:,1]-data1[:,:,:,1]*data2[:,:,:,0]
print 'longhand output shape:',cross_product_longhand.shape # and it does

cross_product_numpy = np.cross(data1,data2)
print 'numpy output shape:',cross_product_numpy.shape # It seems to have transposed the last 2 dimensions

if (cross_product_longhand == np.transpose(cross_product_numpy, (0,2,1))).all():
print 'Unexpected transposition in numpy.cross (numpy version %s)'%np.__version__

# np.cross L1464
if axis is not None:
    axisa, axisb, axisc=(axis,)*3
a = asarray(a).swapaxes(axisa, 0)
b = asarray(b).swapaxes(axisb, 0)
msg = "incompatible dimensions for cross product\n"\
      "(dimension must be 2 or 3)"
if (a.shape[0] not in [2, 3]) or (b.shape[0] not in [2, 3]):
     raise ValueError(msg)
if a.shape[0] == 2:
    if (b.shape[0] == 2):
        cp = a[0]*b[1] - a[1]*b[0]
        if cp.ndim == 0:
            return cp
        else:
            ## WE SHOULD NOT SWAPAXES HERE!
            ## For 2D vectors the first axis has been
            ## collapsed during the cross product
            return cp.swapaxes(0, axisc)

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Charles R Harris | 14 Jul 20:22 2014
Picon

__numpy_ufunc__

Hi All,

Julian has raised the question of including numpy_ufunc in numpy 1.9. I don't feel strongly one way or the other, but it doesn't seem to be finished yet and 1.10 might be a better place to work out the remaining problems along with the astropy folks testing possible uses.

Thoughts?

Chuck
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Charles R Harris | 12 Jul 19:17 2014
Picon

String type again.

As previous posts have pointed out, Numpy's `S` type is currently treated as a byte string, which leads to more complicated code in python3. OTOH, the unicode type is stored as UCS4, which consumes a lot of space, especially for ascii strings. This note proposes to adapt the currently existing 'a' type letter, currently aliased to 'S', as a new fixed encoding dtype. Python 3.3 introduced two one byte internal representations for unicode strings, ascii and latin1. Ascii has the advantage that it is a subset of UTF-8, whereas latin1 has a few more symbols. Another possibility is to just make it an UTF-8 encoding, but I think this would involve more overhead as Python would need to determine the maximum character size. These are just preliminary thoughts, comments are welcome.

Chuck 
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Josè Luis Mietta | 12 Jul 18:53 2014
Picon

plt.show() and plt.draw() doesnt work

Hi experts!

I have a numpy array M. I generate a graph using NetworkX and then I want to draw this graph:

    import networkx as nx
    import matplotlib.pyplot as plt
    G=nx.graph(M)
    nx.draw(G)
    plt.draw()

Doing this, no picture appears. In addition, if I do `plt.show()` no picture appears.

Please help!

Best regards
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Gmane