Neil Girdhar | 9 Apr 01:34 2015
Picon

Consider improving numpy.outer's behavior with zero-dimensional vectors

Numpy's outer product works fine with vectors.  However, I seem to always want len(outer(a, b).shape) to be equal to len(a.shape) + len(b.shape).  Wolfram-alpha seems to agree https://reference.wolfram.com/language/ref/Outer.html with respect to matrix outer products.  My suggestion is to define outer as defined below.  I've contrasted it with numpy's current outer product.

In [36]: def a(n): return np.ones(n)

In [37]: b = a(())

In [38]: c = a(4)

In [39]: d = a(5)

In [40]: np.outer(b, d).shape
Out[40]: (1, 5)

In [41]: np.outer(c, d).shape
Out[41]: (4, 5)

In [42]: np.outer(c, b).shape
Out[42]: (4, 1)

In [43]: def outer(a, b):
    return a[(...,) + len(b.shape) * (np.newaxis,)] * b
   ....:

In [44]: outer(b, d).shape
Out[44]: (5,)

In [45]: outer(c, d).shape
Out[45]: (4, 5)

In [46]: outer(c, b).shape
Out[46]: (4,)

Best,

Neil
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Thiago Franco Moraes | 7 Apr 15:54 2015
Picon

Research position in the Brazilian Research Institute for Science and Neurotechnology - BRAINN

Research position in the Brazilian Research Institute for Science and Neurotechnology – BRAINN

Postdoc researcher to work with software development for medical imaging

The Brazilian Research Institute for Neuroscience and Neurotechnology (BRAINN) (www.brainn.org.br) focuses on the investigation of basic mechanisms leading to epilepsy and stroke, and the injury mechanisms that follow disease onset and progression. This research has important applications related to prevention, diagnosis, treatment and rehabilitation and will serve as a model for better understanding normal and abnormal brain function. The BRAINN Institute is composed of 10 institutions from Brazil and abroad and hosted by State University of Campinas (UNICAMP). Among the associated institutions is Renato Archer Information Technology Center (CTI) that has a specialized team in open-source software development for medical imaging (www.cti.gov.br/invesalius) and 3D printing applications for healthcare. CTI is located close the UNICAMP in the city of Campinas, State of São Paulo in a very technological region of Brazil and is looking for a postdoc researcher to work with software development for medical imaging related to the imaging analysis, diagnosis and treatment of brain diseases. The postdoc position is for two years with the possibility of being renovated  for more two years.

Education
- PhD in computer science, computer engineering, mathematics, physics or related.

Requirements
- Digital image processing (Medical imaging)
- Computer graphics (basic)

Benefits
6.143,40 Reais per month free of taxes (about US$ 2.000,00);
15% technical reserve for conferences participation and specific materials acquisition;

Interested
Send curriculum to: jorge.silva <at> cti.gov.br with subject “Postdoc position”
Applications reviews will begin April 30, 2015 and continue until the position is filled.
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Andreas Hilboll | 7 Apr 17:14 2015
Picon

FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.

Hi all,

I'm commonly using function signatures like

   def myfunc(a, b, c=None):
       if c is None:
           # do something ...
       ...

where c is an optional array argument.  For some time now, I'm getting a

   FutureWarning: comparison to `None` will result in an elementwise
   object comparison in the future

from the "c is None" comparison.  I'm wondering what would be the best
way to do this check in a future-proof way?

Best,
-- Andreas.
Nathaniel Smith | 8 Apr 03:06 2015
Picon

On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal)

On Apr 5, 2015 7:04 AM, "Robert Kern" <robert.kern <at> gmail.com> wrote:
>
> On Sat, Apr 4, 2015 at 10:38 PM, Nathaniel Smith <njs <at> pobox.com> wrote:
> >
> > On Apr 4, 2015 4:12 AM, "Todd" <toddrjen <at> gmail.com> wrote:
> > >
> > > There was no break as large as this. In fact I would say this is even a larger change than any individual change we saw in the python 2 to 3 switch.  The basic mechanics of indexing are just too fundamental and touch on too many things to make this sort of change feasible.
> >
> > I'm afraid I'm not clever enough to know how large or feasible a change is without even seeing the proposed change.
>
> It doesn't take any cleverness. The change in question was to make the default indexing semantics to orthogonal indexing. No matter the details of the ultimate proposal to achieve that end, it has known minimum consequences, at least in the broad outline. Current documentation and books become obsolete for a fundamental operation. Current code must be modified by some step to continue working. These are consequences inherent in the end, not just the means to the end; we don't need a concrete proposal in front of us to know what they are. There are ways to mitigate these consequences, but there are no silver bullets that eliminate them. And we can compare those consequences to approaches like Jaime's that achieve a majority of the benefits of such a change without any of the negative consequences. That comparison does not bode well for any proposal.

Ok, let me try to make my point another way.

I don't actually care at this stage in the discussion whether the change is ultimately viable. And I don't think you should either. (For values of "you" that includes everyone in the discussion, not picking on Robert in particular :-).)

My point is that rational, effective discussion requires giving ideas room to breath. Sometimes ideas turn out to be not as bad as they looked. Sometimes it turns out that they are, but there's some clever tweak that gives you 95% of the benefits for 5% of the cost. Sometimes you generate a better understanding of the tradeoffs that subsequently informs later design decisions. Sometimes working through the details makes both sides realize that there's a third option that solves both their problems. Sometimes you merely get a very specific understanding of why the whole approach is unreasonable that you can then, say, take to the pandas and netcdf developers as evidence of that you made a good faith effort and ask them to meet you half way. And all these things require understanding the specifics of what *exactly* works or doesn't work about about idea. IMHO, it's extremely misleading at this stage to make any assertion about whether Jaime's approach gives the "majority of benefits of such a change" is extremely misleading at this stage: not because it's wrong, but because it totally short-circuits the discussion about what benefits we care about. Jaime's patch certainly has merits, but does it help us make numpy and pandas/netcdf's more compatible? Does it make it easier for Eric to teach? Those are some specific merits that we might care about a lot, and for which Jaime's patch may or may not help much. But that kind of nuance gets lost when we jump straight to debating thumbs-up versus thumbs-down.

I cross-my-heart promise that under the current regime, no PR breaking fancy indexing would ever get anywhere *near* numpy master without *extensive* discussion and warnings on the list. The core devs just spent weeks quibbling about whether a PR that adds a new field to the end of the dtype struct would break ABI backcompat (we're now pretty sure it doesn't), and the current standard we enforce is that every PR that touches public API needs a list discussion, even minor extensions with no compatibility issues at all. No one is going to sneak anything by anyone.

Plus, I dunno, our current approach to discussions just seems to make things hostile and shouty and unpleasant. If a grad student or junior colleague comes to you with an idea where you see some potentially critical flaw, do you yell THAT WILL NEVER WORK and kick them out of your office? Or, do you maybe ask a few leading questions and see where they go?

I think things will work better if the next time something like this comes up, *one* person just says "hmm, interesting idea, but the backcompat issues seem pretty severe; do you have any ideas about how to mitigate that?", and then we let that point be taken as having been made and see where the discussion goes. Maybe we can all give it a try?

-n

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Nicholas Devenish | 7 Apr 01:49 2015
Picon

Multidimensional Indexing

With the indexing example from the documentation:

y = np.arange(35).reshape(5,7)

Why does selecting an item from explicitly every row work as I’d expect:
>>> y[np.array([0,1,2,3,4]),np.array([0,0,0,0,0])]
array([ 0,  7, 14, 21, 28])

But doing so from a full slice (which, I would naively expect to mean “Every Row”) has some…other… behaviour:

>>> y[:,np.array([0,0,0,0,0])]
array([[ 0,  0,  0,  0,  0],
       [ 7,  7,  7,  7,  7],
       [14, 14, 14, 14, 14],
       [21, 21, 21, 21, 21],
       [28, 28, 28, 28, 28]])

What is going on in this example, and how do I get what I expect? By explicitly passing in an extra array with
value===index? What is the rationale for this difference in behaviour?

Thanks,

Nick
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Nathaniel Smith | 7 Apr 01:49 2015
Picon

OS X wheels: speed versus multiprocessing

Hi all,

Starting with 1.9.1, the official numpy OS X wheels (the ones you get by doing "pip install numpy") have been built to use Apple's Accelerate library for linear algebra. This is fast, but it breaks multiprocessing in obscure ways (e.g. see this user report: https://github.com/numpy/numpy/issues/5752).

Unfortunately, there is no obvious best solution to what linear algebra package to use, so we have to make a decision as to which set of compromises we prefer.

Options:

Accelerate: fast, but breaks multiprocessing as above.

OpenBLAS: fast, but Julian raised concerns about its trustworthiness last year (http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069659.html). Possibly things have improved since then (I get the impression that they've gotten some additional developer attention from the Julia community), but I don't know.

Atlas: slower (faster than reference blas but definitely slower than fancy options like the above), but solid.

My feeling is that for wheels in particular it's more important that everything "just work" than that we get the absolute fastest speeds. And this is especially true for the multiprocessing issue, given that it's a widely used part of the stdlib, the failures are really obscure/confusing, and there is no workaround for python 2 which is still where a majority of our users still are. So I'd vote for using either atlas or OpenBLAS. (And would defer to Julian and Matthew about which to choose between these.)

Any opinions, objections?

-n

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Charles R Harris | 6 Apr 21:22 2015
Picon

1.10 release again.

Hi All,

I'd like to mark current PR's for inclusion in 1.10. If there is something that you want to have in the release, please mention it here by PR #.I think new enhancement PR's should be considered for 1.11 rather than 1.10, but bug fixes will go in. There is some flexibility, of course, as there are always last minute items that come up when release contents are begin decided.

Chuck
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
John Kirkham | 4 Apr 17:52 2015
Picon

Re: Fix masked arrays to properly edit views

Hey Eric,

That's a good point. I remember seeing this behavior before and thought it was a bit odd.

Best,
John

> On Mar 16, 2015, at 2:20 AM, numpy-discussion-request <at> scipy.org wrote:
> 
> Send NumPy-Discussion mailing list submissions to
>    numpy-discussion <at> scipy.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>    http://mail.scipy.org/mailman/listinfo/numpy-discussion
> or, via email, send a message with subject or body 'help' to
>    numpy-discussion-request <at> scipy.org
> 
> You can reach the person managing the list at
>    numpy-discussion-owner <at> scipy.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of NumPy-Discussion digest..."
> 
> 
> Today's Topics:
> 
>   1. Re: Fix masked arrays to properly edit views (Eric Firing)
>   2. Rewrite np.histogram in c? (Robert McGibbon)
>   3. numpy.stack -- which function, if any,    deserves the name?
>      (Stephan Hoyer)
>   4. Re: Rewrite np.histogram in c? (Jaime Fern?ndez del R?o)
>   5. Re: Rewrite np.histogram in c? (Robert McGibbon)
>   6. Re: Rewrite np.histogram in c? (Robert McGibbon)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Sat, 14 Mar 2015 14:01:04 -1000
> From: Eric Firing <efiring <at> hawaii.edu>
> Subject: Re: [Numpy-discussion] Fix masked arrays to properly edit
>    views
> To: numpy-discussion <at> scipy.org
> Message-ID: <5504CBC0.1080502 <at> hawaii.edu>
> Content-Type: text/plain; charset=windows-1252; format=flowed
> 
>> On 2015/03/14 1:02 PM, John Kirkham wrote:
>> The sample case of the issue (
>> https://github.com/numpy/numpy/issues/5558 ) is shown below. A proposal
>> to address this behavior can be found here (
>> https://github.com/numpy/numpy/pull/5580 ). Please give me your feedback.
>> 
>> 
>> I tried to change the mask of `a` through a subindexed view, but was
>> unable. Using this setup I can reproduce this in the 1.9.1 version of NumPy.
>> 
>>     import numpy as np
>> 
>>     a = np.arange(6).reshape(2,3)
>>     a = np.ma.masked_array(a, mask=np.ma.getmaskarray(a), shrink=False)
>> 
>>     b = a[1:2,1:2]
>> 
>>     c = np.zeros(b.shape, b.dtype)
>>     c = np.ma.masked_array(c, mask=np.ma.getmaskarray(c), shrink=False)
>>     c[:] = np.ma.masked
>> 
>> This yields what one would expect for `a`, `b`, and `c` (seen below).
>> 
>>      masked_array(data =
>>        [[0 1 2]
>>         [3 4 5]],
>>                   mask =
>>        [[False False False]
>>         [False False False]],
>>              fill_value = 999999)
>> 
>>      masked_array(data =
>>        [[4]],
>>                   mask =
>>        [[False]],
>>              fill_value = 999999)
>> 
>>      masked_array(data =
>>        [[--]],
>>                   mask =
>>        [[ True]],
>>              fill_value = 999999)
>> 
>> Now, it would seem reasonable that to copy data into `b` from `c` one
>> can use `__setitem__` (seen below).
>> 
>>      b[:] = c
>> 
>> This results in new data and mask for `b`.
>> 
>>      masked_array(data =
>>        [[--]],
>>                   mask =
>>        [[ True]],
>>              fill_value = 999999)
>> 
>> This should, in turn, change `a`. However, the mask of `a` remains
>> unchanged (seen below).
>> 
>>      masked_array(data =
>>        [[0 1 2]
>>         [3 0 5]],
>>                   mask =
>>        [[False False False]
>>         [False False False]],
>>              fill_value = 999999)
> 
> I agree that this behavior is wrong.  A related oddity is this:
> 
> In [24]: a = np.arange(6).reshape(2,3)
> In [25]: a = np.ma.array(a, mask=np.ma.getmaskarray(a), shrink=False)
> In [27]: a.sharedmask
> True
> In [28]: a.unshare_mask()
> In [30]: b = a[1:2, 1:2]
> In [31]: b[:] = np.ma.masked
> In [32]: b.sharedmask
> False
> In [33]: a
> masked_array(data =
>  [[0 1 2]
>  [3 -- 5]],
>              mask =
>  [[False False False]
>  [False  True False]],
>        fill_value = 999999)
> 
> It looks like the sharedmask property simply is not being set and 
> interpreted correctly--a freshly initialized array has sharedmask True; 
> and after setting it to False, changing the mask of a new view *does* 
> change the mask in the original.
> 
> Eric
> 
>> 
>> Best,
>> John
>> 
>> 
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion <at> scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Sun, 15 Mar 2015 21:32:49 -0700
> From: Robert McGibbon <rmcgibbo <at> gmail.com>
> Subject: [Numpy-discussion] Rewrite np.histogram in c?
> To: Discussion of Numerical Python <numpy-discussion <at> scipy.org>
> Message-ID:
>    <CAN4+E8Ff_Ck-9GBRCbSTq6qPiuGxgKeiX3+kKrXn4NM-Lnn6rg <at> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> Hi,
> 
> Numpy.histogram is implemented in python, and is a little sluggish. This
> has been discussed previously on the mailing list, [1, 2]. It came up in a
> project that I maintain, where a new feature is bottlenecked by
> numpy.histogram, and one developer suggested a faster implementation in
> cython [3].
> 
> Would it make sense to reimplement this function in c? or cython? Is moving
> functions like this from python to c to improve performance within the
> scope of the development roadmap for numpy? I started implementing this a
> little bit in c, [4] but I figured I should check in here first.
> 
> -Robert
> 
> [1]
> http://scipy-user.10969.n7.nabble.com/numpy-histogram-is-slow-td17208.html
> [2] http://numpy-discussion.10968.n7.nabble.com/Fast-histogram-td9359.html
> [3] https://github.com/mdtraj/mdtraj/pull/734
> [4] https://github.com/rmcgibbo/numpy/tree/histogram
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
http://mail.scipy.org/pipermail/numpy-discussion/attachments/20150315/84ca916d/attachment-0001.html 
> 
> ------------------------------
> 
> Message: 3
> Date: Sun, 15 Mar 2015 22:12:40 -0700
> From: Stephan Hoyer <shoyer <at> gmail.com>
> Subject: [Numpy-discussion] numpy.stack -- which function, if any,
>    deserves the name?
> To: Discussion of Numerical Python <numpy-discussion <at> scipy.org>
> Message-ID:
>    <CAEQ_TvdQwV52_NKnLpM9+cp681NhV5cUEiigmLMtyBkTnzyOcA <at> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> In the past months there have been two proposals for new numpy functions
> using the name "stack":
> 
> 1. np.stack for stacking like np.asarray(np.bmat(...))
> http://thread.gmane.org/gmane.comp.python.numeric.general/58748/
> https://github.com/numpy/numpy/pull/5057
> 
> 2. np.stack for stacking along an arbitrary new axis (this was my proposal)
> http://thread.gmane.org/gmane.comp.python.numeric.general/59850/
> https://github.com/numpy/numpy/pull/5605
> 
> Both functions generalize the notion of stacking arrays from the existing
> hstack, vstack and dstack, but in two very different ways. Both could be
> useful -- but we can only call one "stack". Which one deserves that name?
> 
> The existing *stack functions use the word "stack" to refer to combining
> arrays in two similarly different ways:
> a. For ND -> ND stacking along an existing dimensions (like
> numpy.concatenate and proposal 1)
> b. For ND -> (N+1)D stacking along new dimensions (like proposal 2).
> 
> I think it would be much cleaner API design if we had different words to
> denote these two different operations. Concatenate for "combine along an
> existing dimension" already exists, so my thought (when I wrote proposal
> 2), was that the verb "stack" could be reserved (going forward) for
> "combine along a new dimension." This also has the advantage of suggesting
> that "concatenate" and "stack" are the two fundamental operations for
> combining N-dimensional arrays. The documentation on this is currently
> quite confusing, mostly because no function like that in proposal 2
> currently exists.
> 
> Of course, the *stack functions have existed for quite some time, and in
> many cases vstack and hstack are indeed used for concatenate like
> functionality (e.g., whenever they are used for 2D arrays/matrices). So the
> case is not entirely clear-cut. (We'll never be able to remove this
> functionality from NumPy.)
> 
> In any case, I would appreciate your thoughts.
> 
> Best,
> Stephan
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
http://mail.scipy.org/pipermail/numpy-discussion/attachments/20150315/5a72a8bb/attachment-0001.html 
> 
> ------------------------------
> 
> Message: 4
> Date: Sun, 15 Mar 2015 23:00:33 -0700
> From: Jaime Fern?ndez del R?o <jaime.frio <at> gmail.com>
> Subject: Re: [Numpy-discussion] Rewrite np.histogram in c?
> To: Discussion of Numerical Python <numpy-discussion <at> scipy.org>
> Message-ID:
>    <CAPOWHWmFckwXLcGy+5tSEyQE8VTOrBg0ubKdYeJ8DZywJL_w3g <at> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
>> On Sun, Mar 15, 2015 at 9:32 PM, Robert McGibbon <rmcgibbo <at> gmail.com> wrote:
>> 
>> Hi,
>> 
>> Numpy.histogram is implemented in python, and is a little sluggish. This
>> has been discussed previously on the mailing list, [1, 2]. It came up in a
>> project that I maintain, where a new feature is bottlenecked by
>> numpy.histogram, and one developer suggested a faster implementation in
>> cython [3].
>> 
>> Would it make sense to reimplement this function in c? or cython? Is
>> moving functions like this from python to c to improve performance within
>> the scope of the development roadmap for numpy? I started implementing this
>> a little bit in c, [4] but I figured I should check in here first.
> 
> Where do you think the performance gains will come from? The PR in your
> project that claims a 10x speed-up uses a method that is only fit for
> equally spaced bins. I want to think that implementing that exact same
> algorithm in Python with NumPy would be comparably fast, say within 2x.
> 
> For the general case, NumPy is already doing most of the heavy lifting (the
> sorting and the searching) in C: simply replicating the same algorithmic
> approach entirely in C is unlikely to provide any major speed-up. And if
> the change is to the algorithm, then we should first try it out in Python.
> 
> That said, if you can speed things up 10x, I don't think there is going to
> be much opposition to moving it to C!
> 
> Jaime
> 
> -- 
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
> de dominaci?n mundial.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
http://mail.scipy.org/pipermail/numpy-discussion/attachments/20150315/ab2c26a9/attachment-0001.html 
> 
> ------------------------------
> 
> Message: 5
> Date: Sun, 15 Mar 2015 23:06:43 -0700
> From: Robert McGibbon <rmcgibbo <at> gmail.com>
> Subject: Re: [Numpy-discussion] Rewrite np.histogram in c?
> To: Discussion of Numerical Python <numpy-discussion <at> scipy.org>
> Message-ID:
>    <CAN4+E8GXECy8yaJRfN_NA_V8wdOZeBTLiFM0EJKtfuoONZoMvw <at> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> It might make sense to dispatch to difference c implements if the bins are
> equally spaced (as created by using an integer for the np.histogram bins
> argument), vs. non-equally-spaced bins.
> 
> In that case, getting the bigger speedup may be easier, at least for one
> common use case.
> 
> -Robert
> 
> On Sun, Mar 15, 2015 at 11:00 PM, Jaime Fern?ndez del R?o <
> jaime.frio <at> gmail.com> wrote:
> 
>> On Sun, Mar 15, 2015 at 9:32 PM, Robert McGibbon <rmcgibbo <at> gmail.com>
>> wrote:
>> 
>>> Hi,
>>> 
>>> Numpy.histogram is implemented in python, and is a little sluggish. This
>>> has been discussed previously on the mailing list, [1, 2]. It came up in a
>>> project that I maintain, where a new feature is bottlenecked by
>>> numpy.histogram, and one developer suggested a faster implementation in
>>> cython [3].
>>> 
>>> Would it make sense to reimplement this function in c? or cython? Is
>>> moving functions like this from python to c to improve performance within
>>> the scope of the development roadmap for numpy? I started implementing this
>>> a little bit in c, [4] but I figured I should check in here first.
>> 
>> Where do you think the performance gains will come from? The PR in your
>> project that claims a 10x speed-up uses a method that is only fit for
>> equally spaced bins. I want to think that implementing that exact same
>> algorithm in Python with NumPy would be comparably fast, say within 2x.
>> 
>> For the general case, NumPy is already doing most of the heavy lifting
>> (the sorting and the searching) in C: simply replicating the same
>> algorithmic approach entirely in C is unlikely to provide any major
>> speed-up. And if the change is to the algorithm, then we should first try
>> it out in Python.
>> 
>> That said, if you can speed things up 10x, I don't think there is going to
>> be much opposition to moving it to C!
>> 
>> Jaime
>> 
>> --
>> (\__/)
>> ( O.o)
>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
>> de dominaci?n mundial.
>> 
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion <at> scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
http://mail.scipy.org/pipermail/numpy-discussion/attachments/20150315/0dffb1eb/attachment-0001.html 
> 
> ------------------------------
> 
> Message: 6
> Date: Sun, 15 Mar 2015 23:19:59 -0700
> From: Robert McGibbon <rmcgibbo <at> gmail.com>
> Subject: Re: [Numpy-discussion] Rewrite np.histogram in c?
> To: Discussion of Numerical Python <numpy-discussion <at> scipy.org>
> Message-ID:
>    <CAN4+E8Ewn+tPpZBo866qH9p=1=1vA8i6kLFvrX8XKHWwazv44A <at> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> My apologies for the typo: 'implements' -> 'implementations'
> 
> -Robert
> 
> On Sun, Mar 15, 2015 at 11:06 PM, Robert McGibbon <rmcgibbo <at> gmail.com>
> wrote:
> 
>> It might make sense to dispatch to difference c implements if the bins are
>> equally spaced (as created by using an integer for the np.histogram bins
>> argument), vs. non-equally-spaced bins.
>> 
>> In that case, getting the bigger speedup may be easier, at least for one
>> common use case.
>> 
>> -Robert
>> 
>> On Sun, Mar 15, 2015 at 11:00 PM, Jaime Fern?ndez del R?o <
>> jaime.frio <at> gmail.com> wrote:
>> 
>>> On Sun, Mar 15, 2015 at 9:32 PM, Robert McGibbon <rmcgibbo <at> gmail.com>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Numpy.histogram is implemented in python, and is a little sluggish. This
>>>> has been discussed previously on the mailing list, [1, 2]. It came up in a
>>>> project that I maintain, where a new feature is bottlenecked by
>>>> numpy.histogram, and one developer suggested a faster implementation in
>>>> cython [3].
>>>> 
>>>> Would it make sense to reimplement this function in c? or cython? Is
>>>> moving functions like this from python to c to improve performance within
>>>> the scope of the development roadmap for numpy? I started implementing this
>>>> a little bit in c, [4] but I figured I should check in here first.
>>> 
>>> Where do you think the performance gains will come from? The PR in your
>>> project that claims a 10x speed-up uses a method that is only fit for
>>> equally spaced bins. I want to think that implementing that exact same
>>> algorithm in Python with NumPy would be comparably fast, say within 2x.
>>> 
>>> For the general case, NumPy is already doing most of the heavy lifting
>>> (the sorting and the searching) in C: simply replicating the same
>>> algorithmic approach entirely in C is unlikely to provide any major
>>> speed-up. And if the change is to the algorithm, then we should first try
>>> it out in Python.
>>> 
>>> That said, if you can speed things up 10x, I don't think there is going
>>> to be much opposition to moving it to C!
>>> 
>>> Jaime
>>> 
>>> --
>>> (\__/)
>>> ( O.o)
>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
>>> de dominaci?n mundial.
>>> 
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion <at> scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
http://mail.scipy.org/pipermail/numpy-discussion/attachments/20150315/d22f7d7d/attachment.html 
> 
> ------------------------------
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion <at> scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> 
> End of NumPy-Discussion Digest, Vol 102, Issue 21
> *************************************************
Ralf Gommers | 4 Apr 17:52 2015
Picon

numpy vendor repo

Hi,

Today I wanted to add something to https://github.com/numpy/vendor and realised that this repo is in pretty bad shape. A couple of years ago Ondrej took a copy of the ATLAS binaries in that repo and started a new repo (not a fork) at https://github.com/certik/numpy-vendor. The latest improvements were made by Julian and live at https://github.com/juliantaylor/numpy-vendor.

I'd like to start from numpy/vendor, then add all commits from Julian's numpy-vendor on top of it, then move things around so we have the binaries/sources/tools layout back and finally update the README so it's clear how to build both the ATLAS binaries and Numpy releases.

Any objections or better ideas?

Ralf



_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Todd | 4 Apr 13:11 2015
Picon

Re: Advanced indexing: "fancy" vs. orthogonal


On Apr 4, 2015 10:54 AM, "Nathaniel Smith" <njs <at> pobox.com> wrote:
>
> On Sat, Apr 4, 2015 at 12:17 AM, Ralf Gommers <ralf.gommers <at> gmail.com> wrote:
> >
> >
> > On Sat, Apr 4, 2015 at 1:54 AM, Nathaniel Smith <njs <at> pobox.com> wrote:
> >>
> >>
> >> But, the real problem here is that we have two different array duck
> >> types that force everyone to write their code twice. This is a
> >> terrible state of affairs! (And exactly analogous to the problems
> >> caused by np.ndarray disagreeing with np.matrix & scipy.sparse about
> >> the the proper definition of *, which PEP 465 may eventually
> >> alleviate.) IMO we should be solving this indexing problem directly,
> >> not applying bandaids to its symptoms, and the way to do that is to
> >> come up with some common duck type that everyone can agree on.
> >>
> >> Unfortunately, AFAICT this means our only options here are to have
> >> some kind of backcompat break in numpy, some kind of backcompat break
> >> in pandas, or to do nothing and continue indefinitely with the status
> >> quo where the same indexing operation might silently return different
> >> results depending on the types passed in. All of these options have
> >> real costs for users, and it isn't at all clear to me what the
> >> relative costs will be when we dig into the details of our various
> >> options.
> >
> >
> > I doubt that there is a reasonable way to quantify those costs, especially
> > those of breaking backwards compatibility. If someone has a good method, I'd
> > be interested though.
>
> I'm a little nervous about how easily this argument might turn into
> "either A or B is better but we can't be 100% *certain* which it is so
> instead of doing our best using the data available we should just
> choose B." Being a maintainer means accepting uncertainty and doing
> our best anyway.

I think the burden of proof needs to be on the side proposing a change, and the more invasive the change the higher that burden needs to be. 

When faced with a situation like this, where the proposed change will cause fundamental alterations to the most basic, high-level operation of numpy, and where the is an alternative approach with no backwards-compatibility issues, I think the burden of proof would necessarily be nearly impossibly large.

> But that said I'm still totally on board with erring on the side of
> caution (in particular, you can never go back and *un*break
> backcompat). An obvious challenge to anyone trying to take this
> forward (in any direction!) would definitely be to gather the most
> useful data possible. And it's not obviously impossible -- maybe one
> could do something useful by scanning ASTs of lots of packages (I have
> a copy of pypi if anyone wants it, that I downloaded with the idea of
> making some similar arguments for why core python should slightly
> break backcompat to allow overloading of a < b < c syntax), or adding
> instrumentation to numpy, or running small-scale usability tests, or
> surveying people, or ...
>
> (I was pretty surprised by some of the data gathered during the PEP
> 465 process, e.g. on how common dot() calls are relative to existing
> built-in operators, and on its associativity in practice.)

Surveys like this have the problem of small sample size and selection bias. Usability studies can't measure the effect of the compatibility break,  not to mention the effect on numpy's reputation. This is considerably more difficult to scan existing projects for than .dot because it depends on the type being passed (which may not even be defined in the same project). And I am not sure I much like the idea of numpy "phoning home" by default, and an opt-in had the same issues as a survey.

So to make a long story short, in this sort of situation I have a hard time imaging ways to get enough reliable, representative data to justify this level of backwards compatibility break.

> Core python broke backcompat on a regular basis throughout the python
> 2 series, and almost certainly will again -- the bar to doing so is
> *very* high, and they use elaborate mechanisms to ease the way
> (__future__, etc.), but they do it. A few months ago there was even
> some serious consideration given to changing py3 bytestring indexing
> to return bytestrings instead of integers. (Consensus was
> unsurprisingly that this was a bad idea, but there were core devs
> seriously exploring it, and no-one complained about the optics.)

There was no break as large as this. In fact I would say this is even a larger change than any individual change we saw in the python 2 to 3 switch.  The basic mechanics of indexing are just too fundamental and touch on too many things to make this sort of change feasible. It would be better to have a new language, or in this case anew project.

> It's true that numpy has something of a bad reputation in this area,
> and I think it's because until ~1.7 or so, we randomly broke stuff by
> accident on a pretty regular basis, even in "bug fix" releases. I
> think the way to rebuild that trust is to honestly say to our users
> that when we do break backcompat, we will never do it by accident, and
> we will do it only rarely, after careful consideration, with the
> smoothest transition possible, only in situations where we are
> convinced that it the net best possible solution for our users, and
> only after public discussion and getting buy-in from stakeholders
> (e.g. major projects affected). And then follow through on that to the
> best of our ability. We've certainly gotten a lot better at this over
> the last few years.
>
> If we say we'll *never* break backcompat then we'll inevitably end up
> convincing some people that we're liars, just because one person's
> bugfix is another's backcompat break. (And they're right, it is a
> backcompat break; it's just one where the benefits of the fix
> obviously outweigh the cost of the break.) Or we could actually avoid
> breaking backcompat by descending into Knuth-style stasis... but even
> there notice that none of us are actually using Knuth's TeX, we all
> use forks like XeTeX that have further changes added, which goes to
> show how futile this would be.

I think it is fair to say that some things are just too fundamental to what makes numpy numpy that they are off-limits, that people will always be able to count on those working.

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Robert Kern | 4 Apr 11:15 2015
Picon

Re: Advanced indexing: "fancy" vs. orthogonal

On Sat, Apr 4, 2015 at 9:54 AM, Nathaniel Smith <njs <at> pobox.com> wrote:
>
> On Sat, Apr 4, 2015 at 12:17 AM, Ralf Gommers <ralf.gommers <at> gmail.com> wrote:
> >
> > On Sat, Apr 4, 2015 at 1:54 AM, Nathaniel Smith <njs <at> pobox.com> wrote:

> >> So I'd be very happy to see worked out proposals for any or
> >> all of these approaches. It strikes me as really premature to be
> >> issuing proclamations about what changes might be considered. There is
> >> really no danger to *considering* a proposal;
> >
> > Sorry, I have to disagree. Numpy is already seen by some as having a poor
> > track record on backwards compatibility. Having core developers say "propose
> > some backcompat break to how indexing works and we'll consider it" makes our
> > stance on that look even worse. Of course everyone is free to make any
> > technical proposal they deem fit and we'll consider the merits of it.
> > However I'd like us to be clear that we do care strongly about backwards
> > compatibility and that the fundamentals of the core of Numpy (things like
> > indexing, broadcasting, dtypes and ufuncs) will not be changed in
> > backwards-incompatible ways.
> >
> > Ralf
> >
> > P.S. also not for a possible numpy 2.0 (or have we learned nothing from
> > Python3?).
>
> I agree 100% that we should and do care strongly about backwards
> compatibility. But you're saying in one sentence that we should tell
> people that we won't consider backcompat breaks, and then in the next
> sentence that of course we actually will consider them (even if we
> almost always reject them). Basically, I think saying one thing and
> doing another is not a good way to build people's trust.

There is a difference between politely considering what proposals people send us uninvited and inviting people to work on specific proposals. That is what Ralf was getting at.

--
Robert Kern
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Gmane