### Re: Fix masked arrays to properly edit views

John Kirkham <jakirkham <at> gmail.com>

2015-04-04 15:52:19 GMT

Hey Eric,
That's a good point. I remember seeing this behavior before and thought it was a bit odd.
Best,
John
> On Mar 16, 2015, at 2:20 AM, numpy-discussion-request <at> scipy.org wrote:
>
> Send NumPy-Discussion mailing list submissions to
> numpy-discussion <at> scipy.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> or, via email, send a message with subject or body 'help' to
> numpy-discussion-request <at> scipy.org
>
> You can reach the person managing the list at
> numpy-discussion-owner <at> scipy.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of NumPy-Discussion digest..."
>
>
> Today's Topics:
>
> 1. Re: Fix masked arrays to properly edit views (Eric Firing)
> 2. Rewrite np.histogram in c? (Robert McGibbon)
> 3. numpy.stack -- which function, if any, deserves the name?
> (Stephan Hoyer)
> 4. Re: Rewrite np.histogram in c? (Jaime Fern?ndez del R?o)
> 5. Re: Rewrite np.histogram in c? (Robert McGibbon)
> 6. Re: Rewrite np.histogram in c? (Robert McGibbon)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sat, 14 Mar 2015 14:01:04 -1000
> From: Eric Firing <efiring <at> hawaii.edu>
> Subject: Re: [Numpy-discussion] Fix masked arrays to properly edit
> views
> To: numpy-discussion <at> scipy.org
> Message-ID: <5504CBC0.1080502 <at> hawaii.edu>
> Content-Type: text/plain; charset=windows-1252; format=flowed
>
>> On 2015/03/14 1:02 PM, John Kirkham wrote:
>> The sample case of the issue (
>> https://github.com/numpy/numpy/issues/5558 ) is shown below. A proposal
>> to address this behavior can be found here (
>> https://github.com/numpy/numpy/pull/5580 ). Please give me your feedback.
>>
>>
>> I tried to change the mask of `a` through a subindexed view, but was
>> unable. Using this setup I can reproduce this in the 1.9.1 version of NumPy.
>>
>> import numpy as np
>>
>> a = np.arange(6).reshape(2,3)
>> a = np.ma.masked_array(a, mask=np.ma.getmaskarray(a), shrink=False)
>>
>> b = a[1:2,1:2]
>>
>> c = np.zeros(b.shape, b.dtype)
>> c = np.ma.masked_array(c, mask=np.ma.getmaskarray(c), shrink=False)
>> c[:] = np.ma.masked
>>
>> This yields what one would expect for `a`, `b`, and `c` (seen below).
>>
>> masked_array(data =
>> [[0 1 2]
>> [3 4 5]],
>> mask =
>> [[False False False]
>> [False False False]],
>> fill_value = 999999)
>>
>> masked_array(data =
>> [[4]],
>> mask =
>> [[False]],
>> fill_value = 999999)
>>
>> masked_array(data =
>> [[--]],
>> mask =
>> [[ True]],
>> fill_value = 999999)
>>
>> Now, it would seem reasonable that to copy data into `b` from `c` one
>> can use `____setitem____` (seen below).
>>
>> b[:] = c
>>
>> This results in new data and mask for `b`.
>>
>> masked_array(data =
>> [[--]],
>> mask =
>> [[ True]],
>> fill_value = 999999)
>>
>> This should, in turn, change `a`. However, the mask of `a` remains
>> unchanged (seen below).
>>
>> masked_array(data =
>> [[0 1 2]
>> [3 0 5]],
>> mask =
>> [[False False False]
>> [False False False]],
>> fill_value = 999999)
>
> I agree that this behavior is wrong. A related oddity is this:
>
> In [24]: a = np.arange(6).reshape(2,3)
> In [25]: a = np.ma.array(a, mask=np.ma.getmaskarray(a), shrink=False)
> In [27]: a.sharedmask
> True
> In [28]: a.unshare_mask()
> In [30]: b = a[1:2, 1:2]
> In [31]: b[:] = np.ma.masked
> In [32]: b.sharedmask
> False
> In [33]: a
> masked_array(data =
> [[0 1 2]
> [3 -- 5]],
> mask =
> [[False False False]
> [False True False]],
> fill_value = 999999)
>
> It looks like the sharedmask property simply is not being set and
> interpreted correctly--a freshly initialized array has sharedmask True;
> and after setting it to False, changing the mask of a new view ***does***
> change the mask in the original.
>
> Eric
>
>>
>> Best,
>> John
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion <at> scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> ------------------------------
>
> Message: 2
> Date: Sun, 15 Mar 2015 21:32:49 -0700
> From: Robert McGibbon <rmcgibbo <at> gmail.com>
> Subject: [Numpy-discussion] Rewrite np.histogram in c?
> To: Discussion of Numerical Python <numpy-discussion <at> scipy.org>
> Message-ID:
> <CAN4+E8Ff_Ck-9GBRCbSTq6qPiuGxgKeiX3+kKrXn4NM-Lnn6rg <at> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi,
>
> Numpy.histogram is implemented in python, and is a little sluggish. This
> has been discussed previously on the mailing list, [1, 2]. It came up in a
> project that I maintain, where a new feature is bottlenecked by
> numpy.histogram, and one developer suggested a faster implementation in
> cython [3].
>
> Would it make sense to reimplement this function in c? or cython? Is moving
> functions like this from python to c to improve performance within the
> scope of the development roadmap for numpy? I started implementing this a
> little bit in c, [4] but I figured I should check in here first.
>
> -Robert
>
> [1]
> http://scipy-user.10969.n7.nabble.com/numpy-histogram-is-slow-td17208.html
> [2] http://numpy-discussion.10968.n7.nabble.com/Fast-histogram-td9359.html
> [3] https://github.com/mdtraj/mdtraj/pull/734
> [4] https://github.com/rmcgibbo/numpy/tree/histogram
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
http://mail.scipy.org/pipermail/numpy-discussion/attachments/20150315/84ca916d/attachment-0001.html
>
> ------------------------------
>
> Message: 3
> Date: Sun, 15 Mar 2015 22:12:40 -0700
> From: Stephan Hoyer <shoyer <at> gmail.com>
> Subject: [Numpy-discussion] numpy.stack -- which function, if any,
> deserves the name?
> To: Discussion of Numerical Python <numpy-discussion <at> scipy.org>
> Message-ID:
> <CAEQ_TvdQwV52_NKnLpM9+cp681NhV5cUEiigmLMtyBkTnzyOcA <at> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> In the past months there have been two proposals for new numpy functions
> using the name "stack":
>
> 1. np.stack for stacking like np.asarray(np.bmat(...))
> http://thread.gmane.org/gmane.comp.python.numeric.general/58748/
> https://github.com/numpy/numpy/pull/5057
>
> 2. np.stack for stacking along an arbitrary new axis (this was my proposal)
> http://thread.gmane.org/gmane.comp.python.numeric.general/59850/
> https://github.com/numpy/numpy/pull/5605
>
> Both functions generalize the notion of stacking arrays from the existing
> hstack, vstack and dstack, but in two very different ways. Both could be
> useful -- but we can only call one "stack". Which one deserves that name?
>
> The existing *stack functions use the word "stack" to refer to combining
> arrays in two similarly different ways:
> a. For ND -> ND stacking along an existing dimensions (like
> numpy.concatenate and proposal 1)
> b. For ND -> (N+1)D stacking along new dimensions (like proposal 2).
>
> I think it would be much cleaner API design if we had different words to
> denote these two different operations. Concatenate for "combine along an
> existing dimension" already exists, so my thought (when I wrote proposal
> 2), was that the verb "stack" could be reserved (going forward) for
> "combine along a new dimension." This also has the advantage of suggesting
> that "concatenate" and "stack" are the two fundamental operations for
> combining N-dimensional arrays. The documentation on this is currently
> quite confusing, mostly because no function like that in proposal 2
> currently exists.
>
> Of course, the *stack functions have existed for quite some time, and in
> many cases vstack and hstack are indeed used for concatenate like
> functionality (e.g., whenever they are used for 2D arrays/matrices). So the
> case is not entirely clear-cut. (We'll never be able to remove this
> functionality from NumPy.)
>
> In any case, I would appreciate your thoughts.
>
> Best,
> Stephan
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
http://mail.scipy.org/pipermail/numpy-discussion/attachments/20150315/5a72a8bb/attachment-0001.html
>
> ------------------------------
>
> Message: 4
> Date: Sun, 15 Mar 2015 23:00:33 -0700
> From: Jaime Fern?ndez del R?o <jaime.frio <at> gmail.com>
> Subject: Re: [Numpy-discussion] Rewrite np.histogram in c?
> To: Discussion of Numerical Python <numpy-discussion <at> scipy.org>
> Message-ID:
> <CAPOWHWmFckwXLcGy+5tSEyQE8VTOrBg0ubKdYeJ8DZywJL_w3g <at> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
>> On Sun, Mar 15, 2015 at 9:32 PM, Robert McGibbon <rmcgibbo <at> gmail.com> wrote:
>>
>> Hi,
>>
>> Numpy.histogram is implemented in python, and is a little sluggish. This
>> has been discussed previously on the mailing list, [1, 2]. It came up in a
>> project that I maintain, where a new feature is bottlenecked by
>> numpy.histogram, and one developer suggested a faster implementation in
>> cython [3].
>>
>> Would it make sense to reimplement this function in c? or cython? Is
>> moving functions like this from python to c to improve performance within
>> the scope of the development roadmap for numpy? I started implementing this
>> a little bit in c, [4] but I figured I should check in here first.
>
> Where do you think the performance gains will come from? The PR in your
> project that claims a 10x speed-up uses a method that is only fit for
> equally spaced bins. I want to think that implementing that exact same
> algorithm in Python with NumPy would be comparably fast, say within 2x.
>
> For the general case, NumPy is already doing most of the heavy lifting (the
> sorting and the searching) in C: simply replicating the same algorithmic
> approach entirely in C is unlikely to provide any major speed-up. And if
> the change is to the algorithm, then we should first try it out in Python.
>
> That said, if you can speed things up 10x, I don't think there is going to
> be much opposition to moving it to C!
>
> Jaime
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
> de dominaci?n mundial.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
http://mail.scipy.org/pipermail/numpy-discussion/attachments/20150315/ab2c26a9/attachment-0001.html
>
> ------------------------------
>
> Message: 5
> Date: Sun, 15 Mar 2015 23:06:43 -0700
> From: Robert McGibbon <rmcgibbo <at> gmail.com>
> Subject: Re: [Numpy-discussion] Rewrite np.histogram in c?
> To: Discussion of Numerical Python <numpy-discussion <at> scipy.org>
> Message-ID:
> <CAN4+E8GXECy8yaJRfN_NA_V8wdOZeBTLiFM0EJKtfuoONZoMvw <at> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> It might make sense to dispatch to difference c implements if the bins are
> equally spaced (as created by using an integer for the np.histogram bins
> argument), vs. non-equally-spaced bins.
>
> In that case, getting the bigger speedup may be easier, at least for one
> common use case.
>
> -Robert
>
> On Sun, Mar 15, 2015 at 11:00 PM, Jaime Fern?ndez del R?o <
> jaime.frio <at> gmail.com> wrote:
>
>> On Sun, Mar 15, 2015 at 9:32 PM, Robert McGibbon <rmcgibbo <at> gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Numpy.histogram is implemented in python, and is a little sluggish. This
>>> has been discussed previously on the mailing list, [1, 2]. It came up in a
>>> project that I maintain, where a new feature is bottlenecked by
>>> numpy.histogram, and one developer suggested a faster implementation in
>>> cython [3].
>>>
>>> Would it make sense to reimplement this function in c? or cython? Is
>>> moving functions like this from python to c to improve performance within
>>> the scope of the development roadmap for numpy? I started implementing this
>>> a little bit in c, [4] but I figured I should check in here first.
>>
>> Where do you think the performance gains will come from? The PR in your
>> project that claims a 10x speed-up uses a method that is only fit for
>> equally spaced bins. I want to think that implementing that exact same
>> algorithm in Python with NumPy would be comparably fast, say within 2x.
>>
>> For the general case, NumPy is already doing most of the heavy lifting
>> (the sorting and the searching) in C: simply replicating the same
>> algorithmic approach entirely in C is unlikely to provide any major
>> speed-up. And if the change is to the algorithm, then we should first try
>> it out in Python.
>>
>> That said, if you can speed things up 10x, I don't think there is going to
>> be much opposition to moving it to C!
>>
>> Jaime
>>
>> --
>> (\__/)
>> ( O.o)
>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
>> de dominaci?n mundial.
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion <at> scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
http://mail.scipy.org/pipermail/numpy-discussion/attachments/20150315/0dffb1eb/attachment-0001.html
>
> ------------------------------
>
> Message: 6
> Date: Sun, 15 Mar 2015 23:19:59 -0700
> From: Robert McGibbon <rmcgibbo <at> gmail.com>
> Subject: Re: [Numpy-discussion] Rewrite np.histogram in c?
> To: Discussion of Numerical Python <numpy-discussion <at> scipy.org>
> Message-ID:
> <CAN4+E8Ewn+tPpZBo866qH9p=1=1vA8i6kLFvrX8XKHWwazv44A <at> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> My apologies for the typo: 'implements' -> 'implementations'
>
> -Robert
>
> On Sun, Mar 15, 2015 at 11:06 PM, Robert McGibbon <rmcgibbo <at> gmail.com>
> wrote:
>
>> It might make sense to dispatch to difference c implements if the bins are
>> equally spaced (as created by using an integer for the np.histogram bins
>> argument), vs. non-equally-spaced bins.
>>
>> In that case, getting the bigger speedup may be easier, at least for one
>> common use case.
>>
>> -Robert
>>
>> On Sun, Mar 15, 2015 at 11:00 PM, Jaime Fern?ndez del R?o <
>> jaime.frio <at> gmail.com> wrote:
>>
>>> On Sun, Mar 15, 2015 at 9:32 PM, Robert McGibbon <rmcgibbo <at> gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Numpy.histogram is implemented in python, and is a little sluggish. This
>>>> has been discussed previously on the mailing list, [1, 2]. It came up in a
>>>> project that I maintain, where a new feature is bottlenecked by
>>>> numpy.histogram, and one developer suggested a faster implementation in
>>>> cython [3].
>>>>
>>>> Would it make sense to reimplement this function in c? or cython? Is
>>>> moving functions like this from python to c to improve performance within
>>>> the scope of the development roadmap for numpy? I started implementing this
>>>> a little bit in c, [4] but I figured I should check in here first.
>>>
>>> Where do you think the performance gains will come from? The PR in your
>>> project that claims a 10x speed-up uses a method that is only fit for
>>> equally spaced bins. I want to think that implementing that exact same
>>> algorithm in Python with NumPy would be comparably fast, say within 2x.
>>>
>>> For the general case, NumPy is already doing most of the heavy lifting
>>> (the sorting and the searching) in C: simply replicating the same
>>> algorithmic approach entirely in C is unlikely to provide any major
>>> speed-up. And if the change is to the algorithm, then we should first try
>>> it out in Python.
>>>
>>> That said, if you can speed things up 10x, I don't think there is going
>>> to be much opposition to moving it to C!
>>>
>>> Jaime
>>>
>>> --
>>> (\__/)
>>> ( O.o)
>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes
>>> de dominaci?n mundial.
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion <at> scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
http://mail.scipy.org/pipermail/numpy-discussion/attachments/20150315/d22f7d7d/attachment.html
>
> ------------------------------
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion <at> scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> End of NumPy-Discussion Digest, Vol 102, Issue 21
> *****************************************************************************************************************