David Cournapeau | 1 May 05:03
Picon

Re: a possible way to implement a plogin system

On Wed, 2008-04-30 at 16:44 -0300, Lisandro Dalcin wrote:
> David, in order to put clear what I was proposing to you in previous
> mail regarding to implementing plugin systems for numpy, please take a
> look at the attached tarball.

Thanks for looking at this Lisandro.

The problem I see with the approach of struct vs direct function
pointers is of ABI compatibility: it is easy to mess things up with
structures. There is the advantage of using only one dlsym (or
equivalent) with the struct, which may be much faster than using
hundreds of dlsym for each function. Linux does not seem to have
problems with that, but mac os X for example felt slow when I tried
doing this for several thousand functions. I did not go really far on
that, though. I don't really see why using a struct is cleaner, though.
That's really the same thing (function pointers), in both cases the name
namespace pollution will be the same, and in both cases there will be a
need to generate source code.

> 
> IMHO, this is the easier and cleaner way to deal inside python with
> plugins written in low-level C. It does not depend explicitely on
> 'dlopen' stuff from your side.

Concerning the loading mechanism, I don't understand the point of using
PyCObject_Import. By quickly looking at the code of the function, the
plugin needs to be a python module in that case, which is not needed in
our case; I don't like the fact that it is not documented either.

The code for dlopen/dlclose is really short. For each platform, it is
(Continue reading)

Anne Archibald | 1 May 05:16
Picon

Re: untenable matrix behavior in SVN

2008/4/30 Charles R Harris <charlesr.harris <at> gmail.com>:

> Some operations on stacks of small matrices are easy to get, for instance,
> +,-,*,/, and matrix multiply. The last is the interesting one. If A and B
> are stacks of matrices with the same number of dimensions with the matrices
> stored in the last two indices, then
>
> sum(A[...,:,:,newaxis]*B[...,newaxis,:,:], axis=-2)
>
> is the matrix-wise multiplication of the two stacks. If B is replaced by a
> stack of 1D vectors, x, it is even simpler:
>
> sum(A[...,:,:]*x[...,newaxis,:], axis=-1)
>
> This doesn't go through BLAS, but for large stacks of small matrices it
> might be even faster than BLAS because BLAS is kinda slow for small
> matrices.

Yes and no. For the first operation, you have to create a temporary
that is larger than either of the two input arrays. These invisible
(potentially) gigantic temporaries are the sort of thing that puzzle
users when as their problem size grows they suddenly find they hit a
massive slowdown because it starts swapping to disk, and then a
failure because the temporary can't be allocated. This is one reason
we have dot() and tensordot() even though they can be expressed like
this. (The other is of course that it lets us use optimized BLAS.)

This rather misses the point of Timothy Hochberg's suggestion (as I
understood it), though: yes, you can write the basic operations in
numpy, in a more or less efficient fashion. But it would be very
(Continue reading)

Timothy Hochberg | 1 May 06:32
Picon

Re: untenable matrix behavior in SVN



On Wed, Apr 30, 2008 at 8:16 PM, Anne Archibald <peridot.faceted <at> gmail.com> wrote:
2008/4/30 Charles R Harris <charlesr.harris <at> gmail.com>:

> Some operations on stacks of small matrices are easy to get, for instance,
> +,-,*,/, and matrix multiply. The last is the interesting one. If A and B
> are stacks of matrices with the same number of dimensions with the matrices
> stored in the last two indices, then
>
> sum(A[...,:,:,newaxis]*B[...,newaxis,:,:], axis=-2)
>
> is the matrix-wise multiplication of the two stacks. If B is replaced by a
> stack of 1D vectors, x, it is even simpler:
>
> sum(A[...,:,:]*x[...,newaxis,:], axis=-1)
>
> This doesn't go through BLAS, but for large stacks of small matrices it
> might be even faster than BLAS because BLAS is kinda slow for small
> matrices.

Yes and no. For the first operation, you have to create a temporary
that is larger than either of the two input arrays. These invisible
(potentially) gigantic temporaries are the sort of thing that puzzle
users when as their problem size grows they suddenly find they hit a
massive slowdown because it starts swapping to disk, and then a
failure because the temporary can't be allocated. This is one reason
we have dot() and tensordot() even though they can be expressed like
this. (The other is of course that it lets us use optimized BLAS.)


This rather misses the point of Timothy Hochberg's suggestion (as I
understood it), though: yes, you can write the basic operations in
numpy, in a more or less efficient fashion. But it would be very
valuable for arrays to have some kind of metadata that let them keep
track of which dimensions represented simple array storage and which
represented components of a linear algebra object. Such metadata could
make it possible to use, say, dot() as if it were a binary ufunc
taking two matrices. That is, you could feed it two arrays of
matrices, which it would broadcast to the same shape if necessary, and
then it would compute the elementwise matrix product.

The question I have is, what is the right mathematical model for
describing these
arrays-some-of-whose-dimensions-represent-linear-algebra-objects?


One idea is for each dimension to be flagged as one of "replication",
"vector", or "covector". A column vector might then be a rank-1 vector
array, a row vector might be a rank-1 covector array, a linear
operator might be a rank-2 object with one covector and one vector
dimension, a bilinear form might be a rank-2 object with two covector
dimensions. Dimensions designed for holding repetitions would be
flagged as such, so that (for example) an image might be an array of
shape (N,M,3) of types ("replication","replication","vector"); then to
apply a color-conversion matrix one would simply use dot() (or "*" I
suppose). without too much concern for which index was which. The
problem is, while this formalism sounds good to me, with a background
in differential geometry, if you only ever work in spaces with a
canonical metric, the distinction between vector and covector may seem
peculiar and be unhelpful.

Implementing such a thing need not be too difficult: start with a new
subclass of ndarray which keeps a tuple of dimension types. Come up
with an adequate set of operations on them, and implement them in
terms of numpy's functions, taking advantage of the extra information
about each axis. A few operations that spring to mind:

* Addition: it doesn't make sense to add vectors and covectors; raise
an exception. Otherwise addition is always elementwise anyway. (How
hard should addition work to match up corresponding dimensions?)
* Multiplication: elementwise across "repetition" axes, it combines
vector axes with corresponding covector axes to get some kind of
generalized matrix product. (How is "corresponding" defined?)
* Division: mostly doesn't make sense unless you have an array of
scalars (I suppose it could compute matrix inverses?)
* Exponentiation: very limited (though I suppose matrix powers could
be implemented if the shapes are right)
* Change of basis: this one is tricky because not all dimensions need
come from the same vector space
* Broadcasting: the rules may become a bit byzantine...
* Dimension metadata fiddling

Is this a useful abstraction? It seems like one might run into trouble
when dealing with arrays whose dimensions represent vectors from
unrelated spaces.

Thanks Anne. That is exactly what I had in mind. Alas, every time I sit down to try to prototype some code, it collapses under its own weight. I'm becoming warmer to an extended version of the row/col/matrix idea just because its simpler to understand. It would be just as you describe above but would support only four cases:
  1. scalar: all axes are 'replication' axes
  2. vector: last axes is a 'vector' all other replication.
  3. covector: last axes is a 'covector' all other replication
  4. matrix: last two axes are covector and vector respectively. Others are replication.
Or something like that. It's basically the row/col/matrix formulation with stacking. I suspect in practice it gives us most of the power of the full proposal with less complexity (for the user anyway). Then again, if the implementation turns out not to be that bad, one could always implement some convenience types on top of it to make it usable for the masses with the fully general version available for the stout of heart.
 
Regards,

--
. __
. |-\
.
. tim.hochberg <at> ieee.org
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion <at> scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion
ctw | 1 May 06:37

ndarray subclassing

Hi!

I ran into some strange (at least to me) issues with sublasses of
ndarray. The following minimal class definition illustrates the
problem:

====================================================

import numpy as np
class TestArray(np.ndarray):
    def __new__(cls, data, info=None, dtype=None, copy=False):
        subarr = np.array(data, dtype=dtype, copy=copy)
        subarr = subarr.view(cls)
        return subarr

    def __array_finalize__(self,obj):
        print "self: ",self.shape
        print "obj: ",obj.shape

=====================================================

When I run this code interactively with IPython and then generate
TestArray instances, __array_finalize__ seems to get called when
printing out arrays with more than 1 dimension and self.shape seems to
drop a dimension. Everything works fine if the array has just 1
dimension:

In [3]: x = TestArray(np.arange(5))
self:  (5,)
obj:  (5,)

In [4]: x
Out[4]: TestArray([0, 1, 2, 3, 4])

This is all expected behavior.
However things change when the array is 2-D:

In [5]: x = TestArray(np.zeros((2,3)))
self:  (2, 3)
obj:  (2, 3)

In [6]: x
Out[6]: self:  (3,)
obj:  (2, 3)
self:  (3,)
obj:  (2, 3)

TestArray([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

Now when printing out the array, __array_finalize__ seems to get
called twice and each time self seems to only refer to one row of the
array. Can anybody explain what is going on and why? This behavior
seems to lead to problems when the __array_finalize__ method performs
checks on the shape of the array. In the matrix class this seems to be
circumvented with a special _getitem flag that bypasses the shape
checks in __array_finalize__ and an analogous solution works for my
class, too. However, I'm still puzzled by this behavior and am hoping
that somebody here can shed some light on it.

Thanks!
CTW
Travis E. Oliphant | 1 May 06:41

Re: recarray fun

Stéfan van der Walt wrote:
> 2008/4/30 Christopher Barker <Chris.Barker <at> noaa.gov>:
>   
>> Stéfan van der Walt wrote:
>>  > That's the way, or just rgba_image.view(numpy.int32).
>>
>>  ah -- interestingly, I tried:
>>
>>  rgba_image.view(dtype=numpy.int32)
>>
>>  and got:
>>
>>  Traceback (most recent call last):
>>    File "<stdin>", line 1, in <module>
>>  TypeError: view() takes no keyword arguments
>>
>>  Since it is optional, shouldn't it be keyword argument?
>>     
>
> Thanks, fixed in r5115.
>
>   
This was too hasty.   I had considered this before.

The problem with this is that the object can be either a type object or 
a data-type object.  You can use view to both re-cast a numpy array as 
another subtype or as another data-type.

So, please revert the change until a better solution is posted. 

-Travis
Charles R Harris | 1 May 06:41
Picon

Re: untenable matrix behavior in SVN



On Wed, Apr 30, 2008 at 9:16 PM, Anne Archibald <peridot.faceted <at> gmail.com> wrote:
2008/4/30 Charles R Harris <charlesr.harris <at> gmail.com>:

> Some operations on stacks of small matrices are easy to get, for instance,
> +,-,*,/, and matrix multiply. The last is the interesting one. If A and B
> are stacks of matrices with the same number of dimensions with the matrices
> stored in the last two indices, then
>
> sum(A[...,:,:,newaxis]*B[...,newaxis,:,:], axis=-2)
>
> is the matrix-wise multiplication of the two stacks. If B is replaced by a
> stack of 1D vectors, x, it is even simpler:
>
> sum(A[...,:,:]*x[...,newaxis,:], axis=-1)
>
> This doesn't go through BLAS, but for large stacks of small matrices it
> might be even faster than BLAS because BLAS is kinda slow for small
> matrices.

Yes and no. For the first operation, you have to create a temporary
that is larger than either of the two input arrays. These invisible
(potentially) gigantic temporaries are the sort of thing that puzzle
users when as their problem size grows they suddenly find they hit a
massive slowdown because it starts swapping to disk, and then a
failure because the temporary can't be allocated. This is one reason
we have dot() and tensordot() even though they can be expressed like
this. (The other is of course that it lets us use optimized BLAS.)

But it is interesting that you can multiply stacks of matrices that way, is it not? I haven't seen it mentioned elsewhere.

Chuck


_______________________________________________
Numpy-discussion mailing list
Numpy-discussion <at> scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Travis E. Oliphant | 1 May 06:45

Re: ndarray subclassing

ctw wrote:
> Hi!
>
> I ran into some strange (at least to me) issues with sublasses of
> ndarray. The following minimal class definition illustrates the
> problem:
>
> ====================================================
>
> import numpy as np
> class TestArray(np.ndarray):
>     def __new__(cls, data, info=None, dtype=None, copy=False):
>         subarr = np.array(data, dtype=dtype, copy=copy)
>         subarr = subarr.view(cls)
>         return subarr
>
>     def __array_finalize__(self,obj):
>         print "self: ",self.shape
>         print "obj: ",obj.shape
>
> =====================================================
>
> When I run this code interactively with IPython and then generate
> TestArray instances, __array_finalize__ seems to get called when
> printing out arrays with more than 1 dimension and self.shape seems to
> drop a dimension. Everything works fine if the array has just 1
> dimension:
>
> In [3]: x = TestArray(np.arange(5))
> self:  (5,)
> obj:  (5,)
>
> In [4]: x
> Out[4]: TestArray([0, 1, 2, 3, 4])
>
> This is all expected behavior.
> However things change when the array is 2-D:
>
> In [5]: x = TestArray(np.zeros((2,3)))
> self:  (2, 3)
> obj:  (2, 3)
>
> In [6]: x
> Out[6]: self:  (3,)
> obj:  (2, 3)
> self:  (3,)
> obj:  (2, 3)
>
> TestArray([[ 0.,  0.,  0.],
>        [ 0.,  0.,  0.]])
>   

You are just seeing the result of __repr__.   The printing code works by 
accessing slices of the array.  These slices create new instances of 
your TestArray class which have a smaller number of dimensions.  That's all.

Is the printing code causing you other kinds of problems?

-Travis
Chris.Barker | 1 May 07:00
Picon
Favicon

Re: recarray fun

Travis E. Oliphant wrote:
> Stéfan van der Walt wrote:
>> 2008/4/30 Christopher Barker <Chris.Barker <at> noaa.gov>:
>>>  Since it is optional, shouldn't it be keyword argument?
>>>     
>> Thanks, fixed in r5115.
>>
>>   
> This was too hasty.   I had considered this before.
> 
> The problem with this is that the object can be either a type object or 
> a data-type object.  You can use view to both re-cast a numpy array as 
> another subtype or as another data-type.

Is the issue here that this is a slightly different meaning than the 
"dtype" argument everywhere else? Frankly, that doesn't bother me -- it 
is a superset of the functionality, is it not?

Though I guess I don't really understand quite what the difference is 
between a subtype and a data-type. Or a type object vs. a datatype object.

-Chris

--

-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker <at> noaa.gov
ctw | 1 May 07:15

Re: ndarray subclassing

On Thu, May 1, 2008, Travis E. Oliphant  wrote:
> You are just seeing the result of __repr__.   The printing code works by
> accessing slices of the array.  These slices create new instances of
> your TestArray class which have a smaller number of dimensions.  That's all.

Ahh, that makes sense. Thanks so much for the quick reply!

> Is the printing code causing you other kinds of problems?

The problem was that my __array_finalize__ code did some checking on
the shape of the array. In short, one of the attributes of my class is
a list that has one entry for each dimension. __array_finalize__
checks to make sure this list length and the number of dimensions
match and it throws an exception if they don't. This leads to an
exception being thrown whenever I try to print the contents of an
array with more than 1 dimensions, because I haven't yet implemented a
__getitem__ method that adjusts this attribute when slices are taken.
I didn't realize that this was what's going on and was very puzzled by
the results. It looks like I can make things work by using _getitem
flag as is done in the matrix class.

Thanks again for clearing this up! I think it would be great if
somebody with write access to the wiki could make a note of this on
the sublasses page: http://www.scipy.org/Subclasses

CTW
Anne Archibald | 1 May 07:39
Picon

Re: recarray fun

2008/5/1 Travis E. Oliphant <oliphant <at> enthought.com>:
> Stéfan van der Walt wrote:
>  > 2008/4/30 Christopher Barker <Chris.Barker <at> noaa.gov>:
>  >
>  >> Stéfan van der Walt wrote:
>  >>  > That's the way, or just rgba_image.view(numpy.int32).
>  >>
>  >>  ah -- interestingly, I tried:
>  >>
>  >>  rgba_image.view(dtype=numpy.int32)
>  >>
>  >>  and got:
>  >>
>  >>  Traceback (most recent call last):
>  >>    File "<stdin>", line 1, in <module>
>  >>  TypeError: view() takes no keyword arguments
>  >>
>  >>  Since it is optional, shouldn't it be keyword argument?
>  >>
>  >
>  > Thanks, fixed in r5115.
>  >
>  >
>  This was too hasty.   I had considered this before.
>
>  The problem with this is that the object can be either a type object or
>  a data-type object.  You can use view to both re-cast a numpy array as
>  another subtype or as another data-type.
>
>  So, please revert the change until a better solution is posted.

If we're going to support keyword arguments, shouldn't those two
options be *different* keywords (dtype and ndarray_subclass, say)?
Then a single non-keyword argument tells numpy to guess which one you
wanted...

Anne
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion <at> scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Gmane