Xuedong Zhang | 1 Aug 04:02 2007

Save complex value ndarray

Hi,
Thanks for all people made up this wonderful software.
I am just switching from matlab and still need to save some complex
data into mat file format.

The version I have (scipy.io.mio.savemat function) only save matV4 file 
and don't
support complex data. I wonder if there is any recent work on SVN or I 
have to
work out my own solution???

Thanks
Xuedong
David Cournapeau | 1 Aug 04:03 2007
Picon
Picon

Re: Save complex value ndarray

Xuedong Zhang wrote:
> Hi,
> Thanks for all people made up this wonderful software.
> I am just switching from matlab and still need to save some complex
> data into mat file format.
>
> The version I have (scipy.io.mio.savemat function) only save matV4 file 
> and don't
> support complex data. I wonder if there is any recent work on SVN or I 
> have to
> work out my own solution???
>   
I don't have scipy 0.5.2 at hand now, but I can confirm that using 
scipy.io.savemat from svn scipy can save complex arrays into mat files, 
which are loadable in matlab.

David
David Cournapeau | 1 Aug 07:13 2007
Picon
Picon

FFTW performances in scipy and numpy

Hi Stevens,

    I am one of the contributor to numpy/scipy. Let me first say I am 
*not* the main author of the fftw wrapping for scipy, and that I am a 
relatively newcommer in scipy, and do not claim a deep understanding of 
numpy arrays. But I have been thinking a bit on the problem since I am a 
big user of fft and debugged some problems in the scipy code since.

    - about copying killing performances: I am well aware of the 
problem, this was only a quick hack because the performances were abysal 
before this hack (the plan was computed for every fft !), and I had some 
difficulties to follow the code for something better. At least, it made 
the performances acceptable.
    - Because I found the code difficult to follow code, I started 
cleaning up the sources. The real goal is to add a better mechanism to 
use fftw as efficiently as possible.

To improve performances: I thought about several approaches, which 
happen to be the ones you suggest :)

    - making numpy data 16 bytes aligned. This one is a bit tricky. I 
won't bother you with the details, but generally, numpy data may not be 
even "word aligned". Since some archs require some kind of alignement, 
there are some mechanisms to get aligned buffers from unaligned buffers 
in numpy API; I was thinking about an additional flag "SIMD alignement", 
since this could be quite useful for many optimized libraries using 
SIMD. But maybe this does not make sense, I have not yet thought enough 
about it to propose anything concrete to the main numpy developers.
    - I have tried FFTW_UNALIGNED + FFTW_ESTIMATE plans; unfortunately, 
I found that the performances were worse than using FFTW_MEASURE + copy 
(Continue reading)

Anne Archibald | 1 Aug 09:04 2007
Picon

Re: FFTW performances in scipy and numpy

On 01/08/07, David Cournapeau <david <at> ar.media.kyoto-u.ac.jp> wrote:

>     I am one of the contributor to numpy/scipy. Let me first say I am
> *not* the main author of the fftw wrapping for scipy, and that I am a
> relatively newcommer in scipy, and do not claim a deep understanding of
> numpy arrays. But I have been thinking a bit on the problem since I am a
> big user of fft and debugged some problems in the scipy code since.

I have been using FFTW from raw c, and thinking about how it could be
used efficiently from python. (I also do oodles of FFTs in numpy, but
they are not performance-critical.)

>     - about copying killing performances: I am well aware of the
> problem, this was only a quick hack because the performances were abysal
> before this hack (the plan was computed for every fft !), and I had some
> difficulties to follow the code for something better. At least, it made
> the performances acceptable.

How much faster are plans likely to get with FFTW_DESTROY? Is
copy+FFTW_DESTROY still much slower than FFTW_PRESERVE?
Multidimensional real ffts may require copying anyway, since they
don't support FFTW_PRESERVE (and the numpy fft interface doesn't
damage its input).

>     - Because I found the code difficult to follow code, I started
> cleaning up the sources. The real goal is to add a better mechanism to
> use fftw as efficiently as possible.

It might be worth providing a thinner wrapper (as well as the current
simple one) so that one could control FFTW's planning from python
(Continue reading)

David Cournapeau | 1 Aug 09:49 2007
Picon
Picon

Re: FFTW performances in scipy and numpy

Anne Archibald wrote:
> On 01/08/07, David Cournapeau <david <at> ar.media.kyoto-u.ac.jp> wrote:
>
>>     I am one of the contributor to numpy/scipy. Let me first say I am
>> *not* the main author of the fftw wrapping for scipy, and that I am a
>> relatively newcommer in scipy, and do not claim a deep understanding of
>> numpy arrays. But I have been thinking a bit on the problem since I am a
>> big user of fft and debugged some problems in the scipy code since.
>
Ok, I prepared a small package to test several strategies:

http://www.ar.media.kyoto-u.ac.jp/members/david/archives/fftdev.tbz2

By doing make test, it should build out of the box and run the tests (if 
you are on Linux, have gcc and fftw3, of course :) ). I did not even 
check whether the computation is OK (I just tested against memory 
problems under valgrind).

3 strategies are available:
    - Have a flag to check whether the given array is 16 bytes aligned, 
and conditionnally build plans using this info
    - Use FFTW_UNALIGNED, and do not care about alignement
    - Current strategy: copy.

The three strategies use FFTW_MEASURE, which I didn't do before, and may 
explain the wrong things I've said in my precedent email.

There are four binaries, two for the first strategy: in one case 
(testsimd), I use standard malloc, and in the second case, I force 
malloc to be 16 bytes aligned.
(Continue reading)

John Travers | 1 Aug 13:18 2007
Picon

Re: FFTW performances in scipy and numpy

On 01/08/07, David Cournapeau <david <at> ar.media.kyoto-u.ac.jp> wrote:
> Anne Archibald wrote:
> > On 01/08/07, David Cournapeau <david <at> ar.media.kyoto-u.ac.jp> wrote:
> >
> >>     I am one of the contributor to numpy/scipy. Let me first say I am
> >> *not* the main author of the fftw wrapping for scipy, and that I am a
> >> relatively newcommer in scipy, and do not claim a deep understanding of
> >> numpy arrays. But I have been thinking a bit on the problem since I am a
> >> big user of fft and debugged some problems in the scipy code since.
> >
> Ok, I prepared a small package to test several strategies:
>
> http://www.ar.media.kyoto-u.ac.jp/members/david/archives/fftdev.tbz2
>
> By doing make test, it should build out of the box and run the tests (if
> you are on Linux, have gcc and fftw3, of course :) ). I did not even
> check whether the computation is OK (I just tested against memory
> problems under valgrind).
>
> 3 strategies are available:
>     - Have a flag to check whether the given array is 16 bytes aligned,
> and conditionnally build plans using this info
>     - Use FFTW_UNALIGNED, and do not care about alignement
>     - Current strategy: copy.
>
> The three strategies use FFTW_MEASURE, which I didn't do before, and may

Another strategy worth trying is using FFTW_MEASURE once and then
using FFTW_ESTIMATE for additional arrays. FFTW accumulates wisdom and
so the initial call with MEASURE means that further estimated plans
(Continue reading)

David Cournapeau | 1 Aug 13:41 2007
Picon
Picon

Re: FFTW performances in scipy and numpy

John Travers wrote:
> On 01/08/07, David Cournapeau <david <at> ar.media.kyoto-u.ac.jp> wrote:
>   
>> Anne Archibald wrote:
>>     
>>> On 01/08/07, David Cournapeau <david <at> ar.media.kyoto-u.ac.jp> wrote:
>>>
>>>       
>>>>     I am one of the contributor to numpy/scipy. Let me first say I am
>>>> *not* the main author of the fftw wrapping for scipy, and that I am a
>>>> relatively newcommer in scipy, and do not claim a deep understanding of
>>>> numpy arrays. But I have been thinking a bit on the problem since I am a
>>>> big user of fft and debugged some problems in the scipy code since.
>>>>         
>> Ok, I prepared a small package to test several strategies:
>>
>> http://www.ar.media.kyoto-u.ac.jp/members/david/archives/fftdev.tbz2
>>
>> By doing make test, it should build out of the box and run the tests (if
>> you are on Linux, have gcc and fftw3, of course :) ). I did not even
>> check whether the computation is OK (I just tested against memory
>> problems under valgrind).
>>
>> 3 strategies are available:
>>     - Have a flag to check whether the given array is 16 bytes aligned,
>> and conditionnally build plans using this info
>>     - Use FFTW_UNALIGNED, and do not care about alignement
>>     - Current strategy: copy.
>>
>> The three strategies use FFTW_MEASURE, which I didn't do before, and may
(Continue reading)

dmitrey | 1 Aug 15:49 2007
Picon

GSoC schedule question (letter for my mentors & Matthieu Brucher)

hi all,
this is a letter primarily for my mentors & Matthieu Brucher
according to the schedule proposed by my mentors I should work on chapter 2

2. Make existing openopt code usable by adding docstrings,
   unit tests, and documented sample scripts.
   Docstrings must conform to the standard.
   http://projects.scipy.org/scipy/numpy/wiki/CodingStyleGuidelines
   Tests must conform to the TestingGuidelines
   http://projects.scipy.org/scipy/scipy/wiki/TestingGuidelines
   Make sure ralg and lincher are fully functional, tested, and documented,
   completing as much as possible of your projected work on ralg
   (i.e., constraints to nonsmooth ralg solver (c<0, h=0, lb, ub, A, Aeq),
   as well as their derivatives).  Documentation must be 
   such as to allow other developers to understand and 
   maintain the code, should you cease maintenance.
   Sample scripts should ensure that users have informative
   (and documented!) examples to work with.
   (7 days)

So, one of the main problems for lincher (along with extern QP solver from cvxopt) is: a good line-search
minimizer, that takes into account slope angle, is absent (we had discussed the problem some weeks ago in
mailing lists).
In scipy.optimize there is only one appropriate func: line_search. However, there are some problems with
line_search docstring:

line_search(f, myfprime, xk, pk, gfk, old_fval, old_old_fval, args=(), c1=0.0001,
c2=0.90000000000000002, amax=50)
    Find alpha that satisfies strong Wolfe conditions.

(Continue reading)

Matthieu Brucher | 1 Aug 16:18 2007
Picon

Re: GSoC schedule question (letter for my mentors & Matthieu Brucher)

Hi,
 

So, as you see, params are not described; especially I'm interested in old_fval and old_old_fval.
So, there was a letter from NLPy developer about an alternative (link to an article was provided), also, Matthieu proposed using one of his own solvers.
I think the importance of the auxiliary solver is very high and should be done in 1st order. My opinion about next thing to do is using an appropriate solver from Matthieu's package. However, Matthieu's syntax differs too much from openopt one. I think first of all there should be an openopt binding to Matthieu's package, that will allow for oo users to use same syntax:
prob = NLP(...)
r = prob.solve()
I would implement the binding by myself, but I miss well-described API documentation of Matthieu's code. First of all I'm interested
1) what solvers does it contain?


I put a small description here : https://projects.scipy.org/scipy/scikits/wiki/Optimization (still in progress).
 

2) how xtol, funtol, contol etc can be passed to the solvers?


Each of these parameters are either step information, line search information or criterion information. Each parameter must be given to the corresponding object that will use it (I didn't want to centralize everything as some modules need pre-computation before they can be used in the optimizer, like the Fibonacci section search).
 

then, secondary (it can wait, maybe default parameters would be enough for now)
3) which params can I modify (like type of step (Armijo|Powell|etc), etc)


You can modify everything. The goal is to provide bricks that you can build together so that the optimizer makes what you need. If you want to provide new modules, here are some "rules" :
- the function should be provided as an object that defines the correct methods, like __call__, gradient or hessian if needed (the case of approximation of the gradient as a finite-element one should be programmed with a class from which the function derives, but we can speak of this in another mail if you want details on this one)
- a criterion module takes only one argument which is the current state of the optimizer
- a step module takes three arguments : the function being optimized, the point where to search for a step and the state of the optimizer
- a line search module takes a four arguments : the point, the computed step, the function and the state (I suppose this should be refactored to be more consistent with the step module...)
- the core optimizer that uses these mdoules and dispatches the data accordingly
 

BTW ralg is also missing a good line-search optimizer. It requires the one that finds solutions with slope angle > pi/2. But it can wait, it has one quite good and problem with lincher is more actual.


If you have an algorithm that can do this, you only have to program it and everyone will be able to use it with the other modules.
 

So I think the next GSoC schedule step should be connection of 1-2 Matthieu's solvers (that take into account slope angle, like strong Wolfe conditions do) to native openopt syntax.


If I understand this correctly, it is wrapping some usual combinations together so that people use them without knowing, like for the brent functiona nd the Brent class ? It should be easy by overriding the optimizer constructor and by adding a solve method that just calls optimize() (in this case, I'll probably modify optimize() so that it does not return something, and the state of the optimizer would provide the answer).
 

Are you agree?

if yes, my question to Matthieu: can you provide a description of some appropriate solvers from your package? and/or an example of usage?

If you need more details, ask specific questions, I'll gladly answer them (and add them to the wiki)

Matthieu
_______________________________________________
Scipy-dev mailing list
Scipy-dev <at> scipy.org
http://projects.scipy.org/mailman/listinfo/scipy-dev
Anne Archibald | 1 Aug 17:42 2007
Picon

Re: FFTW performances in scipy and numpy

On 01/08/07, David Cournapeau <david <at> ar.media.kyoto-u.ac.jp> wrote:
> Anne Archibald wrote:
> >
> > Not just libraries; with SSE2 and related instruction sets, it's quite
> > possible that even ufuncs could be radically accelerated - it's
> > reasonable to use SIMD (and cache control!) for even the simple case
> > of adding two arrays into a third. No code yet exists in numpy to do
> > so, but an aggressive optimizing compiler could do something with the
> > code that is there. (Of course, this has observable numerical effects,
> > so there would be the same problem as for gcc's -ffast-math flag.)
> The problem of precision really is specific to SSE and x86, right ? But
> since apple computers also use those now, I guess the problem is kind of
> pervasive :)

I think some other architectures (MIPS? not sure) may also use an
intermediate representation with more accuracy. As you say, though,
x86 and x86-64 are fairly pervasive.

BLAS would of course probably be faster (though how well does it cope
with peculiarly-strided data?) but I expect resistance to making numpy
depend on BLAS.

> > Really large numpy arrays are already going to be SIMD-aligned (on
> > Linux at least), because they are allocated on fresh pages. Small
> > arrays are going to waste space if they're SIMD-aligned. So the
> > default allocator is probably fine as it is, but it would be handy to
> > have alignment as an additional property one could request from
> > constructors and check from anywhere. I would hesitate to make it a
> > flag, since one might well care about page alignment, 32-bit
> > alignment, or whatever.
> Are you sure about the page thing ? A page is 4kb, right ? This would
> mean any double numpy arrays above 512 items is aligned... which is not
> what I observed when I tested. Since I screwed things up last time I
> checked, I should test again, though.

By "really large" I don't necessarily mean "larger than a page"; I
don't know what malloc's threshold is. I had in mind the 300-MB arrays
I'm allocating, which are definitely on fresh pages (which allows
malloc to dump them back to the OS when they get freed).

Anne

Gmane