Sturla Molden | 1 Sep 2009 06:06
Picon
Gravatar

A faster median (Wirth's method)


We recently has a discussion regarding an optimization of NumPy's median 
to average O(n) complexity. After some searching, I found out there is a 
selection algorithm competitive in speed with Hoare's quick select. It 
has the advantage of being a lot simpler to implement. In plain Python:

import numpy as np

def wirthselect(array, k):

    """ Niklaus Wirth's selection algortithm """

    a = np.ascontiguousarray(array)
    if (a is array): a = a.copy()

    l = 0
    m = a.shape[0] - 1
    while l < m:
        x = a[k]
        i = l
        j = m
        while 1:
            while a[i] < x: i += 1
            while x < a[j]: j -= 1
            if i <= j:
                tmp = a[i]
                a[i] = a[j]
                a[j] = tmp
                i += 1
                j -= 1
(Continue reading)

Sturla Molden | 1 Sep 2009 06:50
Picon
Gravatar

Re: A faster median (Wirth's method)

Sturla Molden skrev:
> We recently has a discussion regarding an optimization of NumPy's median 
> to average O(n) complexity. After some searching, I found out there is a 
> selection algorithm competitive in speed with Hoare's quick select.
>
> Reference:
> http://ndevilla.free.fr/median/median.pdf

After som more googling, I found this  text by Wirth himself:

http://www.oberon2005.ru/book/ads2004.pdf

The method is described on page 61 (57 in the PDF) as Hoare's quick 
select. So it seems it's just a less optimized version than that of 
Numerical Receipes, and the first reference (Devillard) was confused. 
Anyhow, it still has the advantage of looking nice in Cython and being 
very different from the Numerical Receipes code. We can rename 
wirthselect to quickselect then. Sorry for the confusion. I should have 
checked the source better.

Sturla Molden
Stefano Covino | 1 Sep 2009 10:08
Picon
Favicon

snow leopard and Numeric

Hello everybody,

I have just upgraded my Mac laptop to snow leopard.
However, I can no more compile Numeric 24.2.

Here is my output:

[MacBook-Pro-di-Stefano:~/Pacchetti/Numeric-24.2] covino% python  
setup.py build
running build
running build_py
running build_ext
building 'RNG.RNG' extension
gcc-4.2 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -arch i386 - 
arch ppc -arch x86_64 -pipe -IInclude -IPackages/FFT/Include - 
IPackages/RNG/Include -I/System/Library/Frameworks/Python.framework/ 
Versions/2.6/include/python2.6 -c Packages/RNG/Src/ranf.c -o build/ 
temp.macosx-10.6-universal-2.6/Packages/RNG/Src/ranf.o
Packages/RNG/Src/ranf.c: In function ‘Mixranf’:
Packages/RNG/Src/ranf.c:153: error: conflicting types for ‘gettimeofday’
/usr/include/sys/time.h:210: error: previous declaration of  
‘gettimeofday’ was here
Packages/RNG/Src/ranf.c: In function ‘Mixranf’:
Packages/RNG/Src/ranf.c:153: error: conflicting types for ‘gettimeofday’
/usr/include/sys/time.h:210: error: previous declaration of  
‘gettimeofday’ was here
Packages/RNG/Src/ranf.c: In function ‘Mixranf’:
Packages/RNG/Src/ranf.c:153: error: conflicting types for ‘gettimeofday’
/usr/include/sys/time.h:210: error: previous declaration of  
‘gettimeofday’ was here
(Continue reading)

Matthieu Brucher | 1 Sep 2009 10:27
Picon

Re: snow leopard and Numeric

Use Numpy instead of Numeric (no longer supported I think)?

Matthieu

2009/9/1 Stefano Covino <stefano_covino <at> yahoo.it>:
> Hello everybody,
>
> I have just upgraded my Mac laptop to snow leopard.
> However, I can no more compile Numeric 24.2.
>
> Here is my output:
>
> [MacBook-Pro-di-Stefano:~/Pacchetti/Numeric-24.2] covino% python
> setup.py build
> running build
> running build_py
> running build_ext
> building 'RNG.RNG' extension
> gcc-4.2 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -arch i386 -
> arch ppc -arch x86_64 -pipe -IInclude -IPackages/FFT/Include -
> IPackages/RNG/Include -I/System/Library/Frameworks/Python.framework/
> Versions/2.6/include/python2.6 -c Packages/RNG/Src/ranf.c -o build/
> temp.macosx-10.6-universal-2.6/Packages/RNG/Src/ranf.o
> Packages/RNG/Src/ranf.c: In function ‘Mixranf’:
> Packages/RNG/Src/ranf.c:153: error: conflicting types for ‘gettimeofday’
> /usr/include/sys/time.h:210: error: previous declaration of
> ‘gettimeofday’ was here
> Packages/RNG/Src/ranf.c: In function ‘Mixranf’:
> Packages/RNG/Src/ranf.c:153: error: conflicting types for ‘gettimeofday’
> /usr/include/sys/time.h:210: error: previous declaration of
(Continue reading)

Tim Michelsen | 1 Sep 2009 12:08
Picon

genfromtext advice

Hello,
I tried to load a ASCII table into a string array. Unfortunately, this table has
some empty chells

Here it is:
http://www.ncdc.noaa.gov/oa/climate/rcsg/cdrom/ismcs/alphanum.html

After having converted this into a text file I tried this:

$ np.genfromtxt('alphanum_to-text.txt', dtype=np.str_, delimiter='|',
skiprows=1, missing='')

But I got an error:

   1043                         dtype = np.dtype(ttype)
   1044             #
-> 1045             output = np.array(data, dtype)
   1046             if usemask:
   1047                 if dtype.names:

ValueError: setting an array element with a sequence

I sometimes experience this error message with text read in. Could this message
be made more helpful like telling in which line of the input file this occurs?
This could really reduce the trial and error.

What would be the correct way to read this data in?

Thanks,
Timmie
(Continue reading)

Pierre GM | 1 Sep 2009 12:37
Picon

Re: genfromtext advice


On Sep 1, 2009, at 6:08 AM, Tim Michelsen wrote:

> Hello,
> I tried to load a ASCII table into a string array. Unfortunately,  
> this table has
> some empty chells
>
> Here it is:
> http://www.ncdc.noaa.gov/oa/climate/rcsg/cdrom/ismcs/alphanum.html
>
> After having converted this into a text file I tried this:
>
> $ np.genfromtxt('alphanum_to-text.txt', dtype=np.str_, delimiter='|',
> skiprows=1, missing='')
>
> But I got an error:
>
>   1043                         dtype = np.dtype(ttype)
>   1044             #
> -> 1045             output = np.array(data, dtype)
>   1046             if usemask:
>   1047                 if dtype.names:
>
> ValueError: setting an array element with a sequence
>
> I sometimes experience this error message with text read in. Could  
> this message
> be made more helpful like telling in which line of the input file  
> this occurs?
(Continue reading)

Tim Michelsen | 1 Sep 2009 13:18
Picon

Re: genfromtext advice

> Mmh, perhaps.
Thanks for the quick reply.

> I'll try to see what I can do. Usually, this message  
> shows up when one of the lines you have read doesn't have the same  
> number of columns as the others. 
Could we add this error to the docstring?
As I suggested, It would be helpful to get the line number where this occurs.

>You mentioned that you're skipping  
> one line : try skipping a few more and check.
Here is an example:
http://pastebin.com/m508d1d00

If I skip the first 5512 lines everything goes well.

> You're 100% sure that you didn't miss  
> something in the conversion ?
Yes. And just did it again.
Just edited the html source and then saved to text. no magic.

> Can you post the result  (or send a link  
> to?)
May I add a improvement request and append the file there?

Thanks,
Timmie 
Citi, Luca | 1 Sep 2009 13:17
Picon
Favicon

Re: genfromtext advice

I have tried
$ awk -F '|' '{if(NF != 12) print NR;}' /tmp/pp.txt
and besides the first 23 lines and the last 3 lines of the file,
also the following have a number of '|' different from 11:
1635
2851
5538
i.e. BIKIN, BENGUERIR and TERESINA AIRPORT.
But I completely agree with you, genfromtxt could print out
the line number and the actual line giving problems.
jorgesmbox-ml | 1 Sep 2009 14:07
Picon
Picon
Favicon

Question about np.savez

Hi,
I know the documentation states that np.savez saves numpy arrays, so my question relates to misusing it.
Before reading the doc in detail, and after reading about pickle and other options to make data
persistent, I passed np.savez a list of ndarrays. It didn't complain, but when I loaded the data back, the
list had been turned into an ndarray. Is this behaviour expected? It did surprise me. Below  there is an example:

In [18]: l
Out[18]: 
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18]),
 array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27])]

In [19]: np.savez('test.npz', l=l)

In [20]: data = np.load('test.npz')

In [21]: l1 = data['l']

In [22]: l1
Out[22]: 
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18],
       [ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27]])

jorge
Tim Michelsen | 1 Sep 2009 14:54
Picon

Re: genfromtext advice

> $ awk -F '|' '{if(NF != 12) print NR;}' /tmp/pp.txt
> and besides the first 23 lines and the last 3 lines of the file,
> also the following have a number of '|' different from 11:
> 1635
> 2851
> 5538
> i.e. BIKIN, BENGUERIR and TERESINA AIRPORT.
Looks lika some bash magic.

I will try to translate this into as python script...

Gmane