dmitrey | 1 Jan 10:43
Favicon

Re: Ann: OpenOpt v 0.15 (free optimization framework)

Hi Lev,

openopt0.15.tar.gz is stable version,

openopt.tar.gz is file containing latest changes (mentioned in openopt 
blog) for those who can't use svn.

Regards, D

Lev Givon wrote:
> Why are there two tarballs available for download on 
> http://scipy.org/scipy/scikits/wiki/OpenOptInstall? They both seem to
> contain the same version of the software.
>
>                             L.G.
>
>
>   
Jarrod Millman | 2 Jan 02:10
Picon
Favicon
Gravatar

Re: read_array problem

On Dec 17, 2007 9:12 AM, Christoph Scheit <cscheit <at> lstm.uni-erlangen.de> wrote:
> just for curiosity I have a question regarding read_array.
> When I use the scipy.io read_array function I observe
> some behaviour which I don't understand...

Hey Chris,

scipy.io.read_array is no longer supported.  In the next release of
scipy, it will be officially deprecated.

Please take a look at numpy.loadtxt(), which has the same
functionality with a slightly different syntax.  The loadtxt docstring
should provide detailed instructions for how to use it.  If you have
any questions about using numpy.loadtxt(), please let us know.

Thanks,

--

-- 
Jarrod Millman
Computational Infrastructure for Research Labs
10 Giannini Hall, UC Berkeley
phone: 510.643.4014
http://cirl.berkeley.edu/
Alexander Dietz | 2 Jan 17:35
Picon
Picon
Favicon

Usage of scipy KS test

Hi,

I am trying to use the KS test implemented to scipy.stats, but nowhere I could find an example on how to use this function, for my purposes.

Therefore let me describe what I have and what I want to do. I have three lists:
x - vector of points on the x-axis
y - vector of measured values for each of the x-points (cumulative distribution, first value:0.0, last value:1.0)
m - vector containing values calculated from a model (cumulative distribution, first value: 0.0, last value:1.0)

Each list has the same length. Now I want to test the hypothesis, that both vectors y and m are from the same distribution ( or not from the same distribution).

I would very appreciate if someone could send me a concrete example using the vectors y and m.


Thanks
  Alex


_______________________________________________
SciPy-user mailing list
SciPy-user <at> scipy.org
http://projects.scipy.org/mailman/listinfo/scipy-user
Anne Archibald | 2 Jan 18:08
Picon

Re: Usage of scipy KS test

On 02/01/2008, Alexander Dietz <Alexander.Dietz <at> astro.cf.ac.uk> wrote:

> I am trying to use the KS test implemented to scipy.stats, but nowhere I
> could find an example on how to use this function, for my purposes.

It is indeed unfortunate that the man page doesn't have an example.
Here is one (in doctest format, I think, for easy inclusion into
scipy):

>>> import numpy
>>> import scipy.stats
>>> a = numpy.array([0.56957006,  0.81129082,  0.58896055,
0.63162055,  0.39305061, 0.92327368,  0.72176744,  0.69589162,
0.12716994,  0.80996302])
>>> scipy.stats.kstest(a, lambda x: x)
(0.26957006, array(0.19655500176460927))
>>> scipy.stats.kstest(a**4, lambda x: x)
(0.46678224511522154, array(0.0080924628974947677))

Let me explain: a was generated using numpy.random.uniform(size=10);
as you can see, I hope, they are uniformly distributed. Each time
scipy.stats.kstest it run, it returns two values: the KS D value
(which is not very meaningful) and the probability that such a
collection of values would be drawn from a distribution with a CDF
given by the second argument. You can see that a is reasonably likely
to have been drawn from a uniform distribution, but a**4 is not.

> Therefore let me describe what I have and what I want to do. I have three
> lists:
> x - vector of points on the x-axis
> y - vector of measured values for each of the x-points (cumulative
> distribution, first value:0.0, last value:1.0)
> m - vector containing values calculated from a model (cumulative
> distribution, first value: 0.0, last value:1.0)
>
> Each list has the same length. Now I want to test the hypothesis, that both
> vectors y and m are from the same distribution ( or not from the same
> distribution).
>
> I would very appreciate if someone could send me a concrete example using
> the vectors y and m.

This format is more complicated than what we need. scipy.stats.kstest
wants the list of (not necessarily sorted) x values, and a function
that evaluates the CDF. The simplest thing to do is provide it your
function that evaluates the CDF rather than computing m. If, however,
you have already computed m, you can cheat: scipy.stats.kstest only
needs to evaluate the function at the points in x, so you can create a
function based on dictionary lookup:

scipy.stats.kstest(x,dict(zip(x,m)).get)

This should return a tuple containing the KS D value and the
probability a data set like this one would be obtained from a
probability distribution with your CDF.

I should say, there's another mode scipy.stats.kstest can be used in:
you can give it a random number generator and the CDF of the
distribution it is supposed to generate, and it will see if the random
number generator is (with a reasonable probability) functioning
properly.

Is nose testing extensible enough to be able to mark (with a decorator
perhaps?) some tests as probabilistic, that is, a test which even a
correct function has a small chance of failing? The standard idiom for
such a test is to run it once, and if it fails run it again before
reporting failure.

Good luck,
Anne
dmitrey | 2 Jan 20:28
Favicon

howto dot(a, b.T), some a or b coords are zeros

hi all,
I have 2 vectors a and b of shape (n, 1) (n can be 1...10^3, 10^4, may 
be more); some coords of a or b usually are zeros (or both a and b, but 
b is more often); getting matrix c = dot(a, b.T) is required (c.shape = 
(n,n))

What's the best way to speedup calculations (w/o using scipy, only numpy)?
(I intend to use the feature to provide a minor enhancement for NLP/NSP 
ralg solver).

Thank you in advance, D.
Alexander Dietz | 2 Jan 20:44
Picon
Picon
Favicon

Re: Usage of scipy KS test

Hi,

thanks a lot for the quick reply, but your suggestion does not seem to work.

On Jan 2, 2008 5:08 PM, Anne Archibald <peridot.faceted <at> gmail.com > wro

This format is more complicated than what we need. scipy.stats.kstest
wants the list of (not necessarily sorted) x values, and a function
that evaluates the CDF. The simplest thing to do is provide it your
function that evaluates the CDF rather than computing m. If, however,
you have already computed m, you can cheat: scipy.stats.kstest only
needs to evaluate the function at the points in x, so you can create a
function based on dictionary lookup:

scipy.stats.kstest(x,dict(zip(x,m)).get)

This should return a tuple containing the KS D value and the
probability a data set like this one would be obtained from a
probability distribution with your CDF.


When I use your suggestion, I get an error:

 File "/usr/lib/python2.4/site-packages/scipy/stats/stats.py", line 1716, in kstest
    cdfvals = cdf(vals, *args)
TypeError: unhashable type

I tried with get(), but this also did not work.  Also, in this example I do not see the vector 'm' containing the modeled values. They must enter somehow the expression....

Assumed, I calculate the D-value by myself. Can I then use stats.ksprob to calculate the probability? Do I have to use sqrt(n)*D as argument?


Thanks
  Alex
_______________________________________________
SciPy-user mailing list
SciPy-user <at> scipy.org
http://projects.scipy.org/mailman/listinfo/scipy-user
Anne Archibald | 2 Jan 21:15
Picon

Re: Usage of scipy KS test

On 02/01/2008, Alexander Dietz <Alexander.Dietz <at> astro.cf.ac.uk> wrote:

> On Jan 2, 2008 5:08 PM, Anne Archibald <peridot.faceted <at> gmail.com > wro

> > scipy.stats.kstest(x,dict(zip(x,m)).get)

> When I use your suggestion, I get an error:
>
>  File
> "/usr/lib/python2.4/site-packages/scipy/stats/stats.py",
> line 1716, in kstest
>     cdfvals = cdf(vals, *args)
> TypeError: unhashable type
>
> I tried with get(), but this also did not work.  Also, in this example I do
> not see the vector 'm' containing the modeled values. They must enter
> somehow the expression....

Well, if x is the list of x values (floats) and m is the list of CDF
values (also floats), then zip(x,m) is the list of pairs (x, CDF(x)).
If you have arrays, you might need to convert them to lists first
(x=list(x) for example). dict(zip(x,m)) makes a dictionary out of such
a list of pairs. dict(zip(x,m)).get is a function that maps xs to ms.
Unfortunately it only maps a single x to a single m; you need to use
numpy.vectorize on it:

scipy.stats.kstest(x,numpy.vectorize(dict(zip(x,m)).get))

numpy.vectorize makes it able to map an array of xs to an array of ms.
That should work. But if you can, you should give kstest your real
CDF-calculating function (possibly wrapped in numpy.vectorize, if it
doesn't work on arrays).

> Assumed, I calculate the D-value by myself. Can I then use stats.ksprob to
> calculate the probability? Do I have to use sqrt(n)*D as argument?

I'm not sure what ksprob wants. It will really be clearer to use kstest.

I should warn you, if your probability distribution is not continuous
- like, for example, a Poisson distribution - kstest will not work.

Anne
Alexander Dietz | 2 Jan 21:24
Picon
Picon
Favicon

Re: Usage of scipy KS test

Hi,

On Jan 2, 2008 8:15 PM, Anne Archibald <peridot.faceted <at> gmail.com> wrote:
On 02/01/2008, Alexander Dietz <Alexander.Dietz <at> astro.cf.ac.uk> wrote:

> On Jan 2, 2008 5:08 PM, Anne Archibald < peridot.faceted <at> gmail.com > wro

> > scipy.stats.kstest(x,dict(zip(x,m)).get)

> When I use your suggestion, I get an error:
>
>  File
> "/usr/lib/python2.4/site-packages/scipy/stats/stats.py",
> line 1716, in kstest
>     cdfvals = cdf(vals, *args)
> TypeError: unhashable type
>
> I tried with get(), but this also did not work.  Also, in this example I do
> not see the vector 'm' containing the modeled values. They must enter
> somehow the expression....

Well, if x is the list of x values (floats) and m is the list of CDF
values (also floats), then zip(x,m) is the list of pairs (x, CDF(x)).
If you have arrays, you might need to convert them to lists first
(x=list(x) for example). dict(zip(x,m)) makes a dictionary out of such
a list of pairs. dict(zip(x,m)).get is a function that maps xs to ms.
Unfortunately it only maps a single x to a single m; you need to use
numpy.vectorize on it:

scipy.stats.kstest(x,numpy.vectorize(dict(zip(x,m)).get))

numpy.vectorize makes it able to map an array of xs to an array of ms.
That should work. But if you can, you should give kstest your real
CDF-calculating function (possibly wrapped in numpy.vectorize, if it
doesn't work on arrays).

Sorry, mixed up two vectors. In the expression above you us the vectors x and m, but not y. See the following concrete example, which defines three vectors and a plot:


x = numpy.asarray([ 0.089,  0.11,   0.161,  0.226,  0.257,  0.287,  0.31,   0.41,   0.438,  0.45,\
           0.547,  0.827,  1.13,   1.8  ])
y = numpy.asarray([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])
m = numpy.asarray([  0.91405923 ,  1.36472838,   2.94870517,   4.59609492,   5.37847868,\
            6.11545809 ,  6.57990978,   8.56403531,   9.0550575,    9.20841591,\
            10.50502489,  12.50640372, 13.29624546 ,  13.64958435])

clf()
plot( x, y)
plot( x, m)
savefig('test.png')

My question: With what probability to the two lines match, i.e. what is the probability that both curves are (not) from the same distribution.

Also; your example above still dod not work. Here is the error:

  File "/usr/lib/python2.4/site-packages/numpy/lib/function_base.py", line 799, in __init__
    nin, ndefault = _get_nargs(pyfunc)
  File "/usr/lib/python2.4/site-packages/numpy/lib/function_base.py", line 756, in _get_nargs
    raise ValueError, 'failed to determine the number of arguments for %s' % (obj)
ValueError: failed to determine the number of arguments for <built-in method get of dict object at 0xb78703e4>


Thanks
  Alex




> Assumed, I calculate the D-value by myself. Can I then use stats.ksprob to
> calculate the probability? Do I have to use sqrt(n)*D as argument?

I'm not sure what ksprob wants. It will really be clearer to use kstest.

I should warn you, if your probability distribution is not continuous
- like, for example, a Poisson distribution - kstest will not work.

Anne
_______________________________________________
SciPy-user mailing list
SciPy-user <at> scipy.org
http://projects.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-user mailing list
SciPy-user <at> scipy.org
http://projects.scipy.org/mailman/listinfo/scipy-user
Scott Ransom | 2 Jan 21:29
Picon
Favicon
Gravatar

Re: Usage of scipy KS test

> > Assumed, I calculate the D-value by myself. Can I then use
> > stats.ksprob to calculate the probability? Do I have to use
> > sqrt(n)*D as argument?
>
> I'm not sure what ksprob wants. It will really be clearer to use
> kstest.

ksprob (which is the same as scipy.special.kolmogorov) expects 
sqrt(n)*D.   This is useful if you determine D from your own routine 
for a two-sided KS-test, for example.

Scott

--

-- 
Scott M. Ransom            Address:  NRAO
Phone:  (434) 296-0320               520 Edgemont Rd.
email:  sransom <at> nrao.edu             Charlottesville, VA 22903 USA
GPG Fingerprint: 06A9 9553 78BE 16DB 407B  FFCA 9BFA B6FF FFD3 2989
Nathan Bell | 2 Jan 21:47
Picon
Gravatar

Re: howto dot(a, b.T), some a or b coords are zeros

On Jan 2, 2008 1:28 PM, dmitrey <dmitrey.kroshko <at> scipy.org> wrote:
> hi all,
> I have 2 vectors a and b of shape (n, 1) (n can be 1...10^3, 10^4, may
> be more); some coords of a or b usually are zeros (or both a and b, but
> b is more often); getting matrix c = dot(a, b.T) is required (c.shape =
> (n,n))
>
> What's the best way to speedup calculations (w/o using scipy, only numpy)?
> (I intend to use the feature to provide a minor enhancement for NLP/NSP
> ralg solver).

How much more expensive is dot(a,b.T) than zeros((n,n))?  Is outer()
any faster?  What proportion of a and b are zero?

You could remove all zeros from a and b, compute that outer product,
and then paste the results back into an n by n matrix.  I doubt this
would be any faster though since the outerproduct doesn't do many
FLOPs.

I know you don't want to use scipy, but time the following:

from scipy.sparse import *
asp = csr_matrix(a)
bsp = csr_matrix(b.T)

c = asp * bsp # time this

--

-- 
Nathan Bell wnbell <at> gmail.com

Gmane