1 Aug 02:48 2010

### Off by one bug in Scipy.stats.hypergeom

Hi!

Perhaps I'm using the module incorrectly, but it looks like the x parameter in scipy.stats.hypergeom is off by one.  Specifically, I think it's one-too-high.

From the wikipedia article http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_example (I know they could be wrong-- just hear me out on this),

scipy.stats.hypergeom?
Hypergeometric distribution

Models drawing objects from a bin.
M is total number of objects, n is total number of Type I objects.
RV counts number of Type I objects in N drawn without replacement from
population.
So translating wikipedia's example...
Pr(x=4; M=50, n=5, N=10)  = (choose(5,4) * choose(50-5, 10-4)) / choose(50,10) = .003964583
Pr(x=5; M=50, n=5, N=10)  = (choose(5,5) * choose(50-5, 10-5)) / choose(50,10) = .0001189375

Which you can check with the python code:
from scipy import comb as chse   # "combination" => choose

float((chse(5,4, exact=1) * chse(50-5,10-4, exact=1))) / chse(50,10,exact=1)  # example one
0.0039645830580150654155  # okay!

float((chse(5,5, exact=1) * chse(50-5,10-5, exact=1))) / chse(50,10,exact=1) # example two
0.00011893749174045196247  # okay!

Try example one with scipy.stats.hypergeom:
# scipy.stats.hypergeom.sf(x, M, n, N)
scipy.stats.hypergeom.sf(4,50,5,10)
0.00011893749169422652     # correct value for x=5, not x=4
scipy.stats.hypergeom.sf(5,50,5,10)
-4.6185277824406512e-14    # wrong

It seems that changing the loc value from =0 (default) to =1 fixes the issue...
scipy.stats.hypergeom.sf(4,50,5,10, loc=1)
0.0040835205497095073    # close enough

scipy.stats.hypergeom.sf(5,50,5,10, loc=1)
0.00011893749169422652   # okay!

Am I using the package wrong?
--
Jake Biesinger
Xie Lab, UC Irvine
(949) 231-7587
```_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
```
1 Aug 03:02 2010

### Re: Off by one bug in Scipy.stats.hypergeom

On Sat, Jul 31, 2010 at 8:48 PM, Jacob Biesinger wrote:

Hi!

Perhaps I'm using the module incorrectly, but it looks like the x parameter in scipy.stats.hypergeom is off by one.  Specifically, I think it's one-too-high.

From the wikipedia article http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_example (I know they could be wrong-- just hear me out on this),

I often see slight parameterization differences for stats packages and I never assume they are the same as each other or even as their documentation until I test some examples myself.  So unless this behavior doesn't match the scipy docs then it probably doesn't count as a bug.

Alex
```_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
```
1 Aug 04:27 2010

### Re: Off by one bug in Scipy.stats.hypergeom

```On Sat, Jul 31, 2010 at 8:48 PM, Jacob Biesinger
<jake.biesinger <at> gmail.com> wrote:
> Hi!
> Perhaps I'm using the module incorrectly, but it looks like the x parameter
> in scipy.stats.hypergeom is off by one.  Specifically, I think it's
> one-too-high.
> From the wikipedia article
> http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_example (I
> know they could be wrong-- just hear me out on this),
> scipy.stats.hypergeom?
> Hypergeometric distribution
>
>        Models drawing objects from a bin.
>        M is total number of objects, n is total number of Type I objects.
>        RV counts number of Type I objects in N drawn without replacement
> from
>        population.
> So translating wikipedia's example...
> Pr(x=4; M=50, n=5, N=10)  = (choose(5,4) * choose(50-5, 10-4)) /
> choose(50,10) = .003964583
> Pr(x=5; M=50, n=5, N=10)  = (choose(5,5) * choose(50-5, 10-5)) /
> choose(50,10) = .0001189375
> Which you can check with the python code:
> from scipy import comb as chse   # "combination" => choose
>
> float((chse(5,4, exact=1) * chse(50-5,10-4, exact=1))) / chse(50,10,exact=1)
>  # example one
> 0.0039645830580150654155  # okay!
> float((chse(5,5, exact=1) * chse(50-5,10-5, exact=1))) / chse(50,10,exact=1)
> # example two
> 0.00011893749174045196247  # okay!
> Try example one with scipy.stats.hypergeom:
> # scipy.stats.hypergeom.sf(x, M, n, N)
> scipy.stats.hypergeom.sf(4,50,5,10)
> 0.00011893749169422652     # correct value for x=5, not x=4
> scipy.stats.hypergeom.sf(5,50,5,10)
> -4.6185277824406512e-14    # wrong
> It seems that changing the loc value from =0 (default) to =1 fixes the
> issue...
> scipy.stats.hypergeom.sf(4,50,5,10, loc=1)
> 0.0040835205497095073    # close enough
> scipy.stats.hypergeom.sf(5,50,5,10, loc=1)
> 0.00011893749169422652   # okay!
> Am I using the package wrong?

I don't know why you are using the survival function.

>From some quick examples, hypergeom looks ok:

pmf has identical results to the Wikipedia example you referenced

>>> stats.hypergeom.pmf([4,5],50,5,10)
array([ 0.00396458,  0.00011894])
>>> stats.hypergeom.pmf(4,50,5,10)
0.0039645830580151411
>>> stats.hypergeom.pmf(5,50,5,10)
0.00011893749174045286

consistency of pmf, cdf, sf

>>> stats.hypergeom.pmf(np.arange(5),50,5,10).sum()
0.9998810625082829
>>> stats.hypergeom.cdf(4,50,5,10)
0.9998810625082829
>>> stats.hypergeom.sf(4,50,5,10)
0.00011893749171709711
>>> 1-stats.hypergeom.pmf(np.arange(5),50,5,10).sum()
0.00011893749171709711

I'm always glad when someone verifies the numbers in scipy stats and
appreciate any reports of inconsistencies or bugs. If there are
differences in the parameterization, then we should make sure that
they are sufficiently documented.
Most distributions are internally tested or against numpy.random and
there are very few tests with other packages.

Thanks,

Josef

> --
> Jake Biesinger
> Xie Lab, UC Irvine
> (949) 231-7587
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User <at> scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
```
1 Aug 07:16 2010

### Re: Off by one bug in Scipy.stats.hypergeom

I have seen the error of my ways and apologize for the noise in the list-- I should have been using the probability mass function instead of the survival function.  Doing the summation manually shows that all is well:

def hypergeom(x, fgTotal, bgMatching, bgTotal):
return scipy.comb(fgTotal,x) * scipy.comb(bgTotal-fgTotal,bgMatching-x) / scipy.comb(bgTotal,bgMatching)

>>> scipy.stats.hypergeom.sf(4,50,5,10, loc=1)
0.0040835205497095073
>>> sum(scipy.stats.hypergeom.pmf(x,50,5,10) for x in range(4,min(10+1,50+1)))
0.0040835205497557125
>>> sum([hypergeom(x, 10, 5, 50) for x in scipy.arange(4, min(10+1, 50+1))])
0.0040835205497557186

>>> scipy.stats.hypergeom.sf(5,50,5,10, loc=1)
0.00011893749169422652
>>> sum(scipy.stats.hypergeom.pmf(x,50,5,10) for x in range(5,min(10+1,50+1)))
0.00011893749174045688
>>> sum([hypergeom(x, 10, 5, 50) for x in scipy.arange(5, min(10+1, 50+1))])
0.00011893749174045679

Thanks anyway!
--
Jake Biesinger
Xie Lab, UC Irvine
(949) 231-7587

On Sat, Jul 31, 2010 at 5:48 PM, Jacob Biesinger wrote:
Hi!

Perhaps I'm using the module incorrectly, but it looks like the x parameter in scipy.stats.hypergeom is off by one.  Specifically, I think it's one-too-high.

From the wikipedia article http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_example (I know they could be wrong-- just hear me out on this),

scipy.stats.hypergeom?
Hypergeometric distribution

Models drawing objects from a bin.
M is total number of objects, n is total number of Type I objects.
RV counts number of Type I objects in N drawn without replacement from
population.
So translating wikipedia's example...
Pr(x=4; M=50, n=5, N=10)  = (choose(5,4) * choose(50-5, 10-4)) / choose(50,10) = .003964583
Pr(x=5; M=50, n=5, N=10)  = (choose(5,5) * choose(50-5, 10-5)) / choose(50,10) = .0001189375

Which you can check with the python code:
from scipy import comb as chse   # "combination" => choose

float((chse(5,4, exact=1) * chse(50-5,10-4, exact=1))) / chse(50,10,exact=1)  # example one
0.0039645830580150654155  # okay!

float((chse(5,5, exact=1) * chse(50-5,10-5, exact=1))) / chse(50,10,exact=1) # example two
0.00011893749174045196247  # okay!

Try example one with scipy.stats.hypergeom:
# scipy.stats.hypergeom.sf(x, M, n, N)
scipy.stats.hypergeom.sf(4,50,5,10)
0.00011893749169422652     # correct value for x=5, not x=4
scipy.stats.hypergeom.sf(5,50,5,10)
-4.6185277824406512e-14    # wrong

It seems that changing the loc value from =0 (default) to =1 fixes the issue...
scipy.stats.hypergeom.sf(4,50,5,10, loc=1)
0.0040835205497095073    # close enough

scipy.stats.hypergeom.sf(5,50,5,10, loc=1)
0.00011893749169422652   # okay!

Am I using the package wrong?
--
Jake Biesinger
Xie Lab, UC Irvine
(949) 231-7587

```_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
```
1 Aug 11:33 2010

### Re: Off by one bug in Scipy.stats.hypergeom

```On Sun, Aug 1, 2010 at 1:16 AM, Jacob Biesinger
<jake.biesinger <at> gmail.com> wrote:
> I have seen the error of my ways and apologize for the noise in the list-- I
> should have been using the probability mass function instead of the survival
> function.  Doing the summation manually shows that all is well:
> def hypergeom(x, fgTotal, bgMatching, bgTotal):
>         return scipy.comb(fgTotal,x) *
> scipy.comb(bgTotal-fgTotal,bgMatching-x) / scipy.comb(bgTotal,bgMatching)
>>>> scipy.stats.hypergeom.sf(4,50,5,10, loc=1)
> 0.0040835205497095073
>>>> sum(scipy.stats.hypergeom.pmf(x,50,5,10) for x in
>>>> range(4,min(10+1,50+1)))
> 0.0040835205497557125
>>>> sum([hypergeom(x, 10, 5, 50) for x in scipy.arange(4, min(10+1, 50+1))])
> 0.0040835205497557186
>
>>>> scipy.stats.hypergeom.sf(5,50,5,10, loc=1)
> 0.00011893749169422652
>>>> sum(scipy.stats.hypergeom.pmf(x,50,5,10) for x in
>>>> range(5,min(10+1,50+1)))
> 0.00011893749174045688
>>>> sum([hypergeom(x, 10, 5, 50) for x in scipy.arange(5, min(10+1, 50+1))])
> 0.00011893749174045679

Note that the distribution methods are vectorized. You can use an
array_like for x instead of the list comprehension.

Josef

>
> Thanks anyway!
> --
> Jake Biesinger
> Xie Lab, UC Irvine
> (949) 231-7587
>
>
> On Sat, Jul 31, 2010 at 5:48 PM, Jacob Biesinger <jake.biesinger <at> gmail.com>
> wrote:
>>
>> Hi!
>> Perhaps I'm using the module incorrectly, but it looks like the x
>> parameter in scipy.stats.hypergeom is off by one.  Specifically, I think
>> it's one-too-high.
>> From the wikipedia article
>> http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_example (I
>> know they could be wrong-- just hear me out on this),
>> scipy.stats.hypergeom?
>> Hypergeometric distribution
>>
>>        Models drawing objects from a bin.
>>        M is total number of objects, n is total number of Type I objects.
>>        RV counts number of Type I objects in N drawn without replacement
>> from
>>        population.
>> So translating wikipedia's example...
>> Pr(x=4; M=50, n=5, N=10)  = (choose(5,4) * choose(50-5, 10-4)) /
>> choose(50,10) = .003964583
>> Pr(x=5; M=50, n=5, N=10)  = (choose(5,5) * choose(50-5, 10-5)) /
>> choose(50,10) = .0001189375
>> Which you can check with the python code:
>> from scipy import comb as chse   # "combination" => choose
>>
>> float((chse(5,4, exact=1) * chse(50-5,10-4, exact=1))) /
>> chse(50,10,exact=1)  # example one
>> 0.0039645830580150654155  # okay!
>> float((chse(5,5, exact=1) * chse(50-5,10-5, exact=1))) /
>> chse(50,10,exact=1) # example two
>> 0.00011893749174045196247  # okay!
>> Try example one with scipy.stats.hypergeom:
>> # scipy.stats.hypergeom.sf(x, M, n, N)
>> scipy.stats.hypergeom.sf(4,50,5,10)
>> 0.00011893749169422652     # correct value for x=5, not x=4
>> scipy.stats.hypergeom.sf(5,50,5,10)
>> -4.6185277824406512e-14    # wrong
>> It seems that changing the loc value from =0 (default) to =1 fixes the
>> issue...
>> scipy.stats.hypergeom.sf(4,50,5,10, loc=1)
>> 0.0040835205497095073    # close enough
>> scipy.stats.hypergeom.sf(5,50,5,10, loc=1)
>> 0.00011893749169422652   # okay!
>> Am I using the package wrong?
>> --
>> Jake Biesinger
>> Xie Lab, UC Irvine
>> (949) 231-7587
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User <at> scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
```
1 Aug 13:33 2010

### ANN: Numexpr 1.4 released

```Hi,

After the Theano talk in last EuroSciPy I suddenly realized that it would not
be too difficult to implement a multi-threaded version of Numexpr.  Well, as
usual I was terribly wrong and it took me a *long* week to do the job :-/

Anyway the thing is done now, so... enjoy!

Note for PyTables users:  Numexpr does not include changes in the API/ABI, so
the upgrade from 1.3.1 is recommended.

========================
Announcing Numexpr 1.4
========================

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

What's new
==========

The main improvement in this version is the support for
the GIL that hampers performance in many Python apps.

implementation is briefly described and where some benchmarks are
shown:

In case you want to know more in detail what has changed in this
version, see:

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=========================

The project is hosted at Google code in:

And you can get the packages from PyPI as well:

http://pypi.python.org/pypi

=====================

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.

Enjoy!

--

--
Francesc Alted
```
1 Aug 14:05 2010

### [ANN] ALGOPY 0.21, algorithmic differentiation in Python

```I'm happy to announce the first official release of ALGOPY in version 0.2.1.

Rationale:
~~~~~~~~
The purpose of ALGOPY is the evaluation of higher-order derivatives in
the forward and reverse mode of Algorithmic Differentiation (AD) using
univariate Taylor polynomial arithmetic. Particular focus are
functions that contain numerical linear algebra functions (e.g. inv,
dot, eigh, qr, cholesky,...) as they often appear in statistically
motivated functions.

~~~~~~~~~
http://pypi.python.org/pypi/algopy/0.2.1
or bleeding edge versions from from http://github.com/b45ch1/algopy

Documentation:
~~~~~~~~~~~~
available at http://packages.python.org/algopy/

OS Support:
~~~~~~~~~~
Linux, Windows (tested with pythonxy), should also work on Mac

Software Dependencies:
~~~~~~~~~~~~~~~~~~~~
for the core: numpy, scipy
for testing:  nose

Exampe Session:
~~~~~~~~~~~~~
Consider the contrived example where it is the goal to compute the
directional derivative df/dx_1 :

>>> import numpy; from numpy import log, exp, sin, cos
>>> import algopy; from algopy import UTPM, dot, inv, zeros
>>>
>>> def f(x):
...     A = zeros((2,2),dtype=x)
...     A[0,0] = numpy.log(x[0]*x[1])
...     A[0,1] = numpy.log(x[1]) + exp(x[0])
...     A[1,0] = sin(x[1])**2 + cos(x[0])**3.1
...     A[1,1] = x[0]**cos(x[1])
...     return log( dot(x.T,  dot( inv(A), x)))
...
>>>
>>> x = UTPM(zeros((2,1,2),dtype=float))
>>> x.data[0,0] = [1,2]
>>> x.data[1,0] = [1,0]
>>> y = f(x)
>>>
>>> print 'normal function evaluation f(x) = ',y.data[0,0]
normal function evaluation f(x) =  0.641250189986
>>> print 'directional derivative df/dx1 = ',y.data[1,0]
directional derivative df/dx1 =  1.62982340133
```
1 Aug 17:36 2010

### ANN: PySAL 1.0.0

```We are pleased to announce the release of PySAL 1.0.0.

PySAL is a library of tools for spatial data analysis written in Python.  This
release comes after an extensive integration and refactoring effort to combine
common analytical functionality in the projects STARS and PySpace into a shared
open source library for geospatial analysis and geocomputation. For more

Python 2.6 binaries for Windows and OS X are available, as well as source
tarballs for other platforms and documentation in html form.

Enjoy,

The PySAL developers

=================================================
Python Spatial Analysis Library
=================================================

.. Contents::

What is PySAL
--------------

PySAL modules
+++++++++++++

* pysal.core — Core Data Structures and IO
* pysal.cg — Computational Geometry
* pysal.esda — Exploratory Spatial Data Analysis
* pysal.inequality — Spatial Inequality Analysis
* pysal.spatial_dynamics — Spatial Dynamics
* pysal.region — Spatially constrained clustering
* pysal.weights — Spatial Weights
* pysal.FileIO — PySAL FileIO: Module for reading and writing
various file types in a Pythonic way

Documentation
-------------

The documentation site is here
http://pysal.org/contents.html

Web sites
---------

PySAL's home is here
http://pysal.org/

The developer's site is here

Mailing Lists
-------------

Please see the developer's list here

Help for users is here

Bug reports
-----------

To search for or report bugs, please see

-------------------

See the file "LICENSE.txt" for information on the history of this
software, terms & conditions for usage, and a DISCLAIMER OF ALL
WARRANTIES.

--

--
Sergio (Serge) Rey
Professor, School of Geographical Sciences and Urban Planning
Arizona State University
http://geoplan.asu.edu/rey

Editor, International Regional Science Review
http://irx.sagepub.com
_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
```
2 Aug 13:13 2010

### Installing scipy 0.8.0 from sources. (where is recaster.py?)

```Here is the last message from trying to install scipy from
sources.(Numpy installed fine)

creating /usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/myfunctools.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/utils.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/__init__.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/setup.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/recaster.py ->
/usr/lib/python2.5/site-packages/scipy/io
error: /usr/lib/python2.5/site-packages/scipy/io/recaster.py: No such
file or directory

I am wondering what causes the last line error?

error: /usr/lib/python2.5/site-packages/scipy/io/recaster.py: No such
file or directory.

Regards,
ernie
```
2 Aug 16:17 2010

### Re: Installing scipy 0.8.0 from sources. (where is recaster.py?)

On Mon, Aug 2, 2010 at 7:13 PM, Ernesto Adorio wrote:
Here is the last message from trying to install scipy from
sources.(Numpy installed fine)

creating /usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/myfunctools.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/utils.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/__init__.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/setup.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/recaster.py ->
/usr/lib/python2.5/site-packages/scipy/io
error: /usr/lib/python2.5/site-packages/scipy/io/recaster.py: No such
file or directory

I am wondering what causes the last line error?

error: /usr/lib/python2.5/site-packages/scipy/io/recaster.py: No such
file or directory.

The recaster.py file is present, but somehow you can't copy it over - strange error

What OS, compilers and install command are you using? And can you please provide your whole build log?

Ralf
```_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
```

Gmane