Jacob Biesinger | 1 Aug 02:48 2010
Picon

Off by one bug in Scipy.stats.hypergeom

Hi!


Perhaps I'm using the module incorrectly, but it looks like the x parameter in scipy.stats.hypergeom is off by one.  Specifically, I think it's one-too-high.

From the wikipedia article http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_example (I know they could be wrong-- just hear me out on this), 

scipy.stats.hypergeom?
Hypergeometric distribution
    
       Models drawing objects from a bin.
       M is total number of objects, n is total number of Type I objects.
       RV counts number of Type I objects in N drawn without replacement from
       population.
So translating wikipedia's example...
Pr(x=4; M=50, n=5, N=10)  = (choose(5,4) * choose(50-5, 10-4)) / choose(50,10) = .003964583
Pr(x=5; M=50, n=5, N=10)  = (choose(5,5) * choose(50-5, 10-5)) / choose(50,10) = .0001189375

Which you can check with the python code:
from scipy import comb as chse   # "combination" => choose
 
float((chse(5,4, exact=1) * chse(50-5,10-4, exact=1))) / chse(50,10,exact=1)  # example one
0.0039645830580150654155  # okay!

float((chse(5,5, exact=1) * chse(50-5,10-5, exact=1))) / chse(50,10,exact=1) # example two
0.00011893749174045196247  # okay!

Try example one with scipy.stats.hypergeom:
# scipy.stats.hypergeom.sf(x, M, n, N)
scipy.stats.hypergeom.sf(4,50,5,10)
0.00011893749169422652     # correct value for x=5, not x=4
scipy.stats.hypergeom.sf(5,50,5,10)
-4.6185277824406512e-14    # wrong

It seems that changing the loc value from =0 (default) to =1 fixes the issue...
scipy.stats.hypergeom.sf(4,50,5,10, loc=1)
0.0040835205497095073    # close enough 

scipy.stats.hypergeom.sf(5,50,5,10, loc=1)
0.00011893749169422652   # okay!

Am I using the package wrong?
--
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine
(949) 231-7587
_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
alex | 1 Aug 03:02 2010
Picon

Re: Off by one bug in Scipy.stats.hypergeom

On Sat, Jul 31, 2010 at 8:48 PM, Jacob Biesinger <jake.biesinger <at> gmail.com> wrote:

Hi!

Perhaps I'm using the module incorrectly, but it looks like the x parameter in scipy.stats.hypergeom is off by one.  Specifically, I think it's one-too-high.

From the wikipedia article http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_example (I know they could be wrong-- just hear me out on this), 


I often see slight parameterization differences for stats packages and I never assume they are the same as each other or even as their documentation until I test some examples myself.  So unless this behavior doesn't match the scipy docs then it probably doesn't count as a bug.

Alex
_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
josef.pktd | 1 Aug 04:27 2010
Picon

Re: Off by one bug in Scipy.stats.hypergeom

On Sat, Jul 31, 2010 at 8:48 PM, Jacob Biesinger
<jake.biesinger <at> gmail.com> wrote:
> Hi!
> Perhaps I'm using the module incorrectly, but it looks like the x parameter
> in scipy.stats.hypergeom is off by one.  Specifically, I think it's
> one-too-high.
> From the wikipedia article
> http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_example (I
> know they could be wrong-- just hear me out on this),
> scipy.stats.hypergeom?
> Hypergeometric distribution
>
>        Models drawing objects from a bin.
>        M is total number of objects, n is total number of Type I objects.
>        RV counts number of Type I objects in N drawn without replacement
> from
>        population.
> So translating wikipedia's example...
> Pr(x=4; M=50, n=5, N=10)  = (choose(5,4) * choose(50-5, 10-4)) /
> choose(50,10) = .003964583
> Pr(x=5; M=50, n=5, N=10)  = (choose(5,5) * choose(50-5, 10-5)) /
> choose(50,10) = .0001189375
> Which you can check with the python code:
> from scipy import comb as chse   # "combination" => choose
>
> float((chse(5,4, exact=1) * chse(50-5,10-4, exact=1))) / chse(50,10,exact=1)
>  # example one
> 0.0039645830580150654155  # okay!
> float((chse(5,5, exact=1) * chse(50-5,10-5, exact=1))) / chse(50,10,exact=1)
> # example two
> 0.00011893749174045196247  # okay!
> Try example one with scipy.stats.hypergeom:
> # scipy.stats.hypergeom.sf(x, M, n, N)
> scipy.stats.hypergeom.sf(4,50,5,10)
> 0.00011893749169422652     # correct value for x=5, not x=4
> scipy.stats.hypergeom.sf(5,50,5,10)
> -4.6185277824406512e-14    # wrong
> It seems that changing the loc value from =0 (default) to =1 fixes the
> issue...
> scipy.stats.hypergeom.sf(4,50,5,10, loc=1)
> 0.0040835205497095073    # close enough
> scipy.stats.hypergeom.sf(5,50,5,10, loc=1)
> 0.00011893749169422652   # okay!
> Am I using the package wrong?

I don't know why you are using the survival function.

>From some quick examples, hypergeom looks ok:

pmf has identical results to the Wikipedia example you referenced

>>> stats.hypergeom.pmf([4,5],50,5,10)
array([ 0.00396458,  0.00011894])
>>> stats.hypergeom.pmf(4,50,5,10)
0.0039645830580151411
>>> stats.hypergeom.pmf(5,50,5,10)
0.00011893749174045286

consistency of pmf, cdf, sf

>>> stats.hypergeom.pmf(np.arange(5),50,5,10).sum()
0.9998810625082829
>>> stats.hypergeom.cdf(4,50,5,10)
0.9998810625082829
>>> stats.hypergeom.sf(4,50,5,10)
0.00011893749171709711
>>> 1-stats.hypergeom.pmf(np.arange(5),50,5,10).sum()
0.00011893749171709711

I'm always glad when someone verifies the numbers in scipy stats and
appreciate any reports of inconsistencies or bugs. If there are
differences in the parameterization, then we should make sure that
they are sufficiently documented.
Most distributions are internally tested or against numpy.random and
there are very few tests with other packages.

Thanks,

Josef

> --
> Jake Biesinger
> Graduate Student
> Xie Lab, UC Irvine
> (949) 231-7587
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User <at> scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
Jacob Biesinger | 1 Aug 07:16 2010
Picon

Re: Off by one bug in Scipy.stats.hypergeom

I have seen the error of my ways and apologize for the noise in the list-- I should have been using the probability mass function instead of the survival function.  Doing the summation manually shows that all is well:


def hypergeom(x, fgTotal, bgMatching, bgTotal):
        return scipy.comb(fgTotal,x) * scipy.comb(bgTotal-fgTotal,bgMatching-x) / scipy.comb(bgTotal,bgMatching)

>>> scipy.stats.hypergeom.sf(4,50,5,10, loc=1)
0.0040835205497095073
>>> sum(scipy.stats.hypergeom.pmf(x,50,5,10) for x in range(4,min(10+1,50+1)))
0.0040835205497557125
>>> sum([hypergeom(x, 10, 5, 50) for x in scipy.arange(4, min(10+1, 50+1))])
0.0040835205497557186


>>> scipy.stats.hypergeom.sf(5,50,5,10, loc=1)
0.00011893749169422652
>>> sum(scipy.stats.hypergeom.pmf(x,50,5,10) for x in range(5,min(10+1,50+1)))
0.00011893749174045688
>>> sum([hypergeom(x, 10, 5, 50) for x in scipy.arange(5, min(10+1, 50+1))])
0.00011893749174045679


Thanks anyway!
--
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine
(949) 231-7587


On Sat, Jul 31, 2010 at 5:48 PM, Jacob Biesinger <jake.biesinger <at> gmail.com> wrote:
Hi!

Perhaps I'm using the module incorrectly, but it looks like the x parameter in scipy.stats.hypergeom is off by one.  Specifically, I think it's one-too-high.

From the wikipedia article http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_example (I know they could be wrong-- just hear me out on this), 

scipy.stats.hypergeom?
Hypergeometric distribution
    
       Models drawing objects from a bin.
       M is total number of objects, n is total number of Type I objects.
       RV counts number of Type I objects in N drawn without replacement from
       population.
So translating wikipedia's example...
Pr(x=4; M=50, n=5, N=10)  = (choose(5,4) * choose(50-5, 10-4)) / choose(50,10) = .003964583
Pr(x=5; M=50, n=5, N=10)  = (choose(5,5) * choose(50-5, 10-5)) / choose(50,10) = .0001189375

Which you can check with the python code:
from scipy import comb as chse   # "combination" => choose
 
float((chse(5,4, exact=1) * chse(50-5,10-4, exact=1))) / chse(50,10,exact=1)  # example one
0.0039645830580150654155  # okay!

float((chse(5,5, exact=1) * chse(50-5,10-5, exact=1))) / chse(50,10,exact=1) # example two
0.00011893749174045196247  # okay!

Try example one with scipy.stats.hypergeom:
# scipy.stats.hypergeom.sf(x, M, n, N)
scipy.stats.hypergeom.sf(4,50,5,10)
0.00011893749169422652     # correct value for x=5, not x=4
scipy.stats.hypergeom.sf(5,50,5,10)
-4.6185277824406512e-14    # wrong

It seems that changing the loc value from =0 (default) to =1 fixes the issue...
scipy.stats.hypergeom.sf(4,50,5,10, loc=1)
0.0040835205497095073    # close enough 

scipy.stats.hypergeom.sf(5,50,5,10, loc=1)
0.00011893749169422652   # okay!

Am I using the package wrong?
--
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine
(949) 231-7587

_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
josef.pktd | 1 Aug 11:33 2010
Picon

Re: Off by one bug in Scipy.stats.hypergeom

On Sun, Aug 1, 2010 at 1:16 AM, Jacob Biesinger
<jake.biesinger <at> gmail.com> wrote:
> I have seen the error of my ways and apologize for the noise in the list-- I
> should have been using the probability mass function instead of the survival
> function.  Doing the summation manually shows that all is well:
> def hypergeom(x, fgTotal, bgMatching, bgTotal):
>         return scipy.comb(fgTotal,x) *
> scipy.comb(bgTotal-fgTotal,bgMatching-x) / scipy.comb(bgTotal,bgMatching)
>>>> scipy.stats.hypergeom.sf(4,50,5,10, loc=1)
> 0.0040835205497095073
>>>> sum(scipy.stats.hypergeom.pmf(x,50,5,10) for x in
>>>> range(4,min(10+1,50+1)))
> 0.0040835205497557125
>>>> sum([hypergeom(x, 10, 5, 50) for x in scipy.arange(4, min(10+1, 50+1))])
> 0.0040835205497557186
>
>>>> scipy.stats.hypergeom.sf(5,50,5,10, loc=1)
> 0.00011893749169422652
>>>> sum(scipy.stats.hypergeom.pmf(x,50,5,10) for x in
>>>> range(5,min(10+1,50+1)))
> 0.00011893749174045688
>>>> sum([hypergeom(x, 10, 5, 50) for x in scipy.arange(5, min(10+1, 50+1))])
> 0.00011893749174045679

Note that the distribution methods are vectorized. You can use an
array_like for x instead of the list comprehension.

Josef

>
> Thanks anyway!
> --
> Jake Biesinger
> Graduate Student
> Xie Lab, UC Irvine
> (949) 231-7587
>
>
> On Sat, Jul 31, 2010 at 5:48 PM, Jacob Biesinger <jake.biesinger <at> gmail.com>
> wrote:
>>
>> Hi!
>> Perhaps I'm using the module incorrectly, but it looks like the x
>> parameter in scipy.stats.hypergeom is off by one.  Specifically, I think
>> it's one-too-high.
>> From the wikipedia article
>> http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_example (I
>> know they could be wrong-- just hear me out on this),
>> scipy.stats.hypergeom?
>> Hypergeometric distribution
>>
>>        Models drawing objects from a bin.
>>        M is total number of objects, n is total number of Type I objects.
>>        RV counts number of Type I objects in N drawn without replacement
>> from
>>        population.
>> So translating wikipedia's example...
>> Pr(x=4; M=50, n=5, N=10)  = (choose(5,4) * choose(50-5, 10-4)) /
>> choose(50,10) = .003964583
>> Pr(x=5; M=50, n=5, N=10)  = (choose(5,5) * choose(50-5, 10-5)) /
>> choose(50,10) = .0001189375
>> Which you can check with the python code:
>> from scipy import comb as chse   # "combination" => choose
>>
>> float((chse(5,4, exact=1) * chse(50-5,10-4, exact=1))) /
>> chse(50,10,exact=1)  # example one
>> 0.0039645830580150654155  # okay!
>> float((chse(5,5, exact=1) * chse(50-5,10-5, exact=1))) /
>> chse(50,10,exact=1) # example two
>> 0.00011893749174045196247  # okay!
>> Try example one with scipy.stats.hypergeom:
>> # scipy.stats.hypergeom.sf(x, M, n, N)
>> scipy.stats.hypergeom.sf(4,50,5,10)
>> 0.00011893749169422652     # correct value for x=5, not x=4
>> scipy.stats.hypergeom.sf(5,50,5,10)
>> -4.6185277824406512e-14    # wrong
>> It seems that changing the loc value from =0 (default) to =1 fixes the
>> issue...
>> scipy.stats.hypergeom.sf(4,50,5,10, loc=1)
>> 0.0040835205497095073    # close enough
>> scipy.stats.hypergeom.sf(5,50,5,10, loc=1)
>> 0.00011893749169422652   # okay!
>> Am I using the package wrong?
>> --
>> Jake Biesinger
>> Graduate Student
>> Xie Lab, UC Irvine
>> (949) 231-7587
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User <at> scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
Francesc Alted | 1 Aug 13:33 2010

ANN: Numexpr 1.4 released

Hi,

After the Theano talk in last EuroSciPy I suddenly realized that it would not 
be too difficult to implement a multi-threaded version of Numexpr.  Well, as 
usual I was terribly wrong and it took me a *long* week to do the job :-/

Anyway the thing is done now, so... enjoy!

Note for PyTables users:  Numexpr does not include changes in the API/ABI, so 
the upgrade from 1.3.1 is recommended.

========================
 Announcing Numexpr 1.4
========================

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

What's new
==========

The main improvement in this version is the support for
multi-threading in pure C.  Threading in C provides the best
performance in nowadays multi-core machines.  In addition, this avoids
the GIL that hampers performance in many Python apps.

Just to wet your appetite, look into this page where the
implementation is briefly described and where some benchmarks are
shown:

http://code.google.com/p/numexpr/wiki/MultiThreadVM

In case you want to know more in detail what has changed in this
version, see:

http://code.google.com/p/numexpr/wiki/ReleaseNotes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=========================

The project is hosted at Google code in:

http://code.google.com/p/numexpr/

And you can get the packages from PyPI as well:

http://pypi.python.org/pypi

Share your experience
=====================

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.

Enjoy!

--

-- 
Francesc Alted
Sebastian Walter | 1 Aug 14:05 2010
Picon

[ANN] ALGOPY 0.21, algorithmic differentiation in Python

I'm happy to announce the first official release of ALGOPY in version 0.2.1.

Rationale:
~~~~~~~~
The purpose of ALGOPY is the evaluation of higher-order derivatives in
the forward and reverse mode of Algorithmic Differentiation (AD) using
univariate Taylor polynomial arithmetic. Particular focus are
functions that contain numerical linear algebra functions (e.g. inv,
dot, eigh, qr, cholesky,...) as they often appear in statistically
motivated functions.

Download:
~~~~~~~~~
http://pypi.python.org/pypi/algopy/0.2.1
or bleeding edge versions from from http://github.com/b45ch1/algopy

Documentation:
~~~~~~~~~~~~
available at http://packages.python.org/algopy/

OS Support:
~~~~~~~~~~
Linux, Windows (tested with pythonxy), should also work on Mac

Software Dependencies:
~~~~~~~~~~~~~~~~~~~~
for the core: numpy, scipy
for testing:  nose

Exampe Session:
~~~~~~~~~~~~~
Consider the contrived example where it is the goal to compute the
directional derivative df/dx_1 :

>>> import numpy; from numpy import log, exp, sin, cos
>>> import algopy; from algopy import UTPM, dot, inv, zeros
>>>
>>> def f(x):
...     A = zeros((2,2),dtype=x)
...     A[0,0] = numpy.log(x[0]*x[1])
...     A[0,1] = numpy.log(x[1]) + exp(x[0])
...     A[1,0] = sin(x[1])**2 + cos(x[0])**3.1
...     A[1,1] = x[0]**cos(x[1])
...     return log( dot(x.T,  dot( inv(A), x)))
...
>>>
>>> x = UTPM(zeros((2,1,2),dtype=float))
>>> x.data[0,0] = [1,2]
>>> x.data[1,0] = [1,0]
>>> y = f(x)
>>>
>>> print 'normal function evaluation f(x) = ',y.data[0,0]
normal function evaluation f(x) =  0.641250189986
>>> print 'directional derivative df/dx1 = ',y.data[1,0]
directional derivative df/dx1 =  1.62982340133
Serge Rey | 1 Aug 17:36 2010
Picon

ANN: PySAL 1.0.0

We are pleased to announce the release of PySAL 1.0.0.

PySAL is a library of tools for spatial data analysis written in Python.  This
release comes after an extensive integration and refactoring effort to combine
common analytical functionality in the projects STARS and PySpace into a shared
open source library for geospatial analysis and geocomputation. For more
information, please see the release notes that follow below.

You can download PySAL from here:
http://code.google.com/p/pysal/downloads/list
Python 2.6 binaries for Windows and OS X are available, as well as source
tarballs for other platforms and documentation in html form.

Enjoy,

The PySAL developers

=================================================
Python Spatial Analysis Library
=================================================

.. Contents::

What is PySAL
--------------

PySAL modules
+++++++++++++

 * pysal.core — Core Data Structures and IO
 * pysal.cg — Computational Geometry
 * pysal.esda — Exploratory Spatial Data Analysis
 * pysal.inequality — Spatial Inequality Analysis
 * pysal.spatial_dynamics — Spatial Dynamics
 * pysal.region — Spatially constrained clustering
 * pysal.weights — Spatial Weights
 * pysal.FileIO — PySAL FileIO: Module for reading and writing
                  various file types in a Pythonic way

Documentation
-------------

The documentation site is here
 http://pysal.org/contents.html

Web sites
---------

PySAL's home is here
 http://pysal.org/

The developer's site is here
 http://code.google.com/p/pysal/

Mailing Lists
-------------

Please see the developer's list here
 http://groups.google.com/group/pysal-dev

Help for users is here
 http://groups.google.com/group/openspace-list

Bug reports
-----------

To search for or report bugs, please see
 http://code.google.com/p/pysal/issues/list

License information
-------------------

See the file "LICENSE.txt" for information on the history of this
software, terms & conditions for usage, and a DISCLAIMER OF ALL
WARRANTIES.

--

-- 
Sergio (Serge) Rey
Professor, School of Geographical Sciences and Urban Planning
Arizona State University
http://geoplan.asu.edu/rey

Editor, International Regional Science Review
http://irx.sagepub.com
_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
Ernesto Adorio | 2 Aug 13:13 2010
Picon

Installing scipy 0.8.0 from sources. (where is recaster.py?)

Here is the last message from trying to install scipy from
sources.(Numpy installed fine)

creating /usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/myfunctools.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/utils.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/__init__.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/setup.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/arffread.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/recaster.py ->
/usr/lib/python2.5/site-packages/scipy/io
error: /usr/lib/python2.5/site-packages/scipy/io/recaster.py: No such
file or directory

I am wondering what causes the last line error?

error: /usr/lib/python2.5/site-packages/scipy/io/recaster.py: No such
file or directory.

Regards,
ernie
Ralf Gommers | 2 Aug 16:17 2010

Re: Installing scipy 0.8.0 from sources. (where is recaster.py?)


On Mon, Aug 2, 2010 at 7:13 PM, Ernesto Adorio <ernesto.adorio <at> gmail.com> wrote:
Here is the last message from trying to install scipy from
sources.(Numpy installed fine)

creating /usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/myfunctools.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/utils.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/__init__.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/setup.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/arffread.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/recaster.py ->
/usr/lib/python2.5/site-packages/scipy/io
error: /usr/lib/python2.5/site-packages/scipy/io/recaster.py: No such
file or directory

I am wondering what causes the last line error?

error: /usr/lib/python2.5/site-packages/scipy/io/recaster.py: No such
file or directory.

The recaster.py file is present, but somehow you can't copy it over - strange error

What OS, compilers and install command are you using? And can you please provide your whole build log?

Ralf
_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user

Gmane