Jacob Biesinger | 1 Aug 2010 02:48
Picon
Gravatar

Off by one bug in Scipy.stats.hypergeom

Hi!


Perhaps I'm using the module incorrectly, but it looks like the x parameter in scipy.stats.hypergeom is off by one.  Specifically, I think it's one-too-high.

From the wikipedia article http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_example (I know they could be wrong-- just hear me out on this), 

scipy.stats.hypergeom?
Hypergeometric distribution
    
       Models drawing objects from a bin.
       M is total number of objects, n is total number of Type I objects.
       RV counts number of Type I objects in N drawn without replacement from
       population.
So translating wikipedia's example...
Pr(x=4; M=50, n=5, N=10)  = (choose(5,4) * choose(50-5, 10-4)) / choose(50,10) = .003964583
Pr(x=5; M=50, n=5, N=10)  = (choose(5,5) * choose(50-5, 10-5)) / choose(50,10) = .0001189375

Which you can check with the python code:
from scipy import comb as chse   # "combination" => choose
 
float((chse(5,4, exact=1) * chse(50-5,10-4, exact=1))) / chse(50,10,exact=1)  # example one
0.0039645830580150654155  # okay!

float((chse(5,5, exact=1) * chse(50-5,10-5, exact=1))) / chse(50,10,exact=1) # example two
0.00011893749174045196247  # okay!

Try example one with scipy.stats.hypergeom:
# scipy.stats.hypergeom.sf(x, M, n, N)
scipy.stats.hypergeom.sf(4,50,5,10)
0.00011893749169422652     # correct value for x=5, not x=4
scipy.stats.hypergeom.sf(5,50,5,10)
-4.6185277824406512e-14    # wrong

It seems that changing the loc value from =0 (default) to =1 fixes the issue...
scipy.stats.hypergeom.sf(4,50,5,10, loc=1)
0.0040835205497095073    # close enough 

scipy.stats.hypergeom.sf(5,50,5,10, loc=1)
0.00011893749169422652   # okay!

Am I using the package wrong?
--
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine
(949) 231-7587
_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
alex | 1 Aug 2010 03:02
Picon
Favicon

Re: Off by one bug in Scipy.stats.hypergeom

On Sat, Jul 31, 2010 at 8:48 PM, Jacob Biesinger <jake.biesinger <at> gmail.com> wrote:

Hi!

Perhaps I'm using the module incorrectly, but it looks like the x parameter in scipy.stats.hypergeom is off by one.  Specifically, I think it's one-too-high.

From the wikipedia article http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_example (I know they could be wrong-- just hear me out on this), 


I often see slight parameterization differences for stats packages and I never assume they are the same as each other or even as their documentation until I test some examples myself.  So unless this behavior doesn't match the scipy docs then it probably doesn't count as a bug.

Alex
_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
josef.pktd | 1 Aug 2010 04:27
Picon

Re: Off by one bug in Scipy.stats.hypergeom

On Sat, Jul 31, 2010 at 8:48 PM, Jacob Biesinger
<jake.biesinger <at> gmail.com> wrote:
> Hi!
> Perhaps I'm using the module incorrectly, but it looks like the x parameter
> in scipy.stats.hypergeom is off by one.  Specifically, I think it's
> one-too-high.
> From the wikipedia article
> http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_example (I
> know they could be wrong-- just hear me out on this),
> scipy.stats.hypergeom?
> Hypergeometric distribution
>
>        Models drawing objects from a bin.
>        M is total number of objects, n is total number of Type I objects.
>        RV counts number of Type I objects in N drawn without replacement
> from
>        population.
> So translating wikipedia's example...
> Pr(x=4; M=50, n=5, N=10)  = (choose(5,4) * choose(50-5, 10-4)) /
> choose(50,10) = .003964583
> Pr(x=5; M=50, n=5, N=10)  = (choose(5,5) * choose(50-5, 10-5)) /
> choose(50,10) = .0001189375
> Which you can check with the python code:
> from scipy import comb as chse   # "combination" => choose
>
> float((chse(5,4, exact=1) * chse(50-5,10-4, exact=1))) / chse(50,10,exact=1)
>  # example one
> 0.0039645830580150654155  # okay!
> float((chse(5,5, exact=1) * chse(50-5,10-5, exact=1))) / chse(50,10,exact=1)
> # example two
> 0.00011893749174045196247  # okay!
> Try example one with scipy.stats.hypergeom:
> # scipy.stats.hypergeom.sf(x, M, n, N)
> scipy.stats.hypergeom.sf(4,50,5,10)
> 0.00011893749169422652     # correct value for x=5, not x=4
> scipy.stats.hypergeom.sf(5,50,5,10)
> -4.6185277824406512e-14    # wrong
> It seems that changing the loc value from =0 (default) to =1 fixes the
> issue...
> scipy.stats.hypergeom.sf(4,50,5,10, loc=1)
> 0.0040835205497095073    # close enough
> scipy.stats.hypergeom.sf(5,50,5,10, loc=1)
> 0.00011893749169422652   # okay!
> Am I using the package wrong?

I don't know why you are using the survival function.

>From some quick examples, hypergeom looks ok:

pmf has identical results to the Wikipedia example you referenced

>>> stats.hypergeom.pmf([4,5],50,5,10)
array([ 0.00396458,  0.00011894])
>>> stats.hypergeom.pmf(4,50,5,10)
0.0039645830580151411
>>> stats.hypergeom.pmf(5,50,5,10)
0.00011893749174045286

consistency of pmf, cdf, sf

>>> stats.hypergeom.pmf(np.arange(5),50,5,10).sum()
0.9998810625082829
>>> stats.hypergeom.cdf(4,50,5,10)
0.9998810625082829
>>> stats.hypergeom.sf(4,50,5,10)
0.00011893749171709711
>>> 1-stats.hypergeom.pmf(np.arange(5),50,5,10).sum()
0.00011893749171709711

I'm always glad when someone verifies the numbers in scipy stats and
appreciate any reports of inconsistencies or bugs. If there are
differences in the parameterization, then we should make sure that
they are sufficiently documented.
Most distributions are internally tested or against numpy.random and
there are very few tests with other packages.

Thanks,

Josef

> --
> Jake Biesinger
> Graduate Student
> Xie Lab, UC Irvine
> (949) 231-7587
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User <at> scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
Jacob Biesinger | 1 Aug 2010 07:16
Picon
Gravatar

Re: Off by one bug in Scipy.stats.hypergeom

I have seen the error of my ways and apologize for the noise in the list-- I should have been using the probability mass function instead of the survival function.  Doing the summation manually shows that all is well:


def hypergeom(x, fgTotal, bgMatching, bgTotal):
        return scipy.comb(fgTotal,x) * scipy.comb(bgTotal-fgTotal,bgMatching-x) / scipy.comb(bgTotal,bgMatching)

>>> scipy.stats.hypergeom.sf(4,50,5,10, loc=1)
0.0040835205497095073
>>> sum(scipy.stats.hypergeom.pmf(x,50,5,10) for x in range(4,min(10+1,50+1)))
0.0040835205497557125
>>> sum([hypergeom(x, 10, 5, 50) for x in scipy.arange(4, min(10+1, 50+1))])
0.0040835205497557186


>>> scipy.stats.hypergeom.sf(5,50,5,10, loc=1)
0.00011893749169422652
>>> sum(scipy.stats.hypergeom.pmf(x,50,5,10) for x in range(5,min(10+1,50+1)))
0.00011893749174045688
>>> sum([hypergeom(x, 10, 5, 50) for x in scipy.arange(5, min(10+1, 50+1))])
0.00011893749174045679


Thanks anyway!
--
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine
(949) 231-7587


On Sat, Jul 31, 2010 at 5:48 PM, Jacob Biesinger <jake.biesinger <at> gmail.com> wrote:
Hi!

Perhaps I'm using the module incorrectly, but it looks like the x parameter in scipy.stats.hypergeom is off by one.  Specifically, I think it's one-too-high.

From the wikipedia article http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_example (I know they could be wrong-- just hear me out on this), 

scipy.stats.hypergeom?
Hypergeometric distribution
    
       Models drawing objects from a bin.
       M is total number of objects, n is total number of Type I objects.
       RV counts number of Type I objects in N drawn without replacement from
       population.
So translating wikipedia's example...
Pr(x=4; M=50, n=5, N=10)  = (choose(5,4) * choose(50-5, 10-4)) / choose(50,10) = .003964583
Pr(x=5; M=50, n=5, N=10)  = (choose(5,5) * choose(50-5, 10-5)) / choose(50,10) = .0001189375

Which you can check with the python code:
from scipy import comb as chse   # "combination" => choose
 
float((chse(5,4, exact=1) * chse(50-5,10-4, exact=1))) / chse(50,10,exact=1)  # example one
0.0039645830580150654155  # okay!

float((chse(5,5, exact=1) * chse(50-5,10-5, exact=1))) / chse(50,10,exact=1) # example two
0.00011893749174045196247  # okay!

Try example one with scipy.stats.hypergeom:
# scipy.stats.hypergeom.sf(x, M, n, N)
scipy.stats.hypergeom.sf(4,50,5,10)
0.00011893749169422652     # correct value for x=5, not x=4
scipy.stats.hypergeom.sf(5,50,5,10)
-4.6185277824406512e-14    # wrong

It seems that changing the loc value from =0 (default) to =1 fixes the issue...
scipy.stats.hypergeom.sf(4,50,5,10, loc=1)
0.0040835205497095073    # close enough 

scipy.stats.hypergeom.sf(5,50,5,10, loc=1)
0.00011893749169422652   # okay!

Am I using the package wrong?
--
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine
(949) 231-7587

_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
josef.pktd | 1 Aug 2010 11:33
Picon

Re: Off by one bug in Scipy.stats.hypergeom

On Sun, Aug 1, 2010 at 1:16 AM, Jacob Biesinger
<jake.biesinger <at> gmail.com> wrote:
> I have seen the error of my ways and apologize for the noise in the list-- I
> should have been using the probability mass function instead of the survival
> function.  Doing the summation manually shows that all is well:
> def hypergeom(x, fgTotal, bgMatching, bgTotal):
>         return scipy.comb(fgTotal,x) *
> scipy.comb(bgTotal-fgTotal,bgMatching-x) / scipy.comb(bgTotal,bgMatching)
>>>> scipy.stats.hypergeom.sf(4,50,5,10, loc=1)
> 0.0040835205497095073
>>>> sum(scipy.stats.hypergeom.pmf(x,50,5,10) for x in
>>>> range(4,min(10+1,50+1)))
> 0.0040835205497557125
>>>> sum([hypergeom(x, 10, 5, 50) for x in scipy.arange(4, min(10+1, 50+1))])
> 0.0040835205497557186
>
>>>> scipy.stats.hypergeom.sf(5,50,5,10, loc=1)
> 0.00011893749169422652
>>>> sum(scipy.stats.hypergeom.pmf(x,50,5,10) for x in
>>>> range(5,min(10+1,50+1)))
> 0.00011893749174045688
>>>> sum([hypergeom(x, 10, 5, 50) for x in scipy.arange(5, min(10+1, 50+1))])
> 0.00011893749174045679

Note that the distribution methods are vectorized. You can use an
array_like for x instead of the list comprehension.

Josef

>
> Thanks anyway!
> --
> Jake Biesinger
> Graduate Student
> Xie Lab, UC Irvine
> (949) 231-7587
>
>
> On Sat, Jul 31, 2010 at 5:48 PM, Jacob Biesinger <jake.biesinger <at> gmail.com>
> wrote:
>>
>> Hi!
>> Perhaps I'm using the module incorrectly, but it looks like the x
>> parameter in scipy.stats.hypergeom is off by one.  Specifically, I think
>> it's one-too-high.
>> From the wikipedia article
>> http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_example (I
>> know they could be wrong-- just hear me out on this),
>> scipy.stats.hypergeom?
>> Hypergeometric distribution
>>
>>        Models drawing objects from a bin.
>>        M is total number of objects, n is total number of Type I objects.
>>        RV counts number of Type I objects in N drawn without replacement
>> from
>>        population.
>> So translating wikipedia's example...
>> Pr(x=4; M=50, n=5, N=10)  = (choose(5,4) * choose(50-5, 10-4)) /
>> choose(50,10) = .003964583
>> Pr(x=5; M=50, n=5, N=10)  = (choose(5,5) * choose(50-5, 10-5)) /
>> choose(50,10) = .0001189375
>> Which you can check with the python code:
>> from scipy import comb as chse   # "combination" => choose
>>
>> float((chse(5,4, exact=1) * chse(50-5,10-4, exact=1))) /
>> chse(50,10,exact=1)  # example one
>> 0.0039645830580150654155  # okay!
>> float((chse(5,5, exact=1) * chse(50-5,10-5, exact=1))) /
>> chse(50,10,exact=1) # example two
>> 0.00011893749174045196247  # okay!
>> Try example one with scipy.stats.hypergeom:
>> # scipy.stats.hypergeom.sf(x, M, n, N)
>> scipy.stats.hypergeom.sf(4,50,5,10)
>> 0.00011893749169422652     # correct value for x=5, not x=4
>> scipy.stats.hypergeom.sf(5,50,5,10)
>> -4.6185277824406512e-14    # wrong
>> It seems that changing the loc value from =0 (default) to =1 fixes the
>> issue...
>> scipy.stats.hypergeom.sf(4,50,5,10, loc=1)
>> 0.0040835205497095073    # close enough
>> scipy.stats.hypergeom.sf(5,50,5,10, loc=1)
>> 0.00011893749169422652   # okay!
>> Am I using the package wrong?
>> --
>> Jake Biesinger
>> Graduate Student
>> Xie Lab, UC Irvine
>> (949) 231-7587
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User <at> scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
Sebastian Walter | 1 Aug 2010 14:05
Picon
Gravatar

[ANN] ALGOPY 0.21, algorithmic differentiation in Python

I'm happy to announce the first official release of ALGOPY in version 0.2.1.

Rationale:
~~~~~~~~
The purpose of ALGOPY is the evaluation of higher-order derivatives in
the forward and reverse mode of Algorithmic Differentiation (AD) using
univariate Taylor polynomial arithmetic. Particular focus are
functions that contain numerical linear algebra functions (e.g. inv,
dot, eigh, qr, cholesky,...) as they often appear in statistically
motivated functions.

Download:
~~~~~~~~~
http://pypi.python.org/pypi/algopy/0.2.1
or bleeding edge versions from from http://github.com/b45ch1/algopy

Documentation:
~~~~~~~~~~~~
available at http://packages.python.org/algopy/

OS Support:
~~~~~~~~~~
Linux, Windows (tested with pythonxy), should also work on Mac

Software Dependencies:
~~~~~~~~~~~~~~~~~~~~
for the core: numpy, scipy
for testing:  nose

Exampe Session:
~~~~~~~~~~~~~
Consider the contrived example where it is the goal to compute the
directional derivative df/dx_1 :

>>> import numpy; from numpy import log, exp, sin, cos
>>> import algopy; from algopy import UTPM, dot, inv, zeros
>>>
>>> def f(x):
...     A = zeros((2,2),dtype=x)
...     A[0,0] = numpy.log(x[0]*x[1])
...     A[0,1] = numpy.log(x[1]) + exp(x[0])
...     A[1,0] = sin(x[1])**2 + cos(x[0])**3.1
...     A[1,1] = x[0]**cos(x[1])
...     return log( dot(x.T,  dot( inv(A), x)))
...
>>>
>>> x = UTPM(zeros((2,1,2),dtype=float))
>>> x.data[0,0] = [1,2]
>>> x.data[1,0] = [1,0]
>>> y = f(x)
>>>
>>> print 'normal function evaluation f(x) = ',y.data[0,0]
normal function evaluation f(x) =  0.641250189986
>>> print 'directional derivative df/dx1 = ',y.data[1,0]
directional derivative df/dx1 =  1.62982340133
Serge Rey | 1 Aug 2010 17:36
Picon
Gravatar

ANN: PySAL 1.0.0

We are pleased to announce the release of PySAL 1.0.0.

PySAL is a library of tools for spatial data analysis written in Python.  This
release comes after an extensive integration and refactoring effort to combine
common analytical functionality in the projects STARS and PySpace into a shared
open source library for geospatial analysis and geocomputation. For more
information, please see the release notes that follow below.

You can download PySAL from here:
http://code.google.com/p/pysal/downloads/list
Python 2.6 binaries for Windows and OS X are available, as well as source
tarballs for other platforms and documentation in html form.

Enjoy,

The PySAL developers

=================================================
Python Spatial Analysis Library
=================================================

.. Contents::

What is PySAL
--------------

PySAL modules
+++++++++++++

 * pysal.core — Core Data Structures and IO
 * pysal.cg — Computational Geometry
 * pysal.esda — Exploratory Spatial Data Analysis
 * pysal.inequality — Spatial Inequality Analysis
 * pysal.spatial_dynamics — Spatial Dynamics
 * pysal.region — Spatially constrained clustering
 * pysal.weights — Spatial Weights
 * pysal.FileIO — PySAL FileIO: Module for reading and writing
                  various file types in a Pythonic way

Documentation
-------------

The documentation site is here
 http://pysal.org/contents.html

Web sites
---------

PySAL's home is here
 http://pysal.org/

The developer's site is here
 http://code.google.com/p/pysal/

Mailing Lists
-------------

Please see the developer's list here
 http://groups.google.com/group/pysal-dev

Help for users is here
 http://groups.google.com/group/openspace-list

Bug reports
-----------

To search for or report bugs, please see
 http://code.google.com/p/pysal/issues/list

License information
-------------------

See the file "LICENSE.txt" for information on the history of this
software, terms & conditions for usage, and a DISCLAIMER OF ALL
WARRANTIES.

--

-- 
Sergio (Serge) Rey
Professor, School of Geographical Sciences and Urban Planning
Arizona State University
http://geoplan.asu.edu/rey

Editor, International Regional Science Review
http://irx.sagepub.com
_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
Ernesto Adorio | 2 Aug 2010 13:13
Picon

Installing scipy 0.8.0 from sources. (where is recaster.py?)

Here is the last message from trying to install scipy from
sources.(Numpy installed fine)

creating /usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/myfunctools.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/utils.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/__init__.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/setup.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/arffread.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/recaster.py ->
/usr/lib/python2.5/site-packages/scipy/io
error: /usr/lib/python2.5/site-packages/scipy/io/recaster.py: No such
file or directory

I am wondering what causes the last line error?

error: /usr/lib/python2.5/site-packages/scipy/io/recaster.py: No such
file or directory.

Regards,
ernie
Ralf Gommers | 2 Aug 2010 16:17
Gravatar

Re: Installing scipy 0.8.0 from sources. (where is recaster.py?)


On Mon, Aug 2, 2010 at 7:13 PM, Ernesto Adorio <ernesto.adorio <at> gmail.com> wrote:
Here is the last message from trying to install scipy from
sources.(Numpy installed fine)

creating /usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/myfunctools.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/utils.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/__init__.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/setup.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/arff/arffread.py ->
/usr/lib/python2.5/site-packages/scipy/io/arff
copying build/lib.linux-i686-2.5/scipy/io/recaster.py ->
/usr/lib/python2.5/site-packages/scipy/io
error: /usr/lib/python2.5/site-packages/scipy/io/recaster.py: No such
file or directory

I am wondering what causes the last line error?

error: /usr/lib/python2.5/site-packages/scipy/io/recaster.py: No such
file or directory.

The recaster.py file is present, but somehow you can't copy it over - strange error

What OS, compilers and install command are you using? And can you please provide your whole build log?

Ralf
_______________________________________________
SciPy-User mailing list
SciPy-User <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
Thøger Emil Juul Thorsen | 2 Aug 2010 16:19
Picon
Gravatar

Re: Points with given distance to a polygon

Shapely indeed looks like exactly what I was searching for! Thanks! I
love Python; I'm able to do stuff after a couple of months which I
couldn't figure out how to do in IDL after years of usage. :-D

/Emil

On Mon, 2010-07-26 at 14:28 -0500, Joe Kington wrote:
> On Mon, Jul 26, 2010 at 1:26 PM, Zachary Pincus
> <zachary.pincus <at> yale.edu> wrote:
>         > I am working on a project where I am defining some regions
>         of
>         > interest.
>         > I have a 2200x2200 px 2D Array in which my ROI is defined by
>         a
>         > polygon.
>         > However, my data are smoothed by a gaussian kernel of width
>         300px,
>         > and I
>         > would like to draw some lines indicating this inner 150px
>         distance to
>         > the borders of the ROI. I cannot come up with any way to do
>         this, does
>         > anyone have an idea?
>         
>         
>         Two broad options spring to mind:
>         (1) Geometric -- shrink the polygon along the normals to the
>         vertices.
>         [Oh, I see that eat has given pseudocode for same... good]
>         (2) Gridded -- rasterize the polygon to a binary mask (no
>         tools for
>         this in scipy, I fear... but if you're handy with opengl or
>         something,
>         that's not too hard), and then use scipy.ndimage to erode or
>         dilate
>         the mask as necessary.
>         
>         Zach
>         
>         _______________________________________________
>         SciPy-User mailing list
>         SciPy-User <at> scipy.org
>         http://mail.scipy.org/mailman/listinfo/scipy-user
>         
> 
> If you go with a purely geometric method, look into a module like
> shapely to buffer the polygon. (similar to the code snipped eat
> posted, but more flexible) 
> 
> See this example for a general idea.  
> 
> -Joe
> _______________________________________________
> SciPy-User mailing list
> SciPy-User <at> scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user

Gmane