Anne Archibald | 2 Nov 15:50
Picon

Inclusion of Kuiper test in Scipy

Hi,

I have implemented a statistical test from the literature, the Kuiper
test, for my own work, but I think it might be worth including it in
Scipy itself. I'd like to hear other people's opinions, though, both
on what (if anything) should go into scipy, and on whether it needs
modification. The code is at:

http://github.com/aarchiba/kuiper

This code includes a number of things beyond the basic test, some or
all of which may not be worth including in Scipy. What's there:

The Kuiper test - analogous to the Kolmogorov-Smirnov test, this takes
either a sample and a callable CDF or two samples and returns an
abstract score and the probability that a score that large would have
arisen if the two arguments are from the same distribution. This test
is sensitive to somewhat different features of the distribution than
the K-S test, and, importantly, it is invariant under cyclic
permutation: that is, if all the samples and distribution are modulo
(say) 1, then any shift in both arguments leaves the value unaffected.
Thus it is well suited to periodic distributions.

The Z_m^2 test - a test for uniformity on [0,1) based on the first m
Fourier coefficients. Returns a score and the probability of a score
that large.

The H test - a test that uses a data-dependent number of harmonics to
test for uniformity. Returns the score and the probability, and also
the number of harmonics that gave the most significant detection.
(Continue reading)

Jake VanderPlas | 2 Nov 16:29
Picon

Re: Inclusion of Kuiper test in Scipy

Anne,
I also recently required a Kuiper test code for my research.  I
adapted an IDL routine for python.  I'd say it is definitely worth
including.  In addition to what you listed, a routine to calculate the
significance of the Kuiper value would be useful.  I have a python
version of that code if you'd like to see it.
   -Jake

On Mon, Nov 2, 2009 at 6:50 AM, Anne Archibald
<aarchiba <at> physics.mcgill.ca> wrote:
> Hi,
>
> I have implemented a statistical test from the literature, the Kuiper
> test, for my own work, but I think it might be worth including it in
> Scipy itself. I'd like to hear other people's opinions, though, both
> on what (if anything) should go into scipy, and on whether it needs
> modification. The code is at:
>
> http://github.com/aarchiba/kuiper
>
> This code includes a number of things beyond the basic test, some or
> all of which may not be worth including in Scipy. What's there:
>
> The Kuiper test - analogous to the Kolmogorov-Smirnov test, this takes
> either a sample and a callable CDF or two samples and returns an
> abstract score and the probability that a score that large would have
> arisen if the two arguments are from the same distribution. This test
> is sensitive to somewhat different features of the distribution than
> the K-S test, and, importantly, it is invariant under cyclic
> permutation: that is, if all the samples and distribution are modulo
(Continue reading)

Jake VanderPlas | 2 Nov 16:41
Picon

How to include new code (Re: Ball Tree)

Hello,
I have been following the developer's list for a while, and I would
like to start contributing to scipy.  I wrote the list about a Ball
Tree implementation that I would like to include in scipy.spatial.
I've written a python/C++ implementation, and for k-nearest-neighbor
searches in large dimensions (d~1000), it is about 10 times faster
than the current scipy.spatial.cKDTree.  Can someone point me to
instructions on how to start the process of including this in scipy?
I have a local copy of the scipy svn, and my code consists of a few
C++ files and a working setup.py  Thanks!
   -Jake
Ralf Gommers | 2 Nov 18:07
Gravatar

Re: module docstrings



On Sat, Oct 31, 2009 at 9:28 PM, <josef.pktd <at> gmail.com> wrote:
On Sat, Oct 31, 2009 at 4:02 PM, Tom K. <tpk <at> kraussfamily.org> wrote:
> Ralf Gommers <ralf.gommers <at> googlemail.com> writes:
>
>>
>>
>> On Sat, Oct 31, 2009 at 2:21 AM, Charles R Harris
> <charlesr.harris <at> gmail.com> wrote:
>> I like routine listings....
>>
> MATLAB has a default behavior for documenting a directory on the path when
> you don't have a Contents.m in that directory: it pulls the "H1" line (first
> line of the M-file's help) from each M-file and lists those for you.
> But the Contents.m usually was nicely formatted and grouped according
> to function, with a list of each M-file with a 1-line summary.
> FWIW.
> I kind of like a listing too, but not all functions in all modules warrant
> listing at the module level - there is an asymmetry problem.
>

This is basically what the reST routines listings do already; one listing per area of functionality, and pulling the first line of the docstrings. See for example http://docs.scipy.org/numpy/docs/numpy-docs/reference/routines.indexing.rst/

Also remember that the contents.m for Matlab is necessary in many cases because of the one-file-per-function limit, for a Python module in a single file with __all__ at the top this is not nearly as important.


I'm not sure the info files in scipy are kept up to date. Since we
moved the documentation to the rst files, I haven't looked at info.py
anymore, except for those packages that have an automodule directive
and load the info script. (If I'm not mistaken about the import
mechanism for the docs.)

You're not mistaken. I'll look into an easy way to keep the existing info files updated.

I just rely on dir(modulename) to get the actual listing,
or better in some cases modulename.__all__  for only the public functions

Yes, you have both of those options, plus most IDE's give you nice overviews, or taglists in vim and (i assume) emacs.
 
I created a ticket with a patch to the docstandard, http://projects.scipy.org/numpy/ticket/1280 (keeping routine listings as an optional section), and will start working on module docs soon.

Cheers,
Ralf


_______________________________________________
Scipy-dev mailing list
Scipy-dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Charles R Harris | 2 Nov 19:08
Picon

Re: How to include new code (Re: Ball Tree)



On Mon, Nov 2, 2009 at 8:41 AM, Jake VanderPlas <jakevdp <at> gmail.com> wrote:
Hello,
I have been following the developer's list for a while, and I would
like to start contributing to scipy.  I wrote the list about a Ball
Tree implementation that I would like to include in scipy.spatial.
I've written a python/C++ implementation, and for k-nearest-neighbor
searches in large dimensions (d~1000), it is about 10 times faster
than the current scipy.spatial.cKDTree.  Can someone point me to
instructions on how to start the process of including this in scipy?
I have a local copy of the scipy svn, and my code consists of a few
C++ files and a working setup.py  Thanks!
  -Jake
__

Sounds like the easiest way would be to open a ticket, attach a patch, and mark it for review. If it doesn't get any attention, complain on the list. In any case, if the new addition needs maintenance you will probably need commit privileges in the long run.

Chuck

_______________________________________________
Scipy-dev mailing list
Scipy-dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Anne Archibald | 2 Nov 20:01
Picon

Re: Inclusion of Kuiper test in Scipy

2009/11/2 Jake VanderPlas <jakevdp <at> gmail.com>:
> Anne,
> I also recently required a Kuiper test code for my research.  I
> adapted an IDL routine for python.  I'd say it is definitely worth
> including.  In addition to what you listed, a routine to calculate the
> significance of the Kuiper value would be useful.  I have a python
> version of that code if you'd like to see it.

Actually, my code has significance calculators for all three tests
(based on Paltani 2004). But I know that the value is somewhat off for
small N - perhaps you could send me yours and I could see if it does
better?

Anne

>   -Jake
>
> On Mon, Nov 2, 2009 at 6:50 AM, Anne Archibald
> <aarchiba <at> physics.mcgill.ca> wrote:
>> Hi,
>>
>> I have implemented a statistical test from the literature, the Kuiper
>> test, for my own work, but I think it might be worth including it in
>> Scipy itself. I'd like to hear other people's opinions, though, both
>> on what (if anything) should go into scipy, and on whether it needs
>> modification. The code is at:
>>
>> http://github.com/aarchiba/kuiper
>>
>> This code includes a number of things beyond the basic test, some or
>> all of which may not be worth including in Scipy. What's there:
>>
>> The Kuiper test - analogous to the Kolmogorov-Smirnov test, this takes
>> either a sample and a callable CDF or two samples and returns an
>> abstract score and the probability that a score that large would have
>> arisen if the two arguments are from the same distribution. This test
>> is sensitive to somewhat different features of the distribution than
>> the K-S test, and, importantly, it is invariant under cyclic
>> permutation: that is, if all the samples and distribution are modulo
>> (say) 1, then any shift in both arguments leaves the value unaffected.
>> Thus it is well suited to periodic distributions.
>>
>> The Z_m^2 test - a test for uniformity on [0,1) based on the first m
>> Fourier coefficients. Returns a score and the probability of a score
>> that large.
>>
>> The H test - a test that uses a data-dependent number of harmonics to
>> test for uniformity. Returns the score and the probability, and also
>> the number of harmonics that gave the most significant detection.
>>
>> fold_intervals - a function to take a series of weighted intervals and
>> return the total exposure of each phase modulo 1. For testing for
>> uniformity when you have more data from some phases than others.
>> cdf_from_intervals - a function to construct a piecewise-linear CDF
>> from a set of exposures (as returned by the above function).
>> histogram_intervals - A function to evaluate how much exposure each
>> histogram bin received, to allow testing for uniformity using a
>> histogram in the presence of non-uniform exposure.
>>
>> There are also a couple of handy decorators in the test suite:
>>
>> seed - set the random seed before running a test
>> double_check - for randomized tests: run once, and if it fails, run it again.
>>
>> All have tests and somewhat informative docstrings, but I suspect some
>> of them may be too specialized to be of much use. The Kuiper test
>> should have wide applicability; the Z_m^2 test and H test, not so
>> much, although they are handy when testinf gor periodicity. The last
>> batch of utility functions I'm not sure are general enough to be very
>> useful, but I needed them.
>>
>> What do you think? How much of this would be useful in Scipy?
>>
>> Thanks,
>> Anne
>> _______________________________________________
>> Scipy-dev mailing list
>> Scipy-dev <at> scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev <at> scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
_______________________________________________
Scipy-dev mailing list
Scipy-dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Ondrej Certik | 3 Nov 20:46
Picon
Gravatar

Re: [Numpy-discussion] [ANN] For SF Bay Area residents: a discussion with Guido at the Berkeley Py4Science seminar

Hi,

On Tue, Nov 3, 2009 at 11:28 AM, Fernando Perez <fperez.net <at> gmail.com> wrote:
> Hi folks,
>
> if you reside in the San Francisco Bay Area, you may be interested in
> a meeting we'll be having tomorrow November 4 (2-4 pm), as part of our
> regular py4science meeting series.  Guido van Rossum, the creator of
> the Python language, will visit for a session where we will first do a
> very rapid overview of a number of scientific projects that use Python
> (in a lightning talk format) and then we will have an open discussion
> with Guido with hopefully interesting questions going in both
> directions.  The meeting is open to all, bring your questions!
>
> More details on this seminar series (including location) can be found here:
>
> https://cirl.berkeley.edu/view/Py4Science

this sounds exciting. I am thinking of coming (from Reno).

Ondrej

P.S. Just registered my car at DMV today. :)
_______________________________________________
Scipy-dev mailing list
Scipy-dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Fernando Perez | 3 Nov 20:50
Picon
Gravatar

Re: [Numpy-discussion] [ANN] For SF Bay Area residents: a discussion with Guido at the Berkeley Py4Science seminar

On Tue, Nov 3, 2009 at 11:46 AM, Ondrej Certik <ondrej <at> certik.cz> wrote:
> this sounds exciting. I am thinking of coming (from Reno).
>
> Ondrej
>
> P.S. Just registered my car at DMV today. :)

By all means do!  Just don't crash the car :)  It's in the same room
as before, 2pm.

Can you email me *3* slides about sympy?  If you come you can present
them, if not I'll use them in my rapid project overview I'll open the
session with (I was going to include sympy regardless, but even better
if you make the 3 slides :)

Cheers,

f
Ondrej Certik | 3 Nov 22:08
Picon
Gravatar

Re: [Numpy-discussion] [ANN] For SF Bay Area residents: a discussion with Guido at the Berkeley Py4Science seminar

On Tue, Nov 3, 2009 at 11:50 AM, Fernando Perez <fperez.net <at> gmail.com> wrote:
> On Tue, Nov 3, 2009 at 11:46 AM, Ondrej Certik <ondrej <at> certik.cz> wrote:
>> this sounds exciting. I am thinking of coming (from Reno).
>>
>> Ondrej
>>
>> P.S. Just registered my car at DMV today. :)
>
> By all means do!  Just don't crash the car :)  It's in the same room
> as before, 2pm.

Cool. I'll come for sure, maybe Luke Peterson will come as well.

>
> Can you email me *3* slides about sympy?  If you come you can present
> them, if not I'll use them in my rapid project overview I'll open the
> session with (I was going to include sympy regardless, but even better
> if you make the 3 slides :)

Ok, I'll do it later today.

Ondrej
_______________________________________________
Scipy-dev mailing list
Scipy-dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
David Goldsmith | 4 Nov 19:29
Picon

Re: module docstrings

Thanks, Ralf, et al.

On Mon, Nov 2, 2009 at 9:07 AM, Ralf Gommers <ralf.gommers <at> googlemail.com> wrote:

I created a ticket with a patch to the docstandard, http://projects.scipy.org/numpy/ticket/1280

So if anyone feels very strongly that this *should not* be optional, make your case now or "forever hold your piece." ;-)

DG
 
(keeping routine listings as an optional section), and will start working on module docs soon.

Cheers,
Ralf



_______________________________________________
Scipy-dev mailing list
Scipy-dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev


_______________________________________________
Scipy-dev mailing list
Scipy-dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev

Gmane