josef.pktd | 3 Jan 06:29
Picon

rewriting stats.spearmanr

spearmanr in scipy.stats does not handle ties correctly.

I was looking at a way to fix it, and ended up instead with a complete
rewrite. The main difference to the current version is that it can
return a correlation matrix for several variables at the same time.
This came pretty cheap, because, instead of using the old shortcut
formula for Spearmans rho, I just use np.corrcoef. Calculating the
correlation matrix takes 3 lines, but as usual dimension handling and
tests scripts take several times more time and lines than the function
itself.

Results are verified with R (through rpy) and are the same to 15,16
digits for both integer variables with ties and continuous variables
without ties, although R has more options and has exact test
statistic.

I could keep the API completely consistent with the current version,
but I would like to return also the test-statistic, and not just the
p-value, this
would, however, require to return a 3-tuple instead of a 2-tuple.

new signature is: spearmanr(a, b=None, axis=0):

Notes are below and new function and test scripts are in attachment.

Comments?

Josef

    Notes
(Continue reading)

josef.pktd | 4 Jan 06:00
Picon

stats.gaussian_kde prevent oversmoothing

I was working on an example for stats.gaussian_kde.

In one example I have a 1 dimensional mixture of normal distribution,
and the estimated distribution stats.gaussian_kde is too smooth, the
peaks are to small compared to the original distribution.

What's the easiest way to reduce the bandwidth for stats.gaussian_kde?
I didn't find any direct option. Is subclassing the only way?

Josef
Robert Kern | 4 Jan 06:05
Picon
Gravatar

Re: stats.gaussian_kde prevent oversmoothing

On Sat, Jan 3, 2009 at 23:00,  <josef.pktd <at> gmail.com> wrote:
> I was working on an example for stats.gaussian_kde.
>
> In one example I have a 1 dimensional mixture of normal distribution,
> and the estimated distribution stats.gaussian_kde is too smooth, the
> peaks are to small compared to the original distribution.
>
> What's the easiest way to reduce the bandwidth for stats.gaussian_kde?
> I didn't find any direct option. Is subclassing the only way?

Currently, yes. Feel free to enhance the code to allow for more flexibility.

--

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
josef.pktd | 4 Jan 07:05
Picon

Re: stats.gaussian_kde prevent oversmoothing

On Sun, Jan 4, 2009 at 12:05 AM, Robert Kern <robert.kern <at> gmail.com> wrote:
> On Sat, Jan 3, 2009 at 23:00,  <josef.pktd <at> gmail.com> wrote:
>> I was working on an example for stats.gaussian_kde.
>>
>> In one example I have a 1 dimensional mixture of normal distribution,
>> and the estimated distribution stats.gaussian_kde is too smooth, the
>> peaks are to small compared to the original distribution.
>>
>> What's the easiest way to reduce the bandwidth for stats.gaussian_kde?
>> I didn't find any direct option. Is subclassing the only way?
>
> Currently, yes. Feel free to enhance the code to allow for more flexibility.
>

Thanks,

I tried some quick monkey patching and it works, e.g.

    def covariance_factor(self):
        return 0.1
    gkde=stats.gaussian_kde(xn) #get kde for original sample

    setattr(gkde, 'covariance_factor', covariance_factor.__get__(gkde,
type(gkde)))
    gkde._compute_covariance()

and then call gkde.evaluate

The automatic covariance_factor was at around 0.25 in the example.
After setting it to 0.1, the kde gets both peaks of the mixture
(Continue reading)

Scott MacDonald | 4 Jan 20:53
Picon
Gravatar

mapminmax function?

I was wondering if there is a function analogous to Matlab's mapminmax (in the neural network toolbox)?

Thanks,
Scott

_______________________________________________
SciPy-user mailing list
SciPy-user <at> scipy.org
http://projects.scipy.org/mailman/listinfo/scipy-user
Pierre Raybaut | 4 Jan 21:43

[ Python(x,y) ] New release : 2.1.8

Hi all,

Release 2.1.8 is now available on http://www.pythonxy.com:
   - All-in-One Installer ("Full Edition"),
   - Plugin Installer -- to be downloaded with xyweb,
   - Update

Changes history
Version 2.1.8 (01-04-2009)

     * Added:
           o SciTE 1.77.0 (replacement for Notepad++)
           o WinMerge 2.10.2 - Open Source differencing and merging tool 
for Windows
     * Updated:
           o Console 2.0.141.6
           o VPython 5.0.1.0
           o xy 1.0.16
           o xydoc 1.0.2
           o IPython 0.9.1.6
     * Corrected:
           o Issues 50, 51, 52

Regards,
Pierre Raybaut
Robert Kern | 4 Jan 22:12
Picon
Gravatar

Re: mapminmax function?

On Sun, Jan 4, 2009 at 13:53, Scott MacDonald
<scott.p.macdonald <at> gmail.com> wrote:
> I was wondering if there is a function analogous to Matlab's mapminmax (in
> the neural network toolbox)?

What does it do?

--

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
Scott MacDonald | 4 Jan 23:54
Picon
Gravatar

Re: mapminmax function?

Oops, I guess that information would have been helpful.  The description:

%   MAPMINMAX processes matrices by normalizing the minimum and maximum values
%   of each row to [YMIN, YMAX].

%   MAPMINMAX(X,YMIN,YMAX) takes X and optional parameters,
%     X - NxQ matrix or a 1xTS row cell array of NxQ matrices.
%     YMIN - Minimum value for each row of Y. (Default is -1)
%     YMAX - Maximum value for each row of Y. (Default is +1)
%   and returns,
%     Y - Each MxQ matrix (where M == N) (optional).
%     PS - Process settings, to allow consistent processing of values.

%    Examples
%
%   Here is how to format a matrix so that the minimum and maximum
%   values of each row are mapped to default interval [-1,+1].
%   
%     x1 = [1 2 4; 1 1 1; 3 2 2; 0 0 0]
%     [y1,ps] = mapminmax(x1)
%
%   Next, we apply the same processing settings to new values.
%
%     x2 = [5 2 3; 1 1 1; 6 7 3; 0 0 0]
%     y2 = mapminmax('apply',x2,ps)
%
%   Here we reverse the processing of y1 to get x1 again.
%
%     x1_again = mapminmax('reverse',y1,ps)
%
%  Algorithm
%
%     It is assumed that X has only finite real values, and that
%     the elements of each row are not all equal.
%
%     y = (ymax-ymin)*(x-xmin)/(xmax-xmin) + ymin;

On Sun, Jan 4, 2009 at 2:12 PM, Robert Kern <robert.kern <at> gmail.com> wrote:
On Sun, Jan 4, 2009 at 13:53, Scott MacDonald
<scott.p.macdonald <at> gmail.com> wrote:
> I was wondering if there is a function analogous to Matlab's mapminmax (in
> the neural network toolbox)?

What does it do?

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
_______________________________________________
SciPy-user mailing list
SciPy-user <at> scipy.org
http://projects.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-user mailing list
SciPy-user <at> scipy.org
http://projects.scipy.org/mailman/listinfo/scipy-user
Robert Kern | 5 Jan 00:11
Picon
Gravatar

Re: mapminmax function?

On Sun, Jan 4, 2009 at 17:54, Scott MacDonald
<scott.p.macdonald <at> gmail.com> wrote:
> Oops, I guess that information would have been helpful.  The description:
>
> %   MAPMINMAX processes matrices by normalizing the minimum and maximum
> values
> %   of each row to [YMIN, YMAX].
>
> %   MAPMINMAX(X,YMIN,YMAX) takes X and optional parameters,
> %     X - NxQ matrix or a 1xTS row cell array of NxQ matrices.
> %     YMIN - Minimum value for each row of Y. (Default is -1)
> %     YMAX - Maximum value for each row of Y. (Default is +1)
> %   and returns,
> %     Y - Each MxQ matrix (where M == N) (optional).
> %     PS - Process settings, to allow consistent processing of values.
>
> %    Examples
> %
> %   Here is how to format a matrix so that the minimum and maximum
> %   values of each row are mapped to default interval [-1,+1].
> %
> %     x1 = [1 2 4; 1 1 1; 3 2 2; 0 0 0]
> %     [y1,ps] = mapminmax(x1)
> %
> %   Next, we apply the same processing settings to new values.
> %
> %     x2 = [5 2 3; 1 1 1; 6 7 3; 0 0 0]
> %     y2 = mapminmax('apply',x2,ps)

<sigh> Every time I manage to forget why I hate Matlab, they make a
function with a schizoid argument spec like this.

No, there's nothing floating around that does this. Here's a quick
implementation. It needs robustifying (it doesn't handle the
all-values-equal case), but it doesn't overload the function's
arguments. Sometimes objects really are the solution.

In [1]: import numpy as np

In [3]: class MapMinMaxApplier(object):
    def __init__(self, slope, intercept):
        self.slope = slope
        self.intercept = intercept
    def __call__(self, x):
        return x * self.slope + self.intercept
    def reverse(self, y):
        return (y-self.intercept) / self.slope
   ....:
   ....:

In [11]: def mapminmax(x, ymin=-1, ymax=+1):
   ....:     x = np.asanyarray(x)
   ....:     xmax = x.max(axis=-1)
   ....:     xmin = x.min(axis=-1)
   ....:     if (xmax==xmin).any():
   ....:         raise ValueError("some rows have no variation")
   ....:     slope = ((ymax-ymin) / (xmax - xmin))[:,np.newaxis]
   ....:     intercept = (-xmin*(ymax-ymin)/(xmax-xmin))[:,np.newaxis] + ymin
   ....:     ps = MapMinMaxApplier(slope, intercept)
   ....:     return ps(x), ps
   ....:

In [12]: x1 = np.array([[1.,2,4], [1,1,2], [3,2,2],[0,0,1]])

In [14]: y1, ps = mapminmax(x1)

In [15]: y1
Out[15]:
array([[-1.        , -0.33333333,  1.        ],
       [-1.        , -1.        ,  1.        ],
       [ 1.        , -1.        , -1.        ],
       [-1.        , -1.        ,  1.        ]])

In [16]: ps(x1)
Out[16]:
array([[-1.        , -0.33333333,  1.        ],
       [-1.        , -1.        ,  1.        ],
       [ 1.        , -1.        , -1.        ],
       [-1.        , -1.        ,  1.        ]])

In [17]: ps.reverse(y1)
Out[17]:
array([[ 1.,  2.,  4.],
       [ 1.,  1.,  2.],
       [ 3.,  2.,  2.],
       [ 0.,  0.,  1.]])

--

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
Scott MacDonald | 5 Jan 00:29
Picon
Gravatar

Re: mapminmax function?

Thank you, much appreciated.

Scott

On Sun, Jan 4, 2009 at 4:11 PM, Robert Kern <robert.kern <at> gmail.com> wrote:
On Sun, Jan 4, 2009 at 17:54, Scott MacDonald
> Oops, I guess that information would have been helpful.  The description:
>
> %   MAPMINMAX processes matrices by normalizing the minimum and maximum
> values
> %   of each row to [YMIN, YMAX].
>
> %   MAPMINMAX(X,YMIN,YMAX) takes X and optional parameters,
> %     X - NxQ matrix or a 1xTS row cell array of NxQ matrices.
> %     YMIN - Minimum value for each row of Y. (Default is -1)
> %     YMAX - Maximum value for each row of Y. (Default is +1)
> %   and returns,
> %     Y - Each MxQ matrix (where M == N) (optional).
> %     PS - Process settings, to allow consistent processing of values.
>
> %    Examples
> %
> %   Here is how to format a matrix so that the minimum and maximum
> %   values of each row are mapped to default interval [-1,+1].
> %
> %     x1 = [1 2 4; 1 1 1; 3 2 2; 0 0 0]
> %     [y1,ps] = mapminmax(x1)
> %
> %   Next, we apply the same processing settings to new values.
> %
> %     x2 = [5 2 3; 1 1 1; 6 7 3; 0 0 0]
> %     y2 = mapminmax('apply',x2,ps)

<sigh> Every time I manage to forget why I hate Matlab, they make a
function with a schizoid argument spec like this.

No, there's nothing floating around that does this. Here's a quick
implementation. It needs robustifying (it doesn't handle the
all-values-equal case), but it doesn't overload the function's
arguments. Sometimes objects really are the solution.

In [1]: import numpy as np

In [3]: class MapMinMaxApplier(object):
   def __init__(self, slope, intercept):
       self.slope = slope
       self.intercept = intercept
   def __call__(self, x):
       return x * self.slope + self.intercept
   def reverse(self, y):
       return (y-self.intercept) / self.slope
  ....:
  ....:

In [11]: def mapminmax(x, ymin=-1, ymax=+1):
  ....:     x = np.asanyarray(x)
  ....:     xmax = x.max(axis=-1)
  ....:     xmin = x.min(axis=-1)
  ....:     if (xmax==xmin).any():
  ....:         raise ValueError("some rows have no variation")
  ....:     slope = ((ymax-ymin) / (xmax - xmin))[:,np.newaxis]
  ....:     intercept = (-xmin*(ymax-ymin)/(xmax-xmin))[:,np.newaxis] + ymin
  ....:     ps = MapMinMaxApplier(slope, intercept)
  ....:     return ps(x), ps
  ....:

In [12]: x1 = np.array([[1.,2,4], [1,1,2], [3,2,2],[0,0,1]])

In [14]: y1, ps = mapminmax(x1)

In [15]: y1
Out[15]:
array([[-1.        , -0.33333333,  1.        ],
      [-1.        , -1.        ,  1.        ],
      [ 1.        , -1.        , -1.        ],
      [-1.        , -1.        ,  1.        ]])

In [16]: ps(x1)
Out[16]:
array([[-1.        , -0.33333333,  1.        ],
      [-1.        , -1.        ,  1.        ],
      [ 1.        , -1.        , -1.        ],
      [-1.        , -1.        ,  1.        ]])

In [17]: ps.reverse(y1)
Out[17]:
array([[ 1.,  2.,  4.],
      [ 1.,  1.,  2.],
      [ 3.,  2.,  2.],
      [ 0.,  0.,  1.]])

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
_______________________________________________
SciPy-user mailing list
SciPy-user <at> scipy.org
http://projects.scipy.org/mailman/listinfo/scipy-user

_______________________________________________
SciPy-user mailing list
SciPy-user <at> scipy.org
http://projects.scipy.org/mailman/listinfo/scipy-user

Gmane