Johann Goetz | 15 Aug 16:59 2014
Picon

Histogram as its own class

Hello,
I'm a long-time user of scipy doing mostly multivariate big-data (several terabytes) analysis in the high-energy physics realm. One thing I've found useful was to promote the histogram to it's own class. Instead of creating yet another package, I have a mind to include it into the scipy.stats module and I would like some feed-back. I.e. is this the right place for such an object?

I have some documentation, but not enough I would say, and the classes are currently buried in my "pyhep" project, but they are easily extracted out.

https://bitbucket.org/theodoregoetz/pyhep/wiki/Home

Here are some details:

The histograms I am addressing are N-dimensional over a continuous-domain (floating-point data, no gaps - though bins can have value inf or nan if need-be) along each axis. The axes need not be uniform.

There are two classes: HistogramAxis and Histogram. The Axes are always floating point, but the histogram's data can be any dtype (default: np.int, a "cast" to float is done when dividing two histograms). I make use of np.histogramdd() and store the data along with the uncertainty. Many operations are supported including adding, subtracting, multiplying, dividing, bin-merging, cutting/clipping along one or more axes, projecting along an axis, iterating over an axis, filling from a sample with or without weights.

Most of power in this package is in the fitting method of the histogram which makes use of scipy.curve_fit(). It handles missing data (when a bin is inf or nan), can include the uncertainty in the fit, and calculates a goodness of fit.

On top of this, I have free functions to plot 1D and 2D histograms using matplotlib, as well as functions to handle reading in large HDF5 files. These are auxiliary and may not fit into scipy directly.

Thank you all,
Johann. 
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Moritz Beber | 13 Aug 17:08 2014
Picon

Proposal for a new function nanpdist that treats NaNs as missing values

Dear all,

As suggested in this github issue (https://github.com/scipy/scipy/issues/3870), I would like to discuss the merit of introducing a new function nanpdist into scipy.spatial. I have also brought up the problem in the following previous e-mail (http://comments.gmane.org/gmane.comp.python.scientific.devel/18956) and on SO (http://stackoverflow.com/questions/24781461/compute-the-pairwise-distance-in-scipy-with-missing-values).

Warren suggested three ways to tackle this problem:
  1. Don't change anything--the users should clean up their data!
  2. nanpdist
  3. Add a keyword argument to pdist that determines how nan should be treated.

Clearly, I don't favor the first option since I believe missing values can be important pieces of information, too. I slightly tend towards option two because adding a keyword will further complicate an already very long pdist function.

I'm happy to submit a pull request if there is a consensus that something should be done.

Best,

Moritz

_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Manoj Kumar | 11 Aug 17:04 2014
Picon

Fastest way to multiply a sparse matrix with another numpy array

Hello,

I was wondering what is the fastest way (format) to multiply a sparse matrix with a numpy array. Intuitively, a csr format multiplied with a numpy array which is fortran contiguous seems to be the fastest, but I have ran a few benchmarks and it seems otherwise. It is also mentioned here
http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.csc_matrix.html that using csr matrices "may" be faster.


In [5]: X Out[5]: <11314x130107 sparse matrix of type '<type 'numpy.float64'>' with 1787565 stored elements in Compressed Sparse Row format> In [6]: _, n_features = X.shape In [9]: w_c = np.random.rand(n_features, 10) In [10]: w_f = np.asarray(w_c, order='f') In [13]: csc = sparse.csc_matrix(X) In [30]: %timeit X * w_f 10 loops, best of 3: 40.5 ms per loop In [31]: %timeit X * w_c 10 loops, best of 3: 37.3 ms per loop In [32]: %timeit csc * w_c 10 loops, best of 3: 24.3 ms per loop In [33]: %timeit csc * w_f 10 loops, best of 3: 27.3 ms per loop
It seems here, using a csc matrix is faster with a C-contiguous numpy array which is completely non-intuitive to me. Are there any hard rules for this? or is it data dependent?

Sorry for my noobish questions!
--
Regards,
Manoj Kumar,
GSoC 2014, Scikit-learn
Mech Undergrad
http://manojbits.wordpress.com
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Sai Rajeshwar | 27 Jul 10:28 2014
Picon

convolution using numpy/scipy using MKL libraries

hi all,

   Im trying to implement 3d convolutional networks.. for which I wanted to use convolve function from scipy.signal.convolve or fftconvolve..  but looks like both of them doesnot use MKL libraries..  is there any implementation of convolutoin which uses MKL libraries or MKL-threaded  so that code runs faster.

thanks a lot in advance

with regards..

M. Sai Rajeswar
M-tech  Computer Technology
IIT Delhi
----------------------------------Cogito Ergo Sum---------
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
nicky van foreest | 23 Jul 16:40 2014
Picon

scipy.sparse versus pysparse

Hi, 

I am doing some testing between scipy.sparse and pysparse on my ubuntu machine. Some testing reveals that pysparse is about 9 times faster in matrix-vector multiplication that scipy.sparse. Might there be anything specific I forgot to do during scipy's installation (I just ran apt-get install python-scipy)? Is there another simple explanation for this difference? I prefer to use scipy.sparse for its cleaner api, but a factor 9 in speed is considerable. 

thanks 

Nicky
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Johannes Kulick | 21 Jul 11:06 2014
Picon

Pull Request: Dirichlet Distribution

Hi,

I sent a pull request, that implements a Dirichlet distribution. Code review
would be appreciated!

https://github.com/scipy/scipy/pull/3815

Best,
Johannes Kulick

--

-- 
Question: What is the weird attachment to all my emails?
Answer:   http://en.wikipedia.org/wiki/Digital_signature
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Moritz Emanuel Beber | 21 Jul 10:09 2014
Picon

computing pairwise distance of vectors with missing (nan) values

Dear all,

My basic problem is that I would like to compute distances between vectors with missing values. You can find more detail in my question on SO (http://stackoverflow.com/questions/24781461/compute-the-pairwise-distance-in-scipy-with-missing-values). Since it seems this is not directly possible with scipy at the moment, I started to Cythonize my function. Currently, the below function is not much faster than my pure Python implementation, so I thought I'd ask the experts here. Note that even though I'm computing the euclidean distance, I'd like to make use of different distance metrics.

So my current attempt at Cythonizing is:

import numpy
cimport numpy
cimport cython
from numpy.linalg import norm

numpy.import_array()

<at> cython.boundscheck(False)
<at> cython.wraparound(False)
def masked_euclidean(numpy.ndarray[numpy.double_t, ndim=2] data):
    cdef Py_ssize_t m = data.shape[0]
    cdef Py_ssize_t i = 0
    cdef Py_ssize_t j = 0
    cdef Py_ssize_t k = 0
    cdef numpy.ndarray[numpy.double_t] dm = numpy.zeros(m * (m - 1) // 2, dtype=numpy.double)
    cdef numpy.ndarray[numpy.uint8_t, ndim=2, cast=True] mask = numpy.isfinite(data) # boolean
    for i in range(m - 1):
        for j in range(i + 1, m):
            curr = numpy.logical_and(mask[i], mask[j])
            u = data[i][curr]
            v = data[j][curr]
            dm[k] = norm(u - v)
            k += 1
    return dm

Maybe the lack of speed-up is due to the Python function 'norm'? So my question is, how to improve the Cython implementation? Or is there a completely different way of approaching this problem?

Thanks in advance,
Moritz
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
Alexander Behringer | 18 Jul 11:53 2014
Picon

Is Brent's method for minimizing the value of a function implemented twice in SciPy?

Hello,

while studying the SciPy documentation, I noticed that the 'brent' and
the 'fminbound' function in the 'scipy.optimize' package both seem to
implement Brent's method for function minimization.

Both functions have been implemented by Travis Oliphant (see commit
infos below).

One minor difference is, that the 'brent' function _optionally_ allows
for auto bracketing via the help of the 'bracket' function, when
supplied only with two bounds via the 'brack' parameter instead of a
triplet as required by Brent's algorithm.

So is it possible, that Brent's method has been implemented twice?

'fminbound' was added in 2001:

https://github.com/scipy/scipy/commit/3f44f63b481abf676a0b344fc836acf76bc86b35

'brent' was added approximately three-quarters of a year later in 2002:

https://github.com/scipy/scipy/commit/b94c30dcb1ba9ad0b4c3e2090f5e99a8a21275ab

The 'brent' code has later been moved into a separate internal class:

https://github.com/scipy/scipy/commit/675ad592465be178cde88a89e9e362fdd5237004

Sincerely,
Alexander Behringer
Yoshiki Vazquez Baeza | 16 Jul 20:06 2014
Picon

Adding Procrustes to SciPy

Hello,

There seems to be some interest in adding Procrustes analysis to SciPy,
there is an existing implementation in scikit-bio (
https://github.com/biocore/scikit-bio/blob/master/skbio/math/stats/spatial.py
a package in which I am a developer) which could probably be ported
over.

The thing that's not particularly clear is where should this code live,
the suggestion by Ralf Gommers is "linalg". However skbio puts the code
inside the "spatial" submodule.

This is the GitHub issue where this was initially discussed:
https://github.com/scipy/scipy/issues/3786

Thanks!

Yoshiki.
Julian Taylor | 15 Jul 20:06 2014

__numpy_ufunc__ and 1.9 release

hi,
as you may know we want to release numpy 1.9 soon. We should have solved
most indexing regressions the first beta showed.

The remaining blockers are finishing the new __numpy_ufunc__ feature.
This feature should allow for alternative method to overriding the
behavior of ufuncs from subclasses.
It is described here:
https://github.com/numpy/numpy/blob/master/doc/neps/ufunc-overrides.rst

The current blocker issues are:
https://github.com/numpy/numpy/issues/4753
https://github.com/numpy/numpy/pull/4815

I'm not to familiar with all the complications of subclassing so I can't
really say how hard this is to solve.
My issue is that it there still seems to be debate on how to handle
operator overriding correctly and I am opposed to releasing a numpy with
yet another experimental feature that may or may not be finished
sometime later. Having datetime in infinite experimental state is bad
enough.
I think nobody is served well if we release 1.9 with the feature
prematurely based on a not representative set of users and the later
after more users showed up see we have to change its behavior.

So I'm wondering if we should delay the introduction of this feature to
1.10 or is it important enough to wait until there is a consensus on the
remaining issues?
Sai Rajeshwar | 16 Jul 11:55 2014
Picon

building scipy with umfpack and amd

hi,
   
   im running  a code which uses scipy.signal.convolve and numpy.sum  extensively. I ran the code on two machines. One machine took very less time compared to other with same configuration,  i checked the scipy configuration in that machine. i found scipy in that is built with umfpack and amd..

is this the reason behind it..  in what way umfpack and amd aid scipy operations..?

--------------------------------

>>> scipy.__config__.show()
blas_info:
    libraries = ['blas']
    library_dirs = ['/usr/lib64']
    language = f77

amd_info:
    libraries = ['amd']
    library_dirs = ['/usr/lib64']
    define_macros = [('SCIPY_AMD_H', None)]
    swig_opts = ['-I/usr/include/suitesparse']
    include_dirs = ['/usr/include/suitesparse']

lapack_info:
    libraries = ['lapack']
    library_dirs = ['/usr/lib64']
    language = f77

atlas_threads_info:
  NOT AVAILABLE

blas_opt_info:
    libraries = ['blas']
    library_dirs = ['/usr/lib64']
    language = f77
    define_macros = [('NO_ATLAS_INFO', 1)]

atlas_blas_threads_info:
  NOT AVAILABLE

umfpack_info:
    libraries = ['umfpack', 'amd']
    library_dirs = ['/usr/lib64']
    define_macros = [('SCIPY_UMFPACK_H', None), ('SCIPY_AMD_H', None)]
    swig_opts = ['-I/usr/include/suitesparse', '-I/usr/include/suitesparse']
    include_dirs = ['/usr/include/suitesparse']

thanks a lot for your replies in advance

 
with regards..

M. Sai Rajeswar
M-tech  Computer Technology
IIT Delhi
----------------------------------Cogito Ergo Sum---------
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev <at> scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev

Gmane