Nathaniel Smith | 9 May 22:26 2015
Picon

Proposed deprecations for 1.10: dot corner cases

Hi all,

I'd like to suggest that we go ahead and add deprecation warnings to
the following operations. This doesn't commit us to changing anything
on any particular time scale, but it gives us more options later.

1) dot(A, B) where A and B *both* have *3 or more dimensions*:
currently, this does a weird "outer product" thing, where it computes
all pairwise matrix products. We've had numerous discussions about why
this is suboptimal, and it contradicts the PEP 465 semantics for  <at> ,
which broadcast + vectorize over extra dimensions. (If you have a
vectorized version, then the outer product one is easy to derive; if
you have only the outer product version .) While dot() is widely used
in general, this particular varient is very, very rarely used. I
propose we issue a FutureWarning here, so as to lay the groundwork for
someday eventually making dot() and  <at>  the same.

2) dot(A, B) where one of the argument is a scalar: currently, this
does scalar multiplication. There is no logically consistent
motivation for this, it violates TOOWTDI, and again it is inconsistent
with the PEP semantics for  <at>  (which are that this case should be an
error). (NB for those still using np.matrix: scalar * np.matrix will
still be supported regardless; this would only affect expressions
where you actually call the dot() function.) I propose to make this a
DeprecationWarning.

--

-- 
Nathaniel J. Smith -- http://vorpus.org
Jaime Fernández del Río | 9 May 19:48 2015
Picon

Bug in np.nonzero / Should index returning functions return ndarray subclasses?

There is a reported bug (issue #5837) regarding different returns from np.nonzero with 1-D vs higher dimensional arrays. A full summary of the differences can be seen from the following output:

>>> class C(np.ndarray): pass
... 
>>> a = np.arange(6).view(C)
>>> b = np.arange(6).reshape(2, 3).view(C)
>>> anz = a.nonzero()
>>> bnz = b.nonzero()

>>> type(anz[0])
<type 'numpy.ndarray'>
>>> anz[0].flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
>>> anz[0].base

>>> type(bnz[0])
<class '__main__.C'>
>>> bnz[0].flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : False
  ALIGNED : True
  UPDATEIFCOPY : False
>>> bnz[0].base
array([[0, 1],
       [0, 2],
       [1, 0],
       [1, 1],
       [1, 2]])

The original bug report was only concerned with the non-writeability of higher dimensional array returns, but there are more differences: 1-D always returns an ndarray that owns its memory and is writeable, but higher dimensional arrays return views, of the type of the original array, that are non-writeable.

I have a branch that attempts to fix this by making both 1-D and n-D arrays:
  1. return a view, never the base array,
  2. return an ndarray, never a subclass, and
  3. return a writeable view.
I guess the most controversial choice is #2, and in fact making that change breaks a few tests. I nevertheless think that all of the index returning functions (nonzero, argsort, argmin, argmax, argpartition) should always return a bare ndarray, not a subclass. I'd be happy to be corrected, but I can't think of any situation in which preserving the subclass would be needed for these functions.

Since we are changing the returns of a few other functions in 1.10 (diagonal, diag, ravel), it may be a good moment to revisit the behavior for these other functions. Any thoughts?

Jaime

--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Dammy | 7 May 02:26 2015
Picon

Using gentxt to import a csv with a string class label and hundreds of integer features

Hi,
I am trying to use numpy.gentxt to import a csv for classification using
scikit-learn. The first column in the csv is a string type class label while
200+ extra columns are integer features.
Please I wish to find out how I can use the gentext function to specify a
dtype of string for the first column while specifying int type for all other
columns.

I have tried using "dtype=None" as shown below, but when I print
dataset.shape,  I get (number_or_rows,) i.e no columns are read in:
 dataset = np.genfromtxt(file,delimiter=',', skip_header=True)

I also tried setting the dtypes as shown in the examples below, but I get
the same error as dtype=None:
a: dataset = np.genfromtxt(file,delimiter=',', skip_header=True,
dtype=['S2'] + [ int for n in range(241)],)
b: dataset = np.genfromtxt(file,delimiter=',', skip_header=True,
dtype=['S2'] + [ int for n in range(241)],names=True )

Any thoughts? Thanks for your assistance.

Dammy

--
View this message in context: http://numpy-discussion.10968.n7.nabble.com/Using-gentxt-to-import-a-csv-with-a-string-class-label-and-hundreds-of-integer-features-tp40319.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.
Francesc Alted | 6 May 12:11 2015
Picon

ANN: python-blosc 1.2.7 released

=============================
Announcing python-blosc 1.2.7
=============================

What is new?
============

Updated to use c-blosc v1.6.1.  Although that this supports AVX2, it is
not enabled in python-blosc because we still need a way to devise how to
detect AVX2 in the underlying platform.

At any rate, c-blosc 1.6.1 fixed an important bug in the blosclz codec that a release was deemed important.

For more info, you can have a look at the release notes in:


More docs and examples are available in the documentation site:



What is it?
===========

Blosc (http://www.blosc.org) is a high performance compressor
optimized for binary data.  It has been designed to transmit data to
the processor cache faster than the traditional, non-compressed,
direct memory fetch approach via a memcpy() OS call.

Blosc is the first compressor that is meant not only to reduce the size
of large datasets on-disk or in-memory, but also to accelerate object
manipulations that are memory-bound
how much speed it can achieve in some datasets.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for
the Blosc compression library.

There is also a handy tool built on Blosc called Bloscpack
(https://github.com/Blosc/bloscpack). It features a commmand line
interface that allows you to compress large binary datafiles on-disk.
It also comes with a Python API that has built-in support for
serializing and deserializing Numpy arrays both on-disk and in-memory at
speeds that are competitive with regular Pickle/cPickle machinery.


Installing
==========

python-blosc is in PyPI repository, so installing it is easy:

$ pip install -U blosc  # yes, you should omit the python- prefix


Download sources
================

The sources are managed through github services at:



Documentation
=============

There is Sphinx-based documentation site at:



Mailing list
============

There is an official mailing list for Blosc at:



Licenses
========

Both Blosc and its Python wrapper are distributed using the MIT license.
See:


for more details.

----

  **Enjoy data!**

--
Francesc Alted
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Nathaniel Smith | 6 May 11:59 2015
Picon

Dispatch rules for binary operations on ndarrays

I just wanted to draw the list's attention to a discussion happening on the tracker, about the details of how methods like ndarray.__add__ are implemented, and how this interacts with the new __numpy_ufunc__ method that will make it possible for third party libraries to override arbitrary ufuncs starting in (hopefully) 1.10:
  https://github.com/numpy/numpy/issues/5844

The details are somewhat arcane, but very important for anyone who implements ndarray-like objects or (to a lesser extent) anyone who subclasses ndarray. So feedback very welcome.

-n

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Alexander Brezinov | 5 May 21:52 2015

import scipy.linalg is hanging on Marvell armada 370

Hello 

The import of scipy.linalg is hanging in DOUBLE_mutiply function (BINARY_LOOP) in umath.so. After attaching the gdb and dumping the local varibles the args are empty strings. Could you please advise if this is known issue? I just search the mailing list and could not find any solution for the problem. 

I am running:

kernel 3.2.36  + Debian wheezy on ARMv71 armhf
CPU Armada 370 Marvell
python 2.7.3
scipy 0.15.1
numpy 1.9.2

The problem could be reproduced by launching python and importing scipy.linalg(import linalg) 

I also run the same OS on qemu and was not able to reproduce the issue. Similar architecture such as rasbery pi (ARMv7 armhf) is fine. Also if using software floating point intead of hardware floating point on the same Armada 370 (ARMv7) working just fine.


Thank you for any comments or suggestions in advance,
Alex  
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Allan Haldane | 5 May 17:13 2015
Picon

Should ndarray subclasses support the keepdims arg?

Hello all,

A question:

Many ndarray methods (eg sum, mean, any, min) have a "keepdims" keyword
argument, but ndarray subclass methods sometimes don't. The 'matrix'
subclass doesn't, and numpy functions like 'np.sum' intentionally
drop/ignore the keepdims argument when called with an ndarray subclass
as first argument.

This means you can't always use ndarray subclasses as 'drop in'
replacement for ndarrays if the code uses keepdims (even indirectly),
and it means code that deals with keepdims (eg np.sum and more) has to
detect ndarray subclasses and drop keepdims even if the subclass
supports it (since there is no good way to detect support). It seems to
me that if we are going to use inheritance, subclass methods should keep
the signature of the parent class methods. What does the list think?

---- Details: ----

This problem comes up in a PR I'm working on (#5706) to add the keepdims
arg to masked array methods. In order to support masked matrices (which
a lot of unit tests check), I would have to detect and drop the keepdims
arg to avoid an exception. This would be solved if the matrix class
supported keepdims (plus an update to np.sum). Similarly,
`np.sum(mymaskedarray, keepdims=True)` does not respect keepdims, but it
could work if all subclasses supported keepdims.

I do not foresee immediate problems with adding keepdims to the matrix
methods, except that it would be an unused argument. Modifying `np.sum`
to always pass on the keepdims arg is trickier, since it would break any
code that tried to np.sum a subclass that doesn't support keepdims, eg
pandas.DataFrame. **kwargs tricks might work. But if it's permissible I
think it would be better to require subclasses to support all the
keyword args ndarray supports.

Allan
Gmail | 4 May 22:17 2015
Picon

read not byte aligned records

Hi,

I am developping a code to read binary files (MDF, Measurement Data File).
In its previous version 3, data was always byte aligned. I used widely
numpy.core.records module (fromstring, fromfile) showing good
performance to read and unpack data on the fly.
However, in the latest version 4, not byte aligned data is possible. It
allows to reduce size of file, especially when raw data is not actually
recorded on bytes, like 10bits for analog converter. For instance, a
record structure could be:
uint64, float32, uint8, unit10, padding 6bits, uint9, padding 7bits,
uint24, uint24, uint24, etc.

I found a way using instead of numpy.core.records the bitstring module
to read these records when not aligned but performance is much worse (I
did not try cython implementation though but in python like x10) ?

Would there be a pure numpy way to do ?

Regards

Aymeric
Nicolas P. Rougier | 1 May 09:49 2015
Picon
Picon

EuroScipy 2015: Extended deadline (15/05/2015)

--------------------------------
Extended deadline: 15th May 2015
--------------------------------

EuroScipy 2015, the annual conference on Python in science will take place in
Cambridge, UK on 26-30 August 2015. The conference features two days of
tutorials followed by two days of scientific talks & posters and an extra day
dedicated to developer sprints. It is the major event in Europe in the field of
technical/scientific computing within the Python ecosystem. Data scientists,
analysts, quants, PhD's, scientists and students from more than 20 countries
attended the conference last year.

The topics presented at EuroSciPy are very diverse, with a focus on advanced
software engineering and original uses of Python and its scientific libraries,
either in theoretical or experimental research, from both academia and the
industry.

Submissions for posters, talks & tutorials (beginner and advanced) are welcome
on our website at http://www.euroscipy.org/2015/ Sprint proposals should be
addressed directly to the organisation at euroscipy-org <at> python.org

Important dates
===============

Mar 24, 2015 Call for talks, posters & tutorials
Apr 30, 2015 Talk and tutorials submission deadline
May 15, 2015 EXTENDED DEADLINE
May 1,  2015 Registration opens
May 30, 2015 Final program announced
Jun 15, 2015 Early-bird registration ends

Aug 26-27, 2015 Tutorials
Aug 28-29, 2015 Main conference
Aug 30, 2015 Sprints

We look forward to an exciting conference and hope to see you in Cambridge

The EuroSciPy 2015 Team - http://www.euroscipy.org/2015/
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

ANN: SciPy 2015 Tutorial Schedule Posted - Register Today - Already 30% Sold Out

**The #SciPy2015 Conference (Scientific Computing with #Python) Tutorial Schedule is up! It is 1st come, 1st served and already 30% sold out. Register today!** http://www.scipy2015.scipy.org/ehome/115969/289057/?&.This year you can choose from 16 different SciPy tutorials OR select the 2 day Software Carpentry course on scientific Python that assumes some programming experience but no Python knowledge. Please share! Tutorials include:

 

*Introduction to NumPy (Beginner)

*Machine Learning with Scikit-Learn (Intermediate)

*Cython: Blend of the Best of Python and C/++ (Intermediate)

*Image Analysis in Python with SciPy and Scikit-Image (Intermediate)

*Analyzing and Manipulating Data with Pandas (Beginner)

*Machine Learning with Scikit-Learn (Advanced)

*Building Python Data Applications with Blaze and Bokeh (Intermediate)

*Multibody Dynamics and Control with Python (Intermediate)

*Anatomy of Matplotlib (Beginner)

*Computational Statistics I (Intermediate)

*Efficient Python for High-Performance Parallel Computing (Intermediate)

*Geospatial Data with Open Source Tools in Python (Intermediate)

*Decorating Drones: Using Drones to Delve Deeper into Intermediate Python (Intermediate)

*Computational Statistics II (Intermediate)

*Modern Optimization Methods in Python (Advanced)

*Jupyter Advanced Topics Tutorial (Advanced)

 

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
josef.pktd | 30 Apr 20:24 2015
Picon

code snippet: assert all close or large



_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Gmane