Francesc Alted | 5 Oct 11:00 2015
Picon
Gravatar

ANN: bcolz 0.11.3 released!

=======================
Announcing bcolz 0.11.3
=======================

What's new
==========

Implemented new feature (#255): bcolz.zeros() can create new ctables
too, either empty or filled with zeros. (#256 <at> FrancescElies
<at> FrancescAlted).

Also, in previous, non announced versions (0.11.1 and 0.11.2), new
dependencies were added and other fixes are there too.

For a more detailed change log, see:

https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst


What it is
==========

*bcolz* provides columnar and compressed data containers that can live
either on-disk or in-memory.  Column storage allows for efficiently
querying tables with a large number of columns.  It also allows for
cheap addition and removal of column.  In addition, bcolz objects are
compressed by default for reducing memory/disk I/O needs. The
compression process is carried out internally by Blosc, an
extremely fast meta-compressor that is optimized for binary data. Lastly,
high-performance iterators (like ``iter()``, ``where()``) for querying
the objects are provided.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, since the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.

Together, bcolz and the Blosc compressor, are finally fulfilling the
promise of accelerating memory I/O, at least for some real scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the
Blaze project (http://blaze.pydata.org/), Quantopian
(https://www.quantopian.com/) and Scikit-Allel
(https://github.com/cggh/scikit-allel) which you can read more about by
pointing your browser at the links below.

* Visualfabriq:

  * *bquery*, A query and aggregation framework for Bcolz:
  * https://github.com/visualfabriq/bquery

* Blaze:

  * Notebooks showing Blaze + Pandas + BColz interaction:
  * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb
  * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb

* Quantopian:

  * Using compressed data containers for faster backtesting at scale:
  * https://quantopian.github.io/talks/NeedForSpeed/slides.html

* Scikit-Allel

  * Provides an alternative backend to work with compressed arrays
  * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html

Installing
==========

bcolz is in the PyPI repository, so installing it is easy::

    $ pip install -U bcolz


Resources
=========

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bcolz <at> googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt

Release notes can be found in the Git repository:
https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst

----

  **Enjoy data!**

--
Francesc Alted

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Jeff Reback | 3 Oct 23:33 2015
Picon

ANN: pandas v0.17.0rc2 - RELEASE CANDIDATE 2

Hi,

I'm pleased to announce the availability of the second release candidate of Pandas 0.17.0.
Please try this RC and report any issues here: Pandas Issues
We will be releasing officially on October 9.

**RELEASE CANDIDATE 2**

From RC 1 we have:

  • compat for Python 3.5
  • compat for matplotlib 1.5.0
  • .convert_objects is now restored to the original, and is deprecated

This is a major release from 0.16.2 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.

Highlights include:

  • Release the Global Interpreter Lock (GIL) on some cython operations, see here
  • Plotting methods are now available as attributes of the .plot accessor, see here
  • The sorting API has been revamped to remove some long-time inconsistencies, see here
  • Support for a datetime64[ns] with timezones as a first-class dtype, see here
  • The default for to_datetime will now be to raise when presented with unparseable formats, previously this would return the original input, see here
  • The default for dropna in HDFStore has changed to False, to store by default all rows even if they are all NaN, see here
  • Support for Series.dt.strftime to generate formatted strings for datetime-likes, see here
  • Development installed versions of pandas will now have PEP440 compliant version strings GH9518
  • Development support for benchmarking with the Air Speed Velocity library GH8316
  • Support for reading SAS xport files, see here
  • Removal of the automatic TimeSeries broadcasting, deprecated since 0.8.0, see here
  • Display format with plain text can optionally align with Unicode East Asian Width, see here
  • Compatibility with Python 3.5 GH11097
  • Compatibility with matplotlib 1.5.0 GH11111

See the Whatsnew for much more information. 

Best way to get this is to install via conda from our development channel. Builds for osx-64,linux-64,win-64 for Python 2.7, Python 3.4, and Python 3.5 (for osx/linux) are all available.

conda install pandas -c pandas

Thanks to all who made this release happen. It is a very large release!

Jeff

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
J.D. Corbin | 2 Oct 18:42 2015

Write DataFrame to CSV Compressed (Python 3)

I am trying to write a dataframe to csv in a compressed gzip format using Python3.

I can get my code to work in Py2.7, but because of the difference relating to bytes and strings in python 3 its not working.  I am not sure how to fix it.

The code I have creates a temporary file then uses that as the stream object to gzip.GzipFile.  I then use the gzip file as the argument to Pandas.to_csv.

I get the error 'str'  does not support the buffer interface which I understand but not sure how to fix my code in python 3 to get around that error.

I've tried different 'mode' arguments but haven't found a way to get around this error.

Here is my sample code that I've been using:

from tempfile import NamedTemporaryFile
import gzip
import pandas as pd

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
'foo', 'bar', 'foo', 'bar'],
'B' : ['one', 'one', 'two', 'two',
'two', 'two', 'one', 'two'],
'C' : [56, 2, 3, 4, 5, 6, 0, 2],
'D' : [51, 2, 3, 4, 5, 6, 0, 2]})


with NamedTemporaryFile(mode='wb') as tmp:
with gzip.GzipFile(fileobj=tmp) as archive:
df.to_csv(archive, header=False)

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
kyoto89 | 1 Oct 17:05 2015
Picon

Pandas Left Merge of xlsx file with CSV file producing null value columns in output

I thought I would bring up the following question that I posted on SO:

http://stackoverflow.com/questions/32889129/pandas-left-merge-with-xlsx-with-csv-producing-null-value-columns-in-output

 I wanted to avoid redundancy (hence why I am merely providing the link). Thank you again for your support. 

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Vincent Davis | 1 Oct 06:23 2015

read_csv( skiprows ) note working for bad rows.

I was trying to use skiprows to skip rows that are bad, but it does not work. Am I doing something wrong or is this a bug?
First this works. You can get the data here is you would like to try it yourself.
denverChar = pd.read_csv('real_property_residential_characteristics.csv', quotechar='"', warn_bad_lines=True, error_bad_lines=False, na_values="", low_memory=False,)

I get this
b'Skipping line 62070: expected 46 fields, saw 47\nSkipping line 62073: expected 46 fields, saw 47\nSkipping line 62076: expected 46 fields, saw 47\nSkipping line 66662: expected 46 fields, saw 48\n'

But if I try to use skiprows, skipping the rows listed above.
denverChar = pd.read_csv('real_property_residential_characteristics.csv', quotechar='"', skiprows = [62070, 62073, 62076, 66662], na_values="", low_memory=False,)

CParserError Traceback (most recent call last) <ipython-input-26-ca2e2986e191> in <module>() 1 import pandas as pd ----> 2 denverChar = pd.read_csv('real_property_residential_characteristics.csv', quotechar='"', skiprows = [62070, 62073, 62076, 66662], na_values="", low_memory=False,) /Users/vincentdavis/anaconda/envs/py35/lib/python3.5/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, na_fvalues, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines) 472 skip_blank_lines=skip_blank_lines) 473 --> 474 return _read(filepath_or_buffer, kwds) 475 476 parser_f.__name__ = name /Users/vincentdavis/anaconda/envs/py35/lib/python3.5/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds) 258 return parser 259 --> 260 return parser.read() 261 262 _parser_defaults = { /Users/vincentdavis/anaconda/envs/py35/lib/python3.5/site-packages/pandas/io/parsers.py in read(self, nrows) 719 raise ValueError('skip_footer not supported for iteration') 720 --> 721 ret = self._engine.read(nrows) 722 723 if self.options.get('as_recarray'): /Users/vincentdavis/anaconda/envs/py35/lib/python3.5/site-packages/pandas/io/parsers.py in read(self, nrows) 1168 1169 try: -> 1170 data = self._reader.read(nrows) 1171 except StopIteration: 1172 if nrows is None: pandas/parser.pyx in pandas.parser.TextReader.read (pandas/parser.c:8094)() pandas/parser.pyx in pandas.parser.TextReader._read_rows (pandas/parser.c:9157)() pandas/parser.pyx in pandas.parser.raise_parser_error (pandas/parser.c:22231)() CParserError: Error tokenizing data. C error: Expected 46 fields in line 62070, saw 47


--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Soren | 27 Sep 22:05 2015
Picon
Gravatar

Joined tables contain NaN

Hi!

Is this behavior intended? 

import pandas as pd
left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                   'B': ['B0', 'B1', 'B2', 'B3'],
                   'key': ['K0', 'K1', 'K0', 'K1']})
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
                   'D': ['D0', 'D1', 'D2', 'D3'],
                   'key': ['K0', 'K1', 'K0', 'K1']})
print left.join(right,on='key',rsuffix='_r')

A B key C D key_r 0 A0 B0 K0 NaN NaN NaN 1 A1 B1 K1 NaN NaN NaN 2 A2 B2 K0 NaN NaN NaN 3 A3 B3 K1 NaN NaN NaN

pd.__version__'0.16.2'
My expectation would have been a table that contains the values from both tables.Why does pandas insert NaN's?
regardsSören

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Soren | 27 Sep 22:13 2015
Picon
Gravatar

Performing an inner/left/right join on key column requires suffix.

Hi! 

I wonder, why is a suffix mandatory in the following example. There is just one column that is identical and it is the key column for the join.
For an outer join, of course, you can have different values in the keys, so you need two columns. However, in all other cases the columns are redundant. 
It generates so much overhead to generate the column and than remove one column again. 
Is there a smarter way of doing that in pandas?

import pandas as pd
left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                   'B': ['B0', 'B1', 'B2', 'B3'],
                   'key': ['K0', 'K1', 'K0', 'K1']})
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
                   'D': ['D0', 'D1', 'D2', 'D3'],
                   'key': ['K0', 'K1', 'K0', 'K1']})
left.join(right,on='key')

ValueError: columns overlap but no suffix specified: Index([u'key'], dtype='object')

pd.__version__()
'0.16.2'


regards
Soren

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Damian Avila | 28 Sep 18:55 2015

ANN: Bokeh 0.10.0 released!

Hi all,

On behalf of the Bokeh team, I am excited to announce the release of version 0.10.0 of Bokeh, an interactive web plotting library for Python... and other languages!  

This release was focused into provide several new features such as webgl support, a new refactored and more powerful chart interface and responsive plots. But we are also shipping a lot of bug-fixes and enhancements in our documentation, testing and build machineries and examples.

Some of the highlights from this release are:

* Initial webgl support (check our new examples: maps city, iris blend, scatter 10K, clustering.py)
* New charts interface supporting aggregation (see our new Bars, BoxPlot, Histogram and Scatter examples)
* Responsive plots
* Lower-level jsresources & cssresources (allow more subtle uses of resources)
* Several test machinery fixes
* Several build machinery enhancements
* More pytest-related fixes and enhancements
* More docs fixes and enhancements
* Now the glyph methods return the glyph renderer (not the plot)
* Gmap points moves consistently
* Added alpha control for imageurl objects
* Removed python33 testing and packaging
* Removed multiuserblazeserver

See the CHANGELOG for full details.

If you are using Anaconda/miniconda, you can install it with conda:

    conda install bokeh

or directly from our Anaconda Cloud main channel with:

    conda install -c bokeh bokeh

Alternatively, you can also install it with pip:

    pip install bokeh

If you want to use Bokeh in standalone Javascript applications, BokehJS is available by CDN at:


Additionally, BokehJS is also installable with the Node Package Manager at https://www.npmjs.com/package/bokehjs

Issues, enhancement requests, and pull requests can be made on the Bokeh Github page: https://github.com/bokeh/bokeh

Documentation is available at http://bokeh.pydata.org/en/0.10.0

Questions can be directed to the Bokeh mailing list: bokeh-aihBOO89d3ITaNkGU808tA@public.gmane.org

We also have a new "general" channel available at Slack: https://bokehplots.slack.com/
Note: This is an unsupported place where users can congregate and self-support or share experiences.
The supported place by default is the Bokeh mailing list.

Cheers.

--

Damián Avila

Software Developer


<at> damian_avila

+5492215345134 | cell (ARG)

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Soren | 27 Sep 21:47 2015
Picon
Gravatar

columns overlap but no suffix specified

Hi,

when I want to join two tables on a specific column pandas by default asks for a suffices. 
However, the column that is used for the join must have the same values. Including both columns in the output therefore is redundant. 
You would always join  the table and throw away one of the columns, wouldn't you?

Can I change this behavior somehow, so that the column appears without suffix and just one time instead of two?
That would be much more convenient in my opinion. 
regards
Sören

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.

scatter plot with colors depending on index position

Hello,

I would like to create a scatter plot with colors which change slightly 
from the first to the last index position. E.g. the circles get a 
slightly brighter with each next index position; this could be for col1 
with color dark blue to light blue and col2 from dark red to light red.

The data are available from a dataframe.

Does someone has a hint how to do this?

Best Regards
Fabian

--

-- 
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@...
For more options, visit https://groups.google.com/d/optout.

Benito Carmona | 26 Sep 09:04 2015
Picon

Read pandas categories directly from a csv

Hi,

With pandas, I am able to read a csv file (e.g. pd.read_table command) and once the dataframe is created  I can use the astype command to transform a int column that has an specific set of values (e.g. age rating for movies) in a category field. My question is : is there any way to do that transformation as part of the read_table or read_csv command instead of as postprocessing step? 

Thanks in advance

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.

Gmane