Valentin Haenel | 9 Jan 20:43 2015
Picon

[ANN] bcolz v0.8.0

======================
Announcing bcolz 0.8.0
======================

What's new
==========

This version adds a public API in the form of a Cython definitions file
(``carray_ext.pxd``) for the ``carray`` class!

This means, other libraries can use the Cython definitions to build more
complex programs using the objects provided by bcolz. In fact, this
feature was specifically requested and there already exists a nascent
application called *bquery* (https://github.com/visualfabriq/bquery)
which provides an efficient out-of-core groupby implementation for the
``ctable`` object

Because this is a fairly sweeping change, the minor version number was
incremented and no additional major features or bugfixes were added to
this release.  We kindly ask any users of bcolz to try this version
carefully and report back any issues, bugs, or even slow-downs you
experience.  I.e. please, please be careful when deploying this version
into production.

Many, many kudos to Francesc Elies and Carst Vaartjes of Visualfabriq
for their hard work, continued effort to push this feature and their
work on bquery which makes use of it!

What it is
==========
(Continue reading)

Yuri D'Elia | 9 Jan 12:54 2015

RFC: Inspecting frames/series interactively

Hi everyone, I'd like to bring up this PR:
https://github.com/pydata/pandas/pull/9191 for discussion.

Some background:

I always found the output formatting of DataFrames a bit disorienting,
especially when the number of columns cannot fit the current terminal width.

I initially started to use and customize a python tool/module called
"tabview" (https://github.com/firecat53/tabview) which provides a curses
spreadsheet-like display.

When combined with ipython it's really neat. Once used for a while, it's
hard to go back to the normal way of displaying the data.

In fact, my request for comments involves about integrating the concept
of a general "data pager" directly into pandas. I don't want to glue a
specific pager into pandas, allow improved pagers to emerge.

The PR https://github.com/pydata/pandas/pull/9191 implements a generic
pd.interact() method, which aims to dispatch the provided argument to a
data viewer.

We add convenience methods to DataFrame/Series/Panel, so that:

  df.groupby('...').anything().interact()

is just a shorthand for:

  pd.interact(result)
(Continue reading)

Valentin Haenel | 4 Jan 22:00 2015
Picon

[ANN] bcolz 0.7.3


======================
Announcing bcolz 0.7.3
======================

What's new
==========

This release includes the support for pickling persistent carray/ctable
objects contributed by Matthew Rocklin. Also, the included version of
Blosc is updated to ``v1.5.2``. Lastly, several minor issues and typos
have been fixed, please see the release notes for details.

``bcolz`` is a renaming of the ``carray`` project.  The new goals for
the project are to create simple, yet flexible compressed containers,
that can live either on-disk or in-memory, and with some
high-performance iterators (like `iter()`, `where()`) for querying them.

Together, bcolz and the Blosc compressor, are finally fulfilling the
promise of accelerating memory I/O, at least for some real scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

For more detailed info, see the release notes in:
https://github.com/Blosc/bcolz/wiki/Release-Notes

What it is
==========

bcolz provides columnar and compressed data containers.  Column storage
(Continue reading)

Dr. Leo | 3 Jan 09:25 2015

How to parse time spans such as '2014-W52', '2014-Q4', '2014-T3', '2014-S2'?

Hi,

is there a way to do this elegantly?

There is an established need for this as at least the SDMX standard
allows for these representations.

For pandaSDMX I have worked around the resulting errors by translating
e.g. '2014-Q3' to '2014-07'. But this is clearly second best. 

Leo

--

-- 
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@...
For more options, visit https://groups.google.com/d/optout.

Carst Vaartjes | 22 Dec 02:23 2014
Picon

Cython Imports, Factorize, GroupBy

Hi everyone,

Just a short mail to summarize some of the stuff that is currently sort of waiting for decisions / checks. First of all, I want to thank you guys for all the work you have been doing. We managed to get the whole thing up & running for production in the last month (the biggest data set in there is actually 2 billion records, though we spread that over multiple files). The performance, reliability and stability is extremely good: we have an internal metadata system that translates report requests into internal calls and asks it to the relevant server (with automatic joins over multiple fact files, groupbys, dimension additions etc.). We moved from the in-mem Pandas based situation (which relied heavily on dataframe caching and therefore issues with keeping it up to date, adding processes for hot data sets etc) to BCOLZ where we have a much more manageable situation (larger availability of files and many more cache-less processes which pick files as needed). The performance is around 2-3x slower than in-mem Pandas (I do believe we can bring this near to 1-2x in time), but that in itself is really impressive already. 
The things aren't in the master yet though, which I understand but just want you to think about and let us know how you want to go from here; also, please realize that we do not have hard core programming backgrounds, so I'm sure there are stuff that can be improved upon too.

Cython Imports
The building block that is really needed for everything, a carray_ext that is "cimport-able". Valentin did work of his own on this before, this has been made separately. It works 100% but might not be optimized yet in terms of structure. Still, I hope you can see it as a 100% working version 0.1 that can be incorporated into the master
Also because all the next discussions cannot be rolled out well without having this in place... Francesc Elies (who made this) is on holidays back home in Barcelona atm, so if there are major questions we cannot answer this until early January unfortunately.

Factorize
See: https://github.com/FrancescElies/bcolz/tree/groupby_w_filter (Mind you, this branch contains three things at the same time (factorization, groupby, a workaround for in filters))
Based on the help from Valentin + looking into existing Pandas functionality, we created a factorization for carrays
Also, we made a function where out-of-core ctables can cache the factorizations; not nicely integrated into any ctable metadata yet, but works really well

GroupBy
Then a out-of-core approach for groupbys. The limitation here is that the aggregated result does need to be in-core (it's a temporary in-mem numpy array), this has to do with inherent limitations of bcolz (non-sequential writes), but it works really well. 60 million record bcolz files go really fast, the memory usage normally is quite limited etc.

In / Not In Filters
The branch contains a workaround based on Pandas; as noted by Francesc Alted before, we need to add "in/not in" functionality to numexpr (this would also benefit Pandas actually, which uses a cython based "in set" check atm). This still has to be started and should be in numexpr, not here, so feel free to ignore it!

I understand if you do not want the Factorize and Groupby in your maintenance if you want to keep it as small as possible, in that case we would make a separate public package which would work on top of BCOLZ (we would have to think of a fancy name for BCOLZ Query Framework ;) We would still need the cython import though, so I hope you can really put that in your next release...
I hope this makes everything clear to everyone! As I said before, BCOLZ is very impressive, I hope we're helping you guys with this new functionality more than giving new headaches ;)

Kind regards,

Carst

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Nick Eubank | 18 Dec 22:13 2014
Picon

Indexes: organizational, or performance related?

Quick (possibly naive) question: is the role of indexes purely organizational (a default column for merges and such), or does it have performance implications (pandas computes binary search trees for index columns or something)? 

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
'Michael' via PyData | 13 Dec 13:35 2014

[Whishlist] pd.set_option('display.date_format', fmt) ?

Would this be considered useful enough to be included in pandas?

Many times I want to see datetime shrinked, 
to be able to see more columns in this way, 
or just because I want to see datetimes in some format

What do you think about this?

Anyway, is there a way to do this now in pandas?
I searched, but didn't find anything

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Jeff Reback | 12 Dec 14:43 2014
Picon

ANN: pandas v0.15.2

Hello,

We are proud to announce v0.15.2 of pandas, a minor release from 0.15.1. 

This release includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug fixes. 

This was a short release of 4 weeks with 137 commits by 49 authors encompassing 75 issues.

We recommend that all users upgrade to this version.

For a more full description of Whatsnew for v0.15.2, see here:

pandas is a Python package providing fast, flexible, and expressive data
structures designed to make working with “relational” or “labeled” data both
easy and intuitive. It aims to be the fundamental high-level building block for
doing practical, real world data analysis in Python. Additionally, it has the
broader goal of becoming the most powerful and flexible open source data
analysis / manipulation tool available in any language.


Documentation:
http://pandas.pydata.org/pandas-docs/stable/

Source tarballs, windows binaries are available on PyPI:

windows binaries are courtesy of  Christoph Gohlke and are built on Numpy 1.8
macosx wheels are courtesy of Matthew Brett

Please report any issues here:
https://github.com/pydata/pandas/issues


Thanks

The Pandas Development Team


Contributors to the 0.15.2 release

  • Aaron Staple
  • Angelos Evripiotis
  • Artemy Kolchinsky
  • Benoit Pointet
  • Brian Jacobowski
  • Charalampos Papaloizou
  • Chris Warth
  • David Stephens
  • Fabio Zanini
  • Francesc Via
  • Henry Kleynhans
  • Jake VanderPlas
  • Jan Schulz
  • Jeff Reback
  • Jeff Tratner
  • Joris Van den Bossche
  • Kevin Sheppard
  • Matt Suggit
  • Matthew Brett
  • Phillip Cloud
  • Rupert Thompson
  • Scott E Lasley
  • Stephan Hoyer
  • Stephen Simmons
  • Sylvain Corlay
  • Thomas Grainger
  • Tiago Antao
  • Trent Hauck
  • Victor Chaves
  • Victor Salgado
  • Vikram Bhandoh
  • WANG Aiyong
  • Will Holmgren
  • behzad nouri
  • broessli
  • charalampos papaloizou
  • immerrr
  • jnmclarty
  • jreback
  • mgilbert
  • onesandzeroes
  • peadarcoyle
  • rockg
  • seth-p
  • sinhrks
  • unutbu
  • wavedatalab
  • Åsmund Hjulstad


--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Miki Tebeka | 8 Dec 18:09 2014
Picon

[OT] Interfacing Matlab?

Greetings,

Bit OT, but hope people might be able to help.

What's the recommended package to interface to legacy matlab code? pymatlab? mlabwrap? ...

Thanks,
--
Miki

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Damian Avila | 8 Dec 16:27 2014

ANN: Bokeh 0.7 released

On behalf of the Bokeh team, I am very happy to announce the release of Bokeh version 0.7!

Bokeh is a Python library for visualizing large and realtime datasets on the web.
Its goal is to provide to developers (and domain experts) with capabilities to easily create novel and powerful visualizations that extract insight from local or remote (possibly large) data sets, and to easily publish those visualization to the web for others to explore and interact with.

This release includes many major new features:

* IPython widgets and animations without a Bokeh server
* Touch UI working for tools on mobile devices
* Vastly improved linked data table
* More new (and improving) bokeh.charts (high level charting interface)
* Color mappers on the python side
* Improved toolbar
* Many new tools: lasso, poly, and point selection, crosshair inspector

Check our blog post: http://continuum.io/blog/bokeh-0.7, to watch some of these tools in action! And you can also see the CHANGELOG for full details.

We would like to mention that the Github Organization for Bokeh is growing! This organization was already home to bokeh-scala and bokeh.jl, and now the Bokeh project itself has a new home there as well, located at https://github.com/bokeh/bokeh. Anyone interested in developing new language bindings for Bokeh is encouraged to contact us about hosting your project under this organization.

Also, the release of Bokeh 0.8 should happen in early 2015. Some notable features we intend to work on are:

* Simplifying production and multi-user Bokeh server deployments
* Colorbar axis and axis location inspectors
* Better support for maps and projections

As usual, don't forget to check out the full documentation, interactive gallery, and tutorial at


as well as the Bokeh IPython notebook nbviewer index (including all the tutorials) at:


To install the latest release, if you are using Anaconda, you can install it with conda:

    conda install bokeh

Alternatively, you can install it with pip:

    pip install bokeh

BokehJS is also available by CDN for use in standalone Javascript applications:


Finally, BokehJS is also installable with the Node Package Manager.

Issues, enhancement requests, and pull requests can be made on the Bokeh Github page: https://github.com/bokeh/bokeh

Questions can be directed to the Bokeh mailing list: bokeh-aihBOO89d3ITaNkGU808tA@public.gmane.org

Thank you for your attention!

Damián

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
'Michael' via PyData | 7 Dec 16:03 2014

plotdevice project (mac osx)

Just wanted to raise awareness for this project.

Found it yesterday.
It's very very nice for drawing, animation, interaction, exporting (images, pdf, movies)

Many many things could be done in combination with pandas.
For research purposes especially.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.

Gmane