'Michael' via PyData | 21 Jul 17:13 2014

mxdatetime instead of datetime for dataframe index?

Im using (egenix) mxdatetime module for quite some time already.
it's way more flexible and faster than datetime from python
Flexibility especially about range of dates it offers (basically anything)
and functions working with julian dates and whatnot

wouldn't it be advantageous for pandas to use mxdatetimes instead of datetimes as a base for timestamps?


Because of my probable ignorance about details of Timestamp class,
I could make an error in judgement
but this is the idea I'm throwing in here, because I use only pandas for my research

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
'Michael' via PyData | 19 Jul 13:34 2014

read_csv() doesn't know filenames with spaces anymore?

Since v14, read_csv gives me an exception when reading my filenames that contain spaces

IOError: File aircrashes\ 50+\ kills.txt does not exist

if I eliminate spaces, it works

I'm on mac, mavericks latest version, anaconda latest version

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Nathaniel Smith | 17 Jul 03:21 2014
Picon

[ANN] patsy v0.3.0 released

Hi all,

I'm pleased to announce the v0.3.0 release of patsy. The main
highlight of this release is the addition of builtin functions to
compute natural and restricted cubic splines, and tensor spline
products, with optional constraints, and all compatible with the R
package 'mgcv'. (Note that if you wanted to replace mgcv itself then
you still need to implement their penalized fitting algorithm -- these
are just the spline basis functions. But these are very useful on
their own, and allow you to fit model coefficients with mgcv and then
use python to generate predictions from that model.) We also dropped
support for python 2.4 and 2.5, and have switched to a single polyglot
codebase for py2 and py3, allowing us to distribute universal wheels.

Patsy is a Python library for describing statistical models
(especially linear models, or models that have a linear component) and
building design matrices. Patsy brings the convenience of R "formulas"
to Python.

Changes: https://patsy.readthedocs.org/en/latest/changes.html#v0-3-0

General information: https://github.com/pydata/patsy/blob/master/README

Share and enjoy,
-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

--

-- 
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@...
For more options, visit https://groups.google.com/d/optout.

Tim Michelsen | 16 Jul 21:01 2014
Picon

Spyderlib: support for pandas objects added to Variable Explorer

Hi,
thanks to Carlos Cordoba & Daniel Høegh, the great IDE Spyder has now 
support for pandas in its Variable Explorer:

* https://code.google.com/p/spyderlib/issues/detail?id=1160

* 
https://bitbucket.org/spyder-ide/spyderlib/commits/008607f1fd22665b8b32695ddf235f0866cf7e32

It should also land tomorrow after the nightly build in the Ubuntu PPA:
https://code.launchpad.net/~pythonxy/+archive/ubuntu/pythonxy-devel

regards.

--

-- 
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@...
For more options, visit https://groups.google.com/d/optout.

Phillip Cloud | 15 Jul 04:00 2014
Picon

Dropping numpy 1.6 support

Hi all,

We over at pandas would like to drop support for numpy 1.6 in the next release v0.15.0. It's become a burden to support, in large part due to broken datetime functionality. All manner of ugly hacks and workarounds await the brave soul who dares venture into pandas/core/common.py. My feeling (and hopefully others) is that if you want to use modern versions of pandas you should upgrade your numpy version to something modern-ish as well. If there are folks who feel we should NOT drop numpy 1.6 support please speak up! Thanks. 


--

--
Best,
Phillip Cloud

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
John E | 14 Jul 03:11 2014
Picon

Re: Converting a big Stata program to Pandas?

Charles, completely agree that correct >> fast.  At the same time, the main reason for doing a conversion is to make it faster.  If Pandas isn't faster, it will probably have to be done in either fortran or numpy, but that would take longer to program of course.


On Sunday, July 13, 2014 1:13:26 PM UTC-4, Charles Cloud wrote:
The last thing you should be worried about is speed. With a port of this size you should start with tests that compare the output of the stata program with the output of pandas. Ideally you'd have this for every processing step so that you can verify that you haven't broken anything along the way. Only when your output is correct should you worry about whether things are fast enough. There are many ways to speed up your program but correctness should be your main concern at this point. I promise you'll thank your future self for writing a bunch of tests so that you don't have to say a prayer every time the program runs. I would start with creating some kind of wrapper that allows you to call stata from within Python, then writing a series of tests that cover as much of the functionality as you want to preserve. Then start porting and every time you make a change run the tests. If something breaks you'll know before writing a bunch more code that introduces even more bugs. 

On Sunday, July 13, 2014, John Eiler <eil...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
I've been handed a fairly large Stata program (around 5,000 lines of code spread throughout several do files), and asked to improve its readability, maintainability and speed (it takes about an hour to run).  At this point, I haven't deeply dived into the code, but was hoping someone here might be able to give some general tips.  I'm decent with Stata and can do some basic things in Pandas (with a lot of help from the documentation).

The main question I have at this point is:  what sort of speed differences can I expect between standard tasks in Stata and Pandas?  E.g. merges, sorts, group by, creating new variables from functions of old ones, etc.

I thought I would be able to find some sort of Stata vs Pandas benchmarks to get a sense of this but couldn't find anything via google.  I do know from past experience that python/numba is much faster than Stata for basic tasks like generating new variables and summing over the data.

Anyway, my current plan is to take a section of Stata code and convert it to Pandas and see what sort of speed difference I get.  On account of Pandas reading and writing *.dta files, I think it should be pretty straightforward to selectively replace things as a first step.

It's probably early to ask this, but would buying something like MKL Optimizations from Continuum speed things up?  That is, I understand that would speed up NumPy, so would that indirectly speed up Pandas also?

Anyway, thanks in advance for any comments, suggestions, or warnings!

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe <at> googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

--
Best,
Phillip Cloud

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
John Eiler | 13 Jul 18:36 2014
Picon

Converting a big Stata program to Pandas?

I've been handed a fairly large Stata program (around 5,000 lines of code spread throughout several do files), and asked to improve its readability, maintainability and speed (it takes about an hour to run).  At this point, I haven't deeply dived into the code, but was hoping someone here might be able to give some general tips.  I'm decent with Stata and can do some basic things in Pandas (with a lot of help from the documentation).

The main question I have at this point is:  what sort of speed differences can I expect between standard tasks in Stata and Pandas?  E.g. merges, sorts, group by, creating new variables from functions of old ones, etc.

I thought I would be able to find some sort of Stata vs Pandas benchmarks to get a sense of this but couldn't find anything via google.  I do know from past experience that python/numba is much faster than Stata for basic tasks like generating new variables and summing over the data.

Anyway, my current plan is to take a section of Stata code and convert it to Pandas and see what sort of speed difference I get.  On account of Pandas reading and writing *.dta files, I think it should be pretty straightforward to selectively replace things as a first step.

It's probably early to ask this, but would buying something like MKL Optimizations from Continuum speed things up?  That is, I understand that would speed up NumPy, so would that indirectly speed up Pandas also?

Anyway, thanks in advance for any comments, suggestions, or warnings!

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Jeff Reback | 11 Jul 15:31 2014
Picon

ANN: pandas 0.14.1 released

Hello,

We are proud to announce v0.14.1 of pandas, a minor release from 0.14.0. 

This release includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug fixes. 

This was 1.5 months of work with 244 commits by 45 authors encompassing 306 issues.

We recommend that all users upgrade to this version.

Highlights:

  • New method select_dtypes() to select columns based on the dtype
  • New method sem() to calculate the standard error of the mean.
  • Support for dateutil timezones (see docs).
  • Support for ignoring full line comments in the read_csv()text parser.
  • New documentation section on Options and Settings.
  • Lots of bug fixes

For a more a full description of Whatsnew for v0.14.1 here:

pandas is a Python package providing fast, flexible, and expressive data
structures designed to make working with “relational” or “labeled” data both
easy and intuitive. It aims to be the fundamental high-level building block for
doing practical, real world data analysis in Python. Additionally, it has the
broader goal of becoming the most powerful and flexible open source data
analysis / manipulation tool available in any language.


Documentation:
http://pandas.pydata.org/pandas-docs/stable/

Source tarballs, windows binaries are available on PyPI:

windows binaries are courtesy of  Christoph Gohlke and are built on Numpy 1.8
macosx wheels will be available soon, courtesy of Matthew Brett

Please report any issues here:
https://github.com/pydata/pandas/issues


Thanks

The Pandas Development Team


Contributors to the 0.14.1 release

  • Andrew Rosenfeld
  • Andy Hayden
  • Benjamin Adams
  • Benjamin M. Gross
  • Brian Quistorff
  • Brian Wignall
  • bwignall
  • clham
  • Daniel Waeber
  • David Bew
  • David Stephens
  • DSM
  • dsm054
  • helger
  • immerrr
  • Jacob Schaer
  • jaimefrio
  • Jan Schulz
  • John David Reaver
  • John W. O’Brien
  • Joris Van den Bossche
  • jreback
  • Julien Danjou
  • Kevin Sheppard
  • K.-Michael Aye
  • Kyle Meyer
  • lexual
  • Matthew Brett
  • Matt Wittmann
  • Michael Mueller
  • Mortada Mehyar
  • onesandzeroes
  • Phillip Cloud
  • Rob Levy
  • rockg
  • sanguineturtle
  • Schaer, Jacob C
  • seth-p
  • sinhrks
  • Stephan Hoyer
  • Thomas Kluyver
  • Todd Jennings
  • TomAugspurger
  • unknown
  • yelite

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Allie Wang | 9 Jul 20:38 2014
Picon

Cython/Pandas performance

We are using pandas in a high performance environment and trying to get as much speed as possible.  We can make things much faster by working with the underlying numpy arrays.  However, construction of dataframes and groupby's are slow it's unclear how to make them faster.  Any advice besides rewriting in C or Cython?

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Bryan Van de Ven | 9 Jul 17:13 2014

ANN: Bokeh 0.5 released

I am very happy to announce the release of Bokeh version 0.5! (http://continuum.io/blog/bokeh-0.5)

Bokeh is a Python library for visualizing large and realtime datasets on the web.

This release includes many new features: weekly dev releases, a new plot frame, a click tool, "always on"
hover tool, multiple axes, log axes, minor ticks, gears and gauges glyphs, and an NPM BokehJS package.
Several usability enhancements have been made to the plotting.py interface to make it even easier to use.
The Bokeh tutorial also now includes exercises in IPython notebook form. Of course, we've made many
little bug fixes - see the CHANGELOG for full details.

The biggest news is all the long-term and architectural goals landing in Bokeh 0.5:

    * Widgets! Build apps and dashboards with Bokeh
    * Very high level bokeh.charts interface
    * Initial Abstract Rendering support for big data visualizations
    * Tighter Pandas integration
    * Simpler, easier plot embedding options

Expect dynamic, data-driven layouts, including ggplot style auto-faceting in upcoming releases, as
well as R language bindings, more statistical plot types in bokeh.charts, and cloud hosting for Bokeh apps.

Check out the full documentation, interactive gallery, and tutorial at

    http://bokeh.pydata.org

as well as the new Bokeh IPython notebook nbviewer index (including all the tutorials) at:

    http://nbviewer.ipython.org/github/ContinuumIO/bokeh-notebooks/blob/master/index.ipynb

If you are using Anaconda, you can install with conda:

    conda install bokeh

Alternatively, you can install with pip:

    pip install bokeh

BokehJS is also available by CDN for use in standalone javascript applications:

    http://cdn.pydata.org/bokeh-0.5.min.js
    http://cdn.pydata.org/bokeh-0.5.min.css

Issues, enhancement requests, and pull requests can be made on the Bokeh Github page: https://github.com/continuumio/bokeh

Questions can be directed to the Bokeh mailing list: bokeh@...

If you have interest in helping to develop Bokeh, please get involved! Special thanks to recent
contributors: Tabish Chasmawala, Samuel Colvin, Christina Doig, Tarun Gaba, Maggie Mari, Amy
Troschinetz, Ben Zaitlen.

Bryan Van de Ven
Continuum Analytics
http://continuum.io

--

-- 
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@...
For more options, visit https://groups.google.com/d/optout.

Simon Cropper | 9 Jul 16:48 2014
Picon

BIG data... What is BIG?

Hi,

I have been exploring various projects that claim to handle BIG data but 
to be honest most do not qualify what BIG actually means.

I remember the days when programs specified the maximum number of 
records, maximum number of fields and maximum number of tables in a 
database that could be manipulated at any one time. Why don't these 
types of specs get provided for languages and libraries anymore?

What are peoples impression of what BIG actually means when used to 
describe large datasets?

To me BIG is millions of records and multiple linked tables.

Simon

--

-- 
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@...
For more options, visit https://groups.google.com/d/optout.


Gmane