Emlyn Clay | 8 May 14:11 2015
Picon

PyData London 2015 - call for proposals!

Hello Pandas users,

We are looking for speakers to speak at PyData London 2015! If you are interested in speaking then we'd love to hear your proposal for a talk, head over to london.pydata.org to register a speaker profile and submit your proposal. The conference will be held from June 19th - 21st at Bloomberg's headquarters near Moorgate.

PyData London 2015 will be a conference that brings together analysts, scientists developers, engineers, architects and others from the Python data science community to discuss new techniques and tools for management, analytics and visualization of data. We have a lot of pandas users in our group, especially those involved in finance sector. Presentation content can be at a novice, intermediate or advanced level. Talks will run 30-40 min and hands-on tutorials will run 90-120 min.

Very best wishes to all pandas users and we do hope you consider proposing a talk and very best luck to those who do!

Warmest regards,

Emlyn, Ian, Calvin, Cecilia, Florian, Graham, Slavi, Leah and James
The PyData London 2015 committee

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Jeff Reback | 11 May 17:42 2015
Picon

ANN: pandas 0.16.1 released

Hello,

We are proud to announce v0.16.1 of pandas, a minor release from 0.16.0. 

This release includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug fixes. 

This was a release of 7 weeks with 222 commits by 57 authors encompassing 85 issues.

We recommend that all users upgrade to this version.

What is it:

pandas is a Python package providing fast, flexible, and expressive data
structures designed to make working with “relational” or “labeled” data both
easy and intuitive. It aims to be the fundamental high-level building block for
doing practical, real world data analysis in Python. Additionally, it has the
broader goal of becoming the most powerful and flexible open source data
analysis / manipulation tool available in any language.

Highlights of this release include:
  • Support for CategoricalIndex, a category based index, see here
  • New section on how-to-contribute to *pandas*, see here
  • Revised "Merge, join, and concatenate" documentation, including graphical examples to make it easier to understand each operations, see here
  • New method sample for drawing random samples from Series, DataFrames and Panels. See here
  • The default Index printing has changed to a more uniform format, see here
  • BusinessHour datetime-offset is now supported, see here
  • Further enhancement to the .str accessor to make string operations easier, see here


Documentation:
http://pandas.pydata.org/pandas-docs/stable/

Source tarballs, windows binaries are available on PyPI:

windows binaries are courtesy of  Christoph Gohlke and are built on Numpy 1.8
macosx wheels are courtesy of Matthew Brett

Please report any issues here:
https://github.com/pydata/pandas/issues


Thanks

The Pandas Development Team


Contributors to the 0.16.1 release

  • Alfonso MHC
  • Andy Hayden
  • Artemy Kolchinsky
  • Chris Gilmer
  • Chris Grinolds
  • Dan Birken
  • David BROCHART
  • David Hirschfeld
  • David Stephens
  • Dr. Leo
  • Evan Wright
  • Frans van Dunné
  • Hatem Nassrat
  • Henning Sperr
  • Hugo Herter
  • Jan Schulz
  • Jeff Blackburne
  • Jeff Reback
  • Jim Crist
  • Jonas Abernot
  • Joris Van den Bossche
  • Kerby Shedden
  • Leo Razoumov
  • Manuel Riel
  • Mortada Mehyar
  • Nick Burns
  • Nick Eubank
  • Olivier Grisel
  • Phillip Cloud
  • Pietro Battiston
  • Roy Hyunjin Han
  • Sam Zhang
  • Scott Sanderson
  • Stephan Hoyer
  • Tiago Antao
  • Tom Ajamian
  • Tom Augspurger
  • Tomaz Berisa
  • Vikram Shirgur
  • Vladimir Filimonov
  • William Hogman
  • Yasin A
  • Younggun Kim
  • behzad nouri
  • dsm054
  • floydsoft
  • flying-sheep
  • gfr
  • jnmclarty
  • jreback
  • ksanghai
  • lucas
  • mschmohl
  • ptype
  • rockg
  • scls19fr
  • sinhrks

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Keith Brown | 11 May 17:40 2015
Picon

Comparing dataframe columns

So I have a df like this.

df=pd.DataFrame([
            ['2012-02-03 13:00','honda',100,'k1',False],
            ['2012-02-03 13:21','nissan',100,'k1',False],
            ['2012-02-03 11:03','toyota',400,'d1',False],
            ['2012-02-03 10:03','bmw',300,'s1',False],
            ['2012-02-03 11:02','toyota',400,'d1',False],
            ],
            columns=['ts','manufacture','size','form','sentinel'])
df.ts=pd.to_datetime(df.ts)
df=df.sort(['ts','manufacture'])
df['verified']=False
df

The dataframe is sorted by timestamp and manufacture. 
Find all duplicate manufacture (toyota) and then compare to see if size (400 & 400) match. 
If they do, change sentinal to True for the first occurance and verified to True on the second instance

So, I am able to find the manufacture. Toyota here
vc=df.manufacture.value_counts()
vci=vc[vc>1]

But, I am not sure how to compare inside the original dataframe, df

Any ideas?

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Keith Brown | 5 May 00:10 2015
Picon

correct way to plot with ipython notebook

At the moment, I do something like this
f=plt.figure(figsize=(10,10))
graph = f.add_plot(111)

for key,group in df.groupby('e'):
  graph=group.plot(ax=f.gca(),x=['t'],y=['b'])

so this graphs. But, when I try to add annotations to the graph inside the for loop I keep getting
<matplotlib.legend.Legend at 0x....>

Now, I have tried doing a .show many times. and still get this problem. Sometimes it works and sometimes it doesn't.
Is there a canonical way you folks plot with annotations?
 


 

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Neal Becker | 4 May 20:03 2015
Picon

select all columns with identical data

In a dataframe, I'd like to find all the columns where all the data (rows) are identical.  Any suggestions?

--
Those who don't understand recursion are doomed to repeat it

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Neal Becker | 1 May 20:22 2015
Picon

problems with to_csv and to_string

I have a bunch of complex values.  I want to convert to csv but want to control the precision (I'd like 0.2g format)

Well to_string has formatters=, which I could use:

 cases.to_string(formatters={label : lambda s : '(%.2g,%.2g)'%(s.real,s.imag) for label in ['a0','b0','g0','a1','b1','g1','a2','b2','g2']})
Out[72]: u'               a0              b0             g0              a1              b1           g1              a2            b2             g2\n0  (2.3e+03,-1.9)  (3.5e+02,-6.2)  (3.4e+02,8.9)  (2.2e+03,0.11)  (6.7e+02,-6.5)  (-0.24,8.4)  (2.2e+03,-3.9)    (2.1,-5.6)    (6.7e+02,9)\n2  (2.3e+03,-1.9)  (9.6e+02,-7.4)  (9.5e+02,8.6)     (1.9e+03,3)    (1.6e+03,-6)   (-3.3,9.1)  (1.9e+03,-6.3)  (0.041,-6.5)  (1.6e+03,5.7)\n4  (2.2e+03,-1.7)  (1.1e+03,-7.9)  (1.2e+03,8.5)   (1.8e+03,4.3)  (1.8e+03,-5.4)    (-13,9.6)    (1.8e+03,-7)    (1.9,-7.1)  (1.8e+03,3.9)'

Well that's nice, but I want comma for delimiter.  I don't see any option to set delimiter in to_string.

What about to_csv?


 cases.to_csv(formatters={label : lambda s : '(%.2g,%.2g)'%(s.real,s.imag) for label in ['a0','b0','g0','a1','b1','g1','a2','b2','g2']})

Oh, that doesn't work either.  to_csv doesn't complain about formatters=, but it doesn't seem to do anything (and it's not documented).

So to summarize, I propose:
1. to_string needs delimiter=
2. to_csv needs formatters=


--
Those who don't understand recursion are doomed to repeat it

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Francesc Alted | 1 May 11:26 2015
Picon

ANN: PyTables 3.2.0 RC2 is out

===========================
 Announcing PyTables 3.2.0rc2
===========================

We are happy to announce PyTables 3.2.0rc2.

*******************************
IMPORTANT NOTICE:

If you are a user of PyTables, it needs your help to keep going.  Please
read the next thread as it contains important information about the
future (or lack of it) of the project:

https://groups.google.com/forum/#!topic/pytables-users/yY2aUa4H7W4

Thanks!
*******************************

What's new
==========

This is a major release of PyTables and it is the result of more than a
year of accumulated patches, but most specially it fixes a couple of
nasty problem with indexed queries not returning the correct results in
some scenarios (mainly pandas users).  There are many usability and
performance improvements too.

In case you want to know more in detail what has changed in this
version, please refer to: http://www.pytables.org/release_notes.html

You can install it via pip or download a source package with generated
PDF and HTML docs from:
http://sourceforge.net/projects/pytables/files/pytables/3.2.0rc2

For an online version of the manual, visit:
http://www.pytables.org/usersguide/index.html

What it is?
===========

PyTables is a library for managing hierarchical datasets and
designed to efficiently cope with extremely large amounts of data with
support for full 64-bit file addressing.  PyTables runs on top of
the HDF5 library and NumPy package for achieving maximum throughput and
convenient use.  PyTables includes OPSI, a new indexing technology,
allowing to perform data lookups in tables exceeding 10 gigarows
(10**10 rows) in less than a tenth of a second.

Resources
=========

About PyTables: http://www.pytables.org

About the HDF5 library: http://hdfgroup.org/HDF5/

About NumPy: http://numpy.scipy.org/

Acknowledgments
===============

Thanks to many users who provided feature improvements, patches, bug
reports, support and suggestions.  See the ``THANKS`` file in the
distribution package for a (incomplete) list of contributors.  Most
specially, a lot of kudos go to the HDF5 and NumPy makers.
Without them, PyTables simply would not exist.

Share your experience
=====================

Let us know of any bugs, suggestions, gripes, kudos, etc. you may have.

----

  **Enjoy data!**

  -- The PyTables Developers

--

-- 
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@...
For more options, visit https://groups.google.com/d/optout.

Keith Brown | 28 Apr 03:31 2015
Picon

merging two data frames so its meaningful

i have two dataframes.

df1 = pd.DataFrame(np.random.randn(10,2),index=pd.date_range('4/1/2013 00:00:00', periods=10,freq='T'),columns=['a','b'])
df2 = pd.DataFrame(np.random.randn(20,2),index=pd.date_range('4/1/2013 00:00:00', periods=20,freq='T'),columns=['a','b'])

df2 has more rows. 

I would like to merge the two dataframes and plot values of a & b. But I want to labels to be ("a df1", "b df1", "a df2", "b df2"). I don't want the values to be totally lost.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Dr. Leo | 27 Apr 00:07 2015

ANN: pandaSDMX v0.2.1 released

Hi,

I am pleased to announce the release of pandaSDMX v0.2.1. This is an
important feature release. Support for zip files has been added to the
get API. And zip files are automatically downloaded from Eurostat if the
original request yielded a message with a footer containing a link to
such zip file generated on demand. This happens when requesting large
datasets. Moreover, the pandas writer parses more time-period formats.
The documentation has been extended.

For more details visit http://pandasdmx.readthedocs.org.

Enjoy!

Leo

About pandaSDMX:

pandaSDMX is an Apache 2.0-licensed Python package aimed at becoming the
most intuitive and versatile tool to retrieve and acquire statistical
data and metadata disseminated in
SDMX format. It works well with the SDMX services of the European
statistics office (Eurostat) and the European Central Bank (ECB). While
pandaSDMX is extensible
to cater any output format, it currently supports only pandas.

Main features
 Liste mit 7 Einträgen
• intuitive API inspired by
requests
• support for many SDMX features including
 Liste mit 4 Einträgen Verschachtelung  1
◦ generic datasets
◦ data structure definitions, codelists and concept schemes
◦ dataflow definitions
◦ categorisations and category schemes
Listenende Verschachtelung  1
• pythonic representation of the SDMX information model
• find dataflows by name or description in multiple languages if available
• read and write local files for offline use
• writer transforming SDMX generic datasets into multi-indexed pandas
DataFrames or Series of observations and attributes
• extensible through custom readers and writers for alternative input
and output formats of data and metadata
Listenende

--

-- 
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@...
For more options, visit https://groups.google.com/d/optout.

Dr. Leo | 23 Apr 08:25 2015

How to parse semestrial (half-yearly) period strings such as '2011S1'

Hi,

some time series of the ECB and other agencies have with six-monthly
periodicity.

The string representations such as '2011S1', '2011S2' are misinterpreted
by pandas.period_range and friends as seconds. I wonder how best to work
around this. It seems that pandas does not know half-yearly periods
along with years, quarters, weeks etc.

It is possible to generate period objects "manually" and generate a
period index from them. Period ranges with start, end and freq appear
impossible though.

My personal preference would be to introduce a period token for
semesters such as 'E' or 'X' as the more common 'H' (half-year) is
already reserved for hours, and have pandas read strings like shown
above properly rather than taking '2011' for seconds and raising an error.

Shall I open an issue? Any other ideas?

Leo

--

-- 
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@...
For more options, visit https://groups.google.com/d/optout.

Olof Sjöbergh | 21 Apr 17:08 2015
Picon

Pandas groupby: Possible to build a custom grouper with overlapping groups?

Hi,

I have some problems where I would like to create Pandas groupings with overlapping groups, and am trying to find out if this is possible. Basically, I would like to build a custom grouper that assigns some items to multiple groups, and then be able to run aggregations on those groups.

This is related to a question posted on Stackoverflow here: http://stackoverflow.com/questions/29032937/aggregate-events-with-start-and-end-times-with-pandas
The data there is for a number of events that have both a start and an end time, and then I need to calculate certain aggregations for the items active at different times. A custom grouper that could assign the items to multiple groups would be a nice solution.

I have tried to read through the source in groupby.py but it's a bit hard to follow at in some parts. 

Do you think it is possible to build such a custom grouper, or is it not supported?

Best regards,

Olof Sjöbergh


--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.

Gmane