Julie Rozenberg | 6 Aug 12:27 2014
Picon

pb with append

I'm working with household surveys from developing countries. I run a model based on the surveys and I store the results in a .h5 file using append.

It usually works well (i.e. I can append the results from different countries) but with a small number of countries I get the following error when appending:
ValueError: cannot match existing table structure for [data1,data2,...] on appending data

The error happens with the model results but I've realized it also happens with some columns of the underlying raw data. And it happens also when I keep only a small number of rows (see attached example).

I have checked the type of each column and they are similar, and I have used convert_objects(convert_
numeric=True) but it didn't help.

Any idea where this comes from?

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Attachment (AFG_GIDD_head.csv): text/csv, 2202 bytes
Attachment (BGD_GIDD_head.csv): text/csv, 1081 bytes
Attachment (debug.py): text/x-python, 562 bytes
'Michael' via PyData | 6 Aug 11:12 2014

shortcut for list of columns

Whenever i need list of columns (or string in general)
I find myself writing:

'col1 col2 col3 col4 col5'.split()

instead of

['col1','col2','col3','col4','col5']

It's quite addictive

So I was wondering if this would be worthy to be added into pandas.
To have something like this:

df.ix[ : , 'col1 col2 col3 col4 col5']

What do you think?
 

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Tejesh Papineni | 31 Jul 03:45 2014
Picon

pandas -- datetime index and labeled columns

dates = pd.date_range('20130101',periods=6)df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))df A B C D 2013-01-01 0.469112 -0.282863 -1.509059 -1.135632 2013-01-02 1.212112 -0.173215 0.119209 -1.044236 2013-01-03 -0.861849 -2.104569 -0.494929 1.071804 2013-01-04 0.721555 -0.706771 -1.039575 0.271860 2013-01-05 -0.424972 0.567020 0.276232 -1.087401 2013-01-06 -0.673690 0.113648 -1.478427 0.524988 [6 rows x 4 columns]df.A gives
2013-01-01 -0.467582 2013-01-02 0.289209 2013-01-03 0.489941 2013-01-04 0.071964 2013-01-05 -0.697187 2013-01-06 -1.073856but i need only column A without date-index like this-0.467582 0.289209 0.489941 0.071964 -0.697187 -1.073856

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Hernan Avella | 29 Jul 18:26 2014
Picon

Pandas df.at_time NotImplementedError

Hi, I have the following indexed df

Out[9]: 
                                    Open     High      Low       Close
DateTime                                               
2014-04-25 17:06:00  1860.00  1860.25  1859.75  1860.00
2014-04-25 17:07:00  1860.00  1860.25  1859.75  1860.00
2014-04-25 17:08:00  1860.00  1860.00  1859.75  1860.00
2014-04-25 17:09:00  1860.00  1860.00  1859.75  1859.75
2014-04-25 17:10:00  1859.75  1860.00  1859.75  1859.75
2014-04-25 17:11:00  1860.00  1860.25  1859.75  1860.25
2014-04-25 17:12:00  1860.00  1860.25  1860.00  1860.00
2014-04-25 17:13:00  1860.25  1860.25  1860.00  1860.25
2014-04-25 17:14:00  1860.00  1860.25  1860.00  1860.25
2014-04-25 17:15:00  1860.00  1860.75  1860.00  1860.25

Im trying to subset the rows a a certain time. I found this procedure and it works well for what I want

df1 = df.at_time(time(15, 0), asof=False)

However, thinking that my data might have holes and some times the specified time might not exist, I would like to get the closest one. I read about "asof" and it seems to do what I want, however when I do this

df1 = df.at_time(time(15, 0), asof=True)

I get a big error (see below). Is there something I can do to make this work?

Thanks

Hernan
Python 2.7.6, Pandas 2.0, Enthought Canopy 1.4
------------------------------------------------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-28-3c10b7e963c9> in <module>()
----> 1 df2 = df1.at_time(time(15, 0), asof=True)

C:\Users\Hernan\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\generic.pyc in at_time(self, time, asof)
   2767         """
   2768         try:
-> 2769             indexer = self.index.indexer_at_time(time, asof=asof)
   2770             return self.take(indexer, convert=False)
   2771         except AttributeError:

C:\Users\Hernan\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\tseries\index.pyc in indexer_at_time(self, time, asof)
   1689 
   1690         if asof:
-> 1691             raise NotImplementedError
   1692 
   1693         if isinstance(time, compat.string_types):

NotImplementedError: 
-----------------------------------------------------------------------------------------------------------------------------------

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Paul Hobson | 29 Jul 17:33 2014
Picon

Weird groupby behavior

Hey gang,

I was trying to help out the asker of this question on SO:


As a result, I'm seeing weird behavior with groupby. The example below demonstrations it

import pandas as pd

x = pd.DataFrame([
    ['best', 'a', 'x'],
    ['worst', 'b', 'y'],
    ['best', 'c', 'x'],
    ['best','d', 'y'],
    ['worst','d', 'y'],
    ['worst','d', 'y'],
    ['best','d', 'z'],
], columns=['a', 'b', 'c'])
groups = x.groupby(by='c') print(groups.get_group('y'))
# a b c # 1 worst b y # 3 best d y # <-- shouldn't this appear below? # 4 worst d y # 5 worst d y

print(groups.filter(lambda g: g['a'] == 'best'))
a b c 0 best a x 2 best c x 6 best d z
Basically I'm curious why a result for the "y" group doesn't appear after that filter operation. Any insight would be most appreciated.
-paul

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Keith Brown | 29 Jul 12:34 2014
Picon

multiprocessing

i have a function which returns a pandas.DataFrame after it does some calculations. 

def f1(colList,verbose=False):
  df=pandas.DataFrame(np.random.randn(100, 4), columns=colList)
 if verbose:
   print "work done"
 return df

dataSet=[]
for i in open('set'):
  dataSet.append(i)

now I would like to run f1 in parallel with multiprocessing. What is the best way to do this? 
dataSet size can be in the thousands thats why i want to do it in parallel.





--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
cloverethos | 28 Jul 18:33 2014
Picon

dplyr style data manipulation

Hi guys,

I am relatively new to Python and pandas and 
really think they are a great combination for data manipulation.
But I really miss the dplyr package in R, it allows us to write:

> mutate(df,c = a + b)

rather that in pandas

> df["c"] = df["a"] + df["b"]

Of course this is a over-simplified example, but my point is dplyr
provides us a more concise way to express the data manipulation 
in our mind.

So I wish to implement something in Python. I have hacked up
a extremely naive implementation of mutate in this 
ipython notebook[ http://nbviewer.ipython.org/gist/catethos/ab979acf3f7c818828a1 ].
Hope someone can come out with a better idea.

Regards,
Caleb


--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Damian Avila | 24 Jul 23:06 2014

ANN: Bokeh 0.5.1 released

On behalf of the Bokeh team, I am very happy to announce the release of Bokeh version 0.5.1! (http://continuum.io/blog/bokeh-0.5.1)

Bokeh is a Python library for visualizing large and realtime datasets on the web.

This release includes many bug fixes and improvements over our last recent 0.5 release:

  * Hover activated by default
  * Boxplot in bokeh.charts
  * Better messages when you forget to start the bokeh-server
  * Fixed some packaging bugs
  * Fixed NBviewer rendering
  * Fixed some Unicodeencodeerror

See the CHANGELOG for full details.

In upcoming releases, you should expect to see dynamic, data-driven layouts (including ggplot-style auto-faceting), as well as R language bindings, more statistical plot types in bokeh.charts, and cloud hosting for Bokeh apps.

Don't forget to check out the full documentation, interactive gallery, and tutorial at

    http://bokeh.pydata.org

as well as the new Bokeh IPython notebook nbviewer index (including all the tutorials) at:

    http://nbviewer.ipython.org/github/ContinuumIO/bokeh-notebooks/blob/master/index.ipynb

If you are using Anaconda, you can install with conda:

    conda install bokeh

Alternatively, you can install with pip:

    pip install bokeh

BokehJS is also available by CDN for use in standalone javascript applications:

    http://cdn.pydata.org/bokeh-0.5.1.min.js
    http://cdn.pydata.org/bokeh-0.5.1.min.css

Issues, enhancement requests, and pull requests can be made on the Bokeh Github page: https://github.com/continuumio/bokeh

Questions can be directed to the Bokeh mailing list: bokeh <at> continuum.io

If you have interest in helping to develop Bokeh, please get involved!

Damián.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Jaidev Deshpande | 24 Jul 15:54 2014
Picon

Cross tabulation output as scipy.sparse matrices.

Hi,

Apologies if this is not the right forum for this post.

I wonder if it would be useful to retrieve the output of a crosstab operation as a sparse matrix. (Something along the lines of the xtabs function in R, which has an argument to optionally make crosstabs sparse.)

I'm working on a problem where I have two series objects, and the number of unique items in one is nearly five times that of the other. Even the shorter one of them has a length of the order of ~ 80k unique elements. A crosstab operation on the two series unsurprisingly causes a MemoryError, but a scipy.sparse.csr_matrix seems to work fine.

I have a crude hack to make the crosstab function return a sparse matrix. It's quite slow, but doesn't raise MemoryErrors. Does this seem like something Pandas could use?

Thanks

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Chao Yue | 24 Jul 15:13 2014
Picon

Series get automatically cut when being assigned to a dataframe with shorter index range

Dear all,

I found in the version of 0.14.0, when a series is assigned as a column to an existing dataframe with a shorter index range,
it gets automatically cut (which was not the behaviour before). Is this a designed feature? An example:

In [39]:

pa.__version__

Out[39]:

'0.14.0'


In [44]:

dft = pa.DataFrame({'colA':np.arange(36.)},index=pa.date_range('19900101','19921231',freq='M'))

stemp = pa.Series(np.arange(48),index=pa.date_range('19890101','19921231',freq='M'))

dft['colC'] = stemp

In [45]:

dft.head()

Out[45]:
    colA     colC
1990-01-31     0     12
1990-02-28     1     13
1990-03-31     2     14
1990-04-30     3     15
1990-05-31     4     16

But the revserse is fine:
In [46]:

dft = pa.DataFrame({'colC':stemp})

dft['colA'] = pa.Series(np.arange(36),index=pa.date_range('19900101','19921231',freq='M'))

dft.head()

Out[46]:
    colC     colA
1989-01-31     0     NaN
1989-02-28     1     NaN
1989-03-31     2     NaN
1989-04-30     3     NaN
1989-05-31     4     NaN

Thanks a lot for all for this wonderful tool.

Best,

Chao

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Dave Hirschfeld | 24 Jul 12:49 2014
Picon

DatetimeIndex equality comparison raises exception

Testing a DatetimeIndex for equality with an empty tuple raises a TypeError which seems like a bug to me.
If the types aren't equal the equality comparison should simply return False IMHO.

I've submitted a bug report <at> https://github.com/pydata/pandas/issues/7830

Thanks,
Dave

In [114]: def test_datetimeindex__eq__():

     ...: """Equality comparisons should never raise an exception"""

     ...: pd.DatetimeIndex(['01-Jan-2015']) == ()

     ...:


In [115]: test_datetimeindex__eq__()

Traceback (most recent call last):


File "<ipython-input-115-9b7967ca9de2>", line 1, in <module>

test_datetimeindex__eq__()


File "<ipython-input-114-7ae5783187b8>", line 3, in test_datetimeindex__eq__

pd.DatetimeIndex(['01-Jan-2015']) == ()


File "C:\dev\bin\Anaconda\lib\site-packages\pandas\tseries\index.py", line 90, in wrapper

other = _ensure_datetime64(other)


File "C:\dev\bin\Anaconda\lib\site-packages\pandas\tseries\index.py", line 112, in _ensure_datetime64

raise TypeError('%s type object %s' % (type(other), str(other)))



--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.

Gmane