Juliana Ullmann | 16 May 01:37 2014
Picon

Help with Pandas recursive lookthrough function

hi!

I am quite a newbie to Python and Pandas. Could you please help me with the recursive function that performs hierarchical look through and some calculations of the below <b>data</b> dataset. Hierarchy is via OBJECT_ID --> ROOT_OBJECT_ID. 

import datetime import numpy as np import pandas as pd import pandas.io.data from pandas import Series, DataFrame data = { 'ED': ['12.05.2014','12.05.2014','12.05.2014','12.05.2014','12.05.2014','12.05.2014','12.05.2014'], 'OBJECT_ID': [32,32,44,44,55,55,55], 'UNITS': [100,10,221,111,20,2,111], 'PRICE': [32,4,23,2,100,23,2], 'ROOT_OBJECT_ID': [0,44,0,55,0,0,0] } df = pd.DataFrame.from_dict(data) df['LOOKTHROUGH'] = False; df['VALUE'] = df.UNITS * df.PRICE; df["WEIGHT"] = df.groupby("OBJECT_ID").VALUE.transform(lambda x: x/x.sum()); df

After lookthrough I am expecting additional records(Lookthrough=True) in my dataset, with new data being calculated like this: child weight(units,value) cells are multiplied by parent weight cell. Can you please help me? E.g. trying to achieve something like this:

ED OBJECT_ID PRICE ROOT_OBJECT_ID UNITS LOOKTHROUGH VALUE WEIGHT PATH 0 12.05.2014 32 32 0 100 False 3200 0.987654 32 1 12.05.2014 32 4 44 10 False 40 0.012346 32 2 12.05.2014 44 23 0 221 False 5083 0.958153 44 3 12.05.2014 44 2 55 111 False 222 0.041847 44 4 12.05.2014 55 100 0 20 False 2000 0.881834 55 5 12.05.2014 55 23 0 2 False 46 0.020282 55 6 12.05.2014 55 2 0 111 False 222 0.097884 55 7 12.05.2014 32 32 0 100 True 3200 0.987654 32 8 12.05.2014 32 23 0 1.66635 True 38.32612 0.011829 32/44 9 12.05.2014 32 100 0 0.01476 True 1.47608 0.000456 32/44/55 10 12.05.2014 32 23 0 0.00148 True 0.03395 0.000010 32/44/55 11 12.05.2014 32 2 0 0.08192 True 0.16385 0.000051 32/44/55 12 12.05.2014 44 23 0 221 True 5083 0.958153 44 13 12.05.2014 44 100 0 1.95767148 True 195.767148 0.036902107 44/55 14 12.05.2014 44 23 0 0.195765391 True 4.502604 0.000848741 44/55 15 12.05.2014 44 2 0 10.865124 True 21.730248 0.004096152 44/55

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Matthew Brett | 14 May 09:19 2014
Picon

AttributeError: 'PandasExprVisitor' test errors on Python 3.4

Hi,

I've testing the scipy stack on current Pythons, and I'm running into an error on Python 3.4 (with numexpr) that I don't see on Python 2.7 or Python 3.3:

http://nipy.bic.berkeley.edu/builders/scipy-stack-3.4.0-wheel-requires/builds/1/steps/shell_8/logs/stdio

There are a number of errors that look like this:

ERROR: test_scalar_unary (pandas.computation.tests.test_eval.TestEvalNumexprPandas)
----------------------------------------------------------------------
Traceback (most recent call last):
   ...
AttributeError: 'PandasExprVisitor' object has no attribute 'visit_NameConstant'

Is anyone else seeing these errors?

Cheers,

Matthew

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Keith Brown | 14 May 13:08 2014
Picon

diff multiple columns

I have a DataFrame like this,

                  date    l   k
0  Tue May 11 17:10:30  319  75
1  Tue May 11 17:11:00  119  65
2  Tue May 11 17:11:30  229  75
3  Tue May 11 17:12:00  239  85
4  Tue May 11 17:12:30  319  45
5  Tue May 11 17:13:00  239  95
6  Tue May 11 17:13:30  319  45


I would like to get the previous difference of both columns, l and k.

I know this works

df['k'].diff() but I don't know how I can have 'l' also. 


--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Paul Hobson | 13 May 18:26 2014
Picon

Chaining pandas DataFrame operations (js-style)

I was curious if chaining DataFrame operations is considered an inherently bad (or good?) practice. For example:

tidy_errors = nonzero_errors.rename(columns=rename_volumes) \
                            .stack() \
                            .dropna() \
                            .reset_index() \
                            .rename(columns=rename_columns) \
                            .set_index(index_cols)

Obviously this will be harder to debug. But once I've worked out all of the kinks with separate statements, there us something oddly satisfying about building these chained commands.

Should I set my enjoyment aside and break these up into separate statements?
-paul

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Neal Becker | 12 May 15:40 2014
Picon

pandas docs loading very slowly today

Is it just me?  All the documentation pages on pandas.pydata.org are taking a very long time to render - chrome says 'waiting for <not sure what url>', while the screen remains black.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Tom Fawcett | 10 May 09:36 2014
Picon

*** ValueError: Error parsing datetime string "?" at position 0

I have a simple pandas DataFrame, shown below.  It has some missing values in it which I want to replace with "?".  When I do:
  insts.fillna('?')
I get the error:
*** ValueError: Error parsing datetime string "?" at position 0
None of the datetime fields have missing values, so I'm at a loss as to what's causing this.
Furthermore, when I do:
  insts.fillna(-1)
  insts.replace(-1, '?')
it works fine.  Is this a bug?  It's specifically breaking in ./pandas/core/internals.py(1636)fillna()
Thanks,
-Tom

              Tstart aw_FATMASS_sym_T2 fa_Calories_Burned_sym_T2  \
TIMESTAMP
2011-11-25 2011-11-23               NaN                         C
2011-12-20 2011-12-18                 G                         D
2012-01-16 2012-01-14                 H                         G
2012-01-20 2012-01-18                 G                         G
2012-02-28 2012-02-26                 C                         F

           run_Calories_Burned_sym_T2 w_FATFREEMASS_sym_T2  \
TIMESTAMP
2011-11-25                        NaN                  NaN
2011-12-20                        NaN                    E
2012-01-16                        NaN                    B
2012-01-20                          E                    A
2012-02-28                        NaN                    G

           fa_Minutes_Sedentary_sym_T2 fa_Active_Score_sym_T2  \
TIMESTAMP
2011-11-25                           B                      C
2011-12-20                           B                      E
2012-01-16                           A                      G
2012-01-20                           B                      G
2012-02-28                           A                      F

           run_Distance_sym_T2 aw_WEIGHT_sym_T2 w_WEIGHT_sym_T2  \
TIMESTAMP
2011-11-25                 NaN                G               G
2011-12-20                 NaN                F               F
2012-01-16                 NaN                G               G
2012-01-20                   E                B               B
2012-02-28                 NaN                E               E

           fs_Time_in_Bed_sym_T2 fa_Minutes_Lightly_Active_sym_T2  \
TIMESTAMP
2011-11-25                     D                                F
2011-12-20                     C                                G
2012-01-16                     E                                H
2012-01-20                     B                                D
2012-02-28                     F                                G

           w_FATMASS_sym_T2 fa_Activity_Calories_sym_T2 fa_Distance_sym_T2  \
TIMESTAMP
2011-11-25              NaN                           C                  B
2011-12-20                G                           E                  C
2012-01-16                H                           H                  D
2012-01-20                G                           F                  G
2012-02-28                C                           F                  C

           run_Average_Speed_sym_T2  run_Climb_sym_T2  \
TIMESTAMP
2011-11-25                      NaN               NaN
2011-12-20                      NaN               NaN
2012-01-16                      NaN               NaN
2012-01-20                        E               NaN
2012-02-28                      NaN               NaN

           fs_Minutes_Asleep_sym_T2  run_Average_Heart_Rate_sym_T2  \
TIMESTAMP
2011-11-25                      NaN                            NaN
2011-12-20                        C                            NaN
2012-01-16                        E                            NaN
2012-01-20                        A                            NaN
2012-02-28                        G                            NaN

           fs_Minutes_Awake_sym_T2
TIMESTAMP
2011-11-25                       H ...
2011-12-20                       E ...
2012-01-16                       C ...
2012-01-20                       D ...
2012-02-28                       B ...

[5 rows x 64 columns]

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Rishabh SHARMA | 9 May 16:12 2014
Picon

dataframe plotting function


hi

I see that dataframe when plotted allows viewing of values when hovering over the plot.

https://cloud.githubusercontent.com/assets/1042961/2793219/39b9dc24-cbdd-11e3-9606-d016368caed4.png

I was trying to increase the font size of the marked region in image. Giving fontsize parameter increases the size of the labels but not this.

Thanks
Rishabh Sharma

PS:This is actually a sunpy.lightcurve which is just container for a pandas.dataframe,ie lightcurve.data isinstanceof dataframe, and lightcurve.data.plot leads to this plot above

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Sri K | 8 May 21:09 2014
Picon

Re: apply, applymap, map, still confused

Can I apply applymap or apply on elements of only one column while ignoring the rest? I also want the data applied so that the original dataframe is changed. Is this possible?

On Wednesday, March 21, 2012 8:25:23 AM UTC-7, Aman wrote:
Hi Mihai,

I've whipped up a quick example that should clarify things.

import numpy as np
import pandas

df = pandas.DataFrame(np.random.rand(10,3),columns=['A','B','C'])
print df

print "add 1 to every element"
print df.applymap(lambda x: x+1)

print "add 2 to row 3 and return the series"
print df.apply(lambda x: x[3]+2,axis=0)

print "add 3 to col A and return the series"
print df.apply(lambda x: x['A']+1,axis=1)


On Wed, Mar 21, 2012 at 9:52 AM, Mihai Ionescu <if.m...-/E1597aS9LQAvxtiuMwx3w@public.gmane.org> wrote:
as a beginner,
i'm still confused with these methods names,

can we find some more descriptive names for these?
applymap is the most confusing,
changing this method's name may be enough

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pydata/-/8bVhwF5fO1wJ.
To post to this group, send email to pyd...-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To unsubscribe from this group, send email to pydata+un... <at> googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pydata?hl=en.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Brendan Barnwell | 8 May 07:59 2014
Picon

Equivalent of Series.map for DataFrame

I asked this question a while ago on StackOverflow but received no fully satisfying reply (http://stackoverflow.com/questions/22293683/equivalent-of-series-map-for-dataframe).

The question is, given a DataFrame with some columns, how can I get the equivalent of Series.map, to map into a Series with a MultiIndex?  That is, if the data in the DataFrame is equivalent to

[
  ["A", 1]
  ["B", 2]
  ["C", 3]
]

And I have a series that has a MultiIndex where ("A", 1), ("B", 2), and ("C", 3) are valid indices, what is the way to do myDataFrame.map(mySeries), to grab the elements from the series whose indices correspond to the data in the DataFrame?

In the question there I noted a few ways of doing this, such as converting the DataFrame to lists of lists or row-wise applying a lambda that grabs the elements from the Series.  Andy Hayden suggested another possibility, namely making a MultiIndex from the DataFrame.  However, these all seem unnecessarily cumbersome.  Shouldn't there be a one-shot way to map multiple DataFrame columns *directly* into a MultiIndexed object?

Incidentally, the fastest solution I found at that time involved using df.apply(tuple, axis=1).  However, this no longer seems to work, because apparently pandas now expands tuple results into multiple columns.  Was this change documented?  Why was it made?

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Matthew Brett | 4 May 04:22 2014
Picon

Some test failures for OSX, can I safely ignore them?

Hi,

I have been experimenting building a scipy stack installation using OSX wheels, but I've run into some errors testing pandas, here:

http://nipy.bic.berkeley.edu/builders/scipy-stack-2.7.6-wheel-staging/builds/1/steps/shell_16/logs/stdio

I just wanted to confirm with you that I can safely ignore these:

ERROR: test_round_trip_frame (pandas.io.tests.test_clipboard.TestClipboard)
ERROR: test_round_trip_frame_sep (pandas.io.tests.test_clipboard.TestClipboard)
ERROR: test_round_trip_frame_string (pandas.io.tests.test_clipboard.TestClipboard)

as being related to https://github.com/pydata/pandas/issues/6317

and these:

FAIL: test_fred_multi (pandas.io.tests.test_data.TestFred)
-> AssertionError: expected 217.47800 but got 217.46600
FAIL: test_fred_parts (pandas.io.tests.test_data.TestFred)
-> AssertionError: 217.29900000000001 != 217.23

because of this explanation:

http://stackoverflow.com/questions/22672409/nosetests-fails-with-pandas

Thanks a lot,

Matthew


--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Vigen Isayan | 24 Apr 23:18 2014
Picon

can not import pandas in Python 2.7.3

Hi all,

 I installed pandas 0.13.1 version in my python 2.7.3.
My question is when i try to import pandas in Python 2.7.3 i get the following error messages:

>>>import pandas
cannot import name hashtable
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "C:\Python27\lib\site-packages\pandas\__init__.py", line 6, in <module>
from . import hashtable, tslib, lib
ImportError: cannot import name hashtable

By the way i also have ipython installed and when i try to import pandas (which came with the installation of ipython) in ipython i don't have that problem.

thanks in advance

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.

Gmane