GV | 4 Dec 20:32 2014

fillna() downcast dictionary use

Hi nice people, 

For whatever reason I'm converting NaN to 0's, and need to downcast all elements in a Pandas dataframe. The fillna() method is very handy to do this, but I can't figure how can I actually make it work.

Let's say I have this dataframe:

df.dtypes 

018dcc90-b1a2-400a-a96e-64b6a101f009      int32
22d8b18e-d980-4833-ac5c-3a81c5675c8f    float64
2ca0b577-5e8e-4dc1-8ea6-600468e725b5    float64
3648db01-b29d-4ab9-835c-83f6a5068fe4    float64
83d91898-7763-47d7-b03b-b92132375c47      int32
92756526-af59-461a-9664-8d8a45da5e7b    float64
e0173ac9-387d-4e8a-a521-3e0f4cc20150    float64
e8e8538f-859b-49e1-b4fe-fefb478e355e    float64
dtype: object

The float64 dtypes come from NaN values. I want these to become a 0 and make their dtype as int32, and don't use 'infer'. According to the fillna() documentation, I did:

df.fillna(0, downcast=int)

but 

ValueError: downcast must have a dictionary or 'infer' as its argument

was returned, And so I tried

df.fillna(0, inplace=True, downcast={int:'int32'})

but got

AssertionError: dtypes as dict is not supported yet


Any help is very appreciated.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
DavidT | 3 Dec 17:29 2014
Picon

Bucket by week and count a condition being true or false?

Hi All,

I'd first like to preface this as I am new to Pandas, but have been doing Python development for 7+ years.

I have data that looks like this:  (there are lots of other columns as well, but I've removed them for brevity)

timestamp                     value
2014-11-25 10:09:48.194645    -0.012545
2014-11-25 12:30:15.992324    0.000369
2014-11-25 12:30:15.992457    0.000369
2014-11-25 12:30:15.994859    0.000369
2014-11-25 12:30:16.232273    -0.000897
2014-11-25 12:30:16.233283    0.001192
2014-11-25 12:30:16.496482    0.001078
2014-11-25 12:30:16.502040    0.000404
2014-11-25 10:41:23.086723    0.000354
2014-11-25 10:41:23.087231    0.000700
2014-11-25 10:58:01.237923    0.000606
2014-11-25 11:04:48.524745    0.000297
2014-11-25 12:02:46.715844    -0.021362

With this data, I want to bucket the timestamps by some grouping (e.g. hour, day, week, month) and in each grouping, count how many times a value is less then zero and how many times a value is greater then or equal to zero.  With this, I want to plot the data as a stacked bar chart both as an outright count as well as a percentage of the overall occurrences in that time frame.

I've been playing around with this for hours and I can't seem to figure out how to accomplish this.  Can someone help here?  Any help is greatly appreciated.

Thanks,
David

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Joris Van den Bossche | 2 Dec 23:39 2014
Picon

To discuss: move online data reader subpackages to separate project outside pandas?

Hi list,

There is an issue opened on github on the question in the title, to think about whether we should move the functionality to read online data sources to a separate package?

https://github.com/pydata/pandas/issues/8961

Some reasons why we would want to move it:
  • This often needs much more urgent releases than pandas to fix changes in the remote sources. So with separate release cycle, this can speed up propagation of bugfixes to users
  • More general: to limit the scope of pandas as the core project, and to encourage that some people take ownership of this subproject (which is easier as a separate project?), publicize it separately, ..
So if you are interested in that dicussion, or would like to contribute to such a project, certainly head over to the github issue or comment here on the mailing list!

Regards,
Joris

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Klamp | 2 Dec 22:01 2014
Picon

Improved support for N-dimensional data in pandas with Dynd?

Is improved N-dimensional data support with Dynd on the roadmap? Maybe something like xray? :  http://xray.readthedocs.org/en/latest/faq.html

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
teramind | 2 Dec 20:07 2014
Picon

Selection by value of column

Hi,

I have a dataframe df and would like to get all rows that fulfill a certain condition e.g. 'State ID'==4
I looked up the http://pandas.pydata.org/pandas-docs/dev/indexing.html page to find a way, but I could not find anything that worked for me.

df=pd.DataFrame({'State ID':[1,2,3,4,5,4],'Value':[6,5,4,3,2,1]})

df.where(df['State ID']==4,df)
df2[{'State.ID':[4]}]                 
df.query('State ID== 4')

Apparently, query cannot handle column names with blanks or dots.

If I change the name to e.g. StateID I can use the query method.

Are there better ways to do such selections quickly?

regards






--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Skipper Seabold | 2 Dec 19:10 2014
Picon

Statsmodels 0.6.1 Release

Hi All,

We have just released 0.6.1. This is a bugfix release. All users are
encouraged to upgrade.

Please report any problems.

The full list of the issues closed:

http://statsmodels.sourceforge.net/stable/release/github-stats-0.6.html#issues-closed-in-0-6-1

The release notes for the 0.6 cycle:

http://statsmodels.sourceforge.net/stable/release/version0.6.html

Cheers,

Skipper

--

-- 
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@...
For more options, visit https://groups.google.com/d/optout.

Daniel Cordeiro | 2 Dec 18:57 2014
Picon

CSV with variable number of columns

Hi all,

I want to read (from sys.stdin) a bunch of lines in CSV format that may have a different number of columns.
I have some lines with 11 and some with 18 columns. I need to read only the first 6 columns, so it should not be a problem. Unfortunately, I always get the error message:

ValueError: Expected 11 fields in line 776483, saw 18

Since the data comes from sys.stdin, I cannot make any assumption about the number of columns or order of lines.
I think it is odd that even passing "usecols = range(0,6)" I get this kind of error. Should I report this as a bug?

Before I start to write my own parser, do you know if there is a way to use read_csv() in my context?


Thanks much,
Daniel

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Paul Blelloch | 30 Nov 03:43 2014
Picon

Convert NaN string to null

I'm probably missing something obvious here, but I'm creating a dataframe from a duct that contains a
number of 'NaN' strings for missing data. Pandas seems to be interpreting these literally as strings. How
do I convert them to nulls so that I can use methods such as dropna to deal with them?

--

-- 
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@...
For more options, visit https://groups.google.com/d/optout.
Joshua Horowitz | 29 Nov 03:14 2014

Introducing pandas-ply, for elegant data-manipulation syntax

pandas is generally quite good at functional, chainable data
manipulation, but I was frustrated with a few ugly gaps, so I built
some dplyr- / SQL-inspired syntactic sugar.

Instead of

  my_awesome_data_frame[(my_awesome_data_frame.month == 1) &
(my_awesome_data_frame.day == 1)]

just write

  my_awesome_data_frame.ply_where(X.month == 1, X.day == 1)

!

Instead of

  my_augmented_data_frame = my_awesome_data_frame[:]
  my_augmented_data_frame['gain'] = my_awesome_data_frame.arr_delay -
my_awesome_data_frame.dep_delay
  my_augmented_data_frame['speed'] = my_awesome_data_frame.distance /
my_awesome_data_frame.air_time * 60

just write

  my_augmented_data_frame = my_awesome_data_frame.ply_select('*',
      gain = X.arr_delay - X.dep_delay,
      speed = X.distance / X.air_time * 60)

!

pandas-ply is available with a simple `pip install pandas-ply`, and
the code lives at https://github.com/coursera/pandas-ply.

I have found it to be useful in my work -- I hope it helps you with yours!

Please send any ideas / issues / comments / criticisms / etc. my way.

Cheers & thanks,
  Josh

--

-- 
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@...
For more options, visit https://groups.google.com/d/optout.

Nick Eubank | 28 Nov 22:16 2014
Picon

Updates on Na / NaN / Missing Support for Int64?

Hi All!

Curious if there were any timelines for when NA / NaN support might arrive for Ints. (I see from docs Pandas is basically waiting for Numpy, but can't find anything on numpy timeline or even if they plan to integrate). 

(I'm sure this is asked and answered many times, but what I can find seems very out of date.) 

I'm strongly considering migrating from R/Stata/Matlab to Pandas/Numpy, but the inability to handle Na in Int types seems like a big limitation for someone always working with dirty data like me (a social scientist), and the Pandas docs suggest they're waiting for Numpy to add support. Just curious when that might happen! 

Thanks!

Nick

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Sudheer Joseph | 28 Nov 12:56 2014
Picon

To make a code which allows user to enter data on monthly basis which gets stored in pandas data fra

 

I have below pandas data-frame object. My objective is to use this as a data base to enter data in to each month using command prompt so that the new data entered via command prompt get saved to the data frame variable. For example an operator wanted to enter November data it should get stored to variable df.NOV. I am presently using if elif for selecting the variable. But is there any way to use a pointer to select required month directly in data frame? for example I get a string variable named CM="NOV" in the code. Is there a way to use df.CM to tell python that it has to enter data for November?

The data structure can be seen at blow link and the code can be seen in second link. Please help. Data https://drive.google.com/file/d/0B3heUQNme7G5UFYxR3Q5MDVNRUU/view?usp=sharing

Code https://drive.google.com/file/d/0B3heUQNme7G5ekR0bVNaQjYtRUk/view?usp=sharing

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.

Gmane