Bill Blount | 8 Jan 03:48 2014
Picon

ranking a series of returns


from the following code...

symbols = ['QQQ','IWM','SPY','MDY','EWJ','DIA','EFA','EPP','EWA','TLT', 'EEM', 'IYR','ILF' ] # define symbols (tickers)
import pandas.io.data as web
import pandas as pd
 
start = datetime(2010, 1, 1)
end = datetime(2014, 1, 3)
df=web.DataReader(symbols, 'yahoo',start)['Close'] 
 
dfretshort = df.pct_change(66)
dfretlong = df.pct_change(132)
dfretblend = (dfretshort*.7)+(dfretlong*.3)




... i would like to select the top three of the 13  dfretblend values at a certain date for further use in back testing.  can someone point me in the right direction?  thanks.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Jeff | 3 Jan 20:08 2014
Picon

pandas tutorials doc section

I think it might be nice to have links to the pandas tutorials that people have created
in the docs.


with any links that are interesting for learning pandas

anyone who wants to create this new doc section via a pull-request is more than welcome.

thanks,

Jeff

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Yaroslav Halchenko | 1 Jan 22:08 2014
Picon

0.13.0 released?

Just found that there was a tag v0.13.0 from y-p as of 12/30 but do not
remember running into announcement, and http://pandas.pydata.org/ still
points to 0.12.0

-- 
Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Senior Research Associate,     Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        

--

-- 
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@...
For more options, visit https://groups.google.com/groups/opt_out.

Richard Styron | 31 Dec 17:32 2013
Picon

academic citation for pandas

Hi,

I'm about to submit a journal article for a project that makes pretty heavy use of pandas.  I have papers to cite for some of the rest of the python scientific stack (ipython, numpy, etc.) but nothing for pandas specifically.  Do the pandas developers have a preference on this?  Maybe the Python for Data Analysis book?

Thanks (especially for the great software),
Richard

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Gour | 25 Dec 15:28 2013
Picon

pandas vs badger for SEO work

Hello,

I want to dive deeper in SEO and would like to use it with some python
tool and/or library.

Short research has led me to pandas library which looks very nice.

However, after visiting his site, I found out about his new startup and
work on the new-generation tool called badger.

Now I have few questions:

1. what will be the future of pandas development, iow. will its
development going on despite the work on badger?

2. is badger meant to (fully replace pandas?

3. according to what I saw in Wes' presentation, it seems that badger
will be closed-source and it looks it might be provided only as some
kind of SaaS?

I do not mind supporting quality apps, but prefer having desktop tool
instead of leaving my data in the cloud.

That was the reason why I quickly eliminated all the SEO cloud-basedtools.

Of course, most of the SEO people are speaking/using Excel, but that's
not option for me 'cause I use linux desktop and will probablly migrate
to Free/PC-BSD OS. I did use Gnumeric spreadsheet lightly, but due to
lack of support for pivot-tables, tried with LibreOffice's Calc.

Now I wonder if I use pandas (as smart spreadsheet), could I continue
with Gnumeric to just prepare reports when the main analysis would be
done with pandas?

Do you, in general, can recommend pandas for SEO work and/or someone can
share his/her experiences?

Afaik, pandas-0.13 is supposed to be fuly py3k-ready?

Wishing you a Merry Christmas,
Gour

-- 
The spirit soul bewildered by the influence of false ego thinks 
himself the doer of activities that are in actuality carried out 
by the three modes of material nature.

http://www.atmarama.net | Hlapicina (Croatia) | GPG: 52B5C810

--

-- 
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@...
For more options, visit https://groups.google.com/groups/opt_out.

Michael Schatzow | 16 Dec 17:53 2013
Picon

0.13.0rc1 groupby.transform

I tried the release candidate and I am having issues with groupby.transform in 0.13 for code that works in 0.12.

I always used df.groupby("date").transform(x.rank() / x.count()) to get daily percentiles of various pieces and it is failing throughout my codebase.  Is there a change to how groupby().transform should work? 


 Here is one example. It is failing in the transform due to what appears to be an indexing issue.


This code segment works fine in 0.12
   print cik_df.head()
   cik_df["portfolio_rank"] = cik_df.groupby("ex-date").transform(lambda x: x.rank(ascending=False))
   print cik_df.head()
0.12
              ex_date  Unnamed: 0 asset_type      cusip              name   shares     value form_type            accepted              period             company_name         cik       sec_id            end_date
0 2002-11-08 00:00:00           0        COM  281020107  EDISON INTERNATI  2792500  27925000    13F-HR 2002-11-08 00:00:00 2002-09-30 00:00:00  APPALOOSA MANAGEMENT LP  0001006438  N03TRC-S-US 2002-11-08 00:00:00
4 2002-11-08 00:00:00           4        COM  453235103    INAMED CORPCOM  3669359  84395000    13F-HR 2002-11-08 00:00:00 2002-09-30 00:00:00  APPALOOSA MANAGEMENT LP  0001006438  M89FLH-S-US 2002-11-08 00:00:00
5 2002-11-08 00:00:00           5        COM  494580103  KINDRED HEALTHCA  2475427  91665000    13F-HR 2002-11-08 00:00:00 2002-09-30 00:00:00  APPALOOSA MANAGEMENT LP  0001006438  P7KT40-S-US 2002-11-08 00:00:00
6 2002-11-08 00:00:00           6     042006  494580111  KINDRED HEALTHCA   720398   5064000    13F-HR 2002-11-08 00:00:00 2002-09-30 00:00:00  APPALOOSA MANAGEMENT LP  0001006438  TXX3S4-S-US 2002-11-08 00:00:00
7 2002-11-08 00:00:00           7     042006  494580129  KINDRED HEALTHCA  1800996   6664000    13F-HR 2002-11-08 00:00:00 2002-09-30 00:00:00  APPALOOSA MANAGEMENT LP  0001006438  QDX491-S-US 2002-11-08 00:00:00
              ex_date  Unnamed: 0 asset_type      cusip              name   shares     value form_type            accepted              period             company_name         cik       sec_id            end_date  portfolio_rank
0 2002-11-08 00:00:00           0        COM  281020107  EDISON INTERNATI  2792500  27925000    13F-HR 2002-11-08 00:00:00 2002-09-30 00:00:00  APPALOOSA MANAGEMENT LP  0001006438  N03TRC-S-US 2002-11-08 00:00:00               3
4 2002-11-08 00:00:00           4        COM  453235103    INAMED CORPCOM  3669359  84395000    13F-HR 2002-11-08 00:00:00 2002-09-30 00:00:00  APPALOOSA MANAGEMENT LP  0001006438  M89FLH-S-US 2002-11-08 00:00:00               2
5 2002-11-08 00:00:00           5        COM  494580103  KINDRED HEALTHCA  2475427  91665000    13F-HR 2002-11-08 00:00:00 2002-09-30 00:00:00  APPALOOSA MANAGEMENT LP  0001006438  P7KT40-S-US 2002-11-08 00:00:00               1
6 2002-11-08 00:00:00           6     042006  494580111  KINDRED HEALTHCA   720398   5064000    13F-HR 2002-11-08 00:00:00 2002-09-30 00:00:00  APPALOOSA MANAGEMENT LP  0001006438  TXX3S4-S-US 2002-11-08 00:00:00               7
7 2002-11-08 00:00:00           7     042006  494580129  KINDRED HEALTHCA  1800996   6664000    13F-HR 2002-11-08 00:00:00 2002-09-30 00:00:00  APPALOOSA MANAGEMENT LP  0001006438  QDX491-S-US 2002-11-08 00:00:00               6

0.13
  fund_data["date"]= fund_data["date"].apply(lambda x: pd.Timestamp(x))
              ex_date  Unnamed: 0 asset_type      cusip              name  \
0 2002-11-08 00:00:00           0        COM  281020107  EDISON INTERNATI   
4 2002-11-08 00:00:00           4        COM  453235103    INAMED CORPCOM   
5 2002-11-08 00:00:00           5        COM  494580103  KINDRED HEALTHCA   
6 2002-11-08 00:00:00           6     042006  494580111  KINDRED HEALTHCA   
7 2002-11-08 00:00:00           7     042006  494580129  KINDRED HEALTHCA   

    shares     value form_type            accepted              period  \
0  2792500  27925000    13F-HR 2002-11-08 00:00:00 2002-09-30 00:00:00   
4  3669359  84395000    13F-HR 2002-11-08 00:00:00 2002-09-30 00:00:00   
5  2475427  91665000    13F-HR 2002-11-08 00:00:00 2002-09-30 00:00:00   
6   720398   5064000    13F-HR 2002-11-08 00:00:00 2002-09-30 00:00:00   
7  1800996   6664000    13F-HR 2002-11-08 00:00:00 2002-09-30 00:00:00   

              company_name         cik       sec_id            end_date  
0  APPALOOSA MANAGEMENT LP  0001006438  N03TRC-S-US 2002-11-08 00:00:00  
4  APPALOOSA MANAGEMENT LP  0001006438  M89FLH-S-US 2002-11-08 00:00:00  
5  APPALOOSA MANAGEMENT LP  0001006438  P7KT40-S-US 2002-11-08 00:00:00  
6  APPALOOSA MANAGEMENT LP  0001006438  TXX3S4-S-US 2002-11-08 00:00:00  
7  APPALOOSA MANAGEMENT LP  0001006438  QDX491-S-US 2002-11-08 00:00:00  

[5 rows x 14 columns]
Traceback (most recent call last):
  File "/home/michael/repos/projects/test_form_data.py", line 201, in <module>
    form_data = get_form13_performance(cik, form_13_store, 10)
  File "/home/michael/repos/projects/test_form_data.py", line 103, in get_form13_performance
    cik_df["portfolio_rank"] = cik_df.groupby("ex-date").transform(lambda x: x.rank(ascending=False))
  File "/local/install/pandas_test/lib/python2.7/site-packages/pandas/core/groupby.py", line 1824, in transform
    result.iloc[self.indices[name]] = res
KeyError: Timestamp('2002-11-08 00:00:00', tz=None)
Closing remaining open files: /data/form13/form13.h5... done


--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Martin De Kauwe | 16 Dec 13:17 2013
Picon

replacing column values based on matches index dates?

Hi,

I'm sure there is a quick way to do this, but I can't seem to find it.  If I have two data frames and I want to replace the column values of one with the values of the other where the index (which is a datetime object) match, is there a nice way to achieve this?

thanks.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Peter Gaultney | 15 Dec 21:54 2013
Picon

creating a DataFrame from a list of Series, using each Series as a row

I am afraid this will turn out to have been a trivial question.

I have multiple Series objects (which I am using as glorified dicts for the purpose of HDF5 compatibility), and I want to combine them to become a DataFrame that has as its column indices the keys of the Series - in other words, such that each Series is a row in the new DataFrame. 

I have, say, 10-1000 of these Series, all describing individual program traces:

In [4]: information = pd.read_hdf('information')
In [5]: information
Out[5]:
CMDLINE               ../../build/dplasma/testing//testing_dpotrf -N...
DIMENSION                                                ddescA(35, 35)
GFLOPS                                                          195.264
HWLOC-XML             <?xml version="1.0" encoding="UTF-8"?>\n<!DOCT...
PARAM_BUT_LEVEL                                                       0
PARAM_CHECK                                                           0
PARAM_CHECKINV                                                        0
PARAM_HIGHLVL_TREE                                                    0
PARAM_IB                                                             -1
PARAM_K                                                               1
PARAM_LDA                                                          8000
PARAM_LDB                                                             0
PARAM_LDC                                                             0
PARAM_LOWLVL_TREE                                                     0
PARAM_M                                                            8000
PARAM_MB                                                            234
PARAM_N                                                            8000
PARAM_NB                                                            234
PARAM_NCORES                                                         48
PARAM_NGPUS                                                           0
PARAM_NNODES                                                          1
PARAM_P                                                               1
PARAM_PINS                                                            0
PARAM_Q                                                               1
PARAM_QR_DOMINO                                                       1
PARAM_QR_HLVL_SZE                                                     0
PARAM_QR_TSRR                                                         1
PARAM_QR_TS_SZE                                                       0
PARAM_RANK                                                            0
PARAM_SCHEDULER                                                       7
PARAM_SMB                                                             1
PARAM_SNB                                                             1
PARAM_VERBOSE                                                         0
SYNC_TIME_ELAPSED                                              0.874193
TIME_ELAPSED                                                   0.874193
cwd                   /mnt/scratch/pgaultne/pandas/TILE_SIZE_TESTS/8000
error                                                                 0
exe                         ../../build/dplasma/testing//testing_dpotrf
filename              /mnt/scratch/pgaultne/pandas/TILE_SIZE_TESTS/8...
hostname                                                 ig.icl.utk.edu
id                                                                    0
last_error                                                            0
nb_cores                                                             48
nb_nodes                                                              1
nb_vps                                                                1
sched                                                                ip
sched_list                                                   [lfq, lfq]
start_time                                                   1387065158
worldsize                                                             1
dtype: object
In [6]: len(information)
Out[6]: 49 


I tried a bunch of different things, including pandas.concat([information1, information2, information3, ...]) and pandas.concat([information1, information2, information3, ...], axis=1). The first of these resulted in a single Series with no indices ( len(all_information) == 147 ). The second made each Series into the column of a new DataFrame, which was closer, but means that indexing by name (say, 'GFLOPS') results in an ndarray of floats that make up the row with the string index, instead of a Series of floats comprising the column.

I ended up going with the following, which works, but is probably slow for large datasets, and also just looks ugly.

    all_info = pd.DataFrame()
    all_info = all_info.append([information1, information2, information3])

which results in:

In [38]: all_profs_2['GFLOPS']
Out[38]:
0    195.264
1    203.155
0    206.545
Name: GFLOPS, dtype: float64

Is there (surely there must be...) a cleaner, more appropriate way to do this? 

Thanks in advance.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Tom Augspurger | 15 Dec 23:23 2013
Picon

C library for reading Stata/SASS/SPSS/R

I saw this the other day: https://github.com/WizardMac/ReadStat
The same person wrote some julia bindings: https://github.com/WizardMac/DataRead.jl

I've got no idea if this would be useful/feasible (which is why this isn't a GH issue). I remember
some interest in the past for a SAS reader.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
Daniel | 15 Dec 02:59 2013
Picon

Pandas version 0.12 - Partial String Indexing: Error when slicing using 2 string dates

Hello,
I have a pandas Series with datetime values as the index.  I can do my_series[datetime(2012, 1, 1):datetime(2012, 12, 1)], but if I do my_series['1/1/2012':'12/1/2012'], I get 'tuple' object has no attribute 'year' error message.

Here's the example using version 0.12:

- Daniel

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.
rowland.jon | 10 Dec 13:31 2013
Picon

Exception when persisting a DataFrame to hdf5

I have an automated process which loads and cleans data using Pandas and persists the end result to an HDF5 file for later retrieval.

This works almost always, except when the resulting DataFrame is empty which happens if there are no good rows in the input data.

The following code snippet throws a ZeroDivisionError if the all rows are filtered out:

import pandas as pd
import StringIO
from datetime import datetime

s = """date,symbol,price
2001-01-02 00:00:00,GCF5,1000.0
2001-01-02 00:00:00,GCZ5,1001.0"""

parse_dttm = lambda s: datetime.strptime(s, '%Y-%m-%d %H:%M:%S')
price_data = pd.read_csv(StringIO.StringIO(s), date_parser=parse_dttm, parse_dates=[0])

# works as there are rows in the DataFrame
price_data.to_hdf('C:/temp/test.h5', 'table')

# fails because the DataFrame is empty
price_data = price_data[price_data['symbol'] == 'XXX']
price_data.to_hdf('C:/temp/test.h5', 'table')


C:\Anaconda\lib\site-packages\tables\leaf.py in _calc_nrowsinbuf(self)
    366         rowsize = self.rowsize
    367         buffersize = params['IO_BUFFER_SIZE']
--> 368         nrowsinbuf = buffersize // rowsize
    369 
    370         # tableextension.pyx performs an assertion

ZeroDivisionError: long division or modulo by zero

Is this a bug? Or should I just avoid writing the DataFrame if it is empty? I do notice that if I persist a newly constructed DataFrame it seems to cope, but if the DataFrame is the result of a filtering operation it seems to fail.

Any help appreciated.

Thanks,
Jon

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit https://groups.google.com/groups/opt_out.

Gmane