Seth P | 18 Jun 02:28 2014
Picon

Rolling exponentionally weighted functions? Exponentially weighted regression?

It appears to me that the ewm{a,var,std,corr,cov} functions are all expanding-window based. (This isn't 100% obvious from the documentation, as "span" suggests that it may be interpreted as a window length.) Any reason there aren't rolling_ewm{a,var,std,corr,cov} functions that take an additional window parameter?

Is the easiest / most efficient way to accomplish this by using rolling_apply() with the corresponding ewm{a,var,std,corr,cov} function?


Separately, I think it would be useful to have an ewmregression function (and a rolling version thereof), i.e. something like this [I haven't actually tested this code, so caveat emptor]:

def ewmregression(arg1, arg2,
                  com=None, span=None, halflife=None,
                  min_periods=0,
                  bias=False,
                  freq=None,
                  pairwise=None,
                  how=None):
    '''
    Regress arg1 on arg2, returning (slope, intercept)
    '''
    slope = pd.ewmcov(arg1=arg1, arg2=arg2, com=com, span=span, halflife=halflife, min_periods=min_periods, bias=bias, freq=freq, pairwise=pairwise, how=how) / \
            pd.ewmvar(arg=arg2, com=com, span=span, halflife=halflife, min_periods=min_periods, bias=bias, freq=freq, how=how)
    intercept = pd.ewma(arg=arg1, com=com, span=span, halflife=halflife, min_periods=min_periods, freq=freq, adjust=True, how=how) - \
                (slope * pd.ewma(arg=arg2, com=com, span=span, halflife=halflife, min_periods=min_periods, freq=freq, adjust=True, how=how))
    return (slope, intercept)


Seth

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Adam Hughes | 16 Jun 18:25 2014
Picon

How does notebook know to render dataframe all sexy like?

I was wondering if there's a method somehwere on the dataframe or elsewhere in pandas where the nice ipython notebook rendering is called, and if someone could direct me so I could learn a bit about it?

Thanks

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Victor Hooi | 13 Jun 11:42 2014
Picon

Help with parsing datetime column in CSV?

Hi,

I have some performance metrics in CSV format that I'm attempting to plot with pandas.

The data looks like this:

(PDH-CSV 4.0) (GMT Summer Time)(-60),\\FOOBAR\LogicalDisk(F:)\Disk Transfers/sec,\\FOOBAR\LogicalDisk(F:)\Disk Bytes/sec,\\FOOBAR\LogicalDisk(F:)\Disk Read Bytes/sec,\\FOOBAR\LogicalDisk(F:)\Disk Write Bytes/sec,\\FOOBAR\LogicalDisk(F:)\Avg. Disk sec/Transfer
06/12/2014 16:54:46.320, , , , ,
06/12/2014 16:54:47.306,136.12018071071395,4160807.9118737634,3303681.4820277682,857126.42984599527,0.0061417910447761192
06/12/2014 16:54:48.307,97.906137386087963,3231717.7521909745,2467522.3862882657,764195.36590270908,0.010606122448979592
06/12/2014 16:54:49.307,148.02067848878488,4353120.1308822837,3432927.5799829233,920192.55089936056,0.0070141891891891892
06/12/2014 16:54:50.308,140.83233910030108,4547801.8419072088,3555191.5444663125,992610.29744089651,0.0064283687943262413
06/12/2014 16:54:51.308,109.99112371631608,3615964.1916897306,2817820.6018774286,798143.58981230215,0.0085545454545454536
06/12/2014 16:54:52.308,137.98752592765615,4684888.4860808589,3616441.0737269353,1068447.4123539233,0.006678985507246377


NB: There's a bit of filler text in cell 0,0, and the first row of data is also empty - however, Pandas doesn't seem to have choked on those.

I'm able to import it pandas like so:

df = pd.read_csv('perfmon_foobar.csv')

I set the index to the first column (timestamp)

df = pd.read_csv('perfmon_foobar.csv', index_col=0)

I would next like to convert the first column to a datetime.

I've checked and pd.datetime() seems able to parse that format:

In [35]: pd.to_datetime('06/12/2014 16:54:46.320')
Out[35]: Timestamp('2014-06-12 16:54:46.320000')

I tried using parse_dates, but this doesn't seem to do anything:

df = pd.read_csv('perfmon_foobar.csv', index_col=0, parse_dates=0)
In [36]: df2.dtypes
Out[36]:
\\FOOBAR\LogicalDisk(F:)\Disk Transfers/sec        object
\\FOOBAR\LogicalDisk(F:)\Disk Bytes/sec            object
\\FOOBAR\LogicalDisk(F:)\Disk Read Bytes/sec       object
\\FOOBAR\LogicalDisk(F:)\Disk Write Bytes/sec      object
\\FOOBAR\LogicalDisk(F:)\Avg. Disk sec/Transfer    object
dtype: object

Any thoughts on how I might get it to parse it correctly?

Next step is to plot each column, but I'm hoping that armed with the pandas docs, that should be a cinch =).

Cheers,
Victor

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
John Na | 13 Jun 11:23 2014
Picon

.apply to multiple columns in dataframe

Hi,
 
the apply option in dataframes is very slow, and when I try use Cython (I'm newby in this) the process isn't fast. This is posible becouse I don´t make well this.
 
The point is that I have considered the option that change multiple colmuns in the same iteration of apply. Normaly, .apply result is a series that it's assigned to a dataframe column, like:
 
        df['newcolumn']=df.apply(...)
 
the cuestion is: Can I assign multiple columns value depending of value of one column?. A example:
 
I have a column, call 'Facts', in a pandas dataframe with a strings like: 'T:1-R:2', 'T:2-R:3', ...., and I have others columns like 'T' and 'R'.  The fact is to assign the value of the columns according to the string of 'Facts'.
 
My solution is:
 
 
def ExtrValFact(f,x):
    '''Return the value of '''
    mf=x.split('-')
    ms={s.split(':')[0]:s.split(':')[1] for s in mf}
    return ms[f]
 
facts=['T','R']
 
for f in facts:
     dfp[f] = dfp.apply(lambda x: ExtrValFact (f, x['Facts]), axis=1,raw=True)
 
 
but this is very slow when the size of dataframe is big.
 
Is I put the function ExtrValFact() in a cython module .pyx and import whith  import pyximport; pyximport.install() is the same result, very slow. I say that I don't know cython, mayby is the momento for
 
 
 
 
But my question is other, for this and other cases. Could I use a something like :
 
     dfp[facts] = dfp.apply(lambda x: ExtrValFact (facts, x['Facts]), axis=1,raw=True)
 
when the function ExtrValFact return the values of the differents factors of the same rows. In this case, the .apply function only use one time for each row?
 
 
 
Thanks
 
John Na
 
 
 
 

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Thiago Avelino | 11 Jun 05:03 2014

Open Mining (open source BI project) in the structure of the project PyData

I started a new project with the name Open Mining, new BI application server written in Python, used Pandas (is a web interface for pandas), read more:

http://openmining.io/
https://github.com/avelino/mining

What do you think of putting the Open Mining within the PyDatas organization on github? Migrate the repository of my personal profile to organizing (on github) pydatas! With this becomes a project PyData...

I will continue to maintain the project.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Adam | 12 Jun 17:33 2014
Picon

Pandas float64 rounding issues

Hello,

I'm using pandas datastructures in conjunction with other structures in a library for spectroscopy, so this may not be a pandas issue per-se, but I'm almost positive it is.  Basically, I'm reading wavelengths from a csv file where they only go out to 2 decimal places (eg 450.05)

I read these into a dataframe, and then a store one column (with the same index) as a reference specturm.  Thus, I have two dataframes (full spectra, single reference spectrum), with identical indicies, derived from the same original index.  A lot is going on under the hood, but somehwere along the line, some of the wavelengths are being rounded differently.  For example, here are the same three elements in two of the indexes:

[480.23 480.6 480.96] [480.23 480.59999999999997 480.96]



These are from Float64Index structures.  I really have no clue if this discrepancy if occurring under the hood from anything I've done, or if it's perhaps an issue involving read_csv() and float64Index.  Has anyone seen this type of problem before, and maybe could explain what the cause and resolution were in your case?  I'm using 0.13.1

The real problem here is that when I add or subtract dataframes, these indicies are not aligned, so the result in NAN's

Thanks

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Nathaniel Smith | 10 Jun 21:41 2014
Picon

why is pandas blowing up when imported from python 2.6 on travis?

I'm getting failing builds on Travis, because on my python 2.6 build,
'import pandas' raises an error:
  https://s3.amazonaws.com/archive.travis-ci.org/jobs/27223804/log.txt

This is with pandas 0.14.0. The relevant bit of traceback is:

  File "/home/travis/virtualenv/python2.6.9/lib/python2.6/site-packages/pandas/io/api.py",
line 7, in <module>
    from pandas.io.excel import ExcelFile, ExcelWriter, read_excel
  File "/home/travis/virtualenv/python2.6.9/lib/python2.6/site-packages/pandas/io/excel.py",
line 626, in <module>
    .format(openpyxl_compat.start_ver, openpyxl_compat.stop_ver))
ValueError: zero length field name in format

Anyone know why this would be? I assume that it can't just be that
pandas has dropped py26 support, given that there are py26 installers
for 0.14.0 on pypi. But all the other builds work fine:
  https://travis-ci.org/pydata/patsy/builds/27223803

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

--

-- 
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@...
For more options, visit https://groups.google.com/d/optout.

Marc Massar | 9 Jun 16:44 2014
Picon

Series.get(k,default) throwing IndexError

I get an error when doing a *.get(key,default) on a series produced by doing a value_counts:

In [1]: np.__version__
Out[1]: '1.8.1'

In [2]: pd.__version__
Out[2]: '0.14.0'

In [3]: df=pd.DataFrame({'i':[0]*10, 'b':[False]*10})

In [4]: vc_i=df.i.value_counts()

In [5]: vc_i.get(99,default='Missing')
Out[5]: 'Missing'

In [6]: vc_b=df.b.value_counts()

In [7]: vc_b.get(False,default='Missing')
Out[7]: 10

In [8]: vc_b.get(True,default='Missing')
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-9-74ffac8d643b> in <module>()
----> 1 vc_b.get(True,default='Missing')

/home/apprun/.virtualenvs/ipython_2.1/lib/python2.7/site-packages/pandas/core/generic.pyc in get(self, key, default)
   1038         """
   1039         try:
-> 1040             return self[key]
   1041         except (KeyError, ValueError):
   1042             return default

/home/apprun/.virtualenvs/ipython_2.1/lib/python2.7/site-packages/pandas/core/series.pyc in __getitem__(self, key)
    477     def __getitem__(self, key):
    478         try:
--> 479             result = self.index.get_value(self, key)
    480
    481             if not np.isscalar(result):

/home/apprun/.virtualenvs/ipython_2.1/lib/python2.7/site-packages/pandas/core/index.pyc in get_value(self, series, key)
   1175
   1176             try:
-> 1177                 return tslib.get_value_box(s, key)
   1178             except IndexError:
   1179                 raise

/home/apprun/.virtualenvs/ipython_2.1/lib/python2.7/site-packages/pandas/tslib.so in pandas.tslib.get_value_box (pandas/tslib.c:10817)()

/home/apprun/.virtualenvs/ipython_2.1/lib/python2.7/site-packages/pandas/tslib.so in pandas.tslib.get_value_box (pandas/tslib.c:10664)()

IndexError: index out of bounds



--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Viviana Nero | 6 Jun 12:17 2014
Picon

Panel regression with robust standard error

Hi all,

I would like to know if there is a way to have robust standard errors in a panel regression.

In particular,  I'm using function ols of pandas of version 0.12.0 for doing fixed effects model panel regression.

Thanks for your attention!

Viviana

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Hugo Pires | 5 Jun 20:29 2014
Picon

How do I slice a Multindex timeseries dataframe on Pandas?

I have to slice a dataframe created by this code

    data = pd.read_csv(("/user_home/w_hugopires/dados/dados_meteo.csv"),names=['POM','DTM','RNF','WET','HMD','TMP','DEW','INF'])
    data['DTM'] = pd.to_datetime(data['DTM'], coerce = True)
    data.set_index(['POM', 'DTM'], inplace=True)

First, I have to create a Multindex since there are DTM (timestamps) repeated between several POM (automatic weather stations).

The result is

                           RNF WET HMD TMP DEW INF
    POM        DTM
    QuintaVilar 2011-11-01 00:00:00 0 0 0 0 0 0
                2011-11-01 00:15:00 0 0 0 0 0 0
                2011-11-01 00:30:00 0 0 0 0 0 0
                2011-11-01 00:45:00 0 0 0 0 0 0
                2011-11-01 01:00:00 0 0 0 0 0 0

Then I use the following code to create a slice of the dataframe

    intervalo = data[['TMP','RNF']].ix[pom1][start_year + start_month + start_day : final_year + final_month + final_day]

And the result is

    TMP RNF
    DTM
    2013-04-01 00:12:00 12.5 0
    2013-04-01 00:27:00 12.1 0
    2013-04-01 00:42:00 12.1 0
    2013-04-01 00:57:00 11.7 0
    2013-04-01 01:12:00 11.7 0

How can I slice with multiple POM's and how can I slice with multiple time intervals (eg every April of every year)?

Thank you

Hugo

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Fabian Braennstroem | 1 Jun 22:23 2014

add category depending string in columns

Hello,

I would like to write the category for each row of a dataframe depending on another
dataframe:


Date Value Category
23.12.2013 TEST
23.12.2013 AUTO
23.12.2013 Calculation 1
23.12.2013 Calculation 2

<!-- BODY,DIV,TABLE,THEAD,TBODY,TFOOT,TR,TH,TD,P { font-family:"Arial"; font-size:x-small } -->
The dataframe with the pattern looks like:
Category Pattern
DIVS <if not in other>
CAR BMW|VW|AUDI
CALC Calculation
<!-- BODY,DIV,TABLE,THEAD,TBODY,TFOOT,TR,TH,TD,P { font-family:"Arial"; font-size:x-small } -->
The result should be:

Date Value Category
23.12.2013 TEST DIVS
23.12.2013 BMW CAR
23.12.2013 Calculation 1 CALC
23.12.2013 Calculation 2
CALC
<!-- BODY,DIV,TABLE,THEAD,TBODY,TFOOT,TR,TH,TD,P { font-family:"Arial"; font-size:x-small } -->
Do you have a suggestion how to achieve this?
Thanks in advance!

Best Regards
Fabian

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.


Gmane