Joris Van den Bossche | 17 Apr 12:07 2015
Picon

Upcoming Index repr changes

Hi all,

We have a PR pending to unify the string representation of the different Index objects: https://github.com/pydata/pandas/pull/9901

What are the most important changes:
  • We propose to reduce the default number of values shown from 100 to 10 (an option controllable as pd.options.display.max_seq_items).
  • The datetime-like indices (DatetimeIndex, TimedeltaIndex, PeriodIndex) were always somewhat different and get a new repr that is now more consistent with how it is for other Index types like Int64Index. This is the biggest change.
So for eg Int64Index not much changes (only 'name' is now also shown, and the number of shown values has changed), but for DatetimeIndex the change is larger.

But we would like to get some feedback on this!

Do you like the changes? For DatetimeIndex? For the number of shown values?
Would you want different behaviour for repr() and str()?

Some examples of the changes with the current state of the PR are shown below:

Previous Behavior

In [1]: pd.get_option('max_seq_items')
Out[1]: 100

In [2]: pd.Index(range(4), name='foo')
Out[2]: Int64Index([0, 1, 2, 3], dtype='int64')

In [3]: pd.Index(range(104), name='foo')
Out[3]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...], dtype='int64')

In [4]: pd.date_range('20130101', periods=4, name='foo', tz='US/Eastern')
Out[4]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 00:00:00-05:00, ..., 2013-01-04 00:00:00-05:00]
Length: 4, Freq: D, Timezone: US/Eastern

In [5]: pd.date_range('20130101', periods=104, name='foo', tz='US/Eastern')
Out[5]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 00:00:00-05:00, ..., 2013-04-14 00:00:00-04:00]
Length: 104, Freq: D, Timezone: US/Eastern


New Behavior

In [1]: pd.get_option('max_seq_items')
Out[1]: 10

In [9]: pd.Index(range(4), name='foo')
Out[9]: Int64Index([0, 1, 2, 3], dtype='int64', name=u'foo')

In [10]: pd.Index(range(104), name='foo')
Out[10]: Int64Index([0, 1, ..., 102, 103], dtype='int64', name=u'foo', length=104)

In [11]: pd.date_range('20130101', periods=4, name='foo', tz='US/Eastern')
Out[11]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00'], dtype='datetime64[ns]', name=u'foo', freq='D', tz='US/Eastern')

In [12]: pd.date_range('20130101', periods=104 ,name='foo', tz='US/Eastern')
Out[12]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', ..., '2013-04-13 00:00:00-04:00', '2013-04-14 00:00:00-04:00'], dtype='datetime64[ns]', name=u'foo', length=104, freq='D', tz='US/Eastern')

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
lamp | 16 Apr 23:21 2015
Picon

Statsmodels needs you!

See this thread here: https://groups.google.com/forum/#!topic/pystatsmodels/vyM4bnkmEFk 

Statsmodels needs to revise infrastructure to encourage and facilitate more grassroots contribution and maintenance.. as well as help with core tasks. Discussion in the thread above. Any ideas and support from the community would be awesome. 

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Ilya Plenne | 13 Apr 19:49 2015
Picon

Pandas plot xticks

Hi, recently, i've found a problem, which i can't solve. This is source code, it's run under recent version of pandas, matplotlib, ipython notebook. Example source code: https://gist.github.com/libbkmz/447ea6fa1ef45eaa2daf

If start date 13 of march, I'm getting this image, but, if i change start date to 14 of march, it will produce this image:


So, I want to show this xticks, with this amount of data, but from 13 of march(it's just for example). I have this issue every day, and i don't know how to fix it. 

If i add x or custom xticks df.plot(figsize=(30, 6), x=r, xticks=r) , I've got this:

So, I can see grid for every day, but there is no days labels in x axis for this grid.

This thing makes me really crazy...

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Dr. Leo | 13 Apr 11:06 2015

ANN: pandaSDMX 0.2.0 released

Hi,

I am excited to announce the release of pandaSDMX 0.2.0. This version is
a quantum leap. The whole project has been redesigned
and rewritten from scratch to provide robust support for many SDMX
features. The new architecture is centered around a pythonic
representation of the SDMX information model. It is extensible through
readers and writers for alternative input and output formats. Export
to pandas has been dramatically improved. Sphinx documentation has
been added.

* Read the documentation including IPython sessions at
http://pandasdmx.readthedocs.org

* Install  it with ``pip install pandasdmx``
* Join the Google group at
https://groups.google.com/forum/?hl=en#!forum/sdmx-python  
* Report issues at https://github.com/dr-leo/pandaSDMX  

About pandaSDMX

pandaSDMX is an Apache 2.0-licensed Python package aimed at becoming
the most intuitive and versatile tool to retrieve and acquire
statistical data and metadata disseminated in SDMX format. It works
well with the SDMX services of the European statistics office
(Eurostat) and the European Central Bank (ECB). While pandaSDMX is
extensible to cater any output format, it currently supports only
pandas, the gold-standard of data analysis in Python. But from pandas
you can export your data to Excel and friends.

Main features
=============

* intuitive API inspired by requests

* support for many SDMX features including

  * generic datasets

  * data structure definitions, codelists and concept schemes

  * dataflow definitions

  * categorisations and category schemes

* pythonic representation of the SDMX information model

* find dataflows by name or description in multiple languages if
  available

* read and write local files for offline use

* writer transforming SDMX generic datasets into multi-indexed
  pandas DataFrames or Series of observations and attributes

* extensible through custom readers and writers for alternative
  input and output formats of data and metadata

Example
=======

   In [1]: from pandasdmx import Request

   # Get annual unemployment data from Eurostat
   In [2]: une_resp = Request('ESTAT').get(resource_type = 'data',
resource_id = 'une_rt_a')

   # From the received dataset, select the time series on Greece,
Ireland and Spain, and write them to a pandas DataFrame
   In [3]: une_df = une_resp.write(s for s in une_resp.msg.data.series
if s.key.GEO in ['EL', 'ES', 'IE'])

   # Explore the DataFrame
   In [4]: une_df.columns.names
   Out[4]: FrozenList(['AGE', 'SEX', 'S_ADJ', 'GEO', 'FREQ'])

   In [5]: une_df.columns.levels[0:2]

--

-- 
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@...
For more options, visit https://groups.google.com/d/optout.

Rebecca | 10 Apr 18:16 2015
Picon

write to csv

Hi all,

I have a pandas data frame like,

a,,,,,
a,b,,,,
a,b,,,,
a,b,c,d,e,f

I would like write to a csv file like,

a
a,b
a,b
a,b,c,d,e,f

I don't want to keep all the missing value as "" and I want to remove "," for the missing value in the csv. How can I do this?

Thanks,

Rebecca

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Daniel | 9 Apr 00:39 2015
Picon

Is there a way to do text patten matching using dataframe.query() method?

In other words, instead of:
df['text_column'].str.contains('hello')

maybe do:
df.query("text_column like '%hello%'")

Sorry, I come from SQL background, so was curious if something like my query() example would be possible.

I have seen several examples on how df.query() is used, but not for text pattern matching.
Would be interesting to see examples on how to use pattern matching with df.query() method, if it is indeed possible.


Thanks!

- Daniel

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Keith Brown | 8 Apr 02:18 2015
Picon

group by timestamp

I have some data like this

timestamp,heigth,owner
2011-11-01 08:04:56,10,A
2011-11-02 08:04:16,10,B
2011-11-03 18:04:26,10,C
2011-11-03 12:04:36,10,A
2011-11-04 05:04:56,10,B

import pandas as pd
df=pd.read_csv('sample.csv')

I would like to get a view or even a dataframe so I can group by timestamp's date value

2011-11-01
  height,owner
  10,A
2011-11-02
  height,owner
  10,B
2011-11-03
  height,owner
  10,C
  10,A
2011-11-04
  10,B

Is there an easy way to break down a dataframe like this? 


--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Jay | 7 Apr 05:54 2015
Picon

Backtest a stock strategy using Pandas?

What is the most efficient way to use Pandas to backtest a stock strategy with a DataFrame? Can anyone show me an example please?

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Denis Akhiyarov | 6 Apr 16:26 2015
Picon

pandas multiindex dataframe, ND interpolation for missing values

Is it possible in pandas to interpolate for missing values in multiindex dataframe?

See this question for details:

http://stackoverflow.com/questions/29465398

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Mohan Radhakrishnan | 6 Apr 13:39 2015
Picon

Storing long HTML tags

Hi,
          When I store entire HTML tag hierarchies in a columns the column seems to get truncated. The entire code is this.

Can't I store long HTML tags in a column ?

Thanks,
Mohan

from bs4 import BeautifulSoup
import fnmatch
import sys
import re
import os
import glob
import pandas as pd

from datetime import datetime
from datetime import time as dt_tm
from datetime import date as dt_date

class Parse:
 
    def __init__(self):
        self.parse()

    def parse(self):
        try:
            path = "D:\\Python Data Analytics"
 
            for infile in glob.glob(os.path.join(path, "*.html")):
                print infile
                markup = (infile)
                soup = BeautifulSoup(open(markup, "r").read())
                            
                data = soup.findAll(True,{'id':True})
                
                data = soup.findAll(False,{'id':False})
               
                df = pd.DataFrame(columns=['No',
                                           'Tag'])
                
                for attribute in data:
                    df = df.append(pd.DataFrame( [dict(No=1,
                                                       Tag=attribute)] ),
                                                       ignore_index=True)


                print df    
                html5report = df
                html5report = html5report.to_html().replace('<table border="1" class="dataframe">','<table class="table table-striped">')
                htmlreporter = '''
<html>
    <head>
        <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/css/bootstrap.min.css">
        <style>body{ margin:0 100; background:whitesmoke; }</style>
    </head>
    <body>
       <h1>HTML 5 Report</h1>

        ''' + html5report + '''
    </body>
</html>'''
                f = open('D:\Python Data Analytics\\Report\\report.html','w')
                f.write(htmlreporter)
                f.close()    
                            
        except IOError as e:
            print "I/O error({0}): {1}".format(e.errno, e.strerror)
        except Exception, err:
            print "Unexpected error:", sys.exc_info()[0]
 

if __name__ == '__main__':
    instance = Parse()

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Thomas Haslwanter | 5 Apr 11:47 2015
Picon

pandas' slicing convention

I am wondering if the syntax used by pandas for slicing should be slightly modified.
For example, consider the dataframe

   df = pd.DataFrame(np.randn(3,3))

For multiple rows, slicing works just like in numpy arrays:

  df[0:2]

However, if I want to address a single line
  df[1]

produces a KeyError, and I have to use

  df[1:2]

This is inconsistent with e.g.

  df.iloc[0,1]

where I can address a single element.

Would changing the syntax such that "df[1]" produces the second line cause any other problems? I think it would really facilitate the use of pandas' DataFrames!

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.

Gmane