Damian Avila | 15 Aug 19:47 2014

ANN: Bokeh 0.5.2 released!

On behalf of the Bokeh team, I am very happy to announce the release of Bokeh version 0.5.2!

Bokeh is a Python library for visualizing large and realtime datasets on the web.
Its goal is to provide to developers (and domain experts) with capabilities to easily create novel and powerful visualizations that extract insight from local or remote (possibly large) data sets, and to easily publish those visualization to the web for others to explore and interact with.

This release includes many bug fixes and improvements over our last recent 0.5.1 release:

  * New Layout system using kiwi.js constraint solver
  * Improved automated testing infrastructure
  * Abstract Rendering testing, server-side downsample fixes and ISO Contours

See the CHANGELOG for full details.

In upcoming releases, you should expect to see new layout capabilities (multiple axes, colorbar axes, better grid plots and improved annotations), new tools, more widgets and more charts, as well as an Object Query API, R language bindings, Blaze integration and cloud hosting for Bokeh apps.

Don't forget to check out the full documentation, interactive gallery, and tutorial at


as well as the new Bokeh IPython notebook nbviewer index (including all the tutorials) at:


If you are using Anaconda, you can install with conda:

    conda install bokeh

Alternatively, you can install with pip:

    pip install bokeh

BokehJS is also available by CDN for use in standalone javascript applications:


Issues, enhancement requests, and pull requests can be made on the Bokeh Github page: https://github.com/continuumio/bokeh

Questions can be directed to the Bokeh mailing list: bokeh-aihBOO89d3ITaNkGU808tA@public.gmane.org

If you have interest in helping to develop Bokeh, please get involved!

Cheers.

Damián.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Adam | 14 Aug 22:18 2014
Picon

ConversionIndex for quick transforms in unit systems

Hi,

I recently created a "ConversionIndex" and a template/tutorial to make it easy to create new pandas index classes that are designed to convert between various unit systems.  For example, if I wanted an index to represent length, it could easily support feet, inches, yards, miles etc... in a general capacity.  I needed to make this for my spectroscopy package, and since spectral data often varies with time, temperature, solution concentration, I needed a simple factory for churning these out.  

I've written a tutorial on how to create a pandas index that support pressure units (Pascals, Atmospheres and Bars).

Thought this might be useful to others, and wanted to get it on here so that it becomes google-able.  Please feel free to get in touch with me if you're interested in implementing it and need some help.

Thanks.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
josef.pktd | 14 Aug 12:12 2014
Picon

pandas master, statsmodels, bool

Could someone please run this code with pandas master?

```
import pandas
from statsmodels import datasets

# load the data and clean it a bit
affairs = datasets.fair.load_pandas()
datas = affairs.exog
# any time greater than 0 is cheating
datas['cheated'] = affairs.endog > 0
# sort by the marriage quality and give meaningful name
# [rate_marriage, age, yrs_married, children,
# religious, educ, occupation, occupation_husb]
datas = datas.sort(['rate_marriage', 'religious'])
num_to_desc = {1: 'awful', 2: 'bad', 3: 'intermediate',
                  4: 'good', 5: 'wonderful'}
datas['rate_marriage'] = datas['rate_marriage'].map(num_to_desc)
num_to_faith = {1: 'non religious', 2: 'poorly religious', 3: 'religious',
                  4: 'very religious'}
datas['religious'] = datas['religious'].map(num_to_faith)
num_to_cheat = {False: 'faithful', True: 'cheated'}
datas['cheated'] = datas['cheated'].map(num_to_cheat)
```

part of the following test that fails on pythonxy Ubuntu testing

======================================================================
ERROR: statsmodels.graphics.tests.test_mosaicplot.test_mosaic
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/usr/lib/python2.7/dist-packages/numpy/testing/decorators.py",
line 146, in skipper_func
    return f(*args, **kwargs)
  File "/build/buildd/statsmodels-0.6.0~ppa18~revno/debian/python-statsmodels/usr/lib/python2.7/dist-packages/statsmodels/graphics/tests/test_mosaicplot.py",
line 124, in test_mosaic
    datas['cheated'] = datas['cheated'].map(num_to_cheat)
  File "/usr/lib/pymodules/python2.7/pandas/core/series.py", line 1960, in map
    indexer = arg.index.get_indexer(values)
  File "/usr/lib/pymodules/python2.7/pandas/core/index.py", line 1460,
in get_indexer
    if not self.is_unique:
  File "properties.pyx", line 34, in pandas.lib.cache_readonly.__get__
(pandas/lib.c:38722)
  File "/usr/lib/pymodules/python2.7/pandas/core/index.py", line 571,
in is_unique
    return self._engine.is_unique
  File "index.pyx", line 205, in
pandas.index.IndexEngine.is_unique.__get__ (pandas/index.c:4338)
  File "index.pyx", line 234, in
pandas.index.IndexEngine._do_unique_check (pandas/index.c:4790)
  File "index.pyx", line 247, in
pandas.index.IndexEngine._ensure_mapping_populated
(pandas/index.c:4995)
  File "index.pyx", line 253, in pandas.index.IndexEngine.initialize
(pandas/index.c:5092)
  File "hashtable.pyx", line 731, in
pandas.hashtable.PyObjectHashTable.map_locations
(pandas/hashtable.c:12440)
ValueError: Does not understand character buffer dtype format string ('?')

--

-- 
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@...
For more options, visit https://groups.google.com/d/optout.

Nathaniel Smith | 13 Aug 23:59 2014
Picon

python 3.2 support in patsy

Hi all,

I've just been looking at switching patsy's continuous integration
code over to use conda, which makes it dramatically faster and easier
to work with. But, the problem is that until now patsy has officially
supported python 3.2, and conda doesn't. And I'm pretty reluctant to
claim to support a version of python that I don't actually test on.
And it'd be easy to accidentally break this later on by e.g. using a
u"" literal.

Does anyone care whether patsy continues to support python 3.2? Are
there any downstream projects that use patsy and want to keep python
3.2 support?

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

--

-- 
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@...
For more options, visit https://groups.google.com/d/optout.

Hernan Avella | 7 Aug 21:25 2014
Picon

Selecting data from different ts dataframes in Pandas

Basically, I just want to be able to "bring" data to df2 from another df1, using as a selection criteria the index of df2 + some days.

df1

Open High Low Close 2005-09-07 1234.50 1238.00 1231.25 1237.00 2005-09-08 1235.00 1242.75 1231.75 1238.75 2005-09-09 1237.50 1250.25 1237.50 1247.75 2005-09-12 1248.75 1251.00 1245.25 1247.00 2005-09-13 1245.25 1246.75 1237.50 1238.00

df2

Ref 2005-09-07 1 2005-09-08 2 2005-09-09 3

Desired Output = Df2

Ref 1d.Close 2d.Close 3d.Close 2005-09-07 1 1238.75 1247.75 1247.00 2005-09-08 2 1247.75 1247.00 1238.00 2005-09-09 3 1247.00 1238.00 NaN

This is what I have tried (please don't laugh):

df2['date.value'] = df2.index df2['+1d.Date'] = df2['date.value'] + timedelta(1) df2['1d.Close'] = df1['Close'].loc[df2['date.value']]

This approach gives me NaN, but if I use:

df2['1d.Close'] = df1['Close'].loc[df2['2005-09-07']]

This would give me 1238.75 which would be correct for the first row of the example. But for some reason It doesn't work in the formula.

Final Notes:

  • The dates on df2 are not always consecutive
  • The "length" of the timedelta is variable too and not always consecutive.

Thanks for the help

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Hernan Avella | 6 Aug 22:52 2014
Picon

Custom time series resampling in Pandas



I have a df with OHLC data in a 1m frequency:

Open High Low Close DateTime 2005-09-06 18:00:00 1230.25 1231.50 1230.25 1230.25 2005-09-06 18:01:00 1230.50 1231.75 1229.25 1230.50 . . 2005-09-07 15:59:00 1234.50 1235.50 1234.25 1234.50 2005-09-07 16:00:00 1234.25 1234.50 1234.25 1234.25

I need to do a "custom" resample that fits futures hours data where:

  • Every day starts at 18:00:00 of the previous day (Monday starts on Sunday 18:00:00)
  • Every day ends at 16:00:00 of the current day
  • The timestamp should be as of the the time of the close, not the open.

After doing the resample, the output for the above df should be:

Open High Low Close DateTime 2005-09-07 16:00:00 1230.25 1235.50 1229.25 1234.25

Where:

  • Open = first (column Open) at 2005-09-06 18:00:00
  • High = max (column High) from 2005-09-06 18:00:00 to 2005-09-07 16:00:00
  • Low = min (column Low) from 2005-09-06 18:00:00 to 2005-09-07 16:00:00
  • Close = last (column Close) at 2005-09-07 16:00:00

I have tried:

  • Changing the parameters rule, and base, but it didn't work.
  • Playing with reindex with no success.

I used the following 'how':

conversion = {'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last'}

Thanks a lot.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Julie Rozenberg | 6 Aug 12:27 2014
Picon

pb with append

I'm working with household surveys from developing countries. I run a model based on the surveys and I store the results in a .h5 file using append.

It usually works well (i.e. I can append the results from different countries) but with a small number of countries I get the following error when appending:
ValueError: cannot match existing table structure for [data1,data2,...] on appending data

The error happens with the model results but I've realized it also happens with some columns of the underlying raw data. And it happens also when I keep only a small number of rows (see attached example).

I have checked the type of each column and they are similar, and I have used convert_objects(convert_
numeric=True) but it didn't help.

Any idea where this comes from?

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Attachment (AFG_GIDD_head.csv): text/csv, 2202 bytes
Attachment (BGD_GIDD_head.csv): text/csv, 1081 bytes
Attachment (debug.py): text/x-python, 562 bytes
'Michael' via PyData | 6 Aug 11:12 2014

shortcut for list of columns

Whenever i need list of columns (or string in general)
I find myself writing:

'col1 col2 col3 col4 col5'.split()

instead of

['col1','col2','col3','col4','col5']

It's quite addictive

So I was wondering if this would be worthy to be added into pandas.
To have something like this:

df.ix[ : , 'col1 col2 col3 col4 col5']

What do you think?
 

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Tejesh Papineni | 31 Jul 03:45 2014
Picon

pandas -- datetime index and labeled columns

dates = pd.date_range('20130101',periods=6)df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))df A B C D 2013-01-01 0.469112 -0.282863 -1.509059 -1.135632 2013-01-02 1.212112 -0.173215 0.119209 -1.044236 2013-01-03 -0.861849 -2.104569 -0.494929 1.071804 2013-01-04 0.721555 -0.706771 -1.039575 0.271860 2013-01-05 -0.424972 0.567020 0.276232 -1.087401 2013-01-06 -0.673690 0.113648 -1.478427 0.524988 [6 rows x 4 columns]df.A gives
2013-01-01 -0.467582 2013-01-02 0.289209 2013-01-03 0.489941 2013-01-04 0.071964 2013-01-05 -0.697187 2013-01-06 -1.073856but i need only column A without date-index like this-0.467582 0.289209 0.489941 0.071964 -0.697187 -1.073856

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Hernan Avella | 29 Jul 18:26 2014
Picon

Pandas df.at_time NotImplementedError

Hi, I have the following indexed df

Out[9]: 
                                    Open     High      Low       Close
DateTime                                               
2014-04-25 17:06:00  1860.00  1860.25  1859.75  1860.00
2014-04-25 17:07:00  1860.00  1860.25  1859.75  1860.00
2014-04-25 17:08:00  1860.00  1860.00  1859.75  1860.00
2014-04-25 17:09:00  1860.00  1860.00  1859.75  1859.75
2014-04-25 17:10:00  1859.75  1860.00  1859.75  1859.75
2014-04-25 17:11:00  1860.00  1860.25  1859.75  1860.25
2014-04-25 17:12:00  1860.00  1860.25  1860.00  1860.00
2014-04-25 17:13:00  1860.25  1860.25  1860.00  1860.25
2014-04-25 17:14:00  1860.00  1860.25  1860.00  1860.25
2014-04-25 17:15:00  1860.00  1860.75  1860.00  1860.25

Im trying to subset the rows a a certain time. I found this procedure and it works well for what I want

df1 = df.at_time(time(15, 0), asof=False)

However, thinking that my data might have holes and some times the specified time might not exist, I would like to get the closest one. I read about "asof" and it seems to do what I want, however when I do this

df1 = df.at_time(time(15, 0), asof=True)

I get a big error (see below). Is there something I can do to make this work?

Thanks

Hernan
Python 2.7.6, Pandas 2.0, Enthought Canopy 1.4
------------------------------------------------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-28-3c10b7e963c9> in <module>()
----> 1 df2 = df1.at_time(time(15, 0), asof=True)

C:\Users\Hernan\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\core\generic.pyc in at_time(self, time, asof)
   2767         """
   2768         try:
-> 2769             indexer = self.index.indexer_at_time(time, asof=asof)
   2770             return self.take(indexer, convert=False)
   2771         except AttributeError:

C:\Users\Hernan\AppData\Local\Enthought\Canopy\User\lib\site-packages\pandas\tseries\index.pyc in indexer_at_time(self, time, asof)
   1689 
   1690         if asof:
-> 1691             raise NotImplementedError
   1692 
   1693         if isinstance(time, compat.string_types):

NotImplementedError: 
-----------------------------------------------------------------------------------------------------------------------------------

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Paul Hobson | 29 Jul 17:33 2014
Picon

Weird groupby behavior

Hey gang,

I was trying to help out the asker of this question on SO:


As a result, I'm seeing weird behavior with groupby. The example below demonstrations it

import pandas as pd

x = pd.DataFrame([
    ['best', 'a', 'x'],
    ['worst', 'b', 'y'],
    ['best', 'c', 'x'],
    ['best','d', 'y'],
    ['worst','d', 'y'],
    ['worst','d', 'y'],
    ['best','d', 'z'],
], columns=['a', 'b', 'c'])
groups = x.groupby(by='c') print(groups.get_group('y'))
# a b c # 1 worst b y # 3 best d y # <-- shouldn't this appear below? # 4 worst d y # 5 worst d y

print(groups.filter(lambda g: g['a'] == 'best'))
a b c 0 best a x 2 best c x 6 best d z
Basically I'm curious why a result for the "y" group doesn't appear after that filter operation. Any insight would be most appreciated.
-paul

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.

Gmane