Devasena Prasad | 19 Feb 17:46 2016
Picon

Mentor for GSoc 2016 project

Hello,

I would like to volunteer as a mentor for Pandas GSoc project.

Could anyone please let me know the process of getting involved and whom to contact?

--
Thanks and Regards,
Devasena

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Konrad Gołuchowski | 12 Feb 13:07 2016
Picon

read_json, 'split' orient and direct numpy

Hello,

I just found out that read_json with 'split' orient doesn't support direct numpy. This setting is explicitly changed to False in FrameParser. Is this needed?

For testing purposes I removed that setting and it seems to work. I was wondering why direct numpy reading is disabled only for split orient? And whether this is really needed? 

Thanks!
Konrad Goluchowski

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
questions anon | 18 Feb 00:35 2016
Picon

resample using given start and end month

I have monthly time series temperature data and I would like to resample the data to find 'mean' for November to March period, therefore skipping all other months. 

I can work out how to resample monthly, quarterly, annually etc. and to anchor to a particular month but I can't find any examples of how to resample to the five months starting november and ending march. 

Any feedback will be greatly appreciated. 

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Halitim Bachir | 14 Feb 23:07 2016
Picon

installing pandas v018.0rc1

Is there any help with this message please .

PS C:\Users\Bachir> conda install pandas=v0.18.0rc1 -c pandas
Traceback (most recent call last):
  File "C:\anaconda\Scripts\conda-script.py", line 4, in <module>
    sys.exit(main())
  File "C:\anaconda\lib\site-packages\conda\cli\main.py", line 140, in main
    from conda.cli import main_search
  File "C:\anaconda\lib\site-packages\conda\cli\main_search.py", line 10, in <module>
    from conda.misc import make_icon_url
  File "C:\anaconda\lib\site-packages\conda\misc.py", line 19, in <module>
    from conda.api import get_index
  File "C:\anaconda\lib\site-packages\conda\api.py", line 10, in <module>
    from conda.fetch import fetch_index
  File "C:\anaconda\lib\site-packages\conda\fetch.py", line 24, in <module>
    from conda.connection import CondaSession, unparse_url, RETRIES
  File "C:\anaconda\lib\site-packages\conda\connection.py", line 23, in <module>
    import requests
ImportError: No module named requests

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Brendan Barnwell | 14 Feb 22:50 2016
Picon
Gravatar

Can you plot computed values without adding them as columns?

The DataFrame plot methods are really nice for things like df.plot.bar(x='blah', y='other') .  But I often find myself wanting to do something like df.plot.bar(x='blah', y=df.other.cumsum()) --- that is, to plot the values of something that is some sort of transform of a column, but doesn't actually exist as a column.

I can of course do this by doing `df['otherCumsum'] = df.other.cumsum()` and then plotting that, but that clutters my DataFrame with columns I may not really want for any purpose but plotting.  I tried doing things like `df.plot.bar(x='blah', y=df.other.cumsum())`, but it gives errors like "indices out of bounds" suggesting it's trying to treat the computed data  as a column name,

So, is there a way to plot computed values without actually adding them as columns in the DataFrame?  Has anyone else thought about doing this?  It would be nice if the x and y parameters to plot methods could accept not just column names, but actual Series objects with the right index (or a sequence where each item is either a column name or such a Series object).  It could raise an error if the index wasn't right, but for the common case where it is (e.g., plotting the sum of two columns, cumsum, Z-score, etc.) it would be very convenient.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Jeff Reback | 14 Feb 01:53 2016
Picon

ANN: pandas v0.18.0rc1 - RELEASE CANDIDATE

Hi,

I'm pleased to announce the availability of the first release candidate of Pandas 0.18.0.
Please try this RC and report any issues here: Pandas Issues
We will be releasing officially in 1-2 weeks or so.

**RELEASE CANDIDATE 1**

This is a major release from 0.17.1 and includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.

Highlights include:

  • pandas >= 0.18.0 will no longer support compatibility with Python version 2.6 GH7718 or version 3.3 GH11273
  • Moving and expanding window functions are now methods on Series and DataFrame similar to .groupby like objects, see here.
  • Adding support for a RangeIndex as a specialized form of the Int64Index for memory savings, see here.
  • API breaking .resample changes to make it more .groupby like, see here
  • Removal of support for positional indexing with floats, which was deprecated since 0.14.0. This will now raise a TypeError, see here
  • The .to_xarray() function has been added for compatibility with the xarray package see here.
  • Addition of the .str.extractall() method, and API changes to the the .str.extract() method, and the .str.cat() method
  • pd.test() top-level nose test runner is available GH4327

See the Whatsnew for much more information. 

Best way to get this is to install via conda from our development channel. Builds for osx-64,linux-64,win-64 for Python 2.7 and Python 3.5 are all available.

conda install pandas=v0.18.0rc1 -c pandas

Thanks to all who made this release happen. It is a very large release!

Jeff


--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Paul Hobson | 11 Feb 04:27 2016
Picon
Gravatar

Help removing a for-loop

Hey everyone,

I'm cleaning up an old algorithm I implemented a few years ago and I'm trying to remove the last remaining for-loop I didn't covert to some sort of apply() when I switch from numpy arrays to DataFrames.

For starters, here's the basic setup:

import numpy
import pandas

cohn = pandas.DataFrame({
    'A': numpy.array([6.0, 1.0, 2.0, 2.0, numpy.nan]),
    'B': numpy.array([0.0, 7.0, 12.0, 22.0, numpy.nan]),
    'PE': numpy.array([numpy.nan, numpy.nan, numpy.nan, numpy.nan, 0.0])),
})

# print(cohn)
A B PE 0 6 0 NaN 1 1 7 NaN 2 2 12 NaN 3 2 22 NaN 4 NaN NaN 0 # <-- this is just a seed value # (the row will be dropped)
The current implementation loops through the dataframe backwards and recomputes the "PE" column base on the current A and B values and the previously computed PE value. It looks like this:
expected = cohn.copy() for j in expected.index[:-1][::-1]: prev_PE = expected.loc[j+1, 'PE'] A, B = expected.loc[j, 'A'], expected.loc[j, 'B'] new_PE = prev_PE + (1 - prev_PE) * A / (A + B) expected.loc[j, 'PE'] = new_PE # print(expected.dropna())
A B PE 0 6 0 1.000000 1 1 7 0.312500 2 2 12 0.214286 3 2 22 0.083333
I've tried to just apply the following function to the reversed dataframe, but it doesn't work work properly since the PE column isn't updated as apply moves through:
def pe(row, fulldf): idx = row.name if idx == fulldf.index.max(): prev_PE = 0 else: prev_PE = fulldf.loc[idx+1, 'PE'] A, B = row['A'], row['B'] new_PE = prev_PE + (1 - prev_PE) * A / (A + B) return new_PE cohn2 = cohn.copy().iloc[0:-1][::-1]
cohn2['PE'] = cohn2.apply(lambda row: pe(row, cohn2), axis=1)
print(cohn2[::-1) A B PE 0 6 0 NaN 1 1 7 NaN 2 2 12 NaN 3 2 22 0.083333
Any thoughts on if it's currently possible remove the explicit loop would be much appreciated. The implementation works as-is, and performance isn't a problem. I'm just curious and challenged by this, I guess.

Cheers,-Paul

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Miki Tebeka | 8 Feb 16:20 2016
Picon
Gravatar

Median number of hourly logins per day

Greetings,

I have a CSV with site id and login time stamp, something like:

site,time
1,2016-01-29 00:47:00
1,2016-01-29 00:51:00
2,2016-01-29 00:55:00
1,2016-01-29 00:57:00

I'd like to generate a bar chart where the x axis is the hour and the bars are the median number of logins for that hour, with a different bar per site.

I have the following code that seems to work, however I'm wondering if there's a more efficient/elegant way to do this.
df = pd.read_csv('logins.csv', parse_dates=['time'])
df['date'] = df['time'].dt.date
df['hour'] = df['time'].dt.hour

hdf = pd.DataFrame(index=np.arange(24))  # hourly median, columns are sites
for site in df['site'].unique():
    sdf = df[df['site'] == site]
    counts = sdf.groupby(['date', 'hour'], as_index=False).count()
    hourly = counts.groupby('hour').median()['time']
    hdf['site %s' % site] = hourly

hdf.fillna(0, inplace=True)
ax = hdf.plot(kind='bar')
ax.set_xticklabels(np.arange(24), rotation=45)
plt.title('Median Hourly Logins')
plt.show()

Thanks,
--
Miki

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Raniere Silva | 8 Feb 14:53 2016
Picon

Google Summer of Code 2016, GSoC2016

Hi all,

Since Pandas is a NumFOCUS sponsored project
and NumFOCUS will apply to be a mentoring organization on GSoC
I want to know (1) if Pandas is planning to apply this year
and (2) if want to apply under NumFOCUS umbrella.

Pandas is welcome and encouraged to apply as separate mentoring
organizations directly with Google. We're happy to help you fill out your
application and improve your ideas pages, as well as link your page to help
students find you. We may also be able to be a reference for you.
It is totally fine if you want to use the NumFOCUS umbrella org
as a backup plan in case you don't get selected and we do!

Cheers,
Raniere

--

-- 
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@...
For more options, visit https://groups.google.com/d/optout.
Bryan Van de Ven | 4 Feb 23:05 2016
Gravatar

ANN: Bokeh 0.11.1 released

Hi all,

I am please to announce a new point release of Bokeh, version 0.11.1, is
now available. Installation instructions can be found in the usual location:

	http://bokeh.pydata.org/en/latest/docs/installation.html

This release focused on providing bug fixes, small features, and documentation 
improvements. Highlights include:

* documentation:
  - instructions for running Bokeh server behind an SSL terminated proxy
  - Quickstart update and cleanup
* bugfixes:
  - notebook comms handles work properly
  - MultiSelect works
  - Oval legend renders correctly
  - Plot title orientation setting works
  - Annulus glyph works on IE/Edge
* features:
  - preview of new streaming API in OHLC demo
  - undo/redo tool add, reset tool now resets plot size
  - "bokeh static" and "bokeh sampledata" commands
  - can now create Bokeh apps directly from Jupyter Notebooks
  - headers and content type now configurable on AjaxDataSource
  - new network config options for "bokeh serve"

For full details, refer to the CHANGELOG in the GitHub repository, and the full
release notes (http://bokeh.pydata.org/en/latest/docs/releases/0.11.1.html)

Issues, enhancement requests, and pull requests can be made on the Bokeh
Github page: https://github.com/bokeh/bokeh

Full documentation is available at http://bokeh.pydata.org/en/0.11.1

Questions can be directed to the Bokeh mailing list: bokeh@...

Thanks, 

Bryan 

--

-- 
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@...
For more options, visit https://groups.google.com/d/optout.

Erol Merdanović | 4 Feb 12:46 2016
Picon

Correct merging of dataframes with overwrite and append

Hello

I created 2 dataframes:

df1 = pd.DataFrame([0.1, 0.2], [1, 2])
df2
= pd.DataFrame([0.3, 0.4], [1, 4])

I want to merge them into 1 dataframe where values are overwritten (if same index) and appended (if index does not exist). 

1. I tried with update

df1.update(df2)
df1
     
0
1  0.3
2  0.2

2. Then I tried with combine_first

df1.combine_first(df2)
     
0
1  0.1
2  0.2
4  0.4

My expect result is
     0
1  0.3
2  0.2
4  0.4

I'm able to accomplish this with 

df1.update(df2)
df1
= df1.combine_first(df2)

But is there a better way?

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.

Gmane