Juarez Bochi | 7 Mar 15:39 2015
Picon

Request for feedback: Sandals (SQL + Pandas)

Hi,

When I stumbled upon Pandas' comparison with SQL, I thought that it would be an interesting exercise to translate SQL queries to pandas operations automatically. Besides being an interesting problem, it could be useful for users new to pandas that are familiar with SQL.

I did some research and found a library called pandasql that kind of does what I wanted, but it uses sqlite behind the scenes. I believe this is not the best approach.

Last weekend I decided to give it a try and developed a new library that I called sandals. It's still just a proof of concept with only ~200LOC and it's definitely not feature complete, but some queries already work:

>>> import pandas as pd >>> import sandals >>> tips = pd.read_csv("tests/data/tips.csv") >>> sandals.sql("SELECT total_bill, sex FROM tips LIMIT 5", locals()) total_bill sex 0 16.99 Female 1 10.34 Male 2 21.01 Male 3 23.68 Male 4 24.59 Female >>> sandals.sql("SELECT * FROM tips WHERE time = 'Dinner' AND tip > 5.00 LIMIT 3", locals()) total_bill tip sex smoker day time size 23 39.42 7.58 Male No Sat Dinner 4 44 30.40 5.60 Male No Sun Dinner 4 47 32.40 6.00 Male No Sun Dinner 4 >>> sandals.sql("SELECT tips.day, AVG(tip), COUNT(*) FROM tips GROUP BY tips.day;", locals()) tip day day Fri 2.734737 19 Sat 2.993103 87 Sun 3.255132 76 Thur 2.771452 62

The implementation still lacks support for joins, union, sub / nested queries, order by, etc. It also lacks support for arithmetic operators (+, -, *, /, %, etc), math/string functions, and user defined functions. (What other features would be interesting to add?)

I would really appreciate some feedback. Is this something that could be useful? 

Thanks
Juarez Bochi

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Richard Stanton | 6 Mar 19:37 2015
Picon

Odd display of x scale in plot of dataframe

When I run the following code in an IPython notebook:


df = pd.DataFrame({'year': [1900, 1901, 1902], 'x1' : [3, 4, 5], 'x2' : [6, 7, 8]}).set_index('year')

df.plot(use_index=True)


the graph appears fine, but the x axis scale looks really strange. It shows the numbers 0, 0.5, 1.0, 1.5, and 2.0, and below the 2.0 is the notation “+1.9e3”


While the graph is correct, it would look a lot better if it just showed the numbers 1900, 1901 and 1902, rather than making the reader add 1900 to everything! 


By the way, this is with pandas  0.15.2.


Is there a simple way to change the display to make it look more standard? And am I doing something wrong here? This output seems a very odd default display.


Thanks.


Richard Stanton

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Phillip Cloud | 6 Mar 18:11 2015
Picon

Into project renamed to odo

It’s morphin’ time!

On behalf of the blaze and odo team I’m happy to announce the release of odo, formerly known as into.

Odo is a library for moving data between different containers. For example to write a list of tuples to a CSV file, we’d write:

In [1]: from odo import odo In [2]: data = [('Alice', 1), ('Bob', 2)] In [3]: csv = odo(data, 'data.csv', dshape='var * {name: string, id: int64}') In [4]: !cat data.csv name,id Alice,1 Bob,2

odo is capable of much more. Check out the docs over at odo.readthedocs.org

Here are the release notes:

  • Renamed “into” to “odo”
  • Swapped order of source and target arguments.
    • Was into(target, source), now: odo(source, target)
  • into still available for backwards compatibility
  • new docs location: odo.readthedocs.org
  • conda install -c blaze odo gets you the latest version of the package

As always, feedback is welcome! Please raise issues, ask questions, and suggest features to your heart’s content. You can do so over at https://github.com/ContinuumIO/odo/issues.

Enjoy!


Best,
Phillip Cloud

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Stefane Fermigier | 1 Mar 09:11 2015
Picon

Last call for the PyData Paris 2015 CFP

Hi,

the CFP for the PyData Paris conference, which will take place on April 3rd, is open until March 3rd. We will notify the selected speakers shortly afterwards:


Let me also remind you of the early birds discount that is also available until this Tuesday:


If you'd like to sponsor the event, you can find a sponsorship presentation here:

http://pydataparis.joinux.org/static/paris2015/pdf/pydata-sponsorship.pdf

A+.

  S.

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Liora S | 26 Feb 20:01 2015
Picon

Pandas 15.2 much slower than 13 and 12 for timestamps

I just upgraded from 0.13 (locally) and 0.12 (on Heroku) to 0.15.2 and I am finding that my datetime processing is taking much, much longer (I wasn't able to time it, but I'd say at least 5x, probably closer to 10x). The lines that have slowed down dramatically are these:

df[df.picked_up.notnull() & df.pickup_by.notnull()].apply(lambda x: x["picked_up"] < x["pickup_by"], axis=1)
df[df.time_bol.notnull() & df.delivered.notnull()].apply(lambda x: x["time_bol"]  <  x["delivered"] + datetime.timedelta(seconds = 3600*24), axis=1)

Where the columns:
"picked_up" is of type pandas.tslib.Timestamp
"pickup_by" is of type 'datetime.datetime'
'time_bol' is of type 'pandas.tslib.Timestamp'
'delivered' is of type 'pandas.tslib.Timestamp'

The operation is being performed on 23189 rows of which some have blank values in one or more of the columns. 

(PS: The reason I didn't do it the following way is because pandas gives me TypeErrors)
df[df.picked_up.notnull() & df.pickup_by.notnull()].picked_up < df[df.picked_up.notnull() & df.pickup_by.notnull()].pickup_by

Am I doing something wrong? How can I make this operation faster?

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Ross Kravitz | 20 Feb 19:42 2015
Picon

pandas.io.gbq bug in read_gbq, and high memory usage

Is anyone else using pandas.io.gbq?

In line 188 of gbq.py, there is an error, should be while (not query_reply.get('jobComplete')) instead of while ('jobComplete' not in query_reply), because the query_reply dict will have 'jobComplete' = False while the query runs.

Second, this process seems to be very memory inefficient.  I have a 750mb file that I'm pulling which takes at least 9GB of memory, before I kill the process.  There seem to be multiple copies of the data being made in the method.  In pandas 0.13.2, the read_gbq method was useable, and now it isn't.  Does anyone know what changes have been made?

Thanks,

Ross

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Dave Hirschfeld | 25 Feb 18:35 2015
Picon

`AbstractHolidayCalendar` caching gives wrong results!

I created a subclass of `AbstractHolidayCalendar` and defined two *instances* of the subclass with different holiday rules. Because of caching at the *class* level this doesn't work as expected.

See https://github.com/pydata/pandas/issues/9552 for details.

I think that it should be possible to do what I'm after and that the current behaviour is somewhat unexpected. I'm happy to put in a fix if it's agreed that the caching should be moved to the instance level.


Thanks,
Dave

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Damian Avila | 23 Feb 22:18 2015

ANN: Bokeh 0.8.1 released

Hi all,

We are excited to announce the release of version 0.8.1 of Bokeh, an interactive web plotting library for Python... and other languages!  This minor release includes many bug fixes and docs improvements:

* Fixed HoverTool
* Fixed Abstract Rendering implementation and docs
* Fixed Charts gallery and docs
* Removed leftovers from the old plotting API implementation
* Some other minor docs fixes

See the CHANGELOG for full details.

If you are using Anaconda/minoconda, you can install with conda:

       conda install bokeh

Alternatively, you can install with pip:

       pip install bokeh

Developer builds are also now made available to get features in the hands of interested users more quickly. See the Developer Builds section in the documentation for more details.

BokehJS is also available by CDN for use in standalone Javascript applications:


Finally, BokehJS is also installable with the Node Package Manager at https://www.npmjs.com/package/bokehjs

Issues, enhancement requests, and pull requests can be made on the Bokeh Github page: https://github.com/bokeh/bokeh

Questions can be directed to the Bokeh mailing list: bokeh-aihBOO89d3ITaNkGU808tA@public.gmane.org

Cheers!

--
Damián Avila
Continuum Analytics

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.
Leo | 23 Feb 13:50 2015

ANN: Update on pandaSDMX - New mailing list for SDMX statistical data and metadata exchange in Python

Hi,

pandaSDMX is a library to ease the retrieval and acquisition of
statistical data and metadata disseminated according to the global
SDMX standard. Since the initial release v0.1 in September 2014,
pandaSDMX has been redesigned and rewritten from the ground up to make
it modular, robust and extensible. The code base has more than
doubled. It now consists of several subpackages. v0.1 was a single
module with a monolithic class and a handfull of specialised methods.

The forthcoming v0.2 will expose SDMX data and metadata through a
pythonic representation of the SDMX information model. Being a layer
above the XML data file, the information model classes are the entry
points for writers to arbitrary output formats such as pandas, Excel,
databases or HTML. Readers may accomodate various SDMX transmission
formats. Currently there is just one reader for SDMXML-2.1 and one
writer (for pandas series and dataframes). The test suite is growing
(needs nosetests). Sphinx docs are planned.

I expect v0.2 to be released in a couple of weeks. It will be much
more robust and feature-rich than v0.1. It will allow full access to
data and significant parts of metadata from Eurostat and the European
Central Bank (ECB).

Development continues at https://github.com/dr-leo/pandaSDMX

Developers and testers are urgently needed. Those interested are
cordially invited to join the mailing list at
https://groups.google.com/forum/?hl=en#!forum/sdmx-python

Regards

Leo

--

-- 
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@...
For more options, visit https://groups.google.com/d/optout.

Raniere Silva | 20 Feb 14:01 2015
Picon

Export DataFrame to HTML with rowspan

Hi,

I'm searching for how I can export DataFrame to HTML
using rowspan [1] for duplicated values in one column.

Thanks in advance,
Raniere

[1] http://www.w3.org/TR/html401/struct/tables.html#h-11.2.6.1

================================ Small Example ================================

$ cat demo.csv
method,database,time
a,x,0
a,y,0
b,x,0
b,y,0
$ cat demo.py
import pandas

data = pandas.read_csv("demo.csv")
print(data.to_html(index=False))
$ python2 demo.py

================================ Actual Results ================================

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>method</th>
      <th>database</th>
      <th>time</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td> a</td>
      <td> x</td>
      <td> 0</td>
    </tr>
    <tr>
      <td> a</td>
      <td> y</td>
      <td> 0</td>
    </tr>
    <tr>
      <td> b</td>
      <td> x</td>
      <td> 0</td>
    </tr>
    <tr>
      <td> b</td>
      <td> y</td>
      <td> 0</td>
    </tr>
  </tbody>
</table>

============================== "Expected" Results ==============================

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>method</th>
      <th>database</th>
      <th>time</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td rowspan="2"> a</td>
      <td> x</td>
      <td> 0</td>
    </tr>
    <tr>
      <td> y</td>
      <td> 0</td>
    </tr>
    <tr>
      <td rowspan="2"> b</td>
      <td> x</td>
      <td> 0</td>
    </tr>
    <tr>
      <td> y</td>
      <td> 0</td>
    </tr>
  </tbody>
</table>

--

-- 
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@...
For more options, visit https://groups.google.com/d/optout.
Damián Avila | 20 Feb 13:21 2015
Picon

[pydata] ANN: SciPy Latin América 2015 - Call for Proposals

Call for Proposals

SciPy Latin América 2015, the third annual Scientific Computing with Python Conference, will be held this May 20-22 in Posadas, Misiones, Argentina.

SciPy is a community dedicated to the advancement of scientific computing through open source Python software for mathematics, science, and engineering. The annual SciPy Conferences allows participants from academic, commercial, and governmental organizations to showcase their latest projects, learn from skilled users and developers, and collaborate on code development.

Proposals are now being accepted for SciPy Latin América 2015.

Presentation content can be at a novice, intermediate or advanced level. Talks will run 30-40 min and hands-on tutorials will run 100-120 min. We also receive proposal for posters. For more information about the different types of proposal, see below the "Different types of Communication" section.

How to Submit?

  1. Register for an account on http://conf.scipyla.org/user/register
  2. Submit your proposal at http://conf.scipyla.org/activity/propose

Important Dates

  • April 6th: Talks, poster, tutorial submission deadline.
  • April 20th: Notification Talks / Posters / Tutorial accepted.
  • May 20th-22nd: SciPy Latin América 2015.

Different types of Communication

Talks: These are the traditional talk sessions given during the main conference days. They're mostly 30 minutes long with 5 min for questions. If you think you have a topic but aren't sure how to propose it, contact our program committee and we'll work with you. We'd love to help you come up with a great proposal.

Tutorials: We are looking for tutorials that can grow this community at any level. We aim for tutorials that will advance Scientific Python, advance this community, and shape the future. They're are 100-120 minutes long, but if you think you need more than one slot, you can split the content and submit two self-contained proposals.

Posters: The poster session provides a more interactive, attendee-driven presentation than the speaker-driven conference talks. Poster presentations have fostered extensive discussions on the topics, with many that have gone on much longer than the actual "session" called for. The idea is to present your topic on poster board and as attendees mingle through the rows, they find your topic, read through what you've written, then strike up a discussion on it. It's as simple as that. You could be doing Q&A in the first minute of the session with a group of 10 people.

Lightning Talks: Want to give a talk, but do not have enough material for a full talk? These talks are, at max, 5 minute talks done in quick succession in the main hall. No need to fill the whole slot, though!


--
The SciPy LA 2015 Program Committee

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
For more options, visit https://groups.google.com/d/optout.

Gmane