sorry for not posting earlier, post-conference InboxInfinity blues and all that...
The BoF did go as planned, and it was a good discussion, mostly following the tentative agenda outlined here:
Various folks were kind enough to take notes during the conversation on an Etherpad instance:
For the sake of completeness and future reference, below I'm including a copy of the notes in this email.
- we probably needed a longer slot than 45 minutes to have a chance to dig in a little deeper.
- it would have been more productive if a focused numpy sprint had been also planned, so that there could be more structured follow-up on the ideas that came up.
It would be great to hear from others who were present at the conference. In particular, Chris Barker brought up a number of things regarding datetime and planned on following up during the sprints, but I'm not sure what ended up happening.
Notes from BoF:
1:30, July 19, 2014
Working with topics on this page:
chuck: where do we go from here? -- what is the role of numpy now?
Generalized ufuncs -- still some more to do -- (LA stuff - norms)
- some ufuncs don't impliment array interface -- which are those -- sprint topic?
Implementation of <at> (matrix multiplication)
- will be in 3.5 ~ 18months
- no work started yet -- have to make sure we do it.
- <at> <at> was not added.
- The PEP for numpy is well-defined. Not much thinking to be done. (Good for a sprint)
- Can it be done? -- too many calendars -- to many time scales, etc.
- Can we cover most applications?
- DynND -- higher abstraction -- convert to back end implimentation
- Also look at what R and Julia do?
- Maybe fix up the little issues in datetime64, first?
- Pandas does not use numpy machinery
- uses a array of objects: those objects are subclassed form datetime.datetime
- does use int64, but gets unboxed on storage.
- Root cause is using UTC, rather than a naive time.
- Naive is not associated with a time zone. Can be interpreted in any way.
- Ripping out the locale timezone on I/O would help.
- More often than not, using the locale timezone is not desired.
- For example, many experimental data do not attach time zones. (Or wrong timezone)
- Consider laboratory time (stopwatch rather than a clock). (timedelta)
- The C++ committee is standardizing this.
- A key feature which is missing, is being able to choose your epoch.
- Example: quad float types. A solution for missing values? Adding units support.
- Record & structured arrays play around with dtypes. Needs to be easier to use these.
- Improve documentation.
- How to extend to support things like labeled arrays?
- This is orthogonal to dtypes.
- Would rather access time column instead of 3rd column.
- Would provide a better foundation for pandas.
- Key is to keep inputs simple.
- Finish the DataArray push?
- We are very closely there. It has been sitting there for a while.
- If interested, talk at sprints on July 10.
- maybe improve masked array.
- give up for now.
- introduces many bugs.
- should discourage this, but make it easier to work with it.
- The issues discussed so far were motivation for starting dynd
- for example, a pluggable type system
- adding a categorical type in numpy (at Continuum) broke lots. Easier in dynd.
- Commitment for dynd is to give it a numpy-like API
- Both need to evolve together.
- Find ways to make things more uniform (in numpy)
- Dynd is more an experimental phase, changing quickly.
- Can we import dynd as np?
- Not a goal. More exploratory in this phase.
- Adding a layer like that at a later time would be good. Not there, yet.
- Do not want to repeat py2->py3 debacle.
- Buffer protocol:
- Supported, but dynd extends it.
- As a pure C++ library, goal is to freeze once stable so systems beyond Python can depend on it as a stable interface for working with array data.
- Nothing official from numpy for using numpy arrays in C++
- Not prioritized.
- Numpy has gotten better about namespace pollution?
- It kind of works already. Talk to Mike Droettboom