David Carmean | 9 Feb 22:52

Emulate left outer join?


Hi,

I've been working with numpy for less than a month, having learned about 
it after finding matplotlib.  My foundation in things like set theory is...
weak to nonexistent, so I need a little help mapping sql-like thoughts into 
set-theory thinking :)

Some context to help me explain:  I'm trying to store, chart, and analyze 
unix system performance data (sar/sadf output).  On a typical system I have 
about 75 fields/variables, all floats, with identical timestamps... or so 
we hope.   What I want to do in order to save memory/disk space is to stack 
the timeseries data all into three or four different arrays, and use a single 
timestamp field for each set.

My problem is: I don't know that I can guarantee that the shape of all the 
individual arrays will be identical along the time axis.  I may receive 
truncated textfiles to parse, or new variables may appear and disappear from 
the set being reported/recorded.

If these were in flat files or database tables, I'd do a left outer join between 
a master timestamp table and each individual variable's table.   But... I don't 
know the keywords to search for in the numpy docs/web chatter.  A thread from 
just about one year ago left the question hanging:

    http://article.gmane.org/gmane.comp.python.numeric.general/27942

Examples? Pointers?  Shoves toward the correct sections of the docs?

Thanks.
Robert Kern | 9 Feb 23:02
Picon
Gravatar

Re: Emulate left outer join?


On Tue, Feb 9, 2010 at 15:52, David Carmean <dlc <at> halibut.com> wrote: > > Hi, > > I've been working with numpy for less than a month, having learned about > it after finding matplotlib.  My foundation in things like set theory is... > weak to nonexistent, so I need a little help mapping sql-like thoughts into > set-theory thinking :) > > > Some context to help me explain:  I'm trying to store, chart, and analyze > unix system performance data (sar/sadf output).  On a typical system I have > about 75 fields/variables, all floats, with identical timestamps... or so > we hope.   What I want to do in order to save memory/disk space is to stack > the timeseries data all into three or four different arrays, and use a single > timestamp field for each set. > > My problem is: I don't know that I can guarantee that the shape of all the > individual arrays will be identical along the time axis.  I may receive > truncated textfiles to parse, or new variables may appear and disappear from > the set being reported/recorded. > > If these were in flat files or database tables, I'd do a left outer join between > a master timestamp table and each individual variable's table.   But... I don't > know the keywords to search for in the numpy docs/web chatter.  A thread from > just about one year ago left the question hanging: > >    http://article.gmane.org/gmane.comp.python.numeric.general/27942 > > Examples? Pointers?  Shoves toward the correct sections of the docs?
numpy.lib.recfunctions.join_by(key, r1, r2, jointype='leftouter') In [23]: numpy.lib.recfunctions.join_by? Type: function Base Class: <type 'function'> Namespace: Interactive File: /Users/rkern/svn/numpy/numpy/lib/recfunctions.py Definition: numpy.lib.recfunctions.join_by(key, r1, r2, jointype='inner', r1postfix='1', r2postfix='2', defaults=None, usemask=True, asrecarray=False) Docstring: Join arrays `r1` and `r2` on key `key`. The key should be either a string or a sequence of string corresponding to the fields used to join the array. An exception is raised if the `key` field cannot be found in the two input arrays. Neither `r1` nor `r2` should have any duplicates along `key`: the presence of duplicates will make the output quite unreliable. Note that duplicates are not looked for by the algorithm. Parameters ---------- key : {string, sequence} A string or a sequence of strings corresponding to the fields used for comparison. r1, r2 : arrays Structured arrays. jointype : {'inner', 'outer', 'leftouter'}, optional If 'inner', returns the elements common to both r1 and r2. If 'outer', returns the common elements as well as the elements of r1 not in r2 and the elements of not in r2. If 'leftouter', returns the common elements and the elements of r1 not in r2. r1postfix : string, optional String appended to the names of the fields of r1 that are present in r2 but absent of the key. r2postfix : string, optional String appended to the names of the fields of r2 that are present in r1 but absent of the key. defaults : {dictionary}, optional Dictionary mapping field names to the corresponding default values. usemask : {True, False}, optional Whether to return a MaskedArray (or MaskedRecords is `asrecarray==True`) or a ndarray. asrecarray : {False, True}, optional Whether to return a recarray (or MaskedRecords if `usemask==True`) or just a flexible-type ndarray. Notes ----- * The output is sorted along the key. * A temporary array is formed by dropping the fields not in the key for the two arrays and concatenating the result. This array is then sorted, and the common entries selected. The output is constructed by filling the fields with the selected entries. Matching is not preserved if there are some duplicates... For some reason, numpy.lib.recfunctions isn't in the documentation editor. I'm not sure why. -- -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion <at> scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Fernando Perez | 9 Feb 23:43
Picon

Re: Emulate left outer join?


On Tue, Feb 9, 2010 at 5:02 PM, Robert Kern <robert.kern <at> gmail.com> wrote: > > numpy.lib.recfunctions.join_by(key, r1, r2, jointype='leftouter') >
And if that isn't sufficient, John has in matplotlib.mlab a few other similar utilities that allow for more complex cases: In [2]: mlab.rec_ mlab.rec_append_fields mlab.rec_groupby mlab.rec_keep_fields mlab.rec_drop_fields mlab.rec_join mlab.rec_summarize Cheers, f
John Hunter | 9 Feb 23:49
Picon

Re: Emulate left outer join?


On Tue, Feb 9, 2010 at 4:43 PM, Fernando Perez <fperez.net <at> gmail.com> wrote: > On Tue, Feb 9, 2010 at 5:02 PM, Robert Kern <robert.kern <at> gmail.com> wrote: >> >> numpy.lib.recfunctions.join_by(key, r1, r2, jointype='leftouter') >> > > And if that isn't sufficient, John has in matplotlib.mlab a few other > similar utilities that allow for more complex cases:
The numpy.lib.recfunctions were ported from matplotlib.mlab so most of the functionality is overlapping, but we have added some stuff since the port, eg matplotlib.mlab.recs_join for a multiway join, and some stuff was never ported (rec_summarize, rec_groupby) so it may be worth looking in mlab too. Some of the stuff for mpl is only in svn but most of it is released. Examples are at http://matplotlib.sourceforge.net/examples/misc/rec_join_demo.html http://matplotlib.sourceforge.net/examples/misc/rec_groupby_demo.html JDH
David Warde-Farley | 10 Feb 00:15
Picon
Favicon
Gravatar

Re: Emulate left outer join?


On 9-Feb-10, at 5:02 PM, Robert Kern wrote: >> Examples? Pointers? Shoves toward the correct sections of the docs? > > numpy.lib.recfunctions.join_by(key, r1, r2, jointype='leftouter')
Huh. All these years, how have I missed this? Yet another demonstration of why my "never skip over a Kern posting" policy exists. David
Ralf Gommers | 10 Feb 00:47
Gravatar

Re: Emulate left outer join?



On Wed, Feb 10, 2010 at 6:02 AM, Robert Kern <robert.kern <at> gmail.com> wrote:


For some reason, numpy.lib.recfunctions isn't in the documentation
editor. I'm not sure why.

Because it's not in np.lib.__all__ .

Cheers,
Ralf
 
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Robert Kern | 10 Feb 00:52
Picon
Gravatar

Re: Emulate left outer join?


On Tue, Feb 9, 2010 at 17:47, Ralf Gommers <ralf.gommers <at> googlemail.com> wrote: > > > On Wed, Feb 10, 2010 at 6:02 AM, Robert Kern <robert.kern <at> gmail.com> wrote: >> >> >> For some reason, numpy.lib.recfunctions isn't in the documentation >> editor. I'm not sure why. >> > Because it's not in np.lib.__all__ .
Then there needs to be a secondary way to add such modules. -- -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Vishal Rana | 9 Feb 16:42
Picon
Gravatar

Utility function to find array items are in ascending order

Hi,


Is there any utility function to find if values in the array are in ascending or descending order.

Example: 
arr = [1, 2, 4, 6] should return true
arr2 = [1, 0, 2, -2] should return false

Thanks
Vishal
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Keith Goodman | 9 Feb 16:50
Picon

Re: Utility function to find array items are in ascending order


On Tue, Feb 9, 2010 at 7:42 AM, Vishal Rana <ranavishal <at> gmail.com> wrote: > Hi, > Is there any utility function to find if values in the array are in > ascending or descending order. > Example: > arr = [1, 2, 4, 6] should return true > arr2 = [1, 0, 2, -2] should return false > Thanks > Vishal
I don't know if it is fast but np.diff should do the trick. You can check if all values are less than or equal to zero. Or if all are greater.
Brent Pedersen | 9 Feb 16:51
Picon
Gravatar

Re: Utility function to find array items are in ascending order

On Tue, Feb 9, 2010 at 7:42 AM, Vishal Rana <ranavishal <at> gmail.com> wrote:
> Hi,
> Is there any utility function to find if values in the array are in
> ascending or descending order.
> Example:
> arr = [1, 2, 4, 6] should return true
> arr2 = [1, 0, 2, -2] should return false
> Thanks
> Vishal
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion <at> scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>

i dont know if there's a utility function, but i'd use:

 >>> np.all(a[1:] >= a[:-1])
Keith Goodman | 9 Feb 16:53
Picon

Re: Utility function to find array items are in ascending order


On Tue, Feb 9, 2010 at 7:51 AM, Brent Pedersen <bpederse <at> gmail.com> wrote: > On Tue, Feb 9, 2010 at 7:42 AM, Vishal Rana <ranavishal <at> gmail.com> wrote: >> Hi, >> Is there any utility function to find if values in the array are in >> ascending or descending order. >> Example: >> arr = [1, 2, 4, 6] should return true >> arr2 = [1, 0, 2, -2] should return false >> Thanks >> Vishal >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion <at> scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > i dont know if there's a utility function, but i'd use: > >  >>> np.all(a[1:] >= a[:-1])
Yes, that's much better than np.diff.

Gmane