Michael Sorich | 1 Mar 07:40 2006
Picon

Table like array

Hi,

I am looking for a table like array. Something like a 'data frame' object to those familiar with the statistical languages R and Splus. This is mainly to hold and manipulate 2D spreadsheet like data, which tends to be of relatively small size (compared to what many people seem to use numpy for), heterogenous, have column and row names, and often contains missing data. A RecArray seems potentially useful, as it allows different fields to have different data types and holds the name of the field. However it doesn't seem easy to manipulate the data. Or perhaps I am simply having difficulty finding documentation on there features.
eg
adding a new column/field (and to a lesser extent a new row/record) to the recarray
Changing the field/column names
make a new table by selecting a subset of fields/columns. (you can select a single field/column, but not multiple).
merging tables (concatenate seems to allow a recarray to be added as new rows but not columns)
It would also be nice for the table to be able to deal easily with masked data (I have not tried this with recarray yet) and perhaps also to be able to give the rows/records unique ids that could be used to select the rows/records (in addition to the row/record index), in the same way that the fieldnames can select the fields.

Can anyone comment on this issue? Particularly whether code exists for this purpose, and if not ideas about how best to go about developing such a Table like array (this would need to be limited to python programing as my ability to program in c is very limited).

Thanks,

michael


_______________________________________________
SciPy-user mailing list
SciPy-user <at> scipy.net
http://www.scipy.net/mailman/listinfo/scipy-user
Travis Oliphant | 1 Mar 08:15 2006
Picon

Re: Table like array

Michael Sorich wrote:

> Hi,
>
> I am looking for a table like array. Something like a 'data frame' 
> object to those familiar with the statistical languages R and Splus. 
> This is mainly to hold and manipulate 2D spreadsheet like data, which 
> tends to be of relatively small size (compared to what many people 
> seem to use numpy for), heterogenous, have column and row names, and 
> often contains missing data. 

You could subclass the ndarray to produce one of these fairly easily, I 
think.   The missing data item could be handled by a mask stored along 
with the array (or even in the array itself).  Or you could use a masked 
array as your core object (though I'm not sure how it handles the 
arbitrary (i.e. record-like) data-types yet).

Alternatively, and probably the easiest way to get started, you could 
just create your own table-like class and use simple 1-d arrays or 1-d 
masked arrays for each of the columns ---  This has always been a way to 
store record-like tables.

It really depends what you want the data-frames to be able to do and 
what you want them to "look-like."

> A RecArray seems potentially useful, as it allows different fields to 
> have different data types and holds the name of the field. However it 
> doesn't seem easy to manipulate the data. Or perhaps I am simply 
> having difficulty finding documentation on there features.

Adding a new column/field means basically creating a new array with a 
new data-type and copying data over into the already-defined fields.  
Data-types always have a fixed number of bytes per item.   What those 
bytes represent can be quite arbitrary but it's always fixed.   So, it 
is always "more work" to insert a new column.  You could make that 
seamless in your table class so the user doesn't see it though.

You'll want to thoroughly understand the dtype object including it's 
attributes and methods.  Particularly the fields attribute of the dtype 
object. 

> eg
> adding a new column/field (and to a lesser extent a new row/record) to 
> the recarray

Adding a new row or record is actually similar because once an array is 
created it is usually resized by creating another array and copying the 
old array into it in the right places.

> Changing the field/column names
> make a new table by selecting a subset of fields/columns. (you can 
> select a single field/column, but not multiple).

Right.  So far you can't select multiple columns.  It would be possible 
to add this feature with a little-bit of effort if there were a strong 
demand for it, but it would be much easier to do it in your subclass 
and/or container class.

How many people would like to see x['f1','f2','f5']  return a new array 
with a new data-type descriptor constructed from the provided fields?

> It would also be nice for the table to be able to deal easily with 
> masked data (I have not tried this with recarray yet) and perhaps also 
> to be able to give the rows/records unique ids that could be used to 
> select the rows/records (in addition to the row/record index), in the 
> same way that the fieldnames can select the fields.

Adding fieldnames to the "rows" is definitely something that a subclass 
would be needed for.  I'm not sure how you would even propose to select 
using row names.  Would you also use getitem semantics? 

> Can anyone comment on this issue? Particularly whether code exists for 
> this purpose, and if not ideas about how best to go about developing 
> such a Table like array (this would need to be limited to python 
> programing as my ability to program in c is very limited).

I don't know of code that already exists for this, but I don't think it 
would be too hard to construct your own data-frame object.

I would probably start with an implementation that just used standard 
arrays of a particular type to represent the internal columns and then 
handle the indexing using your own over-riding of the __getitem__ and 
__setitem__ special methods.  This would be the easiest to get working, 
I think.

-Travis
Travis Oliphant | 1 Mar 08:36 2006
Picon

Re: Table like array

Michael Sorich wrote:

> Hi,
>
> I am looking for a table like array. Something like a 'data frame' 
> object to those familiar with the statistical languages R and Splus. 

It just occurred to me you might not have heard of RPy.  RPy is a Python 
interface to the R language.  Whether you want to actually interface 
with R or not.  It has defined something called the DataFrame class to 
interface with R's data-frames.

You could start there and just use arrays to store the actual column data...

http://rpy.sourceforge.net/rpy/doc/manual_html/DataFrame-class.html

That example shows that a simple data-frame is just a dictionary keyed 
by column name.  You could then add a key for your "row names" and use 
that to access data using row-names.   In fact, the record data-types 
are also dictionary-based (look at the fields method of a data-type 
object). 

It makes me think that you could get something very much what you want 
using your own class that just wraps 1-d arrays (or even lists).

-Travis
Nils Wagner | 1 Mar 08:46 2006
Picon

Modified makefile for gcc 4.x

Hi Hanno,

Please can you send me your modified ATLAS makefile.

Thanks in advance

                                    Nils
Nils Wagner | 1 Mar 10:39 2006
Picon

gfortran, ifc, compat-g77

Hi all,

It seems to me that installing numpy/scipy on SuSE10.0 is not
straightforward in contrast
to prior versions.

If I remove compat-g77 and try to rebuild numpy from scratch the ifc is
used.
I thought that gfortran will be used  in that case.
Am I missing something ?

Anyway

python setup.py build results  in

 ld: skipping incompatible build/temp.linux-x86_64-2.4/libfblas_src.a
when searching for -lfblas_src
ld: cannot find -lfblas_src
ld: skipping incompatible build/temp.linux-x86_64-2.4/libfblas_src.a
when searching for -lfblas_src
ld: cannot find -lfblas_src
error: Command "/opt2/intel/compiler70/ia32/bin/ifc -shared
build/temp.linux-x86_64-2.4/numpy/core/blasdot/_dotblas.o
-Lbuild/temp.linux-x86_64-2.4 -lfblas_src -o
build/lib.linux-x86_64-2.4/numpy/core/_dotblas.so" failed with exit status 1

Nils

Is there someone on the list who has successfully installed numpy/scipy
with or without ATLAS using SuSE 10.0 ?

http://www.novell.com/products/linuxpackages/professional/compat-g77.html
Zachary Pincus | 1 Mar 10:54 2006
Picon

scipy.stats.ttest_ind broken?

Hi folks,

I'm using scipy 0.4.6 with numpy 0.9.5, and I have noticed that the t- 
test in the stats library is broken.

scipy.stats.ttest_ind([1,2,3], [4,5,6])
------------------------------------------------------------------------ 
---
exceptions.TypeError                      Traceback (most recent call  
last)

/Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site- 
packages/scipy/stats/stats.py in ttest_ind(a, b, axis, printit,  
name1, name2, writemode)
    1461     if type(t) == ArrayType:
    1462         probs = reshape(probs,t.shape)
-> 1463     if len(probs) == 1:
    1464         probs = probs[0]
    1465

TypeError: len() of unsized object

What's happening is that betai is returning a scalar value, which has  
no length. This causes the len(probs) call to fail.

Zach Pincus

Program in Biomedical Informatics and Department of Biochemistry
Stanford University School of Medicine
Christian Kristukat | 1 Mar 11:18 2006
Picon

Re: gfortran, ifc, compat-g77

Nils Wagner wrote:
> Hi all,
> 
> It seems to me that installing numpy/scipy on SuSE10.0 is not
> straightforward in contrast
> to prior versions.
> 
> If I remove compat-g77 and try to rebuild numpy from scratch the ifc is
> used.
> I thought that gfortran will be used  in that case.
> Am I missing something ?
> 
> Anyway
> 
> python setup.py build results  in
> 
>  ld: skipping incompatible build/temp.linux-x86_64-2.4/libfblas_src.a
> when searching for -lfblas_src
> ld: cannot find -lfblas_src
> ld: skipping incompatible build/temp.linux-x86_64-2.4/libfblas_src.a
> when searching for -lfblas_src
> ld: cannot find -lfblas_src
> error: Command "/opt2/intel/compiler70/ia32/bin/ifc -shared
> build/temp.linux-x86_64-2.4/numpy/core/blasdot/_dotblas.o
> -Lbuild/temp.linux-x86_64-2.4 -lfblas_src -o
> build/lib.linux-x86_64-2.4/numpy/core/_dotblas.so" failed with exit status 1
>  
> 
> Nils
> 
> Is there someone on the list who has successfully installed numpy/scipy
> with or without ATLAS using SuSE 10.0 ?

Yes, me (again).
Maybe this is a processor issue, but for an Intel P4, I only can repeat: if you
have installed gcc4/gfortran and no other compiler, following the installation
instructions for ATLAS on the wiki building scipy _is_ straightforward. You
won't need any additional information, even building ATLAS with full is covered
there.

You could also try to force to use gfortran like this:

python setup.py config_fc --fcompiler=g95 build

Regards, Christian

ps: I can provide numpy/scipy rpms for SuSE10 with ATLAS built on that processor:

vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping        : 1
cpu MHz         : 3007.387
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni monitor ds_cpl cid
xtpr
bogomips        : 6019.80
Christian Kristukat | 1 Mar 11:21 2006
Picon

Re: gfortran, ifc, compat-g77

Christian Kristukat wrote:
> instructions for ATLAS on the wiki building scipy _is_ straightforward. You
> won't need any additional information, even building ATLAS with full is covered

Sorry, this should read: "... ATLAS with full LAPACK..."

Christian
Pearu Peterson | 1 Mar 10:32 2006

Re: gfortran, ifc, compat-g77


On Wed, 1 Mar 2006, Christian Kristukat wrote:

> You could also try to force to use gfortran like this:
>
> python setup.py config_fc --fcompiler=g95 build

g95 is not gfortran. One should use

python setup.py config_fc --fcompiler=gnu95 build

for forcing gfortran.

Pearu
Nils Wagner | 1 Mar 11:33 2006
Picon

Re: gfortran, ifc, compat-g77

Christian Kristukat wrote:
> Nils Wagner wrote:
>   
>> Hi all,
>>
>> It seems to me that installing numpy/scipy on SuSE10.0 is not
>> straightforward in contrast
>> to prior versions.
>>
>> If I remove compat-g77 and try to rebuild numpy from scratch the ifc is
>> used.
>> I thought that gfortran will be used  in that case.
>> Am I missing something ?
>>
>> Anyway
>>
>> python setup.py build results  in
>>
>>  ld: skipping incompatible build/temp.linux-x86_64-2.4/libfblas_src.a
>> when searching for -lfblas_src
>> ld: cannot find -lfblas_src
>> ld: skipping incompatible build/temp.linux-x86_64-2.4/libfblas_src.a
>> when searching for -lfblas_src
>> ld: cannot find -lfblas_src
>> error: Command "/opt2/intel/compiler70/ia32/bin/ifc -shared
>> build/temp.linux-x86_64-2.4/numpy/core/blasdot/_dotblas.o
>> -Lbuild/temp.linux-x86_64-2.4 -lfblas_src -o
>> build/lib.linux-x86_64-2.4/numpy/core/_dotblas.so" failed with exit status 1
>>  
>>
>> Nils
>>
>> Is there someone on the list who has successfully installed numpy/scipy
>> with or without ATLAS using SuSE 10.0 ?
>>     
>
> Yes, me (again).
> Maybe this is a processor issue, but for an Intel P4, I only can repeat: if you
> have installed gcc4/gfortran and no other compiler, following the installation
> instructions for ATLAS on the wiki building scipy _is_ straightforward. You
> won't need any additional information, even building ATLAS with full is covered
> there.
>
> You could also try to force to use gfortran like this:
>
> python setup.py config_fc --fcompiler=g95 build
>
> Regards, Christian
>
> ps: I can provide numpy/scipy rpms for SuSE10 with ATLAS built on that processor:
>
> vendor_id       : GenuineIntel
> cpu family      : 15
> model           : 4
> model name      : Intel(R) Pentium(R) 4 CPU 3.00GHz
> stepping        : 1
> cpu MHz         : 3007.387
> cache size      : 1024 KB
> physical id     : 0
> siblings        : 2
> core id         : 0
> cpu cores       : 1
> fdiv_bug        : no
> hlt_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 5
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni monitor ds_cpl cid
> xtpr
> bogomips        : 6019.80
>
> _______________________________________________
> SciPy-user mailing list
> SciPy-user <at> scipy.net
> http://www.scipy.net/mailman/listinfo/scipy-user
>   

Just in the case of a processor issue ...

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 47
model name      : AMD Athlon(tm) 64 Processor 3200+
stepping        : 2
cpu MHz         : 2000.141
cache size      : 512 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt
lm 3dnowext 3dnow pni lahf_lm
bogomips        : 4009.73
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

Nils

Gmane