Martin Spacek | 1 Dec 02:00

Re: Loading a > GB file into array

Martin Spacek wrote:
 > Would it be better to load the file one
 > frame at a time, generating nframes arrays of shape (height, width),
 > and sticking them consecutively in a python list?

I just tried this, and it works. Looks like it's all in physical RAM (no 
disk thrashing on the 2GB machine), *and* it's easy to index into. I 
guess I should of thought of this a while ago, since each entry in a 
python list can point to anywhere in memory. Here's roughly what the 
code looks like:

import numpy as np

f = file(fname, 'rb') # 1.3GB file
frames = [None] * nframes # init a list to hold all frames
for framei in xrange(nframes): # one frame at a time...
     frame = np.fromfile(f, np.uint8, count=framesize) # load next frame
     frame.shape = (height, width)
     frames[framei] = frame # save it in the list

--
Martin
David Cournapeau | 1 Dec 09:26
Picon
Picon

Re: Loading a > GB file into array

Martin Spacek wrote:
> Kurt Smith wrote:
>  > You might try numpy.memmap -- others have had success with it for
>  > large files (32 bit should be able to handle a 1.3 GB file, AFAIK).
>
> Yeah, I looked into numpy.memmap. Two issues with that. I need to 
> eliminate as much disk access as possible while my app is running. I'm 
> displaying stimuli on a screen at 200Hz, so I have up to 5ms for each 
> movie frame to load before it's too late and it drops a frame. I'm sort 
> of faking a realtime OS on windows by setting the process priority 
> really high. Disk access in the middle of that causes frames to drop. So 
> I need to load the whole file into physical RAM, although it need not be 
> contiguous. memmap doesn't do that, it loads on the fly as you index 
> into the array, which drops frames, so that doesn't work for me.
If you want to do it 'properly', it will be difficult, specially in 
python, specially on windows. This looks really similar to the problem 
of direct to disk recording, that is you record audio signals from the 
soundcard into the hard-drive (think recording a concert), and the 
proper design, at least on linux and mac os X, is to have several 
threads, one for the IO, one for any computation you may want to do 
which do not block on any condition, etc... and use special OS 
facilities (FIFO scheduling, lock pages into physical ram, etc...) as 
well as some special construct (lock-free ring buffers). This design 
works relatively well for musical applications, where the data has the 
same order of magnitude than what you are talking about, and the same 
kind of latency order (a few ms).

This may be overkill for your application, though.
>
> The 2nd problem I had with memmap was that I was getting a WindowsError 
(Continue reading)

Sebastian Haase | 1 Dec 11:24
Picon

Re: Loading a > GB file into array

On Dec 1, 2007 12:09 AM, Martin Spacek <numpy <at> mspacek.mm.st> wrote:
> Kurt Smith wrote:
>  > You might try numpy.memmap -- others have had success with it for
>  > large files (32 bit should be able to handle a 1.3 GB file, AFAIK).
>
> Yeah, I looked into numpy.memmap. Two issues with that. I need to
> eliminate as much disk access as possible while my app is running. I'm
> displaying stimuli on a screen at 200Hz, so I have up to 5ms for each
> movie frame to load before it's too late and it drops a frame. I'm sort
> of faking a realtime OS on windows by setting the process priority
> really high. Disk access in the middle of that causes frames to drop. So
> I need to load the whole file into physical RAM, although it need not be
> contiguous. memmap doesn't do that, it loads on the fly as you index
> into the array, which drops frames, so that doesn't work for me.
>
> The 2nd problem I had with memmap was that I was getting a WindowsError
> related to memory:
>
>  >>> data = np.memmap(1.3GBfname, dtype=np.uint8, mode='r')
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "C:\bin\Python25\Lib\site-packages\numpy\core\memmap.py", line
> 67, in __new__
>     mm = mmap.mmap(fid.fileno(), bytes, access=acc)
> WindowsError: [Error 8] Not enough storage is available to process this
> command
>
>
> This was for the same 1.3GB file. This is different from previous memory
(Continue reading)

Re: Loading a > GB file into array

Ivan Vilata i Balaguer (el 2007-11-30 a les 19:19:38 +0100) va dir::

> Well, one thing you could do is dump your data into a PyTables_
> ``CArray`` dataset, which you may afterwards access as if its was a
> NumPy array to get slices which are actually NumPy arrays.  PyTables
> datasets have no problem in working with datasets exceeding memory size.
>[...]

I've put together the simple script I've attached which dumps a binary
file into a PyTables' ``CArray`` or loads it to measure the time taken
to load each frame.  I've run it on my laptop, which has a not very fast
4200 RPM laptop hard disk, and I've reached average times of 16 ms per
frame, after dropping caches with::

    # sync && echo 1 > /proc/sys/vm/drop_caches

This I've done with the standard chunkshape and no compression.  Your
data may lean itself very well to bigger chunkshapes and compression,
which should lower access times even further.  Since (as David pointed
out) 200 Hz may be a little exaggerated for human eye, loading
individual frames from disk may prove more than enough for your problem.

HTH,

::

	Ivan Vilata i Balaguer   >qo<   http://www.carabos.com/
	       Cárabos Coop. V.  V  V   Enjoy Data
	                          ""
(Continue reading)

Georg Holzmann | 1 Dec 17:21
Picon

Re: swig numpy2carray converters

Hallo!

> * A new ARGOUTVIEW suite of typemaps is provided that allows your  
> wrapped function
>    to provide a pointer to internal data and that returns a numpy  
> array encapsulating
>    it.

Thanks for integrating it !

> * New typemaps are provided that correctly handle FORTRAN ordered 2D  
> and 3D arrays.

I have some problem with your FORTRAN implementation.
The problem is how you set the flags in numpy.i "int 
require_fortran(PyArrayObject* ary)" (~ line 402).

You do it like this:
ary->flags = ary->flags | NPY_F_CONTIGUOUS;
which does not work (at least on my computer) - I still get usual 
C-ordered arrays returned.

However, it does work if you set the flags like this:
ary->flags = NPY_FARRAY;

> Tests for the ARGOUTVIEW and FORTRAN ordered arrays have also been  
> added, and the documentation (doc/numpy_swig.*) has been updated to  
> reflect all of these changes.

A small typo: in the docs you also write about 1D FORTRAN ARGOUTVIEW 
(Continue reading)

Hans Meine | 1 Dec 18:44
Picon
Favicon

Re: Loading a > GB file into array

On Samstag 01 Dezember 2007, Martin Spacek wrote:
> Kurt Smith wrote:
>  > You might try numpy.memmap -- others have had success with it for
>  > large files (32 bit should be able to handle a 1.3 GB file, AFAIK).
>
> Yeah, I looked into numpy.memmap. Two issues with that. I need to
> eliminate as much disk access as possible while my app is running. I'm
> displaying stimuli on a screen at 200Hz, so I have up to 5ms for each
> movie frame to load before it's too late and it drops a frame. I'm sort
> of faking a realtime OS on windows by setting the process priority
> really high. Disk access in the middle of that causes frames to drop. So
> I need to load the whole file into physical RAM, although it need not be
> contiguous. memmap doesn't do that, it loads on the fly as you index
> into the array, which drops frames, so that doesn't work for me.

Sounds as if using memmap and then copying each frame into a separate 
in-memory ndarray could help?

Ciao, /  /                                                    .o.
     /--/                                                     ..o
    /  / ANS                                                  ooo
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion <at> scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Hans Meine | 1 Dec 18:49
Picon
Favicon

Re: display numpy array as image (Giorgio F. Gilestro)

On Freitag 30 November 2007, Joe Harrington wrote:
> I was misinformed about the status of numdisplay's pages.  The package
> is available as both part of stsci_python and independently, and its
> (up-to-date) home page is here:
>
> http://stsdas.stsci.edu/numdisplay/

I had a look at ds9/numdisplay, and as a summary, I found a nice viewer for 
scalar images with colormap support, and x/y projections.

What I missed though is the display of RGB images.  It looks as if ds9 was 
capable of doing so (I could add "frames" of RGB type), but I did not find a 
way to feed them with data.  numdisplay.display(myarray) seems to only 
support 2D-arrays.

Ciao, /  /                                                    .o.
     /--/                                                     ..o
    /  / ANS                                                  ooo
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion <at> scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Bill Spotz | 1 Dec 19:13
Picon
Favicon

Re: [Swig-user] swig numpy2carray converters

These corrections have been committed.

Thanks.

On Dec 1, 2007, at 9:21 AM, Georg Holzmann wrote:

>> * A new ARGOUTVIEW suite of typemaps is provided that allows your
>> wrapped function
>>    to provide a pointer to internal data and that returns a numpy
>> array encapsulating
>>    it.
>
> Thanks for integrating it !
>
>> * New typemaps are provided that correctly handle FORTRAN ordered 2D
>> and 3D arrays.
>
> I have some problem with your FORTRAN implementation.
> The problem is how you set the flags in numpy.i "int
> require_fortran(PyArrayObject* ary)" (~ line 402).
>
> You do it like this:
> ary->flags = ary->flags | NPY_F_CONTIGUOUS;
> which does not work (at least on my computer) - I still get usual
> C-ordered arrays returned.
>
> However, it does work if you set the flags like this:
> ary->flags = NPY_FARRAY;
>
>> Tests for the ARGOUTVIEW and FORTRAN ordered arrays have also been
(Continue reading)

Ondrej Certik | 3 Dec 01:52
Picon
Gravatar

python-numpy debian package and f2py

Hi,

I am a comaintainer of the python-scipy package in Debian and now it
seems to be in quite a good shape. However, the python-numpy package
is quite a mess, so as it usually goes in opensource, I got fedup and
I tried to clean it. But I noticed, that f2py was moved from external
package into numpy, however
the versions mishmatch:

The newest (deprecated) python-f2py package in Debian has a version
2.45.241+1926, so I assume this was the version of f2py, before
merging
with numpy. However, the f2py in numpy says when executing:

Version:     2_3816
numpy Version: 1.0.3

so I assume the version of f2py in numpy is 2_3816? So has the
versioning scheme of f2py changed? Another question - since both numpy
and f2py
is now built from the same source, doesn't f2py simply has the same
version as numpy, i.e. 1.0.3?  Note: I know there is a newer numpy
release, but that's
not the point now.

I am asking because we probably will have to remove the old
python-f2py package and build a new one from the sources of numpy,
etc., and it will
take some time until this happens (ftpmasters need to remove the old
package from the archive, then the new binary package needs to go to
(Continue reading)

Martin Spacek | 3 Dec 02:22

Re: Loading a > GB file into array

Sebastian Haase wrote:
> reading this thread I have two comments.
> a) *Displaying* at 200Hz probably makes little sense, since humans
> would only see about max. of 30Hz (aka video frame rate).
> Consequently you would want to separate your data frame rate, that (as
> I understand) you want to save data to disk and - asynchrounously -
> "display as many frames as you can" (I have used pyOpenGL for this
> with great satisfaction)

Hi Sebastian,

Although 30Hz looks pretty good, if you watch a 60fps movie, you can
easily tell the difference. It's much smoother. Try recording AVIs on a
point and shoot digital camera, if you have one that can do both 30fps
and 60fps (like my fairly old Canon SD200).

And that's just perception. We're doing neurophysiology, recording from
neurons in the visual cortex, which can phase lock to CRT screen rasters
up to 100Hz. This is an artifact we don't want to deal with, so we use a
200Hz monitor. I need to be certain of exactly what's on the monitor on
every refresh, ie every 5ms, so I run python (with Andrew Straw's
package VisionEgg) as a "realtime" priority process in windows on a dual
core computer, which lets me reliably update the video frame buffer in
time for the next refresh, without having to worry about windows
multitasking butting in and stealing CPU cycles for the next 15-20ms.
Python runs on one core in "realtime", windows does its junk on the
other core. Right now, every 3rd video refresh (ie every 15ms, which is
66.7 Hz, close to the original 60fps the movie was recorded at) I update
with a new movie frame. That update needs to happen in less than 5ms,
every time. If there's any disk access involved during the update, it
(Continue reading)


Gmane