devnew@gmail.com | 1 Mar 2008 06:41
Picon

Re: PCA on set of face images


On Mar 1, 12:57 am, "Peter Skomoroch"  wrote:
I think
> > matlab example should be easy to translate to scipy/matplotlib using the
> > montage function:
>
> > load faces.mat
> > %Form covariance matrix
> > C=cov(faces');
> > %build eigenvectors and eigenvalues
> > [E,D] = eig(C);

hi Peter,
nice code..ran the examples..
however couldn't follow the matlab code since i have no exposure to
matlab..was using numpy etc for calcs
could you confirm the layout for the face images data? i assumed that
the initial face matrix should be
faces=a numpy matrix with N rows ie N=numofimages

row1=image1pixels as a sequence
row2=image2pixels as a sequence
...
rowN=imageNpixels as a sequence

and covariancematrix=faces*faces_transpose

is this the right way?
thanks
(Continue reading)

Charles R Harris | 1 Mar 2008 07:12
Picon

Re: contiguous true



On Fri, Feb 29, 2008 at 10:53 AM, John Hunter <jdh2358 <at> gmail.com> wrote:
[apologies if this is a resend, my mail just flaked out]

I have a boolean array and would like to find the lowest index "ind"
where N contiguous elements are all True.  Eg, if x is

In [101]: x = np.random.rand(20)>.4

In [102]: x
Out[102]:
array([False,  True,  True, False, False,  True,  True, False, False,
       True, False,  True, False,  True,  True,  True, False,  True,
      False,  True], dtype=bool)

I would like to find ind=1 for N=2 and ind=13 for N=2.  I assume with
the right cumsum, diff and maybe repeat magic, this can be vectorized,
but the proper incantation is escaping me.

for N==3, I thought of

In [110]: x = x.astype(int)
In [112]: y = x[:-2] + x[1:-1] + x[2:]

In [125]: ind = (y==3).nonzero()[0]

In [126]: if len(ind): ind = ind[0]

In [128]: ind
Out[128]: 13


This may be more involved than you want, but

In [37]: prng = random.RandomState(1234567890)

In [38]: x = prng.random_sample(50) < 0.5

In [39]: y1 = concatenate(([False], x[:-1]))

In [40]: y2 = concatenate((x[1:], [False]))

In [41]: beg = ind[x & ~y1]

In [42]: end = ind[x & ~y2]

In [43]: cnt = end - beg + 1

In [44]: i = beg[cnt == 4]

In [45]: i
Out[45]: array([28])

In [46]: x
Out[46]:
array([False, False, False, False,  True, False,  True, False, False,
       False,  True, False,  True, False,  True,  True,  True,  True,
        True, False, False, False,  True, False,  True, False, False,
       False,  True,  True,  True,  True, False, False,  True, False,
       False, False, False, False, False, False, False,  True, False,
       False,  True, False,  True, False], dtype=bool)

produces a list of the indices where sequences of length 4 begin.

Chuck

_______________________________________________
Numpy-discussion mailing list
Numpy-discussion <at> scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Charles R Harris | 1 Mar 2008 07:21
Picon

Re: contiguous true



On Fri, Feb 29, 2008 at 11:12 PM, Charles R Harris <charlesr.harris <at> gmail.com> wrote:


On Fri, Feb 29, 2008 at 10:53 AM, John Hunter <jdh2358 <at> gmail.com> wrote:
[apologies if this is a resend, my mail just flaked out]

I have a boolean array and would like to find the lowest index "ind"
where N contiguous elements are all True.  Eg, if x is

In [101]: x = np.random.rand(20)>.4

In [102]: x
Out[102]:
array([False,  True,  True, False, False,  True,  True, False, False,
       True, False,  True, False,  True,  True,  True, False,  True,
      False,  True], dtype=bool)

I would like to find ind=1 for N=2 and ind=13 for N=2.  I assume with
the right cumsum, diff and maybe repeat magic, this can be vectorized,
but the proper incantation is escaping me.

for N==3, I thought of

In [110]: x = x.astype(int)
In [112]: y = x[:-2] + x[1:-1] + x[2:]

In [125]: ind = (y==3).nonzero()[0]

In [126]: if len(ind): ind = ind[0]

In [128]: ind
Out[128]: 13


This may be more involved than you want, but

In [37]: prng = random.RandomState(1234567890)

In [38]: x = prng.random_sample(50) < 0.5

In [39]: y1 = concatenate(([False], x[:-1]))

In [40]: y2 = concatenate((x[1:], [False]))

In [41]: beg = ind[x & ~y1]

In [42]: end = ind[x & ~y2]

In [43]: cnt = end - beg + 1

In [44]: i = beg[cnt == 4]

In [45]: i
Out[45]: array([28])

In [46]: x
Out[46]:
array([False, False, False, False,  True, False,  True, False, False,
       False,  True, False,  True, False,  True,  True,  True,  True,
        True, False, False, False,  True, False,  True, False, False,
       False,  True,  True,  True,  True, False, False,  True, False,
       False, False, False, False, False, False, False,  True, False,
       False,  True, False,  True, False], dtype=bool)

produces a list of the indices where sequences of length 4 begin.

Chuck

Oops, ind = arange(len(x)). I suppose nonzero would work as well.

Chuck

_______________________________________________
Numpy-discussion mailing list
Numpy-discussion <at> scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion
Anne Archibald | 1 Mar 2008 07:56
Picon
Gravatar

Re: contiguous true

On 01/03/2008, Charles R Harris <charlesr.harris <at> gmail.com> wrote:

> > On Fri, Feb 29, 2008 at 10:53 AM, John Hunter <jdh2358 <at> gmail.com> wrote:
> > > I have a boolean array and would like to find the lowest index "ind"
> > > where N contiguous elements are all True.  Eg, if x is

[...]

> Oops, ind = arange(len(x)). I suppose nonzero would work as well.

I'm guessing you're alluding to the fact that diff(nonzero(x)) gives
you a list of the run lengths of Falses in x (except possibly for the
first one).

If you have a fondness for the baroque, you can try

numpy.where(numpy.convolve(x,[1,]*N,'valid')==N)

For large N this can even use Fourier-domain convolution (though you'd
then have to be careful about round-off error).

Silly, really, it's O(NM) or O(N log M) instead of O(N).

Anne
Peter Skomoroch | 1 Mar 2008 08:18
Picon

Re: PCA on set of face images

I think that is correct...

Here is what the final result should look like:

http://www.datawrangling.com/media/images/first_16.png

If the dimensions for the sample faces don't work out to ( 361 x 361 ) in the end, then you are likely to be missing a transpose somewhere.  Also, be aware that the scipy linalg.eig by default returns a vector of eigenvalues and a matrix, but the Matlab eig(), returns 2 matrices ( the eigenvalues are multiplied by an identity matrix to get a diagonal matrix).  You can check out the mathesaurus reference sheet for help translating the example into python, but hopefully this will point you in the right direction:

see:

http://www.mathworks.com/access/helpdesk/help/techdoc/ref/eig.html

vs:

>>> help(linalg.eig)

Help on function eig in module scipy.linalg.decomp:

eig(a, b=None, left=False, right=True, overwrite_a=False, overwrite_b=False)
    Solve ordinary and generalized eigenvalue problem
    of a square matrix.
   
    Inputs:
   
      a     -- An N x N matrix.
      b     -- An N x N matrix [default is identity(N)].
      left  -- Return left eigenvectors [disabled].
      right -- Return right eigenvectors [enabled].
      overwrite_a, overwrite_b -- save space by overwriting the a and/or
                                  b matrices (both False by default)
   
    Outputs:
   
      w      -- eigenvalues [left==right==False].
      w,vr   -- w and right eigenvectors [left==False,right=True].
      w,vl   -- w and left eigenvectors [left==True,right==False].
      w,vl,vr  -- [left==right==True].
   
    Definitions:
   
      a * vr[:,i] = w[i] * b * vr[:,i]
   
      a^H * vl[:,i] = conjugate(w[i]) * b^H * vl[:,i]
   
    where a^H denotes transpose(conjugate(a)).


On Sat, Mar 1, 2008 at 12:41 AM, devnew <at> gmail.com <devnew <at> gmail.com> wrote:


On Mar 1, 12:57 am, "Peter Skomoroch"  wrote:
I think
> > matlab example should be easy to translate to scipy/matplotlib using the
> > montage function:
>
> > load faces.mat
> > %Form covariance matrix
> > C=cov(faces');
> > %build eigenvectors and eigenvalues
> > [E,D] = eig(C);


hi Peter,
nice code..ran the examples..
however couldn't follow the matlab code since i have no exposure to
matlab..was using numpy etc for calcs
could you confirm the layout for the face images data? i assumed that
the initial face matrix should be
faces=a numpy matrix with N rows ie N=numofimages

row1=image1pixels as a sequence
row2=image2pixels as a sequence
...
rowN=imageNpixels as a sequence


and covariancematrix=faces*faces_transpose

is this the right way?
thanks
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion <at> scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion



--
Peter N. Skomoroch
peter.skomoroch <at> gmail.com
http://www.datawrangling.com
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion <at> scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion
devnew@gmail.com | 1 Mar 2008 08:27
Picon

Re: confusion about eigenvector


> This example assumes that facearray is an ndarray.(like you described
> in original post ;-) ) It looks like you are using a matrix.

hi Arnar
thanks ..
a few doubts however

1.when i use say 10 images of 4X3 each
u, s, vt = linalg.svd(facearray, 0)
i will get vt of shape (10,12)
can't i take this as facespace? why do i need to get the transpose?
then i can take as eigface_image0= vt[0].reshape(imgwdth,imght)

2.this way (svd) is diff from covariance matrix method. if i am to do
it using the later ,how can i get the eigenface image data?

thanks for the help
D
Robert Kern | 1 Mar 2008 09:17
Picon
Gravatar

Re: failure building numpy using icc

On Thu, Feb 28, 2008 at 1:21 PM, Glen W. Mabey <Glen.Mabey <at> swri.org> wrote:
> Hello,
>
>  I'm using svn numpy and get the following error upon executing
>
>   /usr/local/bin/python2.5 setup.py config --noisy --cc=/opt/intel/cce/10.0.025/bin/icc
--compiler=intel --fcompiler=intel build_clib build_ext
>
>  I see:
>
>  conv_template:> build/src.linux-x86_64-2.5/numpy/core/src/scalartypes.inc
>  Traceback (most recent call last):
>   File "setup.py", line 96, in <module>
>     setup_package()
>   File "setup.py", line 89, in setup_package
>     configuration=configuration )
>   File
"/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/core.py",
line 184, in setup
>     return old_setup(**new_attr)
>   File "/usr/local/lib/python2.5/distutils/core.py", line 151, in setup
>     dist.run_commands()
>   File "/usr/local/lib/python2.5/distutils/dist.py", line 974, in run_commands
>     self.run_command(cmd)
>   File "/usr/local/lib/python2.5/distutils/dist.py", line 994, in run_command
>     cmd_obj.run()
>   File
"/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/command/build_ext.py",
line 56, in run
>     self.run_command('build_src')
>   File "/usr/local/lib/python2.5/distutils/cmd.py", line 333, in run_command
>     self.distribution.run_command(command)
>   File "/usr/local/lib/python2.5/distutils/dist.py", line 994, in run_command
>     cmd_obj.run()
>   File
"/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/command/build_src.py",
line 130, in run
>     self.build_sources()
>   File
"/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/command/build_src.py",
line 147, in build_sources
>     self.build_extension_sources(ext)
>   File
"/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/command/build_src.py",
line 252, in build_extension_sources
>     sources = self.template_sources(sources, ext)
>   File
"/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/command/build_src.py",
line 359, in template_sources
>     outstr = process_c_file(source)
>   File
"/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/conv_template.py",
line 185, in process_file
>     % (sourcefile, process_str(''.join(lines))))
>   File
"/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/conv_template.py",
line 150, in process_str
>     newstr[sub[0]:sub[1]], sub[4])
>   File
"/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/conv_template.py",
line 117, in expand_sub
>     % (line, template_re.sub(namerepl, substr)))
>   File
"/home/gmabey/src/DiamondBack/Diamondback/src/numpy-20080228_svn/numpy/distutils/conv_template.py",
line 113, in namerepl
>     return names[name][thissub[0]]
>  KeyError: 'PREFIX'
>
>
>  And I do not see any errors when building the same svn version with gcc (on
>  a different machine).
>
>  I've unsuccessfully tried to follow that backtrace of functions to
>  figure out exactly what is going on.
>
>  Any hints/suggestions?

Off-hand, no, sorry. I'm not sure why the compiler would matter in
this part of the code, though. Can you try using gcc on the same
machine?

--

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
devnew@gmail.com | 1 Mar 2008 14:43
Picon

svd() and eigh()

hi
i have a set of images of faces which i make into a 2d array using
numpy.ndarray
each row represents a face image
faces=
[[ 173.   87.  ...   88.  165.]
 [ 158.  103.  ..   73.  143.]
 [ 180.   87.  ..   55.  143.]
 [ 155.  117.  ..   93.  155.]]

from which i can get the mean image =>
avgface=average(faces,axis=0)
and calculate the adjustedfaces=faces-avgface

now if i apply svd() i get
 u, s, vt = linalg.svd(adjustedfaces, 0)
# a member posted this
facespace=vt.transpose()

and if i calculate covariance matrix
covmat=matrix(adjustedfaces)* matrix(adjustedfaces).transpose()
eval,evect=eigh(covmat)
evect=sortbyeigenvalue(evect) # sothat largest eval is first
facespace=evect* matrix(adjustedfaces)

what is the difference btw these 2 methods? apparently they yield
different values for the facespace. which should i follow?
is it possible to calculate eigenvectors using svd()?

thanks
D
Arnar Flatberg | 1 Mar 2008 18:50
Picon

Re: confusion about eigenvector

On Sat, Mar 1, 2008 at 8:27 AM, devnew <at> gmail.com <devnew <at> gmail.com> wrote:
>
>  > This example assumes that facearray is an ndarray.(like you described
>  > in original post ;-) ) It looks like you are using a matrix.
>
>  hi Arnar
>  thanks ..
>  a few doubts however
>
>  1.when i use say 10 images of 4X3 each
>
> u, s, vt = linalg.svd(facearray, 0)
>  i will get vt of shape (10,12)

> can't i take this as facespace?
Yes, you may

> why do i need to get the transpose?
You dont need to. I did because then it would put the eigenvectors
that span your column space as columns of the facespace array. I
figured that would be easier for you, as that would be compatible with
the use of eig (eigh) and matlab

>  then i can take as eigface_image0= vt[0].reshape(imgwdth,imght)
>
> 2.this way (svd) is diff from covariance matrix method.

No it is not. You may be fooled by the scaling though. I see from the
post above, that there may be some confusion here about svd and eig on
a crossproduct matrix :-)

Essentially, if X is a column centered array of size (num_images, num_pixels):

u, s, vt = linalg.svd(X),

Then, the columns of u span the space of dot(X, X.T), the rows of vt
span the space of dot(X.T, X) and s is a vector of scaling
coefficients. Another way of seeing this is that u spans the column
space of X, and vt spans the row space of X. So, for a third view, the
columns of u are the eigenvectors of dot(X, X.T) and the rows of vt
contains the eigenvectors of dot(X.T, X).

Now, in your, `covariance method` you use eigh(dot(X, X.T)), where the
eigenvectors would be exactly the same as u(the array) from an svd on
X. In order to recover the facespace you use facespace=dot(X.T, u).
This facespace is the same as s*vt.T, where s and vt are from the svd.

In my example, the eigenvectors spanning the column space were scaled.
I called this for scores: (u*s)
In your computation the facespace gets scaled implicit. Where to put
the scale is different from application to application and has no
clear definition.

I dont know if this made anything any clearer. However, a simple
example may be clearer:

-------
# X is (a ndarray, *not* matrix) column centered with vectorized images in rows

# method 1:
XX = dot(X, X.T)
s, u = linalg.eigh(XX)
reorder = s.argsort()[::-1]
facespace = dot(X.T, u[:,reorder])

# method 2:
u, s, vt = svd(X, 0)
facespace2 = s*vt.T
------

This gives identical result.
Please remember that eigenvector signs are arbitrary when comparing.

> if i am to do
>  it using the later ,how can i get the eigenface image data?

Just like I described before

Arnar
Arnar Flatberg | 1 Mar 2008 18:58
Picon

Re: svd() and eigh()

On Sat, Mar 1, 2008 at 2:43 PM, devnew <at> gmail.com <devnew <at> gmail.com> wrote:
> hi
>  i have a set of images of faces which i make into a 2d array using
>  numpy.ndarray
>  each row represents a face image
>  faces=
>  [[ 173.   87.  ...   88.  165.]
>   [ 158.  103.  ..   73.  143.]
>   [ 180.   87.  ..   55.  143.]
>   [ 155.  117.  ..   93.  155.]]
>
>  from which i can get the mean image =>
>  avgface=average(faces,axis=0)
>  and calculate the adjustedfaces=faces-avgface
>
>  now if i apply svd() i get
>   u, s, vt = linalg.svd(adjustedfaces, 0)
>  # a member posted this
>  facespace=vt.transpose()
>
>  and if i calculate covariance matrix
>  covmat=matrix(adjustedfaces)* matrix(adjustedfaces).transpose()
>  eval,evect=eigh(covmat)
>  evect=sortbyeigenvalue(evect) # sothat largest eval is first
>  facespace=evect* matrix(adjustedfaces)
>
>  what is the difference btw these 2 methods?
See my answer, in your other post

> apparently they yield
>  different values for the facespace.
Not really.

> which should i follow?
The svd is a little less efficient and slightly slower. However it is
clear in implementation and may, in some rare situations, be more
precise.

>  is it possible to calculate eigenvectors using svd()?
Again, see me other response.

>
>  thanks
>  D
>  _______________________________________________
>  Numpy-discussion mailing list
>  Numpy-discussion <at> scipy.org
>  http://projects.scipy.org/mailman/listinfo/numpy-discussion
>

Gmane