How to calculate eigenfaces in python? - python

I'm trying to calculate eigenfaces for a set of images using python.
First I turn each image into a vector using:
list(map(lambda x:x.flatten(), x))
Then I calculate covariance matrix (after removing mean from all data):
# x is a numpy array
x = x - mean_image
cov_matrix = np.cov(x.T)
Then I calculate eigenvalues and eigenevtors:
eigen_values, eigen_vecotrs = np.linalg.eig(cov_matrix)
The results are vectors with complex numbers, so I only keep the real part to be able to show them:
eigen_vectors = np.real(eigen_vectors)
After trying to show eigenfaces (eigenvectors), the result is not even close to how an eigenface looks like:
I have managed to get a list of eigenfaces using np.linalg.svd() however I'm curious why my code does not work and how can I change it so it work as expected.
To fix the np.linalg.eig returning complex results I reduced the size of images, so it doesn't return complex numbers anymore however still my eigenvectors doesn't look like an eigenface:

proj_data = np.dot(x.transpose(),eigen_vector).T
img = proj_data[i].reshape(height,width)
This will give you the expected result.

After calculating the eigenvectors you should transpose it. Or you will get mixed image.

Related

Interpolating a function over a grid with different input sizes

I have a function f(u,v,w) which I would like to interpolate using a scipy function (with linear interpolation). This is easy enough.
When I run the interpolation step, I simply do the following (interpolating over a u,v,w grid):
u = np.linspace(-1,1,100)
v = np.linspace(-2,2,50)
w = np.linspace(3,8,30)
values_grid = np.zeros((len(u),len(v),len(w)))
count = 0
for i in range(len(u)):
for j in range(len(w)):
for k in range(len(w)):
values_grid[i,j,k] = f(u[i],v[j],w[k])
from scipy.interpolate import RegularGridInterpolator
my_interpolating_function = RegularGridInterpolator((u, v, w), values_grid, method='linear',bounds_error=False,fill_value=-999)
This is fine for many cases. However, when I want to evaluate this interpolation function it seems like I am required to use inputs which have shape [(Number of input samples) x (Dimension of Samples)]. E.g:
func_input = np.vstack([u_samps,v_samps,w_samps].T # E.g. shape is 500,3
output = my_interpolating_function(func_input)) # Has output shape 500
This works fine. The issue is that I would like to evaluate this function over a grid where the samples have the following shape
shape(u_samps) = 500
shape(v_samps) = (100,100)
shape(w_samps) = (100,100)
Meaning I would like to evaluate
my_interpolating_function([u_samps, v_samps, w_samps])
and get out an array which has shape (500,100,100) (so the interpolation is evaluated for all 500 u_samps over the v_samps and w_samps grids). I can flatten the v_samps and w_samps array, but then I have to make several (hundreds) copies of u_samps to get the inputs into the correct format. So is there any way to have an interpolation function that can take the inputs above (u_samps, v_samps, w_samps with the specified shapes) and get out an array with shape (500,100,100) efficiently?
Any help greatly appreciated, I have been stuck on this problem and it's really holding up my progress! The end goal is to use this function in a statistical likelihood which needs to be sampled with MCMC, so speed is pretty important (and making hundreds of copies of massive arrays is very slow)

Efficient 2D cross correlation in Python?

I have two arrays of size (n, m, m) (n number of images of size (m,m)). I want to perform a cross correlation between each corresponding n of the two arrays.
Example: n=1 -> corr2d([m,m]<sub>1</sub>,[m,m]<sub>2</sub>)
My current way include a bunch of for loops in python:
for i in range(len(X)):
X_co = X[i,0,:,:]/(np.max(X[i,0,:,:]))
X_x = X[i,1,:,:]/(np.max(X[i,1,:,:]))
autocorr[i,0,:,:]=correlate2d(X_co, X_x, mode='same', boundary='fill', fillvalue=0)
Obviously this is very slow when the input contain many images, and becomes a substantial part of the total run time if (m,m) << n.
The obvious optimization is to skip the loop and feed everything directly to the compiled correlation function. Currently I'm using scipy's correlate2d.
I've looked around but haven't found any function that allows correlation along some axis or multiple inputs.
Any tips on how to make scipy's correlate2d work or alternatives?
I decided to implement it via the FFT instead.
def fft_xcorr2D(x):
# Over axes (-2,-1) (default in the fft2 function)
## Pad because of cyclic (circular?) behavior of the FFT
x = np.fft2(np.pad(x,([0,0],[0,0],[0,34],[0,34]),mode='constant'))
# Conjugate for correlation, not convolution (Conv. Theorem)
x[:,1,:,:] = np.conj(x[:,1,:,:])
# Over axes (-2,-1) (default in the ifft2 function)
## Multiply elementwise over 2:nd axis (2 image bands for me)
### fftshift over rows and column over images
corr = np.fft.fftshift(np.ifft2(np.prod(x,axis=1)),axes=(-2,-1))
# Return after removing padding
return np.abs(corr)[:,3:-2,3:-2]
Call via:
ts=fft_xcorr2D(X)
If anybody wants to use it:
My input is a 4D array: (N, 2, #Rows, #Cols)
E.g. (500, 2, 30, 30): 500 images, 2 bands (polarizations, for example), of 30x30 pixels
If your input is different, adjust the padding to your liking
Check so your input order is the same as mine otherwise change the axes arguments in the fft2 and ifft2 functions, the np.prod and fftshift. I use fftshift to get the maximum value in the middle (otherwise in the corners), so be wary of that if that's not what you want.
Why is it the maximum value? Technically, it doesn't have to be, but for my purpose it is. fftshift is used to get a correlation that looks like you're used to. Otherwise, the quadrants are turned "inside out". If you wonder what I mean, remove fftshift (just the fftshift part, not its arguments), call the function as before, and plot it.
Afterwards, it should be ready to use.
Possibly x.prod(axis=1) is faster than np.prod(x,axis=1) but it's an old post. It shows no improvement for me after trying.

Singular value decomposition (svd) and mean does not exclude masked values during computation

I am new in python programming, so forgive me if my questions are too basic. I've been helped a lot by this forum before and thanks to you guys for all your contributions.
This time I have a set of 12,000 image data which I am performing singular value decomposition (svd) on and calculating their mean. Some of the images have pixels with very high positive or negative values which I don't want to use during computation, so I used
numpy.ma.masked_array to exclude them from both svd and mean computation. And some images are smaller than others and they were padded with zeros values to make all images to have the same (pixel) dimension. But I also don't want the 'zero paddings' to be used during computation, so I used numpy.ma.masked_array to exclude them from both svd and mean calculation.
Here are some example images:
The problem is that when I perform both svd and mean calculation, the masked values (array elements) are not excluded during computation. I have tried all that I know to resolve this without success. Below are the steps that I took.
from numpy.linalg import svd
import numpy as np
from numpy.ma import masked_array
n, x, y = images.data.shape
Z = []
meanimage = []
for icount in range(n):
image = images[icount,:,:] # current image
# creating a mask for too positively or negatively high values
mask = (np.abs(image) > 2).astype(int);
yindex = 0; xindex = 0;
# --- creating a mask for zero padded values
for i in range(y/2): # get the index of the first none zero pixel
if image[i,x/2] != 0:
yindex = i
break
for i in range(x/2): # get the index of the first none zero pixel
if image[y/2,i] != 0:
xindex = i
break
mask[:yindex,:] = 1;mask[-yindex:,:] = 1;
mask[:xindex,:] = 1;mask[-xindex:,:] = 1;
# ---
image = masked_array(images[icount,:,:], mask)
Z.append(image.ravel()) # accummulating matrix for svd computation
meanimage.append(image) # accummulating matrix for for mean computation
# calc. SVD
u,s,v = svd(masked_array(Z))
#calc. mean image
meanimage = masked_array(meanimage).mean(axis=0)
bimage = np.dot(np.dot(u[:,:2],np.diag(s[:2])),np.transpose(v)[:2,:])
eigenimage = bimage[2,:].reshape(x, y)
The final results - eigenimage and meanimage - that I get does not exclude the masked values from computation. I don't know what I did wrong. Please, I need some ideas that will help me to resolve this.
Above are some samples of the images (beams) data that I am working with.
The final images that I get after computation for the eigenimage and meanimage are :
Eigen (beam) image (with SVD)
Mean (beam) image (masked_array mean)
From the above figures, both the eigenimage and meanimage loses a lot of side lobes information which are not desired.
But I was expecting the final eigen images to be like
The masked_array mean actually excludes masked pixels ('zero paddings') from mean computation. I confirmed this by comparing this result with the one calculated without a mask and noticed a remarkable difference, which confirms that numpy.ma.masked_array mean works perfectly for my case.
On the SVD Eigen image:
The problem was with the transposing v (np.transpose(v)). I found out from documentation (1) that numpy.linalg.svd returns a transpose of v, so I just needed to perform the dot product without transposing v.
bimage = np.dot(np.dot(u[:,:2],np.diag(s[:2])),v[:2,:])

How can I improve a "dumb" vector quantization algorithm for K-means clustering

I need to convert a codebase relying on the scipy.cluster.vq module to not use scipy so that I can implement it in C++.
First I am trying to replicate the results using only numpy.
Starting with an image of dimensions MxNx3 , I create a "centroids" Kx3 array using kmeans with opencv.
I need to map each pixel of the original image to the pixel value in the centroids array that is closest to the original pixel.
I have it working, but performance is awful. I'm sure there must be more advanced ways to compute this, and I suspect it's related to a nearest neighbour search (maybe?) but don't know for sure.
Here is what I'm currently doing: I think this may be called a "brute force" approach
iterate over every pixel in the image
calculate the euclidean distance between this pixel and each pixel in the centroid list
return the minimum value from the list generated in step 2
assign the original image pixel to the value of the centroids list that returned the minimum distance.
def vq(self,image,centroids):
x,y,z = image.shape
Z=np.reshape(image,(x*y,z))
counts = np.zeros(len(centroids))
clusterMap = np.zeros(Z.shape,np.uint8)
for i in range(Z.shape[0]):
color = Z[i]
closestIndex = self.getClosestCenter(color, centroids)
counts[closestIndex]+=1# tracking how often each color occurs
clusterMap[i] = centroids[closestIndex]
return clusterMap,counts
def getClosestCenter(self,color,centers):
distances = [0 for i in range(len(centers))]
for i,center in enumerate(centers):
distances[i] = self.getDistance(color, center)
return distances.index(min(distances))
def getDistance(self,value1,value2):
if len(value1) !=len(value2): return None #error
sum = 0
for i in range(len(value1)):
sum+=(value1[i]-value2[i])**2
return sum**(0.5)
First of all, profile your code to see where exactly it is slow.
Constructs such as enumerate can be very expensive because they require the creation and garbage collection of many tuple objects. A good rule of thumb is to avoid object allocations in inner loops and functions (this includes hidden objects such as tuples)
Last but not least, kmeans does not use Euclidean distance. It uses sum-of-squares. Get rid of the square root.

Minimum Distance Algorithm using GDAL and Python

I'm trying to implement the Minimum Distance Algorithm for image classification using GDAL and Python. After calculating the mean pixel-value of the sample areas and storing them into a list of arrays ("sample_array"), I read the image into an array called "values". With the following code I loop through this array:
values = valBD.ReadAsArray()
# loop through pixel columns
for X in range(0,XSize):
# loop thorugh pixel lines
for Y in range (0, YSize):
# initialize variables
minDist = 9999
# get minimum distance
for iSample in range (0, sample_count):
# dist = calc_distance(values[jPixel, iPixel], sample_array[iSample])
# computing minimum distance
iPixelVal = values[Y, X]
mean = sample_array[iSample]
dist = math.sqrt((iPixelVal - mean) * (iPixelVal - mean)) # only for testing
if dist < minDist:
minDist = dist
values[Y, X] = iSample
classBD.WriteArray(values, xoff=0, yoff=0)
This procedure takes very long for big images. That's why I want to ask if somebody knows a faster method. I don't know much about access-speed of different variables in python. Or maybe someone knows a libary I could use.
Thanks in advance,
Mario
You should definitely be using NumPy. I work with some pretty large raster datasets and NumPy burns through them. On my machine, with the code below there's no noticeable delay for a 1000 x 1000 array. An explanation of how this works follows the code.
import numpy as np
from scipy.spatial.distance import cdist
# some starter data
dim = (1000,1000)
values = np.random.randint(0, 10, dim)
# cdist will want 'samples' as a 2-d array
samples = np.array([1, 2, 3]).reshape(-1, 1)
# this could be a one-liner
# 'values' must have the same number of columns as 'samples'
mins = cdist(values.reshape(-1, 1), samples)
outvalues = mins.argmin(axis=1).reshape(dim)
cdist() calculates the "distance" from each element in values to each of the elements in samples. This generates a 1,000,000 x 3 array, where each row n has the distance from pixel nin the original array to each of the sample values [1, 2, 3]. argmin(axis=1) gives you the index of the minimum value along each row, which is what you want. A quick reshape gives you the rectangular format you'd expect for an image.
Agree with Thomas K: use PIL, or else write a C-function and wrap it using e.g. ctypes, or at very least use some numPy matrix operations.
Or else use pypy on your existing code (JIT-compiled code can be 100x faster, on image code). Try pypy and tell us what speedup you got.
Bottom line: never do stuff pixel-wise like this natively in cPython, the interpreting and memory-mgt overhead will kill you.

Categories