Python - NumPy - tuples as elements of an array - python

I'm a CS major in university working on a programming project for my Calc III course involving singular-value decomposition. The idea is basically to convert an image of m x n dimensions into an m x n matrix wherein each element is a tuple representing the color channels (r, g, b) of the pixel at point (m, n). I'm using Python because it's the only language I've really been (well-)taught so far.
From what I can tell, Python generally doesn't like tuples as elements of an array. I did a little research of my own and found a workaround, namely, pre-allocating the array as follows:
def image_to_array(): #converts an image to an array
aPic = loadPicture("zorak_color.gif")
ph = getHeight(aPic)
pw = getWidth(aPic)
anArray = zeros((ph,pw), dtype='O')
for h in range(ph):
for w in range(pw):
p = getPixel(aPic, w, h)
anArray[h][w] = (getRGB(p))
return anArray
This worked correctly for the first part of the assignment, which was simply to convert an image to a matrix (no linear algebra involved).
The part with SVD, though, is where it gets trickier. When I call the built-in numPy svd function, using the array I built from my image (where each element is a tuple), I get the following error:
Traceback (most recent call last):
File "<pyshell#5>", line 1, in -toplevel-
svd(x)
File "C:\Python24\Lib\site-packages\numpy\linalg\linalg.py", line 724, in svd
a = _fastCopyAndTranspose(t, a)
File "C:\Python24\Lib\site-packages\numpy\linalg\linalg.py", line 107, in _fastCopyAndTranspose
cast_arrays = cast_arrays + (_fastCT(a.astype(type)),)
ValueError: setting an array element with a sequence.
This is the same error I was getting initially, before I did some research and found that I could pre-allocate my arrays to allow tuples as elements.
The issue now is that I am only in my first semester of (college-level) programming, and these numPy functions written by and for professional programmers are a little too black-box for me (though I'm sure they're much clearer to those with experience). So editing these functions to allow for tuples is a bit more complicated than when I did it on my own function. Where do I need to go from here? I assume I should copy the relevant numPy functions into my own program, and modify them accordingly?
Thanks in advance.

Instead of setting the array element type to 'O' (object) you should set it to a tuple. See the SciPy manual for some examples.
In your case, easiest is to use something like
a = zeros((ph,pw), dtype=(float,3))
Assuming your RGB values are tuples of 3 floating point numbers.
This is similar to creating a 3d array (as Steve suggested) and, in fact, the tuple elements are accessed as a[n,m][k] or z[n,m,k] where k is the element in the tuple.
Of course, the SVD is defined for 2d matrices and not 3d arrays so you cannot use linalg.svd(a). You have to decide SVD of what matrix (of the three possible ones: R G and B) you need.
If, for example, you want the SVD of the "R" matrix (assuming that is the first element of the tuple) use something like:
linalg.svd(a[:,:,1])

I think you want a ph by pw by 3 numpy array.
anArray = zeros((ph,pw,3))
for h in range(ph):
for w in range(pw):
p = getPixel(aPic, w, h)
anArray[h][w] = getRGB(p)
You just need to make sure getRGB returns a 3-element list instead of a tuple.

Related

expand numpy array in n dimensions

I am trying to 'expand' an array (generate a new array with proportionally more elements in all dimensions). I have an array with known numbers (let's call it X) and I want to make it j times bigger (in each dimension).
So far I generated a new array of zeros with more elements, then I used broadcasting to insert the original numbers in the new array (at fixed intervals).
Finally, I used linspace to fill the gaps, but this part is actually not directly relevant to the question.
The code I used (for n=3) is:
import numpy as np
new_shape = (np.array(X.shape) - 1 ) * ratio + 1
new_array = np.zeros(shape=new_shape)
new_array[::ratio,::ratio,::ratio] = X
My problem is that this is not general, I would have to modify the third line based on ndim. Is there a way to use such broadcasting for any number of dimensions in my array?
Edit: to be more precise, the third line would have to be:
new_array[::ratio,::ratio] = X
if ndim=2
or
new_array[::ratio,::ratio,::ratio,::ratio] = X
if ndim=4
etc. etc. I want to avoid having to write code for each case of ndim
p.s. If there is a better tool to do the entire process (such as 'inner-padding' that I am not aware of, I will be happy to learn about it).
Thank you
array = array[..., np.newaxis] will add another dimension
This article might help
You can use slice notation -
slicer = tuple(slice(None,None,ratio) for i in range(X.ndim))
new_array[slicer] = X
Build the slicing tuple manually. ::ratio is equivalent to slice(None, None, ratio):
new_array[(slice(None, None, ratio),)*new_array.ndim] = ...

Building a numpy array through iteration

I start with an array of shape (n,d) containing n particle-vectors of length d dimensions, and want to construct an array containing the inter-particle vectors, of shape (n,n,d) which I'll go on to use for calculating forces etc in a Newtonian simulation.
I want to be able to generalise this for any number of dimensions, so the position vectors could be of any d, and have come up with the below, building the new dimension one array at a time into a list that I then convert back into an array. But this seems clunky, and, since it must be such a common operation, I can't help thinking there's some inbuilt numpy magic that will perform this operation more quickly.
def delta_matrix(pos_vec):
build=[]
for i in pos_vec:
build.append(i-pos_vec)
return np.array(build)
In particular then, is there a numpy method that performs this iterative type of operation?
What about list comprehension? That seems simple yet powerful enough.
def delta_matrix(pos_vec):
return np.array([i-pos_vec for i in pos_vec])

numpy array with Pillow's getcolors' dimensions

Pretty self-explanatory. Pillow's getcolors() method returns list of tuples, each with a (1,3) shape (i.e. (count, (r, g, b)) ). Unless there is a better way to handle this, how can I create a numpy array with a [n, [1, 3]] shape?
You should rather use a n x 4 dimensional numpy array. The first axis allows you to choose between different results of the getcolors method. The second axis contains your data. You can store in the first entry the count value, and then the r, g and the b value. Then you can do something like this:
result = np.empty(number, 4)
#get one entry
count, r, g, b = result[n]
You should always keep in mind, what you are acutally trying to do: The data you want to store contains 4 different integers, so it is 4-dimensional. And you expect n different data points of this type. Therefore, your array has to have the shape n x 4.
PS: You use a strange definition of shapes' dimension; this causes you a lot of trouble. I suggest using the default definition of shapes, and thinking about them as the axes of a multi-dimensional array.

Combinations of features using Python NumPy

For an assignment I have to use different combinations of features belonging to some data, to evaluate a classification system. By features I mean measurements, e.g. height, weight, age, income. So for instance I want to see how well a classifier performs when given just the height and weight to work with, and then the height and age say. I not only want to be able to test what two features work best together, but also what 3 features work best together and would like to be able to generalise this to n features.
I've been attempting this using numpy's mgrid, to create n dimensional arrays, flattening them, and then making arrays that use the same elements from each array to create new ones. Tricky to explain so here is some code and psuedo code:
import numpy as np
def test_feature_combos(data, combinations):
dimensions = combinations.shape[0]
grid = np.empty(dimensions)
for i in xrange(dimensions):
grid[i] = combinations[i].flatten()
#The above code throws an error "setting an array element with a sequence" error which I understand, but this shows my approach.
**Pseudo code begin**
For each element of each element of this new array,
create a new array like so:
[[1,1,2,2],[1,2,1,2]] ---> [[1,1],[1,2],[2,1],[2,2]]
Call this new array combo_indices
Then choose the columns (features) from the data in a loop using:
new_data = data[:, combo_indices[j]]
combinations = np.mgrid[1:5,1:5]
test_feature_combos(data, combinations)
I concede that this approach means a lot of unnecessary combinations due to repeats, however I cannot even implement this so beggars can not be choosers.
Please can someone advise me on how I can either a) implement my approach or b) achieve this goal in a much more elegant way.
Thanks in advance, and let me know if any clarification needs to be made, this was tough to explain.
To generate all combinations of k elements drawn without replacement from a set of size n you can use itertools.combinations, e.g.:
idx = np.vstack(itertools.combinations(range(n), k)) # an (n, k) array of indices
For the special case where k=2 it's often faster to use the indices of the upper triangle of an n x n matrix, e.g.:
idx = np.vstack(np.triu_indices(n, 1)).T

Compute outer product of arrays with arbitrary dimensions

I have two arrays A,B and want to take the outer product on their last dimension,
e.g.
result[:,i,j]=A[:,i]*B[:,j]
when A,B are 2-dimensional.
How can I do this if I don't know whether they will be 2 or 3 dimensional?
In my specific problem A,B are slices out of a bigger 3-dimensional array Z,
Sometimes this may be called with integer indices A=Z[:,1,:], B=Z[:,2,:] and other times
with slices A=Z[:,1:3,:],B=Z[:,4:6,:].
Since scipy "squeezes" singleton dimensions, I won't know what dimensions my inputs
will be.
The array-outer-product I'm trying to define should satisfy
array_outer_product( Y[a,b,:], Z[i,j,:] ) == scipy.outer( Y[a,b,:], Z[i,j,:] )
array_outer_product( Y[a:a+N,b,:], Z[i:i+N,j,:])[n,:,:] == scipy.outer( Y[a+n,b,:], Z[i+n,j,:] )
array_outer_product( Y[a:a+N,b:b+M,:], Z[i:i+N, j:j+M,:] )[n,m,:,:]==scipy.outer( Y[a+n,b+m,:] , Z[i+n,j+m,:] )
for any rank-3 arrays Y,Z and integers a,b,...i,j,k...n,N,...
The kind of problem I'm dealing with involves a 2-D spatial grid, with a vector-valued function at each grid point. I want to be able to calculate the covariance matrix (outer product) of these vectors, over regions defined by slices in the first two axes.
You may have some luck with einsum :
http://docs.scipy.org/doc/numpy/reference/generated/numpy.einsum.html
After discovering the use of ellipsis in numpy/scipy arrays
I ended up implementing it as a recursive function:
def array_outer_product(A, B, result=None):
''' Compute the outer-product in the final two dimensions of the given arrays.
If the result array is provided, the results are written into it.
'''
assert(A.shape[:-1] == B.shape[:-1])
if result is None:
result=scipy.zeros(A.shape+B.shape[-1:], dtype=A.dtype)
if A.ndim==1:
result[:,:]=scipy.outer(A, B)
else:
for idx in xrange(A.shape[0]):
array_outer_product(A[idx,...], B[idx,...], result[idx,...])
return result
Assuming I've understood you correctly, I encountered a similar issue in my research a couple weeks ago. I realized that the Kronecker product is simply an outer product which preserves dimensionality. Thus, you could do something like this:
import numpy as np
# Generate some data
a = np.random.random((3,2,4))
b = np.random.random((2,5))
# Now compute the Kronecker delta function
c = np.kron(a,b)
# Check the shape
np.prod(c.shape) == np.prod(a.shape)*np.prod(b.shape)
I'm not sure what shape you want at the end, but you could use array slicing in combination with np.rollaxis, np.reshape, np.ravel (etc.) to shuffle things around as you wish. I guess the downside of this is that it does some extra calculations. This may or may not matter, depending on your limitations.

Categories