numpy fill an array with arrays - python

I want to combine an unspecified (finite) number of matrices under a Kroneckerproduct. In order to do this I want to save the matrices in an array but I don't know how to do this. At the moment I have:
for i in range(LNew-2):
for j in range(LNew-2):
Bulk = np.empty(shape=(LNew-1,LNew-1))
if i == j:
Bulk[i,j]=H2
else:
Bulk[i,j]=idm
Here the H2 and idm are both matrices, which I want to combine under a Kronecker product. But since Bulk is an ndarray object I suppose it wont accept arraylike objects inside it.
edit:
This is the function in which I want to use this idea. I am using it to build a Hamiltonian matrix for a quantum spin chain. So H2 is the Hamiltonian for a two particle chain,
H2 is a 4x4 matrix and idm is the 2x2 identity matrix.
and now the three particle chain is np.kron(H2,idm)+np.kron(idm,H2)
and for four particles
np.kron(np.kron(H2,idm),idm)+np.kron(idm,np.kron(H2,idm))+np.kron(idm,np.kron(idm,H2)) and so on.
def ExpandHN(LNew):
idm = np.identity(2)
H2 = GetH(2,'N')
HNew = H2
for i in range(LNew-2):
for j in range(LNew-2):
Bulk = np.empty(shape=(LNew-1,LNew-1))
if i == j:
Bulk[i,j]=H2
else:
Bulk[i,j]=idm
i = 0
for i in range(LNew-2):
for j in range(LNew-3):
HNew += np.kron(Bulk[i,j],Bulk[i,j+1]) #something like this
return HNew
As you can see the second set of for loops hasn't been worked out.
That being said, if someone has a totaly different but working solution I would be happy with that too.

If I understand correctly, the your question boils down to how to create arrays of arrays with numpy. I would suggest to use the standard python object dict:
Bulk = dict()
for i in range(LNew-2):
for j in range(LNew-2):
if i == j:
Bulk[(i,j)]=H2
else:
Bulk[(i,j)]=idm
The usage of tuples as keys allows you to maintain an array-like indexing of the matrices.
Also note, that you should define Bulk outside of the two for loops (in any case).
HTH

Related

Removing loops in numpy for a simple matrix assignment

How can I remove loops in this simple matrix assignment in order to increase performance?
nk,ncol,nrow=index.shape
for kk in range(0,nk):
for ii in range(0,nrow):
for jj in range(0,ncol):
idx=index[kk][ii][jj]
counter[idx][ii][jj]+=1
I come from C++ and I am finding it difficult to adapt to numpy's functions to do some very basic matrix manipulation like this one. I think I have simplified it to a one dimensional loop, but this is still too slow for what I need and it seems to me that there is got to be a more direct way of doing it. Any suggestions? thanks
for kk in range(0,nk):
xx,yy = np.meshgrid(np.arange(ncol),np.arange(nrow))
counter[index[kk,:,:].flatten(),yy.flatten(),xx.flatten()]+=1
If I understand it correctly, you are looking for this:
uniq, counter = np.unique(index, return_counts=True, axis=0)
The uniq should give you unique set of x,ys (x,y will be flattened into a single array) and counter corresponding number of repetitions in the array index
EDIT:
Per OP's comment below:
xx,yy = np.meshgrid(np.arange(ncol),np.arange(nrow))
idx, counts = np.unique(np.vstack((index.flatten(),np.repeat(yy.flatten(),nk),np.repeat(xx.flatten(),nk))), return_counts=True,axis=1)
counter[tuple(idx)] = counts

Use a function like a numpy array

I'm dealing with a big array D with which I'm running into memory problems. However, the entries of that big array are in fact just copies of elements of a much smaller array B. Now my idea would be to use something like a "dynamic view" into B instead of constructing the full D. For example, is it possible to use a function D_fun like an array which the reads the correct element of B? I.e. something like
def D_fun(B, I, J):
i = convert_i_idx(I, J)
j = convert_j_idx(I, J)
return B[i,j]
And then I could use D_fun to do some matrix and vector multiplications.
Of course, anything else that would keep me form copying the elements of B repeatedly into a huge matrix would be appreciated.
Edit: I realized that if I invest some time in my other code I can get the matrix D to be a block matrix with the Bs on the diagonal and zeros otherwise.
This is usually done by subclassing numpy.ndarray and overloading __getitem__, __setitem__, __delitem__
(array-like access via []) to remap the indices like D_fun(..) does. Still, I am not sure if this will work in combination with the numpy parts implemented in C.
Some concerns:
When you're doing calculations on your big matrix D via the small matrix B, numpy might create a copy of D with its real dimensions, thus using more space than wanted.
If several (I1,J1), (I2,J2).. are mapped to the same (i,j), D[I1,J1] = newValue will also set D(I2,J2) to newValue.
np.dot uses compiled libraries to perform fast matrix products. That constrains the data type (integer, floats), and requires that the data be contiguous. I'd suggest studying this recent question about large dot products, numpy: efficient, large dot products
Defining a class with a custom __getitem__ is a way of accessing a object with indexing syntax. Look in numpy/lib/index_tricks.py for some interesting examples of this, np.mgrid,np.r_, np.s_ etc. But this is largely a syntax enhancement. It doesn't avoid the issues of defining a robust and efficient mapping between your D and B.
And before trying to do much with subclassing ndarray take a look at the implementation for np.matrix or np.ma. scipy.sparse also creates classes that behave like ndarray in many ways, but does not subclass ndarray.
In your D_fun are I and J scalars? If so this conversion would be horribly in efficient. It would be better if they could be arrays, lists or slices (anything that B[atuple] implements), but that can be a lot of work.
def D_fun(B, I, J):
i = convert_i_idx(I, J)
j = convert_j_idx(I, J)
return B[i,j]
def __getitem__(self, atuple):
# sketch of a getitem version of your function
I, J = atuple
<select B based on I,J?>
i = convert_i_idx(I, J)
j = convert_j_idx(I, J)
return B.__getitem__((i,j))
What is the mapping from D to B like? The simplest, and most efficient mapping would be that D is just a higher dimensional collection of B, i.e.
D = np.array([B0,B1,B2,...,Bn])
D[0,...] == B0
Slightly more complicated is the case where D[n1:n2,....] == B0, a slice
But if the B0 values are scattered around D you chances of efficient, reliable mapping a very small.

Combinations of features using Python NumPy

For an assignment I have to use different combinations of features belonging to some data, to evaluate a classification system. By features I mean measurements, e.g. height, weight, age, income. So for instance I want to see how well a classifier performs when given just the height and weight to work with, and then the height and age say. I not only want to be able to test what two features work best together, but also what 3 features work best together and would like to be able to generalise this to n features.
I've been attempting this using numpy's mgrid, to create n dimensional arrays, flattening them, and then making arrays that use the same elements from each array to create new ones. Tricky to explain so here is some code and psuedo code:
import numpy as np
def test_feature_combos(data, combinations):
dimensions = combinations.shape[0]
grid = np.empty(dimensions)
for i in xrange(dimensions):
grid[i] = combinations[i].flatten()
#The above code throws an error "setting an array element with a sequence" error which I understand, but this shows my approach.
**Pseudo code begin**
For each element of each element of this new array,
create a new array like so:
[[1,1,2,2],[1,2,1,2]] ---> [[1,1],[1,2],[2,1],[2,2]]
Call this new array combo_indices
Then choose the columns (features) from the data in a loop using:
new_data = data[:, combo_indices[j]]
combinations = np.mgrid[1:5,1:5]
test_feature_combos(data, combinations)
I concede that this approach means a lot of unnecessary combinations due to repeats, however I cannot even implement this so beggars can not be choosers.
Please can someone advise me on how I can either a) implement my approach or b) achieve this goal in a much more elegant way.
Thanks in advance, and let me know if any clarification needs to be made, this was tough to explain.
To generate all combinations of k elements drawn without replacement from a set of size n you can use itertools.combinations, e.g.:
idx = np.vstack(itertools.combinations(range(n), k)) # an (n, k) array of indices
For the special case where k=2 it's often faster to use the indices of the upper triangle of an n x n matrix, e.g.:
idx = np.vstack(np.triu_indices(n, 1)).T

Unable to add matrices to tuples

New to python and numpy, searched and tried all possible solution not getting results
I have a function that returns 2 matrices. I want to create an array or matrices that saves each of the matrices being returned by my function. I've done so many different versions, this was the closest. I'm used to java and not python. If I do the following: centroidsm[0] and clustersm[0], I cannot get each individual array.
This is my code:
centroidsm = []
centroidsm.append([])
clustersm = []
clustersm.append([])
for k in range(2,20):
centroids, clusters = kMeans(train, k)
centroidsm[k].append(centroids)
clustersm[k].append(clusters)
First, I don't know why you're appending an empty array to centroidsm. You should do away with those lines.
Second, if centroidsm is supposed to be an array of centroid matrices, you simply need to call centroidism.append(centroids) inside you for loop (centroidsm[k].append attempts to append to an array at index k - an array that doesn't exist).
centroidsm = []
clustersm = []
for k in range(2,20):
centroids, clusters = kMeans(train, k)
centroidsm.append(centroids)
clustersm.append(clusters)

Iteration over binarized image with numpy is slow

I need to iterate over all the pixels in a binarized image to find shapes. But it takes a long time to iterate each image pixel in this manner. Is there any other way to iterate the image pixels in a faster manner ?
dimension = im.shape
rows = dimension[0]
cols = dimension[1]
for i in range(0,rows):
for j in range(0,cols):
doSomeOperation(im[i,j])
In general, what your doSomeOperation does defines how much it can be speeded-up.
If the mentioned interest in finding shapes actually means finding connected components, then a simple way to speed-up the solution is to use ndimage.label followed by ndimage.find_objects from the scipy package.
As rubik's comment said, python loops are slow in comparison to the speed with which vectorised functions can work. With a vectorised function you define a function that works on a single element (sometimes more if you get into more complicated vectorised functions) and returns a single value. Common vectorised functions are already define like addition and multiplication.
eg.
arr = numpy.arange(10)
arr = arr * numpy.arange(10, 20)
# times all elements arr by the respective element in other array
arr = arr + 1
# add 1 to all elements
#numpy.vectorize
def threshold(element):
if element < 20:
return 0
else:
return element
# # notation is the same as
# threshold = numpy.vectorize(threshold)
arr = threshold(arr)
# sets all elements less than 20 to 0
However, because you're trying to find shapes, it might be worth saying what areas of pixels you are looking at. So there might be better ways of trying to find what you're looking for.

Categories