NumPy Tensor / Kronecker product of matrices coming out shuffled - python

I'm trying to compute the tensor product (update: what I wanted was actually called the Kronecker product, and this naming confusion was why I couldn't find np.kron) of multiple matrices, so that I can apply transformations to vectors that are themselves the tensor product of multiple vectors. I'm running into trouble with flattening the result correctly.
For example, say I want to compute the tensor product of [[0,1],[1,0]] against itself. The result should be something like:
| 0*|0,1| 1*|0,1| |
| |1,0| |1,0| |
| |
| 1*|0,1| 0*|0,1| |
| |1,0| |1,0| |
which I then want to flatten to:
| 0 0 0 1 |
| 0 0 1 0 |
| 0 1 0 0 |
| 1 0 0 0 |
Unfortunately, the things I try all either fail to flatten the matrix or flatten it too much or permute the entries so that some columns are empty. More specifically, the output of the python program:
import numpy as np
flip = np.matrix([[0, 1], [1, 0]])
print np.tensordot(flip, flip, axes=0)
print np.reshape(np.tensordot(flip, flip, axes=0), (4, 4))
is
[[[[0 0]
[0 0]]
[[0 1]
[1 0]]]
[[[0 1]
[1 0]]
[[0 0]
[0 0]]]]
[[0 0 0 0]
[0 1 1 0]
[0 1 1 0]
[0 0 0 0]]
Neither of which is what I want.
There are a lot of other questions similar to this one, but the things suggested in them haven't worked (or maybe I missed the ones that work). Maybe "tensor product" means something slightly different than I thought; but the example above should make it clear.

From the answers to this and this question, I learned what you want is called the "Kronecker product". It's actually built into Numpy, so just do:
np.kron(flip, flip)
But if you want to make the reshape approach work, first rearrange the rows in the tensor:
flip = [[0,1],[1,0]]
tensor4d = np.tensordot(flip, flip, axes=0)
print tensor4d.swapaxes(2, 1).reshape((4,4))

Related

Creating an image contour (1) in a zero matrix, numpy

I'm using opencv to find contours of an object, the cuntours are in the the matrix of shape (7873, 1, 2)
(e.g. matrix zero below) in the form of [[[x1, y1]], [[x2, y2]], ...] whre x and y are indexes of pixels of an image.
Is it possible using numpy trickery to pass a list of all coordinates of the contour and change them to 1?
I'd like to avoid loops as this is time sensitive. Apart from numpy is there another time efficient way to do it?
zero = np.zeros((5, 5))
test = np.array([[[2,1]], [[3, 1]], [[1, 0]]])
zero[test] = 1
desired OUPUT (for this example):
x 0 1 2 3 4
y _____________
0| 0 1 0 0 0
1| 0 0 1 1 0
2| 0 0 0 0 0
3| 0 0 0 0 0
4| 0 0 0 0 0
You can do:
idx = test.reshape(-1,2).T
zero[idx[1], idx[0]] = 1

Numpy get secondary diagonal with offset=1 and change the values

I have this 6x6 matrix filled with 0s. I got the secondary diagonal in sec_diag. The thing I am trying to do is to change the values of above the sec_diag inside the matrix with the odds numbers from 9-1 [9,7,5,3,1]
import numpy as np
x = np.zeros((6,6), int)
sec_diag = np.diagonal(np.fliplr(x), offset=1)
The result should look like this:
[[0,0,0,0,9,0],
[0,0,0,7,0,0],
[0,0,5,0,0,0],
[0,3,0,0,0,0],
[1,0,0,0,0,0],
[0,0,0,0,0,0]]
EDIT: np.fill_diagonal isn't going to work.
You should use roll
x = np.zeros((6,6),dtype=np.int32)
np.fill_diagonal(np.fliplr(x), [9,7,5,3,1,0])
xr = np.roll(x,-1,axis=1)
print(xr)
Output
[[0 0 0 0 9 0]
[0 0 0 7 0 0]
[0 0 5 0 0 0]
[0 3 0 0 0 0]
[1 0 0 0 0 0]
[0 0 0 0 0 0]]
Maybe you should try with a double loop

How can I optimize searching and matching through multi-dimensional arrays?

I'm trying to match up the elements in 2 different arrays. Array_A is a 3d map of A_Clouds, Array_B is a 3d map of B_Clouds. Each "cloud" is continuous, i.e. any isolated pixels would define a new cloud. The values of the pixels are a single, unique integer for each cloud. Non-cloud values are 0. Here's a 2D example:
[[0 0 0 0 0 0 0 0 0]
[0 0 0 1 1 1 0 0 0]
[0 0 1 1 1 1 1 1 0]
[0 0 0 1 1 1 1 1 0]
[0 0 0 0 0 1 0 0 0]
[0 0 0 0 0 0 0 0 0]]
The output I need is simply the IDs (for both clouds) of each A_Cloud which is overlapping with a B_Cloud, and the number (locations not needed) of pixels which are overlapping between those clouds.
The problem is that these are both very large 3 dimensional arrays (~2000x2000x200, both are the same size). I'm basically doing a bunch of nested for loops, which is of course very slow. Is there a faster way that I could approach this problem? Thanks in advance.
This is what I have right now (simplified to 2d):
final_matches = []
for Acloud_id in ACloud_list:
Acloud_locs = list(set([(i,j) for j, line in enumerate(Array_A) for i,pix in enumerate(line) if pix == Acloud_id]))
matches = []
for loc in Acloud_locs:
Bcloud_pix = Array_B[loc[0]][loc[1]]
if Bcloud_pix:
matches.append(Bcloud_pix)
counter=collections.Counter(matches)
final_matches.append([Acloud_id, counter])
Thanks in advance!
Some considerations here:
for Acloud_id in ACloud_list:
Acloud_locs = list(set([(i,j) for j, line in enumerate(Array_A) for i,pix in enumerate(line) if pix == Acloud_id]))
If I've read that right, this needs to check every pixel in the array in order to generate the set, and it repeats that for every cloud in A. So if you have 500 clouds, you're checking every pixel 500 times. This is not going to scale well!
Might be more efficient to store the overlap counts in a dict, and just go through the arrays once:
overlaps=dict()
for i in possible_x_coords: # define these however you like
for j in possible_y_coords:
if (Array_A[i][j] and Array_B[i][j]):
overlaps[(Array_A[i][j],Array_B[i][j])] = 1 + overlaps.get((Array_A[i][j],Array_B[i][j]),0)
(apologies for any errors, I'm on the road and can't test my code)
update: You've clarified that the arrays are about 80% sparse. If that figure was a lot higher, and if you had control over the format of your inputs, I'd suggest looking into sparse array formats - if your input only stores the non-zero values for A, this can save you the trouble of checking for zero values in A. However, for something that's only 80% sparse, I'm not sure how much efficiency this would add.

python h5py bug when feeding multidimensional dataset

Here is my problem, it works in case 1, not in case 2:
import h5py
import numpy as np
data = np.random.randint(0,256,(5,), np.uint8)
f = h5py.File('test.h5','w')
f.create_dataset('1',(3,5), np.uint8)
f.create_dataset('2',(1,3,5), np.uint8)
print("case 1 before:\n",f['1'].value)
# case 1 before:
# [[0 0 0 0 0]
# [0 0 0 0 0]
# [0 0 0 0 0]]
f['1'][0] = data
print("case 1 after:\n",f['1'].value)
# case 1 after:
# [[ 75 215 125 175 193]
# [ 0 0 0 0 0]
# [ 0 0 0 0 0]]
print()
print()
print("case 2 before:\n",f['2'].value)
# case 2 before:
# [[[0 0 0 0 0]
# [0 0 0 0 0]
# [0 0 0 0 0]]]
f['2'][0][0] = data
print("case 2 after:\n",f['2'].value)
# case 2 after:
# [[[0 0 0 0 0]
# [0 0 0 0 0]
# [0 0 0 0 0]]]
Does anyone can explain to me what i am doing wrong?
(please do not suggest to create a np.array whith shape equal to my dataset shape, because I work with way more dimentions/size!!)
Don't used chained-indexing when making assignments. Instead of
f['2'][0][0] = data
Use
f['2'][0,0] = data
f['2'][0] returns a new array whose data is copied from f['2']. f['2'][0][0] = data assigns data to this new array. The assignment has no effect on f['2'].
In contrast, f['2'][0,0] = data modifies f['2'].
Under the hood, remember that foo[x] calls foo.__getitem__(x).
and foo[x] = y calls foo.__setitem__(x, y).
So f['2'][0][0] = data calls
f.__getitem__('2').__getitem__(0).__setitem(0, data)
f.__getitem__('2') returns a Dataset,
f.__getitem__('2').__getitem__(0) returns a NumPy array
f.__getitem__('2').__getitem__(0).__setitem(0, data) modifies that NumPy array
Whereas, f['2'][0,0] = data calls
f.__getitem__('2').__setitem__((0,0), data)
Now it is the Dataset's __setitem__ method that gets called, which naturally gives the Dataset an opportunity to modify its internal data.

Counting of adjacent cells in a numpy array

Past midnight and maybe someone has an idea how to tackle a problem of mine. I want to count the number of adjacent cells (which means the number of array fields with other values eg. zeroes in the vicinity of array values) as sum for each valid value!.
Example:
import numpy, scipy
s = ndimage.generate_binary_structure(2,2) # Structure can vary
a = numpy.zeros((6,6), dtype=numpy.int) # Example array
a[2:4, 2:4] = 1;a[2,4] = 1 # with example value structure
print a
>[[0 0 0 0 0 0]
[0 0 0 0 0 0]
[0 0 1 1 1 0]
[0 0 1 1 0 0]
[0 0 0 0 0 0]
[0 0 0 0 0 0]]
# The value at position [2,4] is surrounded by 6 zeros, while the one at
# position [2,2] has 5 zeros in the vicinity if 's' is the assumed binary structure.
# Total sum of surrounding zeroes is therefore sum(5+4+6+4+5) == 24
How can i count the number of zeroes in such way if the structure of my values vary?
I somehow believe to must take use of the binary_dilation function of SciPy, which is able to enlarge the value structure, but simple counting of overlaps can't lead me to the correct sum or does it?
print ndimage.binary_dilation(a,s).astype(a.dtype)
[[0 0 0 0 0 0]
[0 1 1 1 1 1]
[0 1 1 1 1 1]
[0 1 1 1 1 1]
[0 1 1 1 1 0]
[0 0 0 0 0 0]]
Use a convolution to count neighbours:
import numpy
import scipy.signal
a = numpy.zeros((6,6), dtype=numpy.int) # Example array
a[2:4, 2:4] = 1;a[2,4] = 1 # with example value structure
b = 1-a
c = scipy.signal.convolve2d(b, numpy.ones((3,3)), mode='same')
print numpy.sum(c * a)
b = 1-a allows us to count each zero while ignoring the ones.
We convolve with a 3x3 all-ones kernel, which sets each element to the sum of it and its 8 neighbouring values (other kernels are possible, such as the + kernel for only orthogonally adjacent values). With these summed values, we mask off the zeros in the original input (since we don't care about their neighbours), and sum over the whole array.
I think you already got it. after dilation, the number of 1 is 19, minus 5 of the starting shape, you have 14. which is the number of zeros surrounding your shape. Your total of 24 has overlaps.

Categories