How to compute the kind of distance matrix with vectorization

How to compute the kind of distance matrix with vectorization - python

I have an numpy array A of shape 4 X 3 X 2. Each line below is a 2D coordinate of a node. (Each three nodes compose a triangle in my finite element analysis.)
array([[[0., 2.], #node00
[2., 2.], #node01
[1., 1.]], #node02
[[0., 2.], #node10
[1., 1.], #node11
[0., 0.]], #node12
[[2., 2.], #node20
[1., 1.], #node21
[2., 0.]], #node22
[[0., 0.], #node30
[1., 1.], #node31
[2., 0.]]]) #node32
I have another numpy array B of coordinates of pre-computed "centers":
array([[1. , 1.66666667], # center0
[0.33333333, 1. ], # center1
[1.66666667, 1. ], # center2
[1. , 0.33333333]])# center3
How can I efficiently calculate a matrix C of Euclidian distance like this
dist(center0, node00) dist(center0,node01) dist(center0, node02)
dist(center1, node10) dist(center1,node11) dist(center1, node12)
dist(center2, node20) dist(center2,node21) dist(center2, node22)
dist(center3, node30) dist(center3,node31) dist(center3, node32)
where dist represents a Euclidian distance formula like math.dist or numpy.linalg.norm? Namely, the result matrix's i,j element is the distance between center-i to node-ij.
Vectorized code instead of loops is needed, as my actual data is from medical imaging which is very large. With a nested loop, one can obtain the expected output as follows:
In [63]: for i in range(4):
...: for j in range(3):
...: C[i,j]=math.dist(A[i,j], B[i])
In [67]: C
Out[67]:
array([[1.05409255, 1.05409255, 0.66666667],
[1.05409255, 0.66666667, 1.05409255],
[1.05409255, 0.66666667, 1.05409255],
[1.05409255, 0.66666667, 1.05409255]])
[Edit] This is different question from Pairwise operations (distance) on two lists in numpy, as things like indexing needs to be properly addressed here.

a = np.reshape(A, [12, 2])
b = B[np.repeat(np.arange(4), 3)]
c = np.reshape(np.linalg.norm(a - b, axis=-1), (4, 3))
c
# array([[1.05409255, 1.05409255, 0.66666667],
# [1.05409255, 0.66666667, 1.05409255],
# [1.05409255, 0.66666667, 1.05409255],
# [1.05409255, 0.66666667, 1.05409255]])

Related

how to split a list of arrays and switch the position of arrays in python

I have a list of numpy arrays. These arrays are related to some data sets and iteration. In my list arrays are sorted firstly based on the iterations and then data sets but I want to sort them firstly based on the iterations. This is my list:
all_data=[np.array([[1., 5.],[1., 5.],[1., 5.]]),\
np.array([[2., 5.],[2., 5.],[2., 5.]]),\
np.array([[3., 5.],[3., 5.],[3., 5.]]),\
np.array([[1., 50.],[1., 50.],[1., 50.]]),\
np.array([[2., 50.],[2., 50.],[2., 50.]]),\
np.array([[3., 50.],[3., 50.2],[3., 50.]]),\
np.array([[1., 500.],[1., 500.],[1., 500.]]),\
np.array([[2., 500.],[2., 500.],[2., 500.]]),\
np.array([[3., 500.],[3., 500.],[3., 500.]])]
As it can be seen in my list, the data stored in first three arrays are presenting three iterations (from 1 to 3) of one data set (which their last column is 5). From array number 4 to 6, I have the results of the same three iterations for another data set (which their last column is 50) and last three arrays are related to another data set. I porpusefuly copied this simplified numbers to make a visualization of what I want. I have the numbers iterations and data sets as:
n_data_sets=3.
n_iteration=3.
Then I tried firstly to split my list into the number of data sets using:
data=[all_data[i:i + n_iteration] for i in range(0, len(all_data), n_iteration)]
Then I tried the following code to rearrange my list but it was not successfull:
re_ar=[]
for i in range (len (data)-1):
for j in range (len(data[i])):
re_ar.append([data[i][j], data[i+1][j]])
This is my expected outcome:
[[np.array([[1., 5.],[1., 5.],[1., 5.]]),\
np.array([[1., 50.],[1., 50.],[1., 50.]]),\
np.array([[1., 500.],[1., 500.],[1., 500.]])],\
[np.array([[2., 5.],[2., 5.],[2., 5.]]),\
np.array([[2., 50.],[2., 50.],[2., 50.]]),\
np.array([[2., 500.],[2., 500.],[2., 500.]])],\
[np.array([[3., 5.],[3., 5.],[3., 5.]]),\
np.array([[3., 50.],[3., 50.2],[3., 50.]]),\
np.array([[3., 500.],[3., 500.],[3., 500.]])]]

What I think you are saying is that you want every n-th element from the list:
n_iteration = 3
data=[all_data[i:: n_iteration] for i in range(n_iteration)]
which gives
[[array([[1., 5.], [1., 5.], [1., 5.]]),
array([[ 1., 50.], [ 1., 50.], [ 1., 50.]]),
array([[ 1., 500.], [ 1., 500.], [ 1., 500.]])],
[array([[2., 5.], [2., 5.], [2., 5.]]),
array([[ 2., 50.], [ 2., 50.], [ 2., 50.]]),
array([[ 2., 500.], [ 2., 500.], [ 2., 500.]])],
[array([[3., 5.], [3., 5.], [3., 5.]]),
array([[ 3. , 50. ], [ 3. , 50.2], [ 3. , 50. ]]),
array([[ 3., 500.], [ 3., 500.], [ 3., 500.]])]]

How to create a specific upper triangular matrix?

I would like to create in python (using numpy) an upper triangular matrix in the form:
[[ 1, c, c^2],
[ 0, 1, c ],
[ 0, 0, 1 ]])
where c is a rational number and the rank of the matrix may vary (2, 3, 4, ...). Is there any smart way to do it other than creating rows and stacking them?

r = 3
c = 3
i,j = np.indices((r,r))
np.triu(float(c)**(j-i))
Result:
array([[1., 3., 9.],
[0., 1., 3.],
[0., 0., 1.]])

There are probably more straightforward solutions but this is what I came up with:
import numpy as np
c=5
m=np.triu(c**np.triu(np.ones((3,3)), 1).cumsum(axis =1))
print(m)
output:
[[ 1. 5. 25.]
[ 0. 1. 5.]
[ 0. 0. 1.]]

Depthwise stacking with NumPy

I am using the following code and getting an output numpy ndarray of size (2,9) that I am then trying to reshape into size (3,3,2). My hope was that calling reshape using (3,3,2) as the dimensions of the new array would take each row of the 2x9 array and shape it into a 3x3 array and wrap these two 3x3 arrays into another array.
For instance, when I index the result I would like the following behavior:
input: print(result)
output: [[ 2. 2. 1. 0. 8. 5. 2. 4. 5.]
[ 4. 7. 5. 6. 4. 3. -3. 2. 1.]]
result = result.reshape((3,3,2))
DESIRED NEW BEHAVIOR
input: print(result[:,:,0])
output: [[2. 2. 1.]
[0. 8. 5.]
[2. 4. 5.]]
input: print(result[:,:,1])
output: [[ 4. 7. 5.]
[ 6. 4. 3.]
[-3. 2. 1.]]
ACTUAL NEW BEHAVIOR
input: print(result[:,:,0])
output: [[2. 1. 8.]
[2. 5. 7.]
[6. 3. 2.]]
input: print(result[:,:,1])
output: [[ 2. 0. 5.]
[ 4. 4. 5.]
[ 4. -3. 1.]]
Is there a way to specify to reshape that I would like to go row by row along the depth dimension? I'm very confused as to why numpy by default makes the choice it does for reshape.
Here is the code I am using to produce result matrix, this code may or may not be necessary to analyze my issue. I feel as if it will not be necessary but am including it for completeness:
import numpy as np
# im2col implementation assuming width/height dimensions of filter and input_vol
# are the same (i.e. input_vol_width is equal to input_vol_height and the same
# for the filter spatial dimensions, although input_vol_width need not equal
# filter_vol_width)
def im2col(input, filters, input_vol_dims, filter_size_dims, stride):
receptive_field_size = 1
for dim in filter_size_dims:
receptive_field_size *= dim
output_width = output_height = int((input_vol_dims[0]-filter_size_dims[0])/stride + 1)
X_col = np.zeros((receptive_field_size,output_width*output_height))
W_row = np.zeros((len(filters),receptive_field_size))
pos = 0
for i in range(0,input_vol_dims[0]-1,stride):
for j in range(0,input_vol_dims[1]-1,stride):
X_col[:,pos] = input[i:i+stride+1,j:j+stride+1,:].ravel()
pos += 1
for i in range(len(filters)):
W_row[i,:] = filters[i].ravel()
bias = np.array([[1], [0]])
result = np.dot(W_row, X_col) + bias
print(result)
if __name__ == '__main__':
x = np.zeros((7, 7, 3))
x[:,:,0] = np.array([[0,0,0,0,0,0,0],
[0,1,1,0,0,1,0],
[0,2,2,1,1,1,0],
[0,2,0,2,1,0,0],
[0,2,0,0,1,0,0],
[0,0,0,1,1,0,0],
[0,0,0,0,0,0,0]])
x[:,:,1] = np.array([[0,0,0,0,0,0,0],
[0,2,0,1,0,2,0],
[0,0,1,2,1,0,0],
[0,2,0,0,2,0,0],
[0,2,1,0,0,0,0],
[0,1,2,2,2,0,0],
[0,0,0,0,0,0,0]])
x[:,:,2] = np.array([[0,0,0,0,0,0,0],
[0,0,0,2,1,1,0],
[0,0,0,2,2,0,0],
[0,2,1,0,2,2,0],
[0,0,1,2,1,2,0],
[0,2,0,0,2,1,0],
[0,0,0,0,0,0,0]])
w0 = np.zeros((3,3,3))
w0[:,:,0] = np.array([[1,1,0],
[1,-1,1],
[-1,1,1]])
w0[:,:,1] = np.array([[-1,-1,0],
[1,-1,1],
[1,-1,-1]])
w0[:,:,2] = np.array([[0,0,0],
[0,0,1],
[1,0,1]]
w1 = np.zeros((3,3,3))
w1[:,:,0] = np.array([[0,-1,1],
[1,1,0],
[1,1,0]])
w1[:,:,1] = np.array([[-1,-1,1],
[1,0,1],
[0,1,1]])
w1[:,:,2] = np.array([[-1,-1,0],
[1,-1,0],
[1,1,0]])
filters = np.array([w0,w1])
im2col(x,np.array([w0,w1]),x.shape,w0.shape,2)

Let's reshape a bit differently and then do a depth-wise dstack:
arr = np.dstack(result.reshape((-1,3,3)))
arr[..., 0]
array([[2., 2., 1.],
[0., 8., 5.],
[2., 4., 5.]])

Reshape keeps the original order of the elements
In [215]: x=np.array(x)
In [216]: x.shape
Out[216]: (2, 9)
Reshaping the size 9 dimension into a 3x3 keeps the element order that you want:
In [217]: x.reshape(2,3,3)
Out[217]:
array([[[ 2., 2., 1.],
[ 0., 8., 5.],
[ 2., 4., 5.]],
[[ 4., 7., 5.],
[ 6., 4., 3.],
[-3., 2., 1.]]])
But you have to index it with [0,:,:] to see one of those blocks.
To see the same blocks with [:,:,0], you have to move that size 2 dimension to the end. COLDSPEED's dstack does that by iterating on the first dimension, and joining the 2 blocks (each 3x3) on a new third dimension). Another way is to use transpose to reorder the dimensions:
In [218]: x.reshape(2,3,3).transpose(1,2,0)
Out[218]:
array([[[ 2., 4.],
[ 2., 7.],
[ 1., 5.]],
[[ 0., 6.],
[ 8., 4.],
[ 5., 3.]],
[[ 2., -3.],
[ 4., 2.],
[ 5., 1.]]])
In [219]: y = _
In [220]: y.shape
Out[220]: (3, 3, 2)
In [221]: y[:,:,0]
Out[221]:
array([[2., 2., 1.],
[0., 8., 5.],
[2., 4., 5.]])

Adding Numpy arrays like Counters

Since collections.Counter is so slow, I am pursuing a faster method of summing mapped values in Python 2.7. It seems like a simple concept and I'm kind of disappointed in the built-in Counter method.
Basically, I need to be able to take arrays like this:
array([[ 0., 2.],
[ 2., 2.],
[ 3., 1.]])
array([[ 0., 3.],
[ 1., 1.],
[ 2., 5.]])
And then "add" them so they look like this:
array([[ 0., 5.],
[ 1., 1.],
[ 2., 7.],
[ 3., 1.]])
If there isn't a good way to do this quickly and efficiently, I'm open to any other ideas that will allow me to do something similar to this, and I'm open to modules other than Numpy.
Thanks!
Edit: Ready for some speedtests?
Intel win 64bit machine. All of the following values are in seconds; 20000 loops.
collections.Counter results:
2.131000, 2.125000, 2.125000
Divakar's union1d + masking results:
1.641000, 1.633000, 1.625000
Divakar's union1d + indexing results:
0.625000, 0.625000, 0.641000
Histogram results:
1.844000, 1.938000, 1.858000
Pandas results:
16.659000, 16.686000, 16.885000
Conclusions: union1d + indexing wins, the array size is too small for Pandas to be effective, and the histogram approach blew my mind with its simplicity but I'm guessing it takes too much overhead to create. All of the responses I received were very good, though. This is what I used to get the numbers. Thanks again!
Edit: And it should be mentioned that using Counter1.update(Counter2.elements()) is terrible despite doing the same exact thing (65.671000 sec).
Later Edit: I've been thinking about this a lot, and I've came to realize that, with Numpy, it might be more effective to fill each array with zeros so that the first column isn't even needed since we can just use the index, and that would also make it much easier to add multiple arrays together as well as do other functions. Additionally, Pandas makes more sense than Numpy since there would be no need to 0-fill, and it would definitely be more effective with large data sets (however, Numpy has the advantage of being compatible on more platforms, like GAE, if that matters at all). Lastly, the answer I checked was definitely the best answer for the exact question I asked--adding the two arrays in the way I showed--but I think what I needed was a change in perspective.

Here's one approach with np.union1d and masking -
def app1(a,b):
c0 = np.union1d(a[:,0],b[:,0])
out = np.zeros((len(c0),2))
out[:,0] = c0
mask1 = np.in1d(c0,a[:,0])
out[mask1,1] = a[:,1]
mask2 = np.in1d(c0,b[:,0])
out[mask2,1] += b[:,1]
return out
Sample run -
In [174]: a
Out[174]:
array([[ 0., 2.],
[ 12., 2.],
[ 23., 1.]])
In [175]: b
Out[175]:
array([[ 0., 3.],
[ 1., 1.],
[ 12., 5.]])
In [176]: app1(a,b)
Out[176]:
array([[ 0., 5.],
[ 1., 1.],
[ 12., 7.],
[ 23., 1.]])
Here's another with np.union1d and indexing -
def app2(a,b):
n = np.maximum(a[:,0].max(), b[:,0].max())+1
c0 = np.union1d(a[:,0],b[:,0])
out0 = np.zeros((int(n), 2))
out0[a[:,0].astype(int),1] = a[:,1]
out0[b[:,0].astype(int),1] += b[:,1]
out = out0[c0.astype(int)]
out[:,0] = c0
return out
For the case where all indices are covered by the first column values in a and b -
def app2_specific(a,b):
c0 = np.union1d(a[:,0],b[:,0])
n = c0[-1]+1
out0 = np.zeros((int(n), 2))
out0[a[:,0].astype(int),1] = a[:,1]
out0[b[:,0].astype(int),1] += b[:,1]
out0[:,0] = c0
return out0
Sample run -
In [234]: a
Out[234]:
array([[ 0., 2.],
[ 2., 2.],
[ 3., 1.]])
In [235]: b
Out[235]:
array([[ 0., 3.],
[ 1., 1.],
[ 2., 5.]])
In [236]: app2_specific(a,b)
Out[236]:
array([[ 0., 5.],
[ 1., 1.],
[ 2., 7.],
[ 3., 1.]])

If you know the number of fields, use np.bincount.
c = np.vstack([a, b])
counts = np.bincount(c[:, 0], weights = c[:, 1], minlength = numFields)
out = np.vstack([np.arange(numFields), counts]).T
This works if you're getting all your data at once. Make a list of your arrays and vstack them. If you're getting data chunks sequentially, you can use np.add.at to do the same thing.
out = np.zeros(2, numFields)
out[:, 0] = np.arange(numFields)
np.add.at(out[:, 1], a[:, 0], a[:, 1])
np.add.at(out[:, 1], b[:, 0], b[:, 1])

You can use a basic histogram, this will deal with gaps, too. You can filter out zero-count entries if need be.
import numpy as np
x = np.array([[ 0., 2.],
[ 2., 2.],
[ 3., 1.]])
y = np.array([[ 0., 3.],
[ 1., 1.],
[ 2., 5.],
[ 5., 3.]])
c, w = np.vstack((x,y)).T
h, b = np.histogram(c, weights=w,
bins=np.arange(c.min(),c.max()+2))
r = np.vstack((b[:-1], h)).T
print(r)
# [[ 0. 5.]
# [ 1. 1.]
# [ 2. 7.]
# [ 3. 1.]
# [ 4. 0.]
# [ 5. 3.]]
r_nonzero = r[r[:,1]!=0]

Pandas have some functions doing exactly what you intend
import pandas as pd
pda = pd.DataFrame(a).set_index(0)
pdb = pd.DataFrame(b).set_index(0)
result = pd.concat([pda, pdb], axis=1).fillna(0).sum(axis=1)
Edit: If you actually need the data back in numpy format, just do
array_res = result.reset_index(name=1).values

This is a quintessential grouping problem, which numpy_indexed (disclaimer: I am its author) was created to solve elegantly and efficiently:
import numpy_indexed as npi
C = np.concatenate([A, B], axis=0)
labels, sums = npi.group_by(C[:, 0]).sum(C[:, 1])
Note: its cleaner to maintain your label arrays as a seperate int array; floats are finicky when it comes to labeling things, with positive and negative zeros, and printed values not relaying all binary state. Better to use ints for that.

Joining Array In Python

Hi I want to join multiple arrays in python, using numpy to form multidimensional arrays, it's inside of a for loop, this is a pseudocode
import numpy as np
h = np.zeros(4)
for x in range(3):
x1 = some array of length of 4 returned from a previous function (3,5,6,7)
h = np.concatenate((h,x1), axis =0)
The first iteration goes fine, but during the second iteration on the for loop I get the following error,
ValueError: all the input arrays must have same number of dimensions
The output array should look something like this
[[0,0,0,0],[3,5,6,7],[6,3,6,7]]
etc
So how can I join the arrays?
Thanks

You need to use vstack. It allows you to stack arrays. You take a sequence of arrays and stack them vertically to make a single array
import numpy as np
h = np.zeros(4)
for x in range(3):
x1 = [3,5,6,7]
h = np.vstack((h,x1))
# not h = np.concatenate((h,x1), axis =0)
print h
Output:
[[ 0. 0. 0. 0.]
[ 3. 5. 6. 7.]
[ 3. 5. 6. 7.]
[ 3. 5. 6. 7.]]
more edits later.
If you do want to use cocatenate only, you can do the following way as well:
import numpy as np
h1 = np.zeros(4)
for x in range(3):
x1 = np.array([3,5,6,7])
h1= np.concatenate([h1,x1.T], axis =0)
print h1.shape
print h1.reshape(4,4)
Output:
(16,)
[[ 0. 0. 0. 0.]
[ 3. 5. 6. 7.]
[ 3. 5. 6. 7.]
[ 3. 5. 6. 7.]]
Both have different applications. You can choose according to your need.

There are multiple ways of doing this. I'll list a few examples:
First, we import numpy and define a function that generates those arrays of length 4.
import numpy as np
def previous_function_returning_array_of_length_4(x):
return np.array(range(4)) + x
The first way involves creating a list of arrays, then calling numpy.array() to convert the list to a 2D array.
h0 = np.zeros(4)
arrays = [h0]
for x in range(3):
x1 = previous_function_returning_array_of_length_4(x)
arrays.append(x1)
h = np.array(arrays)
You can do the same with np.vstack():
h0 = np.zeros(4)
arrays = [h0]
for x in range(3):
x1 = previous_function_returning_array_of_length_4(x)
arrays.append(x1)
h = np.vstack(arrays)
Alternatively, if you know how many arrays you are going to create, you can create the 2D array first and fill in the values:
h = np.zeros((4, 4))
for ii in range(3):
x1 = previous_function_returning_array_of_length_4(ii)
h[ii + 1, ...] = x1
There are more ways, but hopefully, this will give you an idea of what to do.

It is best to collect values in a list, and perform the concatenate or array creation once, at the end.
h = [np.zeros(4)]
for x in range(3):
x1 = some array of length of 4 returned from a previous function (3,5,6,7)
h = h.append(x1)
h = np.array(h)
# or h = np.vstack(h)
All the concatenate/stack/array functions takes a list of multiple items. It is faster to append to a list than to do a concatenate of 2 items.
======================
Let's try your approach step by step:
In [189]: h=np.zeros(4)
In [190]: h
Out[190]: array([ 0., 0., 0., 0.]) # 1d array (4,) shape
In [191]: x1=np.array([3,5,6,7]) # another 1d
In [192]: h1=np.concatenate((h,x1),axis=0)
In [193]: h1
Out[193]: array([ 0., 0., 0., 0., 3., 5., 6., 7.])
In [194]: h1.shape
Out[194]: (8,) # also a 1d array, but with 8 items
In [195]: x1=np.array([6,3,6,7])
In [196]: h1=np.concatenate((h1,x1),axis=0)
In [197]: h1
Out[197]: array([ 0., 0., 0., 0., 3., 5., 6., 7., 6., 3., 6., 7.])
In this case I'm adding (4,) arrays one after the other, still getting a 1d array.
If I go back an create x1 as 2d (1,4):
In [198]: h=np.zeros(4)
In [199]: x1=np.array([[6,3,6,7]])
In [200]: h1=np.concatenate((h,x1),axis=0)
...
ValueError: all the input arrays must have same number of dimensions
I get this dimension error right away.
The fact that you get the error on the 2nd iteration suggests that the 1st x1 is (4,), but the 2nd is 2d.
When you have dimensions errors like this, check the shapes.
vstack adds dimensions to the inputs, as needed, so you can build 2d arrays:
In [207]: h=np.zeros(4)
In [208]: x1=np.array([3,5,6,7])
In [209]: h=np.vstack((h,x1))
In [210]: h
Out[210]:
array([[ 0., 0., 0., 0.],
[ 3., 5., 6., 7.]])
In [211]: x1=np.array([6,3,6,7])
In [212]: h=np.vstack((h,x1))
In [213]: h
Out[213]:
array([[ 0., 0., 0., 0.],
[ 3., 5., 6., 7.],
[ 6., 3., 6., 7.]])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to compute the kind of distance matrix with vectorization - python

a = np.reshape(A, [12, 2]) b = B[np.repeat(np.arange(4), 3)] c = np.reshape(np.linalg.norm(a - b, axis=-1), (4, 3)) c # array([[1.05409255, 1.05409255, 0.66666667], # [1.05409255, 0.66666667, 1.05409255], # [1.05409255, 0.66666667, 1.05409255], # [1.05409255, 0.66666667, 1.05409255]])

Related

how to split a list of arrays and switch the position of arrays in python

How to create a specific upper triangular matrix?

Depthwise stacking with NumPy

Adding Numpy arrays like Counters

Joining Array In Python

Categories

Resources