How to iterate two arrays in numpy/numba (zip throws error) - python

I have the following code for calculating the distance of a path given a distance matrix.
dist_matrix = np.array(
[
[0.0, 0.5, 1.0, 1.41421356, 1.0],
[0.5, 0.0, 0.5, 1.11803399, 1.11803399],
[1.0, 0.5, 0.0, 1.0, 1.41421356],
[1.41421356, 1.11803399, 1.0, 0.0, 1.0],
[1.0, 1.11803399, 1.41421356, 1.0, 0.0],
]
)
#jit(nopython=True)
def calc_dist(tour):
return np.sum(np.array([dist_matrix[i, j] for i, j in zip([tour[0:-1]], tour[1:])]))
tour = [0, 1, 2, 3, 4]
print(calc_dist(tour))
Expected output: 2.118
but it is throwing the following error:
numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Invalid use of Function(<intrinsic range_iter_len>) with argument(s) of type(s): (zip(iter(list(reflected list(int64))), iter(reflected list(int64))))```
I know I could remove the error by setting nopython=False, but my understanding is it's not really worth using numba unless you can use it with nopython=True. But I'm having trouble figuring out how to replace the zip in my calc_distance function. What's the best way to replace zip with numpy/numba?

If you have numpy arrays you can use dstack():
import numpy as np
a = np.array([1,2,3,4,5])
b = np.array([6,7,8,9,10])
c = np.dstack((a,b))
#or
d = np.column_stack((a,b))
>>> c
array([[[ 1, 6],
[ 2, 7],
[ 3, 8],
[ 4, 9],
[ 5, 10]]])
>>> d
array([[ 1, 6],
[ 2, 7],
[ 3, 8],
[ 4, 9],
[ 5, 10]])
>>> c.shape
(1, 5, 2)
>>> d.shape
(5, 2)

Related

Losing decimal when doing array operation in Python

I tried to make a function and inside it there is a code to divides a column with its column sum and here I come up with.
A = np.array([[1,2,3,4],[1,2,3,4],[1,2,3,4]])
print(A)
A = A.T
Asum = A.sum(axis=1)
print(Asum)
for i in range(len(Asum)):
A[:,i] = A[:,i]/Asum[i]
I'm hoping some decimal matrix but it automatically turn into integer. It gives me a zero matrix. Where do I go wrong?
You must change:
Asum = A.sum(axis=1)
by:
Asum = A.sum(axis=0)
To get the column by column sum.
Also you can get the division easily with numpy.divide:
np.divide(A, Asum)
#array([[0.1, 0.1, 0.1],
# [0.2, 0.2, 0.2],
# [0.3, 0.3, 0.3],
# [0.4, 0.4, 0.4]])
Or simply with:
A/Asum
Your A is integer dtype; assigned floats get truncated. If A started as a float array your iteration would work. But you don't need to iterate to perform this calculation:
In [108]: A = np.array([[1,2,3,4],[1,2,3,4],[1,2,3,4]]).T
In [109]: A
Out[109]:
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4]])
In [110]: Asum = A.sum(axis=1)
In [111]: Asum
Out[111]: array([ 3, 6, 9, 12])
A is (4,3), Asum is (4,). If we make it (4,1):
In [114]: Asum[:,None]
Out[114]:
array([[ 3],
[ 6],
[ 9],
[12]])
we can perform the divide without iteration (review broadcasting if necessary):
In [115]: A/Asum[:,None]
Out[115]:
array([[0.33333333, 0.33333333, 0.33333333],
[0.33333333, 0.33333333, 0.33333333],
[0.33333333, 0.33333333, 0.33333333],
[0.33333333, 0.33333333, 0.33333333]])
sum has keepdims parameter that makes this kind of calculation easier:
In [117]: Asum = A.sum(axis=1, keepdims=True)
In [118]: Asum
Out[118]:
array([[ 3],
[ 6],
[ 9],
[12]])

Conditional reduce

I would like to reduce a variable number of elements (or slices) of an array multiple times, and put the result into a new array. Kind of like a masked np.apply_along_axis, but we stay in numpy
For example, to reduce by mean:
to_reduce = np.array([
[0, 1, 1, 0, 0],
[0, 0, 0, 1, 1],
[1, 0, 1, 0, 1],
[1, 1, 1, 1, 0]]).astype(np.bool8)
arr = np.array([
[1.0, 2.0, 3.0],
[1.0, 2.0, 4.0],
[2.0, 2.0, 3.0],
[2.0, 2.0, 4.0],
[1.0, 0.0, 3.0]])
I want:
np.array([
[1.5, 2.0, 3.5],
[1.5, 1.0, 3.5],
[1.33333, 1.33333, 3.0],
[1.5, 2.0, 3.5]])
The slow way would be:
out = np.empty((4, 3))
for j, mask in enumerate(to_reduce):
out[j] = np.mean(arr[mask], axis=0)
Here's one simple and efficient way with matrix-multiplication -
In [56]: to_reduce.dot(arr)/to_reduce.sum(1)[:,None]
Out[56]:
array([[1.5 , 2. , 3.5 ],
[1.5 , 1. , 3.5 ],
[1.33333333, 1.33333333, 3. ],
[1.5 , 2. , 3.5 ]])

Convert numpy matrix to python array

Are there alternative or better ways to convert a numpy matrix to a python array than this?
>>> import numpy
>>> import array
>>> b = numpy.matrix("1.0 2.0 3.0; 4.0 5.0 6.0", dtype="float16")
>>> print(b)
[[ 1. 2. 3.]
[ 4. 5. 6.]]
>>> a = array.array("f")
>>> a.fromlist((b.flatten().tolist())[0])
>>> print(a)
array('f', [1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
You could convert to a NumPy array and generate its flattened version with .ravel() or .flatten(). This could also be achieved by simply using the function np.ravel itself as it does both these takes under the hood. Finally, use array.array() on it, like so -
a = array.array('f',np.ravel(b))
Sample run -
In [107]: b
Out[107]:
matrix([[ 1., 2., 3.],
[ 4., 5., 6.]], dtype=float16)
In [108]: array.array('f',np.ravel(b))
Out[108]: array('f', [1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
here is an example :
>>> x = np.matrix(np.arange(12).reshape((3,4))); x
matrix([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> x.tolist()
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]

numpy arrays and list multiplication, finding the maximum and its index

I know that this question might seem repeated, but I tried to debug my code in several ways and still don't know what it is wrong. Below is my code.
def myfunc(LUT,LUT_Prob,test):
x = []
y = []
z = []
x.extend(hamming_distance(test, LUT[i]) for i in range (len(LUT)))
y = [(len(LUT[0])) - j for j in x]
z = [a*b for a,b in zip(y,LUT_prob)]
MAP = max(z)
closest_index = z.index(max(z))
return x, y, LUT_Prob, z, MAP, closest_index
In another script:
Winner = []
for j in range (0,5):
Winner.append(myfunc(LUT1,LUT_Prob1,test[j]))
print 'Winner = {}' .format(Winner)
The output is:
Winner = [([2, 4, 2, 4], [8, 6, 8, 6], [array([ 0.4, 0.2, 0.2, 0.2])], [[array([ 3.2, 1.6, 1.6, 1.6])]], [array([ 3.2, 1.6, 1.6, 1.6])], 0), ([1, 3, 1, 3], [9, 7, 9, 7], [array([ 0.4, 0.2, 0.2, 0.2])], [[array([ 3.6, 1.8, 1.8, 1.8])]], [array([ 3.6, 1.8, 1.8, 1.8])], 0), ([3, 5, 5, 3], [7, 5, 5, 7], [array([ 0.4, 0.2, 0.2, 0.2])], [[array([ 2.8, 1.4, 1.4, 1.4])]], [array([ 2.8, 1.4, 1.4, 1.4])], 0), ([3, 5, 3, 5], [7, 5, 7, 5], [array([ 0.4, 0.2, 0.2, 0.2])], [[array([ 2.8, 1.4, 1.4, 1.4])]], [array([ 2.8, 1.4, 1.4, 1.4])], 0), ([3, 3, 3, 1], [7, 7, 7, 9], [array([ 0.4, 0.2, 0.2, 0.2])], [[array([ 2.8, 1.4, 1.4, 1.4])]], [array([ 2.8, 1.4, 1.4, 1.4])], 0)]
Note: The output is the returned values x, y, LUT_Prob, z, MAP, closest_index with the same order and iterated 5 times.
The errors that I am getting:
1- z is not as expected, the expectation is multiply y and LUT_Prob element wise what I am getting is the results of multiplying the first element of y by LUT_Prob.
2- MAP should be only one value that is in this case "3.2" however there is an array instead.
3- Max_index in this case is correct, however, if the the "3.2" is anywhere else Max_index remains "0".
So, can somebody help?

How can I vectorize the averaging of 2x2 sub-arrays of numpy array?

I have a very a very large 2D numpy array that contains 2x2 subsets that I need to take the average of. I am looking for a way to vectorize this operation. For example, given x:
# |- col 0 -| |- col 1 -| |- col 2 -|
x = np.array( [[ 0.0, 1.0, 2.0, 3.0, 4.0, 5.0], # row 0
[ 6.0, 7.0, 8.0, 9.0, 10.0, 11.0], # row 0
[12.0, 13.0, 14.0, 15.0, 16.0, 17.0], # row 1
[18.0, 19.0, 20.0, 21.0, 22.0, 23.0]]) # row 1
I need to end up with a 2x3 array which are the averages of each 2x2 sub array, i.e.:
result = np.array( [[ 3.5, 5.5, 7.5],
[15.5, 17.5, 19.5]])
so element [0,0] is calculated as the average of x[0:2,0:2], while element [0,1] would be the average of x[2:4, 0:2]. Does numpy have vectorized/efficient ways of doing aggregates on subsets like this?
If we form the reshaped matrix y = x.reshape(2,2,3,2), then the (i,j) 2x2 submatrix is given by y[i,:,j,:]. E.g.:
In [340]: x
Out[340]:
array([[ 0., 1., 2., 3., 4., 5.],
[ 6., 7., 8., 9., 10., 11.],
[ 12., 13., 14., 15., 16., 17.],
[ 18., 19., 20., 21., 22., 23.]])
In [341]: y = x.reshape(2,2,3,2)
In [342]: y[0,:,0,:]
Out[342]:
array([[ 0., 1.],
[ 6., 7.]])
In [343]: y[1,:,2,:]
Out[343]:
array([[ 16., 17.],
[ 22., 23.]])
To get the mean of the 2x2 submatrices, use the mean method, with axis=(1,3):
In [344]: y.mean(axis=(1,3))
Out[344]:
array([[ 3.5, 5.5, 7.5],
[ 15.5, 17.5, 19.5]])
If you are using an older version of numpy that doesn't support using a tuple for the axis, you could do:
In [345]: y.mean(axis=1).mean(axis=-1)
Out[345]:
array([[ 3.5, 5.5, 7.5],
[ 15.5, 17.5, 19.5]])
See the link given by #dashesy in a comment for more background on the reshaping "trick".
To generalize this to a 2-d array with shape (m, n), where m and n are even, use
y = x.reshape(x.shape[0]/2, 2, x.shape[1], 2)
y can then be interpreted as an array of 2x2 arrays. The first and third index slots of the 4-d array act as the indices that select one of the 2x2 blocks. To get the upper left 2x2 block, use y[0, :, 0, :]; to the block in the second row and third column of blocks, use y[1, :, 2, :]; and in general, to acces block (j, k), use y[j, :, k, :].
To compute the reduced array of averages of these blocks, use the mean method, with axis=(1, 3) (i.e. average over axes 1 and 3):
avg = y.mean(axis=(1, 3))
Here's an example where x has shape (8, 10), so the array of averages of the 2x2 blocks has shape (4, 5):
In [10]: np.random.seed(123)
In [11]: x = np.random.randint(0, 4, size=(8, 10))
In [12]: x
Out[12]:
array([[2, 1, 2, 2, 0, 2, 2, 1, 3, 2],
[3, 1, 2, 1, 0, 1, 2, 3, 1, 0],
[2, 0, 3, 1, 3, 2, 1, 0, 0, 0],
[0, 1, 3, 3, 2, 0, 3, 2, 0, 3],
[0, 1, 0, 3, 1, 3, 0, 0, 0, 2],
[1, 1, 2, 2, 3, 2, 1, 0, 0, 3],
[2, 1, 0, 3, 2, 2, 2, 2, 1, 2],
[0, 3, 3, 3, 1, 0, 2, 0, 2, 1]])
In [13]: y = x.reshape(x.shape[0]/2, 2, x.shape[1]/2, 2)
Take a look at a couple of the 2x2 blocks:
In [14]: y[0, :, 0, :]
Out[14]:
array([[2, 1],
[3, 1]])
In [15]: y[1, :, 2, :]
Out[15]:
array([[3, 2],
[2, 0]])
Compute the averages of the blocks:
In [16]: avg = y.mean(axis=(1, 3))
In [17]: avg
Out[17]:
array([[ 1.75, 1.75, 0.75, 2. , 1.5 ],
[ 0.75, 2.5 , 1.75, 1.5 , 0.75],
[ 0.75, 1.75, 2.25, 0.25, 1.25],
[ 1.5 , 2.25, 1.25, 1.5 , 1.5 ]])

Categories