Related
I have the following code for calculating the distance of a path given a distance matrix.
dist_matrix = np.array(
[
[0.0, 0.5, 1.0, 1.41421356, 1.0],
[0.5, 0.0, 0.5, 1.11803399, 1.11803399],
[1.0, 0.5, 0.0, 1.0, 1.41421356],
[1.41421356, 1.11803399, 1.0, 0.0, 1.0],
[1.0, 1.11803399, 1.41421356, 1.0, 0.0],
]
)
#jit(nopython=True)
def calc_dist(tour):
return np.sum(np.array([dist_matrix[i, j] for i, j in zip([tour[0:-1]], tour[1:])]))
tour = [0, 1, 2, 3, 4]
print(calc_dist(tour))
Expected output: 2.118
but it is throwing the following error:
numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Invalid use of Function(<intrinsic range_iter_len>) with argument(s) of type(s): (zip(iter(list(reflected list(int64))), iter(reflected list(int64))))```
I know I could remove the error by setting nopython=False, but my understanding is it's not really worth using numba unless you can use it with nopython=True. But I'm having trouble figuring out how to replace the zip in my calc_distance function. What's the best way to replace zip with numpy/numba?
If you have numpy arrays you can use dstack():
import numpy as np
a = np.array([1,2,3,4,5])
b = np.array([6,7,8,9,10])
c = np.dstack((a,b))
#or
d = np.column_stack((a,b))
>>> c
array([[[ 1, 6],
[ 2, 7],
[ 3, 8],
[ 4, 9],
[ 5, 10]]])
>>> d
array([[ 1, 6],
[ 2, 7],
[ 3, 8],
[ 4, 9],
[ 5, 10]])
>>> c.shape
(1, 5, 2)
>>> d.shape
(5, 2)
How would I do the following:
With a 3D numpy array I want to take the mean in one dimension and assign the values back to a 3D array with the same shape, with duplicate values of the means in the direction they were derived...
I'm struggling to work out an example in 3D but in 2D (4x4) it would look a bit like this I guess
array[[1, 1, 2, 2]
[2, 2, 1, 0]
[1, 1, 2, 2]
[4, 8, 3, 0]]
becomes
array[[2, 3, 2, 1]
[2, 3, 2, 1]
[2, 3, 2, 1]
[2, 3, 2, 1]]
I'm struggling with the np.mean and the loss of dimensions when take an average.
You can use the keepdims keyword argument to keep that vanishing dimension, e.g.:
>>> a = np.random.randint(10, size=(4, 4)).astype(np.double)
>>> a
array([[ 7., 9., 9., 7.],
[ 7., 1., 3., 4.],
[ 9., 5., 9., 0.],
[ 6., 9., 1., 5.]])
>>> a[:] = np.mean(a, axis=0, keepdims=True)
>>> a
array([[ 7.25, 6. , 5.5 , 4. ],
[ 7.25, 6. , 5.5 , 4. ],
[ 7.25, 6. , 5.5 , 4. ],
[ 7.25, 6. , 5.5 , 4. ]])
You can resize the array after taking the mean:
In [24]: a = np.array([[1, 1, 2, 2],
[2, 2, 1, 0],
[2, 3, 2, 1],
[4, 8, 3, 0]])
In [25]: np.resize(a.mean(axis=0).astype(int), a.shape)
Out[25]:
array([[2, 3, 2, 0],
[2, 3, 2, 0],
[2, 3, 2, 0],
[2, 3, 2, 0]])
In order to correctly satisfy the condition that duplicate values of the means appear in the direction they were derived, it's necessary to reshape the mean array to a shape which is broadcastable with the original array.
Specifically, the mean array should have the same shape as the original array except that the length of the dimension along which the mean was taken should be 1.
The following function should work for any shape of array and any number of dimensions:
def fill_mean(arr, axis):
mean_arr = np.mean(arr, axis=axis)
mean_shape = list(arr.shape)
mean_shape[axis] = 1
mean_arr = mean_arr.reshape(mean_shape)
return np.zeros_like(arr) + mean_arr
Here's the function applied to your example array which I've called a:
>>> fill_mean(a, 0)
array([[ 2.25, 3.5 , 2. , 0.75],
[ 2.25, 3.5 , 2. , 0.75],
[ 2.25, 3.5 , 2. , 0.75],
[ 2.25, 3.5 , 2. , 0.75]])
>>> fill_mean(a, 1)
array([[ 1.5 , 1.5 , 1.5 , 1.5 ],
[ 1.25, 1.25, 1.25, 1.25],
[ 2. , 2. , 2. , 2. ],
[ 3.75, 3.75, 3.75, 3.75]])
Construct the numpy array
import numpy as np
data = np.array(
[[1, 1, 2, 2],
[2, 2, 1, 0],
[1, 1, 2, 2],
[4, 8, 3, 0]]
)
Use the axis parameter to get means along a particular axis
>>> means = np.mean(data, axis=0)
>>> means
array([ 2., 3., 2., 1.])
Now tile that resulting array into the shape of the original
>>> print np.tile(means, (4,1))
[[ 2. 3. 2. 1.]
[ 2. 3. 2. 1.]
[ 2. 3. 2. 1.]
[ 2. 3. 2. 1.]]
You can replace the 4,1 with parameters from data.shape
I have a very a very large 2D numpy array that contains 2x2 subsets that I need to take the average of. I am looking for a way to vectorize this operation. For example, given x:
# |- col 0 -| |- col 1 -| |- col 2 -|
x = np.array( [[ 0.0, 1.0, 2.0, 3.0, 4.0, 5.0], # row 0
[ 6.0, 7.0, 8.0, 9.0, 10.0, 11.0], # row 0
[12.0, 13.0, 14.0, 15.0, 16.0, 17.0], # row 1
[18.0, 19.0, 20.0, 21.0, 22.0, 23.0]]) # row 1
I need to end up with a 2x3 array which are the averages of each 2x2 sub array, i.e.:
result = np.array( [[ 3.5, 5.5, 7.5],
[15.5, 17.5, 19.5]])
so element [0,0] is calculated as the average of x[0:2,0:2], while element [0,1] would be the average of x[2:4, 0:2]. Does numpy have vectorized/efficient ways of doing aggregates on subsets like this?
If we form the reshaped matrix y = x.reshape(2,2,3,2), then the (i,j) 2x2 submatrix is given by y[i,:,j,:]. E.g.:
In [340]: x
Out[340]:
array([[ 0., 1., 2., 3., 4., 5.],
[ 6., 7., 8., 9., 10., 11.],
[ 12., 13., 14., 15., 16., 17.],
[ 18., 19., 20., 21., 22., 23.]])
In [341]: y = x.reshape(2,2,3,2)
In [342]: y[0,:,0,:]
Out[342]:
array([[ 0., 1.],
[ 6., 7.]])
In [343]: y[1,:,2,:]
Out[343]:
array([[ 16., 17.],
[ 22., 23.]])
To get the mean of the 2x2 submatrices, use the mean method, with axis=(1,3):
In [344]: y.mean(axis=(1,3))
Out[344]:
array([[ 3.5, 5.5, 7.5],
[ 15.5, 17.5, 19.5]])
If you are using an older version of numpy that doesn't support using a tuple for the axis, you could do:
In [345]: y.mean(axis=1).mean(axis=-1)
Out[345]:
array([[ 3.5, 5.5, 7.5],
[ 15.5, 17.5, 19.5]])
See the link given by #dashesy in a comment for more background on the reshaping "trick".
To generalize this to a 2-d array with shape (m, n), where m and n are even, use
y = x.reshape(x.shape[0]/2, 2, x.shape[1], 2)
y can then be interpreted as an array of 2x2 arrays. The first and third index slots of the 4-d array act as the indices that select one of the 2x2 blocks. To get the upper left 2x2 block, use y[0, :, 0, :]; to the block in the second row and third column of blocks, use y[1, :, 2, :]; and in general, to acces block (j, k), use y[j, :, k, :].
To compute the reduced array of averages of these blocks, use the mean method, with axis=(1, 3) (i.e. average over axes 1 and 3):
avg = y.mean(axis=(1, 3))
Here's an example where x has shape (8, 10), so the array of averages of the 2x2 blocks has shape (4, 5):
In [10]: np.random.seed(123)
In [11]: x = np.random.randint(0, 4, size=(8, 10))
In [12]: x
Out[12]:
array([[2, 1, 2, 2, 0, 2, 2, 1, 3, 2],
[3, 1, 2, 1, 0, 1, 2, 3, 1, 0],
[2, 0, 3, 1, 3, 2, 1, 0, 0, 0],
[0, 1, 3, 3, 2, 0, 3, 2, 0, 3],
[0, 1, 0, 3, 1, 3, 0, 0, 0, 2],
[1, 1, 2, 2, 3, 2, 1, 0, 0, 3],
[2, 1, 0, 3, 2, 2, 2, 2, 1, 2],
[0, 3, 3, 3, 1, 0, 2, 0, 2, 1]])
In [13]: y = x.reshape(x.shape[0]/2, 2, x.shape[1]/2, 2)
Take a look at a couple of the 2x2 blocks:
In [14]: y[0, :, 0, :]
Out[14]:
array([[2, 1],
[3, 1]])
In [15]: y[1, :, 2, :]
Out[15]:
array([[3, 2],
[2, 0]])
Compute the averages of the blocks:
In [16]: avg = y.mean(axis=(1, 3))
In [17]: avg
Out[17]:
array([[ 1.75, 1.75, 0.75, 2. , 1.5 ],
[ 0.75, 2.5 , 1.75, 1.5 , 0.75],
[ 0.75, 1.75, 2.25, 0.25, 1.25],
[ 1.5 , 2.25, 1.25, 1.5 , 1.5 ]])
Right now I am doing this by iterating, but there has to be a way to accomplish this task using numpy functions. My goal is to take a 2D array and average J columns at a time, producing a new array with the same number of rows as the original, but with columns/J columns.
So I want to take this:
J = 2 // two columns averaged at a time
[[1 2 3 4]
[4 3 7 1]
[6 2 3 4]
[3 4 4 1]]
and produce this:
[[1.5 3.5]
[3.5 4.0]
[4.0 3.5]
[3.5 2.5]]
Is there a simple way to accomplish this task? I also need a way such that if I never end up with an unaveraged remainder column. So if, for example, I have an input array with 5 columns and J=2, I would average the first two columns, then the last three columns.
Any help you can provide would be great.
data.reshape(-1,j).mean(axis=1).reshape(data.shape[0],-1)
If your j divides data.shape[1], that is.
Example:
In [40]: data
Out[40]:
array([[7, 9, 7, 2],
[7, 6, 1, 5],
[8, 1, 0, 7],
[8, 3, 3, 2]])
In [41]: data.reshape(-1,j).mean(axis=1).reshape(data.shape[0],-1)
Out[41]:
array([[ 8. , 4.5],
[ 6.5, 3. ],
[ 4.5, 3.5],
[ 5.5, 2.5]])
First of all, it looks to me like you're not averaging the columns at all, you're just averaging two data points at a time. Seems to me like your best off reshaping the array, so your that you have a Nx2 data structure that you can feed directly to mean. You may have to pad it first if the number of columns isn't quite compatible. Then at the end, just do a weighted average of the padded remainder column and the one before it. Finally reshape back to the shape you want.
To play off of the example provided by TheodrosZelleke:
In [1]: data = np.concatenate((data, np.array([[5, 6, 7, 8]]).T), 1)
In [2]: data
Out[2]:
array([[7, 9, 7, 2, 5],
[7, 6, 1, 5, 6],
[8, 1, 0, 7, 7],
[8, 3, 3, 2, 8]])
In [3]: cols = data.shape[1]
In [4]: j = 2
In [5]: dataPadded = np.concatenate((data, np.zeros((data.shape[0], j - cols % j))), 1)
In [6]: dataPadded
Out[6]:
array([[ 7., 9., 7., 2., 5., 0.],
[ 7., 6., 1., 5., 6., 0.],
[ 8., 1., 0., 7., 7., 0.],
[ 8., 3., 3., 2., 8., 0.]])
In [7]: dataAvg = dataPadded.reshape((-1,j)).mean(axis=1).reshape((data.shape[0], -1))
In [8]: dataAvg
Out[8]:
array([[ 8. , 4.5, 2.5],
[ 6.5, 3. , 3. ],
[ 4.5, 3.5, 3.5],
[ 5.5, 2.5, 4. ]])
In [9]: if cols % j:
dataAvg[:, -2] = (dataAvg[:, -2] * j + dataAvg[:, -1] * (cols % j)) / (j + cols % j)
dataAvg = dataAvg[:, :-1]
....:
In [10]: dataAvg
Out[10]:
array([[ 8. , 3.83333333],
[ 6.5 , 3. ],
[ 4.5 , 3.5 ],
[ 5.5 , 3. ]])
hello all, need to define a function that can be divided term by term matrix or in the worst cases, between arrays of lists so you get the result in a third matrix,
thanks for any response
Unless I'm misunderstanding, this is where numpy can be put to good use:
>>> from numpy import *
>>> a = array([[1,2,3],[4,5,6],[7,8,9]])
>>> b = array([[0.5] * 3, [0.5] * 3, [0.5] * 3])
>>> a / b
array([[ 2., 4., 6.],
[ 8., 10., 12.],
[ 14., 16., 18.]])
This works for multiplication too. And indeed, as noted by Mark, scalar division (and multiplication) is also possible:
>>> a / 10.0
array([[ 0.1, 0.2, 0.3],
[ 0.4, 0.5, 0.6],
[ 0.7, 0.8, 0.9]])
>>> a * 10
array([[10, 20, 30],
[40, 50, 60],
[70, 80, 90]])
Edit: to be complete, for lists of lists you could do the following:
>>> a = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> b = [[0.5] * 3, [0.5] * 3, [0.5] * 3]
>>> def mat_div(a, b):
... return [[n / d for n, d in zip(ra, rb)] for ra, rb in zip(a, b)]
...
>>> mat_div(a, b)
[[2.0, 4.0, 6.0], [8.0, 10.0, 12.0], [14.0, 16.0, 18.0]]