I have a numpy 2D array (50x50) filled with values. I would like to flatten the 2D array into one column (2500x1), but the location of these values are very important. The indices can be converted to spatial coordinates, so I want another two (x,y) (2500x1) arrays so I can retrieve the x,y spatial coordinate of the corresponding value.
For example:
My 2D array:
--------x-------
[[0.5 0.1 0. 0.] |
[0. 0. 0.2 0.8] y
[0. 0. 0. 0. ]] |
My desired output:
#Values
[[0.5]
[0.1]
[0. ]
[0. ]
[0. ]
[0. ]
[0. ]
[0.2]
...],
#Corresponding x index, where I will retrieve the x spatial coordinate from
[[0]
[1]
[2]
[3]
[4]
[0]
[1]
[2]
...],
#Corresponding y index, where I will retrieve the x spatial coordinate from
[[0]
[0]
[0]
[0]
[1]
[1]
[1]
[1]
...],
Any clues on how to do this? I've tried a few things but they have not worked.
For the simplisity let's reproduce your array with this chunk of code:
value = np.arange(6).reshape(2, 3)
Firstly, we create variables x, y which contains index for each dimension:
x = np.arange(value.shape[0])
y = np.arange(value.shape[1])
np.meshgrid is the method, related to the issue you described:
xx, yy = np.meshgrid(x, y, sparse=False)
Finaly, transform all elements it in the shape you want with these lines:
xx = xx.reshape(-1, 1)
yy = yy.reshape(-1, 1)
value = value.reshape(-1, 1)
According to your example, with np.indices:
data = np.arange(2500).reshape(50, 50)
y_indices, x_indices = np.indices(data.shape)
Reshaping your data:
data = data.reshape(-1,1)
x_indices = x_indices.reshape(-1,1)
y_indices = y_indices.reshape(-1,1)
Assuming you want to flatten and reshape into a single column, use reshape:
a = np.array([[0.5, 0.1, 0., 0.],
[0., 0., 0.2, 0.8],
[0., 0., 0., 0. ]])
a.reshape((-1, 1)) # 1 column, as many row as necessary (-1)
output:
array([[0.5],
[0.1],
[0. ],
[0. ],
[0. ],
[0. ],
[0.2],
[0.8],
[0. ],
[0. ],
[0. ],
[0. ]])
getting the coordinates
y,x = a.shape
np.tile(np.arange(x), y)
# array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3])
np.repeat(np.arange(y), x)
# array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2])
or simply using unravel_index:
Y, X = np.unravel_index(range(a.size), a.shape)
# (array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2]),
# array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]))
Related
suppose i have multiple 4x4 matrices which i want to add to a final 6x6 zero matrix by adding some of the values in the designated coordination. how would i do this. I throughout of adding slices to np.zero 6x6 matrix , but i believe this may be quite tedious.
matrix 1 would go to this position first position and you have matrix 2 going to this position position 2. these two positions would be added and form the following final matrix Final position matrix
import numpy as np
from math import sqrt
# Element 1
C_1= 3/5
S_1= 4/5
matrix_1 = np.matrix([[C_1**2, C_1*S_1,-C_1**2,-C_1*S_1],[C_1*S_1,S_1**2,-C_1*S_1,-S_1**2],
[-C_1**2,-C_1*S_1,C_1**2,C_1*S_1],[-C_1*S_1,-S_1**2,C_1*S_1,S_1**2]])
empty_mat1 = np.zeros((6,6))
empty_mat1[0:4 , 0:4] = empty_mat1[0:4 ,0:4] + matrix_1
#print(empty_mat1)
# Element 2
C_2 = 0
S_2 = 1
matrix_2 = 1.25*np.matrix([[C_2**2, C_2*S_2,-C_2**2,-C_2*S_2],[C_2*S_2,S_2**2,-C_2*S_2,-S_2**2],
[-C_2**2,-C_2*S_2,C_2**2,C_2*S_2],[-C_2*S_2,-S_2**2,C_2*S_2,S_2**2]])
empty_mat2 = np.zeros((6,6))
empty_mat2[0:2,0:2] = empty_mat2[0:2,0:2] + matrix_2[0:2,0:2]
empty_mat2[4:6,0:2] = empty_mat2[4:6,0:2] + matrix_2[2:4,0:2]
empty_mat2[0:2,4:6] = empty_mat2[0:2,4:6] + matrix_2[2:4,2:4]
empty_mat2[4:6,4:6] = empty_mat2[4:6,4:6] + matrix_2[0:2,0:2]
print(empty_mat1+empty_mat2)
Adding two arrays of differents dimensions is a little bit tricky with numpy.
However, with array comprehension, you could do it with the following "rustic" method :
Supposing M1 and M2 your 2 input arrays, M3 (from M1) and M4 (from M2) your temporary arrays and M5 the final array :
#Initalisation
M1 = np.array([[ 0.36, 0.48, -0.36, -0.48], [ 0.48, 0.64, -0.48, -0.64], [ -0.36, -0.48, 0.36, 0.48], [-0.48, -0.64, 0.48, 0.64]])
M2 = np.array([[ 0, 0, 0, 0], [ 0, 1.25, 0, -1.25], [ 0, 0, 0, 0], [ 0, -1.25, 0, 1.25]])
M3, M4 = np.zeros((6, 6)), np.zeros((6, 6))
#M3 and M4 operations
M3[0:4, 0:4] = M1[0:4, 0:4] + M3[0:4, 0:4]
M4[0:2, 0:2] = M2[0:2, 0:2]
M4[0:2, 4:6] = M2[0:2, 2:4]
M4[4:6, 0:2] = M2[2:4, 0:2]
M4[4:6, 4:6] = M2[2:4, 2:4]
#Final operation
M5 = M3+M4
print(M5)
Output :
[[ 0.36 0.48 -0.36 -0.48 0. 0. ]
[ 0.48 1.89 -0.48 -0.64 0. -1.25]
[-0.36 -0.48 0.36 0.48 0. 0. ]
[-0.48 -0.64 0.48 0.64 0. 0. ]
[ 0. 0. 0. 0. 0. 0. ]
[ 0. -1.25 0. 0. 0. 1.25]]
Have a good day.
You will need to encode some way of where your 4x4 matrices end up in the final 6x6 matrix. Suppose you have N (=2 in your case) such 4x4 matrices. You can then define two new arrays (shape Nx4) that denote the row and col indices of the final 6x6 matrix that you want your 4x4 matrices to end up in. Finally, you use fancy indexing and broadcasting to build up a Nx6x6 array which you can sum over. Your example:
import numpy as np
N = 2
arr = np.array([[
[0.36, 0.48, -0.36, -0.48],
[0.48, 0.64, -0.48, -0.64],
[-0.36, -0.48, 0.36, 0.48],
[-0.48, -0.64, 0.48, 0.64],
], [
[0, 0, 0, 0],
[0, 1.25, 0, -1.25],
[0, 0, 0, 0],
[0, -1.25, 0, 1.25],
]])
rows = np.array([
[0, 1, 2, 3],
[0, 1, 4, 5]
])
cols = np.array([
[0, 1, 2, 3],
[0, 1, 4, 5]
])
i = np.arange(N)
out = np.zeros((N, 6, 6))
out[
i[:, None, None],
rows[:, :, None],
cols[:, None, :]
] = arr
out = out.sum(axis=0)
Gives as output:
array([[ 0.36, 0.48, -0.36, -0.48, 0. , 0. ],
[ 0.48, 1.89, -0.48, -0.64, 0. , -1.25],
[-0.36, -0.48, 0.36, 0.48, 0. , 0. ],
[-0.48, -0.64, 0.48, 0.64, 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , -1.25, 0. , 0. , 0. , 1.25]])
If you want even more control over where each row/col ends up, you can pull off some more trickery as follows:
rows = np.array([
[1, 2, 3, 4, 0, 0],
[1, 2, 0, 0, 3, 4]
])
cols = np.array([
[1, 2, 3, 4, 0, 0],
[1, 2, 0, 0, 3, 4]
])
i = np.arange(N)
out = np.pad(arr, ((0, 0), (1, 0), (1, 0)))[
i[:, None, None],
rows[:, :, None],
cols[:, None, :]
].sum(axis=0)
which has the same output. This would allow you to shuffle the rows/cols of arr by shuffling the values 1-4 in the rows, cols arrays. I would prefer option 1 though.
I probably should wait for you to correct your question, but I'll go ahead and give you some code - yes, in the most tedious form - based on your images
res = np.zeros((6,6))
# arr1, arr2 are (4,4) arrays
res[:4, :4] += arr1
idx = np.array([0,1,4,5])
res[idx[:,None], idx] += arr2
The first is contiguous block, so the 2 slices are enough.
The second is split up, so I'm using advanced indexing.
I have a numpy array named heartbeats with 100 rows. Each row has 5 elements.
I also have a single array named time_index with 5 elements.
I need to prepend the time index to each row of heartbeats.
heartbeats = np.array([
[-0.58, -0.57, -0.55, -0.39, -0.40],
[-0.31, -0.31, -0.32, -0.46, -0.46]
])
time_index = np.array([-2, -1, 0, 1, 2])
What I need:
array([-2, -0.58],
[-1, -0.57],
[0, -0.55],
[1, -0.39],
[2, -0.40],
[-2, -0.31],
[-1, -0.31],
[0, -0.32],
[1, -0.46],
[2, -0.46])
I only wrote two rows of heartbeats to illustrate.
Assuming you are using numpy, the exact output array you are looking for can be made by stacking a repeated version of time_index with the raveled version of heartbeats:
np.stack((np.tile(time_index, len(heartbeats)), heartbeats.ravel()), axis=-1)
Another approach, using broadcasting
In [13]: heartbeats = np.array([
...: [-0.58, -0.57, -0.55, -0.39, -0.40],
...: [-0.31, -0.31, -0.32, -0.46, -0.46]
...: ])
...: time_index = np.array([-2, -1, 0, 1, 2])
Make a target array:
In [14]: res = np.zeros(heartbeats.shape + (2,), heartbeats.dtype)
In [15]: res[:,:,1] = heartbeats # insert a (2,5) into a (2,5) slot
In [17]: res[:,:,0] = time_index[None] # insert a (5,) into a (2,5) slot
In [18]: res
Out[18]:
array([[[-2. , -0.58],
[-1. , -0.57],
[ 0. , -0.55],
[ 1. , -0.39],
[ 2. , -0.4 ]],
[[-2. , -0.31],
[-1. , -0.31],
[ 0. , -0.32],
[ 1. , -0.46],
[ 2. , -0.46]]])
and then reshape to 2d:
In [19]: res.reshape(-1,2)
Out[19]:
array([[-2. , -0.58],
[-1. , -0.57],
[ 0. , -0.55],
[ 1. , -0.39],
[ 2. , -0.4 ],
[-2. , -0.31],
[-1. , -0.31],
[ 0. , -0.32],
[ 1. , -0.46],
[ 2. , -0.46]])
[17] takes a (5,), expands it to (1,5), and then to (2,5) for the insert. Read up on broadcasting.
As an alternative way, you can repeat time_index by np.concatenate based on the specified times:
concatenated = np.concatenate([time_index] * heartbeats.shape[0])
# [-2 -1 0 1 2 -2 -1 0 1 2]
# result = np.dstack((concatenated, heartbeats.reshape(-1))).squeeze()
result = np.array([concatenated, heartbeats.reshape(-1)]).T
Using np.concatenate may be faster than np.tile. This solution is faster than Mad Physicist, but the fastest is using broadcasting as hpaulj's answer.
I have a 2d array, and I have some numbers to add to some cells. I want to vectorize the operation in order to save time. The problem is when I need to add several numbers to the same cell. In this case, the vectorized code only adds the last.
'a' is my array, 'x' and 'y' are the coordinates of the cells I want to increment, and 'z' contains the numbers I want to add.
import numpy as np
a=np.zeros((4,4))
x=[1,2,1]
y=[0,1,0]
z=[2,3,1]
a[x,y]+=z
print(a)
As you see, a[1,0] should be incremented twice: one by 2, one by 1. So the expected array should be:
[[0. 0. 0. 0.]
[3. 0. 0. 0.]
[0. 3. 0. 0.]
[0. 0. 0. 0.]]
but instead I get:
[[0. 0. 0. 0.]
[1. 0. 0. 0.]
[0. 3. 0. 0.]
[0. 0. 0. 0.]]
The problem would be easy to solve with a for loop, but I wonder if I can correctly vectorize this operation.
Use np.add.at for that:
import numpy as np
a = np.zeros((4,4))
x = [1, 2, 1]
y = [0, 1, 0]
z = [2, 3, 1]
np.add.at(a, (x, y), z)
print(a)
# [[0. 0. 0. 0.]
# [3. 0. 0. 0.]
# [0. 3. 0. 0.]
# [0. 0. 0. 0.]]
When you're doing a[x,y]+=z, we can decompose the operations as :
a[1, 0], a[2, 1], a[1, 0] = [a[1, 0] + 2, a[2, 1] + 3, a[1, 0] + 1]
# Equivalent to :
a[1, 0] = 2
a[2, 1] = 3
a[1, 0] = 1
That's why it doesn't works.
But if you're incrementing your array with a loop for each dimention, it should work
You could create a multi-dimensional array of size 3x4x4, then add up z to all the 3 different dimensions and them sum them all
import numpy as np
x = [1,2,1]
y = [0,1,0]
z = [2,3,1]
a = np.zeros((3,4,4))
n = range(a.shape[0])
a[n,x,y] += z
print(sum(a))
which will result in
[[0. 0. 0. 0.]
[3. 0. 0. 0.]
[0. 3. 0. 0.]
[0. 0. 0. 0.]]
Approach #1: Bincount-based method for performance
We can use np.bincount for efficient bin-based summation and basically inspired by this post -
def accumulate_arr(x, y, z, out):
# Get output array shape
shp = out.shape
# Get linear indices to be used as IDs with bincount
lidx = np.ravel_multi_index((x,y),shp)
# Or lidx = coords[0]*(coords[1].max()+1) + coords[1]
# Accumulate arr with IDs from lidx
out += np.bincount(lidx,z,minlength=out.size).reshape(out.shape)
return out
If you are working with a zeros-initialized output array, feed in the output shape directly into the function and get the bincount output as the final one.
Output on given sample -
In [48]: accumulate_arr(x,y,z,a)
Out[48]:
array([[0., 0., 0., 0.],
[3., 0., 0., 0.],
[0., 3., 0., 0.],
[0., 0., 0., 0.]])
Approach #2: Using sparse-matrix for memory-efficiency
In [54]: from scipy.sparse import coo_matrix
In [56]: coo_matrix((z,(x,y)), shape=(4,4)).toarray()
Out[56]:
array([[0, 0, 0, 0],
[3, 0, 0, 0],
[0, 3, 0, 0],
[0, 0, 0, 0]])
If you are okay with a sparse-matrix, skip the .toarray() part for a memory-efficient solution.
Could someone care to explain the meshgrid method? I cannot wrap my mind around it. The example is from the [SciPy][1] site:
import numpy as np
nx, ny = (3, 2)
x = np.linspace(0, 1, nx)
print ("x =", x)
y = np.linspace(0, 1, ny)
print ("y =", y)
xv, yv = np.meshgrid(x, y)
print ("xv_1 =", xv)
print ("yv_1 =", yv)
xv, yv = np.meshgrid(x, y, sparse=True) # make sparse output arrays
print ("xv_2 =", xv)
print ("yv_2 =", yv)
Printout is :
x = [ 0. 0.5 1. ]
y = [ 0. 1.]
xv_1 = [[ 0. 0.5 1. ]
[ 0. 0.5 1. ]]
yv_1 = [[ 0. 0. 0.]
[ 1. 1. 1.]]
xv_2 = [[ 0. 0.5 1. ]]
yv_2 = [[ 0.]
[ 1.]]
Why are arrays xv_1 and yv_1 formed like this ? Ty :)
[1]: http://docs.scipy.org/doc/numpy/reference/generated/numpy.meshgrid.html#numpy.meshgrid
In [214]: nx, ny = (3, 2)
In [215]: x = np.linspace(0, 1, nx)
In [216]: x
Out[216]: array([ 0. , 0.5, 1. ])
In [217]: y = np.linspace(0, 1, ny)
In [218]: y
Out[218]: array([ 0., 1.])
Using unpacking to better see the 2 arrays produced by meshgrid:
In [225]: X,Y = np.meshgrid(x, y)
In [226]: X
Out[226]:
array([[ 0. , 0.5, 1. ],
[ 0. , 0.5, 1. ]])
In [227]: Y
Out[227]:
array([[ 0., 0., 0.],
[ 1., 1., 1.]])
and for the sparse version. Notice that X1 looks like one row of X (but 2d). and Y1 like one column of Y.
In [228]: X1,Y1 = np.meshgrid(x, y, sparse=True)
In [229]: X1
Out[229]: array([[ 0. , 0.5, 1. ]])
In [230]: Y1
Out[230]:
array([[ 0.],
[ 1.]])
When used in calculations like plus and times, both forms behave the same. That's because of numpy's broadcasting.
In [231]: X+Y
Out[231]:
array([[ 0. , 0.5, 1. ],
[ 1. , 1.5, 2. ]])
In [232]: X1+Y1
Out[232]:
array([[ 0. , 0.5, 1. ],
[ 1. , 1.5, 2. ]])
The shapes might also help:
In [235]: X.shape, Y.shape
Out[235]: ((2, 3), (2, 3))
In [236]: X1.shape, Y1.shape
Out[236]: ((1, 3), (2, 1))
The X and Y have more values than are actually needed for most uses. But usually there isn't much of penalty for using them instead the sparse versions.
Your linear spaced vectors x and y defined by linspace use 3 and 2 points respectively.
These linear spaced vectors are then used by the meshgrid function to create a 2D linear spaced point cloud. This will be a grid of points for each of the x and y coordinates. The size of this point cloud will be 3 x 2.
The output of the function meshgrid creates an indexing matrix that holds in each cell the x and y coordinates for each point of your space.
This is created as follows:
# dummy
def meshgrid_custom(x,y):
xv = np.zeros((len(x),len(y)))
yv = np.zeros((len(x),len(y)))
for i,ix in zip(range(len(x)),x):
for j,jy in zip(range(len(y)),y):
xv[i,j] = ix
yv[i,j] = jy
return xv.T, yv.T
So, for example the point at the location (1,1) has the coordinates:
x = xv_1[1,1] = 0.5
y = yv_1[1,1] = 1.0
I'm trying to efficiently map a N * 1 numpy array of ints to a N * 3 numpy array of floats using a ufunc.
What I have so far:
map = {1: (0, 0, 0), 2: (0.5, 0.5, 0.5), 3: (1, 1, 1)}
ufunc = numpy.frompyfunc(lambda x: numpy.array(map[x], numpy.float32), 1, 1)
input = numpy.array([1, 2, 3], numpy.int32)
ufunc(input) gives a 3 * 3 array with dtype object. I'd like this array but with dtype float32.
You could use np.hstack:
import numpy as np
mapping = {1: (0, 0, 0), 2: (0.5, 0.5, 0.5), 3: (1, 1, 1)}
ufunc = np.frompyfunc(lambda x: np.array(mapping[x], np.float32), 1, 1, dtype = np.float32)
data = np.array([1, 2, 3], np.int32)
result = np.hstack(ufunc(data))
print(result)
# [ 0. 0. 0. 0.5 0.5 0.5 1. 1. 1. ]
print(result.dtype)
# float32
print(result.shape)
# (9,)
If your mapping is a numpy array, you can just use fancy indexing for this:
>>> valmap = numpy.array([(0, 0, 0), (0.5, 0.5, 0.5), (1, 1, 1)])
>>> input = numpy.array([1, 2, 3], numpy.int32)
>>> valmap[input-1]
array([[ 0. , 0. , 0. ],
[ 0.5, 0.5, 0.5],
[ 1. , 1. , 1. ]])
You can use ndarray fancy index to get the same result, I think it should be faster than frompyfunc:
map_array = np.array([[0,0,0],[0,0,0],[0.5,0.5,0.5],[1,1,1]], dtype=np.float32)
index = np.array([1,2,3,1])
map_array[index]
Or you can just use list comprehension:
map = {1: (0, 0, 0), 2: (0.5, 0.5, 0.5), 3: (1, 1, 1)}
np.array([map[i] for i in [1,2,3,1]], dtype=np.float32)
Unless I misread the doc, the output of np.frompyfunc on a scalar a object indeed: when using a ndarray as input, you'll get a ndarray with dtype=obj.
A workaround is to use the np.vectorize function:
F = np.vectorize(lambda x: mapper.get(x), 'fff')
Here, we force the dtype of F's output to be 3 floats (hence the 'fff').
>>> mapper = {1: (0, 0, 0), 2: (0.5, 1.0, 0.5), 3: (1, 2, 1)}
>>> inp = [1, 2, 3]
>>> F(inp)
(array([ 0. , 0.5, 1. ], dtype=float32), array([ 0., 0.5, 1.], dtype=float32), array([ 0. , 0.5, 1. ], dtype=float32))
OK, not quite what we want: it's a tuple of three float arrays (as we gave 'fff'), the first array being equivalent to [mapper[i][0] for i in inp]. So, with a bit of manipulation:
>>> np.array(F(inp)).T
array([[ 0. , 0. , 0. ],
[ 0.5, 0.5, 0.5],
[ 1. , 1. , 1. ]], dtype=float32)