So basically I want to create a new array for each element and append the coordinates of the element to the original value (so adding the x and y position to the original element):
[ [7,2,4],[1,5,3] ]
then becomes
[ [[0,0,7][0,1,2][0,2,4]],
[[1,0,1][1,1,5][1,2,3]] ]
I've been looking for different ways to make this work with the axis system in NumPy but I'm probably overseeing some more obvious way.
You can try np.meshgrid to create a grid and then np.stack to combine it with input array:
import numpy as np
a = np.asarray([[7,2,4],[1,5,3]])
result = np.stack(np.meshgrid(range(a.shape[1]), range(a.shape[0]))[::-1] + [a], axis=-1)
Output:
array([[[0, 0, 7],
[0, 1, 2],
[0, 2, 4]],
[[1, 0, 1],
[1, 1, 5],
[1, 2, 3]]])
Let me know if it helps.
Without numpy you could use list comprehension:
old_list = [ [7,2,4],[1,5,3] ]
new_list = [ [[i,j,old_list[i][j]] for j in range(len(old_list[i]))] for i in range(old_list) ]
I'd assume that numpy is faster but the sublists are not required to have equal length in this solution.
Another approach using enumerate
In [38]: merge = list()
...: for i,j in enumerate(val):
...: merge.append([[i, m, n] for m, n in enumerate(j)])
...:
In [39]: merge
Out[39]: [[[0, 0, 7], [0, 1, 2], [0, 2, 4]], [[1, 0, 1], [1, 1, 5], [1, 2, 3]]]
Hope it useful
a = np.array([[7,2,4], [1,5,3]])
idx = np.argwhere(a)
idx = idx.reshape((*(a.shape), -1))
a = np.expand_dims(a, axis=-1)
a = np.concatenate((idx, a), axis=-1)
Related
I look for an efficient way to get a row-wise intersection of two two-dimensional numpy ndarrays. There is only one intersection per row. For example:
[[1, 2], ∩ [[0, 1], -> [1,
[3, 4]] [0, 3]] 3]
In the best case zeros should be ignored:
[[1, 2, 0], ∩ [[0, 1, 0], -> [1,
[3, 4, 0]] [0, 3, 0]] 3]
My solution:
import numpy as np
arr1 = np.array([[1, 2],
[3, 4]])
arr2 = np.array([[0, 1],
[0, 3]])
arr3 = np.empty(len(arr1))
for i in range(len(arr1)):
arr3[i] = np.intersect1d(arr1[i], arr2[i])
print(arr3)
# [ 1. 3.]
I have about 1 million rows, so the vectorized operations are most preferred. You are welcome to use other python packages.
You can use np.apply_along_axis.
I wrote a solution that pads to the size of the arr1.
Didn't test the efficiency.
import numpy as np
def intersect1d_padded(x):
x, y = np.split(x, 2)
padded_intersection = -1 * np.ones(x.shape, dtype=np.int)
intersection = np.intersect1d(x, y)
padded_intersection[:intersection.shape[0]] = intersection
return padded_intersection
def rowwise_intersection(a, b):
return np.apply_along_axis(intersect1d_padded,
1, np.concatenate((a, b), axis=1))
result = rowwise_intersection(arr1,arr2)
>>> array([[ 1, -1],
[ 3, -1]])
if you know you have only one element in the intersection you can use
result = rowwise_intersection(arr1,arr2)[:,0]
>>> array([1, 3])
You can also modify intersect1d_padded to return a scalar with the intersection value.
I don't know of an elegant way to do it in numpy, but a simple list comprehension can do the trick:
[list(set.intersection(set(_x),set(_y)).difference({0})) for _x,_y in zip(x,y)]
I'm looking for an efficient way to return indices for a 2d array based on values in a 1d array. I currently have a nested for loop set up that is painfully slow.
Here is some example data and what I want to get:
data2d = np.array( [ [1,2] , [1,3] ,[3,4], [1,2] , [7,9] ])
data1d = np.array([1,2,3,4,5,6,7,8,9])
I would like to return the indices where data2d is equal to data1d. My desired output would be this 2d array:
locs = np.array([[0, 1], [0, 2], [2, 3], [0, 1], [6, 8]])
The only thing I've come up with is the nested for loop:
locs = np.full((np.shape(data2d)), np.nan)
for i in range(0, 5):
for j in range(0, 2):
loc_val = np.where(data1d == data2d[i, j])
loc_val = loc_val[0]
locs[i, j] = loc_val
This would be fine for a small set of data but I have 87,600 2d grids that are each 428x614 grid points.
Use np.searchsorted:
np.searchsorted(data1d, data2d.ravel()).reshape(data2d.shape)
array([[0, 1],
[0, 2],
[2, 3],
[0, 1],
[6, 8]])
searchsorted performs binary search with the ravelled data2d. The result is then reshaped.
Another option is to build an index and query it in constant time. You can do this with pandas' Index API.
import pandas as pd
idx = pd.Index([1,2,3,4,5,6,7,8,9])
idx
# Int64Index([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')
idx.get_indexer(data2d.ravel()).reshape(data2d.shape)
array([[0, 1],
[0, 2],
[2, 3],
[0, 1],
[6, 8]])
This should be fast also
import numpy as np
data2d = np.array( [ [1,2] , [1,3] ,[3,4], [1,2] , [7,9] ])
data1d = np.array([1,2,3,4,5,6,7,8,9])
idxdict = dict(zip(data1d,range(len(data1d))))
locs = data2d
for i in range(len(locs)):
for j in range(len(locs[i])):
locs[i][j] = idxdict[locs[i][j]]
I'd like to get the index of a value for every column in a matrix M. For example:
M = matrix([[0, 1, 0],
[4, 2, 4],
[3, 4, 1],
[1, 3, 2],
[2, 0, 3]])
In pseudocode, I'd like to do something like this:
for col in M:
idx = numpy.where(M[col]==0) # Only for columns!
and have idx be 0, 4, 0 for each column.
I have tried to use where, but I don't understand the return value, which is a tuple of matrices.
The tuple of matrices is a collection of items suited for indexing. The output will have the shape of the indexing matrices (or arrays), and each item in the output will be selected from the original array using the first array as the index of the first dimension, the second as the index of the second dimension, and so on. In other words, this:
>>> numpy.where(M == 0)
(matrix([[0, 0, 4]]), matrix([[0, 2, 1]]))
>>> row, col = numpy.where(M == 0)
>>> M[row, col]
matrix([[0, 0, 0]])
>>> M[numpy.where(M == 0)] = 1000
>>> M
matrix([[1000, 1, 1000],
[ 4, 2, 4],
[ 3, 4, 1],
[ 1, 3, 2],
[ 2, 1000, 3]])
The sequence may be what's confusing you. It proceeds in flattened order -- so M[0,2] appears second, not third. If you need to reorder them, you could do this:
>>> row[0,col.argsort()]
matrix([[0, 4, 0]])
You also might be better off using arrays instead of matrices. That way you can manipulate the shape of the arrays, which is often useful! Also note ajcr's transpose-based trick, which is probably preferable to using argsort.
Finally, there is also a nonzero method that does the same thing as where in this case. Using the transpose trick now:
>>> (M == 0).T.nonzero()
(matrix([[0, 1, 2]]), matrix([[0, 4, 0]]))
As an alternative to np.where, you could perhaps use np.argwhere to return an array of indexes where the array meets the condition:
>>> np.argwhere(M == 0)
array([[[0, 0]],
[[0, 2]],
[[4, 1]]])
This tells you each the indexes in the format [row, column] where the condition was met.
If you'd prefer the format of this output array to be grouped by column rather than row, (that is, [column, row]), just use the method on the transpose of the array:
>>> np.argwhere(M.T == 0).squeeze()
array([[0, 0],
[1, 4],
[2, 0]])
I also used np.squeeze here to get rid of axis 1, so that we are left with a 2D array. The sequence you want is the second column, i.e. np.argwhere(M.T == 0).squeeze()[:, 1].
The result of where(M == 0) would look something like this
(matrix([[0, 0, 4]]), matrix([[0, 2, 1]])) First matrix tells you the rows where 0s are and second matrix tells you the columns where 0s are.
Out[4]:
matrix([[0, 1, 0],
[4, 2, 4],
[3, 4, 1],
[1, 3, 2],
[2, 0, 3]])
In [5]: np.where(M == 0)
Out[5]: (matrix([[0, 0, 4]]), matrix([[0, 2, 1]]))
In [6]: M[0,0]
Out[6]: 0
In [7]: M[0,2] #0th row 2nd column
Out[7]: 0
In [8]: M[4,1] #4th row 1st column
Out[8]: 0
This isn't anything new on what's been already suggested, but a one-line solution is:
>>> np.where(np.array(M.T)==0)[-1]
array([0, 4, 0])
(I agree that NumPy matrix objects are more trouble than they're worth).
>>> M = np.array([[0, 1, 0],
... [4, 2, 4],
... [3, 4, 1],
... [1, 3, 2],
... [2, 0, 3]])
>>> [np.where(M[:,i]==0)[0][0] for i in range(M.shape[1])]
[0, 4, 0]
I have a 2D NumPy array and I hope to expand its size on both dimensions by copying the bottom row and right column.
For example, from 2x2:
[[0,1],
[2,3]]
to 4x4:
[[0,1,1,1],
[2,3,3,3],
[2,3,3,3],
[2,3,3,3]]
What's the best way to do it?
Thanks.
Here, the hstack and vstack functions can come in handy. For example,
In [16]: p = array(([0,1], [2,3]))
In [20]: vstack((p, p[-1], p[-1]))
Out[20]:
array([[0, 1],
[2, 3],
[2, 3],
[2, 3]])
And remembering that p.T is the transpose:
So now you can do something like the following:
In [16]: p = array(([0,1], [2,3]))
In [22]: p = vstack((p, p[-1], p[-1]))
In [25]: p = vstack((p.T, p.T[-1], p.T[-1])).T
In [26]: p
Out[26]:
array([[0, 1, 1, 1],
[2, 3, 3, 3],
[2, 3, 3, 3],
[2, 3, 3, 3]])
So the 2 lines of code should do it...
Make an empty array and copy whatever rows, columns you want into it.
def expand(a, new_shape):
x, y = a.shape
r = np.empty(new_shape, a.dtype)
r[:x, :y] = a
r[x:, :y] = a[-1:, :]
r[:x, y:] = a[:, -1:]
r[x:, y:] = a[-1, -1]
return r
Suppose I have a list contains un-equal length lists.
a = [ [ 1, 2, 3], [2], [2, 4] ]
What is the best way to obtain a zero padding numpy array with standard shape?
zero_a = [ [1, 2, 3], [2, 0, 0], [2, 4, 0] ]
I know I can use list operation like
n = max( map( len, a ) )
map( lambda x : x.extend( [0] * (n-len(x)) ), a )
zero_a = np.array(zero_a)
but I was wondering is there any easy numpy way to do this work?
As numpy have to know size of an array just prior to its initialization, best solution would be a numpy based constructor for such case. Sadly, as far as I know, there is none.
Probably not ideal, but slightly faster solution will be create numpy array with zeros and fill with list values.
import numpy as np
def pad_list(lst):
inner_max_len = max(map(len, lst))
map(lambda x: x.extend([0]*(inner_max_len-len(x))), lst)
return np.array(lst)
def apply_to_zeros(lst, dtype=np.int64):
inner_max_len = max(map(len, lst))
result = np.zeros([len(lst), inner_max_len], dtype)
for i, row in enumerate(lst):
for j, val in enumerate(row):
result[i][j] = val
return result
Test case:
>>> pad_list([[ 1, 2, 3], [2], [2, 4]])
array([[1, 2, 3],
[2, 0, 0],
[2, 4, 0]])
>>> apply_to_zeros([[ 1, 2, 3], [2], [2, 4]])
array([[1, 2, 3],
[2, 0, 0],
[2, 4, 0]])
Performance:
>>> timeit.timeit('from __main__ import pad_list as f; f([[ 1, 2, 3], [2], [2, 4]])', number = 10000)
0.3937079906463623
>>> timeit.timeit('from __main__ import apply_to_zeros as f; f([[ 1, 2, 3], [2], [2, 4]])', number = 10000)
0.1344289779663086
Not strictly a function from numpy, but you could do something like this
from itertools import izip, izip_longest
import numpy
a=[[1,2,3], [4], [5,6]]
res1 = numpy.array(list(izip(*izip_longest(*a, fillvalue=0))))
or, alternatively:
res2=numpy.array(list(izip_longest(*a, fillvalue=0))).transpose()
If you use python 3, use zip, and itertools.zip_longest.