Compute sum of unique values in Numpy Array by indices - python

I have a task to compute sum efficiently and beautifully.
I have two array
dst = [[1 2 3 4 5]
[2 3 4 5 6]
[1 1 2 2 3]
[7 8 9 9 3]]
and
ids = [[1 1 2 1 3]
[2 2 1 1 3]
[3 3 2 1 1]
[2 2 1 3 3]
[1 2 3 2 1]]
For each row in dst I need to compute sum for unique elements in ids and return max of sum and number of index.
Example for first row: I have 3 unique number in ids in first row [1,2,3].
indices for 1 = [0,1,3] for 2 = [2] for 3 = [4]
For 1: sum is sum of dst[0][0] + dst[0][1] + dst[0][3] = 1 + 2 + 4 = 7.
For 2: sum is dst[0][2] = 3
For 3: sum is dst[0][4] = 5.
max(sum) = 7
number = 3
Total: [3,7] - for first row
I have no idea how to do it with using of Numpy function efficiently and easy. I did it with classic python, but that solution works too slow.

You can try to get the unique indices like this:
indices = [np.unique(row) for row in ids]
and then calculate the sums:
sums = [np.sum(dst[i][indices[i]]) for i in range(len(dst))]

Related

An efficient way to concatenate rows of a 2-dim array according to a given list of pairs of indexes

Suppose I have a 2 dimensional array with a very large number of rows, and a list of pairs of indexes of that array. I want to create a new 2 dim array, whose rows are concatenations of the rows of the original array, made according to the list of pairs of indexes. For example:
a =
1 2 3
4 5 6
7 8 9
0 0 0
indexes = [[0,0], [0,1], [2,3]]
the returned array should be:
1 2 3 1 2 3
1 2 3 4 5 6
7 8 9 0 0 0
Obviously I can iterate the list of indexes, but my question is whether there is a more efficient way of doing this. I should say that the list of indexes is also very large.
First convert indexes to a Numpy array:
ind = np.array(indexes)
Then generate your result as:
result = np.concatenate([a[ind[:,0]], a[ind[:,1]]], axis=1)
The result is:
array([[1, 2, 3, 1, 2, 3],
[1, 2, 3, 4, 5, 6],
[7, 8, 9, 0, 0, 0]])
Another possible formula (with the same result):
result = np.concatenate([ a[ind[:,i]] for i in range(ind.shape[1]) ], axis=1)
You can do this in one line using NumPy as:
a = np.arange(12).reshape(4, 3)
print(a)
b = [[0, 0], [1, 1], [2, 3]]
b = np.array(b)
print(b)
c = a[b.reshape(-1)].reshape(-1, a.shape[1]*b.shape[1])
print(c)
'''
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
[[0 0]
[1 1]
[2 3]]
[[ 0 1 2 0 1 2]
[ 3 4 5 3 4 5]
[ 6 7 8 9 10 11]]
'''
You can use horizontal stacking np.hstack:
c = np.array(indexes)
np.hstack((a[c[:,0]],a[c[:,1]]))
output:
[[1 2 3 1 2 3]
[1 2 3 4 5 6]
[7 8 9 0 0 0]]

How to find numpy array shape in a larger array?

big_array = np.array((
[0,1,0,0,1,0,0,1],
[0,1,0,0,0,0,0,0],
[0,1,0,0,1,0,0,0],
[0,0,0,0,1,0,0,0],
[1,0,0,0,1,0,0,0]))
print(big_array)
[[0 1 0 0 1 0 0 1]
[0 1 0 0 0 0 0 0]
[0 1 0 0 1 0 0 0]
[0 0 0 0 1 0 0 0]
[1 0 0 0 1 0 0 0]]
Is there a way to iterate over this numpy array and for each 2x2 cluster of 0s, set all values within that cluster = 5? This is what the output would look like.
[[0 1 5 5 1 5 5 1]
[0 1 5 5 0 5 5 0]
[0 1 5 5 1 5 5 0]
[0 0 5 5 1 5 5 0]
[1 0 5 5 1 5 5 0]]
My thoughts are to use advanced indexing to set the 2x2 shape = to 5, but I think it would be really slow to simply iterate like:
1) check if array[x][y] is 0
2) check if adjacent array elements are 0
3) if all elements are 0, set all those values to 5.
big_array = [1, 7, 0, 0, 3]
i = 0
p = 0
while i <= len(big_array) - 1 and p <= len(big_array) - 2:
if big_array[i] == big_array[p + 1]:
big_array[i] = 5
big_array[p + 1] = 5
print(big_array)
i = i + 1
p = p + 1
Output:
[1, 7, 5, 5, 3]
It is a example, not whole correct code.
Here's a solution by viewing the array as blocks.
First you need to define this function rolling_window from here https://gist.github.com/seberg/3866040/revisions
Then break the array big, your starting array, into 2x2 blocks using this function.
Also generate an array which has indices of every element in big and break it similarly into 2x2 blocks.
Then generate a boolean mask where the 2x2 blocks of big are all zero, and use the index array to get those elements.
blks = rolling_window(big,window=(2,2)) # 2x2 blocks of original array
inds = np.indices(big.shape).transpose(1,2,0) # array of indices into big
blkinds = rolling_window(inds,window=(2,2,0)).transpose(0,1,4,3,2) # 2x2 blocks of indices into big
mask = blks == np.zeros((2,2)) # generate a mask of every 2x2 block which is all zero
mask = mask.reshape(*mask.shape[:-2],-1).all(-1) # still generating the mask
# now blks[mask] is every block which is zero..
# but you actually want the original indices in the array 'big' instead
inds = blkinds[mask].reshape(-1,2).T # indices into big where elements need replacing
big[inds[0],inds[1]] = 5 #reassign
You need to test this: I did not. But the idea is to break the array into blocks, and an array of indices into blocks, then develop a boolean condition on the blocks, use those to get the indices, and then reassign.
An alternative would be to iterate through indblks as defined here, then test the 2x2 obtained from big at each indblk element and reassign if necessary.
This is my attempt to help you solve your problem. My solution may be subject to fair criticism.
import numpy as np
from itertools import product
m = np.array((
[0,1,0,0,1,0,0,1],
[0,1,0,0,0,0,0,0],
[0,1,0,0,1,0,0,0],
[0,0,0,0,1,0,0,0],
[1,0,0,0,1,0,0,0]))
h = 2
w = 2
rr, cc = tuple(d + 1 - q for d, q in zip(m.shape, (h, w)))
slices = [(slice(r, r + h), slice(c, c + w))
for r, c in product(range(rr), range(cc))
if not m[r:r + h, c:c + w].any()]
for s in slices:
m[s] = 5
print(m)
[[0 1 5 5 1 5 5 1]
[0 1 5 5 0 5 5 5]
[0 1 5 5 1 5 5 5]
[0 5 5 5 1 5 5 5]
[1 5 5 5 1 5 5 5]]

Python numpy: reshape list into repeating 2D array

I'm new to python and I have a question about numpy.reshape. I currently have 2 lists of values like this:
x = [0,1,2,3]
y = [4,5,6,7]
And I want them to be in separate 2D arrays, where each item is repeated for the length of the original lists, like this:
xx = [[0,0,0,0]
[1,1,1,1]
[2,2,2,2]
[3,3,3,3]]
yy = [[4,5,6,7]
[4,5,6,7]
[4,5,6,7]
[4,5,6,7]]
Is there a way to do this with numpy.reshape, or is there a better method I could use? I would very much appreciate a detailed explanation. Thanks!
numpy.meshgrid will do this for you.
N.B. From your requested output, it looks like you want ij indexing, not the default xy
from numpy import meshgrid
x = [0,1,2,3]
y = [4,5,6,7]
xx,yy=meshgrid(x,y,indexing='ij')
print xx
>>> [[0 0 0 0]
[1 1 1 1]
[2 2 2 2]
[3 3 3 3]]
print yy
>>> [[4 5 6 7]
[4 5 6 7]
[4 5 6 7]
[4 5 6 7]]
For reference, here's xy indexing
xx,yy=meshgrid(x,y,indexing='xy')
print xx
>>> [[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]]
print yy
>>> [[4 4 4 4]
[5 5 5 5]
[6 6 6 6]
[7 7 7 7]]

How to filter a numpy array based on indices? [duplicate]

This question already has answers here:
How to apply a disc shaped mask to a NumPy array?
(7 answers)
Closed 8 years ago.
I've a square numpy array and I would like to extract the values from an annulus region around the central point of the array. I would like to set the radii of the annulus based on the distance of the points from the center. I retrieved the array indices by using numpy.indices but could not mange to find an efficient way to construct the filter. I'll appreciate if you share your comments/suggestions.
indices = numpy.indices((5, 5))
print indices
[[[0 0 0 0 0]
[1 1 1 1 1]
[2 2 2 2 2]
[3 3 3 3 3]
[4 4 4 4 4]]
[[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]]]
Now I want to extract the values of those points whose indices are at a distance of say, 1 from the central point i.e. (2,2).
pt = (2, 2)
distance = 1
mask = (indices[0] - pt[0]) ** 2 + (indices[1] - pt[1]) ** 2 <= distance ** 2
result = my_array[mask]

Performance of NumPy for algorithms concerning individual elements of an array

I'm interested in the performance of NumPy, when it comes to algorithms that check whether a condition is True for an element and its affiliations (e.g. neighbouring elements) and assign a value according to the condition.
An example may be: (I make this up now)
I generate a 2d array of 1's and 0's, randomly.
Then I check whether the first element of the array is the same with its neighbors.
If the similar ones are the majority, I switch (0 -> 1 or 1 -> 0) that particular element.
And I proceed to the next element.
I guess that this kind of element wise conditions and element-wise operations are pretty slow with NumPy, is there a way that I can make the performance better?
For example, would creating the array with type dbool and adjusting the code, would it help?
Thanks in advance.
Maybe http://www.scipy.org/Cookbook/GameOfLifeStrides helps you.
It looks like your are doing some kind of image processing, you can try scipy.ndimage.
from scipy.ndimage import convolve
import numpy as np
np.random.seed(0)
x = np.random.randint(0,2,(5,5))
print x
w = np.ones((3,3), dtype=np.int8)
w[1,1] = 0
y = convolve(x, w, mode="constant")
print y
the outputs are:
[[0 1 1 0 1]
[1 1 1 1 1]
[1 0 0 1 0]
[0 0 0 0 1]
[0 1 1 0 0]]
[[3 4 4 5 2]
[3 5 5 5 3]
[2 4 4 4 4]
[2 3 3 3 1]
[1 1 1 2 1]]
y is the sum of the neighbors of every element. Do the same convolve with all ones, you get the number of neighbors number of every element:
>>> n = convolve(np.ones((5,5),np.int8), w, mode="constant")
>>> n
[[3 5 5 5 3]
[5 8 8 8 5]
[5 8 8 8 5]
[5 8 8 8 5]
[3 5 5 5 3]]
then you can do element-wise operations with x, y, n, and get your result.

Categories