Numpy: assign different values to different rows using mask

Numpy: assign different values to different rows using mask - python

Say I have an array p in shape (m, n), an thresholds vector Ts in shape (m). I want to replace value in p using the following rule:
for i in range(m):
for j in range(n):
if p[i, j] > Ts[i]:
p[i, j] = Ts[i]
My implementation is:
newP = np.zeros_like(p)
cond = p > Ts[:, None]
newP += cond * Ts[:, None]
newP += ~cond * p
p = newP
It definitely looks ugly. I want to know if there's any way like p[cond]=Ts style. Thanks :)
An example:
# m = 2, n = 5
p = np.array([[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10]])
Ts = np.array([3, 8])
expected_new_p = \
np.array([[1, 2, 3, 3, 3],
[6, 7, 8, 8, 8]])

You can simply use np.where. If condition is not met, then return element from broadcasted Ts, else return the respective p value.
np.where(p < Ts[:, None], p, Ts[:, None])
array([[1, 2, 3, 3, 3],
[6, 7, 8, 8, 8]])

I think the most readable solution would be to use np.minimum() to extract element-wise minimums between p and the broadcasted array Ts[:,None]:
p = np.minimum(p, Ts[:,None])

You can use np.tile to cast threshold array to input array. So you can use boolean indexing directly. This might be useful for you:
m, n = 3, 4
x = np.random.random((m,n))
t = np.random.random((m))
mask = x > t[:,np.newaxis]
x[mask] = np.tile(t[:,np.newaxis], (1,n))[mask] #assigning values of t for True values to corresponding elements

You can compare p and Ts by adding an extra dimension to Ts, extract the locations where p < Ts with np.where. Then, ovewrite with the values from Ts:
i = np.where(p > Ts[:, None])
p[i] = Ts[i[0]]
Above, i is a tuple of arrays containing the indices in each dimensions of p.

Related

Filtering a pair of arrays for a certain range of numbers using python

So let's say I have two arrays x and y.
x = [5, -6, 3, 11, -4, 2]
y = [1, 9, 2, 8, 12, -5]
I need to filter these arrays, keeping only values between 0 and 10. I was able to do this separately:
new_x=[i for i in x if i>=0 and i<=10] with output
[5, 3, 2]
new_y=[i for i in y if i>=0 and i<=10] with output
[1, 9, 2, 8]
However, these x and y arrays are meant to symbolize coordinates, which I will later have to graph. They have to remain in pairs, so if an element in the x array does not fulfill the criteria, I do not want to include the matching y element. Same goes vice versa.
By looking at the arrays, it is easy to tell that only the first and third pair - (5,1) and (3,2) respectively - fulfill my criteria. How can I filter x and y together to represent this? My dataset is much larger than this example, so I do not have the time to filter through it manually.
Any help would be greatly appreciated!

zip them together:
pairs = [(a, b) for a, b in zip(x, y) if 0 <= a <= 10 and 0 <= b <= 10]
# [(5, 1), (3, 2)]
If you then want to go back to x, y (I wouldn't recommend this, just keep them together), you could use:
x, y = zip(*pairs)
Which will return x and y as tuples:
x = (5, 3)
y = (1, 2)
In this context, the two tuples look like ordered pairs, but in general you could get something like
x = (5, 3, -6, 11, ...)

You can use the filter function
filtered_coords = list(
filter(lambda coords: all([0 <= coord <= 10 for coord in coords]), zip(x, y))
)
If you want to split the x and y coordinates:
filtered_x, filtered_y = [[c[i] for c in filtered_coords] for i in range(2)]
print(filtered_x) # [5, 3]
print(filtered_y) # [1, 2]

Use the same index to point to both arrays at the same time.
x = [5, -6, 3, 11, -4, 2]
y = [1, 9, 2, 8, 12, -5]
new_x = []
new_y = []
for i in range(len(x)):
if 0 <= x[i] <= 10 and 0 <= y[i] <= 10:
new_x.append(x[i])
new_y.append(y[i])
print(new_x, new_y)
# [5, 3] [1, 2]

Numpy array insert every second element from second array

I have two arrays of the same shape and now want to combine them by making every odd element and 0 one of the first array and every even one of the second array in the same order.
E.g.:
a = ([0,1,3,5])
b = ([2,4,6])
c = ([0,1,2,3,4,5,6])
I tried something including modulo to identify uneven indices:
a = ([0,1,3,5])
b = ([2,4,6])
c = a
i = 0
j = 2
l = 0
for i in range(1,22):
k = (i+j) % 2
if k > 0:
c = np.insert(c, i, b[l])
l+=1
else:
continue
I guess there is some easier/faster slicing option, but can't figure it out.

np.insert would work well:
>>> A = np.array([1, 3, 5, 7])
>>> B = np.array([2, 4, 6, 8])
>>> np.insert(B, np.arange(len(A)), A)
array([1, 2, 3, 4, 5, 6, 7, 8])
However, if you don't rely on sorted values, try this:
>>> A = np.array([5, 3, 1])
>>> B = np.array([1, 2, 3])
>>> C = [ ]
>>> for element in zip(A, B):
C.extend(element)
>>> C
[5, 1, 3, 2, 1, 3]

read the documentation of the range
for i in range(0,10,2):
print(i)
will print [0,2,4,6,8]

From what I understand, the first element in a is always first the rest are just intereleaved. If that is the case, then some clever use of stacking and reshaping is probably enough.
a = np.array([0,1,3,5])
b = np.array([2,4,6])
c = np.hstack([a[:1], np.vstack([a[1:], b]).T.reshape((-1, ))])

You could try something like this
import numpy as np
A = [0,1,3,5]
B = [2,4,6]
lst = np.zeros(len(A)+len(B))
lst[0]=A[0]
lst[1::2] = A[1:]
lst[2::2] = B
Even though I don't understand why you would make it so complicated

"Multiply" 1d numpy array with a smaller one and sum the result

I want to "multiply" (for lack of better description) a numpy array X of size M with a smaller numpy array Y of size N, for every N elements in X. Then, I want to sum the resulting array (almost like a dotproduct).
I hope the example makes it more clear:
Example
X = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Y = [1,2,3]
Z = mymul(X, Y)
= [0*1, 1*2, 2*3, 3*1, 4*2, 5*3, 6*1, 7*2, 8*3, 9*1]
= [ 0, 2, 6, 3, 8, 15, 6, 14, 24, 9]
result = sum(Z) = 87
X and Y can be of varying lengths and Y is always smaller than X, but not necessarily divisible (e.g. M % N != 0)
I have some solutions but they are quite slow. I'm hoping there is a faster way to do this.
import numpy as np
X = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int)
Y = np.array([1,2,3], dtype=int)
# these work but are slow for large X, Y
# simple for-loop
t = 0
for i in range(len(X)):
t += X[i] * Y[i % len(Y)]
print(t) #87
# extend Y M/N times so np.dot can be applied
Ytiled = np.tile(Y, int(np.ceil(len(X) / len(Y))))[:len(X)]
t = np.dot(X, Ytiled)
print(t) #87

Resize Y to same length as X and then use matrix-multiplication -
In [52]: np.dot(X, np.resize(Y,len(X)))
Out[52]: 87
Alternative to using np.resize would be with tiling. Hence, np.tile(Y,(m+n-1)//n)[:m] for m,n = len(X), len(Y), could replace np.resize(Y,len(X)) for a faster one.
Another without resizing Y to achieve memory-efficiency -
In [79]: m,n = len(X), len(Y)
In [80]: s = n*(m//n)
In [81]: X2D = X[:s].reshape(-1,n)
In [82]: X2D.dot(Y).sum() + np.dot(X[s:],Y[:m-s])
Out[82]: 87
Alternatively, we can use np.einsum('ij,j->',X2D,Y) to replace X2D.dot(Y).sum().

You can use convolve (documentation):
np.convolve(X, Y[::-1], 'same')[::len(Y)].sum()
Remember to reverse the second array.

Given distances and values array, return sorted filtered values in numpy

I am not sure what the title of this question should be. But lets say we have 2 arrays, values and distances.
values = np.array([[-1,-1,-1],
[1, 2, 0],
[-1,-1,-1]])
distances = np.array([[1,2,3],
[6,5,4],
[7,8,9]])
I would like to get the values that are non negative, and have them in order by its corresponding distance, based on the distances array.
So with the example above, the positive values are [1,2,0] and its distances will be [6,5,4]. Thus, if sorting by its corresponding distance, I would like to have [0,2,1] as the answer.
My code is below. It works, but would like to have the solution of just using numpy. Im sure that would be more efficient than this:
import numpy as np
import heapq
def get_sorted_values(seek_val, values, distances):
r, c = np.where(values >= seek_val)
di = distances[r, c]
vals = values[r, c]
print("di", di)
print("vals", vals)
if len(di) >= 1:
heap = []
for d, v in zip(di,vals):
heapq.heappush(heap, (d,v))
lists = []
while heap:
d, v = heapq.heappop(heap)
lists.append(v)
return lists
else:
## NOTHING FOUND
return None
Input:
seek_val = 0
values = np.array([[-1,-1,-1],
[1,2,0],
[-1,-1,-1]])
distances = np.array([[1,2,3],
[6,5,4],
[7,8,9]])
print("Ans:",get_sorted_values(seek_val, values, distances))
Output:
di [6 5 4]
vals [1 2 0]
Ans: [0, 2, 1]

"one liner":
values[np.where(values >= 0)][np.argsort(distances[np.where(values >= 0)])]
Out[981]: array([0, 2, 1])
repeating np.where(values >= 0) is inefficient, could make a variable if values is big
v_indx = np.where(values >= 0)
values[v_indx][np.argsort(distances[v_indx])]

Try np.argsort
import numpy as np
values = np.array([[-1,-1,-1],
[ 1, 2, 0],
[-1,-1,-1]])
distances = np.array([[1, 2, 3],
[6, 5, 4],
[7, 8, 9]])
print(values[values >= 0])
# [1 2 0]
print(distances[values >= 0])
# [6 5 4]
print('Ans:', values[values >= 0][np.argsort(distances[values >= 0])])
# Ans: [0 2 1]

Summing and removing repeated elements of Numpy Arrays

I have 4 1D Numpy arrays of equal length.
The first three act as an ID, uniquely identifying the 4th array.
The ID arrays contain repeated combinations, for which I need to sum the 4th array, and remove the repeating element from all 4 arrays.
x = np.array([1, 2, 4, 1])
y = np.array([1, 1, 4, 1])
z = np.array([1, 2, 2, 1])
data = np.array([4, 7, 3, 2])
In this case I need:
x = [1, 2, 4]
y = [1, 1, 4]
z = [1, 2, 2]
data = [6, 7, 3]
The arrays are rather long so loops really won't work. I'm sure there is a fairly simple way to do this, but for the life of me I can't figure it out.

To get started, we can stack the ID vectors into a matrix such that each ID is a row of three values:
XYZ = np.vstack((x,y,z)).T
Now, we just need to find the indices of repeated rows. Unfortunately, np.unique doesn't operate on rows, so we need to do some tricks:
order = np.lexsort(XYZ.T)
diff = np.diff(XYZ[order], axis=0)
uniq_mask = np.append(True, (diff != 0).any(axis=1))
This part is borrowed from the np.unique source code, and finds the unique indices as well as the "inverse index" mapping:
uniq_inds = order[uniq_mask]
inv_idx = np.zeros_like(order)
inv_idx[order] = np.cumsum(uniq_mask) - 1
Finally, sum over the unique indices:
data = np.bincount(inv_idx, weights=data)
x,y,z = XYZ[uniq_inds].T

You can use unique and sum as reptilicus suggested to do the following
from itertools import izip
import numpy as np
x = np.array([1, 2, 4, 1])
y = np.array([1, 1, 4, 1])
z = np.array([1, 2, 2, 1])
data = np.array([4, 7, 3, 2])
# N = len(x)
# ids = x + y*N + z*(N**2)
ids = np.array([hash((a, b, c)) for a, b, c in izip(x, y, z)]) # creates flat ids
_, idx, idx_rep = np.unique(ids, return_index=True, return_inverse=True)
x_out = x[idx]
y_out = y[idx]
z_out = z[idx]
# data_out = np.array([np.sum(data[idx_rep == i]) for i in idx])
data_out = np.bincount(idx_rep, weights=data)
print x_out
print y_out
print z_out
print data_out

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy: assign different values to different rows using mask - python

You can simply use np.where. If condition is not met, then return element from broadcasted Ts, else return the respective p value. np.where(p < Ts[:, None], p, Ts[:, None]) array([[1, 2, 3, 3, 3], [6, 7, 8, 8, 8]])

I think the most readable solution would be to use np.minimum() to extract element-wise minimums between p and the broadcasted array Ts[:,None]: p = np.minimum(p, Ts[:,None])

You can compare p and Ts by adding an extra dimension to Ts, extract the locations where p < Ts with np.where. Then, ovewrite with the values from Ts: i = np.where(p > Ts[:, None]) p[i] = Ts[i[0]] Above, i is a tuple of arrays containing the indices in each dimensions of p.

Related

Filtering a pair of arrays for a certain range of numbers using python

Numpy array insert every second element from second array

"Multiply" 1d numpy array with a smaller one and sum the result

Given distances and values array, return sorted filtered values in numpy

Summing and removing repeated elements of Numpy Arrays

Categories

Resources