Summing and removing repeated elements of Numpy Arrays - python

I have 4 1D Numpy arrays of equal length.
The first three act as an ID, uniquely identifying the 4th array.
The ID arrays contain repeated combinations, for which I need to sum the 4th array, and remove the repeating element from all 4 arrays.
x = np.array([1, 2, 4, 1])
y = np.array([1, 1, 4, 1])
z = np.array([1, 2, 2, 1])
data = np.array([4, 7, 3, 2])
In this case I need:
x = [1, 2, 4]
y = [1, 1, 4]
z = [1, 2, 2]
data = [6, 7, 3]
The arrays are rather long so loops really won't work. I'm sure there is a fairly simple way to do this, but for the life of me I can't figure it out.

To get started, we can stack the ID vectors into a matrix such that each ID is a row of three values:
XYZ = np.vstack((x,y,z)).T
Now, we just need to find the indices of repeated rows. Unfortunately, np.unique doesn't operate on rows, so we need to do some tricks:
order = np.lexsort(XYZ.T)
diff = np.diff(XYZ[order], axis=0)
uniq_mask = np.append(True, (diff != 0).any(axis=1))
This part is borrowed from the np.unique source code, and finds the unique indices as well as the "inverse index" mapping:
uniq_inds = order[uniq_mask]
inv_idx = np.zeros_like(order)
inv_idx[order] = np.cumsum(uniq_mask) - 1
Finally, sum over the unique indices:
data = np.bincount(inv_idx, weights=data)
x,y,z = XYZ[uniq_inds].T

You can use unique and sum as reptilicus suggested to do the following
from itertools import izip
import numpy as np
x = np.array([1, 2, 4, 1])
y = np.array([1, 1, 4, 1])
z = np.array([1, 2, 2, 1])
data = np.array([4, 7, 3, 2])
# N = len(x)
# ids = x + y*N + z*(N**2)
ids = np.array([hash((a, b, c)) for a, b, c in izip(x, y, z)]) # creates flat ids
_, idx, idx_rep = np.unique(ids, return_index=True, return_inverse=True)
x_out = x[idx]
y_out = y[idx]
z_out = z[idx]
# data_out = np.array([np.sum(data[idx_rep == i]) for i in idx])
data_out = np.bincount(idx_rep, weights=data)
print x_out
print y_out
print z_out
print data_out

Related

Multiply two lists but multiply each number in the first list by all numbers in the second list in python

I have two lists I want to multiply each number in the first list by all numbers in the second list
[1,2]x[1,2,3]
I want my result to be like this [(1x1)+(1x2)+(1x3),(2x1)+(2x2)+(2x3)]
numpy
a = np.array([1,2])
b = np.array([1,2,3])
c = (a[:,None]*b).sum(1)
output: array([ 6, 12])
python
a = [1,2]
b = [1,2,3]
c = [sum(x*y for y in b) for x in a]
output: [6, 12]
old answer (product per element)
numpy
a = np.array([1,2])
b = np.array([1,2,3])
c = (a[:,None]*b).ravel()
output: array([1, 2, 3, 2, 4, 6])
python
a = [1,2]
b = [1,2,3]
c = [x*y for x in a for y in b]
## OR
from itertools import product
c = [x*y for x,y in product(a,b)]
output: [1, 2, 3, 2, 4, 6]
def multiplyLists(list1: list, list2:list) -> list:
toReturn = []
for i in list1:
temp_sum = 0
for j in list2:
temp_sum += i * j
toReturn.append(temp_sum)
return toReturn
Another way using numpy (that you can extend to many other functions between two lists):
a = [1,2]
b = [1,2,3]
np.multiply.outer(a,b).ravel()
#array([1, 2, 3, 2, 4, 6])
As the comments point out a pure Python solution will be very different to a numpy solution.
Pyhton
Here it woud be straightforward to use a nested loop or list comprehension:
list1 = [1, 2]
list2 = [1, 2, 3]
lst_output = []
for i in list1:
for j in list2:
lst_output .append(i*j)
#equivalent alternative
lst_output = [i*j for i in list1 for j in list2]
Numpy
There are mny ways to go about it with numpy as well. Here's one example:
arr1 = np.array([1, 2])
arr2 = np.array([1, 2, 3])
xx, yy = np.meshgrid(arr1, arr2)
arr_output = xx * yy
# optionally (to get a 1d array)
arr_output_flat = arr_output.flatten()
Edit: Reading your question again I noticed you state you actually want the output to be 2 sums (of 3 products). I suggest you phrase more precisely what you want an what you've tried. But to provide that here's what you can do with the lists or arrays from above:
# Pure Python
lst_output = [sum(i*j for j in list2) for i in list1]
# Numpy
xx, yy = np.meshgrid(arr1, arr2)
arr_output = np.sum(xx * yy, axis=0)

How do I reduce the use of for loops using numpy?

Basically, I have three arrays that I multiply with values from 0 to 2, expanding the number of rows to the number of products (the values to be multiplied are the same for each array). From there, I want to calculate the product of every combination of rows from all three arrays. So I have three arrays
A = np.array([1, 2, 3])
B = np.array([1, 2, 3])
C = np.array([1, 2, 3])
and I'm trying to reduce the operation given below
search_range = np.linspace(0, 2, 11)
results = np.array([[0, 0, 0]])
for i in search_range:
for j in search_range:
for k in search_range:
sm = i*A + j*B + k*C
results = np.append(results, [sm], axis=0)
What I tried doing:
A = np.array([[1, 2, 3]])
B = np.array([[1, 2, 3]])
C = np.array([[1, 2, 3]])
n = 11
scale = np.linspace(0, 2, n).reshape(-1, 1)
A = np.repeat(A, n, axis=0) * scale
B = np.repeat(B, n, axis=0) * scale
C = np.repeat(C, n, axis=0) * scale
results = np.array([[0, 0, 0]])
for i in range(n):
A_i = A[i]
for j in range(n):
B_j = B[j]
C_k = C
sm = A_i + B_j + C_k
results = np.append(results, sm, axis=0)
which only removes the last for loop. How do I reduce the other for loops?
You can get the same result like this:
search_range = np.linspace(0, 2, 11)
search_range = np.array(np.meshgrid(search_range, search_range, search_range))
search_range = search_range.T.reshape(-1, 3)
sm = search_range[:, 0, None]*A + search_range[:, 1, None]*B + search_range[:, 2, None]*C
results = np.concatenate(([[0, 0, 0]], sm))
Instead of using three nested loops to get every combination of elements in the "search_range" array, I used the meshgrid function to convert "search_range" to a 2D array of every possible combination and then instead of i, j and k you can use the 3 items in the arrays in the "search_range".
And finally, as suggested by #Mercury you can use indexing for the new "search_range" array to generate the result. For example search_range[:, 1, None] is an array in shape of (1331, 1), containing singleton arrays of every element at index of 0 in arrays in the "search_range". That concatenate is only there because you wanted the results array to have default value of [[0, 0, 0]], so I appended sm to it; Otherwise, the sm array contains the answer.

Numpy: assign different values to different rows using mask

Say I have an array p in shape (m, n), an thresholds vector Ts in shape (m). I want to replace value in p using the following rule:
for i in range(m):
for j in range(n):
if p[i, j] > Ts[i]:
p[i, j] = Ts[i]
My implementation is:
newP = np.zeros_like(p)
cond = p > Ts[:, None]
newP += cond * Ts[:, None]
newP += ~cond * p
p = newP
It definitely looks ugly. I want to know if there's any way like p[cond]=Ts style. Thanks :)
An example:
# m = 2, n = 5
p = np.array([[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10]])
Ts = np.array([3, 8])
expected_new_p = \
np.array([[1, 2, 3, 3, 3],
[6, 7, 8, 8, 8]])
You can simply use np.where. If condition is not met, then return element from broadcasted Ts, else return the respective p value.
np.where(p < Ts[:, None], p, Ts[:, None])
array([[1, 2, 3, 3, 3],
[6, 7, 8, 8, 8]])
I think the most readable solution would be to use np.minimum() to extract element-wise minimums between p and the broadcasted array Ts[:,None]:
p = np.minimum(p, Ts[:,None])
You can use np.tile to cast threshold array to input array. So you can use boolean indexing directly. This might be useful for you:
m, n = 3, 4
x = np.random.random((m,n))
t = np.random.random((m))
mask = x > t[:,np.newaxis]
x[mask] = np.tile(t[:,np.newaxis], (1,n))[mask] #assigning values of t for True values to corresponding elements
You can compare p and Ts by adding an extra dimension to Ts, extract the locations where p < Ts with np.where. Then, ovewrite with the values from Ts:
i = np.where(p > Ts[:, None])
p[i] = Ts[i[0]]
Above, i is a tuple of arrays containing the indices in each dimensions of p.

Numpy array insert every second element from second array

I have two arrays of the same shape and now want to combine them by making every odd element and 0 one of the first array and every even one of the second array in the same order.
E.g.:
a = ([0,1,3,5])
b = ([2,4,6])
c = ([0,1,2,3,4,5,6])
I tried something including modulo to identify uneven indices:
a = ([0,1,3,5])
b = ([2,4,6])
c = a
i = 0
j = 2
l = 0
for i in range(1,22):
k = (i+j) % 2
if k > 0:
c = np.insert(c, i, b[l])
l+=1
else:
continue
I guess there is some easier/faster slicing option, but can't figure it out.
np.insert would work well:
>>> A = np.array([1, 3, 5, 7])
>>> B = np.array([2, 4, 6, 8])
>>> np.insert(B, np.arange(len(A)), A)
array([1, 2, 3, 4, 5, 6, 7, 8])
However, if you don't rely on sorted values, try this:
>>> A = np.array([5, 3, 1])
>>> B = np.array([1, 2, 3])
>>> C = [ ]
>>> for element in zip(A, B):
C.extend(element)
>>> C
[5, 1, 3, 2, 1, 3]
read the documentation of the range
for i in range(0,10,2):
print(i)
will print [0,2,4,6,8]
From what I understand, the first element in a is always first the rest are just intereleaved. If that is the case, then some clever use of stacking and reshaping is probably enough.
a = np.array([0,1,3,5])
b = np.array([2,4,6])
c = np.hstack([a[:1], np.vstack([a[1:], b]).T.reshape((-1, ))])
You could try something like this
import numpy as np
A = [0,1,3,5]
B = [2,4,6]
lst = np.zeros(len(A)+len(B))
lst[0]=A[0]
lst[1::2] = A[1:]
lst[2::2] = B
Even though I don't understand why you would make it so complicated

Python - Convert the array in a tuple to just a normal array

I have a signal where I want to find the average height of the values. This is done by finding the zero crossings and calculating the max and min between each zero crossing, then averaging these values.
My problem occurs when I want to use np.where() to find where the signal is crossing zero. When I use np.where() I get the result in a tuple, but I want it in an array where I can count the amount of times zero is crossed.
I am new to Python and coming from Matlab it is a bit confusing with all the different classes. As you can see, I get an error because nu = len(zero_u) gives 1 as a result, because the whole array is written in a tuple as one element.
Any ideas how to go around this?
The code looks like this:
import numpy as np
def averageheight(f):
rms = np.std(f)
f = f + (rms * 10**-6)
# Find zero crossing
fsign = np.sign(f)
fdiff = np.diff(fsign)
zero_u = np.asarray(np.where(fdiff > 0)) + 1
zero_d = np.asarray(np.where(fdiff < 0)) + 1
nu = len(zero_u)
nd = len(zero_d)
value_max = np.zeros((nu, 1))
value_min = np.zeros((nu, 1))
imaxvec = np.zeros((nu, 1))
iminvec = np.zeros((nu, 1))
if (nu > 2) and (nd > 2):
if zero_u[0] > zero_d[0]:
zero_d[0] = []
nu = len(zero_u)
nd = len(zero_d)
ncross = np.fmin(nu, nd)
# Find Maxima:
for ic in range(0, ncross - 1):
up = int(zero_u[ic])
down = int(zero_d[ic])
fvec = f[up:down]
value_max[ic] = np.amax(fvec)
index_max = value_max.argmax()
imaxvec[ic] = up + index_max - 1
# Find Minima:
for ic in range(0, ncross - 2):
down = int(zero_d[ic])
up = int(zero_u[ic+1])
fvec = f[down:up]
value_min[ic] = np.amin(fvec)
index_min = value_min.argmin()
iminvec[ic] = down + index_min - 1
# Remove spurious values, bumps and zero_d
thr = rms/3
maxfind = np.where(value_max < thr)
for i in range(0, len(maxfind)):
imaxfind = np.where(value_max == maxfind[i])
imaxvec[imaxfind] = 0
value_max[imaxfind] = 0
minfind = np.where(value_min > -thr)
for j in range(0, len(minfind)):
iminfind = np.where(value_min == minfind[j])
value_min[iminfind] = 0
iminvec[iminfind] = 0
# Find Average Height
avh = np.mean(value_max) - np.mean(value_min)
else:
avh = 0
return avh
np.where, and np.nonzero even more so, clearly explains that it returns a tuple, with one array for each dimension of the condition array:
In [71]: arr = np.random.randint(-5,5,10)
In [72]: arr
Out[72]: array([ 3, 4, 2, -3, -1, 0, -5, 4, 2, -3])
In [73]: arr.shape
Out[73]: (10,)
In [74]: np.where(arr>=0)
Out[74]: (array([0, 1, 2, 5, 7, 8]),)
In [75]: arr[_]
Out[75]: array([3, 4, 2, 0, 4, 2])
That Out[74] tuple can be used directly as an index.
You can also extract the array from the tuple:
In [76]: np.where(arr>=0)[0]
Out[76]: array([0, 1, 2, 5, 7, 8])
That, I think is a better choice than the np.asarray(np.where(...))
This convention for where becomes clearer when we use it on a 2d array
In [77]: arr2 = arr.reshape(2,5)
In [78]: np.where(arr2>=0)
Out[78]: (array([0, 0, 0, 1, 1, 1]), array([0, 1, 2, 0, 2, 3]))
In [79]: arr2[_]
Out[79]: array([3, 4, 2, 0, 4, 2])
Again we are indexing with a tuple. arr2[1,3] is really arr2[(1,3)]. The values in [] indexing brackets are actually passed to the indexing function as a tuple of values.
np.argwhere applies transpose to the result of where, producing an array:
In [80]: np.transpose(np.where(arr2>=0))
Out[80]:
array([[0, 0],
[0, 1],
[0, 2],
[1, 0],
[1, 2],
[1, 3]])
That's the same indexing arrays, but arranged in a 2d column matrix.
If you need the count of where without the actual values, a slightly faster function is
In [81]: np.count_nonzero(arr>=0)
Out[81]: 6
In fact np.nonzero uses the count to first determine the size of the arrays that it will return.

Categories