Say I have the numpy array arr_1 = np.arange(10) returning:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
How do I change multiple elements to a certain value using slicing?
For example: changing the zeroth, first and second element that occur every five elements, starting from the first element, to 100. I want this:
array([0, 100, 100, 100, 4, 5, 100, 100, 100, 9])
I tried arr_1[1::[5, 6, 7]] = 100 but that doesn't work.
Here is another solution based on what you did :
arr_1 = np.arange(10)
arr_1[1::5] = 100
arr_1[2::5] = 100
arr_1[3::5] = 100
and it returns :
array([ 0, 100, 100, 100, 4, 5, 100, 100, 100, 9])
If your repeat offset divides the array length:
a.reshape((-1, 5))[:, 1:4] = 100
General case requires two lines:
a[: len(a) // 5 * 5].reshape((-1, 5))[:, 1:4] = 100
a[len(a) // 5 * 5 :][1:4] = 100
How it works: Reshaping in the described way stacks consecutive stretches of the array in such a way that the target substretches are aligned and can therefore be addressed in one go using standard 2d indexing:
>>> a = np.arange(15)
>>> a.reshape((-1, 5))
array([[ 0, 1x, 2x, 3x, 4],
[ 5, 6x, 7x, 8x, 9],
[10, 11x, 12x, 13x, 14]])
Here's one approach with masking -
a = np.arange(10) # Input array
idx = np.array([0,1,2]) # Indices to be set
offset = 1 # Offset
a[np.in1d(np.mod(np.arange(a.size),5) , idx+offset)] = 100
Sample run with original sample -
In [849]: a = np.arange(10) # Input array
...: idx = np.array([0,1,2]) # Indices to be set
...: offset = 1 # Offset
...:
...: a[np.in1d(np.mod(np.arange(a.size),5) , idx+offset)] = 100
...:
In [850]: a
Out[850]: array([ 0, 100, 100, 100, 4, 5, 100, 100, 100, 9])
Sample run with non-sequential indices -
In [851]: a = np.arange(11) # Input array
...: idx = np.array([0,2,3]) # Indices to be set
...: offset = 1 # Offset
...:
In [852]: a[np.in1d(np.mod(np.arange(a.size),5) , idx+offset)] = 100
In [853]: a
Out[853]: array([ 0, 100, 2, 100, 100, 5, 100, 7, 100, 100, 10])
You just need to wrap your list of indexes in np.array(list). You were very close to being correct:
In [2]: arr_1 = np.arange(10)
In [3]: arr_1[np.array([0,1,2,5,6,7])] = 100
In [4]: arr_1
Out[4]: array([100, 100, 100, 3, 4, 100, 100, 100, 8, 9])
I used hand coded values for the indexes, per your requirements. You can get the indexes in an automated way using some technique you like, like that shown by Divakar.
Related
I need to average the Y values corresponding to the values in the X array...
X=np.array([ 1, 1, 2, 2, 2, 2, 3, 3 ... ])
Y=np.array([ 10, 30, 15, 10, 16, 10, 15, 20 ... ])
In other words, the equivalents of the 1 values in the X array are 10 and 30 in the Y array, and the average of this is 20, the equivalents of the 2 values are 15, 10, 16, and 10, and their average is 12.75, and so on...
How can I calculate these average values?
One option is to use a property of linear regression (with categorical variables):
import numpy as np
x = np.array([ 1, 1, 2, 2, 2, 2, 3, 3 ])
y = np.array([ 10, 30, 15, 10, 16, 10, 15, 20 ])
x_dummies = x[:, None] == np.unique(x)
means = np.linalg.lstsq(x_dummies, y, rcond=None)[0]
print(means) # [20. 12.75 17.5 ]
You can try using pandas
import pandas as pd
import numpy as np
N = pd.DataFrame(np.transpose([X,Y]),
columns=['X', 'Y']).groupby('X')['Y'].mean().to_numpy()
# array([20. , 12.75, 17.5 ])
import numpy as np
X = np.array([ 1, 1, 2, 2, 2, 2, 3, 3])
Y = np.array([ 10, 30, 15, 10, 16, 10, 15, 20])
# Only unique values
unique_vals = np.unique(X);
# Loop for every value
for val in unique_vals:
# Search for proper indexes in Y
idx = np.where(X == val)
# Mean for finded indexes
aver = np.mean(Y[idx])
print(f"Average for {val}: {aver}")
Result:
Average for 1: 20.0
Average for 2: 12.75
Average for 3: 17.5
you can use something like the below code :
import numpy as np
X=np.array([ 1, 1, 2, 2, 2, 2, 3, 3])
Y=np.array([ 10, 30, 15, 10, 16, 10, 15, 20])
def groupby(a, b):
# Get argsort indices, to be used to sort a and b in the next steps
sidx = b.argsort(kind='mergesort')
a_sorted = a[sidx]
b_sorted = b[sidx]
# Get the group limit indices (start, stop of groups)
cut_idx = np.flatnonzero(np.r_[True,b_sorted[1:] != b_sorted[:-1],True])
# Split input array with those start, stop ones
out = [a_sorted[i:j] for i,j in zip(cut_idx[:-1],cut_idx[1:])]
return out
group_by_array=groupby(Y,X)
for item in group_by_array:
print(np.average(item))
I use the information in the below link to answer the question:
Group numpy into multiple sub-arrays using an array of values
I think this solution should work:
avg_arr = []
i = 1
while i <= np.max(x):
inds = np.where(x == i)
my_val = np.average(y[inds[0][0]:inds[0][-1]])
avg_arr.append(my_val)
i+=1
Definitely, not the cleanest, but I was able to test it quickly and it does indeed work.
I have a numpy array and a mask specifying which entries from that array to shuffle while keeping their relative order. Let's have an example:
In [2]: arr = np.array([5, 3, 9, 0, 4, 1])
In [4]: mask = np.array([True, False, False, False, True, True])
In [5]: arr[mask]
Out[5]: array([5, 4, 1]) # These entries shall be shuffled inside arr, while keeping their order.
In [6]: np.where(mask==True)
Out[6]: (array([0, 4, 5]),)
In [7]: shuffle_array(arr, mask) # I'm looking for an efficient realization of this function!
Out[7]: array([3, 5, 4, 9, 0, 1]) # See how the entries 5, 4 and 1 haven't changed their order.
I've written some code that can do this, but it's really slow.
import numpy as np
def shuffle_array(arr, mask):
perm = np.arange(len(arr)) # permutation array
n = mask.sum()
if n > 0:
old_true_pos = np.where(mask == True)[0] # old positions for which mask is True
old_false_pos = np.where(mask == False)[0] # old positions for which mask is False
new_true_pos = np.random.choice(perm, n, replace=False) # draw new positions
new_true_pos.sort()
new_false_pos = np.setdiff1d(perm, new_true_pos)
new_pos = np.hstack((new_true_pos, new_false_pos))
old_pos = np.hstack((old_true_pos, old_false_pos))
perm[new_pos] = perm[old_pos]
return arr[perm]
To make things worse, I actually have two large matrices A and B with shape (M,N). Matrix A holds arbitrary values, while each row of matrix B is the mask which to use for shuffling one corresponding row of matrix A according to the procedure that I outlined above. So what I want is shuffled_matrix = row_wise_shuffle(A, B).
The only way I have so far found to do it is via my shuffle_array() function and a for loop.
Can you think of any numpy'onic way to accomplish this task avoiding loops? Thank you so much in advance!
For 1d case:
import numpy as np
a = np.arange(8)
b = np.array([1,1,1,1,0,0,0,0])
# Get ordered values
ordered_values = a[np.where(b==1)]
# We'll shuffle both arrays
shuffled_ix = np.random.permutation(a.shape[0])
a_shuffled = a[shuffled_ix]
b_shuffled = b[shuffled_ix]
# Replace the values with correct order
a_shuffled[np.where(b_shuffled==1)] = ordered_values
a_shuffled # Notice that 0, 1, 2, 3 preserves order.
>>>
array([0, 1, 2, 6, 3, 4, 7, 5])
for 2d case, columnwise shuffle (along axis=1):
import numpy as np
a = np.arange(24).reshape(4,6)
b = np.array([[0,0,0,0,1,1], [1,1,1,0,0,0], [1,1,1,1,0,0], [0,0,1,1,0,0]])
# The code below works for column shuffle (i.e. axis=1).
# Get ordered values
i,j = np.where(b==1)
values = a[i, j]
values
# We'll shuffle both arrays for axis=1
# taken from https://stackoverflow.com/questions/5040797/shuffling-numpy-array-along-a-given-axis
idx = np.random.rand(*a.shape).argsort(axis=1)
a_shuffled = np.take_along_axis(a,idx,axis=1)
b_shuffled = np.take_along_axis(b,idx,axis=1)
# Replace the values with correct order
a_shuffled[np.where(b_shuffled==1)] = values
# Get the result
a_shuffled # see that 4,5 | 6,7,8 | 12,13,14,15 | 20, 21 preserves order
>>>
array([[ 4, 1, 0, 3, 2, 5],
[ 9, 6, 7, 11, 8, 10],
[12, 13, 16, 17, 14, 15],
[23, 20, 19, 22, 21, 18]])
for 2d case, rowwise shuffle (along axis=0), we can use the same code, first transpose arrays and after shuffle transpose back:
import numpy as np
a = np.arange(24).reshape(4,6)
b = np.array([[0,0,0,0,1,1], [1,1,1,0,0,0], [1,1,1,1,0,0], [0,0,1,1,0,0]])
# The code below works for column shuffle (i.e. axis=1).
# As you said rowwise, we first transpose
at = a.T
bt = b.T
# Get ordered values
i,j = np.where(bt==1)
values = at[i, j]
values
# We'll shuffle both arrays for axis=1
# taken from https://stackoverflow.com/questions/5040797/shuffling-numpy-array-along-a-given-axis
idx = np.random.rand(*at.shape).argsort(axis=1)
at_shuffled = np.take_along_axis(at,idx,axis=1)
bt_shuffled = np.take_along_axis(bt,idx,axis=1)
# Replace the values with correct order
at_shuffled[np.where(bt_shuffled==1)] = values
# Get the result
a_shuffled = at_shuffled.T
a_shuffled # see that 6,12 | 7, 13 | 8,14,20 | 15, 21 preserves order
>>>
array([[ 6, 7, 2, 3, 10, 17],
[18, 19, 8, 15, 16, 23],
[12, 13, 14, 21, 4, 5],
[ 0, 1, 20, 9, 22, 11]])
I have a numpy array such as this
[[ 0, 57],
[ 7, 72],
[ 2, 51],
[ 8, 67],
[ 4, 42]]
I want to find out for each row, how many elements in the 2nd column are within a certain distance (say, 10) of the 2nd column value for that row. So in this example, here the solution would be
[[ 0, 57, 3],
[ 7, 72, 2],
[ 2, 51, 3],
[ 8, 67, 3],
[ 4, 42, 2]]
So [first row, third column] is 3, because there are 3 elements in the 2nd column (57,51,67) which are within distance 10 from 57. Similarly for each row
Any help would be appreciated!
Here's one approach leveraging broadcasting with outer-subtraction -
(np.abs(a[:,1,None] - a[:,1]) <= 10).sum(1)
With outer subtract builtin and count_nonzero for counting -
np.count_nonzero(np.abs(np.subtract.outer(a[:,1],a[:,1]))<=10,axis=1)
Sample run -
# Input array
In [23]: a
Out[23]:
array([[ 0, 57],
[ 7, 72],
[ 2, 51],
[ 8, 67],
[ 4, 42]])
# Get count
In [24]: count = (np.abs(a[:,1,None] - a[:,1]) <= 10).sum(1)
In [25]: count
Out[25]: array([3, 2, 3, 3, 2])
# Stack with input
In [26]: np.c_[a,count]
Out[26]:
array([[ 0, 57, 3],
[ 7, 72, 2],
[ 2, 51, 3],
[ 8, 67, 3],
[ 4, 42, 2]])
Alternatively with SciPy's cdist -
In [53]: from scipy.spatial.distance import cdist
In [54]: (cdist(a[:,None,1],a[:,1,None], 'minkowski', p=2)<=10).sum(1)
Out[54]: array([3, 2, 3, 3, 2])
For million rows in the input, we might want to resort to a loopy one -
n = len(a)
count = np.empty(n, dtype=int)
for i in range(n):
count[i] = np.count_nonzero(np.abs(a[:,1]-a[i,1])<=10)
Here's a non-broadcasting approach, which takes advantage of the fact that to know how many numbers are within 3 of 10, you can subtract the number of numbers <= 13 from those strictly less than 7.
import numpy as np
def broadcast(x, width):
# for comparison
return (np.abs(x[:,None] - x) <= width).sum(1)
def largest_leq(arr, x, allow_equal=True):
maybe = np.searchsorted(arr, x)
maybe = maybe.clip(0, len(arr) - 1)
above = arr[maybe] > x if allow_equal else arr[maybe] >= x
maybe[above] -= 1
return maybe
def faster(x, width):
uniq, inv, counts = np.unique(x, return_counts=True, return_inverse=True)
counts = counts.cumsum()
low_bounds = uniq - width
low_ix = largest_leq(uniq, low_bounds, allow_equal=False)
low_counts = counts[low_ix]
low_counts[low_ix < 0] = 0
high_bounds = uniq + width
high_counts = counts[largest_leq(uniq, high_bounds)]
delta = high_counts - low_counts
out = delta[inv]
return out
This passes my tests:
for width in range(1, 10):
for window in range(5):
for trial in range(10):
x = np.random.randint(0, 10, width)
b = broadcast(x, window).tolist()
f = faster(x, window).tolist()
assert b == f
and behaves pretty well even at larger sizes:
In [171]: x = np.random.random(10**6)
In [172]: %time faster(x, 0)
Wall time: 386 ms
Out[172]: array([1, 1, 1, ..., 1, 1, 1], dtype=int64)
In [173]: %time faster(x, 1)
Wall time: 372 ms
Out[173]: array([1000000, 1000000, 1000000, ..., 1000000, 1000000, 1000000], dtype=int64)
In [174]: x = np.random.randint(0, 10, 10**6)
In [175]: %timeit faster(x, 3)
10 loops, best of 3: 83 ms per loop
x = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
I want to grab first 2 rows of array x from every block of 5, result should be:
x[fancy_indexing] = [1,2, 6,7, 11,12]
It's easy enough to build up an index like that using a for loop.
Is there a one-liner slicing trick that will pull it off? Points for simplicity here.
Approach #1 Here's a vectorized one-liner using boolean-indexing -
x[np.mod(np.arange(x.size),M)<N]
Approach #2 If you are going for performance, here's another vectorized approach using NumPy strides -
n = x.strides[0]
shp = (x.size//M,N)
out = np.lib.stride_tricks.as_strided(x, shape=shp, strides=(M*n,n)).ravel()
Sample run -
In [61]: # Inputs
...: x = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
...: N = 2
...: M = 5
...:
In [62]: # Approach 1
...: x[np.mod(np.arange(x.size),M)<N]
Out[62]: array([ 1, 2, 6, 7, 11, 12])
In [63]: # Approach 2
...: n = x.strides[0]
...: shp = (x.size//M,N)
...: out=np.lib.stride_tricks.as_strided(x,shape=shp,strides=(M*n,n)).ravel()
...:
In [64]: out
Out[64]: array([ 1, 2, 6, 7, 11, 12])
I first thought you need this to work for 2d arrays due to your phrasing of "first N rows of every block of M rows", so I'll leave my solution as this.
You could work some magic by reshaping your array into 3d:
M = 5 # size of blocks
N = 2 # number of columns to cut
x = np.arange(3*4*M).reshape(4,-1) # (4,3*N)-shaped dummy input
x = x.reshape(x.shape[0],-1,M)[:,:,:N+1].reshape(x.shape[0],-1) # (4,3*N)-shaped output
This will extract every column according to your preference. In order to use it for your 1d case you'd need to make your 1d array into a 2d one using x = x[None,:].
Reshape the array to multiple rows of five columns then take (slice) the first two columns of each row.
>>> x
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
>>> x.reshape(x.shape[0] / 5, 5)[:,:2]
array([[ 1, 2],
[ 6, 7],
[11, 12]])
Or
>>> x.reshape(x.shape[0] / 5, 5)[:,:2].flatten()
array([ 1, 2, 6, 7, 11, 12])
>>>
It only works with 1-d arrays that have a length that is a multiple of five.
import numpy as np
x = np.array(range(1, 16))
y = np.vstack([x[0::5], x[1::5]]).T.ravel()
y
// => array([ 1, 2, 6, 7, 11, 12])
Taking the first N rows of every block of M rows in the array [1, 2, ..., K]:
import numpy as np
K = 30
M = 5
N = 2
x = np.array(range(1, K+1))
y = np.vstack([x[i::M] for i in range(N)]).T.ravel()
y
// => array([ 1, 2, 6, 7, 11, 12, 16, 17, 21, 22, 26, 27])
Notice that .T and .ravel() are fast operations: they don't copy any data, but just manipulate the dimensions and strides of the array.
If you insist on getting your slice using fancy indexing:
import numpy as np
K = 30
M = 5
N = 2
x = np.array(range(1, K+1))
fancy_indexing = [i*M+n for i in range(len(x)//M) for n in range(N)]
x[fancy_indexing]
// => array([ 1, 2, 6, 7, 11, 12, 16, 17, 21, 22, 26, 27])
I'm hoping anybody could help me with the following.
I have 2 lists of arrays, which should be linked to each-other. Each list stands for a certain object. arr1 and arr2 are the attributes of that object.
For example:
import numpy as np
arr1 = [np.array([1, 2, 3]), np.array([1, 2]), np.array([2, 3])]
arr2 = [np.array([20, 50, 30]), np.array([50, 50]), np.array([75, 25])]
The arrays are linked to each other as in the 1 in arr1, first array belongs to the 20 in arr2 first array. The result I'm looking for in this example would be a numpy array with size 3,4. The 'columns' stand for 0, 1, 2, 3 (the numbers in arr1, plus 0) and the rows are filled with the corresponding values of arr2. When there are no corresponding values this cell should be 0.
Example:
array([[ 0, 20, 50, 30],
[ 0, 50, 50, 0],
[ 0, 0, 75, 25]])
How would I link these two list of arrays and reshape them in the desired format as shown in the above example?
Many thanks!
Here's an almost* vectorized approach -
lens = np.array([len(i) for i in arr1])
N = len(arr1)
row_idx = np.repeat(np.arange(N),lens)
col_idx = np.concatenate(arr1)
M = col_idx.max()+1
out = np.zeros((N,M),dtype=int)
out[row_idx,col_idx] = np.concatenate(arr2)
*: Almost because of the loop comprehension at the start, but that should be computationally negligible as it doesn't involve any computation there.
Here is a solution with for-loops. Showing each step in detail.
import numpy as np
arr1 = [np.array([1, 2, 3]), np.array([1, 2]), np.array([2, 3])]
arr2 = [np.array([20, 50, 30]), np.array([50, 50]), np.array([75, 25])]
maxi = []
for i in range(len(arr1)):
maxi.append(np.max(arr1[i]))
maxi = np.max(maxi)
output = np.zeros((len(arr2),maxi))
for i in range(len(arr1)):
for k in range(len(arr1[i])):
output[i][k]=arr2[i][k]
This is a straight forward approach, with only one level of iteration:
In [261]: res=np.zeros((3,4),int)
In [262]: for i,(idx,vals) in enumerate(zip(arr1, arr2)):
...: res[i,idx]=vals
...:
In [263]: res
Out[263]:
array([[ 0, 20, 50, 30],
[ 0, 50, 50, 0],
[ 0, 0, 75, 25]])
I suspect it is faster than #Divakar's approach for this example. And it should remain competitive as long as the number of columns is quite a bit larger than the number of rows.