I have two numpy arrays of the following shape:
print(a.shape) -> (100, 20, 3, 3)
print(b.shape) -> (100, 3)
Array a is empty, as I just need this predefined shape, I created it with:
a = numpy.empty(shape=(100, 20, 3, 3))
Now I would like to copy data from array b to array a so that the second and third dimension of array a gets filled with the same 3 values of the corresponding row of array b.
Let me try to make it a bit clearer:
Array b contains 100 rows (100, 3) and each row holds three values (100, 3).
Now every row of array a (100, 20, 3, 3) should also hold the same three values in the last dimension (100, 20, 3, 3), while those three values stay the same for the second and third dimension (100, 20, 3, 3) for the same row (100, 20, 3, 3).
How can I copy the data as described without using loops? I just can not get it done but there must be an easy solution for this.
We can make use of np.broadcast_to.
If you are okay with a view -
np.broadcast_to(b[:,None, None, :], (100, 2, 3, 3))
If you need an output with its own memory space, simply append with .copy().
If you want to save on memory and fill into already defined array, a :
a[:] = b[:,None,None,:]
Note that we can skip the trailing :s.
Timings :
In [20]: b = np.random.rand(100, 3)
In [21]: %timeit np.broadcast_to(b[:,None, None, :], (100, 2, 3, 3))
5.93 µs ± 64.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [22]: %timeit np.broadcast_to(b[:,None, None, :], (100, 2, 3, 3)).copy()
11.4 µs ± 56.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [23]: %timeit np.repeat(np.repeat(b[:,None,None,:], 20, 1), 3, 2)
39.3 µs ± 147 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
You can use repeat along axis. You also do not need to predefine a. I would also suggest NOT to use broadcast_to since it returns readonly view and memory is shared among elements:
a = np.repeat(b[:,None,None,:], 20, 1) #adds dimensions 1 and 2 and repeats 20 times along axis 1
a = np.repeat(a, 3, 2) #repeats 3 times along axis 2
Smaller example:
b = np.arange(2*3).reshape(2,3)
#[[0 1 2]
# [3 4 5]]
a = np.repeat(b[:,None,None,:], 2, 1)
a = np.repeat(a, 3, 2)
#shape(2,2,3,3)
[[[[0 1 2]
[0 1 2]
[0 1 2]]
[[0 1 2]
[0 1 2]
[0 1 2]]]
[[[3 4 5]
[3 4 5]
[3 4 5]]
[[3 4 5]
[3 4 5]
[3 4 5]]]]
Related
I'm looking for a numpy equivalent of my suboptimal Python code. The calculation I want to do can be summarized by:
The average of the peak of each section for each row.
Here the code with a sample array and list of indices. Sections can be of different sizes.
x = np.array([[1, 2, 3, 4],
[5, 6, 7, 8]])
indices = [2]
result = np.empty((1, x.shape[0]))
for row in x:
splited = np.array_split(row, indexes)
peak = [np.amax(a) for a in splited]
result[0, i] = np.average(peak)
Which gives: result = array([[3., 7.]])
What is the optimized numpy way to suppress both loop?
You could just take off the for loop and use axis instead:
result2 = np.mean([np.max(arr, 1) for arr in np.array_split(x_large, indices, 1)], axis=0)
Output:
array([3., 7.])
Benchmark:
x_large = np.array([[1, 2, 3, 4],
[5, 6, 7, 8]] * 1000)
%%timeit
result = []
for row in x_large:
splited = np.array_split(row, indices)
peak = [np.amax(a) for a in splited]
result.append(np.average(peak))
# 29.9 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.mean([np.max(arr, 1) for arr in np.array_split(x_large, indices, 1)], axis=0)
# 37.4 µs ± 499 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Validation:
np.array_equal(result, result2)
# True
I'm trying to split a multidimensional array (array)
import numpy as np
shape = (3, 4, 4, 2)
array = np.random.randint(0,10,shape)
into an array (new_array) with shape (3,2,2,2,2,2) where the dimension 1 has been split into 2 (dimension 1 and 2) and dimension 2 in array has been split into 2 (dimensions 3 and 4).
So far I got a working method which is:
div_x = 2
div_y = 2
new_dim_x = shape[1]//div_x
new_dim_y = shape[2]//div_y
new_array_split = np.array([np.split(each_sub, axis=2, indices_or_sections=div_y) for each_sub in np.split(array[:, :(new_dim_x*div_x), :(new_dim_y*div_y)], axis=1, indices_or_sections=div_x)])
I'm also looking into using reshape:
new_array_reshape = array[:, :(div_x*new_dim_x), :(div_y*new_dim_y), ...].reshape(shape[0], div_x, div_y, new_dim_x, new_dim_y, shape[-1]).transpose(1,2,0,3,4,5)
The reshape method is faster than the split method:
%timeit array[:, :(div_x*new_dim_x), :(div_y*new_dim_y), ...].reshape(shape[0], div_x, div_y, new_dim_x, new_dim_y, shape[-1]).transpose(1,2,0,3,4,5)
2.16 µs ± 44.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit np.array([np.split(each_sub, axis=2, indices_or_sections=div_y) for each_sub in np.split(array[:, :(new_dim_x*div_x), :(new_dim_y*div_y)], axis=1, indices_or_sections=div_x)])
58.3 µs ± 2.13 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
However, I cannot get the same results, because of the last dimension:
print('Reshape method')
print(new_array_reshape[1,0,0,...])
print('\nSplit method')
print(new_array_split[1,0,0,...])
Reshape method
[[[2 2]
[4 3]]
[[3 5]
[5 9]]]
Split method
[[[2 2]
[4 3]]
[[5 3]
[9 8]]]
The split method does exactly what I want, I did check number by number and it does the type of split I want, but not at the speed I would like.
QUESTION
Is there a way to achieve the same results as the split method, using reshape or any other approach?
CONTEXT
The array is actually data flow from image processing, where the first dimension of array is the time, the second dimension is coordinate x (4), the third dimension is coordinate y (4) and the fourth dimension (2) is the Magnitude and phase of the flow.
I would like to split the images (coordinate x and y) into subimages making an array of pictures of 2x2 so I can analyse the flow more locally, perform averages, clustering, etc.
This process (splitting) is going to be performed many times that is why I'm looking for an optimal and efficient solution. I believe the way is probably using reshape, but I'm open to any other option.
Reshape and permute axes -
array.reshape(3,2,2,2,2,2).transpose(1,3,0,2,4,5)
For your use case I'm not sure reshape is the best option. If you want to be able to locally average and cluster, you might want a window function:
from skimage.util import view_as_windows
def window_over(arr, size = 2, step = 2, axes = (1, 2) ):
wshp = list(arr.shape)
for a in axes:
wshp[a] = size
return view_as_windows(arr, wshp, step).squeeze()
window_over(test).shape
Out[]: (2, 2, 3, 2, 2, 2)
Your output axes can then be rearranged how you want using transpose. The benefit of this is that you can get the intermediate windows:
window_over(test, step = 1).shape
Out[]: (3, 3, 3, 2, 2, 2)
That includes the 2x2 windows that overlap, so you get 3x3 results.
Since overlapping is possible, you also don't need your windows to be divisible by the dimension size:
window_over(test, size = 3).shape
Out[]: (2, 2, 3, 3, 3, 2)
I need to check if an array A contains all elements of another array B. If not, output the missing elements. Both A and B are integers, and B is always from 0 to N with an interval of 1.
import numpy as np
A=np.array([1,2,3,6,7,8,9])
B=np.arange(10)
I know that I can use the following to check if there is any missing elements, but it does not give the index of the missing element.
np.all(elem in A for elem in B)
Is there a good way in python to output the indices of the missing elements?
IIUC you can try the following and assuming that B always is an "index" list:
[i for i in B if i not in A]
The output would be : [0, 4, 5]
Best way to do it with numpy
Numpy actually has a function to perform this : numpy.insetdiff1d
np.setdiff1d(B, A)
# Which returns
array([0, 4, 5])
You can use enumerate to get both index and content of a list. The following code would do what you want
idx = [idx for idx, element in enumerate(B) if element not in A]
I am assuming we want to get the elements exclusive to B, when compared to A.
Approach #1
Given the specific of B is always from 0 to N with an interval of 1, we can use a simple mask-based one -
mask = np.ones(len(B), dtype=bool)
mask[A] = False
out = B[mask]
Approach #2
Another one that edits B and would be more memory-efficient -
B[A] = -1
out = B[B>=0]
Approach #3
A more generic case of integers could be handled differently -
def setdiff_for_ints(B, A):
N = max(B.max(), A.max()) - min(min(A.min(),B.min()),0) + 1
mask = np.zeros(N, dtype=bool)
mask[B] = True
mask[A] = False
out = np.flatnonzero(mask)
return out
Sample run -
In [77]: A
Out[77]: array([ 1, 2, 3, 6, 7, 8, -6])
In [78]: B
Out[78]: array([1, 3, 4, 5, 7, 9])
In [79]: setdiff_for_ints(B, A)
Out[79]: array([4, 5, 9])
# Using np.setdiff1d to verify :
In [80]: np.setdiff1d(B, A)
Out[80]: array([4, 5, 9])
Timings -
In [81]: np.random.seed(0)
...: A = np.unique(np.random.randint(-10000,100000,1000000))
...: B = np.unique(np.random.randint(0,100000,1000000))
# #Hugolmn's soln with np.setdiff1d
In [82]: %timeit np.setdiff1d(B, A)
4.78 ms ± 96.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [83]: %timeit setdiff_for_ints(B, A)
599 µs ± 6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Let’s say I have two NumPy arrays, a and b:
a = np.array([
[1, 2, 3],
[2, 3, 4]
])
b = np.array([8,9])
And I would like to append the same array b to every row (ie. adding multiple columns) to get an array, c:
b = np.array([
[1, 2, 3, 8, 9],
[2, 3, 4, 8, 9]
])
How can I do this easily and efficiently in NumPy?
I am especially concerned about its behaviour with big datasets (where a is much bigger than b), is there any way around creating many copies (ie. a.shape[0]) of b?
Related to this question, but with multiple values.
Here's one way. I assume it's efficient because it's vectorised. It relies on the fact that in matrix multiplication, pre-multiplying a row by the column (1, 1) will produce two stacked copies of the row.
import numpy as np
a = np.array([
[1, 2, 3],
[2, 3, 4]
])
b = np.array([[8,9]])
np.concatenate([a, np.array([[1],[1]]).dot(b)], axis=1)
Out: array([[1, 2, 3, 8, 9],
[2, 3, 4, 8, 9]])
Note that b is specified slightly differently (as a two-dimensional array).
Is there any way around creating many copies of b?
The final result contains those copies (and numpy arrays are literally arrays of values in memory), so I don't see how.
An alternative to concatenate approach is to make a recipient array, and copy values to it:
In [483]: a = np.arange(300).reshape(100,3)
In [484]: b=np.array([8,9])
In [485]: res = np.zeros((100,5),int)
In [486]: res[:,:3]=a
In [487]: res[:,3:]=b
sample timings
In [488]: %%timeit
...: res = np.zeros((100,5),int)
...: res[:,:3]=a
...: res[:,3:]=b
...:
...:
6.11 µs ± 20.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [491]: timeit np.concatenate((a, b.repeat(100).reshape(2,-1).T),1)
7.74 µs ± 15.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [164]: timeit np.concatenate([a, np.ones([a.shape[0],1], dtype=int).dot(np.array([b]))], axis=1)
8.58 µs ± 160 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
The way I solved this initially was :
c = np.concatenate([a, np.tile(b, (a.shape[0],1))], axis = 1)
But this feels very inefficient...
I have multiple numpy arrays and I want to create new arrays doing something that is like an XOR ... but not quite.
My input is two arrays, array1 and array2.
My output is a modified (or new array, I don't really care) version of array1.
The modification is elementwise, by doing the following:
1.) If either array has 0 for the given index, then the index is left unchanged.
2.) If array1 and array2 are nonzero, then the modified array is assigned the value of array1's index subtracted by array2's index, down to a minimum of zero.
Examples:
array1: [0, 3, 8, 0]
array2: [1, 1, 1, 1]
output: [0, 2, 7, 0]
array1: [1, 1, 1, 1]
array2: [0, 3, 8, 0]
output: [1, 0, 0, 1]
array1: [10, 10, 10, 10]
array2: [8, 12, 8, 12]
output: [2, 0, 2, 0]
I would like to be able to do this with say, a single numpy.copyto statement, but I don't know how. Thank you.
edit:
it just hit me. could I do:
new_array = np.zeros(size_of_array1)
numpy.copyto(new_array, array1-array2, where=array1>array2)
Edit 2: Since I have received several answers very quickly I am going to time the different answers against each other to see how they do. Be back with results in a few minutes.
Okay, results are in:
array of random ints 0 to 5, size = 10,000, 10 loops
1.)using my np.copyto method
2.)using clip
3.)using maximum
0.000768184661865
0.000391960144043
0.000403165817261
Kasramvd also provided some useful timings below
You can use a simple subtraction and clipping the result with zero as the min:
(arr1 - arr2).clip(min=0)
Demo:
In [43]: arr1 = np.array([0,3,8,0]); arr2 = np.array([1,1,1,1])
In [44]: (arr1 - arr2).clip(min=0)
Out[44]: array([0, 2, 7, 0])
On large arrays it's also faster than maximum approach:
In [51]: arr1 = np.arange(10000); arr2 = np.arange(10000)
In [52]: %timeit np.maximum(0, arr1 - arr2)
22.3 µs ± 1.77 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [53]: %timeit (arr1 - arr2).clip(min=0)
20.9 µs ± 167 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [54]: arr1 = np.arange(100000); arr2 = np.arange(100000)
In [55]: %timeit np.maximum(0, arr1 - arr2)
671 µs ± 5.69 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [56]: %timeit (arr1 - arr2).clip(min=0)
648 µs ± 4.43 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Note that if it's possible for arr2 to have negative values you should consider using an abs function on arr2 to get the expected result:
(arr1 - abs(arr2)).clip(min=0)
In [73]: np.maximum(0,np.array([0,3,8,0])-np.array([1,1,1,1]))
Out[73]: array([0, 2, 7, 0])
This doesn't explicitly address
If either array has 0 for the given index, then the index is left unchanged.
but the results match for all examples:
In [74]: np.maximum(0,np.array([1,1,1,1])-np.array([0,3,8,0]))
Out[74]: array([1, 0, 0, 1])
In [75]: np.maximum(0,np.array([10,10,10,10])-np.array([8,12,8,12]))
Out[75]: array([2, 0, 2, 0])
You can first simply subtract the arrays and then use boolean array indexing on the subtracted result to assign 0 where there are negative values as in:
# subtract
In [43]: subtracted = arr1 - arr2
# get a boolean mask by checking for < 0
# index into the array and assign 0
In [44]: subtracted[subtracted < 0] = 0
In [45]: subtracted
Out[45]: array([0, 2, 7, 0])
Applying the same for the other inputs specified by OP:
In [46]: arr1 = np.array([1, 1, 1, 1])
...: arr2 = np.array([0, 3, 8, 0])
In [47]: subtracted = arr1 - arr2
In [48]: subtracted[subtracted < 0] = 0
In [49]: subtracted
Out[49]: array([1, 0, 0, 1])
And for the third input arrays:
In [50]: arr1 = np.array([10, 10, 10, 10])
...: arr2 = np.array([8, 12, 8, 12])
In [51]: subtracted = arr1 - arr2
In [52]: subtracted[subtracted < 0] = 0
In [53]: subtracted
Out[53]: array([2, 0, 2, 0])