Related
I have two numpy arrays NS, EW to sum up. Each of them has missing values at different positions, like
NS = array([[ 1., 2., nan],
[ 4., 5., nan],
[ 6., nan, nan]])
EW = array([[ 1., 2., nan],
[ 4., nan, nan],
[ 6., nan, 9.]]
How can I perform a summation operation in the numpy way, which will treat nan as zero if one array has nan at a location, and keep nan if both arrays has nan at the same location.
The result I expect to see is
SUM = array([[ 2., 4., nan],
[ 8., 5., nan],
[ 12., nan, 9.]])
When I try
SUM=np.add(NS,EW)
it gives me
SUM=array([[ 2., 4., nan],
[ 8., nan, nan],
[ 12., nan, nan]])
When I try
SUM = np.nansum(np.dstack((NS,EW)),2)
it gives me
SUM=array([[ 2., 4., 0.],
[ 8., 5., 0.],
[ 12., 0., 9.]])
Of course, I can realize my goal by doing element-level operation,
for i in range(np.size(NS,0)):
for j in range(np.size(NS,1)):
if np.isnan(NS[i,j]) and np.isnan(EW[i,j]):
SUM[i,j] = np.nan
elif np.isnan(NS[i,j]):
SUM[i,j] = EW[i,j]
elif np.isnan(EW[i,j]):
SUM[i,j] = NS[i,j]
else:
SUM[i,j] = NS[i,j]+EW[i,j]
but it is very slow. So I'm looking for a more numpy solution to solve this problem.
Thanks for help in advance!
Approach #1 : One approach with np.where -
def sum_nan_arrays(a,b):
ma = np.isnan(a)
mb = np.isnan(b)
return np.where(ma&mb, np.nan, np.where(ma,0,a) + np.where(mb,0,b))
Sample run -
In [43]: NS
Out[43]:
array([[ 1., 2., nan],
[ 4., 5., nan],
[ 6., nan, nan]])
In [44]: EW
Out[44]:
array([[ 1., 2., nan],
[ 4., nan, nan],
[ 6., nan, 9.]])
In [45]: sum_nan_arrays(NS, EW)
Out[45]:
array([[ 2., 4., nan],
[ 8., 5., nan],
[ 12., nan, 9.]])
Approach #2 : Probably a faster one with a mix of boolean-indexing -
def sum_nan_arrays_v2(a,b):
ma = np.isnan(a)
mb = np.isnan(b)
m_keep_a = ~ma & mb
m_keep_b = ma & ~mb
out = a + b
out[m_keep_a] = a[m_keep_a]
out[m_keep_b] = b[m_keep_b]
return out
Runtime test -
In [140]: # Setup input arrays with 4/9 ratio of NaNs (same as in the question)
...: a = np.random.rand(3000,3000)
...: b = np.random.rand(3000,3000)
...: a.ravel()[np.random.choice(range(a.size), size=4000000, replace=0)] = np.nan
...: b.ravel()[np.random.choice(range(b.size), size=4000000, replace=0)] = np.nan
...:
In [141]: np.nanmax(np.abs(sum_nan_arrays(a, b) - sum_nan_arrays_v2(a, b))) # Verify
Out[141]: 0.0
In [142]: %timeit sum_nan_arrays(a, b)
10 loops, best of 3: 141 ms per loop
In [143]: %timeit sum_nan_arrays_v2(a, b)
10 loops, best of 3: 177 ms per loop
In [144]: # Setup input arrays with lesser NaNs
...: a = np.random.rand(3000,3000)
...: b = np.random.rand(3000,3000)
...: a.ravel()[np.random.choice(range(a.size), size=4000, replace=0)] = np.nan
...: b.ravel()[np.random.choice(range(b.size), size=4000, replace=0)] = np.nan
...:
In [145]: np.nanmax(np.abs(sum_nan_arrays(a, b) - sum_nan_arrays_v2(a, b))) # Verify
Out[145]: 0.0
In [146]: %timeit sum_nan_arrays(a, b)
10 loops, best of 3: 69.6 ms per loop
In [147]: %timeit sum_nan_arrays_v2(a, b)
10 loops, best of 3: 38 ms per loop
Actually your nansum approach almost worked, you just need to add in the nans again:
def add_ignore_nans(a, b):
stacked = np.array([a, b])
res = np.nansum(stacked, axis=0)
res[np.all(np.isnan(stacked), axis=0)] = np.nan
return res
>>> add_ignore_nans(a, b)
array([[ 2., 4., nan],
[ 8., 5., nan],
[ 12., nan, 9.]])
This will be slower than #Divakars answer but I wanted to mention that you were pretty close already! :-)
I think we can get a bit more concise, in the same vein as Divakar's second approach. With a = NS and b = EW:
na = numpy.isnan(a)
nb = numpy.isnan(b)
a[na] = 0
b[nb] = 0
a += b
na &= nb
a[na] = numpy.nan
The operations are done in-place where possible to save memory, assuming this is is feasible in your scenario. The final result is in a.
I'm looking for the fastest way to calculate the squared difference between two vectors ((x1-x2)**2), but pairwise (all combinations or only the upper triangle).
x1 = [1,3,5,6,8]
x2 = [3,6,7,9,12]
Expected output:
array([[ 4., 25., 36., 64., 121.],
[ 0., 9., 16., 36., 81.],
[ 4., 1., 4., 16., 49.],
[ 9., 0., 1., 9., 36.],
[ 25., 4., 1., 1., 16.]])
or
array([[ 4., 25., 36., 64., 121.],
[ 0., 9., 16., 36., 81.],
[ 0., 0., 4., 16., 49.],
[ 0., 0., 0., 9., 36.],
[ 0., 0., 0., 0., 16.]])
or even (if faster):
array([ 4., 25., 36., 64., 121., 9., 16., 36., 81.,
4., 1., 4., 16., 49., 9., 1., 9., 36.,
25., 4., 1., 1., 16.])
Here's one with broadcasting and masking to get the upper triangular ones and then squaring only those for better performance efficiency -
def pairwise_squared_diff(x1, x2):
x1 = np.asarray(x1)
x2 = np.asarray(x2)
diffs = x1[:,None] - x2
mask = np.arange(len(x1))[:,None] <= np.arange(len(x2))
return (diffs[mask])**2
Sample run -
In [85]: x1
Out[85]: array([1, 3, 5, 6, 8])
In [86]: x2
Out[86]: array([ 3, 6, 7, 9, 12])
In [87]: pairwise_squared_diff(x1, x2)
Out[87]:
array([ 4, 25, 36, 64, 121, 9, 16, 36, 81, 4, 16, 49, 9,
36, 16])
Possible improvements
Improvement #1 :
We could also use np.tri to generate mask -
mask = ~np.tri(len(x1),len(x2),dtype=bool,k=-1)
Improvement #2 :
If we are okay with a 2D output with the lower triangular ones set as 0s, then a simple elementwise multiplication with mask solves it too to get the final output -
(diffs*mask)**2
This would work well with numexpr module for large data and to gain memory efficiency and hence performance.
Improvement #3 :
We could also compute the differences with numexpr and hence compute the masked output too with the same evaulate method, to give ourselves a new solution altogether -
def pairwise_squared_diff_numexpr(x1, x2):
x1 = np.asarray(x1)
x2 = np.asarray(x2)
mask = ~np.tri(len(x1),len(x2),dtype=bool,k=-1)
return ne.evaluate('mask*((x1D-x2)**2)',{'x1D':x1[:,None]})
Timings with improvements
Let's study these suggestions on performance for large arrays -
Setup :
In [136]: x1 = np.random.randint(0,9,(1000))
In [137]: x2 = np.random.randint(0,9,(1000))
With Improvement #1 :
In [138]: %timeit np.arange(len(x1))[:,None] <= np.arange(len(x2))
1000 loops, best of 3: 772 µs per loop
In [139]: %timeit ~np.tri(len(x1),len(x2),dtype=bool,k=-1)
1000 loops, best of 3: 243 µs per loop
With Improvement #2 :
In [140]: import numexpr as ne
In [141]: diffs = x1[:,None] - x2
...: mask = np.arange(len(x1))[:,None] <= np.arange(len(x2))
In [142]: %timeit (diffs[mask])**2
1000 loops, best of 3: 1.46 ms per loop
In [143]: %timeit ne.evaluate('(diffs*mask)**2')
1000 loops, best of 3: 1.05 ms per loop
With Improvement #3 on complete solutions :
In [170]: %timeit pairwise_squared_diff(x1, x2)
100 loops, best of 3: 3.66 ms per loop
In [171]: %timeit pairwise_squared_diff_numexpr(x1, x2)
1000 loops, best of 3: 1.54 ms per loop
Loopy one
For completeness, here's a loopy one that leverages slicing to perform better than pure broadcasting one, owing to the memory-efficiency -
def pairwise_squared_diff_loopy(x1,x2):
n = len(x2)
idx = np.concatenate(( [0], np.arange(n,0,-1).cumsum() ))
start, stop = idx[:-1], idx[1:]
L = n*(n+1)//2
out = np.empty(L,dtype=np.result_type(x1,x2))
for i,(s0,s1) in enumerate(zip(start,stop)):
out[s0:s1] = x1[i] - x2[i:]
return out**2
Timings -
In [300]: x1 = np.random.randint(0,9,(1000))
...: x2 = np.random.randint(0,9,(1000))
In [301]: %timeit pairwise_squared_diff(x1, x2)
100 loops, best of 3: 3.44 ms per loop
In [302]: %timeit pairwise_squared_diff_loopy(x1, x2)
100 loops, best of 3: 2.73 ms per loop
You can use broadcasting:
x1 = np.asarray([1,3,5,6,8]).reshape(-1, 1)
x2 = np.asarray([3,6,7,9,12]).reshape(1, -1)
(x1 - x2)**2
Output:
array([[ 4, 25, 36, 64, 121],
[ 0, 9, 16, 36, 81],
[ 4, 1, 4, 16, 49],
[ 9, 0, 1, 9, 36],
[ 25, 4, 1, 1, 16]])
which is simple to code, but computes all values, so it may be optimized to compute only the upper triangle.
I have a numpy vector, and a numpy array.
I need to take from every row in the matrix the first N (lets say 3) values that are smaller than (or equal to) the corresponding line in the vector.
so if this is my vector:
7,
9,
22,
38,
6,
15
and this is my matrix:
[[ 20., 9., 7., 5., None, None],
[ 33., 21., 18., 9., 8., 7.],
[ 31., 21., 13., 12., 4., 0.],
[ 36., 18., 11., 7., 7., 2.],
[ 20., 14., 10., 6., 6., 3.],
[ 14., 14., 13., 11., 5., 5.]]
the output should be:
[[7,5,None],
[9,8,7],
[21,13,12],
[36,18,11],
[6,6,3],
14,14,13]]
Is there any efficient way to do that with masks or something, without an ugly for loop?
Any help will be appreciated!
Approach #1
Here's one with broadcasting -
def takeN_le_per_row_broadcasting(a, b, N=3): # a, b : 1D, 2D arrays respectively
# First col indices in each row of b with <= corresponding one in a
idx = (b <= a[:,None]).argmax(1)
# Get all N ranged column indices
all_idx = idx[:,None] + np.arange(N)
# Finally advanced-index with those indices into b for desired output
return b[np.arange(len(all_idx))[:,None], all_idx]
Approach #2
Inspired by NumPy Fancy Indexing - Crop different ROIs from different channels's solution, we can leverage np.lib.stride_tricks.as_strided for efficient patch extraction, like so -
from skimage.util.shape import view_as_windows
def takeN_le_per_row_strides(a, b, N=3): # a, b : 1D, 2D arrays respectively
# First col indices in each row of b with <= corresponding one in a
idx = (b <= a[:,None]).argmax(1)
# Get 1D sliding windows for each element off data
w = view_as_windows(b, (1,N))[:,:,0]
# Use fancy/advanced indexing to select the required ones
return w[np.arange(len(idx)), idx]
I a little confused by the broadcasting rules of numpy. Suppose you want to perform an axis-wise scalar product of a higher dimension array to reduce the array dimension by one (basically to perform a weighted summation along one axis):
from numpy import *
A = ones((3,3,2))
v = array([1,2])
B = zeros((3,3))
# V01: this works
B[0,0] = v.dot(A[0,0])
# V02: this works
B[:,:] = v[0]*A[:,:,0] + v[1]*A[:,:,1]
# V03: this doesn't
B[:,:] = v.dot(A[:,:])
Why does V03 not work?
Cheers
np.dot(a, b) operates over the last axis of a and the second-to-last of b. So for your particular case in your question,you could always go with :
>>> a.dot(v)
array([[ 3., 3., 3.],
[ 3., 3., 3.],
[ 3., 3., 3.]])
If you want to keep the v.dot(a) order, you need to get the axis into position, which can easily be achieved with np.rollaxis:
>>> v.dot(np.rollaxis(a, 2, 1))
array([[ 3., 3., 3.],
[ 3., 3., 3.],
[ 3., 3., 3.]])
I don't like np.dot too much, unless it is for the obvious matrix or vector multiplication, because it is very strict about the output dtype when using the optional out parameter. Joe Kington has mentioned it already, but if you are going to be doing this type of things, get used to np.einsum: once you get the hang of the syntax, it will cut down the amount of time you spend worrying about reshaping things to a minimum:
>>> a = np.ones((3, 3, 2))
>>> np.einsum('i, jki', v, a)
array([[ 3., 3., 3.],
[ 3., 3., 3.],
[ 3., 3., 3.]])
Not that it is too relevant in this case, but it is also ridiculously fast:
In [4]: %timeit a.dot(v)
100000 loops, best of 3: 2.43 us per loop
In [5]: %timeit v.dot(np.rollaxis(a, 2, 1))
100000 loops, best of 3: 4.49 us per loop
In [7]: %timeit np.tensordot(v, a, axes=(0, 2))
100000 loops, best of 3: 14.9 us per loop
In [8]: %timeit np.einsum('i, jki', v, a)
100000 loops, best of 3: 2.91 us per loop
You can also use tensordot, in this particular case.
import numpy as np
A = np.ones((3,3,2))
v = np.array([1,2])
print np.tensordot(v, A, axes=(0, 2))
This yields:
array([[ 3., 3., 3.],
[ 3., 3., 3.],
[ 3., 3., 3.]])
The axes=(0,2) indicates that tensordot should sum over the first axis in v and the third axis in A. (Also have a look at einsum, which is more flexible, but harder to understand if you're not used to the notation.)
If speed is a consideration, tensordot is considerably faster than using apply_along_axes for small arrays.
In [14]: A = np.ones((3,3,2))
In [15]: v = np.array([1,2])
In [16]: %timeit np.tensordot(v, A, axes=(0, 2))
10000 loops, best of 3: 21.6 us per loop
In [17]: %timeit np.apply_along_axis(v.dot, 2, A)
1000 loops, best of 3: 258 us per loop
(The difference is less apparent for large arrays due to a constant overhead, though tensordot is consistently faster.)
You could use numpy.apply_along_axis() for this:
In [35]: np.apply_along_axis(v.dot, 2, A)
Out[35]:
array([[ 3., 3., 3.],
[ 3., 3., 3.],
[ 3., 3., 3.]])
The reason I think V03 doesn't work is that it's no different to:
B[:,:] = v.dot(A)
i.e. it tries to compute the dot product along the outermost axis of A.
I have a numpy_array. Something like [ a b c ].
And then I want to concatenate it with another NumPy array (just like we create a list of lists). How do we create a NumPy array containing NumPy arrays?
I tried to do the following without any luck
>>> M = np.array([])
>>> M
array([], dtype=float64)
>>> M.append(a,axis=0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'append'
>>> a
array([1, 2, 3])
In [1]: import numpy as np
In [2]: a = np.array([[1, 2, 3], [4, 5, 6]])
In [3]: b = np.array([[9, 8, 7], [6, 5, 4]])
In [4]: np.concatenate((a, b))
Out[4]:
array([[1, 2, 3],
[4, 5, 6],
[9, 8, 7],
[6, 5, 4]])
or this:
In [1]: a = np.array([1, 2, 3])
In [2]: b = np.array([4, 5, 6])
In [3]: np.vstack((a, b))
Out[3]:
array([[1, 2, 3],
[4, 5, 6]])
Well, the error message says it all: NumPy arrays do not have an append() method. There's a free function numpy.append() however:
numpy.append(M, a)
This will create a new array instead of mutating M in place. Note that using numpy.append() involves copying both arrays. You will get better performing code if you use fixed-sized NumPy arrays.
You may use numpy.append()...
import numpy
B = numpy.array([3])
A = numpy.array([1, 2, 2])
B = numpy.append( B , A )
print B
> [3 1 2 2]
This will not create two separate arrays but will append two arrays into a single dimensional array.
Sven said it all, just be very cautious because of automatic type adjustments when append is called.
In [2]: import numpy as np
In [3]: a = np.array([1,2,3])
In [4]: b = np.array([1.,2.,3.])
In [5]: c = np.array(['a','b','c'])
In [6]: np.append(a,b)
Out[6]: array([ 1., 2., 3., 1., 2., 3.])
In [7]: a.dtype
Out[7]: dtype('int64')
In [8]: np.append(a,c)
Out[8]:
array(['1', '2', '3', 'a', 'b', 'c'],
dtype='|S1')
As you see based on the contents the dtype went from int64 to float32, and then to S1
I found this link while looking for something slightly different, how to start appending array objects to an empty numpy array, but tried all the solutions on this page to no avail.
Then I found this question and answer: How to add a new row to an empty numpy array
The gist here:
The way to "start" the array that you want is:
arr = np.empty((0,3), int)
Then you can use concatenate to add rows like so:
arr = np.concatenate( ( arr, [[x, y, z]] ) , axis=0)
See also https://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html
Actually one can always create an ordinary list of numpy arrays and convert it later.
In [1]: import numpy as np
In [2]: a = np.array([[1,2],[3,4]])
In [3]: b = np.array([[1,2],[3,4]])
In [4]: l = [a]
In [5]: l.append(b)
In [6]: l = np.array(l)
In [7]: l.shape
Out[7]: (2, 2, 2)
In [8]: l
Out[8]:
array([[[1, 2],
[3, 4]],
[[1, 2],
[3, 4]]])
I had the same issue, and I couldn't comment on #Sven Marnach answer (not enough rep, gosh I remember when Stackoverflow first started...) anyway.
Adding a list of random numbers to a 10 X 10 matrix.
myNpArray = np.zeros([1, 10])
for x in range(1,11,1):
randomList = [list(np.random.randint(99, size=10))]
myNpArray = np.vstack((myNpArray, randomList))
myNpArray = myNpArray[1:]
Using np.zeros() an array is created with 1 x 10 zeros.
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
Then a list of 10 random numbers is created using np.random and assigned to randomList.
The loop stacks it 10 high. We just have to remember to remove the first empty entry.
myNpArray
array([[31., 10., 19., 78., 95., 58., 3., 47., 30., 56.],
[51., 97., 5., 80., 28., 76., 92., 50., 22., 93.],
[64., 79., 7., 12., 68., 13., 59., 96., 32., 34.],
[44., 22., 46., 56., 73., 42., 62., 4., 62., 83.],
[91., 28., 54., 69., 60., 95., 5., 13., 60., 88.],
[71., 90., 76., 53., 13., 53., 31., 3., 96., 57.],
[33., 87., 81., 7., 53., 46., 5., 8., 20., 71.],
[46., 71., 14., 66., 68., 65., 68., 32., 9., 30.],
[ 1., 35., 96., 92., 72., 52., 88., 86., 94., 88.],
[13., 36., 43., 45., 90., 17., 38., 1., 41., 33.]])
So in a function:
def array_matrix(random_range, array_size):
myNpArray = np.zeros([1, array_size])
for x in range(1, array_size + 1, 1):
randomList = [list(np.random.randint(random_range, size=array_size))]
myNpArray = np.vstack((myNpArray, randomList))
return myNpArray[1:]
a 7 x 7 array using random numbers 0 - 1000
array_matrix(1000, 7)
array([[621., 377., 931., 180., 964., 885., 723.],
[298., 382., 148., 952., 430., 333., 956.],
[398., 596., 732., 422., 656., 348., 470.],
[735., 251., 314., 182., 966., 261., 523.],
[373., 616., 389., 90., 884., 957., 826.],
[587., 963., 66., 154., 111., 529., 945.],
[950., 413., 539., 860., 634., 195., 915.]])
Try this code :
import numpy as np
a1 = np.array([])
n = int(input(""))
for i in range(0,n):
a = int(input(""))
a1 = np.append(a, a1)
a = 0
print(a1)
Also you can use array instead of "a"
If I understand your question, here's one way. Say you have:
a = [4.1, 6.21, 1.0]
so here's some code...
def array_in_array(scalarlist):
return [(x,) for x in scalarlist]
Which leads to:
In [72]: a = [4.1, 6.21, 1.0]
In [73]: a
Out[73]: [4.1, 6.21, 1.0]
In [74]: def array_in_array(scalarlist):
....: return [(x,) for x in scalarlist]
....:
In [75]: b = array_in_array(a)
In [76]: b
Out[76]: [(4.1,), (6.21,), (1.0,)]
As you want to concatenate along an existing axis (row wise), np.vstack or np.concatenate will work for you.
For a detailed list of concatenation operations, refer to the official docs.
This is for people working with numpy's ndarrays. The function numpy.concatenate() does work as well.
>>a = np.random.randint(0,9, size=(10,1,5,4))
>>a.shape
(10, 1, 5, 4)
>>b = np.random.randint(0,9, size=(15,1,5,4))
>>b.shape
(15, 1, 5, 4)
>>X = np.concatenate((a, b))
>>X.shape
(25, 1, 5, 4)
Much the same way as vstack()
>>Y = np.vstack((a,b))
>>Y.shape
(25, 1, 5, 4)
There is a handfull of method to stack arrays together, depending on the direction of the stack.
You may consider np.stack() (doc), np.vstack() (doc) and np.hstack() (doc) for example.