I've been trying to subtract (and combine an multiply etc.) two Numpy arrays so that the resulting array would have values left only in such places, where the other array has no data.
Like if I have matrices a and b, a-b would give c:
a = np.array([0,2,3,0])
b = np.array([1,0,3,0])
c = np.array([0,2,0,0])
I've already tried multiplying b with a very big number, but then I couldn't figure out how to get rid of the negative values. There is also that that the arrays a and b have missing values as -999.
Help would be very appreciated!Thanks!
How about this?
>>> a = np.array([0,2,3,0])
>>> b = np.array([1,0,3,0])
>>> a[b!=0] = 0
>>> a
array([0, 2, 0, 0])
Related
I have two 4D matrices, which I would like to add. The matrices have the exact same dimension and number of elements, but they both contain randomly distributed NaN values.
I would prefer to add them as below using numpy.nansum.
(1) if two values are added I want the sum to be a value,
(2) if a value and a NaN are added I want the sum to be the value and
(3) if two NaN are added I want the sum to be NaN.
Herewith what I tried
a[6x7x180x360]
b[6x7x180x360]
C=np.nansum[(a,b)]
C=np.nansum(np.dstack((a,b)),2)
But I am unable to get the resultant matrix with same dimension as input. It means resultant matrix C should be in [6x7x180x360].
Anyone can help in this regard. Thank you in advance.
You could use np.stack((a,b)) to stack along a new 0-axis, then call nansum to sum along that 0-axis:
C = np.nansum(np.stack((a,b)), axis=0)
For example,
In [34]: a = np.random.choice([1,2,3,np.nan], size=(6,7,180,360))
In [35]: b = np.random.choice([1,2,3,np.nan], size=(6,7,180,360))
In [36]: np.stack((a,b)).shape
Out[36]: (2, 6, 7, 180, 360)
In [37]: np.nansum(np.stack((a,b)), axis=0).shape
Out[37]: (6, 7, 180, 360)
You had the right idea, but np.dstack stacks along the third axis, which is not desireable here since you already have 4 axes:
In [31]: np.dstack((a,b)).shape
Out[31]: (6, 7, 360, 360)
Regarding your point (3):
Note that the behavior of np.nansum depends on the NumPy version:
In NumPy versions <= 1.8.0 Nan is returned for slices that are all-NaN or
empty. In later versions zero is returned.
If you are using NumPy version > 1.8.0, then you may have to use a solution such as
Maarten Fabré's to address this issue.
I believe the function np.nansum is not appropriate in your case. If I understand your question correctly, you wish to do an element-wise addition of two matrices with a little of logic regarding the NaN values.
Here is the full example on how to do it:
import numpy as np
a = np.array([ [np.nan, 2],
[3, np.nan]])
b = np.array([ [3, np.nan],
[1, np.nan]])
result = np.add(a,b)
a_is_nan = np.isnan(a)
b_is_nan = np.isnan(b)
result_is_nan = np.isnan(result)
mask_a = np.logical_and(result_is_nan, np.logical_not(a_is_nan))
result[mask_a] = a[mask_a]
mask_b = np.logical_and(result_is_nan, np.logical_not(b_is_nan))
result[mask_b] = b[mask_b]
print(result)
A little bit of explanation:
The first operation is np.add(a,b). This adds both matrices and any NaN element will produce a result of NaN also.
To select the NaN values from either arrays, we use a logical mask:
# result_is_nan is a boolean array containing True whereve the result is np.NaN. This occurs when any of the two element were NaN
result_is_nan = np.isnan(result)
# mask_a is a boolean array which 'flags' elements that are NaN in result but were not NaN in a !
mask_a = np.logical_and(result_is_nan, np.logical_not(a_is_nan))
# Using that mask, we assign those value to result
result[mask_a] = a[mask_a]
There you have it !
I think the easiest way is to use np.where
result = np.where(
np.isnan(a+b),
np.where(np.isnan(a), b, a),
a+b
)
This reads as:
if a+b is not nan, use a+b, else use a, unless it is nan, then use b. Whether or b is nan is of little consequence then.
Alternatively, you can use it like this:
result2 = np.where(
np.isnan(a) & np.isnan(b),
np.nan,
np.nansum(np.stack((a,b)), axis=0)
)
np.testing.assert_equal(result, result2) passes
I would like to take the two smallest values from an array x. But when I use np.where:
A,B = np.where(x == x.min())[0:1]
I get this error:
ValueError: need more than 1 value to unpack
How can I fix this error? And do I need to arange numbers in ascending order in array?
You can use numpy.partition to get the lowest k+1 items:
A, B = np.partition(x, 1)[0:2] # k=1, so the first two are the smallest items
In Python 3.x you could also use:
A, B, *_ = np.partition(x, 1)
For example:
import numpy as np
x = np.array([5, 3, 1, 2, 6])
A, B = np.partition(x, 1)[0:2]
print(A) # 1
print(B) # 2
How about using sorted instead of np.where?
A,B = sorted(x)[:2]
There are two errors in the code. The first is that the slice is [0:1] when it should be [0:2]. The second is actually a very common issue with np.where. If you look into the documentation, you will see that it always returns a tuple, with one element if you only pass one parameter. Hence you have to access the tuple element first and then index the array normally:
A,B = np.where(x == x.min())[0][0:2]
Which will give you the two first indices containing the minimum value. If no two such indices exist you will get an exception, so you may want to check for that.
I have two arrays, one is a matrix of index pairs,
a = array([[[0,0],[1,1]],[[2,0],[2,1]]], dtype=int)
and another which is a matrix of data to access at these indices
b = array([[1,2,3],[4,5,6],[7,8,9]])
and I want to able to use the indices of a to get the entries of b. Just doing:
>>> b[a]
does not work, as it gives one row of b for each entry in a, i.e.
array([[[[1,2,3],
[1,2,3]],
[[4,5,6],
[4,5,6]]],
[[[7,8,9],
[1,2,3]],
[[7,8,9],
[4,5,6]]]])
when I would like to use the index pair in the last axis of a to give the two indices of b:
array([[1,5],[7,8]])
Is there a clean way of doing this, or do I need to reshape b and combine the columns of a in a corresponding manner?
In my actual problem a has about 5 million entries, and b is 100-by-100, I'd like to avoid for loops.
Actually, this works:
b[a[:, :, 0],a[:, :, 1]]
Gives array([[1, 5],
[7, 8]]).
For this case, this works
tmp = a.reshape(-1,2)
b[tmp[:,0], tmp[:,1]]
A more general solution, whenever you want to use a 2D array of indices of shape (n,m) with arbitrary large dimension m, named inds, in order to access elements of another 2D array of shape (n,k), named B:
# array of index offsets to be added to each row of inds
offset = np.arange(0, inds.size, inds.shape[1])
# numpy.take(B, C) "flattens" arrays B and C and selects elements from B based on indices in C
Result = np.take(B, offset[:,np.newaxis]+inds)
Another solution, which doesn't use np.take and I find more intuitive, is the following:
B[np.expand_dims(np.arange(B.shape[0]), -1), inds]
The advantage of this syntax is that it can be used both for reading elements from B based on inds (like np.take), as well as for assignment.
You can test this by using, e.g.:
B = 1/(np.arange(n*m).reshape(n,-1) + 1)
inds = np.random.randint(0,B.shape[1],(B.shape[0],B.shape[1]))
I have a problem using multi-dimensional vectors as indices for multi-dimensional vectors. Say I have C.ndim == idx.shape[0], then I want C[idx] to give me a single element. Allow me to explain with a simple example:
A = arange(0,10)
B = 10+A
C = array([A.T, B.T])
C = C.T
idx = array([3,1])
Now, C[3] gives me the third row, and C[1] gives me the first row. C[idx] then will give me a vstack of both rows. However, I need to get C[3,1]. How would I achieve that given arrays C, idx?
/edit:
An answer suggested tuple(idx). This work's perfectly for a single idx. But:
Let's take it to the next level: say INDICES is a vector where I have stacked vertically arrays of shape idx. tuple(INDICES) will give me one long tuple, so C[tuple(INDICES)] won't work. Is there a clean way of doing this or will I need to iterate over the rows?
If you convert idx to a tuple, it'll be interpreted as basic and not advanced indexing:
>>> C[3,1]
13
>>> C[tuple(idx)]
13
For the vector case:
>>> idx
array([[3, 1],
[7, 0]])
>>> C[3,1], C[7,0]
(13, 7)
>>> C[tuple(idx.T)]
array([13, 7])
>>> C[idx[:,0], idx[:,1]]
array([13, 7])
I have a two dimensional numpy array.
Each row is three elements long and is an integer 0-3. This represents a 6 bit integer, with each cell representing two bits, in order.
I'm trying to transform them into the full integer.
E.g.
for i in range(len(myarray)):
myarray[i] = myarray[i][0] * 16 + myarray[i][1] * 4 + myarray[i][2]
E.g. I'm trying to sum each row but according to a certain weight vector of [16,4,1].
What is the most elegant way to do this? I'm thinking I have to do some sort of dot product followed by a sum, but I'm not 100% confident where to do the dot.
The dot product inclination is correct, and that includes the sum you need. So, to get the sum of the products of the elements of a target array and a set of weights:
>>> a = np.array([[0,1,2],[2,2,3]])
>>> a
array([[0, 1, 2],
[2, 2, 3]])
>>> weights = np.array([16,4,2])
>>> np.dot(a,weights)
array([ 8, 46])