I need to change all nans of a matrix to a different value. I can easily get the nan positions using argwhere, but then I am not sure how to access those positions programmatically. Here is my nonworking code:
myMatrix = np.array([[3.2,2,float('NaN'),3],[3,1,2,float('NaN')],[3,3,3,3]])
nanPositions = np.argwhere(np.isnan(myMatrix))
maxVal = np.nanmax(abs(myMatrix))
for pos in nanPositions :
myMatrix[pos] = maxval
the problem is that myMatrix[pos] does not accept pos as an array.
The more-efficient way of generating your output has already been covered by sacul. However, you're incorrectly indexing your 2D matrix in the case where you want to use an array.
At least to me, it's a bit unintuitive, but you need to use:
myMatrix[[all_row_indices], [all_column_indices]]
The following will give you what you expect:
import numpy as np
myMatrix = np.array([[3.2,2,float('NaN'),3],[3,1,2,float('NaN')],[3,3,3,3]])
nanPositions = np.argwhere(np.isnan(myMatrix))
maxVal = np.nanmax(abs(myMatrix))
print(myMatrix[nanPositions[:, 0], nanPositions[:, 1]])
You can see more about advanced indexing in the documentation
In [54]: arr = np.array([[3.2,2,float('NaN'),3],[3,1,2,float('NaN')],[3,3,3,3]])
...:
In [55]: arr
Out[55]:
array([[3.2, 2. , nan, 3. ],
[3. , 1. , 2. , nan],
[3. , 3. , 3. , 3. ]])
Location of the nan:
In [56]: np.where(np.isnan(arr))
Out[56]: (array([0, 1]), array([2, 3]))
In [57]: np.argwhere(np.isnan(arr))
Out[57]:
array([[0, 2],
[1, 3]])
where produces a tuple of arrays; argwhere the same values but as a 2d array
In [58]: arr[Out[56]]
Out[58]: array([nan, nan])
In [59]: arr[Out[56]] = [100,200]
In [60]: arr
Out[60]:
array([[ 3.2, 2. , 100. , 3. ],
[ 3. , 1. , 2. , 200. ],
[ 3. , 3. , 3. , 3. ]])
The argwhere can be used to index individual items:
In [72]: for ij in Out[57]:
...: print(arr[tuple(ij)])
100.0
200.0
The tuple() is needed here because np.array([1,3]) in interpreted as 2 element indexing on the first dimension.
Another way to get that indexing tuple is to use unpacking:
In [74]: [arr[i,j] for i,j in Out[57]]
Out[74]: [100.0, 200.0]
So while argparse looks useful, it is trickier to use than plain where.
You could, as noted in the other answers, use boolean indexing (I've already modified arr so the isnan test no longer works):
In [75]: arr[arr>10]
Out[75]: array([100., 200.])
More on indexing with a list or array, and indexing with a tuple:
In [77]: arr[[0,0]] # two copies of row 0
Out[77]:
array([[ 3.2, 2. , 100. , 3. ],
[ 3.2, 2. , 100. , 3. ]])
In [78]: arr[(0,0)] # one element
Out[78]: 3.2
In [79]: arr[np.array([0,0])] # same as list
Out[79]:
array([[ 3.2, 2. , 100. , 3. ],
[ 3.2, 2. , 100. , 3. ]])
In [80]: arr[np.array([0,0]),:] # making the trailing : explicit
Out[80]:
array([[ 3.2, 2. , 100. , 3. ],
[ 3.2, 2. , 100. , 3. ]])
You can do this instead (IIUC):
myMatrix[np.isnan(myMatrix)] = np.nanmax(abs(myMatrix))
Related
I have a numpy array named heartbeats with 100 rows. Each row has 5 elements.
I also have a single array named time_index with 5 elements.
I need to prepend the time index to each row of heartbeats.
heartbeats = np.array([
[-0.58, -0.57, -0.55, -0.39, -0.40],
[-0.31, -0.31, -0.32, -0.46, -0.46]
])
time_index = np.array([-2, -1, 0, 1, 2])
What I need:
array([-2, -0.58],
[-1, -0.57],
[0, -0.55],
[1, -0.39],
[2, -0.40],
[-2, -0.31],
[-1, -0.31],
[0, -0.32],
[1, -0.46],
[2, -0.46])
I only wrote two rows of heartbeats to illustrate.
Assuming you are using numpy, the exact output array you are looking for can be made by stacking a repeated version of time_index with the raveled version of heartbeats:
np.stack((np.tile(time_index, len(heartbeats)), heartbeats.ravel()), axis=-1)
Another approach, using broadcasting
In [13]: heartbeats = np.array([
...: [-0.58, -0.57, -0.55, -0.39, -0.40],
...: [-0.31, -0.31, -0.32, -0.46, -0.46]
...: ])
...: time_index = np.array([-2, -1, 0, 1, 2])
Make a target array:
In [14]: res = np.zeros(heartbeats.shape + (2,), heartbeats.dtype)
In [15]: res[:,:,1] = heartbeats # insert a (2,5) into a (2,5) slot
In [17]: res[:,:,0] = time_index[None] # insert a (5,) into a (2,5) slot
In [18]: res
Out[18]:
array([[[-2. , -0.58],
[-1. , -0.57],
[ 0. , -0.55],
[ 1. , -0.39],
[ 2. , -0.4 ]],
[[-2. , -0.31],
[-1. , -0.31],
[ 0. , -0.32],
[ 1. , -0.46],
[ 2. , -0.46]]])
and then reshape to 2d:
In [19]: res.reshape(-1,2)
Out[19]:
array([[-2. , -0.58],
[-1. , -0.57],
[ 0. , -0.55],
[ 1. , -0.39],
[ 2. , -0.4 ],
[-2. , -0.31],
[-1. , -0.31],
[ 0. , -0.32],
[ 1. , -0.46],
[ 2. , -0.46]])
[17] takes a (5,), expands it to (1,5), and then to (2,5) for the insert. Read up on broadcasting.
As an alternative way, you can repeat time_index by np.concatenate based on the specified times:
concatenated = np.concatenate([time_index] * heartbeats.shape[0])
# [-2 -1 0 1 2 -2 -1 0 1 2]
# result = np.dstack((concatenated, heartbeats.reshape(-1))).squeeze()
result = np.array([concatenated, heartbeats.reshape(-1)]).T
Using np.concatenate may be faster than np.tile. This solution is faster than Mad Physicist, but the fastest is using broadcasting as hpaulj's answer.
I have two arrays, and I want all the elements of one to be divided by the second. For example,
In [24]: a = np.array([1,2,3])
In [25]: b = np.array([1,2,3])
In [26]: a/b
Out[26]: array([1., 1., 1.])
In [27]: 1/b
Out[27]: array([1. , 0.5 , 0.33333333])
This is not the answer I want, the output I want is like (we can see all of the elements of a are divided by b)
In [28]: c = []
In [29]: for i in a:
...: c.append(i/b)
...:
In [30]: c
Out[30]:
[array([1. , 0.5 , 0.33333333]),
array([2. , 1. , 0.66666667]),
In [34]: np.array(c)
Out[34]:
array([[1. , 0.5 , 0.33333333],
[2. , 1. , 0.66666667],
[3. , 1.5 , 1. ]])
But I don't like for loop, it's too slow for big data, so is there a function that included in numpy package or any good (faster) way to solve this problem?
It is simple to do in pure numpy, you can use broadcasting to calculate the outer product (or any other outer operation) of two vectors:
import numpy as np
a = np.arange(1, 4)
b = np.arange(1, 4)
c = a[:,np.newaxis] / b
# array([[1. , 0.5 , 0.33333333],
# [2. , 1. , 0.66666667],
# [3. , 1.5 , 1. ]])
This works, since a[:,np.newaxis] increases the dimension of the (3,) shaped array a into a (3, 1) shaped array, which can be used for the desired broadcasting operation.
First you need to cast a into a 2D array (same shape as the output), then repeat for the dimension you want to loop over. Then vectorized division will work.
>>> a.reshape(-1,1)
array([[1],
[2],
[3]])
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1)
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1) / b
array([[1. , 0.5 , 0.33333333],
[2. , 1. , 0.66666667],
[3. , 1.5 , 1. ]])
# Transpose will let you do it the other way around, but then you just get 1 for everything
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1).T
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1).T / b
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
This should do the job:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([1, 2, 3])
print(a.reshape(-1, 1) / b)
Output:
[[ 1. 0.5 0.33333333]
[ 2. 1. 0.66666667]
[ 3. 1.5 1. ]]
I'm setting a numpy array with a power-law equation. The problem is that part of my domain tries to do numpy.power(x, n) when x is negative and n is not an integer. In this part of the domain I want the value to be 0.0. Below is a code that has the correct behavior, but is there a more Pythonic way to do this?
# note mesh.x is a numpy array of length nx
myValues = npy.zeros((nx))
para = [5.8780046, 0.714285714, 2.819250868]
for j in range(nx):
if mesh.x[j] > para[1]:
myValues[j] = para[0]*npy.power(mesh.x[j]-para[1],para[2])
else:
myValues[j] = 0.0
Is "numpythonic" a word? It should be a word. The following is really neither pythonic nor unpythonic, but it is much more efficient than using a for loop, and close(r) to the way Travis would probably do it:
import numpy
mesh_x = numpy.array([0.5,1.0,1.5])
myValues = numpy.zeros_like( mesh_x )
para = [5.8780046, 0.714285714, 2.819250868]
mask = mesh_x > para[1]
myValues[mask] = para[0] * numpy.power(mesh_x[mask] - para[1], para[2])
print(myValues)
For very large problems you would probably want to avoid creating temporary arrays:
mask = mesh.x > para[1]
myValues[mask] = mesh.x[mask]
myValues[mask] -= para[1]
myValues[mask] **= para[2]
myValues[mask] *= para[0]
Here's one approach with np.where to choose values between the power calculations and 0 -
import numpy as np
np.where(mesh.x>para[1],para[0]*np.power(mesh.x-para[1],para[2]),0)
Explanation :
np.where(mask,A,B) chooses elements from A or B depending on mask elements. So, in our case it is mesh.x>para[1] when doing a vectorized comparison for all mesh.x elements in one go.
para[0]*np.power(mesh.x-para[1],para[2]) gives us the elements that are to be chosen in case a mask element is True. Else, we choose 0, which is the third argument to np.where.
More of an explanation of the answers given by #jez and #Divakar with simple examples than an answer itself. They both rely on some form of boolean indexing.
>>>
>>> a
array([[-4.5, -3.5, -2.5],
[-1.5, -0.5, 0.5],
[ 1.5, 2.5, 3.5]])
>>> n = 2.2
>>> a ** n
array([[ nan, nan, nan],
[ nan, nan, 0.21763764],
[ 2.44006149, 7.50702771, 15.73800567]])
np.where is made for this it selects one of two values based on a boolean array.
>>> np.where(np.isnan(a**n), 0, a**n)
array([[ 0. , 0. , 0. ],
[ 0. , 0. , 0.21763764],
[ 2.44006149, 7.50702771, 15.73800567]])
>>>
>>> b = np.where(a < 0, 0, a)
>>> b
array([[ 0. , 0. , 0. ],
[ 0. , 0. , 0.5],
[ 1.5, 2.5, 3.5]])
>>> b **n
array([[ 0. , 0. , 0. ],
[ 0. , 0. , 0.21763764],
[ 2.44006149, 7.50702771, 15.73800567]])
Use of boolean indexing on the left-hand-side and the right-hand-side. This is similar to np.where
>>>
>>> a[a >= 0] = a[a >= 0] ** n
>>> a
array([[ -4.5 , -3.5 , -2.5 ],
[ -1.5 , -0.5 , 0.21763764],
[ 2.44006149, 7.50702771, 15.73800567]])
>>> a[a < 0] = 0
>>> a
array([[ 0. , 0. , 0. ],
[ 0. , 0. , 0.21763764],
[ 2.44006149, 7.50702771, 15.73800567]])
>>>
I want to generate a numpy array of the form:
0.5*[[0, 0], [1, 1], [2, 2], ...]
I want the final array to have a dtype of numpy.float32.
Here is my attempt:
>>> import numpy as np
>>> N = 5
>>> x = np.array(np.repeat(0.5*np.arange(N), 2), np.float32)
>>> x
array([ 0. , 0. , 0.5, 0.5, 1. , 1. , 1.5, 1.5, 2. , 2. ], dtype=float32)
Is this a good way? Can I avoid the copy (if it is indeed copying) just for type conversion?
You only has to reshape your final result to obtain what you want:
x = x.reshape(-1, 2)
You could also run arange passing the dtype:
x = np.repeat(0.5*np.arange(N, dtype=np.float32), 2).reshape(-1, 2)
You can easily cast the array as another type using the astype method, which accepts an argument copy:
x.astype(np.int8, copy=False)
But, as explained in the documentation, numpy checks for some requirements in order to return the view. If those requirements are not satisfied, a copy is returned.
You can check if a given array is a copy or a view from another by checking the OWNDATA attribute, accessible through the flags property of the ndarray.
EDIT: more on checking if a given array is a copy...
Is there a way to check if numpy arrays share the same data?
An alternative:
np.array([0.5*np.arange(N, dtype=np.float32)]*2)
Gives:
array([[ 0. , 0.5, 1. , 1.5, 2. ],
[ 0. , 0.5, 1. , 1.5, 2. ]], dtype=float32)
You might want to rotate it:
np.rot90(np.array([0.5*np.arange(N, dtype=np.float32)]*2),3)
Giving:
array([[ 0. , 0. ],
[ 0.5, 0.5],
[ 1. , 1. ],
[ 1.5, 1.5],
[ 2. , 2. ]], dtype=float32)
Note, this is slower than #Saullo_Castro's answer:
np.rot90(np.array([0.5*np.arange(N, dtype=np.float32)]*2),3)
10000 loops, best of 3: 24.3us per loop
np.repeat(0.5*np.arange(N, dtype=np.float32), 2).reshape(-1, 2)
10000 loops, best of 3: 9.23 us per loop
np.array(np.repeat(0.5*np.arange(N), 2), np.float32).reshape(-1, 2)
10000 loops, best of 3: 10.4 us per loop
(using %%timeit on ipython)
Let's say I have a set of vectors (readings from sensor 1, readings from sensor 2, readings from sensor 3 -- indexed first by timestamp and then by sensor id) that I'd like to correlate to a separate set of vectors (temperature, humidity, etc -- also all indexed first by timestamp and secondly by type).
What is the cleanest way in numpy to do this? It seems like it should be a rather simple function...
In other words, I'd like to see:
> a.shape
(365,20)
> b.shape
(365, 5)
> correlations = magic_correlation_function(a,b)
> correlations.shape
(20, 5)
Cheers,
/YGA
P.S. I've been asked to add an example.
Here's what I would like to see:
$ In [27]: x
$ Out[27]:
array([[ 0, 0, 0],
[-1, 0, -1],
[-2, 0, -2],
[-3, 0, -3],
[-4, 0.1, -4]])
$ In [28]: y
$ Out[28]:
array([[0, 0],
[1, 0],
[2, 0],
[3, 0],
[4, 0.1]])
$ In [28]: magical_correlation_function(x, y)
$ Out[28]:
array([[-1. , 0.70710678, 1. ]
[-0.70710678, 1. , 0.70710678]])
Ps2: whoops, mis-transcribed my example. Sorry all. Fixed now.
The simplest thing that I could find was using the scipy.stats package
In [8]: x
Out[8]:
array([[ 0. , 0. , 0. ],
[-1. , 0. , -1. ],
[-2. , 0. , -2. ],
[-3. , 0. , -3. ],
[-4. , 0.1, -4. ]])
In [9]: y
Out[9]:
array([[0. , 0. ],
[1. , 0. ],
[2. , 0. ],
[3. , 0. ],
[4. , 0.1]])
In [10]: import scipy.stats
In [27]: (scipy.stats.cov(y,x)
/(numpy.sqrt(scipy.stats.var(y,axis=0)[:,numpy.newaxis]))
/(numpy.sqrt(scipy.stats.var(x,axis=0))))
Out[27]:
array([[-1. , 0.70710678, -1. ],
[-0.70710678, 1. , -0.70710678]])
These aren't the numbers you got, but you've mixed up your rows. (Element [0,0] should be 1.)
A more complicated, but purely numpy solution is
In [40]: numpy.corrcoef(x.T,y.T)[numpy.arange(x.shape[1])[numpy.newaxis,:]
,numpy.arange(y.shape[1])[:,numpy.newaxis]]
Out[40]:
array([[-1. , 0.70710678, -1. ],
[-0.70710678, 1. , -0.70710678]])
This will be slower because it computes the correlation of each element in x with each other element in x, which you don't want. Also, the advanced indexing techniques used to get the subset of the array you desire can make your head hurt.
If you're going to use numpy intensely, get familiar with the rules on broadcasting and indexing. They will help you push as much down to the C-level as possible.
Will this do what you want?
correlations = dot(transpose(a), b)
Note: if you do this, you'll probably want to standardize or whiten a and b first, e.g. something equivalent to this:
a = sqrt((a - mean(a))/(var(a)))
b = sqrt((b - mean(b))/(var(b)))
As David said, you should define the correlation you're using. I don't know of any definitions of correlation that gives sensible numbers when correlating empty and non-empty signals.