NumPy: impute mean of the two nearest rows for all NaN - python

I have a NumPy array with missing values. I want to impute the mean of the nearest values vertically.
import numpy as np
arr = np.random.randint(0, 10, (10, 4)).astype(float)
arr[2, 0] = np.nan
arr[4, 3] = np.nan
arr[0, 2] = np.nan
print(arr)
[[ 5. 7. nan 4.] # should be 4
[ 2. 6. 4. 9.]
[nan 2. 5. 5.] # should be 4.5
[ 7. 0. 3. 8.]
[ 6. 4. 3. nan] # should be 4
[ 8. 1. 2. 0.]
[ 0. 0. 1. 1.]
[ 1. 2. 6. 6.]
[ 8. 1. 9. 7.]
[ 3. 5. 8. 8.]]

If you are open to using Pandas, pd.DataFrame.interpolate is easy to use. Set limit_direction if "interpolating" values at ends of array:
df = pd.DataFrame(arr).interpolate(limit_direction='both')
df.to_numpy() # back to a numpy array if needed (if using v0.24.0 or above)
Output:
array([[5. , 7. , 4. , 4. ],
[2. , 6. , 4. , 9. ],
[4.5, 2. , 5. , 5. ],
[7. , 0. , 3. , 8. ],
[6. , 4. , 3. , 4. ],
[8. , 1. , 2. , 0. ],
[0. , 0. , 1. , 1. ],
[1. , 2. , 6. , 6. ],
[8. , 1. , 9. , 7. ],
[3. , 5. , 8. , 8. ]])

import numpy as np
arr = np.random.randint(0, 10, (10, 4)).astype(float)
arr[2, 0] = np.nan
arr[4, 3] = np.nan
arr[0, 2] = np.nan
print(arr)
[[ 5. 7. nan 4.]
[ 2. 6. 4. 9.]
[nan 2. 5. 5.]
[ 7. 0. 3. 8.]
[ 6. 4. 3. nan]
[ 8. 1. 2. 0.]
[ 0. 0. 1. 1.]
[ 1. 2. 6. 6.]
[ 8. 1. 9. 7.]
[ 3. 5. 8. 8.]]
for x, y in np.argwhere(np.isnan(arr)):
sample = arr[np.maximum(x - 1, 0):np.minimum(x + 2, 20), y]
arr[x, y] = np.mean(sample[np.logical_not(np.isnan(sample))])
print(arr)
[[5. 7. 4. 4. ] # 3rd value here is mean(4)
[2. 6. 4. 9. ]
[4.5 2. 5. 5. ] # first value here is mean(2, 7)
[7. 0. 3. 8. ]
[6. 4. 3. 4. ] # 4th value here is mean(8, 0)
[8. 1. 2. 0. ]
[0. 0. 1. 1. ]
[1. 2. 6. 6. ]
[8. 1. 9. 7. ]
[3. 5. 8. 8. ]]

Related

Delete first column of a numpy array

I have the following np.array():
[[55.3 1. 2. 2. 2. 2. ]
[55.5 1. 2. 0. 2. 2. ]
[54.9 2. 2. 2. 2. 2. ]
[47.9 2. 2. 2. 0. 0. ]
[57. 1. 2. 2. 0. 2. ]
[56.6 1. 2. 2. 2. 2. ]
[54.7 1. 2. 2. 2. nan]
[51.4 2. 2. 2. 2. 2. ]
[55.3 2. 2. 2. 2. nan]]
And I would Like to get the following one :
[[1. 2. 2. 2. 2. ]
[1. 2. 0. 2. 2. ]
[2. 2. 2. 2. 2. ]
[2. 2. 2. 0. 0. ]
[1. 2. 2. 0. 2. ]
[1. 2. 2. 2. 2. ]
[1. 2. 2. 2. nan]
[2. 2. 2. 2. 2. ]
[2. 2. 2. 2. nan]]
I did try :
MyArray[1:]#But this delete the first line
np.delete(MyArray, 0, 1) #Where I don't understand the output
[[ 2. 2. 2. 2. 2.]
[ 1. 2. 2. 2. 2.]
[ 1. 2. 0. 2. 2.]
[ 2. 2. 2. 2. 2.]
[ 2. 2. 2. 0. 0.]
[ 1. 2. 2. 0. 2.]
[ 1. 2. 2. 2. 2.]
[ 1. 2. 2. 2. nan]
[ 2. 2. 2. 2. 2.]
[ 2. 2. 2. 2. nan]]
You made a bit of a mistake using np.delete,
The np.delete arguments are array,list of indexes to be deleted, axis. By using the below snippet you get the output you want.
arr=np.delete(arr,[0],1)
The problem you created was, you passed integer instead of a list, which is why it isn't giving correct output.
You could try: new_array = [i[1:] for i in MyArray]
Try MyArray[:,1:]
I think you can get rid of column 0 with this
It should be straight forward with
new_array = MyArray[:, 1:]
See this link for explanation and examples.
Or this link

How can i sort an array based on the mean of each column in python?

Input:
array([[ 1. , 5. , 1. ],
[ 10. , 7. , 1.5],
[ 6.9, 5. , 1. ],
[ 19. , 9. , 100. ],
[ 11. , 11. , 11. ]])
Expected Output:
array([[ 19. , 9. , 100. ],
[ 11. , 11. , 11. ],
[ 10. , 7. , 1.5],
[ 6.9, 5. , 1. ],
[ 1. , 5. , 1. ]])
i tried doing the below:
for i in M:
ls = i.mean()
x = np.append(i,ls)
print(x) #found the mean
After this i am unable to arrange each column based on the mean value in each row. All i can do
is to arrange each row in descending order but that is not what i wanted.
You can do this:
In [405]: row_idxs = np.argsort(np.mean(a * -1, axis=1))
In [406]: a[row_idxs, :]
Out[406]:
array([[ 19. , 9. , 100. ],
[ 11. , 11. , 11. ],
[ 10. , 7. , 1.5],
[ 6.9, 5. , 1. ],
[ 1. , 5. , 1. ]])
Using argsort will sort the indices. Multiplying by -1 allows you to get descending order.

NxN matrix in python with non-duplicate integers (in range [0:N-1]) in both rows AND columns

In python, how to create a matrix or 2D array of N x N such that :
[A] Each Row has non-duplicate integers from 0 : N-1
And [B] Each Column has non-duplicate integers from 0:N-1
Example :
[[1 0 2]
[2 1 0]
[0 2 1]]
So I had a bit of a tinker with this question, this code seems to work
import numpy as np
N = 10
row = np.arange(N)
result = np.zeros((N, N))
for i in row:
result[i] = np.roll(row, i)
print(result)
output:
[[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
[9. 0. 1. 2. 3. 4. 5. 6. 7. 8.]
[8. 9. 0. 1. 2. 3. 4. 5. 6. 7.]
[7. 8. 9. 0. 1. 2. 3. 4. 5. 6.]
[6. 7. 8. 9. 0. 1. 2. 3. 4. 5.]
[5. 6. 7. 8. 9. 0. 1. 2. 3. 4.]
[4. 5. 6. 7. 8. 9. 0. 1. 2. 3.]
[3. 4. 5. 6. 7. 8. 9. 0. 1. 2.]
[2. 3. 4. 5. 6. 7. 8. 9. 0. 1.]
[1. 2. 3. 4. 5. 6. 7. 8. 9. 0.]]
Ask away if you have any questions.

Numpy sorting a matrix by column by cycle

I have this numpy array:
array(
[[ 1. , 9. , 565.98653513],
[ 1. , 1. , 973.18466261],
[ 1. , 25. , 803.17747373],
[ 2. , 9. , 82.56336897],
[ 2. , 1. , 104.69517373],
[ 2. , 25. , 627.01127514],
[ 3. , 21. , 334.07622382],
[ 3. , 34. , 921.37623107],
[ 3. , 20. , 342.08177942],
... ... ... ...
[ 10. , 7. , 424.29338026],
[ 10. , 0. , 232.71475407],
[ 10. , 1. , 330.44846202]])
But I want to sort the matrix by the first column in a cycle: 1, 2, 3, ...,10. It should look like this:
array(
[[ 1. , 9. , 565.98653513],
[ 2. , 9. , 82.56336897],
[ 3. , 21. , 334.07622382],
... ... ... ...
[ 10. , 7. , 424.29338026],
[ 1. , 1. , 973.18466261],
[ 2. , 1. , 104.69517373],
[ 3. , 34. , 921.37623107],
... ... ... ...
[ 10. , 0. , 232.71475407],
[ 1. , 25. , 803.17747373],
[ 2. , 25. , 627.01127514],
[ 3. , 20. , 342.08177942],
... ... ... ...
[ 10. , 1. , 330.44846202]])
How can I do this?
I was thinking of converting it to a dataframe (i.e. pandas) for more sorting options, then covert back to an array...but I don't see a straight forward function to do this.
I appreciate any help or ideas.
Assuming your array is named x:
y = [x[i::10] for i in range(int(len(x)/10))]
y = np.array(y)
y.reshape(x.shape)
print(y)
The x[i:j:k] notation means x from i to j with step k. So x[i::10] mean x from i to the end with step 10.
See more here.

Sort rows of a 2D array, based on the first column

I want to sort the rows of a 2D array based on the elements of the first column, in Python 3. For example, if
x = array([[ 5. , 9. , 2. , 6. ],
[ 7. , 12. , 3.5, 8. ],
[ 2. , 6. , 7. , 9. ]])
then I need the sorted array to be
x = array([[ 2. , 6. , 7. , 9. ],
[ 5. , 9. , 2. , 6. ],
[ 7. , 12. , 3.5, 8. ]])
How can I do that? A similar question was asked and answered here, but it does not work for me.
The following should work:
import numpy as np
x = np.array([[ 5. , 9. , 2. , 6. ],
[ 7. , 12. , 3.5, 8. ],
[ 2. , 6. , 7. , 9. ]])
x[x[:, 0].argsort()]
Out[2]:
array([[ 2. , 6. , 7. , 9. ],
[ 5. , 9. , 2. , 6. ],
[ 7. , 12. , 3.5, 8. ]])
Documentation : numpy.argsort
#using sorted
x = ([[5.,9.,2.,6. ], [7.,12.,3.5,8.], [2.,6.,7.,9.]])
x = sorted(x, key=lambda i: i[0]) #1st col
print(x)

Categories