How to replace values in a numpy array based on another column?

How to replace values in a numpy array based on another column? - python

Let say i have the following:
import numpy as np
data = np.array([
[1,2,3],
[1,2,3],
[1,2,3],
[4,5,6],
])
How would I go about changing values in column 3 based on values in column 2? For instance, If column 3 == 3, column 2 = 9.
[[1,9,3],
[1,9,3],
[1,9,3],
[4,5,6]]
I've looked at np.any(), but I can't figure out how to alter the array in place.

You can use Numpy's slicing and indexing to achieve this. Take all the rows where the third column is 3, and change the second column of each of those rows to 9:
>>> data[data[:, 2] == 3, 1] = 9
>>> data
array([[1, 9, 3],
[1, 9, 3],
[1, 9, 3],
[4, 5, 6]])

Related

Which way are rows and columns in a numpy 2-d array used as a matrix?

When using a numpy array as a matrix, in which order are rows and columns?
For example:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
Is [1, 2, 3] the first row or the first column?
I cannot find this information in the documentation, perhaps because the answer is too obvious.

[1, 2, 3] is the first row.
The examples in numpy ndarray documentation actually gives you some hints:
>>> x = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
>>> # The element of x in the *second* row, *third* column, namely, 6.
>>> x[1, 2] ```

How to sort an nx3 numpy array by column(s) but it remembers the data in that row?

First off, I'm very new to python and so any tips/help is really appreciated.
Essentially I want an nx3 numpy array to be sorted initially by the second column then by the third but I want all of the data in the row to remain with each other.
Like so:
import numpy as np
a = np.array([[20, 2, 4],
[7, 5, 6],
[25, 1, 5],
[2, 2, 3],
[3, 5, 8],
[4, 1, 3]])
......... (n times)
In this array the first column represents the value, the second it's x coordinate and the third its y coordinate. What is the best way to do a descending sort the array by first the x coordinate, then do a descending sort on the y coordinate whilst value still stays assigned to the x and y coordinate?
So after the sort, it looks like this:
a = ([[4, 1, 3],
[25, 1, 5],
[2, 2, 3],
[20, 2, 4],
[7, 5, 6],
[3, 5, 8]])
......... (n times)
As you can see how can it first sort the x coordinate then with sort all the y coordinates which have the same x coordinates. As it first finds all x coordinates of 1 then within that sort the y coordinates. Whilst the value, x and y coordinates all remain on the same row with each other.

Easiest way is to convert it into a pandas dataframe then it's easier to manipulate it.
import pandas as pd
df = pd.DataFrame({'a': [6, 2, 1], 'b': [4, 5, 6]})
print(df)
Out
a b
0 6 4
1 2 5
2 1 6
sorteddf = df.sort_values(by='a')
print(sorteddf)
Out
a b
2 1 6
1 2 5
0 6 4

Take a look at the 'order' parameter: https://docs.scipy.org/doc/numpy/reference/generated/numpy.sort.html
import numpy as np
dtype = [('x',int),('y',int)]
values = [(1,7),(3,4),(1,4),(1,3)]
a = np.array(values,dtype=dtype)
print(np.sort(a, order=['x','y']))

The easiest way is to first sort by y and then sort the result by x, so for equals values of x the final result will be sorted by y.
Full code:
import numpy as np
a = np.array([[20, 2, 4],
[7, 5, 6],
[25, 1, 5],
[2, 2, 3],
[3, 5, 8],
[4, 1, 3]])
a = a[a[:,2].argsort()] # sort according to column 2
a = a[a[:,1].argsort()] # sort according to column 1
result
a
array([[ 4, 1, 3],
[25, 1, 5],
[ 2, 2, 3],
[20, 2, 4],
[ 7, 5, 6],
[ 3, 5, 8]])

Modify different columns in each row of a 2D NumPy array

I have the following problem:
Let's say I have an array defined like this:
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
What I would like to do is to make use of Numpy multiple indexing and set several elements to 0. To do that I'm creating a vector:
indices_to_remove = [1, 2, 0]
What I want it to mean is the following:
Remove element with index '1' from the first row
Remove element with index '2' from the second row
Remove element with index '0' from the third row
The result should be the array [[1,0,3],[4,5,0],[0,8,9]]
I've managed to get values of the elements I would like to modify by following code:
values = np.diagonal(np.take(A, indices, axis=1))
However, that doesn't allow me to modify them. How could this be solved?

You could use integer array indexing to assign those zeros -
A[np.arange(len(indices_to_remove)), indices_to_remove] = 0
Sample run -
In [445]: A
Out[445]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [446]: indices_to_remove
Out[446]: [1, 2, 0]
In [447]: A[np.arange(len(indices_to_remove)), indices_to_remove] = 0
In [448]: A
Out[448]:
array([[1, 0, 3],
[4, 5, 0],
[0, 8, 9]])

Modify a particular row/column of a NumPy array

How do I modify particular a row or column of a NumPy array?
For example I have a NumPy array as follows:
P = array([[1, 2, 3],
[4, 5, 6]])
How do I change the elements of first row, [1, 2, 3], to [7, 8, 9] so that the P will become:
P = array([[7, 8, 9],
[4, 5, 6]])
Similarly, how do I change second column values, [2, 5], to [7, 8]?
P = array([[1, 7, 3],
[4, 8, 6]])

Rows and columns of NumPy arrays can be selected or modified using the square-bracket indexing notation in Python.
To select a row in a 2D array, use P[i]. For example, P[0] will return the first row of P.
To select a column, use P[:, i]. The : essentially means "select all rows". For example, P[:, 1] will select all rows from the second column of P.
If you want to change the values of a row or column of an array, you can assign it to a new list (or array) of values of the same length.
To change the values in the first row, write:
>>> P[0] = [7, 8, 9]
>>> P
array([[7, 8, 9],
[4, 5, 6]])
To change the values in the second column, write:
>>> P[:, 1] = [7, 8]
>>> P
array([[1, 7, 3],
[4, 8, 6]])

In a similar way if you want to select only two last columns for example but all rows you can use:
print P[:,1:3]

If you have lots of elements in a column:
import numpy as np
np_mat = np.array([[1, 2, 2],
[3, 4, 5],
[5, 6, 5]])
np_mat[:,2] = np_mat[:,2] * 3
print(np_mat)
It is making a multiplied by 3 change in third column:
[[ 1 2 6]
[ 3 4 15]
[ 5 6 15]]

removing columns from an array in Python

I have a 2D Python array, from which I would like to remove certain columns, but I don't know how many I would like to remove until the code runs.
I want to loop over the columns in the original array, and if the sum of the rows in any one column is about a certain value I want to remove the whole column.
I started to do this the following way:
for i in range(original_number_of_columns)
if sum(original_array[:,i]) < certain_value:
new_array[:,new_index] = original_array[:,i]
new_index+=1
But then I realised that I was going to have to define new_array first, and tell Python what size it is. But I don't know what size it is going to be beforehand.
I have got around it before by firstly looping over the columns to find out how many I will lose, then defining the new_array, and then lastly running the loop above - but obviously there will be much more efficient ways to do such things!
Thank you.

You can use the following:
import numpy as np
a = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
)
print a.compress(a.sum(0) > 15, 1)
[[3]
[6]
[9]]

without numpy
my_2d_table = [[...],[...],...]
only_cols_that_sum_lt_x = [col for col in zip(*my_2d_table) if sum(col) < some_threshold]
new_table = map(list,zip(*only_cols_that_sum_lt_x))
with numpy
a = np.array(my_2d_table)
a[:,np.sum(a,0) < some_target]

I suggest using numpy.compress.
>>> import numpy as np
>>> a = np.array([[1, 2, 3], [1, -3, 2], [4, 5, 7]])
>>> a
array([[ 1, 2, 3],
[ 1, -3, 2],
[ 4, 5, 7]])
>>> a.sum(axis=0) # sums each column
array([ 6, 4, 12])
>>> a.sum(0) < 5
array([ False, True, False], dtype=bool)
>>> a.compress(a.sum(0) < 5, axis=1) # applies the condition to the elements of each row so that only those elements in the rows whose column indices correspond to True values in the condition array will be kept
array([[ 2],
[-3],
[ 5]])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to replace values in a numpy array based on another column? - python

You can use Numpy's slicing and indexing to achieve this. Take all the rows where the third column is 3, and change the second column of each of those rows to 9: >>> data[data[:, 2] == 3, 1] = 9 >>> data array([[1, 9, 3], [1, 9, 3], [1, 9, 3], [4, 5, 6]])

Related

Which way are rows and columns in a numpy 2-d array used as a matrix?

How to sort an nx3 numpy array by column(s) but it remembers the data in that row?

Modify different columns in each row of a 2D NumPy array

Modify a particular row/column of a NumPy array

removing columns from an array in Python

Categories

Resources