Numpy Matrix Modulo Index Extraction - python

Suppose I have a 2-dimensional matrix A, say
A = np.mat([[1,2,3,4],
[5,6,7,8],
[9,10,11,12]])
how can I change all elements in row 1 with column index modulo 2 to 0? I.e., I would like to obtain
np.mat([[1,2,3,4],
[0,6,0,8],
[9,10,11,12]])
I have tried
A[1][np.arange(len(A))%2==0] = 0
which results in IndexError.

Column index % 2 = 0 means that the index is an even integer.
You can change the elements of the first row at even column indexes to 0 as follows:
A[1, ::2] = 0 # 2 is the step
If you want to do it as your (incorrect) A[1][np.arange(len(A))%2==0] = 0, you should change it to
A[1, np.arange(A.shape[1]) % 2 == 0] = 0
where A.shape[1] is the number of columns (whereas len(A) gives you the number of rows).

Related

How to efficiently check conditions on two columns and perform operation on third column in python

I have three columns with thousands of rows. Numbers in column 1 and 2 change from 1 to 6. I desire to check combinations of numbers in both column 1 and 2 to divide the value in column 3 by a certain value.
1 2 3.036010
1 3 2.622544
3 1 2.622544
1 2 3.036010
2 1 3.036010
Further, column 3 will be divided by same number if values of column 1 and column 2 are swapped. For example, for 1 2 and 2 1 combinations, column 3 may be divided by same value. My present approach does the job, but I would have to write several conditions manually. What could be more efficient way to perform this task? Thanks in advance!
my_data = np.loadtxt('abc.dat')
for row in my_data:
if row[0] == 1 and row[1] == 2:
row[3]/some_value
Numpy offers np.where which allows for vectorized test:
result = np.where(data[:, 0] == data[:, 1], data[:, 2]/some_value, data[:, 2])
or if you want to change the array in place:
data[:, 2] = np.where(data[:, 0] == data[:, 1], data[:, 2]/some_value, data[:, 2])
You could use a mask for this:
import numpy as np
my_data = np.column_stack([np.random.randint(1, 6, (1000, 2)), np.random.randn(1000)])
some_value = 123
mask = my_data[:, 0] == my_data[:, 1]
# divide
my_data[mask, 2] /= some_value
output in my_data
If you want to combine some conditions like your code. you can use operator & for and or | for or in np.where:
cond1 = my_data[:, 0] == 1 # cond is a masked Boolean array for where the first condition is satisfied
cond2 = my_data[:, 1] == 2
some_value = 10
indices = np.where(cond1 & cond2)[0] # it gets indices for where the two conditions are satisfied
# indices = np.where(cond1 | cond2)[0] # it gets indices for where at least one of the masks is satisfied
result = my_data[:, 2][indices] / some_value # operation is done on the specified indices
and if you want to modify the 2nd column in place, as Ballesta answer
my_data[:, 2][indices] = my_data[:, 2][indices] / some_value
np.logical_and and np.logical_or are the other modules that can handle these such conditions, too; These modules must be used as np.logical_and.reduce and np.logical_or.reduce if conditions are more than two.
Maybe using pandas is more suitable for this task, you can define conditions and apply them to tabular data without any explicit loop.

How to set first value in numpy array that meets a condition to 1 but not the rest

I want to set the max value in a numpy array equal to 1, and the rest of the values to 0, so that there is only one value equal to 1 in the new array.
Right now I'm doing this with:
new_arr = np.where(arr == np.max(arr), 1, 0)
However, if there are multiple values in arr that are equal to np.max(arr) then there will be multiple values in new_arr equal to 1.
How do I make it so that there is only one value in new_arr equal to 1 (the first value equal to np.max(arr) seems like a fine option but not necessary).
You can use:
new_arr = np.zeros(shape=arr.shape)
new_arr[np.unravel_index(np.argmax(arr),shape=arr.shape)] = 1
This also works for multi-dimensional arrays. np.argmax gives the flattened index of the first instance of the max element, and np.unravel_index converts the flat index to the index based on the array shape.
You almost have it.
This will give you the index of the last occurrence of the max value
np.where(arr == np.max(arr))[0][-1]
and if you want the first occurence of the maximum value, it is :
np.where(arr == np.max(arr))[0][0]
Example
import numpy as np
arr = np.array(np.random.choice([1, 2, 3, 4], size=10))
print(arr)
Output : [4 1 2 4 1 1 4 2 4 3]
Then:
np.where(arr == np.max(arr))[0][-1] # last index of max value
Output: 8
or
np.where(arr == np.max(arr))[0][0] # first index of max value
Output: 0
You can then proceed to replace by index.

How to replace repeated consecutive elements in a 2d numpy array with single element

I have a numpy array of shape(1080,960)
[[0 0 255 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 255 0 0]
...
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 255 255 ... 0 0 0]]
I want to output a numpy array that replaces the repeated values of both 0 and 255 with a single 0 and a single 255
The numpy array is a representation of a binary image that has pixels in the form BBBWWWWWWWBBBBWWW where B is black and W is white. I want to convert it into BWBW.
Example:
input:
[[0,0,0,255,255,255,0,0,0,0],
[255,255,255,0,0,0,255,255,255],
[0,0,255,0,0,255,0,0,255]]
output:
[[0,255,0],
[255,0,255]
[0,255,0,255,0,255]]
You cannot output a 2D numpy array because output rows may have different lengths. I would settle for a list of numpy arrays. So 1st let's generate some data:
img = np.random.choice([0,255], size=(1080, 960))
Then iterate over each row:
out=[]
for row in img:
idx=np.ediff1d(row, to_begin=1).nonzero()[0]
out.append(row[idx])
By taking the difference we are simply detecting where changes take place, and then using those indices idx to select the starting element in a consecutive streak. This solution is a bit simpler and faster than the the one by #DavidWinder (30 ms vs. 150 ms).
A fully vectorized solution can be a bit faster, but the code would be a bit complex. It would involve flattening arrays, raveling and unraveling indices... and applying np.split at the end, which is not a very fast operation because it involves creating a list. So I think this answer is good enough compromise between speed/code simplicity.
Edit #1
If the preferred output is an array padded with 0s at the end, it is better to create a zeros array and fill it with values of out list. First find out which row has more elements, and create array:
max_elms = np.max([len(x) for x in out])
arr = np.zeros((1080, max_elms), dtype=np.int32)
And then iterate over out list and arr, filling values of arr with the ones in out list:
for row, data in zip(arr, out):
row[:len(data)] = data
You can iterate over the rows and group the element by build new array while checking last elements and append only if difference.
Function as follow:
def groupRow(row):
newRow = [row[0]]
for elem in row:
if elem != newRow[-1]:
newRow.append(elem)
return newRow
Iterate and replace every row in the shape with the newRow from that function

Python: Iterate over a data frame column, check for a condition-value stored in array, and get the values to a list

After some help in the forum I managed to do what I was looking for and now I need to get to the next level. ( the long explanation is here:
Python Data Frame: cumulative sum of column until condition is reached and return the index):
I have a data frame:
In [3]: df
Out[3]:
index Num_Albums Num_authors
0 0 10 4
1 1 1 5
2 2 4 4
3 3 7 1000
4 4 1 44
5 5 3 8
I add a column with the cumulative sum of another column.
In [4]: df['cumsum'] = df['Num_Albums'].cumsum()
In [5]: df
Out[5]:
index Num_Albums Num_authors cumsum
0 0 10 4 10
1 1 1 5 11
2 2 4 4 15
3 3 7 1000 22
4 4 1 44 23
5 5 3 8 26
Then I apply a condition to the cumsumcolumn and I extract the corresponding values of the row where the condition is met with a given tolerance:
In [18]: tol = 2
In [19]: cond = df.where((df['cumsum']>=15-tol)&(df['cumsum']<=15+tol)).dropna()
In [20]: cond
Out[20]:
index Num_Albums Num_authors cumsum
2 2.0 4.0 4.0 15.0
Now, what I want to do is to substitute to the condition 15 in the example, the conditions stored in an array. Check when the condition is met and retrieve not the entire row, but only the value of the column Num_Albums. Finally, all these retrieved values (one per condition) are stored in an array or list.
Coming from matlab, I would do something like this (I apologize for this mixed matlab/python syntax):
conditions = np.array([10, 15, 23])
for i=0:len(conditions)
retrieved_values(i) = df.where((df['cumsum']>=conditions(i)-tol)&(df['cumsum']<=conditions(i)+tol)).dropna()
So for the data frame above I would get (for tol=0):
retrieved_values = [10, 4, 1]
I would like a solution that lets me keep the .where function if possible..
A quick way to do would be to leverage NumPy's broadcasting techniques as an extension of this answer from the same post linked, although an answer related to the use of DF.where was actually asked.
Broadcasting eliminates the need to iterate through every element of the array and it's highly efficient at the same time.
The only addition to this post is the use of np.argmax to grab the indices of the first True instance along each column (traversing ↓ direction).
conditions = np.array([10, 15, 23])
tol = 0
num_albums = df.Num_Albums.values
num_albums_cumsum = df.Num_Albums.cumsum().values
slices = np.argmax(np.isclose(num_albums_cumsum[:, None], conditions, atol=tol), axis=0)
Retrieved slices:
slices
Out[692]:
array([0, 2, 4], dtype=int64)
Corresponding array produced:
num_albums[slices]
Out[693]:
array([10, 4, 1], dtype=int64)
If you still prefer using DF.where, here is another solution using list-comprehension -
[df.where((df['cumsum'] >= cond - tol) & (df['cumsum'] <= cond + tol), -1)['Num_Albums']
.max() for cond in conditions]
Out[695]:
[10, 4, 1]
The conditions not fulfilling the given criteria would be replaced by -1. Doing this way preserves the dtype at the end.
well the output not always be 1 number right?
in case the ouput is exact 1 number you can write this code
tol = 0
#condition
c = [5,15,25]
value = []
for i in c:
if len(df.where((df['a'] >= i - tol) & (df['a'] <= i + tol)).dropna()['a']) > 0:
value = value + [df.where((df['a'] >= i - tol) & (df['a'] <= i + tol)).dropna()['a'].values[0]]
else:
value = value + [[]]
print(value)
the output should be like
[1,2,3]
in case the output can be multiple number and want to be like this
[[1.0, 5.0], [12.0, 15.0], [25.0]]
you can use this code
tol = 5
c = [5,15,25]
value = []
for i in c:
getdatas = df.where((df['a'] >= i - tol) & (df['a'] <= i + tol)).dropna()['a'].values
value.append([x for x in getdatas])
print(value)

Print the row and column numbers of a matrix-python

I am trying to print the row number and column number of a matrix where the value is 1.
For example:
A=[0 1 0]
[1 0 1]
[1 0 0]
I want the output to be displayed as:(row number followed by the corresponding column)
0 1
1 0 2
2 0
I tried to use enumerate() but it gave me different kind of output.
G={i: [j for j, y in enumerate(row) if y] for i, row in enumerate(A)}
print (G)
Python's indices are zero-based. You have done everything correctly, but just need to add a couple of +1s to get the output you expected. Also, dictionaries are unordered by nature, so you would be better off just using a list of tuples:
G = [(i+1, [j+1 for j, y in enumerate(row) if y]) for i, row in enumerate(A)]
Or better still; just a 2d list using the indices as the first column when you need them.

Categories