Update a matrix through advanced indexing/vectorizing - python

I have a matrix of what is effectively counters. I would like to increment those counters based on a list of column indices - with each positional index also corresponding to a row increment.
This is straightforward with a for loop, and a little less straightforward with list comprehension. In either case, iteration is involved. But I was wondering if there is any way to vectorise this problem?
The minimal problem is:
counters = np.zeros((4,4))
counters
array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
update_columns = [1,0,2,2]
for row, col in zip(range(len(update_columns)), update_columns):
counters[row, col] += 1
counters
array([[0., 1., 0., 0.],
[1., 0., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 1., 0.]])

What you are looking for is called advanced numpy indexing. You can pass the row index using np.arange and column index using update_columns:
update_columns = np.array(update_columns)
counters[np.arange(update_columns.size), update_columns] += 1
output:
[[0. 1. 0. 0.]
[1. 0. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 1. 0.]]

Related

Python: Fill out edges of binary array

I'm using the following code to generate an array based on coordinates of edges:
verts = np.array(list(itertools.product((0,2), (0,2))))
arr = np.zeros((5, 5))
arr[tuple(verts.T)] = 1
plt.imshow(arr)
which gives me
or, as a numeric array:
[[1., 0., 1., 0., 0.],
[0., 0., 0., 0., 0.],
[1., 0., 1., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]]
Now, I would like to fill out the spaces in between the corners (ie. yellow squares):
so that I get the following array:
[[1., 1., 1., 0., 0.],
[1., 1., 1., 0., 0.],
[1., 1., 1., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]]
Replace (0,2) using range(0,3) (3 as ranges are inclusive-exclusive) that is
import itertools
import numpy as np
verts = np.array(list(itertools.product(range(0,3), range(0,3))))
arr = np.zeros((5, 5))
arr[tuple(verts.T)] = 1
print(arr)
output
[[1. 1. 1. 0. 0.]
[1. 1. 1. 0. 0.]
[1. 1. 1. 0. 0.]
[0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0.]]

how to access a matrix and increase specific column?

For example, I have this matrix, and I need to access the second column and increase it by 2:
m = [[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
You can do that just by accessing the 2nd column and incrementing the value. You can do that by doing this : m[:, 1] = m[:, 1] + 2
It means that you are ignoring all the rows and specifying the columns. Here, 1 refers to the 2nd column.
You can do this by using numpy library which allows you to easily do such thing.
Import numpy as import numpy as np
Convert the 2d list into numpy array
m = np.array([
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]
])
Now apply the conditioning
m[:, 1] = m[:, 1] + 2
Print the output.
print("M: ", m)
Combined Code:
import numpy as np
m = np.array([
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]
])
m[:, 1] = m[:, 1] + 2
print("M: ", m)
So, you need to increase the second element of each row by 2. You could achieve this by a for loop.
for row in m:
row[1] += 2
You could convert the matrix into a numpy array. Just in case you're looking at exploiting the optimisations that this library offers
import numpy as np
m = np.array([
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]
])
m[:, 1] += 1

tf.keras.utils.to_categorical mixing classes

I am using tf.keras.utils.to_categorical() for data preparation.
I have this very simple list and I want to get the categorical values out of it.
So I do this:
tf.keras.utils.to_categorical([1,2,3], num_classes=6)
and I get:
array([[0., 1., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0.],
[0., 0., 0., 1., 0., 0.]], dtype=float32)
Now for further usage, I reduce the values I sent to the function by 1, to get a amount of 6 classes, without 0 as placeholder:
tf.keras.utils.to_categorical([x -1 for x in [1,2,3]], num_classes=6)
which results in this:
array([[1., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0.]], dtype=float32)
Now comes the weird part. I want to set certain features to 0 and thats why I found this behaviour:
tf.keras.utils.to_categorical([x -1 for x in [-4,2,3]], num_classes=6)
results in:
array([[0., 1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0.]], dtype=float32)
So to_categorical() is mixing -4 and 2 into the same class, which I find pretty weird. I would have expected an exception as the list was not map-able to 6 classes. But I did not expect this to happen. Is this a bug or a feature, why is this happening?
Thanks!
That's completely normal. It just works consistently with Python's negative indexing. See:
import tensorflow as tf
tf.keras.utils.to_categorical([0, 1, 2, -1, -2, -3])
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 1.],
[0., 1., 0.],
[1., 0., 0.]], dtype=float32)
To put it differently:
import tensorflow as tf
a = tf.keras.utils.to_categorical([0, 1, 2], num_classes=3)
b = tf.keras.utils.to_categorical([-3, -2, -1], num_classes=3)
print(a)
print(b)
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
If you want to know why this happened, I think to_categorical in keras doesn't work with negative numbers. but if you want to solve it I suggest to make all numbers greater than 0.
this code do that:
arr=numpy.array([-5,-4,-2,-1,0,1,2,3,4]) #anything
arr+=(0-arr.min())
Keras to_categorical doesn't work for negative numbers. It's clearly written that the numbers must start from 0.
https://keras.io/api/utils/python_utils/#to_categorical-function
If you still need to make it work, make a dictionary to map the negative numbers.

How do I copy elements of an array based on elements of another array?

pop=np.zeros((population_size,chromosome_length))
for i in range(population_size):
for j in range(i,chromosome_length):
pop[i,j] = random.randint(0, 1)
pop
array([[0., 1., 0., 1., 1., 1., 0., 0., 1., 1.],
[0., 0., 1., 0., 1., 0., 1., 1., 0., 0.],
[0., 0., 1., 0., 0., 1., 1., 0., 0., 1.],
[0., 0., 0., 0., 1., 0., 1., 1., 1., 0.],
[0., 0., 0., 0., 1., 0., 1., 1., 0., 1.],
[0., 0., 0., 0., 0., 0., 0., 1., 0., 0.]])
I have another array expected, generated from un-shown code, with an example below:
array([[1.99214608],
[1.45140389],
[0.07068525],
[0.69507167],
[1.08384057],
[0.70685254]])
I then want to bin the values of expected based on custom intervals:
actual=np.zeros((population_size,1))
for i in range(len(expected)):
if expected[i]>=1.5:
actual[i]=2
elif 1.5>expected[i]>=0.9:
actual[i]=1
else:
actual[i]=0
actual=actual.astype(int)
total_count=int(np.sum(actual))
print(total_count)
[[2]
[1]
[0]
[0]
[1]
[0]]
4
and I want the final output as:
array([[0., 1., 0., 1., 1., 1., 0., 0., 1., 1.],
[0., 1., 0., 1., 1., 1., 0., 0., 1., 1.],
[0., 0., 1., 0., 1., 0., 1., 1., 0., 0.],
[0., 0., 0., 0., 1., 0., 1., 1., 0., 1.]])
based on the values in total_count. The first row of pop got copied twice, the second row once and the fifth row once. In short, what I want is repeat/copy/duplicate elements of an array based on the integer element of another array.
I'll try address this question in sections as you are using NumPy arrays as though they are lists, and therefore losing a lot of the purpose of the library in the first place. Although the syntax is much more compact, it comes with significant speed increases.
Creating the population
This one is simple enough. We can make a direct replacement for generating pop by using numpy.random.randint. We need to specify values for population_size and chromosome length and use those to specify the output size.
population_size = 6
chromosome_length = 10
pop = np.random.randint(0, 2, (population_size, chromosome_length))
NOTE: This won't give the exact same values as you've included in your actual question because we haven't set a seed for the random number generator. However, the code is directly equivalent to your for loop but more performant.
Generating expected
I can't make an exact replacement for this section because it's too much to replace your loops, with some variables also being undefined. So, I'm just assuming that I'll get the same 2D array as you have shown:
expected = np.array([[1.99214608],
[1.45140389],
[0.07068525],
[0.69507167],
[1.08384057],
[0.70685254]])
Binning the data
This is a bit more complex. We can make use of numpy.digitize to bin the data between your intervals (0, 0.9 and 1.5). However, this method will not work with 2D arrays so I'm going to use numpy.ravel() to flatten the array first.
This is going to give back a list of bin identities that each value of expected belongs to. However, bin identities start at 1, and we want to use these values as indicies of an array further on, so I'm also going to subtract 1 from the result at the same time.
bins = np.array([0, 0.9, 1.5])
dig = np.digitize(expected.ravel(), bins) - 1
Last Steps
I'm going to create an array of values that correspond to the bin categories. We can then use numpy.take to replace the values of dig with the corresponding replacement values.
replacements = np.array([0, 1, 2])
actual = np.take(replacements, dig)
And finally :), we can use numpy.repeat using actual to take rows from pop in the correct proportions to build the output.
Final Code
import numpy as np
population_size = 6
chromosome_length = 10
pop = np.random.randint(0, 2, (population_size, chromosome_length))
# But I'm going to deliberately overwrite the above to solve your particular case
pop = np.array([[0., 1., 0., 1., 1., 1., 0., 0., 1., 1.],
[0., 0., 1., 0., 1., 0., 1., 1., 0., 0.],
[0., 0., 1., 0., 0., 1., 1., 0., 0., 1.],
[0., 0., 0., 0., 1., 0., 1., 1., 1., 0.],
[0., 0., 0., 0., 1., 0., 1., 1., 0., 1.],
[0., 0., 0., 0., 0., 0., 0., 1., 0., 0.]])
# Hard-coded :/
expected = np.array([[1.99214608],
[1.45140389],
[0.07068525],
[0.69507167],
[1.08384057],
[0.70685254]])
bins = np.array([0, 0.9, 1.5])
dig = np.digitize(expected.ravel(), bins) - 1
replacements = np.array([0, 1, 2])
actual = np.take(replacements, dig)
out = np.repeat(pop, actual, axis=0)
print(out)
Gives:
[[0. 1. 0. 1. 1. 1. 0. 0. 1. 1.]
[0. 1. 0. 1. 1. 1. 0. 0. 1. 1.]
[0. 0. 1. 0. 1. 0. 1. 1. 0. 0.]
[0. 0. 0. 0. 1. 0. 1. 1. 0. 1.]]

Vectorizing creation of array of diagonal matrix, [duplicate]

This question already has answers here:
Vectorized creation of an array of diagonal square arrays from a liner array in Numpy or Tensorflow
(5 answers)
Closed 3 years ago.
I have a 2d array called diagonals where each row represents the diagonal of a 2d matrix. What's the fastest/best way to create a 3d array diag_matricies where the last two dimensions each consist of a diagonal matrix created using the rows of diagonals?
In a loop this is what I want:
import numpy as np
diag_matricies = np.zeros([3,3,3])
diagonals = np.array([[1,2,3],[4,5,6],[7,8,9]])
for i in range(3):
diag_matricies[i] = np.diag(diagonals[i,:])
print(diag_matricies)
One faster alternative is to use advanced indexing:
index = np.arange(3)
diag_matricies[:, index, index] = diagonals
[[[1. 0. 0.]
[0. 2. 0.]
[0. 0. 3.]]
[[4. 0. 0.]
[0. 5. 0.]
[0. 0. 6.]]
[[7. 0. 0.]
[0. 8. 0.]
[0. 0. 9.]]]
Timing with the size of each dimension being 1200:
from datetime import datetime
N = 1200
diag_matricies = np.zeros([N, N, N])
diagonals = np.arange(N * N).reshape((N, N))
start = datetime.now()
index = np.arange(N)
diag_matricies[:, index, index] = diagonals
print('advanced indexing: ', datetime.now() - start)
start = datetime.now()
for i in range(N):
diag_matricies[i] = np.diag(diagonals[i])
print('for loop: ', datetime.now() - start)
# advanced indexing: 0:00:01.537120
# for loop: 0:00:07.281833
You can use np.einsum:
>>> out = np.zeros((3,3,3))
>>> np.einsum('ijj->ij',out)[...] = diagonals
>>> out
array([[[1., 0., 0.],
[0., 2., 0.],
[0., 0., 3.]],
[[4., 0., 0.],
[0., 5., 0.],
[0., 0., 6.]],
[[7., 0., 0.],
[0., 8., 0.],
[0., 0., 9.]]])
What this does under the hood is more or less the following:
>>> out2 = np.zeros((3,3,3))
>>> out2.reshape(3,9)[:,::4] = diagonals
>>> out2
array([[[1., 0., 0.],
[0., 2., 0.],
[0., 0., 3.]],
[[4., 0., 0.],
[0., 5., 0.],
[0., 0., 6.]],
[[7., 0., 0.],
[0., 8., 0.],
[0., 0., 9.]]])
only the einsum method also works for noncontiguous arrays.

Categories