numpy count elements across axis 0 matching values from another array - python

Given a 3D array such as:
array = np.random.randint(1, 6, (3, 3, 3))
and an array of maximum values across axis 0:
max_array = array.max(axis=0)
Is there a vectorised way to count the number of elements in axis 0 of array which are equal to the value of the matching index in max_array? For example, if array contains [1, 3, 3] in one axis 0 position, the output is 2, and so on for the other 8 positions, returning an array with the counts.

To count the number of values in x which equal the corresponding value in xmax, you could use:
(x == xmax).sum(axis=0)
Note that since x has shape (3,3,3) and xmax has shape (3,3), the expression x == xmax causes NumPy to broadcast xmax up to shape (3,3,3) where the new axis is added on the left.
For example,
import numpy as np
np.random.seed(2015)
x = np.random.randint(1, 6, (3,3,3))
print(x)
# [[[3 5 5]
# [3 2 1]
# [3 4 1]]
# [[1 5 4]
# [1 4 1]
# [2 3 4]]
# [[2 3 3]
# [2 1 1]
# [5 1 2]]]
xmax = x.max(axis=0)
print(xmax)
# [[3 5 5]
# [3 4 1]
# [5 4 4]]
count = (x == xmax).sum(axis=0)
print(count)
# [[1 2 1]
# [1 1 3]
# [1 1 1]]

Related

Remove row from a numpy array according to some condition and concatenate result

I have a list of numpy arrays and I wanted to remove a row according to some condition.
Lets suppose I have the following list of numpy arrays and I want to delete the rows which contain an item that is > 8.
test = [np.array([[2,2,4],[10,3,5],[1,2,4,],[1,2,4]]),
np.array([[1,2,3],[1,3,5],[6,3,1],[9,1,2]])]
for i in test:
z = np.argwhere(i>8)
print(z)#[[1 0]] and [[3 0]]
a1 = np.delete(i,z,axis=0)
print(a1)
This for loop skips the numpy array of index[0]. How can Ifix this?
Returns:
[[1 2 4]
[1 2 4]]
[[1 3 5]
[6 3 1]]
Desirable Return:
[[2,2,4]
[1 2 4]
[1 2 4]]
[[1,2,3]
[1 3 5]
[6 3 1]]
From your example, you want to remove row with index 1 from the first array,
and row with index 3 from the second array.
So use those indices when executing np.delete:
a1 = np.delete(i, z[0][0], axis=0)
np.argwhere will return both indices, but we're only interested in the rows:
np.argwhere(i > 8)[:, 0]
But really, we're only interested in unique rows, so we can take care of that too:
np.unique(np.argwhere(i > 8)[:, 0])
Altogether we get:
test = [np.array([[2,2,4],[10,3,5],[1,2,4,],[1,2,4]]),np.array([[1,2,3],[1,3,5],[6,3,1],[9,1,2]])]
for i in test:
z = np.unique(np.argwhere(i>8)[:, 0])
a1 = np.delete(i,z,axis=0)
print(a1)
#[[2 2 4]
# [1 2 4]
# [1 2 4]]
#[[1 2 3]
# [1 3 5]
# [6 3 1]]

How to insert tensor array into tensor matrix after every second position

I have a tensor array a and a tensor matrix m. Now I want to insert a into m after every second position started at index 0 ending with len(m)-2. Let's make an equivalent example using numpy and plain python:
# define m
m = np.array([[3,7,6],[4,3,1],[8,4,2],[2,8,7]])
print(m)
#[[3 7 6]
# [4 3 1]
# [8 4 2]
# [2 8 7]]
# define a
a = np.array([1,2,3])
#[1 2 3]
# insert a into m
result = []
for i in range(len(m)):
result.append(a)
result.append(m[i])
print(np.array(result))
#[[1 2 3]
# [3 7 6]
# [1 2 3]
# [4 3 1]
# [1 2 3]
# [8 4 2]
# [1 2 3]
# [2 8 7]]
I am looking for a solution in tensorflow. I am convinced that there is a solution that doesn't need a loop but I am not able to find one. I hope someone can help me out with this!
You can concatenate your target vector at the beginning of each line of your matrix, and then reshape it.
import tensorflow as tf
initial_array = tf.constant([
[3, 7, 6],
[4, 3, 1],
[8, 4, 2],
[2, 8, 7],
])
vector_to_add = [1, 2, 3]
concat = tf.concat([[vector_to_add] * initial_array.shape[0], initial_array], axis=1) # Concatenate vector_to_add to each vector of initial_array
output = tf.reshape(concat, (2 * initial_array.shape[0], initial_array.shape[1])) # Reshape
This should work,
np.ravel(np.column_stack((m, np.tile(a, (4, 1))))).reshape(8, 3)
For idea, please refer to Interweaving two numpy arrays. Apply any solution described there, and reshape.

how does numpy reshape work with negative variable as second parameter

I am trying to play with negative variable as second parameter
a = np.array([[1,2,3], [4,5,6]])
print(np.reshape(a, (3,-1)) )
print("___________________________________")
print(np.reshape(a, (3,-2)) )
print("___________________________________")
print(np.reshape(a, (3,-3)) )
print("___________________________________")
print(np.reshape(a, (3,2)) )
All the four types of reshaping above basically gives the same result as the output.
[[1 2]
[3 4]
[5 6]]
___________________________________
[[1 2]
[3 4]
[5 6]]
___________________________________
[[1 2]
[3 4]
[5 6]]
___________________________________
[[1 2]
[3 4]
[5 6]]
I am just trying to understand what is the difference between the above? Can -1 and 2 be used interchangeably?
The parameters to reshape can contain one unknown dimension which represented by a negative number, the value is inferred from the length of the array and remaining dimensions.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html#numpy.reshape
for example
a = np.array([[1,2,3, 4], [5,6,7,8]])
print(np.reshape(a, (-2)) )
print("___________________________________")
print(np.reshape(a, (2, 2,-2)) )
print("___________________________________")
print(np.reshape(a, (2, -1,-2)) )
Output
[1 2 3 4 5 6 7 8]
___________________________________
[[[1 2]
[3 4]]
[[5 6]
[7 8]]]
___________________________________
...
ValueError: can only specify one unknown dimension
Reshaping with a negative number is no magic. As stated in the answer above the number after the negative sign does not really matter.
Here is a function demonstrating how reshaping is done. Note that this is purely demonstrative, not an actual implementation taken from source code or anything like that.
def computeNegativeDim(arr):
givenDims = list(arr.shape)
knownDims = [d for d in givenDims if d > 0]
val = 1
for k in knownDims:
val *= k
dimOfPreviouslyUnknown = arr.size / val
for g in givenDims:
if g < 0:
g = dimOfPreviouslyUnknown
newarr = arr.reshape(givenDims)
Or somewhere along the above.

Tensorflow: stack all row pairs from a tensor

Given a tensor t=[[1,2], [3,4]], I need to produce ts=[[1,2,1,2], [1,2,3,4], [3,4,1,2], [3,4,3,4]]. That is, I need to stack together all row pairs.
Important: the tensor has dimension [None, 2], ie. the first dimension is variable.
I have tried:
Using a tf.while_loop to generate a list of indices idx=[[0, 0], [0, 1], [1, 0], [1, 1]], then tf.gather(ts, idx). This works but is messy and I don't know what to do about gradients.
2 for loops iterating over tf.unstack(t), adding stacked rows to a buffer, then tf.stack(buffer). This does not work if the first dimension is variable.
To look for inspiration in broadcasting. For instance, given x=t.expand_dims(t, 0), y=t.expand_dims(t, 1), s=tf.reshape(tf.add(x, y), [-1, 2]) s will be [[2, 4], [4, 6], [4, 6], [6, 8]], ie. the sum of every row combination. But how can I do stacking instead of sum? I've been failing for 2 days :)
Solution with tf.meshgrid() and some reshaping:
import tensorflow as tf
import numpy as np
t = tf.placeholder(tf.int32, [None, 2])
num_rows, size_row = tf.shape(t)[0], tf.shape(t)[1] # actual dynamic dimensions
# Getting pair indices using tf.meshgrid:
idx_range = tf.range(num_rows)
pair_indices = tf.stack(tf.meshgrid(*[idx_range, idx_range]))
pair_indices = tf.transpose(pair_indices, perm=[1, 2, 0])
# Finally gathering the rows accordingly:
res = tf.reshape(tf.gather(t, pair_indices), (-1, size_row * 2))
with tf.Session() as sess:
print(sess.run(res, feed_dict={t: np.array([[1,2], [3,4], [5,6]])}))
# [[1 2 1 2]
# [3 4 1 2]
# [5 6 1 2]
# [1 2 3 4]
# [3 4 3 4]
# [5 6 3 4]
# [1 2 5 6]
# [3 4 5 6]
# [5 6 5 6]]
Solution using cartesian product:
import tensorflow as tf
import numpy as np
t = tf.placeholder(tf.int32, [None, 2])
num_rows, size_row = tf.shape(t)[0], tf.shape(t)[1] # actual dynamic dimensions
# Getting pair indices by computing the indices cartesian product:
row_idx = tf.range(num_rows)
row_idx_a = tf.expand_dims(tf.tile(tf.expand_dims(row_idx, 1), [1, num_rows]), 2)
row_idx_b = tf.expand_dims(tf.tile(tf.expand_dims(row_idx, 0), [num_rows, 1]), 2)
pair_indices = tf.concat([row_idx_a, row_idx_b], axis=2)
# Finally gathering the rows accordingly:
res = tf.reshape(tf.gather(t, pair_indices), (-1, size_row * 2))
with tf.Session() as sess:
print(sess.run(res, feed_dict={t: np.array([[1,2], [3,4], [5,6]])}))
# [[1 2 1 2]
# [1 2 3 4]
# [1 2 5 6]
# [3 4 1 2]
# [3 4 3 4]
# [3 4 5 6]
# [5 6 1 2]
# [5 6 3 4]
# [5 6 5 6]]
Can be achieved by:
tf.concat([tf.tile(tf.expand_dims(t,1), [1, tf.shape(t)[0], 1]), tf.tile(tf.expand_dims(t,0), [tf.shape(t)[0], 1, 1])], axis=2)
Detailed steps:
t = tf.placeholder(tf.int32, shape=[None, 2])
#repeat each row of t
d = tf.tile(tf.expand_dims(t,1), [1, tf.shape(t)[0], 1])
#Output:
#[[[1 2] [1 2]]
# [[3 4] [3 4]]]
#repeat the entire input t
e = tf.tile(tf.expand_dims(t,0), [tf.shape(t)[0], 1, 1])
#Output:
#[[[1 2] [3 4]]
# [[1 2] [3 4]]]
#concat
f = tf.concat([d, e], axis=2)
with tf.Session() as sess:
print(sess.run(f, {t:np.asarray([[1,2],[3,4]])}))
#Output
#[[[1 2 1 2]
#[1 2 3 4]]
#[[3 4 1 2]
#[3 4 3 4]]]

numpy 2D array: get indices of all entries that are connected and share the same value

I have a 2D numpy Array filled with integer-values from 0 to N, how can i get the indices of all entries that are directly connected and share the same value.
Addition: Most of the entries are zero and can be ignored!
Example Input array:
[ 0 0 0 0 0 ]
[ 1 1 0 1 1 ]
[ 0 1 0 1 1 ]
[ 1 0 0 0 0 ]
[ 2 2 2 2 2 ]
Wished output indices:
1: [ [1 0] [1 1] [2 1] [3 0] ] # first 1 cluster
[ [1 3] [1 4] [2 3] [2 4] ] # second 1 cluster
2: [ [4 0] [4 1] [4 2] [4 3] [4 4] ] # only 2 cluster
the formating of the output arrays is not important, i just need separated value clusters where it is possible to address the single indices
What i was first thinking of is:
N = numberClusters
x = myArray
for c in range(N):
for i in np.where(x==c):
# fill output array with i
but this misses the separation of clusters that have the same value
You can use skimage.measure.label (install it with pip install scikit-image, if needed) for this:
import numpy as np
from skimage import measure
# Setup some data
np.random.seed(42)
img = np.random.choice([0, 1, 2], (5, 5), [0.7, 0.2, 0.1])
# [[2 0 2 2 0]
# [0 2 1 2 2]
# [2 2 0 2 1]
# [0 1 1 1 1]
# [0 0 1 1 0]]
# Label each region, considering only directly adjacent pixels connected
img_labeled = measure.label(img, connectivity=1)
# [[1 0 2 2 0]
# [0 3 4 2 2]
# [3 3 0 2 5]
# [0 5 5 5 5]
# [0 0 5 5 0]]
# Get the indices for each region, excluding zeros
idx = [np.where(img_labeled == label)
for label in np.unique(img_labeled)
if label]
# [(array([0]), array([0])),
# (array([0, 0, 1, 1, 2]), array([2, 3, 3, 4, 3])),
# (array([1, 2, 2]), array([1, 0, 1])),
# (array([1]), array([2])),
# (array([2, 3, 3, 3, 3, 4, 4]), array([4, 1, 2, 3, 4, 2, 3]))]
# Get the bounding boxes of each region (ignoring zeros)
bboxes = [area.bbox for area in measure.regionprops(img_labeled)]
# [(0, 0, 1, 1),
# (0, 2, 3, 5),
# (1, 0, 3, 2),
# (1, 2, 2, 3),
# (2, 1, 5, 5)]
The bounding boxes can be found using the very helpful function skimage.measure.regionprops, which contains a plethora of information on the regions. For the bounding box it returns a tuple of (min_row, min_col, max_row, max_col), where pixels belonging to the bounding box are in the half-open interval [min_row; max_row) and [min_col; max_col).

Categories