I have multidimensional array. Once it has a critical value in the last dimension, I would like to mutate a tail of the dimension.
np.random.seed(100)
arr = np.random.uniform(size=100).reshape([2,5,2,5])
# array([[[[ 0.54340494, 0.27836939, 0.42451759, 0.84477613, 0.00471886],
# [ 0.12156912, 0.67074908, 0.82585276, 0.13670659, 0.57509333]],
# [[ 0.89132195, 0.20920212, 0.18532822, 0.10837689, 0.21969749],
# [ 0.97862378, 0.81168315, 0.17194101, 0.81622475, 0.27407375]],
# [[ 0.43170418, 0.94002982, 0.81764938, 0.33611195, 0.17541045],
# [ 0.37283205, 0.00568851, 0.25242635, 0.79566251, 0.01525497]],
# [[ 0.59884338, 0.60380454, 0.10514769, 0.38194344, 0.03647606],
# [ 0.89041156, 0.98092086, 0.05994199, 0.89054594, 0.5769015 ]],
# [[ 0.74247969, 0.63018394, 0.58184219, 0.02043913, 0.21002658],
# [ 0.54468488, 0.76911517, 0.25069523, 0.28589569, 0.85239509]]],
# [[[ 0.97500649, 0.88485329, 0.35950784, 0.59885895, 0.35479561],
# [ 0.34019022, 0.17808099, 0.23769421, 0.04486228, 0.50543143]],
# [[ 0.37625245, 0.5928054 , 0.62994188, 0.14260031, 0.9338413 ],
# [ 0.94637988, 0.60229666, 0.38776628, 0.363188 , 0.20434528]],
# [[ 0.27676506, 0.24653588, 0.173608 , 0.96660969, 0.9570126 ],
# [ 0.59797368, 0.73130075, 0.34038522, 0.0920556 , 0.46349802]],
# [[ 0.50869889, 0.08846017, 0.52803522, 0.99215804, 0.39503593],
# [ 0.33559644, 0.80545054, 0.75434899, 0.31306644, 0.63403668]],
# [[ 0.54040458, 0.29679375, 0.1107879 , 0.3126403 , 0.45697913],
# [ 0.65894007, 0.25425752, 0.64110126, 0.20012361, 0.65762481]]]])
Let's say critical value will be 0.80. We need to mutate all furter values after we see value higher than 0.80. We focus on two first "rows". Which stands for [3,2] after selection with np.argmax.
where_bigger = np.argmax(arr >= 0.80, axis = 3)
# array([[[3, 2], ## used as example later !!!!!!!!!
# [0, 0],
# [1, 0],
# [0, 0],
# [0, 4]],
# [[0, 0],
# [4, 0],
# [3, 0],
# [3, 1],
# [0, 0]]])
As example, we first focus on element with index 3 in the [3,2](see above with !!!!). Once we found value higher than 0.80 (index of such is 3) all following values should be replaced with np.na
arr[0,0,0,3] ## 0.84477613 comes as first element in [3,2]
# [ 0.54340494, 0.27836939, 0.42451759, 0.84477613, np.na]
Similar here, we focus on element 2 out of [3,2] and need to set all following elements to np.na
arr[0,0,1,2] ## 0.82585276 comes as second element in [3,2]
# [ 0.12156912, 0.67074908, 0.82585276, np.na, np.na]
At the end we repeat it for all elements found by argmax:
# array([[[[ 0.54340494, 0.27836939, 0.42451759, 0.84477613, np.na],
# [ 0.12156912, 0.67074908, 0.82585276, np.na, np.na]],
# [[ 0.89132195, np.na, np.na, np.na, np.na],
# [ 0.97862378, np.na, np.na, np.na, np.na]],
# [[ 0.43170418, 0.94002982, np.na, np.na, np.na],
# ...
Is it possible to adjust whole array at once without looping? Probably it is possible to do with slicing. I would like to use some approach like
arr[where_bigger:] = np.na, but it is clearly wrong. And so far I could not progress further.
Best bet is some type of boolean mask. You can make the tail by np.logical_or.accumulate but that will include the index with the threshhold value. If you want to keep the first instance, you'll have to pad it.
mask = np.c_[np.zeros(arr.shape[:-1] + (1,), dtype = bool), np.logical_or.accumulate(arr > .8, axis = -1)[...,:-1]]
arr[mask] = np.nan
Related
I'm implementing the Nearest Centroid Classification algorithm and I'm kind of blocked on how to use numpy.mean in my case.
So suppose I have some spherical datasets X:
[[ 0.39151059 3.48203037]
[-0.68677876 1.45377717]
[ 2.30803493 4.19341503]
[ 0.50395297 2.87076658]
[ 0.06677012 3.23265678]
[-0.24135103 3.78044279]
[-0.05660036 2.37695381]
[ 0.74210998 -3.2654815 ]
[ 0.05815341 -2.41905942]
[ 0.72126958 -1.71081388]
[ 1.03581142 -4.09666955]
[ 0.23209714 -1.86675298]
[-0.49136284 -1.55736028]
[ 0.00654881 -2.22505305]]]
and the labeled vector Y:
[0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 1.]
An example with 100 2D data points gives the following result:
The NCC algorithm consists of first calculating the class mean of each class (0 and 1: that's blue and red) and then calculating the nearest class centroid for the next data point.
This is my current function:
def mean_ncc(X,Y):
# find unique classes
m_cids = np.unique(Y) #[0. 1.]
# compute class means
mu = np.zeros((len(cids), X.shape[1])) #[[0. 0.] [0. 0.]] (in the case where Y has 2 unique points (0 and 1)
for class_idx, class_label in enumerate(cids):
mu[class_idx, :] = #problem here
return mu
So here I want an array containing the class means of '0' (blue) points and '1' (red) points:
How can I specify the number of elements of X whose mean I want to calculate?
I would like to do something like this:
for class_idx, class_label in enumerate(m_cids):
mu[class_idx, :] = np.mean(X[only the elements,that contains the same class_label], axis=0)
Is it possible or is there another way to implement this?
You could use something like this:
import numpy as np
tags = [0, 0, 1, 1, 0, 1]
values = [5, 4, 2, 5, 9, 8]
tags_np = np.array(tags)
values_np = np.array(values)
print(values_np[tags_np == 1].mean())
EDIT: You will surely need to look more into the axis parameter for the mean function:
import numpy as np
values = [[5, 4],
[5, 4],
[4, 3],
[4, 3]]
values_np = np.array(values)
tags_np = np.array([0, 0, 1, 1])
print(values_np[tags_np == 0].mean(axis=0))
I have a Tensorflow tensor A with (let's say) shape (5, 3, 5).
I want to get a tensor B with shape (5, 3) such that
# B = [A[0, :, 0], A[1, :, 1], A[2, :, 2], ...]
I want to achieve this indexing without using any for-loops.
Using numpy one would do:
import numpy as np
# A.shape = (5, 3, 5)
B = A[np.arange(A.shape[0]), :, np.arange(A.shape[2])]
Any suggestions how to do this using Tensorflow?
There are two ways to achieve your goal.
import tensorflow as tf
a = tf.random_normal(shape=(5,3,5))
# method 1: take the diagonal after transpose
b_diag = tf.matrix_diag_part(tf.transpose(a,[1,0,2])) # shape = (3,5)
result1 = tf.transpose(b_diag,[1,0])
# method 2: take the value by indices
indices = tf.stack([tf.range(tf.shape(a)[0])]*2,axis=-1)
# [[0 0]
# [1 1]
# [2 2]
# [3 3]
# [4 4]]
result2 = tf.gather_nd(tf.transpose(a,[0,2,1]),indices)
with tf.Session() as sess:
val_a,val_result1,val_result2 = sess.run([a,result1,result2])
print('origin matrix:\n',val_a)
print('method 1:\n',val_result1)
print('method 2:\n',val_result2)
origin matrix:
[[[ 0.6905094 0.13725948 -0.42244634 -0.19795062 0.02895796]
[-1.2307093 -0.90263253 0.8939539 0.43943858 0.60205126]
[ 0.1317933 0.7697048 -0.8040689 -0.41206598 -0.66366917]]
[[-0.07341296 -0.83268213 1.1547179 -1.035854 -0.43292868]
[ 0.63890094 -1.9335823 -0.61634874 -3.2909455 -1.1862688 ]
[-1.0031502 -0.07485765 0.53183764 0.55050373 -0.03113765]]
[[ 0.23482691 -0.9363624 0.30995724 -0.02038437 0.65965956]
[ 0.73754835 0.23244548 -1.5190666 0.89143264 -0.47610378]
[ 0.6452583 1.5191171 -0.15525642 0.5060588 1.2310679 ]]
[[ 0.32281107 0.80718434 -0.865543 0.5899832 -0.66145474]
[ 0.45294672 -0.31048244 -0.48481905 -1.1497563 1.4231541 ]
[ 0.2343677 -0.8113462 0.58899856 1.6336825 0.11803629]]
[[ 0.8602735 1.3486015 1.4897087 -1.2132328 -0.70290196]
[-2.635646 -0.3950463 0.19890717 -1.9909118 1.3279002 ]
[-0.88162804 -0.7264523 -0.40416357 -0.7689555 1.33081 ]]]
method 1:
[[ 0.6905094 -1.2307093 0.1317933 ]
[-0.83268213 -1.9335823 -0.07485765]
[ 0.30995724 -1.5190666 -0.15525642]
[ 0.5899832 -1.1497563 1.6336825 ]
[-0.70290196 1.3279002 1.33081 ]]
method 2:
[[ 0.6905094 -1.2307093 0.1317933 ]
[-0.83268213 -1.9335823 -0.07485765]
[ 0.30995724 -1.5190666 -0.15525642]
[ 0.5899832 -1.1497563 1.6336825 ]
[-0.70290196 1.3279002 1.33081 ]]
I have a numpy 2D matrix with data in python and I want to perform downsampling by keeping the 25% of the initial samples. In order to do so, I am using the following random.randint functionality:
reduced_train_face = face_train[np.random.randint(face_train.shape[0], size=300), :]
However, I am having a second matrix which contains the labels associated with the faces and I want to reduce with the same way. How, can I keep the indexes from the reduced matrix and apply them to the train_lbls matrix?
You can fix the seed just before applying your extraction:
import numpy as np
# Each labels correspond to the first element of each line of face_train
labels_train = np.array(range(0,15,3))
face_train = np.array(range(15)).reshape(5,3)
np.random.seed(0)
reduced_train_face = face_train[np.random.randint(face_train.shape[0], size=3), :]
np.random.seed(0)
reduced_train_labels = labels_train[np.random.randint(labels_train.shape[0], size=3)]
print(reduced_train_face, reduced_train_labels)
# [[12, 13, 14], [ 0, 1, 2], [ 9, 10, 11]], [12, 0, 9]
With the same seed, it will be reduce the same way.
edit: I advice you to use np.random.choice(n_total_elem, n_reduce_elem) in order to ensure that you only choose each data once and not twice the same data
Why don't you keep the selected index and use them to select data from both matrices?
import numpy as np
# setting up matrices
np.random.seed(1234) # make example repeatable
# the seeding is optional, only for the showing the
# same results as below!
face_train = np.random.rand(8,3)
train_lbls= np.random.rand(8)
print('face_train:\n', face_train)
print('labels:\n', train_lbls)
# Setting the random indexes
random_idxs= np.random.randint(face_train.shape[0], size=4)
print('random_idxs:\n', random_idxs)
# Using the indexes to slice the matrixes
reduced_train_face = face_train[random_idxs, :]
reduced_labels = train_lbls[random_idxs]
print('reduced_train_face:\n', reduced_train_face)
print('reduced_labels:\n', reduced_labels)
Gives as output:
face_train:
[[ 0.19151945 0.62210877 0.43772774]
[ 0.78535858 0.77997581 0.27259261]
[ 0.27646426 0.80187218 0.95813935]
[ 0.87593263 0.35781727 0.50099513]
[ 0.68346294 0.71270203 0.37025075]
[ 0.56119619 0.50308317 0.01376845]
[ 0.77282662 0.88264119 0.36488598]
[ 0.61539618 0.07538124 0.36882401]]
labels:
[ 0.9331401 0.65137814 0.39720258 0.78873014 0.31683612 0.56809865
0.86912739 0.43617342]
random_idxs:
[1 7 5 4]
reduced_train_face:
[[ 0.78535858 0.77997581 0.27259261]
[ 0.61539618 0.07538124 0.36882401]
[ 0.56119619 0.50308317 0.01376845]
[ 0.68346294 0.71270203 0.37025075]]
reduced_labels:
[ 0.65137814 0.43617342 0.56809865 0.31683612]
Can someone please help me to understand why sometimes the advanced selection doesn't work and what I can do to get it to work (2nd case)?
>>> import numpy as np
>>> b = np.random.rand(5, 14, 3, 2)
# advanced selection works as expected
>>> b[[0,1],[0,1]]
array([[[ 0.7575555 , 0.18989068],
[ 0.06816789, 0.95760398],
[ 0.88358107, 0.19558106]],
[[ 0.62122898, 0.95066355],
[ 0.62947885, 0.00297711],
[ 0.70292323, 0.2109297 ]]])
# doesn't work - why?
>>> b[[0,1],[0,1,2]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: shape mismatch: objects cannot be broadcast to a single shape
# but this seems to work
>>> b[:,[0,1,2]]
array([[[[ 7.57555496e-01, 1.89890676e-01],
[ 6.81678915e-02, 9.57603975e-01],
[ 8.83581071e-01, 1.95581063e-01]],
[[ 2.24896112e-01, 4.77818599e-01],
[ 4.29313861e-02, 8.61578045e-02],
[ 4.80092364e-01, 3.66821618e-01]],
...
Update
Breaking up the selection seems to resolve the problem, but I am unsure why this is necessary (or if there's a better way to achieve this).
>>> b.shape
(5, 14, 3, 2)
>>> b[[0,1]].shape
(2, 14, 3, 2)
# trying to separate indexing by dimension.
>>> b[[0,1]][:,[0,1,2]]
array([[[[ 0.7575555 , 0.18989068],
[ 0.06816789, 0.95760398],
[ 0.88358107, 0.19558106]],
[[ 0.22489611, 0.4778186 ],
[ 0.04293139, 0.0861578 ],
You want
b[np.ix_([0, 1], [0, 1, 2])]
You also need to do the same thing for b[[0, 1], [0, 1]], because that's not actually doing what you think it is:
b[np.ix_([0, 1], [0, 1])]
The problem here is that advanced indexing does something completely different from what you think it does. You've made the mistake of thinking that b[[0, 1], [0, 1, 2]] means "take all parts b[i, j] of b where i is 0 or 1 and j is 0, 1, or 2". This is a reasonable mistake to make, considering that it seems to work that way when you have one list in the indexing expression, like
b[:, [1, 3, 5], 2]
In fact, for an array A and one-dimensional integer arrays I and J, A[I, J] is an array where
A[I, J][n] == A[I[n], J[n]]
This generalizes in the natural way to more index arrays, so for example
A[I, J, K][n] == A[I[n], J[n], K[n]]
and to higher-dimensional index arrays, so if I and J are two-dimensional, then
A[I, J][m, n] == A[I[m, n], J[m, n]]
It also applies the broadcasting rules to the index arrays, and converts lists in the indexes to arrays. This is much more powerful than what you expected to happen, but it means that to do what you were trying to do, you need something like
b[[[0],
[1]], [[0, 1, 2]]]
np.ix_ is a helper that will do that for you so you don't have to write a dozen brackets.
I think you misunderstood the advanced selection syntax for this case. I used your example, just made it smaller to be easier to see.
import numpy as np
b = np.random.rand(5, 4, 3, 2)
# advanced selection works as expected
print b[[0,1],[0,1]] # http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
# this picks the two i,j=0 (a 3x2 matrix) and i=1,j=1, another 3x2 matrix
# doesn't work - why?
#print b[[0,1],[0,1,2]] # this doesnt' work because [0,1] and [0,1,2] have different lengths
print b[[0,1,2],[0,1,2]] # works
Output:
[[[ 0.27334558 0.90065184]
[ 0.8624593 0.34324983]
[ 0.19574819 0.2825373 ]]
[[ 0.38660087 0.63941692]
[ 0.81522421 0.16661912]
[ 0.81518479 0.78655536]]]
[[[ 0.27334558 0.90065184]
[ 0.8624593 0.34324983]
[ 0.19574819 0.2825373 ]]
[[ 0.38660087 0.63941692]
[ 0.81522421 0.16661912]
[ 0.81518479 0.78655536]]
[[ 0.65336551 0.1435357 ]
[ 0.91380873 0.45225145]
[ 0.57255923 0.7645396 ]]]
I wrote a function that takes in one set of randomized cartesian coordinates and returns the subset that remains within some spatial domain. To illustrate:
grid = np.ones((5,5))
grid = np.lib.pad(grid, ((10,10), (10,10)), 'constant')
>> np.shape(grid)
(25, 25)
random_pts = np.random.random(size=(100, 2)) * len(grid)
def inside(input):
idx = np.floor(input).astype(np.int)
mask = grid[idx[:,0], idx[:,1]] == 1
return input[mask]
>> inside(random_pts)
array([[ 10.59441506, 11.37998288],
[ 10.39124766, 13.27615815],
[ 12.28225713, 10.6970708 ],
[ 13.78351949, 12.9933591 ]])
But now I want the ability to simultaneously generate n sets of random_pts and keep n corresponding subsets that satisfy the same functional condition. So, if n=3,
random_pts = np.random.random(size=(3, 100, 2)) * len(grid)
Without resorting to for loop, how could I index my variables such that inside(random_pts) returns something like
array([[[ 17.73323523, 9.81956681],
[ 10.97074592, 2.19671642],
[ 21.12081044, 12.80412997]],
[[ 11.41995519, 2.60974757]],
[[ 9.89827156, 9.74580059],
[ 17.35840479, 7.76972241]]])
One approach -
def inside3d(input):
# Get idx in 3D
idx3d = np.floor(input).astype(np.int)
# Create a similar mask as witrh 2D case, but in 3D now
mask3d = grid[idx3d[:,:,0], idx3d[:,:,1]]==1
# Count of mask matches for each index in 0th dim
counts = np.sum(mask3d,axis=1)
# Index into input to get masked matches across all elements in 0th dim
out_cat_array = input.reshape(-1,2)[mask3d.ravel()]
# Split the rows based on the counts, as the final output
return np.split(out_cat_array,counts.cumsum()[:-1])
Verify results -
Create 3D random input:
In [91]: random_pts3d = np.random.random(size=(3, 100, 2)) * len(grid)
With inside3d:
In [92]: inside3d(random_pts3d)
Out[92]:
[array([[ 10.71196268, 12.9875877 ],
[ 10.29700184, 10.00506662],
[ 13.80111411, 14.80514828],
[ 12.55070282, 14.63155383]]), array([[ 10.42636137, 12.45736944],
[ 11.26682474, 13.01632751],
[ 13.23550598, 10.99431284],
[ 14.86871413, 14.19079225],
[ 10.61103434, 14.95970597]]), array([[ 13.67395756, 10.17229061],
[ 10.01518846, 14.95480515],
[ 12.18167251, 12.62880968],
[ 11.27861513, 14.45609646],
[ 10.895685 , 13.35214678],
[ 13.42690335, 13.67224414]])]
With inside:
In [93]: inside(random_pts3d[0])
Out[93]:
array([[ 10.71196268, 12.9875877 ],
[ 10.29700184, 10.00506662],
[ 13.80111411, 14.80514828],
[ 12.55070282, 14.63155383]])
In [94]: inside(random_pts3d[1])
Out[94]:
array([[ 10.42636137, 12.45736944],
[ 11.26682474, 13.01632751],
[ 13.23550598, 10.99431284],
[ 14.86871413, 14.19079225],
[ 10.61103434, 14.95970597]])
In [95]: inside(random_pts3d[2])
Out[95]:
array([[ 13.67395756, 10.17229061],
[ 10.01518846, 14.95480515],
[ 12.18167251, 12.62880968],
[ 11.27861513, 14.45609646],
[ 10.895685 , 13.35214678],
[ 13.42690335, 13.67224414]])