Getting index of maximum value in nested list - python

I have a nested list a made of N sublists each of them filled with M floats. I have a way to obtain the index of the largest float using numpy, as shown in the MWE below:
import numpy as np
def random_data(bot, top, N):
# Generate some random data.
return np.random.uniform(bot, top, N)
# Generate nested list a.
N, M = 10, 7 # number of sublists and length of each sublist
a = np.asarray([random_data(0., 1., M) for _ in range(N)])
# x,y indexes of max float in a.
print np.unravel_index(a.argmax(), a.shape)
Note that I'm using the sublist index and the float index as x,y coordinates where x is the index of a sublist and y the index of the float inside said sublist.
What I need now is a way to find the coordinates/indexes of the largest float imposing certain boundaries. For example I would like to obtain the x,y values of the largest float for the range [3:5] in x and [2:6] in y.
This means I want to search for the largest float inside a but restricting that search to those sublists within [3:5] and in those sublists restrict it to the floats within [2:6].
I can use:
print np.unravel_index(a[3:5].argmax(), a[3:5].shape)
to restrict the range in x but the index returned is shifted since the list is sliced and furthermore I can think of no way to obtain the y index this way.

An alternative solution would be to set the values outside the range to be np.inf:
import numpy as np
# Generate nested list a.
N, M = 10, 7 # number of sublists and length of each sublist
a = np.random.rand(N, M)
# x,y indexes of max float in a.
print np.unravel_index(a.argmax(), a.shape)
A = np.full_like(a, -np.inf) # use np.inf if doing np.argmin
A[3:5, 2:6] = a[3:5, 2:6]
np.unravel_index(A.argmax(), A.shape)

You could turn 'a' into a matrix, which lets you easily index rows and columns. From there, the same basic command works to get the restricted indices. You can then add the start of the restricted indices to the result to get them in terms of the whole matrix.
local_inds = np.unravel_index(A[3:5,2:6].argmax(),A[3:5,2:6].shape)
correct_inds = np.asarray(local_inds) + np.asarray([3,2])
This won't work if you have more complicated index restrictions. If you have a list of the x and y indices you want to restrict by, you could do:
idx = [1,2,4,5,8]
idy = [0,3,6]
# make the subset of the data
b = np.asarray([[a[x][y] for y in idy] for x in idx])
# get the indices in the restricted data
indb = np.unravel_index(b.argmax(), b.shape)
# convert them back out
inda = (idx[indb[0]],idy[indb[1]])

Related

Numpy - IF element is less than or equal to, then pass

I am trying to sort through a list of values in NUMPY / UPROOT, and I am having trouble with the formatting, as I am new to UPROOT.
The values are in some other list, and we'll call the values in one by one with a name, x.
If the values of x is greater than or equal to 5, I want to add it to the array, which is initially empty. If the number is less than 5, then we move onto the next number.
specifically, I need help with how to format the "greater than equal to"
array = []
if x is greater than or equal to 5:
array.append(x)
else:
return 0
Thanks everyone!
Using numpy you can do something like:
import numpy as np
# Initialize array
array = np.array([])
# Make some random values for x
x = np.floor(np.random.rand(10)*10)
for i in x: # Loop through x
if i >= 5: # If value is bigger or equal to 5
array = np.append(array, i) # add to array
So, "greater than equal to" is just >=
you're using a python list, which is different from numpy arrays. either way the following code should work
X = np.random.random(size= [10]) # array containing x values
if you want an numpy array
arr = X[X >= 5]
if you want a list
arr = list(X[X >= 5])

K Means in Python from Scratch

I have a python code for a k-means algorithm.
I am having a hard time understanding what it does.
Lines like C = X[numpy.random.choice(X.shape[0], k, replace=False), :] are very confusing to me.
Could someone explain what this code is actually doing?
Thank you
def k_means(data, k, num_of_features):
# Make a matrix out of the data
X = data.as_matrix()
# Get k random points from the data
C = X[numpy.random.choice(X.shape[0], k, replace=False), :]
# Remove the last col
C = [C[j][:-1] for j in range(len(C))]
# Turn it into a numpy array
C = numpy.asarray(C)
# To store the value of centroids when it updates
C_old = numpy.zeros(C.shape)
# Make an array that will assign clusters to each point
clusters = numpy.zeros(len(X))
# Error func. - Distance between new centroids and old centroids
error = dist(C, C_old, None)
# Loop will run till the error becomes zero of 5 tries
tries = 0
while error != 0 and tries < 1:
# Assigning each value to its closest cluster
for i in range(len(X)):
# Get closest cluster in terms of distance
clusters[i] = dist1(X[i][:-1], C)
# Storing the old centroid values
C_old = deepcopy(C)
# Finding the new centroids by taking the average value
for i in range(k):
# Get all of the points that match the cluster you are on
points = [X[j][:-1] for j in range(len(X)) if clusters[j] == i]
# If there were no points assigned to cluster, put at origin
if not points:
C[i][:] = numpy.zeros(C[i].shape)
else:
# Get the average of all the points and put that centroid there
C[i] = numpy.mean(points, axis=0)
# Erro is the distance between where the centroids use to be and where they are now
error = dist(C, C_old, None)
# Increase tries
tries += 1
return sil_coefficient(X,clusters,k)
(Expanded answer, will format later)
X is the data, as a matrix.
Using the [] notation, we are taking slices, or selecting single element, from the matrix. You may want to review numpy array indexing. https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
numpy.random.choice selects k elements at random from the size of the first dimension of the data matrix without replacement.
Notice, that in indexing, using the [] syntax, we see we have two entries. The numpy.random.choice, and ":".
":" indicates that we are taking everything along that axis.
Thus, X[numpy.random.choice(X.shape[0], k, replace=False), :] means we select an element along the first axis and take every element along the second which shares that first index. Effectively, we are selecting a random row of a matrix.
(The comments expalain this code quite well, I would suggest you read into numpy indexing an list comprehensions for further elucidation).
C[C[j][:-1] for j in range(len(c))]
The part after "C[" uses a list comprehension in order to select parts of the matrix C.
C[j] represents the rows of the matrix C.
We use the [:-1] to take up to, but not including the final element of the row. We do this for each row in the matrix C. This removes the last column of the matrix.
C = numpy.asarray(C). This converts the matrix to a numpy array so we can do special numpy things with it.
C_old = numpy.zeros(C.shape). This creates a zero matrix, to later be populated, which is the same size as C. We are initializing this array to be populated later.
clusters = numpy.zeros(len(x)). This creates a zero vector whose dimension is the same as the number of rows in the matrix X. This vector will be populated later. We are initializing this array to be populated later.
error = dist(C, C_old, None). Take the distance between the two matrices. I believe this function to be defined elsewhere in your script.
tries = 0. Set the tires counter to 0.
while...do this block while this condition is true.
for i in [0...(number of rows in X - 1)]:
clusters[i] = dist1(X[i][:-1], C); Put which cluster the ith row of X is closest to in the ith position of clusters.
C_old = deepcopy(C) - Create a copy of C which is new. Don't just move pointers.
for each (0..number of means - 1):
points = [X[j][:-1] for j in range(len(X)) if clusters[j] == i]. This is a list comprehension. Create a list of the rows of X, with all but the last entry, but only include the row if it belongs to the jth cluster.
if not points. If nothing belongs to a cluster.
C[i][:] = numpy.zeros(C[i].shape). Create a vector of zeros, to be populated later, and use this vector as the ith row of the clusters matrix, C.
else:
C[i] = np.mean(points, axis=0). Assign the ith row of the clusters matrix, C, to be the average point in the cluster. We sum across the rows (axis=0). This is us updating our clusters.

Using numpy where on multidimensional array but need to control indexing

I need to modify elements of an 3D array if they exceed some threshold value. The modification is based upon related elements of another array. More concretely:
A_ijk = A_ijk if A_ijk < threshold value
= (B_(i-1)jk + B_ijk) / 2, otherwise
Numpy.where provides most of the functionality I need, but I don't know how to iterate over the first index without an explicit loop. The follow code does what I want, but uses a loop. Is there a better way? Assume A and B are same shape.
for i in xrange(A.shape[0]):
A[i] = numpy.where(A[i] <= threshold, A[i], (B[i - 1] + B[i]) / 2)
To address the comments below: The first few rows of A are guaranteed to be below threshold. This keeps the i index from looping over to the last entry of A.
You can vectorize your operation by using boolean indexing to replace the elements of A that are above the threshold. A little care has to be taken, since the auxiliary array corresponding to (B[i-1] + B[i])/2 has one less size along the first dimension than A, so we have to explicitly ignore the first row of A (knowing that they are all below the threshold, as explained in the question):
import numpy as np
# some dummy data
A = np.random.rand(3,4,5)
B = np.random.rand(3,4,5)
threshold = 0.5
A[0,:] *= threshold # put the first dummy row below threshhold
mask = A[1:] > threshold # to be overwritten, shape (2,4,5)
replace = (B[:-1] + B[1:])/2 # to overwrite elements in A from, shape (2,4,5)
# replace corresponding elements where `mask` is True
A[1:][mask] = replace[mask]
You can use where to directly index into ndarray:
a = np.random.rand(4,3,10)
b = np.zeros(a.shape)
idx = np.where(a < .1)
print(a)
a[idx] = b[idx]
print(a)
If a for-loop is needed however, just convert the ravel the indices and update.
a = np.random.rand(4,3,10)
b = np.zeros(a.shape)
idx = [np.ravel_multi_index(i, a.shape) for i in zip(*np.where(a < .1))]
print(a, idx)
for i in idx:
a.ravel()[i] = b.ravel()[i]
print(a)

How to delete an element in an array depending on another array modification in Python

I have two arrays:
x=np.array([0,0,0,0,1,1,0,0,0,0,0,0,0,1,1,0,0,1])
y=np.array([y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12])
Where the elements y_i correspond only to the 0-elements in the array x.
I want to modify one random 0-element in x and to make it 1. At this time, I have to delete the corresponding element in y (because y contains only element related to 0 in the array x).
[EDIT:] for example, I have
x=np.array([0,0,0,0,0,1,1,0,0,1,0]
y=np.array([21,32,45,54,87,90,88,60])
such that to x[0] corresponds y[0]=21, x[1]—> y[1]=32, x[2]—>y[2]=45, x[3]—> y[3]=54, x[4]—>y[4]=87, x[7]—>y[5]=90, x[8]—>y[6]=88, x[10]—>y[7]=60. If I change x[2], x[7] and x[10] from 0 to 1, I want to cancel the corresponding elements in y, i.e. 45, 90 and 60
This may be what you are looking for:
x_0 = np.where(x==0)[0]
rand_ind = np.random.randint(1,len(x_0))
x[x_0[rand_ind]] = 1;
y = np.delete(y, rand_ind)
x_0 is an array of elements in x initially equal to 0. rand_ind is a random index chosen from x_0 length range. To change 0 to 1 the program uses x_0[rand_ind] while to delete the corresponding element in y it simply uses rand_ind.
This is a modified solution for the updated question,
import numpy as np
import random
x_0 = np.where(x==0)[0]
rand_ind = np.array(random.sample(list(x_0), 3))
x[rand_ind] = 1
y = np.delete(y, [np.where(x_0==i)[0] for i in rand_ind])
In this solution rand_ind is a direct choice of indices of the x array and not the trimmed x_0 array. It chooses the indices stored in x_0 to avoid choosing indexes where the value in x is 1. The indexes i.e. elements of x_0 are the indexes of x while the indexes of x_0 correspond to indexes/positions in y.
Program uses random.sample as a straightforward way to pick N elements at random without repetition (here 3). It then turns the elements at rand_ind in x to 1 and deletes the corresponding elements in y. The last step is done by using a generator and finding the indices of x_0 elements with each rand_ind value.

Pivoting numpy array about an element in my array

I need to be able to pivot my numpy array about a certain element in my array.
Say I have the array x = [a b c d e f g].
I know the operation to reverse it: x_arr = [::-1] == [g f e d c b a]
But let's say I want to pivot my original array about c, then I want: [e d c b a 0 0]
I'm thinking reversing then some sort of concatenating and reduction, but some help would be appreciated.
I wrote the following script where a is the array, p is the index of the pivot element, and f is used to pad the array after the pivot. The indices for truncating the arrays were found with some logic and trial and error. Note that in the case of an even length array, the center index c will be x.5 where x is an integer, while for an odd length array it will be x.0. This allows the if statements to correctly handle both even and odd length arrays.
In the first case, when the pivot element is the center of the array, I simply return the reverse of the array. Note that an even length array will never execute this if statement.
In the second case, when the pivot element is before the center of the array, I remove the elements from the reversed array that would fall outside of the pivotted array. Then I return this shortened array right padded with f to the length of a.
The only difference between the third case, where the pivot element is after the center of the array, and the second case is that the shortened array is left padded instead of right padded.
Finally, if none of the if statements execute due to some unforeseen error, I return None.
#!/usr/bin/env python
import numpy as np
def pivot(a,p,f):
la = len(a)
c = la/2.0-0.5
x = a[::-1]
if p==c:
return x
if p<c:
x = x[la-(2*p+1):]
lpad = la-len(x)
pad = np.repeat(f,lpad)
return np.append(x,pad)
if p>c:
x = x[:2*(la-p)-1]
lpad = la-len(x)
pad = np.repeat(f,lpad)
return np.append(pad,x)
return None
I haven't used numpy, so i am not much familiar with that,
But what you want to achieve is pretty simple
x = ['a','b','c','d','e','f','g']
pivot = 'c'
len_x = len(x)
pad_len = len_x - len(x[:x.index(pivot)*2 + 1])
y = x[:x.index(pivot)*2 + 1]
y = list(reversed(y))
for i in range(pad_len):
y.append('0')
print y
Output is ['e', 'd', 'c', 'b', 'a', '0', '0']
I hope this is what you where looking for.

Categories