Array element evaluation from reverse - python

I'm still very new to python and programing and I'm trying to figure out if I'm going about this problem in the correct fashion. I tend to have a matlab approach to things but here I'm just struggling...
Context:
I have two numpy arrays plotted in this image on flickr since I can't post photos here :(. They are of equal length properties (both 777x1600) and I'm trying to use the red array to help return the index(value on the x-axis of plot) and element value(y-axis) of the point in the blue plot indicated by the arrow for each row of the blue array.
The procedure I've been tasked with was to:
a) determine max value of red array (represented with red dot in figure and already achieved)
and b) Start at the end of the blue array with the final element and count backwards, comparing element to preceding element. Goal being to determine where the preceding value decreases. (for example, when element -1 is greater than element -2, indicative of the last peak in the image). Additionally, to prevent selecting "noise" at the tail end of the section with elevated values, I also need to constrain the selected value to be larger than the maximum of the red array.
Here's what I've got so far, but I'm stuck on line two where I have to evaluate the selected row of the array from the (-1) position in the row to the beginning, or (0) position:
for i,n in enumerate(blue): #select each row of blue in turn to analyze
for j,m in enumerate(n): #select each element of blue ??how do I start from the end of array and work backwards??
if m > m-1 and m > max_val_red[i]:
indx_m[i] = j
val_m[i] = m

To answer you question directly, you can use n[::-1] to reverse the arrray n.
So the code is :
for j, m in enumerate(n[::-1]):
j = len(n)-j-1
# here is your code
But to increase calculation speed, you should avoid python loop:
import numpy as np
n = np.array([1,2,3,4,2,5,7,8,3,2,3,3,0,1,1,2])
idx = np.nonzero(np.diff(n) < 0)[0]
peaks = n[idx]
mask = peaks > 3 # peak muse larger than 3
print "index=", idx[mask]
print "value=", peaks[mask]
the output is:
index= [3 7]
value= [4 8]

I assume you mean:
if m > n[j-1] and m > max_val_red[i]:
indx_m[i] = j
val_m[i] = m
because m > m - 1 is always True
To reverse an array on an axis you can index the array using ::-1 on that axis, for example to reverse blue on axis 1 you can use:
blue_reverse = blue[:, ::-1]
Try and see you can write your function as a set of array operations instead of loops (that tends to be much faster). This is similar to the other answer, but it should allow you avoid both loops you're currently using:
threshold = red.max(1)
threshold = threshold[:, np.newaxis] #this makes threshold's shape (n, 1)
blue = blue[:, ::-1]
index_from_end = np.argmax((blue[:, :-1] > blue[:, 1:]) & (blue[:, :-1] > threshold), 1)
value = blue[range(len(blue)), index_from_end]
index = blue.shape[1] - 1 - index_from_end

Sorry, I didn't read all of it but you can possibly look into the built in function reversed.
so instead of enumerate( n ). you can do reversed( enumerate( n ) ). But then your index would be wrong the correct index would be eval to len( n ) - j

Related

How to only iterate over one argument of an matrix array if both have the same variable in python?

I am trying to eliminate some non zero entries in a matrix where the 2 adjacent diagonals to the main diagonal are nonzero.
h = np.zeros((n**2,n**2))
for i in np.arange(0, n**2):
for j in np.arange(0,n**2):
if(i==j):
for i in np.arange(0,n**2,n):
h[i,j-1] = 0
print(h)
I want it to only eliminate the lower triangle non-zero entries, but it's erasing some entries in the upper triangle. I know this is because on the last if statement with the for loop, it is iterating for both arguments of the array, when I only want it to iterate for the first argument i, but since I set i=j, it runs for both.
The matrix I want to obtain is the following:
Desired matrix
PS: sorry for the extremely bad question format, this is my first question.
hamiltonian = np.zeros((n**2,n**2)) # store the Hamiltonian
for i in np.arange(0, n**2):
for j in np.arange(0,n**2):
if abs(i-j) == 1:
hamiltonian[i,j] = 1
Is this what you are looking for?:
hamiltonian[0,1] = 1
hamiltonian[n**2-1,n**2-2] = 1
for i in np.arange(1, n**2-1):
hamiltonian[i,i+1] = 1
hamiltonian[i,i-1] = 1

K Means in Python from Scratch

I have a python code for a k-means algorithm.
I am having a hard time understanding what it does.
Lines like C = X[numpy.random.choice(X.shape[0], k, replace=False), :] are very confusing to me.
Could someone explain what this code is actually doing?
Thank you
def k_means(data, k, num_of_features):
# Make a matrix out of the data
X = data.as_matrix()
# Get k random points from the data
C = X[numpy.random.choice(X.shape[0], k, replace=False), :]
# Remove the last col
C = [C[j][:-1] for j in range(len(C))]
# Turn it into a numpy array
C = numpy.asarray(C)
# To store the value of centroids when it updates
C_old = numpy.zeros(C.shape)
# Make an array that will assign clusters to each point
clusters = numpy.zeros(len(X))
# Error func. - Distance between new centroids and old centroids
error = dist(C, C_old, None)
# Loop will run till the error becomes zero of 5 tries
tries = 0
while error != 0 and tries < 1:
# Assigning each value to its closest cluster
for i in range(len(X)):
# Get closest cluster in terms of distance
clusters[i] = dist1(X[i][:-1], C)
# Storing the old centroid values
C_old = deepcopy(C)
# Finding the new centroids by taking the average value
for i in range(k):
# Get all of the points that match the cluster you are on
points = [X[j][:-1] for j in range(len(X)) if clusters[j] == i]
# If there were no points assigned to cluster, put at origin
if not points:
C[i][:] = numpy.zeros(C[i].shape)
else:
# Get the average of all the points and put that centroid there
C[i] = numpy.mean(points, axis=0)
# Erro is the distance between where the centroids use to be and where they are now
error = dist(C, C_old, None)
# Increase tries
tries += 1
return sil_coefficient(X,clusters,k)
(Expanded answer, will format later)
X is the data, as a matrix.
Using the [] notation, we are taking slices, or selecting single element, from the matrix. You may want to review numpy array indexing. https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
numpy.random.choice selects k elements at random from the size of the first dimension of the data matrix without replacement.
Notice, that in indexing, using the [] syntax, we see we have two entries. The numpy.random.choice, and ":".
":" indicates that we are taking everything along that axis.
Thus, X[numpy.random.choice(X.shape[0], k, replace=False), :] means we select an element along the first axis and take every element along the second which shares that first index. Effectively, we are selecting a random row of a matrix.
(The comments expalain this code quite well, I would suggest you read into numpy indexing an list comprehensions for further elucidation).
C[C[j][:-1] for j in range(len(c))]
The part after "C[" uses a list comprehension in order to select parts of the matrix C.
C[j] represents the rows of the matrix C.
We use the [:-1] to take up to, but not including the final element of the row. We do this for each row in the matrix C. This removes the last column of the matrix.
C = numpy.asarray(C). This converts the matrix to a numpy array so we can do special numpy things with it.
C_old = numpy.zeros(C.shape). This creates a zero matrix, to later be populated, which is the same size as C. We are initializing this array to be populated later.
clusters = numpy.zeros(len(x)). This creates a zero vector whose dimension is the same as the number of rows in the matrix X. This vector will be populated later. We are initializing this array to be populated later.
error = dist(C, C_old, None). Take the distance between the two matrices. I believe this function to be defined elsewhere in your script.
tries = 0. Set the tires counter to 0.
while...do this block while this condition is true.
for i in [0...(number of rows in X - 1)]:
clusters[i] = dist1(X[i][:-1], C); Put which cluster the ith row of X is closest to in the ith position of clusters.
C_old = deepcopy(C) - Create a copy of C which is new. Don't just move pointers.
for each (0..number of means - 1):
points = [X[j][:-1] for j in range(len(X)) if clusters[j] == i]. This is a list comprehension. Create a list of the rows of X, with all but the last entry, but only include the row if it belongs to the jth cluster.
if not points. If nothing belongs to a cluster.
C[i][:] = numpy.zeros(C[i].shape). Create a vector of zeros, to be populated later, and use this vector as the ith row of the clusters matrix, C.
else:
C[i] = np.mean(points, axis=0). Assign the ith row of the clusters matrix, C, to be the average point in the cluster. We sum across the rows (axis=0). This is us updating our clusters.

Creating multidimensional arrays similair to matlab

I have this MatLab-Code and I need to do the same with Python.
for t = 1:AnzahlSegA zsegaussen{t,1} = werteAussen(t,SpalteSegStartA):Abstandrechnung:werteAussen(t,SpalteSegEndeA);
This code creates an array with t*1 dimensions and fills each row with numbers. it starts at a given number (the first cell) and adds more numbers according to the stepsize ( in this case Abstandsrechnung) till it reaches the end value. WerteAussen is a Matrix which contains the needed information.
I wanted to the same with python but couldnt come up with something good. my best effort is this
zsegmente = [[] for i in range(row)]
while listenstart < row and p < row:
x: int
for x in range(int(AllesAussen.iloc[p, 0]), int(AllesAussen.iloc[p, 1]), Schrittweite):
zsegmente[listenstart].append(x)
if len(zsegmente[listenstart]) == (int(AllesAussen.iloc[p, 1] - int(AllesAussen.iloc[p, 0])) / Schrittweite):
listenstart += 1
p += 1
print(zsegmente)
my idea was to have a list which contains lists which have the same information as each row in the array. as it turned out this is not easy to use anmd modify so I really need to get a multy dimensional array somehow.
I appreciate any help.

Minimum number of iterations in matrix where cell value replaced by maximum of neighbour cell value in single iteration

I have an matrix with values in each cell (minimum value=1), where the maximum value is 'max'.
At a time, I modify each cell value by the highest value of its neighboring cells i.e. all 8 neighbors, and this occurs for the whole matrix, simultaneously. I want to find after what minimum number of iterations after which value of all cells will be max.
One brute force method of doing this is by padding the matrix by zeros, and
for i in range (1,x_max+1):
for j in range(1,y_max+1):
maximum = 0
for k in range(-1,2):
for l in range(-1,2):
if matrix[i+k][j+l]>maximum:
maximum = matrix[i+k][j+l]
matrix[i][j] = maximum
But is there an intelligent and faster way of doing this?
Thanks in advance.
I think this can be solved by BFS(Breadth first Search).
Start BFS simulatneously with all the matrix cells with 'max' value.
dis[][] == infinite // min. distance of cell from nearest cell with 'max' value, initially infinite for all
Q // Queue
M[][] // matrix
for all i,j // travers the matrix, enqueue all cells with 'max'
if M[i][j] == 'max'
dis[i][j] = 0 , Q.push( cell(i,j) )
while !Q.empty:
cell Current = Q.front
for all neighbours Cell(p,q) of Current:
if dis[p][q] == infinite
dis[p][q] = dis[Current.row][Current.column] + 1
Q.push( cell(p,q))
Q.pop()
The cell with max(dis[i][j]) for all i,j will be the no. of iterations needed.
Use an array with a "border".
Testing the edge conditions is tedious and can be avoided by making the array 1-bigger around the edge, each element with the value of INT_MIN.
Additionally, consider 8 tests, rather than a double nested loop
// Data is in matrix[1...N][1...M], yet is size matrix[N+2][M+2]
for (i=1; i <= N; i++) {
for (j=1; j <= M; j++) {
maximum = matrix[i-1][j-l];
if (matrix[i-1][j+0] > maximum) maximum = matrix[i-1][j+0];
if (matrix[i-1][j+1] > maximum) maximum = matrix[i-1][j+1];
if (matrix[i+0][j-1] > maximum) maximum = matrix[i+0][j-1];
if (matrix[i+0][j+0] > maximum) maximum = matrix[i+0][j+0];
if (matrix[i+0][j+1] > maximum) maximum = matrix[i+0][j+1];
if (matrix[i+1][j-1] > maximum) maximum = matrix[i+1][j-1];
if (matrix[i+1][j+0] > maximum) maximum = matrix[i+1][j+0];
if (matrix[i+1][j+1] > maximum) maximum = matrix[i+1][j+1];
newmatrix[i][j] = maximum
All existing answers require examining every cell in the matrix. If you don't already know what the locations of the maximum value are, this is unavoidable, and in that case, Amit Kumar's BFS algorithm has optimal time complexity: O(wh), if the matrix has width w and height h.
OTOH, perhaps you already know the locations of the k maximum values, and k is relatively small. In that case, the following algorithm will find the answer in just O(k^2*(log(k)+log(max(w, h)))) time, which is much faster when either w or h is large. It doesn't actually look at any matrix entries; instead, it runs a binary search to look for candidate stopping times (that is, answers). For each candidate stopping time it builds the set of rectangles that would be occupied by max by that time, and checks whether any matrix cell remains uncovered by a rectangle.
To explain the idea, we first need some terms. Call the top row of a rectangle a "starting vertical event", and the row below its bottom edge an "ending vertical event". A "basic interval" is the interval of rows spanned by any pair of vertical events that does not have a third vertical event anywhere between them (the event pairs defining these intervals can be from the same or different rectangles). Notice that with k rectangles, there can never be more than 2k+1 basic intervals -- there is no dependence here on h.
The basic idea is to walk left-to-right through the columns of the matrix that correspond to horizontal events: columns in which either a new rectangle "starts" (the left vertical edge of a rectangle), or an existing rectangle "finishes" (the column to the right of the right vertical edge of a rectangle), keeping track of how many rectangles are currently covering every basic interval. If we ever detect a basic interval covered by 0 rectangles, we can stop: we have found a column containing one or more cells that are not yet covered at time t. If we get to the right edge of the matrix without this happening, then all cells are covered at time t.
Here is pseudocode for a function that checks whether any matrix cell remains uncovered by time t, given a length-k array peak, where (peak[i].x, peak[i].y) is the location of the i-th max-containing cell in the original matrix, in increasing order of x co-ordinate (so the leftmost max-containing cell is at (peak[1].x, peak[1].y)).
Function IsMatrixCovered(t, peak[]) {
# Discover all vertical events and basic intervals
Let vertEvents[] be an empty array of integers.
For i from 1 to k:
top = max(1, peak[i].y - t)
bot = min(h, peak[i].y + t)
Append top to vertEvents[]
Append bot+1 to vertEvents[]
Sort vertEvents in increasing order, and remove duplicates.
x = 1
Let horizEvents[] be an empty array of { col, type, top, bot } structures.
For i from 1 to k:
# Calculate the (clipped) rectangle that peak[i] will cover at time t:
lft = max(1, peak[i].x - t)
rgt = min(w, peak[i].x + t)
top = max(1, peak[i].y - t)
bot = min(h, peak[i].y + t)
# Convert vertical positions to vertical event indices
top = LookupIndexUsingBinarySearch(top, vertEvents[])
bot = LookupIndexUsingBinarySearch(bot+1, vertEvents[])
# Record horizontal events
Append (lft, START, top, bot) to horizEvents[]
Append (rgt+1, STOP, top, bot) to horizEvents[]
Sort horizEvents in increasing order by its first 2 fields, with START considered < STOP.
# Walk through all horizontal events, from left to right.
Let basicIntervals[] be an array of size(vertEvents[]) integers, initially all 0.
nOccupiedBasicIntervalsFirstCol = 0
For i from 1 to size(horizEvents[]):
If horizEvents[i].type = START:
d = 1
Else (if it is STOP):
d = -1
If horizEvents[i].col <= w:
For j from horizEvents[i].top to horizEvents[i].bot:
If horizEvents[i].col = 1 and basicIntervals[j] = 0:
++nOccupiedBasicIntervalsFirstCol # Must be START
basicIntervals[j] += d
If basicIntervals[j] = 0:
return FALSE
If nOccupiedBasicIntervalsFirstCol < size(basicIntervals):
return FALSE # Could have checked earlier, but the code is simpler this way
return TRUE
}
The above function can simply be called inside a binary search on t, that looks for the smallest value of t for which the function returns TRUE.
A further factor of k/log(k) could be removed by exploiting the fact that the set of basic intervals affected by any rectangle starting or ending is always an interval, through the use of Fenwick trees.

Walk through each column in a numpy matrix efficiently in Python

I have a very big two-dimensions array in Python, using numpy library. I want to walk through each column efficiently and check each time if elements are different from 0 to count their number in every column.
Suppose I have the following matrix.
M = array([[1,2], [3,4]])
The following code enables us to walk through each row efficiently, for example (it is not what I intend to do of course!):
for row_idx, row in enumerate(M):
print "row_idx", row_idx, "row", row
for col_idx, element in enumerate(row):
print "col_idx", col_idx, "element", element
# update the matrix M: square each element
M[row_idx, col_idx] = element ** 2
However, in my case I want to walk through each column efficiently, since I have a very big matrix.
I've heard that there is a very efficient way to achieve this using numpy, instead of my current code:
curr_col, curr_row = 0, 0
while (curr_col < numb_colonnes):
result = 0
while (curr_row < numb_rows):
# If different from 0
if (M[curr_row][curr_col] != 0):
result += 1
curr_row += 1
.... using result value ...
curr_col += 1
curr_row = 0
Thanks in advance!
In the code you showed us, you treat numpy's arrays as lists and for what you can see, it works! But arrays are not lists, and while you can treat them as such it wouldn't make sense to use arrays, or even numpy.
To really exploit the usefulness of numpy you have to operate directly on arrays, writing, e.g.,
M = M*M
when you want to square the elements of an array and using the rich set of numpy functions to operate directly on arrays.
That said, I'll try to get a bit closer to your problem...
If your intent is to count the elements of an array that are different from zero, you can use the numpy function sum.
Using sum, you can obtain the sum of all the elements in an array, or you can sum across a particular axis.
import numpy as np
a = np.array(((3,4),(5,6)))
print np.sum(a) # 18
print np.sum(a, axis=0) # [8, 10]
print np.sum(a, axis=1) # [7, 11]
Now you are protesting: I don't want to sum the elements, I want to count the non-zero elements... but
if you write a logical test on an array, you obtain an array of booleans, e.g, we want to test which elements of a are even
print a%2==0
# [[False True]
# [False True]]
False is zero and True is one, at least when we sum it...
print np.sum(a%2==0) # 2
or, if you want to sum over a column, i.e., the index that changes is the 0-th
print np.sum(a%2==0, axis=0) # [0 2]
or sum across a row
print np.sum(a%2==0, axis=1) # [1 1]
To summarize, for your particular use case
by_col = np.sum(M!=0, axis=0)
# use the counts of non-zero terms in each column, stored in an array
...
# if you need the grand total, use sum again
total = np.sum(by_col)

Categories