Ising Model: How to shorten simulation time? - python

I am simulating the Ising Model of ferromagnets in dimensions higher than 3 using a simple coding structure but am having some problems with efficiency. In my code, there is one particular function that is the bottleneck.
In the simulation process, it is necessary to find what are called the nearest neighbors of a given site. For example, in the 2D Ising model, spins occupy the lattice at every point, noted by two numbers: (x,y). The nearest neighbors of the point at (x,y) are the four adjacent values, namely (x+1,y),(x-1,y),(x,y+1),(x,y-1). In 5D, the spin at some lattice site has coordinates (a,b,c,d,e) with 10 nearest neighbors, in the same form as before but for each point in the tuple.
Now here's the code that is given the following inputs:
"site_i is a random value between 0 and n-1 denoting the site of the ith spin"
"coord is an array of size (n**dim,dim) that contains the coordinates of ever spin"
"spins is an array of shape (n**dim,1) that contains the spin values (-1 or 1)"
"n is the lattice size and dim is the dimensionality"
"neighbor_coupling is the number that tells the function to return the neighbor spins that are one spacing away, two spacing away, etc."
def calc_neighbors(site_i,coord,spins,n,dim,neighbor_coupling):
# Extract all nearest neighbors
# Obtain the coordinates of each nearest neighbor
# How many neighbors to extract
num_NN = 2*dim
# Store the results in a result array
result_coord = np.zeros((num_NN,dim))
result_spins = np.zeros((num_NN,1))
# Get the coordinates of the ith site
site_coord = coord[site_i]
# Run through the + and - for each scalar value in the vector in site_coord
count = 0
for i in range(0,dim):
assert count <= num_NN, "Accessing more than nearest neighbors values."
site_coord_i = site_coord[i]
plus = site_coord_i + neighbor_coupling
minus = site_coord_i - neighbor_coupling
# Implement periodic boundaries
if (plus > (n-1)): plus = plus - n
if (minus < 0): minus = n - np.abs(minus)
# Store the coordinates
result_coord[count] = site_coord
result_coord[count][i] = minus
# Store the spin value
spin_index = np.where(np.all(result_coord[count]==coord,axis=1))[0][0]
result_spins[count] = spins[spin_index]
count = count + 1
# Store the coordinates
result_coord[count] = site_coord
result_coord[count][i] = plus
# Store the spin value
spin_index = np.where(np.all(result_coord[count]==coord,axis=1))[0][0]
result_spins[count] = spins[spin_index]
count = count + 1
I don't really know how I can make this faster but it would help a lot. Perhaps a different way of storing everything?

Not an answer, just some suggestions for straightening: there is a lot of copying while you attempt to document every step of the calculation. Without sacrificing this, you could drop site_coord_i, and then
# New coords, implement periodic boundaries
plus = (site_coord[i] + neighbor_coupling) % n
minus = (site_coord[i] - neighbor_coupling + n) % n
This avoids intermediate steps ("if...").
One other suggestions would be to defer using a subarray until you really need it:
# Store the coordinates
rcc = site_coord
rcc[i] = plus
# Store the spin value
spin_index = np.where(np.all(rcc==coord,axis=1))[0][0]
result_spins[count] = spins[spin_index]
result_coord[count] = rcc
count += 1
The goal is to reduce the number of dimensions of the variable used in the comparison, and to prefer local variables.


Heuristic to choose five column arrays that maximise the dot product

I have a sparse 60000x10000 matrix M where each element is either a 1 or 0. Each column in the matrix is a different combination of signals (ie. 1s and 0s). I want to choose five column vectors from M and take the Hadamard (ie. element-wise) product of them; I call the resulting vector the strategy vector. After this step, I compute the dot product of this strategy vector with a target vector (that does not change). The target vector is filled with 1s and -1s such that having a 1 in a specific row of the strategy vector is either rewarded or penalised.
Is there some heuristic or linear algebra method that I could use to help me pick the five vectors from the matrix M that result in a high dot product? I don't have any experience with Google's OR tools nor Scipy's optimization methods so I am not too sure if they can be applied to my problem. Advice on this would be much appreciated! :)
Note: the five column vectors given as the solution does not need to be the optimal one; I'd rather have something that does not take months/years to run.
First of all, thanks for a good question. I don't get to practice numpy that often. Also, I don't have much experience in posting to SE, so any feedback, code critique, and opinions relating to the answer are welcome.
This was an attempt at finding an optimal solution at first, but I didn't manage to deal with the complexity. The algorithm should, however, give you a greedy solution that might prove to be adequate.
Colab Notebook (Python code + Octave validation)
Core Idea
Note: During runtime, I've transposed the matrix. So, the column vectors in the question correspond to row vectors in the algorithm.
Notice that you can multiply the target with one vector at a time, effectively getting a new target, but with some 0s in it. These will never change, so you can filter out some computations by removing those rows (columns, in the algorithm) in further computations entirely - both from the target and the matrix. - you're then left with a valid target again (only 1s and -1 in it).
That's the basic idea of the algorithm. Given:
n: number of vectors you need to pick
b: number of best vectors to check
m: complexity of matrix operations to check one vector
Do an exponentially-complex O((n*m)^b) depth-first search, but decrease the complexity of the calculations in deeper layers by reducing target/matrix size, while cutting down a few search paths with some heuristics.
Heuristics used
The best score achieved so far is known in every recursion step. Compute an optimistic vector (turn -1 to 0) and check what scores can still be achieved. Do not search in levels where the score cannot be surpassed.
This is useless if the best vectors in the matrix have 1s and 0s equally distributed. The optimistic scores are just too high. However, it gets better with more sparsity.
Ignore duplicates. Basically, do not check duplicate vectors in the same layer. Because we reduce the matrix size, the chance for ending up with duplicates increases in deeper recursion levels.
Further Thoughts on Heuristics
The most valuable ones are those that eliminate the vector choices at the start. There's probably a way to find vectors that are worse-or-equal than others, with respect to their affects on the target. Say, if v1 only differs from v2 by an extra 1, and target has a -1 in that row, then v1 is worse-or-equal than v2.
The problem is that we need to find more than 1 vector, and can't readily discard the rest. If we have 10 vectors, each worse-or-equal than the one before, we still have to keep 5 at the start (in case they're still the best option), then 4 in the next recursion level, 3 in the following, etc.
Maybe it's possible to produce a tree and pass it on in into recursion? Still, that doesn't help trim down the search space at the start... Maybe it would help to only consider 1 or 2 of the vectors in the worse-or-equal chain? That would explore more diverse solutions, but doesn't guarantee that it's more optimal.
Warning: Note that the MATRIX and TARGET in the example are in int8. If you use these for the dot product, it will overflow. Though I think all operations in the algorithm are creating new variables, so are not affected.
# Given:
TARGET = np.random.choice([1, -1], size=60000).astype(np.int8)
MATRIX = np.random.randint(0, 2, size=(10000,60000), dtype=np.int8)
# Tunable - increase to search more vectors, at the cost of time.
# Performs better if the best vectors in the matrix are sparse
MAX_BRANCHES = 3 # can give more for sparser matrices
# Usage
score, picked_vectors_idx = pick_vectors(TARGET, MATRIX, 5)
# Function
def pick_vectors(init_target, init_matrix, vectors_left_to_pick: int, best_prev_result=float("-inf")):
assert vectors_left_to_pick >= 1
if init_target.shape == (0, ) or len(init_matrix.shape) <= 1 or init_matrix.shape[0] == 0 or init_matrix.shape[1] == 0:
return float("inf"), None
target = init_target.copy()
matrix = init_matrix.copy()
neg_matrix = np.multiply(target, matrix)
neg_matrix_sum = neg_matrix.sum(axis=1)
if vectors_left_to_pick == 1:
picked_id = np.argmax(neg_matrix_sum)
score = neg_matrix[picked_id].sum()
return score, [picked_id]
sort_order = np.argsort(neg_matrix_sum)[::-1]
sorted_sums = neg_matrix_sum[sort_order]
sorted_neg_matrix = neg_matrix[sort_order]
sorted_matrix = matrix[sort_order]
best_score = best_prev_result
best_picked_vector_idx = None
# Heuristic 1 (H1) - optimistic target.
# Set a maximum score that can still be achieved
optimistic_target = target.copy()
optimistic_target[target == -1] = 0
if optimistic_target.sum() <= best_score:
# This check can be removed - the scores are too high at this point
return float("-inf"), None
# Heuristic 2 (H2) - ignore duplicates
vecs_tried = set()
# MAIN GOAL: for picked_id, picked_vector in enumerate(sorted_matrix):
for picked_id, picked_vector in enumerate(sorted_matrix[:MAX_BRANCHES]):
# H2
picked_tuple = tuple(picked_vector)
if picked_tuple in vecs_tried:
# Discard picked vector
new_matrix = np.delete(sorted_matrix, picked_id, axis=0)
# Discard matrix and target rows where vector is 0
ones = np.argwhere(picked_vector == 1).squeeze()
new_matrix = new_matrix[:, ones]
new_target = target[ones]
if len(new_matrix.shape) <= 1 or new_matrix.shape[0] == 0:
return float("-inf"), None
# H1: Do not compute if best score cannot be improved
new_optimistic_target = optimistic_target[ones]
optimistic_matrix = np.multiply(new_matrix, new_optimistic_target)
optimistic_sums = optimistic_matrix.sum(axis=1)
optimistic_viable_vector_idx = optimistic_sums > best_score
if optimistic_sums.max() <= best_score:
new_matrix = new_matrix[optimistic_viable_vector_idx]
score, next_picked_vector_idx = pick_vectors(new_target, new_matrix, vectors_left_to_pick - 1, best_prev_result=best_score)
if score <= best_score:
# Convert idx of trimmed-down matrix into sorted matrix IDs
for i, returned_id in enumerate(next_picked_vector_idx):
# H1: Loop until you hit the required number of 'True'
values_passed = 0
j = 0
while True:
value_picked: bool = optimistic_viable_vector_idx[j]
if value_picked:
values_passed += 1
if values_passed-1 == returned_id:
next_picked_vector_idx[i] = j
j += 1
# picked_vector index
if returned_id >= picked_id:
next_picked_vector_idx[i] += 1
best_score = score
# Convert from sorted matrix to input matrix IDs before returning
matrix_id = sort_order[picked_id]
next_picked_vector_idx = [sort_order[x] for x in next_picked_vector_idx]
best_picked_vector_idx = [matrix_id] + next_picked_vector_idx
return best_score, best_picked_vector_idx
Maybe it's too naive, but the first thing that occurs to me is to choose the 5 columns with the shortest distance to the target:
import scipy
import numpy as np
from sklearn.metrics.pairwise import pairwise_distances
def sparse_prod_axis0(A):
"""Sparse equivalent of, axis=0)
valid_mask = A.getnnz(axis=0) == A.shape[0]
out = np.zeros(A.shape[1], dtype=A.dtype)
out[valid_mask] =[:, valid_mask].A, axis=0)
return np.matrix(out)
def get_strategy(M, target, n=5):
"""Guess n best vectors.
dists = np.squeeze(pairwise_distances(X=M, Y=target))
idx = np.argsort(dists)[:n]
return sparse_prod_axis0(M[idx])
# Example data.
M = scipy.sparse.rand(m=6000, n=1000, density=0.5, format='csr').astype('bool')
target = np.atleast_2d(np.random.choice([-1, 1], size=1000))
# Try it.
strategy = get_strategy(M, target, n=5)
result = strategy # target.T
It strikes me that you could add another step of taking the top few percent from the M–target distances and check their mutual distances — but this could be quite expensive.
I have not checked how this compares to an exhaustive search.

Write a function to calculate a unit vector

The homework problem is written as follows:
Write a function called unitVec that determines a unit vector in the direction of the line that connects two points (A and B) in space. The function should take as input two vectors (lists), each with the coordinates of a point in space. The output should be a vector (list) with the components of the unit vector in the direction from A to B. If points A and B have two coordinates each (i.e., they lie in the x y plane), the output vector should have two elements. If points A and B have three coordinates each (i.e., they lie in general space), the output vector should have three elements.
I have basically the entire code written but cannot for the life of me figure out how to square each element in the list called connects[].
To calculate a unit vector the program will subtract the elements in vector B with the corresponding elements in vector A and create a new list (connects[]) with these values. Then each of these elements needs to be squared and they all need to be added together. Then the square root will be taken of this number and each element in connects[] will be divided by this number and stored in a new list which will be the unit vector.
I'm trying to add the squares of elements in connects[] by using the line
add = add + (connects[i]**2)
but I know this only returns the list twice. The rest of my code is fine I just need help squaring these elements.
from math import *
vecA = []
vecB = []
unitV = []
connects = []
vec = []
elements = int(input("How many elements will your vectors contain?"))
for i in range(0,elements):
A = float(input("Enter element for vector A:"))
B = float(input("Enter element for vector B:"))
def unitVec(vecA,vecB):
for i in range(0,elements):
unit = 0
add = 0
connect = vecB[i] - vecA[i]
add = add + (connects[i]**2)
uVec = sqrt(add)
result = connects[i]/uVec
return unitV
print("The unit vector connecting your two vectors is:",unitVec(vecA,vecB))
You need to change your function to the following:
def unitVec(vecA,vecB):
add = 0
for i in range(0, elements):
unit = 0
connect = vecB[i] - vecA[i]
add = add + (connect**2)
uVec = sqrt(add)
unitV = [val/uVec for val in connects]
return unitV
You cannot do everything in a single for loop, since you need to add all the differences before being able to get the square root. Then you can divide the differences by this uVec.
python's list is for general use and its arithmetric operation is different from vector operation. for example, [1,2,3]*2 is replication operation instead of vector scalar multiplication such that result is [1,2,3,1,2,3] instead of [2,4,6].
I would use numpy array which is designed for numerical array and provide vector operations.
import numpy as np
a = [1,2,3]
# convert python list into numpy array
b = np.array(a)
# vector magnitude
magnitude = np.sqrt((b**2).sum()) # sqrt( sum(b_i^2))
# or
magnitude = (b**2).sum()**0.5 # sqrt( sum(b_i^2))
# unit vector calculation
unit_b = b/magnitude

K Means in Python from Scratch

I have a python code for a k-means algorithm.
I am having a hard time understanding what it does.
Lines like C = X[numpy.random.choice(X.shape[0], k, replace=False), :] are very confusing to me.
Could someone explain what this code is actually doing?
Thank you
def k_means(data, k, num_of_features):
# Make a matrix out of the data
X = data.as_matrix()
# Get k random points from the data
C = X[numpy.random.choice(X.shape[0], k, replace=False), :]
# Remove the last col
C = [C[j][:-1] for j in range(len(C))]
# Turn it into a numpy array
C = numpy.asarray(C)
# To store the value of centroids when it updates
C_old = numpy.zeros(C.shape)
# Make an array that will assign clusters to each point
clusters = numpy.zeros(len(X))
# Error func. - Distance between new centroids and old centroids
error = dist(C, C_old, None)
# Loop will run till the error becomes zero of 5 tries
tries = 0
while error != 0 and tries < 1:
# Assigning each value to its closest cluster
for i in range(len(X)):
# Get closest cluster in terms of distance
clusters[i] = dist1(X[i][:-1], C)
# Storing the old centroid values
C_old = deepcopy(C)
# Finding the new centroids by taking the average value
for i in range(k):
# Get all of the points that match the cluster you are on
points = [X[j][:-1] for j in range(len(X)) if clusters[j] == i]
# If there were no points assigned to cluster, put at origin
if not points:
C[i][:] = numpy.zeros(C[i].shape)
# Get the average of all the points and put that centroid there
C[i] = numpy.mean(points, axis=0)
# Erro is the distance between where the centroids use to be and where they are now
error = dist(C, C_old, None)
# Increase tries
tries += 1
return sil_coefficient(X,clusters,k)
(Expanded answer, will format later)
X is the data, as a matrix.
Using the [] notation, we are taking slices, or selecting single element, from the matrix. You may want to review numpy array indexing.
numpy.random.choice selects k elements at random from the size of the first dimension of the data matrix without replacement.
Notice, that in indexing, using the [] syntax, we see we have two entries. The numpy.random.choice, and ":".
":" indicates that we are taking everything along that axis.
Thus, X[numpy.random.choice(X.shape[0], k, replace=False), :] means we select an element along the first axis and take every element along the second which shares that first index. Effectively, we are selecting a random row of a matrix.
(The comments expalain this code quite well, I would suggest you read into numpy indexing an list comprehensions for further elucidation).
C[C[j][:-1] for j in range(len(c))]
The part after "C[" uses a list comprehension in order to select parts of the matrix C.
C[j] represents the rows of the matrix C.
We use the [:-1] to take up to, but not including the final element of the row. We do this for each row in the matrix C. This removes the last column of the matrix.
C = numpy.asarray(C). This converts the matrix to a numpy array so we can do special numpy things with it.
C_old = numpy.zeros(C.shape). This creates a zero matrix, to later be populated, which is the same size as C. We are initializing this array to be populated later.
clusters = numpy.zeros(len(x)). This creates a zero vector whose dimension is the same as the number of rows in the matrix X. This vector will be populated later. We are initializing this array to be populated later.
error = dist(C, C_old, None). Take the distance between the two matrices. I believe this function to be defined elsewhere in your script.
tries = 0. Set the tires counter to 0. this block while this condition is true.
for i in [0...(number of rows in X - 1)]:
clusters[i] = dist1(X[i][:-1], C); Put which cluster the ith row of X is closest to in the ith position of clusters.
C_old = deepcopy(C) - Create a copy of C which is new. Don't just move pointers.
for each (0..number of means - 1):
points = [X[j][:-1] for j in range(len(X)) if clusters[j] == i]. This is a list comprehension. Create a list of the rows of X, with all but the last entry, but only include the row if it belongs to the jth cluster.
if not points. If nothing belongs to a cluster.
C[i][:] = numpy.zeros(C[i].shape). Create a vector of zeros, to be populated later, and use this vector as the ith row of the clusters matrix, C.
C[i] = np.mean(points, axis=0). Assign the ith row of the clusters matrix, C, to be the average point in the cluster. We sum across the rows (axis=0). This is us updating our clusters.

Efficient Particle-Pair Interactions Calculation

I have an N-body simulation that generates a list of particle positions, for multiple timesteps in the simulation. For a given frame, I want to generate a list of the pairs of particles' indices (i, j) such that dist(p[i], p[j]) < masking_radius. Essentially I'm creating a list of "interaction" pairs, where the pairs are within a certain distance of each other. My current implementation looks something like this:
interaction_pairs = []
# going through each unique pair (order doesn't matter)
for i in range(num_particles):
for j in range(i + 1, num_particles):
if dist(p[i], p[j]) < masking_radius:
Because of the large number of particles, this process takes a long time (>1 hr per test), and it is severely limiting to what I need to do with the data. I was wondering if there was any more efficient way to structure the data such that calculating these pairs would be more efficient instead of comparing every possible combination of particles. I was looking into KDTrees, but I couldn't figure out a way to utilize them to compute this more efficiently. Any help is appreciated, thank you!
Since you are using python, sklearn has multiple implementations for nearest neighbours finding:
There is KDTree and Balltree provided.
As for KDTree the main point is to push all the particles you have into KDTree, and then for each particle ask query: "give me all particles in range X". KDtree usually do this faster than bruteforce search.
You can read more for example here:
If you are using 2D or 3D space, then other option is to just cut the space into big grid (which cell size of masking radius) and assign each particle into one grid cell. Then you can find possible candidates for interaction just by checking neighboring cells (but you also have to do a distance check, but for much fewer particle pairs).
Here's a fairly simple technique using plain Python that can reduce the number of comparisons required.
We first sort the points along either the X, Y, or Z axis (selected by axis in the code below). Let's say we choose the X axis. Then we loop over point pairs like your code does, but when we find a pair whose distance is greater than the masking_radius we test whether the difference in their X coordinates is also greater than the masking_radius. If it is, then we can bail out of the inner j loop because all points with a greater j have a greater X coordinate.
My dist2 function calculates the squared distance. This is faster than calculating the actual distance because computing the square root is relatively slow.
I've also included code that behaves similar to your code, i.e., it tests every pair of points, for speed comparison purposes; it also serves to check that the fast code is correct. ;)
from random import seed, uniform
from operator import itemgetter
# Make some fake data
def make_point(hi=10.0):
return [uniform(-hi, hi) for _ in range(3)]
psize = 1000
points = [make_point() for _ in range(psize)]
masking_radius = 4.0
masking_radius2 = masking_radius ** 2
def dist2(p, q):
return (p[0] - q[0])**2 + (p[1] - q[1])**2 + (p[2] - q[2])**2
pair_count = 0
test_count = 0
do_fast = 1
if do_fast:
# Sort the points on one axis
axis = 0
# Fast
for i, p in enumerate(points):
left, right = i - 1, i + 1
for j in range(i + 1, psize):
test_count += 1
q = points[j]
if dist2(p, q) < masking_radius2:
#interaction_pairs.append((i, j))
pair_count += 1
elif q[axis] - p[axis] >= masking_radius:
if i % 100 == 0:
print('\r {:3} '.format(i), flush=True, end='')
total_pairs = psize * (psize - 1) // 2
print('\r {} / {} tests'.format(test_count, total_pairs))
# Slow
for i, p in enumerate(points):
for j in range(i+1, psize):
q = points[j]
if dist2(p, q) < masking_radius2:
#interaction_pairs.append((i, j))
pair_count += 1
if i % 100 == 0:
print('\r {:3} '.format(i), flush=True, end='')
print('\n', pair_count, 'pairs')
output with do_fast = 1
181937 / 499500 tests
13295 pairs
output with do_fast = 0
13295 pairs
Of course, if most of the point pairs are within masking_radius of each other, there won't be much benefit in using this technique. And sorting the points adds a little bit of time, but Python's TimSort is rather efficient, especially if the data is already partially sorted, so if the masking_radius is sufficiently small you should see a noticeable improvement in the speed.

Minimum number of iterations in matrix where cell value replaced by maximum of neighbour cell value in single iteration

I have an matrix with values in each cell (minimum value=1), where the maximum value is 'max'.
At a time, I modify each cell value by the highest value of its neighboring cells i.e. all 8 neighbors, and this occurs for the whole matrix, simultaneously. I want to find after what minimum number of iterations after which value of all cells will be max.
One brute force method of doing this is by padding the matrix by zeros, and
for i in range (1,x_max+1):
for j in range(1,y_max+1):
maximum = 0
for k in range(-1,2):
for l in range(-1,2):
if matrix[i+k][j+l]>maximum:
maximum = matrix[i+k][j+l]
matrix[i][j] = maximum
But is there an intelligent and faster way of doing this?
Thanks in advance.
I think this can be solved by BFS(Breadth first Search).
Start BFS simulatneously with all the matrix cells with 'max' value.
dis[][] == infinite // min. distance of cell from nearest cell with 'max' value, initially infinite for all
Q // Queue
M[][] // matrix
for all i,j // travers the matrix, enqueue all cells with 'max'
if M[i][j] == 'max'
dis[i][j] = 0 , Q.push( cell(i,j) )
while !Q.empty:
cell Current = Q.front
for all neighbours Cell(p,q) of Current:
if dis[p][q] == infinite
dis[p][q] = dis[Current.row][Current.column] + 1
Q.push( cell(p,q))
The cell with max(dis[i][j]) for all i,j will be the no. of iterations needed.
Use an array with a "border".
Testing the edge conditions is tedious and can be avoided by making the array 1-bigger around the edge, each element with the value of INT_MIN.
Additionally, consider 8 tests, rather than a double nested loop
// Data is in matrix[1...N][1...M], yet is size matrix[N+2][M+2]
for (i=1; i <= N; i++) {
for (j=1; j <= M; j++) {
maximum = matrix[i-1][j-l];
if (matrix[i-1][j+0] > maximum) maximum = matrix[i-1][j+0];
if (matrix[i-1][j+1] > maximum) maximum = matrix[i-1][j+1];
if (matrix[i+0][j-1] > maximum) maximum = matrix[i+0][j-1];
if (matrix[i+0][j+0] > maximum) maximum = matrix[i+0][j+0];
if (matrix[i+0][j+1] > maximum) maximum = matrix[i+0][j+1];
if (matrix[i+1][j-1] > maximum) maximum = matrix[i+1][j-1];
if (matrix[i+1][j+0] > maximum) maximum = matrix[i+1][j+0];
if (matrix[i+1][j+1] > maximum) maximum = matrix[i+1][j+1];
newmatrix[i][j] = maximum
All existing answers require examining every cell in the matrix. If you don't already know what the locations of the maximum value are, this is unavoidable, and in that case, Amit Kumar's BFS algorithm has optimal time complexity: O(wh), if the matrix has width w and height h.
OTOH, perhaps you already know the locations of the k maximum values, and k is relatively small. In that case, the following algorithm will find the answer in just O(k^2*(log(k)+log(max(w, h)))) time, which is much faster when either w or h is large. It doesn't actually look at any matrix entries; instead, it runs a binary search to look for candidate stopping times (that is, answers). For each candidate stopping time it builds the set of rectangles that would be occupied by max by that time, and checks whether any matrix cell remains uncovered by a rectangle.
To explain the idea, we first need some terms. Call the top row of a rectangle a "starting vertical event", and the row below its bottom edge an "ending vertical event". A "basic interval" is the interval of rows spanned by any pair of vertical events that does not have a third vertical event anywhere between them (the event pairs defining these intervals can be from the same or different rectangles). Notice that with k rectangles, there can never be more than 2k+1 basic intervals -- there is no dependence here on h.
The basic idea is to walk left-to-right through the columns of the matrix that correspond to horizontal events: columns in which either a new rectangle "starts" (the left vertical edge of a rectangle), or an existing rectangle "finishes" (the column to the right of the right vertical edge of a rectangle), keeping track of how many rectangles are currently covering every basic interval. If we ever detect a basic interval covered by 0 rectangles, we can stop: we have found a column containing one or more cells that are not yet covered at time t. If we get to the right edge of the matrix without this happening, then all cells are covered at time t.
Here is pseudocode for a function that checks whether any matrix cell remains uncovered by time t, given a length-k array peak, where (peak[i].x, peak[i].y) is the location of the i-th max-containing cell in the original matrix, in increasing order of x co-ordinate (so the leftmost max-containing cell is at (peak[1].x, peak[1].y)).
Function IsMatrixCovered(t, peak[]) {
# Discover all vertical events and basic intervals
Let vertEvents[] be an empty array of integers.
For i from 1 to k:
top = max(1, peak[i].y - t)
bot = min(h, peak[i].y + t)
Append top to vertEvents[]
Append bot+1 to vertEvents[]
Sort vertEvents in increasing order, and remove duplicates.
x = 1
Let horizEvents[] be an empty array of { col, type, top, bot } structures.
For i from 1 to k:
# Calculate the (clipped) rectangle that peak[i] will cover at time t:
lft = max(1, peak[i].x - t)
rgt = min(w, peak[i].x + t)
top = max(1, peak[i].y - t)
bot = min(h, peak[i].y + t)
# Convert vertical positions to vertical event indices
top = LookupIndexUsingBinarySearch(top, vertEvents[])
bot = LookupIndexUsingBinarySearch(bot+1, vertEvents[])
# Record horizontal events
Append (lft, START, top, bot) to horizEvents[]
Append (rgt+1, STOP, top, bot) to horizEvents[]
Sort horizEvents in increasing order by its first 2 fields, with START considered < STOP.
# Walk through all horizontal events, from left to right.
Let basicIntervals[] be an array of size(vertEvents[]) integers, initially all 0.
nOccupiedBasicIntervalsFirstCol = 0
For i from 1 to size(horizEvents[]):
If horizEvents[i].type = START:
d = 1
Else (if it is STOP):
d = -1
If horizEvents[i].col <= w:
For j from horizEvents[i].top to horizEvents[i].bot:
If horizEvents[i].col = 1 and basicIntervals[j] = 0:
++nOccupiedBasicIntervalsFirstCol # Must be START
basicIntervals[j] += d
If basicIntervals[j] = 0:
return FALSE
If nOccupiedBasicIntervalsFirstCol < size(basicIntervals):
return FALSE # Could have checked earlier, but the code is simpler this way
return TRUE
The above function can simply be called inside a binary search on t, that looks for the smallest value of t for which the function returns TRUE.
A further factor of k/log(k) could be removed by exploiting the fact that the set of basic intervals affected by any rectangle starting or ending is always an interval, through the use of Fenwick trees.
