Using a genetic algorithm to guess an image

Using a genetic algorithm to guess an image - python

I am trying to write a genetic algorithm which will take in a given image and then produce that image. I have already achieved this for 28x28 white and black images, but I am struggling to work out how to do it for larger RGB images.
I thought I would start by generating the individuals in the population, so I randomly generated a list of size x*y where x and y are the dimensions of the input image, and each item within the list is in the format (r, g, b). Like this
random_numbers = np.random.randint(low=0, high=256, size=(pixel_amount, 3))
generated_colours = [tuple(colour) for colour in random_numbers.tolist()]
In order to evaluate fitness, I went through each pixel in the image I want to produce, and the individual in the population and then I checked to see how many of the pixels were the correct colour in the correct place. However, this is clearly not a good approach as there are 256^3 combinations for each colour. As such, I thought it might be a better idea to evaluate it based on how many of the individual r, g and b values are correct. Therefore, an image of 28x28 pixels would have a maximum fitness of 28*28*3 = 2352 (in this case this is good, whereas a low fitness is usually ideal so you can invert it). Although, I thought perhaps it could be a better idea to instead look at exactly how far away the individual r, g and b values in each individual in the population are from the r, g and b values in the image we want, in a similar fashion to a cost function of a neural network. However, I'm not sure exactly how I would implement this. Perhaps
fitness = 0
for i in range(pixel_amount):
given_image_rgb = given_image_pixels[i]
individual_rgb = individual_pixels[i]
for j in range(3):
fitness += (individual_rgb[j] - given_image_rgb[j])**2
So if the fitness remains at zero, it is because the image is exactly correct. Whereas, if it is higher than it is further from what we want. Although, I think this would give very large numbers as for only a 28x28 image the worst possible fitness is 28*28*3*255^3 = 152938800.
So, assuming I can get the fittest individuals in the population, I was going to remove the worst half of the individuals. Then, I would breed the fittest individuals to create the next generation. I was thinking of doing this by randomly selecting two individuals from the remaining half. Then, I randomly pick half the pixel coordinates and I take those pixels from parent1, and the other half of the pixels from parent2 to make child1. Then, I take the inverse in order to form child2. This would continue until the initial population size has been restored. However, I think that this crossover needs to not just combine pixels, but also combine (r, g, b) values of the pixels. Perhaps calculate an average across the two and always round down? Here is the current approach:
first_child = [None for _ in range(pixel_amount)]
second_child = [None for _ in range(pixel_amount)]
first_parent = first_parent[1]
second_parent = second_parent[1]
for i in range(pixel_amount):
if i in random_pixels:
first_child[i] = first_parent[i]
second_child[i] = second_parent[i]
else:
first_child[i] = second_parent[i]
second_child[i] = first_parent[i]
return first_child, second_child
After that, I would mutate the individuals which would involve randomly changing a certain % of the pixels in each individual. Say 0.25% of each image would get set to a random (r, g, b) value.
for individual in population:
mutated_pixels = random.sample(range(pixel_amount - 1), int(pixel_amount * mutation_rate))
for pixel in mutated_pixels:
individual[pixel] = tuple(np.random.randint(low=0, high=256, size=3))
In terms of a stopping condition, I was going to calculate what percentage of the pixels, or what percentage of the rgb values are correct, and then have it stop once it has reached a certain % correct - say 70%.
I'm sorry for a rather long post, but I would really appreciate some advice on how I can get this working for larger colour images. I understand that it may be practically impossible using this approach, and so I have begun to look into using a set of translucent polygons instead.

Why don't you try to use Markov Chain Monte Carlo (MCMC) in your search space? Select a random sampling with a given restriction to decrease the search space. You keep crossover at around 60% and mutation at 1% and select random images with a given restriction, let's say average of pixels inside a given threshold related to the image you want to reconstruct. Although the selection process is sub-optimal, you can use the roulette wheel based on Expectation Maximization algorithm as a way to overcome this heuristic approach, as well as the threshold in the choice of fittest individuals.

Related

How to use an optimization algorithm to find the best possible parameter

I'm trying to find a good interval of colors for color masking in order to extract skin from images.
I have a database with images and masks to extract skin from those images. here's an example of a sample :
I'm applying the mask for each image in order to get something like this :
I'm getting all the pixels from all the masked images and removing the black pixels in order to keep only the pixels containing the skin. Using this method I'm able to gather different pixels containing different shades of color of different skins from different people.
This is the code I'm using for this :
for i, (img_color, img_mask) in enumerate ( zip(COLORED_IMAGES, MASKS) ) :
# masking
img_masked = cv2.bitwise_and(img_color, img_mask)
# transforming into pixels array
img_masked_pixels = img_masked.reshape(len(img_masked) * len(img_masked[0]), len(img_masked[0][0]))
# merging all pixels from all samples
if i == 0:
all_pixels = img_masked_pixels
else:
all_pixels = np.concatenate((all_pixels, img_masked_pixels), axis = 0)
# removing black
all_pixels = all_pixels[ ~ (all_pixels == 0).all(axis = 1) ]
# sorting pixels
all_pixels = np.sort(all_pixels)
# reshape into 1 NB_PIXELSx1 image in order to create histogram
all_pixels = all_pixels.reshape(len(all_pixels), 1, 3)
# creating image NB_PIXELSx1 image containing all skin colors from dataset samples
all_pixels = cv2.cvtColor(all_pixels, cv2.COLOR_BGR2YCR_CB)
After extracting all shades of color from different skins, I'm creating a histogram that allows me to see which colors are more common. The code is too long for the creation of the histogram, but this is the result :
Then, I use the turning point for each color space graph and chose a distance for that color space, say 20. The interval for that color space is gotten by doing [ turning point - 20, turning point +20 ]
So let's say that we got the following :
R :
turning point : 142
distance : 61
interval : [81, 203]
G :
turning point : 155
distance : 10
interval : [145, 165]
B :
turning point : 109
distance : 14
interval : [95, 123]
I would use these intervals in order to create masks of the colored image from the dataset in order to extract the skin (left: my intervals mask, right: ground truth mask):
The extracted masks using my intervals are compared with the dataset preexistent masks and the accuracy is calculated in order to see how effective and good the intervals that I got are :
precision_moy = 0
accuracy_moy = 0
for i, (image, img) in enumerate ( zip(COLORED, GROUND_TRUTH) ) :
Min = np.array([81, 145, 95], np.uint8)
Max = np.array([203, 165, 123], np.uint8)
mask = cv2.inRange (image, Min, Max)
TP = 0 # True Positive
TN = 0 # True Negative
FP = 0 # False Positive
FN = 0 # False Negative
for i in range(mask.shape[0]) :
for j in range(mask.shape[1]) :
if mask[i,j] == 255 and img[i,j,0] == 255:
TP = TP + 1
if mask[i,j] == 0 and img[i,j,0] == 0:
TN = TN+1
if mask[i,j] == 255 and img[i,j,0] == 0:
FP = FP+1
if mask[i,j] == 0 and img[i,j,0] == 255:
FN = FN+1
precision = TP/(TP+FP)
accuracy = (TP+TN)/(TP+TN+FP+FN)
precision_moy = precision_moy + precision
accuracy_moy = accuracy_moy + accuracy
precision_moy = precision_moy / len(COLORED)
accuracy_moy = accuracy_moy / len(COLORED)
I keep on changing the intervals, testing and calculating the accuracy, in order to find the best possible interval for each color space. This change is done by multiplying the distance by a number between 0 and 2. For example :
OLD R :
turning point : 142
distance : 61
interval : [81, 203]
NEW DISTANCE = OLD DISTANCE * 0.7 = 61 * 0.7 = 43
NEW R:
turning point : 142
distance : 43
interval : [99, 185]
To get a higher interval I would multiply by a number in ]1, 2]
To get a smaller interval I would multiply by a number in ]0, 1[
Now, to my question:
I would like to find the best possible interval for each color space using an optimization method instead of manually and randomly changing the intervals. What optimization method should I use and how would I use it ?
Thank you for taking the time. Your help is appreciated.

I would suggest using genetic optimization which can be easily implemented for as simple problem as yours. Since the problem is relatively "small" it should not take much longer to find optimal solution compared to some local optimization method like Hillclimb suggested by #Leander. Genetic algorithm is a metaheuristic search so it is not guaranteed to find the optimal solution but it should get you very close. In fact for a such small problem the chance that you will find the global optimum is very high.
As a start I would recommend taking a look at DEAP so you don't have to implement anything yourself (https://deap.readthedocs.io/en/master/). It contains very good implementations of many genetic algorithm variations and there are tutorials with nice examples. With a bit of effort you should be able to compose a simple optimization algorithm in a day or two.
Genetic algorithm will from now on be denoted as GA for simplicity
Some tips where to start:
I suggest you start with the simplest variationeaSimple in DEAP. When this will not be satisfactory you can always move to something little more sophisticated but I think that won't be necessary.
your Individual in GA will have 6 components -> [blue_low, blue_high, green_low, green_high, red_low, red_high] this will also address the problem of assymetric interval as mentioned by #Leander in the comments
mutations will be done by randomly altering elements of the individual
for fittness function you can use your accuracy as you are computing it now
That is essentially all you need to build GA for your problem. This example here https://deap.readthedocs.io/en/master/examples/ga_onemax.html should get you up and running. You just need to define your own individuals, operators and fitness evaluation function as I mentioned in previous steps
A final note on the use of any general optimization method. As I understand this is a discrete problem in 6 dimensions since you have 6 components: blue_low, blue_high, green_low, green_high, red_low, red_high and each one of them has only 255 possible values. This will prevent use of most optimization methods since they require the problem to be continuous.

One basic approach which converges quickly but may not yield the global optimum is Hillclimbing.
Hillclimbing is a form of local search which can be used in this case.
Hillclimbing works by going from one state or solution to the next depending on the score or performance of the state. If no better state can be found that state is returned as solution.
There are multiple ways of implementing Hillclimbing, in your case I would do something like this:
The State: In your case an item containing the Min and Max numpy arrays and the accuracy or f-measure of the mask created with these arrays applied on the image as score property.
For now I suggest you only take symmetrical ranges to massively reduce the search space.
Starting State
You can create a starting state at random, taking a random interval for each channel (Red, Green, Blue). This is especially useful if you run this algorithm multiple times. Determine the maximum and minimum for each interval based on your histograms.
Iteration Process (this is where the searching is done)
You want to create an infinite loop in which you create successor states for the current state. Increasing or decreasing the interval of each channel with say 10 of the current state, and then every combination of those new intervals can be a successor state.
Another way could be to switch channel each iteration. So in the first iteration you create a successor state that has the Red channel of the current state decreased with 10, and a successor state that has the Red channel of the current state increased with 10. The second iteration you change the Green channel, the third iteration the Blue channel, etc.
You then create a mask based on each successor state and apply them onto the image, therefore determining the performance of each successor state.
Select the best performing successor state and take it as current state if its performance is better.
Repeat this process until the best successor state is performing worse than the current state, then you know you have hit a local optimum. Return this state as solution.
Problems
As highlighted in above line, this algorithm will find the local optimum for the starting state. This is because of greediness of this algorithm.
You therefore may want to restart this algorithm on different starting locations, allowing more of the search space to be explored, increasing the chance the global maximum is found.
If you have multiple threads you may run multiple instances in parallel and then finally returning the best state out of the results from each instance.
Hillclimbing is not the best optimization algorithm, but it is very fast and easy to implement.

In your current algorithm, you are finding the Mode (ie., peak) of the colorspace data and then taking the bins (color values) symmetrically around the mode.
For a normal distribution curve, you would have the % of population based on the number of standard deviations around the mean as given below:
In a normal distribution, mean, median and mode will be the same. However, if your distribution is skewed the population on the left side of the mean wont be the same as the population on the right side of the mean. So, a simple adjustment that you can make is as follows:
Let p_left be the % of population to the left of the peak and p_right be the % of population to the right of the peak. For eg: let p_left = 40% and p_right = 60%. Instead of a fixed interval width of 40 that you are using (-20,20), you can set another parameter which is % of selected population, say 15%. This is the total population we want around the mode (including the mode). You can then divide this 15% in the proportion of the left vs right population.
left proportion = 15% x 40% = 6%
right proportion = 15% x 60% = 9%
You should correct these 6% and 9% by calculating the mode % of population and taking out half of it from each. For eg: If the mode is 5% of the population, you should deduct 2.5% from 6% and 9%. This gives adjusted p_left and p_right as:
p_left = 6% - 2.5% = 3.5%
p_right = 9% - 2.5% = 6.5%
Instead of dividing the interval evenly around the mean, you compute how many bins from the left and right need to be included to determine the range. For eg: you may find including 5 bins on the left adds up to 3.5% of total population and adding 3 bins on the right gives you 6.5% of the population approximately.
So, your range becomes (x - 5, x + 3) where x is the x coordinate of the mode.
Parameter estimation: To determine the right % for the mode% of population (the 15% in the example above), you can compute the histograms on a standard set of your masked images and use that to determine a good initial estimate. Essentially count the unmasked pixels in your masked images and divide it by total pixels

Actually, finding the global optimum for a given dataset is not too complicated. For simplicity, let's first assume you have grayscale images since each of the colors is treated independently (I believe). It would be a bit more complicated if you were scoring a pixel based on all 3 colors falling within the required interval, but it seems like you're not.
So anyways, you can just exhaustively check each interval for each image, depending on the size of your dataset. For instance, if each pixel only takes integer values in [0,255], there are only on the order of 100 interval sizes you even need to consider. So you can compute the accuracy for each candidate interval size and each image, and simply take the interval that yields the highest average accuracy. Repeat across all colors. This is the brute force approach for sure, but unless your dataset is quite large it shouldn't be computationally expensive using optimized matrix operations. If your dataset is huge, a sufficiently large random sample of images over which to use this technique would yield an approximate (though not globally optimal solution).
As an aside, the way you are currently computing your accuracies between mask and ground truth is quite inefficient. The rule of thumb is pretty much to always use numpy matrix operations when you can because they're much more efficient (there are some cool algorithmic tricks for time saving on matrix operations and they're written in C so are faster for that reason as well.
You can replace this:
for i in range(mask.shape[0]) :
for j in range(mask.shape[1]) :
if mask[i,j] == 255 and img[i,j,0] == 255:
TP = TP + 1
if mask[i,j] == 0 and img[i,j,0] == 0:
TN = TN+1
if mask[i,j] == 255 and img[i,j,0] == 0:
FP = FP+1
if mask[i,j] == 0 and img[i,j,0] == 255:
FN = FN+1
With the equivalent matrix operation:
ones = np.ones(img.shape)
zeros = np.zeros(img.shape)
diff = mask - img
TP = sum(np.where(np.multiply(diff,img) == 1,ones,zeros))
TN = sum(np.where(np.multiply(diff,1-img) == 1,ones,zeros))
FP = sum(np.where(diff == -1,ones,zeros))
FN = sum(np.where(diff == 1,ones,zeros))
This will save you time especially if you use a brute-force approach like the one I suggested, but is also good practice in general

Laplacian of Gaussian Edge Detector Being Affected by Change of Mask Size

For a class, I've written a Laplacian of Gaussian edge detector that works in the following way.
Make a Laplacian of Gaussian mask given the variance of the Gaussian the size of the mask
Convolve it with the image
Find the zero crossings in a really shoddy manner, these are the edges of the image
If you so desire, the code for this program can be viewed here, but the most important part is where I create my Gaussian mask which depends on two functions that I've reproduced here for your convenience:
# Function for calculating the laplacian of the gaussian at a given point and with a given variance
def l_o_g(x, y, sigma):
# Formatted this way for readability
nom = ( (y**2)+(x**2)-2*(sigma**2) )
denom = ( (2*math.pi*(sigma**6) ))
expo = math.exp( -((x**2)+(y**2))/(2*(sigma**2)) )
return nom*expo/denom
# Create the laplacian of the gaussian, given a sigma
# Note the recommended size is 7 according to this website http://homepages.inf.ed.ac.uk/rbf/HIPR2/log.htm
# Experimentally, I've found 6 to be much more reliable for images with clear edges and 4 to be better for images with a lot of little edges
def create_log(sigma, size = 7):
w = math.ceil(float(size)*float(sigma))
# If the dimension is an even number, make it uneven
if(w%2 == 0):
print "even number detected, incrementing"
w = w + 1
# Now make the mask
l_o_g_mask = []
w_range = int(math.floor(w/2))
print "Going from " + str(-w_range) + " to " + str(w_range)
for i in range_inc(-w_range, w_range):
for j in range_inc(-w_range, w_range):
l_o_g_mask.append(l_o_g(i,j,sigma))
l_o_g_mask = np.array(l_o_g_mask)
l_o_g_mask = l_o_g_mask.reshape(w,w)
return l_o_g_mask
All in all, it works relatively well, even if it is extremely slow because I don't know how to leverage Numpy. However, whenever I change the size of the Gaussian mask, the thickness of the edges I detect change drastically.
Here is the image run with a size of mask equivalent to 4 times the given variance of the Gaussian:
Here is the same image run with a size of mask equivalent to 6 times the variance:
I'm kind of baffled, because the only thing the size parameter should change is the accuracy of the approximation of the Laplacian of Gaussian mask before I begin to convolve it with the image. So I ran a test where I wanted to vizualize how my mask looked given different size parameters.
Here it is with a size of 4:
Here it is with a size of 6:
The shape of the function seems to be the same as far as I can tell from the zero crossings (they happen to be spaced around four pixels apart) and their peaks. Is there a better way to check?
Any suggestions as to why this issue might be occurring or how to investigate further are appreciated.

It turns out your concept about the effect of increasing the mask size is wrong. Increasing the size doesn't actually improve the quality of approximation or the resolution of the function. To explain, instead of using a complicated 2D function like the Laplacian of the Gaussian, let's take things back down to the one dimension and pretend we are approximating the function f(x) = x^2.
Now you code for calculating the function would look like this:
def derp(theta, size):
w = math.ceil(float(size)*float(sigma))
# If the dimension is an even number, make it uneven
if(w%2 == 0):
print "even number detected, incrementing"
w = w + 1
# Now make the mask
x_mask = []
w_range = int(math.floor(w/2))
print "Going from " + str(-w_range) + " to " + str(w_range)
for i in range_inc(-w_range, w_range):
x_mask = a*i^2
If you were to increase the "size" of this function, you wouldn't be increasing the resolution, you're actually increasing the range of values of x that you're grabbing from. For example, for a size of 3 you're evaluating -1, 0, 1, for a size of 5 you're evaluating -2, -1, 0, 1, 2. Notice this doesn't increase the spacing between the pixels. This is what you're actually seeing when you talk about the zero crossing occurring the same number of pixels apart.
Consequently, when convoluting with this really silly mask, you would get really different results. But what if we went back to the Laplacian of the Gaussian?
Well, the nice property the Laplacian of the Gaussian has is that the farther you go with it, the more zero values you get. So unlike our silly x^2 function, you should be getting the same results after some time.
Now, I think the reason you didn't see this with your test cases is because they were too limited in size, because your program is too slow for you to really see the difference between size=15 and size=20, but if were to actually run those cases I think you would see that the image doesn't change that much.
This still doesn't answer what you should be doing, for that, we're going to have to look to the professionals. Namely, the implementation of the gaussian_filter in Scipy (source here).
When you look at their source code, the first thing you'll notice is that when creating their mask they're basically doing the same thing as you. They are always using an integer step size and they are scaling the size of the mask by it's standard deviation.
As to why they are doing it that way, I can't answer, since I don't have that much of an in depth knowledge of image processing or Scipy. However, this may make for a good new question to ask on SO.

Merging image regions (bboxes) in linear time

I have a set of regions (bounding boxes) for some image, example python code:
im = Image.open("single.png")
pix = np.array(im)
gray = rgb2grey(pix)
thresh = threshold_otsu(gray)
bw = closing(gray > thresh, square(1))
cleared = bw.copy()
clear_border(cleared)
borders = np.logical_xor(bw, cleared)
label_image = label(borders)
for region in regionprops(label_image, ['Area', 'BoundingBox']):
#now i have bounding boxes in hand
What I would like to do is to merge regions which overlap or the distance between bbox edges is less than X. Naive approach would be checking distances between all regions, which has O(n2) complexity. I can write something smarter but I have impression that this kind of algorithm already exists and I don't want to reinvent the wheel. Any help is appreciated.

Is this your question "There is n boxes (not necessarily // to x-y axis), and you want to find all overlapping boxes and merge them if they exist?"
I cannot think of a linear algorithm yet but I have a rough idea faster than O(n^2), maybe O(n lg n) describes as follow:
Give each box an id, also for each edge, mark it's belonging box
Use sweeping line algorithm to find all intersections
In the sweeping line algorithm, once an intersection is reported, you know which 2 boxes are overlapping, use something like disjoint-set to group them.
Lastly linearly scan the disjoint-set, for each set, keep updating the leftmost, rightmost, topmost, bottommost point for making a larger box to bound them all
(merging done here, note that if a box has no overlapping with others, the set will only contain itself)
I hope this method will work and it should be faster than O(n^2), but even if it does works, it still have some problems at step 4, where the larger merged box must be // to x-y axis, which is not a must.
Edit: Sorry I just go through OP again, and understand the above solution does not solve the "merge boxes with distance < x", it even only solves partly of the overlapping boxes problem.
Moreover, the merging box procedure is not a 1-pass job, it is kind of recursive, for example a box A and box B merged become box C, then box C may overlap / distance < x with box D..and so on.
To solve this task in linear time is quite impossible for me, as pre-computing the distance between all pair-wise box is already hardly be done in O(n)...

Selecting best range of values from histogram curve

Scenario :
I am trying to track two different colored objects. At the beginning, user is prompted to hold the first colored object (say, may be a RED) at a particular position in front of camera (marked on screen by a rectangle) and press any key, then my program takes that portion of frame (ROI) and analyze the color in it, to find what color to track. Similarly for second object also. Then as usual, use cv.inRange function in HSV color plane and track the object.
What is done :
I took the ROI of object to be tracked, converted it to HSV and checked the Hue histogram. I got two cases as below :
( here there is only one major central peak. But in some cases, I get two such peaks, One a bigger peak with some pixel cluster around it, and second peak, smaller than first one, but significant size with small cluster around it also. I don't have an sample image of it now. But it almost look like below (created in paint))
Question :
How can I get best range of hue values from these histograms?
By best range I mean, may be around 80-90% of the pixels in ROI lie in that range.
Or is there any better method than this to track different colored objects ?

If I understand right, the only thing you need here is to find a maximum in a graph, where the maximum is not necessarily the highest peak, but the area with largest density.
Here's a very simple not too scientific but fast O(n) approach. Run the histogram trough a low pass filter. E.g. a moving average. The length of your average can be let's say 20. In that case the 10th value of your new modified histogram would be:
mh10 = (h1 + h2 + ... + h20) / 20
where h1, h2... are values from your histogram. The next value:
mh11 = (h2 + h3 + ... + h21) / 20
which can be calculated much easier using the previously calculated mh10, by dropping it's first component and adding a new one to the end:
mh11 = mh10 - h1/20 + h21/20
Your only problem is how you handle numbers at the edge of your histogram. You could shrink your moving average's length to the length available, or you could add values before and after what you already have. But either way, you couldn't handle peaks right at the edge.
And finally, when you have this modified histogram, just get the maximum. This works, because now every value in your histogram contains not only himself but it's neighbors as well.
A more sophisticated approach is to weight your average for example with a Gaussian curve. But that's not linear any more. It would be O(k*n), where k is the length of your average which is also the length of the Gaussian.

Peak detection in a 2D array

I'm helping a veterinary clinic measuring pressure under a dogs paw. I use Python for my data analysis and now I'm stuck trying to divide the paws into (anatomical) subregions.
I made a 2D array of each paw, that consists of the maximal values for each sensor that has been loaded by the paw over time. Here's an example of one paw, where I used Excel to draw the areas I want to 'detect'. These are 2 by 2 boxes around the sensor with local maxima's, that together have the largest sum.
So I tried some experimenting and decide to simply look for the maximums of each column and row (can't look in one direction due to the shape of the paw). This seems to 'detect' the location of the separate toes fairly well, but it also marks neighboring sensors.
So what would be the best way to tell Python which of these maximums are the ones I want?
Note: The 2x2 squares can't overlap, since they have to be separate toes!
Also I took 2x2 as a convenience, any more advanced solution is welcome, but I'm simply a human movement scientist, so I'm neither a real programmer or a mathematician, so please keep it 'simple'.
Here's a version that can be loaded with np.loadtxt
Results
So I tried #jextee's solution (see the results below). As you can see, it works very on the front paws, but it works less well for the hind legs.
More specifically, it can't recognize the small peak that's the fourth toe. This is obviously inherent to the fact that the loop looks top down towards the lowest value, without taking into account where this is.
Would anyone know how to tweak #jextee's algorithm, so that it might be able to find the 4th toe too?
Since I haven't processed any other trials yet, I can't supply any other samples. But the data I gave before were the averages of each paw. This file is an array with the maximal data of 9 paws in the order they made contact with the plate.
This image shows how they were spatially spread out over the plate.
Update:
I have set up a blog for anyone interested and I have setup a OneDrive with all the raw measurements. So to anyone requesting more data: more power to you!
New update:
So after the help I got with my questions regarding paw detection and paw sorting, I was finally able to check the toe detection for every paw! Turns out, it doesn't work so well in anything but paws sized like the one in my own example. Off course in hindsight, it's my own fault for choosing the 2x2 so arbitrarily.
Here's a nice example of where it goes wrong: a nail is being recognized as a toe and the 'heel' is so wide, it gets recognized twice!
The paw is too large, so taking a 2x2 size with no overlap, causes some toes to be detected twice. The other way around, in small dogs it often fails to find a 5th toe, which I suspect is being caused by the 2x2 area being too large.
After trying the current solution on all my measurements I came to the staggering conclusion that for nearly all my small dogs it didn't find a 5th toe and that in over 50% of the impacts for the large dogs it would find more!
So clearly I need to change it. My own guess was changing the size of the neighborhood to something smaller for small dogs and larger for large dogs. But generate_binary_structure wouldn't let me change the size of the array.
Therefore, I'm hoping that anyone else has a better suggestion for locating the toes, perhaps having the toe area scale with the paw size?

I detected the peaks using a local maximum filter. Here is the result on your first dataset of 4 paws:
I also ran it on the second dataset of 9 paws and it worked as well.
Here is how you do it:
import numpy as np
from scipy.ndimage.filters import maximum_filter
from scipy.ndimage.morphology import generate_binary_structure, binary_erosion
import matplotlib.pyplot as pp
#for some reason I had to reshape. Numpy ignored the shape header.
paws_data = np.loadtxt("paws.txt").reshape(4,11,14)
#getting a list of images
paws = [p.squeeze() for p in np.vsplit(paws_data,4)]
def detect_peaks(image):
"""
Takes an image and detect the peaks usingthe local maximum filter.
Returns a boolean mask of the peaks (i.e. 1 when
the pixel's value is the neighborhood maximum, 0 otherwise)
"""
# define an 8-connected neighborhood
neighborhood = generate_binary_structure(2,2)
#apply the local maximum filter; all pixel of maximal value
#in their neighborhood are set to 1
local_max = maximum_filter(image, footprint=neighborhood)==image
#local_max is a mask that contains the peaks we are
#looking for, but also the background.
#In order to isolate the peaks we must remove the background from the mask.
#we create the mask of the background
background = (image==0)
#a little technicality: we must erode the background in order to
#successfully subtract it form local_max, otherwise a line will
#appear along the background border (artifact of the local maximum filter)
eroded_background = binary_erosion(background, structure=neighborhood, border_value=1)
#we obtain the final mask, containing only peaks,
#by removing the background from the local_max mask (xor operation)
detected_peaks = local_max ^ eroded_background
return detected_peaks
#applying the detection and plotting results
for i, paw in enumerate(paws):
detected_peaks = detect_peaks(paw)
pp.subplot(4,2,(2*i+1))
pp.imshow(paw)
pp.subplot(4,2,(2*i+2) )
pp.imshow(detected_peaks)
pp.show()
All you need to do after is use scipy.ndimage.measurements.label on the mask to label all distinct objects. Then you'll be able to play with them individually.
Note that the method works well because the background is not noisy. If it were, you would detect a bunch of other unwanted peaks in the background. Another important factor is the size of the neighborhood. You will need to adjust it if the peak size changes (the should remain roughly proportional).

Solution
Data file: paw.txt. Source code:
from scipy import *
from operator import itemgetter
n = 5 # how many fingers are we looking for
d = loadtxt("paw.txt")
width, height = d.shape
# Create an array where every element is a sum of 2x2 squares.
fourSums = d[:-1,:-1] + d[1:,:-1] + d[1:,1:] + d[:-1,1:]
# Find positions of the fingers.
# Pair each sum with its position number (from 0 to width*height-1),
pairs = zip(arange(width*height), fourSums.flatten())
# Sort by descending sum value, filter overlapping squares
def drop_overlapping(pairs):
no_overlaps = []
def does_not_overlap(p1, p2):
i1, i2 = p1[0], p2[0]
r1, col1 = i1 / (width-1), i1 % (width-1)
r2, col2 = i2 / (width-1), i2 % (width-1)
return (max(abs(r1-r2),abs(col1-col2)) >= 2)
for p in pairs:
if all(map(lambda prev: does_not_overlap(p,prev), no_overlaps)):
no_overlaps.append(p)
return no_overlaps
pairs2 = drop_overlapping(sorted(pairs, key=itemgetter(1), reverse=True))
# Take the first n with the heighest values
positions = pairs2[:n]
# Print results
print d, "\n"
for i, val in positions:
row = i / (width-1)
column = i % (width-1)
print "sum = %f # %d,%d (%d)" % (val, row, column, i)
print d[row:row+2,column:column+2], "\n"
Output without overlapping squares. It seems that the same areas are selected as in your example.
Some comments
The tricky part is to calculate sums of all 2x2 squares. I assumed you need all of them, so there might be some overlapping. I used slices to cut the first/last columns and rows from the original 2D array, and then overlapping them all together and calculating sums.
To understand it better, imaging a 3x3 array:
>>> a = arange(9).reshape(3,3) ; a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Then you can take its slices:
>>> a[:-1,:-1]
array([[0, 1],
[3, 4]])
>>> a[1:,:-1]
array([[3, 4],
[6, 7]])
>>> a[:-1,1:]
array([[1, 2],
[4, 5]])
>>> a[1:,1:]
array([[4, 5],
[7, 8]])
Now imagine you stack them one above the other and sum elements at the same positions. These sums will be exactly the same sums over the 2x2 squares with the top-left corner in the same position:
>>> sums = a[:-1,:-1] + a[1:,:-1] + a[:-1,1:] + a[1:,1:]; sums
array([[ 8, 12],
[20, 24]])
When you have the sums over 2x2 squares, you can use max to find the maximum, or sort, or sorted to find the peaks.
To remember positions of the peaks I couple every value (the sum) with its ordinal position in a flattened array (see zip). Then I calculate row/column position again when I print the results.
Notes
I allowed for the 2x2 squares to overlap. Edited version filters out some of them such that only non-overlapping squares appear in the results.
Choosing fingers (an idea)
Another problem is how to choose what is likely to be fingers out of all the peaks. I have an idea which may or may not work. I don't have time to implement it right now, so just pseudo-code.
I noticed that if the front fingers stay on almost a perfect circle, the rear finger should be inside of that circle. Also, the front fingers are more or less equally spaced. We may try to use these heuristic properties to detect the fingers.
Pseudo code:
select the top N finger candidates (not too many, 10 or 12)
consider all possible combinations of 5 out of N (use itertools.combinations)
for each combination of 5 fingers:
for each finger out of 5:
fit the best circle to the remaining 4
=> position of the center, radius
check if the selected finger is inside of the circle
check if the remaining four are evenly spread
(for example, consider angles from the center of the circle)
assign some cost (penalty) to this selection of 4 peaks + a rear finger
(consider, probably weighted:
circle fitting error,
if the rear finger is inside,
variance in the spreading of the front fingers,
total intensity of 5 peaks)
choose a combination of 4 peaks + a rear peak with the lowest penalty
This is a brute-force approach. If N is relatively small, then I think it is doable. For N=12, there are C_12^5 = 792 combinations, times 5 ways to select a rear finger, so 3960 cases to evaluate for every paw.

This is an image registration problem. The general strategy is:
Have a known example, or some kind of prior on the data.
Fit your data to the example, or fit the example to your data.
It helps if your data is roughly aligned in the first place.
Here's a rough and ready approach, "the dumbest thing that could possibly work":
Start with five toe coordinates in roughly the place you expect.
With each one, iteratively climb to the top of the hill. i.e. given current position, move to maximum neighbouring pixel, if its value is greater than current pixel. Stop when your toe coordinates have stopped moving.
To counteract the orientation problem, you could have 8 or so initial settings for the basic directions (North, North East, etc). Run each one individually and throw away any results where two or more toes end up at the same pixel. I'll think about this some more, but this kind of thing is still being researched in image processing - there are no right answers!
Slightly more complex idea: (weighted) K-means clustering. It's not that bad.
Start with five toe coordinates, but now these are "cluster centres".
Then iterate until convergence:
Assign each pixel to the closest cluster (just make a list for each cluster).
Calculate the center of mass of each cluster. For each cluster, this is: Sum(coordinate * intensity value)/Sum(coordinate)
Move each cluster to the new centre of mass.
This method will almost certainly give much better results, and you get the mass of each cluster which may help in identifying the toes.
(Again, you've specified the number of clusters up front. With clustering you have to specify the density one way or another: Either choose the number of clusters, appropriate in this case, or choose a cluster radius and see how many you end up with. An example of the latter is mean-shift.)
Sorry about the lack of implementation details or other specifics. I would code this up but I've got a deadline. If nothing else has worked by next week let me know and I'll give it a shot.

Using persistent homology to analyze your data set I get the following result (click to enlarge):
This is the 2D-version of the peak detection method described in this SO answer. The above figure simply shows 0-dimensional persistent homology classes sorted by persistence.
I did upscale the original dataset by a factor of 2 using scipy.misc.imresize(). However, note that I did consider the four paws as one dataset; splitting it into four would make the problem easier.
Methodology.
The idea behind this quite simple: Consider the function graph of the function that assigns each pixel its level. It looks like this:
Now consider a water level at height 255 that continuously descents to lower levels. At local maxima islands pop up (birth). At saddle points two islands merge; we consider the lower island to be merged to the higher island (death). The so-called persistence diagram (of the 0-th dimensional homology classes, our islands) depicts death- over birth-values of all islands:
The persistence of an island is then the difference between the birth- and death-level; the vertical distance of a dot to the grey main diagonal. The figure labels the islands by decreasing persistence.
The very first picture shows the locations of births of the islands. This method not only gives the local maxima but also quantifies their "significance" by the above mentioned persistence. One would then filter out all islands with a too low persistence. However, in your example every island (i.e., every local maximum) is a peak you look for.
Python code can be found here.

This problem has been studied in some depth by physicists. There is a good implementation in ROOT. Look at the TSpectrum classes (especially TSpectrum2 for your case) and the documentation for them.
References:
M.Morhac et al.: Background elimination methods for multidimensional coincidence gamma-ray spectra. Nuclear Instruments and Methods in Physics Research A 401 (1997) 113-132.
M.Morhac et al.: Efficient one- and two-dimensional Gold deconvolution and its application to gamma-ray spectra decomposition. Nuclear Instruments and Methods in Physics Research A 401 (1997) 385-408.
M.Morhac et al.: Identification of peaks in multidimensional coincidence gamma-ray spectra. Nuclear Instruments and Methods in Research Physics A 443(2000), 108-125.
...and for those who don't have access to a subscription to NIM:
Spectrum.doc
SpectrumDec.ps.gz
SpectrumSrc.ps.gz
SpectrumBck.ps.gz

I'm sure you have enough to go on by now, but I can't help but suggest using the k-means clustering method. k-means is an unsupervised clustering algorithm which will take you data (in any number of dimensions - I happen to do this in 3D) and arrange it into k clusters with distinct boundaries. It's nice here because you know exactly how many toes these canines (should) have.
Additionally, it's implemented in Scipy which is really nice (http://docs.scipy.org/doc/scipy/reference/cluster.vq.html).
Here's an example of what it can do to spatially resolve 3D clusters:
What you want to do is a bit different (2D and includes pressure values), but I still think you could give it a shot.

Here is an idea: you calculate the (discrete) Laplacian of the image. I would expect it to be (negative and) large at maxima, in a way that is more dramatic than in the original images. Thus, maxima could be easier to find.
Here is another idea: if you know the typical size of the high-pressure spots, you can first smooth your image by convoluting it with a Gaussian of the same size. This may give you simpler images to process.

Just a couple of ideas off the top of my head:
take the gradient (derivative) of the scan, see if that eliminates the false calls
take the maximum of the local maxima
You might also want to take a look at OpenCV, it's got a fairly decent Python API and might have some functions you'd find useful.

thanks for the raw data. I'm on the train and this is as far as I've gotten (my stop is coming up). I massaged your txt file with regexps and have plopped it into a html page with some javascript for visualization. I'm sharing it here because some, like myself, might find it more readily hackable than python.
I think a good approach will be scale and rotation invariant, and my next step will be to investigate mixtures of gaussians. (each paw pad being the center of a gaussian).
<html>
<head>
<script type="text/javascript" src="http://vis.stanford.edu/protovis/protovis-r3.2.js"></script>
<script type="text/javascript">
var heatmap = [[[0,0,0,0,0,0,0,4,4,0,0,0,0],
[0,0,0,0,0,7,14,22,18,7,0,0,0],
[0,0,0,0,11,40,65,43,18,7,0,0,0],
[0,0,0,0,14,61,72,32,7,4,11,14,4],
[0,7,14,11,7,22,25,11,4,14,65,72,14],
[4,29,79,54,14,7,4,11,18,29,79,83,18],
[0,18,54,32,18,43,36,29,61,76,25,18,4],
[0,4,7,7,25,90,79,36,79,90,22,0,0],
[0,0,0,0,11,47,40,14,29,36,7,0,0],
[0,0,0,0,4,7,7,4,4,4,0,0,0]
],[
[0,0,0,4,4,0,0,0,0,0,0,0,0],
[0,0,11,18,18,7,0,0,0,0,0,0,0],
[0,4,29,47,29,7,0,4,4,0,0,0,0],
[0,0,11,29,29,7,7,22,25,7,0,0,0],
[0,0,0,4,4,4,14,61,83,22,0,0,0],
[4,7,4,4,4,4,14,32,25,7,0,0,0],
[4,11,7,14,25,25,47,79,32,4,0,0,0],
[0,4,4,22,58,40,29,86,36,4,0,0,0],
[0,0,0,7,18,14,7,18,7,0,0,0,0],
[0,0,0,0,4,4,0,0,0,0,0,0,0],
],[
[0,0,0,4,11,11,7,4,0,0,0,0,0],
[0,0,0,4,22,36,32,22,11,4,0,0,0],
[4,11,7,4,11,29,54,50,22,4,0,0,0],
[11,58,43,11,4,11,25,22,11,11,18,7,0],
[11,50,43,18,11,4,4,7,18,61,86,29,4],
[0,11,18,54,58,25,32,50,32,47,54,14,0],
[0,0,14,72,76,40,86,101,32,11,7,4,0],
[0,0,4,22,22,18,47,65,18,0,0,0,0],
[0,0,0,0,4,4,7,11,4,0,0,0,0],
],[
[0,0,0,0,4,4,4,0,0,0,0,0,0],
[0,0,0,4,14,14,18,7,0,0,0,0,0],
[0,0,0,4,14,40,54,22,4,0,0,0,0],
[0,7,11,4,11,32,36,11,0,0,0,0,0],
[4,29,36,11,4,7,7,4,4,0,0,0,0],
[4,25,32,18,7,4,4,4,14,7,0,0,0],
[0,7,36,58,29,14,22,14,18,11,0,0,0],
[0,11,50,68,32,40,61,18,4,4,0,0,0],
[0,4,11,18,18,43,32,7,0,0,0,0,0],
[0,0,0,0,4,7,4,0,0,0,0,0,0],
],[
[0,0,0,0,0,0,4,7,4,0,0,0,0],
[0,0,0,0,4,18,25,32,25,7,0,0,0],
[0,0,0,4,18,65,68,29,11,0,0,0,0],
[0,4,4,4,18,65,54,18,4,7,14,11,0],
[4,22,36,14,4,14,11,7,7,29,79,47,7],
[7,54,76,36,18,14,11,36,40,32,72,36,4],
[4,11,18,18,61,79,36,54,97,40,14,7,0],
[0,0,0,11,58,101,40,47,108,50,7,0,0],
[0,0,0,4,11,25,7,11,22,11,0,0,0],
[0,0,0,0,0,4,0,0,0,0,0,0,0],
],[
[0,0,4,7,4,0,0,0,0,0,0,0,0],
[0,0,11,22,14,4,0,4,0,0,0,0,0],
[0,0,7,18,14,4,4,14,18,4,0,0,0],
[0,4,0,4,4,0,4,32,54,18,0,0,0],
[4,11,7,4,7,7,18,29,22,4,0,0,0],
[7,18,7,22,40,25,50,76,25,4,0,0,0],
[0,4,4,22,61,32,25,54,18,0,0,0,0],
[0,0,0,4,11,7,4,11,4,0,0,0,0],
],[
[0,0,0,0,7,14,11,4,0,0,0,0,0],
[0,0,0,4,18,43,50,32,14,4,0,0,0],
[0,4,11,4,7,29,61,65,43,11,0,0,0],
[4,18,54,25,7,11,32,40,25,7,11,4,0],
[4,36,86,40,11,7,7,7,7,25,58,25,4],
[0,7,18,25,65,40,18,25,22,22,47,18,0],
[0,0,4,32,79,47,43,86,54,11,7,4,0],
[0,0,0,14,32,14,25,61,40,7,0,0,0],
[0,0,0,0,4,4,4,11,7,0,0,0,0],
],[
[0,0,0,0,4,7,11,4,0,0,0,0,0],
[0,4,4,0,4,11,18,11,0,0,0,0,0],
[4,11,11,4,0,4,4,4,0,0,0,0,0],
[4,18,14,7,4,0,0,4,7,7,0,0,0],
[0,7,18,29,14,11,11,7,18,18,4,0,0],
[0,11,43,50,29,43,40,11,4,4,0,0,0],
[0,4,18,25,22,54,40,7,0,0,0,0,0],
[0,0,4,4,4,11,7,0,0,0,0,0,0],
],[
[0,0,0,0,0,7,7,7,7,0,0,0,0],
[0,0,0,0,7,32,32,18,4,0,0,0,0],
[0,0,0,0,11,54,40,14,4,4,22,11,0],
[0,7,14,11,4,14,11,4,4,25,94,50,7],
[4,25,65,43,11,7,4,7,22,25,54,36,7],
[0,7,25,22,29,58,32,25,72,61,14,7,0],
[0,0,4,4,40,115,68,29,83,72,11,0,0],
[0,0,0,0,11,29,18,7,18,14,4,0,0],
[0,0,0,0,0,4,0,0,0,0,0,0,0],
]
];
</script>
</head>
<body>
<script type="text/javascript+protovis">
for (var a=0; a < heatmap.length; a++) {
var w = heatmap[a][0].length,
h = heatmap[a].length;
var vis = new pv.Panel()
.width(w * 6)
.height(h * 6)
.strokeStyle("#aaa")
.lineWidth(4)
.antialias(true);
vis.add(pv.Image)
.imageWidth(w)
.imageHeight(h)
.image(pv.Scale.linear()
.domain(0, 99, 100)
.range("#000", "#fff", '#ff0a0a')
.by(function(i, j) heatmap[a][j][i]));
vis.render();
}
</script>
</body>
</html>

Physicist's solution:
Define 5 paw-markers identified by their positions X_i and init them with random positions.
Define some energy function combining some award for location of markers in paws' positions with some punishment for overlap of markers; let's say:
E(X_i;S)=-Sum_i(S(X_i))+alfa*Sum_ij (|X_i-Xj|<=2*sqrt(2)?1:0)
(S(X_i) is the mean force in 2x2 square around X_i, alfa is a parameter to be peaked experimentally)
Now time to do some Metropolis-Hastings magic:
1. Select random marker and move it by one pixel in random direction.
2. Calculate dE, the difference of energy this move caused.
3. Get an uniform random number from 0-1 and call it r.
4. If dE<0 or exp(-beta*dE)>r, accept the move and go to 1; if not, undo the move and go to 1.
This should be repeated until the markers will converge to paws. Beta controls the scanning to optimizing tradeoff, so it should be also optimized experimentally; it can be also constantly increased with the time of simulation (simulated annealing).

Just wanna tell you guys there is a nice option to find local maxima in images with python:
from skimage.feature import peak_local_max
or for skimage 0.8.0:
from skimage.feature.peak import peak_local_max
http://scikit-image.org/docs/0.8.0/api/skimage.feature.peak.html

It's probably worth to try with neural networks if you are able to create some training data... but this needs many samples annotated by hand.

Heres another approach that I used when doing something similar for a large telescope:
1) Search for the highest pixel.
Once you have that, search around that for the best fit for 2x2 (maybe maximizing the 2x2 sum), or do a 2d gaussian fit inside the sub region of say 4x4 centered on the highest pixel.
Then set those 2x2 pixels you have found to zero (or maybe 3x3) around the peak center
go back to 1) and repeat till the highest peak falls below a noise threshold, or you have all the toes you need

a rough outline...
you'd probably want to use a connected components algorithm to isolate each paw region. wiki has a decent description of this (with some code) here: http://en.wikipedia.org/wiki/Connected_Component_Labeling
you'll have to make a decision about whether to use 4 or 8 connectedness. personally, for most problems i prefer 6-connectedness. anyway, once you've separated out each "paw print" as a connected region, it should be easy enough to iterate through the region and find the maxima. once you've found the maxima, you could iteratively enlarge the region until you reach a predetermined threshold in order to identify it as a given "toe".
one subtle problem here is that as soon as you start using computer vision techniques to identify something as a right/left/front/rear paw and you start looking at individual toes, you have to start taking rotations, skews, and translations into account. this is accomplished through the analysis of so-called "moments". there are a few different moments to consider in vision applications:
central moments: translation invariant
normalized moments: scaling and translation invariant
hu moments: translation, scale, and rotation invariant
more information about moments can be found by searching "image moments" on wiki.

Perhaps you can use something like Gaussian Mixture Models. Here's a Python package for doing GMMs (just did a Google search)
http://www.ar.media.kyoto-u.ac.jp/members/david/softwares/em/

Interesting problem. The solution I would try is the following.
Apply a low pass filter, such as convolution with a 2D gaussian mask. This will give you a bunch of (probably, but not necessarily floating point) values.
Perform a 2D non-maximal suppression using the known approximate radius of each paw pad (or toe).
This should give you the maximal positions without having multiple candidates which are close together. Just to clarify, the radius of the mask in step 1 should also be similar to the radius used in step 2. This radius could be selectable, or the vet could explicitly measure it beforehand (it will vary with age/breed/etc).
Some of the solutions suggested (mean shift, neural nets, and so on) probably will work to some degree, but are overly complicated and probably not ideal.

It seems you can cheat a bit using jetxee's algorithm. He is finding the first three toes fine, and you should be able to guess where the fourth is based off that.

Well, here's some simple and not terribly efficient code, but for this size of a data set it is fine.
import numpy as np
grid = np.array([[0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0,0.4,0.4,0.4,0,0,0],
[0,0,0,0,0.4,1.4,1.4,1.8,0.7,0,0,0,0,0],
[0,0,0,0,0.4,1.4,4,5.4,2.2,0.4,0,0,0,0],
[0,0,0.7,1.1,0.4,1.1,3.2,3.6,1.1,0,0,0,0,0],
[0,0.4,2.9,3.6,1.1,0.4,0.7,0.7,0.4,0.4,0,0,0,0],
[0,0.4,2.5,3.2,1.8,0.7,0.4,0.4,0.4,1.4,0.7,0,0,0],
[0,0,0.7,3.6,5.8,2.9,1.4,2.2,1.4,1.8,1.1,0,0,0],
[0,0,1.1,5,6.8,3.2,4,6.1,1.8,0.4,0.4,0,0,0],
[0,0,0.4,1.1,1.8,1.8,4.3,3.2,0.7,0,0,0,0,0],
[0,0,0,0,0,0.4,0.7,0.4,0,0,0,0,0,0]])
arr = []
for i in xrange(grid.shape[0] - 1):
for j in xrange(grid.shape[1] - 1):
tot = grid[i][j] + grid[i+1][j] + grid[i][j+1] + grid[i+1][j+1]
arr.append([(i,j),tot])
best = []
arr.sort(key = lambda x: x[1])
for i in xrange(5):
best.append(arr.pop())
badpos = set([(best[-1][0][0]+x,best[-1][0][1]+y)
for x in [-1,0,1] for y in [-1,0,1] if x != 0 or y != 0])
for j in xrange(len(arr)-1,-1,-1):
if arr[j][0] in badpos:
arr.pop(j)
for item in best:
print grid[item[0][0]:item[0][0]+2,item[0][1]:item[0][1]+2]
I basically just make an array with the position of the upper-left and the sum of each 2x2 square and sort it by the sum. I then take the 2x2 square with the highest sum out of contention, put it in the best array, and remove all other 2x2 squares that used any part of this just removed 2x2 square.
It seems to work fine except with the last paw (the one with the smallest sum on the far right in your first picture), it turns out that there are two other eligible 2x2 squares with a larger sum (and they have an equal sum to each other). One of them is still selects one square from your 2x2 square, but the other is off to the left. Fortunately, by luck we see to be choosing more of the one that you would want, but this may require some other ideas to be used to get what you actually want all of the time.

I am not sure this answers the question, but it seems like you can just look for the n highest peaks that don't have neighbors.
Here is the gist. Note that it's in Ruby, but the idea should be clear.
require 'pp'
NUM_PEAKS = 5
NEIGHBOR_DISTANCE = 1
data = [[1,2,3,4,5],
[2,6,4,4,6],
[3,6,7,4,3],
]
def tuples(matrix)
tuples = []
matrix.each_with_index { |row, ri|
row.each_with_index { |value, ci|
tuples << [value, ri, ci]
}
}
tuples
end
def neighbor?(t1, t2, distance = 1)
[1,2].each { |axis|
return false if (t1[axis] - t2[axis]).abs > distance
}
true
end
# convert the matrix into a sorted list of tuples (value, row, col), highest peaks first
sorted = tuples(data).sort_by { |tuple| tuple.first }.reverse
# the list of peaks that don't have neighbors
non_neighboring_peaks = []
sorted.each { |candidate|
# always take the highest peak
if non_neighboring_peaks.empty?
non_neighboring_peaks << candidate
puts "took the first peak: #{candidate}"
else
# check that this candidate doesn't have any accepted neighbors
is_ok = true
non_neighboring_peaks.each { |accepted|
if neighbor?(candidate, accepted, NEIGHBOR_DISTANCE)
is_ok = false
break
end
}
if is_ok
non_neighboring_peaks << candidate
puts "took #{candidate}"
else
puts "denied #{candidate}"
end
end
}
pp non_neighboring_peaks

Maybe a naive approach is sufficient here: Build a list of all 2x2 squares on your plane, order them by their sum (in descending order).
First, select the highest-valued square into your "paw list". Then, iteratively pick 4 of the next-best squares that don't intersect with any of the previously found squares.

What if you proceed step by step: you first locate the global maximum, process if needed the surrounding points given their value, then set the found region to zero, and repeat for the next one.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.