Find minimal number of rectangles in the image - python

I have binary images where rectangles are placed randomly and I want to get the positions and sizes of those rectangles.
If possible I want the minimal number of rectangles necessary to exactly recreate the image.
On the left is my original image and on the right the image I get after applying scipys.find_objects()
(like suggested for this question).
import scipy
# image = scipy.ndimage.zoom(image, 9, order=0)
labels, n = scipy.ndimage.measurements.label(image, np.ones((3, 3)))
bboxes = scipy.ndimage.measurements.find_objects(labels)
img_new = np.zeros_like(image)
for bb in bboxes:
img_new[bb[0], bb[1]] = 1
This works fine if the rectangles are far apart, but if they overlap and build more complex structures this algorithm just gives me the largest bounding box (upsampling the image made no difference). I have the feeling that there should already exist a scipy or opencv method which does this.
I would be glad to know if somebody has an idea on how to tackle this problem or even better knows of an existing solution.
As result I want a list of rectangles (ie. lower-left-corner : upper-righ-corner) in the image. The condition is that when I redraw those filled rectangles I want to get exactly the same image as before. If possible the number of rectangles should be minimal.
Here is the code for generating sample images (and a more complex example original vs scipy)
import numpy as np
def random_rectangle_image(grid_size, n_obstacles, rectangle_limits):
n_dim = 2
rect_pos = np.random.randint(low=0, high=grid_size-rectangle_limits[0]+1,
size=(n_obstacles, n_dim))
rect_size = np.random.randint(low=rectangle_limits[0],
size=(n_obstacles, n_dim))
# Crop rectangle size if it goes over the boundaries of the world
diff = rect_pos + rect_size
ex = np.where(diff > grid_size, True, False)
rect_size[ex] -= (diff - grid_size)[ex].astype(int)
img = np.zeros((grid_size,)*n_dim, dtype=bool)
for i in range(n_obstacles):
p_i = np.array(rect_pos[i])
ps_i = p_i + np.array(rect_size[i])
img[tuple(map(slice, p_i, ps_i))] = True
return img
img = random_rectangle_image(grid_size=64, n_obstacles=30,
rectangle_limits=[4, 10])

Here is something to get you started: a naïve algorithm that walks your image and creates rectangles as large as possible. As it is now, it only marks the rectangles but does not report back coordinates or counts. This is to visualize the algorithm alone.
It does not need any external libraries except for PIL, to load and access the left side image when saved as a PNG. I'm assuming a border of 15 pixels all around can be ignored.
from PIL import Image
def fill_rect (pixels,xp,yp,w,h):
for y in range(h):
for x in range(w):
pixels[xp+x,yp+y] = (255,0,0,255)
for y in range(h):
pixels[xp,yp+y] = (255,192,0,255)
pixels[xp+w-1,yp+y] = (255,192,0,255)
for x in range(w):
pixels[xp+x,yp] = (255,192,0,255)
pixels[xp+x,yp+h-1] = (255,192,0,255)
def find_rect (pixels,x,y,maxx,maxy):
# assume we're at the top left
# get max horizontal span
width = 0
height = 1
while x+width < maxx and pixels[x+width,y] == (0,0,0,255):
width += 1
# now walk down, adjusting max width
while y+height < maxy:
for w in range(x,x+width,1):
if pixels[x,y+height] != (0,0,0,255):
if pixels[x,y+height] != (0,0,0,255):
height += 1
# fill rectangle
fill_rect (pixels,x,y,width,height)
image ='A.png')
pixels = image.load()
width, height = image.size
print (width,height)
for y in range(16,height-15,1):
for x in range(16,width-15,1):
if pixels[x,y] == (0,0,0,255):
find_rect (pixels,x,y,width,height)
From the output
you can observe the detection algorithm can be improved, as, for example, the "obvious" two top left rectangles are split up into 3. Similar, the larger structure in the center also contains one rectangle more than absolutely needed.
Possible improvements are either to adjust the find_rect routine to locate a best fit¹, or store the coordinates and use math (beyond my ken) to find which rectangles may be joined.
¹ A further idea on this. Currently all found rectangles are immediately filled with the "found" color. You could try to detect obviously multiple rectangles, and then, after marking the first, the other rectangle(s) to check may then either be black or red. Off the cuff I'd say you'd need to try different scan orders (top-to-bottom or reverse, left-to-right or reverse) to actually find the minimally needed number of rectangles in any combination.


How to slice an image with different dimensions in python?

I have a 1024x1024 image and I want to slice it with boxes which are different sizes and will be selected randomly. For example 2 pieces 512x512,8 pieces 16x16 etc. Box positions is not important. And I want to use every pixel only one time. Below is my code but when I run it, a lot of pictures are created and same regions are being used. How can I make that each pixel will be used only 1 time. Below picture represents which I want.
from PIL import Image
import random
infile = 'Da Vinci.jpg'
chopsize = [512,256,128,64,32]
img =
width, height = img.size
a= random.choice(chopsize)
for x0 in range(0, width):
for y0 in range(0, height):
box = (x0, y0,
x0+random.choice(chopsize) if x0+random.choice(chopsize) < width else width - 1,
y0+random.choice(chopsize) if y0+random.choice(chopsize) < height else height - 1)
print('%s %s' % (infile, box))
img.crop(box).save('%s.x%01d.y%01d.jpg' % (infile.replace('.jpg',''), x0, y0))
That is what I want:
This is a fun problem! How do you randomly tile with your boxes an area but make sure none of the boxes overlap.
You have a couple of issues:
as you've written your code so far you are going to have boxes that spill over the border of your image. I don't know if you care about this - in your example picture the boxes fit perfectly into the space. If you do care you are going to have to figure that part out.
(although this code makes me think you have thought about it and don't care)
x0+random.choice(chopsize) if x0+random.choice(chopsize) < width else width - 1
The other issue which is what your question is really about is that you don't save a record of what pixels you have already visited. There are a few different ways you could do this.
One might be something like:
import numpy as np
filled_pixels = np.zeros((width, height))
x = 0
while x < width:
if filled_pixels[x,y] == 1:
x+=32 #the minimum dimensions of a square
while y < height:
chop = random.choice(chopsize)
if filled_pixels[x,y] == 1:
y+=1 #the minimum dimensions of a square
filled_pixels[x:x+chop,y:y+chop] = 1
#do your stuff with making the boxes
you basically could raster through your image making boxes, making sure that you aren't making a square at any pixel where you already have a square (given by your filled value)

python split image in overlapping and rotating tiles

I am doing a image classification. I have very imbalanced data. I am trying couple of approaches to overcome the imbalanced data issue. one of them is oversampling the minority class.
The images that i have are already in high resolution(1392x1038), so i am splitting them into 348x256 size 16 tiles. As in oversampling, you simply replicate the minority classes. I was thinking of splitting the image into overlapping tiles with stride 1 or 2, so i would have slighlty different images and it would also help me in oversampling. Following code splits the images into specified number of defined size overlapping tiles
for i in range(0, count):
start_row_idx = random.randint(0, img_height-target_height-1)
start_col_idx = random.randint(0, img_width-target_width-1)
if mode == 'rgb':
patch = img_array[start_row_idx:(start_row_idx+target_height), start_col_idx:(start_col_idx+target_width), :]
patch = img_array[start_row_idx:(start_row_idx+target_height), start_col_idx:(start_col_idx+target_width)]
idxs.append((start_row_idx, start_col_idx))
how can I make it work for rotating overlapping tiles with defined number of tiles and size.
Edited Question:
In following image, the black squares shows the horizontal stride and tile which is I am able to get. I want to get the red color squares in that shape. I think, with red color type cropping i would be able to get more images for oversampling.
As we discussed above, you have tiles that have the potential of being overlapped so this is already being addressed. What is missing are rotating the tiles too. We will need to specify a random angle of rotation so that we can generate a random angle first.
After, this is simply a matter of applying an affine transform that is purely a rotation to the tiles then appending to the list. The problem with rotating images in OpenCV is that when you do rotate the image, it is subject to cropping so you don't get the entire tile contained in the image once you rotate.
I used the following post as inspiration to address this issue so that when you do rotate, the image is fully contained. Take note that the image will expand in dimensions in order to accommodate for the rotation and to keep the entire image contained in the rotated result.
import cv2
import numpy as np
def rotate_about_center(src, angle):
h, w = src.shape[:2]
rangle = np.deg2rad(angle) # angle in radians
# now calculate new image width and height
nw = (abs(np.sin(rangle)*h) + abs(np.cos(rangle)*w))
nh = (abs(np.cos(rangle)*h) + abs(np.sin(rangle)*w))
# ask OpenCV for the rotation matrix
rot_mat = cv2.getRotationMatrix2D((nw*0.5, nh*0.5), angle, 1)
# calculate the move from the old centre to the new centre combined
# with the rotation
rot_move =, np.array([(nw-w)*0.5, (nh-h)*0.5,0]))
# the move only affects the translation, so update the translation
# part of the transform
rot_mat[0,2] += rot_move[0]
rot_mat[1,2] += rot_move[1]
return cv2.warpAffine(src, rot_mat, (int(math.ceil(nw)), int(math.ceil(nh))), flags=cv2.INTER_LANCZOS4)
You use this function and call this with a random angle then save the patch when you're done. You'll also need to specify a maximum angle of rotation of course.
import random
max_angle = 20 # +/- 20 degrees maximum rotation
patches = []
idxs = []
for i in range(0, count):
start_row_idx = random.randint(0, img_height-target_height-1)
start_col_idx = random.randint(0, img_width-target_width-1)
# Generate an angle between +/- max_angle
angle = (2*max_angle)*random.random() - max_angle
if mode == 'rgb':
patch = img_array[start_row_idx:(start_row_idx+target_height), start_col_idx:(start_col_idx+target_width), :]
patch = img_array[start_row_idx:(start_row_idx+target_height), start_col_idx:(start_col_idx+target_width)]
# Randomly rotate the image
patch_r = rotate_about_center(patch, angle)
# Save it now
idxs.append((start_row_idx, start_col_idx))

Connect the nearest points in segment and label segment

I using Open CV and skimage for document analysis of datasheets.
I am trying to segment out the shade region separately .
I am currently able to segment out the part and number as different clusters.
Using felzenszwalb() from skimage I segment the parts:
import matplotlib.pyplot as plt
import numpy as np
from skimage.segmentation import felzenszwalb
from import imread
img = imread('test.jpg')
segments_fz = felzenszwalb(img, scale=100, sigma=0.2, min_size=50)
print("Felzenszwalb number of segments {}".format(len(np.unique(segments_fz))))
But not able to connect them. Any idea to connect methodically and label out the corresponding segment with part and part number would of great help .
Thanks in advance for your time – if I’ve missed out anything, over- or under-emphasised a specific point let me know in the comments.
Some preliminary code:
%matplotlib inline
%load_ext Cython
import numpy as np
import cv2
from matplotlib import pyplot as plt
import skimage as sk
import skimage.morphology as skm
import itertools
def ShowImage(title,img,ctype):
plt.figure(figsize=(20, 20))
if ctype=='bgr':
b,g,r = cv2.split(img) # get b,g,r
rgb_img = cv2.merge([r,g,b]) # switch it to rgb
elif ctype=='hsv':
rgb = cv2.cvtColor(img,cv2.COLOR_HSV2RGB)
elif ctype=='gray':
elif ctype=='rgb':
raise Exception("Unknown colour type")
For reference, here's your original image:
#Read in image
img = cv2.imread('part.jpg')
Identifying Numbers
To simplify things, we'll want to classify pixels as being either on or off. We can do so with thresholding. Since our image contains two clear classes of pixels (black and white), we can use Otsu's method. We'll invert the colour scheme since the libraries we're using consider black pixels boring and white pixels interesting.
#Convert image to grayscale
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
#Apply Otsu's method to eliminate pixels of intermediate colour
ret, thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
ShowImage('Applying Otsu',thresh,'gray')
#Verify that pixels are either black or white and nothing in between
Our strategy will be to locate numbers and then follow the line(s) near them to parts and then to label those parts. Since, conveniently, all of the Arabic numerals are formed from contiguous pixels, we can start by finding the connected components.
ret, components = cv2.connectedComponents(thresh)
#Each component is a different colour
ShowImage('Connected Components', components, 'rgb')
We can then filter the connected components to find the numbers by filtering for dimension. Note that this is not a super robust method of doing this. A better option would be to use character recognition, but this is left as an exercise to the reader :-)
class Box:
def __init__(self,x0,x1,y0,y1):
self.x0, self.x1, self.y0, self.y1 = x0,x1,y0,y1
def overlaps(self,box2,tol):
if self.x0 is None or box2.x0 is None:
return False
return not (self.x1+tol<=box2.x0 or self.x0-tol>=box2.x1 or self.y1+tol<=box2.y0 or self.y0-tol>=box2.y1)
def merge(self,box2):
self.x0 = min(self.x0,box2.x0)
self.x1 = max(self.x1,box2.x1)
self.y0 = min(self.y0,box2.y0)
self.y1 = max(self.y1,box2.y1)
box2.x0 = None #Used to mark `box2` as being no longer valid. It can be removed later
def dist(self,x,y):
#Get center point
ax = (self.x0+self.x1)/2
ay = (self.y0+self.y1)/2
#Get distance to center point
return np.sqrt((ax-x)**2+(ay-y)**2)
def good(self):
return not (self.x0 is None)
def ExtractComponent(original_image, component_matrix, component_number):
"""Extracts a component from a ConnectedComponents matrix"""
#Create a true-false matrix indicating if a pixel is part of a particular component
is_component = component_matrix==component_number
#Find the coordinates of those pixels
coords = np.argwhere(is_component)
# Bounding box of non-black pixels.
y0, x0 = coords.min(axis=0)
y1, x1 = coords.max(axis=0) + 1 # slices are exclusive at the top
# Get the contents of the bounding box.
return x0,x1,y0,y1,original_image[y0:y1, x0:x1]
numbers_img = thresh.copy() #This is used purely to show that we can identify numbers
numbers = []
for component in range(components.max()):
tx0,tx1,ty0,ty1,this_component = ExtractComponent(thresh, components, component)
#ShowImage('Component #{0}'.format(component), this_component, 'gray')
cheight, cwidth = this_component.shape
#print(cwidth,cheight) #Enable this to see dimensions
#Identify numbers based on aspect ratio
if (abs(cwidth-14)<3 or abs(cwidth-7)<3) and abs(cheight-24)<3:
numbers_img[ty0:ty1,tx0:tx1] = 128
ShowImage('Numbers', numbers_img, 'gray')
We now connect the numbers into contiguous blocks by expanding their bounding boxes slightly and looking for overlaps.
#This is kind of a silly way to do this, but it will work find for small quantities (hundreds)
merged=True #If true, then a merge happened this round
while merged: #Continue until there are no more mergers
merged=False #Reset merge indicator
for a,b in itertools.combinations(numbers,2): #Consider all pairs of numbers
if a.overlaps(b,10): #If this pair overlaps
a.merge(b) #Merge it
merged=True #Make a note that we've merged
numbers = [x for x in numbers if x.good()] #Eliminate those boxes that were gobbled by the mergers
#This is used purely to show that we can identify numbers
numbers_img = thresh.copy()
for n in numbers:
numbers_img[n.y0:n.y1,n.x0:n.x1] = 128
thresh[n.y0:n.y1,n.x0:n.x1] = 0 #Drop numbers from thresholded image
ShowImage('Numbers', numbers_img, 'gray')
Okay, so now we've identified the numbers! We'll use these later to identify parts.
Identifying Arrows
Next, we'll want to figure out what parts the numbers are pointing to. To do so, we want to detect lines. The Hough transform is good for this. To reduce the number of false positives, we skeletonize the data, which transforms it into a representation which is at most one pixel wide.
skel = sk.img_as_ubyte(skm.skeletonize(thresh>0))
ShowImage('Skeleton', skel, 'gray')
Now we perform the Hough transform. We're looking for one that identifies all of the lines going from the numbers to the parts. Getting this right may take some fiddling with the parameters.
lines = cv2.HoughLinesP(
1, #Resolution of r in pixels
np.pi / 180, #Resolution of theta in radians
30, #Minimum number of intersections to detect a line
80, #Min line length
10 #Max line gap
lines = [x[0] for x in lines]
line_img = thresh.copy()
line_img = cv2.cvtColor(line_img, cv2.COLOR_GRAY2BGR)
for l in lines:
color = tuple(map(int, np.random.randint(low=0, high=255, size=3)))
cv2.line(line_img, (l[0], l[1]), (l[2], l[3]), color, 3, cv2.LINE_AA)
ShowImage('Lines', line_img, 'bgr')
We now want to find the line or lines which are closest to each number and retain only these. We're essentially filtering out all of the lines which are not arrows. To do so, we compare the end points of each line to the center point of each number box.
comp_labels = np.zeros(img.shape[0:2], dtype=np.uint8)
for n_idx,n in enumerate(numbers):
distvals = []
for i,l in enumerate(lines):
#Distances from each point of line to midpoint of rectangle
dists = [n.dist(l[0],l[1]),n.dist(l[2],l[3])]
#Minimum distance and the end point (0 or 1) of the line associated with that point
#Tuples of (Line Number, Line Point, Dist to Line Point) are produced
distvals.append( (i,np.argmin(dists),np.min(dists)) )
#Sort by distance between the number box and the line
distvals = sorted(distvals, key=lambda x: x[2])
#Include nearby lines, not just the closest one. This accounts for forking.
distvals = [x for x in distvals if x[2]<1.5*distvals[0][2]]
#Draw a white rectangle where the number box was
cv2.rectangle(comp_labels, (n.x0,n.y0), (n.x1,n.y1), 1, cv2.FILLED)
#Draw white lines where the arrows are
for dv in distvals:
l = lines[dv[0]]
lp = (l[0],l[1]) if dv[1]==0 else (l[2],l[3])
cv2.line(comp_labels, (l[0], l[1]), (l[2], l[3]), 1, 3, cv2.LINE_AA)
cv2.line(comp_labels, (lp[0], lp[1]), ((n.x0+n.x1)//2, (n.y0+n.y1)//2), 1, 3, cv2.LINE_AA)
ShowImage('Lines', comp_labels, 'gray')
Finding Parts
This part was hard! We now want to segment the parts in the image. If there was some way to disconnect the lines linking subparts together, this would be easy. Unfortunately, the lines connecting the subparts are the same width as many of the lines which constitute the parts.
To work around this, we could use a lot of logic. It would be painful and error-prone.
Alternatively, we could assume you have an expert-in-the-loop. This expert's sole job is to cut the lines connecting the subparts. This should be both easy and fast for them. Labeling everything would be slow and sad for humans, but is fast for computers. Separating things is easy for humans, but hard for computers. So we let both do what they do best.
In this case, you could probably train someone to do this job in a few minutes, so a true "expert" isn't really necessary. Just a mildly competent human.
If you pursue this, you'll need to write the expert in the loop tool. To do so, save the skeleton images, have your expert modify them, and read the skeletonized images back in. Like so.
#Save the image, or display it on a GUI
#cv2.imwrite("/z/skel.png", skel);
#Read the expert-mediated image back in
skelhuman = cv2.imread('/z/skel.png')
#Convert back to the form we need
skelhuman = cv2.cvtColor(skelhuman,cv2.COLOR_BGR2GRAY)
ret, skelhuman = cv2.threshold(skelhuman,0,255,cv2.THRESH_OTSU)
ShowImage('SkelHuman', skelhuman, 'gray')
Now that we have the parts separated, we'll eliminate as much of the arrows as possible. We've already extracted these above, so we can add them back later if we need to.
To eliminate the arrows, we'll find all of the lines that terminate in locations other than by another line. That is, we'll locate pixels which have only one neighbouring pixel. We'll then eliminate the pixel and look at its neighbour. Doing this iteratively eliminates the arrows. Since I don't know another term for it, I'll call this a Fuse Transform. Since this will require manipulating individual pixels, which would be super slow in Python, we'll write the transform in Cython.
%%cython -a --cplus
import cython
from libcpp.queue cimport queue
import numpy as np
cimport numpy as np
cpdef void FuseTransform(unsigned char [:, :] image):
# set the variable extension types
cdef int c, x, y, nx, ny, width, height, neighbours
cdef queue[int] q
# grab the image dimensions
height = image.shape[0]
width = image.shape[1]
cdef int dx[8]
cdef int dy[8]
#Offsets to neighbouring cells
dx[:] = [-1,-1,0,1,1,1,0,-1]
dy[:] = [0,-1,-1,-1,0,1,1,1]
#Find seed cells: those with only one neighbour
for y in range(1, height-1):
for x in range(1, width-1):
if image[y,x]==0: #Seed cells cannot be blank cells
neighbours = 0
for n in range(0,8): #Looks at all neighbours
nx = x+dx[n]
ny = y+dy[n]
if image[ny,nx]>0: #This neighbour has a value
neighbours += 1
if neighbours==1: #Was there only one neighbour?
q.push(y*width+x) #If so, this is a seed cell
#Starting with the seed cells, gobble up the lines
while not q.empty():
c = q.front()
y = c//width #Convert flat index into 2D x-y index
x = c%width
image[y,x] = 0 #Gobble up this part of the fuse
neighbour = -1 #No neighbours yet
for n in range(0,8): #Look at all neighbours
nx = x+dx[n] #Find coordinates of neighbour cells
ny = y+dy[n]
#If the neighbour would be off the side of the matrix, ignore it
if nx<0 or ny<0 or nx==width or ny==height:
if image[ny,nx]>0: #Is the neighbouring cell active?
if neighbour!=-1: #If we've already found an active neighbour
neighbour=-1 #Then pretend we found no neighbours
break #And stop looking. This is the end of the fuse.
else: #Otherwise, make a note of the neighbour's index.
neighbour = ny*width+nx
if neighbour!=-1: #If there was only one neighbour
q.push(neighbour) #Continue burning the fuse
Back in standard Python:
#Apply the Fuse Transform
ShowImage('Fuse Transform', skh_dilated, 'gray')
Now that we've eliminated all of the arrows and lines connecting the parts, we dilate the remaining pixels a lot.
kernel = np.ones((3,3),np.uint8)
dilated = cv2.dilate(skh_dilated, kernel, iterations=6)
ShowImage('Dilation', dilated, 'gray')
Putting It All Together
And overlay the labels and arrows we segmented out earlier...
comp_labels_dilated = cv2.dilate(comp_labels, kernel, iterations=5)
labels_combined = np.uint8(np.logical_or(comp_labels_dilated,dilated))
ShowImage('Comp Labels', labels_combined, 'gray')
Finally, we take the merged number boxes, component arrows, and parts and color each of them using pretty colors from Color Brewer. We then overlay this on the original image to obtain the desired highlighting.
ret, labels = cv2.connectedComponents(labels_combined)
colormask = np.zeros(img.shape, dtype=np.uint8)
#Colors from Color Brewer
colors = [(228,26,28),(55,126,184),(77,175,74),(152,78,163),(255,127,0),(255,255,51),(166,86,40),(247,129,191),(153,153,153)]
for l in range(labels.max()):
if l==0: #Background component
colormask[labels==0] = (255,255,255)
colormask[labels==l] = colors[l]
ShowImage('Comp Labels', colormask, 'bgr')
blended = cv2.addWeighted(img,0.7,colormask,0.3,0)
ShowImage('Blended', blended, 'bgr')
The final image
So, to recap, we identified numbers, arrows, and parts. In some cases, we were able to separate them automatically. In other cases, we used expert in the loop. Where we had to manipulate pixels individually, we used Cython for speed.
Of course, the danger with this sort of thing is that some other image will break the (many) assumptions I've made here. But that's a risk that you take when you try to use a single image to present a problem.

PySide: Separating a spritesheet / Separating an image into contiguous regions of color

I'm working on a program in which I need to separate spritesheets, or in other words, separate an image into contiguous regions of color.
I've never done any image processing before, so I'm wondering how I would go about this. What would I do after I test for pixel color? What's the best way to determine which pixel goes with each sprite?
All the input images have uniform backgrounds, and an alpha channel different from that of the background counts as color. The order of the output images needs to be left-right, up-down. My project is written in PySide, so I'm hoping to use it for this task too, but I could import more libraries if necessary.
Thanks your replies!
I'm not sure if the PySide tag is appropriate or not, since I'm using PySide, but the question doesn't involve the GUI aspects of it. If a mod feels it doesn't belong, feel free to remove it.
For example, I have a spritesheet that looks like this:
I want to separate it into these:
That sounds like something that should be implemented in anything that deals with sprites, but here we will implement our own sprite-spliter.
The first thing we need here is to extract the individual objects. In this situation, it is only a matter of deciding whether a pixel is a background one or not. If we assume the point at origin is a background pixel, then we are done:
from PIL import Image
def sprite_mask(img, bg_point=(0, 0)):
width, height = img.size
im = img.load()
bg = im[bg_point]
mask_img ='L', img.size)
mask = mask_img.load()
for x in xrange(width):
for y in xrange(height):
if im[x, y] != bg:
mask[x, y] = 255
return mask_img, bg
If you save the mask image created above and open it, here is what you would see on it (I added a rectangle inside your empty window):
With the image above, the next thing we need is to fill its holes if we want to join sprites that are inside others (like the rectangle added, see figure above). This is another simple rule: if a point cannot be reached from the point at [0, 0], then it is a hole and it must be filled. All that is left is then separating each sprite in individual images. This is done by connected component labeling. For each component we get its axis-aligned bounding box in order to define the dimensions of the piece, and then we copy from the original image the points that belong to a given component. To keep it short, the following code uses scipy for these tasks:
import sys
import numpy
from scipy.ndimage import label, morphology
def split_sprite(img, mask, bg, join_interior=True, basename='sprite_%d.png'):
im = img.load()
m = numpy.array(mask, dtype=numpy.uint8)
if join_interior:
m = morphology.binary_fill_holes(m)
lbl, ncc = label(m, numpy.ones((3, 3)))
for i in xrange(1, ncc + 1):
px, py = numpy.nonzero(lbl == i)
xmin, xmax, ymin, ymax = px.min(), px.max(), py.min(), py.max()
sprite =, (ymax - ymin + 1, xmax - xmin + 1), bg)
sp = sprite.load()
for x, y in zip(px, py):
x, y = int(x), int(y)
sp[y - int(ymin), x - int(xmin)] = im[y, x]
name = basename % i
print "Wrote %s" % name
sprite =[1])
mask, bg = sprite_mask(sprite)
split_sprite(sprite, mask, bg)
Now you have all the pieces (sprite_1.png, sprite_2.png, ..., sprite_8.png) exactly as you included in the question.

Trim scanned images with PIL?

What would be the approach to trim an image that's been input using a scanner and therefore has a large white/black area?
the entropy solution seems problematic and overly intensive computationally. Why not edge detect?
I just wrote this python code to solve this same problem for myself. My background was dirty white-ish, so the criteria that I used was darkness and color. I simplified this criteria by just taking the smallest of the R, B or B value for each pixel, so that black or saturated red both stood out the same. I also used the average of the however many darkest pixels for each row or column. Then I started at each edge and worked my way in till I crossed a threshold.
Here is my code:
#these values set how sensitive the bounding box detection is
threshold = 200 #the average of the darkest values must be _below_ this to count (0 is darkest, 255 is lightest)
obviousness = 50 #how many of the darkest pixels to include (1 would mean a single dark pixel triggers it)
from PIL import Image
def find_line(vals):
#implement edge detection once, use many times
for i,tmp in enumerate(vals):
average = float(sum(tmp[:obviousness]))/len(tmp[:obviousness])
if average <= threshold:
return i
return i #i is left over from failed threshold finding, it is the bounds
def getbox(img):
#get the bounding box of the interesting part of a PIL image object
#this is done by getting the darekest of the R, G or B value of each pixel
#and finding were the edge gest dark/colored enough
#returns a tuple of (left,upper,right,lower)
width, height = img.size #for making a 2d array
retval = [0,0,width,height] #values will be disposed of, but this is a black image's box
pixels = list(img.getdata())
vals = [] #store the value of the darkest color
for pixel in pixels:
vals.append(min(pixel)) #the darkest of the R,G or B values
#make 2d array
vals = np.array([vals[i * width:(i + 1) * width] for i in xrange(height)])
#start with upper bounds
forupper = vals.copy()
retval[1] = find_line(forupper)
#next, do lower bounds
forlower = vals.copy()
forlower = np.flipud(forlower)
retval[3] = height - find_line(forlower)
#left edge, same as before but roatate the data so left edge is top edge
forleft = vals.copy()
forleft = np.swapaxes(forleft,0,1)
retval[0] = find_line(forleft)
#and right edge is bottom edge of rotated array
forright = vals.copy()
forright = np.swapaxes(forright,0,1)
forright = np.flipud(forright)
retval[2] = width - find_line(forright)
if retval[0] >= retval[2] or retval[1] >= retval[3]:
print "error, bounding box is not legit"
return None
return tuple(retval)
if __name__ == '__main__':
image ='cat.jpg')
box = getbox(image)
print "result is: ",box
result = image.crop(box)
For starters, Here is a similar question. Here is a related question. And a another related question.
Here is just one idea, there are certainly other approaches. I would select an arbitrary crop edge and then measure the entropy* on either side of the line, then proceed to re-select the crop line (probably using something like a bisection method) until the entropy of the cropped-out portion falls below a defined threshold. As I think, you may need to resort to a brute root-finding method as you will not have a good indication of when you have cropped too little. Then repeat for the remaining 3 edges.
*I recall discovering that the entropy method in the referenced website was not completely accurate, but I could not find my notes (I'm sure it was in a SO post, however.)
Other criteria for the "emptiness" of an image portion (other than entropy) might be contrast ratio or contrast ratio on an edge-detect result.
