What would be the approach to trim an image that's been input using a scanner and therefore has a large white/black area?
the entropy solution seems problematic and overly intensive computationally. Why not edge detect?
I just wrote this python code to solve this same problem for myself. My background was dirty white-ish, so the criteria that I used was darkness and color. I simplified this criteria by just taking the smallest of the R, B or B value for each pixel, so that black or saturated red both stood out the same. I also used the average of the however many darkest pixels for each row or column. Then I started at each edge and worked my way in till I crossed a threshold.
Here is my code:
#these values set how sensitive the bounding box detection is
threshold = 200 #the average of the darkest values must be _below_ this to count (0 is darkest, 255 is lightest)
obviousness = 50 #how many of the darkest pixels to include (1 would mean a single dark pixel triggers it)
from PIL import Image
def find_line(vals):
#implement edge detection once, use many times
for i,tmp in enumerate(vals):
tmp.sort()
average = float(sum(tmp[:obviousness]))/len(tmp[:obviousness])
if average <= threshold:
return i
return i #i is left over from failed threshold finding, it is the bounds
def getbox(img):
#get the bounding box of the interesting part of a PIL image object
#this is done by getting the darekest of the R, G or B value of each pixel
#and finding were the edge gest dark/colored enough
#returns a tuple of (left,upper,right,lower)
width, height = img.size #for making a 2d array
retval = [0,0,width,height] #values will be disposed of, but this is a black image's box
pixels = list(img.getdata())
vals = [] #store the value of the darkest color
for pixel in pixels:
vals.append(min(pixel)) #the darkest of the R,G or B values
#make 2d array
vals = np.array([vals[i * width:(i + 1) * width] for i in xrange(height)])
#start with upper bounds
forupper = vals.copy()
retval[1] = find_line(forupper)
#next, do lower bounds
forlower = vals.copy()
forlower = np.flipud(forlower)
retval[3] = height - find_line(forlower)
#left edge, same as before but roatate the data so left edge is top edge
forleft = vals.copy()
forleft = np.swapaxes(forleft,0,1)
retval[0] = find_line(forleft)
#and right edge is bottom edge of rotated array
forright = vals.copy()
forright = np.swapaxes(forright,0,1)
forright = np.flipud(forright)
retval[2] = width - find_line(forright)
if retval[0] >= retval[2] or retval[1] >= retval[3]:
print "error, bounding box is not legit"
return None
return tuple(retval)
if __name__ == '__main__':
image = Image.open('cat.jpg')
box = getbox(image)
print "result is: ",box
result = image.crop(box)
result.show()
For starters, Here is a similar question. Here is a related question. And a another related question.
Here is just one idea, there are certainly other approaches. I would select an arbitrary crop edge and then measure the entropy* on either side of the line, then proceed to re-select the crop line (probably using something like a bisection method) until the entropy of the cropped-out portion falls below a defined threshold. As I think, you may need to resort to a brute root-finding method as you will not have a good indication of when you have cropped too little. Then repeat for the remaining 3 edges.
*I recall discovering that the entropy method in the referenced website was not completely accurate, but I could not find my notes (I'm sure it was in a SO post, however.)
Edit:
Other criteria for the "emptiness" of an image portion (other than entropy) might be contrast ratio or contrast ratio on an edge-detect result.
Related
Problem Statement: after successfully getting the boundary box around the object in yolo, i wanted to separate the background from the object itself.
My Solution: i have an RGB-D camera that returns a depth map as well as the image (image is given to yolo obv) , using the depth map , i made a simple function to get the depths (rounded) and how many pixels have that same value
def GetAllDepthsSortedMeters(depth_image_ocv):
_depth = depth_image_ocv[np.isfinite(depth_image_ocv)]
_depth= -np.sort(-depth_image_ocv)[:int(len(_depth)/2)]
_depth= np.round(_depth,1)
unique, counts = np.unique(_depth, return_counts=True)
return dict(zip(counts, unique))
and plotting them, i noticed that there are dominant peaks and the rest lay around them, after some filtering i was able to successfully get those peaks each time.
#get the values of depths and their number of occurences
counts,values = GetKeysAndValues(_depths)
#find the peaks of depths in those values
peaks = find_peaks_cwt(counts, widths=np.ones(counts.shape)*2)-1
using those peaks, i was able to segment the required object from the background by checking what peaks is this value close to, and make a mask for each peak(and pixels around it).
def GetAcceptedMasks(h,w,depth_map,depths_of_accepted_peaks,accepted_masks):
prev=None
prev_index=None
for pos in product(range(h), range(w)):
pixel = depth_map.item(pos)
if ( (prev is not None) and (round(prev,1) == round(pixel,1)) ):
accepted_masks[prev_index][pos[0],pos[1]]= 255
else:
_temp_array = abs(depths_of_accepted_peaks-pixel)
_min = np.amin(_temp_array)
_ind = np.where( _temp_array == _min )[0][0]
accepted_masks[_ind][pos[0],pos[1]]= 255
prev_index = _ind
prev = pixel
return accepted_masks
after passing the image through YOLOv3 and applying the filtering and depth segmentation, it takes 0.8s which is far from optimal,
it's mostly result of above funcution, any help would be amazing. thank you
this is masks i get at the end
Mask1-Of-Closest-Depth
Mask2-Of-2nd-Closest-Depth
Mask3-Of-3rd-Closest-Depth
Edit:
Example of distance:
[0.60000002 1.29999995 1.89999998]
Example of DepthMap when show with imshow:
Example of Depth Map
Here's a way to do it.
Make an array of floats the same height and width as your image, and with the final dimension equal to the number of unique depths you want to identify
At each pixel location, calculate the distance to each of the three desired depths and store in the final dimension
Use np.argmin(..., axis=2) to select the nearest depth of the three
I am not at a computer to test, and your image is not your actual image but rather a picture of it with window decorations and title bar and different values, but something like this:
import cv2
# Load the image as greyscale float - so we can store positive and negative distances
im = cv2.imread('depth.png', cv2.IMREAD_GRAYSCALE).astype(np.float)
# Make list of the desired depths
depths = [255, 181, 125]
# Make array with distance to each depth
d2each = np.zeros(((im.shape[0],im.shape[1],len(depths)), dtype=np.float)
for i in range(len(depths)):
d2each[...,i] = np.abs(im - depths[i])
# Now let Numpy choose nearest of three distances
mask = np.argmin(d2each, axis=2)
Another way, is to range test the distances. Load the image as above:
# Make mask of pixels matching first distance
d0 = np.logical_and(im>100, im<150)
# Make mask of pixels matching second distance
d1 = np.logical_and(im>180, im<210)
# Make mask of pixels matching third distance
d2 = im >= 210
Those masks will be logical (i.e. True/False), but if you want to make them black and white, just multiply them by 255 and cast with mask0 = d0.astype(np.uint8)
Another approach could be to use K-means clustering.
I have binary images where rectangles are placed randomly and I want to get the positions and sizes of those rectangles.
If possible I want the minimal number of rectangles necessary to exactly recreate the image.
On the left is my original image and on the right the image I get after applying scipys.find_objects()
(like suggested for this question).
import scipy
# image = scipy.ndimage.zoom(image, 9, order=0)
labels, n = scipy.ndimage.measurements.label(image, np.ones((3, 3)))
bboxes = scipy.ndimage.measurements.find_objects(labels)
img_new = np.zeros_like(image)
for bb in bboxes:
img_new[bb[0], bb[1]] = 1
This works fine if the rectangles are far apart, but if they overlap and build more complex structures this algorithm just gives me the largest bounding box (upsampling the image made no difference). I have the feeling that there should already exist a scipy or opencv method which does this.
I would be glad to know if somebody has an idea on how to tackle this problem or even better knows of an existing solution.
As result I want a list of rectangles (ie. lower-left-corner : upper-righ-corner) in the image. The condition is that when I redraw those filled rectangles I want to get exactly the same image as before. If possible the number of rectangles should be minimal.
Here is the code for generating sample images (and a more complex example original vs scipy)
import numpy as np
def random_rectangle_image(grid_size, n_obstacles, rectangle_limits):
n_dim = 2
rect_pos = np.random.randint(low=0, high=grid_size-rectangle_limits[0]+1,
size=(n_obstacles, n_dim))
rect_size = np.random.randint(low=rectangle_limits[0],
high=rectangle_limits[1]+1,
size=(n_obstacles, n_dim))
# Crop rectangle size if it goes over the boundaries of the world
diff = rect_pos + rect_size
ex = np.where(diff > grid_size, True, False)
rect_size[ex] -= (diff - grid_size)[ex].astype(int)
img = np.zeros((grid_size,)*n_dim, dtype=bool)
for i in range(n_obstacles):
p_i = np.array(rect_pos[i])
ps_i = p_i + np.array(rect_size[i])
img[tuple(map(slice, p_i, ps_i))] = True
return img
img = random_rectangle_image(grid_size=64, n_obstacles=30,
rectangle_limits=[4, 10])
Here is something to get you started: a naïve algorithm that walks your image and creates rectangles as large as possible. As it is now, it only marks the rectangles but does not report back coordinates or counts. This is to visualize the algorithm alone.
It does not need any external libraries except for PIL, to load and access the left side image when saved as a PNG. I'm assuming a border of 15 pixels all around can be ignored.
from PIL import Image
def fill_rect (pixels,xp,yp,w,h):
for y in range(h):
for x in range(w):
pixels[xp+x,yp+y] = (255,0,0,255)
for y in range(h):
pixels[xp,yp+y] = (255,192,0,255)
pixels[xp+w-1,yp+y] = (255,192,0,255)
for x in range(w):
pixels[xp+x,yp] = (255,192,0,255)
pixels[xp+x,yp+h-1] = (255,192,0,255)
def find_rect (pixels,x,y,maxx,maxy):
# assume we're at the top left
# get max horizontal span
width = 0
height = 1
while x+width < maxx and pixels[x+width,y] == (0,0,0,255):
width += 1
# now walk down, adjusting max width
while y+height < maxy:
for w in range(x,x+width,1):
if pixels[x,y+height] != (0,0,0,255):
break
if pixels[x,y+height] != (0,0,0,255):
break
height += 1
# fill rectangle
fill_rect (pixels,x,y,width,height)
image = Image.open('A.png')
pixels = image.load()
width, height = image.size
print (width,height)
for y in range(16,height-15,1):
for x in range(16,width-15,1):
if pixels[x,y] == (0,0,0,255):
find_rect (pixels,x,y,width,height)
image.show()
From the output
you can observe the detection algorithm can be improved, as, for example, the "obvious" two top left rectangles are split up into 3. Similar, the larger structure in the center also contains one rectangle more than absolutely needed.
Possible improvements are either to adjust the find_rect routine to locate a best fit¹, or store the coordinates and use math (beyond my ken) to find which rectangles may be joined.
¹ A further idea on this. Currently all found rectangles are immediately filled with the "found" color. You could try to detect obviously multiple rectangles, and then, after marking the first, the other rectangle(s) to check may then either be black or red. Off the cuff I'd say you'd need to try different scan orders (top-to-bottom or reverse, left-to-right or reverse) to actually find the minimally needed number of rectangles in any combination.
I was looking to the following example in this question: Eliminating number of connected pixels smaller than some specified number threshold which is very close to what I need.
However, the analysis there is based oly in the number of connected pixels, let's say that now I want to remove together with the areas below a certain amount of pixels also the areas which have an aspect ration different than a "square".
For instance in the following image (left panel) example output let's say I have the red line which is 1900 pixels this means that using the treshold
# now remove the labels
for label,size in enumerate(label_size):
if size < 1800:
Z[Zlabeled == label] = 0
the red line won't be eliminated. But if now I increase the threshold (let's say to 2000), it may happen that I eliminate also the big two figures on the right panel which is my desired output. How do I need to modify the code to consider also the aspect ratio of the connected components?
Thanks in advance:
A possible solution is using the connected component analysis done in this way
from scipy.ndimage.measurements import label
structure = np.ones((3, 3), dtype=np.int)
labeled, ncomponents = label(Z, structure)
indices = np.indices(Z.shape).T[:,:,[1, 0]]
for i in range(1,ncomponents):
pixelcount = (np.sum(labeled==i))
xs = indices[labeled==i][:,0]
ys = indices[labeled==i][:,1]
area = (np.max(xs)-np.min(xs)+2)*(np.max(ys)-np.min(ys)+1)
if (pixelcount/area<1):
pass
labeled[labeled==i] = 0
plt.figure(1)
plt.imshow(labeled,cmap='jet')
where at the end I check the ratio between the pixel in a given connected area divided by the area pixelcount/area so in that way one can control the ration between pixels and total area
I'm trying to take a picture (.jpg file) and find the exact centers (x/y coords) of two differently colored circles in this picture. I've done this in python 2.7. My program works well, but it takes a long time and I need to drastically reduce the amount of time it takes to do this. I currently check every pixel and test its color, and I know I could greatly improve efficiency by pre-sampling a subset of pixels (e.g. every tenth pixel in both horizontal and vertical directions to find areas of the picture to hone in on). My question is if there are pre-developed functions or ways of finding the x/y coords of objects that are much more efficient than my code. I've already removed function calls within the loop, but that only reduced the run time by a few percent.
Here is my code:
from PIL import Image
import numpy as np
i = Image.open('colors4.jpg')
iar = np.asarray(i)
(numCols,numRows) = i.size
print numCols
print numRows
yellowPixelCount = 0
redPixelCount = 0
yellowWeightedCountRow = 0
yellowWeightedCountCol = 0
redWeightedCountRow = 0
redWeightedCountCol = 0
for row in range(numRows):
for col in range(numCols):
pixel = iar[row][col]
r = pixel[0]
g = pixel[1]
b = pixel[2]
brightEnough = r > 200 and g > 200
if r > 2*b and g > 2*b and brightEnough: #yellow pixel
yellowPixelCount = yellowPixelCount + 1
yellowWeightedCountRow = yellowWeightedCountRow + row
yellowWeightedCountCol = yellowWeightedCountCol + col
if r > 2*g and r > 2*b and r > 100: # red pixel
redPixelCount = redPixelCount + 1
redWeightedCountRow = redWeightedCountRow + row
redWeightedCountCol = redWeightedCountCol + col
print "Yellow circle location"
print yellowWeightedCountRow/yellowPixelCount
print yellowWeightedCountCol/yellowPixelCount
print " "
print "Red circle location"
print redWeightedCountRow/redPixelCount
print redWeightedCountCol/redPixelCount
print " "
Update: As I mentioned below, the picture is somewhat arbitrary, but here is an example of one frame from the video I am using:
First you have to do some clearing:
what do you consider fast enough? where is the sample image so we can see what are you dealing with (resolution, bit per pixel). what platform (especially CPU so we can estimate speed).
As you are dealing with circles (each one encoded with different color) then it should be enough to find bounding box. So find min and max x,y coordinates of the pixels of each color. Then your circle is:
center.x=(xmin+xmax)/2
center.y=(ymin+ymax)/2
radius =((xmax-xmin)+(ymax-ymin))/4
If coded right even with your approach it should take just few ms. on images around 1024x1024 resolution I estimate 10-100 ms on average machine. You wrote your approach is too slow but you did not specify the time itself (in some cases 1us is slow in other 1min is enough so we can only guess what you need and got). Anyway if you got similar resolution and time is 1-10 sec then you most likelly use some slow pixel access (most likely from GDI) like get/setpixel use bitmap Scanline[] or direct Pixel access with bitblt or use own memory for images.
Your approach can be speeded up by using ray cast to find approximate location of circles.
cast horizontal lines
their distance should be smaller then radius of smallest circle you search for. cast as many rays until you hit each circle with at least 2 rays
cast 2 vertical lines
you can use found intersection points from #1 so no need to cast many rays just 2 ... use the H ray where intersection points are closer together but not too close.
compute you circle properties
so from the 4 intersection points compute center and radius as it is axis aligned rectangle +/- pixel error it should be as easy just find the mid point of any diagonal and radius is also obvious as half of diagonal size.
As you did not share any image we can only guess what you got in case you do no have circles or need an idea for different approach see:
Algorithms: Ellipse matching
find archery target in image of different perspectives
If you are sure of the colours of the circle, easier method be to filter the colors using a mask and then apply Hough circles as Mathew Pope suggested.
Here is a snippet to get you started quick.
import cv2 as cv2
import numpy as np
fn = '200px-Traffic_lights_dark_red-yellow.svg.png'
# OpenCV reads image with BGR format
img = cv2.imread(fn)
# Convert to HSV format
img_hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# lower mask (0-10)
lower_red = np.array([0, 50, 50])
upper_red = np.array([10, 255, 255])
mask = cv2.inRange(img_hsv, lower_red, upper_red)
# Bitwise-AND mask and original image
masked_red = cv2.bitwise_and(img, img, mask=mask)
# Check for circles using HoughCircles on opencv
circles = cv2.HoughCircles(mask, cv2.cv.CV_HOUGH_GRADIENT, 1, 20, param1=30, param2=15, minRadius=0, maxRadius=0)
print 'Radius ' + 'x = ' + str(circles[0][0][0]) + ' y = ' + str(circles[0][0][1])
One example of applying it on image looks like this. First is the original image, followed by the red colour mask obtained and the last is after circle is found using Hough circle function of OpenCV.
Radius found using the above method is Radius x = 97.5 y = 99.5
Hope this helps! :)
I'm trying to determine if an image is squared(pixelated).
I've heard of 2D fourrier transform with numpy or scipy but it is a bit complicated.
The goal is to determine an amount of squared zone due to bad compression like this (img a):
I have no idea if this would work - but, something you could try is to get the nearest neighbors around a pixel. The pixellated squares will be a visible jump in RGB values around a region.
You can find the nearest neighbors for every pixel in an image with something like
def get_neighbors(x,y, img):
ops = [-1, 0, +1]
pixels = []
for opy in ops:
for opx in ops:
try:
pixels.append(img[x+opx][y+opy])
except:
pass
return pixels
This will give you the nearest pixels in a region of your source image.
To use it, you'd do something like
def detect_pixellated(fp):
img = misc.imread(fp)
width, height = np.shape(img)[0:2]
# Pixel change to detect edge
threshold = 20
for x in range(width):
for y in range(height):
neighbors = get_neighbors(x, y, img)
# Neighbors come in this order:
# 6 7 8
# 3 4 5
# 0 1 2
center = neighbor[4]
del neighbor[4]
for neighbor in neighbors:
diffs = map(operator.abs, map(operator.sub, neighbor, center))
possibleEdge = all(diff > threshold for diff in diffs)
After further thought though, use OpenCV and do edge detection and get contour sizes. That would be significantly easier and more robust.
If you scan through lines of it it's abit easier because then you deal with linear graphs instead of 2d image graphs, which is always simpler.
Solution:
scan a line across the pixels, put the line in an array if it is faster to access for computations, and then run algorithms on the line(s) to determine the blockiness:
1/ run through every pixel in your line and compare it to the previous pixel by substracting the value between the two pixels. make an array of previous pixel values. if large jumps in pixel values are at regular invervals, it's blocky. if there are large jumps in values combined with small jumps in values, it's blocky... you can assume that if there are many equal pixel differences, it's blocky, especially if you repeat the analysis twice at 2 and 4 neighbour pixel intervals, and on multiple lines.
you can also make graphs of pixel differences between pixels 3-5-10 pixels apart, to have additional information on gradient changes of sampled lines of pics. if the ratio of pixel differences of neighbour pixels and 5th neighbour pixels is similar, it also indicates unsmooth colors.
there can be many algorythms, including fast fourrier on a linear graph, same as audio, that you would use on line(s) from the pic, that is simpler than a 2d image algorythm.