I have an image (sorry cannot link it for copyright purposes) that has a character outlined in a black line. The black line that outlines the character is the darkest thing on the picture (planned on using this fact to help find it). What I need to do is obtain four coordinates that draw a virtual box around the character. The box should be as small as possible while still keeping the outlined character inside its contents. I intend on using the box to help pinpoint what would be the central point of the character's figure by using the center point of the box.
I started with trying to identify parts of the outline. Since it's the darkest line on the image, I used getextrema() to obtain at least one point on the outline, but I can't figure out how to get more points and then combine those points to make a box.
Any insight into this problem is greatly appreciated. Cheers!
EDIT *
This is what I have now:
im = Image.open("pic.jpg")
im = im.convert("L")
lo, hi = im.getextrema()
im = im.point(lambda p: p == lo)
rect = im.getbbox()
x = 0.5 * (rect[0] + rect[2])
y = 0.5 * (rect[1] + rect[3])
It seems to be pretty consistent to getting inside the figure, but it's really not that close to the center. Any idea why?
Find an appropriate threshold that separates the outline from the rest of the image, perhaps using the extrema you already have. If the contrast is big enough this shouldn't be too hard, just add some value to the minimum.
Threshold the image with the value you found, see this question. You want the dark part to become white in the binary thresholded image, so use a smaller-than threshold (lambda p: p < T).
Use thresholdedImage.getbbox() to get the bounding box of the outline
Related
This question is about a python3 face_recognition module.
For a frame from a live-streaming video, I have
face_locs = face_recognition.face_locations(frame)
What I want now is, for each face in face_locs, to convert face from a CSS-styled quadruple (top, right, bottom, left) to the area of the frame (as an image) bounded by the rectangle defined by that quadruple.
(The relevant part of) my code is following:
for face in face_locs:
# TODO: convert face to a comparable image first
res = face_recognition.compare_faces(face_encs_in_DB, face)
In the code above, what I can't do is denoted with TODO.
In my opinion (and I might be wrong), I should write a function fix_face() that takes face and, np_arr being the corresponding numpy array to face, the function should return face_recognition.face_encodings(np_arr)[0].
Please, help me.
I just found a solution to my problem after examining it for a couple of hours. Not sure if I should post this as an answer; if I'm wrong in doing so, please, warn me, so I can demote it to a comment. I'm posting it so that if anybody else runs into the same issue, they can get this solution.
Instead of
face_locs = face_recognition.face_locations(frame),
write
curr_face_encs = face_recognition.face_encodings(frame).
Then,
for face in curr_face_encs:
res = face_recognition.compare_faces(face_encs_in_DB, face)
works.
Looking for trainings on python I decided to draw the mandelbrot set using a script. Drawing it wasn't too complicated so I decided to use color and I discovered the smooth coloring algorithm. Using this question I was able to render something really beautiful and similar to this one.
To achieve that I set up a gradation color palette using three "steps" : From dark blue to light blue, then light blue to yellow and finally yellow to dark brown. The overall image is perfect.
Problem comes when I try too zoom in. Let's take the example of this area. When I'm at this level of zoom, my script doesn't draw dark blue anymore. I think I mis coded something because whereever you see dark blue on the wikipedia image, I have dark brown (so a color near the end of my palette). When I first thought about this I told myself if the pattern is going back to the original one, then it should use the same colors cause escape time should be the same.
So, was this coloring configured in the palette or is there something about escape time I didn't understand ?
Here is the code I use for the coloring :
def color_pixel(n, z):
smoothcolor = n + 1 - math.log(math.log(abs(z)))/math.log(2)
f = smoothcolor/iterate_max
i = int(f*500)
color = palette[i]
return color
500 is the number of colors in my palette (len(palette)-1).
z the value of z when it escaped over 10.
I use 100 as the max iterations but same results with a higher value.
Thanks !
My colouring method is to use a rotative array in three sections. First blue cross-fades to green without using red, then green to red without using blue, and finally red to (almost) blue with no green, where the next iteration level will wrap back to pure blue at the bottom of the array by using a modulus of the iterations.
However when I made a supposedly smoothe realtime zoom (by storing the data with a doubling scale, and then in-betweening 16 frames by interpolation for playback), I found that in the neighbourhood of the M-set, where the contours look chaotic, the effect was messy as the colours tend to dance around. There I used a different scheme, bending the colours to a gray scale.
My final colouring method was to use the rotating palette for pixels having one or more neighbours of the same depth, but tending towards mid-gray depending on how many neighbours were different. Bear in mind though, that the requirements for a moving image are different from a static image, and sharp detail is not necessarily desirable.
At deep zooms the number of iterations needed to extract the detail can be 1000 or more. I solved the problem laterally. I do not brute-force the map calculations. I developed a curve-stitching method that follows the contour of an iteration level, and then fills the region. In the smoothly changing areas that means large areas do not have to be iterated. Similarly for the M-Set itself where the function has not escaped - I avoid iterating there as far as possible by again trying to follow round its edge and then filling. This method can suffer from nipping off some detail, but the speed gain is enormous. In the chaotic region near the edge of the M-Set my method was to just iterate at every pixel.
I'm also looking into this now as well (the coloring scheme). Since the image was made using Ultra Fractal 3, I looked into that program and poked around and finally found the details, which are slightly different than from what you and the wiki are doing. It's written in some custom scripting language but hopefully you can understand what it's doing. Here's the code:
Smooth(OUTSIDE) {
;
; This coloring method provides smooth iteration
; colors for Mandelbrot and other z^2 formula types
; (Phoenix, Julia). Results on other types may be
; unpredictable, but might be interesting.
;
; Thanks to F. Slijkerman for some tweaks.
; Thanks to Linas Vepstas for the math.
;
; Written by Damien M. Jones
;
init:
complex il = 1/log(#power) ; Inverse log (power).
float lp = log(log(#bailout)) ; log(log bailout).
final:
#index = 0.05 * real(#numiter + il*lp - il*log(log(cabs(#z))))
default:
title = "Smooth (Mandelbrot)"
helpfile = "Uf*.chm"
helptopic = "Html/coloring/standard/smooth.html"
$IFDEF VER50
rating = recommended
$ENDIF
param power
caption = "Exponent"
default = (2,0)
hint = "This should be set to match the exponent of the \
formula you are using. For Mandelbrot, this is usually 2."
endparam
param bailout
caption = "Bail-out value"
default = 128.0
min = 1
hint = "This should be set to match the bail-out value in \
the Formula tab. This formula works best with bail-out \
values higher than 100."
endparam
}
My math isn't good enough to know how to compute the log of a complex number so I'm stuck at the moment in going further using this, but I thought I'd share what I've found on this topic.
I want to ocr two-digit-image after erasing right number for good accuracy.
Example) Original, Modified
The image is PNG file(52*26px) and background color is (192,192,192,255) and color of each digit is different.
But amazingly after erasig right number, tesseract cannot recognize the number.
Result:
original:60
left:
from PIL import Image
from pytesseract.pytesseract import *
im=Image.open('NA2WK.png')
#calculate far left xpos of color
color={}
for i in range(52):
for j in range(26):
if im.load()[i,j]!=(192,192,192,255):
if color.get(im.load()[i,j])==None:
color[im.load()[i,j]]=9999
if i<color[im.load()[i,j]]:
color[im.load()[i,j]]=i
#get color of left character
if color.values()[0]<color.values()[1]:
left=color.keys()[0]
right=color.keys()[1]
else:
left=color.keys()[1]
right=color.keys()[0]
#left processing
imleft=Image.open('test.png')
pix=imleft.load()
for i in range(52):
for j in range(26):
if pix[i,j]==(192,192,192,255) or pix[i,j]==right:
pix[i,j]=(255,255,255,255)
else:
pix[i,j]=(0,0,0,255)
print('original:'+image_to_string(im))
print('left:'+image_to_string(imleft))
Tesseract performs a connected component analysis internally. It does try to group blocks of text together and might be causing an issue due to lack of too many characters within the page. There is page segmentation mode wherein you can ask tesseract to treat the image as a single character. Try this approach it may give you the required results.
In an effort to separate histologic slides into several layers based on color, I modified some widely distributed code (1) available through OpenCV's community. Our staining procedure marks different cell types of tissue cross sections with different colors (B cells are red, Macrophages are brown, background nuceli have a bluish color).
I'm interested in selecting only the magenta-colored and brown parts of the image.
Here's my attempt to create a mask for the magenta pigment:
import cv2
import numpy as np
def mask_builder(filename,hl,hh,sl,sh,vl,vh):
#load image, convert to hsv
bgr = cv2.imread(filename)
hsv = cv2.cvtColor(bgr, cv2.COLOR_BGR2HSV)
#set lower and upper bounds of range according to arguements
lower_bound = np.array([hl,sl,vl],dtype=np.uint8)
upper_bound = np.array([hh,sh,vh],dtype=np.uint8)
return cv2.inRange(hsv, lower_bound,upper_bound)
mask = mask_builder('sample 20 138 1.jpg', 170,180, 0,200, 0,230)
cv2.imwrite('mask.jpg', mask)
So far a trial and error approach has produced poor results:
The can anyone suggest a smarter method to threshhold within the HSV colorspace? I've done my best to search for answers in previous posts, but it seems that these color ranges are particularly difficult to define due to the nature of the image.
References:
Separation with Colorspaces: http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_imgproc/py_colorspaces/py_colorspaces.html
python opencv color tracking
BGR separation: http://www.pyimagesearch.com/2014/08/04/opencv-python-color-detection/
UPDATE:
I've found a working solution to my problem. I increased the lower bound of 'S' and 'V' by regular intervals using a simple FOR control structure, outputing the results for each test image and choosing the best. I found my lower bounds for S and V should be set at 100 and 125. This systematic method of trial and error produced better results:
I am happy you found your answer.
I will suggest an alternate method that might work. Unfortunately I am not proficient with python so you'll need to find out how to code that in python (its basic).
If I had the firs image you have after the HSV threshold, I would use morphological operations to get the information I want.
I would probably give it a go to "closing", but if it doesnt work I would first dilate, then fill and then erode the same amount firstly dilated.
Probably after this first step you'll need to delete the small "noise" blobs you have around and you'll get the image.
This is how it would be in Matlab (showing this mainly so you can see the results):
I=imread('http://i.stack.imgur.com/RlH4V.jpg');
I=I>230; % Create Black and white image (this is because in stackoverflow its a jpg)
ker=strel('square',3); % Create a 3x3 square kernel
I1=imdilate(I,ker); % Dilate
I2=imfill(I1,'holes'); % Close
I3=imerode(I2,ker); % Erode
Ilabel=bwlabel(I3,8); % Get a label per independent blob
% Get maximum area blob (you can do this with a for in python easily)
areas = regionprops(Ilabel,'Centroid','Area','PixelIdxList');
[~,index] = max([areas.Area]); % Get the maximum area
Imask=Ilabel==index; % Get the image with only the max area.
% Plot: This is just matlab code, no relevance
figure;
subplot(131)
title('Dialted')
imshow(I1);
subplot(132)
title('Closed')
imshow(I2);
subplot(133)
title('Eroded')
imshow(I3);
figure;
imshow(imread('http://i.stack.imgur.com/ZqrF9.jpg'))
hold on
h=imshow(bwperim(Imask));
set(h,'alphadata',Imask/2)
Note that I started from the "bad" HSV segmentation. If you try a better one the results may improve. Also, play with the kernel size for the erosion and dilation.
Through trial-and-error (incrementing down and up the "S" and "V" scales), I found that my desired colors require a relaxed range for "S" and "V" values. I'll refrain from sharing the particular values I use because I don't think anyone would find such information useful.
Note that the original code shared works fine once more representitive ranges are used.
I need to remove the lines from the image below of numbers on a piece of ruled paper without causing my digits any distortion. Without this, my digit detection algorithm to fails as there are artefacts of the ruling lines of the piece of paper in the region of interest's.
a cleaner version of the file without any artefacts
Classic task for Fourier-domain transform.
Perform Fourier transform:
import numpy as np
from scipy.misc import imshow, imsave, imread
img = imread("10XIn.jpg")[:,:,:3]
imggray = np.mean(img, -1)
imfft = np.fft.fft2(imggray)
mags = np.abs(np.fft.fftshift(imfft))
angles = np.angle(np.fft.fftshift(imfft))
visual = np.log(mags)
visual2 = (visual - visual.min()) / (visual.max() - visual.min())*255
visual2 will look like following:
Note the diagonal line across the center - it represents your lines.
Now, I've manually created the mask for this line, but idealy you could filter it out programmatucally
Then we read the mask and paint out the line:
mask = imread("fftimg4_mask.jpg")[:,:,:3]
mask = (np.mean(mask,-1) > 20)
visual[mask] = np.mean(visual)
And then reverse the fft:
newmagsshift = np.exp(visual)
newffts = newmagsshift * np.exp(1j*angles)
newfft = np.fft.ifftshift(newffts)
imrev = np.fft.ifft2(newfft)
newim2 = 255 - np.abs(imrev).astype(np.uint8)
imsave("fftimg2.jpg", newim2 )
Here is newim2
Of course, you could do more accurate patching in fourier space and also you could apply the result back to the original image to keep colors, but I think this post illustrates the idea.
Okay, this might be a bitcomplicated as the color of the notebook lines is quite close to the color of digits, as it seems from your example. I presume, that the green boxes are you addition and not part of the data itself.
You don't state which framework you use, so I will provide only some general tips how to approach this problem.
First step would be some thresholding. You can use either binary thresholding or better some adaptive thresholding with correctly sized windows. You will have to experiment on this. Result of threshholding will be binary image. Still with lines.
Second step will be to use morphological operations to clear the image. If you are not sure what morphology is, look at this morphology tutorial.
Around half way through, there are some examples of removing lines from images. The biggest problem is, that some number also contain horizontal lines. So one option will be to use rather small morphology kernel (maybe 3 rows and 1 column), as the notebook lines are thinner. And update the recognizer, to recognize even distorted numbers. This should be doable, because all the digits will be distored in same way.
Another way to do it is to exploit the known structure.
deskew the image (skew can be found with Hough transform in opencv)
locate peaks in row sums
physically clone pixels above and below lines
I just implemented this for another dataset, example attached. This could be tuned further.