I have a video stream from an IP camera and I need to process the specific region of interest (ROI) in the image (you can see in the screenshot).
The problem is that the camera is installed from the side and the images come with an angle. With OpenCV functions, I can only extract rectangle image. However, then I have unnecessary parts in the extracted rectangle. But, I want to extract custom ROI on this image and then process only this part. How can I achieve this?
Below is extracted rectangle part of image (including unnecessary parts):
upper_left = (0, 70)
bottom_right = (1280, 250)
while True:
success_1, img_1 = cam_capture.read()
success_2, img_2 = cam_capture.read()
if success_1 and success_2:
# find difference between two frames
diff = cv2.absdiff(img_1, img_2)
# Rectangle marker
r = cv2.rectangle(img_1, upper_left, bottom_right, (100, 50, 200), 5)
rect_img = diff[upper_left[1]: bottom_right[1], upper_left[0]: bottom_right[0]]
Related
I am trying to create a small section above a GIF that a user inputs along with some text they can add to mimic that popular GIF meme style. Upon output, the colors of the gif are entirely completely black or completely white with lots of artifacting.
I open the input GIF and create a new image using PIL with the same dimensions, plus a little extra height.
giftemplate = Image.open("input.gif")
# create empty white frame with extra height for text
result_template = Image.new(giftemplate.mode, size=(base_width, new_height), color=(255, 255, 255))
Then I add some text to the extra height and loop over every frame in the gif, pasting the current frame to the new image, then append that frame in a list.
# paste each frame of gif under extra height
frames = []
for frame in ImageSequence.Iterator(giftemplate):
result_template.paste(frame, (0, padding_size))
b = BytesIO()
result_template.save(b, format="GIF")
result_frame = Image.open(b)
frames.append(result_frame)
frames[0].save('meme_out.gif', save_all=True, append_images=frames[1:], loop=0)
Gif used as input -
Input.gif
The generated gif outputted by PIL - meme_out.gif
Notice the total lack of color, and (while not so apparent on this example) slowed speed. You can still see a small outline of the cat dancing.
The output gif looks totally fine if I dont paste each frame of the gif onto created image with text
EDIT: Reproducible example provided:
from PIL import Image, ImageFont, ImageDraw, ImageSequence
from io import BytesIO
caption = "sample"
giftemplate = Image.open("input.gif")
font = ImageFont.truetype("Futura Condensed Extra Bold Regular.ttf", 10)
# text margin size scales with image height
text_margin = int((giftemplate.height / 100))
# text width and height
tw, th = font.getsize(caption)
# top left text box coordinate with respect to image pixels. Top left of image is 0,0
cx, cy = int(giftemplate.width / 2), text_margin
padding_size = th
base_width, base_height = giftemplate.size
new_height = base_height + padding_size
# create empty white frame with extra height for text
result_template = Image.new(giftemplate.mode, size=(base_width, new_height), color=(255, 255, 255))
# draw text lines in the extra height
tw, th = font.getsize(caption)
draw = ImageDraw.Draw(result_template)
draw.text((cx - tw / 2, cy), caption, (0, 0, 0), font=font)
# paste each frame of gif under extra height
frames = []
for frame in ImageSequence.Iterator(giftemplate):
result_template.paste(frame, (0, padding_size))
b = BytesIO()
result_template.save(b, format="GIF")
result_frame = Image.open(b)
frames.append(result_frame)
frames[0].save('meme_out.gif', save_all=True, append_images=frames[1:], loop=0)
So I have started a program that takes two images, one that's the model image and the other that's an image with a change I want it to detect the differences and show me with circling the differences. I have come to an issue with finding the difference coordinates as my circle keeps ending up in the middle of the image.
This is the code I have:
import cv2 as cv
import numpy as np
from PIL import Image, ImageChops
#Ideal Image and The main Image
img2= cv.imread("ideal.jpg")
img1 = cv.imread("Actual.jpg")
#Verifys if there is or isnt a differance in the Image for the If statement
diff = cv.subtract(img2, img1)
results = not np.any(diff)
#Tells the User if there is a Differance within the 2 images with the model image and the image given
if results is True:
print("The Images are the same!")
else:
print("The images are differant")
#This is to make the image show the differance to circle
img_1=Image.open("Actual.jpg")
img_2=Image.open("ideal.jpg")
diff=ImageChops.difference(img_1,img_2)
diff.save("Differance.jpg")
#Reads the image Just saved
Differance = cv.imread("Differance.jpg", 0)
#Resize the Image to make it smaller
img1s = cv.resize(img1, (0, 0), fx=0.5, fy=0.5)
Differance = cv.resize(Differance, (0, 0), fx=0.5, fy=0.5)
# Find anything not black, i.e. The differance
nz = cv.findNonZero(Differance)
# Find top, bottom, left and right edge of the Differance
a = nz[:,0,0].min()
b = nz[:,0,0].max()
c = nz[:,0,1].min()
d = nz[:,0,1].max()
# Average top and bottom edges, left and right edges, to give centre
c0 = (a+b)/2
c1 = (c+d)/2
#The Center Coords
c3 = (int(c0),int(c1))
#Values for the below code so it doesnt look messy
radius = 50
color = (0, 0, 255)
thickness = 2
#This Places a Circle around the center of the differance
Finished = cv.circle(img1s, c3, radius, color, thickness)
#Saves the Final Image with the circle around it
cv.imwrite("Final.jpg", Finished)
And the Images attached 1
2
This code currently takes both images and blacks out the background leaving only the difference within the image then the program is meant to take the location of the difference and place a circle around the center of the main image that is the one with the difference on it.
Your main problem is JPG format which changes pixels to better compress image - and this creates differences in all area. If you display diff or difference then you should see many gray pixels
I hope you see pixels below ball
If you use PNG for original image (without ball) and later use this image to create image with ball and also save in PNG then code will works correctly.
My version without PIL.
Press any key to close window with image.
import cv2 as cv
import numpy as np
# load images
img1 = cv.imread("img1.png")
img2 = cv.imread("img2.png")
# calculate difference
diff = cv.subtract(img1, img2) # other order `(img2, img1)` gives worse result
# saves difference
cv.imwrite("difference.png", diff)
# show difference - press any key to close
cv.imshow('diff', diff)
cv.waitKey(0)
cv.destroyWindow('diff')
if not np.any(diff):
print("The images are the same!")
else:
print("The images are differant")
# resize images to make them smaller
#img1_resized = cv.resize(img1, (0, 0), fx=0.5, fy=0.5)
#diff_resized = cv.resize(diff, (0, 0), fx=0.5, fy=0.5)
img1_resized = img1
diff_resized = diff
# convert to grayscale (without saving and loading again)
diff_resized = cv.cvtColor(diff_resized, cv.COLOR_BGR2GRAY)
# find anything not black in differance
non_zero = cv.findNonZero(diff_resized)
#print(non_zero)
# find top, bottom, left and right edge of the differance
x_min = non_zero[:,0,0].min()
x_max = non_zero[:,0,0].max()
y_min = non_zero[:,0,1].min()
y_max = non_zero[:,0,1].max()
print('x:', x_min, x_max)
print('y:', y_min, y_max)
sizes = [x_max-x_min+1, y_max-y_min+1]
print('width :', sizes[0])
print('height:', sizes[1])
# center
center_x = (x_min + x_max) // 2
center_y = (y_min + y_max) // 2
center = (center_x, center_y)
print('center:', center)
# radius
radius = max(sizes) // 2
print('radius:', radius)
color = (0, 0, 255)
thickness = 2
# draw circle around the center of the differance
finished = cv.circle(img1_resized, center, radius, color, thickness)
# saves final image with circle
#cv.imwrite("final.png", finished)
# show final image - press any key to close
cv.imshow('finished', finished)
cv.waitKey(0)
cv.destroyWindow('finished')
img1.png
img2.png
difference.png
final.png
EDIT:
If you work with JPG then you can try to reduce noises
diff = cv.subtract(img1, img2)
diff_gray = cv.cvtColor(diff, cv.COLOR_BGR2GRAY)
diff_gray[diff_gray < 50] = 0
For different images you may need different values instead of 50.
You may also try thresholding
(_, diff_gray) = cv.threshold(diff_gray, 50, 0, cv.THRESH_TOZERO)
It may need also other functions like blur(), erode(), dilate(),
do not need PIL
take Differance image
threshold it
use findcontour to find regions
if contours finded then draw it
for cnt in contours:
out_image = cv2.drawContours(out_image, [cnt], 0, (255,0,0), -1)
(x,y),radius = cv2.minEnclosingCircle(cnt)
center = (int(x),int(y))
radius = int(radius)
out_image = cv2.circle(out_image,center,radius,(0,255,0),2)
I have the attached an image with 300 DPI. I am using the code below to extract text but I am getting no text. Anyone know the issue?
finalImg = Image.open('withdpi.jpg')
text = pytesseract.image_to_string(finalImg)
image to extract text from
Lets observe what is your code doing.
We need to see what part of the text is localized and detected.
For understanding the code behavior we will use image_to_data function.
image_to_data will show what part of the image is detected.
# Open the image and convert it to the gray-scale
finalImg = Image.open('hP5Pt.jpg').convert('L')
# Initialize ImageDraw class for displaying the detected rectangle in the image
finalImgDraw = ImageDraw.Draw(finalImg)
# OCR detection
d = pytesseract.image_to_data(finalImg, output_type=pytesseract.Output.DICT)
# Get ROI part from the detection
n_boxes = len(d['level'])
# For each detected part
for i in range(n_boxes):
# Get the localized region
(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
# Initialize shape for displaying the current localized region
shape = [(x, y), (w, h)]
# Draw the region
finalImgDraw.rectangle(shape, outline="red")
# Display
finalImg.show()
# OCR "psm 6: Assume a single uniform block of text."
txt = pytesseract.image_to_string(cropped, config="--psm 6")
# Result
print(txt)
Result:
i
I
```
So the result is the image itself displays nothing is detected. The code is not-functional. The output does not display the desired result.
There might be various reasons.
Here are some facts of the input image:
Binary image.
Big rectangle artifact.
Text is a little bit dilated.
We can't know whether the image requires pre-processing without testing.
We are sure about the big-black-rectangle is an artifact. We need to remove the artifact. One solution is selecting part of the image.
To select the part of image, we need to use crop and some trial-and-error to find the roi.
If we the image as two pieces in terms of height. We don't want the other artifact containing half.
From the first glance, we want (0 -> height/2). If you play with the values you can see that the exact text location is between (height/6 -> height/4)
Result will be:
$1,582
Code:
# Open the image and convert it to the gray-scale
finalImg = Image.open('hP5Pt.jpg').convert('L')
# Get height and width of the image
w, h = finalImg.size
# Get part of the desired text
finalImg = finalImg.crop((0, int(h/6), w, int(h/4)))
# Initialize ImageDraw class for displaying the detected rectangle in the image
finalImgDraw = ImageDraw.Draw(finalImg)
# OCR detection
d = pytesseract.image_to_data(finalImg, output_type=pytesseract.Output.DICT)
# Get ROI part from the detection
n_boxes = len(d['level'])
# For each detected part
for i in range(n_boxes):
# Get the localized region
(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
# Initialize shape for displaying the current localized region
shape = [(x, y), (w, h)]
# Draw the region
finalImgDraw.rectangle(shape, outline="red")
# Display
finalImg.show()
# OCR "psm 6: Assume a single uniform block of text."
txt = pytesseract.image_to_string(cropped, config="--psm 6")
# Result
print(txt)
If you can't get the same solution as mine, you need to check your pytesseract version, using:
print(pytesseract.get_tesseract_version())
For me the result is 4.1.1
The corner coordinates of square_1 = (0, 0, 1920, 1080). I then define square_2 as a smaller ROI within square one using numpy slicing like so roi = square_1[y1:y2, x1:x2]. I then resize square_1 using square_resize = cv2.resize(square_1, (960, 540), interpolation = cv2.INTER_AREA) . However, now my ROI is no longer accurate. I have a tool which tells me the screen coords of the mouse pos, which is how I find the dimensions of the ROI, but I need a function that translates the ROI coordinates I find, given the coordinates of square_1, in terms of the coordinates of square_resize.
EDIT:
Solved using Panda50's answer. grab_screen() is my own custom function for getting screenshots. Here is my code if it helps anyone. It does not give 100% accurate coords but you can play around some and narrow it down.
from cv2 import cv2
import numpy as np
y1 = int(92 / 2)
y2 = int(491 / 2)
x1 = int(233 / 2)
x2 = int(858 / 2)
# grab screen and convert to RGB
screen = grab_screen(region = (0, 0, 1920, 1080))
screen = cv2.cvtColor(screen, cv2.COLOR_BGR2RGB)
# resize screen
screen = cv2.resize(screen, (960, 540), interpolation = cv2.INTER_AREA)
# define ROI
roi = screen[y1:y2, x1:x2].copy()
cv2.imshow('roi', roi)
cv2.waitKey()
cv2.destroyAllWindows()
In python, = associate one variable with another. By changing square_1 you'll also change roi .
You have to use :
roi = square_1[y1:y2, x1:x2].copy()
I really don't know if "UV's" is the right word as i'm from the world of Unity and am trying to write some stuff in python. What i'm trying to do is to take a picture of a human (from webcam) take the placement of their landmarks/key features and alter a second image (of a different person) to make their key features in the same place whilst morphing / warping the parts of their skin that are within the face to fit the position of the first input image (webcam)'s landmarks. After i do that I need to put the face back on the non-webcam input. (i'm sorry for how much that made me sound like a serial killer, stretching and cutting faces) I know that probably didn't make any sense but I want it to look like this.
I have the face landmark and cutting done with DLIB and OpenCV but i need a way to find a way to take these "cut" face chunks and stretch them "dynamically". What I mean by dynamically is that you don't just put a mask on by linearly re-sizing it on 1 or 2 axises. You can select a point of the mask and change that, I wanna do that but my mask is my cut chunk and the point is a section of that chunk that needs to change for the chunk to comply with the position of the generated landmarks. I know this is a very hard topic to think about and if you guys need any clarification just ask. My code:
import cv2
import numpy as np
import dlib
cap = cv2.VideoCapture(0)
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")
while True:
_, frame = cap.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
faces = detector(gray)
for face in faces:
x1 = face.left()
y1 = face.top()
x2 = face.right()
y2 = face.bottom()
cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 3)
landmarks = predictor(gray, face)
for n in range(0, 68):
x = landmarks.part(n).x
y = landmarks.part(n).y
cv2.circle(frame, (x, y), 4, (255, 0, 0), -1)
cv2.imshow("Frame", frame)
key = cv2.waitKey(1)
if key == 27:
break
EDIT: No i'm not a serial killer
If you need to deform source image like a rubber sheet using 2 sets of keypoints, you need to use thin plate spline (TPS), or, better, piecewice affine transformation like here. The last one is more similar to texture rasterization methods (triangle to triangle texture transform).