objective i want to extract text from image.
i play a game which an icon appears randomly ,and there is a text(text as image) near to the icon from the right.
i want the script take screenshot of the region of the text only.
so, i want the script every time he locatonscreen the i con, i want him take screen shot of the text.
here is an image to understand the idea :
enter image description here
this is my code:
import pyautogui as py
import time
from PIL import Image
from pytesseract import *
pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
while 1:
indice1 = py.locateOnScreen("icon.png")
if indice1:
print("indice see it ")
myScreenshot = py.screenshot()
myScreenshot.save(r'C:\Users\rachidel07\Desktop\ok\venv\image.png')
img=Image.open("image.png")
output = pytesseract.image_to_string(img)
print(output)
else:
print ("non")
If you just want the text, check for the icon and when it does find it, take a picture of the whole box with coordinates relative to the icon (you get this easily get this since locateonscreen returns coordinates and you can just measure how big the text box is and do the math.) Then use PIL to crop only the text and then use tesseract for ocr.
To crop the text, you would use the crop() from PIL.
from PIL import Image
img = Image.open("image.png")
newimg = img.crop((100, 100, 150, 150))
newimg.save("croppedimage.png")
Related
This is my code:
import pyautogui
images = ['colordonkergrijsDCphrasev2.png'] #the image I want it to find
while True:
for image in images: #search for the image in images
pos = pyautogui.locateOnScreen(image, region=(740,870, 50, 20)) #search on the screen for the image
if pos is not None: # this checks that the image was found
pyautogui.click(pos) # click the position of the image
I want that my code clicks on a special region when he is seeing that image.
But my code doesn't do that, my code clicks every time also when the image isn't there. The image I'm using is very similar as the background. But I added confidence = 1 and it still doesn't work
Does someone know how to fix it?
I use Python 3.9.4 64-bit.
I already read the docs but there isn't anything in there what can help me.
If you want to docs here it is: https://pyautogui.readthedocs.io/en/latest/screenshot.html
Probably an unusual question, but I am currently looking for a solution to display image files with PIL slower.
Ideally so that you can see how the image builds up, pixel by pixel from left to right.
Does anyone have an idea how to implement something like this?
It is a purely optical thing, so it is not essential.
Here an example:
from PIL import Image
im = Image.open("sample-image.png")
im.show()
Is there a way to "slow down" im.show()?
AFAIK, you cannot do this directly with PIL's Image.show() because it actually saves your image as a file to /var/tmp/XXX and then passes that file to your OS's standard image viewer to display on the screen and there is no further interaction with the viewer process after that. So, if you draw in another pixel, the viewer will not be aware and if you call Image.show() again, it will save a new copy of your image and invoke another viewer which will give you a second window rather than updating the first!
There are several possibilities to get around it:
use OpenCV's cv2.imshow() which does allow updates
use tkinter to display the changing image
create an animated GIF and start a new process to display that
I chose the first, using OpenCV, as the path of least resistance:
#!/usr/bin/env python3
import cv2
import numpy as np
from PIL import Image
# Open image
im = Image.open('paddington.png')
# Make BGR Numpy version for OpenCV
BGR = np.array(im)[:,:,::-1]
h, w = BGR.shape[:2]
# Make empty image to fill in slowly and display
d = np.zeros_like(BGR)
# Use "x" to avoid drawing and waiting for every single pixel
x=0
for y in range(h):
for x in range(w):
d[y,x] = BGR[y,x]
if x%400==0:
cv2.imshow("SlowLoader",d)
cv2.waitKey(1)
x += 1
# Wait for one final keypress to exit
cv2.waitKey(0)
Increase the 400 near the end to make it faster and update the screen after a greater number of pixels, or decrease it to make it update the screen after a smaller number of pixels meaning you will see them appear more slowly.
As I cannot share a movie on StackOverflow, I made an animated GIF to show how that looks:
I decided to try and do it with tkinter as well. I am no expert on tkinter but the following works just the same as the code above. If anyone knows tkinter better, please feel free to point out my inadequacies - I am happy to learn! Thank you.
#!/usr/bin/env python3
import numpy as np
from tkinter import *
from PIL import Image, ImageTk
# Create Tkinter Window and Label
root = Tk()
video = Label(root)
video.pack()
# Open image
im = Image.open('paddington.png')
# Make Numpy version for simpler pixel access
RGB = np.array(im)
h, w = RGB.shape[:2]
# Make empty image to fill in slowly and display
d = np.zeros_like(RGB)
# Use "x" to avoid drawing and waiting for every single pixel
x=0
for y in range(h):
for x in range(w):
d[y,x] = RGB[y,x]
if x%400==0:
# Convert the video for Tkinter
img = Image.fromarray(d)
imgtk = ImageTk.PhotoImage(image=img)
# Set the image on the label
video.config(image=imgtk)
# Update the window
root.update()
x += 1
I have read mountains of posts on pytesseract, but I cannot get it to read text off a dead simple image; It returns an empty string.
Here is the image:
I have tried scaling it, grayscaling it, and adjusting the contrast, thresholding, blurring, everything it says in other posts, but my problem is that I don't know what the OCR wants to work better. Does it want blurry text? High contrast?
Code to try:
import pytesseract
from PIL import Image
print pytesseract.image_to_string(Image.open(IMAGE FILE))
As you can see in my code, the image is stored locally on my computer, hence Image.open()
Trying something along the lines of
import pytesseract
from PIL import Image
import requests
import io
response = requests.get('https://i.stack.imgur.com/J2ojU.png')
img = Image.open(io.BytesIO(response.content))
text = pytesseract.image_to_string(img, lang='eng', config='--psm 7')
print(text)
with --psm values equal or larger than 6 did yield "Gm" for me.
If the image is stored locally (and in your working directory), just drop the response variable and change the definition of text with the lines
image_name = "J2ojU.png" # or whatever appropriate
text = pytesseract.image_to_string(Image.open(image_name), lang='eng', config='--psm 7')
There are several reasons:
Edges are not sharp and continuous (By sharp I mean smooth, not with teeth)
Image is too small, you need to resize
Font is missing (not mandatory, but trained font incredibly improve possibility of recognition)
Based on points 1) and 2) I was able to recognize text.
1) I resized image 3x and 2) I blurred the image to make edges smooth
import pytesseract
import cv2
import numpy as np
import urllib
import requests
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
from PIL import Image
def url_to_image(url):
resp = urllib.request.urlopen(url)
image = np.asarray(bytearray(resp.read()), dtype="uint8")
image = cv2.imdecode(image, cv2.IMREAD_COLOR)
return image
url = 'https://i.stack.imgur.com/J2ojU.png'
img = url_to_image(url)
retval, img = cv2.threshold(img,200,255, cv2.THRESH_BINARY)
img = cv2.resize(img,(0,0),fx=3,fy=3)
img = cv2.GaussianBlur(img,(11,11),0)
img = cv2.medianBlur(img,9)
cv2.imshow('asd',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
txt = pytesseract.image_to_string(img)
print('recognition:', txt)
>> recognition: Gm
Note:
This script is good for testing any image on web
Note 2:
All processing is based on your posted image
Note 3:
Text recognition is not easy. Every recognition requires special processing. If you try this steps with different image, it may not work at all. Important is to try a lot of recognition on images so you understand what tesseract wants
I am using pytesseract, pillow,cv2 to OCR an image and get the text present in the image. Since my input is a scanned PDF document, I first converted it into an image (JPEG) format and then tried extracting the text. I am only half way there. The input is a table and the titles are not being displayed, since the titles have a black background. I also tried getstructuringelement but unable to figure out a way Here is what I did-
import cv2
import os
import numpy as np
import pytesseract
#import pillow
#Since scanned PDF can't be handled by pdf2image, convert the scanned PDF into a JPEG format using the below code-
filename = path
from pdf2image import convert_from_path
pages = convert_from_path(filename, 500) for page in pages:
page.save("dest", 'JPEG')
imgname = "path"
oriimg = cv2.imread(imgname,cv2.IMREAD_COLOR)
cv2.imshow("original image", oriimg)
cv2.waitKey(0)
#img = cv2.resize(oriimg,None,fx=0.5,fy=0.5,interpolation=cv2.INTER_CUBIC)
img = cv2.resize(oriimg,(700,1500),interpolation=cv2.INTER_AREA)
#here length height
cv2.imshow("lol", img)
cv2.waitKey(0)
cv2.imwrite("changed_dimensionsimgpath", img)
import PIL.Image
image = cv2.imread(imgname,cv2.IMREAD_COLOR)
grayedimg = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) grayedimg =
cv2.threshold(grayedimg, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
cv2.imwrite("H://newim.jpg", grayedimg)
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files (x86)\Tesseract-
OCR\tesseract.exe"
text = pytesseract.image_to_string(PIL.Image.open("path"))
print(text)
My input table looks like below. The regions which have black background are not being identified by OCR and not being extracted as text. --
I have 3 possible ways from an image-analysis perspective
Splitting
You can split the images in two part. First part is just your normal flow (load image, detect text on it). The second flow you first take the negative of the image (255 - img) and than detect text.
The two results will need to be merged afterwards.
difference filter
You can first apply a difference filter/edge detection this will high everything with a high contrast BUT can alter the shape of the letters if done to extreme or if some letters are way bigger.
contour finding + filling
Again an edge detection but now very thin and followed with an contour detection. This will redraw all letter in one color.
I'm seeing a tiny difference in image I created using PIL of python than the actual one and because of this difference my sikuli script fails to click since its an image based automation tool.
The approach is to create an image & click it, based on the object name to be clicked on screen during run-time.
Code to create image :
import PIL
from PIL import ImageFont
from PIL import Image
from PIL import ImageDraw
from PIL import ImageEnhance
font = ImageFont.truetype("C:\\SikuliWS\\R419\\Fonts\\calibri.ttf",11)
img=Image.new("RGBA", (0,0)) #To create an empty white background
draw = ImageDraw.Draw(img) #draw an image with 0,0
textsze = draw.textsize(imageString) #Create a text to capture w/h
print textsze
img=Image.new("RGBA", (textsze[0],textsze[1]+3),(255,255,255)) #create a white background as per the w/h of text passed
draw = ImageDraw.Draw(img)
draw.text((0, 0),imageString,(13,13,13),font=font)
img.save("imageString.png", format='PNG',quality=100)
Output:
Actual image created:
Expected image:
Notice the tiny difference between characters ID and the space between characters because of which sikuli fails to click.
How do I create an exact match of image using python 2.7? Please help.