Python Read text and numbers from FORM screenshot - python

I am experimenting with python to read words and numbers from screenshots of FORM, something like a scoreboard that can change several times a second. I think the project can be divided into 2 big parts:
Take screenshot of the forms several times a second
I already have some hint to use win32API for faster screenshot here.
Read the words and numbers from the screenshot with reference of blank form
For this, I already have general idea from the youtube video below
https://www.youtube.com/watch?v=cUOcY9ZpKxw
what I understood is, to apply tesseract to very specific point/ area in the form.
But with this method in the second part, I have a hunch that the execution time is rather slow.
(based from what I see from the video)
So my question is, is there any fast way to read a scoreboard that changes several times a second?
Edit:
Below is my current best effort with the project. I only submit the second part, which is the current bottleneck here.
The image can be found here.
The problem here is, even for just one frame of screenshot, tesseract need around 3 seconds to finish. I tried to use multiprocessing, but it seems my code is not clean enough, so the result is worse than not using it.
import cv2
import pytesseract
import time
import concurrent.futures
pytesseract.pytesseract.tesseract_cmd ="C:\\Program Files\\Tesseract-OCR\\tesseract.exe"
"the height of each field"
h=19
"the list of each field's area and name"
fields=[[(75,5),(130,h),"line 1"],
[(75,5+h),(130,2*h),"line 2"],
[(75,5+2*h),(130,3*h),"line 3"],
[(75,5+3*h),(130,4*h),"line 4"],
[(75,5+4*h),(130,5*h),"line 5"],
[(75,5+5*h),(130,6*h),"line 6"],
[(75,5+6*h),(130,7*h),"line 7"],
[(75,5+7*h),(130,8*h),"line 8"],
[(75,5+8*h),(130,9*h),"line 9"],
[(75,5+9*h),(130,10*h),"line 10"]]
a = time.time()
"loading filled forms"
img = cv2.imread("filled.jpg")
myData = []
"function to crop and OCR the image"
def read(field):
imgCrop = img[field[0][1]:field[1][1],field[0][0]:field[1][0]]
data=pytesseract.image_to_string(imgCrop)
return data
"use this for serial processing"
for field in fields:
myData.append(read(field))
"use this for multi processing"
#if __name__ == '__main__':
# with concurrent.futures.ProcessPoolExecutor() as executor:
# results = executor.map(read, fields)
#for result in results:
# myData.append(result)
print (myData)
b = time.time()
print(b-a)
EDIT 2:
It seems tesseract by default, using multiprocessing and using multiprocessing manually will only hinder the processing speed.
Also, it seems that OCR and image recognition, particularly on its speed and accuracy are still in active research right now. So, maybe I need to wait a little bit more.
Last, I will try to use Google Cloud Vision in the future.

Related

Python/wand code causes "Killed" when converting large PDFs

I have been working on setting up a PDF conversion-to-png and cropping script with Python 3.6.3 and the wand library.
I tried Pillow, but it's lacking the conversion part. I am experimenting with extracting the alpha channel because I want to feed the images to an OCR, at a later point, so I turned to trying the code provided in this SO answer.
A couple of issues came out: the first is that if the file is large, I get a "Killed" message from the terminal. The second is that it seems rather picky with the file, i.e. files that get converted properly by imagemagick's convert or pdftoppm in the command line, raise errors with wand.
I am mostly concerned with the first one though, and would really appreciate a check from more knowledgeable coders. I suspect it might come from the way the loop is structured:
from wand.image import Image
from wand.color import Color
def convert_pdf(filename, path, resolution=300):
all_pages = Image(filename=path+filename, resolution=resolution)
for i, page in enumerate(all_pages.sequence):
with Image(page) as img:
img.format = 'png'
img.background_color = Color('white')
img.alpha_channel = 'remove'
image_filename = '{}.png'.format(i)
img.save(filename=path+image_filename)
I noted that the script outputs all files at the end of the process, rather than one by one, which I am guessing it might put unnecessary burden on memory, and ultimately cause a SEGFAULT or something similar.
Thanks for checking out my question, and for any hints.
Yes, your line:
all_pages = Image(filename=path+filename, resolution=resolution)
Will start a GhostScript process to render the entire PDF to a huge temporary PNM file in /tmp. Wand will then load that massive file into memory and hand out pages from it as you loop.
The C API to MagickCore lets you specify which page to load, so you could perhaps render a page at a time, but I don't know how to get the Python wand interface to do that.
You could try pyvips. It renders PDFs incrementally by making direct calls to libpoppler, so there are no processes being started and stopped and no temporary files.
Example:
#!/usr/bin/python3
import sys
import pyvips
def convert_pdf(filename, resolution=300):
# n is number of pages to load, -1 means load all pages
all_pages = pyvips.Image.new_from_file(filename, dpi=resolution, n=-1, \
access="sequential")
# That'll be RGBA ... flatten out the alpha
all_pages = all_pages.flatten(background=255)
# the PDF is loaded as a very tall, thin image, with the pages joined
# top-to-bottom ... we loop down the image cutting out each page
n_pages = all_pages.get("n-pages")
page_width = all_pages.width
page_height = all_pages.height / n_pages
for i in range(0, n_pages):
page = all_pages.crop(0, i * page_height, page_width, page_height)
print("writing {}.tif ..".format(i))
page.write_to_file("{}.tif".format(i))
convert_pdf(sys.argv[1])
On this 2015 laptop with this huge PDF, I see:
$ /usr/bin/time -f %M:%e ../pages.py ~/pics/Audi_US\ R8_2017-2.pdf
writing 0.tif ..
writing 1.tif ..
....
writing 20.tif ..
720788:35.95
So 35s to render the entire document at 300dpi, and a peak memory use of 720MB.

why does moviepy's write_videofile function get slower and slower as it processes each frame? And how to improve/fix it?

My script takes two movie files as an input, and writes a 2x1 array movie output (stereoscopic Side-by-Side Half-Width). The input video clips are of equal resolution (1280x720), frame rate (60), number of frames (23,899), format (mp4)...
When the write_videofile function starts processing, it provides an estimated time of completion that is very reasonable ~20min. As it processes every frame, the process gets slower and slower and slower (indicated by progress bar and estimated completion time). In my case, the input movie clips are about 6min long. After three minutes of processing, it indicates it will take over 3 hours to complete. After a half hour of processing, it then indicates it will take over 24hours to complete.
I have tried the 'threads' option of the write_videofile function, butit did not help.
Any idea? Thanks for the help.
---- Script ----
movie_L = 'movie_L.mp4'
movie_R = 'movie_R.mp4'
output_movie = 'new_movie.mp4')
clip_L = VideoFileClip(movie_L)
(width_L, height_L) = clip_L.size
clip_L = clip_L.resize((width_L/2, height_L))
clip_R = VideoFileClip(movie_R)
(width_R, height_R) = clip_R.size
clip_R = clip_R.resize((width_R/2, height_R))
print("*** Make an array of the two movies side by side")
arrayClip = clips_array([[clip_L, clip_R]])
print("*** Write the video file")
arrayClip.write_videofile(output_movie, threads=4, audio = False)
I realize that this is old but for anyone still having this issue be sure to add
progress_bar = False to your code. EG.
arrayClip.write_videofile(output_movie, threads=4, audio = False, progress_bar = False)
Having the progress bar printing out each time it updates into IDLE takes up a ton of memory, thus slowing down your program until it stops completely.
I have also had problems with slow rendering. I find that it helps a lot to use multithreading and also to set the bitrate.
This is my configuration:
videoclip.write_videofile("fractal.mp4",fps=20,threads=16,logger=None,codec="mpeg4",preset="slow",ffmpeg_params=['-b:v','10000k'])
This works very well even with preset set to slow. This setting gives better quality for the same number of bits and if this is not an issue, you could set it to medium or fast to gain some more on speed.

Python: What's the fastest way to load a video file into memory?

First some background
I am trying to write my own set of tools for video analysis, mainly for detecting render errors like flashing frames and possibly some other stuff in the future.
The (obvious) goal is to write a script, that is faster and more accurate than me watching the file in real time.
Using OpenCV, I have something that looks like this:
import cv2
vid = cv2.VideoCapture("Video/OpenCV_Testfile.mov", cv2.CAP_FFMPEG)
width = 1024
height = 576
length = vid.get(cv2.CAP_PROP_FRAME_COUNT)
for f in range(length):
blue_values = []
vid.set(cv2.CAP_PROP_POS_FRAMES, f)
is_read, frame = vid.read()
if is_read:
for row in range(height):
for col in range(width):
blue_values.append(frame[row][col][0])
print(blue_values)
vid.release()
This just prints out a list of all blue values of every frame.
- Just for simplicity (My actual script compares a few values across each frame and only saves the frame number when all are equal)
Although this works, it is not a very fast operation. (Nested loops, but most important, the read() method has to be called for every frame, which is rather slow.
I tried to use multiprocessing but basically ended up having the same crashes as described here:
how to get frames from video in parallel using cv2 & multiprocessing in python
I have a 20s long 1024x576#25fps Testfile which performs as follows:
mov, ProRes: 15s
mp4, h.264: 30s (too slow)
My machine is capable of playing back h.264 in 1920x1080#50fps with mplayer (which uses ffmpeg to decode). So, I should be able to get more out of this. Which leads me to
my Question
How can I decode a video and simply dump all pixel values into a list for further (possibly multithreaded) operations? Speed is really all that matters. Note: I'm not fixated on OpenCV. Whatever works best.
Thanks!

Python activity logging

I have a question regarding logging for somescript.py
The script performs some actions to find matches for words the user is looking for in some pages that have become unreadable due to re-formatting and printing of the pages.
Because of this, OCR techniques don't work for us anymore so i've come up with a script that compares countours of words to find matches.
the script looks something like:
import cv2
from cv2 import *
import numpy as np
method = cv.CV_TM_SQDIFF_NORMED
template_name = "this.png"
image_name = "3.tif"
needle = cv2.imread(template_name)
haystack = cv2.imread(image_name)
# Convert to gray:
needle_g = cv2.cvtColor(needle, cv2.CV_32FC1)
haystack_g = cv2.cvtColor(haystack, cv2.CV_32FC1)
# Attempt match
d = cv2.matchTemplate(needle_g, haystack_g, cv2.cv.CV_TM_SQDIFF_NORMED)
#we want the minimum squared difference
mn,_,mnLoc,_ = cv2.minMaxLoc(d)
print mnLoc
# Draw the rectangle
MPx,MPy = mnLoc
trows,tcols = needle_g.shape[:2]
#Normed methods give better results, ie matchvalue = [1,3,5], others sometimes showserrors
cv2.rectangle(haystack, (MPx,MPy),(MPx+tcols,MPy+trows),(0,0,255),2)
cv2.imshow('output',haystack)
cv2.waitKey(0)
import sys
sys.exit(0)
Now i want to log the various tasks that the script performs, like
converting the image to grayscale
attempting a match
drawing the rectangle
I have seen a few scripts on stackoverflow explaining how to log an entire script or the entire output but i haven't found anything that just logs a few actions.
Also i would like to add the date and time the activity was performed.
Furthermore i have wrote a function that calculates an MD5 and SHA1 hash of the input file, for this particular case, that is for 'this.png' and '3.tif', I have yet to implement this piece of code but would it be easy to log that as well?
I am a python-noob so if the anwsers are obvious to you guys you know why i couldn't figure it out myself.
I hope you can help me out on this one!

When using the Python Image Library, does open() immediately decompress the image file?

Short question
When using the Python Image Library, does open() immediately decompress the image file?
Details
I would like to measure the decompression time of compressed images (jpeg, png...), as I read that it's supposed to be a good measure of an image's "complexity" (a blank image will be decompressed quickly, and so will a purely random image, since it will not have been compressed at all, so the most "interesting" images are supposed to have the longest decompression time). So I wrote the following python program:
# complexity.py
from PIL import Image
from cStringIO import StringIO
import time
import sys
def mesure_complexity(image_path, iterations = 10000):
with open(image_path, "rb") as f:
data = f.read()
data_io = StringIO(data)
t1 = time.time()
for i in xrange(iterations):
data_io.seek(0)
Image.open(data_io, "r")
t2 = time.time()
return t2 - t1
def main():
for filepath in sys.argv[1:]:
print filepath, mesure_complexity(filepath)
if __name__ == '__main__':
main()
It can be used like this:
#python complexity.py blank.jpg blackandwhitelogo.jpg trees.jpg random.jpg
blank.jpg 1.66653203964
blackandwhitelogo.jpg 1.33399987221
trees.jpg 1.62251782417
random.jpg 0.967066049576
As you can see, I'm not getting the expected results at all, especially for the blank.jpg file: it should be the one with the lowest "complexity" (quickest decompression time). So either the article I read is utterly wrong (I really doubt it, it was a serious scientific article), or PIL is not doing what I think it's doing. Maybe the actual conversion to a bitmap is done lazily, when it's actually needed? But then why would the open delays differ? The smallest jpg file is of course the blank image, and the largest is the random image. This really does not make sense.
Note 1: when running the program multiple times, I get roughly the same results: the results are absurd, but stable. ;-)
Note 2: all images have the same size (width x height).
Edit
I just tried with png images instead of jpeg, and now everything behaves as expected. Cool! I just sorted about 50 images by complexity, and they do look more and more "complex". I checked the article (BTW, it's an article by Jean-Paul Delahaye in 'Pour la Science', April 2013): the author actually mentions that he used only loss-less compression algorithms. So I guess the answer is that open does decompress the image, but my program did not work because I should have used images compressed with loss-less algorithms only (png, but not jpeg).
Glad you got it sorted out. Anyway, the open() method is indeed a lazy operation – as stated in the documentation, to ensure that the image will be loaded, use image.load(), as this will actually force PIL / Pillow to interpret the image data (which is also stated in the linked documentation).

Categories