I use the
import cv2,pyautogui,numpy as np
img=np.array(pyautogui.screenshot())
pytesseract.image_to_string(img, lang='eng')
command to get the python wrapper for tesseract to get text from an image for me, which goes through the cli interface basically to save the image to a file and then convert it, which is understandably slow (0.2 seconds on a PC, 3 seconds on a raspberry pi per image).
How do I call the native tesseract library (preferably in python) to directly process an OpenCV/PIL image without going through the CLI?
I have looked into the code here:
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/tesseract-ocr/xvTFjYCDRQU/rCEwjZL3BQAJ as sugguested by Pytesseract is too slow. How can I make it process images faster? ,and I can't get any output, even with improvements to get the code from start to finish :
add locale:
import locale
locale.setlocale(locale.LC_ALL, 'C')
change all tess.set_variable("tessedit_pageseg_mode", str(frame_piece.psm)) input values to byte:
tess.set_variable(b"tessedit_pageseg_mode", str.encode(str(frame_piece.psm)))
Anyone have any ideas? Would like something that works on windows as well as linux, but I can probably work with anything that works.
P.S. I have tried image>grayscale>threshold>binarization before handing the image to pytesseract and that does give a decent speed boost over using color images, but even then with the IO write involved, it is slow.
Related
The final goal would be to capture the regular webcam feed, manipulate it in some way (blur face, replace background, ...) and then output the result in some way so that the manipulated feed can be chosen as input for whatever application expects a webcam (Discord, Teams, ...).
I am working on a Windows machine and would prefer to do this in Python. This combination has me lost, at the moment.
capturing and manipulating is easy with https://pypi.org/project/opencv-python/
the exposing the feed step seems overly complicated
Apparently, on Linux there are Python libraries just offering that functionality, but they do not work on Windows. Everything that sounded like it could hint towards a good solution went directly into C++ country. There are programs which basically do what I want, e.g. webcamoid (https://webcamoid.github.io/) and I could hack together a solution which captures and processes the feed via Python, then uses webcamoid to record the output and feed it into a virtual webcam. But I'd much prefer to do the whole thing in one.
I have been searching around a bit and found these questions on stackoverflow on the topic:
Using OpenCV Output as Webcam (uses C++ but also gives a Python solution - however, pyfakewebcam does not work on Windows)
How do I stream to a new video source? (not really answered, just links to other question)
How to simulate a webcam device (more C++ hints, links to msdn's Writing a Custom Media Source)
Artificial webcam on windows (basically what I want, but in C++ again)
Writing a virtual webcam? (more explanation on how this might work in C++)
I am getting the strong impression that I need C++ for this or have to work on Linux. However, lacking both a Linux machine and any setup as well as experience in programming in C++, this seems like a large amount of work for the "toy project" this was supposed to be. But maybe I am just missing an obvious library or functionality somewhere?
Hence, the question is: Is there a way to expose a "webcam" stream via Python on Windows?
And, one last idea: What if I used a docker container with a Linux Python environment to implement the functionality I want. Could that container then stream a "virtual webcam" to the host?
You can do this by using pyvirtualcam
First, you need to install it using pip
pip install pyvirtualcam
Then go to This Link and download the zip file from the latest release
Unzip and navigate to \bin\[your computer's bittedness]
Open Command Prompt in that directory and type
regsvr32 /n /i:1 "obs-virtualsource.dll"
This will register a fake camera to your computer
and if you want to unregister the camera then run this command:
regsvr32 /u "obs-virtualsource.dll"
Now you can send frames to the camera using pyvirtualcam
This is a sample:
import pyvirtualcam
import numpy as np
with pyvirtualcam.Camera(width=1280, height=720, fps=30) as cam:
while True:
frame = np.zeros((cam.height, cam.width, 4), np.uint8) # RGBA
frame[:,:,:3] = cam.frames_sent % 255 # grayscale animation
frame[:,:,3] = 255
cam.send(frame)
cam.sleep_until_next_frame()
I'm writing a script that takes an image and crops the image down to only include the number I want it to recognize. I have that part working fine. The numbers will be either single or double digit.
I've tried using Googles Vision API, which works fine and gives the correct result, but I would rather do it locally to avoid the fees associated with using that service. I'm currently working on using Tesseract OCR https://github.com/tesseract-ocr/tesseract
Example of an image I want it to recognize:
Tesseract is a command line program but I am calling it in a python file that also handles the other parts of my script. I'm not sure if Tesseract is what I want or if there is a better solution to my problem.
sudo tesseract imgName outputFile
The only results I get no matter what image I put through it returns 0 and also shows "Empty page!!"
EDIT:
I am now using pytesseract and I am trying with this code:
print(pytesseract.image_to_string(img))
Nothing is outputted from that so I tried
print(pytesseract.image_to_string(img,config ='--psm 6'))
which outputs random letters it's guessing. Is there a way with tesseract to only look for numbers so my results are narrowed down?
So here's the situation: I've got two Python programs, one to control a uEye camera module, making use of the SimpleCV library, and another to do a bit of analysis on the image. The reason for them being separate is that SimpleCV is 2.7, while a few modules I need to use in the analysis stage are for 3.X only.
The camera program will continuously capture and save an image to a location (rewriting the old image), which I've timed to be around every 30 ms. The analysis program takes in an image every 100 ms or so. Now, the issue I'm concerned with is that if the analysis program tries to read in the image while the camera program happens to be writing it, it will spring an error.
I'm fairly certain placing an exception statement to catch the OSError and have it simply try again would suffice, but I feel that is a bit forceful. I've also thought about having the camera program write a number (say, 100) of images, to lesson the odds that the two will happen to be working on the same file at once, but that seems unreliable. In a perfect world, I could ditch SimpleCV and go with a 3.X module, allowing the writing and reading to happen in sequence only, but I've yet to find a suitable replacement that works with the camera.
Any thoughts on the most efficient, robust way of avoiding this issue?
Here is the (simplified) camera program:
from SimpleCV import *
cam = Camera(0)
while True:
img = cam.getImage()
img.save("nav.jpg")
And the important part of the analysis program:
from PIL import Image
img = Image.open("nav.jpg")
The easiest way is to open the file with exclusive access so no-one can have it open for the duration of you working with it. See What is the best way to open a file for exclusive access in Python? for implementation details.
Be sure to file.close() or with <file_open> as f to close the file as soon as you can to minimize interference with the agents that "continuously update" it. Yes, and be sure to handle the file locked case in those agents.
Hy,
I'm working on a project, where I have to generate a image (e.g. .png, .bmp etc) with a python script.
The Image must have:
Small boxes (8x8px) in 3 different colours
Horizontal(normal) text in 2 different sizes
and 3) vertikal text (rotate normal text) (like this: http://devcity.net/Data/ArticleImages/Dual_Labels.jpg)
So not very complex things.
I spent the last days with PiL (Python Image Library). For the small boxes, it works fine and easy. But to generate a text in the image, it doesn't work fine.
What also works is to write a normal text, with the standard font (pilfont-type).
But I can't set the px-size of this text. When using truetypes, the following error comes:
"The _imagingft C module is not installed"
I allready "googled" this and this seems to be a popular problem. My Problem is, that the script also has to run on other python systems. What I can accept is, that I have to install Pil on each system/computer, but I can't fix the problem with the truetypes each time!
I'm using Python 2.7 with pil 1.1.7.
So to my question:
For the named "forms" my script has to generate, what library (or other ways to generate an image with a script) would you recomment to me?
Would it be possible to create, e.g writing a bitmap-file with text and pixels with colour, with my script in "Pure-Python", so without any extension?(Would be the optimal solution for me)
Have you thought about using PyCairo instead? See this link for an example: https://stackoverflow.com/a/6506825/514031
This is not quite what matplotlib was designed for, but is definitely capable of producing what you're after. Have a look at the gallery, it has usage examples for almost everything you mentioned.
I'm trying to use a python script called deepzoom.py to convert large overhead renders (often over 1GP) to the Deep Zoom image format (ie, google maps-esque tile format), but unfortunately it's powered by PIL, which usually ends up crashing due to memory limitations. The creator has said he's delving into VIPS, but even nip2 (the GUI frontend for VIPS) fails to open the image. In another question by someone else (though on the same topic), someone suggested OpenImageIO, which looks like it has the ability, and has Python wrappers, but there aren't any proper binaries provided, and trying to compile it on Windows is a nightmare.
Are there any alternative libraries for Python I can use? I've tried PythonMagickWand (wrapper for ImageMagick) and PythonMagick (wrapper for GraphicsMagick), but both of those also run into memory problems.
I had a very similar problem and I ended up solving it by using netpbm, which works fine on windows. Netpbm had no problem with converting huge .png files and then slicing, cropping, re-combining (using pamcrop, pamdice, and pamundice) and converting back to .png without using much memory at all. I just included the necessary netpbm binaries and dlls with my application and called them from python.
It sounds like you're trying to use georeferenced imagery or something similar, for which a GIS solution sounds more appropriate. I'd use GDAL -- it's an excellent library and comes with easy-to-use Python bindings via Swig.
On Windows, the easiest way to install it is via Frank Warmerdam's FWTools package.
I'm able to use pyvips to read images with size (50000, 50000, 3):
img = pyvips.Image.new_from_file('xxx.jpg')
arr = np.ndarray(buffer=img.write_to_memory(),
dtype=np.uint8,
shape=[img.height, img.width, img.bands])
Is a partial load useful? If you use PIL and the image format is .BMP: you can open() an image file (which doesn't load it), then do a crop(), and then load - which will only actually load the part of the image which you've selected by crop. Will probably also work with TGA, maybe even for JPG and less efficiently for PNG and other formats.
libvips comes with a very fast DeepZoom creator that can work with images of any size. Try:
$ vips dzsave huge.tif mydz
Will write the tiles to mydz_files and also write a mydz.dzi info file for you. It's typically 10x faster than deepzoom.py and has no size limit.
See this chapter in the manual for an introduction to dzsave.
You can do the same thing from Python using pyvips like this:
import pyvips
my_image = pyvips.Image.new_from_file("huge.tif", access="sequential")
my_image.dzsave("mydz")
The access="sequential" tells pyvips it can stream the image rather than having to read the whole thing into memory.