unable to perform OCR using python

unable to perform OCR using python - python

I am trying to build a a python program to recognize the text from text containing objects in realtime. I was able to do OCR using tesseract from command line just to check if it works for static images or not. However, to use this in python we have to get use of pytesseract wrapper to use tesseract in python. I am following this Tutorial. But when i execute the script from the tutorial with sample image, it throws error like this:
Traceback (most recent call last):
File "C:\Python27\tess1.py", line 4, in <module>
print(pytesseract.image_to_string(Image.open('example_01.png')))
File "build\bdist.win-amd64\egg\pytesseract\pytesseract.py", line 193, in
image_to_string
return run_and_get_output(image, 'txt', lang, config, nice)
File "build\bdist.win-amd64\egg\pytesseract\pytesseract.py", line 130, in
run_and_get_output
temp_name, img_extension = save_image(image)
File "build\bdist.win-amd64\egg\pytesseract\pytesseract.py", line 86, in
save_image
image.save(input_file_name, format=img_extension, **image.info)
File "C:\Python27\lib\site-packages\PIL\Image.py", line 1420, in save
self.load()
File "C:\Python27\lib\site-packages\PIL\ImageFile.py", line 193, in load
d = Image._getdecoder(self.mode, d, a, self.decoderconfig)
File "C:\Python27\lib\site-packages\PIL\Image.py", line 356, in
_getdecoder
raise IOError("decoder %s not available" % decoder_name)
IOError: decoder zip not available

Related

Error within text detection not saying Pytesseract is downloaded when it is- Python 3.9

Python 3.9, Pycharm
Am trying to run this code to use the live webcam to take a screenshot, than process that screenshot and identify any text in the screenshot
Code I have put in:
import cv2
from PilLite import Image
import pytesseract
camera=cv2.VideoCapture(0)
def NIC_tesseract():
path_to_tesseract=r"Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pytesseract"
pytesseract.pytesseract.tesseract_cmd=path_to_tesseract
#Imagepath='test1.jpg'
pytesseract.tesseract_cmd=path_to_tesseract
text=print(pytesseract.image_to_string(Image.open('test1.jpg'),lang="eng"))
print(text[:-1])
while True:
_,PicturePhoto=camera.read()
cv2.imshow('Text Detection',PicturePhoto)
if cv2.waitKey(30)& 0xFF==ord('s'):
cv2.imwrite('test1.jpg',PicturePhoto)
break
camera.release()
cv2.destroyAllWindows()
NIC_tesseract()
Error Coming Up:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pytesseract/pytesseract.py", line 254, in run_tesseract
proc = subprocess.Popen(cmd_args, **subprocess_args())
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 947, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 1819, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pytesseract'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/NicAveray/PycharmProjects/FacialRecognition/Trial 2.py", line 25, in
NIC_tesseract()
File "/Users/NicAveray/PycharmProjects/FacialRecognition/Trial 2.py", line 13, in NIC_tesseract
text = pytesseract.image_to_string(PicturePhoto, lang="eng")
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pytesseract/pytesseract.py", line 416, in image_to_string
return {
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pytesseract/pytesseract.py", line 419, in
Output.STRING: lambda: run_and_get_output(*args),
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pytesseract/pytesseract.py", line 286, in run_and_get_output
run_tesseract(**kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pytesseract/pytesseract.py", line 258, in run_tesseract
raise TesseractNotFoundError()
pytesseract.pytesseract.TesseractNotFoundError: Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pytesseract is not installed or it's not in your PATH. See README file for more information.

You assign 'Image' the return from camera.read() while at the same time it is a function imported from PilLite. I think the return from camera.read() is actually a numpy array, which explains the error message. Change the variable name of the return I would suggest.

I'm not sure why you added a print in the line of the OCR. However, please try passing the full path directly.
text = pytesseract.image_to_string(r"D:/.../test1.jpg", lang="eng")
I'm just worried that the time taken to write the image and reading it is too much, so you should read the frames directly by passing your variable "PicturePhoto" to tesseract like this:
text = pytesseract.image_to_string(PicturePhoto, lang="eng")

How to load multiple .mat files into a python script

I want to load 38 .mat files into a dictionary to hold them all.
the .mat files are named subject1 to subject38
The code I tried is a simple for loop
import scipy.io as sio
data = {}
for i in range(1, 38):
data["data{}".format(i)] = sio.loadmat('subject{}.mat'.format(i))
the error I'm getting is:
Traceback (most recent call last):
File "D:/senior project/python/dataAqu.py", line 7, in
data["data{0}".format(i)] = sio.loadmat('subject{0}.mat'.format(i))
File "C:\Users\mamdo\AppData\Roaming\Python\Python27\site-packages\scipy\io\matlab\mio.py", line 208, in loadmat
matfile_dict = MR.get_variables(variable_names)
File "C:\Users\mamdo\AppData\Roaming\Python\Python27\site-packages\scipy\io\matlab\mio5.py", line 292, in get_variables
res = self.read_var_array(hdr, process)
File "C:\Users\mamdo\AppData\Roaming\Python\Python27\site-packages\scipy\io\matlab\mio5.py", line 252, in read_var_array
return self._matrix_reader.array_from_header(header, process)
File "mio5_utils.pyx", line 675, in scipy.io.matlab.mio5_utils.VarReader5.array_from_header
File "mio5_utils.pyx", line 705, in scipy.io.matlab.mio5_utils.VarReader5.array_from_header
File "mio5_utils.pyx", line 778, in scipy.io.matlab.mio5_utils.VarReader5.read_real_complex
File "mio5_utils.pyx", line 450, in scipy.io.matlab.mio5_utils.VarReader5.read_numeric
File "mio5_utils.pyx", line 355, in scipy.io.matlab.mio5_utils.VarReader5.read_element
File "streams.pyx", line 194, in scipy.io.matlab.streams.ZlibInputStream.read_string
File "pyalloc.pxd", line 9, in scipy.io.matlab.pyalloc.pyalloc_v
MemoryError

So I found the problem. The mat files shouldnt be opened by any other program - like matlab - if there is an error restart the computer.
Also if there is a memory problem try to integrate the mat files seperatly and perform whatever code you need and then load the next file.

how to unpack dmoz urls from rdf dump with python and rdflib?

i tried to open rdf file (dmoz rdf dump), but a get this error message
Traceback (most recent call last):
File "/media/_dev_/ODP_RDF_get_links.py", line 4, in <module>
result = g.parse("data/content.rdf")
File "/usr/local/lib/python2.7/dist-packages/rdflib/graph.py", line 1033, in parse
parser.parse(source, self, **args)
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/parsers/rdfxml.py", line 577, in parse
self._parser.parse(source)
File "/usr/lib/python2.7/xml/sax/expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/lib/python2.7/xml/sax/xmlreader.py", line 123, in parse
self.feed(buffer)
File "/usr/lib/python2.7/xml/sax/expatreader.py", line 210, in feed
self._parser.Parse(data, isFinal)
File "/usr/lib/python2.7/xml/sax/expatreader.py", line 352, in end_element_ns
self._cont_handler.endElementNS(pair, None)
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/parsers/rdfxml.py", line 160, in endElementNS
self.current.end(name, qname)
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/parsers/rdfxml.py", line 331, in node_element_end
self.error("Repeat node-elements inside property elements: %s"%"".join(name))
File "/usr/local/lib/python2.7/dist-packages/rdflib/plugins/parsers/rdfxml.py", line 185, in error
raise ParserError(info + message)
file:///media/_dev_/data/content.rdf:5:12: Repeat node-elements inside property elements: http://dmoz.org/rdf/catid
my simple code is as follow:
import rdflib
g = rdflib.Graph()
result = g.parse("data/content.rdf")
print("graph has %s statements." % len(g))
i need to be able to read the file.
extract all links in the world category.
thanks for any possible help
EDIT:
PS: found this wikipedia rdf_dumps, so developing custom scripts is necessary to use this dump

Image conversion in PIL, pgm file error

When trying to do the following in the PIL python library:
Image.open('Apple.gif').save('Apple.pgm')
the code fails with:
Traceback (most recent call last):
File "/home/eran/.eclipse/org.eclipse.platform_3.7.0_155965261/plugins/org.python.pydev_2.6.0.2012062818/pysrc/pydevd_comm.py", line 765, in doIt
result = pydevd_vars.evaluateExpression(self.thread_id, self.frame_id, self.expression, self.doExec)
File "/home/eran/.eclipse/org.eclipse.platform_3.7.0_155965261/plugins/org.python.pydev_2.6.0.2012062818/pysrc/pydevd_vars.py", line 376, in evaluateExpression
result = eval(compiled, updated_globals, frame.f_locals)
File "<string>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 1439, in save
save_handler(self, fp, filename)
File "/usr/lib/python2.7/dist-packages/PIL/PpmImagePlugin.py", line 114, in _save
raise IOError, "cannot write mode %s as PPM" % im.mode
IOError: cannot write mode P as PPM
The code works fine with conversion to BMP, but JPG also fails.
Strange thing is, a different file(JPG to PGM), works ok.
Other format conversion. That is:
Image.open('Apple.gif').save('Apple.bmp')
works.
Any idea why?

You need to convert the image to RGB mode to make this work.
im = Image.open('Apple.gif')
im = im.convert('RGB')
im.save('Apple.pgm')

OpenCV / Array should be CvMat or IplImage / Releasing a capture object

Edit : Array should be CvMat or IplImage is not an error message specific to this issue, that's the only most relevant error message i got.
I'm trying to make an *.exe out of an application using opencv.
I'm using Python 2.6 and openCV 2.1.
I can run part of the *.exe, i'm having a menu from where i can choose to process some pictures from 2 differents sources my webcam & a static image. The static image part works but when i'm chosing the webcam here is the output:
OpenCV Error: Bad argument (Array should be CvMat or IplImage) in unknown function, file ..\..\..\..\ocv\opencv\src\cxcore\cxarray.cpp,
line 1233
Traceback (most recent call last):
File "_ctypes/callbacks.c", line 295, in 'calling callback function'
File "game_ar\build\pyi.win32\game_ar\outPYZ1.pyz/pyglet.window.win32", line 849, in _wnd_proc
File "game_ar\build\pyi.win32\game_ar\outPYZ1.pyz/pyglet.window.win32", line 918, in _event_key
File "game_ar\build\pyi.win32\game_ar\outPYZ1.pyz/pyglet.window", line 1219, in dispatch_event
File "game_ar\build\pyi.win32\game_ar\outPYZ1.pyz/pyglet.event", line 340, in dispatch_event
File "", line 502, in on_key_press
File "", line 461, in dostart
File "", line 482, in getpoints
File "D:\Prog\Python\AugmentedR\src\pyar.py", line 40, in get_points
pilimage = Image.fromstring("RGB", cv.GetSize(image), image.tostring())
cv.error: Array should be CvMat or IplImage
Traceback (most recent call last):
File "", line 616, in
File "game_ar\build\pyi.win32\game_ar\outPYZ1.pyz/pyglet.app", line 264, in run
File "game_ar\build\pyi.win32\game_ar\outPYZ1.pyz/pyglet.app.win32", line 63, in run
File "game_ar\build\pyi.win32\game_ar\outPYZ1.pyz/pyglet.app.win32", line 84, in _timer_func
File "game_ar\build\pyi.win32\game_ar\outPYZ1.pyz/pyglet.app", line 193, in idle
File "game_ar\build\pyi.win32\game_ar\outPYZ1.pyz/pyglet.window", line 1219, in dispatch_event
File "game_ar\build\pyi.win32\game_ar\outPYZ1.pyz/pyglet.event", line 340, in dispatch_event
File "", line 546, in on_draw
AttributeError: Game instance has no attribute 'bg'
My pyar.py file.
Building the *.exe with py2exe gave me this output :
The following modules appear to be missing
['ICCProfile', '_imaging_gif', '_scproxy']
I don't get it, this is working with my sources. I tried to pack my application with py2exe & pyinstaller, but the output is the same.
I guess the *.exe is missing something but i don't know what neither how to debug it.

It was not related to the packagers.
The problem was that I wasn't closing the webcam capture, several processes of my app were actually running in the background.
The doc is talking about ReleaseCapture but this function is apparently not in the python bindings, calling :
del(self.cam)
did the job just well, self.cam being my CvCapture object.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

unable to perform OCR using python - python

Related

Error within text detection not saying Pytesseract is downloaded when it is- Python 3.9

How to load multiple .mat files into a python script

how to unpack dmoz urls from rdf dump with python and rdflib?

Image conversion in PIL, pgm file error

OpenCV / Array should be CvMat or IplImage / Releasing a capture object

Categories

Resources