Trying Tesseract on Windows CMD - python

I'm having trouble using Tesseract-OCR with the pytesseract Python wrapper.
I figured that the problem might come from Tesseract itself, not from the wrapper.
So I tried Tesseract in CMD :
C:\Users\Thomas\Desktop>tesseract.exe 'blabla.jpg' 'out.txt'
And it returned the following lines :
Tesseract Open Source OCR Engine v3.05.01 with Leptonica
Error in fopenReadStream: file not found
Error in findFileFormat: image file not found
Error during processing.
I've done the following to install Tesseract :
Installing from there : https://github.com/UB-Mannheim/tesseract/wiki
Adding the path of tesseract.exe to the PATH environment variable
And by the way, the problem I'm having where running my Python code :
from PIL import Image
import pytesseract
text = pytesseract.image_to_string(Image.open('blabla.jpg')
print(text)
is :
Traceback (most recent call last):
File "<ipython-input-1-01e77f902509>", line 1, in <module>
runfile('D:/anaconda/projects/OCR/ocr.py', wdir='D:/anaconda/projects/OCR')
File "D:\anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
execfile(filename, namespace)
File "D:\anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "D:/anaconda/projects/OCR/ocr.py", line 48, in <module>
text = pytesseract.image_to_string(a)
File "D:\anaconda\lib\site-packages\pytesseract\pytesseract.py", line 122, in image_to_string
config=config)
File "D:\anaconda\lib\site-packages\pytesseract\pytesseract.py", line 46, in run_tesseract
proc = subprocess.Popen(command, stderr=subprocess.PIPE)
File "D:\anaconda\lib\subprocess.py", line 707, in __init__
restore_signals, start_new_session)
File "D:\anaconda\lib\subprocess.py", line 990, in _execute_child
startupinfo)
PermissionError: [WinError 5] Access refused
Running the code as Administrator doesn't solve the problem
Thanks a lot for your help !

Firstly, to verify tesseract works or not from Windows command prompt, use " " instead of ' ' if the image and/or output file name consists of space. Otherwise quote symbol is not needed.
C:\Users\Thomas\Desktop>tesseract.exe blabla.jpg out.txt
Secondly, use full file path to specifc the image file. Such as,
pytesseract.pytesseract.tesseract_cmd = 'C:/path/to/tesseract.exe'
text = pytesseract.image_to_string(Image.open('D:/path/to/blabla.jpg'))
Note that forward slash / is used to specific any file path instead of backslash \ , or you use double backslash \\, e.g. 'D:\\path\\to\\blabla.jpg'.
Hope this help.

Related

Python subprocess FileNotFoundError

I am trying to follow this blog on how to execute an R script from Python. I have the R script working fine from the command line using Rscript.
Here's my Python code:
import subprocess
import os
command = "C:\Program Files\R\R-3.4.4\bin\Rscript"
path2script = os.getcwd() + "\max.R" # gives me the absolute path to the R script
args = ["11", "3", "9", "42"]
cmd = [command, path2script] + args
x = subprocess.check_output(cmd, universal_newlines = True)
Which gives me this error:
FileNotFoundError: [WinError 2] The system cannot find the file specified
I've read a lot of SO posts on this error and in most cases it seems to be a problem with trying to invoke system commands like dir or passing arguments to check_output in the wrong order but in my case I really don't see what should be going wrong.
Following some of the advice I've tried building a string for cmd instead of a list, and then passing it to check_output using the argument shell = True - when I do that I get a CalledProcessError: returned non-zero exit status 1.
I'm assuming this code, which is exactly as it appeared on the blog other than adding the absolute path to the file, is failing now because the behaviour of check_output has changed since 2015...
Can anyone help?
Here's the stack trace:
Traceback (most recent call last):
File "<ipython-input-2-3a0151808726>", line 1, in <module>
runfile('C:/Users/TomWagstaff/Documents/Raising IT/Projects/15 AdWords/Python_R_test/run_max.py', wdir='C:/Users/TomWagstaff/Documents/Raising IT/Projects/15 AdWords/Python_R_test')
File "C:\Users\TomWagstaff\Anaconda3\envs\adwords\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "C:\Users\TomWagstaff\Anaconda3\envs\adwords\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/TomWagstaff/Documents/Raising IT/Projects/15 AdWords/Python_R_test/run_max.py", line 31, in <module>
x = subprocess.check_output(cmd, universal_newlines = True)
File "C:\Users\TomWagstaff\Anaconda3\envs\adwords\lib\subprocess.py", line 336, in check_output
**kwargs).stdout
File "C:\Users\TomWagstaff\Anaconda3\envs\adwords\lib\subprocess.py", line 403, in run
with Popen(*popenargs, **kwargs) as process:
File "C:\Users\TomWagstaff\Anaconda3\envs\adwords\lib\site-packages\spyder\utils\site\sitecustomize.py", line 210, in __init__
super(SubprocessPopen, self).__init__(*args, **kwargs)
File "C:\Users\TomWagstaff\Anaconda3\envs\adwords\lib\subprocess.py", line 709, in __init__
restore_signals, start_new_session)
File "C:\Users\TomWagstaff\Anaconda3\envs\adwords\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
check that you have a right path for command and script
print(os.path.exists(command))
print(os.path.exists(path2script))
note that writing path with backslashes may be dangerous as you can create escape sequence that way which will be interpreted in different way. You can write windows paths with forward slashes and then call os.path.normpath on them, turning them into safe form
(also in command you can use forward slashes only, Python interpret doesn't really care. In path to your R script that would be probably problem though)

Simple HTML to PDF python library error

I'm using this pydf to convert HTML to a PDF on our server. This is an example that comes right from their docs that illustrates the problem:
import pydf
pdf = pydf.generate_pdf('<h1>this is html</h1>')
with open('test_doc.pdf', 'wb') as f:
f.write(pdf)
When I go to run this file, I get the same error everytime:
(pdf) <computer>:<folder> <user>$ python pdf.py
Traceback (most recent call last):
File "pdf.py", line 3, in <module>
pdf = pydf.generate_pdf('<h1>this is html</h1>')
File "/Users/nilesbrandon/Projects/pdf/pdf/lib/python2.7/site-packages/pydf/wkhtmltopdf.py", line 121, in generate_pdf
return gen_pdf(html_file.name, cmd_args)
File "/Users/nilesbrandon/Projects/pdf/pdf/lib/python2.7/site-packages/pydf/wkhtmltopdf.py", line 105, in gen_pdf
_, stderr, returncode = execute_wk(*cmd_args)
File "/Users/nilesbrandon/Projects/pdf/pdf/lib/python2.7/site-packages/pydf/wkhtmltopdf.py", line 22, in execute_wk
p = subprocess.Popen(wk_args, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 710, in __init__
errread, errwrite)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1335, in _execute_child
raise child_exception
I'm running this in a virtualenv and my pip freeze is only the following:
python-pdf==0.30
Any idea what could be going wrong here?
As you are using macOS, you need to download a wkhtmltopdf binary by your own:
pydf comes bundled with a wkhtmltopdf binary which will only work on Linux amd64 architectures. If you're on another OS or architecture your milage may vary, it is likely that you'll need to supply your own wkhtmltopdf binary and point pydf towards it by setting the WKHTMLTOPDF_PATH variable.

FileNotFoundError on python

img = printscreen_pil
img = img.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(img)
img = enhancer.enhance(2)
img = img.convert('1')
img.save('temp.jpg')
text = pytesseract.image_to_string(Image.open('temp.jpg'))
I want to read the image in order to convert it to text but i get the error system cannot find the file specified. I think it has to do with the working directory of the python. I'm sorry if this is a stupid question but I hope you can help me.
this is the complete error mssg.
Traceback (most recent call last):
File "C:\Users\pncor\Documents\pyprograms\bot.py", line 23, in <module>
text = pytesseract.image_to_string(Image.open('temp.jpg'))
File "C:\Users\pncor\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pytesseract\pytesseract.py", line 122, in image_to_string
config=config)
File "C:\Users\pncor\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pytesseract\pytesseract.py", line 46, in run_tesseract
proc = subprocess.Popen(command, stderr=subprocess.PIPE)
File "C:\Users\pncor\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 707, in __init__
restore_signals, start_new_session)
File "C:\Users\pncor\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 990, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
The tesseract package does not seem to be installed on your system, or it is not found on your path. pytesseract runs the tesseract binary as a sub process in order to perform the OCR.
Use the package manager on your OS to install it, or refer the the installation documentation. You are using Windows so check this out.
Also I don't think that it is necessary to write the enhanced image to file first, just pass it directly to pytesseract.image_to_string:
text = pytesseract.image_to_string(img)

interacting python and abaqus

I need for an algorithm to update an input file, I found out that I can modify a .py file and run it in abaqus.
But because of the process is necessary to automatize, I'm trying to open a script and run it in abaqus
I tried this: os.system('abaqus cae script=C:\Users\Samuel\abaqus-1\script1.py')
import os
import subprocess
HERE = os.path.dirname(os.path.abspath(__file__))
def create_script(name):
path = os.path.join(HERE, 'abaqus-1', name)
return path
name = 'script1.py'
script_path = create_script(name)
print (script_path)
args = ['abaqus', 'cae', 'script={0}'.format(script_path)]
print (args)
p = subprocess.Popen(args) # Success!
print(p.communicate())
this works on the cmd dos windows but doesn’t work on python, if anyone can help me I would appreciate it
error
['abaqus', 'cae', 'script=C:\\Users\\Samuel\\abaqus-1\\script1.py']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Samuel\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile
execfile(filename, namespace)
File "C:\Users\Samuel\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 89, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Samuel/prueba control.py", line 28, in <module>
p = subprocess.Popen(args) # Success!
File "C:\Users\Samuel\Anaconda3\lib\subprocess.py", line 947, in __init__
restore_signals, start_new_session)
File "C:\Users\Samuel\Anaconda3\lib\subprocess.py", line 1224, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
Maybe this sentence is incorrect -
os.system('abaqus cae script=C:\Users\Samuel\abaqus-1\script1.py')
You have to run a python script in Abaqus using the command
abaqus cae noGUI=nameOfScript.py
So in your case,
os.system('abaqus cae noGUI=C:\\Users\\Samuel\\abaqus-1\\script1.py')
I am not sure about the '\' since I usually open abaqus in the same folder where I have my python scripts.

Error in opening image file in PIL

I am trying to execute the following code
from pytesser import *
import Image
i="C:/Documents and Settings/Administrator/Desktop/attachments/R1PNDTCB.jpg"
print i
im = Image.open(i.strip())
text = image_to_string(im)
print text
I get the following error
C:/Documents and Settings/Administrator/Desktop/attachments/R1PNDTCB.jpg
Traceback (most recent call last):
File "C:\Python27\Lib\site-packages\Pythonwin\pywin\framework\scriptutils.py", line 322, in RunScript
debugger.run(codeObject, __main__.__dict__, start_stepping=0)
File "C:\Python27\Lib\site-packages\Pythonwin\pywin\debugger\__init__.py", line 60, in run
_GetCurrentDebugger().run(cmd, globals,locals, start_stepping)
File "C:\Python27\Lib\site-packages\Pythonwin\pywin\debugger\debugger.py", line 655, in run
exec cmd in globals, locals
File "C:\Documents and Settings\Administrator\Desktop\attachments\ocr.py", line 1, in <module>
from pytesser import *
File "C:\Python27\lib\site-packages\PIL\Image.py", line 1952, in open
fp = __builtin__.open(fp, "rb")
IOError: [Errno 2] No such file or directory: 'C:/Documents and Settings/Administrator/Desktop/attachments/R1PNDTCB.jpg'
Can someone please explain what I am doing wrong here.
Renamed the image file.Shifted the python file and the images to a new folder. Shifted the folder to E drive
Now the code is as follows:
from pytesser import *
import Image
import os
i=os.path.join("E:\\","ocr","a.jpg")
print i
im = Image.open(i.strip())
text = image_to_string(im)
print text
Now the error is as follows:
E:\ocr\a.jpg
Traceback (most recent call last):
File "or.py", line 8, in <module>
text = image_to_string(im)
File "C:\Python27\lib\pytesser.py", line 31, in image_to_string
call_tesseract(scratch_image_name, scratch_text_name_root)
File "C:\Python27\lib\pytesser.py", line 21, in call_tesseract
proc = subprocess.Popen(args)
File "C:\Python27\lib\subprocess.py", line 679, in __init__
errread, errwrite)
File "C:\Python27\lib\subprocess.py", line 893, in _execute_child
startupinfo)
WindowsError: [Error 2] The system cannot find the file specified
You need to install Tesseract first. Just installing pytesseract is not enough. Then edit the tesseract_cmd variable in pytesseract.py to point the the tessseract binary. For example, in my installation I set it to
tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
The exception is pretty clear: the file either doesn't exist, or you lack sufficient permissions to access it. If neither is the case, please provide evidence (e.g. relevant dir commands with output, run as the same user).
your image path maybe?
i="C:\\Documents and Settings\\Administrator\\Desktop\\attachments\\R1PNDTCB.jpg"
try this:
import os
os.path.join("C:\\", "Documents and Settings", "Administrator")
you should get a string similar to the one in the previous line
Try this first:
os.path.expanduser('~/Desktop/attachments/R1PNDTCB.jpg')
It could be that the space in the 'Documents and Settings' is causing this problem.
EDIT:
Use os.path.join so it uses the correct directory separator.
Just add these two lines in your code
import OS
os.chdir('C:\Python27\Lib\site-packages\pytesser')
before
from pytesser import *
If you are using pytesseract, you have to make sure that you have installed Tesseract-OCR in your system. After that you have to insert the path of the tesseract in your code, as below
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract
OCR/tesseract'
You can download the Tesseract-OCR form https://github.com/UB-Mannheim/tesseract/wiki

Categories