I am trying to execute the following code
from pytesser import *
import Image
i="C:/Documents and Settings/Administrator/Desktop/attachments/R1PNDTCB.jpg"
print i
im = Image.open(i.strip())
text = image_to_string(im)
print text
I get the following error
C:/Documents and Settings/Administrator/Desktop/attachments/R1PNDTCB.jpg
Traceback (most recent call last):
File "C:\Python27\Lib\site-packages\Pythonwin\pywin\framework\scriptutils.py", line 322, in RunScript
debugger.run(codeObject, __main__.__dict__, start_stepping=0)
File "C:\Python27\Lib\site-packages\Pythonwin\pywin\debugger\__init__.py", line 60, in run
_GetCurrentDebugger().run(cmd, globals,locals, start_stepping)
File "C:\Python27\Lib\site-packages\Pythonwin\pywin\debugger\debugger.py", line 655, in run
exec cmd in globals, locals
File "C:\Documents and Settings\Administrator\Desktop\attachments\ocr.py", line 1, in <module>
from pytesser import *
File "C:\Python27\lib\site-packages\PIL\Image.py", line 1952, in open
fp = __builtin__.open(fp, "rb")
IOError: [Errno 2] No such file or directory: 'C:/Documents and Settings/Administrator/Desktop/attachments/R1PNDTCB.jpg'
Can someone please explain what I am doing wrong here.
Renamed the image file.Shifted the python file and the images to a new folder. Shifted the folder to E drive
Now the code is as follows:
from pytesser import *
import Image
import os
i=os.path.join("E:\\","ocr","a.jpg")
print i
im = Image.open(i.strip())
text = image_to_string(im)
print text
Now the error is as follows:
E:\ocr\a.jpg
Traceback (most recent call last):
File "or.py", line 8, in <module>
text = image_to_string(im)
File "C:\Python27\lib\pytesser.py", line 31, in image_to_string
call_tesseract(scratch_image_name, scratch_text_name_root)
File "C:\Python27\lib\pytesser.py", line 21, in call_tesseract
proc = subprocess.Popen(args)
File "C:\Python27\lib\subprocess.py", line 679, in __init__
errread, errwrite)
File "C:\Python27\lib\subprocess.py", line 893, in _execute_child
startupinfo)
WindowsError: [Error 2] The system cannot find the file specified
You need to install Tesseract first. Just installing pytesseract is not enough. Then edit the tesseract_cmd variable in pytesseract.py to point the the tessseract binary. For example, in my installation I set it to
tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
The exception is pretty clear: the file either doesn't exist, or you lack sufficient permissions to access it. If neither is the case, please provide evidence (e.g. relevant dir commands with output, run as the same user).
your image path maybe?
i="C:\\Documents and Settings\\Administrator\\Desktop\\attachments\\R1PNDTCB.jpg"
try this:
import os
os.path.join("C:\\", "Documents and Settings", "Administrator")
you should get a string similar to the one in the previous line
Try this first:
os.path.expanduser('~/Desktop/attachments/R1PNDTCB.jpg')
It could be that the space in the 'Documents and Settings' is causing this problem.
EDIT:
Use os.path.join so it uses the correct directory separator.
Just add these two lines in your code
import OS
os.chdir('C:\Python27\Lib\site-packages\pytesser')
before
from pytesser import *
If you are using pytesseract, you have to make sure that you have installed Tesseract-OCR in your system. After that you have to insert the path of the tesseract in your code, as below
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract
OCR/tesseract'
You can download the Tesseract-OCR form https://github.com/UB-Mannheim/tesseract/wiki
Related
I need to merge both audio and video using ffmpeg in python 3.7,could you help me to solve this issue?
Here i used code :
from pathlib import Path
glob_path = Path(r"D:\t")
file_name = [str(pp) for pp in glob_path.glob(r'**/*.mp4')]
print(file_name)
video_stream = ffmpeg.input(file_name[0])
audio_stream = ffmpeg.input(file_name[1])
print(video_stream)
print(audio_stream)
ffmpeg.output(audio_stream, video_stream, 'out.mp4').run()
I getting this out put error
['D:\\t\\audio.mp4', 'D:\\t\\video.mp4']
input(filename='D:\\t\\audio.mp4')[None] <b0ebefda3dca>
input(filename='D:\\t\\video.mp4')[None] <388c87446fd1>
Traceback (most recent call last):
File "C:\Users\LENOVO\PycharmProjects\calc_new\udownlod.py", line 18, in <module>
ffmpeg.output(audio_stream, video_stream, 'out.mp4').run()
File "C:\Users\LENOVO\PycharmProjects\calc_new\venv\lib\site-packages\ffmpeg\_run.py", line 320, in run
overwrite_output=overwrite_output,
File "C:\Users\LENOVO\PycharmProjects\calc_new\venv\lib\site-packages\ffmpeg\_run.py", line 285, in run_async
args, stdin=stdin_stream, stdout=stdout_stream, stderr=stderr_stream
File "C:\Users\LENOVO\Anaconda3\lib\subprocess.py", line 775, in __init__
restore_signals, start_new_session)
File "C:\Users\LENOVO\Anaconda3\lib\subprocess.py", line 1178, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
Looks like one of the files cannot be found. The best thing to do is to check if both files exist at the specified locations.
FileNotFoundError: [WinError 2] The system cannot find the file specified
Even though it was shown path error, that error was in installation package, i used
conda install -c menpo ffmpeg
in https://anaconda.org/menpo/ffmpeg , solved my issue
My code is as follows:
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd = 'B:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
img = Image.open("sample.png")
text = pytesseract.image_to_string(img, lang="eng")
print(text)
The error I get is:
Traceback (most recent call last):
File "C:/PY/tesseract test.py", line 11, in <module>
text = pytesseract.image_to_string(img, lang="eng")
File "C:\PY\lib\site-packages\pytesseract\pytesseract.py", line 346, in image_to_string
return {
File "C:\PY\lib\site-packages\pytesseract\pytesseract.py", line 349, in <lambda>
Output.STRING: lambda: run_and_get_output(*args),
File "C:\PY\lib\site-packages\pytesseract\pytesseract.py", line 260, in run_and_get_output
run_tesseract(**kwargs)
File "C:\PY\lib\site-packages\pytesseract\pytesseract.py", line 236, in run_tesseract
raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')
I have tried searching for other solutions but cannot find anything
I'm not familiar with tesseract in Python, but you may need to load the eng.traineddata binary in order to make it work. Add a TESSDATA_PREFIX to your environment variables and point it to the folder where the binary is located.
You may want to at this answer, looks kind similar to your case: pytesseract Failed loading language \'eng\'
I fixed this issue by uninstalling tesseract and installing an older version (3.0.2). So far I haven't noticed any functionality loss. I'm personally just happy that it works.
img = printscreen_pil
img = img.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(img)
img = enhancer.enhance(2)
img = img.convert('1')
img.save('temp.jpg')
text = pytesseract.image_to_string(Image.open('temp.jpg'))
I want to read the image in order to convert it to text but i get the error system cannot find the file specified. I think it has to do with the working directory of the python. I'm sorry if this is a stupid question but I hope you can help me.
this is the complete error mssg.
Traceback (most recent call last):
File "C:\Users\pncor\Documents\pyprograms\bot.py", line 23, in <module>
text = pytesseract.image_to_string(Image.open('temp.jpg'))
File "C:\Users\pncor\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pytesseract\pytesseract.py", line 122, in image_to_string
config=config)
File "C:\Users\pncor\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pytesseract\pytesseract.py", line 46, in run_tesseract
proc = subprocess.Popen(command, stderr=subprocess.PIPE)
File "C:\Users\pncor\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 707, in __init__
restore_signals, start_new_session)
File "C:\Users\pncor\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py", line 990, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
The tesseract package does not seem to be installed on your system, or it is not found on your path. pytesseract runs the tesseract binary as a sub process in order to perform the OCR.
Use the package manager on your OS to install it, or refer the the installation documentation. You are using Windows so check this out.
Also I don't think that it is necessary to write the enhanced image to file first, just pass it directly to pytesseract.image_to_string:
text = pytesseract.image_to_string(img)
I'm having trouble using Tesseract-OCR with the pytesseract Python wrapper.
I figured that the problem might come from Tesseract itself, not from the wrapper.
So I tried Tesseract in CMD :
C:\Users\Thomas\Desktop>tesseract.exe 'blabla.jpg' 'out.txt'
And it returned the following lines :
Tesseract Open Source OCR Engine v3.05.01 with Leptonica
Error in fopenReadStream: file not found
Error in findFileFormat: image file not found
Error during processing.
I've done the following to install Tesseract :
Installing from there : https://github.com/UB-Mannheim/tesseract/wiki
Adding the path of tesseract.exe to the PATH environment variable
And by the way, the problem I'm having where running my Python code :
from PIL import Image
import pytesseract
text = pytesseract.image_to_string(Image.open('blabla.jpg')
print(text)
is :
Traceback (most recent call last):
File "<ipython-input-1-01e77f902509>", line 1, in <module>
runfile('D:/anaconda/projects/OCR/ocr.py', wdir='D:/anaconda/projects/OCR')
File "D:\anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
execfile(filename, namespace)
File "D:\anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "D:/anaconda/projects/OCR/ocr.py", line 48, in <module>
text = pytesseract.image_to_string(a)
File "D:\anaconda\lib\site-packages\pytesseract\pytesseract.py", line 122, in image_to_string
config=config)
File "D:\anaconda\lib\site-packages\pytesseract\pytesseract.py", line 46, in run_tesseract
proc = subprocess.Popen(command, stderr=subprocess.PIPE)
File "D:\anaconda\lib\subprocess.py", line 707, in __init__
restore_signals, start_new_session)
File "D:\anaconda\lib\subprocess.py", line 990, in _execute_child
startupinfo)
PermissionError: [WinError 5] Access refused
Running the code as Administrator doesn't solve the problem
Thanks a lot for your help !
Firstly, to verify tesseract works or not from Windows command prompt, use " " instead of ' ' if the image and/or output file name consists of space. Otherwise quote symbol is not needed.
C:\Users\Thomas\Desktop>tesseract.exe blabla.jpg out.txt
Secondly, use full file path to specifc the image file. Such as,
pytesseract.pytesseract.tesseract_cmd = 'C:/path/to/tesseract.exe'
text = pytesseract.image_to_string(Image.open('D:/path/to/blabla.jpg'))
Note that forward slash / is used to specific any file path instead of backslash \ , or you use double backslash \\, e.g. 'D:\\path\\to\\blabla.jpg'.
Hope this help.
I have ffmpeg and ffprobe installed on my mac (macOS Sierra), and I have added their path to PATH. I can run them from terminal.
I am trying to use ffprobe to get the width and height of a video file using the following code:
import subprocess
import shlex
import json
# function to find the resolution of the input video file
def findVideoResolution(pathToInputVideo):
cmd = "ffprobe -v quiet -print_format json -show_streams"
args = shlex.split(cmd)
args.append(pathToInputVideo)
# run the ffprobe process, decode stdout into utf-8 & convert to JSON
ffprobeOutput = subprocess.check_output(args).decode('utf-8')
ffprobeOutput = json.loads(ffprobeOutput)
# find height and width
height = ffprobeOutput['streams'][0]['height']
width = ffprobeOutput['streams'][0]['width']
return height, width
h, w = findVideoResolution("/Users/tomburrows/Documents/qfpics/user1/order1/movie.mov")
print(h, w)
I am sorry I cannot provide a MCVE, as I didn't write this code, and I don't really know how it works.
It gives the following error:
Traceback (most recent call last):
File "/Users/tomburrows/Dropbox/Moviepy Tests/get_dimensions.py", line 21, in <module>
h, w = findVideoResolution("/Users/tomburrows/Documents/qfpics/user1/order1/movie.mov")
File "/Users/tomburrows/Dropbox/Moviepy Tests/get_dimensions.py", line 12, in findVideoResolution
ffprobeOutput = subprocess.check_output(args).decode('utf-8')
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 626, in check_output
**kwargs).stdout
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 693, in run
with Popen(*popenargs, **kwargs) as process:
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 947, in __init__
restore_signals, start_new_session)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'ffprobe'
If python is not reading from the PATH file, how can I specify where ffprobe is?
Edit:
It appears the python path is not aligned with my shell path.
Using os.environ["PATH"]+=":/the_path/of/ffprobe/dir" at the beginning of each program allows me to use ffprobe, but why might my python path not be the same as my shell path?
You may use
import os
print os.environ['PATH']
to verify/validate that ffprobe is in your python environment. According to the error you have, it is likely not.