Textract: failed with exit code 127 // windows 10 // pdftotext

Textract: failed with exit code 127 // windows 10 // pdftotext - python

When I'm trying to run my (after deploying with pyinstaller) program for reading and converting a PDF file and entering it into a google sheet. I get the error shown in the image below. However I can not seem to figure out what the problem is:
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Users\trpfinance\AppData\Local\Programs\Python\Python38-32\lib\site-packages\textract\parsers\utils.py", line 82, in run
pipe = subprocess.Popen(
File "C:\Users\trpfinance\AppData\Local\Programs\Python\Python38-32\lib\subprocess.py", line 854, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\trpfinance\AppData\Local\Programs\Python\Python38-32\lib\subprocess.py", line 1307, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\trpfinance\AppData\Local\Programs\Python\Python38-32\lib\tkinter\__init__.py", line 1883, in __call__
return self.func(*args)
File "EinkaufRGWindows.py", line 40, in InkoopRekeningen
text = textract.process(str(importfolder) + str(i))
File "C:\Users\trpfinance\AppData\Local\Programs\Python\Python38-32\lib\site-packages\textract\parsers\__init__.py", line 77, in process
return parser.process(filename, encoding, **kwargs)
File "C:\Users\trpfinance\AppData\Local\Programs\Python\Python38-32\lib\site-packages\textract\parsers\utils.py", line 46, in process
byte_string = self.extract(filename, **kwargs)
File "C:\Users\trpfinance\AppData\Local\Programs\Python\Python38-32\lib\site-packages\textract\parsers\pdf_parser.py", line 28, in extract
raise ex
File "C:\Users\trpfinance\AppData\Local\Programs\Python\Python38-32\lib\site-packages\textract\parsers\pdf_parser.py", line 20, in extract
return self.extract_pdftotext(filename, **kwargs)
File "C:\Users\trpfinance\AppData\Local\Programs\Python\Python38-32\lib\site-packages\textract\parsers\pdf_parser.py", line 43, in extract_pdftotext
stdout, _ = self.run(args)
File "C:\Users\trpfinance\AppData\Local\Programs\Python\Python38-32\lib\site-packages\textract\parsers\utils.py", line 90, in run
raise exceptions.ShellError(
textract.exceptions.ShellError: The command `pdftotext //Mac/Home/Desktop/Wickey Einkauf Test/Rekeningen/Lekkerkerker_ - 20803471.pdf -` failed with exit code 127
------------- stdout -------------
------------- stderr -------------

You're getting a FileNotFoundError it seems. If you look at the error, the command being run is:
pdftotext //Mac/Home/Desktop/Wickey Einkauf Test/Rekeningen/Lekkerkerker_ -
0803471.pdf -
There are a couple of things here I would look at. Firstly, there is an extra slash at the start of your file path, which seems wrong. Secondly, you have spaces in the file path, but there are no quotations enclosing the path. This second part means pdftotext will read this as a few separate command arguments, rather than one. You can fix this by formatting you subprocess call to have the file wrapped in quotation marks, like so:
pdftotext "example file path.pdf" -

You need to install pdftotext using pip.
To install it you need to have Microsoft Visual C++ 14 or greater.

I had the same issue. It seems to be an OS issue. For me, switching to GIT bash worked. https://github.com/deanmalmgren/textract/issues/229
If you are using Pycharm, change default terminal to bash.

Related

Why do I get the following error when I try to import oct2py?

The error occurs when I try to import the oct2py package. Here's my code:
import oct2py
Here's the error that I get:
Traceback (most recent call last):
File "c:\Users\samke\___\___\___\test.py", line 1, in <module> # I blanked out the path for privacy
import oct2py
File "C:\Users\samke\AppData\Local\Programs\Python\Python310\lib\site-packages\oct2py\__init__.py", line 38, in <module>
octave = Oct2Py()
File "C:\Users\samke\AppData\Local\Programs\Python\Python310\lib\site-packages\oct2py\core.py", line 83, in __init__
self.restart()
File "C:\Users\samke\AppData\Local\Programs\Python\Python310\lib\site-packages\oct2py\core.py", line 533, in restart
self._engine = OctaveEngine(stdin_handler=self._handle_stdin,
File "C:\Users\samke\AppData\Local\Programs\Python\Python310\lib\site-packages\octave_kernel\kernel.py", line 176, in __init__
self.repl = self._create_repl()
File "C:\Users\samke\AppData\Local\Programs\Python\Python310\lib\site-packages\octave_kernel\kernel.py", line 402, in _create_repl
repl = REPLWrapper(cmd, orig_prompt, change_prompt,
File "C:\Users\samke\AppData\Local\Programs\Python\Python310\lib\site-packages\metakernel\replwrap.py", line 61, in __init__
self.child = pexpect.spawnu(cmd_or_spawn, echo=echo,
File "C:\Users\samke\AppData\Local\Programs\Python\Python310\lib\site-packages\metakernel\pexpect.py", line 29, in spawn
child = PopenSpawn(command, timeout=timeout, maxread=maxread,
File "C:\Users\samke\AppData\Local\Programs\Python\Python310\lib\site-packages\pexpect\popen_spawn.py", line 53, in __init__
self.proc = subprocess.Popen(cmd, **kwargs)
File "C:\Users\samke\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 969, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\samke\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 1438, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
OSError: [WinError 193] %1 is not a valid Win32 application
PS C:\Users\samke\Dev\Trading_Program\ML>
If it helps, I'm running python 3.10.5.
Thanks all!
Edit: I do have octave installed, and octave is on my path, as the oct2py documentation says is necessary.

I had similar problem and I spent half a day looking for a solution. Your problem might be similar to mine, so I will explain my situation and the solution, and hopefully, it will help solving your issue as well.
Download octave-7.1.0-w64.zip and extracted it
under the path e.g., C:\octave-7.1.0-w64
Add C:\octave-7.1.0-w64\mingw64\bin to the environment variables.
(run: edit environment variables for your account and edit the path field there).
The reason is that, octave.exe is located in this path, and this what we need to run octave in the console mode.
Run the command prompt (CMD) and type octave (i.e. calling octave.exe).
At this point, I got an error message. And this what caused that error when calling octave from oct2py.
To fix this issue, go to the octave main folder C:\octave-7.1.0-w64 and run the script post-install.bat. This adds and updates octave packages.
Once the post install script is done, run octave again from the command line (step 3). If everything is done correctly, then octave console will start octave:1>. This indicates that octave is ready to be called from oct2py without any issue.
Run python and execute import oct2py. It executed successfully.
I repeated the same procedure on two different devices and it works.

Constantly returning 'file not found' when trying to retreive .jpg EXIF data

I am trying to use PyExifTool to examine the EXIF data of given images. I am using an almost identical copy of what can be found in the github support documents and another StackOverflow resource.
This is what my code looks like:
import exiftool
import os, errno
files = ["/Users/username/Pictures/testimage.jpg"]
with exiftool.ExifTool() as et:
metadata = et.get_tag('DateCreated', files)
print(metadata)
I then receive quite the error message. As I am very new to coding I am struggling to troubleshoot this response.
Traceback (most recent call last):
File "/Users/username/Documents/test.py", line 7, in <module>
with exiftool.ExifTool() as et:
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/exiftool.py", line 191, in __enter__
self.start()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/exiftool.py", line 170, in start
self._process = subprocess.Popen(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/subprocess.py", line 854, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/subprocess.py", line 1702, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'exiftool'
Presumably, the key piece of information here is:
FileNotFoundError: [Errno 2] No such file or directory: 'exiftool'
Except that I don't really understand what I am being told? ExifTool is supposed to be a tool, I am not intentionally trying to access it as a file or directory.
I can be certain ExifTool is correctly installed because if in command line I run...
#bash:~
$ exiftool /Users/username/Pictures/testimage.jpg
...ExifTool returns all the Exif data for my test image including an in-tact 'Date Created' field. Incidentally, this also confirms that my path is correct.
Edit 1: Additional information as per the request of tripleee
(Excuse my ignorance if and when I give the wrong information)
- How do I run the script?
-> I just have that snippet of code in a .py document and then it runs in my idle shell?
- Running type -all exiftool in Bash
exiftool is /usr/local/bin/exiftool
exiftool is /usr/local/bin/exiftool
(yes it does give the same thing twice)
- Running print(os.environ['PATH']) in IDLE
/usr/bin:/bin:/usr/sbin:/sbin
Edit 2: Additional troubleshooting off the back of Tripleee's queries
- Running type -all exiftool in Bash gave two paths.
exiftool is /usr/local/bin/exiftool
exiftool is /usr/local/bin/exiftool
--> I used some code I found elsewhere to try and fix this.
PATH=$(printf "%s" "$PATH" | awk -v RS=':' '!a[$1]++ { if (NR > 1) printf RS; printf $1 }')
Now running type -all exiftool in Bash gives only...
exiftool is /usr/local/bin/exiftool
Until I open a new terminal window/tab, a which point the problem returns.
- How do I run the script?
Well now I open terminal and input IDLE3 so that it is opened from prompt, not via the icon on my hotbar.
This gives a new error...
Traceback (most recent call last):
File "/Users/username/Documents/test.py", line 9, in <module>
metadata = et.get_tag('Date Created', files)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/exiftool.py", line 325, in get_tag
return self.get_tag_batch(tag, [filename])[0]
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/exiftool.py", line 312, in get_tag_batch
data = self.get_tags_batch([tag], filenames)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/exiftool.py", line 291, in get_tags_batch
return self.execute_json(*params)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/exiftool.py", line 252, in execute_json
return json.loads(self.execute(b"-j", *params).decode("utf-8"))
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/exiftool.py", line 108, in fsencode
return filename.encode(encoding, errors)
AttributeError: 'list' object has no attribute 'encode'
The key difference being it ends:
AttributeError: 'list' object has no attribute 'encode'
Edit 3: Additional troubleshooting off the back of Tripleee's queries
- Discovered I had added nano .bash_profile to the end of my .bash_profile.
--> This has been removed
- Running print(os.environ['PATH']) in IDLE but now from prompt using IDLE3
/Library/Frameworks/Python.framework/Versions/3.8/bin:/usr/local/fsl/bin:/usr/local/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin~/.bash_profile

Error while performing OCR using pytesseract

I wanna to use pytesseract. This is my code.
import pytesseract
from pdf2image import convert_from_path
PDF_file = 'file.pdf'
text = ''
pages = convert_from_path(PDF_file, 500)
pageText = str(((pytesseract.image_to_string(pages[0]))))
and at result I get this error
Traceback (most recent call last):
File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pdf2image\pdf2image.py", line 409, in pdfinfo_from_path
proc = Popen(command, env=env, stdout=PIPE, stderr=PIPE)
File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\subprocess.py", line 854, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\subprocess.py", line 1307, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\user\Desktop\projects\pdfparser\pdftest.py", line 13, in
pages = convert_from_path(PDF_file, 500)
File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pdf2image\pdf2image.py", line 89, in convert_from_path
page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"]
File "C:\Users\user\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pdf2image\pdf2image.py", line 430, in pdfinfo_from_path
raise PDFInfoNotInstalledError(
pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?

As a lot of comments already pointed out, the error message
PDFInfoNotInstalledError( pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?
Tells you precisely what went wrong: Poppler is not installed. Please refer to the README for help on that side.
You see, pdf2image is only a wrapper around the pdftoppm command-line utility. On Linux it is installed by default so you would not need to bother with it, but on Windows it is not.

textract doesn´t work on pdf

im new to python. Im using Pycharm 2018.2 and the latest version on Anaconda. Im working on windows 10.
After solving all the problems with installing textract on win 10. I got a positive installation result using anaconda prompt. Additional i have import the Project Interpreter from the \continuum\anaconda3\python.exe
My Target is that i want to extract pdf text from large files so save this text as a .txt
I have tried the test_pdf.py files from textract but they dont work.
Here is the conclusion code:
"textract" is wrong written or cant be found (self translate from
german :-/)
So I tried my own as on the textract page. But it doesnt work...:
Code:
import textract
text = textract.process('pfad/large.pdf')
Results:
C:\Users\raz\AppData\Local\Continuum\anaconda3\python.exe "C:/Users/raz/Google Drive/FOM/Master/Master/NurText/Testo.py"
Traceback (most recent call last):
File "C:\Users\raz\AppData\Local\Continuum\anaconda3\lib\site-packages\textract-1.6.1-py3.6.egg\textract\parsers\utils.py", line 85, in run
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
File "C:\Users\raz\AppData\Local\Continuum\anaconda3\lib\subprocess.py", line 709, in init
restore_signals, start_new_session)
File "C:\Users\raz\AppData\Local\Continuum\anaconda3\lib\subprocess.py", line 997, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] Das System kann die angegebene Datei nicht finden
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/raz/Google Drive/FOM/Master/Master/NurText/Testo.py", line 2, in
text = textract.process('pfad/large.pdf')
File "C:\Users\raz\AppData\Local\Continuum\anaconda3\lib\site-packages\textract-1.6.1-py3.6.egg\textract\parsers_init_.py", line 77, in process
return parser.process(filename, encoding, **kwargs)
File "C:\Users\raz\AppData\Local\Continuum\anaconda3\lib\site-packages\textract-1.6.1-py3.6.egg\textract\parsers\utils.py", line 46, in process
byte_string = self.extract(filename, **kwargs)
File "C:\Users\raz\AppData\Local\Continuum\anaconda3\lib\site-packages\textract-1.6.1-py3.6.egg\textract\parsers\pdf_parser.py", line 28, in extract
raise ex
File "C:\Users\raz\AppData\Local\Continuum\anaconda3\lib\site-packages\textract-1.6.1-py3.6.egg\textract\parsers\pdf_parser.py", line 20, in extract
return self.extract_pdftotext(filename, **kwargs)
File "C:\Users\raz\AppData\Local\Continuum\anaconda3\lib\site-packages\textract-1.6.1-py3.6.egg\textract\parsers\pdf_parser.py", line 43, in extract_pdftotext
stdout, _ = self.run(args)
File "C:\Users\raz\AppData\Local\Continuum\anaconda3\lib\site-packages\textract-1.6.1-py3.6.egg\textract\parsers\utils.py", line 92, in run
' '.join(args), 127, '', '',
textract.exceptions.ShellError: The command pdftotext pfad/large.pdf - failed with exit code 127
------------- stdout -------------
------------- stderr -------------
Thanks for your help

Compiling and Executing Java file in python

how can I open an java file in python?, i've search over the net and found this:
import os.path, subprocess
from subprocess import STDOUT, PIPE
def compile_java (java_file):
subprocess.check_call(['javac', java_file])
def execute_java (java_file):
cmd=['java', java_file]
proc=subprocess.Popen(cmd, stdout = PIPE, stderr = STDOUT)
input = subprocess.Popen(cmd, stdin = PIPE)
print(proc.stdout.read())
compile_java("CsMain.java")
execute_java("CsMain")
but then I got this error:
Traceback (most recent call last):
File "C:\Python33\lib\subprocess.py", line 1106, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\casestudy\opener.py", line 13, in <module>
compile_java("CsMain.java")
File "C:\casestudy\opener.py", line 5, in compile_java
subprocess.check_call(['javac', java_file])
File "C:\Python33\lib\subprocess.py", line 539, in check_call
retcode = call(*popenargs, **kwargs)
File "C:\Python33\lib\subprocess.py", line 520, in call
with Popen(*popenargs, **kwargs) as p:
File "C:\Python33\lib\subprocess.py", line 820, in __init__
restore_signals, start_new_session)
File "C:\Python33\lib\subprocess.py", line 1112, in _execute_child
raise WindowsError(*e.args)
FileNotFoundError: [WinError 2] The system cannot find the file specified
>>>
the python file and java file is in the same folder, and I am using Python 3.3.2, how can I resolve this? or do you guys have another way on doing this?, any answer is appreciated thanks!

I think it isn't recognizing the javac command. Try manually running the command and if javac isn't a recognized command, register it in your PATH variable and try again.
Or you could just try typing the full pathname to the Java directory for javac and java.

you need to add path to your java file name. like this:
compile_java("C:\\path\to\this\CsMain.java")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Textract: failed with exit code 127 // windows 10 // pdftotext - python

You need to install pdftotext using pip. To install it you need to have Microsoft Visual C++ 14 or greater.

I had the same issue. It seems to be an OS issue. For me, switching to GIT bash worked. https://github.com/deanmalmgren/textract/issues/229 If you are using Pycharm, change default terminal to bash.

Related

Why do I get the following error when I try to import oct2py?

Constantly returning 'file not found' when trying to retreive .jpg EXIF data

Error while performing OCR using pytesseract

textract doesn´t work on pdf

Compiling and Executing Java file in python

Categories

Resources