Unexpected error with pdf2image while using a python loop

Unexpected error with pdf2image while using a python loop - python

I'm using the pd2image module to convert a list of .ai files into .png files in a python loop. When I use the module in a loop it will successful convert the first .ai file into a .png file per page in the .ai file, but it seems to break on the second .ai file.
Here's the code
import os
from pdf2image import convert_from_path
directory = '/Users/jacobpatty/vscode_projects/badger_colors/test_ai_work_orders'
def ai_to_png(ai_file):
convert_from_path(
ai_file,
dpi=200,
output_folder='/Users/jacobpatty/vscode_projects/badger_colors/ai_to_ping_temp_storage',
fmt='png'
)
def end_loop(database):
for filename in os.scandir(database):
ai_to_png(filename)
print(filename)
end_loop(directory)
and here's the error
<DirEntry '8904_Heather_B_12-6-17.ai'>
Traceback (most recent call last):
File "/opt/homebrew/lib/python3.9/site-packages/pdf2image/pdf2image.py", line 479, in pdfinfo_from_path
raise ValueError
ValueError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/jacobpatty/vscode_projects/badger_colors/TS_pdf2image.py", line 21, in <module>
end_loop(directory)
File "/Users/jacobpatty/vscode_projects/badger_colors/TS_pdf2image.py", line 17, in end_loop
ai_to_png(filename)
File "/Users/jacobpatty/vscode_projects/badger_colors/TS_pdf2image.py", line 7, in ai_to_png
convert_from_path(
File "/opt/homebrew/lib/python3.9/site-packages/pdf2image/pdf2image.py", line 98, in convert_from_path
page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"]
File "/opt/homebrew/lib/python3.9/site-packages/pdf2image/pdf2image.py", line 488, in pdfinfo_from_path
raise PDFPageCountError(
pdf2image.exceptions.PDFPageCountError: Unable to get page count.
Syntax Warning: May not be a PDF file (continuing anyway)
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Couldn't read xref table
I am very new to programming, but it seems odd to me that a line of code would work for one iteration but not the next. To fix it I tried using different ai files, but nothing changed and the same error occurred. I tried uninstalling and reinstalling pdf2image but that also didn't help.
Any ideas?

Related

NFLfastpy installed but wont import

I have pip installed nflfastpy ,
But when I import it.
running only
import nflfastpy
i get this error message
(pythonCoursera) C:\Users\austi\PycharmProjects\pythonCoursera>python sportsbet.py
Traceback (most recent call last):
File "C:\Users\austi\PycharmProjects\pythonCoursera\sportsbet.py", line 2, in <module>
import nflfastpy as nfl
File "C:\Users\austi\anaconda3\envs\pythonCoursera\lib\site-packages\nflfastpy\__init__.py", line 16, in <module>
default_headshot = mpl_image.imread(headshot_url)
File "C:\Users\austi\anaconda3\envs\pythonCoursera\lib\site-packages\matplotlib\image.py", line 1536, in imread
raise ValueError(
ValueError: Please open the URL for reading and pass the result to Pillow, e.g. with ``np.array(PIL.Image.open(urllib.request.urlopen(url)))``.
on 1 single line of code, nothing else in the file,
I've trie a few versions of it, cant seem to figure it out.
Any suggestions?

That library has a bug and seems to be not actively maintained. You are on your own. At least that image loading error can be avoided by removing the dead code in nflfastpy/__init__.py like the following.
...
#default_headshot = mpl_image.imread(headshot_url)

`concurrent.futures.ProcessPoolExecutor` on Python is ran from beginning of file instead of the defined function

I have a trouble with concurrent.futures. For the short background, I was trying to do a massive image manipulation with python-opencv2. I stumbled upon performance issue, which is a pain considering it can take hours to process only hundreds of image. I found a solution by using concurrent.futures to utilize CPU multicores to make the process go faster (because I noticed while it took really long time to process, it only use like 16% of my 6-core processor, which is roughly a single-core). So I created the code but then I noticed that the multiprocessing actually start from the beginning of the code instead of isolated around the function I just created. Here's the minimal working reproduction of the error:
import glob
import concurrent.futures
import cv2
import os
def convert_this(filename):
### Read in the image data
img = cv2.imread(filename)
### Resize the image
res = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
res.save("output/"+filename)
try:
#create output dir
os.mkdir("output")
with concurrent.futures.ProcessPoolExecutor() as executor:
files = glob.glob("../project/temp/")
executor.map(convert_this, files)
except Exception as e:
print("Encountered Error!")
print(e)
filelist = glob.glob("output")
for f in filelist:
os.remove(f)
os.rmdir("output")
It gave me an error:
Encountered Error!
Encountered Error!
[WinError 183] Cannot create a file when that file already exists: 'output'
Traceback (most recent call last):
File "M:\pythonproject\testfolder\test.py", line 17, in <module>
os.mkdir("output")
[WinError 183] Cannot create a file when that file already exists: 'output'
Encountered Error!
[WinError 183] Cannot create a file when that file already exists: 'output'
Traceback (most recent call last):
File "M:\pythonproject\testfolder\test.py", line 17, in <module>
os.mkdir("output")
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'output'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\<username>\Anaconda3\envs\py37\lib\multiprocessing\spawn.py", line 105, in spawn_main
Encountered Error!
[WinError 183] Cannot create a file when that file already exists: 'output'
Traceback (most recent call last):
File "M:\pythonproject\testfolder\test.py", line 17, in <module>
os.mkdir("output")
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'output'
...
(it was repeating errors of the same "can't create file")
As you see, the os.mkdir was ran even though it's outside of the convert_this function I just defined. I'm not that new to Python but definitely new in multiprocessing and threading. Is this just how concurrent.futures behaves? Or am I missing some documentation reading?
Thanks.

Yes, multiprocessing must load the file in the new processes before it can run the function (just as it does when you run the file yourself), so it runs all code you have written. So, either (1) move your multiprocessing code to a separate file with nothing extra in it and call that, or (2) enclose your top level code in a function (e.g., main()), and at the bottom of your file write
If __name__ == ”__main__":
main()
This code will only be run when you start the script, but not by the multiprocess-spawned version. See Python docs for details on this construction.

Problem downloading MEGA files with python

I am trying to download a file from my MEGA account using the following code from the mega.py library of Python:
from mega import Mega
mega = Mega()
m = mega.login('example#example.com', 'example')
file = m.find('example.txt')
m.download(file, 'D:\\Desktop')
However, it is always returning:
Traceback (most recent call last):
File "D:\Programas\aNaconda\lib\shutil.py", line 788, in move
os.rename(src, real_dst)
PermissionError: [WinError 32] The file is already being used by another process: 'C:\\Users\\vrida\\AppData\\Local\\Temp\\megapy_xdste432' -> 'example.txt'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<ipython-input-26-c3f75106fafb>", line 1, in <module>
m.download(file)
File "D:\Programas\aNaconda\lib\site-packages\mega\mega.py", line 564, in download
return self._download_file(file_handle=None,
File "D:\Programas\aNaconda\lib\site-packages\mega\mega.py", line 745, in _download_file
shutil.move(temp_output_file.name, output_path)
File "D:\Programas\aNaconda\lib\shutil.py", line 803, in move
os.unlink(src)
PermissionError: [WinError 32] The file is already being used by another process: 'C:\\Users\\vrida\\AppData\\Local\\Temp\\megapy_example'
Actually when I enter in the folder (C:\Users\vrida\AppData\Local\Temp) I find a temporary file like the one I'm wanting to download but named megapy_example.
I saw that the following site has a discussion to solve the problem:
https://www.reddit.com/r/learnpython/comments/mw6is2/download_file_from_mega_using_megapy/
asking to add the following lines to the code:
try:
m.download(file, 'D:\\Desktop')
except PermissionError:
continue
In my case, the continue command wasn't working so I simply put in the pass command. The code runs, but I don't know if the file is really saved or not.
Could someone please help me? I really need to download the files and save them.
If it doesn't work through the mega.py library, you guys would somehow know how to download from the public link like this by Python:
https://mega.co.nz/#!cSZCELDb!5O57KMVMIgrPiH5fnaefWeNPDqoDWzGbY-sZkdTUdNk

There's a bug in the library, it's not closing the file before moving it. You could fix the bug by editing the source code:
Open the file at D:\Programas\aNaconda\lib\site-packages\mega\mega.py
Goto line 745 where the line shutil.move(temp_output_file.name, output_path) is.
Add temp_output_file.close() right above it.
Save & try again.

PyPDF2.utils.PdfReadError: File has not been decrypted

I have been learning Python PyPDF2, This was the code on geeksforgeeks.org/
# importing required modules
import PyPDF2
# creating a pdf file object
pdfFileObj = open('English.pdf', 'rb')
# creating a pdf reader object
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
# printing number of pages in pdf file
print(pdfReader.numPages)
# creating a page object
pageObj = pdfReader.getPage(0)
# extracting text from page
print(pageObj.extractText())
# closing the pdf file object
pdfFileObj.close()
After running this program, this error pops up:
Traceback (most recent call last):
File "C:\Program Files (x86)\Python38-32\lib\site-packages\PyPDF2\pdf.py", line 1147, in getNumPages
self.decrypt('')
File "C:\Program Files (x86)\Python38-32\lib\site-packages\PyPDF2\pdf.py", line 1987, in decrypt
return self._decrypt(password)
File "C:\Program Files (x86)\Python38-32\lib\site-packages\PyPDF2\pdf.py", line 1996, in _decrypt
raise NotImplementedError("only algorithm code 1 and 2 are supported")
NotImplementedError: only algorithm code 1 and 2 are supported
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 11, in <module>
print(pdfReader.getNumPages())
File "C:\Program Files (x86)\Python38-32\lib\site-packages\PyPDF2\pdf.py", line 1150, in getNumPages
raise utils.PdfReadError("File has not been decrypted")
PyPDF2.utils.PdfReadError: File has not been decrypted
I tried different ways to resolve this, but this error stays, can you please guide me here?

PyPDF2 only supports very old PDF files. It doesn't support the formats from Acrobat 6. You'll either need to convert this to an older format or find a different PDF library.
https://github.com/mstamy2/PyPDF2/issues/378

reading pg_dump file occurs at open the file

I'm using the pgdumplib lib. Unfortunately there is an error, when I'm trying to open the file. The file is in the same folder as the python script. I'm using Python 3.7
Code:
import pgdumplib
dump = pgdumplib.load('test.dump')
print('Database: {}'.format(dump.toc.dbname))
print('Archive Timestamp: {}'.format(dump.toc.timestamp))
print('Server Version: {}'.format(dump.toc.server_version))
print('Dump Version: {}'.format(dump.toc.dump_version))
for line in dump.table_data('public', 'pgbench_accounts'):
print(line)
Error:
Traceback (most recent call last):
File "C:/Users/user/data/test.py", line 3, in <module>
dump = pgdumplib.load('test.dump')
File "C:\Users\user\venv\data\lib\site-packages\pgdumplib\__init__.py", line 24, in load
return dump.Dump(converter=converter).load(filepath)
File "C:\Users\user\venv\data\lib\site-packages\pgdumplib\dump.py", line 228, in load
raise ValueError('Path {!r} does not exist'.format(path))
ValueError: Path 'test.dump' does not exist

If you are running your code from C:/Users/user/700Joach/project/ and you have the following line in your script:
dump = pgdumplib.load('test.dump')
Then, python would look for the following path to open test.dump:
C:/Users/user/700Joach/project/test.dump
Namely, this part: load('test.dump') internally is forging a relative path to test.dump.
You can do several things to resolve the issue. Either move test.dump to the directory from which you are executing your code. Or, provide an absolute path to your test.dump as follows:
dump = pgdumplib.load('C:/Users/user/700Joach/project/test.dump')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unexpected error with pdf2image while using a python loop - python

Related

NFLfastpy installed but wont import

`concurrent.futures.ProcessPoolExecutor` on Python is ran from beginning of file instead of the defined function

Problem downloading MEGA files with python

PyPDF2.utils.PdfReadError: File has not been decrypted

reading pg_dump file occurs at open the file

Categories

Resources