PDF merging in Python 3 - python

Is there a working PDF manipulation module for Python 3? I've tried Pypdf, but it glitches out when I try to install with PIP. I'd like to merge PDF files. If I use Pypdf2, I get the following message using this code:
from pypdf2 import merger, PdfFileReader
with open('test1.pdf', 'rb') as f:
with open('test2.pdf', 'rb') as f2:
merger = PdfFileMerger()
merger.merge(position=0, fileobj=f2)
merger.merge(position=0, fileobj=f)
merger.write(open("test_out.pdf", 'wb'))
"File "c:\...merger.py", line 97, in merge
elif type(fileobj) == file:
NameError: global name 'file' is not defined"
Line 97 of merger.py is:
elif type(fileobj) == file:
I get similar errors in my own code when using code such as
input1 = PdfFileReader(file("document1.pdf", "rb")) - that's a copy and paste from http://www.blog.pythonlibrary.org/2012/07/11/pypdf2-the-new-fork-of-pypdf/

It seems that is a bug in PyPDF2... file is gone in python3, that's why you get an error here.
A quick fix would be to add this to the imports in merger.py:
from io import FileIO as file

Related

ImportError: cannot import name 'COMError' from '_ctypes' (/usr/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so)

import os
import glob
import comtypes.client
from PyPDF2 import PdfFileMerger
def docxs_to_pdf():
"""Converts all word files in pdfs and append them to pdfslist"""
word = comtypes.client.CreateObject('Word.Application')
pdfslist = PdfFileMerger()
x = 0
for f in glob.glob("*.docx"):
input_file = os.path.abspath(f)
output_file = os.path.abspath("demo" + str(x) + ".pdf")
# loads each word document
doc = word.Documents.Open(input_file)
doc.SaveAs(output_file, FileFormat=16+1)
doc.Close() # Closes the document, not the application
pdfslist.append(open(output_file, 'rb'))
x += 1
word.Quit()
return pdfslist
def joinpdf(pdfs):
"""Unite all pdfs"""
with open("result.pdf", "wb") as result_pdf:
pdfs.write(result_pdf)
def main():
"""docxs to pdfs: Open Word, create pdfs, close word, unite pdfs"""
pdfs = docxs_to_pdf()
joinpdf(pdfs)
main()
I am using jupyter notebook and it throw an error what should I do :
this is error message
I am going to convert many .doc file to one pdf. Help me I am beginner in this field.
Make sure you have all the dependencies installed in your environment. You can use pip to install comtypes.client, simply pass this in your terminal:
pip install comtypes
You can download _ctypes from sourceforge:
https://sourceforge.net/projects/ctypes/files/ctypes/1.0.2/ctypes-1.0.2.tar.gz/download?use_mirror=deac-fra
Using docx2pdf does seem easier for your task though. After you converted the files you can use PyPDF2 to append them.

Issue writting to file with pyinstaller

So an update, I found my compile issue was that I needed to change my notebook to a py file and choosing save as doesn't do that. So I had to run a different script turn my notebook to a py file. And part of my exe issue was I was using the fopen command that apparently isn't useable when compiled into a exe. So I redid the code to what is above. But now I get a write error when trying to run the script. I can not find anything on write functions with os is there somewhere else I should look?
Original code:
import requests
import json
import pandas as pd
import csv
from pathlib import Path
response = requests.get('url', headers={'CERT': 'cert'}, stream=True).json()
json2 = json.dumps(response)
f = open('data.json', 'r+')
f.write(json2)
f.close()
Path altered code:
import requests
import json
import pandas as pd
import csv
from pathlib import Path
response = requests.get('url', headers={'CERT': 'cert'}, stream=True).json()
json2 = json.dumps(response)
filename = 'data.json'
if '_MEIPASS2' in os.environ:
filename = os.path.join(os.environ['_MEIPASS2'], filename)
fd = open(filename, 'r+')
fd.write(json2)
fd.close()
The changes to the code allowed me to get past the fopen issue but created a write issue. Any ideas?
If you want to write to a file, you have to open it as writable.
fd = open(filename, 'wb')
Although I don't know why you're opening it in binary if you're writing text.

Saving urlretrieve in to zip file python

Trying to save a xml file to a into a zip file, but I'm getting an error with the directory. I have the following code:
if not os.path.exists(log_file_path):
os.makedirs(log_file_path)
for s in xml_list:
parent_file = zipfile.ZipFile(zip_file_name, "w")
urllib.urlretrieve(log_repository_url + "/r.xml", zip_file_name + "\\r.xml")
parent_file.close()
The error is saying I don't have r.xml in the zip file. Shouldn't this code create the .xml file and write to it? If not, how should I proceed?
Thank you!
Question: The error is saying I don't have r.xml in the zip file.
You have to write it to your ZipFile, for instance:
with ZipFile(zip_file_name, 'w') as myzip:
local_filename, headers =
urllib.request.urlretrieve(log_repository_url + "/r.xml")
myzip.write(local_filename, arcname="r.xml")
Python ยป 3.6.1 Documentation:
urllib.request.urlretrieve
zipfile.ZipFile.write

Python: Creating and writing to a file error

I am having trouble creating and writing to a text file in Python. I am running Python 3.5.1 and have the following code to try and create and write to a file:
from os import *
custom_path = "MyDirectory/"
if not path.exists(custom_path)
mkdir(custom_path)
text_path = custom_path + "MyTextFile.txt"
text_file = open(text_path, "w")
text_file.write("my text")
But I get a TypeError saying an integer is required (got type str) at the line text_file = open(text_path, "w").
I don't know what I'm doing wrong as my code is just about identical to that of several tutorial sites showing how to create and write to files.
Also, does the above code create the text file if it doesn't exist, and if not how do I create it?
Please don't import everything from os module:
from os import path, mkdir
custom_path = "MyDirectory/"
if not path.exists(custom_path):
mkdir(custom_path)
text_path = custom_path + "MyTextFile.txt"
text_file = open(text_path, 'w')
text_file.write("my text")
Because there also a "open" method in os module which will overwrite the native file "open" method.

Open a protected pdf file in python

I write a pdf cracking and found the password of the protected pdf file. I want to write a program in Python that can display that pdf file on the screen without password.I use the PyPDF library.
I know how to open a file without the password, but can't figure out the protected one.Any idea? Thanks
filePath = raw_input()
password = 'abc'
if sys.platform.startswith('linux'):
subprocess.call(["xdg-open", filePath])
The approach shown by KL84 basically works, but the code is not correct (it writes the output file for each page). A cleaned up version is here:
https://gist.github.com/bzamecnik/1abb64affb21322256f1c4ebbb59a364
# Decrypt password-protected PDF in Python.
#
# Requirements:
# pip install PyPDF2
from PyPDF2 import PdfFileReader, PdfFileWriter
def decrypt_pdf(input_path, output_path, password):
with open(input_path, 'rb') as input_file, \
open(output_path, 'wb') as output_file:
reader = PdfFileReader(input_file)
reader.decrypt(password)
writer = PdfFileWriter()
for i in range(reader.getNumPages()):
writer.addPage(reader.getPage(i))
writer.write(output_file)
if __name__ == '__main__':
# example usage:
decrypt_pdf('encrypted.pdf', 'decrypted.pdf', 'secret_password')
You should use pikepdf library nowadays instead:
import pikepdf
with pikepdf.open("input.pdf", password="abc") as pdf:
num_pages = len(pdf.pages)
print("Total pages:", num_pages)
PyPDF2 doesn't support many encryption algorithms, pikepdf seems to solve them, it supports most of password protected methods, and also documented and actively maintained.
You can use pdfplumber library. Super easy to use and reads machine written pdf files seamlessly, better than any other library i have used.
import pdfplumber
with pdfplumber.open(r'D:\examplepdf.pdf' , password = 'abc') as pdf:
first_page = pdf.pages[0]
print(first_page.extract_text())
I have the answer for this question. Basically, the PyPDF2 library needs to install and use in order to get this idea working.
#When you have the password = abc you have to call the function decrypt in PyPDF to decrypt the pdf file
filePath = raw_input("Enter pdf file path: ")
f = PdfFileReader(file(filePath, "rb"))
output = PdfFileWriter()
f.decrypt ('abc')
# Copy the pages in the encrypted pdf to unencrypted pdf with name noPassPDF.pdf
for pageNumber in range (0, f.getNumPages()):
output.addPage(f.getPage(pageNumber))
# write "output" to noPassPDF.pdf
outputStream = file("noPassPDF.pdf", "wb")
output.write(outputStream)
outputStream.close()
#Open the file now
if sys.platform.startswith('darwin'):#open in MAC OX
subprocess.call(["open", "noPassPDF.pdf"])

Categories