I've got hundreds on PDFs I need to set password. I tried to use pyPDF2 to do that but I got an error:
"DependencyError: PyCryptodome is required for AES algorithm".
I've tried to google any other module like pikepdf but I found only how to crack the password using it and not to actually set password.
Any ideas how to deal with it? I get an error on that line: "input_pdf = PdfFileReader(in_file)"
file = directory + '\\passwords.xlsx'
df = pd.read_excel(file)
df['PDF'] = df.iloc[:,[0]] + '.pdf'
df = df.to_dict('records')
for i in df:
filename = i['PDF']
password = i['Password']
with open(filename, "rb") as in_file:
input_pdf = PdfFileReader(in_file)
output_pdf = PdfFileWriter()
output_pdf.appendPagesFromReader(input_pdf)
output_pdf.encrypt(password)
with open(filename, "wb") as out_file:
output_pdf.write(out_file)
I had the same problem.
You just need to install PyCryptodome package.
For example:
pip install pycryptodome==3.15.0
A.) Great way to do it:
https://roytuts.com/how-to-encrypt-pdf-as-password-protected-file-in-python/
import PyPDF2
#pdf_in_file = open("simple.pdf",'rb')
pdf_in_file = open("gre_research_validity_data.pdf",'rb')
inputpdf = PyPDF2.PdfFileReader(pdf_in_file)
pages_no = inputpdf.numPages
output = PyPDF2.PdfFileWriter()
for i in range(pages_no):
inputpdf = PyPDF2.PdfFileReader(pdf_in_file)
output.addPage(inputpdf.getPage(i))
output.encrypt('password')
#with open("simple_password_protected.pdf", "wb") as outputStream:
with open("gre_research_validity_data_password_protected.pdf", "wb") as outputStream:
output.write(outputStream)
pdf_in_file.close()
B.) If you want to fix your own bug:
solution for similar error message but during counting pages - Not able to find number of pages of PDF using Python 3.X: DependencyError: PyCryptodome is required for AES algorithm
ORIGINAL CODE
! pip install PyPDF2
! pip install pycryptodome
from PyPDF2 import PdfFileReader
from Crypto.Cipher import AES
if PdfFileReader('Media Downloaded Files/spk-10-3144 bro.pdf').isEncrypted:
print('This file is encrypted.')
else:
print(PdfFileReader('Media Downloaded Files/spk-10-3144-bro.pdf').numPages)
FIX
! pip install pikepdf
from pikepdf import Pdf
pdf = Pdf.open('Media Downloaded Files/spk-10-3144-bro.pdf')
len(pdf.pages)
Related
import os
import glob
import comtypes.client
from PyPDF2 import PdfFileMerger
def docxs_to_pdf():
"""Converts all word files in pdfs and append them to pdfslist"""
word = comtypes.client.CreateObject('Word.Application')
pdfslist = PdfFileMerger()
x = 0
for f in glob.glob("*.docx"):
input_file = os.path.abspath(f)
output_file = os.path.abspath("demo" + str(x) + ".pdf")
# loads each word document
doc = word.Documents.Open(input_file)
doc.SaveAs(output_file, FileFormat=16+1)
doc.Close() # Closes the document, not the application
pdfslist.append(open(output_file, 'rb'))
x += 1
word.Quit()
return pdfslist
def joinpdf(pdfs):
"""Unite all pdfs"""
with open("result.pdf", "wb") as result_pdf:
pdfs.write(result_pdf)
def main():
"""docxs to pdfs: Open Word, create pdfs, close word, unite pdfs"""
pdfs = docxs_to_pdf()
joinpdf(pdfs)
main()
I am using jupyter notebook and it throw an error what should I do :
this is error message
I am going to convert many .doc file to one pdf. Help me I am beginner in this field.
Make sure you have all the dependencies installed in your environment. You can use pip to install comtypes.client, simply pass this in your terminal:
pip install comtypes
You can download _ctypes from sourceforge:
https://sourceforge.net/projects/ctypes/files/ctypes/1.0.2/ctypes-1.0.2.tar.gz/download?use_mirror=deac-fra
Using docx2pdf does seem easier for your task though. After you converted the files you can use PyPDF2 to append them.
How can I encrypt the document so it is not allowed to edit text or should not allow copying content from pdf files?
I tried setting different user and admin passwords but still, I was able to edit the text in pdf editor.
import pikepdf
from pikepdf import Pdf
pdf = Pdf.open("document.pdf")
pdf.save('output_filename.pdf', encryption=pikepdf.Encryption(owner="password", user="123", R=6))
pdf.close()
Basically if there is way without password if I can encrypt document to edit than that will well and good. Thanks in Advance.
You need to PyPDF2 (pip install PyPDF2) for using this script. And try this:
import PyPDF2
#pdf_in_file = open("document.pdf",'rb')
pdf_in_file = open("document.pdf",'rb')
inputpdf = PyPDF2.PdfFileReader(pdf_in_file)
pages_no = inputpdf.numPages
output = PyPDF2.PdfFileWriter()
for i in range(pages_no):
inputpdf = PyPDF2.PdfFileReader(pdf_in_file)
output.addPage(inputpdf.getPage(i))
output.encrypt('password_u_want')
#with open("simple_password_protected.pdf", "wb") as outputStream:
with open("output_filename.pdf", "wb") as outputStream:
output.write(outputStream)
pdf_in_file.close()
Here details for PyPDF2: https://pythonhosted.org/PyPDF2/PdfFileWriter.html
I am trying to make docs read in python and I am not getting any error, but it also doesn't show what is written in document.
from docx import Document
import os
file = open('C:\\Users\\hamza\\Desktop\\Python\\qwe.docx','r', encoding='utf8' )
document =(file.read())
file.close()
Try pip install docx2txt
from docx2txt import process
import os
path = r'C:\Users\hamza\Desktop\Python\qwe.docx'
text = process(path)
with open(os.path.basename(path) + '.txt', 'w') as f:
f.write(text)
I have got a PDF file and associated password.
I would to convert an encrypted file to a clear version using python only.
I found here some python modules (pyPdf2 , PDFMiner)
to treat PDF file but none of them will work with encryption.
Someone have already done this ?
Now pyPDF2 support encryption, according to this answer, it may be implemented like this:
import os
import PyPDF2
from PyPDF2 import PdfFileReader
fp = open(filename)
pdfFile = PdfFileReader(fp)
password = "mypassword"
if pdfFile.isEncrypted:
try:
pdfFile.decrypt(password)
print('File Decrypted (PyPDF2)')
except:
command = ("cp "+ filename +
" temp.pdf; qpdf --password='' --decrypt temp.pdf " + filename
+ "; rm temp.pdf")
os.system(command)
print('File Decrypted (qpdf)')
fp = open(filename)
pdfFile = PdfFileReader(fp)
else:
print('File Not Encrypted')
Note that this code, use pyPDF2 by default and failback to qpdf in case of issue.
You'd also need to know the encryption algorithm and key length to be able to advise which tool might work... and depending on the answers, a python library may not be available.
I write a pdf cracking and found the password of the protected pdf file. I want to write a program in Python that can display that pdf file on the screen without password.I use the PyPDF library.
I know how to open a file without the password, but can't figure out the protected one.Any idea? Thanks
filePath = raw_input()
password = 'abc'
if sys.platform.startswith('linux'):
subprocess.call(["xdg-open", filePath])
The approach shown by KL84 basically works, but the code is not correct (it writes the output file for each page). A cleaned up version is here:
https://gist.github.com/bzamecnik/1abb64affb21322256f1c4ebbb59a364
# Decrypt password-protected PDF in Python.
#
# Requirements:
# pip install PyPDF2
from PyPDF2 import PdfFileReader, PdfFileWriter
def decrypt_pdf(input_path, output_path, password):
with open(input_path, 'rb') as input_file, \
open(output_path, 'wb') as output_file:
reader = PdfFileReader(input_file)
reader.decrypt(password)
writer = PdfFileWriter()
for i in range(reader.getNumPages()):
writer.addPage(reader.getPage(i))
writer.write(output_file)
if __name__ == '__main__':
# example usage:
decrypt_pdf('encrypted.pdf', 'decrypted.pdf', 'secret_password')
You should use pikepdf library nowadays instead:
import pikepdf
with pikepdf.open("input.pdf", password="abc") as pdf:
num_pages = len(pdf.pages)
print("Total pages:", num_pages)
PyPDF2 doesn't support many encryption algorithms, pikepdf seems to solve them, it supports most of password protected methods, and also documented and actively maintained.
You can use pdfplumber library. Super easy to use and reads machine written pdf files seamlessly, better than any other library i have used.
import pdfplumber
with pdfplumber.open(r'D:\examplepdf.pdf' , password = 'abc') as pdf:
first_page = pdf.pages[0]
print(first_page.extract_text())
I have the answer for this question. Basically, the PyPDF2 library needs to install and use in order to get this idea working.
#When you have the password = abc you have to call the function decrypt in PyPDF to decrypt the pdf file
filePath = raw_input("Enter pdf file path: ")
f = PdfFileReader(file(filePath, "rb"))
output = PdfFileWriter()
f.decrypt ('abc')
# Copy the pages in the encrypted pdf to unencrypted pdf with name noPassPDF.pdf
for pageNumber in range (0, f.getNumPages()):
output.addPage(f.getPage(pageNumber))
# write "output" to noPassPDF.pdf
outputStream = file("noPassPDF.pdf", "wb")
output.write(outputStream)
outputStream.close()
#Open the file now
if sys.platform.startswith('darwin'):#open in MAC OX
subprocess.call(["open", "noPassPDF.pdf"])