I would like to secure PDF files the same way it's possible using Adobe Reader: it's possible to open the file without the password, but copying, changing the document, page extraction, printing in high-resolution etc. are not allowed.
I know that there is a way to encrypt a PDF file using PyPDF2, using this code (for the curious only, taken from https://www.blog.pythonlibrary.org/2018/06/07/an-intro-to-pypdf2/), but it asks for a password before opening the contents and then copying is still possible:
from PyPDF2 import PdfFileWriter, PdfFileReader
def encrypt(input_pdf, output_pdf, password):
pdf_writer = PdfFileWriter()
pdf_reader = PdfFileReader(input_pdf)
for page in range(pdf_reader.getNumPages()):
pdf_writer.addPage(pdf_reader.getPage(page))
pdf_writer.encrypt(user_pwd=password, owner_pwd=None,
use_128bit=True)
with open(output_pdf, 'wb') as fh:
pdf_writer.write(fh)
if __name__ == '__main__':
encrypt(input_pdf='introduction.pdf',
output_pdf='encrypted.pdf',
password='blowfish')
But is there a way to secure a PDF using Adobe Reader commands? I've searched and I failed. Does anybody know how to do it? Hope somebody can help!
Actually, it is possible after all!
The code above works, all you need is to change the user password to empty string, set an owner password and change one line in the PyPDF2's pdf.py file from:
# permit everything:
P = -1
to:
# permit everything:
P = -3904
This block all changing, copying etc. for the encrypted PDF :)
Related
I'm looking to password protect a PDF for editing, but without needing the password to view the file.
Is there a way to do this?
I looked at PyPDF2, but I could only find full encryption.
Just be aware the ISO standard describes this optional feature as
ISO 32000-1:
»Once the document has been opened and decrypted successfully, a conforming reader technically has access to the entire contents of the document. There is nothing inherent in PDF encryption that enforces the document permissions specified in the encryption dictionary.«
It is better described as
Specifying access restrictions for a document, such as Printing allowed: None disables the respective function in Acrobat. However, this not necessarily holds true for third-party PDF viewers or other software. It is up to the developer of PDF tools whether or not access permissions are honored. Indeed, several PDF tools are known to ignore permission settings altogether; commercially available* PDF cracking tools can be used to disable all access restrictions. This has nothing to do with cracking the encryption; there is simply no way that a PDF file can make sure it won’t be printed while it still remains viewable.
Forget aiding commercial exploitation of your limited file as many online sites will unlock files for "FREE" and thus your restricted file is more likely to become touted freely across the web (great for web coverage but poor for your personal demands.) However they just need to use their browser to save the PDF copy contents. See here the "protected" Adobe Master book, in two common viewers.
You can use the permisions flag. For example:
from PyPDF2 import PdfFileReader, PdfFileWriter
pdf_writer = PdfFileWriter()
# CREATE THE PDF HERE
# This is the key line, not the permission_flag parameter
pdf_writer.encrypt(user_pwd='', owner_pwd=PASSWORD, permissions_flag=0b0100)
with open('NewPDF.pdf', "wb") as out_file:
pdf_writer.write(out_file)
You could try pikepdf, it could set the pdf permission.
e.g.
from pikepdf import Pdf, Permissions, Encryption, PasswordError
userkey = 'abcd'
ownerkey = 'king'
with Pdf.open(ori_pdf_file) as pdf:
no_print = Permissions(print_lowres=False, print_highres=False)
pdf.save(enc_pdf_file, encryption = Encryption(user=userkey, owner=ownerkey, allow=no_print))
It would prevent print while opening the enc_pdf_file with 'abcd'(userkey).
You could read the documents for more details:
https://pikepdf.readthedocs.io/en/latest/tutorial.html#saving-changes
https://pikepdf.readthedocs.io/en/latest/api/models.html#pikepdf.Permissions
I'm late to answer this and of course, #MYK is right, just worth mentioning in PyPDF2 version 3.0.0 PdfFileReader, PdfFileWriter is deprecated and some functions like getPage() and addPage() have changed.
from PyPDF2 import PdfWriter, PdfReader
out = PdfWriter()
file = PdfReader("yourFile.pdf")
num = len(file.pages)
for idx in range(num):
page = file.pages[idx]
out.add_page(page)
password = "pass"
out.encrypt(user_pwd='', owner_pwd=password, permissions_flag=0b0100)
with open("yourfile_encrypted.pdf", "wb") as f:
out.write(f)
and permissions are
There is a request has been made to the server using Python's requests module:
requests.get('myserver/pdf', headers)
It returned a status-200 response, which all contains PDF binary data in response.content
Question
How does one create a PDF file from the response.content?
You can create an empty pdf then save write to that pdf in binary like this:
from reportlab.pdfgen import canvas
import requests
# Example of path. This file has not been created yet but we
# will use this as the location and name of the pdf in question
path_to_create_pdf_with_name_of_pdf = r'C:/User/Oleg/MyDownloadablePdf.pdf'
# Anything you used before making the request. Since you did not
# provide code I did not know what you used
.....
request = requests.get('myserver/pdf', headers)
#Actually creates the empty pdf that we will use to write the binary data to
pdf_file = canvas.Canvas(path_to_create_pdf_with_name_of_pdf)
#Open the empty pdf that we created above and write the binary data to.
with open(path_to_create_pdf_with_name_of_pdf, 'wb') as f:
f.write(request.content)
f.close()
The reportlab.pdfgen allows you to make a new pdf by specifying the path you want to save the pdf in along with the name of the pdf using the canvas.Canvas method. As stated in my answer you need to provide the path to do this.
Once you have an empty pdf, you can open the pdf file as wb (write binary) and write the content of the pdf from the request to the file and close the file.
When using the path - ensure that the name is not the name of any existing files to ensure that you do not overwrite any existing files. As the comments show, if this name is the name of any other file then you risk overwriting the data. If you are doing this in a loop for example, you will need to specify the path with a new name at each iteration to ensure that you have a new pdf each time. But if it is a one-off thing then you do not run that risk so as long as it is not the name of another file.
I have a pdf file and I need to edit some text/values in the pdf. For example, In the pdf files that I have BIRTHDAY DD/MM/YYYY is always N/A. I want to change it to whatever value I desire and then save it as a new document. Overwriting existing document is also alright.
I have previously done this so far:
from PyPDF2 import PdfReader, PdfWriter
reader = PdfReader("abc.pdf")
page = reader.pages[0]
writer = PdfWriter()
writer.add_page(reader.pages[0])
pdf_doc = writer.update_page_form_field_values(
reader.pages[0], {"BIRTHDAY DD/MM/YYYY": "123"}
)
with open("new_abc1.pdf", "wb") as fh:
writer.write(fh)
But this update_page_form_field_values() doesn't change the desired value, maybe because this is not a form field?
Screenshot of pdf showing the value to be changed:
Any clues?
I'm the current maintainer of pypdf and PyPDF2 (Please use pypdf; PyPDF2 is deprecated)
It is not possible to change a text with pypdf at the moment.
Changing form contents is a different story. However, we have several issues with form fields: https://github.com/py-pdf/pypdf/labels/workflow-forms
The update_page_form_field_values is the correct function to use.
What I want to do is edit the etc/samba/smb.conf and I want to add
[Test's Files]
comment = Test's Files
path = /files/test
browsable = yes
read only = no
valid users = test
all of this via a Python web application that recieves information from the user's input. For now I just want to know how I would add that piece of text to the file with python.
You can use the open() function in python to obtain a file:
with open('path/to/file', 'w') as output_file:
output_file.write('content')
The second parameter to the open command is the mode. More details can be found on the Python documentation website. Side note: if this is in /etc/, then your application may need special permissions to be able to write to this file. To limit the potential danger of elevated privileges, you should create a subprocess with the elevated permissions that does nothing but write this file so that your main process has normal permissions.
You can look into SMBCOnnection module in python.http://pythonhosted.org/pysmb/api/smb_SMBConnection.html
I have already worked with it is a great module to work with smb server.
with open('path to file', 'w') as fileobj:
fileobj.write('text to be written')
the above peice of code is very hand to open, write.And close a file implicitly performing opening and closing operations
You can do that in this simple way:
with open('/etc/whatever.txt', 'a+') as file:
file.write("""[Test's Files]
comment = Test's Files
path = /files/test
browsable = yes
read only = no
valid users = test""")
But you need to pay attention to the permissions if you want to edit files that only root can write. Please, pay attention to the mode how you open the file! This must be 'a+' and NOT 'w' as it is shown in other answers! Otherwise you will overwrite the file!
I am not too sure the best way to word this, but what I want to do, is read a pdf file, make various modifications, and save the modified pdf over the original file. As of now, I am able to save the modified pdf to a separate file, but I am looking to replace the original, not create a new file.
Here is my current code:
from pyPdf import PdfFileWriter, PdfFileReader
output = PdfFileWriter()
input = PdfFileReader(file('input.pdf', 'rb'))
blank = PdfFileReader(file('C:\\BLANK.pdf', 'rb'))
# Copy the input pdf to the output.
for page in range(int(input.getNumPages())):
output.addPage(input.getPage(page))
# Add a blank page if needed.
if (input.getNumPages() % 2 != 0):
output.addPage(blank.getPage(0))
# Write the output to pdf.
outputStream = file('input.pdf', 'wb')
output.write(outputStream)
outputStream.close()
If i change the outputStream to a different file name, it works fine, I just cant save over the input file because it is still being used. I have tried to .close() the stream, but it was giving me errors as well.
I have a feeling this has a fairly simple solution, I just haven't had any luck finding it.
Thanks!
You can always rename the temporary output file to the old file:
import os
f = open('input.pdf', 'rb')
# do stuff to temp.pdf
f.close()
os.rename('temp.pdf', 'input.pdf')
You said you tried to close() the stream but got errors? You could delete the PdfFileReader objects to ensure nobody still has access to the stream. And then close the stream.
from pyPdf import PdfFileWriter, PdfFileReader
inputStream = file('input.pdf', 'rb')
blankStream = file('C:\\BLANK.pdf', 'rb')
output = PdfFileWriter()
input = PdfFileReader(inputStream)
blank = PdfFileReader(blankStream)
...
del input # PdfFileReader won't mess with the stream anymore
inputStream.close()
del blank
blankStream.close()
# Write the output to pdf.
outputStream = file('input.pdf', 'wb')
output.write(outputStream)
outputStream.close()
If the PDFs are small enough (that'll depend on your platform), you could just read the whole thing in, close the file, modify the data, then write the whole thing back over the same file.