How do I overwrite a file currently being read by Python - python

I am not too sure the best way to word this, but what I want to do, is read a pdf file, make various modifications, and save the modified pdf over the original file. As of now, I am able to save the modified pdf to a separate file, but I am looking to replace the original, not create a new file.
Here is my current code:
from pyPdf import PdfFileWriter, PdfFileReader
output = PdfFileWriter()
input = PdfFileReader(file('input.pdf', 'rb'))
blank = PdfFileReader(file('C:\\BLANK.pdf', 'rb'))
# Copy the input pdf to the output.
for page in range(int(input.getNumPages())):
output.addPage(input.getPage(page))
# Add a blank page if needed.
if (input.getNumPages() % 2 != 0):
output.addPage(blank.getPage(0))
# Write the output to pdf.
outputStream = file('input.pdf', 'wb')
output.write(outputStream)
outputStream.close()
If i change the outputStream to a different file name, it works fine, I just cant save over the input file because it is still being used. I have tried to .close() the stream, but it was giving me errors as well.
I have a feeling this has a fairly simple solution, I just haven't had any luck finding it.
Thanks!

You can always rename the temporary output file to the old file:
import os
f = open('input.pdf', 'rb')
# do stuff to temp.pdf
f.close()
os.rename('temp.pdf', 'input.pdf')

You said you tried to close() the stream but got errors? You could delete the PdfFileReader objects to ensure nobody still has access to the stream. And then close the stream.
from pyPdf import PdfFileWriter, PdfFileReader
inputStream = file('input.pdf', 'rb')
blankStream = file('C:\\BLANK.pdf', 'rb')
output = PdfFileWriter()
input = PdfFileReader(inputStream)
blank = PdfFileReader(blankStream)
...
del input # PdfFileReader won't mess with the stream anymore
inputStream.close()
del blank
blankStream.close()
# Write the output to pdf.
outputStream = file('input.pdf', 'wb')
output.write(outputStream)
outputStream.close()

If the PDFs are small enough (that'll depend on your platform), you could just read the whole thing in, close the file, modify the data, then write the whole thing back over the same file.

Related

How to create a virtual pdf file to be used in Django unit tests [duplicate]

I'm working with Python3 and I want to simulate writing to a file, but without actually creating a file.
For example, my specific case is as follows:
merger = PdfFileMerger()
for pdf in files_to_merge:
merger.append(pdf)
merger.write('result.pdf') # This creates a file. I want to avoid this
merger.close()
# pdf -> binary
with open('result.pdf', mode='rb') as file: # Conversely. I don't want to read the data from an actual file
file_content = file.read()
I think StringIO is a good candidate for this situation, but I don't know how to use it in this case, which would be writing to a StringIO object. It would look something like this:
output = StringIO()
output.write('This goes into the buffer. ')
# Retrieve the value written
print output.getvalue()
output.close() # discard buffer memory
# Initialize a read buffer
input = StringIO('Inital value for read buffer')
# Read from the buffer
print input.read()
Since the PdfFileMerger.write method supports writing to file-like objects, you can simply make the PdfFileMerger object write to a BytesIO object instead:
from io import BytesIO
merger = PdfFileMerger()
for pdf in files_to_merge:
merger.append(pdf)
output = BytesIO()
merger.write(output)
merger.close()
file_content = output.getvalue()

Python and openpyxl is saving my shared workbook as unshared

Using Python and openpyxl I have a code that goes into a workbook, adds a bunch of information to some cells every day, and then closes and saves it. My problem is that if this workbook is open the code doesn't work (obviously).
By making the workbook shared(multiple users can edit at once) I can overcome this problem, however after the code runs once, python saves and then reverts it back to a closed, unshared workbook.
Anyone know if openpyxl can save as shared? I'm not finding anything online.
Pre-emptive thanks for your help.
It seems that when openpyxl saves an Excel workbook, the docProps/app.xml file inside is wiped and contains only minimal information.
A quick (and dirty) solution is to use zipfile to get these information and transfer them into the new/saved file.
import zipfile, openpyxl
def shared_pyxl_save(file_path, workbook):
"""
`file_path`: path to the shared file you want to save
`workbook`: the object returned by openpyxl.load_workbook()
"""
zin = zipfile.ZipFile(file_path, 'r')
buffers = []
for item in zin.infolist():
buffers.append((item, zin.read(item.filename)))
zin.close()
workbook.save(file_path)
zout = zipfile.ZipFile(file_path, 'w')
for item, buffer in buffers:
zout.writestr(item, buffer)
zout.close()
here's my answer post on another page, but this page is the original
well... after playing with it back and forth, for some weird reason zipfile.infolist() does contains the sheet data as well, so here's my way to fine tune it, using the shared_pyxl_save example the previous gentleman provided
basically instead of letting the old file overriding the sheet's data, use the old one
def shared_pyxl_save(file_path, workbook):
"""
`file_path`: path to the shared file you want to save
`workbook`: the object returned by openpyxl.load_workbook()
"""
zin = zipfile.ZipFile(file_path, 'r')
buffers = []
for item in zin.infolist():
if "sheet1.xml" not in item.filename:
buffers.append((item, zin.read(item.filename)))
zin.close()
workbook.save(file_path)
""" loop through again to find the sheet1.xmls and put it into buffer, else will show up error"""
zin2 = zipfile.ZipFile(file_path, 'r')
for item in zin2.infolist():
if "sheet1.xml" in item.filename:
buffers.append((item, zin2.read(item.filename)))
zin2.close()
#finally saves the file
zout = zipfile.ZipFile(file_path, 'w')
for item, buffer in buffers:
zout.writestr(item, buffer)
zout.close()
workbook.close()

Convert/Write PDF to RAM as file-like object for further working with it

My script generates PDF (PyPDF2.pdf.PdfFileWriter object) and stores it in the variable.
I need to work with it as file-like object further in script. But now I have to write it to HDD first. Then I have to open it as file to work with it.
To prevent this unnecessary writing/reading operations I found many solutions - StringIO, BytesIO and so on. But I cannot find what exactly can help me in my case.
As far as I understand - I need to "convert" (or write to RAM) PyPDF2.pdf.PdfFileWriter object to file-like object to work directly with it.
Or there is another method that fits exactly to my case?
UPDATE - here is code-sample
from pdfrw import PdfReader, PdfWriter, PageMerge
from PyPDF2 import PdfFileReader, PdfFileWriter
red_file = PdfFileReader(open("file_name.pdf", 'rb'))
large_pages_indexes = [1, 7, 9]
large = PdfFileWriter()
for i in large_pages_indexes:
p = red_file.getPage(i)
large.addPage(p)
# here final data have to be written (I would like to avoid that)
with open("virtual_file.pdf", 'wb') as tmp:
large.write(tmp)
# here I need to read exported "virtual_file.pdf" (I would like to avoid that too)
with open("virtual_file.pdf", 'rb') as tmp:
pdf = PdfReader(tmp) # here I'm starting to work with this file using another module "pdfrw"
print(pdf)
To avoid slow disk I/O it appears you want to replace
with open("virtual_file.pdf", 'wb') as tmp:
large.write(tmp)
with open("virtual_file.pdf", 'rb') as tmp:
pdf = PdfReader(tmp)
with
buf = io.BytesIO()
large.write(buf)
buf.seek(0)
pdf = PdfReader(buf)
Also, buf.getvalue() is available to you.

Is it possible to input pdf bytes straight into PyPDF2 instead of making a PDF file first

I am using Linux; printing raw to port 9100 returns a "bytes" type. I was wondering if it is possible to go from this straight into PyPDF2, rather than make a pdf file first and using method PdfFileReader?
Thank you for your time.
PyPDF2.PdfFileReader() defines its first parameter as:
stream – A File object or an object that supports the standard read and seek methods similar to a File object. Could also be a string representing a path to a PDF file.
So you can pass any data to it as long as it can be accessed as a file-like stream. A perfect candidate for that is io.BytesIO(). Write your received raw bytes to it, then seek back to 0, pass the object to PyPDF2.PdfFileReader() and you're done.
Yeah, first comment right. Here is code-example for generate pdf-bytes without creating pdf-file:
import io
from typing import List
from PyPDF2 import PdfFileReader, PdfFileWriter
def join_pdf(pdf_chunks: List[bytes]) -> bytes:
# Create empty pdf-writer object for adding all pages here
result_pdf = PdfFileWriter()
# Iterate for all pdf-bytes
for chunk in pdf_chunks:
# Read bytes
chunk_pdf = PdfFileReader(
stream=io.BytesIO( # Create steam object
initial_bytes=chunk
)
)
# Add all pages to our result
for page in range(chunk_pdf.getNumPages()):
result_pdf.addPage(chunk_pdf.getPage(page))
# Writes all bytes to bytes-stream
response_bytes_stream = io.BytesIO()
result_pdf.write(response_bytes_stream)
return response_bytes_stream.getvalue()
A few years later, I've added this to the PyPDF2 docs:
from io import BytesIO
# Prepare example
with open("example.pdf", "rb") as fh:
bytes_stream = BytesIO(fh.read())
# Read from bytes_stream
reader = PdfFileReader(bytes_stream)
# Write to bytes_stream
writer = PdfFileWriter()
with BytesIO() as bytes_stream:
writer.write(bytes_stream)

Python 3.3 - Need help writing a .png file with data from a .txt

Using Python V.3.3
I was wondering how to create a .PNG (or any other picture file) by using hex data that was written in a notepad document. Currently it reads a picture file. From there it turns it into hex format then saves to a notepad document. It then reads the notepad file and grabs the data.
The problem I am having is that when it tries to write a new picture file it does, but there is no data stored. No matter what I try I end up with a blank, 0 byte picture. How do I fix this? Is there any specific format I need to use on my getbyte variable? Any help would be much appreciated. I'm trying to get this to work to possible send/store data easier for 2D game maps.
import binascii
f = open("c:/test1.png", "rb")
ima = f.read()
f.close()
print (binascii.hexlify(ima))
f = open("file123.txt", "w")
f.write(binascii.hexlify(ima).decode('utf-8'))
f.close()
#-----------
f = open("file123.txt", "r+")
getbyte = f.read()
f.close()
getbytes = (binascii.unhexlify(getbyte))
getbyte = (binascii.hexlify(getbytes))
f = open("filetest.png", "wb")
f.write(getbyte)
f.close
#-----------
To save it as a binary image, write getbytes:
getbytes = (binascii.unhexlify(getbyte))
f = open("filetest.png", "wb")
f.write(getbytes)
f.close
I think you are also looking at the wrong directory, try to save under a different name and see if it creates that file.

Categories