Merging PDF's with python pypdf and deleting merged files - python

I'm trying to write a program in python that takes a PDF file and appends to it first any pdf which includes the name of a fruit to it(Mango, Orange or Apple), then appends the pdf's with the names of animals to the original file(Zebra, Monkey, Dog) and finally appends any remaining PDF's. This is the code I have:
import os
from PyPDF2 import PdfFileReader, PdfFileMerger
originalFile="C:/originalFile.pdf"
merger = PdfFileMerger()
merger.append(PdfFileReader(file(originalFile, 'rb')))
os.remove(originalFile)
for filename in os.listdir('C:/'):
if "Mango" in filename or "Apple" in filename or "Orange" in filename:
if ".pdf" in filename:
merger.append(PdfFileReader(file('C:/'+filename, 'rb')))
os.remove("C:/"+filename)
for filename in os.listdir('C:/'):
if "Zebra" in filename or "Monkey" in filename or "Dog" in filename:
if ".pdf" in filename:
merger.append(PdfFileReader(file('C:/'+filename, 'rb')))
os.remove("C:/"+filename)
for filename in os.listdir('C:/'):
if ".pdf" in filename:
merger.append(PdfFileReader(file('C:/TRIAL/'+filename, 'rb')))
os.remove("C:/TRIAL/"+filename)
merger.write(originalFile)
When I run this program I get the following Error:
os.remove(originalFile)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: 'C:/originalFile.pdf'
Could anyone explain me how to close the file after I've added it to my merger file?

You should close the file explicitly.
fd = file('C:/'+filename, 'rb')
merger.append(PdfFileReader(fd))
fd.close()
os.remove('C:/'+filename)
A safer version:
fd = None
try:
fd = file('C:/'+filename, 'rb')
merger.append(PdfFileReader(fd))
finally:
if fd: fd.close()
if os.path.exists('C:/'+filename): os.remove('C:/'+filename)
Which can be simplified in Python 2.5+ as:
with file('C:/'+filename, 'rb') as fd:
merger.append(PdfFileReader(fd))
if os.path.exists('C:/'+filename): os.remove('C:/'+filename)
Which will cause python to close the file automagically.

To close a file, you should have opened it with with statement, which always closes the file whatever happens to the code inside the with block:
with open(originalFile,'rb') as pdf:
merger.append(PdfFileReader(pdf))
os.remove(originalFile)
This works for me.
Just a reminder that, you can close the file since you have added the pdf into the merger. Note that if you just open it with PdfFileReader(pdf) and haven't done anything to it, you can't delete the file or the PdfFileReader object won't be able to read the file. This is because the PdfFileReader only actually reads the file if you call some read method on it like getPage

Become originalFile has been opened, therefore, you cannot delete the file until you close it.
You need to modify your code like this:
merger = PdfFileMerger()
fin = file(originalFile, 'rb')
merger.append(PdfFileReader(fin))
fin.close()
os.remove(originalFile)

PyPDF merger now has a close method in as of in version 1.26.0
close()
Shuts all file descriptors (input and output) and clears all memory usage.
https://pythonhosted.org/PyPDF2/PdfFileMerger.html

Pdf merging isn't that hard in python. I see that you are already using PdfFileMerger. That should work as long as the pdf file exists, and the user who forks the python process has privileges to access the pdfs being merged. Good luck.

Related

Undesired deletion of temporaly files

I am try to create some temporal files and make some operations on them inside a loop. Then I will access the information on all of the temporal files. And do some operations with that information. For simplicity I brought the following code that reproduces my issue:
import tempfile
tmp_files = []
for i in range(40):
tmp = tempfile.NamedTemporaryFile(suffix=".txt")
with open(tmp.name, "w") as f:
f.write(str(i))
tmp_files.append(tmp.name)
string = ""
for tmp_file in tmp_files:
with open(tmp_file, "r") as f:
data = f.read()
string += data
print(string)
ERROR:
with open(tmp_file, "r") as f: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpynh0kbnw.txt'
When I look on /tmp directory (with some time.sleep(2) on the loop) I see that the file is deleted and only one is preserved. And for that the error.
Of course I could handle to keep all the files with the flag tempfile.NamedTemporaryFile(suffix=".txt", delete=False). But that is not the idea. I would like to hold the temporal files just for the running time of the script. I also could delete the files with os.remove. But my question is more why this happen. Because I expected that the files hold to the end of the running. Because I don't close the file on the execution (or do I?).
A lot of thanks in advance.
tdelaney does already answer your actual question.
I just would like to offer you an alternative to NamedTemporaryFile. Why not creating a temporary folder which is removed (with all files in it) at the end of the script?
Instead of using a NamedTemporaryFile, you could use tempfile.TemporaryDirectory. The directory will be deleted when closed.
The example below uses the with statement which closes the file handle automatically when the block ends (see John Gordon's comment).
import os
import tempfile
with tempfile.TemporaryDirectory() as temp_folder:
tmp_files = []
for i in range(40):
tmp_file = os.path.join(temp_folder, f"{i}.txt")
with open(tmp_file, "w") as f:
f.write(str(i))
tmp_files.append(tmp_file)
string = ""
for tmp_file in tmp_files:
with open(tmp_file, "r") as f:
data = f.read()
string += data
print(string)
By default, a NamedTemporaryFile deletes its file when closed. its a bit subtle, but tmp = tempfile.NamedTemporaryFile(suffix=".txt") in the loop causes the previous file to be deleted when tmp is reassigned. One option is to use the delete=False parameter. Or, just keep the file open and seek to the beginning after the write.
NamedTemporaryFile is already a file object - you can write to it directly without reopening. Just make sure the mode is "write plus" and in text, not binary mode. Put the code an a try/finally block to make sure the files are really deleted at the end.
import tempfile
tmp_files = []
try:
for i in range(40):
tmp = tempfile.NamedTemporaryFile(suffix=".txt", mode="w+")
tmp.write(str(i))
tmp.seek(0)
tmp_files.append(tmp)
string = ""
for tmp_file in tmp_files:
data = tmp_file.read()
string += data
finally:
for tmp_file in tmp_files:
tmp_file.close()
print(string)

Not able to fix file handling issue in python

I wrote python code to search a pattern in a tcl file and replace it with a string, it prints the output but the same is not saved in the tcl file
import re
import fileinput
filename=open("Fdrc.tcl","r+")
for i in filename:
if i.find("set qa_label")!=-1:
print(i)
a=re.sub(r'REL.*','harsh',i)
print(a)
filename.close()
actual result
set qa_label
REL_ts07n0g42p22sadsl01msaA04_2018-09-11-11-01
set qa_label harsh
Expected result is that in my file it should reflect the same result as above but it is not
You need to actually write your changes back to disk if you want to see them affected there. As #ImperishableNight says, you don't want to do this by trying to write to a file you're also reading from...you want to write to a new file. Here's an expanded version of your code that does that:
import re
import fileinput
fin=open("/tmp/Fdrc.tcl")
fout=open("/tmp/FdrcNew.tcl", "w")
for i in fin:
if i.find("set qa_label")!=-1:
print(i)
a=re.sub(r'REL.*','harsh',i)
print(a)
fout.write(a)
else:
fout.write(i)
fin.close()
fout.close()
Input and output file contents:
> cat /tmp/Fdrc.tcl
set qa_label REL_ts07n0g42p22sadsl01msaA04_2018-09-11-11-01
> cat /tmp/FdrcNew.tcl
set qa_label harsh
If you wanted to overwrite the original file, then you would want to read the entire file into memory and close the input file stream, then open the file again for writing, and write modified content to the same file.
Here's a cleaner version of your code that does this...produces an in memory result and then writes that out using a new file handle. I am still writing to a different file here because that's usually what you want to do at least while you're testing your code. You can simply change the name of the second file to match the first and this code will overwrite the original file with the modified content:
import re
lines = []
with open("/tmp/Fdrc.tcl") as fin:
for i in fin:
if i.find("set qa_label")!=-1:
print(i)
i=re.sub(r'REL.*','harsh',i)
print(i)
lines.append(i)
with open("/tmp/FdrcNew.tcl", "w") as fout:
fout.writelines(lines)
Open a tempfile for writing the updated file contents and open the file for writing.
After modifying the lines, write it back in the file.
import re
import fileinput
from tempfile import TemporaryFile
with TemporaryFile() as t:
with open("Fdrc.tcl", "r") as file_reader:
for line in file_reader:
if line.find("set qa_label") != -1:
t.write(
str.encode(
re.sub(r'REL.*', 'harsh', str(line))
)
)
else:
t.write(str.encode(line))
t.seek(0)
with open("Fdrc.tcl", "wb") as file_writer:
file_writer.writelines(t)

Why a new NamedTemporaryFile object has a path, but a file is not available? [duplicate]

I am attempting to create and write to a temporary file on Windows OS using Python. I have used the Python module tempfile to create a temporary file.
But when I go to write that temporary file I get an error Permission Denied. Am I not allowed to write to temporary files?! Am I doing something wrong? If I want to create and write to a temporary file how should should I do it in Python? I want to create a temporary file in the temp directory for security purposes and not locally (in the dir the .exe is executing).
IOError: [Errno 13] Permission denied: 'c:\\users\\blah~1\\appdata\\local\\temp\\tmpiwz8qw'
temp = tempfile.NamedTemporaryFile().name
f = open(temp, 'w') # error occurs on this line
NamedTemporaryFile actually creates and opens the file for you, there's no need for you to open it again for writing.
In fact, the Python docs state:
Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).
That's why you're getting your permission error. What you're probably after is something like:
f = tempfile.NamedTemporaryFile(mode='w') # open file
temp = f.name # get name (if needed)
Use the delete parameter as below:
tmpf = NamedTemporaryFile(delete=False)
But then you need to manually delete the temporary file once you are done with it.
tmpf.close()
os.unlink(tmpf.name)
Reference for bug: https://github.com/bravoserver/bravo/issues/111
regards,
Vidyesh
Consider using os.path.join(tempfile.gettempdir(), os.urandom(24).hex()) instead. It's reliable, cross-platform, and the only caveat is that it doesn't work on FAT partitions.
NamedTemporaryFile has a number of issues, not the least of which is that it can fail to create files because of a permission error, fail to detect the permission error, and then loop millions of times, hanging your program and your filesystem.
The following custom implementation of named temporary file is expanded on the original answer by Erik Aronesty:
import os
import tempfile
class CustomNamedTemporaryFile:
"""
This custom implementation is needed because of the following limitation of tempfile.NamedTemporaryFile:
> Whether the name can be used to open the file a second time, while the named temporary file is still open,
> varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).
"""
def __init__(self, mode='wb', delete=True):
self._mode = mode
self._delete = delete
def __enter__(self):
# Generate a random temporary file name
file_name = os.path.join(tempfile.gettempdir(), os.urandom(24).hex())
# Ensure the file is created
open(file_name, "x").close()
# Open the file in the given mode
self._tempFile = open(file_name, self._mode)
return self._tempFile
def __exit__(self, exc_type, exc_val, exc_tb):
self._tempFile.close()
if self._delete:
os.remove(self._tempFile.name)
This issue might be more complex than many of you think. Anyway this was my solution:
Make use of atexit module
def delete_files(files):
for file in files:
file.close()
os.unlink(file.name)
Make NamedTemporaryFile delete=False
temp_files = []
result_file = NamedTemporaryFile(dir=tmp_path(), suffix=".xlsx", delete=False)
self.temp_files.append(result_file)
Register delete_files as a clean up function
atexit.register(delete_files, temp_files)
tempfile.NamedTemporaryFile() :
It creates and opens a temporary file for you.
f = open(temp, 'w') :
You are again going to open the file which is already open and that's why you are getting Permission Denied error.
If you really wants to open the file again then you first need to close it which will look something like this-
temp= tempfile.NamedTemporaryFile()
temp.close()
f = open(temp.name, 'w')
Permission was denied because the file is Open during line 2 of your code.
close it with f.close() first then you can start writing on your tempfile

Close already open csv in Python

Is there a way for Python to close that the file is already open file.
Or at the very least display a popup that file is open or a custom written error message popup for permission error.
As to avoid:
PermissionError: [Errno 13] Permission denied: 'C:\\zf.csv'
I've seen a lot of solutions that open a file then close it through python. But in my case. Lets say I left my csv open and then tried to run the job.
How can I make it so it closes the currently opened csv?
I've tried the below variations but none seem to work as they expect that I have already opened the csv at an earlier point through python. I suspect I'm over complicating this.
f = 'C:\\zf.csv'
file.close()
AttributeError: 'str' object has no attribute 'close'
This gives an error as there is no reference to opening of file but simply strings.
Or even..
theFile = open(f)
file_content = theFile.read()
# do whatever you need to do
theFile.close()
As well as:
fileobj=open('C:\\zf.csv',"wb+")
if not fileobj.closed:
print("file is already opened")
How do I close an already open csv?
The only workaround I can think of would be to add a messagebox, though I can't seem to get it to detect the file.
filename = "C:\\zf.csv"
if not os.access(filename, os.W_OK):
print("Write access not permitted on %s" % filename)
messagebox.showinfo("Title", "Close your CSV")
Try using a with context, which will manage the close (__exit__) operation smoothly at the end of the context:
with open(...) as theFile:
file_content = theFile.read()
You can also try to copy the file to a temporary file, and open/close/remove it at will. It requires that you have read access to the original, though.
In this example I have a file "test.txt" that is write-only (chmod 444) and it throws a "Permission denied" error if I try writing to it directly. I copy it to a temporary file that has "777" rights so that I can do what I want with it:
import tempfile, shutil, os
def create_temporary_copy(path):
temp_dir = tempfile.gettempdir()
temp_path = os.path.join(temp_dir, 'temp_file_name')
os.chmod(temp_path, 0o777); # give full access to the tempfile so we can copy
shutil.copy2(path, temp_path) # copy the original into the temp one
os.chmod(temp_path, 0o777); # replace permissions from the original file
return temp_path
path = "./test.txt" # original file
copy_path = create_temporary_copy(path) # temp copy
with open(copy_path, "w") as g: # can do what I want with it
g.write("TEST\n")
f = open("C:/Users/amol/Downloads/result.csv", "r")
print(f.readlines()) #just to check file is open
f.close()
# here you can add above print statement to check if file is closed or not. I am using python 3.5

How to use tempfile.NamedTemporaryFile() in Python

I want to use tempfile.NamedTemporaryFile() to write some contents into it and then open that file. I have written following code:
tf = tempfile.NamedTemporaryFile()
tfName = tf.name
tf.seek(0)
tf.write(contents)
tf.flush()
but I am unable to open this file and see its contents in Notepad or similar application. Is there any way to achieve this? Why can't I do something like:
os.system('start notepad.exe ' + tfName)
at the end.
I don't want to save the file permanently on my system. I just want the contents to be opened as a text in Notepad or similar application and delete the file when I close that application.
This could be one of two reasons:
Firstly, by default the temporary file is deleted as soon as it is closed. To fix this use:
tf = tempfile.NamedTemporaryFile(delete=False)
and then delete the file manually once you've finished viewing it in the other application.
Alternatively, it could be that because the file is still open in Python Windows won't let you open it using another application.
Edit: to answer some questions from the comments:
As of the docs from 2 when using delete=False the file can be removed by using:
tf.close()
os.unlink(tf.name)
You can also use it with a context manager so that the file will be closed/deleted when it goes out of scope. It will also be cleaned up if the code in the context manager raises.
import tempfile
with tempfile.NamedTemporaryFile() as temp:
temp.write('Some data')
temp.flush()
# do something interesting with temp before it is destroyed
Here is a useful context manager for this.
(In my opinion, this functionality should be part of the Python standard library.)
# python2 or python3
import contextlib
import os
#contextlib.contextmanager
def temporary_filename(suffix=None):
"""Context that introduces a temporary file.
Creates a temporary file, yields its name, and upon context exit, deletes it.
(In contrast, tempfile.NamedTemporaryFile() provides a 'file' object and
deletes the file as soon as that file object is closed, so the temporary file
cannot be safely re-opened by another library or process.)
Args:
suffix: desired filename extension (e.g. '.mp4').
Yields:
The name of the temporary file.
"""
import tempfile
try:
f = tempfile.NamedTemporaryFile(suffix=suffix, delete=False)
tmp_name = f.name
f.close()
yield tmp_name
finally:
os.unlink(tmp_name)
# Example:
with temporary_filename() as filename:
os.system('echo Hello >' + filename)
assert 6 <= os.path.getsize(filename) <= 8 # depending on text EOL
assert not os.path.exists(filename)

Categories