Using too many Open calls. How do I close all files? - python

I'm trying to change a lot of pdf-files. Because of this I must open a lot of files. I use the method open to many times. So python gives the error too many open files.
I hope my code is grace.writer many too similar
readerbanner = PyPDF2.pdf.PdfFileReader(open('transafe.pdf', 'rb'))
readertestpages = PyPDF2.pdf.PdfFileReader(open(os.path.join(Cache_path, cache_file_name), 'rb'))
writeroutput.write(open(os.path.join(Output_path,cache_file_name), 'wb'))
or
writer_output.write(open(os.path.join(Cache_path, str(NumPage) + "_" + pdf_file_name), 'wb'))
reader_page_x = PyPDF2.pdf.PdfFileReader(open(os.path.join(PDF_path, pdf_file_name), 'rb'))
All the open methods do not use f_name = open("path","r").
because all open file have period. I know the position but not know how close all open files.

To close a file just call close() on it.
You can also use a context manager which handles file closing for you:
with open('file.txt') as myfile:
# do something with myfile here
# here myfile is automatically closed

As far as i know, this code should not open too many files. Unless it is run a lot of times.
Regardless, the problem consists of you calling:
PyPDF2.pdf.PdfFileReader(open('transafe.pdf', 'rb'))
and similar. This creates a file object, but saves no reference to it.
What you need to do for all open calls is:
file = open('transafe.pdf', 'rb')
PyPDF2.pdf.PdfFileReader(file)
And then:
file.close()
when you do not use the file anymore.
If you want to close many files at the same time, put them in a list.

with statement
with open("abc.txt", "r") as file1, open("123.txt", "r") as file2:
# use files
foo = file1.readlines()
# they are closed automatically
print(file1.closed)
# -> True
print(file2.closed)
# -> True
wrapper function
files = []
def my_open(*args):
f = open(*args)
files.append(f)
return f
# use my_open
foo = my_open("text.txt", "r")
# close all files
list(map(lambda f: f.close(), files))
wrapper class
class OpenFiles():
def __init__(self):
self.files = []
def open(self, *args):
f = open(*args)
self.files.append(f)
return f
def close(self):
list(map(lambda f: f.close(), self.files))
files = OpenFiles()
# use open method
foo = files.open("text.txt", "r")
# close all files
files.close()

ExitStack can be useful:
https://docs.python.org/3/library/contextlib.html#contextlib.ExitStack
with ExitStack() as stack:
files = [stack.enter_context(open(fname)) for fname in filenames]
# All opened files will automatically be closed at the end of
# the with statement, even if attempts to open files later
# in the list raise an exception

Related

Undesired deletion of temporaly files

I am try to create some temporal files and make some operations on them inside a loop. Then I will access the information on all of the temporal files. And do some operations with that information. For simplicity I brought the following code that reproduces my issue:
import tempfile
tmp_files = []
for i in range(40):
tmp = tempfile.NamedTemporaryFile(suffix=".txt")
with open(tmp.name, "w") as f:
f.write(str(i))
tmp_files.append(tmp.name)
string = ""
for tmp_file in tmp_files:
with open(tmp_file, "r") as f:
data = f.read()
string += data
print(string)
ERROR:
with open(tmp_file, "r") as f: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpynh0kbnw.txt'
When I look on /tmp directory (with some time.sleep(2) on the loop) I see that the file is deleted and only one is preserved. And for that the error.
Of course I could handle to keep all the files with the flag tempfile.NamedTemporaryFile(suffix=".txt", delete=False). But that is not the idea. I would like to hold the temporal files just for the running time of the script. I also could delete the files with os.remove. But my question is more why this happen. Because I expected that the files hold to the end of the running. Because I don't close the file on the execution (or do I?).
A lot of thanks in advance.
tdelaney does already answer your actual question.
I just would like to offer you an alternative to NamedTemporaryFile. Why not creating a temporary folder which is removed (with all files in it) at the end of the script?
Instead of using a NamedTemporaryFile, you could use tempfile.TemporaryDirectory. The directory will be deleted when closed.
The example below uses the with statement which closes the file handle automatically when the block ends (see John Gordon's comment).
import os
import tempfile
with tempfile.TemporaryDirectory() as temp_folder:
tmp_files = []
for i in range(40):
tmp_file = os.path.join(temp_folder, f"{i}.txt")
with open(tmp_file, "w") as f:
f.write(str(i))
tmp_files.append(tmp_file)
string = ""
for tmp_file in tmp_files:
with open(tmp_file, "r") as f:
data = f.read()
string += data
print(string)
By default, a NamedTemporaryFile deletes its file when closed. its a bit subtle, but tmp = tempfile.NamedTemporaryFile(suffix=".txt") in the loop causes the previous file to be deleted when tmp is reassigned. One option is to use the delete=False parameter. Or, just keep the file open and seek to the beginning after the write.
NamedTemporaryFile is already a file object - you can write to it directly without reopening. Just make sure the mode is "write plus" and in text, not binary mode. Put the code an a try/finally block to make sure the files are really deleted at the end.
import tempfile
tmp_files = []
try:
for i in range(40):
tmp = tempfile.NamedTemporaryFile(suffix=".txt", mode="w+")
tmp.write(str(i))
tmp.seek(0)
tmp_files.append(tmp)
string = ""
for tmp_file in tmp_files:
data = tmp_file.read()
string += data
finally:
for tmp_file in tmp_files:
tmp_file.close()
print(string)

Delete a temporal file when it is closed in Python

I have a function similar to this:
def open_tmp():
tmp = mktemp()
copy('file.txt', tmp)
return open(tmp, 'rt')
And I would like to remove automatically the created temporal file when the file will close, for example:
file = open_tmp()
# Do something with file
file.close() # I want to remove the temporal file here
Is it possible? I thought to create a subclass of BaseIO and rewrite the close() function, but I think it is too much work because I would have to rewrite all the BaseIO methods.
You can try this code snippet. As per security concern I recommended to use tempfile instead of your code.
import os
import tempfile
new_file, file_path = tempfile.mkstemp()
try:
with os.fdopen(new_file, 'w') as temp_file:
# Do something with file
temp_file.write('write some dumy text in file')
finally:
os.remove(file_path)
I've found the solution:
import os
import tempfile
def open_tmp():
tmp = tempfile.mkstemp()
copy('file.txt', tmp) # This copy the file.txt to tmp
file = open(tmp, 'rt')
old_close = file.close
def close():
old_close()
os.remove(tmp)
file.close = close
return file

race-condition: reading/writing file (windows)

I have the following situation:
-different users (all on windows OS) that run a python script that can either read or write to pickle file located on a shared folder.
-the "system" is designed in way that only one user at a time will be writing to the file (therefore no race condition of more processes trying to WRITE at the same time on the file)
-the basic code to write would be this:
with open(path + r'\final_db.p', 'wb') as f:
pickle.dump((x, y), f)
-while code to read would be:
with open(path + r'\final_db.p', 'rb') as f:
x, y = pickle.load(f)
-x is list of 5K or plus elements, where each element is a class instance containing many attributes and functions; y is a date
QUESTION:
am i correct assuming that there is a race condition when a reading and a writing process overlap? and that the reading one can end up with a corrupt file?
PROPOSED SOLUTIONS:
1.a possible solution i thought of is using filelock:
code to write:
file_path = path + r'\final_db.p'
lock_path = file_path + '.lock'
lock = filelock.FileLock(lock_path, timeout=-1)
with lock:
with open(file_path, 'wb') as f:
pickle.dump((x, y), f)
code to read:
file_path = path + r'\final_db.p'
lock_path = file_path + '.lock'
lock = filelock.FileLock(lock_path, timeout=-1)
with lock:
with open(file_path, 'rb') as f:
x, y = pickle.load(f)
this solution should work (??), but if a process crash, the file remains blocked till the "file_path + '.lock'" is cancelled
2.another solution could be to use portalocker
code to write:
with open(path + r'\final_db.p', 'wb') as f:
portalocker.lock(f, portalocker.LOCK_EX)
pickle.dump((x, y), f)
code to read:
segnale = True
while segnale:
try:
with open(path + r'\final_db.p', 'rb') as f:
x, y = pickle.load(f)
segnale = False
except:
pass
the reading process, if another process started writing before it, will keep looping till the file is unlocked (except PermissionError).
if the writing process started after the reading process, the reading should loop if the file is corrupt.
what i am not sure about is if the reading process could end up reading a partially written file.
Any advice? better solutions?

ValueError: I/O operation on closed file while looping

Hi I am facing I/O error while looping file execution. The code prompt 'ValueError: I/O operation on closed file.' while running. Does anyone have any idea while says operation on closed as I am opening new while looping? Many thanks
code below:
with open('inputlist.csv', 'r') as f: #input list reading
reader = csv.reader(f)
queries2Google = reader
print(queries2Google)
def QGN(query2Google):
s = '"'+query2Google+'"' #Keywords for query, to solve the + for space
s = s.replace(" ","+")
date = str(datetime.datetime.now().date()) #timestamp
filename =query2Google+"_"+date+"_"+'SearchNews.csv' #csv filename
f = open(filename,"wb") #open output file
pass
df = np.reshape(df,(-1,3))
itemnum,col=df.shape
itemnum=str(itemnum)
df1 = pd.DataFrame(df,columns=['Title','URL','Brief'])
print("Done! "+itemnum+" pieces found.")
df1.to_csv(filename, index=False,encoding='utf-8')
f.close()
return
for query2Google in queries2Google:
QGN(query2Google) #output should be multiple files
with closes the file that you are trying to read once it it is done. So you are opening file, making a csv reader, and then closing the underlying file and then trying to read from it. See more about file i/o here
Solution is to do all of your work on your queries2Google reader INSIDE the with statement:
with open('inputlist.csv', 'r') as f: #input list reading
reader = csv.reader(f)
for q2g in reader:
QGN(q2g)
Some additional stuff:
That pass isn't doing anything and you should probably be using with again inside the QGN function since the file is opened and closed in there. Python doesn't need empty returns. You also don't seem to even be using f in the QGN function.

Creating new text file in Python?

Is there a method of creating a text file without opening a text file in "w" or "a" mode? For instance If I wanted to open a file in "r" mode but the file does not exist then when I catch IOError I want a new file to be created
e.g.:
while flag == True:
try:
# opening src in a+ mode will allow me to read and append to file
with open("Class {0} data.txt".format(classNo),"r") as src:
# list containing all data from file, one line is one item in list
data = src.readlines()
for ind,line in enumerate(data):
if surname.lower() and firstName.lower() in line.lower():
# overwrite the relevant item in data with the updated score
data[ind] = "{0} {1}\n".format(line.rstrip(),score)
rewrite = True
else:
with open("Class {0} data.txt".format(classNo),"a") as src:
src.write("{0},{1} : {2}{3} ".format(surname, firstName, score,"\n"))
if rewrite == True:
# reopen src in write mode and overwrite all the records with the items in data
with open("Class {} data.txt".format(classNo),"w") as src:
src.writelines(data)
flag = False
except IOError:
print("New data file created")
# Here I want a new file to be created and assigned to the variable src so when the
# while loop iterates for the second time the file should successfully open
At the beginning just check if the file exists and create it if it doesn't:
filename = "Class {0} data.txt"
if not os.path.isfile(filename):
open(filename, 'w').close()
From this point on you can assume the file exists, this will greatly simplify your code.
No operating system will allow you to create a file without actually writing to it. You can encapsulate this in a library so that the creation is not visible, but it is impossible to avoid writing to the file system if you really want to modify the file system.
Here is a quick and dirty open replacement which does what you propose.
def open_for_reading_create_if_missing(filename):
try:
handle = open(filename, 'r')
except IOError:
with open(filename, 'w') as f:
pass
handle = open(filename, 'r')
return handle
Better would be to create the file if it doesn't exist, e.g. Something like:
import sys, os
def ensure_file_exists(file_name):
""" Make sure that I file with the given name exists """
(the_dir, fname) = os.path.split(file_name)
if not os.path.exists(the_dir):
sys.mkdirs(the_dir) # This may give an exception if the directory cannot be made.
if not os.path.exists(file_name):
open(file_name, 'w').close()
You could even have a safe_open function that did something similar prior to opening for read and returning the file handle.
The sample code provided in the question is not very clear, specially because it invokes multiple variables that are not defined anywhere. But based on it here is my suggestion. You can create a function similar to touch + file open, but which will be platform agnostic.
def touch_open( filename):
try:
connect = open( filename, "r")
except IOError:
connect = open( filename, "a")
connect.close()
connect = open( filename, "r")
return connect
This function will open the file for you if it exists. If the file doesn't exist it will create a blank file with the same name and the open it. An additional bonus functionality with respect to import os; os.system('touch test.txt') is that it does not create a child process in the shell making it faster.
Since it doesn't use the with open(filename) as src syntax you should either remember to close the connection at the end with connection = touch_open( filename); connection.close() or preferably you could open it in a for loop. Example:
file2open = "test.txt"
for i, row in enumerate( touch_open( file2open)):
print i, row, # print the line number and content
This option should be preferred to data = src.readlines() followed by enumerate( data), found in your code, because it avoids looping twice through the file.

Categories