Python how to ensure file writing completely when closing program? - python

I have a python script which dump yaml file every second while running. However, I found sometimes the yaml file is not completed. My guess is it is coincidence that at the same time I am closing the progress (the script is running in Windows commend line), while the file is saving. Sample code as following:
class State(object):
def __init__(self):
...
self.__t = threading.Thread(name='StateAutoSave', target=self.__auto_save)
self.__t.start()
def __auto_save(self):
while 1:
try:
...
self.__save()
except Exception as err:
logging.exception(err)
time.sleep(1)
def __save(self):
...
with open(self.__yaml_file, 'w') as outfile:
yaml.dump(data, outfile, default_flow_style=False)
How to avoid this problem? or is there a method like destructor function in python, so that we can do something when the program is being closed? (It seems that 'with' does not fully work here)

The atexit module is made just for this purpose.
https://docs.python.org/3/library/atexit.html
Just beware that it only works for normal termination of the script (ctrl + c is considered normal), and it won't work if you app suddenly crashes or you have to force close it.

Related

Understanding python close method

Is it correctly understood that the following two functions do the exact same? No matter how they are invoked.
def test():
file = open("testfile.txt", "w")
file.write("Hello World")
def test_2():
with open("testfile.txt", "w") as f:
f.write("Hello World")
Since python invokes the close method when an object is no longer referenced.
If not then this quote confuses me:
Python automatically closes a file when the reference object of a file
is reassigned to another file. It is a good practice to use the
close() method to close a file.
from https://www.tutorialspoint.com/python/file_close.htm
No, the close method would be invoked by python garbage collector (finalizer) machinery in the first case, and immediately in the second case. If you loop calling your test or test_2 functions thousands of times, the observed behavior could be different.
File descriptors are (at least on Linux) a precious and scarce resource (when it is exhausted, the open(2) syscall fails). On Linux use getrlimit(2) with RLIMIT_NOFILE to query the limit on the number of file descriptors for your process. You should prefer the close(2) syscall to be invoked quickly once a file handle is useless.
Your question is implementation specific, operating system specific, and computer specific. You may want to understand more about operating systems by reading Operating Systems: Three Easy Pieces.
On Linux, try also the cat /proc/$$/limits or cat /proc/self/limits command in a terminal. You would see a line starting with Max open files (on my Debian desktop computer, right now in december 2019, the soft limit is 1024). See proc(5).
No. The first one will not save the information correctly. You need to use file.close() to ensure that file is closed properly and data is saved.
On the other hand, with statement handles file operations for you. It will keep the file open for as long as the program keeps executing at the same indent level and as soon as it goes to a level higher will automatically close and save the file.
More information here.
In case of test function, close method is not called until Python garbage collector will del f, in this case it's invoked by file __del__ magic method which is invoked on variable deletion.
In case of test_2 function, close method is called when code execution goes outside of with statement. Read more about python context managers which is used by with statement.
with foo as f:
do_something()
roughly is just syntax sugar for:
f = foo.__enter__()
do_something()
f.__exit__()
and in case of file, __exit__ implicitly calls close
No, it is not correctly understood. The close method is invoked via the __exit__ method, which is only invoked when exiting a with statement not when exiting a function. Se code example below:
class Temp:
def __exit__(self, exc_type, exc_value, tb):
print('exited')
def __enter__(self):
pass
def make_temp():
temp = Temp()
make_temp()
print('temp_make')
with Temp() as temp:
pass
print('temp_with')
Witch outputs:
temp_make
exited
temp_with

So confused on how to write to a file repeatedly in one loop (python3.6)

I'm running a program the takes in data from other clients, and have been having an enormous amount of problems writing, and changing information in a file, and I feel like I have tried everything. I want to save the information in case the program stops for some reason, and so the data would have saved. I feel like i have tried everything, using file.flush, using os.fsync() with it, I have tried using with open(file) as file: statements to close the file when the program stops, and currently, I's trying atexit to have a function write to the file when it closes, which hasn't worked out, plus doesn't call on errors, so is kinda irrelevant. I'm looking for a way to write to a file, repeatedly, and, well, work. I may not understand something, so please explain it to me. I have been having trouble without end, and need help.
EDIT
AccData = {}
client = discord.Client()
User = discord.User
def SaveData():
pickle.dump(AccData,data)
data.close()
print("data saved")
atexit.register(SaveData)
f = open('DisCoin.json','rb')
AccData = pickle.load(open('DisCoin.json','rb'))
f.seek(0)
f.close()
data = open('DisCoin.json','wb')
Python catches its own exceptions, most signals and exit() then runs atexit routines for cleanup. So, you can deal with normal badness there.
But other bad things happen. A segmenation fault or other internal error. An unknown signal. Code that calls os._exit(). These will cause an early termination and data not yet flushed will be lost. Bad things can happen to any program and if they need extra resiliency, they need some method to handle that.
You can write things to temporary files and rename them to the "live" file only when they are complete. If a program terminates, at least its last saved data is still there.
You can write a log or journal of changes and rebuild the data you want by scanning that log. That's how many file systems work, and "Big Data" map/reduce systems to basically the same thing.
You can move to a database and use its transaction processing or any OLPT system to make sure you do all-or-none updates to your data store.
Your example code is especially fragile because
data = open('DisCoin.json','wb')
trashes existing data on disk. There is no going back with this code! Step one, then, is don't do that. Keep old data until the new stuff is ready.
Here is an example class that manages temporary files for you. Use it instead of open and it will create a temporary file for you to update and will only go live with the data of the with clause exits without an exception. There is no need for an atexit handler if you use this in a with clause.
import shutil
import os
class SidelineFile:
def __init__(self, *args, **kw):
self.args = list(args)
self.kw = kw
def __enter__(self):
self.closed = False
self.orig_filename = self.args[0]
self.args[0] += '.tmp'
try:
mode = self.args[1]
except IndexError:
try:
mode = self.kw['mode']
except KeyError:
mode = 'r'
if 'a' in mode:
shutil.copy2(self.orig_filename, self.args[0])
self.file_obj = open(*self.args, **self.kw)
return self.file_obj
def __exit__(self, exc_type, exc_value, traceback):
if not self.closed:
self.file_obj.close()
self.closed = True
if not exc_type:
os.rename(self.args[0], self.orig_filename)
else:
os.remove(self.args[0])
fn = 'test.txt'
with SidelineFile(fn, 'w') as fp:
fp.write("foo")
print(1, repr(open(fn).read()))
with SidelineFile(fn, mode='a') as fp:
fp.write("bar")
print(2, repr(open(fn).read()))
with SidelineFile(fn, 'w') as fp:
fp.write("foo")
print(3, repr(open(fn).read()))
try:
with SidelineFile(fn, 'a') as fp:
fp.write("bar")
raise IndexError()
except IndexError:
pass
print(4, repr(open(fn).read()))
Personally, I like to achieve this by defining a print function for it.
import os
def fprint(text,**kwargs):
os.chdir('C:\\mypath')
myfile=open('output.txt','a')
if kwargs:
print(text,end=kwargs['end'],file=myfile)
else:
print(text,file=myfile)
myfile.close()
fprint('Hello')
input()
fprint('This is here too',end='!!\n')
The above code will write 'Hello' into the file 'output.txt' at C:\mypath, save it, then after you enter some input will write 'This is here too!!' into the file. If you check the file while the script is waiting for input, it should already contain 'Hello'.

Python module function returning immediately in thread

I've written a couple of twitter scrapers in python, and am writing another script to keep them running even if they suffer a timeout, disconnection, etc.
My current solution is as follows:
Each scraper file has a doScrape/1 function in it, which will start up a scraper and run it once, eg:
def doScrape(logger):
try:
with DBWriter(logger=logger) as db:
logger.log_info("starting", __name__)
s = PastScraper(db.getKeywords(), TwitterAuth(), db, logger)
s.run()
finally:
logger.log_info("Done", __name__)
Where run is a near-infinite loop, which won't break unless there is an exception.
In order to run one of each kind of scraper at once, I'm using this code (with a few extra imports):
from threading import Thread
class ScraperThread(Thread):
def __init__(self, module, logger):
super(ScraperThread, self).__init__()
self.module = module # Module should contain a doScrape(logger) function
self.logger = logger
def run(self):
while True:
try:
print "Starting!"
print self.module.doScrape
self.module.doScrape(self.logger)
except: # if for any reason we get disconnected, reconnect
self.logger.log_debug("Restarting scraper", __name__)
if __name__ == "__main__":
with Logger(level="all", handle=open(sys.argv[1], "a")) as l:
past = ScraperThread(PastScraper, l)
stream = ScraperThread(StreamScraper, l)
past.start()
stream.start()
past.join()
stream.join()
However, it appears that my call of doScrape from above is returning immediately, hence "Starting!" is printed in the console repeatedly, and that "Done" message in the finally block is not written to the log, whereas when run individually like so:
if __name__ == "__main__":
# Example instantiation
from Scrapers.Logging import Logger
with Logger(level="all", handle=open(sys.argv[1], "a")) as l:
doScrape(l)
The code runs forever, as expected. I'm a bit stumped.
Is there anything silly that I might have missed?
get rid of the diaper pattern in your run() method, as in: get rid of that catch-all exception handler. You'll probably get the error printed there then. I think there may be something wrong in the DBWriter or other code you're calling from your doScrape function. Perhaps it is not thread-safe. That would explain why running it from the main program directly works, but calling it from a thread fails.
Aha, solved it! It was actually that I didn't realise that a default argument (here in TwitterAuth()) is evaluated at definition time. TwitterAuth reads the API key settings from a file handle, and the default argument opens up the default config file. Since this file handle is generated at definition time, both threads had the same handle, and once one had read it, the other one tried to read from the end of the file, throwing an exception. This is remedied by resetting the file before use, and using a mutex.
Cheers to Irmen de Jong for pointing me in the right direction.

How to test file locking in Python

So I want to write some files that might be locked/blocked for write/delete by other processes and like to test that upfront.
As I understand: os.access(path, os.W_OK) only looks for the permissions and will return true although a file cannot currently be written. So I have this little function:
def write_test(path):
try:
fobj = open(path, 'a')
fobj.close()
return True
except IOError:
return False
It actually works pretty well, when I try it with a file that I manually open with a Program. But as a wannabe-good-developer I want to put it in a test to automatically see if it works as expected.
Thing is: If I just open(path, 'a') the file I can still open() it again no problem! Even from another Python instance. Although Explorer will actually tell me that the file is currently open in Python!
I looked up other posts here & there about locking. Most are suggesting to install a package. You migth understand that I don't wanna do that to test a handful lines of code. So I dug up the packages to see the actual spot where the locking is eventually done...
fcntl? I don't have that. win32con? Don't have it either... Now in filelock there is this:
self.fd = os.open(self.lockfile, os.O_CREAT|os.O_EXCL|os.O_RDWR)
When I do that on a file it moans that the file exists!! Ehhm ... yea! That's the idea! But even when I do it on a non-existing path. I can still open(path, 'a') it! Even from another python instance...
I'm beginning to think that I fail to understand something very basic here. Am I looking for the wrong thing? Can someone point me into the right direction?
Thanks!
You are trying to implement the file locking problem using just the system call open(). The Unix-like systems uses by default advisory file locking. This means that cooperating processes may use locks to coordinate access to a file among themselves, but uncooperative processes are also free to ignore locks and access the file in any way they choose. In other words, file locks lock out other file lockers only, not I/O. See Wikipedia.
As stated in system call open() reference the solution for performing atomic file locking using a lockfile is to create a unique file on the same file system (e.g., incorporating hostname and pid), use link(2) to make a link to the lockfile. If link() returns 0, the lock is successful. Otherwise, use stat(2) on the unique file to check if its link count has increased to 2, in which case the lock is also successful.
That is why in filelock they also use the function fcntl.flock() and puts all that stuff in a module as it should be.
Alright! Thanks to those guys I actually have something now! So this is my function:
def lock_test(path):
"""
Checks if a file can, aside from it's permissions, be changed right now (True)
or is already locked by another process (False).
:param str path: file to be checked
:rtype: bool
"""
import msvcrt
try:
fd = os.open(path, os.O_APPEND | os.O_EXCL | os.O_RDWR)
except OSError:
return False
try:
msvcrt.locking(fd, msvcrt.LK_NBLCK, 1)
msvcrt.locking(fd, msvcrt.LK_UNLCK, 1)
os.close(fd)
return True
except (OSError, IOError):
os.close(fd)
return False
And the unittest could look something like this:
class Test(unittest.TestCase):
def test_lock_test(self):
testfile = 'some_test_name4142351345.xyz'
testcontent = 'some random blaaa'
with open(testfile, 'w') as fob:
fob.write(testcontent)
# test successful locking and unlocking
self.assertTrue(lock_test(testfile))
os.remove(testfile)
self.assertFalse(os.path.exists(testfile))
# make file again, lock and test False locking
with open(testfile, 'w') as fob:
fob.write(testcontent)
fd = os.open(testfile, os.O_APPEND | os.O_RDWR)
msvcrt.locking(fd, msvcrt.LK_NBLCK, 1)
self.assertFalse(lock_test(testfile))
msvcrt.locking(fd, msvcrt.LK_UNLCK, 1)
self.assertTrue(lock_test(testfile))
os.close(fd)
with open(testfile) as fob:
content = fob.read()
self.assertTrue(content == testcontent)
os.remove(testfile)
Works. Downsides are:
It's kind of testing itself with itself
so the initial OSError catch is not even tested, only locking again with msvcrt
But I dunno how to make it better now.

Good way of closing a file

Let us say, we have the following code:
from sys import exit
def parseLine(l):
if '#' not in l:
print 'Invalid expresseion'
exit(1)
return l
with open('somefile.txt') as f:
for l in f:
print parseLine(l)
(Note that this is a demo code. The actual program is much more complex.)
Now, how do I know if I have safely closed all the open files when I exit from the program? At this point I am just assuming that the files have been closed. Currently my programs are working OK, but I want them to be robust and free of problems related to files not closed properly.
One of the chief benefits of the with block with files is that it will automatically close the file, even if there's an exception.
https://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects
It's already closing properly, since you're using a with statement when you open the file. That'll automatically close the file when control leaves the with statement, even if there's an exception. This is usually considered the best way to ensure files are closed when they should be.
If you don't use a with statement or close the file yourself, there are a few built-in safeties and a few pitfalls.
First, in CPython, the file object's destructor will close the file when it gets garbage-collected. However, that isn't guaranteed to happen in other Python implementations, and even in CPython, it isn't guaranteed to happen promptly.
Second, when your program exits, the operating system will close any files the program left open. This means if you accidentally do something that makes the program never close its files (perhaps you had to issue a kill -9 or something else that prevents cleanup code from running), you don't have to reboot the machine or perform filesystem repair to make the file usable again. Relying on this as your usual means of closing files would be inadvisable, though.
If you're using a with block, you essentially have your open call inside of a try block and the close in a finally block. See https://docs.python.org/2/tutorial/inputoutput.html for more information from the official docs.
Since calling exit() actually raises the SystemExit exception, all code within finally blocks will be run before the program completely exits. Since this is the case, and since you're using with open(...) blocks, the file will be closed with any uncaught exception.
Below is your code (runnable/debuggable/steppable at http://python.dbgr.cc/s)
from sys import exit
def parseLine(l):
if '#' not in l:
print 'Invalid expresseion'
exit(1)
return l
with open('somefile.txt') as f:
for l in f:
print parseLine(l)
print("file is closed? %r" % f.closed)
Equivalent code without using the with open(...) block is shown below (runnable/debuggable at http://python.dbgr.cc/g):
from sys import exit
def parseLine(l):
if '#' not in l:
print 'Invalid expresseion'
exit(1)
return l
try:
f = open('somefile.txt')
for l in f:
print parseLine(l)
finally:
print("Closing open file!")
f.close()
print("file is closed? %r" % f.closed)

Categories