Abort a slow flush to disk after write?

Abort a slow flush to disk after write? - python

Is there a way to abort a python write operation in such a way that the OS doesn't feel it's necessary to flush the unwritten data to the disc?
I'm writing data to a USB device, typically many megabytes. I'm using 4096 bytes as my block size on the write, but it appears that Linux caches up a bunch of data early on, and write it out to the USB device slowly. If at some point during the write, my user decides to cancel, I want the app to just stop writing immediately. I can see that there's a delay between when the data stops flowing from the application, and the USB activity light stops blinking. Several seconds, up to about 10 seconds typically. I find that the app is holding in the close() method, I'm assuming, waiting for the OS to finish writing the buffered data. I call flush() after every write, but that doesn't appear to have any impact on the delay. I've scoured the python docs for an answer but have found nothing.

It's somewhat filesystem dependent, but in some filesystems, if you delete a file before (all of) it is allocated, the IO to write the blocks will never happen. This might also be true if you truncate it so that the part which is still being written is chopped off.
Not sure that you can really abort a write if you want to still access the data. Also the kinds of filesystems that support this (e.g. xfs, ext4) are not normally used on USB sticks.
If you want to flush data to the disc, use fdatasync(). Merely flushing your IO library's buffer into the OS one will not achieve any physical flushing.

Assuming I am understanding this correct, you want to be able to 'abort' and NOT flush the data. This IS possible using a ctype and a little pokery. This is very OS dependent so I'll give you the OSX version and then what you can do to change it to Linux:
f = open('flibble1.txt', 'w')
f.write("hello world")
import ctypes
x = ctypes.cdll.LoadLibrary('/usr/lib/libc.dylib')
x.close(f.fileno())
try:
del f
catch IOError:
pass
If you change /usr/lib/libc.dylib to the libc.so.6 in /usr/lib for Linux then you should be good to go. Basically by calling close() instead of fclose(), no call to fsync() is done and nothing is flushed.
Hope that's useful.

When you abort the write operation, trying doing file.truncate(0); before closing it.

Related

closing files of a killed process

python: 3.4
OS: win7 / win10
I want to kill a running process with python and close all the files it opened:
for proc in psutil.process_iter():
if proc.name() == 'myprocess.exe':
opened = proc.open_files()
proc.kill()
for i in opened:
print(i.path)
io.FileIO(i.path).close()
print(io.FileIO(i.path).closed)
Somehow io.IOBase(i.path).close() does not work.
Explanation:
It's like I would like to kill Microsoft Word with python, but it leaves some files open. And I would like to close those files as well.
Microsoft Word is just an example. It is a self-written python programm. The opened files are:
fonts (.ttf)
clr.pyd
and .dll-s
How should I close these files?

You don't need to close any files that were opened by the process. That is done automatically:
Terminating a process has the following results:
Any remaining threads in the process are marked for termination.
Any resources allocated by the process are freed.
All kernel objects are closed.
The process code is removed from memory.
The process exit code is set.
The process object is signaled.
The important bit is "All kernel objects are closed." For every open file handle, there is an associated kernel object--that's actually what a handle is, a mapping from a number to a kernel object. When the process exits, the kernel will walk behind and close all associated file handles, sockets, etc.
Additionally, you're original approach has a few problems. First, the list of open files is only a snapshot of which ones were open at that time. In between asking for the list of open files and killing the process, the process could have opened many more, or closed and removed many as well. Second, the Python 3 docs say that the constructor for IOBase isn't public, so using it in this way is wrong:
class io.IOBase
The abstract base class for all I/O classes, acting on streams of bytes. There is no public constructor.
Generally, you'd use something like io.open() which takes the path. This leads to the third issue. All you have to work with is the path. In order to close a file, you really need the handle. Those handles are process-specific. This means in one process, 0x5555AAAA may correspond to "file1.txt", but in another process, it might correspond to "file2.txt" or maybe not even a file at all (it could be a socket or something else). So even if you have the kernel handle, we don't really have a way of saying "close this handle in the context of this other process." That violates some security goals of processes. Also, it means that what you're actually doing here is creating your own handle to only turn around and close it (or in this case, it possibly does nothing at all since the object wasn't created correctly).
So, if you're having a problem with files still being held, perhaps the problem is that the process didn't actually die yet before trying whatever work you needed to get done. You may need to wait for the process to exit before attempting to move on if there are files the process was using that you want to use again. It looks like you can use psutils.wait_procs() to do that.
Also, on Windows I find that anti-virus tools often get in the way. They hold open files accessed by applications making it look like a process is still holding onto them when it's actually the virus scanner doing its thing. I remember one instance of having to deal with this in Subversion. The code still exists today. So you might need to simply wait a bit and try again.
Update
Microsoft Word is just an example. It is a self-written python programm. The opened files are:
fonts (.ttf)
clr.pyd
and .dll-s
How should I close these files?
The answer is that you shouldn't need to. Just make sure the process has actually exited. It's not an instantaneous operation, so there's some time between killing it and it actually exiting that it still retains the file handles.
Given that you've actually written the process being killed, I think a far better approach would be to introduce a way to launch that process, have it do its work, then exit gracefully. Then use subprocess.run() to run the script and wait for it to exit.

It's like I would like to kill Microsoft Word with python, but it leaves some files open. And I would like to close those files as well.
There is some misunderstanding here. When you terminate Word with kill, all files are closed from a system point of view, but they will be dirty closed. When Word terminates normally, it flushes its internal buffers, removes any temporary files and mark the files as clean. When it crashes or is abruptely terminated, all that cleaning does not occur. Some modifications may not be written to disk, and temp files are still there, so on next execution, Word will warn you that the files have not been orderly closed and have to be repaired.
So you do not want to kill Microsoft Word, but to close it, meaning posting a WM_QUIT message to its main window. Unfortunately, there is no clean and neat support in Python for that. There is an example of closing Excel by the win32com module here. The convertion for Word should be (beware untested):
wd = win32com.client.Dispatch("Word.Application")
wd.Quit() #quit word, as if user hit the close button/clicked file->exit.

Take a look at the with statement syntax. There's a brief overview here

Does python "file write()" method guarantee datas have been correctly written?

I'm new with python and I'm writting script to patch a file with something like:
def getPatchDatas(file):
f = open(file,"rb")
datas = f.read()
f.close()
return datas
f = open("myfile.bin","r+b")
f.seek(0xC020)
f.write(getPatchDatas("mypatch.bin"))
f.close()
I would like to be sure the patch as been applied correctly.
So, if no error / exception is raised, does it mean I'm 100% sure the patch has been correctly written?
Or is it better to double check with something like:
f = open("myfile.bin","rb")
f.seek(0xC020)
if not f.read(0x20) == getPatchDatas("mypatch.bin"):
print "Patch not applied correctly!"
f.close()
??
Thanks.

No it doesn't, but roughly it does. It depends how much it matters.
Anything could go wrong - it could be a consumer hard disk which lies to the operating system about when it has finished writing data to disk. It could be corrupted in memory and that corrupt version gets written to disk, or it could be corrupted inside the disk during writing by electrical or physical problems.
It could be intercepted by kernel modules on Linux, filter drivers on Windows or a FUSE filesystem provider which doesn't actually support writing but pretends it does, meaning nothing was written.
It could be screwed up by a corrupted Python install where exceptions don't work or were deliberately hacked out of it, or file objects monkeypatched, or accidentally run in an uncommon implementation of Python which fakes supporting files but is otherwise identical.
These kinds of reasons are why servers have server class hardware with higher tolerances to temperature and electrical variation, error checking and correcting memory (ECC), RAID controller battery backups, ZFS checksumming filesystem, Uninterruptable Power Supplies, and so on.
But, as far as normal people and low risk things go - if it's written without error, it's as good as written. Double-checking makes sense - especially as it's that easy. It's nice to know if something has failed.

In single process, it is.
In multi processes(e.g. One process is writing and another is reading. Even you ensure it'll only read after call "write", the "write" need some time to finish), you may need a filelock.

Python Save Sets To File On Windows Shutdown?

I do not want to lose my sets if windows is about to shutdown/restart/log off/sleep, Is it possible to save it before shutdown? Or is there an alternative to save information without worring it will get lost on windows shutdown? JSON, CSV, DB? Anything?
s = {1,2,3,4}
with open("s.pick","wb") as f: # pickle it to file when PC about to shutdown to save information
pickle.dump(s,f)

I do not want to lose my sets if windows is about to shutdown/restart/log off/sleep, Is it possible to save it before shutdown?
Yes, if you've built an app with a message loop, you can receive the WM_QUERYENDSESSION message. If you want to have a GUI, most GUI libraries will probably wrap this up in their own way. If you don't need a GUI, your simplest solution is probably to use PyWin32. Somewhere in the docs there's a tutorial on creating a hidden window and writing a simple message loop. Just do that on the main thread, and do your real work on a background thread, and signal your background thread when a WM_QUERYENDSESSION message comes in.
Or, much more simply, as Evgeny Prokurat suggests, just use SetConsoleCtrlHandler (again through PyWin32). This can also catch ^C, ^BREAK, and the user closing your console, as well as the logoff and shutdown messages that WM_QUERYENDSESSION catches. More importantly, it doesn't require a message loop, so if you don't have any other need for one, it's a lot simpler.
Or is there an alternative to save information without worring it will get lost on windows shutdown? JSON, CSV, DB? Anything?
The file format isn't going to magically solve anything. However, a database could have two advantages.
First, you can reduce the problem by writing as often as possible. But with most file formats, that means rewriting the whole file as often as possible, which will be very slow. The solution is to streaming to a simpler "journal" file, packing that into the real file less often, and looking for a leftover journal at every launch. You can do that manually, but a database will usually do that for you automatically.
Second, if you get killed in the middle of a write, you end up with half a file. You can solve that by the atomic writing trick—write a temporary file, then replace the old file with the temporary—but this is hard to get right on Windows (especially with Python 2.x) (see Getting atomic writes right), and again, a database will usually do it for you.
The "right" way to do this is to create a new window class with a msgproc that dispatches to your handler on WM_QUERYENDSESSION. Just as MFC makes this easier than raw Win32 API code, win32ui (which wraps MFC) makes this easier than win32api/win32gui (which wraps raw Win32 API). And you can find lots of samples for that (e.g., a quick search for "pywin32 msgproc example" turned up examples like this, and searches for "python win32ui" and similar terms worked just as well).
However, in this case, you don't have a window that you want to act like a normal window, so it may be easier to go right to the low level and write a quick&dirty message loop. Unfortunately, that's a lot harder to find sample code for—you basically have to search the native APIs for C sample code (like Creating a Message Loop at MSDN), then figure out how to translate that to Python with the pywin32 documentation. Less than ideal, especially if you don't know C, but not that hard. Here's an example to get you started:
def msgloop():
while True:
msg = win32gui.GetMessage(None, 0, 0)
if msg and msg.message == win32con.WM_QUERYENDSESSION:
handle_shutdown()
win32api.TranslateMessage(msg)
win32api.DispatchMessage(msg)
if msg and msg.message == win32con.WM_QUIT:
return msg.wparam
worker = threading.Thread(real_program)
worker.start()
exitcode = msgloop()
worker.join()
sys.exit(exitcode)
I haven't shown the "how to create a minimal hidden window" part, or how to signal the worker to stop with, e.g., a threading.Condition, because there are a lot more (and easier-to-find) good samples for those parts; this is the tricky part to find.

you can detect windows shutdown/log off with win32api.setConsoleCtrlHandler
there is a good example How To Catch “Kill” Events with Python

What happens if I don't close a txt file

I'm about to write a program for a racecar, that creates a txt and continuously adds new lines to it. Unfortunately I can't close the file, because when the car shuts off the raspberry (which the program is running on) gets also shut down. So I have no chance of closing the txt.
Is this a problem?

Yes and no. Data is buffered at different places in the process of writing: the file object of python, the underlying C-functions, the operating system, the disk controller. Even closing the file, does not guarantee, that all these buffers are written physically. Only the first two levels are forced to write their buffers to the next level. The same can be done by flushing the filehandle without closing it.
As long as the power-off can occur anytime, you have to deal with the fact, that some data is lost or partially written.
Closing a file is important to give free limited resources of the operating system, but this is no concern in your setup.

Does python make a copy of opened files in memory?

So I would like to search for filenames with os.walk() and write the resulting list of names to a file. I would like to know what is more efficient : opening the file and then writing each result as I find them or storing everything in a list and then writing the whole list. That list could be big so I wonder if the second solution would work.

See this example:
import os
fil = open('/tmp/stuff', 'w')
fil.write('aaa')
os.system('cat /tmp/stuff')
You may expect to see aaa, but instead you get nothing. This is because Python has an internal buffer. Writing to disk is expensive, as it has to:
Tell the OS to write it.
Actually transfer the data to the disk (on a hard disk it may involve spinning it up, waiting for IO time, etc.).
Wait for the OS to report success on the writing.
If you want to write any small things, it can add up to quite some time. Instead, what Python does is to keep a buffer and only actually write from time to time. You don't have to worry about the memory growth, as it will be kept at a low value. From the docs:
"0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used."
When you are done, make sure you do a fil.close(), or fil.flush() at any point during the execution, or use the keyword buffering=0 to disable buffering.
Another thing to consider is what happens if, for some reason, the program exits in the middle of the process. If you store everything in memory, it will be lost. What you have on disk, will remain there (but unless you flush, there is no guarantee of how much was actually saved).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.