What happens if I don't close a txt file - python

I'm about to write a program for a racecar, that creates a txt and continuously adds new lines to it. Unfortunately I can't close the file, because when the car shuts off the raspberry (which the program is running on) gets also shut down. So I have no chance of closing the txt.
Is this a problem?

Yes and no. Data is buffered at different places in the process of writing: the file object of python, the underlying C-functions, the operating system, the disk controller. Even closing the file, does not guarantee, that all these buffers are written physically. Only the first two levels are forced to write their buffers to the next level. The same can be done by flushing the filehandle without closing it.
As long as the power-off can occur anytime, you have to deal with the fact, that some data is lost or partially written.
Closing a file is important to give free limited resources of the operating system, but this is no concern in your setup.

Related

Confusion about the select module in Python

I am a little confused about the select module in python. As PyMOTW describes:
select monitors sockets, open files, and pipes (anything with a
fileno() method that returns a valid file descriptor) until they
become readable or writable, or a communication error occurs.
I am confused about what do readable and writable mean? What's the difference between them?
Besides, it describes:
select() returns three new lists, containing subsets of the contents
of the lists passed in. All of the sockets in the readable list have
incoming data buffered and available to be read. All of the sockets in
the writable list have free space in their buffer and can be written
to. The sockets returned in exceptional have had an error (the actual
definition of “exceptional condition” depends on the platform).
So in my understanding, the select module is such a tool which monitors multiple sockets when they are open and working. Select can tell a specific socket whether it should read the data, write data or there is an error. Is that right? Could someone explain to me that how does it achieve multi-connection socket communication?
select doesn't tell sockets anything. It just watches them.
Say you have a building with one entrance. You post a receptionist there. He watches the door, and when there's someone at the door, he goes and opens the door for the guest.
Now you build a back entrance, but you are too cheap to hire a second receptionist. So while the front receptionist is staring at the front entrance, the back entrance is piling up with very angry people staring at the stubbornly closed door.
If only there was a surveillance system, so that the poor receptionist could see both doors at the same time...
That's what select does.
Normally when you f.read() (on a blocking file descriptor), your program stops till some data shows up. When you f.write() but the other side has signalled their buffer is full, your program will stop till the other end clears some space in their buffer and signals it's okay to receive again. And when your program is stuck on some IO operation, it can't do anything else - while incoming data in other sockets is piling up, or maybe while a user on some other socket is impatiently waiting for their response.
With select, it waits until any file descriptor has something you can do about. It will wait until something becomes actionable; then it will tell you where f.read will be instantaneously responded to, and where write will be instantly sent, without further blocking.

closing files of a killed process

python: 3.4
OS: win7 / win10
I want to kill a running process with python and close all the files it opened:
for proc in psutil.process_iter():
if proc.name() == 'myprocess.exe':
opened = proc.open_files()
proc.kill()
for i in opened:
print(i.path)
io.FileIO(i.path).close()
print(io.FileIO(i.path).closed)
Somehow io.IOBase(i.path).close() does not work.
Explanation:
It's like I would like to kill Microsoft Word with python, but it leaves some files open. And I would like to close those files as well.
Microsoft Word is just an example. It is a self-written python programm. The opened files are:
fonts (.ttf)
clr.pyd
and .dll-s
How should I close these files?
You don't need to close any files that were opened by the process. That is done automatically:
Terminating a process has the following results:
Any remaining threads in the process are marked for termination.
Any resources allocated by the process are freed.
All kernel objects are closed.
The process code is removed from memory.
The process exit code is set.
The process object is signaled.
The important bit is "All kernel objects are closed." For every open file handle, there is an associated kernel object--that's actually what a handle is, a mapping from a number to a kernel object. When the process exits, the kernel will walk behind and close all associated file handles, sockets, etc.
Additionally, you're original approach has a few problems. First, the list of open files is only a snapshot of which ones were open at that time. In between asking for the list of open files and killing the process, the process could have opened many more, or closed and removed many as well. Second, the Python 3 docs say that the constructor for IOBase isn't public, so using it in this way is wrong:
class io.IOBase
The abstract base class for all I/O classes, acting on streams of bytes. There is no public constructor.
Generally, you'd use something like io.open() which takes the path. This leads to the third issue. All you have to work with is the path. In order to close a file, you really need the handle. Those handles are process-specific. This means in one process, 0x5555AAAA may correspond to "file1.txt", but in another process, it might correspond to "file2.txt" or maybe not even a file at all (it could be a socket or something else). So even if you have the kernel handle, we don't really have a way of saying "close this handle in the context of this other process." That violates some security goals of processes. Also, it means that what you're actually doing here is creating your own handle to only turn around and close it (or in this case, it possibly does nothing at all since the object wasn't created correctly).
So, if you're having a problem with files still being held, perhaps the problem is that the process didn't actually die yet before trying whatever work you needed to get done. You may need to wait for the process to exit before attempting to move on if there are files the process was using that you want to use again. It looks like you can use psutils.wait_procs() to do that.
Also, on Windows I find that anti-virus tools often get in the way. They hold open files accessed by applications making it look like a process is still holding onto them when it's actually the virus scanner doing its thing. I remember one instance of having to deal with this in Subversion. The code still exists today. So you might need to simply wait a bit and try again.
Update
Microsoft Word is just an example. It is a self-written python programm. The opened files are:
fonts (.ttf)
clr.pyd
and .dll-s
How should I close these files?
The answer is that you shouldn't need to. Just make sure the process has actually exited. It's not an instantaneous operation, so there's some time between killing it and it actually exiting that it still retains the file handles.
Given that you've actually written the process being killed, I think a far better approach would be to introduce a way to launch that process, have it do its work, then exit gracefully. Then use subprocess.run() to run the script and wait for it to exit.
It's like I would like to kill Microsoft Word with python, but it leaves some files open. And I would like to close those files as well.
There is some misunderstanding here. When you terminate Word with kill, all files are closed from a system point of view, but they will be dirty closed. When Word terminates normally, it flushes its internal buffers, removes any temporary files and mark the files as clean. When it crashes or is abruptely terminated, all that cleaning does not occur. Some modifications may not be written to disk, and temp files are still there, so on next execution, Word will warn you that the files have not been orderly closed and have to be repaired.
So you do not want to kill Microsoft Word, but to close it, meaning posting a WM_QUIT message to its main window. Unfortunately, there is no clean and neat support in Python for that. There is an example of closing Excel by the win32com module here. The convertion for Word should be (beware untested):
wd = win32com.client.Dispatch("Word.Application")
wd.Quit() #quit word, as if user hit the close button/clicked file->exit.
Take a look at the with statement syntax. There's a brief overview here

Does python make a copy of opened files in memory?

So I would like to search for filenames with os.walk() and write the resulting list of names to a file. I would like to know what is more efficient : opening the file and then writing each result as I find them or storing everything in a list and then writing the whole list. That list could be big so I wonder if the second solution would work.
See this example:
import os
fil = open('/tmp/stuff', 'w')
fil.write('aaa')
os.system('cat /tmp/stuff')
You may expect to see aaa, but instead you get nothing. This is because Python has an internal buffer. Writing to disk is expensive, as it has to:
Tell the OS to write it.
Actually transfer the data to the disk (on a hard disk it may involve spinning it up, waiting for IO time, etc.).
Wait for the OS to report success on the writing.
If you want to write any small things, it can add up to quite some time. Instead, what Python does is to keep a buffer and only actually write from time to time. You don't have to worry about the memory growth, as it will be kept at a low value. From the docs:
"0 means unbuffered, 1 means line buffered, any other positive value means use a buffer of (approximately) that size (in bytes). A negative buffering means to use the system default, which is usually line buffered for tty devices and fully buffered for other files. If omitted, the system default is used."
When you are done, make sure you do a fil.close(), or fil.flush() at any point during the execution, or use the keyword buffering=0 to disable buffering.
Another thing to consider is what happens if, for some reason, the program exits in the middle of the process. If you store everything in memory, it will be lost. What you have on disk, will remain there (but unless you flush, there is no guarantee of how much was actually saved).

When does Python write a file to disk?

I have a library that interacts with a configuration file. When the library is imported, the initialization code reads the configuration file, possibly updates it, and then writes the updated contents back to the file (even if nothing was changed).
Very occasionally, I encounter a problem where the contents of the configuration file simply disappear. Specifically, this happens when I run many invocations of a short script (using the library), back-to-back, thousands of times. It never occurs during the same directories, which leads me to believe it's a somewhat random problem--specifically a race condition with IO.
This is a pain to debug, since I can never reliably reproduce the problem and it only happens on some systems. I have a suspicion about what might happen, but I wanted to see if my picture of file I/O in Python is correct.
So the question is, when does a Python program actually write file contents to a disk? I thought that the contents would make it to disk by the time that the file closed, but then I can't explain this error. When python closes a file, does it flush the contents to the disk itself, or simply queue it up to the filesystem? Is it possible that file contents can be written to disk after Python terminates? And can I avoid this issue by using fp.flush(); os.fsync(fp.fileno()) (where fp is the file handle)?
If it matters, I'm programming on a Unix system (Mac OS X, specifically). Edit: Also, keep in mind that the processes are not running concurrently.
Appendix: Here is the specific race condition that I suspect:
Process #1 is invoked.
Process #1 opens the configuration file in read mode and closes it when finished.
Process #1 opens the configuration file in write mode, erasing all of its contents. The erasing of the contents is synced to the disk.
Process #1 writes the new contents to the file handle and closes it.
Process #1: Upon closing the file, Python tells the OS to queue writing these contents to disk.
Process #1 closes and exits
Process #2 is invoked
Process #2 opens the configuration file in read mode, but new contents aren't synced yet. Process #2 sees an empty file.
The OS finally finishes writing the contents to disk, after process 2 reads the file
Process #2, thinking the file is empty, sets defaults for the configuration file.
Process #2 writes its version of the configuration file to disk, overwriting the last version.
It is almost certainly not python's fault. If python closes the file, OR exits cleanly (rather than killed by a signal), then the OS will have the new contents for the file. Any subsequent open should return the new contents. There must be something more complicated going on. Here are some thoughts.
What you describe sounds more likely to be a filesystem bug than a Python bug, and a filesystem bug is pretty unlikely.
Filesystem bugs are far more likely if your files actually reside in a remote filesystem. Do they?
Do all the processes use the same file? Do "ls -li" on the file to see its inode number, and see if it ever changes. In your scenario, it should not. Is it possible that something is moving files, or moving directories, or deleting directories and recreating them? Are there symlinks involved?
Are you sure that there is no overlap in the running of your programs? Are any of them run from a shell with "&" at the end (i.e. in the background)? That could easily mean that a second one is started before the first one is finished.
Are there any other programs writing to the same file?
This isn't your question, but if you need atomic changes (so that any program running in parallel only sees either the old version or the new one, never the empty file), the way to achieve it is to write the new content to another file (e.g. "foo.tmp"), then do os.rename("foo.tmp", "foo"). Rename is atomic.

Abort a slow flush to disk after write?

Is there a way to abort a python write operation in such a way that the OS doesn't feel it's necessary to flush the unwritten data to the disc?
I'm writing data to a USB device, typically many megabytes. I'm using 4096 bytes as my block size on the write, but it appears that Linux caches up a bunch of data early on, and write it out to the USB device slowly. If at some point during the write, my user decides to cancel, I want the app to just stop writing immediately. I can see that there's a delay between when the data stops flowing from the application, and the USB activity light stops blinking. Several seconds, up to about 10 seconds typically. I find that the app is holding in the close() method, I'm assuming, waiting for the OS to finish writing the buffered data. I call flush() after every write, but that doesn't appear to have any impact on the delay. I've scoured the python docs for an answer but have found nothing.
It's somewhat filesystem dependent, but in some filesystems, if you delete a file before (all of) it is allocated, the IO to write the blocks will never happen. This might also be true if you truncate it so that the part which is still being written is chopped off.
Not sure that you can really abort a write if you want to still access the data. Also the kinds of filesystems that support this (e.g. xfs, ext4) are not normally used on USB sticks.
If you want to flush data to the disc, use fdatasync(). Merely flushing your IO library's buffer into the OS one will not achieve any physical flushing.
Assuming I am understanding this correct, you want to be able to 'abort' and NOT flush the data. This IS possible using a ctype and a little pokery. This is very OS dependent so I'll give you the OSX version and then what you can do to change it to Linux:
f = open('flibble1.txt', 'w')
f.write("hello world")
import ctypes
x = ctypes.cdll.LoadLibrary('/usr/lib/libc.dylib')
x.close(f.fileno())
try:
del f
catch IOError:
pass
If you change /usr/lib/libc.dylib to the libc.so.6 in /usr/lib for Linux then you should be good to go. Basically by calling close() instead of fclose(), no call to fsync() is done and nothing is flushed.
Hope that's useful.
When you abort the write operation, trying doing file.truncate(0); before closing it.

Categories