Why does select.select() work with disk files but not epoll()? - python

The following code essentially cats a file with select.select():
f = open('node.py')
fd = f.fileno()
while True:
r, w, e = select.select([fd], [], [])
print '>', repr(os.read(fd, 10))
time.sleep(1)
When I try a similar thing with epoll I get an error:
self._impl.register(fd, events | self.ERROR)
IOError: [Errno 1] Operation not permitted
I've also read that epoll does not support disk files -- or perhaps that it doesn't make sense.
Epoll on regular files
But why does select() support disk files then? I looked at the implementation in selectmodule.c and it seems to just be going through to the operating system, i.e. Python is not adding any special support.
On a higher level I'm experimenting with the best way to serve static files in a nonblocking server. I guess I will try creating I/O threads that read from disk and feed data to the main event loop thread that writes to sockets.

select allows filedescriptors pointing to regular files to be monitored, however it will always report a file as readable/writable (i.e. it's somewhat useless, as it doesn't tell you whether a read/write would actually block).
epoll just disallows monitoring of regular files, as it has no mechanism (on linux at least) available to tell whether reading/writing a regular file would block

Related

sftp write operation atomicity with Python

When appending lines to a remote file via SFTP with pysftp:
import pysftp
with pysftp.Connection('192.168.0.2', username='root', password='') as sftp:
with sftp.cd('/home/www/test'):
with sftp.open('test.txt', 'a+') as f:
for i in range(100):
s = (("%04d" % i).encode()*10000) + b'\n' # 40'001 bytes
f.write(s)
if I terminate the process in the middle of the operation, sometimes (if I'm lucky), the whole line s is written on the distant file.
On other cases, the last line is truncated in the middle, at the time the process has been interrupted.
Is there a way to make the SFTP f.write(s) operation atomic? i.e. either it fails in the middle, then nothing is written, or it succeeds and the full 40'001 byte-line is written?
I don't believe this is possible. First of all, in order for it to be possible at all, the remote system's write(2) syscall would have to guarantee that, and POSIX does not require that behavior. There are many reasons a write may be non-atomic, such as if the remote disk is full and you can only write part of the data to disk, or if the remote user has a quota and your full write would exceed that.
Additionally, you're trying to write over 40 kB over a network connection, and it's likely that doesn't fit in one packet. Consequently, it wouldn't make sense for any network software to write a packet that large.
If it's important to you to write a file completely or not at all, you can write to another file on the same disk and then rename over the original file. This is the way programs like Git guarantee atomic file updates. I believe for SFTP that requires that both sides support the posix-rename#openssh.com extension; OpenSSH does, but I don't know if pysftp does, so you'd need to consult the documentation.

closing files of a killed process

python: 3.4
OS: win7 / win10
I want to kill a running process with python and close all the files it opened:
for proc in psutil.process_iter():
if proc.name() == 'myprocess.exe':
opened = proc.open_files()
proc.kill()
for i in opened:
print(i.path)
io.FileIO(i.path).close()
print(io.FileIO(i.path).closed)
Somehow io.IOBase(i.path).close() does not work.
Explanation:
It's like I would like to kill Microsoft Word with python, but it leaves some files open. And I would like to close those files as well.
Microsoft Word is just an example. It is a self-written python programm. The opened files are:
fonts (.ttf)
clr.pyd
and .dll-s
How should I close these files?
You don't need to close any files that were opened by the process. That is done automatically:
Terminating a process has the following results:
Any remaining threads in the process are marked for termination.
Any resources allocated by the process are freed.
All kernel objects are closed.
The process code is removed from memory.
The process exit code is set.
The process object is signaled.
The important bit is "All kernel objects are closed." For every open file handle, there is an associated kernel object--that's actually what a handle is, a mapping from a number to a kernel object. When the process exits, the kernel will walk behind and close all associated file handles, sockets, etc.
Additionally, you're original approach has a few problems. First, the list of open files is only a snapshot of which ones were open at that time. In between asking for the list of open files and killing the process, the process could have opened many more, or closed and removed many as well. Second, the Python 3 docs say that the constructor for IOBase isn't public, so using it in this way is wrong:
class io.IOBase
The abstract base class for all I/O classes, acting on streams of bytes. There is no public constructor.
Generally, you'd use something like io.open() which takes the path. This leads to the third issue. All you have to work with is the path. In order to close a file, you really need the handle. Those handles are process-specific. This means in one process, 0x5555AAAA may correspond to "file1.txt", but in another process, it might correspond to "file2.txt" or maybe not even a file at all (it could be a socket or something else). So even if you have the kernel handle, we don't really have a way of saying "close this handle in the context of this other process." That violates some security goals of processes. Also, it means that what you're actually doing here is creating your own handle to only turn around and close it (or in this case, it possibly does nothing at all since the object wasn't created correctly).
So, if you're having a problem with files still being held, perhaps the problem is that the process didn't actually die yet before trying whatever work you needed to get done. You may need to wait for the process to exit before attempting to move on if there are files the process was using that you want to use again. It looks like you can use psutils.wait_procs() to do that.
Also, on Windows I find that anti-virus tools often get in the way. They hold open files accessed by applications making it look like a process is still holding onto them when it's actually the virus scanner doing its thing. I remember one instance of having to deal with this in Subversion. The code still exists today. So you might need to simply wait a bit and try again.
Update
Microsoft Word is just an example. It is a self-written python programm. The opened files are:
fonts (.ttf)
clr.pyd
and .dll-s
How should I close these files?
The answer is that you shouldn't need to. Just make sure the process has actually exited. It's not an instantaneous operation, so there's some time between killing it and it actually exiting that it still retains the file handles.
Given that you've actually written the process being killed, I think a far better approach would be to introduce a way to launch that process, have it do its work, then exit gracefully. Then use subprocess.run() to run the script and wait for it to exit.
It's like I would like to kill Microsoft Word with python, but it leaves some files open. And I would like to close those files as well.
There is some misunderstanding here. When you terminate Word with kill, all files are closed from a system point of view, but they will be dirty closed. When Word terminates normally, it flushes its internal buffers, removes any temporary files and mark the files as clean. When it crashes or is abruptely terminated, all that cleaning does not occur. Some modifications may not be written to disk, and temp files are still there, so on next execution, Word will warn you that the files have not been orderly closed and have to be repaired.
So you do not want to kill Microsoft Word, but to close it, meaning posting a WM_QUIT message to its main window. Unfortunately, there is no clean and neat support in Python for that. There is an example of closing Excel by the win32com module here. The convertion for Word should be (beware untested):
wd = win32com.client.Dispatch("Word.Application")
wd.Quit() #quit word, as if user hit the close button/clicked file->exit.
Take a look at the with statement syntax. There's a brief overview here

Does python "file write()" method guarantee datas have been correctly written?

I'm new with python and I'm writting script to patch a file with something like:
def getPatchDatas(file):
f = open(file,"rb")
datas = f.read()
f.close()
return datas
f = open("myfile.bin","r+b")
f.seek(0xC020)
f.write(getPatchDatas("mypatch.bin"))
f.close()
I would like to be sure the patch as been applied correctly.
So, if no error / exception is raised, does it mean I'm 100% sure the patch has been correctly written?
Or is it better to double check with something like:
f = open("myfile.bin","rb")
f.seek(0xC020)
if not f.read(0x20) == getPatchDatas("mypatch.bin"):
print "Patch not applied correctly!"
f.close()
??
Thanks.
No it doesn't, but roughly it does. It depends how much it matters.
Anything could go wrong - it could be a consumer hard disk which lies to the operating system about when it has finished writing data to disk. It could be corrupted in memory and that corrupt version gets written to disk, or it could be corrupted inside the disk during writing by electrical or physical problems.
It could be intercepted by kernel modules on Linux, filter drivers on Windows or a FUSE filesystem provider which doesn't actually support writing but pretends it does, meaning nothing was written.
It could be screwed up by a corrupted Python install where exceptions don't work or were deliberately hacked out of it, or file objects monkeypatched, or accidentally run in an uncommon implementation of Python which fakes supporting files but is otherwise identical.
These kinds of reasons are why servers have server class hardware with higher tolerances to temperature and electrical variation, error checking and correcting memory (ECC), RAID controller battery backups, ZFS checksumming filesystem, Uninterruptable Power Supplies, and so on.
But, as far as normal people and low risk things go - if it's written without error, it's as good as written. Double-checking makes sense - especially as it's that easy. It's nice to know if something has failed.
In single process, it is.
In multi processes(e.g. One process is writing and another is reading. Even you ensure it'll only read after call "write", the "write" need some time to finish), you may need a filelock.

How to resume file transferring with paramiko

I'm working on a Python project that is required some file transferring. One side of the connection is highly available ( REHL 6 ) and always online. But the other side is going on and off ( Windows 7 ) and the connection period is not guaranteed. The files are transporting on both directions and sizes are between 10MB to 2GB.
Is it possible to resume the file transferring with paramiko instead of transferring the entire file from the beginning.
I would like to use rSync but one side is windows and I would like to avoid cwRsync and DeltaCopy
Paramiko doesn't offer an out of the box 'resume' function however, Syncrify, DeltaCopy's big successor has a retry built in and if the backup goes down the server waits up to six hours for a reconnect. Pretty trusty, easy to use and data diff by default.
paramiko.sftp_client.SFTPClient contains an open function, which functions exactly like python's built-in open function.
You can use this to open both a local and remote file, and manually transfer data from one to the other, all the while recording how much data has been transferred. When the connection is interrupted, you should be able to pick up right where you left off (assuming that neither file has been changed by a 3rd party) by using the seek method.
Keep in mind that a naive implementation of this is likely to be slower than paramiko's get and put functions.

Abort a slow flush to disk after write?

Is there a way to abort a python write operation in such a way that the OS doesn't feel it's necessary to flush the unwritten data to the disc?
I'm writing data to a USB device, typically many megabytes. I'm using 4096 bytes as my block size on the write, but it appears that Linux caches up a bunch of data early on, and write it out to the USB device slowly. If at some point during the write, my user decides to cancel, I want the app to just stop writing immediately. I can see that there's a delay between when the data stops flowing from the application, and the USB activity light stops blinking. Several seconds, up to about 10 seconds typically. I find that the app is holding in the close() method, I'm assuming, waiting for the OS to finish writing the buffered data. I call flush() after every write, but that doesn't appear to have any impact on the delay. I've scoured the python docs for an answer but have found nothing.
It's somewhat filesystem dependent, but in some filesystems, if you delete a file before (all of) it is allocated, the IO to write the blocks will never happen. This might also be true if you truncate it so that the part which is still being written is chopped off.
Not sure that you can really abort a write if you want to still access the data. Also the kinds of filesystems that support this (e.g. xfs, ext4) are not normally used on USB sticks.
If you want to flush data to the disc, use fdatasync(). Merely flushing your IO library's buffer into the OS one will not achieve any physical flushing.
Assuming I am understanding this correct, you want to be able to 'abort' and NOT flush the data. This IS possible using a ctype and a little pokery. This is very OS dependent so I'll give you the OSX version and then what you can do to change it to Linux:
f = open('flibble1.txt', 'w')
f.write("hello world")
import ctypes
x = ctypes.cdll.LoadLibrary('/usr/lib/libc.dylib')
x.close(f.fileno())
try:
del f
catch IOError:
pass
If you change /usr/lib/libc.dylib to the libc.so.6 in /usr/lib for Linux then you should be good to go. Basically by calling close() instead of fclose(), no call to fsync() is done and nothing is flushed.
Hope that's useful.
When you abort the write operation, trying doing file.truncate(0); before closing it.

Categories