How to do proper file locking on NFS?

How to do proper file locking on NFS? - python

I am trying to implement a "record manager" class in python 3x and linux/macOS. The class is relatively easy and straightforward, the only "hard" thing I want is to be able to access the same file (where results are saved) on multiple processes.
This seemed pretty easy, conceptually: when saving, acquire an exclusive lock on the file. Update your information, save the new information, release exclusive lock on the file. Easy enough.
I am using fcntl.lockf(file, fcntl.LOCK_EX) to acquire the exclusive lock. The problem is that, looking on the internet, I am finding a lot of different websites saying how this is not reliable, that it won't work on windows, that the support on NFS is shaky, and that things could change between macOS and linux.
I have accepted that the code won't work on windows, but I was hoping to be able to make it work on macOS (single machine) and on linux (on multiple servers with NFS).
The problem is that I can't seem to make this work; and after a while of debugging and after the tests passed on macOS, they failed once I tried them on the NFS with linux (ubuntu 16.04). The issue is an inconsistency between the informations saved by multiple processes - some processes have their modifications missing, which means something went wrong in the locking and saving procedure.
I am sure there is something I am doing wrong, and I suspect this may be related to the issues that I read about online. So, what is the proper way to deal multiple access to the same file that works on macOS and linux over NFS?
Edit
This is what the typical method that writes new informations to disk looks like:
sf = open(self._save_file_path, 'rb+')
try:
fcntl.lockf(sf, fcntl.LOCK_EX) # acquire an exclusive lock - only one writer
self._raw_update(sf) #updates the records from file (other processes may have modified it)
self._saved_records[name] = new_info
self._raw_save() #does not check for locks (but does *not* release the lock on self._save_file_path)
finally:
sf.flush()
os.fsync(sf.fileno()) #forcing the OS to write to disk
sf.close() #release the lock and close
While this is how a typical method that only read info from disk looks like:
sf = open(self._save_file_path, 'rb')
try:
fcntl.lockf(sf, fcntl.LOCK_SH) # acquire shared lock - multiple writers
self._raw_update(sf) #updates the records from file (other processes may have modified it)
return self._saved_records
finally:
sf.close() #release the lock and close
Also, this is how _raw_save looks like:
def _raw_save(self):
#write to temp file first to avoid accidental corruption of information.
#os.replace is guaranteed to be an atomic operation in POSIX
with open('temp_file', 'wb') as p:
p.write(self._saved_records)
os.replace('temp_file', self._save_file_path) #pretty sure this does not release the lock
Error message
I have written a unit test where I create 100 different processes, 50 that read and 50 that write to the same file. Each process does some random waiting to avoid accessing the files sequentially.
The problem is that some of the records are not kept; at the end there are some 3-4 random records missing, so I only end up with 46-47 records rather than 50.
Edit 2
I have modified the code above and I acquire the lock not on the file itself, but on a separate lock file. This prevents the issue that closing the file would release the lock (as suggested by #janneb), and makes the code work correctly on mac. The same code fails on linux with NFS though.

I don't see how the combination of file locks and os.replace() can make sense. When the file is replaced (that is, the directory entry is replaced), all the existing file locks (probably including file locks waiting for the locking to succeed, I'm not sure of the semantics here) and file descriptors will be against the old file, not the new one. I suspect this is the reason behind the race conditions causing you to lose some of the records in your tests.
os.replace() is a good technique to ensure that a reader doesn't read a partial update. But it doesn't work robustly in the face of multiple updaters (unless losing some of the updates is ok).
Another issues is that fcntl is a really really stupid API. In particular, the locks are bound to the process, not the file descriptor. Which means that e.g. a close() on ANY file descriptor pointing to the file will release the lock.
One way would be to use a "lock file", e.g. taking advantage of the atomicity of link(). From http://man7.org/linux/man-pages/man2/open.2.html:
Portable
programs that want to perform atomic file locking using a
lockfile, and need to avoid reliance on NFS support for
O_EXCL, can create a unique file on the same filesystem (e.g.,
incorporating hostname and PID), and use link(2) to make a
link to the lockfile. If link(2) returns 0, the lock is
successful. Otherwise, use stat(2) on the unique file to
check if its link count has increased to 2, in which case the
lock is also successful.
If it's Ok to read slightly stale data then you can use this link() dance only for a temp file that you use when updating the file and then os.replace() the "main" file you use for reading (reading can then be lockless). If not, then you need to do the link() trick for the "main" file and forget about shared/exclusive locking, all locks are then exclusive.
Addendum: One tricky thing to deal with when using lock files is what to do when a process dies for whatever reason, and leaves the lock file around. If this is to run unattended, you might want to incorporate some kind of timeout and removal of lock files (e.g. check the stat() timestamps).

Using randomly named hard links and the link counts on those files as lock files is a common strategy (E.g. this), and arguable better than using lockd but for far more information about the limits of all sorts of locks over NFS read this: http://0pointer.de/blog/projects/locking.html
You'll also find that this is a long standing standard problem for MTA software using Mbox files over NFS. Probably the best answer there was to use Maildir instead of Mbox, but if you look for examples in the source code of something like postfix, it'll be close to best practice. And if they simply don't solve that problem, that might also be your answer.

NFS is great for file sharing. It sucks as a "transmission" medium.
I've been down the NFS-for-data-transmission road multiple times. In every instance, the solution involved moving away from NFS.
Getting reliable locking is one part of the problem. The other part is the update of the file on the server and expecting the clients to receive that data at some specific point-in-time (such as before they can grab the lock).
NFS isn't designed to be a data transmission solution. There are caches and timing involved. Not to mention paging of the file content, and file metadata (e.g. the atime attribute). And client O/S'es keeping track of state locally (such as "where" to append the client's data when writing to the end of the file).
For a distributed, synchronized store, I recommend looking at a tool that does just that. Such as Cassandra, or even a general-purpose database.
If I'm reading the use-case correctly, you could also go with a simple server-based solution. Have a server listen for TCP connections, read messages from the connections, and then write each to file, serializing the writes within the server itself. There's some added complexity in having your own protocol (to know where a message starts and stops), but otherwise, it's fairly straight-forward.

Related

Concurrent file accesses from different scripts python

I have several scripts. Each of them does some computation and it is completely independent from the others. Once these computations are done, they will be saved to disk and a record updated.
The record is maintained by an instance of a class, which saves itself to disks. I would like to have a single record instance used in multiple scripts (for example, record_manager = RecordManager(file_on_disk). And then record_manager.update(...) ); but I can't do this right now, because when updating the record there may be concurrent write accesses to the same file on disk, leading to data loss. So I have a separate record manager for every script, and then I merge the records manually later.
What is the easiest way to have a single instance used in all the scripts that solves the concurrent write access problem?
I am using macOS (High sierra) and linux (Ubuntu 16.04).
Thanks!

To build a custom solution to this you will probably need to write a short new queuing module. This queuing module will have write access to the file(s) alone and be passed write actions from the existing modules in your code.
The queue logic and logic should be a pretty straightforward queue architecture.
There may also be libraries that exist in python to handle this problem that would avoid you writing your own queue class.
Finally, it is possible that this whole thing will be/could be handled in some way by your OS, independent of python.

Bad idea to have two class opening the same file?

I have two classes in my Python program and one of them is a thread. Is it a bad idea to have both classes open the same log file and write to it?
Is there any good approach to write to the same log file for two classes which are running at the same time?

This is a classical concurrency issue. You need to ensure that you exactly control what is happening. Regarding log files, the easiest solution might be to have a queue collecting log messages from various places (from different threads or even processes) and then have one entity that pops messages from that queue and writes them to the log file. This way, at least single messages stay self-contained.
The operating system does not prevent message mix up if you write to the file from different unsynchronized entities. Hence, if you do not explicitly control what should happen in which order, you might end up with corrupted messages in that file, even if things seem to work most of the time.

Use the python logging module. It handles the gory details for you.

As long as you control which class is reading and writing from a file and ensure that only one of them can write to it at a time you should be fine and every time you switch you reread the file.
Look into using lock to ensure that both classes are not accessing the file at the same time.

Python shared file access between threads

I have a thread writing to a file(writeThread) periodically and another(readThread) that reads from the file asynchronously. Can readThread access the file using a different handle and not mess anything up?
If not, does python have a shared lock that can be used by writeThread but does not block readThread ? I wouldn't prefer a simple non-shared lock because file access takes order of a millisecond and the writeThread write period is of the same order(the period depends on some external parameters). Thus, a situation may arise where even though writeThread may release the lock, it will re-acquire it immediately and thus cause starvation.
A solution which I can think of is to maintain multiple copies of the file, one for reading and another for writing and avoid the whole situation all-together. However, the file sizes involved may become huge, thus making this method not preferable.
Are there any other alternatives or is this a bad design ?
Thanks

Yes, you can open the file multiple times and get independent access to it. Each file object will have its own buffers and position so for instance a seek on one will not mess up the other. It works pretty much like multiple program access and you have to be careful when reading / writing the same area of the file. For instance, a write that appends to the end of the file won't be seen by the reader until the write object flushes. Rewrites of existing data won't be seen by the reader until both the reader and writer flush. Writes won't be atomic, so if you are writing records the reader may see partial records. async Select or poll events on the reader may be funky... not sure about that one.
An alternative is mmap but I haven't used it enough to know the gotchas.

Multiple threads reading from single folder on Linux

My projects needs multiple threads reading files from the same folder. This folder has incoming files and the file should only be processed by any one of those threads. Later, this file reading thread, deletes the file after processing it.
EDIT after the first answer: I don't want a single thread in charge of reading filenames and feeding those names to other threads, so that they can read it.
Is there any efficient way of achieving this in python?

You should probably use the Queue module. From the docs:
The Queue module implements multi-producer, multi-consumer queues. It is especially useful in threaded programming when information must be exchanged safely between multiple threads.
I would use a FIFO approach, with a thread in charge of checking for inbound files and queuing them, and a number of workers processing them. A LIFO approach or an approach in which priority is assigned with a custom method are also supported by the module.
EDIT: If you don't want to use the Queue module and you are under a *nix system, you could use fcntl.lockf instead. An alternative, opening the files with os.open('filename', os.O_EXLOCK).
Depending on how often you perform this operation, you might find it less performing than using Queue, as you will have to account for race conditions (i.e.: you might acquire the name of the file to open, but the file might get locked by another thread before you get a chance to open it, throwing an exception that you will have to trap). Queue is there for a reason! ;)
EDIT2: Comments in this and other questions are bringing up the problem with simultaneous disk access to different files and the consequent performance hit. I was thinking that task_done would have been used for preventing this, but reading others' comments it occurred to me that instead of queuing file names, one could queue the files' content directly. This second alternative would work only for a limited amount of limited size queued files, given that RAM would fill up rather quickly otherwise.
I'm unaware if RAID and other parallel disk configurations would already take care of reading one file per disk rather than bouncing back and forth between two files on both disks.
HTH!

If you want multiple threads to read directly from the same folder several files in parallel, then I must disappoint you. Reading in parallel from a single disk is not a viable option. A single disk needs to spin and seek the next location to be read. If you're reading with multiple threads, you are just bouncing the disk around between seeks and the performance is much worse than a simple sequential read.
Just stick to mac's advice and use a single thread for reading.

How does one programmatically determine if "write" system call is atomic on a particular file?

In some cases the coder cannot rely on system calls being atomic, e.g. if the file is on a NFS filesystem. (c.f. NFS Overview, FAQ and HOWTO Documents). But atomic system calls are ultimately required for most database work. (c.f. Atomicity of database systems).
Is there a standard (and OS independent) way of confirming writes (and other syscalls) are atomic on a particular FILE in C (or python).
Any suggestions?
Subsequent notes: Atomicity on pipes is discussed in the following:
unix pipe multiple writers
What happens if a write system call is called on same file by 2 different processes simultaneously
Note in-particular the "man" page extract dealing specifically with O_APPEND:
If the O_APPEND flag of the file status flags is set, the file
offset shall be set to the end of the file prior to each write and no
intervening file modification operation shall occur between changing
the file offset and the write operation.

The write call as defined in POSIX has no atomicity guarantee at all. So you don't need to confirm anything, it's not atomic.
It doesn't even guarantee that the data will have reached the hard drive (if there is a drive at all) if it completes successfully. Successfully reading back the data doesn't give you any guarantees either.
You'll need to use the sync family of functions to get some durability guarantees.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.