I'm a Python beginner and facing the following : I have a script periodically reading a settings file and doing something according to those settings . I have another script triggered by some UI that writes the settings file with user input values. I use the ConfigParser module to both read and write the file.
I am wondering if this scenario is capable of leading into an inconsistent state (like in middle of reading the settings file, the other script begins writing). I am unaware if there are any mechanism behind the scene to automatically protect against this situations.
If such inconsistencies are possible, what could I use to synchronize both scripts and mantain the integrity of the operations ?
I'm a Python beginner and facing the following : I have a script periodically reading a settings file and doing something according to those settings . I have another script triggered by some UI that writes the settings file with user input values.
There may be a race condition when the reader reads while the writer writes to the file, so that the reader may read the file while it is incomplete.
You can protect from this race by locking the file while reading and writing (see Linux flock() or Python lockfile module), so that the reader never observes the file incomplete.
Or, better, you can first write into a temporary file and when done rename it to the final name atomically. This way the reader and writer never block:
def write_config(config, filename):
tmp_filename = filename + "~"
with open(tmp_filename, 'wb') as file:
config.write(file)
os.rename(tmp_filename, filename)
When the writer uses the above method no changes are required to the reader.
When you write the config file write it to a temporary file first. When it's done, rename it to the correct name. The rename operation (os.rename) is normally implemented as an atomic operation on Unix systems, Linux and Windows, too, I think, so there will be no risk of the other process trying to read the config while the writing has not been finished yet.
There are al least two ways to address this issue (assuming you are on a unix-ish system):
If you want to write, write to a temporary file first, then do something unix can do atomically, especially rename the temporary file into place.
Lock the file during any operation, e.g. with the help of this filelock module.
Personally, I like the first option because it utilizes the OS, although some systems have had problems with the atomicity: On how rename is broken in Mac OS X - another limitation: the rename system call can not rename files across devices.
Related
i am scraping data through multiple websites.
To do that i have written multiple web scrapers with using selenium and PhantomJs.
Those scrapers return values.
My question is: is there a way i can feed those values to a single python program that will sort through that data in real time.
What i want to do is not save that data to analyze it later i want to send it to a program that will analyze it in real time.
what i have tried: i have no idea where to even start
Perhaps a named pipe would be suitable:
mkfifo whatever (you can also do this from within your python script; os.mkfifo)
You can write to whatever like a normal file (it will block until something reads it) and read from whatever with a different process (it will block if there is no data available)
Example:
# writer.py
with open('whatever', 'w') as h:
h.write('some data') # Blocks until reader.py reads the data
# reader.py
with open('whatever', 'r') as h:
print(h.read()) # Blocks until writer.py writes to the named pipe
You can try writing the data you want to share to a file and have the other script read and interpret it. Have the other script run in a loop to check if there is a new file or the file has been changed.
Simply use files for data exchange and a trivial locking mechanism.
Each writer or reader (only one reader, it seems) gets a unique number.
If a writer or reader wants to write to the file, it renames it to its original name + the number and then writes or reads, renaming it back after that.
The others wait until the file is available again under its own name and then access it by locking it in a similar way.
Of course you have shared memory and such, or memmapped files and semaphores. But this mechanism has worked flawlessly for me for over 30 years, on any OS, over any network. Since it's trivially simple.
It is in fact a poor man's mutex semaphore.
To find out if a file has changed, look to its writing timestamp.
But the locking is necessary too, otherwise you'll land into a mess.
I've seen a few questions related to this but nothing that definitively answers my question.
I have a short python script that does some simple tasks, then outputs some text to a log file, waits for more input, and loops.
At times, the file is opened in write mode ("w") and other times it is opened in append mode ("a") depending on the results of the other tasks. For the sake of simplicity let's say it is in write mode/append mode 50/50.
I am opening files by saying:
with open(fileName, mode) as file:
and writing to them by saying:
file.write(line)
While these files are being opened, written to, appended to, etc., I expect a command prompt to be doing some read activities on them (findstr, specifically).
1) What's going to happen if my script tries to write to the same file the command window is reading from?
2) Is there a way to explicitly set the open to shared mode?
3)Does using the 'logger' module help at all/handle this instead of just manually making my own log files?
Thanks
What you are referring to is generally called a "race condition" where two programs are trying to read / write the same file at the same time. Some operating systems can help you avoid this by implementing a file-lock mutex system, but on most operating systems you just get a corrupted file, a crashed program, or both.
Here's an interesting article talking about how to avoid race conditions in python:
http://blog.gocept.com/2013/07/15/reliable-file-updates-with-python/
One suggestion that the author makes is to copy the file to a temp file, make your writes/appends there and then move the file back. Race conditions happen when files are kept open for a long time, this way you are never actually opening the main file in python, so the only point at which a collision could occur is during the OS copy / move operations, which are much faster.
I have a file that I want to read. The file may at any time be overwritten by another process. I do not want to block that writing. I am prepared to manage corruption to the data that I read, but do not want my reading to be in any way change the behaviour of the writing process.
The process that is writing the file is a delphi program running locally on the server. It opens the file using fmCreate. fmCreate tries to open the file exclusively and fails if there are any other handles on the file.
I am reading the file from a python script that accesses the file remotely across our network.
I am interested in whether there is a solution, independent of whether it is supported by python or delphi. I want to know if there is any way of achieving this under windows without modifying the writing program.
Edit: To reiterate, this is not a duplicate. The other question was trying to get read access to a file that is being written to. I want to the writer to have access to a file that I have open for reading. These are different questions (although I fear the answer will be similar, that it can't be done.)
I think the real answer here, all of these years later, is to use opportunistic locks. With this, you can open the file for read access, while telling the OS that you want to be notified if another program wants to access the file. Basically, you can use the file as long as you like, and then back off if someone else needs it. This avoids the sharing/access violation that the other program would normally get, if you had just opened the file "normally".
There is an MSDN article on Opportunistic Locks. Raymond Chen also has a blog article about this, complete with sample code: Using opportunistic locks to get out of the way if somebody wants the file
The key is calling the DeviceIoControl function, with the FSCTL_REQUEST_OPLOCK flag, and passing it the handle to an event that you previously created by calling CreateEvent.
It should be straightforward to use this from Delphi, since it supports calling Windows API functions. I am not so sure about Python. But, given the arrangement in the question, it should not be necessary to modify the Python code. Just make your Delphi code use the opportunistic lock when it opens the file, and let it get out of the way when the Python script needs the file.
Also much easier and lighter weight than a filter driver or the Volume Shadow Copy service.
You can setup a filter driver which can act in two ways: (1) modify the flags when the file is opened, and (2) it can capture the data when it's written to the file and save a copy of the data elsewhere.
This approach is much more lightweight and efficient than volume shadow copy service, mentioned in comments, however it requires having a filter driver. There exist several drivers on the market (i.e. those are products which include a driver and let you write business logic in user mode) yet they are costly and can be an overkill in your case. Still, if you need the thing for private use only, contact me privately for a license for our CallbackFilter.
Update: if you want to let the writer open the file which has been already opened, then a filter which will modify flags when the file is being opened is your only option.
I am writing a script that will be polling a directory looking for new files.
In this scenario, is it necessary to do some sort of error checking to make sure the files are completely written prior to accessing them?
I don't want to work with a file before it has been written completely to disk, but because the info I want from the file is near the beginning, it seems like it could be possible to pull the data I need without realizing the file isn't done being written.
Is that something I should worry about, or will the file be locked because the OS is writing to the hard drive?
This is on a Linux system.
Typically on Linux, unless you're using locking of some kind, two processes can quite happily have the same file open at once, even for writing. There are three ways of avoiding problems with this:
Locking
By having the writer apply a lock to the file, it is possible to prevent the reader from reading the file partially. However, most locks are advisory so it is still entirely possible to see partial results anyway. (Mandatory locks exist, but a strongly not recommended on the grounds that they're far too fragile.) It's relatively difficult to write correct locking code, and it is normal to delegate such tasks to a specialist library (i.e., to a database engine!) In particular, you don't want to use locking on networked filesystems; it's a source of colossal trouble when it works and can often go thoroughly wrong.
Convention
A file can instead be created in the same directory with another name that you don't automatically look for on the reading side (e.g., .foobar.txt.tmp) and then renamed atomically to the right name (e.g., foobar.txt) once the writing is done. This can work quite well, so long as you take care to deal with the possibility of previous runs failing to correctly write the file. If there should only ever be one writer at a time, this is fairly simple to implement.
Not Worrying About It
The most common type of file that is frequently written is a log file. These can be easily written in such a way that information is strictly only ever appended to the file, so any reader can safely look at the beginning of the file without having to worry about anything changing under its feet. This works very well in practice.
There's nothing special about Python in any of this. All programs running on Linux have the same issues.
On Unix, unless the writing application goes out of its way, the file won't be locked and you'll be able to read from it.
The reader will, of course, have to be prepared to deal with an incomplete file (bearing in mind that there may be I/O buffering happening on the writer's side).
If that's a non-starter, you'll have to think of some scheme to synchronize the writer and the reader, for example:
explicitly lock the file;
write the data to a temporary location and only move it into its final place when the file is complete (the move operation can be done atomically, provided both the source and the destination reside on the same file system).
If you have some control over the writing program, have it write the file somewhere else (like the /tmp directory) and then when it's done move it to the directory being watched.
If you don't have control of the program doing the writing (and by 'control' I mean 'edit the source code'), you probably won't be able to make it do file locking either, so that's probably out. In which case you'll likely need to know something about the file format to know when the writer is done. For instance, if the writer always writes "DONE" as the last four characters in the file, you could open the file, seek to the end, and read the last four characters.
Yes it will.
I prefer the "file naming convention" and renaming solution described by Donal.
I am writing a Python logger script which writes to a CSV file in the following manner:
Open the file
Append data
Close the file (I think this is necessary to save the changes, to be safe after every logging routine.)
PROBLEM:
The file is very much accessible through Windows Explorer (I'm using XP). If the file is opened in Excel, access to it is locked by Excel. When the script tries to append data, obviously it fails, then it aborts altogether.
OBJECTIVE:
Is there a way to lock a file using Python so that any access to it remains exclusive to the script? Or perhaps my methodology is poor in the first place?
Rather than closing and reopening the file after each access, just flush its buffer:
theloggingfile.flush()
This way, you keep it open for writing in Python, which should lock the file from other programs opening it for writing. I think Excel will be able to open it as read-only while it's open in Python, but I can't check that without rebooting into Windows.
EDIT: I don't think you need the step below. .flush() should send it to the operating system, and if you try to look at it in another program, the OS should give it the cached version. Use os.fsync to force the OS to really write it to the hard drive, e.g. if you're concerned about sudden power failures.
os.fsync(theloggingfile.fileno())
As far as I know, Windows does not support file locking. In other words, applications that don't know about your file being locked can't be prevented from reading a file.
But the remaining question is: how can Excel accomplish this?
You might want to try to write to a temporary file first (one that Excel does not know about) and replace the original file by it lateron.