So I want to write some files that might be locked/blocked for write/delete by other processes and like to test that upfront.
As I understand: os.access(path, os.W_OK) only looks for the permissions and will return true although a file cannot currently be written. So I have this little function:
def write_test(path):
try:
fobj = open(path, 'a')
fobj.close()
return True
except IOError:
return False
It actually works pretty well, when I try it with a file that I manually open with a Program. But as a wannabe-good-developer I want to put it in a test to automatically see if it works as expected.
Thing is: If I just open(path, 'a') the file I can still open() it again no problem! Even from another Python instance. Although Explorer will actually tell me that the file is currently open in Python!
I looked up other posts here & there about locking. Most are suggesting to install a package. You migth understand that I don't wanna do that to test a handful lines of code. So I dug up the packages to see the actual spot where the locking is eventually done...
fcntl? I don't have that. win32con? Don't have it either... Now in filelock there is this:
self.fd = os.open(self.lockfile, os.O_CREAT|os.O_EXCL|os.O_RDWR)
When I do that on a file it moans that the file exists!! Ehhm ... yea! That's the idea! But even when I do it on a non-existing path. I can still open(path, 'a') it! Even from another python instance...
I'm beginning to think that I fail to understand something very basic here. Am I looking for the wrong thing? Can someone point me into the right direction?
Thanks!
You are trying to implement the file locking problem using just the system call open(). The Unix-like systems uses by default advisory file locking. This means that cooperating processes may use locks to coordinate access to a file among themselves, but uncooperative processes are also free to ignore locks and access the file in any way they choose. In other words, file locks lock out other file lockers only, not I/O. See Wikipedia.
As stated in system call open() reference the solution for performing atomic file locking using a lockfile is to create a unique file on the same file system (e.g., incorporating hostname and pid), use link(2) to make a link to the lockfile. If link() returns 0, the lock is successful. Otherwise, use stat(2) on the unique file to check if its link count has increased to 2, in which case the lock is also successful.
That is why in filelock they also use the function fcntl.flock() and puts all that stuff in a module as it should be.
Alright! Thanks to those guys I actually have something now! So this is my function:
def lock_test(path):
"""
Checks if a file can, aside from it's permissions, be changed right now (True)
or is already locked by another process (False).
:param str path: file to be checked
:rtype: bool
"""
import msvcrt
try:
fd = os.open(path, os.O_APPEND | os.O_EXCL | os.O_RDWR)
except OSError:
return False
try:
msvcrt.locking(fd, msvcrt.LK_NBLCK, 1)
msvcrt.locking(fd, msvcrt.LK_UNLCK, 1)
os.close(fd)
return True
except (OSError, IOError):
os.close(fd)
return False
And the unittest could look something like this:
class Test(unittest.TestCase):
def test_lock_test(self):
testfile = 'some_test_name4142351345.xyz'
testcontent = 'some random blaaa'
with open(testfile, 'w') as fob:
fob.write(testcontent)
# test successful locking and unlocking
self.assertTrue(lock_test(testfile))
os.remove(testfile)
self.assertFalse(os.path.exists(testfile))
# make file again, lock and test False locking
with open(testfile, 'w') as fob:
fob.write(testcontent)
fd = os.open(testfile, os.O_APPEND | os.O_RDWR)
msvcrt.locking(fd, msvcrt.LK_NBLCK, 1)
self.assertFalse(lock_test(testfile))
msvcrt.locking(fd, msvcrt.LK_UNLCK, 1)
self.assertTrue(lock_test(testfile))
os.close(fd)
with open(testfile) as fob:
content = fob.read()
self.assertTrue(content == testcontent)
os.remove(testfile)
Works. Downsides are:
It's kind of testing itself with itself
so the initial OSError catch is not even tested, only locking again with msvcrt
But I dunno how to make it better now.
Related
I have several instances of the same python script running in parallel, reading and writing to the same json file: First an instance reads information from the json file, then processes it then locks it then reads it again, to get the up to date contents of the file (might have been altered by other instances) then writes to it and releases the lock. Well, that is, this is how it would work if it... worked
A stripped down version of the locking and writing part in my script looks like this:
import json
import fcntl
data = json.load(open('test.json'))
# do things with data
with open('test.json', 'w+') as file:
fcntl.flock(file, fcntl.LOCK_EX | fcntl.LOCK_NB)
data = json.load(open('test.json'))
fcntl.flock(file, fcntl.LOCK_UN)
But the open function seems to kind of clear the file, as it will be empty after running this snippet and json complains about invalid file format.
How do I have to set this up correctly?
But the open function seems to kind of clear the file
Yes, opening a file in w write mode always clears the file; from the open() function documentation:
'w'
open for writing, truncating the file first
[...]
The default mode is 'r' (open for reading text, synonym of 'rt'). For binary read-write access, the mode 'w+b' opens and truncates the file to 0 bytes. 'r+b' opens the file without truncation.
You want to lock the file before truncating it. You can also open the file in 'r+' mode (reading and writing), at which point you need to manually truncate it after locking.
You also will need to lock the file for reading, because you don't want your readers to end up with truncated data when they try to read while another process is busy replacing the contents. Use a shared lock, at which point other processes are allowed to obtain a shared lock too, making it possible for many processes to read the data without having to wait for one another. A process that wants to write has to grab an exclusive lock, which is only going to be awarded when there are no shared locks anymore.
Personally, I'd create a context manager that handles the locking (either in exclusive mode for writing, or in shared mode for reading), and only truncate the file after obtaining the lock. You'll also need to account for the file not yet existing, and if you don't want to wait for locks forever, you need to handle timeouts (meaning you need to use LOCK_NB in a loop and test for the return value to see if the lock was acquired, until a certain amount of time has passed).
In the following context manager, I used the os.open() low-level system call to ensure the file is created when trying to lock it for exclusive access without truncating it if it already exists:
import errno
import fcntl
import os
import time
class Timeout(Exception):
"""Could not obtain a lock within the time given"""
class LockException(Exception):
"""General (file) locking-related exception"""
class LockedFile:
"""Lock and open a file.
If the file is opened for writing, an exclusive lock is used,
otherwise it is a shared lock
"""
def __init__(self, path, mode, timeout=None, **fileopts):
self.path = path
self.mode = mode
self.fileopts = fileopts
self.timeout = timeout
# lock in exclusive mode when writing or appending (including r+)
self._exclusive = set('wa+').intersection(mode)
self._lockfh = None
self._file = None
def _acquire(self):
if self._exclusive:
# open the file in write & create mode, but *without the
# truncate flag* to make sure it is created only if it
# doesn't exist yet
lockfhmode, lockmode = os.O_WRONLY | os.O_CREAT, fcntl.LOCK_EX
else:
lockfhmode, lockmode = os.O_RDONLY, fcntl.LOCK_SH
self._lockfh = os.open(self.path, lockfhmode)
start = time.time()
while True:
try:
fcntl.lockf(self._lockfh, lockmode | fcntl.LOCK_NB)
return
except OSError as e:
if e.errno not in {errno.EACCES, errno.EAGAIN}:
raise
if self.timeout is not None and time.time() - start > self.timeout:
raise Timeout()
time.sleep(0.1)
def _release(self):
fcntl.lockf(self._lockfh, fcntl.LOCK_UN)
os.close(self._lockfh)
def __enter__(self):
if self._file is not None:
raise LockException('Lock already taken')
self._acquire()
try:
self._file = open(self.path, self.mode, **self.fileopts)
except IOException:
self._release()
raise
return self._file
def __exit__(self, *exc):
if self._file is None:
raise LockException('Not locked')
try:
self._file.close()
finally:
self._file = None
self._release()
The processes that try to read the file then use:
with LockedFile('test.json', 'r') as file:
data = json.load(file)
and the process that wants to write uses:
with LockedFile('test.json', 'w') as file:
json.dump(data, file)
If you want to allow for a timeout, add a try/except block around the with block and catch the Timeout exception; you'll need to decide what should happen then:
try:
with LockedFile('test.json', 'w', timeout=10) as file:
json.dump(data, file)
except Timeout:
# could not acquire an exclusive lock to write the file. What now?
You used "w+" for opening the file.
w+
Opens a file for both writing and reading. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing.
So instead of w+ use a.
Looks to me you can really use threading library or multiprocessing to do this in more elegant way by using Locks, instead of running multiple instances of the same python script.
Source : www.tutorialspoint.com, Python Docs
I'm running a program the takes in data from other clients, and have been having an enormous amount of problems writing, and changing information in a file, and I feel like I have tried everything. I want to save the information in case the program stops for some reason, and so the data would have saved. I feel like i have tried everything, using file.flush, using os.fsync() with it, I have tried using with open(file) as file: statements to close the file when the program stops, and currently, I's trying atexit to have a function write to the file when it closes, which hasn't worked out, plus doesn't call on errors, so is kinda irrelevant. I'm looking for a way to write to a file, repeatedly, and, well, work. I may not understand something, so please explain it to me. I have been having trouble without end, and need help.
EDIT
AccData = {}
client = discord.Client()
User = discord.User
def SaveData():
pickle.dump(AccData,data)
data.close()
print("data saved")
atexit.register(SaveData)
f = open('DisCoin.json','rb')
AccData = pickle.load(open('DisCoin.json','rb'))
f.seek(0)
f.close()
data = open('DisCoin.json','wb')
Python catches its own exceptions, most signals and exit() then runs atexit routines for cleanup. So, you can deal with normal badness there.
But other bad things happen. A segmenation fault or other internal error. An unknown signal. Code that calls os._exit(). These will cause an early termination and data not yet flushed will be lost. Bad things can happen to any program and if they need extra resiliency, they need some method to handle that.
You can write things to temporary files and rename them to the "live" file only when they are complete. If a program terminates, at least its last saved data is still there.
You can write a log or journal of changes and rebuild the data you want by scanning that log. That's how many file systems work, and "Big Data" map/reduce systems to basically the same thing.
You can move to a database and use its transaction processing or any OLPT system to make sure you do all-or-none updates to your data store.
Your example code is especially fragile because
data = open('DisCoin.json','wb')
trashes existing data on disk. There is no going back with this code! Step one, then, is don't do that. Keep old data until the new stuff is ready.
Here is an example class that manages temporary files for you. Use it instead of open and it will create a temporary file for you to update and will only go live with the data of the with clause exits without an exception. There is no need for an atexit handler if you use this in a with clause.
import shutil
import os
class SidelineFile:
def __init__(self, *args, **kw):
self.args = list(args)
self.kw = kw
def __enter__(self):
self.closed = False
self.orig_filename = self.args[0]
self.args[0] += '.tmp'
try:
mode = self.args[1]
except IndexError:
try:
mode = self.kw['mode']
except KeyError:
mode = 'r'
if 'a' in mode:
shutil.copy2(self.orig_filename, self.args[0])
self.file_obj = open(*self.args, **self.kw)
return self.file_obj
def __exit__(self, exc_type, exc_value, traceback):
if not self.closed:
self.file_obj.close()
self.closed = True
if not exc_type:
os.rename(self.args[0], self.orig_filename)
else:
os.remove(self.args[0])
fn = 'test.txt'
with SidelineFile(fn, 'w') as fp:
fp.write("foo")
print(1, repr(open(fn).read()))
with SidelineFile(fn, mode='a') as fp:
fp.write("bar")
print(2, repr(open(fn).read()))
with SidelineFile(fn, 'w') as fp:
fp.write("foo")
print(3, repr(open(fn).read()))
try:
with SidelineFile(fn, 'a') as fp:
fp.write("bar")
raise IndexError()
except IndexError:
pass
print(4, repr(open(fn).read()))
Personally, I like to achieve this by defining a print function for it.
import os
def fprint(text,**kwargs):
os.chdir('C:\\mypath')
myfile=open('output.txt','a')
if kwargs:
print(text,end=kwargs['end'],file=myfile)
else:
print(text,file=myfile)
myfile.close()
fprint('Hello')
input()
fprint('This is here too',end='!!\n')
The above code will write 'Hello' into the file 'output.txt' at C:\mypath, save it, then after you enter some input will write 'This is here too!!' into the file. If you check the file while the script is waiting for input, it should already contain 'Hello'.
Let us say, we have the following code:
from sys import exit
def parseLine(l):
if '#' not in l:
print 'Invalid expresseion'
exit(1)
return l
with open('somefile.txt') as f:
for l in f:
print parseLine(l)
(Note that this is a demo code. The actual program is much more complex.)
Now, how do I know if I have safely closed all the open files when I exit from the program? At this point I am just assuming that the files have been closed. Currently my programs are working OK, but I want them to be robust and free of problems related to files not closed properly.
One of the chief benefits of the with block with files is that it will automatically close the file, even if there's an exception.
https://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects
It's already closing properly, since you're using a with statement when you open the file. That'll automatically close the file when control leaves the with statement, even if there's an exception. This is usually considered the best way to ensure files are closed when they should be.
If you don't use a with statement or close the file yourself, there are a few built-in safeties and a few pitfalls.
First, in CPython, the file object's destructor will close the file when it gets garbage-collected. However, that isn't guaranteed to happen in other Python implementations, and even in CPython, it isn't guaranteed to happen promptly.
Second, when your program exits, the operating system will close any files the program left open. This means if you accidentally do something that makes the program never close its files (perhaps you had to issue a kill -9 or something else that prevents cleanup code from running), you don't have to reboot the machine or perform filesystem repair to make the file usable again. Relying on this as your usual means of closing files would be inadvisable, though.
If you're using a with block, you essentially have your open call inside of a try block and the close in a finally block. See https://docs.python.org/2/tutorial/inputoutput.html for more information from the official docs.
Since calling exit() actually raises the SystemExit exception, all code within finally blocks will be run before the program completely exits. Since this is the case, and since you're using with open(...) blocks, the file will be closed with any uncaught exception.
Below is your code (runnable/debuggable/steppable at http://python.dbgr.cc/s)
from sys import exit
def parseLine(l):
if '#' not in l:
print 'Invalid expresseion'
exit(1)
return l
with open('somefile.txt') as f:
for l in f:
print parseLine(l)
print("file is closed? %r" % f.closed)
Equivalent code without using the with open(...) block is shown below (runnable/debuggable at http://python.dbgr.cc/g):
from sys import exit
def parseLine(l):
if '#' not in l:
print 'Invalid expresseion'
exit(1)
return l
try:
f = open('somefile.txt')
for l in f:
print parseLine(l)
finally:
print("Closing open file!")
f.close()
print("file is closed? %r" % f.closed)
I came across the Python with statement for the first time today. I've been using Python lightly for several months and didn't even know of its existence! Given its somewhat obscure status, I thought it would be worth asking:
What is the Python with statement
designed to be used for?
What do
you use it for?
Are there any
gotchas I need to be aware of, or
common anti-patterns associated with
its use? Any cases where it is better use try..finally than with?
Why isn't it used more widely?
Which standard library classes are compatible with it?
I believe this has already been answered by other users before me, so I only add it for the sake of completeness: the with statement simplifies exception handling by encapsulating common preparation and cleanup tasks in so-called context managers. More details can be found in PEP 343. For instance, the open statement is a context manager in itself, which lets you open a file, keep it open as long as the execution is in the context of the with statement where you used it, and close it as soon as you leave the context, no matter whether you have left it because of an exception or during regular control flow. The with statement can thus be used in ways similar to the RAII pattern in C++: some resource is acquired by the with statement and released when you leave the with context.
Some examples are: opening files using with open(filename) as fp:, acquiring locks using with lock: (where lock is an instance of threading.Lock). You can also construct your own context managers using the contextmanager decorator from contextlib. For instance, I often use this when I have to change the current directory temporarily and then return to where I was:
from contextlib import contextmanager
import os
#contextmanager
def working_directory(path):
current_dir = os.getcwd()
os.chdir(path)
try:
yield
finally:
os.chdir(current_dir)
with working_directory("data/stuff"):
# do something within data/stuff
# here I am back again in the original working directory
Here's another example that temporarily redirects sys.stdin, sys.stdout and sys.stderr to some other file handle and restores them later:
from contextlib import contextmanager
import sys
#contextmanager
def redirected(**kwds):
stream_names = ["stdin", "stdout", "stderr"]
old_streams = {}
try:
for sname in stream_names:
stream = kwds.get(sname, None)
if stream is not None and stream != getattr(sys, sname):
old_streams[sname] = getattr(sys, sname)
setattr(sys, sname, stream)
yield
finally:
for sname, stream in old_streams.iteritems():
setattr(sys, sname, stream)
with redirected(stdout=open("/tmp/log.txt", "w")):
# these print statements will go to /tmp/log.txt
print "Test entry 1"
print "Test entry 2"
# back to the normal stdout
print "Back to normal stdout again"
And finally, another example that creates a temporary folder and cleans it up when leaving the context:
from tempfile import mkdtemp
from shutil import rmtree
#contextmanager
def temporary_dir(*args, **kwds):
name = mkdtemp(*args, **kwds)
try:
yield name
finally:
shutil.rmtree(name)
with temporary_dir() as dirname:
# do whatever you want
I would suggest two interesting lectures:
PEP 343 The "with" Statement
Effbot Understanding Python's
"with" statement
1.
The with statement is used to wrap the execution of a block with methods defined by a context manager. This allows common try...except...finally usage patterns to be encapsulated for convenient reuse.
2.
You could do something like:
with open("foo.txt") as foo_file:
data = foo_file.read()
OR
from contextlib import nested
with nested(A(), B(), C()) as (X, Y, Z):
do_something()
OR (Python 3.1)
with open('data') as input_file, open('result', 'w') as output_file:
for line in input_file:
output_file.write(parse(line))
OR
lock = threading.Lock()
with lock:
# Critical section of code
3.
I don't see any Antipattern here.
Quoting Dive into Python:
try..finally is good. with is better.
4.
I guess it's related to programmers's habit to use try..catch..finally statement from other languages.
The Python with statement is built-in language support of the Resource Acquisition Is Initialization idiom commonly used in C++. It is intended to allow safe acquisition and release of operating system resources.
The with statement creates resources within a scope/block. You write your code using the resources within the block. When the block exits the resources are cleanly released regardless of the outcome of the code in the block (that is whether the block exits normally or because of an exception).
Many resources in the Python library that obey the protocol required by the with statement and so can used with it out-of-the-box. However anyone can make resources that can be used in a with statement by implementing the well documented protocol: PEP 0343
Use it whenever you acquire resources in your application that must be explicitly relinquished such as files, network connections, locks and the like.
Again for completeness I'll add my most useful use-case for with statements.
I do a lot of scientific computing and for some activities I need the Decimal library for arbitrary precision calculations. Some part of my code I need high precision and for most other parts I need less precision.
I set my default precision to a low number and then use with to get a more precise answer for some sections:
from decimal import localcontext
with localcontext() as ctx:
ctx.prec = 42 # Perform a high precision calculation
s = calculate_something()
s = +s # Round the final result back to the default precision
I use this a lot with the Hypergeometric Test which requires the division of large numbers resulting form factorials. When you do genomic scale calculations you have to be careful of round-off and overflow errors.
An example of an antipattern might be to use the with inside a loop when it would be more efficient to have the with outside the loop
for example
for row in lines:
with open("outfile","a") as f:
f.write(row)
vs
with open("outfile","a") as f:
for row in lines:
f.write(row)
The first way is opening and closing the file for each row which may cause performance problems compared to the second way with opens and closes the file just once.
See PEP 343 - The 'with' statement, there is an example section at the end.
... new statement "with" to the Python
language to make
it possible to factor out standard uses of try/finally statements.
points 1, 2, and 3 being reasonably well covered:
4: it is relatively new, only available in python2.6+ (or python2.5 using from __future__ import with_statement)
The with statement works with so-called context managers:
http://docs.python.org/release/2.5.2/lib/typecontextmanager.html
The idea is to simplify exception handling by doing the necessary cleanup after leaving the 'with' block. Some of the python built-ins already work as context managers.
Another example for out-of-the-box support, and one that might be a bit baffling at first when you are used to the way built-in open() behaves, are connection objects of popular database modules such as:
sqlite3
psycopg2
cx_oracle
The connection objects are context managers and as such can be used out-of-the-box in a with-statement, however when using the above note that:
When the with-block is finished, either with an exception or without, the connection is not closed. In case the with-block finishes with an exception, the transaction is rolled back, otherwise the transaction is commited.
This means that the programmer has to take care to close the connection themselves, but allows to acquire a connection, and use it in multiple with-statements, as shown in the psycopg2 docs:
conn = psycopg2.connect(DSN)
with conn:
with conn.cursor() as curs:
curs.execute(SQL1)
with conn:
with conn.cursor() as curs:
curs.execute(SQL2)
conn.close()
In the example above, you'll note that the cursor objects of psycopg2 also are context managers. From the relevant documentation on the behavior:
When a cursor exits the with-block it is closed, releasing any resource eventually associated with it. The state of the transaction is not affected.
In python generally “with” statement is used to open a file, process the data present in the file, and also to close the file without calling a close() method. “with” statement makes the exception handling simpler by providing cleanup activities.
General form of with:
with open(“file name”, “mode”) as file_var:
processing statements
note: no need to close the file by calling close() upon file_var.close()
The answers here are great, but just to add a simple one that helped me:
with open("foo.txt") as file:
data = file.read()
open returns a file
Since 2.6 python added the methods __enter__ and __exit__ to file.
with is like a for loop that calls __enter__, runs the loop once and then calls __exit__
with works with any instance that has __enter__ and __exit__
a file is locked and not re-usable by other processes until it's closed, __exit__ closes it.
source: http://web.archive.org/web/20180310054708/http://effbot.org/zone/python-with-statement.htm
I have a Thread-extending class that is supposed to run only one instance at a time (cross-process). In order to achieve that, I'm trying to use a file lock. Here are bits of my code:
class Scanner(Thread):
def __init__(self, path):
Thread.__init__(self)
self.lock_file = open(os.path.join(config.BASEDIR, "scanner.lock"), 'r+')
fcntl.lockf(self.lock_file, fcntl.LOCK_EX | fcntl.LOCK_NB)
# Stuff omitted
def run(self):
logging.info("Starting scan on %s" % self.path)
# More stuff omitted
fcntl.lockf(self.lock_file, fcntl.LOCK_UN)
I was expecting the lockf call to throw an exception if a Scanner thread was already running and not initialize the object at all. However, I can see this in the terminal:
INFO:root:Starting scan on /home/felix/Music
INFO:root:Starting scan on /home/felix/Music
INFO:root:Scan finished
INFO:root:Scan finished
Which suggests that two Scanner threads are running at the same time, no exception thrown. I'm sure I'm missing something really basic here, but I can't seem to figure out what that is. Can anyone help?
Found the solution myself in the end. It was to use fcntl.flock() instead of fcntl.lockf(), with the exact same parameters. Not sure why that made a difference.
You're opening the lock file using r+ which is erasing the previous file and creating a new one. Each thread is locking a different file.
Use w or r+a
Along with using flock, I had to also open the file like so :
fd = os.open(lockfile, os.O_CREAT | os.O_TRUNC | os.O_WRONLY)
It does not work other wise.