I want to open multiple files using a with statement (so I get the benefit of the context manager) based on boolean flags which instruct whether my program should or should not actually open each file.
I know I can use a with statement to open multiple files, like:
with open('log.txt', 'w') as logfile, open('out_a.txt', 'w') as out_a, open('out_b.txt', 'w') as out_b:
# do something with logfile, out_a and out_b
# all files are closed here
I want to run a similar statement, but only opening certain files based on their corresponding flags. I thought about implementing it as a conditional_openfunction, something like:
write_log = True
write_out_a = False
write_out_b = True
with conditional_open('log.txt', 'w', cond=write_log) as logfile, open('out_a.txt', 'w', cond=write_out_a) as out_a, open('out_b.txt', 'w', cond=write_out_b) as out_b:
# do something with logfile, out_a and out_b
# all files are closed here
But I'm a little confused as to how properly create that function. Ideally, coditional_open would either return an open file handle or None (in which case the file is never created/touched/deleted):
def conditional_open(filename, mode, cond):
return open(filename, mode) if cond else None
But I fear that this skips the benefits of the context manager when opening a file, since I'm calling open outside from it. Is this assumption correct?
Can anyone give some ideas about how I could be doing this? I know I could create mock file objects based on the conditions and write to them instead, but it sounds a bit too convoluted to me - this seems like a simple problem, which should have a simple solution in Python.
Just set up your function as a context manager.
from contextlib import contextmanager
#contextmanager
def conditional_open(f_name, mode, cond):
if not cond:
yield None
resource = open(f_name, mode)
try:
yield resource
finally:
resource.close()
Related
I'm writing a manager class that will allow creation of different types of log files (raw, CSV, custom data format), and I'd like to keep the log file open to write lines as they come in. The log file can also be started and stopped by events (button presses, conditions).
I want to know if I can combine the with open('file') as file: syntax, with an instance variable in a class - so I'm not stuck polling in a loop while the file is open, but instead can write to the file by event.
I know how to use the open and close methods, but everyone says "with" is much more robust, and (more) certain to write the file to disk.
I want to do something like this:
class logfile:
def __init__(self,filename,mode):
with open(filename, mode) as self.f:
return
def write(self, input):
self.f.write(input)
and use it like:
lf = logfile("junk.txt","wt") # yeah, lf has to be global to use like this. Keeping demo simple here.
...then leave method, do other stuff to refresh screen, respond to other events, and later when a data line to log comes in:
lf.write(dataline)
I then expect things to close cleanly, file to get flushed to disk when lf disappears - either implicitly at program close, or explicitly when I set lf to None.
When I try this, the file is (apparently) closed at return from creation of lf. I can inspect lf and see that
lf.f == <_io.TextIOWrapper name='junk.txt' mode='wt' encoding='UTF-8'>
but when I try to use lf.write("text"), I get:
ValueError: I/O operation on closed file.
Is there a way of using "with" and keeping it in an instance?
Failing that, should I just use open, close and write, and ensure I have try/except/finally in init and close in exit?
The with syntax is a context manager in Python. So, what it does is, it calls file.close() method once it is out of context. So, the file is closed already after its return. I think you can write your class this way:
class logfile:
def __init__(self, filename, mode):
self.filename = filename
self.mode = mode
def write(self, text):
with open(self.filename, self.mode) as f:
f.write(text)
I have several instances of the same python script running in parallel, reading and writing to the same json file: First an instance reads information from the json file, then processes it then locks it then reads it again, to get the up to date contents of the file (might have been altered by other instances) then writes to it and releases the lock. Well, that is, this is how it would work if it... worked
A stripped down version of the locking and writing part in my script looks like this:
import json
import fcntl
data = json.load(open('test.json'))
# do things with data
with open('test.json', 'w+') as file:
fcntl.flock(file, fcntl.LOCK_EX | fcntl.LOCK_NB)
data = json.load(open('test.json'))
fcntl.flock(file, fcntl.LOCK_UN)
But the open function seems to kind of clear the file, as it will be empty after running this snippet and json complains about invalid file format.
How do I have to set this up correctly?
But the open function seems to kind of clear the file
Yes, opening a file in w write mode always clears the file; from the open() function documentation:
'w'
open for writing, truncating the file first
[...]
The default mode is 'r' (open for reading text, synonym of 'rt'). For binary read-write access, the mode 'w+b' opens and truncates the file to 0 bytes. 'r+b' opens the file without truncation.
You want to lock the file before truncating it. You can also open the file in 'r+' mode (reading and writing), at which point you need to manually truncate it after locking.
You also will need to lock the file for reading, because you don't want your readers to end up with truncated data when they try to read while another process is busy replacing the contents. Use a shared lock, at which point other processes are allowed to obtain a shared lock too, making it possible for many processes to read the data without having to wait for one another. A process that wants to write has to grab an exclusive lock, which is only going to be awarded when there are no shared locks anymore.
Personally, I'd create a context manager that handles the locking (either in exclusive mode for writing, or in shared mode for reading), and only truncate the file after obtaining the lock. You'll also need to account for the file not yet existing, and if you don't want to wait for locks forever, you need to handle timeouts (meaning you need to use LOCK_NB in a loop and test for the return value to see if the lock was acquired, until a certain amount of time has passed).
In the following context manager, I used the os.open() low-level system call to ensure the file is created when trying to lock it for exclusive access without truncating it if it already exists:
import errno
import fcntl
import os
import time
class Timeout(Exception):
"""Could not obtain a lock within the time given"""
class LockException(Exception):
"""General (file) locking-related exception"""
class LockedFile:
"""Lock and open a file.
If the file is opened for writing, an exclusive lock is used,
otherwise it is a shared lock
"""
def __init__(self, path, mode, timeout=None, **fileopts):
self.path = path
self.mode = mode
self.fileopts = fileopts
self.timeout = timeout
# lock in exclusive mode when writing or appending (including r+)
self._exclusive = set('wa+').intersection(mode)
self._lockfh = None
self._file = None
def _acquire(self):
if self._exclusive:
# open the file in write & create mode, but *without the
# truncate flag* to make sure it is created only if it
# doesn't exist yet
lockfhmode, lockmode = os.O_WRONLY | os.O_CREAT, fcntl.LOCK_EX
else:
lockfhmode, lockmode = os.O_RDONLY, fcntl.LOCK_SH
self._lockfh = os.open(self.path, lockfhmode)
start = time.time()
while True:
try:
fcntl.lockf(self._lockfh, lockmode | fcntl.LOCK_NB)
return
except OSError as e:
if e.errno not in {errno.EACCES, errno.EAGAIN}:
raise
if self.timeout is not None and time.time() - start > self.timeout:
raise Timeout()
time.sleep(0.1)
def _release(self):
fcntl.lockf(self._lockfh, fcntl.LOCK_UN)
os.close(self._lockfh)
def __enter__(self):
if self._file is not None:
raise LockException('Lock already taken')
self._acquire()
try:
self._file = open(self.path, self.mode, **self.fileopts)
except IOException:
self._release()
raise
return self._file
def __exit__(self, *exc):
if self._file is None:
raise LockException('Not locked')
try:
self._file.close()
finally:
self._file = None
self._release()
The processes that try to read the file then use:
with LockedFile('test.json', 'r') as file:
data = json.load(file)
and the process that wants to write uses:
with LockedFile('test.json', 'w') as file:
json.dump(data, file)
If you want to allow for a timeout, add a try/except block around the with block and catch the Timeout exception; you'll need to decide what should happen then:
try:
with LockedFile('test.json', 'w', timeout=10) as file:
json.dump(data, file)
except Timeout:
# could not acquire an exclusive lock to write the file. What now?
You used "w+" for opening the file.
w+
Opens a file for both writing and reading. Overwrites the existing file if the file exists. If the file does not exist, creates a new file for reading and writing.
So instead of w+ use a.
Looks to me you can really use threading library or multiprocessing to do this in more elegant way by using Locks, instead of running multiple instances of the same python script.
Source : www.tutorialspoint.com, Python Docs
I'm running a program the takes in data from other clients, and have been having an enormous amount of problems writing, and changing information in a file, and I feel like I have tried everything. I want to save the information in case the program stops for some reason, and so the data would have saved. I feel like i have tried everything, using file.flush, using os.fsync() with it, I have tried using with open(file) as file: statements to close the file when the program stops, and currently, I's trying atexit to have a function write to the file when it closes, which hasn't worked out, plus doesn't call on errors, so is kinda irrelevant. I'm looking for a way to write to a file, repeatedly, and, well, work. I may not understand something, so please explain it to me. I have been having trouble without end, and need help.
EDIT
AccData = {}
client = discord.Client()
User = discord.User
def SaveData():
pickle.dump(AccData,data)
data.close()
print("data saved")
atexit.register(SaveData)
f = open('DisCoin.json','rb')
AccData = pickle.load(open('DisCoin.json','rb'))
f.seek(0)
f.close()
data = open('DisCoin.json','wb')
Python catches its own exceptions, most signals and exit() then runs atexit routines for cleanup. So, you can deal with normal badness there.
But other bad things happen. A segmenation fault or other internal error. An unknown signal. Code that calls os._exit(). These will cause an early termination and data not yet flushed will be lost. Bad things can happen to any program and if they need extra resiliency, they need some method to handle that.
You can write things to temporary files and rename them to the "live" file only when they are complete. If a program terminates, at least its last saved data is still there.
You can write a log or journal of changes and rebuild the data you want by scanning that log. That's how many file systems work, and "Big Data" map/reduce systems to basically the same thing.
You can move to a database and use its transaction processing or any OLPT system to make sure you do all-or-none updates to your data store.
Your example code is especially fragile because
data = open('DisCoin.json','wb')
trashes existing data on disk. There is no going back with this code! Step one, then, is don't do that. Keep old data until the new stuff is ready.
Here is an example class that manages temporary files for you. Use it instead of open and it will create a temporary file for you to update and will only go live with the data of the with clause exits without an exception. There is no need for an atexit handler if you use this in a with clause.
import shutil
import os
class SidelineFile:
def __init__(self, *args, **kw):
self.args = list(args)
self.kw = kw
def __enter__(self):
self.closed = False
self.orig_filename = self.args[0]
self.args[0] += '.tmp'
try:
mode = self.args[1]
except IndexError:
try:
mode = self.kw['mode']
except KeyError:
mode = 'r'
if 'a' in mode:
shutil.copy2(self.orig_filename, self.args[0])
self.file_obj = open(*self.args, **self.kw)
return self.file_obj
def __exit__(self, exc_type, exc_value, traceback):
if not self.closed:
self.file_obj.close()
self.closed = True
if not exc_type:
os.rename(self.args[0], self.orig_filename)
else:
os.remove(self.args[0])
fn = 'test.txt'
with SidelineFile(fn, 'w') as fp:
fp.write("foo")
print(1, repr(open(fn).read()))
with SidelineFile(fn, mode='a') as fp:
fp.write("bar")
print(2, repr(open(fn).read()))
with SidelineFile(fn, 'w') as fp:
fp.write("foo")
print(3, repr(open(fn).read()))
try:
with SidelineFile(fn, 'a') as fp:
fp.write("bar")
raise IndexError()
except IndexError:
pass
print(4, repr(open(fn).read()))
Personally, I like to achieve this by defining a print function for it.
import os
def fprint(text,**kwargs):
os.chdir('C:\\mypath')
myfile=open('output.txt','a')
if kwargs:
print(text,end=kwargs['end'],file=myfile)
else:
print(text,file=myfile)
myfile.close()
fprint('Hello')
input()
fprint('This is here too',end='!!\n')
The above code will write 'Hello' into the file 'output.txt' at C:\mypath, save it, then after you enter some input will write 'This is here too!!' into the file. If you check the file while the script is waiting for input, it should already contain 'Hello'.
So I want to write some files that might be locked/blocked for write/delete by other processes and like to test that upfront.
As I understand: os.access(path, os.W_OK) only looks for the permissions and will return true although a file cannot currently be written. So I have this little function:
def write_test(path):
try:
fobj = open(path, 'a')
fobj.close()
return True
except IOError:
return False
It actually works pretty well, when I try it with a file that I manually open with a Program. But as a wannabe-good-developer I want to put it in a test to automatically see if it works as expected.
Thing is: If I just open(path, 'a') the file I can still open() it again no problem! Even from another Python instance. Although Explorer will actually tell me that the file is currently open in Python!
I looked up other posts here & there about locking. Most are suggesting to install a package. You migth understand that I don't wanna do that to test a handful lines of code. So I dug up the packages to see the actual spot where the locking is eventually done...
fcntl? I don't have that. win32con? Don't have it either... Now in filelock there is this:
self.fd = os.open(self.lockfile, os.O_CREAT|os.O_EXCL|os.O_RDWR)
When I do that on a file it moans that the file exists!! Ehhm ... yea! That's the idea! But even when I do it on a non-existing path. I can still open(path, 'a') it! Even from another python instance...
I'm beginning to think that I fail to understand something very basic here. Am I looking for the wrong thing? Can someone point me into the right direction?
Thanks!
You are trying to implement the file locking problem using just the system call open(). The Unix-like systems uses by default advisory file locking. This means that cooperating processes may use locks to coordinate access to a file among themselves, but uncooperative processes are also free to ignore locks and access the file in any way they choose. In other words, file locks lock out other file lockers only, not I/O. See Wikipedia.
As stated in system call open() reference the solution for performing atomic file locking using a lockfile is to create a unique file on the same file system (e.g., incorporating hostname and pid), use link(2) to make a link to the lockfile. If link() returns 0, the lock is successful. Otherwise, use stat(2) on the unique file to check if its link count has increased to 2, in which case the lock is also successful.
That is why in filelock they also use the function fcntl.flock() and puts all that stuff in a module as it should be.
Alright! Thanks to those guys I actually have something now! So this is my function:
def lock_test(path):
"""
Checks if a file can, aside from it's permissions, be changed right now (True)
or is already locked by another process (False).
:param str path: file to be checked
:rtype: bool
"""
import msvcrt
try:
fd = os.open(path, os.O_APPEND | os.O_EXCL | os.O_RDWR)
except OSError:
return False
try:
msvcrt.locking(fd, msvcrt.LK_NBLCK, 1)
msvcrt.locking(fd, msvcrt.LK_UNLCK, 1)
os.close(fd)
return True
except (OSError, IOError):
os.close(fd)
return False
And the unittest could look something like this:
class Test(unittest.TestCase):
def test_lock_test(self):
testfile = 'some_test_name4142351345.xyz'
testcontent = 'some random blaaa'
with open(testfile, 'w') as fob:
fob.write(testcontent)
# test successful locking and unlocking
self.assertTrue(lock_test(testfile))
os.remove(testfile)
self.assertFalse(os.path.exists(testfile))
# make file again, lock and test False locking
with open(testfile, 'w') as fob:
fob.write(testcontent)
fd = os.open(testfile, os.O_APPEND | os.O_RDWR)
msvcrt.locking(fd, msvcrt.LK_NBLCK, 1)
self.assertFalse(lock_test(testfile))
msvcrt.locking(fd, msvcrt.LK_UNLCK, 1)
self.assertTrue(lock_test(testfile))
os.close(fd)
with open(testfile) as fob:
content = fob.read()
self.assertTrue(content == testcontent)
os.remove(testfile)
Works. Downsides are:
It's kind of testing itself with itself
so the initial OSError catch is not even tested, only locking again with msvcrt
But I dunno how to make it better now.
I am using python's csv module to extract data from a csv that is constantly being updated by an external tool. I have run into a problem where when I reach the end of the file I get a StopIteration error, however, I would like the script to continue to loop waiting for more lines to be added by the external tool.
What I came up with so far to do this is:
f = open('file.csv')
csvReader = csv.reader(f, delimiter=',')
while 1:
try:
doStuff(csvReader.next())
except StopIteration:
depth = f.tell()
f.close()
f = open('file.csv')
f.seek(depth)
csvReader = csv.reader(f, delimiter=',')
This has the intended functionality but it also seems terrible. Looping after catching the StopIteration is not possible since once StopIteration is thrown, it will throw a StopIteration on every subsequent call to next(). Anyone have any suggestions on how to implement this is in such a way that I don't have to do this silly tell and seeking? Or have a different python module that can easily support this functionality.
Your problem is not with the CSV reader, but with the file object itself. You may still have to do the crazy gyrations you're doing in your snippet above, but it would be better to create a file object wrapper or subclass that does it for you, and use that with your CSV reader. That keeps the complexity isolated from your csv processing code.
For instance (warning: untested code):
class ReopeningFile(object):
def __init__(self, filename):
self.filename = filename
self.f = open(self.filename)
def next(self):
try:
self.f.next()
except StopIteration:
depth = self.f.tell()
self.f.close()
self.f = open(self.filename)
self.f.seek(depth)
# May need to sleep here to allow more data to come in
# Also may need a way to signal a real StopIteration
self.next()
def __iter__(self):
return self
Then your main code becomes simpler, as it is freed from having to manage the file reopening (note that you also don't have to restart your csv_reader whenever the file restarts:
import csv
csv_reader = csv.reader(ReopeningFile('data.csv'))
for each in csv_reader:
process_csv_line(each)
Producer-consumer stuff can get a bit tricky. How about using seek and reading bytes instead? What about using a named pipe?
Heck, why not communicate over a local socket?
You rarely need to catch StopIteration explicitly. Do this:
for row in csvReader:
doStuff(row)
As for detecting when new lines are written to the file, you can either popen a tail -f process or write out the Python code for what tail -f does. (It isn't complicated; it basically just stats the file every second to see if it's changed. Here's the C source code of tail.)
EDIT: Disappointingly, popening tail -f doesn't work as I expected in Python 2.x. It seems iterating over the lines of a file is implemented using fread and a largeish buffer, even if the file is supposed to be unbuffered (like when subprocess.py creates the file, passing bufsize=0). But popening tail would be a mildly ugly hack anyway.