I have a Thread-extending class that is supposed to run only one instance at a time (cross-process). In order to achieve that, I'm trying to use a file lock. Here are bits of my code:
class Scanner(Thread):
def __init__(self, path):
Thread.__init__(self)
self.lock_file = open(os.path.join(config.BASEDIR, "scanner.lock"), 'r+')
fcntl.lockf(self.lock_file, fcntl.LOCK_EX | fcntl.LOCK_NB)
# Stuff omitted
def run(self):
logging.info("Starting scan on %s" % self.path)
# More stuff omitted
fcntl.lockf(self.lock_file, fcntl.LOCK_UN)
I was expecting the lockf call to throw an exception if a Scanner thread was already running and not initialize the object at all. However, I can see this in the terminal:
INFO:root:Starting scan on /home/felix/Music
INFO:root:Starting scan on /home/felix/Music
INFO:root:Scan finished
INFO:root:Scan finished
Which suggests that two Scanner threads are running at the same time, no exception thrown. I'm sure I'm missing something really basic here, but I can't seem to figure out what that is. Can anyone help?
Found the solution myself in the end. It was to use fcntl.flock() instead of fcntl.lockf(), with the exact same parameters. Not sure why that made a difference.
You're opening the lock file using r+ which is erasing the previous file and creating a new one. Each thread is locking a different file.
Use w or r+a
Along with using flock, I had to also open the file like so :
fd = os.open(lockfile, os.O_CREAT | os.O_TRUNC | os.O_WRONLY)
It does not work other wise.
Related
I'm running a program the takes in data from other clients, and have been having an enormous amount of problems writing, and changing information in a file, and I feel like I have tried everything. I want to save the information in case the program stops for some reason, and so the data would have saved. I feel like i have tried everything, using file.flush, using os.fsync() with it, I have tried using with open(file) as file: statements to close the file when the program stops, and currently, I's trying atexit to have a function write to the file when it closes, which hasn't worked out, plus doesn't call on errors, so is kinda irrelevant. I'm looking for a way to write to a file, repeatedly, and, well, work. I may not understand something, so please explain it to me. I have been having trouble without end, and need help.
EDIT
AccData = {}
client = discord.Client()
User = discord.User
def SaveData():
pickle.dump(AccData,data)
data.close()
print("data saved")
atexit.register(SaveData)
f = open('DisCoin.json','rb')
AccData = pickle.load(open('DisCoin.json','rb'))
f.seek(0)
f.close()
data = open('DisCoin.json','wb')
Python catches its own exceptions, most signals and exit() then runs atexit routines for cleanup. So, you can deal with normal badness there.
But other bad things happen. A segmenation fault or other internal error. An unknown signal. Code that calls os._exit(). These will cause an early termination and data not yet flushed will be lost. Bad things can happen to any program and if they need extra resiliency, they need some method to handle that.
You can write things to temporary files and rename them to the "live" file only when they are complete. If a program terminates, at least its last saved data is still there.
You can write a log or journal of changes and rebuild the data you want by scanning that log. That's how many file systems work, and "Big Data" map/reduce systems to basically the same thing.
You can move to a database and use its transaction processing or any OLPT system to make sure you do all-or-none updates to your data store.
Your example code is especially fragile because
data = open('DisCoin.json','wb')
trashes existing data on disk. There is no going back with this code! Step one, then, is don't do that. Keep old data until the new stuff is ready.
Here is an example class that manages temporary files for you. Use it instead of open and it will create a temporary file for you to update and will only go live with the data of the with clause exits without an exception. There is no need for an atexit handler if you use this in a with clause.
import shutil
import os
class SidelineFile:
def __init__(self, *args, **kw):
self.args = list(args)
self.kw = kw
def __enter__(self):
self.closed = False
self.orig_filename = self.args[0]
self.args[0] += '.tmp'
try:
mode = self.args[1]
except IndexError:
try:
mode = self.kw['mode']
except KeyError:
mode = 'r'
if 'a' in mode:
shutil.copy2(self.orig_filename, self.args[0])
self.file_obj = open(*self.args, **self.kw)
return self.file_obj
def __exit__(self, exc_type, exc_value, traceback):
if not self.closed:
self.file_obj.close()
self.closed = True
if not exc_type:
os.rename(self.args[0], self.orig_filename)
else:
os.remove(self.args[0])
fn = 'test.txt'
with SidelineFile(fn, 'w') as fp:
fp.write("foo")
print(1, repr(open(fn).read()))
with SidelineFile(fn, mode='a') as fp:
fp.write("bar")
print(2, repr(open(fn).read()))
with SidelineFile(fn, 'w') as fp:
fp.write("foo")
print(3, repr(open(fn).read()))
try:
with SidelineFile(fn, 'a') as fp:
fp.write("bar")
raise IndexError()
except IndexError:
pass
print(4, repr(open(fn).read()))
Personally, I like to achieve this by defining a print function for it.
import os
def fprint(text,**kwargs):
os.chdir('C:\\mypath')
myfile=open('output.txt','a')
if kwargs:
print(text,end=kwargs['end'],file=myfile)
else:
print(text,file=myfile)
myfile.close()
fprint('Hello')
input()
fprint('This is here too',end='!!\n')
The above code will write 'Hello' into the file 'output.txt' at C:\mypath, save it, then after you enter some input will write 'This is here too!!' into the file. If you check the file while the script is waiting for input, it should already contain 'Hello'.
I've written a couple of twitter scrapers in python, and am writing another script to keep them running even if they suffer a timeout, disconnection, etc.
My current solution is as follows:
Each scraper file has a doScrape/1 function in it, which will start up a scraper and run it once, eg:
def doScrape(logger):
try:
with DBWriter(logger=logger) as db:
logger.log_info("starting", __name__)
s = PastScraper(db.getKeywords(), TwitterAuth(), db, logger)
s.run()
finally:
logger.log_info("Done", __name__)
Where run is a near-infinite loop, which won't break unless there is an exception.
In order to run one of each kind of scraper at once, I'm using this code (with a few extra imports):
from threading import Thread
class ScraperThread(Thread):
def __init__(self, module, logger):
super(ScraperThread, self).__init__()
self.module = module # Module should contain a doScrape(logger) function
self.logger = logger
def run(self):
while True:
try:
print "Starting!"
print self.module.doScrape
self.module.doScrape(self.logger)
except: # if for any reason we get disconnected, reconnect
self.logger.log_debug("Restarting scraper", __name__)
if __name__ == "__main__":
with Logger(level="all", handle=open(sys.argv[1], "a")) as l:
past = ScraperThread(PastScraper, l)
stream = ScraperThread(StreamScraper, l)
past.start()
stream.start()
past.join()
stream.join()
However, it appears that my call of doScrape from above is returning immediately, hence "Starting!" is printed in the console repeatedly, and that "Done" message in the finally block is not written to the log, whereas when run individually like so:
if __name__ == "__main__":
# Example instantiation
from Scrapers.Logging import Logger
with Logger(level="all", handle=open(sys.argv[1], "a")) as l:
doScrape(l)
The code runs forever, as expected. I'm a bit stumped.
Is there anything silly that I might have missed?
get rid of the diaper pattern in your run() method, as in: get rid of that catch-all exception handler. You'll probably get the error printed there then. I think there may be something wrong in the DBWriter or other code you're calling from your doScrape function. Perhaps it is not thread-safe. That would explain why running it from the main program directly works, but calling it from a thread fails.
Aha, solved it! It was actually that I didn't realise that a default argument (here in TwitterAuth()) is evaluated at definition time. TwitterAuth reads the API key settings from a file handle, and the default argument opens up the default config file. Since this file handle is generated at definition time, both threads had the same handle, and once one had read it, the other one tried to read from the end of the file, throwing an exception. This is remedied by resetting the file before use, and using a mutex.
Cheers to Irmen de Jong for pointing me in the right direction.
So I want to write some files that might be locked/blocked for write/delete by other processes and like to test that upfront.
As I understand: os.access(path, os.W_OK) only looks for the permissions and will return true although a file cannot currently be written. So I have this little function:
def write_test(path):
try:
fobj = open(path, 'a')
fobj.close()
return True
except IOError:
return False
It actually works pretty well, when I try it with a file that I manually open with a Program. But as a wannabe-good-developer I want to put it in a test to automatically see if it works as expected.
Thing is: If I just open(path, 'a') the file I can still open() it again no problem! Even from another Python instance. Although Explorer will actually tell me that the file is currently open in Python!
I looked up other posts here & there about locking. Most are suggesting to install a package. You migth understand that I don't wanna do that to test a handful lines of code. So I dug up the packages to see the actual spot where the locking is eventually done...
fcntl? I don't have that. win32con? Don't have it either... Now in filelock there is this:
self.fd = os.open(self.lockfile, os.O_CREAT|os.O_EXCL|os.O_RDWR)
When I do that on a file it moans that the file exists!! Ehhm ... yea! That's the idea! But even when I do it on a non-existing path. I can still open(path, 'a') it! Even from another python instance...
I'm beginning to think that I fail to understand something very basic here. Am I looking for the wrong thing? Can someone point me into the right direction?
Thanks!
You are trying to implement the file locking problem using just the system call open(). The Unix-like systems uses by default advisory file locking. This means that cooperating processes may use locks to coordinate access to a file among themselves, but uncooperative processes are also free to ignore locks and access the file in any way they choose. In other words, file locks lock out other file lockers only, not I/O. See Wikipedia.
As stated in system call open() reference the solution for performing atomic file locking using a lockfile is to create a unique file on the same file system (e.g., incorporating hostname and pid), use link(2) to make a link to the lockfile. If link() returns 0, the lock is successful. Otherwise, use stat(2) on the unique file to check if its link count has increased to 2, in which case the lock is also successful.
That is why in filelock they also use the function fcntl.flock() and puts all that stuff in a module as it should be.
Alright! Thanks to those guys I actually have something now! So this is my function:
def lock_test(path):
"""
Checks if a file can, aside from it's permissions, be changed right now (True)
or is already locked by another process (False).
:param str path: file to be checked
:rtype: bool
"""
import msvcrt
try:
fd = os.open(path, os.O_APPEND | os.O_EXCL | os.O_RDWR)
except OSError:
return False
try:
msvcrt.locking(fd, msvcrt.LK_NBLCK, 1)
msvcrt.locking(fd, msvcrt.LK_UNLCK, 1)
os.close(fd)
return True
except (OSError, IOError):
os.close(fd)
return False
And the unittest could look something like this:
class Test(unittest.TestCase):
def test_lock_test(self):
testfile = 'some_test_name4142351345.xyz'
testcontent = 'some random blaaa'
with open(testfile, 'w') as fob:
fob.write(testcontent)
# test successful locking and unlocking
self.assertTrue(lock_test(testfile))
os.remove(testfile)
self.assertFalse(os.path.exists(testfile))
# make file again, lock and test False locking
with open(testfile, 'w') as fob:
fob.write(testcontent)
fd = os.open(testfile, os.O_APPEND | os.O_RDWR)
msvcrt.locking(fd, msvcrt.LK_NBLCK, 1)
self.assertFalse(lock_test(testfile))
msvcrt.locking(fd, msvcrt.LK_UNLCK, 1)
self.assertTrue(lock_test(testfile))
os.close(fd)
with open(testfile) as fob:
content = fob.read()
self.assertTrue(content == testcontent)
os.remove(testfile)
Works. Downsides are:
It's kind of testing itself with itself
so the initial OSError catch is not even tested, only locking again with msvcrt
But I dunno how to make it better now.
I'm trying to use a unix named pipe to output statistics of a running service. I intend to provide a similar interface as /proc where one can see live stats by catting a file.
I'm using a code similar to this in my python code:
while True:
f = open('/tmp/readstatshere', 'w')
f.write('some interesting stats\n')
f.close()
/tmp/readstatshere is a named pipe created by mknod.
I then cat it to see the stats:
$ cat /tmp/readstatshere
some interesting stats
It works fine most of the time. However, if I cat the entry several times in quick successions, sometimes I get multiple lines of some interesting stats instead of one. Once or twice, it has even gone into an infinite loop printing that line forever until I killed it. The only fix that I've got so far is to put a delay of let's say 500ms after f.close() to prevent this issue.
I'd like to know why exactly this happens and if there is a better way of dealing with it.
Thanks in advance
A pipe is simply the wrong solution here. If you want to present a consistent snapshot of the internal state of your process, write that to a temporary file and then rename it to the "public" name. This will prevent all issues that can arise from other processes reading the state while you're updating it. Also, do NOT do that in a busy loop, but ideally in a thread that sleeps for at least one second between updates.
What about a UNIX socket instead of a pipe?
In this case, you can react on each connect by providing fresh data just in time.
The only downside is that you cannot cat the data; you'll have to create a new socket handle and connect() to the socket file.
MYSOCKETFILE = '/tmp/mysocket'
import socket
import os
try:
os.unlink(MYSOCKETFILE)
except OSError: pass
s = socket.socket(socket.AF_UNIX)
s.bind(MYSOCKETFILE)
s.listen(10)
while True:
s2, peeraddr = s.accept()
s2.send('These are my actual data')
s2.close()
Program querying this socket:
MYSOCKETFILE = '/tmp/mysocket'
import socket
import os
s = socket.socket(socket.AF_UNIX)
s.connect(MYSOCKETFILE)
while True:
d = s.recv(100)
if not d: break
print d
s.close()
I think you should use fuse.
it has python bindings, see http://pypi.python.org/pypi/fuse-python/
this allows you to compose answers to questions formulated as posix filesystem system calls
Don't write to an actual file. That's not what /proc does. Procfs presents a virtual (non-disk-backed) filesystem which produces the information you want on demand. You can do the same thing, but it'll be easier if it's not tied to the filesystem. Instead, just run a web service inside your Python program, and keep your statistics in memory. When a request comes in for the stats, formulate them into a nice string and return them. Most of the time you won't need to waste cycles updating a file which may not even be read before the next update.
You need to unlink the pipe after you issue the close. I think this is because there is a race condition where the pipe can be opened for reading again before cat finishes and it thus sees more data and reads it out, leading to multiples of "some interesting stats."
Basically you want something like:
while True:
os.mkfifo(the_pipe)
f = open(the_pipe, 'w')
f.write('some interesting stats')
f.close()
os.unlink(the_pipe)
Update 1: call to mkfifo
Update 2: as noted in the comments, there is a race condition in this code as well with multiple consumers.
I am stuck reading a file in /sys/ which contains the light intensity in Lux of the ambient light sensor on my Nokia N900 phone.
See thread on talk.maemo.org here
I tried to use pyinotify to poll the file but this looks some kind of wrong to me since the file is alway "process_IN_OPEN", "process_IN_ACCESS" and "process_IN_CLOSE_NOWRITE"
I basically want to get the changes ASAP and if something changed trigger an event, execute a class...
Here's the code I tried, which works, but not as I expected (I was hoping for process_IN_MODIFY to be triggered):
#!/usr/bin/env python
import os, time, pyinotify
import pyinotify
ambient_sensor = '/sys/class/i2c-adapter/i2c-2/2-0029/lux'
wm = pyinotify.WatchManager() # Watch Manager
mask = pyinotify.ALL_EVENTS
def action(self, the_event):
value = open(the_event.pathname, 'r').read().strip()
return value
class EventHandler(pyinotify.ProcessEvent):
...
def process_IN_MODIFY(self, event):
print "MODIFY event:", action(self, event)
...
#log.setLevel(10)
notifier = pyinotify.ThreadedNotifier(wm, EventHandler())
notifier.start()
wdd = wm.add_watch(ambient_sensor, mask)
wdd
time.sleep(5)
notifier.stop()
Update 1:
Mmmh, all I came up without having a clue if there is a special mechanism is the following:
f = open('/sys/class/i2c-adapter/i2c-2/2-0029/lux')
while True:
value = f.read()
print value
f.seek(0)
This, wrapped in a own thread, could to the trick, but does anyone have a smarter, less CPU-hogging and faster way to get the latest value?
Since the /sys/file is a pseudo-file which just presents a view on an underlying, volatile operating system value, it makes sense that there would never be a modify event raised. Since the file is "modified" from below it doesn't follow regular file-system semantics.
If a modify event is never raised, using a package like pinotify isn't going to get you anywhere. 'twould be better to look for a platform-specific mechanism.
Response to Update 1:
Since the N900 maemo runtime supports GFileMonitor, you'd do well to check if it can provide the asynchronous event that you desire.
Busy waiting - as I gather you know - is wasteful. On a phone it can really drain a battery. You should at least sleep in your busy loop.
Mmmh, all I came up without having a clue if there is a special mechanism is the following:
f = open('/sys/class/i2c-adapter/i2c-2/2-0029/lux')
while True:
value = f.read()
print value
f.seek(0)
This, wrapped in a own thread, could to the trick, but does anyone have a smarter, less CPU-hogging and faster way to get the latest value?
Cheers
Bjoern