python multithreading and file locking issues

python multithreading and file locking issues - python

I have implemented multithreaded code in two ways, but in both ways I got an error. Could someone explain what causes the problem?
In version 1, I got an exception saying two arguments passed to writekey function instead of one.
In version 2, one of the threads reads empty line, therefore exception is raised while processing the empty string.
I am using locks, shouldn't it prevent multiple threads accessing the function or file at the same time?
Version 1:
class SomeThread(threading.Thread):
def __init__(self, somequeue, lockfile):
threading.Thread.__init__(self)
self.myqueue = somequeue
self.myfilelock = lockfile
def writekey(key):
if os.path.exists(os.path.join('.', outfile)):
with open(outfile, 'r') as fc:
readkey = int(fc.readline().rstrip())
os.remove(os.path.join('.', outfile))
with open(outfile, 'w') as fw:
if readkey > key:
fw.write(str(readkey))
else:
fw.write(str(key))
def run(self):
while(True):
dict = self.myqueue.get()
self.myfilelock.acquire()
try:
self.writekey(dict.get("key"))
finally:
self.myfilelock.release()
self.myqueue.task_done()
populateQueue() # populate queue with objects
filelock = threading.Lock()
for i in range(threadnum):
thread = SomeThread(somequeue, filelock)
thread.setDaemon(True)
thread.start()
somequeue.join()
Version 2:
def writekey(key):
if os.path.exists(os.path.join('.', outfile)):
with open(outfile, 'r') as fc:
# do something...
os.remove(os.path.join('.', outfile))
with open(outfile, 'w') as fw:
# do something...
class SomeThread(threading.Thread):
def __init__(self, somequeue, lockfile):
threading.Thread.__init__(self)
self.myqueue = somequeue
self.myfilelock = lockfile
def run(self):
while(True):
dict = self.myqueue.get()
self.myfilelock.acquire()
try:
writekey(dict.get("key"))
finally:
myfilelock.release()
self.myqueue.task_done()
# Same as above ....

In version 1, def writekey(key) should be declared with "self" as the first parameter, i.e.
def writekey(self, key):
The problem in version 2 is less clear. I assume that an empty line is being read while reading outfile. This is normal and it indicates that the end-of-file has been reached. Normally you would just break out of your read loop. Usually it is preferable to read your file line-by-line in a for loop, e.g.
with open(outfile, 'r') as fc:
for line in fc:
# process the line
The for loop will terminate naturally upon reaching end-of-file.

Related

How do I add a time delay to a Python instance method?

I want to write good tests to make sure my concurrent data structure works. But the tests are passing even on a class that is obviously not thread-safe.
class NotThreadSafe:
def __init__(self):
self.set1 = set()
self.set2 = set()
def add_to_sets(self, item):
self._add_to_set1(item)
self._add_to_set2(item)
def _add_to_set1(self, item):
self.set1.add(item)
def _add_to_set2(self, item):
self.set2.add(item)
def are_sets_equal_length(self):
return len(self.set1) == len(self.set2)
My tests have a reader thread and a writer thread running concurrently. The writer thread calls add_to_sets and the reader thread calls are_sets_equal_length.
But the reader thread always observes are_sets_equal_length to be True, even though the writer thread should theoretically cause inequalities.
How can I add some time delay on add_to_set2 so that it forces the race condition to surface?
The test:
import threading
import time
def writer_fn(nts: NotThreadSafe):
for i in range(1000):
nts.add_to_sets(i)
def reader_fn(nts: NotThreadSafe, stop: list, results: list):
while not len(stop):
if not nts.are_sets_equal_length():
results.append(False)
return
results.append(True)
def test_nts():
nts = NotThreadSafe()
stop = []
results = []
reader = threading.Thread(target=reader_fn, args=[nts, stop, results])
writer = threading.Thread(target=writer_fn, args=[nts])
reader.start()
writer.start()
writer.join()
stop.append(True)
reader.join()
assert not results[0]

Step 1: write a wrapper that creates a new function containing a time delay.
def slow_wrapper(method):
"""Adds a tiny delay to a method. Good for triggering race conditions that would otherwise be very rare."""
def wrapped_method(*args):
time.sleep(0.001)
return method(*args)
return wrapped_method
Step 2: In the test function, after creating the object, change add_to_set2 into a slow version:
nts = NotThreadSafe()
# change _add_to_set2 into a time-delayed version
nts._add_to_set2 = slow_wrapper(nts._add_to_set2)
Step 3: Run the tests. The failure should be triggered properly.

How to capture prints in real time from function?

I want to capture all the prints and do something like return them but keep running the function.
I found this method but it only returns the prints when the code is finished.
f = io.StringIO()
with redirect_stdout(f):
# my code
return f.getvalue()
Is there any method to capture every print in real-time?

You could write your own file-like object that processes lines of text as it sees them. In the simplest case you only need to supply a write method as shown below. The tricky part is knowing when a "print" call is done. print may call stdout.write several times to do a single print operation. In this example, I did processing whenever a newline is seen. This code does not return interim prints but does allow you to intercept the writes to stdout and process them before returning to the function that calls print.
from contextlib import redirect_stdout
import sys
real_stdout_for_test = sys.stdout
class WriteProcessor:
def __init__(self):
self.buf = ""
def write(self, buf):
# emit on each newline
while buf:
try:
newline_index = buf.index("\n")
except ValueError:
# no newline, buffer for next call
self.buf += buf
break
# get data to next newline and combine with any buffered data
data = self.buf + buf[:newline_index + 1]
self.buf = ""
buf = buf[newline_index + 1:]
# perform complex calculations... or just print with a note.
real_stdout_for_test.write("fiddled with " + data)
with redirect_stdout(WriteProcessor()):
print("hello there")
print("a\nprint\nof\nmany\nlines")
print("goodbye ", end="")
print("for now")

how to call a method on the GUI thread?

I am making a small program that gets the latest revenue from a webshop, if its more than the previous amount it makes a sound, I am using Pyglet but I get errors because its not being called from the main thread. I would like to know how to call a method on the main thread. see error below:
'thread that imports pyglet.app' RuntimeError: EventLoop.run() must
be called from the same thread that imports pyglet.app
def work ():
threading.Timer(5, work).start()
file_Name = "save.txt"
lastRevenue = 0
data = json.load(urllib2.urlopen(''))
newRevenue = data["revenue"]
if (os.path.getsize(file_Name) <= 0):
with open(file_Name, "wb") as f:
f.write('%d' % newRevenue)
f.flush()
with open(file_Name, "rb") as f:
lastRevenue = float(f.readline().strip())
print lastRevenue
print newRevenue
f.close()
if newRevenue > lastRevenue:
with open(file_Name, "wb") as f:
f.write('%f' % newRevenue)
f.flush()
playsound()
def playsound():
music = pyglet.resource.media('cash.wav')
music.play()
pyglet.app.run()
work()

It's not particularly strange. work is being executed as a separate thread from where pyglet was imported.
pyglet.app when imported sets up a lot of context variables and what not. I say what not because I actually haven't bothered checking deeper into what it actually sets up.
And OpenGL can't execute things out of it's own context (the main thread where it resides). There for you're not allowed to poke around on OpenGL from a neighboring thread. If that makes sense.
However, if you create your own .run() function and use a class based method of activating Pyglet you can start the GUI from the thread.
This is a working example of how you could set it up:
import pyglet
from pyglet.gl import *
from threading import *
# REQUIRES: AVBin
pyglet.options['audio'] = ('alsa', 'openal', 'silent')
class main(pyglet.window.Window):
def __init__ (self):
super(main, self).__init__(300, 300, fullscreen = False)
self.x, self.y = 0, 0
self.bg = pyglet.sprite.Sprite(pyglet.image.load('background.jpg'))
self.music = pyglet.resource.media('cash.wav')
self.music.play()
self.alive = 1
def on_draw(self):
self.render()
def on_close(self):
self.alive = 0
def render(self):
self.clear()
self.bg.draw()
self.flip()
def run(self):
while self.alive == 1:
self.render()
if not self.music.playing:
self.alive = 0
# -----------> This is key <----------
# This is what replaces pyglet.app.run()
# but is required for the GUI to not freeze
#
event = self.dispatch_events()
class ThreadExample(Thread):
def __init__(self):
Thread.__init__(self)
self.start()
def run(self):
x = main()
x.run()
Test_One = ThreadExample()
Note that you still have to start the actual GUI code from within the thread.
I STRONGLY RECOMMEND YOU DO THIS INSTEAD THO
Seeing as mixing threads and GUI calls is a slippery slope, I would suggest you go with a more cautious path.
from threading import *
from time import sleep
def is_main_alive():
for t in enumerate():
if t.name == 'MainThread':
return t.isAlive()
class worker(Thread):
def __init__(self, shared_dictionary):
Thread.__init__(self)
self.shared_dictionary
self.start()
def run(self):
while is_main_alive():
file_Name = "save.txt"
lastRevenue = 0
data = json.load(urllib2.urlopen(''))
newRevenue = data["revenue"]
if (os.path.getsize(file_Name) <= 0):
with open(file_Name, "wb") as f:
f.write('%d' % newRevenue)
f.flush()
with open(file_Name, "rb") as f:
lastRevenue = float(f.readline().strip())
print lastRevenue
print newRevenue
f.close()
if newRevenue > lastRevenue:
with open(file_Name, "wb") as f:
f.write('%f' % newRevenue)
f.flush()
#playsound()
# Instead of calling playsound() here,
# set a flag in the shared dictionary.
self.shared_dictionary['Play_Sound'] = True
sleep(5)
def playsound():
music = pyglet.resource.media('cash.wav')
music.play()
pyglet.app.run()
shared_dictionary = {'Play_Sound' : False}
work_handle = worker(shared_dictionary)
while 1:
if shared_dictionary['Play_Sound']:
playsound()
shared_dictionary['Play_Sound'] = False
sleep(0.025)
It's a rough draft of what you're looking for.
Basically some sort of event/flag driven backend that the Thread and the GUI can use to communicate with each other.
Essentially you have a worker thread (just as you did before), it checks whatever file you want every 5 seconds and if it detects newRevenue > lastRevenue it will set a specific flag to True. Your main loop will detect this change, play a sound and revert the flag back to False.
I've by no means included any error handling here on purpose, we're here to help and not create entire solutions. I hope this helps you in the right direction.

python multithreading join causes hang

I'm using the threading module in python to do some tests on I/O bound processing.
Basically, I am simply reading a file, line by line and writing it out concurrently.
I put the reading and writing loops in separate threads and use a Queue to pass data between:
q = Queue()
rt = ReadThread(ds)
wt = WriteThread(outBand)
rt.start()
wt.start()
If I run it as above, it works fine, but the interpreter crashes at the end of execution. (Any ideas why?)
If I add:
rt.join()
wt.join()
at the end, the interpreter simply hangs. Any ideas why?
The code for the ReadThread and WriteThread classes is as follows:
class ReadThread(threading.Thread):
def __init__(self, ds):
threading.Thread.__init__(self)
self.ds = ds #The raster datasource to read from
def run(self):
reader(self.ds)
class WriteThread(threading.Thread):
def __init__(self, ds):
threading.Thread.__init__(self)
self.ds = ds #The raster datasource to write to
def run(self):
writer(self.ds)
def reader(ds):
"""Reads data from raster, starting with a chunk for three lines then removing/adding a row for the remainder"""
data = read_lines(ds)
q.put(data[1, :]) #add to the queue
for i in np.arange(3, ds.RasterYSize):
data = np.delete(data, 0, 0)
data = np.vstack([data, read_lines(ds, int(i), 1)])
q.put(data[1,:]) # put the relevant data on the queue
def writer(ds):
""" Writes data from the queue to a raster file """
i = 0
while True:
arr = q.get()
ds.WriteArray(np.atleast_2d(arr), xoff = 0, yoff = i)
i +=1

Call q.get() will block infinitely in case your Queue is empty.
You can try to use get_nowait(), but you have to make sure that by the time you get to the writer function, there is something in the Queue.

wt.join() waits for the thread to finish, which it never does because of the infinite loop around q.get() in writer. To make it finish, add
q.put(None)
as the last line of reader, and change writer to
def writer(ds):
""" Writes data from the queue to a raster file """
for i, arr in enumerate(iter(q.get, None)):
ds.WriteArray(np.atleast_2d(arr), xoff = 0, yoff = i)
iter(q.get, None) yields values from q until q.get returns None. I added enumerate just for the sake of simplifying the code further.

Creating interruptible process in python

I'm creating a python script of which parses a large (but simple) CSV.
It'll take some time to process. I would like the ability to interrupt the parsing of the CSV so I can continue at a later stage.
Currently I have this - of which lives in a larger class: (unfinished)
Edit:
I have some changed code. But the system will parse over 3 million rows.
def parseData(self)
reader = csv.reader(open(self.file))
for id, title, disc in reader:
print "%-5s %-50s %s" % (id, title, disc)
l = LegacyData()
l.old_id = int(id)
l.name = title
l.disc_number = disc
l.parsed = False
l.save()
This is the old code.
def parseData(self):
#first line start
fields = self.data.next()
for row in self.data:
items = zip(fields, row)
item = {}
for (name, value) in items:
item[name] = value.strip()
self.save(item)
Thanks guys.

If under linux, hit Ctrl-Z and stop the running process. Type "fg" to bring it back and start where you stopped it.

You can use signal to catch the event. This is a mockup of a parser than can catch CTRL-C on windows and stop parsing:
import signal, tme, sys
def onInterupt(signum, frame):
raise Interupted()
try:
#windows
signal.signal(signal.CTRL_C_EVENT, onInterupt)
except:
pass
class Interupted(Exception): pass
class InteruptableParser(object):
def __init__(self, previous_parsed_lines=0):
self.parsed_lines = previous_parsed_lines
def _parse(self, line):
# do stuff
time.sleep(1) #mock up
self.parsed_lines += 1
print 'parsed %d' % self.parsed_lines
def parse(self, filelike):
for line in filelike:
try:
self._parse(line)
except Interupted:
print 'caught interupt'
self.save()
print 'exiting ...'
sys.exit(0)
def save(self):
# do what you need to save state
# like write the parse_lines to a file maybe
pass
parser = InteruptableParser()
parser.parse([1,2,3])
Can't test it though as I'm on linux at the moment.

The way I'd do it:
Puty the actual processing code in a class, and on that class I'd implement the Pickle protocol (http://docs.python.org/library/pickle.html ) (basically, write proper __getstate__ and __setstate__ functions)
This class would accept the filename, keep the open file, and the CSV reader instance as instance members. The __getstate__ method would save the current file position, and setstate would reopen the file, forward it to the proper position, and create a new reader.
I'd perform the actuall work in an __iter__ method, that would yeld to an external function after each line was processed.
This external function would run a "main loop" monitoring input for interrupts (sockets, keyboard, state of an specific file on the filesystem, etc...) - everything being quiet, it would just call for the next iteration of the processor. If an interrupt happens, it would pickle the processor state to an specific file on disk.
When startingm the program just has to check if a there is a saved execution, if so, use pickle to retrieve the executor object, and resume the main loop.
Here goes some (untested) code - the iea is simple enough:
from cPickle import load, dump
import csv
import os, sys
SAVEFILE = "running.pkl"
STOPNOWFILE = "stop.now"
class Processor(object):
def __init__(self, filename):
self.file = open(filename, "rt")
self.reader = csv.reader(self.file)
def __iter__(self):
for line in self.reader():
# do stuff
yield None
def __getstate__(self):
return (self.file.name, self.file.tell())
def __setstate__(self, state):
self.file = open(state[0],"rt")
self.file.seek(state[1])
self.reader = csv.reader(self.File)
def check_for_interrupts():
# Use your imagination here!
# One simple thing would e to check for the existence of an specific file
# on disk.
# But you go all the way up to instantiate a tcp server and listen to
# interruptions on the network
if os.path.exists(STOPNOWFILE):
return True
return False
def main():
if os.path.exists(SAVEFILE):
with open(SAVEFILE) as savefile:
processor = load(savefile)
os.unlink(savefile)
else:
#Assumes the name of the .csv file to be passed on the command line
processor = Processor(sys.argv[1])
for line in processor:
if check_for_interrupts():
with open(SAVEFILE, "wb") as savefile:
dump(processor)
break
if __name__ == "__main__":
main()

My Complete Code
I followed the advice of #jsbueno with a flag - but instead of another file, I kept it within the class as a variable:
I create a class - when I call it asks for ANY input and then begins another process doing my work. As its looped - if I were to press a key, the flag is set and only checked when the loop is called for my next parse. Thus I don't kill the current action.
Adding a process flag in the database for each object from the data I'm calling means I can start this any any time and resume where I left off.
class MultithreadParsing(object):
process = None
process_flag = True
def f(self):
print "\nMultithreadParsing has started\n"
while self.process_flag:
''' get my object from database '''
legacy = LegacyData.objects.filter(parsed=False)[0:1]
if legacy:
print "Processing: %s %s" % (legacy[0].name, legacy[0].disc_number)
for l in legacy:
''' ... Do what I want it to do ...'''
sleep(1)
else:
self.process_flag = False
print "Nothing to parse"
def __init__(self):
self.process = Process(target=self.f)
self.process.start()
print self.process
a = raw_input("Press any key to stop \n")
print "\nKILL FLAG HAS BEEN SENT\n"
if a:
print "\nKILL\n"
self.process_flag = False
Thanks for all you help guys (especially yours #jsbueno) - if it wasn't for you I wouldn't have got this class idea.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python multithreading and file locking issues - python

Related

How do I add a time delay to a Python instance method?

How to capture prints in real time from function?

how to call a method on the GUI thread?

python multithreading join causes hang

Creating interruptible process in python

Categories

Resources