Running two endless parallel loops - python

I want to run two endless parallel loops. One is reading data from a server and updates an object with a number. The other is doing nothing else then reading it and in case of change, processing it. Does not have to be in sync or so. So my questions are :
In case of write from one side and read from another, does Python have issues with it ?
In case I get a sync problem, do I need to lock the read/write processes ? Any other way
I should do it ?
What is best to use, thread or threading ?
As the next step, I will read from 100 sites and update 100 objects,
and read from 100 loops for the changes. Is it recommend to use Multiprocessing from the
beginning so I can scale without problems ? Do I need at the read and write issues ?
Any help is appreciated.

Short answer is, whatever you think will be understandable for you.
Meaning, your code should make sense to you for learning purposes..
Here's an example, it's light and easy to use.
Getting values from and to the thread is easy..
It's not actual multi-threading tho (same CPU core)
from threading import *
class worker(Thread):
def __init__(self, input=0):
self.input = input
Thread.__init__(self)
self.start()
def run(self):
while 1:
self.input += 1
x = worker(-100)
y = worker(x.input)
print y.input
This is just an example to show that the Y thread can access the data in x.. in practice this can be dangerous considering that both threads will be updating the same variable :) (In short: -100 will be calculated twice per cycle, -98, -96, -94.. etc)
Will not span across multiple CPU's
Easy to use ( accessing data across threads is easy )
Logical code, if you're not familar with queue systems or distributed systems
Will raise a error if the OS can't create more threads (a "limitation")

from threading import Thread
from Queue import Queue
class producer(Thread):
def __init__(self,queue):
Thread.__init__(self)
self.queue=queue
self.start()
def run(self):
while 1:
self.queue.put(update_value())
class consumer(Thread):
def __init__(self,queue):
Thread.__init__(self)
self.queue=queue
self.start()
def run(self):
while True:
value = queue.get()
do_whatever_you_want(value)
queue = Queue()
producer(queue)
consumer(queue)
notice that you can scale by using 100 producer and one consumer (and of course one queue) 100 threads should be ok but things would be different if you wanted 10000

Related

Does python provide a synchronized buffer?

I'm very familiar with Python queue.Queue. This is definitely the thing you want when you want to have a reliable stream between consumer and producer threads.
However, sometimes you have producers that are faster than consumers and are forced to drop data (as for live video frame capture, for example. We may typically want to buffer just the last one, or two frames).
Does Python provide an asynchronous buffer class, similar to queue.Queue?
It's not exactly obvious how to correctly implement one using queue.Queue.
I could, for example:
buf = queue.Queue(maxsize=3)
def produce(msg):
if buf.full():
buf.get(block=False) # Make space
buf.put(msg, block=False)
def consume():
msg = buf.get(block=True)
work(msg)
although I don't particularly like that produce is not a locked, queue-atomic operation. A consume may start between full and get, for example, and it would be (probably) broken for a multi-producer scenario.
Is there's an out-of-the-box solution?
There's nothing built in for this, but it appears straightforward enough to build your own buffer class that wraps a Queue and provides mutual exclusion between .put() and .get() with its own lock, and using a Condition variable to wake up would-be consumers whenever an item is added. Like so:
import threading
class SBuf:
def __init__(self, maxsize):
import queue
self.q = queue.Queue()
self.maxsize = maxsize
self.nonempty = threading.Condition()
def get(self):
with self.nonempty:
while not self.q.qsize():
self.nonempty.wait()
assert self.q.qsize()
return self.q.get()
def put(self, v):
with self.nonempty:
while self.q.qsize() >= self.maxsize:
self.q.get()
self.q.put(v)
assert 0 < self.q.qsize() <= self.maxsize
self.nonempty.notify_all()
BTW, I advise against trying to build this kind of logic out of raw locks. Of course it can be done, but Condition variables are very carefully designed to save you from universes of unintended race conditions. There's a learning curve for Condition variables, but one well worth climbing: they often make things easy instead of brain-busting. Indeed, Python's threading module uses them internally to implement all sort of things.
An Alternative
In the above, we only invoke queue.Queue methods under the protection of our own lock, so there's really no need to use a thread-safe container - we're supplying all the thread safety already.
So it would be a bit leaner to use a simpler container. Happily, a collections.deque can be configured to discard all but the most recent N entries itself, but "at C speed". Like so:
class SBuf:
def __init__(self, maxsize):
import collections
self.q = collections.deque(maxlen=maxsize)
self.maxsize = maxsize
self.nonempty = threading.Condition()
def get(self):
with self.nonempty:
while not self.q:
self.nonempty.wait()
assert self.q
return self.q.popleft()
def put(self, v):
with self.nonempty:
self.q.append(v) # discards oldest, if needed
assert 0 < len(self.q) <= self.maxsize
self.nonempty.notify()
This also changed .notify_all() to .notify(). In this use case, either works correctly, but we're only adding one item so there's no need to notify more than one consumer. If there are multiple consumers waiting, .notify_all() will wake all of them up but only the first will find a non-empty queue. The others will see that it's empty, and just .wait() again.
Queue is already multiprocessing and multithreading safe, in that you can't write and read from the queue at the same time. However, you are correct that there's nothing stopping the queue from getting modified between the full() and get commands.
As such you can use a lock, which is how you can control thread access between multiple lines. The lock can only be acquired once, so if its currently locked, all other threads will wait until it has been released before they continue.
import threading
lock = threading.Lock()
def produce(msg):
lock.acquire()
if buf.full():
buf.get(block=False) # Make space
buf.put(msg, block=False)
lock.release()
def consume():
msg = None
while !msg:
lock.acquire()
try:
msg = buf.get(block=False)
except queue.Empty:
# buffer is empty, wait and try again
sleep(0.01)
lock.release()
work(msg)

Are techniques (like queue, lock, ...) required when reading from a thread in one direction

I want to read current results from a thread (which has a high sample rate). The thread reads sensor values from the hardware with a looptime of 10 ms -> 1 ms work and 9 ms sleep but it varies a bit (+/- 0.5 ms). In my first approach I used a Lifo-Queue. Thread writes to the queue and main program reads it and empty the queue after reading. That worked well then. Until I noticed that it came after a long time of the program to call delays.
As far as I understood the problem. Every queue.put () uses fresh memory and queue.get () + queue.clear () makages memory to be purged. The memory is then cleaned up at certain intervals. and the cleaner comes with different runtime. And that generates the sporadic call delays.
Now I have asked myself if the queue makes sense, in this application case. So I created a threading.Object and built in addition to init () and run () a get_value_function("get_io_data"), which is called by the main process. The main process has no possibility to write back to the sensor data. The writing back is not needed. Only reading in one direction from the thread into the main process. The GIL should actually prevent simultaneous reading and writing. And the worst that can happen is that an old value has been read.
Is the assumption correct? Have I overlooked something or not understood?
This code is a schematic representation to make the question easier to understand. He is not functional!
import threading
import time
from My_modul import controller_step
class IOboard(threading.Thread):
def __init__(self):
threading.Thread.__init__(self, name="IO_board")
self.__data__ = {}
def run(self):
self.init_hardware()
while True:
self.data.update(self.read_all_sensors())
self.data.update(self.read_all_aktors())
time.sleep(0.09)
def get_io_data(self):
return self.__data__
def main():
app = IOboard()
app.start()
while True:
input_data_now = app.get_io_data()
out_put_data = controller_step(input_data_now)
print(time.time() + out_put_data)
time.sleep(0.1)
if __name__ == "__main__":
main()
This is my first stack overflow question. I hope that I could understand myself. I want an answer: if with build-in-call a thread can query a varibale without violating the python-law or linux-law

Asynhronus, Multithread simulation of real life process

first of all I have to mention I'm not a programmer but a mechanical engineer so please don't crucify me if I misinterpret something or say some nonsense.
I want to write a python code witch will be "simulating" a real life problem. The real life problem is something like FIFO queue, where objects are taken from on different stations and they spend there some time and then they are returned back to queue.
What I understand what I need is to write an asynchronous Programm, because I have one Function which is putting objects to queue (let say every 15 second) and then I have some stations which take only one object from this queue and then are working on it for some time again (simple timer and a print "Hi im working on object x, will return it in: minutes".
I'm not sure if I can do it with Threading? What if I had 100 stations which work asynchronous is it possible to start 100 Threads? Because as I understand every thread should have one timer?
I would ask to give me a little push to the simplest direction to solve it, it doesn't have to be pretty but functional and easy for me.
thank you in advance for each idea!
Best regards,
MM.
Of course you can use Threading to run several processes simultaneously.
You have to create a class like this :
from threading import Thread
class Work(Thread):
def __init__(self):
Thread.__init__(self)
self.lock = threading.Lock()
def run(self): # This function launch the thread
(your code)
if you want run several thread at the same time :
def foo():
i = 0
list = []
while i < 10:
list.append(Work())
list[i].start() # Start call run() method of the class above.
i += 1
Be careful if you want to use the same variable in several threads. You must lock this variable so that they do not all reach this variable at the same time. Like this :
lock = threading.Lock()
lock.acquire()
try:
yourVariable += 1 # When you call lock.acquire() without arguments, block all variables until the lock is unlocked (lock.release()).
finally:
lock.release()
From the main thread, you can call join() on the queue to wait until all pending tasks have been completed.
This approach has the benefit that you are not creating and destroying threads, which is expensive. The worker threads will run continuously, but will be asleep when no tasks are in the queue, using zero CPU time.
I hope it will help you.

multithreading check membership in Queue and stop the threads

I want to iterate over a list using 2 thread. One from leading and other from trailing, and put the elements in a Queue on each iteration. But before putting the value in Queue I need to check for existence of the value within Queue (its when that one of the threads has putted that value in Queue), So when this happens I need to stop the thread and return list of traversed values for each thread.
This is what I have tried so far :
from Queue import Queue
from threading import Thread, Event
class ThreadWithReturnValue(Thread):
def __init__(self, group=None, target=None, name=None,
args=(), kwargs={}, Verbose=None):
Thread.__init__(self, group, target, name, args, kwargs, Verbose)
self._return = None
def run(self):
if self._Thread__target is not None:
self._return = self._Thread__target(*self._Thread__args,
**self._Thread__kwargs)
def join(self):
Thread.join(self)
return self._return
main_path = Queue()
def is_in_queue(x, q):
with q.mutex:
return x in q.queue
def a(main_path,g,l=[]):
for i in g:
l.append(i)
print 'a'
if is_in_queue(i,main_path):
return l
main_path.put(i)
def b(main_path,g,l=[]):
for i in g:
l.append(i)
print 'b'
if is_in_queue(i,main_path):
return l
main_path.put(i)
g=['a','b','c','d','e','f','g','h','i','j','k','l']
t1 = ThreadWithReturnValue(target=a, args=(main_path,g))
t2 = ThreadWithReturnValue(target=b, args=(main_path,g[::-1]))
t2.start()
t1.start()
# Wait for all produced items to be consumed
print main_path.join()
I used ThreadWithReturnValue that will create a custom thread that returns the value.
And for membership checking I used the following function :
def is_in_queue(x, q):
with q.mutex:
return x in q.queue
Now if I first start the t1 and then the t2 I will get 12 a then one b then it doesn't do any thing and I need to terminate the python manually!
But if I first run the t2 then t1 I will get the following result:
b
b
b
b
ab
ab
b
b
b
b
a
a
So my questions is that why python treads different in this cases? and how can I terminate the threads and make them communicate with each other?
Before we get into bigger problems, you're not using Queue.join right.
The whole point of this function is that a producer who adds a bunch of items to a queue can wait until the consumer or consumers have finished working on all of those items. This works by having the consumer call task_done after they finish working on each item that they pulled off with get. Once there have been as many task_done calls as put calls, the queue is done. You're not doing a get anywhere, much less a task_done, so there's no way the queue can ever be finished. So, that's why you block forever after the two threads finish.
The first problem here is that your threads are doing almost no work outside of the actual synchronization. If the only thing they do is fight over a queue, only one of them is going to be able to run at a time.
Of course that's common in toy problems, but you have to think through your real problem:
If you're doing a lot of I/O work (listening on sockets, waiting for user input, etc.), threads work great.
If you're doing a lot of CPU work (calculating primes), threads don't work in Python because of the GIL, but processes do.
If you're actually primarily dealing with synchronizing separate tasks, neither one is going to work well (and processes will be worse). It may still be simpler to think in terms of threads, but it'll be the slowest way to do things. You may want to look into coroutines; Greg Ewing has a great demonstration of how to use yield from to use coroutines to build things like schedulers or many-actor simulations.
Next, as I alluded to in your previous question, making threads (or processes) work efficiently with shared state requires holding locks for as short a time as possible.
So, if you have to search a whole queue under a lock, that had better be a constant-time search, not a linear-time search. That's why I suggested using something like an OrderedSet recipe rather than a list, like the one inside the stdlib's Queue.Queue. Then this function:
def is_in_queue(x, q):
with q.mutex:
return x in q.queue
… is only blocking the queue for a tiny fraction of a second—just long enough to look up a hash value in a table, instead of long enough to compare every element in the queue against x.
Finally, I tried to explain about race conditions on your other question, but let me try again.
You need a lock around every complete "transaction" in your code, not just around the individual operations.
For example, if you do this:
with queue locked:
see if x is in the queue
if x was not in the queue:
with queue locked:
add x to the queue
… then it's always possible that x was not in the queue when you checked, but in the time between when you unlocked it and relocked it, someone added it. This is exactly why it's possible for both threads to stop early.
To fix this, you need to put a lock around the whole thing:
with queue locked:
if x is not in the queue:
add x to the queue
Of course this goes directly against what I said before about locking the queue for as short a time as possible. Really, that's what makes multithreading hard in a nutshell. It's easy to write safe code that just locks everything for as long as might conceivably be necessary, but then your code ends up only using a single core, while all the other threads are blocked waiting for the lock. And it's easy to write fast code that just locks everything as briefly as possible, but then it's unsafe and you get garbage values or even crashes all over the place. Figuring out what needs to be a transaction, and how to minimize the work inside those transactions, and how to deal with the multiple locks you'll probably need to make that work without deadlocking them… that's not so easy.
A couple of things that I think can be improved:
Due to the GIL, you might want to use the multiprocessing (rather than threading) module. In general, CPython threading will not cause CPU intensive work to speed up. (Depending on what exactly is the context of your question, it's also possible that multiprocessing won't, but threading almost certainly won't.)
A function like your is_inqueue would likely lead to high contention.
The locked time seems linear in the number of items that need to be traversed:
def is_in_queue(x, q):
with q.mutex:
return x in q.queue
So, instead, you could possibly do the following.
Use multiprocessing with a shared dict:
from multiprocessing import Process, Manager
manager = Manager()
d = manager.dict()
# Fn definitions and such
p1 = Process(target=p1, args=(d,))
p2 = Process(target=p2, args=(d,))
within each function, check for the item like this:
def p1(d):
# Stuff
if 'foo' in d:
return

How to do a non-blocking URL fetch in Python

I am writing a GUI app in Pyglet that has to display tens to hundreds of thumbnails from the Internet. Right now, I am using urllib.urlretrieve to grab them, but this blocks each time until they are finished, and only grabs one at a time.
I would prefer to download them in parallel and have each one display as soon as it's finished, without blocking the GUI at any point. What is the best way to do this?
I don't know much about threads, but it looks like the threading module might help? Or perhaps there is some easy way I've overlooked.
You'll probably benefit from threading or multiprocessing modules. You don't actually need to create all those Thread-based classes by yourself, there is a simpler method using Pool.map:
from multiprocessing import Pool
def fetch_url(url):
# Fetch the URL contents and save it anywhere you need and
# return something meaningful (like filename or error code),
# if you wish.
...
pool = Pool(processes=4)
result = pool.map(f, image_url_list)
As you suspected, this is a perfect situation for threading. Here is a short guide I found immensely helpful when doing my own first bit of threading in python.
As you rightly indicated, you could create a number of threads, each of which is responsible for performing urlretrieve operations. This allows the main thread to continue uninterrupted.
Here is a tutorial on threading in python:
http://heather.cs.ucdavis.edu/~matloff/Python/PyThreads.pdf
Here's an example of how to use threading.Thread. Just replace the class name with your own and the run function with your own. Note that threading is great for IO restricted applications like your's and can really speed it up. Using pythong threading strictly for computation in standard python doesn't help because only one thread can compute at a time.
import threading, time
class Ping(threading.Thread):
def __init__(self, multiple):
threading.Thread.__init__(self)
self.multiple = multiple
def run(self):
#sleeps 3 seconds then prints 'pong' x times
time.sleep(3)
printString = 'pong' * self.multiple
pingInstance = Ping(3)
pingInstance.start() #your run function will be called with the start function
print "pingInstance is alive? : %d" % pingInstance.isAlive() #will return True, or 1
print "Number of threads alive: %d" % threading.activeCount()
#main thread + class instance
time.sleep(3.5)
print "Number of threads alive: %d" % threading.activeCount()
print "pingInstance is alive?: %d" % pingInstance.isAlive()
#isAlive returns false when your thread reaches the end of it's run function.
#only main thread now
You have these choices:
Threads: easiest but doesn't scale well
Twisted: medium difficulty, scales well but shares CPU due to GIL and being single threaded.
Multiprocessing: hardest. Scales well if you know how to write your own event loop.
I recommend just using threads unless you need an industrial scale fetcher.
You either need to use threads, or an asynchronous networking library such as Twisted. I suspect that using threads might be simpler in your particular use case.

Categories