Python simplest form of multiprocessing

Python simplest form of multiprocessing - python

Ive been trying to read up on threading and multiprocessing but all the examples are to intricate and advanced for my level of python/programming knowlegde. I want to run a function, which consists of a while loop, and while that loop runs I want to continue with the program and eventually change the condition for the while-loop and end that process. This is the code:
class Example():
def __init__(self):
self.condition = False
def func1(self):
self.condition = True
while self.condition:
print "Still looping"
time.sleep(1)
print "Finished loop"
def end_loop(self):
self.condition = False
The I make the following function-calls:
ex = Example()
ex.func1()
time.sleep(5)
ex.end_loop()
What I want is for the func1 to run for 5s before the end_loop() is called and changes the condition and ends the loop and thus also the function. I.e I want one process to start and "go" into func1 and at the same time I want time.sleep(5) to be called, so the processes "split" when arriving at func1, one process entering the function while the other continues down the program and start with the time.sleep(5) execution.
This must be the most basic example of a multiprocess, still Ive had trouble finding a simple way to do it!
Thank you
EDIT1: regarding do_something. In my real problem do_something is replaced by some code that communicates with another program via a socket and receives packages with coordinates every 0.02s and stores them in membervariables of the class. I want this constant updating of the coordinates to start and then be able to to read the coordinates via other functions at the same time.
However that is not so relevant. What if do_something is replaced by:
time.sleep(1)
print "Still looping"
How do I solve my problem then?
EDIT2: I have tried multiprocessing like this:
from multiprocessing import Process
ex = Example()
p1 = Process(target=ex.func1())
p2 = Process(target=ex.end_loop())
p1.start()
time.sleep(5)
p2.start()
When I ran this, I never got to p2.start(), so that did not help. Even if it had this is not really what Im looking for either. What I want would be just to start the process p1, and then continue with time.sleep and ex.end_loop()

The first problem with your code are the calls
p1 = Process(target=ex.func1())
p2 = Process(target=ex.end_loop())
With ex.func1() you're calling the function and pass the return value as target parameter. Since the function doesn't return anything, you're effectively calling
p1 = Process(target=None)
p2 = Process(target=None)
which makes, of course, no sense.
After fixing that, the next problem will be shared data: when using the multiprocessing package, you implement concurrency using multiple processes which, by default, cannot simply share data afaik. Have a look at Sharing state between processes in the package's documentation to read about this. Especially take the first sentence into account: "when doing concurrent programming it is usually best to avoid using shared state as far as possible"!
So you might want to also have a look at Exchanging objects between processes to read about how to send/receive data between two different processes. So, instead of simply setting a flag to stop the loop, it might be better to send a message to signal the loop should be terminated.
Also note that processes are a heavyweight form of multiprocessing, they spawn multiple OS processes which comes with a relatively big overhead. multiprocessing's main purpose is to avoid problems imposed by Python's Global Interpreter Lock (google about this to read more...) If your problem is'nt much more complex than what you've told us, you might want to use the threading package instead: threads come with less overhead than processes and also allow to access the same data (although you really should read about synchronization when doing this...)
I'm afraid, multiprocessing is an inherently complex subject. So I think you will need to advance your programming/python skills to successfully use it. But I'm sure you'll manage this, the python documentation about this is comprehensive and there are a lot of other resources about this.

To tackle your EDIT2 problem, you could try using the shared memory map Value.
import time
from multiprocessing import Process, Value
class Example():
def func1(self, cond):
while (cond.value == 1):
print('do something')
time.sleep(1)
return
if __name__ == '__main__':
ex = Example()
cond = Value('i', 1)
proc = Process(target=ex.func1, args=(cond,))
proc.start()
time.sleep(5)
cond.value = 0
proc.join()
(Note the target=ex.func1 without the parentheses and the comma after cond in args=(cond,).)
But look at the answer provided by MartinStettner to find a good solution.

Related

Shared memory global list while running multiprocessing in python

I'm currently coding a chatbot for my streaming. Since it needs to do multiple things at once I'm using the multiprocessing module, that way it can still respond to commands and do functions at the same time. My problem now is that I have one process dedicated to some web scraping and another one to look at chat and respond if a command is being typed. My thoughts were, if I append the information from one process to a global list, and then when the command is being typed in chat, the other process can use the information in the appended list. Well, this didn't work and I learned that this is because the 2 processes don't have shared memory, although both having access to the same list, they are both copies of the list, so even if one appends, in the other process' case, it will still be empty. I've come across a few questions regarding this here on stack overflow, but the examples are very specific and since I'm fairly new to coding still, I had a hard time figuring out how to apply it to my own code. For this exact reason, I've simplified the problem so it can help others who are in a similar situation, by having my example broad enough and simple enough for anyone to understand it once they read the solution. Thus this is not the code I'm actually using for my chatbot, but one that mimics the problem.
import multiprocessing as mp
import time
globalList = []
def readList():
while True:
time.sleep(2)
if globalList:
print(globalList)
else:
print("List is Empty")
print(globalList)
def writeList():
while True:
time.sleep(3)
globalList.append("Item")
print(globalList)
if __name__ == '__main__':
p1 = mp.Process(target=readList)
p2 = mp.Process(target=writeList)
p1.start()
p2.start()
When running this code you can see that the writeList function will keep adding another item to the list, but the readList function will keep showing an empty list.
I hope some master wiz out there can help me with this problem.

In Python processes cannot straightforwardly access global mutable objects created by other processes. For this, you can use, for example, a multiprocessing.Manager and its proxy objects. Your adapted example:
import multiprocessing as mp
import time
def readList(shared_list):
while True:
time.sleep(2)
if shared_list:
print(shared_list)
else:
print("List is Empty")
print(shared_list)
def writeList(shared_list):
while True:
time.sleep(3)
shared_list.append("Item")
print(shared_list)
if __name__ == '__main__':
manager = mp.Manager()
shared_list = manager.list()
p1 = mp.Process(target=readList, args=(shared_list,))
p2 = mp.Process(target=writeList, args=(shared_list,))
p1.start()
p2.start()
p1.join()
p2.join()

You can not have that by normal means. Processes have their own memory space. Threads, on the other hand, have same memory space and are ran within one process.
For more, please, reffer to this answer Multiprocessing vs Threading Python

What is the preferred way of running two methods containing infinite loops concurrently using threads?

I'm trying to combine two python3 scripts I'm running separately at the moment. Both run in an infinite loop. I found different ways of achieving what I want, but I'm a beginner still learning and trying to do it the right way.
One script is a reddit bot that replies to certain comments and uploads videos, while saving links in newly created .txt files. The other one iterates through those .txt files, reads them and sometimes deletes them.
This variety seems to be the most intuitive for me:
from threading import Thread
def runA():
while True:
print 'A\n'
def runB():
while True:
print 'B\n'
if __name__ == "__main__":
t1 = Thread(target = runA)
t2 = Thread(target = runB)
t1.setDaemon(True)
t2.setDaemon(True)
t1.start()
t2.start()
while True:
pass
Is this the preferred way of running threads? And why do I need
While True:
pass
at the end?

In general, that is a good way to start two threads, but there are details to think about.
Note that in that code, there are actually 3 threads: main thread, t1 and t2.
Since the comments say one thread downloads and the other reads the downloaded files and since the main thread does nothing in your case, I'd say you need just this much:
def download_forever():
while True:
download_stuff()
def process_new_downloads():
do_something_with_new_downloads_here
def main():
download_thread = Thread(target=download_forever)
download_thread.start()
while True:
process_new_downloads()
sleep(1) # let go of the CPU for a while, there's nothing to do anyway
Setting the threads as daemon does not modify how they live, only how they die. And here it is not clear how the whole thing ends, so I'm not sure you need that. You might want to implement some ways to stop the threads politely. You might also define some way to end the whole thing.
Additionaly, you could implement a way for one thread to wake up the other exactly when there is something new to do. You can do that e.g. with a threading.Event.
BTW, the while True which was in the main thread in the original code was needed exactly because all other threads were daemons, so ending the main thread (i.e. not making it run forever) would kill the whole application.

What is the "correct" way to make a stoppable thread in Python, given stoppable pseudo-atomic units of work?

I'm writing a threaded program in Python. This program is interrupted very frequently, by user (CRTL+C) interaction, and by other programs sending various signals, all of which should stop thread operation in various ways. The thread does a bunch of units of work (I call them "atoms") in sequence.
Each atom can be stopped quickly and safely, so making the thread itself stop is fairly trivial, but my question is: what is the "right", or canonical way to implement a stoppable thread, given stoppable, pseudo-atomic pieces of work to be done?
Should I poll a stop_at_next_check flag before each atom (example below)? Should I decorate each atom with something that does the flag-checking (basically the same as the example, but hidden in a decorator)? Or should I use some other technique I haven't thought of?
Example (simple stopped-flag checking):
class stoppable(Thread):
stop_at_next_check = False
current_atom = None
def __init__(self):
Thread.__init__(self)
def do_atom(self, atom):
if self.stop_at_next_check:
return False
self.current_atom = atom
self.current_atom.do_work()
return True
def run(self):
#get "work to be done" objects atom1, atom2, etc. from somewhere
if not do_atom(atom1):
return
if not do_atom(atom2):
return
#...etc
def die(self):
self.stop_at_next_check = True
self.current_atom.stop()

Flag checking seems right, but you missed an occasion to simplify it by using a list for atoms. If you put atoms in a list, you can use a single for loop without needing a do_atom() method, and the problem of where to do the check solves itself.
def run(self):
atoms = # get atoms
for atom in atoms:
if self.stop_at_next_check:
break
self.current_atom = atom
atom.do_work()

Create a "thread x should continue processing" flag, and when you're done with the thread, set the flag to false.
Killing a thread directly is considered bad form, because you might get a fractional chunk of work completed.

A tad late but I have created a small library, ants, solving this problem. In your example an atomic unit is represented by an worker
Example
from ants import worker
#worker
def hello():
print(“hello world”)
t = hello.start()
...
t.stop()
In above example hello() will run in a separate thread being called in a while True: loop thus spitting out “hello world” as fast as possible
You can also have triggering events , e.g. in above replace hello.start() with hello.start(lambda: time.sleep(5)) and you will have it trigger every 5:th second
The library is very new and work is ongoing on GitHub https://github.com/fa1k3n/ants.git
Future work includes adding a colony for having several workers working on different parts of same data, also planning on a queen for worker communication and control, like synch

How do I run two python loops concurrently?

Suppose I have the following in Python
# A loop
for i in range(10000):
Do Task A
# B loop
for i in range(10000):
Do Task B
How do I run these loops simultaneously in Python?

If you want concurrency, here's a very simple example:
from multiprocessing import Process
def loop_a():
while 1:
print("a")
def loop_b():
while 1:
print("b")
if __name__ == '__main__':
Process(target=loop_a).start()
Process(target=loop_b).start()
This is just the most basic example I could think of. Be sure to read http://docs.python.org/library/multiprocessing.html to understand what's happening.
If you want to send data back to the program, I'd recommend using a Queue (which in my experience is easiest to use).
You can use a thread instead if you don't mind the global interpreter lock. Processes are more expensive to instantiate but they offer true concurrency.

There are many possible options for what you wanted:
use loop
As many people have pointed out, this is the simplest way.
for i in xrange(10000):
# use xrange instead of range
taskA()
taskB()
Merits: easy to understand and use, no extra library needed.
Drawbacks: taskB must be done after taskA, or otherwise. They can't be running simultaneously.
multiprocess
Another thought would be: run two processes at the same time, python provides multiprocess library, the following is a simple example:
from multiprocessing import Process
p1 = Process(target=taskA, args=(*args, **kwargs))
p2 = Process(target=taskB, args=(*args, **kwargs))
p1.start()
p2.start()
merits: task can be run simultaneously in the background, you can control tasks(end, stop them etc), tasks can exchange data, can be synchronized if they compete the same resources etc.
drawbacks: too heavy!OS will frequently switch between them, they have their own data space even if data is redundant. If you have a lot tasks (say 100 or more), it's not what you want.
threading
threading is like process, just lightweight. check out this post. Their usage is quite similar:
import threading
p1 = threading.Thread(target=taskA, args=(*args, **kwargs))
p2 = threading.Thread(target=taskB, args=(*args, **kwargs))
p1.start()
p2.start()
coroutines
libraries like greenlet and gevent provides something called coroutines, which is supposed to be faster than threading. No examples provided, please google how to use them if you're interested.
merits: more flexible and lightweight
drawbacks: extra library needed, learning curve.

Why do you want to run the two processes at the same time? Is it because you think they will go faster (there is a good chance that they wont). Why not run the tasks in the same loop, e.g.
for i in range(10000):
doTaskA()
doTaskB()
The obvious answer to your question is to use threads - see the python threading module. However threading is a big subject and has many pitfalls, so read up on it before you go down that route.
Alternatively you could run the tasks in separate proccesses, using the python multiprocessing module. If both tasks are CPU intensive this will make better use of multiple cores on your computer.
There are other options such as coroutines, stackless tasklets, greenlets, CSP etc, but Without knowing more about Task A and Task B and why they need to be run at the same time it is impossible to give a more specific answer.

from threading import Thread
def loopA():
for i in range(10000):
#Do task A
def loopB():
for i in range(10000):
#Do task B
threadA = Thread(target = loopA)
threadB = Thread(target = loobB)
threadA.run()
threadB.run()
# Do work indepedent of loopA and loopB
threadA.join()
threadB.join()

You could use threading or multiprocessing.

How about: A loop for i in range(10000): Do Task A, Do Task B ? Without more information i dont have a better answer.

I find that using the "pool" submodule within "multiprocessing" works amazingly for executing multiple processes at once within a Python Script.
See Section: Using a pool of workers
Look carefully at "# launching multiple evaluations asynchronously may use more processes" in the example. Once you understand what those lines are doing, the following example I constructed will make a lot of sense.
import numpy as np
from multiprocessing import Pool
def desired_function(option, processes, data, etc...):
# your code will go here. option allows you to make choices within your script
# to execute desired sections of code for each pool or subprocess.
return result_array # "for example"
result_array = np.zeros("some shape") # This is normally populated by 1 loop, lets try 4.
processes = 4
pool = Pool(processes=processes)
args = (processes, data, etc...) # Arguments to be passed into desired function.
multiple_results = []
for i in range(processes): # Executes each pool w/ option (1-4 in this case).
multiple_results.append(pool.apply_async(param_process, (i+1,)+args)) # Syncs each.
results = np.array(res.get() for res in multiple_results) # Retrieves results after
# every pool is finished!
for i in range(processes):
result_array = result_array + results[i] # Combines all datasets!
The code will basically run the desired function for a set number of processes. You will have to carefully make sure your function can distinguish between each process (hence why I added the variable "option".) Additionally, it doesn't have to be an array that is being populated in the end, but for my example, that's how I used it. Hope this simplifies or helps you better understand the power of multiprocessing in Python!

How to do a non-blocking URL fetch in Python

I am writing a GUI app in Pyglet that has to display tens to hundreds of thumbnails from the Internet. Right now, I am using urllib.urlretrieve to grab them, but this blocks each time until they are finished, and only grabs one at a time.
I would prefer to download them in parallel and have each one display as soon as it's finished, without blocking the GUI at any point. What is the best way to do this?
I don't know much about threads, but it looks like the threading module might help? Or perhaps there is some easy way I've overlooked.

You'll probably benefit from threading or multiprocessing modules. You don't actually need to create all those Thread-based classes by yourself, there is a simpler method using Pool.map:
from multiprocessing import Pool
def fetch_url(url):
# Fetch the URL contents and save it anywhere you need and
# return something meaningful (like filename or error code),
# if you wish.
...
pool = Pool(processes=4)
result = pool.map(f, image_url_list)

As you suspected, this is a perfect situation for threading. Here is a short guide I found immensely helpful when doing my own first bit of threading in python.

As you rightly indicated, you could create a number of threads, each of which is responsible for performing urlretrieve operations. This allows the main thread to continue uninterrupted.
Here is a tutorial on threading in python:
http://heather.cs.ucdavis.edu/~matloff/Python/PyThreads.pdf

Here's an example of how to use threading.Thread. Just replace the class name with your own and the run function with your own. Note that threading is great for IO restricted applications like your's and can really speed it up. Using pythong threading strictly for computation in standard python doesn't help because only one thread can compute at a time.
import threading, time
class Ping(threading.Thread):
def __init__(self, multiple):
threading.Thread.__init__(self)
self.multiple = multiple
def run(self):
#sleeps 3 seconds then prints 'pong' x times
time.sleep(3)
printString = 'pong' * self.multiple
pingInstance = Ping(3)
pingInstance.start() #your run function will be called with the start function
print "pingInstance is alive? : %d" % pingInstance.isAlive() #will return True, or 1
print "Number of threads alive: %d" % threading.activeCount()
#main thread + class instance
time.sleep(3.5)
print "Number of threads alive: %d" % threading.activeCount()
print "pingInstance is alive?: %d" % pingInstance.isAlive()
#isAlive returns false when your thread reaches the end of it's run function.
#only main thread now

You have these choices:
Threads: easiest but doesn't scale well
Twisted: medium difficulty, scales well but shares CPU due to GIL and being single threaded.
Multiprocessing: hardest. Scales well if you know how to write your own event loop.
I recommend just using threads unless you need an industrial scale fetcher.

You either need to use threads, or an asynchronous networking library such as Twisted. I suspect that using threads might be simpler in your particular use case.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python simplest form of multiprocessing - python

Related

Shared memory global list while running multiprocessing in python

What is the preferred way of running two methods containing infinite loops concurrently using threads?

What is the "correct" way to make a stoppable thread in Python, given stoppable pseudo-atomic units of work?

How do I run two python loops concurrently?

How to do a non-blocking URL fetch in Python

Categories

Resources