Running multiple independent python scripts concurrently

Running multiple independent python scripts concurrently - python

My goal is create one main python script that executes multiple independent python scripts in windows server 2012 at the same time. One of the benefits in my mind is that I can point taskscheduler to one main.py script as opposed to multiple .py scripts. My server has 1 cpu. I have read on multiprocessing,thread & subprocess which only added to my confusion a bit. I am basically running multiple trading scripts for different stock symbols all at the same time after market open at 9:30 EST. Following is my attempt but I have no idea whether this is right. Any direction/feedback is highly appreciated!
import subprocess
subprocess.Popen(["python", '1.py'])
subprocess.Popen(["python", '2.py'])
subprocess.Popen(["python", '3.py'])
subprocess.Popen(["python", '4.py'])

I think I'd try to do this like that:
from multiprocessing import Pool
def do_stuff_with_stock_symbol(symbol):
return _call_api()
if __name__ == '__main__':
symbols = ["GOOG", "APPL", "TSLA"]
p = Pool(len(symbols))
results = p.map(do_stuff_with_stock_symbol, symbols)
print(results)
(Modified example from multiprocessing introduction: https://docs.python.org/3/library/multiprocessing.html#introduction)
Consider using a constant pool size if you deal with a lot of stock symbols, because every python process will use some amount of memory.
Also, please note that using threads might be a lot better if you are dealing with an I/O bound workload (calling an API, writing and reading from disk). Processes really become necessary with python when dealing with compute bound workloads (because of the global interpreter lock).
An example using threads and the concurrent futures library would be:
import concurrent.futures
TIMEOUT = 60
def do_stuff_with_stock_symbol(symbol):
return _call_api()
if __name__ == '__main__':
symbols = ["GOOG", "APPL", "TSLA"]
with concurrent.futures.ThreadPoolExecutor(max_workers=len(symbols)) as executor:
results = {executor.submit(do_stuff_with_stock_symbol, symbol, TIMEOUT): symbol for symbol in symbols}
for future in concurrent.futures.as_completed(results):
symbol = results[future]
try:
data = future.result()
except Exception as exc:
print('{} generated an exception: {}'.format(symbol, exc))
else:
print('stock symbol: {}, result: {}'.format(symbol, data))
(Modified example from: https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor-example)
Note that threads will still use some memory, but less than processes.
You could use asyncio or green threads if you want to reduce memory consumption per stock symbol to a minimum, but at some point you will run into network bandwidth problems because of all the concurrent API calls :)

While what you're asking might not be the best way to handle what you're doing, I've wanted to do similar things in the past and it took a while to find what I needed so to answer your question:
I'm not promising this to be the "best" way to do it, but it worked in my use case.
I created a class I wanted to use to extend threading.
thread.py
"""
Extends threading.Thread giving access to a Thread object which will accept
A thread_id, thread name, and a function at the time of instantiation. The
function will be called when the threads start() method is called.
"""
import threading
class Thread(threading.Thread):
def __init__(self, thread_id, name, func):
threading.Thread.__init__(self)
self.threadID = thread_id
self.name = name
# the function that should be run in the thread.
self.func = func
def run(self):
return self.func()
I needed some work done that was part of another package
work_module.py
import...
def func_that_does_work():
# do some work
pass
def more_work():
# do some work
pass
Then the main script I wanted to run
main.py
from thread import Thread
import work_module as wm
mythreads = []
mythreads.append(Thread(1, "a_name", wm.func_that_does_work))
mythreads.append(Thread(2, "another_name", wm.more_work))
for t in mythreads:
t.start()
The threads die when the run() is returned. Being this extends a Thread from threading there are several options available in the docs here: https://docs.python.org/3/library/threading.html

If all you're looking to do is automate the startup, creating a .bat file is a great and simple alternative to trying to do it with another python script.
the example linked in the comments shows how to do it with bash on unix based machines, but batch files can do a very similar thing with the START command:
start_py.bat:
START "" /B "path\to\python.exe" "path\to\script_1.py"
START "" /B "path\to\python.exe" "path\to\script_2.py"
START "" /B "path\to\python.exe" "path\to\script_3.py"
the full syntax for START can be found here.

Related

Python Unittest and Multiprocessing fail when run sequentially

I'm attempting to create unittests for my application which uses multiple processes, but have been having strange issues when attempting to run all the tests together. Basically when running tests individually they pass without issue but when run sequentially, such as when running all tests in the file, some tests will fail.
What I'm seeing is that many python processes are being created but they aren't closing when the test is reported as passed. For example if 2 tests are run that each generate 5 proceses, then 10 python processes show up in the system monitor.
I've tried using terminate and join but neither work. Is there a way to force a test to correctly close all processes that it generated before running the next test?
I'm running Python 2.7 in Ubuntu 16.04.
Edit:
It's a fairly large code base so here a simplified example.
from multiprocessing import Pipe, Process
class BaseDevice:
# Various methods
pass
class BaseInstr(BaseDevice, Process):
def __init__(self, pipe):
Process.__init__(self)
self.pipe = pipe
def run(self):
# Do stuff and wait for terminate message on pipe
# Various other higher level methods
class BaseCompountInstrument(BaseInstr):
def __init__(self, pipe):
# Create multiple instruments, usually done with config file but simplified here
BaseInstr.__init__(self, pipe)
instrlist = list()
for _ in range(5):
masterpipe, slavepipe = Pipe()
instrlist.append([BaseInstr(slavepipe), masterpipe])
def run(self):
pass
# Listen for message from pipe, send messages to sub-instruments
def shutdown(self):
# When shutdown message received, send to all sub-instruments
pass
class test(unittest.TestCase):
def setUp(self):
# Load up a configuration file from the sample configs so that they're updated
self.parentConn, self.childConn = Pipe()
self.instr = BaseCompountInstrument( self.childConn)
self.instr.start()
def tearDown(self):
self.parentConn.send("shutdown") # Propagates to all sub-instruments
def test1(self):
pass
def test2(self):
pass

After struggling a while (2 days actually) with this, I found a solution with it is not technically wrong, but removes all the parallel code you can have (Only in tests, only in tests...)
I use this package mock to mock functions (which I realize now it's part of the unittest module since Python 3.3 xD), you can suppose the execution of certain function worked well, fix a certain return value, or change the function itself.
So I did the last option: Change the function itself.
In my case I used a list of Process (because Pool didn't work in my case) and Manager's list to share data between the processes.
My original code would be something like this:
import multiprocessing as mp
manager = mp.Manager()
list_data = manager.list()
list_return = manager.list()
def parallel_function(list_data, list_return)
while len(list_data) > 0:
# Do things and make sure to "pop" the data in list_data
list_return.append(return_data)
return None
# Create as many processes as images or cpus, the lesser number
processes = [mp.Process(target=parallel_function,
args=(list_data, list_return))
for num_p in range(mp.cpu_count())]
for p in processes:
p.start()
for p in processes:
p.join(10)
So in my test I mock the function Process._init_ from the multiprocessing module to do my parallel_function instead create a new process.
In the test file, before any test you should define the same function you try to parallelize:
def fake_process(self, list_data, list_return):
while len(list_data) > 0:
# Do things and make sure to "pop" the data in list_data
list_return.append(return_data)
return None
And before the definition of any method which is going to execute this part of the code you have to define its decorators to overwrite the Process._init_ function.
#patch('multiprocessing.Process.__init__', new=fake_process)
#patch('multiprocessing.Process.start', new=lambda x: None)
#patch('multiprocessing.Process.join', new=lambda x, y: None)
def test_from_the_hell(self):
# Do things
If you use Manager data structures there is no need of use Locks or anything to control the access to the data, because those structures are thread safe.
I hope this will help any other lost soul who is trying to test multiprocessing code.

Python create a subprocess and do not wait

I would like to run a series of commands (which take a long time). But I do not want to wait for the completion of each command. How can I go about this in Python?
I looked at
os.fork()
and
subprocess.popen()
Don't think that is what I need.
Code
def command1():
wait(10)
def command2():
wait(10)
def command3():
wait(10)
I would like to call
command1()
command2()
command3()
Without having to wait.

Use python's multiprocessing module.
def func(arg1):
... do something ...
from multiprocessing import Process
p = Process(target=func, args=(arg1,), name='func')
p.start()
Complete Documentaion is over here: https://docs.python.org/2/library/multiprocessing.html
EDIT:
You can also use the Threading module of python if you are using jpython/cpython distribution as you can overcome the GIL (Global Interpreter Lock) in these distributions.
https://docs.python.org/2/library/threading.html

This example maybe is suitable for you:
#!/usr/bin/env python3
import sys
import os
import time
def forked(fork_func):
def do_fork():
pid = os.fork()
if (pid > 0):
fork_func()
exit(0)
else:
return pid
return do_fork
#forked
def command1():
time.sleep(2)
#forked
def command2():
time.sleep(1)
command1()
command2()
print("Hello")
You just use decorator #forked for your functions.
There is only one problem: when main program is over, it waits for end of child processes.

The most straightforward way is to use Python's own multiprocessing:
from multiprocessing import Process
def command1():
wait(10)
...
call1 = Process(target=command1, args=(...))
call1.start()
...
This module was introduced back exactly to ease the burden on controlling external process execution of functions accessible in the same code-base Of course, that could already be done by using os.fork, subprocess. Multiprocessing emulates as far as possible, Python's own threading moudle interface. The one immediate advantage of using multiprocessing over threading is that this enables the various worker processes to make use of different CPU cores, actually working in parallel - while threading, effectively, due to language design limitations is actually limited to a single execution worker at once, thus making use of a single core even when several are available.
Now, note that there are still peculiarities - specially if you are, for example, calling these from inside a web-request. Check this question an answers form a few days ago:
Stop a background process in flask without creating zombie processes

Python simplest form of multiprocessing

Ive been trying to read up on threading and multiprocessing but all the examples are to intricate and advanced for my level of python/programming knowlegde. I want to run a function, which consists of a while loop, and while that loop runs I want to continue with the program and eventually change the condition for the while-loop and end that process. This is the code:
class Example():
def __init__(self):
self.condition = False
def func1(self):
self.condition = True
while self.condition:
print "Still looping"
time.sleep(1)
print "Finished loop"
def end_loop(self):
self.condition = False
The I make the following function-calls:
ex = Example()
ex.func1()
time.sleep(5)
ex.end_loop()
What I want is for the func1 to run for 5s before the end_loop() is called and changes the condition and ends the loop and thus also the function. I.e I want one process to start and "go" into func1 and at the same time I want time.sleep(5) to be called, so the processes "split" when arriving at func1, one process entering the function while the other continues down the program and start with the time.sleep(5) execution.
This must be the most basic example of a multiprocess, still Ive had trouble finding a simple way to do it!
Thank you
EDIT1: regarding do_something. In my real problem do_something is replaced by some code that communicates with another program via a socket and receives packages with coordinates every 0.02s and stores them in membervariables of the class. I want this constant updating of the coordinates to start and then be able to to read the coordinates via other functions at the same time.
However that is not so relevant. What if do_something is replaced by:
time.sleep(1)
print "Still looping"
How do I solve my problem then?
EDIT2: I have tried multiprocessing like this:
from multiprocessing import Process
ex = Example()
p1 = Process(target=ex.func1())
p2 = Process(target=ex.end_loop())
p1.start()
time.sleep(5)
p2.start()
When I ran this, I never got to p2.start(), so that did not help. Even if it had this is not really what Im looking for either. What I want would be just to start the process p1, and then continue with time.sleep and ex.end_loop()

The first problem with your code are the calls
p1 = Process(target=ex.func1())
p2 = Process(target=ex.end_loop())
With ex.func1() you're calling the function and pass the return value as target parameter. Since the function doesn't return anything, you're effectively calling
p1 = Process(target=None)
p2 = Process(target=None)
which makes, of course, no sense.
After fixing that, the next problem will be shared data: when using the multiprocessing package, you implement concurrency using multiple processes which, by default, cannot simply share data afaik. Have a look at Sharing state between processes in the package's documentation to read about this. Especially take the first sentence into account: "when doing concurrent programming it is usually best to avoid using shared state as far as possible"!
So you might want to also have a look at Exchanging objects between processes to read about how to send/receive data between two different processes. So, instead of simply setting a flag to stop the loop, it might be better to send a message to signal the loop should be terminated.
Also note that processes are a heavyweight form of multiprocessing, they spawn multiple OS processes which comes with a relatively big overhead. multiprocessing's main purpose is to avoid problems imposed by Python's Global Interpreter Lock (google about this to read more...) If your problem is'nt much more complex than what you've told us, you might want to use the threading package instead: threads come with less overhead than processes and also allow to access the same data (although you really should read about synchronization when doing this...)
I'm afraid, multiprocessing is an inherently complex subject. So I think you will need to advance your programming/python skills to successfully use it. But I'm sure you'll manage this, the python documentation about this is comprehensive and there are a lot of other resources about this.

To tackle your EDIT2 problem, you could try using the shared memory map Value.
import time
from multiprocessing import Process, Value
class Example():
def func1(self, cond):
while (cond.value == 1):
print('do something')
time.sleep(1)
return
if __name__ == '__main__':
ex = Example()
cond = Value('i', 1)
proc = Process(target=ex.func1, args=(cond,))
proc.start()
time.sleep(5)
cond.value = 0
proc.join()
(Note the target=ex.func1 without the parentheses and the comma after cond in args=(cond,).)
But look at the answer provided by MartinStettner to find a good solution.

How to use multiprocessing with class instances in Python?

I am trying to create a class than can run a separate process to go do some work that takes a long time, launch a bunch of these from a main module and then wait for them all to finish. I want to launch the processes once and then keep feeding them things to do rather than creating and destroying processes. For example, maybe I have 10 servers running the dd command, then I want them all to scp a file, etc.
My ultimate goal is to create a class for each system that keeps track of the information for the system in which it is tied to like IP address, logs, runtime, etc. But that class must be able to launch a system command and then return execution back to the caller while that system command runs, to followup with the result of the system command later.
My attempt is failing because I cannot send an instance method of a class over the pipe to the subprocess via pickle. Those are not pickleable. I therefore tried to fix it various ways but I can't figure it out. How can my code be patched to do this? What good is multiprocessing if you can't send over anything useful?
Is there any good documentation of multiprocessing being used with class instances? The only way I can get the multiprocessing module to work is on simple functions. Every attempt to use it within a class instance has failed. Maybe I should pass events instead? I don't understand how to do that yet.
import multiprocessing
import sys
import re
class ProcessWorker(multiprocessing.Process):
"""
This class runs as a separate process to execute worker's commands in parallel
Once launched, it remains running, monitoring the task queue, until "None" is sent
"""
def __init__(self, task_q, result_q):
multiprocessing.Process.__init__(self)
self.task_q = task_q
self.result_q = result_q
return
def run(self):
"""
Overloaded function provided by multiprocessing.Process. Called upon start() signal
"""
proc_name = self.name
print '%s: Launched' % (proc_name)
while True:
next_task_list = self.task_q.get()
if next_task is None:
# Poison pill means shutdown
print '%s: Exiting' % (proc_name)
self.task_q.task_done()
break
next_task = next_task_list[0]
print '%s: %s' % (proc_name, next_task)
args = next_task_list[1]
kwargs = next_task_list[2]
answer = next_task(*args, **kwargs)
self.task_q.task_done()
self.result_q.put(answer)
return
# End of ProcessWorker class
class Worker(object):
"""
Launches a child process to run commands from derived classes in separate processes,
which sit and listen for something to do
This base class is called by each derived worker
"""
def __init__(self, config, index=None):
self.config = config
self.index = index
# Launce the ProcessWorker for anything that has an index value
if self.index is not None:
self.task_q = multiprocessing.JoinableQueue()
self.result_q = multiprocessing.Queue()
self.process_worker = ProcessWorker(self.task_q, self.result_q)
self.process_worker.start()
print "Got here"
# Process should be running and listening for functions to execute
return
def enqueue_process(target): # No self, since it is a decorator
"""
Used to place an command target from this class object into the task_q
NOTE: Any function decorated with this must use fetch_results() to get the
target task's result value
"""
def wrapper(self, *args, **kwargs):
self.task_q.put([target, args, kwargs]) # FAIL: target is a class instance method and can't be pickled!
return wrapper
def fetch_results(self):
"""
After all processes have been spawned by multiple modules, this command
is called on each one to retreive the results of the call.
This blocks until the execution of the item in the queue is complete
"""
self.task_q.join() # Wait for it to to finish
return self.result_q.get() # Return the result
#enqueue_process
def run_long_command(self, command):
print "I am running number % as process "%number, self.name
# In here, I will launch a subprocess to run a long-running system command
# p = Popen(command), etc
# p.wait(), etc
return
def close(self):
self.task_q.put(None)
self.task_q.join()
if __name__ == '__main__':
config = ["some value", "something else"]
index = 7
workers = []
for i in range(5):
worker = Worker(config, index)
worker.run_long_command("ls /")
workers.append(worker)
for worker in workers:
worker.fetch_results()
# Do more work... (this would actually be done in a distributor in another class)
for worker in workers:
worker.close()
Edit: I tried to move the ProcessWorker class and the creation of the multiprocessing queues outside of the Worker class and then tried to manually pickle the worker instance. Even that doesn't work and I get an error
RuntimeError: Queue objects should only be shared between processes
through inheritance
. But I am only passing references of those queues into the worker instance?? I am missing something fundamental. Here is the modified code from the main section:
if __name__ == '__main__':
config = ["some value", "something else"]
index = 7
workers = []
for i in range(1):
task_q = multiprocessing.JoinableQueue()
result_q = multiprocessing.Queue()
process_worker = ProcessWorker(task_q, result_q)
worker = Worker(config, index, process_worker, task_q, result_q)
something_to_look_at = pickle.dumps(worker) # FAIL: Doesn't like queues??
process_worker.start()
worker.run_long_command("ls /")

So, the problem was that I was assuming that Python was doing some sort of magic that is somehow different from the way that C++/fork() works. I somehow thought that Python only copied the class, not the whole program into a separate process. I seriously wasted days trying to get this to work because all of the talk about pickle serialization made me think that it actually sent everything over the pipe. I knew that certain things could not be sent over the pipe, but I thought my problem was that I was not packaging things up properly.
This all could have been avoided if the Python docs gave me a 10,000 ft view of what happens when this module is used. Sure, it tells me what the methods of multiprocess module does and gives me some basic examples, but what I want to know is what is the "Theory of Operation" behind the scenes! Here is the kind of information I could have used. Please chime in if my answer is off. It will help me learn.
When you run start a process using this module, the whole program is copied into another process. But since it is not the "__main__" process and my code was checking for that, it doesn't fire off yet another process infinitely. It just stops and sits out there waiting for something to do, like a zombie. Everything that was initialized in the parent at the time of calling multiprocess.Process() is all set up and ready to go. Once you put something in the multiprocess.Queue or shared memory, or pipe, etc. (however you are communicating), then the separate process receives it and gets to work. It can draw upon all imported modules and setup just as if it was the parent. However, once some internal state variables change in the parent or separate process, those changes are isolated. Once the process is spawned, it now becomes your job to keep them in sync if necessary, either through a queue, pipe, shared memory, etc.
I threw out the code and started over, but now I am only putting one extra function out in the ProcessWorker, an "execute" method that runs a command line. Pretty simple. I don't have to worry about launching and then closing a bunch of processes this way, which has caused me all kinds of instability and performance issues in the past in C++. When I switched to launching processes at the beginning and then passing messages to those waiting processes, my performance improved and it was very stable.
BTW, I looked at this link to get help, which threw me off because the example made me think that methods were being transported across the queues: http://www.doughellmann.com/PyMOTW/multiprocessing/communication.html
The second example of the first section used "next_task()" that appeared (to me) to be executing a task received via the queue.

Instead of attempting to send a method itself (which is impractical), try sending a name of a method to execute.
Provided that each worker runs the same code, it's a matter of a simple getattr(self, task_name).
I'd pass tuples (task_name, task_args), where task_args were a dict to be directly fed to the task method:
next_task_name, next_task_args = self.task_q.get()
if next_task_name:
task = getattr(self, next_task_name)
answer = task(**next_task_args)
...
else:
# poison pill, shut down
break

REF: https://stackoverflow.com/a/14179779
Answer on Jan 6 at 6:03 by David Lynch is not factually correct when he says that he was misled by
http://www.doughellmann.com/PyMOTW/multiprocessing/communication.html.
The code and examples provided are correct and work as advertised. next_task() is executing a task received via the queue -- try and understand what the Task.__call__() method is doing.
In my case what, tripped me up was syntax errors in my implementation of run(). It seems that the sub-process will not report this and just fails silently -- leaving things stuck in weird loops! Make sure you have some kind of syntax checker running e.g. Flymake/Pyflakes in Emacs.
Debugging via multiprocessing.log_to_stderr()F helped me narrow down the problem.

How to do a non-blocking URL fetch in Python

I am writing a GUI app in Pyglet that has to display tens to hundreds of thumbnails from the Internet. Right now, I am using urllib.urlretrieve to grab them, but this blocks each time until they are finished, and only grabs one at a time.
I would prefer to download them in parallel and have each one display as soon as it's finished, without blocking the GUI at any point. What is the best way to do this?
I don't know much about threads, but it looks like the threading module might help? Or perhaps there is some easy way I've overlooked.

You'll probably benefit from threading or multiprocessing modules. You don't actually need to create all those Thread-based classes by yourself, there is a simpler method using Pool.map:
from multiprocessing import Pool
def fetch_url(url):
# Fetch the URL contents and save it anywhere you need and
# return something meaningful (like filename or error code),
# if you wish.
...
pool = Pool(processes=4)
result = pool.map(f, image_url_list)

As you suspected, this is a perfect situation for threading. Here is a short guide I found immensely helpful when doing my own first bit of threading in python.

As you rightly indicated, you could create a number of threads, each of which is responsible for performing urlretrieve operations. This allows the main thread to continue uninterrupted.
Here is a tutorial on threading in python:
http://heather.cs.ucdavis.edu/~matloff/Python/PyThreads.pdf

Here's an example of how to use threading.Thread. Just replace the class name with your own and the run function with your own. Note that threading is great for IO restricted applications like your's and can really speed it up. Using pythong threading strictly for computation in standard python doesn't help because only one thread can compute at a time.
import threading, time
class Ping(threading.Thread):
def __init__(self, multiple):
threading.Thread.__init__(self)
self.multiple = multiple
def run(self):
#sleeps 3 seconds then prints 'pong' x times
time.sleep(3)
printString = 'pong' * self.multiple
pingInstance = Ping(3)
pingInstance.start() #your run function will be called with the start function
print "pingInstance is alive? : %d" % pingInstance.isAlive() #will return True, or 1
print "Number of threads alive: %d" % threading.activeCount()
#main thread + class instance
time.sleep(3.5)
print "Number of threads alive: %d" % threading.activeCount()
print "pingInstance is alive?: %d" % pingInstance.isAlive()
#isAlive returns false when your thread reaches the end of it's run function.
#only main thread now

You have these choices:
Threads: easiest but doesn't scale well
Twisted: medium difficulty, scales well but shares CPU due to GIL and being single threaded.
Multiprocessing: hardest. Scales well if you know how to write your own event loop.
I recommend just using threads unless you need an industrial scale fetcher.

You either need to use threads, or an asynchronous networking library such as Twisted. I suspect that using threads might be simpler in your particular use case.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.