I would like to run a series of commands (which take a long time). But I do not want to wait for the completion of each command. How can I go about this in Python?
I looked at
os.fork()
and
subprocess.popen()
Don't think that is what I need.
Code
def command1():
wait(10)
def command2():
wait(10)
def command3():
wait(10)
I would like to call
command1()
command2()
command3()
Without having to wait.
Use python's multiprocessing module.
def func(arg1):
... do something ...
from multiprocessing import Process
p = Process(target=func, args=(arg1,), name='func')
p.start()
Complete Documentaion is over here: https://docs.python.org/2/library/multiprocessing.html
EDIT:
You can also use the Threading module of python if you are using jpython/cpython distribution as you can overcome the GIL (Global Interpreter Lock) in these distributions.
https://docs.python.org/2/library/threading.html
This example maybe is suitable for you:
#!/usr/bin/env python3
import sys
import os
import time
def forked(fork_func):
def do_fork():
pid = os.fork()
if (pid > 0):
fork_func()
exit(0)
else:
return pid
return do_fork
#forked
def command1():
time.sleep(2)
#forked
def command2():
time.sleep(1)
command1()
command2()
print("Hello")
You just use decorator #forked for your functions.
There is only one problem: when main program is over, it waits for end of child processes.
The most straightforward way is to use Python's own multiprocessing:
from multiprocessing import Process
def command1():
wait(10)
...
call1 = Process(target=command1, args=(...))
call1.start()
...
This module was introduced back exactly to ease the burden on controlling external process execution of functions accessible in the same code-base Of course, that could already be done by using os.fork, subprocess. Multiprocessing emulates as far as possible, Python's own threading moudle interface. The one immediate advantage of using multiprocessing over threading is that this enables the various worker processes to make use of different CPU cores, actually working in parallel - while threading, effectively, due to language design limitations is actually limited to a single execution worker at once, thus making use of a single core even when several are available.
Now, note that there are still peculiarities - specially if you are, for example, calling these from inside a web-request. Check this question an answers form a few days ago:
Stop a background process in flask without creating zombie processes
Related
My goal is create one main python script that executes multiple independent python scripts in windows server 2012 at the same time. One of the benefits in my mind is that I can point taskscheduler to one main.py script as opposed to multiple .py scripts. My server has 1 cpu. I have read on multiprocessing,thread & subprocess which only added to my confusion a bit. I am basically running multiple trading scripts for different stock symbols all at the same time after market open at 9:30 EST. Following is my attempt but I have no idea whether this is right. Any direction/feedback is highly appreciated!
import subprocess
subprocess.Popen(["python", '1.py'])
subprocess.Popen(["python", '2.py'])
subprocess.Popen(["python", '3.py'])
subprocess.Popen(["python", '4.py'])
I think I'd try to do this like that:
from multiprocessing import Pool
def do_stuff_with_stock_symbol(symbol):
return _call_api()
if __name__ == '__main__':
symbols = ["GOOG", "APPL", "TSLA"]
p = Pool(len(symbols))
results = p.map(do_stuff_with_stock_symbol, symbols)
print(results)
(Modified example from multiprocessing introduction: https://docs.python.org/3/library/multiprocessing.html#introduction)
Consider using a constant pool size if you deal with a lot of stock symbols, because every python process will use some amount of memory.
Also, please note that using threads might be a lot better if you are dealing with an I/O bound workload (calling an API, writing and reading from disk). Processes really become necessary with python when dealing with compute bound workloads (because of the global interpreter lock).
An example using threads and the concurrent futures library would be:
import concurrent.futures
TIMEOUT = 60
def do_stuff_with_stock_symbol(symbol):
return _call_api()
if __name__ == '__main__':
symbols = ["GOOG", "APPL", "TSLA"]
with concurrent.futures.ThreadPoolExecutor(max_workers=len(symbols)) as executor:
results = {executor.submit(do_stuff_with_stock_symbol, symbol, TIMEOUT): symbol for symbol in symbols}
for future in concurrent.futures.as_completed(results):
symbol = results[future]
try:
data = future.result()
except Exception as exc:
print('{} generated an exception: {}'.format(symbol, exc))
else:
print('stock symbol: {}, result: {}'.format(symbol, data))
(Modified example from: https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor-example)
Note that threads will still use some memory, but less than processes.
You could use asyncio or green threads if you want to reduce memory consumption per stock symbol to a minimum, but at some point you will run into network bandwidth problems because of all the concurrent API calls :)
While what you're asking might not be the best way to handle what you're doing, I've wanted to do similar things in the past and it took a while to find what I needed so to answer your question:
I'm not promising this to be the "best" way to do it, but it worked in my use case.
I created a class I wanted to use to extend threading.
thread.py
"""
Extends threading.Thread giving access to a Thread object which will accept
A thread_id, thread name, and a function at the time of instantiation. The
function will be called when the threads start() method is called.
"""
import threading
class Thread(threading.Thread):
def __init__(self, thread_id, name, func):
threading.Thread.__init__(self)
self.threadID = thread_id
self.name = name
# the function that should be run in the thread.
self.func = func
def run(self):
return self.func()
I needed some work done that was part of another package
work_module.py
import...
def func_that_does_work():
# do some work
pass
def more_work():
# do some work
pass
Then the main script I wanted to run
main.py
from thread import Thread
import work_module as wm
mythreads = []
mythreads.append(Thread(1, "a_name", wm.func_that_does_work))
mythreads.append(Thread(2, "another_name", wm.more_work))
for t in mythreads:
t.start()
The threads die when the run() is returned. Being this extends a Thread from threading there are several options available in the docs here: https://docs.python.org/3/library/threading.html
If all you're looking to do is automate the startup, creating a .bat file is a great and simple alternative to trying to do it with another python script.
the example linked in the comments shows how to do it with bash on unix based machines, but batch files can do a very similar thing with the START command:
start_py.bat:
START "" /B "path\to\python.exe" "path\to\script_1.py"
START "" /B "path\to\python.exe" "path\to\script_2.py"
START "" /B "path\to\python.exe" "path\to\script_3.py"
the full syntax for START can be found here.
I'm currently using the standard multiprocessing in python to generate a bunch of processes that will run indefinitely. I'm not particularly concerned with performance; each thread is simply watching for a different change on the filesystem, and will take the appropriate action when a file is modified.
Currently, I have a solution that works, for my needs, in Linux. I have a dictionary of functions and arguments that looks like:
job_dict['func1'] = {'target': func1, 'args': (args,)}
For each, I create a process:
import multiprocessing
for k in job_dict.keys():
jobs[k] = multiprocessing.Process(target=job_dict[k]['target'],
args=job_dict[k]['args'])
With this, I can keep track of each one that is running, and, if necessary, restart a job that crashes for any reason.
This does not work in Windows. Many of the functions I'm using are wrappers, using various functools functions, and I get messages about not being able to serialize the functions (see What can multiprocessing and dill do together?). I have not figured out why I do not get this error in Linux, but do in Windows.
If I import dill before starting my processes in Windows, I do not get the serialization error. However, the processes do not actually do anything. I cannot figure out why.
I then switched to the multiprocessing implementation in pathos, but did not find an analog to the simple Process class within the standard multiprocessing module. I was able to generate threads for each job using pathos.pools.ThreadPool. This is not the intended use for map, I'm sure, but it started all the threads, and they ran in Windows:
import pathos
tp = pathos.pools.ThreadPool()
for k in job_dict.keys():
tp.uimap(job_dict[k]['target'], job_dict[k]['args'])
However, now I'm not sure how to monitor whether a thread is still active, which I'm looking for so that I can restart threads that crash for some reason or another. Any suggestions?
I'm the pathos and dill author. The Process class is buried deep within pathos at pathos.helpers.mp.process.Process, where mp itself is the actual fork of the multiprocessing library. Everything in multiprocessing should be accessible from there.
Another thing to know about pathos is that it keeps the pool alive for you until you remove it from the held state. This helps reduce overhead in creating "new" pools. To remove a pool, you do:
>>> # create
>>> p = pathos.pools.ProcessPool()
>>> # remove
>>> p.clear()
There's no such mechanism for a Process however.
For multiprocessing, windows is different than Linux and Macintosh… because windows doesn't have a proper fork like on linux… linux can share objects across processes, while on windows there is no sharing… it's basically a fully independent new process created… and therefore the serialization has to be better for the object to pass across to the other process -- just as if you would send the object to another computer. On, linux, you'd have to do this to get the same behavior:
def check(obj, *args, **kwds):
"""check pickling of an object across another process"""
import subprocess
fail = True
try:
_x = dill.dumps(x, *args, **kwds)
fail = False
finally:
if fail:
print "DUMP FAILED"
msg = "python -c import dill; print dill.loads(%s)" % repr(_x)
print "SUCCESS" if not subprocess.call(msg.split(None,2)) else "LOAD FAILED"
There are a lot of similar questions and asnwers, but I still can't find reliable answer.
So, I have a function, that can possibly run too long. Function is private, in sense that I can not change it code.
I want to restrict its execution time to 60 seconds.
I tried following approaches:
Python signals. Don't work on Windows and in multithreaded envorinment (mod_wsgi).
Threads. Nice way, but thread can not be stopped, so that it lives even after raising TimeoutException.
multiprocessing python module. I have problems with pickling and I don't know how to solve them. I want to make time_limit decorator and there are problems with importing required function in top-level. Function that is executed long is instance method, and wrapping it also doesn't help...
So, are there good solutions to the above problem?
How to kill thread, that I started?
How to use subprocesses and avoid problems with pickling?
Is subprocess module of any help?
Thank you.
I think the multiprocessing approach is your only real option. You're correct that threads can't be killed (nicely) and signals have cross-platform issues. Here is one multiprocessing implementation:
import multiprocessing
import Queue
def timed_function(return_queue):
do_other_stuff()
return_queue.put(True)
return
def main():
return_queue = multiprocessing.Manager().Queue()
proc = multiprocessing.Process(target=timed_function, args=(return_queue,))
proc.start()
try:
# wait for 60 seconds for the function to return a value
return_queue.get(timeout=60)
except Queue.Empty:
# timeout expired
proc.terminate() # kill the subprocess
# other cleanup
I know you said that you have pickling issues, but those can almost always be resolved with refactoring. For example, you said that your long function is an instance method. You can wrap those kinds of functions to use them with multiprocessing:
class TestClass(object):
def timed_method(self, return_queue):
do_other_stuff()
return_queue.put(True)
return
To use that method in a pool of workers, add this wrapper to the top-level of the module:
def _timed_method_wrapper(TestClass_object, return_queue):
return TestClass_object(return_queue)
Now you can, for example, use apply_async on this class method from a different method of the same class:
def run_timed_method():
return_queue = multiprocessing.Manager().Queue()
pool = multiprocessing.Pool()
result = pool.apply_async(_timed_method_wrapper, args=(self, return_queue))
I'm pretty sure that these wrappers are only necessary if you're using a multiprocessing.Pool instead of launching the subprocess with a multiprocessing.Process object. Also, I bet a lot of people would frown on this construct because you're breaking the nice, clean abstraction that classes provide, and also creating a dependency between the class and this other random wrapper function hanging around. You'll have to be the one to decide if making your code more ugly is worth it or not.
An answer to
Is it possible to kill a process on Windows from within Python? may help:
You need to kill that subprocess or thread:
"Terminating a subprocess on windows"
Maybe also TerminateThread helps
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Multiprocessing launching too many instances of Python VM
I'm trying to use python multiprocess to parallelize web fetching, but I'm finding that the application calling the multiprocessing gets instantiated multiple times, not just the function I want called (which is a problem for me as the caller has some dependencies on a library that is slow to instantiate - losing most of my performance gains from parallelism).
What am I doing wrong or how is this avoided?
my_app.py:
from url_fetcher import url_fetch, parallel_fetch
import my_slow_stuff
my_slow_stuff.py:
if __name__ == '__main__':
import datetime
urls = ['http://www.microsoft.com'] * 20
results = parallel_fetch(urls, fn=url_fetch)
print([x[:20] for x in results])
class MySlowStuff(object):
import time
print('doing slow stuff')
time.sleep(0)
print('done slow stuff')
url_fetcher.py:
import multiprocessing
import urllib
def url_fetch(url):
#return urllib.urlopen(url).read()
return url
def parallel_fetch(urls, fn):
PROCESSES = 10
CHUNK_SIZE = 1
pool = multiprocessing.Pool(PROCESSES)
results = pool.imap(fn, urls, CHUNK_SIZE)
return results
if __name__ == '__main__':
import datetime
urls = ['http://www.microsoft.com'] * 20
results = parallel_fetch(urls, fn=url_fetch)
print([x[:20] for x in results])
partial output:
$ python my_app.py
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
doing slow stuff
done slow stuff
...
Python multiprocessing module for Windows behaves slightly differently because Python doesn't implement os.fork() on this platform. In particular:
Safe importing of main module
Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).
Here, global class MySlowStuff gets always evaluated by newly started child processes on Windows. To fix that class MySlowStuff should be defined only when __name__ == '__main__'.
See 16.6.3.2. Windows for more details.
The multiprocessing module on windows doesn't work the same as in Unix/Linux. On Linux it uses the fork command and all the context is copied/duplciated to the new pocess as it is when forked.
The system call fork does not exsit on windows, and the multiprocessing module has to create a new python process and load all the modules again, this is the reason why on the python lib documetnacion forces you to user the if __name__ == '__main__' trick when using mutiprocessing on windows.
The solution to this case is to use threads instead. This case is a IO bound process and you the advantage os multiprocessing that is avoiding GIL problems does not afect you.
More info in http://docs.python.org/library/multiprocessing.html#windows
I am having an issue with the time.sleep() function in python. I am running a script that needs to wait for another program to generate txt files. Although, this is a terribly old machine, so when I sleep the python script, I run into issues with the other program not generating files. Is there any alternatives to using time.sleep()? I thought locking the thread might work but essentially it would just be a loop of locking the thread for a couple of seconds. I'll give some pseudo code here of what I'm doing.
While running:
if filesFound != []:
moveFiles
else:
time.sleep(1)
One way to do a non-blocking wait is to use threading.Event:
import threading
dummy_event = threading.Event()
dummy_event.wait(timeout=1)
This can be set() from another thread to indicate that something has completed. But if you are doing stuff in another thread, you could avoid the timeout and event altogether and just join the other thread:
import threading
def create_the_file(completion_event):
# Do stuff to create the file
def Main():
worker = threading.Thread(target=create_the_file)
worker.start()
# We will stop here until the "create_the_file" function finishes
worker.join()
# Do stuff with the file
If you want an example of using events for more fine-grained control, I can show you that...
The threading approach won't work if your platform doesn't provide the threading module. For example, if you try to substitute the dummy_threading module, dummy_event.wait() returns immediately. Not sure about the join() approach.
If you are waiting for other processes to finish, you would be better off managing them from your own script using the subprocess module (and then, for example, using the wait method to be sure the process is done before you do further work).
If you can't manage the subprocess from your script, but you know the PID, you can use the os.waitpid() function. Beware of the OSError if the process has already finished by the time you use this function...
If you want a cross-platform way to watch a directory to be notified of new files, I'd suggest using a GIO FileMonitor from PyGTK/PyGObject. You can get a monitor on a directory using the monitor_directory method of a GIO.File.
Quick sample code for a directory watch:
import gio
def directory_changed(monitor, file1, file2, evt_type):
print "Changed:", file1, file2, evt_type
gfile = gio.File(".")
monitor = gfile.monitor_directory(gio.FILE_MONITOR_NONE, None)
monitor.connect("changed", directory_changed)
import glib
ml = glib.MainLoop()
ml.run()