Using the subprocess module, I'm running 1000 calls to sleep(1) in parallel:
import subprocess
import time
start = time.clock()
procs = []
for _ in range(1000):
proc = subprocess.Popen(["sleep.exe", "1"])
procs.append(proc)
for proc in procs:
proc.communicate()
end = time.clock()
print("Executed in %.2f seconds" % (end - start))
On my 4-core machine, this results in an execution time of a couple of seconds, far less than I expected (~ 1000s / 4).
How does it get optimized away? Does it depend on the sleep implementation (this one is taken from the Windows-Git-executables)?
Sleeping doesn't require any processor time, so your OS can run far more than 4 sleep requests at a time, even though it has only 4 cores. Ideally it would be able to process the entire batch of 1000 in only 1 second, but there's lots of overhead in the creation and teardown of the individual processes.
This is because, subprocess.Popen(..) is not a blocking call. The thread just triggers the child process creation and moves on. It does not wait for it to finish.
In other words, you are spawning 1000 asynchronous processes in a loop, and then waiting on them one by one later on. This asynchronous behavior results in your overall run time of a few seconds.
Calling proc.communicate() waits until the child process is complete (has exited). Now, if you want the sleep times to add up (minus the process creation/destruction) overhead, you'd do:
import subprocess
import time
start = time.clock()
procs = []
#Get the start time
for _ in range(10):
proc = subprocess.Popen(["sleep.exe", "1"])
procs.append(proc)
proc.communicate()
#Get the end time
Does it depend on the sleep implementation (this one is taken from the Windows-Git-executables)?
As I've outlined above, this has nothing to do with implementation of sleep.
Related
I like to run a bunch of processes concurrently but never want to reuse an already existing process. So, basically once a process is finished I like to create a new one. But at all times the number of processes should not exceed N.
I don't think I can use multiprocessing.Pool for this since it reuses processes.
How can I achieve this?
One solution would be to run N processes and wait until all processed are done. Then repeat the same thing until all tasks are done. This solution is not very good since each process can have very different runtimes.
Here is a naive solution that appears to work fine:
from multiprocessing import Process, Queue
import random
import os
from time import sleep
def f(q):
print(f"{os.getpid()} Starting")
sleep(random.choice(range(1, 10)))
q.put("Done")
def create_proc(q):
p = Process(target=f, args=(q,))
p.start()
if __name__ == "__main__":
q = Queue()
N = 5
for n in range(N):
create_proc(q)
while True:
q.get()
create_proc(q)
Pool can reuse a process a limited number of times, including one time only when you pass maxtasksperchild=1. You might also try initializer to see if you can run the picky once per process parts of your library there instead of in your pool jobs.
I have a function readFiles that I need to call 8.5 million times (essentially stress-testing a logger to ensure the log rotates correctly). I don't care about the output/result of the function, only that I run it N times as quickly as possible.
My current solution is this:
from threading import Thread
import subprocess
def readFile(filename):
args = ["/usr/bin/ls", filename]
subprocess.run(args)
def main():
filename = "test.log"
threads = set()
for i in range(8500000):
thread = Thread(target=readFile, args=(filename,)
thread.start()
threads.add(thread)
# Wait for all the reads to finish
while len(threads):
# Avoid changing size of set while iterating
for thread in threads.copy():
if not thread.is_alive():
threads.remove(thread)
readFile has been simplified, but the concept is the same. I need to run readFile 8.5 million times, and I need to wait for all the reads to finish. Based on my mental math, this spawns ~60 threads per second, which means it will take ~40 hours to finish. Ideally, this would finish within 1-8 hours.
Is this possible? Is the number of iterations simply too high for this to be done in a reasonable span of time?
Oddly enough, when I wrote a test script, I was able to generate a thread about every ~0.0005 seconds, which should equate to ~2000 threads per second, but this is not the case here.
I considered iteration 8500000 / 10 times, and spawning a thread which then runs the readFile function 10 times, which should decrease the amount of time by ~90%, but it caused some issues with blocking resources, and I think passing a lock around would be a bit complicated insofar as keeping the function usable by methods that don't incorporate threading.
Any tips?
Based on #blarg's comment, and scripts I've used using multiprocessing, the following can be considered.
It simply reads the same file based on the size of the list. Here I'm looking at 1M reads.
With 1 core it takes around 50 seconds. With 8 cores it's down to around 22 seconds. this is on a windows PC, but I use these scripts on linux EC2 (AWS) instances as well.
just put this in a python file and run:
import os
import time
from multiprocessing import Pool
from itertools import repeat
def readfile(fn):
f = open(fn, "r")
def _multiprocess(mylist, num_proc):
with Pool(num_proc) as pool:
r = pool.starmap(readfile, zip(mylist))
pool.close()
pool.join()
return r
if __name__ == "__main__":
__spec__=None
# use the system cpus or change explicitly
num_proc = os.cpu_count()
num_proc = 1
start = time.time()
mylist = ["test.txt"]*1000000 # here you'll want to 8.5M, but test first that it works with smaller number. note this module is slow with low number of reads, meaning 8 cores is slower than 1 core until you reach a certain point, then multiprocessing is worth it
rs = _multiprocess(mylist, num_proc=num_proc)
print('total seconds,', time.time()-start )
I think you should considering using subprocess here, if you just want to execute ls command I think it's better to use os.system since it will reduce the resource consumption of your current GIL
also you have to put a little delay with time.sleep() while waiting the thread to be finished to reduce resource consumption
from threading import Thread
import os
import time
def readFile(filename):
os.system("/usr/bin/ls "+filename)
def main():
filename = "test.log"
threads = set()
for i in range(8500000):
thread = Thread(target=readFile, args=(filename,)
thread.start()
threads.add(thread)
# Wait for all the reads to finish
while len(threads):
time.sleep(0.1) # put this delay to reduce resource consumption while waiting
# Avoid changing size of set while iterating
for thread in threads.copy():
if not thread.is_alive():
threads.remove(thread)
I saw the following code in a thread tutorial:
from time import sleep, perf_counter
from threading import Thread
start = perf_counter()
def foo():
sleep(5)
threads = []
for i in range(100):
t = Thread(target=foo,)
t.start()
threads.append(t)
for i in threads:
i.join()
end = perf_counter()
print(f'Took {end - start}')
When I run it it prints Took 5.014557975. Okay, that part is fine. It does not take 500 seconds as the non threaded version would.
What I don't understand is how .join works. I noticed without calling .join I got Took 0.007060926999999995 which indicates that the main thread ended before the child threads. Since '.join()' is supposed to block, when the first iteration of the loop occurs won't it be blocked and have to wait 5 seconds till the second iteration? How does it still manage to run?
I keep reading python threading is not truly multithreaded and it only appears to be (runs on a single core), but if that is the case then how exactly is the background time running if it's not parallel?
So '.join()' is supposed to block, so when the first iteration of the loop occurs wont it be blocked and it has to wait 5 seconds till the second iteration?
Remember all the threads are started at the same time and all of them take ~5s.
The second for loop waits for all the threads to finish. It will take roughly 5s for the first thread to finish, but the remaining 99 threads will finish roughly at the same time, and so will the remaining 99 iterations of the loop.
By the time you're calling join() on the second thread, it is either already finished or will be within a couple of milliseconds.
I keep reading python threading is not truly multithreaded and it only appears to be (runs on a single core), but if that is the case then how exactly is the background time running if it's not parallel?
It's a topic that has been discussed a lot, so I won't add another page-long answer.
Tl;dr: Yes, Python Multithreading doesn't help with CPU-intensive tasks, but it's just fine for tasks that spend a lot of time on waiting for something else (Network, Disk-I/O, user input, a time-based event).
sleep() belongs to the latter group of tasks, so Multithreading will speed it up, even though it doesn't utilize multiple cores simultaneously.
The OS is in control when the thread starts and the OS will context-switch (I believe that is the correct term) between threads.
time functions access a clock on your computer via the OS - that clock is always running. As long as the OS periodically gives each thread time to access a clock the thread's target can tell if it has been sleeping long enough.
The threads are not running in parallel, the OS periodically gives each one a chance to look at the clock.
Here is a little finer detail for what is happening. I subclassed Thread and overrode its run and join methods to log when they are called.
Caveat The documentation specifically states
only override __init__ and run methods
I was surprised overriding join didn't cause problems.
from time import sleep, perf_counter
from threading import Thread
import pandas as pd
c = {}
def foo(i):
c[i]['foo start'] = perf_counter() - start
sleep(5)
# print(f'{i} - start:{start} end:{perf_counter()}')
c[i]['foo end'] = perf_counter() - start
class Test(Thread):
def __init__(self,*args,**kwargs):
self.i = kwargs['args'][0]
super().__init__(*args,**kwargs)
def run(self):
# print(f'{self.i} - started:{perf_counter()}')
c[self.i]['thread start'] = perf_counter() - start
super().run()
def join(self):
# print(f'{self.i} - joined:{perf_counter()}')
c[self.i]['thread joined'] = perf_counter() - start
super().join()
threads = []
start = perf_counter()
for i in range(10):
c[i] = {}
t = Test(target=foo,args=(i,))
t.start()
threads.append(t)
for i in threads:
i.join()
df = pd.DataFrame(c)
print(df)
0 1 2 3 4 5 6 7 8 9
thread start 0.000729 0.000928 0.001085 0.001245 0.001400 0.001568 0.001730 0.001885 0.002056 0.002215
foo start 0.000732 0.000931 0.001088 0.001248 0.001402 0.001570 0.001732 0.001891 0.002058 0.002217
thread joined 0.002228 5.008274 5.008300 5.008305 5.008323 5.008327 5.008330 5.008333 5.008336 5.008339
foo end 5.008124 5.007982 5.007615 5.007829 5.007672 5.007899 5.007724 5.007758 5.008051 5.007549
Hopefully you can see that all the threads are started in sequence very close together; once thread 0 is joined nothing else happens till it stops (foo ends) then each of the other threads are joined and terminate.
Sometimes a thread terminates before it is even joined - for threads one plus foo ends before the thread is joined.
I am using python to run multiple subprocesses at the same time.
I want to get the run time of each process.
I am using the subprocess module.
What I did:
I created two separate for loops:
The first one for running each process
The second waits for all processes to end.
for prcs in batch:
p = subprocess.Popen([prcs])
ps.append(p)
for p in ps:
p.wait()
This code works fine for running the processes simultaneously, but I do not know what to add to it in order to get the run time of each process separately.
Edit: Is there a way to get the run time through the module subprocess?
For example: runtime = p.runtime()
I agree with #quamrana that the easiest way to do this would be with threads.
First, we need to import some standard library modules:
import collections
import subprocess
import threading
import time
Instead of a list to store the processes, we use an ordered dictionary to keep track of the processes and their times. Since we don't know how long each thread will take, we need some way to keep track of the original order of our {process: time} pairs. The threads themselves can be stored in a list.
ps = collections.OrderedDict()
ts = []
Initializing the value paired to each process as the current time makes the whole thing cleaner, despite the fact that it is generally inadvisable to use the same variable for two different things (in this case, starting time followed by process duration). The target for our thread simply waits for the thread to finish and updates the ps ordered dictionary from the start time to the process duration.
def time_p(p):
p.wait()
ps[p] = time.time() - ps[p]
for prcs in batch:
p = subprocess.Popen([prcs])
ps[p] = time.time()
ts.append(threading.Thread(target=time_p, args=(p,)))
Now, we just start each of the threads, then wait for them all to complete.
for t in ts:
t.start()
for t in ts:
t.join()
Once they are all complete, we can print out the results for each:
for prcs, p in zip(batch, ps):
print('%s took %s seconds' % (prcs, ps[p]))
I am using Python 2.7.
I am currently using ThreadPoolExecuter like this:
params = [1,2,3,4,5,6,7,8,9,10]
with concurrent.futures.ThreadPoolExecutor(5) as executor:
result = list(executor.map(f, params))
The problem is that f sometimes runs for too long. Whenever I run f, I want to limit its run to 100 seconds, and then kill it.
Eventually, for each element x in param, I would like to have an indication of whether or not f had to be killed, and in case it wasn't - what was the return value.
Even if f times out for one parameter, I still want to run it with the next parameters.
The executer.map method does have a timeout parameter, but it sets a timeout for the entire run, from the time of the call to executer.map, and not for each thread separately.
What is the easiest way to get my desired behavior?
This answer is in terms of python's multiprocessing library, which is usually preferable to the threading library, unless your functions are just waiting on network calls. Note that the multiprocessing and threading libraries have the same interface.
Given you're processes run for potentially 100 seconds each, the overhead of creating a process for each one is fairly small in comparison. You probably have to make your own processes to get the necessary control.
One option is to wrap f in another function that will exectue for at most 100 seconds:
from multiprocessing import Pool
def timeout_f(arg):
pool = Pool(processes=1)
return pool.apply_async(f, [arg]).get(timeout=100)
Then your code changes to:
result = list(executor.map(timeout_f, params))
Alternatively, you could write your own thread/process control:
from multiprocessing import Process
from time import time
def chunks(l, n):
""" Yield successive n-sized chunks from l. """
for i in xrange(0, len(l), n):
yield l[i:i+n]
processes = [Process(target=f, args=(i,)) for i in params]
exit_codes = []
for five_processes = chunks(processes, 5):
for p in five_processes:
p.start()
time_waited = 0
start = time()
for p in five_processes:
if time_waited >= 100:
p.join(0)
p.terminate()
p.join(100 - time_waited)
p.terminate()
time_waited = time() - start
for p in five_processes:
exit_codes.append(p.exit_code)
You'd have to get the return values through something like Can I get a return value from multiprocessing.Process?
The exit codes of the processes are 0 if the processes completed and non-zero if they were terminated.
Techniques from:
Join a group of python processes with a timeout, How do you split a list into evenly sized chunks?
As another option, you could just try to use apply_async on multiprocessing.Pool
from multiprocessing import Pool, TimeoutError
from time import sleep
if __name__ == "__main__":
pool = Pool(processes=5)
processes = [pool.apply_async(f, [i]) for i in params]
results = []
for process in processes:
try:
result.append(process.get(timeout=100))
except TimeoutError as e:
results.append(e)
Note that the above possibly waits more than 100 seconds for each process, as if the first one takes 50 seconds to complete, the second process will have had 50 extra seconds in its run time. More complicated logic (such as the previous example) is needed to enforce stricter timeouts.