Multiprocessing with Python Process - python

I'm trying to modify a Python script to multiprocess with "Process". The problem is it's not working. In a first step, the content is retrieved sequentially (test1, test2). In a second one, it is to be called in parallel (test1 and test2). There is practically no speed difference. If you execute the functions individually, you will notice a difference. In my opinion, parallelization should only take as long as the longest individual process. What am I missing here?
import multiprocessing
import time
def test1(k):
k = k * k
for e in range(1, k):
e = e**k
def test2(k):
k = k * k
for e in range(1, k):
e = e + 5 - 5*k ** 4000
if __name__ == '__main__':
start = time.time()
test1(100)
test2(100)
end = time.time()
print(end-start)
start = time.time()
worker_1 = multiprocessing.Process(target=test1(100))
worker_1.start()
worker_2 = multiprocessing.Process(target=test2, args=(100,))
worker_2.start()
worker_1.join()
worker_2.join()
end = time.time()
print(end-start)
I want to add that I checked the task manager and saw that only 1 core is used. (4 real Core only 25% CPU => 1Core 100% used)
I know Pool Class, but I don't want to use it.
Thank you for your help.
Update
Hello, everybody,
the one with the "typo" was unfavorable. Sorry about that. Bakuriu, thank you for your answer. In fact, you're right. I think it was the typo and too much work. :-( So I changed the example once again. For all who are interested:
I create two functions, in the first part of the main I run 3 times the functions sequentially. My computer needs approx. 36 sec. Then I start two new processes. These calculate their results here in parallel. As a small addition, the skin process of the program itself also calculates the function test1, which should show that the main program itself can also do something. I get a computing time of 12 sec. So that it is comprehensible for all in the Internet, what this means I once attached a picture here.
Task Manager
import multiprocessing
import time
def test1(k):
k = k * k
for e in range(1, k):
e = e**k
def test2(k):
k = k * k
for e in range(1, k):
e = e**k
if __name__ == '__main__':
start = time.time()
test1(100)
test2(100)
test1(100)
end = time.time()
print(end-start)
start = time.time()
worker_1 = multiprocessing.Process(target=test1, args=(100,))
worker_1.start()
worker_2 = multiprocessing.Process(target=test2, args=(100,))
worker_2.start()
test1(100)
worker_1.join()
worker_2.join()
end = time.time()
print(end-start)

Your code is executing sequentially because instead of passing test1 to the Process's target argument you are passing test1's result to it!
You want to do this:
worker_1 = multiprocessing.Process(target=test1, args=(100,))
As you do in the other call not this:
worker_1 = multiprocessing.Process(target=test1(100))
This code is first executing test1(100), then returns None and assigns that to target spawning an "empty process". After that you spawn a second process that executes test2(100). So you execute the code sequentially plus you add the overhead of spawning two processes.

Related

How long does it take to create a thread in python

I'm trying to finish my programming course and I'm stuck on one exercise.
I have to count how much time it takes in Python to create threads and whether it depends on the number of threads created.
I wrote a simple script and I don't know if it is good:
import threading
import time
def fun1(a,b):
c = a + b
print(c)
time.sleep(100)
times = []
for i in range(10000):
start = time.time()
threading.Thread(target=fun1, args=(55,155)).start()
end = time.time()
times.append(end-start)
print(times)
In times[] I got a 10000 results near 0.0 or exacly 0.0.
And now I don't know if I created the test because I don't understand something, or maybe the result is correct and the time of creating a thread does not depend on the number of already created ones?
Can U help me with it? If it's worng solution, explain me why, or if it's correct confirm it? :)
So there are two ways to interpret your question:
Whether the existence of other threads (that have not been started) affects creation time for new threads
Whether other threads running in the background (threads already started) affects creation time for new threads.
Checking the first one
In this case, you simply don't start the threads:
import threading
import time
def fun1(a,b):
c = a + b
print(c)
time.sleep(100)
times = []
for i in range(10):
start = time.time()
threading.Thread(target=fun1, args=(55,155)) # don't start
end = time.time()
times.append(end-start)
print(times)
output for 10 runs:
[4.696846008300781e-05, 2.8848648071289062e-05, 2.6941299438476562e-05, 2.5987625122070312e-05, 2.5987625122070312e-05, 2.5987625122070312e-05, 2.5987625122070312e-05, 2.5987625122070312e-05, 2.5033950805664062e-05, 2.6941299438476562e-05]
As you can see, the times are about the same (as you would expect).
Checking the second one
In this case, we want the previously created threads to keep running as we create more threads. So we give each thread a task that never finishes:
import threading
import time
def fun1(a,b):
while True:
pass # never ends
times = []
for i in range(100):
start = time.time()
threading.Thread(target=fun1, args=(55,155)).start()
end = time.time()
times.append(end-start)
print(times)
output:
Over 100 runs, the first one took 0.0003440380096435547 whereas the last one took 0.3017098903656006 so there's quite a magnitude of increase there.

Python: Very strange behavior with multiprocessing; later code causes "retroactive" slowdown of earlier code

I'm trying to learn how to implement multiprocessing for computing Monte Carlo simulations. I reproduced the code from this simple tutorial where the aim is to compute an integral. I also compare it to the answer from WolframAlpha and compute the error. The first part of my code has no problems and is just there to define the integral function and declare some constants:
import numpy as np
import multiprocessing as mp
import time
def integrate(iterations):
np.random.seed()
mc_sum = 0
chunks = 10000
chunk_size = int(iterations/chunks)
for i in range(chunks):
u = np.random.uniform(size=chunk_size)
mc_sum += np.sum(np.exp(-u * u))
normed = mc_sum / iterations
return normed
wolfram_answer = 0.746824132812427
mc_iterations = 1000000000
But there's some very spooky stuff that happens in the next two parts (I've labelled them because it's important). First (labelled "BLOCK 1"), I do the simulation without any multiprocessing at all, just to get a benchmark. After this (labelled "BLOCK 2"), I do the same thing but with a multiprocessing step. If you're reproducing this, you may want to adjust the num_procs variable depending on how many cores your machines has:
#### BLOCK 1
single_before = time.time()
single = integrate(mc_iterations)
single_after = time.time()
single_duration = np.round(single_after - single_before, 3)
error_single = (wolfram_answer - single)/wolfram_answer
print(mc_iterations, "iterations on single-thread:",
single_duration, "seconds.")
print("Estimation error:", error_single)
print("")
#### BLOCK 2
if __name__ == "__main__":
num_procs = 8
multi_iterations = int(mc_iterations / num_procs)
multi_before = time.time()
pool = mp.Pool(processes = num_procs)
multi_result = pool.map(integrate, [multi_iterations]*num_procs)
multi_result = np.array(multi_result).mean()
multi_after = time.time()
multi_duration = np.round(multi_after - multi_before, 3)
error_multi = (wolfram_answer - multi_result)/wolfram_answer
print(num_procs, "threads with", multi_iterations, "iterations each:",
multi_duration, "seconds.")
print("Estimation error:", error_multi)
The output is:
1000000000 iterations on single-thread: 37.448 seconds.
Estimation error: 1.17978774235e-05
8 threads with 125000000 iterations each: 54.697 seconds.
Estimation error: -5.88380936901e-06
So, the multiprocessing is slower. That's not at all unheard of; maybe the overhead from the multiprocessing is just more than the gains from the parallelization?
But, that is not what is happening. Watch what happens when I merely comment out the first block:
#### BLOCK 1
##single_before = time.time()
##single = integrate(mc_iterations)
##single_after = time.time()
##single_duration = np.round(single_after - single_before, 3)
##error_single = (wolfram_answer - single)/wolfram_answer
##
##print(mc_iterations, "iterations on single-thread:",
## single_duration, "seconds.")
##print("Estimation error:", error_single)
##print("")
#### BLOCK 2
if __name__ == "__main__":
num_procs = 8
multi_iterations = int(mc_iterations / num_procs)
multi_before = time.time()
pool = mp.Pool(processes = num_procs)
multi_result = pool.map(integrate, [multi_iterations]*num_procs)
multi_result = np.array(multi_result).mean()
multi_after = time.time()
multi_duration = np.round(multi_after - multi_before, 3)
error_multi = (wolfram_answer - multi_result)/wolfram_answer
print(num_procs, "threads with", multi_iterations, "iterations each:",
multi_duration, "seconds.")
print("Estimation error:", error_multi)
The output is:
8 threads with 125000000 iterations each: 6.662 seconds.
Estimation error: 3.86063069069e-06
That's right -- the time to complete the multiprocessing goes down from 55 seconds to less than 7 seconds! And that's not even the weirdest part. Watch what happens when I move Block 1 to be after Block 2:
#### BLOCK 2
if __name__ == "__main__":
num_procs = 8
multi_iterations = int(mc_iterations / num_procs)
multi_before = time.time()
pool = mp.Pool(processes = num_procs)
multi_result = pool.map(integrate, [multi_iterations]*num_procs)
multi_result = np.array(multi_result).mean()
multi_after = time.time()
multi_duration = np.round(multi_after - multi_before, 3)
error_multi = (wolfram_answer - multi_result)/wolfram_answer
print(num_procs, "threads with", multi_iterations, "iterations each:",
multi_duration, "seconds.")
print("Estimation error:", error_multi)
#### BLOCK 1
single_before = time.time()
single = integrate(mc_iterations)
single_after = time.time()
single_duration = np.round(single_after - single_before, 3)
error_single = (wolfram_answer - single)/wolfram_answer
print(mc_iterations, "iterations on single-thread:",
single_duration, "seconds.")
print("Estimation error:", error_single)
print("")
The output is:
8 threads with 125000000 iterations each: 54.938 seconds.
Estimation error: 7.42415402896e-06
1000000000 iterations on single-thread: 37.396 seconds.
Estimation error: 9.79800494235e-06
We're back to the slow output again, which is completely crazy! Isn't Python supposed to be interpreted? I know that statement comes with a hundred caveats, but I took for granted that the code gets executed line-by-line, so stuff that comes afterwards (outside of functions, classes, etc) can't affect the stuff from before, because it hasn't been "looked at" yet.
So, how can the stuff that gets executed after the multiprocessing step has concluded, retroactively slow down the multiprocessing code?
Finally, the fast behavior is restored merely by indenting Block 1 to be inside the if __name__ == "__main__" block, because of course it does:
#### BLOCK 2
if __name__ == "__main__":
num_procs = 8
multi_iterations = int(mc_iterations / num_procs)
multi_before = time.time()
pool = mp.Pool(processes = num_procs)
multi_result = pool.map(integrate, [multi_iterations]*num_procs)
multi_result = np.array(multi_result).mean()
multi_after = time.time()
multi_duration = np.round(multi_after - multi_before, 3)
error_multi = (wolfram_answer - multi_result)/wolfram_answer
print(num_procs, "threads with", multi_iterations, "iterations each:",
multi_duration, "seconds.")
print("Estimation error:", error_multi)
#### BLOCK 1
single_before = time.time()
single = integrate(mc_iterations)
single_after = time.time()
single_duration = np.round(single_after - single_before, 3)
error_single = (wolfram_answer - single)/wolfram_answer
print(mc_iterations, "iterations on single-thread:",
single_duration, "seconds.")
print("Estimation error:", error_single)
print("")
The output is:
8 threads with 125000000 iterations each: 7.293 seconds.
Estimation error: 1.10350027622e-05
1000000000 iterations on single-thread: 31.035 seconds.
Estimation error: 2.53582945763e-05
And the fast behavior is also restored if you keep Block 1 inside the if block, but move it to above where num_procs is defined (not shown here because this question is already getting long).
So, what on Earth is causing this behavior? I'm guessing it's some kind of race-condition to do with threading and process branching, but from my level of expertise it might as well be that my Python interpreter is haunted.
This is because you are using Windows. On Windows, each subprocess is generated using the 'spawn' method which essentially starts a new python interpreter and imports your module instead of forking the process.
This is a problem, because all the code outside if __name__ == '__main__' is executed again. This can lead to a multiprocessing bomb if you put the multiprocessing code at the top-level, because it will start spawning processes until you run out of memory.
This is actually warned about in the docs
Safe importing of main module
Make sure that the main module can be safely imported by a new Python
interpreter without causing unintended side effects (such a starting a
new process).
...
Instead one should protect the “entry point” of the program by using
if __name__ == '__main__'
...
This allows the newly spawned Python interpreter to safely import the
module...
That section used to be called "Windows" in the older docs on Python 2.
Adding some detail, on Windows the module is imported "from scratch" in each worker process. That means everything in the module is executed by each worker. So, in your first example, each worker process first executes "BLOCK 1".
But your output doesn't reflect that. You should have gotten a line of output like
1000000000 iterations on single-thread: 37.448 seconds.
from each of your 8 worker processes. But your output doesn't show that. Perhaps you're using an IDE that suppresses output from spawned processes? If you run it in a "DOS box" (cmd.exe window) instead, that won't suppress output, and can make what's going on clearer.

How to get end of process time per processor using "pool" in Python?

I use Multiprocessing library in Python to distribute a function over multiple cores. To do that I use "Pool" function, but I want to know when each processor has completed its work.
Here is the code :
def parallel(m,G):
D=0
for i in xrange(G):
D+=random()
return 1*(D<1)
pool=Pool()
TOTAL=0
for i in xrange(10):
TOTAL += sum(pool.map(partial(parallel,G=2),xrange(100)))
print TOTAL
I know how to use time.time() in normal situation, but what I need is to know when each core has completed is part of the job. If I put a time stamp directly in the function I will get many time values without knowing on what core it is processed.
Any advice is welcome!
You may return the completion time along with the actual result from parallel and then pick the last timestamp for each worker.
import time
from random import random
from functools import partial
from multiprocessing import Pool, current_process
def parallel(m, G):
D = 0
for i in xrange(G):
D += random()
# uncomment to give the other workers more chances to run
# time.sleep(.001)
return (current_process().name, time.time()), 1 * (D < 1)
# don't deny the existence of Windows
if __name__ == '__main__':
pool = Pool()
TOTAL = 0
proc_times = {}
for i in xrange(5):
# times is a list of proc_name:timestamp pairs
times, results = zip(*pool.map(partial(parallel, G=2), xrange(100)))
TOTAL += sum(results)
# process_times_loc is guaranteed to hold the last timestamp
# for each proc_name, see the doc on dict
proc_times_loc = dict(times)
print 'local completion times:', proc_times_loc
proc_times.update(proc_times_loc)
print TOTAL
print 'total completion times:', proc_times
However when jobs are that simple you may find that calling time.time each time consumes too much of CPU time.)

multiprocessing.Pool and Rate limit

I'm making some API requests which are limited at 20 per second. As to get the answer the waiting time is about 0.5 secs I thought to use multiprocessing.Pool.map and using this decorator
rate-limiting
So my code looks like
def fun(vec):
#do stuff
def RateLimited(maxPerSecond):
minInterval = 1.0 / float(maxPerSecond)
def decorate(func):
lastTimeCalled = [0.0]
def rateLimitedFunction(*args,**kargs):
elapsed = time.clock() - lastTimeCalled[0]
leftToWait = minInterval - elapsed
if leftToWait>0:
time.sleep(leftToWait)
ret = func(*args,**kargs)
lastTimeCalled[0] = time.clock()
return ret
return rateLimitedFunction
return decorate
#RateLimited(20)
def multi(vec):
p = Pool(5)
return p.map(f, vec)
I have 4 cores and this program works fine and there is an improvement in time compared to the loop version. Furthermore, when the Pool argument is 4,5,6 it works and the time is smaller for Pool(6) but when I use 7+ I got errors (Too many connections per second I guess).
Then if my function is more complicated and can do 1-5 requests the decorator doesn't work as expected.
What else I can use in this case?
UPDATE
For anyone looking for use Pool remembers to close it otherwise you are going to use all the RAM
def multi(vec):
p = Pool(5)
res=p.map(f, vec)
p.close()
return res
UPDATE 2
I found that something like this WebRequestManager can do the trick. The problem is that doesn't work with multiprocessing. Pool with 19-20 processes because the time is stored in the class you need to call when you run the request.
Your indents are inconsistent up above which makes it harder to answer this, but I'll take a stab.
It looks like you're rate limiting the wrong thing; if f is supposed be limited, you need to limit the calls to f, not the calls to multi. Doing this in something that's getting dispatched to the Pool won't work, because the forked workers would each be limiting independently (forked processes will have independent tracking of the time since last call).
The easiest way to do this would be to limit how quickly the iterator that the Pool pulls from produces results. For example:
import collections
import time
def rate_limited_iterator(iterable, limit_per_second):
# Initially, we can run immediately limit times
runats = collections.deque([time.time()] * limit_per_second)
for x in iterable:
runat, now = runats.popleft(), time.time()
if now < runat:
time.sleep(runat - now)
runats.append(time.time() + 1)
yield x
def multi(vec):
p = Pool(5)
return p.map(f, rate_limited_iterator(vec, 20))

Multiprocessing takes longer?

I am trying to understand how to get children to write to a parent's variables. Maybe I'm doing something wrong here, but I would have imagined that multiprocessing would have taken a fraction of the time that it is actually taking:
import multiprocessing, time
def h(x):
h.q.put('Doing: ' + str(x))
return x
def f_init(q):
h.q = q
def main():
q = multiprocessing.Queue()
p = multiprocessing.Pool(None, f_init, [q])
results = p.imap(h, range(1,5))
p.close()
-----Results-----:
1
2
3
4
Multiprocessed: 0.0695610046387 seconds
1
2
3
4
Normal: 2.78949737549e-05 seconds # much shorter
for i in range(len(range(1,5))):
print results.next() # prints 1, 4, 9, 16
if __name__ == '__main__':
start = time.time()
main()
print "Multiprocessed: %s seconds" % (time.time()-start)
start = time.time()
for i in range(1,5):
print i
print "Normal: %s seconds" % (time.time()-start)
#Blender basically already answered your question, but as a comment. There is some overhead associated with the multiprocessing machinery, so if you incur the overhead without doing any significant work, it will be slower.
Try actually doing some work that parallelizes well. For example, write Python code to open a file, scan it using a regular expression, and pull out matching lines; then make a list of ten big files and time how long it takes to do all ten with multiprocessing vs. plain Python. Or write code to compute an expensive function and try that.
I have used multiprocessing.Pool() just to run a bunch of instances of an external program. I used subprocess to run an audio encoder, and it ran four instances of the encoder at once for a noticeable speedup.

Categories