Why this multi-threading is slower than single thread - python

This is my first time to use multi-threading..
I write a code to process every file in a directory like:
list_battle=[]
start = time.time()
for filepath in pathlib.Path(dir_battle).glob('**/*'):
battle_json = gzip.GzipFile(filepath,'rb').read().decode("utf-8")
battle_id = eval(type_cmd)
list_battle.append((battle_id, battle_json))
end = time.time()
print(end - start)
it shows the code runs 8.74 seconds.
Then, I tried to use multi-threading as follows:
# define function to process each file
def get_file_data(path, cmd, result_list):
data_json = gzip.GzipFile(path,'rb').read().decode("utf-8")
data_id = eval(cmd)
result_list.append((battle_id, battle_json))
# start to run multi-threading
pool = Pool(5)
start = time.time()
for filepath in pathlib.Path(dir_battle).glob('**/*'):
pool.apply_async( get_file_data(filepath, type_cmd, list_battle) )
end = time.time()
print(end - start)
However, the result shows it takes 12.36 seconds!
In my view, in single threading, in each turn of loop, the loop waits for the single thread to finish codes and then starts the next turn; while in multi-processing, 1st turn, the loop calls thread1 to run the codes, then 2nd turn calls thread2 to run.... during this job dispatching for other 4 threads, thread1 is running and when the 6th turn arrives, it should finishes its job and the loop could directly ask it to run the code of 6th trun...
So this should be quicker than single thread...Why the code with multi-processing runs even slower? How to address this issue? What is wrong with my thinking?
Any help is appreciated.

Multiprocessing does not reduce your processing time unless your process has a lot of dead time (waiting). The main purpose of multiprocessing is parallelism of different tasks at the burden of context switching. Whenever you switch from one task to another, interupting the previous one, your program needs to store all variables for the former task and get the ones from the new one. This takes time as well.
This means the shorter the time you spend per task, the less efficient (in regards of computing time) your multiprocessing is.

Related

Fastest way to call a function millions of times in Python

I have a function readFiles that I need to call 8.5 million times (essentially stress-testing a logger to ensure the log rotates correctly). I don't care about the output/result of the function, only that I run it N times as quickly as possible.
My current solution is this:
from threading import Thread
import subprocess
def readFile(filename):
args = ["/usr/bin/ls", filename]
subprocess.run(args)
def main():
filename = "test.log"
threads = set()
for i in range(8500000):
thread = Thread(target=readFile, args=(filename,)
thread.start()
threads.add(thread)
# Wait for all the reads to finish
while len(threads):
# Avoid changing size of set while iterating
for thread in threads.copy():
if not thread.is_alive():
threads.remove(thread)
readFile has been simplified, but the concept is the same. I need to run readFile 8.5 million times, and I need to wait for all the reads to finish. Based on my mental math, this spawns ~60 threads per second, which means it will take ~40 hours to finish. Ideally, this would finish within 1-8 hours.
Is this possible? Is the number of iterations simply too high for this to be done in a reasonable span of time?
Oddly enough, when I wrote a test script, I was able to generate a thread about every ~0.0005 seconds, which should equate to ~2000 threads per second, but this is not the case here.
I considered iteration 8500000 / 10 times, and spawning a thread which then runs the readFile function 10 times, which should decrease the amount of time by ~90%, but it caused some issues with blocking resources, and I think passing a lock around would be a bit complicated insofar as keeping the function usable by methods that don't incorporate threading.
Any tips?
Based on #blarg's comment, and scripts I've used using multiprocessing, the following can be considered.
It simply reads the same file based on the size of the list. Here I'm looking at 1M reads.
With 1 core it takes around 50 seconds. With 8 cores it's down to around 22 seconds. this is on a windows PC, but I use these scripts on linux EC2 (AWS) instances as well.
just put this in a python file and run:
import os
import time
from multiprocessing import Pool
from itertools import repeat
def readfile(fn):
f = open(fn, "r")
def _multiprocess(mylist, num_proc):
with Pool(num_proc) as pool:
r = pool.starmap(readfile, zip(mylist))
pool.close()
pool.join()
return r
if __name__ == "__main__":
__spec__=None
# use the system cpus or change explicitly
num_proc = os.cpu_count()
num_proc = 1
start = time.time()
mylist = ["test.txt"]*1000000 # here you'll want to 8.5M, but test first that it works with smaller number. note this module is slow with low number of reads, meaning 8 cores is slower than 1 core until you reach a certain point, then multiprocessing is worth it
rs = _multiprocess(mylist, num_proc=num_proc)
print('total seconds,', time.time()-start )
I think you should considering using subprocess here, if you just want to execute ls command I think it's better to use os.system since it will reduce the resource consumption of your current GIL
also you have to put a little delay with time.sleep() while waiting the thread to be finished to reduce resource consumption
from threading import Thread
import os
import time
def readFile(filename):
os.system("/usr/bin/ls "+filename)
def main():
filename = "test.log"
threads = set()
for i in range(8500000):
thread = Thread(target=readFile, args=(filename,)
thread.start()
threads.add(thread)
# Wait for all the reads to finish
while len(threads):
time.sleep(0.1) # put this delay to reduce resource consumption while waiting
# Avoid changing size of set while iterating
for thread in threads.copy():
if not thread.is_alive():
threads.remove(thread)

How does Thread().join work in the following case?

I saw the following code in a thread tutorial:
from time import sleep, perf_counter
from threading import Thread
start = perf_counter()
def foo():
sleep(5)
threads = []
for i in range(100):
t = Thread(target=foo,)
t.start()
threads.append(t)
for i in threads:
i.join()
end = perf_counter()
print(f'Took {end - start}')
When I run it it prints Took 5.014557975. Okay, that part is fine. It does not take 500 seconds as the non threaded version would.
What I don't understand is how .join works. I noticed without calling .join I got Took 0.007060926999999995 which indicates that the main thread ended before the child threads. Since '.join()' is supposed to block, when the first iteration of the loop occurs won't it be blocked and have to wait 5 seconds till the second iteration? How does it still manage to run?
I keep reading python threading is not truly multithreaded and it only appears to be (runs on a single core), but if that is the case then how exactly is the background time running if it's not parallel?
So '.join()' is supposed to block, so when the first iteration of the loop occurs wont it be blocked and it has to wait 5 seconds till the second iteration?
Remember all the threads are started at the same time and all of them take ~5s.
The second for loop waits for all the threads to finish. It will take roughly 5s for the first thread to finish, but the remaining 99 threads will finish roughly at the same time, and so will the remaining 99 iterations of the loop.
By the time you're calling join() on the second thread, it is either already finished or will be within a couple of milliseconds.
I keep reading python threading is not truly multithreaded and it only appears to be (runs on a single core), but if that is the case then how exactly is the background time running if it's not parallel?
It's a topic that has been discussed a lot, so I won't add another page-long answer.
Tl;dr: Yes, Python Multithreading doesn't help with CPU-intensive tasks, but it's just fine for tasks that spend a lot of time on waiting for something else (Network, Disk-I/O, user input, a time-based event).
sleep() belongs to the latter group of tasks, so Multithreading will speed it up, even though it doesn't utilize multiple cores simultaneously.
The OS is in control when the thread starts and the OS will context-switch (I believe that is the correct term) between threads.
time functions access a clock on your computer via the OS - that clock is always running. As long as the OS periodically gives each thread time to access a clock the thread's target can tell if it has been sleeping long enough.
The threads are not running in parallel, the OS periodically gives each one a chance to look at the clock.
Here is a little finer detail for what is happening. I subclassed Thread and overrode its run and join methods to log when they are called.
Caveat The documentation specifically states
only override __init__ and run methods
I was surprised overriding join didn't cause problems.
from time import sleep, perf_counter
from threading import Thread
import pandas as pd
c = {}
def foo(i):
c[i]['foo start'] = perf_counter() - start
sleep(5)
# print(f'{i} - start:{start} end:{perf_counter()}')
c[i]['foo end'] = perf_counter() - start
class Test(Thread):
def __init__(self,*args,**kwargs):
self.i = kwargs['args'][0]
super().__init__(*args,**kwargs)
def run(self):
# print(f'{self.i} - started:{perf_counter()}')
c[self.i]['thread start'] = perf_counter() - start
super().run()
def join(self):
# print(f'{self.i} - joined:{perf_counter()}')
c[self.i]['thread joined'] = perf_counter() - start
super().join()
threads = []
start = perf_counter()
for i in range(10):
c[i] = {}
t = Test(target=foo,args=(i,))
t.start()
threads.append(t)
for i in threads:
i.join()
df = pd.DataFrame(c)
print(df)
0 1 2 3 4 5 6 7 8 9
thread start 0.000729 0.000928 0.001085 0.001245 0.001400 0.001568 0.001730 0.001885 0.002056 0.002215
foo start 0.000732 0.000931 0.001088 0.001248 0.001402 0.001570 0.001732 0.001891 0.002058 0.002217
thread joined 0.002228 5.008274 5.008300 5.008305 5.008323 5.008327 5.008330 5.008333 5.008336 5.008339
foo end 5.008124 5.007982 5.007615 5.007829 5.007672 5.007899 5.007724 5.007758 5.008051 5.007549
Hopefully you can see that all the threads are started in sequence very close together; once thread 0 is joined nothing else happens till it stops (foo ends) then each of the other threads are joined and terminate.
Sometimes a thread terminates before it is even joined - for threads one plus foo ends before the thread is joined.

Multiprocessing pool map_async for one function then block before the next (python 3)

please be warned that this demonstration code generates a few GB data.
I have been using versions of the code below for multiprocessing for some time. It works well when the run time of each process in the pool is similar but if one process takes much longer I end up with many blocked processes waiting on the one, so I'm trying to make it run asynchronously - just for one function at a time.
For example, if I have 70 cores and need to run a function 2000 times I want that to run asynchronously then wait for the last process before calling the next function. Currently it just submits processes in batches of how ever many cores I give it and each batch has to wait for the longest process.
As you can see I've tried using map_async but this is clearly the wrong syntax. Can anyone help me out?
import os
p='PATH/test/'
def f1(tup):
x,y=tup
to_write = x*(y**5)
with open(p+x+str(y)+'.txt','w') as fout:
fout.write(to_write)
def f2(tup):
x,y=tup
print (os.path.exists(p+x+str(y)+'.txt'))
def call_func(f,nos,threads,call):
print (call)
for i in range(0, len(nos), threads):
print (i)
chunk = nos[i:i + threads]
tmp = [('args', no) for no in chunk]
pool.map(f, tmp)
#pool.map_async(f, tmp)
nos=[i for i in range(55)]
threads=8
if __name__ == '__main__':
with Pool(processes=threads) as pool:
call_func(f1,nos,threads,'f1')
call_func(f2,nos,threads,'f2')
map will only return and map_async will only call the callback after all tasks of the current chunk are done.
So you can only either give all tasks to map/map_async at once or use apply_async (initially called threads times) where the callback calls apply_asyncfor the next task.
If the actual return values of the call don't matter (or at least their order doesn't), imap_unordered may be another efficient solution when giving it all tasks at once (or an iterator/generator producing the tasks on demand)

pool.apply_async takes so long to finish, how to speed up?

I call pool.apply_async() with 14 cores.
import multiprocessing
from time import time
import timeit
informative_patients = informative_patients_2500_end[20:]
pool = multiprocessing.Pool(14)
results = []
wLength = [20,30,50]
start = time()
for fn in informative_patients:
result = pool.apply_async(compute_features_test_set, args = (fn,
wLength), callback=results.append)
pool.close()
pool.join()
stop = timeit.default_timer()
print stop - start
The problem is it finishes calling compute_features_test_set() function for the first 13 data in less than one hour, but it takes more than one hour to finish the last one. The size of the data for all the 14 data-set is the same. I tried putting pool.terminate() after pool.close() but in this case it doesn't even start the pool and terminate the pool immediately without going inside the for loop. This always happen in the same way and if I use more cores and more data set, always the last one takes so long to finish. My compute_features_test_set() function is a simple feature extraction code and works correctly. I work on a server with Linux red hat 6, python 2.7 and jupyter. Computation time is important to me and my question is what is wrong here and how I can fix it to get the all the computation done in a reasonable time?
Question: ... what is wrong here and how I can fix it
Couldn't catch this as a multiprocessing issue.
But How do you get this: "always the last one takes so long to finish"?
You are using callback=results.append instead of a own function?
Edit your Question and show How you timeit one Process Time.
Also add your Python Version to your Question.
Do the following to verify it's not a Data issue:
start = time()
results.append(
compute_features_test_set(<First informative_patients>, wLength
)
stop = timeit.default_timer()
print stop - start
start = time()
results.append(
compute_features_test_set(<Last informative_patients>, wLength
)
stop = timeit.default_timer()
print stop - start
Compare the two times you get.

Python threads do something at the EXACT same time

Is it possible to have 2, 3 or more threads in Python to be able to execute something simultaneously - at the exact same moment? Is it possible if one of the threads is late, for the other to be waiting for it, so the last request can be executed at the same time?
Example: There are two threads that are calculating specific parameters, after they have done that they need to click one button at the same time (to send post request to the server).
"Exact the same time" is really difficult, at almost the same time is possible but you need to use multiprocessing instead of threads. Here one example.
from time import time
from multiprocessing import Pool
def f(*args):
while time() < start + 5: #syncronize the execution of each process
pass
print(time())
start = time()
with Pool(10) as p:
p.map(f, range(10))
It prints
1495552973.6672032
1495552973.6672032
1495552973.669514
1495552973.667697
1495552973.6672032
1495552973.668086
1495552973.6693969
1495552973.6672032
1495552973.6677089
1495552973.669164
Note that some of the processes are really simultaneous (in the 10e-7 second precision). It's impossible to guarantee that all the processes will be executed at the very same moment.
However, if you limitate the number of processes to the number of core you actually have, then most of the time they will run exactly at the same moment.

Categories