Python parallel library takes longer than sequential execution

Python parallel library takes longer than sequential execution - python

I am trying to leverage multi processing by using the Parallel library of python. However strangely I see that the sequential execution is taking longer compared to the parallel version. Below is the code which I am running for comparison
import time
from joblib import Parallel, delayed
def compute_features(summary, article):
feature_dict = {}
feature_dict["f1"] = summary
feature_dict["f2"] = article
return feature_dict
def construct_input(n):
summaries = []
articles = []
for i in range(n):
summaries.append("summary_" + str(i))
articles.append("articles_" + str(i))
return summaries, articles
def sequential_test(n):
print("Sequential test")
start_time = time.time()
summaries, articles = construct_input(n)
feature_list = []
for i in range(n):
feature_list.append(compute_features(summaries[i], articles[i]))
total_time = time.time() - start_time
print("Total Time Sequential : %s" % total_time)
# print(feature_list)
def parallel_test(n):
print("Parallel test")
start_time = time.time()
summaries, articles = construct_input(n)
feature_list = []
executor = Parallel(n_jobs=8, backend="multiprocessing", prefer="processes", verbose=True)
# executor = Parallel(n_jobs=4, prefer="threads")
tasks = (delayed(compute_features)(summaries[i], articles[i]) for i in range(n))
results = executor(tasks)
for result in results:
feature_list.append(result)
total_time = time.time() - start_time
print("Total Time Parallel : %s" % total_time)
# print(feature_list)
if __name__ == "__main__":
n = 500000
sequential_test(n)
parallel_test(n)
I get the following output when I run the code above
Sequential test
Total Time Sequential : 1.200118064880371
Parallel test
[Parallel(n_jobs=8)]: Using backend MultiprocessingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done 56 tasks | elapsed: 0.0s
[Parallel(n_jobs=8)]: Done 49136 tasks | elapsed: 1.0s
[Parallel(n_jobs=8)]: Done 500000 out of 500000 | elapsed: 4.7s finished
Total Time Parallel : 5.427206039428711
I am running this code on a mac with the following configuration
Can you guys please help me understand why this is so? And if the hardware were to change say to use a GPU would the code be any faster? Appreciate your responses. Thanks in advance.

Related

How can I improve what I already have?

I have created a windows internet speed test, I'd like to improve it and make the code more presentable as well as better define my functions.
When the computer reaches initialise, due to the variable being in another function, it cannot call it. How can I rectify this as I have various variables being called in different functions.
Feel free to use this speedtester as well, I will be working on developing a useful phone app to run the code as well.
The code prints the current date and time, searches for the connected SSID, initialises the speedtest module, scans for servers, selects the best server, initiates ping test, then download speed test, then upload speed test, followed by printing the results on screen and writing it to a simple txt file for viewing later.
Each function shows its run time using the time module and lastly total execution time with date and time also.
It works perfectly without the functions, and on android without find_ssid(): but I keep running into the trouble of localised variables.
import speedtest
from datetime import datetime
import subprocess
import re
import time
def main():
def date():
dt_now = datetime.now()
dtn = dt_now.strftime("%a %d-%m-%Y, %H:%M:%S%p")
return dtn
print(date())
def find_ssid():
stt = time.time()
cdop = subprocess.run(["netsh", "WLAN", "show", "interfaces"], capture_output=True).stdout.decode()
ssid = (re.findall("SSID : (.*)\r", cdop))
for char in ssid:
ssid = f"Network Name: {char} \n"
sid = time.time() - stt
print(f'SSID found in: {sid:.2f}s')
print(ssid)
find_ssid()
def initialise():
print("Initialising network speed test... ")
st = speedtest.Speedtest()
print("Network speed test active.")
sta = time.time() - stt
print(f'Speed test activation time: {sta - sid:.2f}s')
def scan_servers():
print("Scanning for available servers...")
st.get_servers()
print("Found available servers.")
sft = time.time() - stt
print(f'Servers found in: {sft - sta:.2f}s')
def best_server():
print("Choosing best server...")
bserv = st.get_best_server()
print(f"Best server is: {bserv['sponsor']} - {bserv['host']} located in {bserv['name']}, {bserv['country']}")
bst = time.time() - stt
print(f'Best server found in: {bst - sft:.2f}s')
def ping_test():
print("Ping testing...")
p = st.results.ping
ph = f"Ping: {p:.2f}ms"
print("Ping test complete.")
ptt = time.time() - stt
print(f'Ping test completed in: {ptt - bst:.2f}s')
def download_speed_test():
print("Download speed testing...")
ds = st.download()
dsh = f"Download speed: {ds / 1024 / 1024:.2f}mb/s"
print("Download speed test complete.")
dst = time.time() - stt
print(f'Download speed test completed in: {dst - ptt:.2f}s')
def upload_speed_test():
print("Upload speed testing...")
us = st.upload()
ust = time.time() - stt
ush = f"Upload speed: {us / 1024 / 1024:.2f}mb/s \n"
print("Upload speed test complete. \n")
print(f'Upload speed test completed in: {ust - dst:.2f}s')
def result():
print("Speed test results are: \n")
print(ssid)
print(ph)
print(dsh)
print(ush)
ttn = datetime.now()
fdt = ttn.strftime("%a %d-%m-%Y, %H:%M:%S%p")
tt = time.time() - stt
print(f"Start Time: {dtn}")
print(f"Finish Time: {fdt}")
print(f'Total execution time: {tt:.2f}s')
results = [ssid, ph, dsh, ush, dtn]
txt = "Speedtest Results.txt"
with open(txt, 'a') as f:
f.write("\n")
f.write("\n".join(results))
f.write("\n")
f.close()
main()

You can run this on, one line i believe by
ssid = (re.findall("SSID : (.*)\r", cdop))
for char in ssid:
ssid = f"Network Name: {char} \n/
Which should make it quicker, have a look at list comprehension

Wrong threading.active_count() results when using ThreadPool

I am using the multiprocessing.pool.ThreadPool with N threads (e.g 5 threads) and I wanted to check the total number of active threads in my process. To do that I am using the method threading.active_count(). I know it's a different module, but I found no other method to count the number of active threads in the multiprocessing package,
The expected result is N+1 (the number of threads I started plus the main thread), but I always get a higher number.
For ThreadPool(2) I am getting 6 active threads
For ThreadPool(5) I am getting 9 active threads
For ThreadPool(10) I am getting 14 active threads
It's important to say that threading.active_count() works fine when creating threads using the threading module. And I found out that multiprocessing.pool.ThreadPool is not well documented.
Can someone help me?
A reproduceable code is described bellow
import threading
from multiprocessing.pool import ThreadPool
import time
import requests
import os
urls_to_download = [
'https://picsum.photos/seed/1/1920/1080',
'https://picsum.photos/seed/2/1920/1080',
'https://picsum.photos/seed/3/1920/1080',
'https://picsum.photos/seed/4/1920/1080',
'https://picsum.photos/seed/5/1920/1080',
'https://picsum.photos/seed/6/1920/1080',
'https://picsum.photos/seed/7/1920/1080',
'https://picsum.photos/seed/8/1920/1080',
'https://picsum.photos/seed/9/1920/1080',
'https://picsum.photos/seed/10/1920/1080',
'https://picsum.photos/seed/11/1920/1080',
'https://picsum.photos/seed/12/1920/1080',
'https://picsum.photos/seed/13/1920/1080',
'https://picsum.photos/seed/14/1920/1080',
'https://picsum.photos/seed/15/1920/1080',
'https://picsum.photos/seed/16/1920/1080',
'https://picsum.photos/seed/17/1920/1080'
]
output_dir = 'downloaded_images'
##
def download(url):
print(f'downloading {url}')
img_data = requests.get(url).content
img_name = url.split('/')[-3]
img_name = f'{img_name}.jpg'
print(f'Received data for {img_name}')
print(f'Active Threads: {threading.active_count()}')
with open(os.path.join(output_dir,img_name), 'wb') as img_file:
img_file.write(img_data)
number_of_threads = 2
t1 = time.perf_counter()
with ThreadPool(number_of_threads) as pool:
pool.map(download,urls_to_download)
t2 = time.perf_counter()
print(f'Finished in {t2-t1} seconds')

How to Speed Up This Python Loop

downloadStart = datetime.now()
while (True):
requestURL = transactionAPI.format(page = tempPage,limit = 5000)
response = requests.get(requestURL,headers=headers)
json_data = json.loads(response.content)
tempMomosTransactionHistory.extend(json_data["list"])
if(datetime.fromtimestamp(json_data["list"][-1]["crtime"]) < datetime(datetime.today().year,datetime.today().month,datetime.today().day - dateRange)):
break
tempPage += 1
downloadEnd = datetime.now()
Any suggestions please threading or something like that ?
Outputs here
downloadtime 0:00:02.056010
downloadtime 0:00:05.680806
downloadtime 0:00:05.447945

You need to improve it in two ways.
Optimise code within loop
Parallelize code execution
#1
By looking at your code I can see one improvement ie. create datetime.today object instead of doing 3 times. Check other methods like transactionAPI optimise further.
#2:
If you multi core CPU machine then you take advantage of machine by spanning thread per page. Refer to modified code of above.
import threading
def processRequest(tempPage):
requestURL = transactionAPI.format(page = tempPage,limit = 5000)
response = requests.get(requestURL,headers=headers)
json_data = json.loads(response.content)
tempMomosTransactionHistory.extend(json_data["list"])
downloadStart = datetime.now()
while (True):
#create thread per page
t1 = threading.Thread(target=processRequest, args=(tempPage, ))
t1.start()
#Fetch datetime today object once instaed 3 times
datetimetoday = datetime()
if(datetime.fromtimestamp(json_data["list"][-1]["crtime"]) < datetime(datetimetoday.year,datetimetoday.month,datetimetoday.day - dateRange)):
break
tempPage += 1
downloadEnd = datetime.now()

Use Python Multiprocess inside of a class taking way TOO long

I'm with a problem using multiprocess with Python. I have two codes. The first is working well, but it's outside of a class and I need to put it inside of a class because this class is part of a bigger program. When I do it the code takes 250 seconds to run instead of 10 when is
The working code (without class) is:
import time
nlp =spacy.load("en_core_web_md")
start_time = time.time()
doc1 = nlp(str("Data Scientist"))
def get_paralel_similarity(item):
return doc1.similarity(nlp(item))
if __name__ == '__main__':
pool = Pool() # Create a multiprocessing Pool
similarities = pool.map(get_paralel_similarity, list(df["jobs"]))
print("--- %s seconds ---" % (time.time() - start_time))
--- 10.971235990524292 seconds ---
You can see that it took less than 11 seconds to run. Without multiprocessing, the same process was taking 1 minute.
The problem is, doc1 is dynamic and I need to run this code a large number of times. In that sense I need to put it in a class. The code I did with this objective is:
import time
import spacy
import warnings
import operator
from multiprocessing import Pool, set_start_method
from functools import partial
from tqdm import tqdm
warnings.filterwarnings("ignore")
nlp =spacy.load("en_core_web_md")
def get_paralel_similarity(tup):
#doc1 = tup[0]
return tup[0].similarity(nlp(tup[1]))
class Matcher(object):
def __init__(self,**kwargs):
self.word = kwargs.get('word')
self.word_list = kwargs.get('word_list')
self.n = kwargs.get('n')
self.nlp = kwargs.get('nlp')
def get_top_similarities(self):
pool = Pool()
similarities = {}
doc1 = nlp(str(self.word))
tup_list = []
for i in tqdm(self.word_list):
tup_list.append((doc1,i))
start_time = time.time()
similarities = pool.map(get_paralel_similarity, tup_list)
pool.close()
#pool.join()
print("--- %s seconds ---" % (time.time() - start_time))
simi = {}
for i in tqd(range(len(self.word_list))):
simi[self.word_list[i]] = similarities[i]
return sorted(simi.items(),key=operator.itemgetter(1),reverse=True)[:self.n]
When I do:
import pandas as pd
df = pd.read_pickle("complete.pkl")
matcher = Matcher(word="Data Scientist",word_list=list(df["jobs"]),n=5,nlp=nlp)
similarity = matcher.get_top_similarities()
--- 256.1134169101715 seconds ---
It's taking ~250 seconds. I will appreciate it if you please help me understand what is wrong?

Wrong speed test using time.clock() between synchronous and asynchronous cases?

I'm reading a tutorial about gevent, and it provids sample codes for a demonstrastion for synchronous and asynchronous cases:
import gevent
import random
def task(pid):
"""
Some non-deterministic task
"""
gevent.sleep(random.randint(0,2)*0.001)
print('Task', pid, 'done')
def synchronous():
for i in range(1,10):
task(i)
def asynchronous():
threads = [gevent.spawn(task, i) for i in xrange(1000)]
gevent.joinall(threads)
This article explains that 'the order of execution in the async case is essentially random and that the total execution time in the async case is much less than the sync case'.
So I used time module to test it:
print('Synchronous:')
start1 = time.clock()
synchronous()
end1 = time.clock()
print "%.2gs" % (end1-start1)
print('Asynchronous:')
start2 = time.clock()
asynchronous()
end2 = time.clock()
print "%.2gs" % (end2-start2)
However, the time run by 'asynchronous' is much longer than 'synchronous':
ubuntu#ip:/tmp$ python gevent_as.py
Synchronous:
0.32s
Asynchronous:
0.64s
ubuntu#ip:/tmp$ python gevent_as.py
Synchronous:
0.3s
Asynchronous:
0.61s
I want to know what's wronge with my test program? Thanks.

It is the problem of time.clock(), that doesn't work properly under ubuntu. See the link for details: Python - time.clock() vs. time.time() - accuracy?
I changed the test program:
print('Synchronous:')
start1 = time.time()
synchronous()
end1 = time.time()
print "%.2gs" % (end1-start1)
print('Asynchronous:')
start2 = time.time()
asynchronous()
end2 = time.time()
print "%.2gs" % (end2-start2)
Then the test speed of 'asynchronous' is much faster than 'synchronous':
ubuntu#ip:/tmp$ python gevent_as.py
Synchronous:
1.1s
Asynchronous:
0.057s

Probably the sleeps are very small and overhead matters. Try replacing 0.001 with 0.1.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python parallel library takes longer than sequential execution - python

Related

How can I improve what I already have?

Wrong threading.active_count() results when using ThreadPool

How to Speed Up This Python Loop

Use Python Multiprocess inside of a class taking way TOO long

Wrong speed test using time.clock() between synchronous and asynchronous cases?

Categories

Resources