Use Python Multiprocess inside of a class taking way TOO long

Use Python Multiprocess inside of a class taking way TOO long - python

I'm with a problem using multiprocess with Python. I have two codes. The first is working well, but it's outside of a class and I need to put it inside of a class because this class is part of a bigger program. When I do it the code takes 250 seconds to run instead of 10 when is
The working code (without class) is:
import time
nlp =spacy.load("en_core_web_md")
start_time = time.time()
doc1 = nlp(str("Data Scientist"))
def get_paralel_similarity(item):
return doc1.similarity(nlp(item))
if __name__ == '__main__':
pool = Pool() # Create a multiprocessing Pool
similarities = pool.map(get_paralel_similarity, list(df["jobs"]))
print("--- %s seconds ---" % (time.time() - start_time))
--- 10.971235990524292 seconds ---
You can see that it took less than 11 seconds to run. Without multiprocessing, the same process was taking 1 minute.
The problem is, doc1 is dynamic and I need to run this code a large number of times. In that sense I need to put it in a class. The code I did with this objective is:
import time
import spacy
import warnings
import operator
from multiprocessing import Pool, set_start_method
from functools import partial
from tqdm import tqdm
warnings.filterwarnings("ignore")
nlp =spacy.load("en_core_web_md")
def get_paralel_similarity(tup):
#doc1 = tup[0]
return tup[0].similarity(nlp(tup[1]))
class Matcher(object):
def __init__(self,**kwargs):
self.word = kwargs.get('word')
self.word_list = kwargs.get('word_list')
self.n = kwargs.get('n')
self.nlp = kwargs.get('nlp')
def get_top_similarities(self):
pool = Pool()
similarities = {}
doc1 = nlp(str(self.word))
tup_list = []
for i in tqdm(self.word_list):
tup_list.append((doc1,i))
start_time = time.time()
similarities = pool.map(get_paralel_similarity, tup_list)
pool.close()
#pool.join()
print("--- %s seconds ---" % (time.time() - start_time))
simi = {}
for i in tqd(range(len(self.word_list))):
simi[self.word_list[i]] = similarities[i]
return sorted(simi.items(),key=operator.itemgetter(1),reverse=True)[:self.n]
When I do:
import pandas as pd
df = pd.read_pickle("complete.pkl")
matcher = Matcher(word="Data Scientist",word_list=list(df["jobs"]),n=5,nlp=nlp)
similarity = matcher.get_top_similarities()
--- 256.1134169101715 seconds ---
It's taking ~250 seconds. I will appreciate it if you please help me understand what is wrong?

Related

Python multiprocessing finish the work correctly, but the processes still alive (Linux)

I use python multiprocessing to compute some sort of scores on DNA sequences from a large file.
For that I write and use the script below.
I use a Linux machine with 48 cpu in python 3.8 environment.
Th code work fine, and terminate the work correctly and print the processing time at the end.
Problem: when I use the htop command, I find that all 48 processes are still alive.
I don't know why, and I don't know what to add to my script to avoid this.
import csv
import sys
import concurrent.futures
from itertools import combinations
import psutil
import time
nb_cpu = psutil.cpu_count(logical=False)
def fun_job(seq_1, seq_2): # seq_i : (id, string)
start = time.time()
score_dist = compute_score_dist(seq_1[1], seq_2[1])
end = time.time()
return seq_1[0], seq_2[0], score_dist, end - start # id seq1, id seq2, score, time
def help_fun_job(nested_pair):
return fun_job(nested_pair[0], nested_pair[1])
def compute_using_multi_processing(list_comb_ids, dict_ids_seqs):
start = time.perf_counter()
with concurrent.futures.ProcessPoolExecutor(max_workers=nb_cpu) as executor:
results = executor.map(help_fun_job,
[((pair_ids[0], dict_ids_seqs[pair_ids[0]]), (pair_ids[1], dict_ids_seqs[pair_ids[1]]))
for pair_ids in list_comb_ids])
save_results_to_csv(results)
finish = time.perf_counter()
proccessing_time = str(datetime.timedelta(seconds=round(finish - start, 2)))
print(f' Processing time Finished in {proccessing_time} hh:mm:ss')
def main():
print("nb_cpu in this machine : ", nb_cpu)
file_path = sys.argv[1]
dict_ids_seqs = get_dict_ids_seqs(file_path)
list_ids = list(dict_ids_seqs) # This will convert the dict_keys to a list
list_combined_ids = list(combinations(list_ids, 2))
compute_using_multi_processing(list_combined_ids, dict_ids_seqs)
if __name__ == '__main__':
main()
Thank you for your help.
Edit : add the complete code for fun_job (after #Booboo answer)
from Bio import Align
def fun_job(seq_1, seq_2): # seq_i : (id, string)
start = time.time()
aligner = Align.PairwiseAligner()
aligner.mode = 'global'
score_dist = aligner.score(seq_1[1],seq_2[1])
end = time.time()
return seq_1[0], seq_2[0], score_dist, end - start # id seq1, id seq2, score, time

When the with ... as executor: block exits, there is an implicit call to executor.shutdown(wait=True). This will wait for all pending futures to to be done executing "and the resources associated with the executor have been freed", which presumably includes terminating the processes in the pool (if possible?). Why your program terminates (or does it?) or at least you say all the futures have completed executing, while the processes have not terminated is a bit of a mystery. But you haven't provided the code for fun_job, so who can say why this is so?
One thing you might try is to switch to using the multiprocessing.pool.Pool class from the multiprocessing module. It supports a terminate method, which is implicitly called when its context manager with block exits, that explicitly attempts to terminate all processes in the pool:
#import concurrent.futures
import multiprocessing
... # etc.
def compute_using_multi_processing(list_comb_ids, dict_ids_seqs):
start = time.perf_counter()
with multiprocessing.Pool(processes=nb_cpu) as executor:
results = executor.map(help_fun_job,
[((pair_ids[0], dict_ids_seqs[pair_ids[0]]), (pair_ids[1], dict_ids_seqs[pair_ids[1]]))
for pair_ids in list_comb_ids])
save_results_to_csv(results)
finish = time.perf_counter()
proccessing_time = str(datetime.timedelta(seconds=round(finish - start, 2)))
print(f' Processing time Finished in {proccessing_time} hh:mm:ss')

Getting Pool apply_async return too slow

I'm trying to make a bot for IQ Option.
I already did it, but i did it one by one, like, i had to open 10 bots so i could check 10 pairs.
I've been trying all day long doing with ThreadPool, Threadings, map and starmap (i think i didn't use them as good as they can be).
The thing is: i'm checking pairs (EURUSD, EURAUD...) values of the last 100 minutes. When i do it one by one, it takes between 80 and 300ms to return each. I'm trying now to do this in a way that i could do like all the calls at the same time and get their results around the same time to their respective var.
Atm my code is like this:
from iqoptionapi.stable_api import IQ_Option
from functools import partial
from multiprocessing.pool import ThreadPool as Pool
from time import *
from datetime import datetime, timedelta
import os
import sys
import dados #my login data
import config #atm is just payoutMinimo = 0.79
parAtivo = {}
class PAR:
def __init__(self, par, velas):
self.par = par
self.velas = velas
self.lucro = 0
self.stoploss = 50000
self.stopgain = 50000
def verificaAbertasPayoutMinimo(API, payoutMinimo):
status = API.get_all_open_time()
profits = API.get_all_profit()
abertasPayoutMinimo = []
for x in status['turbo']:
if status['turbo'][x]['open'] and profits[x]['turbo'] >= payoutMinimo:
abertasPayoutMinimo.append(x)
return abertasPayoutMinimo
def getVelas(API, par, tempoAN, segundos, numeroVelas):
return API.get_candles(par, tempoAN*segundos, numeroVelas, time()+50)
def logVelas(velas, par):
global parAtivo
parAtivo[par] = PAR(par, velas)
def verificaVelas(API, abertasPayoutMinimo, tempoAN, segundos, numeroVelas):
pool = Pool()
global parAtivo
for par in abertasPayoutMinimo:
print(f"Verificando par {par}")
pool = Pool()
if par not in parAtivo:
callbackFunction = partial(logVelas, par=par)
pool.apply_async(
getVelas,
args=(API, par, tempoAN, segundos, numeroVelas),
callback=callbackFunction
)
pool.close()
pool.join()
def main():
tempoAN = 1
segundos = 60
numeroVelas = 20
tempoUltimaVerificacao = datetime.now() - timedelta(days=99)
global parAtivo
conectado = False
while not conectado:
API = IQ_Option(dados.user, dados.pwd)
API.connect()
if API.check_connect():
os.system("cls")
print("Conectado com sucesso.")
sleep(1)
conectado = True
else:
print("Erro ao conectar.")
sleep(1)
conectado = False
API.change_balance("PRACTICE")
while True:
if API.get_balance() < 2000:
API.reset_practice_balance()
if datetime.now() > tempoUltimaVerificacao + timedelta(minutes=5):
abertasPayoutMinimo = verificaAbertasPayoutMinimo(API, config.payoutMinimo)
tempoUltimaVerificacao = datetime.now()
verificaVelas(API, abertasPayoutMinimo, tempoAN, segundos, numeroVelas)
for item in parAtivo:
print(parAtivo[item])
break #execute only 1 time for testing
if __name__ == "__main__":
main()
#edit1: just complemented with more info, actually this is the whole code right now.
#edit2: when i print it like this:
for item in parAtivo:
print(parAtivo[item].velas[-1]['close']
I get:
0.26671
0.473878
0.923592
46.5628
1.186974
1.365679
0.86263
It's correct, the problem is it takes too long, like almost 3 seconds, the same as if i was doing without ThreadPool.

Solved.
Did it using threadings.Thread, like this:
for par in abertasPayoutMinimo:
t = threading.Thread(
target=getVelas,
args=(API, par, tempoAN, segundos)
)
t.start()
t.join()

Python parallel library takes longer than sequential execution

I am trying to leverage multi processing by using the Parallel library of python. However strangely I see that the sequential execution is taking longer compared to the parallel version. Below is the code which I am running for comparison
import time
from joblib import Parallel, delayed
def compute_features(summary, article):
feature_dict = {}
feature_dict["f1"] = summary
feature_dict["f2"] = article
return feature_dict
def construct_input(n):
summaries = []
articles = []
for i in range(n):
summaries.append("summary_" + str(i))
articles.append("articles_" + str(i))
return summaries, articles
def sequential_test(n):
print("Sequential test")
start_time = time.time()
summaries, articles = construct_input(n)
feature_list = []
for i in range(n):
feature_list.append(compute_features(summaries[i], articles[i]))
total_time = time.time() - start_time
print("Total Time Sequential : %s" % total_time)
# print(feature_list)
def parallel_test(n):
print("Parallel test")
start_time = time.time()
summaries, articles = construct_input(n)
feature_list = []
executor = Parallel(n_jobs=8, backend="multiprocessing", prefer="processes", verbose=True)
# executor = Parallel(n_jobs=4, prefer="threads")
tasks = (delayed(compute_features)(summaries[i], articles[i]) for i in range(n))
results = executor(tasks)
for result in results:
feature_list.append(result)
total_time = time.time() - start_time
print("Total Time Parallel : %s" % total_time)
# print(feature_list)
if __name__ == "__main__":
n = 500000
sequential_test(n)
parallel_test(n)
I get the following output when I run the code above
Sequential test
Total Time Sequential : 1.200118064880371
Parallel test
[Parallel(n_jobs=8)]: Using backend MultiprocessingBackend with 8 concurrent workers.
[Parallel(n_jobs=8)]: Done 56 tasks | elapsed: 0.0s
[Parallel(n_jobs=8)]: Done 49136 tasks | elapsed: 1.0s
[Parallel(n_jobs=8)]: Done 500000 out of 500000 | elapsed: 4.7s finished
Total Time Parallel : 5.427206039428711
I am running this code on a mac with the following configuration
Can you guys please help me understand why this is so? And if the hardware were to change say to use a GPU would the code be any faster? Appreciate your responses. Thanks in advance.

Process start duration between Python3 and Python2

I observe a significant time delta for starting a serie of processes between Python 3.5 and Python 2.7.
In this below code, if CRITICAL = 8 : perf are almost identical in Py2 and Py3 (<1s). But for 9+, perf in Py2 remains unchanged whereas in Py3 it goes deeply worst (~1min!).
It seems linked to the size of args i give to process...
UPDATE : it's also linked to the location of module. Indeed, if it's run from "C:\" (or short path), then Py3 is similar to Py2. But if run from very long path, perf in Py3 are very downgraded, whereas it remains unchanged in Py2.
from __future__ import print_function
from multiprocessing import Process
import time
import itertools
def workerTask(inputs):
for _ in itertools.product(*inputs):
pass
if __name__ == '__main__':
CRITICAL = 9 # OK for 8-, KO for 9+
start = time.time()
ARGS = [["123.4567{}".format(i) for i in range(CRITICAL)] for _ in range(10)]
workerPool = [Process(target=workerTask, args=(ARGS,)) for _ in range(15)]
for idx, w in enumerate(workerPool):
print("...Starting process #{} after {}".format(idx + 1, time.time() - start))
w.start()
print("ALL PROCESSES STARTED in {}!".format(time.time() - start))

I've found an alternative, which seems very modular to "multi-process" works.
By this way, in Py3, time to launch N process remains similar to Py2.
Instead of providing huge args to each process, i create a shared object, linked to BaseManager, in which one i store huge data needed by process.
Furthemore, i can also store shared progress or any data computed by each process to continue after and use it. I really like this solution.
Here the code:
from __future__ import print_function
import time
import itertools
from multiprocessing import Process
from multiprocessing.managers import BaseManager
def workerTask(sharedSandbox):
inputs = sharedSandbox.getARGS()
for _ in itertools.product(*inputs):
pass
class _SharedData(object):
def __init__(self, data):
self.__myARGS = data
def getARGS(self):
return self.__myARGS
class _GlobalManager(BaseManager):
BaseManager.register('SharedData', _SharedData)
if __name__ == '__main__':
CRITICAL = 9 # OK for 8-, KO for 9+
start = time.time()
manager = _GlobalManager()
manager.start()
ARGS = manager.SharedData([["123.4567{}".format(i) for i in range(CRITICAL)] for _ in range(10)])
workerPool = [Process(target=workerTask, args=(ARGS,)) for _ in range(15)]
for idx, w in enumerate(workerPool):
print("...Starting process #{} after {}".format(idx + 1, time.time() - start))
w.start()
print("ALL PROCESSES STARTED in {}!".format(time.time() - start))
while any([w.is_alive() for w in workerPool]):
pass

Python Multiple requests

I have a situation to call multiple requests in a scheduler job to check live user status for 1000 users at a time. But server limits maximum up to 50 users in each hit of an API request. So using following approach with for loop its taking around 66 seconds for 1000 users (i.e for 20 API calls).
from apscheduler.schedulers.blocking import BlockingScheduler
sched = BlockingScheduler()
def shcdulerjob():
"""
"""
uidlist = todays_userslist() #Get around 1000 users from table
#-- DIVIDE LIST BY GIVEN SIZE (here 50)
split_list = lambda lst, sz: [lst[i:i+sz] for i in range(0, len(lst), sz)]
idlists = split_list(uidlist, 50) # SERVER MAX LIMIT - 50 ids/request
for idlist in idlists:
apiurl = some_server_url + "&ids="+str(idlist)
resp = requests.get(apiurl)
save_status(resp.json()) #-- Save status to db
if __name__ == "__main__":
sched.add_job(shcdulerjob, 'interval', minutes=10)
sched.start()
So,
Is there any workaround so that it should optimize the time required to fetch API?
Does Python- APScheduler provide any multiprocessing option to process such api requests in a single job?

You could try to apply python's Thread pool from the concurrent.futures module, if the server allows concurrent requests. That way you would parallelise the processing, instead of the scheduling itself
There are some good examples provided in the documentation here (If you're using python 2, there is a sort of an equivalent module
e.g.
import concurrent.futures
import multiprocessing
import requests
import time
import json
cpu_start_time = time.process_time()
clock_start_time = time.time()
queue = multiprocessing.Queue()
uri = "http://localhost:5000/data.json"
users = [str(user) for user in range(1, 50)]
with concurrent.futures.ThreadPoolExecutor(multiprocessing.cpu_count()) as executor:
for user_id, result in zip(
[str(user) for user in range(1, 50)]
, executor.map(lambda x: requests.get(uri, params={id: x}).content, users)
):
queue.put((user_id, result))
while not queue.empty():
user_id, rs = queue.get()
print("User ", user_id, json.loads(rs.decode()))
cpu_end_time = time.process_time()
clock_end_time = time.time()
print("Took {0:.03}s [{1:.03}s]".format(cpu_end_time-cpu_start_time, clock_end_time-clock_start_time))
If you want to use a Process pool, just make sure you don't use shared resources, e.g. queue, and write your data our independently

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Use Python Multiprocess inside of a class taking way TOO long - python

Related

Python multiprocessing finish the work correctly, but the processes still alive (Linux)

Getting Pool apply_async return too slow

Python parallel library takes longer than sequential execution

Process start duration between Python3 and Python2

Python Multiple requests

Categories

Resources