Store partial results in dask delayed iterations

Store partial results in dask delayed iterations - python

I have a current long iteration process where I run a calculation and every x iteration I store the results to DB.
For example, iterate fun() function over range(20) and save every 5 results with save_results:
import time
def fun(x):
time.sleep(0.1*x)
return(0.1*x)
def save_results(result):
# originally stores the new data to DB
print result # print as example
result = []
for i in range(20):
result.append(fun(i))
if i%5==4:
save_results(result[-5:])
I want to parallelize the process with dask delayed and compute methods. But if I run it like in the following example, the store_results occurs before the compute:
import dask as da
result = []
for i in range(20):
result.append(da.delayed(fun)(i))
if i%5==4:
save_results(result[-5:])
result = da.compute(result)[0]
and therefore, instead of storing the results every 5 iterations, I store a list of delayed objects:
[Delayed('fun-202f7e28-e594-4926-a5cd-5931dbc99d6b'),
Delayed('fun-d2bf2bc9-a4f3-46d7-adb7-84114a68b482'),
Delayed('fun-c34f2c04-3e25-47fa-8165-1ee7c786aaf6'),
Delayed('fun-a4edd3fc-442d-4ec1-8a0e-320bd9315a61'),
Delayed('fun-c7b48e2c-cb66-472e-85c5-fe6c595fa1ec')]
How can I overcome the issue, and store every 5 new results to DB?

You should delay any function calls that operate on delayed objects
Sequential code
result = []
for i in range(20):
result.append(fun(i))
if i%5==4:
save_results(result[-5:])
Parallel code
def fun(x):
...
result = []
side_effects = []
for i in range(20):
result = dask.delayed(fun)(i)
results.append(result)
if i%5==4:
value = dask.delayed(save_results)(result[-5:])
side_effects.append(value)
dask.compute(results + side_effects)

Related

Running a Function with Arguments using Multiprocessing, and Storing the Result

Say I have a program that contains a simulation-based function which takes some time to compute.
def foo_func(args):
# some calculations
return foo # df
res = {} # will be a dictionary of dfs
for i in range(n):
res[i] = foo_func(args)
Problem: Calculating foo using foo_func n times takes too long
Question: how do i implement multiprocessing/multithreading within the program, and store the results in res?
Note that:
foo_func takes in args
order does not matter within the res - the order in which the jobs finish does not matter, as long as all of the jobs are correctly stored in res

Reduce time for Loop over multiple objects to get value Django

I have a Dataframe in which a column Code is increased significantly everyday and these codes are to be converted into object description for which I am doing something like the following:
product = []
beacon = []
count = []
c_start = time.time()
for i, v in df["D Code"].iteritems():
product.append(Product.objects.get(short_code=v[:2]).description) #how to optimize this?
beacon.append("RFID")
count.append(v[-5:])
c_end = time.time()
print("D Code loop time ", c_end-c_start)
Now initially when the rows were less it used to work in no time but as the data increased the combined Database call for every code takes too much time. Is there any more efficient Django method to loop over a list and get the value?
The df['D Code]` looks something like this:
['TRRFF.1T22AD0029',
'TRRFF.1T22AD0041',
'TRRFF.1T22AD0009',
'TRRFF.1T22AD0032',
'TRRFF.1T22AD0028',
'TRRFF.1T22AD0026',
'TRRFF.1T22AD0040',
'HTRFF.1T22AD0003',
'TRRFF.1T22AD0048',
'PPRFP.1T22AD0017',
'TRRFF.1T22AD0047',
'TRRFF.1T22AD0005',
'TRRFF.1T22AD0033',
'TRRFF.1T22AD0024',
'TRRFF.1T22AD0042'],

You can create a lookup dict with just one query. Then use that dict to find your description.
description_dict = {}
for product in Product.objects.values('short_code', 'description'):
description_dict[product['short_code'] = product['description']
for i, v in df["D Code"].iteritems():
product.append(description_dict[v[:2]])
...

Python - Big For Loop

I'm computing a very big for cycle and i'll try to explain how does it works. There are 4320 matrices (40x80 each) that have been taken from a matlab file.
This loop takes a matrix per time: it assign to each value the right value of H and T. Once finished, it pass to the next matrix and so on.
The dataframe created is then written on a csv file needed for the creation of a database for the wave energy converters productivity.
The problem is that this code is running since 9 days and it is at half on the total computations..Is there any way to drastically reduce the computational time?
indice_4 = 0
configuration_id=-1
n_configurations=4320
for z in range(0,n_configurations,1): #iteration on all the configurations
print(z)
power_matrix=P_mat[z]
energy_wave_period_converted = pd.DataFrame([],columns=['energy_wave_period'])
H_start=0.25
H_end=10
H_step=0.25
T_start=3
T_end=17
T_step=0.177
y=T_start
relative_direction = int(direc[z])
if relative_direction==0:
configuration_id = configuration_id + 1
print(configuration_id)
r=0 #r=row
c=0 #c=column
while y <= T_end:
energy_wave_period= float('%.2f'%y)
x=H_start #initialize on the right wave haights
r=0
while x <= H_end:
significant_wave_height= float('%.2f'%x)
average_power=float('%.2f'%power_matrix[r,c])
new_line_4 = pd.Series([indice_4 , configuration_id, significant_wave_height , energy_wave_period ,relative_direction ,average_power] , index =['id','configuration_id','significant_wave_height','energy_wave_period','relative_direction','average_output_power'])
seastate_productivity = seastate_productivity.append([new_line_4], ignore_index=True)
indice_4= indice_4 + 1
r=r+1
x=x+H_step
c=c+1
y = y + T_step
seastate_productivity.to_csv('seastate_productivity.csv',index=False,sep=';')
'

One of the main things slowing your code down is that you do pandas operations in an iteration. Specifically using pd.Series and pd.DataFrame.append in the loop (which runs for over 12 million times) really slows you down. When using pandas you should really aim to vectorize your operations (meaning performing operations in batch). When I tried your original code every iteration took about 4 seconds, but the time increased gradually. When removing the pd.append every iteration only took 0.5 seconds, and when removing the pd.Series it dropped even more.
I did some improvements by saving the data in lists and later to a dataframe in one go, which took about 2 minutes to run till completion on my laptop:
import time
import numpy as np
import pandas as pd
# Generate random data for testing
P_mat = np.random.rand(4320,40,80)
direc=np.random.rand(4320)
H_start=0.25
H_end=10
H_step=0.25
T_start=3
T_end=17
T_step=0.177
indice_4 = 0
configuration_id=-1
n_configurations=4320
data = []
# Time it
t0 = time.perf_counter()
for z in range(n_configurations):
power_matrix=P_mat[z]
print(z)
y=T_start
relative_direction = int(direc[z])
if relative_direction==0:
configuration_id = configuration_id + 1
r=0 #r=row
c=0 #c=column
while y <= T_end:
energy_wave_period= float('%.2f'%y)
x=H_start #initialize on the right wave haights
r=0
while x <= H_end:
significant_wave_height= float('%.2f'%x)
average_power=float('%.2f'%power_matrix[r,c])
# Save data to list
new_line_4 = [indice_4 , configuration_id, significant_wave_height , energy_wave_period ,relative_direction ,average_power]
data.append(new_line_4) # Append to create a list of lists
indice_4= indice_4 + 1
r=r+1
x=x+H_step
c=c+1
y = y + T_step
# Make dataframe from list of lists
seastate_productivity = pd.DataFrame.from_records(data,columns =['id','configuration_id','significant_wave_height','energy_wave_period','relative_direction','average_output_power'])
# Save data
seastate_productivity.to_csv('seastate_productivity.csv',index=False,sep=';')
# Print time it took
print("Done in:",time.perf_counter()-t0)
You could probably still optimize this solution, by moving the rounding from the loop to outside, by rounding the pandas columns. Also, since you are only moving data around, there is probably also a completely vectorized solution (without a loop) but this is probably sufficient for you.
A way to find out what the issue is with slow code is by timing portions of code. You can use the timeit module, or the time module like I used. You can then isolate lines of code, and run them and analyse the performance.

You should consider using numpy. Using numpy's matrix operations you should be able to reduce computation time.

I suggest you to dig also into concurrent.futures.
It specifically enables to run parallel tasks and reduce run time.
You need to convert your code into a function and then call it into the async func, each element at a time.
The concurrent.futures module provides a high-level interface for asynchronously executing callables.
The asynchronous execution can be performed with threads, using ThreadPoolExecutor, or separate processes, using ProcessPoolExecutor.
https://docs.python.org/3/library/concurrent.futures.html
this is a scolastic example
import concurrent.futures
nums = range(10)
def f(x):
return x * x
def main():
print([val for val in map(f, nums)])
with concurrent.futures.ProcessPoolExecutor() as executor:
print([val for val in executor.map(f, nums)])
if __name__ == '__main__':
main()

code is faster on single cpu but very slow on multiple processes why?

I have some code to sort some values originally in sparse matrix and zip it together with another data. I used some kind of optimizations to make it fast and the code is 20x faster than it was as it is below:
This code takes 8s on single CPU core:
# cosine_sim is a sparse csr matrix
# names is an numpy array of length 400k
cosine_sim_labeled = []
for i in range(0, cosine_sim.shape[0]):
row = cosine_sim.getrow(i).toarray()[0]
non_zero_sim_indexes = np.nonzero(row)
non_zero_sim_values = row[non_zero_sim_indexes]
non_zero_sim_values = [round(freq, 4) for freq in non_zero_sim_values]
non_zero_names_values = np.take(names, non_zero_sim_indexes)[0]
zipped = zip(non_zero_names_values, non_zero_sim_values)
cosine_sim_labeled.append(sorted(zipped, key=lambda cv: -cv[1])[1:][:top_similar_count])
But if I use same code with multi core (to make it even faster) it takes 300 seconds:
#split is array of arrays of numbers like [[1,2,3], [4,5,6]] it is meant to generate batches of array indexes to be processed with each paralel process
split = np.array_split(range(0, cosine_sim.shape[0]), cosine_sim.shape[0] / batch)
def sort_rows(split):
cosine_sim_labeled = []
for i in split:
row = cosine_sim.getrow(i).toarray()[0]
non_zero_sim_indexes = np.nonzero(row)
non_zero_sim_values = row[non_zero_sim_indexes]
non_zero_sim_values = [round(freq, 4) for freq in non_zero_sim_values]
non_zero_names_values = np.take(names, non_zero_sim_indexes)[0]
zipped = zip(non_zero_names_values, non_zero_sim_values)
cosine_sim_labeled.append(sorted(zipped, key=lambda cv: -cv[1])[1:][:top_similar_count])
return cosine_sim_labeled
# this ensures paralel CPU execution
rows = Parallel(n_jobs=CPU_use, verbose=40)(delayed(sort_rows)(x) for x in split)
cosine_sim_labeled = np.vstack(rows).tolist()

you do realize that your new parallel function sort_rows does not even use the split argument? all it does is to distribute all the data to all processes, which takes time, then each process is doing the exact same calculation, only to return the whole data back to the main process, which again takes time

How to reduce for loop execution time using multiprocessing in python

I have two lists. List X contains 1000 words. List Y contains 500 words. I am trying to find similar words for List X with respect to Y.
I am using Spacy's similarity function.
The problem I am facing is that it takes a long time for the for loop part of the execution. I have understood from research that in python, multi threading only gives a illusion of concurrency and hence does not have any real performance increase. Thus I thought multiprocessing is the way but I am new to multiprocessing usage, hence request help.
How do I speed up the execution of the for loop part through multiprocessing in python?
The following is my code.
import en_vectors_web_lg
nlp = en_vectors_web_lg.load()
ListX =['HSBC', 'JP Morgan',......] #500 words lists
ListY = ['Currency','Blockchain'.......] #1000 words lists
s_words = []
for token1 in ListY:
list_to_sort = []
for token2 in ListX:
list_to_sort.append((token1, token2,nlp(str(token1)).similarity(nlp(str(token2)))))
sorted_list = sorted(list_to_sort, key = itemgetter(2), reverse=True)[0][:2]
s_words.append(sorted_list)

You can try this:
import en_vectors_web_lg
nlp = en_vectors_web_lg.load()
def compare_function(token1, token2, nlp):
return token1, token2, nlp(str(token1)).similarity(nlp(str(token2)))
from multiprocessing import Pool
import itertools
tokenlist = [(a,b, nlp) for a, b in itertools.product(ListX, ListY)]
p = Pool(8)
results = p.map(compare_function, tokenlist)
If you are under windows, use
if __name__ == '__main__':
results = p.map(compare_function, tokenlist)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Store partial results in dask delayed iterations - python

Related

Running a Function with Arguments using Multiprocessing, and Storing the Result

Reduce time for Loop over multiple objects to get value Django

Python - Big For Loop

code is faster on single cpu but very slow on multiple processes why?

How to reduce for loop execution time using multiprocessing in python

Categories

Resources