Why parallelized code with concurrent.futures is slower then regular code? - python

I tried parallelizing with concurrent.futures expecting that parallelized code will be faster.
I made a dafault code to test the parallelization. It is not important what the code does. I'm mainly interested in the speed of the dafault code and parallelized code. All it does is calculate the correlation between lists from sigs and data_mat and store the values in corr_coefs. You can see the plain code below:
from time import time
import numpy as np
sigs = [
[91, 43, 44, 49, 64, 37, 61, 31, 73],
[59, 94, 91, 12, 47, 44, 93, 7, 84],
[47, 76, 24, 87, 2, 83, 77, 60, 36],
[83, 68, 3, 49, 14, 12, 51, 36, 22]
]
data_mat = [
[83, 68, 3, 49, 14, 12, 51, 36, 22],
[8, 78, 44, 40, 39, 67, 63, 64, 34],
[49, 24, 77, 91, 66, 44, 83, 30, 99],
[97, 40, 69, 7, 24, 70, 63, 52, 81],
[26, 62, 53, 36, 72, 54, 85, 94, 31],
[99, 52, 87, 52, 50, 9, 22, 72, 62],
[91, 15, 54, 84, 89, 15, 43, 31, 9],
[39, 26, 36, 81, 65, 50, 67, 12, 19],
[67, 22, 86, 24, 38, 30, 45, 94, 44],
# etc.
]
execution_time_start = time()
corr_coefs = []
for sig in sigs:
for data_mat_row in data_mat:
corr = np.corrcoef(np.square(sig), np.square(data_mat_row))
corr_coefs.append(corr[0, 1])
execution_time_end = time()
elapsed_time = execution_time_end - execution_time_start
print(f'Execution time (without parallelizaion): = {elapsed_time:.20f} s')
I tried to parallelize this code using concurrent.futures. The data_mat and sings sheets are the same (I just rewrote the code):
from time import time
import numpy as np
import concurrent.futures
sigs = [
[91, 43, 44, 49, 64, 37, 61, 31, 73],
[59, 94, 91, 12, 47, 44, 93, 7, 84],
[47, 76, 24, 87, 2, 83, 77, 60, 36],
[83, 68, 3, 49, 14, 12, 51, 36, 22]
]
data_mat = [
[83, 68, 3, 49, 14, 12, 51, 36, 22],
[8, 78, 44, 40, 39, 67, 63, 64, 34],
[49, 24, 77, 91, 66, 44, 83, 30, 99],
[97, 40, 69, 7, 24, 70, 63, 52, 81],
[26, 62, 53, 36, 72, 54, 85, 94, 31],
[99, 52, 87, 52, 50, 9, 22, 72, 62],
[91, 15, 54, 84, 89, 15, 43, 31, 9],
[39, 26, 36, 81, 65, 50, 67, 12, 19],
[67, 22, 86, 24, 38, 30, 45, 94, 44],
# etc.
]
execution_time_start = time()
corr_coefs = []
with concurrent.futures.ThreadPoolExecutor() as executor:
future_corr_coefs = {executor.submit(np.corrcoef, np.square(sig), np.square(data_mat_row)): (sig, data_mat_row)
for sig in sigs for data_mat_row in data_mat}
for future in concurrent.futures.as_completed(future_corr_coefs):
sig, data_mat_row = future_corr_coefs[future]
corr = future.result()
corr_coefs.append(corr[0,1])
execution_time_end = time()
elapsed_time = execution_time_end - execution_time_start
print(f'Execution time (with parallelizaion): = {elapsed_time:.20f} s')
I expected the rewritten code to be faster, but I got these outputs:
Execution time (without parallelization): = 1.30910301208496093750 s
Execution time (with parallelization): = 2.38465380668640136719 s
I also tried with a larger data set by expanding the list data_mat , but still the code is slower. Does anyone have any advice that would help? I still thought it might be Overhead. But I am not able to explain how...

I found the answer. The code is faster, but sigs and data_mat should be much larger (very large) to be more efficient. If the input data set is small, then it is pointless to use concurent.futures because the overhead for parallelizing the code increases the computation time...but if the data set is large and the code in loops is more complex, then the parallelization is faster...

Related

How do you split an array into specific intervals in Num.py for Python?

The question follows a such:
x = np.arange(100)
Write Python code to split the following array at these intervals: 10, 25, 45, 75, 95
I have used the split function and unable to get at these specific intervals, can anyone enlighten me on another method or am i doing it wrongly?
Here's both the manual way and the numpy way with split.
# Manual method
x = np.arange(100)
split_indices = [10, 25, 45, 75, 95]
split_arrays = []
for i, j in zip([0]+split_indices[:-1], split_indices):
split_arrays.append(x[i:j])
print(split_arrays)
# Numpy method
split_arrays_np = np.split(x, split_indices)
print(split_arrays_np)
And the result is (for both)
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]),
array([25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44]),
array([45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74]),
array([75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94])
]

Sending random data to API using Python Flask

I am trying to send random data to API using python flask with intervals of 1 second. But it only shows the last array of data. I am using the following code:
import time
import random
import datetime
from flask import Flask
mylist = []
ct = datetime.datetime.now()
app = Flask(__name__)
#app.route('/')
def index():
mylist = []
ct = datetime.datetime.now()
for i in range(0, 61):
x = random.randint(1, 100)
mylist.append(x)
if len(mylist) == 11:
right_in_left_out = mylist.pop(0)
else:
right_in_left_out = None
time.sleep(1)
print(mylist)
return mylist
if __name__ == "__main__":
app.run(debug=True)
OUTPUT:
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on http://127.0.0.1:5000
Press CTRL+C to quit
* Restarting with stat
* Debugger is active!
* Debugger PIN: 516-689-025
[50]
[50, 61]
[50, 61, 47]
[50, 61, 47, 63]
[50, 61, 47, 63, 24]
[50, 61, 47, 63, 24, 92]
[50, 61, 47, 63, 24, 92, 18]
[50, 61, 47, 63, 24, 92, 18, 75]
[50, 61, 47, 63, 24, 92, 18, 75, 95]
[50, 61, 47, 63, 24, 92, 18, 75, 95, 4]
[61, 47, 63, 24, 92, 18, 75, 95, 4, 40]
[47, 63, 24, 92, 18, 75, 95, 4, 40, 88]
[63, 24, 92, 18, 75, 95, 4, 40, 88, 39]
[24, 92, 18, 75, 95, 4, 40, 88, 39, 47]
[92, 18, 75, 95, 4, 40, 88, 39, 47, 58]
[18, 75, 95, 4, 40, 88, 39, 47, 58, 82]
[75, 95, 4, 40, 88, 39, 47, 58, 82, 88]
[95, 4, 40, 88, 39, 47, 58, 82, 88, 7]
[4, 40, 88, 39, 47, 58, 82, 88, 7, 90]
[40, 88, 39, 47, 58, 82, 88, 7, 90, 65]
[88, 39, 47, 58, 82, 88, 7, 90, 65, 93]
[39, 47, 58, 82, 88, 7, 90, 65, 93, 9]
[47, 58, 82, 88, 7, 90, 65, 93, 9, 55]
[58, 82, 88, 7, 90, 65, 93, 9, 55, 48]
[82, 88, 7, 90, 65, 93, 9, 55, 48, 83]
[88, 7, 90, 65, 93, 9, 55, 48, 83, 96]
[7, 90, 65, 93, 9, 55, 48, 83, 96, 63]
[90, 65, 93, 9, 55, 48, 83, 96, 63, 8]
[65, 93, 9, 55, 48, 83, 96, 63, 8, 43]
[93, 9, 55, 48, 83, 96, 63, 8, 43, 49]
[9, 55, 48, 83, 96, 63, 8, 43, 49, 95]
[55, 48, 83, 96, 63, 8, 43, 49, 95, 92]
[48, 83, 96, 63, 8, 43, 49, 95, 92, 43]
[83, 96, 63, 8, 43, 49, 95, 92, 43, 57]
[96, 63, 8, 43, 49, 95, 92, 43, 57, 91]
[63, 8, 43, 49, 95, 92, 43, 57, 91, 61]
[8, 43, 49, 95, 92, 43, 57, 91, 61, 27]
[43, 49, 95, 92, 43, 57, 91, 61, 27, 66]
[49, 95, 92, 43, 57, 91, 61, 27, 66, 70]
[95, 92, 43, 57, 91, 61, 27, 66, 70, 4]
[92, 43, 57, 91, 61, 27, 66, 70, 4, 34]
[43, 57, 91, 61, 27, 66, 70, 4, 34, 11]
[57, 91, 61, 27, 66, 70, 4, 34, 11, 95]
[91, 61, 27, 66, 70, 4, 34, 11, 95, 71]
[61, 27, 66, 70, 4, 34, 11, 95, 71, 35]
[27, 66, 70, 4, 34, 11, 95, 71, 35, 4]
[66, 70, 4, 34, 11, 95, 71, 35, 4, 98]
[70, 4, 34, 11, 95, 71, 35, 4, 98, 18]
[4, 34, 11, 95, 71, 35, 4, 98, 18, 81]
[34, 11, 95, 71, 35, 4, 98, 18, 81, 87]
[11, 95, 71, 35, 4, 98, 18, 81, 87, 84]
[95, 71, 35, 4, 98, 18, 81, 87, 84, 37]
[71, 35, 4, 98, 18, 81, 87, 84, 37, 63]
[35, 4, 98, 18, 81, 87, 84, 37, 63, 42]
[4, 98, 18, 81, 87, 84, 37, 63, 42, 18]
[98, 18, 81, 87, 84, 37, 63, 42, 18, 79]
[18, 81, 87, 84, 37, 63, 42, 18, 79, 28]
[81, 87, 84, 37, 63, 42, 18, 79, 28, 12]
[87, 84, 37, 63, 42, 18, 79, 28, 12, 36]
[84, 37, 63, 42, 18, 79, 28, 12, 36, 23]
[37, 63, 42, 18, 79, 28, 12, 36, 23, 49]
127.0.0.1 - - [21/Sep/2022 12:55:46] "GET / HTTP/1.1" 200 -
OUTPUT AT API:
I am looking to send data the same way as it is being displayed in the IDE with 1 sec intervals.
The problem lies here
if len(mylist) == 11:
right_in_left_out = mylist.pop(0)
Once this code executes for the first time, your list size is back to 10 , further iterations everytime it becomes 11 and then back to 10!
Your code is returning a list, and that is logical. If you want to return all the lists like the ones you displayed, you have to store them in a list of lists.
I mean by that:
all_lists = []
mylist = []
ct = datetime.datetime.now()
for i in range(0, 61):
x = random.randint(1, 100)
mylist.append(x)
if len(mylist) == 11:
mylist.pop(0)
time.sleep(1)
all_lists.append(mylist)
print(mylist)
return all_lists
There is also no need to use right_in_left_out variable in your code.

Numpy blocks reshaping

I am looking for a way to reshape the following 1d-numpy array:
# dimensions
n = 2 # int : 1 ... N
h = 2 # int : 1 ... N
m = n*(2*h+1)
input_data = np.arange(0,(n*(2*h+1))**2)
The expected output should be reshaped into (2*h+1)**2 blocks of shape (n,n) such as:
input_data.reshape(((2*h+1)**2,n,n))
>>> array([[[ 0 1]
[ 2 3]]
[[ 4 5]
[ 6 7]]
...
[[92 93]
[94 95]]
[[96 97]
[98 99]]]
These blocks finally need to be reshaped into a (m,m) matrix so that they are stacked in rows of 2*h+1 blocks:
>>> array([[ 0, 1, 4, 5, 8, 9, 12, 13, 16, 17],
[ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19],
...
[80, 81, 84, 85, 88, 89, 92, 93, 96, 97],
[82, 83, 86, 87, 90, 91, 94, 95, 98, 99]])
My problem is that I can't seem to find proper axis permutations after the first reshape into (n,n) blocks. I have looked at several answers such as this one but in vain.
As the real dimensions n and h are quite bigger and this operation takes place in an iterative process, I am looking for an efficient reshaping operation.
I don't think you can do this with reshape and transpose alone (although I'd love to be proven wrong). Using np.block works, but it's a bit messy:
np.block([list(i) for i in input_data.reshape( (2*h+1), (2*h+1), n, n )])
array([[ 0, 1, 4, 5, 8, 9, 12, 13, 16, 17],
[ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19],
[20, 21, 24, 25, 28, 29, 32, 33, 36, 37],
[22, 23, 26, 27, 30, 31, 34, 35, 38, 39],
[40, 41, 44, 45, 48, 49, 52, 53, 56, 57],
[42, 43, 46, 47, 50, 51, 54, 55, 58, 59],
[60, 61, 64, 65, 68, 69, 72, 73, 76, 77],
[62, 63, 66, 67, 70, 71, 74, 75, 78, 79],
[80, 81, 84, 85, 88, 89, 92, 93, 96, 97],
[82, 83, 86, 87, 90, 91, 94, 95, 98, 99]])
EDIT: Never mind, you can do without np.block:
input_data.reshape( (2*h+1), (2*h+1), n, n).transpose(0, 2, 1, 3).reshape(10, 10)
array([[ 0, 1, 4, 5, 8, 9, 12, 13, 16, 17],
[ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19],
[20, 21, 24, 25, 28, 29, 32, 33, 36, 37],
[22, 23, 26, 27, 30, 31, 34, 35, 38, 39],
[40, 41, 44, 45, 48, 49, 52, 53, 56, 57],
[42, 43, 46, 47, 50, 51, 54, 55, 58, 59],
[60, 61, 64, 65, 68, 69, 72, 73, 76, 77],
[62, 63, 66, 67, 70, 71, 74, 75, 78, 79],
[80, 81, 84, 85, 88, 89, 92, 93, 96, 97],
[82, 83, 86, 87, 90, 91, 94, 95, 98, 99]])

Сan you explain why I get an error globalizing a variable?

Hey!
This relates to problem 18 from Euler's project (https://projecteuler.net/problem=18)
This code solved it, but I got an error (4th line):
Undefined variable: 'ans'Python(undefined-variable)
So, I want to understand why this happened
Also, let me know, if there are any flaws in my code
Thanks in advance
def brute(i, j, sum):
global ans
if i > len(l) - 1:
if sum > ans:
ans = sum
return None
brute(i + 1, j, sum + l[i][j])
brute(i + 1, j + 1, sum + l[i][j])
l = [
[75],
[95, 64],
[17, 47, 82],
[18, 35, 87, 10],
[20, 4, 82, 47, 65],
[19, 1, 23, 75, 3, 34],
[88, 2, 77, 73, 7, 63, 67],
[99, 65, 4, 28, 6, 16, 70, 92],
[41, 41, 26, 56, 83, 40, 80, 70, 33],
[41, 48, 72, 33, 47, 32, 37, 16, 94, 29],
[53, 71, 44, 65, 25, 43, 91, 52, 97, 51, 14],
[70, 11, 33, 28, 77, 73, 17, 78, 39, 68, 17, 57],
[91, 71, 52, 38, 17, 14, 91, 43, 58, 50, 27, 29, 48],
[63, 66, 4, 68, 89, 53, 67, 30, 73, 16, 69, 87, 40, 31],
[4, 62, 98, 27, 23, 9, 70, 98, 73, 93, 38, 53, 60, 4, 23],
]
ans = 0
brute(0, 0, 0)
print(ans)
IMHO this is not a good use-case for globals, would be better to refactor the code like so:
def brute(i, j):
if i > len(l) - 1:
return 0
return l[i][j]+max(brute(i + 1, j), brute(i + 1, j + 1))
I've flipped the logic around to accomplish though, the code works by picking the maximum sum from its subtree
You ideally want to save usage of global variables for system-wide settings and such

List Index out of range error in python

I'm trying to solve the max path sum problem from project euler.
CODE:
def main():
data = [map(int,row.split()) for row in open("Triangle.txt")]
print data
for i in range(len(data)-2,-1,-1):
for j in range(i+1):
data[i][j] += max([data[i+1][j],data[i+1][j+1]]) #list out of range error
print (data[0][0])
if __name__ == '__main__':
main()
The data value has 16 internal lists as follows:
[[75], [95, 64], [17, 47, 82], [18, 35, 87, 10], [20, 4, 82, 47, 65], [19, 1, 23, 75, 3, 34], [88, 2, 77, 73, 7, 63, 67], [99, 65, 4, 28, 6, 16, 70, 92], [41, 41, 26, 56, 83, 40, 80, 70, 33], [41, 48, 72, 33, 47, 32, 37, 16, 94, 29], [53, 71, 44, 65, 25, 43, 91, 52, 97, 51, 14], [70, 11, 33, 28, 77, 73, 17, 78, 39, 68, 17, 57], [91, 71, 52, 38, 17, 14, 91, 43, 58, 50, 27, 29, 48], [63, 66, 4, 68, 89, 53, 67, 30, 73, 16, 69, 87, 40, 31], [4, 62, 98, 27, 23, 9, 70, 98, 73, 93, 38, 53, 60, 4, 23], []]
And I am getting list index out of range error in the line:
data[i][j] += max([data[i+1][j],data[i+1][j+1]])
IndexError: list index out of range
How can i get rid of this error?
Thanks in advance...
The problem is the last item in data. It's an empty list. Try removing it and executing the script, as follows:
In [392]: data[-1]
Out[392]: []
In [393]: data = data[:-1]
In [394]: for i in range(len(data)-2,-1,-1):
.....: for j in range(i+1):
.....: data[i][j] += max([data[i+1][j],data[i+1][j+1]]) #list out of range error
.....: print (data[0][0])
.....:
1074
In order to eliminate the error altogether, without the need to manually alter the contents of data, you can read it correctly at the first place, as follows:
data = [map(int,row.split()) for row in open("Triangle.txt") if row.strip()]

Categories