Related
I tried parallelizing with concurrent.futures expecting that parallelized code will be faster.
I made a dafault code to test the parallelization. It is not important what the code does. I'm mainly interested in the speed of the dafault code and parallelized code. All it does is calculate the correlation between lists from sigs and data_mat and store the values in corr_coefs. You can see the plain code below:
from time import time
import numpy as np
sigs = [
[91, 43, 44, 49, 64, 37, 61, 31, 73],
[59, 94, 91, 12, 47, 44, 93, 7, 84],
[47, 76, 24, 87, 2, 83, 77, 60, 36],
[83, 68, 3, 49, 14, 12, 51, 36, 22]
]
data_mat = [
[83, 68, 3, 49, 14, 12, 51, 36, 22],
[8, 78, 44, 40, 39, 67, 63, 64, 34],
[49, 24, 77, 91, 66, 44, 83, 30, 99],
[97, 40, 69, 7, 24, 70, 63, 52, 81],
[26, 62, 53, 36, 72, 54, 85, 94, 31],
[99, 52, 87, 52, 50, 9, 22, 72, 62],
[91, 15, 54, 84, 89, 15, 43, 31, 9],
[39, 26, 36, 81, 65, 50, 67, 12, 19],
[67, 22, 86, 24, 38, 30, 45, 94, 44],
# etc.
]
execution_time_start = time()
corr_coefs = []
for sig in sigs:
for data_mat_row in data_mat:
corr = np.corrcoef(np.square(sig), np.square(data_mat_row))
corr_coefs.append(corr[0, 1])
execution_time_end = time()
elapsed_time = execution_time_end - execution_time_start
print(f'Execution time (without parallelizaion): = {elapsed_time:.20f} s')
I tried to parallelize this code using concurrent.futures. The data_mat and sings sheets are the same (I just rewrote the code):
from time import time
import numpy as np
import concurrent.futures
sigs = [
[91, 43, 44, 49, 64, 37, 61, 31, 73],
[59, 94, 91, 12, 47, 44, 93, 7, 84],
[47, 76, 24, 87, 2, 83, 77, 60, 36],
[83, 68, 3, 49, 14, 12, 51, 36, 22]
]
data_mat = [
[83, 68, 3, 49, 14, 12, 51, 36, 22],
[8, 78, 44, 40, 39, 67, 63, 64, 34],
[49, 24, 77, 91, 66, 44, 83, 30, 99],
[97, 40, 69, 7, 24, 70, 63, 52, 81],
[26, 62, 53, 36, 72, 54, 85, 94, 31],
[99, 52, 87, 52, 50, 9, 22, 72, 62],
[91, 15, 54, 84, 89, 15, 43, 31, 9],
[39, 26, 36, 81, 65, 50, 67, 12, 19],
[67, 22, 86, 24, 38, 30, 45, 94, 44],
# etc.
]
execution_time_start = time()
corr_coefs = []
with concurrent.futures.ThreadPoolExecutor() as executor:
future_corr_coefs = {executor.submit(np.corrcoef, np.square(sig), np.square(data_mat_row)): (sig, data_mat_row)
for sig in sigs for data_mat_row in data_mat}
for future in concurrent.futures.as_completed(future_corr_coefs):
sig, data_mat_row = future_corr_coefs[future]
corr = future.result()
corr_coefs.append(corr[0,1])
execution_time_end = time()
elapsed_time = execution_time_end - execution_time_start
print(f'Execution time (with parallelizaion): = {elapsed_time:.20f} s')
I expected the rewritten code to be faster, but I got these outputs:
Execution time (without parallelization): = 1.30910301208496093750 s
Execution time (with parallelization): = 2.38465380668640136719 s
I also tried with a larger data set by expanding the list data_mat , but still the code is slower. Does anyone have any advice that would help? I still thought it might be Overhead. But I am not able to explain how...
I found the answer. The code is faster, but sigs and data_mat should be much larger (very large) to be more efficient. If the input data set is small, then it is pointless to use concurent.futures because the overhead for parallelizing the code increases the computation time...but if the data set is large and the code in loops is more complex, then the parallelization is faster...
In this code I need to exit loop on certain condition. if position + 1 == len(triangle)
Maybe I am not good at Python and don't understand clearly its behaviour.
It is not listening to my command and keep calling same function instead of leaving the loop.
The only other thing I tried is to call break in the loop itself when same condition is met but it is not working as well.
def max_value(list, index):
for _ in range(len(list)):
dictionary = dict()
maximum = max(list[index], list[index + 1])
dictionary['max'] = maximum
if maximum == list[index]:
dictionary['next_index'] = index
else:
dictionary['next_index'] = index + 1
return dictionary
total = 0
index = 0
skip = False
position = 0
def sliding_triangle(triangle):
global total
global index
global skip
global position
if not skip:
skip = True
total += triangle[0][0]
total += max_value(triangle[1], index).get("max")
index = max_value(triangle[1], index).get("next_index")
position = 2
sliding_triangle(triangle)
if position + 1 == len(triangle): return <-----------------HERE I AM EXPECTING IT TO EXIT
for row in range(position, len(triangle)):
values = max_value(triangle[row], index)
total += values.get("max")
index = values.get("next_index")
print(position, int(len(triangle)), index, values.get("max"), total)
position += 1
sliding_triangle(triangle)
return total
print(sliding_triangle([
[75],
[95, 64],
[17, 47, 82],
[18, 35, 87, 10],
[20, 4, 82, 47, 65],
[19, 1, 23, 75, 3, 34],
[88, 2, 77, 73, 7, 63, 67],
[99, 65, 4, 28, 6, 16, 70, 92],
[41, 41, 26, 56, 83, 40, 80, 70, 33],
[41, 48, 72, 33, 47, 32, 37, 16, 94, 29],
[53, 71, 44, 65, 25, 43, 91, 52, 97, 51, 14],
[70, 11, 33, 28, 77, 73, 17, 78, 39, 68, 17, 57],
[91, 71, 52, 38, 17, 14, 91, 43, 58, 50, 27, 29, 48],
[63, 66, 4, 68, 89, 53, 67, 30, 73, 16, 69, 87, 40, 31],
[ 4, 62, 98, 27, 23, 9, 70, 98, 73, 93, 38, 53, 60, 4, 23],
]))
Hehey, Got it working finally, so the solution was to break from loop earlier.
I had to put the condition in the beginning of the loop otherwise it was doing the same process and condition was wrong.
total = 0
index = 0
skip = False
position = 0
def max_value(list, index):
for _ in range(len(list)):
dictionary = dict()
maximum = max(list[index], list[index + 1])
dictionary['max'] = maximum
if maximum == list[index]:
dictionary['next_index'] = index
else:
dictionary['next_index'] = index + 1
return dictionary
def sliding_triangle(triangle):
global total
global index
global skip
global position
if not skip:
skip = True
total += triangle[0][0]
total += max_value(triangle[1], index).get("max")
index = max_value(triangle[1], index).get("next_index")
position = 2
sliding_triangle(triangle)
for row in range(position, len(triangle)):
if position == int(len(triangle)): break <<<--------------- I HAD TO CALL BREAK EARLIER, OTHERWISE FOR LOOP WAS KEEP WORKING INSTEAD OF STOPPING
values = max_value(triangle[row], index)
total += values.get("max")
index = values.get("next_index")
position += 1
sliding_triangle(triangle)
return total
print(sliding_triangle([
[75],
[95, 64],
[17, 47, 82],
[18, 35, 87, 10],
[20, 4, 82, 47, 65],
[19, 1, 23, 75, 3, 34],
[88, 2, 77, 73, 7, 63, 67],
[99, 65, 4, 28, 6, 16, 70, 92],
[41, 41, 26, 56, 83, 40, 80, 70, 33],
[41, 48, 72, 33, 47, 32, 37, 16, 94, 29],
[53, 71, 44, 65, 25, 43, 91, 52, 97, 51, 14],
[70, 11, 33, 28, 77, 73, 17, 78, 39, 68, 17, 57],
[91, 71, 52, 38, 17, 14, 91, 43, 58, 50, 27, 29, 48],
[63, 66, 4, 68, 89, 53, 67, 30, 73, 16, 69, 87, 40, 31],
[ 4, 62, 98, 27, 23, 9, 70, 98, 73, 93, 38, 53, 60, 4, 23],
]))
Recursive brute force solution
def sliding_triangle(triangle, row = 0, index = 0):
if row >= len(triangle) or index >= len(triangle[row]):
return 0 # row or index out of bounds
# Add parent value to max of child triangles
return triangle[row][index] + max(sliding_triangle(triangle, row+1, index), sliding_triangle(triangle, row+1, index+1))
Tests
print(sliding_triangle([[3], [7, 4], [2, 4, 6], [8, 5, 9, 3]]))
# Output: 23
print(sliding_triangle([
[75],
[95, 64],
[17, 47, 82],
[18, 35, 87, 10],
[20, 4, 82, 47, 65],
[19, 1, 23, 75, 3, 34],
[88, 2, 77, 73, 7, 63, 67],
[99, 65, 4, 28, 6, 16, 70, 92],
[41, 41, 26, 56, 83, 40, 80, 70, 33],
[41, 48, 72, 33, 47, 32, 37, 16, 94, 29],
[53, 71, 44, 65, 25, 43, 91, 52, 97, 51, 14],
[70, 11, 33, 28, 77, 73, 17, 78, 39, 68, 17, 57],
[91, 71, 52, 38, 17, 14, 91, 43, 58, 50, 27, 29, 48],
[63, 66, 4, 68, 89, 53, 67, 30, 73, 16, 69, 87, 40, 31],
[ 4, 62, 98, 27, 23, 9, 70, 98, 73, 93, 38, 53, 60, 4, 23],
]))
# Output: 1074
However, brute force approach times out on larges dataset
Optimized Solution
Apply memoization to brute force solution.
Uses cache to avoid repeatedly solving for subpaths of a parent triangle node
Code
def sliding_triangle(triangle):
' Wrapper setup function '
def sliding_triangle_(row, index):
' Memoized function which does the calcs'
if row >= len(triangle) or index >= len(triangle[row]):
return 0
if not (row, index) in cache:
# Update cache
cache[(row, index)] = (triangle[row][index] +
max(sliding_triangle_(row+1, index),
sliding_triangle_(row+1, index+1)))
return cache[(row, index)]
cache = {} # init cache
return sliding_triangle_(0, 0) # calcuate starting from top most node
Tests
Same results as brute force solution for simple test cases
Works on large dataset i.e. https://projecteuler.net/project/resources/p067_triangle.txt
Find and Show Optimal Path*
Modify Brute Force to Return Path
Show highlighted path in triangle
Code
####### Main function
def sliding_triangle_path(triangle, row = 0, index = 0, path = None):
'''
Finds highest scoring path (using brute force)
'''
if path is None:
path = [(0, 0)] # Init path with top most triangle node
if row >= len(triangle) or index >= len(triangle[row]):
path.pop() # drop last item since place out of bounds
return path
# Select best path of child nodes
path_ = max(sliding_triangle_path(triangle, row+1, index, path + [(row+1, index)]),
sliding_triangle_path(triangle, row+1, index+1, path + [(row+1, index+1)]),
key = lambda p: score(triangle, p))
return path_
####### Utils
def getter(x, args):
'''
Gets element of multidimensional array using tuple as index
Source (Modified): https://stackoverflow.com/questions/40258083/recursive-itemgetter-for-python
'''
try:
for k in args:
x = x[k]
return x
except IndexError:
return 0
def score(tri, path):
' Score for a path through triangle tri '
return sum(getter(tri, t) for t in path)
def colored(r, g, b, text):
'''
Use rgb code to color text'
Source: https://www.codegrepper.com/code-examples/python/how+to+print+highlighted+text+in+python
'''
return "\033[38;2;{};{};{}m{} \033[38;2;255;255;255m".format(r, g, b, text)
def highlight_path(triangle, path):
' Created string that highlight path in red through triangle'
result = "" # output string
for p in path: # Looop over path tuples
row, index = p
values = triangle[row] # corresponding values in row 'row' of triangle
# Color in red path value at index, other values are in black (color using rgb)
row_str = ' '.join([colored(255, 0, 0, str(v)) if i == index else colored(0, 0, 0, str(v)) for i, v in enumerate(values)])
result += row_str + '\n'
return result
Test
# Test
triangle = ([
[75],
[95, 64],
[17, 47, 82],
[18, 35, 87, 10],
[20, 4, 82, 47, 65],
[19, 1, 23, 75, 3, 34],
[88, 2, 77, 73, 7, 63, 67],
[99, 65, 4, 28, 6, 16, 70, 92],
[41, 41, 26, 56, 83, 40, 80, 70, 33],
[41, 48, 72, 33, 47, 32, 37, 16, 94, 29],
[53, 71, 44, 65, 25, 43, 91, 52, 97, 51, 14],
[70, 11, 33, 28, 77, 73, 17, 78, 39, 68, 17, 57],
[91, 71, 52, 38, 17, 14, 91, 43, 58, 50, 27, 29, 48],
[63, 66, 4, 68, 89, 53, 67, 30, 73, 16, 69, 87, 40, 31],
[ 4, 62, 98, 27, 23, 9, 70, 98, 73, 93, 38, 53, 60, 4, 23],
])
path = sliding_triangle_path(triangle)
print(f'Score: {score(tri, path)}')
print(f"Path\n {'->'.join(map(str,path))}")
print(f'Highlighted path\n {highlight_path(tri, path)}')
Output
Score: 1074
Path
(0, 0)->(1, 1)->(2, 2)->(3, 2)->(4, 2)->(5, 3)->(6, 3)->(7, 3)->(8, 4)->(9, 5)->(10, 6)->(11, 7)->(12, 8)->(13, 8)->(14, 9)
Got my own correct answer for the kata, which can handle big triangles and passed all tests
def longest_slide_down(triangle):
temp_arr = []
first = triangle[-2]
second = triangle[-1]
if len(triangle) > 2:
for i in range(len(first)):
for _ in range(len(second)):
summary = first[i] + max(second[i], second[i + 1])
temp_arr.append(summary)
break
del triangle[-2:]
triangle.append(temp_arr)
return longest_slide_down(triangle)
summary = triangle[0][0] + max(triangle[1][0], triangle[1][1])
return summary
You can try using an else and a pass, like so:
def max_value():
# code
def sliding_triangle():
if not skip:
# code
if position + 1 == len(triangle):
pass
else:
for row in range(position, len(triangle)):
# code
return total
print sliding_triangle()
As far as I know, you can't interrupt a def by throwing a return in two or more different points of the script just like in Java. Instead, you can just place a condition that, whether is respected, you skip to the return. Instead, you continue with the execution.
I synthesized your code to let you understand the logic easier, but it's not a problem if I have to write it fully
I am looking for a way to reshape the following 1d-numpy array:
# dimensions
n = 2 # int : 1 ... N
h = 2 # int : 1 ... N
m = n*(2*h+1)
input_data = np.arange(0,(n*(2*h+1))**2)
The expected output should be reshaped into (2*h+1)**2 blocks of shape (n,n) such as:
input_data.reshape(((2*h+1)**2,n,n))
>>> array([[[ 0 1]
[ 2 3]]
[[ 4 5]
[ 6 7]]
...
[[92 93]
[94 95]]
[[96 97]
[98 99]]]
These blocks finally need to be reshaped into a (m,m) matrix so that they are stacked in rows of 2*h+1 blocks:
>>> array([[ 0, 1, 4, 5, 8, 9, 12, 13, 16, 17],
[ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19],
...
[80, 81, 84, 85, 88, 89, 92, 93, 96, 97],
[82, 83, 86, 87, 90, 91, 94, 95, 98, 99]])
My problem is that I can't seem to find proper axis permutations after the first reshape into (n,n) blocks. I have looked at several answers such as this one but in vain.
As the real dimensions n and h are quite bigger and this operation takes place in an iterative process, I am looking for an efficient reshaping operation.
I don't think you can do this with reshape and transpose alone (although I'd love to be proven wrong). Using np.block works, but it's a bit messy:
np.block([list(i) for i in input_data.reshape( (2*h+1), (2*h+1), n, n )])
array([[ 0, 1, 4, 5, 8, 9, 12, 13, 16, 17],
[ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19],
[20, 21, 24, 25, 28, 29, 32, 33, 36, 37],
[22, 23, 26, 27, 30, 31, 34, 35, 38, 39],
[40, 41, 44, 45, 48, 49, 52, 53, 56, 57],
[42, 43, 46, 47, 50, 51, 54, 55, 58, 59],
[60, 61, 64, 65, 68, 69, 72, 73, 76, 77],
[62, 63, 66, 67, 70, 71, 74, 75, 78, 79],
[80, 81, 84, 85, 88, 89, 92, 93, 96, 97],
[82, 83, 86, 87, 90, 91, 94, 95, 98, 99]])
EDIT: Never mind, you can do without np.block:
input_data.reshape( (2*h+1), (2*h+1), n, n).transpose(0, 2, 1, 3).reshape(10, 10)
array([[ 0, 1, 4, 5, 8, 9, 12, 13, 16, 17],
[ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19],
[20, 21, 24, 25, 28, 29, 32, 33, 36, 37],
[22, 23, 26, 27, 30, 31, 34, 35, 38, 39],
[40, 41, 44, 45, 48, 49, 52, 53, 56, 57],
[42, 43, 46, 47, 50, 51, 54, 55, 58, 59],
[60, 61, 64, 65, 68, 69, 72, 73, 76, 77],
[62, 63, 66, 67, 70, 71, 74, 75, 78, 79],
[80, 81, 84, 85, 88, 89, 92, 93, 96, 97],
[82, 83, 86, 87, 90, 91, 94, 95, 98, 99]])
Hey!
This relates to problem 18 from Euler's project (https://projecteuler.net/problem=18)
This code solved it, but I got an error (4th line):
Undefined variable: 'ans'Python(undefined-variable)
So, I want to understand why this happened
Also, let me know, if there are any flaws in my code
Thanks in advance
def brute(i, j, sum):
global ans
if i > len(l) - 1:
if sum > ans:
ans = sum
return None
brute(i + 1, j, sum + l[i][j])
brute(i + 1, j + 1, sum + l[i][j])
l = [
[75],
[95, 64],
[17, 47, 82],
[18, 35, 87, 10],
[20, 4, 82, 47, 65],
[19, 1, 23, 75, 3, 34],
[88, 2, 77, 73, 7, 63, 67],
[99, 65, 4, 28, 6, 16, 70, 92],
[41, 41, 26, 56, 83, 40, 80, 70, 33],
[41, 48, 72, 33, 47, 32, 37, 16, 94, 29],
[53, 71, 44, 65, 25, 43, 91, 52, 97, 51, 14],
[70, 11, 33, 28, 77, 73, 17, 78, 39, 68, 17, 57],
[91, 71, 52, 38, 17, 14, 91, 43, 58, 50, 27, 29, 48],
[63, 66, 4, 68, 89, 53, 67, 30, 73, 16, 69, 87, 40, 31],
[4, 62, 98, 27, 23, 9, 70, 98, 73, 93, 38, 53, 60, 4, 23],
]
ans = 0
brute(0, 0, 0)
print(ans)
IMHO this is not a good use-case for globals, would be better to refactor the code like so:
def brute(i, j):
if i > len(l) - 1:
return 0
return l[i][j]+max(brute(i + 1, j), brute(i + 1, j + 1))
I've flipped the logic around to accomplish though, the code works by picking the maximum sum from its subtree
You ideally want to save usage of global variables for system-wide settings and such
I'm trying to solve the max path sum problem from project euler.
CODE:
def main():
data = [map(int,row.split()) for row in open("Triangle.txt")]
print data
for i in range(len(data)-2,-1,-1):
for j in range(i+1):
data[i][j] += max([data[i+1][j],data[i+1][j+1]]) #list out of range error
print (data[0][0])
if __name__ == '__main__':
main()
The data value has 16 internal lists as follows:
[[75], [95, 64], [17, 47, 82], [18, 35, 87, 10], [20, 4, 82, 47, 65], [19, 1, 23, 75, 3, 34], [88, 2, 77, 73, 7, 63, 67], [99, 65, 4, 28, 6, 16, 70, 92], [41, 41, 26, 56, 83, 40, 80, 70, 33], [41, 48, 72, 33, 47, 32, 37, 16, 94, 29], [53, 71, 44, 65, 25, 43, 91, 52, 97, 51, 14], [70, 11, 33, 28, 77, 73, 17, 78, 39, 68, 17, 57], [91, 71, 52, 38, 17, 14, 91, 43, 58, 50, 27, 29, 48], [63, 66, 4, 68, 89, 53, 67, 30, 73, 16, 69, 87, 40, 31], [4, 62, 98, 27, 23, 9, 70, 98, 73, 93, 38, 53, 60, 4, 23], []]
And I am getting list index out of range error in the line:
data[i][j] += max([data[i+1][j],data[i+1][j+1]])
IndexError: list index out of range
How can i get rid of this error?
Thanks in advance...
The problem is the last item in data. It's an empty list. Try removing it and executing the script, as follows:
In [392]: data[-1]
Out[392]: []
In [393]: data = data[:-1]
In [394]: for i in range(len(data)-2,-1,-1):
.....: for j in range(i+1):
.....: data[i][j] += max([data[i+1][j],data[i+1][j+1]]) #list out of range error
.....: print (data[0][0])
.....:
1074
In order to eliminate the error altogether, without the need to manually alter the contents of data, you can read it correctly at the first place, as follows:
data = [map(int,row.split()) for row in open("Triangle.txt") if row.strip()]