I am trying to find the minimal cost path from point (0, 0) to point {(u, v) | u + v <= 100} in a 2d matrix of data I have generated.
My algorithm is pretty simple and currently I have managed to produce the following (visualized) results, which leads me to understand that I am way off in my algorithm.
# each cell of path_arr contains a tuple of (i,j) of the next cell in path.
# data contains the "cost" of stepping on its cell
# total_cost_arr is used to assist reconstructing the path.
def min_path(data, m=100, n=100):
total_cost_arr = np.array([np.array([0 for x in range(0, m)]).astype(float) for x in range(0, n)])
path_arr = np.array([np.array([(0, 0) for x in range(0, m)], dtype='i,i') for x in range(0, n)])
total_cost_arr[0, 0] = data[0][0]
for i in range(0, m):
total_cost_arr[i, 0] = total_cost_arr[i - 1, 0] + data[i][0]
for j in range(0, n):
total_cost_arr[0, j] = total_cost_arr[0, j - 1] + data[0][j]
for i in range(1, m):
for j in range(1, n):
total_cost_arr[i, j] = min(total_cost_arr[i - 1, j - 1], total_cost_arr[i - 1, j], total_cost_arr[i, j - 1]) + data[i][j]
if total_cost_arr[i, j] == total_cost_arr[i - 1, j - 1] + data[i][j]:
path_arr[i - 1, j - 1] = (i, j)
elif total_cost_arr[i, j] == total_cost_arr[i - 1, j] + data[i][j]:
path_arr[i - 1, j] = (i, j)
else:
path_arr[i, j - 1] = (i, j)
each cell of path_arr contains a tuple of (i,j) of the next cell in path.
data contains the "cost" of stepping on its cell, and
total_cost_arr is used to assist reconstructing the path.
I think that placing (i,j) in previous cell is causing some conflicts which lead to this behavior.
I don't think an array is the best structure for your problem.
You should use some graph data structure (with networkx for example) and use algorithm like the Dijkstra one's or A* (derivated from the first one).
The Dijkstra algorithm is implemented in netwokrkx (function for shortest path).
Related
I have a python code that implements Dynamic Time Wrapping, which I use to compare the predicted curve to my actual curve. I care about the shape of the curve but also about the distance between the 2 curves. I z-normalized the 2 curves before calling the function that returns the cost. However, I got weird results. For example:
I got cost of 0.28 for this example:
While I got 0.38 for the below example:
In the first plot, the prediction is very far away compared to the second plot. I even got the same value of 0.28 with even very far away prediction such as 5000 points further. What is wrong here?
Below is my code from this source:
#Dynamic Time Wrapping Algorithm
def dp(dist_mat):
N, M = dist_mat.shape
# Initialize the cost matrix
cost_mat = numpy.zeros((N + 1, M + 1))
for i in range(1, N + 1):
cost_mat[i, 0] = numpy.inf
for i in range(1, M + 1):
cost_mat[0, i] = numpy.inf
# Fill the cost matrix while keeping traceback information
traceback_mat = numpy.zeros((N, M))
for i in range(N):
for j in range(M):
penalty = [
cost_mat[i, j], # match (0)
cost_mat[i, j + 1], # insertion (1)
cost_mat[i + 1, j]] # deletion (2)
i_penalty = numpy.argmin(penalty)
cost_mat[i + 1, j + 1] = dist_mat[i, j] + penalty[i_penalty]
traceback_mat[i, j] = i_penalty
# Traceback from bottom right
i = N - 1
j = M - 1
path = [(i, j)] #Path is commented because I am not interested in the path
# while i > 0 or j > 0:
# tb_type = traceback_mat[i, j]
# if tb_type == 0:
# # Match
# i = i - 1
# j = j - 1
# elif tb_type == 1:
# # Insertion
# i = i - 1
# elif tb_type == 2:
# # Deletion
# j = j - 1
# path.append((i, j))
# Strip infinity edges from cost_mat before returning
cost_mat = cost_mat[1:, 1:]
return (path[::-1], cost_mat)
I use the above code as below:
z_actual=stats.zscore(actual)
z_pred=stats.zscore(mean_predictions)
N = actual.shape[0]
M = mean_predictions.shape[0]
dist_mat = numpy.zeros((N, M))
for i in range(N):
for j in range(M):
dist_mat[i, j] = abs(z_actual[i] - z_pred[j])
path,cost_mat=dp(dist_mat)
mape=cost_mat[N - 1, M - 1]/(N + M)
Basically, I want to find the number of unique paths on the maze, but however, I've tried this code and it's working for my first matrix of the binary maze, however, it only identifies 1 unique path, however, the answer should be 2 because, on the maze[3][0], it could take a new unique path using the "right-down" choice rather than going to maze[4][0] and taking "right" choice.
# Check if cell (x, y) is valid or not
def isValidCell(x, y, N, M):
return not (x < 0 or y < 0 or x >= N or y >= M)
def countPaths(maze, i, j, dest, visited):
# `N × M` matrix
N = len(maze)
M = len(maze[0])
# if destination (x, y) is found, return 1
if (i, j) == dest:
return 1
# stores number of unique paths from source to destination
count = 0
# mark the current cell as visited
visited[i][j] = True
# if the current cell is a valid and open cell
if isValidCell(i, j, N, M) and maze[i][j] == 1:
print(i)
# go down (i, j) ——> (i + 1, j)
if i + 1 < N and not visited[i + 1][j]:
print("down")
count += countPaths(maze, i + 1, j, dest, visited)
# go up (i, j) ——> (i - 1, j)
elif i - 1 >= 0 and not visited[i - 1][j]:
print("up")
count += countPaths(maze, i - 1, j, dest, visited)
# go right (i, j) ——> (i, j + 1)
elif j + 1 < M and not visited[i][j + 1]:
print("right")
count += countPaths(maze, i, j + 1, dest, visited)
# go right-down (diagonal) (i, j) ——> (i + 1, j + 1)
elif j + 1 < M and i + 1 < N and not visited[i + 1][j + 1]:
print("right down")
count += countPaths(maze, i + 1, j + 1, dest, visited)
# backtrack from the current cell and remove it from the current path
visited[i][j] = False
return count
def findCount(maze, src, dest):
# get source cell (i, j)
i, j = src
# get destination cell (x, y)
x, y = dest
# base case: invalid input
if not maze or not len(maze) or not maze[i][j] or not maze[x][y]:
return 0
# `N × M` matrix
N = len(maze)
M = len(maze[0])
print(M)
# 2D matrix to keep track of cells involved in the current path
visited = [[False for k in range(M)] for l in range(N)]
# start from source cell (i, j)
return countPaths(maze, i, j, dest, visited)
if name == 'main':
maze = [
[1, 0],
[1, 0],
[1, 0],
[1, 0],
[1, 1]
]
# source cell
src = (0, 0)
# destination cell
dest = (4, 1)
print("The total number of unique paths are", findCount(maze, src, dest))
The code assumes a square matrix
# `N × N` matrix
N = len(maze)
You are feeding it a rectangular matrix maze = 15x5 (didn't count the lines, guesstimated)
I am implementing a code to find all the paths from top left to bottom right in a n*m matrix.
Here is my code:
# Python3 program to Print all possible paths from
# top left to bottom right of a mXn matrix
'''
/* mat: Pointer to the starting of mXn matrix
i, j: Current position of the robot
(For the first call use 0, 0)
m, n: Dimentions of given the matrix
pi: Next index to be filed in path array
*path[0..pi-1]: The path traversed by robot till now
(Array to hold the path need to have
space for at least m+n elements) */
'''
def printAllPathsUtil(mat, i, j, m, n, path, pi):
# Reached the bottom of the matrix
# so we are left with only option to move right
if (i == m - 1):
for k in range(j, n):
path[pi + k - j] = mat[i][k]
for l in range(pi + n - j):
print(path[l], end = " ")
print()
return
# Reached the right corner of the matrix
# we are left with only the downward movement.
if (j == n - 1):
for k in range(i, m):
path[pi + k - i] = mat[k][j]
for l in range(pi + m - i):
print(path[l], end = " ")
print()
return
# Add the current cell
# to the path being generated
path[pi] = mat[i][j]
# Print all the paths
# that are possible after moving down
printAllPathsUtil(mat, i + 1, j, m, n, path, pi + 1)
# Print all the paths
# that are possible after moving right
printAllPathsUtil(mat, i, j + 1, m, n, path, pi + 1)
# Print all the paths
# that are possible after moving diagonal
# printAllPathsUtil(mat, i+1, j+1, m, n, path, pi + 1);
# The main function that prints all paths
# from top left to bottom right
# in a matrix 'mat' of size mXn
def printAllPaths(mat, m, n):
path = [0 for i in range(m + n)]
printAllPathsUtil(mat, 0, 0, m, n, path, 0)
def printAllPaths(mat, m, n):
path = [0 for i in range(m + n)]
printAllPathsUtil(mat, 0, 0, m, n, path, 0)
matrix = np.random.rand(150, 150)
printAllPaths(matrix, 150, 150)
I would like to find all the paths for a 150 by 150 matrix. But this takes a lot of time. Is there a good way to make it faster? If there are also any suggestions to speed up the algorithm that would be great`.
I think that when you talk of path a graph is a good solution, my idea is to build a graph with all paths and ask to him the solution, this print out all paths, each node is the couple of coordinates (x,y):
import networkx as nx
X = Y = 150
G = nx.DiGraph()
edges = []
for x in range(X):
for y in range(Y):
if x<X-1:
edges.append(((x,y),(x+1,y)))
if y<Y-1:
edges.append(((x,y),(x,y+1)))
G.add_edges_from(edges)
print(len(G.nodes()))
print(len(G.edges()))
for path in nx.all_simple_paths(G,(0,0),(X-1,Y-1)):
print(path)
I have the following code snippet, which essentially does the following:
Given a 2d numpy array, arr, compute sum_arr as follow:
sum_arr[i, j] = arr[i, j] + min(sum_arr[i - 1, j-1:j+2]) if (i>0) else arr[i, j]
(reasonable indices for j - 1 : j + 2 of course, all within 0 and w)
Here's my implementation:
import numpy as np
h, w = 1000, 1000 # Shape of the 2d array
arr = np.arange(h * w).reshape((h, w))
sum_arr = arr.copy()
def min_parent(i, j):
min_index = j
if j > 0:
if sum_arr[i - 1, j - 1] < sum_arr[i - 1, min_index]:
min_index = j - 1
if j < w - 1:
if sum_arr[i - 1, j + 1] < sum_arr[i - 1, min_index]:
min_index = j + 1
return (i - 1, min_index)
for i, j in np.ndindex((h - 1, w)):
sum_arr[i + 1, j] += sum_arr[min_parent(i + 1, j)]
And here's the problem: this code snippet takes way too long to execute for only 1e6 operations (About 5s on average on my machine)
What is a better way of implementing this?
While your operation is sequential across rows, within rows it is not. It is therefore easy to vectorize row-wise and keep only a 1D outer loop which in relative terms shouldn't incur too much overhead.
Indeed, doing so gives me a ~200x speedup:
5.2975871179951355 # OP
0.023798351001460105 # vectorized rows
And the code is actually quite simple:
import numpy as np
h, w = 1000, 1000 # Shape of the 2d array
arr = np.arange(h * w).reshape((h, w))
def min_parent(i, j, sum_arr):
min_index = j
if j > 0:
if sum_arr[i - 1, j - 1] < sum_arr[i - 1, min_index]:
min_index = j - 1
if j < w - 1:
if sum_arr[i - 1, j + 1] < sum_arr[i - 1, min_index]:
min_index = j + 1
return (i - 1, min_index)
def OP():
sum_arr = arr.copy()
for i, j in np.ndindex((h - 1, w)):
sum_arr[i + 1, j] += sum_arr[min_parent(i + 1, j, sum_arr)]
return sum_arr
def vect_rows():
h, w = arr.shape
if w==1:
return arr.cumsum(0)
out = np.empty_like(arr)
out[0] = arr[0]
for i in range(1, h):
out[i, :-1] = np.minimum(out[i-1, :-1], out[i-1, 1:])
out[i, 1:] = np.minimum(out[i, :-1], out[i-1, 1:])
out[i] += arr[i]
return out
assert np.allclose(OP(), vect_rows())
from timeit import repeat
print(min(repeat(OP, number=3)))
print(min(repeat(vect_rows, number=3)))
Use dynamic programming:
On a different array, precompute the mins for the blocks of of size X (in your case you are doing it for size 3 (since you check j-1, j, j + 1). To determine the min for a block, use the value of the referenced position in the original array and the min of the previous block because you seem to be doing it dynamically.
This way you simply assign the index that needs it.
I am trying to solve the XOR equation system. For example:
A = [[1, 1, 1, 0, 0], [0, 1, 1, 1, 0], [0, 0, 1, 1, 1], [0, 1, 1, 0, 1], [0, 1, 0, 1, 1]]
s = [3, 14, 13, 5, 2]
m = 5 # len(s)
Ax = s => x = [12, 9, 6, 1, 10]
I tried 2 ways:
The first way is Gaussian elimination (~2.5 second) which was showed here
The second way to invert modular matrix A (with modulo 2) and then, XOR multiply with A_invert and s. (~7.5 second)
Could you please show me is there a way or a python library to speed up. Even I tried to use gmpy2 library, but it cannot reduce much. Below I described python code so that you can easily follow.
Using Gaussian elimination:
def SolveLinearSystem (A, B, N):
for K in range (0, N):
if (A[K][K] == 0):
for i in range (K+1, N):
if (A[i][K]!=0):
for L in range (0, N):
s = A[K][L]
A[K][L] = A[i][L]
A[i][L] = s
s = B[i]
B[i] = B[K]
B[K] = s
break
for I in range (0, N):
if (I!=K):
if (A[I][K]):
#M = 0
for M in range (K, N):
A[I][M] = A[I][M] ^ A[K][M]
B[I] = B[I] ^ B[K]
SolveLinearSystem (A, s, 5)
Using Inversion
def identitymatrix(n):
return [[long(x == y) for x in range(0, n)] for y in range(0, n)]
def multiply_vector_scalar (vector, scalar, q):
kq = []
for i in range (0, len(vector)):
kq.append (vector[i] * scalar %q)
return kq
def minus_vector_scalar(vector1, scalar, vector2, q):
kq = []
for i in range (0, len(vector1)):
kq.append ((vector1[i] - scalar * vector2[i]) %q)
return kq
def inversematrix(matrix, q):
n = len(matrix)
A =[]
for j in range (0, n):
temp = []
for i in range (0, n):
temp.append (matrix[j][i])
A.append(temp)
Ainv = identitymatrix(n)
for i in range(0, n):
factor = gmpy2.invert(A[i][i], q) #invert mod q
A[i] = multiply_vector_scalar(A[i],factor,q)
Ainv[i] = multiply_vector_scalar(Ainv[i],factor,q)
for j in range(0, n):
if (i != j):
factor = A[j][i]
A[j] = minus_vector_scalar(A[j], factor, A[i], q)
Ainv[j] = minus_vector_scalar(Ainv[j], factor, Ainv[i], q)
return Ainv
def solve_equation (A, y):
result = []
for i in range (0, m):
temp = 0
for j in range (0, m):
temp = (temp ^ A[i][j]* y[j])
result.append(temp)
return result
A_invert = inversematrix(A, 2)
print solve_equation (A_invert, s)
Both of the methods you present make you do a cubic number of bit-operations. There are methods that are faster, both asymptotically and in practise.
A first step (that may well be sufficient for you) is to use a 32-bit integer (I believe they're called numpy.int32 in Python) to store 32 consecutive elements of a row. This will speed up row reduction by a factor close to 32 on large enough inputs and probably put a significant dent into your running time on modest inputs.
In your particular code, there are a number of things for you to trivially specialise to the mod-2 case. Search your code for % and inversemodp and handle all of those; the extra, pointless operations are most certainly not helping your runtime.