Related
I have a set of 68 keypoints (size [68, 2]) that I am mapping to gaussian heatmaps. To do this, I have the following function:
def generate_gaussian(t, x, y, sigma=10):
"""
Generates a 2D Gaussian point at location x,y in tensor t.
x should be in range (-1, 1).
sigma is the standard deviation of the generated 2D Gaussian.
"""
h,w = t.shape
# Heatmap pixel per output pixel
mu_x = int(0.5 * (x + 1.) * w)
mu_y = int(0.5 * (y + 1.) * h)
tmp_size = sigma * 3
# Top-left
x1,y1 = int(mu_x - tmp_size), int(mu_y - tmp_size)
# Bottom right
x2, y2 = int(mu_x + tmp_size + 1), int(mu_y + tmp_size + 1)
if x1 >= w or y1 >= h or x2 < 0 or y2 < 0:
return t
size = 2 * tmp_size + 1
tx = np.arange(0, size, 1, np.float32)
ty = tx[:, np.newaxis]
x0 = y0 = size // 2
# The gaussian is not normalized, we want the center value to equal 1
g = torch.tensor(np.exp(- ((tx - x0) ** 2 + (ty - y0) ** 2) / (2 * sigma ** 2)))
# Determine the bounds of the source gaussian
g_x_min, g_x_max = max(0, -x1), min(x2, w) - x1
g_y_min, g_y_max = max(0, -y1), min(y2, h) - y1
# Image range
img_x_min, img_x_max = max(0, x1), min(x2, w)
img_y_min, img_y_max = max(0, y1), min(y2, h)
t[img_y_min:img_y_max, img_x_min:img_x_max] = \
g[g_y_min:g_y_max, g_x_min:g_x_max]
return t
def rescale(a, img_size):
# scale tensor to [-1, 1]
return 2 * a / img_size[0] - 1
My current code uses a for loop to compute the gaussian heatmap for each of the 68 keypoint coordinates, then stacks the resulting tensors to create a [68, H, W] tensor:
x_k1 = [generate_gaussian(torch.zeros(H, W), x, y) for x, y in rescale(kp1.numpy(), frame.shape)]
x_k1 = torch.stack(x_k1, dim=0)
However, this method is super slow. Is there some way that I can do this without a for loop?
Edit:
I tried #Cris Luengo's proposal to compute a 1D Gaussian:
def generate_gaussian1D(t, x, y, sigma=10):
h,w = t.shape
# Heatmap pixel per output pixel
mu_x = int(0.5 * (x + 1.) * w)
mu_y = int(0.5 * (y + 1.) * h)
tmp_size = sigma * 3
# Top-left
x1, y1 = int(mu_x - tmp_size), int(mu_y - tmp_size)
# Bottom right
x2, y2 = int(mu_x + tmp_size + 1), int(mu_y + tmp_size + 1)
if x1 >= w or y1 >= h or x2 < 0 or y2 < 0:
return t
size = 2 * tmp_size + 1
tx = np.arange(0, size, 1, np.float32)
ty = tx[:, np.newaxis]
x0 = y0 = size // 2
g = torch.tensor(np.exp(-np.power(tx - mu_x, 2.) / (2 * np.power(sigma, 2.))))
g = g * g[:, None]
g_x_min, g_x_max = max(0, -x1), min(x2, w) - x1
g_y_min, g_y_max = max(0, -y1), min(y2, h) - y1
img_x_min, img_x_max = max(0, x1), min(x2, w)
img_y_min, img_y_max = max(0, y1), min(y2, h)
t[img_y_min:img_y_max, img_x_min:img_x_max] = \
g[g_y_min:g_y_max, g_x_min:g_x_max]
return t
but my output ends up being an incomplete gaussian.
I'm not sure what I'm doing wrong. Any help would be appreciated.
You generate an NxN array g with a Gaussian centered on its center pixel. N is computed such that it extends by 3*sigma from that center pixel. This is the fastest way to build such an array:
tmp_size = sigma * 3
tx = np.arange(1, tmp_size + 1, 1, np.float32)
g = np.exp(-(tx**2) / (2 * sigma**2))
g = np.concatenate((np.flip(g), [1], g))
g = g * g[:, None]
What we're doing here is compute half a 1D Gaussian. We don't even bother computing the value of the Gaussian for the middle pixel, which we know will be 1. We then build the full 1D Gaussian by flipping our half-Gaussian and concatenating. Finally, the 2D Gaussian is built by the outer product of the 1D Gaussian with itself.
We could shave a bit of extra time by building a quarter of the 2D Gaussian, then concatenating four rotated copies of it. But the difference in computational cost is not very large, and this is much simpler. Note that np.exp is the most expensive operation here by far, so just minimizing how often we call it we significantly reduce the computational cost.
However, the best way to speed up the complete code is to compute the array g only once, rather than anew for each key point. Note how your sigma doesn't change, so all the arrays g that are computed are identical. If you compute it only once, it no longer matters which method you use to compute it, since this will be a minimal portion of the total program anyway.
You could, for example, have a global variable _gaussian to hold your array, and have your function compute it only the first time it is called. Or you could separate your function into two functions, one that constructs this array, and one that copies it into an image, and call them as follows:
g = create_gaussian(sigma=3)
x_k1 = [
copy_gaussian(torch.zeros(H, W), x, y, g)
for x, y in rescale(kp1.numpy(), frame.shape)
]
On the other hand, you're likely best off using existing functionality. For example, DIPlib has a function dip.DrawBandlimitedPoint() [disclosure: I'm an author] that adds a Gaussian blob to an image. Likely you'll find similar functions in other libraries.
I have 2 Points A and B and their xyz-coordinates. I need a list of all xyz points that are on the line between those 2 points. The Bresenham's line algorithm was too slow for my case.
Example xyz for A and B:
p = np.array([[ 275.5, 244.2, -27.3],
[ 153.2, 184.3, -0.3]])
Expected output:
x3 = p[0,0] + t*(p[1,0]-p[0,0])
y3 = p[0,1] + t*(p[1,1]-p[0,1])
z3 = p[0,2] + t*(p[1,2]-p[0,2])
p3 = [x3,y3,z3]
There was a very fast approach for 2D:
def connect(ends):
d0, d1 = np.diff(ends, axis=0)[0]
if np.abs(d0) > np.abs(d1):
return np.c_[np.arange(ends[0, 0], ends[1,0] + np.sign(d0), np.sign(d0), dtype=np.int32),
np.arange(ends[0, 1] * np.abs(d0) + np.abs(d0)//2,
ends[0, 1] * np.abs(d0) + np.abs(d0)//2 + (np.abs(d0)+1) * d1, d1, dtype=np.int32) // np.abs(d0)]
else:
return np.c_[np.arange(ends[0, 0] * np.abs(d1) + np.abs(d1)//2,
ends[0, 0] * np.abs(d1) + np.abs(d1)//2 + (np.abs(d1)+1) * d0, d0, dtype=np.int32) // np.abs(d1),
np.arange(ends[0, 1], ends[1,1] + np.sign(d1), np.sign(d1), dtype=np.int32)]
It is impossible to give "all" points on a line, as there are countless points. You could do that for a discrete data type like integers, though.
My answer assumes floating point numbers, as in your example.
Technically, floating point numbers are stored in a binary format with a fixed width, so they are discrete, but I will disregard that fact, as it is most likely not what you want.
As you already typed in your question, every point P on that line satisfies this equation:
P = P1 + t * (P2 - P1), 0 < t < 1
My version uses numpy broadcasting to circumvent explicit loops.
import numpy as np
p = np.array([[ 275.5, 244.2, -27.3],
[ 153.2, 184.3, -0.3]])
def connect(points, n_points):
p1, p2 = points
diff = p2 - p1
t = np.linspace(0, 1, n_points+2)[1:-1]
return p1[np.newaxis, :] + t[:, np.newaxis] * diff[np.newaxis, :]
print(connect(p, n_points=4))
# [[251.04 232.22 -21.9 ]
# [226.58 220.24 -16.5 ]
# [202.12 208.26 -11.1 ]
# [177.66 196.28 -5.7 ]]
Maybe I missunderstand something here, but couldn't you just create a function (like you already did just for one single point) and then create a list of points depending of how manny points you want?
I mean between 2 points are an infinite number of other points, so you have to either define a number or use the function directly, which describes where the points are at.
import numpy as np
p = np.array([[ 275.5, 244.2, -27.3],
[ 153.2, 184.3, -0.3]])
def gen_line(p, n):
points = []
stepsize = 1/n
for t in np.arange(0,1,stepsize):
x = (p[1,0]-p[0,0])
y = (p[1,1]-p[0,1])
z = (p[1,2]-p[0,2])
px = p[0,0]
py = p[0,1]
pz = p[0,2]
x3 = px + t*x
y3 = py + t*y
z3 = pz + t*z
points.append([x3,y3,z3])
return points
# generates list of 30k points
gen_line(p, 30000)
So I have this 3x3 G matrix (not shown here, it's irrelevant to my problem) that I created using the two variables u (a vector, x - y) and the scalar k. x_j = (x_1 (j), x_2 (j), x_3 (j)) and y_j = (y_1 (j), y_2 (j), y_3 (j)). alpha_j is a 3x3 matrix. The A matrix is block diagonal matrix of size 3nx3n. I am having trouble with the W matrix. How do I code a matrix of size 3nx3n, where the (i,j)th block is the 3x3 matrix given by alpha_i*G_[ij]*alpha_j?? I am lost.
My alpha_j matrix also seems to be having some trouble. The loop keeps throwing me the error, "only length-1 arrays can be converted to Python scalars." pls help :/
def W(x, y, k, alpha, A):
u = x - y
n = x.shape[0]
W = np.zeros((3*n, 3*n))
for i in range(0, n-1):
for j in range(0, n-1):
#u = -np.array([[x[i,0] - x[j,0]], [x[i,1] - x[j,1]], [0]]) ??
W[i][j] = (alpha_j(alpha, A) * G(u, k) * alpha_j(alpha, A))
W[i][i] = np.zeros((n, n))
return W
def alpha_j(a, A):
alph = np.array([[0,0,0],[0,0,0],[0,0,0]],complex)
rho = np.random.rand(3,1)
for i in range(0, 2):
for j in range(0, 2):
alph[i][j] = (rho[i] * a * A[i][j])
return alph
#-------------------------------------------------------------------
x1 = np.array([[1], [2], [0]])
y1 = np.array([[4], [5], [0]])
# SYSTEM PARAMETERS
# incoming Wave angle
theta = 0 # can range from [0, 2pi)
# susceptibility
chi = 10 + 1j
# wavelength
lam = 0.5 # microns (values between .4-.7)
# frequency
k = (2 * np.pi)/lam # 1/microns
# volume
V_0 = (0.05)**3 # microns^3
# incoming wave vector
K = k * np.array([[0], [np.sin(theta)], [np.cos(theta)]])
# polarization vector
vecinc = np.array([[1], [0], [0]]) # (can choose any vector perpendicular to K)
# for the fixed alpha case
alpha = (V_0 * 3 * chi)/(chi + 3)
# 3 x 3 matrix
A = np.matlib.identity(3) # could be any symmetric matrix,
#-------------------------------------------------------------------
# TEST FUNCTIONS
test = G((x1-y1), k)
print(test)
w = W(x1, y1, k, alpha, A)
print(w)
Sometimes my W loops throws me the error, "can't set an array element with a sequence." But I need to set each array element in this arbitrary matrix W to the 3x3 matrix created by multiplying alpha by G...
To your question of how to create a new array with a block for each element, the following should do the trick:
G = np.random.random([3,3])
result = np.zeros([9,9])
num_blocks = 3
a = np.random.random([3,3])
b = np.random.random([3,3])
for i in range(G.shape[0]):
for j in range(G.shape[1]):
block_result = a*G[i,j]*b
for k in range(num_blocks):
for l in range(num_blocks):
result[3*i + k, 3*j + l] = block_result[i, j]
You should be able to generalize from there. I hope I've understood correctly.
EDIT: It looks like I haven't understood correctly. I'm leaving it in hopes it spurs you to an answer. The general idea is to generate ranges of indices to operate on, and then just operate on them directly. Slicing might be helpful, too.
Ah, you asked how to create a diagonal filled with blocks. In that case:
num_diagonal_blocks = 3 # for example
for block_dim in range(num_diagonal_blocks)
# do your block calculation...
for k in range(G.shape[0]):
for l in range(G.shape[1]):
result[3*block_dim + k, 3*block_dim + l] = # assign to element of block
I think that's nearly it.
I have written a Python script to calculate the distance between two points in 3D space while accounting for periodic boundary conditions. The problem is that I need to do this calculation for many, many points and the calculation is quite slow. Here is my function.
def PBCdist(coord1,coord2,UC):
dx = coord1[0] - coord2[0]
if (abs(dx) > UC[0]*0.5):
dx = UC[0] - dx
dy = coord1[1] - coord2[1]
if (abs(dy) > UC[1]*0.5):
dy = UC[1] - dy
dz = coord1[2] - coord2[2]
if (abs(dz) > UC[2]*0.5):
dz = UC[2] - dz
dist = np.sqrt(dx**2 + dy**2 + dz**2)
return dist
I then call the function as so
for i, coord2 in enumerate(coordlist):
if (PBCdist(coord1,coord2,UC) < radius):
do something with i
Recently I read that I can greatly increase performance by using list comprehension. The following works for the non-PBC case, but not for the PBC case
coord_indices = [i for i, y in enumerate([np.sqrt(np.sum((coord2-coord1)**2)) for coord2 in coordlist]) if y < radius]
for i in coord_indices:
do something
Is there some way to do the equivalent of this for the PBC case? Is there an alternative that would work better?
You should write your distance() function in a way that you can vectorise the loop over the 5711 points. The following implementation accepts an array of points as either the x0 or x1 parameter:
def distance(x0, x1, dimensions):
delta = numpy.abs(x0 - x1)
delta = numpy.where(delta > 0.5 * dimensions, delta - dimensions, delta)
return numpy.sqrt((delta ** 2).sum(axis=-1))
Example:
>>> dimensions = numpy.array([3.0, 4.0, 5.0])
>>> points = numpy.array([[2.7, 1.5, 4.3], [1.2, 0.3, 4.2]])
>>> distance(points, [1.5, 2.0, 2.5], dimensions)
array([ 2.22036033, 2.42280829])
The result is the array of distances between the points passed as second parameter to distance() and each point in points.
import numpy as np
bounds = np.array([10, 10, 10])
a = np.array([[0, 3, 9], [1, 1, 1]])
b = np.array([[2, 9, 1], [5, 6, 7]])
min_dists = np.min(np.dstack(((a - b) % bounds, (b - a) % bounds)), axis = 2)
dists = np.sqrt(np.sum(min_dists ** 2, axis = 1))
Here a and b are lists of vectors you wish to calculate the distance between and bounds are the boundaries of the space (so here all three dimensions go from 0 to 10 and then wrap). It calculates the distances between a[0] and b[0], a[1] and b[1], and so on.
I'm sure numpy experts could do better, but this will probably be an order of magnitude faster than what you're doing, since most of the work is now done in C.
I have found that meshgrid is very useful for generating distances. For example:
import numpy as np
row_diff, col_diff = np.meshgrid(range(7), range(8))
radius_squared = (row_diff - x_coord)**2 + (col_diff - y_coord)**2
I now have an array (radius_squared) where every entry specifies the square of the distance from the array position [x_coord, y_coord].
To circularize the array, I can do the following:
row_diff, col_diff = np.meshgrid(range(7), range(8))
row_diff = np.abs(row_diff - x_coord)
row_circ_idx = np.where(row_diff > row_diff.shape[1] / 2)
row_diff[row_circ_idx] = (row_diff[row_circ_idx] -
2 * (row_circ_idx + x_coord) +
row_diff.shape[1])
row_diff = np.abs(row_diff)
col_diff = np.abs(col_diff - y_coord)
col_circ_idx = np.where(col_diff > col_diff.shape[0] / 2)
col_diff[row_circ_idx] = (row_diff[col_circ_idx] -
2 * (col_circ_idx + y_coord) +
col_diff.shape[0])
col_diff = np.abs(row_diff)
circular_radius_squared = (row_diff - x_coord)**2 + (col_diff - y_coord)**2
I now have all the array distances circularized with vector math.
I am trying to implement the Karatsuba multiplication algorithm in c++ but right now I am just trying to get it to work in python.
Here is my code:
def mult(x, y, b, m):
if max(x, y) < b:
return x * y
bm = pow(b, m)
x0 = x / bm
x1 = x % bm
y0 = y / bm
y1 = y % bm
z2 = mult(x1, y1, b, m)
z0 = mult(x0, y0, b, m)
z1 = mult(x1 + x0, y1 + y0, b, m) - z2 - z0
return mult(z2, bm ** 2, b, m) + mult(z1, bm, b, m) + z0
What I don't get is: how should z2, z1, and z0 be created? Is using the mult function recursively correct? If so, I'm messing up somewhere because the recursion isn't stopping.
Can someone point out where the error is?
NB: the response below addresses directly the OP's question about
excessive recursion, but it does not attempt to provide a correct
Karatsuba algorithm. The other responses are far more informative in
this regard.
Try this version:
def mult(x, y, b, m):
bm = pow(b, m)
if min(x, y) <= bm:
return x * y
# NOTE the following 4 lines
x0 = x % bm
x1 = x / bm
y0 = y % bm
y1 = y / bm
z0 = mult(x0, y0, b, m)
z2 = mult(x1, y1, b, m)
z1 = mult(x1 + x0, y1 + y0, b, m) - z2 - z0
retval = mult(mult(z2, bm, b, m) + z1, bm, b, m) + z0
assert retval == x * y, "%d * %d == %d != %d" % (x, y, x * y, retval)
return retval
The most serious problem with your version is that your calculations of x0 and x1, and of y0 and y1 are flipped. Also, the algorithm's derivation does not hold if x1 and y1 are 0, because in this case, a factorization step becomes invalid. Therefore, you must avoid this possibility by ensuring that both x and y are greater than b**m.
EDIT: fixed a typo in the code; added clarifications
EDIT2:
To be clearer, commenting directly on your original version:
def mult(x, y, b, m):
# The termination condition will never be true when the recursive
# call is either
# mult(z2, bm ** 2, b, m)
# or mult(z1, bm, b, m)
#
# Since every recursive call leads to one of the above, you have an
# infinite recursion condition.
if max(x, y) < b:
return x * y
bm = pow(b, m)
# Even without the recursion problem, the next four lines are wrong
x0 = x / bm # RHS should be x % bm
x1 = x % bm # RHS should be x / bm
y0 = y / bm # RHS should be y % bm
y1 = y % bm # RHS should be y / bm
z2 = mult(x1, y1, b, m)
z0 = mult(x0, y0, b, m)
z1 = mult(x1 + x0, y1 + y0, b, m) - z2 - z0
return mult(z2, bm ** 2, b, m) + mult(z1, bm, b, m) + z0
Usually big numbers are stored as arrays of integers. Each integer represents one digit. This approach allows to multiply any number by the power of base with simple left shift of the array.
Here is my list-based implementation (may contain bugs):
def normalize(l,b):
over = 0
for i,x in enumerate(l):
over,l[i] = divmod(x+over,b)
if over: l.append(over)
return l
def sum_lists(x,y,b):
l = min(len(x),len(y))
res = map(operator.add,x[:l],y[:l])
if len(x) > l: res.extend(x[l:])
else: res.extend(y[l:])
return normalize(res,b)
def sub_lists(x,y,b):
res = map(operator.sub,x[:len(y)],y)
res.extend(x[len(y):])
return normalize(res,b)
def lshift(x,n):
if len(x) > 1 or len(x) == 1 and x[0] != 0:
return [0 for i in range(n)] + x
else: return x
def mult_lists(x,y,b):
if min(len(x),len(y)) == 0: return [0]
m = max(len(x),len(y))
if (m == 1): return normalize([x[0]*y[0]],b)
else: m >>= 1
x0,x1 = x[:m],x[m:]
y0,y1 = y[:m],y[m:]
z0 = mult_lists(x0,y0,b)
z1 = mult_lists(x1,y1,b)
z2 = mult_lists(sum_lists(x0,x1,b),sum_lists(y0,y1,b),b)
t1 = lshift(sub_lists(z2,sum_lists(z1,z0,b),b),m)
t2 = lshift(z1,m*2)
return sum_lists(sum_lists(z0,t1,b),t2,b)
sum_lists and sub_lists returns unnormalized result - single digit can be greater than the base value. normalize function solved this problem.
All functions expect to get list of digits in the reverse order. For example 12 in base 10 should be written as [2,1]. Lets take a square of 9987654321.
» a = [1,2,3,4,5,6,7,8,9]
» res = mult_lists(a,a,10)
» res.reverse()
» res
[9, 7, 5, 4, 6, 1, 0, 5, 7, 7, 8, 9, 9, 7, 1, 0, 4, 1]
The goal of the Karatsuba multiplication is to improve on the divide-and conquer multiplication algorithm by making 3 recursive calls instead of four. Therefore, the only lines in your script that should contain a recursive call to the multiplication are those assigning z0,z1 and z2. Anything else will give you a worse complexity. You can't use pow to compute bm when you haven't defined multiplication yet (and a fortiori exponentiation), either.
For that, the algorithm crucially uses the fact that it is using a positional notation system. If you have a representation x of a number in base b, then x*bm is simply obtained by shifting the digits of that representation m times to the left. That shifting operation is essentially "free" with any positional notation system. That also means that if you want to implement that, you have to reproduce this positional notation, and the "free" shift. Either you chose to compute in base b=2 and use python's bit operators (or the bit operators of a given decimal, hex, ... base if your test platform has them), or you decide to implement for educational purposes something that works for an arbitrary b, and you reproduce this positional arithmetic with something like strings, arrays, or lists.
You have a solution with lists already. I like to work with strings in python, since int(s, base) will give you the integer corresponding to the string s seen as a number representation in base base: it makes tests easy. I have posted an heavily commented string-based implementation as a gist here, including string-to-number and number-to-string primitives for good measure.
You can test it by providing padded strings with the base and their (equal) length as arguments to mult:
In [169]: mult("987654321","987654321",10,9)
Out[169]: '966551847789971041'
If you don't want to figure out the padding or count string lengths, a padding function can do it for you:
In [170]: padding("987654321","2")
Out[170]: ('987654321', '000000002', 9)
And of course it works with b>10:
In [171]: mult('987654321', '000000002', 16, 9)
Out[171]: '130eca8642'
(Check with wolfram alpha)
I believe that the idea behind the technique is that the zi terms are computed using the recursive algorithm, but the results are not unified together that way. Since the net result that you want is
z0 B^2m + z1 B^m + z2
Assuming that you choose a suitable value of B (say, 2) you can compute B^m without doing any multiplications. For example, when using B = 2, you can compute B^m using bit shifts rather than multiplications. This means that the last step can be done without doing any multiplications at all.
One more thing - I noticed that you've picked a fixed value of m for the whole algorithm. Typically, you would implement this algorithm by having m always be a value such that B^m is half the number of digits in x and y when they are written in base B. If you're using powers of two, this would be done by picking m = ceil((log x) / 2).
Hope this helps!
In Python 2.7: Save this file as Karatsuba.py
def karatsuba(x,y):
"""Karatsuba multiplication algorithm.
Return the product of two numbers in an efficient manner
#author Shashank
date: 23-09-2018
Parameters
----------
x : int
First Number
y : int
Second Number
Returns
-------
prod : int
The product of two numbers
Examples
--------
>>> import Karatsuba.karatsuba
>>> a = 1234567899876543211234567899876543211234567899876543211234567890
>>> b = 9876543211234567899876543211234567899876543211234567899876543210
>>> Karatsuba.karatsuba(a,b)
12193263210333790590595945731931108068998628253528425547401310676055479323014784354458161844612101832860844366209419311263526900
"""
if len(str(x)) == 1 or len(str(y)) == 1:
return x*y
else:
n = max(len(str(x)), len(str(y)))
m = n/2
a = x/10**m
b = x%10**m
c = y/10**m
d = y%10**m
ac = karatsuba(a,c) #step 1
bd = karatsuba(b,d) #step 2
ad_plus_bc = karatsuba(a+b, c+d) - ac - bd #step 3
prod = ac*10**(2*m) + bd + ad_plus_bc*10**m #step 4
return prod