Computing random pair with Euclidean distance less than certain value - python

I am trying to generate two (1D) points x1, x2 chosen randomly, and independently, from the uniform distribution U(-1,1) such that the euclidean distance between them is less than a certain value, dist. Here is one solution, but I'm looking for something more efficient:
def point_pair(low_=-1, hight_=1, dist = 0.001):
while(1):
x = np.random.uniform(low=low_, high=hight_)
y = np.random.uniform(low=low_, high=hight_)
length = np.linalg.norm(x-y)
if length <= dist:
return x,y
return 0,0

To generate two scalars whose magnitudes are close to each other:
import numpy as np
def point_pair(low, high, dist):
delta = np.random.uniform(-dist, dist)
a = np.random.uniform(low + dist, high - dist)
b = np.random.choice((-1, 1)) * a + delta
return a, b
Your scalar a is generated almost in the same way, but where the bounds are not [-1, 1) but [-1 + dist, 1 - dist). For b, you generate a new scalar which is uniformly distributed from -dist to +dist. This represents the bounds on the maximum distance away from a in either direction on the real line that you allow. Then b is simply k*a + delta, where again, delta is any value between -dist and +dist, and k is either -1 or 1.
This will ensure that both a and b are in [-1, 1) and that their magnitudes are similar, or ||a| - |b|| <= dist.
Note
The np.random.uniform(low, high) function always returns values in [low, high) so if you want to also include your upper bound, you'll need to use a different method.

use polar coordinates, your random numbers are angle and distance.

Related

How to fit a piecewise (alternating linear and constant segments) function to a parabolic function?

I do have a function, for example , but this can be something else as well, like a quadratic or logarithmic function. I am only interested in the domain of . The parameters of the function (a and k in this case) are known as well.
My goal is to fit a continuous piece-wise function to this, which contains alternating segments of linear functions (i.e. sloped straight segments, each with intercept of 0) and constants (i.e. horizontal segments joining the sloped segments together). The first and last segments are both sloped. And the number of segments should be pre-selected between around 9-29 (that is 5-15 linear steps + 4-14 constant plateaus).
Formally
The input function:
The fitted piecewise function:
I am looking for the optimal resulting parameters (c,r,b) (in terms of least squares) if the segment numbers (n) are specified beforehand.
The resulting constants (c) and the breakpoints (r) should be whole natural numbers, and the slopes (b) round two decimal point values.
I have tried to do the fitting numerically using the pwlf package using a segmented constant models, and further processed the resulting constant model with some graphical intuition to "slice" the constant steps with the slopes. It works to some extent, but I am sure this is suboptimal from both fitting perspective and computational efficiency. It takes multiple minutes to generate a fitting with 8 slopes on the range of 1-50000. I am sure there must be a better way to do this.
My idea would be to instead using only numerical methods/ML, the fact that we have the algebraic form of the input function could be exploited in some way to at least to use algebraic transforms (integrals) to get to a simpler optimization problem.
import numpy as np
import matplotlib.pyplot as plt
import pwlf
# The input function
def input_func(x,k,a):
return np.power(x,1/a)*k
x = np.arange(1,5e4)
y = input_func(x, 1.8, 1.3)
plt.plot(x,y);
def pw_fit(func, x_r, no_seg, *fparams):
# working on the specified range
x = np.arange(1,x_r)
y_input = func(x, *fparams)
my_pwlf = pwlf.PiecewiseLinFit(x, y_input, degree=0)
res = my_pwlf.fit(no_seg)
yHat = my_pwlf.predict(x)
# Function values at the breakpoints
y_isec = func(res, *fparams)
# Slope values at the breakpoints
slopes = np.round(y_isec / res, decimals=2)
slopes = slopes[1:]
# For the first slope value, I use the intersection of the first constant plateau and the input function
slopes = np.insert(slopes,0,np.round(y_input[np.argwhere(np.diff(np.sign(y_input - yHat))).flatten()[0]] / np.argwhere(np.diff(np.sign(y_input - yHat))).flatten()[0], decimals=2))
plateaus = np.unique(np.round(yHat))
# If due to rounding slope values (to two decimals), there is no change in a subsequent step, I just remove those segments
to_del = np.argwhere(np.diff(slopes) == 0).flatten()
slopes = np.delete(slopes,to_del + 1)
plateaus = np.delete(plateaus,to_del)
breakpoints = [np.ceil(plateaus[0]/slopes[0])]
for idx, j in enumerate(slopes[1:-1]):
breakpoints.append(np.floor(plateaus[idx]/j))
breakpoints.append(np.ceil(plateaus[idx+1]/j))
breakpoints.append(np.floor(plateaus[-1]/slopes[-1]))
return slopes, plateaus, breakpoints
slo, plat, breaks = pw_fit(input_func, 50000, 8, 1.8, 1.3)
# The piecewise function itself
def pw_calc(x, slopes, plateaus, breaks):
x = x.astype('float')
cond_list = [x < breaks[0]]
for idx, j in enumerate(breaks[:-1]):
cond_list.append((j <= x) & (x < breaks[idx+1]))
cond_list.append(breaks[-1] <= x)
func_list = [lambda x: x * slopes[0]]
for idx, j in enumerate(slopes[1:]):
func_list.append(plateaus[idx])
func_list.append(lambda x, j=j: x * j)
return np.piecewise(x, cond_list, func_list)
y_output = pw_calc(x, slo, plat, breaks)
plt.plot(x,y,y_output);
(Not important, but I think the fitted piecewise function is not continuous as it is. Intervals should be x<=r1; r1<x<=r2; ....)
As Anatolyg has pointed out, it looks to me that in the optimal solution (for the function posted at least, and probably for any where the derivative is different from zero), the horizantal segments will collapse to a point or the minimum segment length (in this case 1).
EDIT---------------------------------------------
The behavior above could only be valid if the slopes could have an intercept. If the intercepts are zero, as posted in the question, one consideration must be taken into account: Is the initial parabolic function defined in zero or nearby? Imagine the function y=0.001 *sqrt(x-1000), then the segments defined as b*x will have a slope close to zero and will be so similar to the constant segments that the best fit will be just the line that without intercept that fits better all the function.
Provided that the function is defined in zero or nearby, you can start by approximating the curve just by linear segments (with intercepts):
divide the function domain in N intervals(equal intervals or whose size is a function of the average curvature (or second derivative) of the function along the domain).
linear fit/regression in each intervals
for each interval, if a point (or bunch of points) in the extreme of any interval is better fitted by the line of the neighbor interval than the line of its interval, this point is assigned to the neighbor interval.
Repeat from 2) until no extreme points are moved.
Linear regressions might be optimized not to calculate all the covariance matrixes from scratch on each iteration, but just adding the contributions of the moved points to the previous covariance matrixes.
Then each linear segment (LSi) is replaced by a combination of a small constant segment at the beginning (Cbi), a linear segment without intercept (Si), and another constant segment at the end (Cei). This segments are easy to calculate as Si will contain the middle point of LSi, and Cbi and Cei will have respectively the begin and end values of the segment LSi. Then the intervals of each segment has to be calculated as an intersection between lines.
With this, the constant end segment will be collinear with the constant begin segment from the next interval so they will merge, resulting in a series of constant and linear segments interleaved.
But this would be a floating point start solution. Next, you will have to apply all the roundings which will mess up quite a lot all the segments as the conditions integer intervals and linear segments without slope can be very confronting. In fact, b,c,r are not totally independent. If ci and ri+1 are known, then bi+1 is already fixed
If nothing is broken so far, the final task will be to minimize the error/cost function (I assume that it will be the integral of the error between the parabolic function and the segments). My guess is that gradients here will be quite a pain, as if you change for example one ci, all the rest of the bj and cj will have to adapt as well due to the integer intervals restriction. However, if you can generalize the derivatives between parameters ( how much do I have to adapt bi+1 if ci changes a unit), you can propagate the change of one parameter to all other parameters and have kind of a gradient. Then for each interval, you can estimate what would be the ideal parameter and averaging all intervals calculate the best gradient step. Let me illustrate this:
Assuming first that r parameters are fixed, if I change c1 by one unit, b2 changes by 0.1, c2 changes by -0.2 and b3 changes by 0.2. This would be the gradient.
Then I estimate, comparing with the parabolic curve, that c1 should increase 0.5 (to reduce the cost by 10 points), b2 should increase 0.2 (to reduce the cost by 5 points), c2 should increase 0.2 (to reduce the cost by 6 points) and b3 should increase 0.1 (to reduce the cost by 9 points).
Finally, the gradient step would be (0.5/1·10 + 0.2/0.1·5 - 0.2/(-0.2)·6 + 0.1/0.2·9)/(10 + 5 + 6 + 9)~= 0.45. Thus, c1 would increase 0.45 units, b2 would increase 0.45·0.1, and so on.
When you add the r parameters to the pot, as integer intervals do not have an proper derivative, calculation is not straightforward. However, you can consider r parameters as floating points, calculate and apply the gradient step and then apply the roundings.
We can integrate the squared error function for linear and constant pieces and let SciPy optimize it. Python 3:
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize
xl = 1
xh = 50000
a = 1.3
p = 1 / a
n = 8
def split_b_and_c(bc):
return bc[::2], bc[1::2]
def solve_for_r(b, c):
r = np.empty(2 * n)
r[0] = xl
r[1:-1:2] = c / b[:-1]
r[2::2] = c / b[1:]
r[-1] = xh
return r
def linear_residual_integral(b, x):
return (
(x ** (2 * p + 1)) / (2 * p + 1)
- 2 * b * x ** (p + 2) / (p + 2)
+ b ** 2 * x ** 3 / 3
)
def constant_residual_integral(c, x):
return x ** (2 * p + 1) / (2 * p + 1) - 2 * c * x ** (p + 1) / (p + 1) + c ** 2 * x
def squared_error(bc):
b, c = split_b_and_c(bc)
r = solve_for_r(b, c)
linear = np.sum(
linear_residual_integral(b, r[1::2]) - linear_residual_integral(b, r[::2])
)
constant = np.sum(
constant_residual_integral(c, r[2::2])
- constant_residual_integral(c, r[1:-1:2])
)
return linear + constant
def evaluate(x, b, c, r):
i = 0
while x > r[i + 1]:
i += 1
return b[i // 2] * x if i % 2 == 0 else c[i // 2]
def main():
bc0 = (xl + (xh - xl) * np.arange(1, 4 * n - 2, 2) / (4 * n - 2)) ** (
p - 1 + np.arange(2 * n - 1) % 2
)
bc = scipy.optimize.minimize(
squared_error, bc0, bounds=[(1e-06, None) for i in range(2 * n - 1)]
).x
b, c = split_b_and_c(bc)
r = solve_for_r(b, c)
X = np.linspace(xl, xh, 1000)
Y = [evaluate(x, b, c, r) for x in X]
plt.plot(X, X ** p)
plt.plot(X, Y)
plt.show()
if __name__ == "__main__":
main()
I have tried to come up with a new solution myself, based on the idea of #Amo Robb, where I have partitioned the domain, and curve fitted a dual - constant and linear - piece together (with the help of np.maximum). I have used the 1 / f(x)' as the function to designate the breakpoints, but I know this is arbitrary and does not provide a global optimum. Maybe there is some optimal function for these breakpoints. But this solution is OK for me, as it might be appropriate to have a better fit at the first segments, at the expense of the error for the later segments. (The task itself is actually a cost based retail margin calculation {supply price -> added margin}, as the retail POS software can only work with such piecewise margin function).
The answer from #David Eisenstat is correct optimal solution if the parameters are allowed to be floats. Unfortunately the POS software can not use floats. It is OK to round up c-s and r-s afterwards. But the b-s should be rounded to two decimals, as those are inputted as percents, and this constraint would ruin the optimal solution with long floats. I will try to further improve my solution with both Amo's and David's valuable input. Thank You for that!
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# The input function f(x)
def input_func(x,k,a):
return np.power(x,1/a) * k
# 1 / f(x)'
def one_per_der(x,k,a):
return a / (k * np.power(x, 1/a-1))
# 1 / f(x)' inverted
def one_per_der_inv(x,k,a):
return np.power(a / (x*k), a / (1-a))
def segment_fit(start,end,y,first_val):
b, _ = curve_fit(lambda x,b: np.maximum(first_val, b*x), np.arange(start,end), y[start-1:end-1])
b = float(np.round(b, decimals=2))
bp = np.round(first_val / b)
last_val = np.round(b * end)
return b, bp, last_val
def pw_fit(end_range, no_seg, **fparams):
y_bps = np.linspace(one_per_der(1, **fparams), one_per_der(end_range,**fparams) , no_seg+1)[1:]
x_bps = np.round(one_per_der_inv(y_bps, **fparams))
y = input_func(x, **fparams)
slopes = [np.round(float(curve_fit(lambda x,b: x * b, np.arange(1,x_bps[0]), y[:int(x_bps[0])-1])[0]), decimals = 2)]
plats = [np.round(x_bps[0] * slopes[0])]
bps = []
for i, xbp in enumerate(x_bps[1:]):
b, bp, last_val = segment_fit(int(x_bps[i]+1), int(xbp), y, plats[i])
slopes.append(b); bps.append(bp); plats.append(last_val)
breaks = sorted(list(x_bps) + bps)[:-1]
# If due to rounding slope values (to two decimals), there is no change in a subsequent step, I just remove those segments
to_del = np.argwhere(np.diff(slopes) == 0).flatten()
breaks_to_del = np.concatenate((to_del * 2, to_del * 2 + 1))
slopes = np.delete(slopes,to_del + 1)
plats = np.delete(plats[:-1],to_del)
breaks = np.delete(breaks,breaks_to_del)
return slopes, plats, breaks
def pw_calc(x, slopes, plateaus, breaks):
x = x.astype('float')
cond_list = [x < breaks[0]]
for idx, j in enumerate(breaks[:-1]):
cond_list.append((j <= x) & (x < breaks[idx+1]))
cond_list.append(breaks[-1] <= x)
func_list = [lambda x: x * slopes[0]]
for idx, j in enumerate(slopes[1:]):
func_list.append(plateaus[idx])
func_list.append(lambda x, j=j: x * j)
return np.piecewise(x, cond_list, func_list)
fparams = {'k':1.8, 'a':1.2}
end_range = 5e4
no_steps = 10
x = np.arange(1, end_range)
y = input_func(x, **fparams)
slopes, plats, breaks = pw_fit(end_range, no_steps, **fparams)
y_output = pw_calc(x, slopes, plats, breaks)
plt.plot(x,y_output,y);

Rotating 1D numpy array of radial intensities into 2D array of spacial intensities

I have a numpy array filled with intensity readings at different radii in a uniform circle (for context, this is a 1D radiative transfer project for protostellar formation models: while much better models exist, my supervisor wasnts me to have the experience of producing one so I understand how others work).
I want to take that 1d array, and "rotate" it through a circle, forming a 2D array of intensities that could then be shown with imshow (or, with a bit of work, aplpy). The final array needs to be 2d, and the projection needs to be Cartesian, not polar.
I can do it with nested for loops, and I can do it with lookup tables, but I have a feeling there must be a neat way of doing it in numpy or something.
Any ideas?
EDIT:
I have had to go back and recreate my (frankly horrible) mess of for loops and if statements that I had before. If I really tried, I could probably get rid of one of the loops and one of the if statements by condensing things down. However, the aim is not to make it work with for loops, but see if there is a built in way to rotate the array.
impB is an array that differs slightly from what I stated it was before. Its actually just a list of radii where particles are detected. I then bin those into radius bins to get the intensity (or frequency if you prefer) in each radius. R is the scale factor for my radius as I run the model in a dimensionless way. iRes is a resolution scale factor, essentially how often I want to sample my radial bins. Everything else should be clear.
radJ = np.ndarray(shape=(2*iRes, 2*iRes)) # Create array of 2xRadius square
for i in range(iRes):
n = len(impB[np.where(impB[:] < ((i+1.) * (R / iRes)))]) # Count number of things within this radius +1
m = len(impB[np.where(impB[:] <= ((i) * (R / iRes)))]) # Count number of things in this radius
a = (((i + 1) * (R / iRes))**2 - ((i) * (R / iRes))**2) * math.pi # A normalisation factor based on area.....dont ask
for x in range(iRes):
for y in range(iRes):
if (x**2 + y**2) < (i * iRes)**2:
if (x**2 + y**2) >= (i * iRes)**2: # Checks for radius, and puts in cartesian space
radJ[x+iRes,y+iRes] = (n-m) / a # Put in actual intensity bins
radJ[x+iRes,-y+iRes] = (n-m) / a
radJ[-x+iRes,y+iRes] = (n-m) / a
radJ[-x+iRes,-y+iRes] = (n-m) / a
Nested loops are a simple approach for that. With ri_data_r and y containing your radius values (difference to the middle pixel) and the array for rotation, respectively, I would suggest:
from scipy import interpolate
import numpy as np
y = np.random.rand(100)
ri_data_r = np.linspace(-len(y)/2,len(y)/2,len(y))
interpol_index = interpolate.interp1d(ri_data_r, y)
xv = np.arange(-1, 1, 0.01) # adjust your matrix values here
X, Y = np.meshgrid(xv, xv)
profilegrid = np.ones(X.shape, float)
for i, x in enumerate(X[0, :]):
for k, y in enumerate(Y[:, 0]):
current_radius = np.sqrt(x ** 2 + y ** 2)
profilegrid[i, k] = interpol_index(current_radius)
print(profilegrid)
This will give you exactly what you are looking for. You just have to take in your array and calculate an symmetric array ri_data_r that has the same length as your data array and contains the distance between the actual data and the middle of the array. The code is doing this automatically.
I stumbled upon this question in a different context and I hope I understood it right. Here are two other ways of doing this. The first uses skimage.transform.warp with interpolation of desired order (here we use order=0 Nearest-neighbor). This method is slower but more precise and needs less memory then the second method.
The second one does not use interpolation, therefore is faster but also less precise and needs way more memory because it stores each 2D array containing one tilt until the end, where they are averaged with np.nanmean().
The difference between both solutions stemmed from the problem of handling the center of the final image where the tilts overlap the most, i.e. the first one would just add values with each tilt ending up out of the original range. This was "solved" by clipping the matrix in each step to a global_min and global_max (consult the code). The second one solves it by taking the mean of the tilts where they overlap, which forces us to use the np.nan.
Please, read the Example of usage and Sanity check sections in order to understand the plot titles.
Solution 1:
import numpy as np
from skimage.transform import warp
def rotate_vector(vector, deg_angle):
# Credit goes to skimage.transform.radon
assert vector.ndim == 1, 'Pass only 1D vectors, e.g. use array.ravel()'
center = vector.size // 2
square = np.zeros((vector.size, vector.size))
square[center,:] = vector
rad_angle = np.deg2rad(deg_angle)
cos_a, sin_a = np.cos(rad_angle), np.sin(rad_angle)
R = np.array([[cos_a, sin_a, -center * (cos_a + sin_a - 1)],
[-sin_a, cos_a, -center * (cos_a - sin_a - 1)],
[0, 0, 1]])
# Approx. 80% of time is spent in this function
return warp(square, R, clip=False, output_shape=((vector.size, vector.size)))
def place_vectors(vectors, deg_angles):
matrix = np.zeros((vectors.shape[-1], vectors.shape[-1]))
global_min, global_max = 0, 0
for i, deg_angle in enumerate(deg_angles):
tilt = rotate_vector(vectors[i], deg_angle)
global_min = tilt.min() if global_min > tilt.min() else global_min
global_max = tilt.max() if global_max < tilt.max() else global_max
matrix += tilt
matrix = np.clip(matrix, global_min, global_max)
return matrix
Solution 2:
Credit for the idea goes to my colleague Michael Scherbela.
import numpy as np
def rotate_vector(vector, deg_angle):
assert vector.ndim == 1, 'Pass only 1D vectors, e.g. use array.ravel()'
square = np.ones([vector.size, vector.size]) * np.nan
radius = vector.size // 2
r_values = np.linspace(-radius, radius, vector.size)
rad_angle = np.deg2rad(deg_angle)
ind_x = np.round(np.cos(rad_angle) * r_values + vector.size/2).astype(np.int)
ind_y = np.round(np.sin(rad_angle) * r_values + vector.size/2).astype(np.int)
ind_x = np.clip(ind_x, 0, vector.size-1)
ind_y = np.clip(ind_y, 0, vector.size-1)
square[ind_y, ind_x] = vector
return square
def place_vectors(vectors, deg_angles):
matrices = []
for deg_angle, vector in zip(deg_angles, vectors):
matrices.append(rotate_vector(vector, deg_angle))
matrix = np.nanmean(np.array(matrices), axis=0)
return np.nan_to_num(matrix, copy=False, nan=0.0)
Example of usage:
r = 100 # Radius of the circle, i.e. half the length of the vector
n = int(np.pi * r / 8) # Number of vectors, e.g. number of tilts in tomography
v = np.ones(2*r) # One vector, e.g. one tilt in tomography
V = np.array([v]*n) # All vectors, e.g. a sinogram in tomography
# Rotate 1D vector to a specific angle (output is 2D)
angle = 45
rotated = rotate_vector(v, angle)
# Rotate each row of a 2D array according to its angle (output is 2D)
angles = np.linspace(-90, 90, num=n, endpoint=False)
inplace = place_vectors(V, angles)
Sanity check:
These are just simple checks which by no means cover all possible edge cases. Depending on your use case you might want to extend the checks and adjust the method.
# I. Sanity check
# Assuming n <= πr and v = np.ones(2r)
# Then sum(inplace) should be approx. equal to (n * (2πr - n)) / π
# which is an area that should be covered by the tilts
desired_area = (n * (2 * np.pi * r - n)) / np.pi
covered_area = np.sum(inplace)
covered_frac = covered_area / desired_area
print(f'This method covered {covered_frac * 100:.2f}% '
'of the area which should be covered in total.')
# II. Sanity check
# Assuming n <= πr and v = np.ones(2r)
# Then a circle M with radius m <= r should be the largest circle which
# is fully covered by the vectors. I.e. its mean should be no less than 1.
# If n = πr then m = r.
# m = n / π
m = int(n / np.pi)
# Code for circular mask not included
mask = create_circular_mask(2*r, 2*r, center=None, radius=m)
m_area = np.mean(inplace[mask])
print(f'Full radius r={r}, radius m={m}, mean(M)={m_area:.4f}.')
Code for plotting:
import matplotlib.pyplot as plt
plt.figure(figsize=(16, 8))
plt.subplot(121)
rotated = np.nan_to_num(rotated) # not necessary in case of the first method
plt.title(
f'Output of rotate_vector(), angle={angle}°\n'
f'Sum is {np.sum(rotated):.2f} and should be {np.sum(v):.2f}')
plt.imshow(rotated, cmap=plt.cm.Greys_r)
plt.subplot(122)
plt.title(
f'Output of place_vectors(), r={r}, n={n}\n'
f'Covered {covered_frac * 100:.2f}% of the area which should be covered.\n'
f'Mean of the circle M is {m_area:.4f} and should be 1.0.')
plt.imshow(inplace)
circle=plt.Circle((r, r), m, color='r', fill=False)
plt.gcf().gca().add_artist(circle)
plt.gcf().gca().legend([circle], [f'Circle M (m={m})'])

Python & Algorithm: How to do simple geometry shape match?

Given a set of points (with order), I want to know if its shape is within certain types. The types are:
rectangle = [(0,0),(0,1),(1,1),(1,0)]
hexagon = [(0,0),(0,1),(1,2),(2,1),(2,0),(1,-1)]
l_shape = [(0,0),(0,3),(1,3),(1,1),(3,1),(3,0)]
concave = [(0,0),(0,3),(1,3),(1,1),(2,1),(2,3),(3,3),(3,0)]
cross = [(0,0),(0,-1),(1,-1),(1,0),(2,0),(2,1),(1,1),(1,2),(0,2),(0,1),(-1,1),(-1,0)]
For example, give roratated_rectangle = [(0,0),(1,1),(0,2),(-1,1)]
We'll know it belongs to the rectangle above.
is similar to
Notice:
rotation and edge with different length are considered similar.
Input points are ordered. (so they can be draw by path in polygon module)
How can I do it? Is there any algorithm for this?
What I'm thinking:
Maybe we can reconstruct lines from given points. And from lines, we can get angles of a shape. By comparing the angle series (both clockwise and counterclockwise), we can determine if input points are similar to the types given above.
Your thinking is basically the right idea. You want to compare the sequence of angles in your test shape to the sequence of angles in a predefined shape (for each of your predefined shapes). Since the first vertex of your test shape may not correspond to the first vertex of the matching predefined shape, we need to allow that the test shape's sequence of angles may be rotated relative to the predefined shape's sequence. (That is, your test shape's sequence might be a,b,c,d but your predefined shape is c,d,a,b.) Also, the test shape's sequence may be reversed, in which case the angles are also negated relative to the predefined shape's angles. (That is, a,b,c,d vs. -d,-c,-b,-a or equivalently 2π-d,2π-c,2π-b,2π-a.)
We could try to choose a canonical rotation for the sequence of angles. For example, we could find the lexicographically minimal rotation. (For example, l_shape's sequence as given is 3π/2,3π/2,π/2,3π/2,3π/2,3π/2. The lexicographically minimal rotation puts π/2 first: π/2,3π/2,3π/2,3π/2,3π/2,3π/2.)
However, I think there's a risk that floating-point rounding might result in us choosing different canonical rotations for the test shape vs. the predefined shape. So instead we'll just check all the rotations.
First, a function that returns the sequence of angles for a shape:
import math
def anglesForPoints(points):
def vector(tail, head):
return tuple(h - t for h, t in zip(head, tail))
points = points[:] + points[0:2]
angles = []
for p0, p1, p2 in zip(points, points[1:], points[2:]):
v0 = vector(tail=p0, head=p1)
a0 = math.atan2(v0[1], v0[0])
v1 = vector(tail=p1, head=p2)
a1 = math.atan2(v1[1], v1[0])
angle = a1 - a0
if angle < 0:
angle += 2 * math.pi
angles.append(angle)
return angles
(Note that computing the cosines using the dot product would not be sufficient, because we need a signed angle, but cos(a) == cos(-a).)
Next, a generator that produces all rotations of a list:
def allRotationsOfList(items):
for i in xrange(len(items)):
yield items[i:] + items[:i]
Finally, to determine if two shapes match:
def shapesMatch(shape0, shape1):
if len(shape0) != len(shape1):
return False
def closeEnough(a0, a1):
return abs(a0 - a1) < 0.000001
angles0 = anglesForPoints(shape0)
reversedAngles0 = list(2 * math.pi - a for a in reversed(angles0))
angles1 = anglesForPoints(shape1)
for rotatedAngles1 in allRotationsOfList(angles1):
if all(closeEnough(a0, a1) for a0, a1 in zip(angles0, rotatedAngles1)):
return True
if all(closeEnough(a0, a1) for a0, a1 in zip(reversedAngles0, rotatedAngles1)):
return True
return False
(Note that we need to use a fuzzy comparison because of floating point rounding errors. Since we know the angles will always be in the small fixed range 0 … 2π, we can use an absolute error limit.)
>>> shapesMatch([(0,0),(1,1),(0,2),(-1,1)], rectangle)
True
>>> shapesMatch([(0,0),(1,1),(0,2),(-1,1)], l_shape)
False
>>> shapesMatch([(0,0), (1,0), (1,1), (2,1), (2,2), (0,2)], l_shape)
True
If you want to compare the test shape against all the predefined shapes, you may want to compute the test shape's angle sequence just once. If you're going to test many shapes against the predefined shapes, you may want to precompute the predefined shapes' sequences just once. I leave those optimizations as an exercise for the reader.
Your shapes may not only be rotated, they are also translated and may even be scaled. The orientation of the nodes may also differ. Fo example your original square has an edge length of 1.0 and is defined anticlockwise, whereas your diamond shape has an edge length of 1.414 and is defined clockwise.
You need to find a good reference for comparison. The following should work:
Find the centre of gravity C for each shape.
Determine radial coordinates (r, φ) for all nodes, where the origin of the radial coordinate system is the centre of gravity C.
Normalise the radii for each shape so that the maximum value of r in a shape is 1.0
Ensure that the nodes are oriented anticlockwise, i.e. with increasing angles φ.
Now you have two lists of n radial coordinates. (The cases where the number of nodes in a shape don't match or where there are fewer than three nodes should have been ruled out already.)
Evaluate all n configurations of offsets, where you leave the first array as it is and shift the second. For a four-element array, you compare:
{a1, a2, a3, a4} <=> {b1, b2, b3, b4}
{a1, a2, a3, a4} <=> {b2, b3, b4, b1}
{a1, a2, a3, a4} <=> {b3, b4, b1, b2}
{a1, a2, a3, a4} <=> {b4, b1, b2, b3}
The radial coordinates are floating-point numbers . When you compare values, you should allow for some latitude to cater for inaccuracies introduced by the floating-point math. Because the numbers 0 ≤ r ≤ 1 and −π ≤ φ ≤ π are roughly in the same range, you can used a fixed epsilon for that.
Radii are compared by their normalised value. Angles are compared by their difference to the angle of the previous point in the list. When that difference is negative, we have wrapped around the 360° border and have to adjust it. (We must enforce positive angles differences, because the shape we compare to might not be equally rotated and thus may not have the wrap-around gap.) The angle is allowed to go both forwards and backwards, but must eventually come full circle.
The code has to check n configurations and test all n nodes for each. In practice, mismatches will be found early, so that the code should have okay performance. If you are going to compare a lot of shapes, it might be worth creating the normalised, anticlockwise radial representation for all shapes beforehand.
Anyway, here goes:
def radial(x, y, cx = 0.0, cy = 0.0):
"""Return radial coordinates from Cartesian ones"""
x -= cx
y -= cy
return (math.sqrt(x*x + y*y), math.atan2(y, x))
def anticlockwise(a):
"""Reverse direction when a is clockwise"""
phi0 = a[-1]
pos = 0
neg = 0
for r, phi in a:
if phi > phi0:
pos += 1
else:
neg += 1
phi0 = phi
if neg > pos:
a.reverse()
def similar_r(ar, br, eps = 0.001):
"""test two sets of radial coords for similarity"""
_, aprev = ar[-1]
_, bprev = br[-1]
for aa, bb in zip(ar, br):
# compare radii
if abs(aa[0] - bb[0]) > eps:
return False
# compare angles
da = aa[1] - aprev
db = bb[1] - bprev
if da < 0: da += 2 * math.pi
if db < 0: db += 2 * math.pi
if abs(da - db) > eps:
return False
aprev = aa[1]
bprev = bb[1]
return True
def similar(a, b):
"""Determine whether two shapes are similar"""
# Only consider shapes with same number of points
if len(a) != len(b) or len(a) < 3:
return False
# find centre of gravity
ax, ay = [1.0 * sum(x) / len(x) for x in zip(*a)]
bx, by = [1.0 * sum(x) / len(x) for x in zip(*b)]
# convert Cartesian coords into radial coords
ar = [radial(x, y, ax, ay) for x, y in a]
br = [radial(x, y, bx, by) for x, y in b]
# find maximum radius
amax = max([r for r, phi in ar])
bmax = max([r for r, phi in br])
# and normalise the coordinates with it
ar = [(r / amax, phi) for r, phi in ar]
br = [(r / bmax, phi) for r, phi in br]
# ensure both shapes are anticlockwise
anticlockwise(ar)
anticlockwise(br)
# now match radius and angle difference in n cionfigurations
n = len(a)
while n:
if similar_r(ar, br):
return True
br.append(br.pop(0)) # rotate br by one
n -= 1
return False
Edit: While this solution works, it is overcomplicated. Rob's answer is similar in essence, but uses a straightforward metric: the internal angles between the edges, which takes care of translation and scaling automatically.
Approaching this by linear algebra, each point will be a vector (x,y), which can be multiplied into a rotation matrix to get new coordinates (x1,y1) by the designated angle. There does not appear to be LaTeX support here so I can't write it out clearly, but in essence:
(cos(a) -sin(a);sin(a) cos(a))*(x y) = (x1 y1)
This results in coordinates x1, y1 that are rotated by angle "a".
EDIT: This is perhaps the underlying theory, but you can set up an algorithm to shift the shape point-by-point to (0,0), then compute the cosine of the angle between adjacent points to classify the object.

How to approximate a value from a formula

So I have one vector of alpha, one vector of beta, and I am trying to find a theta for when the sum of all the estimates (for alpha's 1 to N and beta's 1 to N) equals 60:
def CalcTheta(grensscore, alpha, beta):
theta = 0.0001
estimate = [grensscore-1]
while(sum(estimate) < grensscore):
theta += 0.00001
for x in range(len(beta)):
if x == 0:
estimate = []
estimate.append(math.exp(alpha[x] * (theta - beta[x])) /
(1 + math.exp(alpha[x] * (theta - beta[x]))))
return(theta)
Basically what I did is start from theta = 0.0001, and iterate through, calculating all these sums, and when it is lower than 60, continue by adding 0.0001 each time, while above 60 means we found the theta.
I found the value theta this way. Problem is, it took me about 60 seconds using Python, to find a theta of 0.456.
What is quicker approach to find this theta (since I would like to apply this for other data)?
If you know a lower and an upper bound for θ, and the function is monotonic in the range between these, then you could employ a bisection algorithm to easily and quickly find the desired value.

Digitizing an analog signal

I have a array of CSV values representing a digital output. It has been gathered using an analog oscilloscope so it is not a perfect digital signal. I'm trying to filter out the data to have a perfect digital signal for calculating the periods (which may vary).
I would also like to define the maximum error i get from this filtration.
Something like this:
Idea
Apply a treshold od the data. Here is a pseudocode:
for data_point_raw in data_array:
if data_point_raw < 0.8: data_point_perfect = LOW
if data_point_raw > 2 : data_point_perfect = HIGH
else:
#area between thresholds
if previous_data_point_perfect == Low : data_point_perfect = LOW
if previous_data_point_perfect == HIGH: data_point_perfect = HIGH
There are two problems bothering me.
This seems like a common problem in digital signal processing, however i haven't found a predefined standard function for it. Is this an ok way to perform the filtering?
How would I get the maximum error?
Here's a bit of code that might help.
from __future__ import division
import numpy as np
def find_transition_times(t, y, threshold):
"""
Given the input signal `y` with samples at times `t`,
find the times where `y` increases through the value `threshold`.
`t` and `y` must be 1-D numpy arrays.
Linear interpolation is used to estimate the time `t` between
samples at which the transitions occur.
"""
# Find where y crosses the threshold (increasing).
lower = y < threshold
higher = y >= threshold
transition_indices = np.where(lower[:-1] & higher[1:])[0]
# Linearly interpolate the time values where the transition occurs.
t0 = t[transition_indices]
t1 = t[transition_indices + 1]
y0 = y[transition_indices]
y1 = y[transition_indices + 1]
slope = (y1 - y0) / (t1 - t0)
transition_times = t0 + (threshold - y0) / slope
return transition_times
def periods(t, y, threshold):
"""
Given the input signal `y` with samples at times `t`,
find the time periods between the times at which the
signal `y` increases through the value `threshold`.
`t` and `y` must be 1-D numpy arrays.
"""
transition_times = find_transition_times(t, y, threshold)
deltas = np.diff(transition_times)
return deltas
if __name__ == "__main__":
import matplotlib.pyplot as plt
# Time samples
t = np.linspace(0, 50, 501)
# Use a noisy time to generate a noisy y.
tn = t + 0.05 * np.random.rand(t.size)
y = 0.6 * ( 1 + np.sin(tn) + (1./3) * np.sin(3*tn) + (1./5) * np.sin(5*tn) +
(1./7) * np.sin(7*tn) + (1./9) * np.sin(9*tn))
threshold = 0.5
deltas = periods(t, y, threshold)
print("Measured periods at threshold %g:" % threshold)
print(deltas)
print("Min: %.5g" % deltas.min())
print("Max: %.5g" % deltas.max())
print("Mean: %.5g" % deltas.mean())
print("Std dev: %.5g" % deltas.std())
trans_times = find_transition_times(t, y, threshold)
plt.plot(t, y)
plt.plot(trans_times, threshold * np.ones_like(trans_times), 'ro-')
plt.show()
The output:
Measured periods at threshold 0.5:
[ 6.29283207 6.29118893 6.27425846 6.29580066 6.28310224 6.30335003]
Min: 6.2743
Max: 6.3034
Mean: 6.2901
Std dev: 0.0092793
You could use numpy.histogram and/or matplotlib.pyplot.hist to further analyze the array returned by periods(t, y, threshold).
This is not an answer for your question, just and suggestion that may help. Im writing it here because i cant put image in comment.
I think you should normalize data somehow, before any processing.
After normalization to range of 0...1 you should apply your filter.
If you're really only interested in the period, you could plot the Fourier Transform, you'll have a peak where the frequency of the signals occurs (and so you have the period). The wider the peak in the Fourier domain, the larger the error in your period measurement
import numpy as np
data = np.asarray(my_data)
np.fft.fft(data)
Your filtering is fine, it's basically the same as a schmitt trigger, but the main problem you might have with it is speed. The benefit of using Numpy is that it can be as fast as C, whereas you have to iterate once over each element.
You can achieve something similar using the median filter from SciPy. The following should achieve a similar result (and not be dependent on any magnitudes):
filtered = scipy.signal.medfilt(raw)
filtered = numpy.where(filtered > numpy.mean(filtered), 1, 0)
You can tune the strength of the median filtering with medfilt(raw, n_samples), n_samples defaults to 3.
As for the error, that's going to be very subjective. One way would be to discretise the signal without filtering and then compare for differences. For example:
discrete = numpy.where(raw > numpy.mean(raw), 1, 0)
errors = np.count_nonzero(filtered != discrete)
error_rate = errors / len(discrete)

Categories