Related
I have the equation of a line in y=ax+b form, and a starting point, and I want to find all the pixels this line crosses over/into.
At the moment, I was just stepping the x value a bit, calculating y, truncating to find pixel index and adding it to a list if not already in the list, and continuing until reaching a destination point. Kind of as follows (python/pseudocode):
temp_x = start_x
prev_tested = None
pixel_list = []
while(not at destination):
temp_y = ... find y from x and line equation
pixel = (int(temp_y), int(temp_x))
if pixel is not the same as the prev_pixel:
pixel_list.append(pixel)
temp_x += some_step_value
But this just seems wildly inaccurate and inefficient (No need to tell me that in the answers, I understand this is a bad algo). The step value affects a lot. Too large and I will miss pixels (especially if there is a large slope). Too small and I am just wasting iterations. I figured that I can make my step value proportional to my slope, so that I try to minimize the number of iterations I have to do, while also not skipping over too much. But it is not perfect, still skipping over pixels that the line only barely enters the corner.
I feel like there has to be some kind of way to just absolutely determine which pixels a line is touching, but I have had no look finding anything myself. Is there some resource anyone could point me towards that could help with this?
Dx= X1 - X0
Dy= Y1 - Y0
D= Max(Abs(Dx), Abs(Dy))
for I= 0 to D
X= X0 + I * Dx / D
Y= Y0 + I * Dy / D
works in all cases (except the degenerate D=0) to join (X0, Y0) to (X1, Y1) using integer coordinates.
Technical note:
You can avoid the two divisions. One by the fact that the fraction simplifies to ±1, and the other by computing the quotient and remainder incrementally.
If you believe that this is not accurate enough, you can scale all coordinates by an arbitrary integer M, compute the points with step M and divide the coordinates by M.
Your step value should be always 1 . What to step over, however depends on your line being more on the horizontal or more on the vertical (that is "a < 1" or "a > 1". For one, step x on 1, and y will be a fraction of that, and for lines more on vertical, y will step with 1 and x will be a fraction of that.
def draw_line(a, b, start_x, destination):
result = []
x = start_x
y = a * x + b
result.append((int(x),int(y)))
while int(x) != destination[0] and int(y) != destination[1]:
if abs(a) < 1:
x += 1 if destination[0] > start_x else -1
y += (1 / a) if a!= 0 else 0
else:
y += 1 if destination[1] > y else -1
x += 1 / a
result.append((int(x), int(y)))
return result
# maybe there is some missing corner case for vertical lines.
I do have a function, for example , but this can be something else as well, like a quadratic or logarithmic function. I am only interested in the domain of . The parameters of the function (a and k in this case) are known as well.
My goal is to fit a continuous piece-wise function to this, which contains alternating segments of linear functions (i.e. sloped straight segments, each with intercept of 0) and constants (i.e. horizontal segments joining the sloped segments together). The first and last segments are both sloped. And the number of segments should be pre-selected between around 9-29 (that is 5-15 linear steps + 4-14 constant plateaus).
Formally
The input function:
The fitted piecewise function:
I am looking for the optimal resulting parameters (c,r,b) (in terms of least squares) if the segment numbers (n) are specified beforehand.
The resulting constants (c) and the breakpoints (r) should be whole natural numbers, and the slopes (b) round two decimal point values.
I have tried to do the fitting numerically using the pwlf package using a segmented constant models, and further processed the resulting constant model with some graphical intuition to "slice" the constant steps with the slopes. It works to some extent, but I am sure this is suboptimal from both fitting perspective and computational efficiency. It takes multiple minutes to generate a fitting with 8 slopes on the range of 1-50000. I am sure there must be a better way to do this.
My idea would be to instead using only numerical methods/ML, the fact that we have the algebraic form of the input function could be exploited in some way to at least to use algebraic transforms (integrals) to get to a simpler optimization problem.
import numpy as np
import matplotlib.pyplot as plt
import pwlf
# The input function
def input_func(x,k,a):
return np.power(x,1/a)*k
x = np.arange(1,5e4)
y = input_func(x, 1.8, 1.3)
plt.plot(x,y);
def pw_fit(func, x_r, no_seg, *fparams):
# working on the specified range
x = np.arange(1,x_r)
y_input = func(x, *fparams)
my_pwlf = pwlf.PiecewiseLinFit(x, y_input, degree=0)
res = my_pwlf.fit(no_seg)
yHat = my_pwlf.predict(x)
# Function values at the breakpoints
y_isec = func(res, *fparams)
# Slope values at the breakpoints
slopes = np.round(y_isec / res, decimals=2)
slopes = slopes[1:]
# For the first slope value, I use the intersection of the first constant plateau and the input function
slopes = np.insert(slopes,0,np.round(y_input[np.argwhere(np.diff(np.sign(y_input - yHat))).flatten()[0]] / np.argwhere(np.diff(np.sign(y_input - yHat))).flatten()[0], decimals=2))
plateaus = np.unique(np.round(yHat))
# If due to rounding slope values (to two decimals), there is no change in a subsequent step, I just remove those segments
to_del = np.argwhere(np.diff(slopes) == 0).flatten()
slopes = np.delete(slopes,to_del + 1)
plateaus = np.delete(plateaus,to_del)
breakpoints = [np.ceil(plateaus[0]/slopes[0])]
for idx, j in enumerate(slopes[1:-1]):
breakpoints.append(np.floor(plateaus[idx]/j))
breakpoints.append(np.ceil(plateaus[idx+1]/j))
breakpoints.append(np.floor(plateaus[-1]/slopes[-1]))
return slopes, plateaus, breakpoints
slo, plat, breaks = pw_fit(input_func, 50000, 8, 1.8, 1.3)
# The piecewise function itself
def pw_calc(x, slopes, plateaus, breaks):
x = x.astype('float')
cond_list = [x < breaks[0]]
for idx, j in enumerate(breaks[:-1]):
cond_list.append((j <= x) & (x < breaks[idx+1]))
cond_list.append(breaks[-1] <= x)
func_list = [lambda x: x * slopes[0]]
for idx, j in enumerate(slopes[1:]):
func_list.append(plateaus[idx])
func_list.append(lambda x, j=j: x * j)
return np.piecewise(x, cond_list, func_list)
y_output = pw_calc(x, slo, plat, breaks)
plt.plot(x,y,y_output);
(Not important, but I think the fitted piecewise function is not continuous as it is. Intervals should be x<=r1; r1<x<=r2; ....)
As Anatolyg has pointed out, it looks to me that in the optimal solution (for the function posted at least, and probably for any where the derivative is different from zero), the horizantal segments will collapse to a point or the minimum segment length (in this case 1).
EDIT---------------------------------------------
The behavior above could only be valid if the slopes could have an intercept. If the intercepts are zero, as posted in the question, one consideration must be taken into account: Is the initial parabolic function defined in zero or nearby? Imagine the function y=0.001 *sqrt(x-1000), then the segments defined as b*x will have a slope close to zero and will be so similar to the constant segments that the best fit will be just the line that without intercept that fits better all the function.
Provided that the function is defined in zero or nearby, you can start by approximating the curve just by linear segments (with intercepts):
divide the function domain in N intervals(equal intervals or whose size is a function of the average curvature (or second derivative) of the function along the domain).
linear fit/regression in each intervals
for each interval, if a point (or bunch of points) in the extreme of any interval is better fitted by the line of the neighbor interval than the line of its interval, this point is assigned to the neighbor interval.
Repeat from 2) until no extreme points are moved.
Linear regressions might be optimized not to calculate all the covariance matrixes from scratch on each iteration, but just adding the contributions of the moved points to the previous covariance matrixes.
Then each linear segment (LSi) is replaced by a combination of a small constant segment at the beginning (Cbi), a linear segment without intercept (Si), and another constant segment at the end (Cei). This segments are easy to calculate as Si will contain the middle point of LSi, and Cbi and Cei will have respectively the begin and end values of the segment LSi. Then the intervals of each segment has to be calculated as an intersection between lines.
With this, the constant end segment will be collinear with the constant begin segment from the next interval so they will merge, resulting in a series of constant and linear segments interleaved.
But this would be a floating point start solution. Next, you will have to apply all the roundings which will mess up quite a lot all the segments as the conditions integer intervals and linear segments without slope can be very confronting. In fact, b,c,r are not totally independent. If ci and ri+1 are known, then bi+1 is already fixed
If nothing is broken so far, the final task will be to minimize the error/cost function (I assume that it will be the integral of the error between the parabolic function and the segments). My guess is that gradients here will be quite a pain, as if you change for example one ci, all the rest of the bj and cj will have to adapt as well due to the integer intervals restriction. However, if you can generalize the derivatives between parameters ( how much do I have to adapt bi+1 if ci changes a unit), you can propagate the change of one parameter to all other parameters and have kind of a gradient. Then for each interval, you can estimate what would be the ideal parameter and averaging all intervals calculate the best gradient step. Let me illustrate this:
Assuming first that r parameters are fixed, if I change c1 by one unit, b2 changes by 0.1, c2 changes by -0.2 and b3 changes by 0.2. This would be the gradient.
Then I estimate, comparing with the parabolic curve, that c1 should increase 0.5 (to reduce the cost by 10 points), b2 should increase 0.2 (to reduce the cost by 5 points), c2 should increase 0.2 (to reduce the cost by 6 points) and b3 should increase 0.1 (to reduce the cost by 9 points).
Finally, the gradient step would be (0.5/1·10 + 0.2/0.1·5 - 0.2/(-0.2)·6 + 0.1/0.2·9)/(10 + 5 + 6 + 9)~= 0.45. Thus, c1 would increase 0.45 units, b2 would increase 0.45·0.1, and so on.
When you add the r parameters to the pot, as integer intervals do not have an proper derivative, calculation is not straightforward. However, you can consider r parameters as floating points, calculate and apply the gradient step and then apply the roundings.
We can integrate the squared error function for linear and constant pieces and let SciPy optimize it. Python 3:
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize
xl = 1
xh = 50000
a = 1.3
p = 1 / a
n = 8
def split_b_and_c(bc):
return bc[::2], bc[1::2]
def solve_for_r(b, c):
r = np.empty(2 * n)
r[0] = xl
r[1:-1:2] = c / b[:-1]
r[2::2] = c / b[1:]
r[-1] = xh
return r
def linear_residual_integral(b, x):
return (
(x ** (2 * p + 1)) / (2 * p + 1)
- 2 * b * x ** (p + 2) / (p + 2)
+ b ** 2 * x ** 3 / 3
)
def constant_residual_integral(c, x):
return x ** (2 * p + 1) / (2 * p + 1) - 2 * c * x ** (p + 1) / (p + 1) + c ** 2 * x
def squared_error(bc):
b, c = split_b_and_c(bc)
r = solve_for_r(b, c)
linear = np.sum(
linear_residual_integral(b, r[1::2]) - linear_residual_integral(b, r[::2])
)
constant = np.sum(
constant_residual_integral(c, r[2::2])
- constant_residual_integral(c, r[1:-1:2])
)
return linear + constant
def evaluate(x, b, c, r):
i = 0
while x > r[i + 1]:
i += 1
return b[i // 2] * x if i % 2 == 0 else c[i // 2]
def main():
bc0 = (xl + (xh - xl) * np.arange(1, 4 * n - 2, 2) / (4 * n - 2)) ** (
p - 1 + np.arange(2 * n - 1) % 2
)
bc = scipy.optimize.minimize(
squared_error, bc0, bounds=[(1e-06, None) for i in range(2 * n - 1)]
).x
b, c = split_b_and_c(bc)
r = solve_for_r(b, c)
X = np.linspace(xl, xh, 1000)
Y = [evaluate(x, b, c, r) for x in X]
plt.plot(X, X ** p)
plt.plot(X, Y)
plt.show()
if __name__ == "__main__":
main()
I have tried to come up with a new solution myself, based on the idea of #Amo Robb, where I have partitioned the domain, and curve fitted a dual - constant and linear - piece together (with the help of np.maximum). I have used the 1 / f(x)' as the function to designate the breakpoints, but I know this is arbitrary and does not provide a global optimum. Maybe there is some optimal function for these breakpoints. But this solution is OK for me, as it might be appropriate to have a better fit at the first segments, at the expense of the error for the later segments. (The task itself is actually a cost based retail margin calculation {supply price -> added margin}, as the retail POS software can only work with such piecewise margin function).
The answer from #David Eisenstat is correct optimal solution if the parameters are allowed to be floats. Unfortunately the POS software can not use floats. It is OK to round up c-s and r-s afterwards. But the b-s should be rounded to two decimals, as those are inputted as percents, and this constraint would ruin the optimal solution with long floats. I will try to further improve my solution with both Amo's and David's valuable input. Thank You for that!
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# The input function f(x)
def input_func(x,k,a):
return np.power(x,1/a) * k
# 1 / f(x)'
def one_per_der(x,k,a):
return a / (k * np.power(x, 1/a-1))
# 1 / f(x)' inverted
def one_per_der_inv(x,k,a):
return np.power(a / (x*k), a / (1-a))
def segment_fit(start,end,y,first_val):
b, _ = curve_fit(lambda x,b: np.maximum(first_val, b*x), np.arange(start,end), y[start-1:end-1])
b = float(np.round(b, decimals=2))
bp = np.round(first_val / b)
last_val = np.round(b * end)
return b, bp, last_val
def pw_fit(end_range, no_seg, **fparams):
y_bps = np.linspace(one_per_der(1, **fparams), one_per_der(end_range,**fparams) , no_seg+1)[1:]
x_bps = np.round(one_per_der_inv(y_bps, **fparams))
y = input_func(x, **fparams)
slopes = [np.round(float(curve_fit(lambda x,b: x * b, np.arange(1,x_bps[0]), y[:int(x_bps[0])-1])[0]), decimals = 2)]
plats = [np.round(x_bps[0] * slopes[0])]
bps = []
for i, xbp in enumerate(x_bps[1:]):
b, bp, last_val = segment_fit(int(x_bps[i]+1), int(xbp), y, plats[i])
slopes.append(b); bps.append(bp); plats.append(last_val)
breaks = sorted(list(x_bps) + bps)[:-1]
# If due to rounding slope values (to two decimals), there is no change in a subsequent step, I just remove those segments
to_del = np.argwhere(np.diff(slopes) == 0).flatten()
breaks_to_del = np.concatenate((to_del * 2, to_del * 2 + 1))
slopes = np.delete(slopes,to_del + 1)
plats = np.delete(plats[:-1],to_del)
breaks = np.delete(breaks,breaks_to_del)
return slopes, plats, breaks
def pw_calc(x, slopes, plateaus, breaks):
x = x.astype('float')
cond_list = [x < breaks[0]]
for idx, j in enumerate(breaks[:-1]):
cond_list.append((j <= x) & (x < breaks[idx+1]))
cond_list.append(breaks[-1] <= x)
func_list = [lambda x: x * slopes[0]]
for idx, j in enumerate(slopes[1:]):
func_list.append(plateaus[idx])
func_list.append(lambda x, j=j: x * j)
return np.piecewise(x, cond_list, func_list)
fparams = {'k':1.8, 'a':1.2}
end_range = 5e4
no_steps = 10
x = np.arange(1, end_range)
y = input_func(x, **fparams)
slopes, plats, breaks = pw_fit(end_range, no_steps, **fparams)
y_output = pw_calc(x, slopes, plats, breaks)
plt.plot(x,y_output,y);
I am working on finding the frequencies from a given dataset and I am struggling to understand how np.fft.fft() works. I thought I had a working script but ran into a weird issue that I cannot understand.
I have a dataset that is roughly sinusoidal and I wanted to understand what frequencies the signal is composed of. Once I took the FFT, I got this plot:
However, when I take the same dataset, slice it in half, and plot the same thing, I get this:
I do not understand why the frequency drops from 144kHz to 128kHz which technically should be the same dataset but with a smaller length.
I can confirm a few things:
Step size between data points 0.001
I have tried interpolation with little luck.
If I slice the second half of the dataset I get a different frequency as well.
If my dataset is indeed composed of both 128 and 144kHz, then why doesn't the 128 peak show up in the first plot?
What is even more confusing is that I am running a script with pure sine waves without issues:
T = 0.001
fs = 1 / T
def find_nearest_ind(data, value):
return (np.abs(data - value)).argmin()
x = np.arange(0, 30, T)
ff = 0.2
y = np.sin(2 * ff * np.pi * x)
x = x[:len(x) // 2]
y = y[:len(y) // 2]
n = len(y) # length of the signal
k = np.arange(n)
T = n / fs
frq = k / T * 1e6 / 1000 # two sides frequency range
frq = frq[:len(frq) // 2] # one side frequency range
Y = np.fft.fft(y) / n # dft and normalization
Y = Y[:n // 2]
frq = frq[:50]
Y = Y[:50]
fig, (ax1, ax2) = plt.subplots(2)
ax1.plot(x, y)
ax1.set_xlabel("Time (us)")
ax1.set_ylabel("Electric Field (V / mm)")
peak_ind = find_nearest_ind(abs(Y), np.max(abs(Y)))
ax2.plot(frq, abs(Y))
ax2.axvline(frq[peak_ind], color = 'black', linestyle = '--', label = F"Frequency = {round(frq[peak_ind], 3)}kHz")
plt.legend()
plt.xlabel('Freq(kHz)')
ax1.title.set_text('dV/dX vs. Time')
ax2.title.set_text('Frequencies')
fig.tight_layout()
plt.show()
Here is a breakdown of your code, with some suggestions for improvement, and extra explanations. Working through it carefully will show you what is going on. The results you are getting are completely expected. I will propose a common solution at the end.
First set up your units correctly. I assume that you are dealing with seconds, not microseconds. You can adjust later as long as you stay consistent.
Establish the period and frequency of the sampling. This means that the Nyquist frequency for the FFT will be 500Hz:
T = 0.001 # 1ms sampling period
fs = 1 / T # 1kHz sampling frequency
Make a time domain of 30e3 points. The half domain will contain 15000 points. That implies a frequency resolution of 500Hz / 15k = 0.03333Hz.
x = np.arange(0, 30, T) # time domain
n = x.size # number of points: 30000
Before doing anything else, we can define our time domain right here. I prefer a more intuitive approach than what you are using. That way you don't have to redefine T or introduce the auxiliary variable k. But as long as the results are the same, it does not really matter:
F = np.linspace(0, 1 - 1/n, n) / T # Notice F[1] = 0.03333, as predicted
Now define the signal. You picked ff = 0.2. Notice that 0.2Hz. 0.2 / 0.03333 = 6, so you would expect to see your peak in exactly bin index 6 (F[6] == 0.2). To better illustrate what is going on, let's take ff = 0.22. This will bleed the spectrum into neighboring bins.
ff = 0.22
y = np.sin(2 * np.pi * ff * x)
Now take the FFT:
Y = np.fft.fft(y) / n
maxbin = np.abs(Y).argmax() # 7
maxF = F[maxbin] # 0.23333333: This is the nearest bin
Since your frequency bins are 0.03Hz wide, the best resolution you can expect 0.015Hz. For your real data, which has much lower resolution, the error is much larger.
Now let's take a look at what happens when you halve the data size. Among other things, the frequency resolution becomes smaller. Now you have a maximum frequency of 500Hz spread over 7.5k samples, not 15k: the resolution drops to 0.066666Hz per bin:
n2 = n // 2 # 15000
F2 = np.linspace(0, 1 - 1 / n2, n2) / T # F[1] = 0.06666
Y2 = np.fft.fft(y[:n2]) / n2
Take a look what happens to the frequency estimate:
maxbin2 = np.abs(Y2).argmax() # 3
maxF2 = F2[maxbin2] # 0.2: This is the nearest bin
Hopefully, you can see how this applies to your original data. In the full FFT, you have a resolution of ~16.1 per bin with the full data, and ~32.2kHz with the half data. So your original result is within ~±8kHz of the right peak, while the second one is within ~±16kHz. The true frequency is therefore between 136kHz and 144kHz. Another way to look at it is to compare the bins that you showed me:
full: 128.7 144.8 160.9
half: 96.6 128.7 160.9
When you take out exactly half of the data, you drop every other frequency bin. If your peak was originally closest to 144.8kHz, and you drop that bin, it will end up in either 128.7 or 160.9.
Note: Based on the bin numbers you show, I suspect that your computation of frq is a little off. Notice the 1 - 1/n in my linspace expression. You need that to get the right frequency axis: the last bin is (1 - 1/n) / T, not 1 / T, no matter how you compute it.
So how to get around this problem? The simplest solution is to do a parabolic fit on the three points around your peak. That is usually a sufficiently good estimator of the true frequency in the data when you are looking for essentially perfect sinusoids.
def peakF(F, Y):
index = np.abs(Y).argmax()
# Compute offset on normalized domain [-1, 0, 1], not F[index-1:index+2]
y = np.abs(Y[index - 1:index + 2])
# This is the offset from zero, which is the scaled offset from F[index]
vertex = (y[0] - y[2]) / (0.5 * (y[0] + y[2]) - y[1])
# F[1] is the bin resolution
return F[index] + vertex * F[1]
In case you are wondering how I got the formula for the parabola: I solved the system with x = [-1, 0, 1] and y = Y[index - 1:index + 2]. The matrix equation is
[(-1)^2 -1 1] [a] Y[index - 1]
[ 0^2 0 1] # [b] = Y[index]
[ 1^2 1 1] [c] Y[index + 1]
Computing the offset using a normalized domain and scaling afterwards is almost always more numerically stable than using whatever huge numbers you have in F[index - 1:index + 2].
You can plug the results in the example into this function to see if it works:
>>> peakF(F, Y)
0.2261613409657391
>>> peakF(F2, Y2)
0.20401580936430794
As you can see, the parabolic fit gives an improvement, however slight. There is no replacement for just increasing frequency resolution through more samples though!
I have an algorithm in which I need to work out the signed angle (-180 to 180) between edges in a graph. I've done some research and found plenty of specific answers but can't figure out how to relate them to my situation (e.g. this question which uses atan2, however the OP wanted only positive angles)
I've tried implementing a few different ways (using atan2 or arccos) but I'm struggling to relate the examples to my specific problem. I've tried treating the edges as vectors but got strange results.
Given a graph with points (A, B, C, D, E), and the average of those points (avg)... how do I find the signed angle between one of those points (e.g. A) and the other points (e.g. B, C, D, E), taking the angle from the current origin (A) to the 'avg' point as equal to 0 degrees. Example below...
...in this example, the anti-clockwise angle from (A, avg) to (A, B) would be positive something (between 0 and 180), and the angle from (A, avg) to (A, E) would be negative something (between 0 and -180).
Ideally I want a formula which I could also apply to defining any of the points as the origin, for example taking point C as the origin.. the 'zero angle' would be (C, avg) and the angle between (C, avg) and (C, A) would be negative (0 to -180) and the angle between (C, avg) and (C, E) would be positive (0 to 180).
I haven't studied math beyond high-school so I find it hard to decipher equations with symbols I don't understand.
UPDATE: Thought I'd clean this up to make it more obvious what the conclusion was.
I made two small changes to the accepted answer, resulting in the below snippet:
def angle(vertex, start, dest):
AhAB = math.atan2((dest.y - vertex.y), (dest.x - vertex.x))
AhAO = math.atan2((start.y - vertex.y), (start.x - vertex.x))
AB = AhAB - AhAO
# in between 0-math.pi = do nothing, more than math.pi = +(-2 * math.pi), less than zero = do nothing
AB = math.degrees(AB + (-2 * math.pi if AB > math.pi else (2 * math.pi if AB < 0 - math.pi else 0)))
return AB
...the final one-liner may be a bit much to grok after a few months of not working on this, so I turned it into it's own function, taking the result of AB = AhAB - AhAO as it's argument...
def calc(ab):
if ab > math.pi:
return ab + (-2 * math.pi)
else:
if ab < 0 - math.pi:
return ab + (2 * math.pi)
else:
return ab + 0
I thought this was a little clearer to read, though more lines.
The final function in full:
def angle(vertex, start, dest):
"""Calculates the signed angle between two edges with the same origin.
Origin is the 'vertex' argument, 'start' is the bounding point of the edge to calculate the angle from.
Positively signed result means anti-clockwise rotation about the vertex."""
def calc_radians(ab):
if ab > math.pi:
return ab + (-2 * math.pi)
else:
if ab < 0 - math.pi:
return ab + (2 * math.pi)
else:
return ab + 0
AhAB = math.atan2((dest.y - vertex.y), (dest.x - vertex.x))
AhAO = math.atan2((start.y - vertex.y), (start.x - vertex.x))
res = calc_radians(AhAB - AhAO)
return math.degrees(res)
Note: The function assumes the three arguments will all be instances of a typical Point class with x and y attributes.
Also, the example graph above has only positive values, but I am fairly sure that this works with graphs that involve negative values too.
I read your problem statement as follows: given 2 points A and B, and a center O, find the angle A to B as the angle, positive if anticlockwise, between the vectors A→O and A→B.
If my premises are correct, then you can
find the angle between A→B and a horizontal, rightward line passing in A,
find the angle between A→O and a horizontal, rightward line passing in A,
find the angle A to B as the difference of said angles,
normalize the result range so that it's between -π and +π.
What I've said can be visualized as follows
or in code (assuming a Point class with attributes x and y)
AhAB = math.atan2((B.y-A.y), (B.x-A.x)) # -π < AhAB ≤ +π
AhAO = math.atan2((O.y-A.y), (O.x-A.x)) # -π < AhA) ≤ +π
AB = AhAB - AhAO # -2π < AB ≤ +2π
AB = AB + ( 2*math.pi if AB < math.pi else (-2*math.pi if AB> math.pi else 0))
Addendum
Here it is a small code example, the position of the points is just similar to what you can see in the picture
In [18]: from math import atan2, pi
In [21]: class Point():
...: def __init__(self, x, y):
...: self.x, self.y = x, y
...: def __repr__(self):
...: return '(%s, %s)'%(self.x, self.y)
In [22]: A = Point(0.0, 0.0)
In [23]: B = Point(-2.0, 2.0)
In [24]: O = Point(0.0, -3.0)
In [25]: AhAB = atan2((B.y-A.y), (B.x-A.x)) ; print(3/4, AhAB/pi)
0.75 0.75
In [26]: AhAO = atan2((O.y-A.y), (O.x-A.x)) ; print(-1/2, AhAO/pi)
-0.5 -0.5
In [27]: AB = AhAB - AhAO ; print(5/4, AB/pi)
1.25 1.25
In [28]: AB = AB + ( 2*pi if AB < pi else (-2*pi if AB> pi else 0)) ; print(AB/pi)
-0.75
In [29]:
The last line normalize your result AB to be in the correct range -π < AB ≤ π, adding or subtracting 2π that doesn't change the meaning of the measured angle.
The definition of positive and negative angles is heavily depending on the reference system or reference point. Despite of its 'correct' definition, the basic calculation can be pretty much done based on the slope between two points and the resulting angle of incline which can be calculated by applying the inverse tan to the slope.
Applying the inverse tan in programming can be a bit annoying since many programming languages provide two different functions for this:
arctan or atan which is implemented in Python's math.atan() or numpy.atan()
arctan2 or atan2 which is delivered by math.atan2() or numpy.atan2()
Both of these functions, regardless of the implementation in the math module or numpy package, return the calculated angle in radians which is basically based on the number Pi instead of degrees which makes some further conversion necessary. This can either be done manually or by applying a function like numpy.rad2deg(). To get a basic idea of the data points and to get some eye-balled estimation for the calculated results, I suggest plotting the data point by using matplotlib.
Glueing all the before-mentioned considerations into code can look like this:
import pandas as pd
import matplotlib
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
# Define some sample data points
coords = {
'A': (1.5, 3.0),
'B': (3.0, 5.0),
'C': (5.5, 4.5),
'D': (5.8, 2.2),
'E': (2.8, 1.2)
}
# Extract data values from `coords` dict
values = np.array(list(coords.values()))
# Calculate the averaged point of all data points
avg = np.mean(values, axis=0)
# Plot sample data for better overview
for k, v in coords.items():
plt.plot(*v, marker='o', linestyle='')
plt.text(*v, k)
plt.plot(*avg, marker='o', linestyle='')
plt.text(*avg, 'avg')
plt.show()
# For further information about slope and angle of incline
# see Wikipedia (https://en.wikipedia.org/wiki/Slope).
# Calculating the angle from `avg` to each point. Please adopt
# to your own needs if needed for other pairs of points.
# Calculate the distance in x- and y-direction from each point to point `avg`
distances_x_y = (values - avg)
# Depending on your definition of the 'reference point' consider using
# distances_x_y = (avg - values)
# For further explanation on `atan` and `atan2` see
# https://stackoverflow.com/q/35749246/3991125 and
# https://en.wikipedia.org/wiki/Atan2 .
# Using a for loop instead of numpy's array/vectors is not very elegant,
# but easy to understand and therefore has potential for improvements.
# Calculate angle from point `avg` to each other point based on distances
angle_radians = np.array([np.arctan2(element[1], element[0]) for element in distances_x_y])
# since `.arctan2()` or `.arctan()` return the angle in radians,
# we need to convert to degrees
angle_degrees = np.rad2deg(angle_radians)
# print results
print(angle_degrees)
If you consider the coordinates x0=xavg-xA, y0=yavg-yA and x=xPoint-xA,y=yPoint-yA, the formula f(x,y) gives the signed angle that is positive as counter clockwise.
f(x,y)=pi()/2*((1+sign(x0))* (1-sign(y0^2))-(1+sign(x))* (1-sign(y^2)))
+pi()/4*((2+sign(x0))*sign(y0)-(2+sign(x))*sign(y))
+sign(x0*y0)*atan((abs(x0)-abs(y0))/(abs(x0)+abs(y0)))
-sign(x*y)*atan((abs(x)-abs(y))/(abs(x)+abs(y)))
I have a numpy array filled with intensity readings at different radii in a uniform circle (for context, this is a 1D radiative transfer project for protostellar formation models: while much better models exist, my supervisor wasnts me to have the experience of producing one so I understand how others work).
I want to take that 1d array, and "rotate" it through a circle, forming a 2D array of intensities that could then be shown with imshow (or, with a bit of work, aplpy). The final array needs to be 2d, and the projection needs to be Cartesian, not polar.
I can do it with nested for loops, and I can do it with lookup tables, but I have a feeling there must be a neat way of doing it in numpy or something.
Any ideas?
EDIT:
I have had to go back and recreate my (frankly horrible) mess of for loops and if statements that I had before. If I really tried, I could probably get rid of one of the loops and one of the if statements by condensing things down. However, the aim is not to make it work with for loops, but see if there is a built in way to rotate the array.
impB is an array that differs slightly from what I stated it was before. Its actually just a list of radii where particles are detected. I then bin those into radius bins to get the intensity (or frequency if you prefer) in each radius. R is the scale factor for my radius as I run the model in a dimensionless way. iRes is a resolution scale factor, essentially how often I want to sample my radial bins. Everything else should be clear.
radJ = np.ndarray(shape=(2*iRes, 2*iRes)) # Create array of 2xRadius square
for i in range(iRes):
n = len(impB[np.where(impB[:] < ((i+1.) * (R / iRes)))]) # Count number of things within this radius +1
m = len(impB[np.where(impB[:] <= ((i) * (R / iRes)))]) # Count number of things in this radius
a = (((i + 1) * (R / iRes))**2 - ((i) * (R / iRes))**2) * math.pi # A normalisation factor based on area.....dont ask
for x in range(iRes):
for y in range(iRes):
if (x**2 + y**2) < (i * iRes)**2:
if (x**2 + y**2) >= (i * iRes)**2: # Checks for radius, and puts in cartesian space
radJ[x+iRes,y+iRes] = (n-m) / a # Put in actual intensity bins
radJ[x+iRes,-y+iRes] = (n-m) / a
radJ[-x+iRes,y+iRes] = (n-m) / a
radJ[-x+iRes,-y+iRes] = (n-m) / a
Nested loops are a simple approach for that. With ri_data_r and y containing your radius values (difference to the middle pixel) and the array for rotation, respectively, I would suggest:
from scipy import interpolate
import numpy as np
y = np.random.rand(100)
ri_data_r = np.linspace(-len(y)/2,len(y)/2,len(y))
interpol_index = interpolate.interp1d(ri_data_r, y)
xv = np.arange(-1, 1, 0.01) # adjust your matrix values here
X, Y = np.meshgrid(xv, xv)
profilegrid = np.ones(X.shape, float)
for i, x in enumerate(X[0, :]):
for k, y in enumerate(Y[:, 0]):
current_radius = np.sqrt(x ** 2 + y ** 2)
profilegrid[i, k] = interpol_index(current_radius)
print(profilegrid)
This will give you exactly what you are looking for. You just have to take in your array and calculate an symmetric array ri_data_r that has the same length as your data array and contains the distance between the actual data and the middle of the array. The code is doing this automatically.
I stumbled upon this question in a different context and I hope I understood it right. Here are two other ways of doing this. The first uses skimage.transform.warp with interpolation of desired order (here we use order=0 Nearest-neighbor). This method is slower but more precise and needs less memory then the second method.
The second one does not use interpolation, therefore is faster but also less precise and needs way more memory because it stores each 2D array containing one tilt until the end, where they are averaged with np.nanmean().
The difference between both solutions stemmed from the problem of handling the center of the final image where the tilts overlap the most, i.e. the first one would just add values with each tilt ending up out of the original range. This was "solved" by clipping the matrix in each step to a global_min and global_max (consult the code). The second one solves it by taking the mean of the tilts where they overlap, which forces us to use the np.nan.
Please, read the Example of usage and Sanity check sections in order to understand the plot titles.
Solution 1:
import numpy as np
from skimage.transform import warp
def rotate_vector(vector, deg_angle):
# Credit goes to skimage.transform.radon
assert vector.ndim == 1, 'Pass only 1D vectors, e.g. use array.ravel()'
center = vector.size // 2
square = np.zeros((vector.size, vector.size))
square[center,:] = vector
rad_angle = np.deg2rad(deg_angle)
cos_a, sin_a = np.cos(rad_angle), np.sin(rad_angle)
R = np.array([[cos_a, sin_a, -center * (cos_a + sin_a - 1)],
[-sin_a, cos_a, -center * (cos_a - sin_a - 1)],
[0, 0, 1]])
# Approx. 80% of time is spent in this function
return warp(square, R, clip=False, output_shape=((vector.size, vector.size)))
def place_vectors(vectors, deg_angles):
matrix = np.zeros((vectors.shape[-1], vectors.shape[-1]))
global_min, global_max = 0, 0
for i, deg_angle in enumerate(deg_angles):
tilt = rotate_vector(vectors[i], deg_angle)
global_min = tilt.min() if global_min > tilt.min() else global_min
global_max = tilt.max() if global_max < tilt.max() else global_max
matrix += tilt
matrix = np.clip(matrix, global_min, global_max)
return matrix
Solution 2:
Credit for the idea goes to my colleague Michael Scherbela.
import numpy as np
def rotate_vector(vector, deg_angle):
assert vector.ndim == 1, 'Pass only 1D vectors, e.g. use array.ravel()'
square = np.ones([vector.size, vector.size]) * np.nan
radius = vector.size // 2
r_values = np.linspace(-radius, radius, vector.size)
rad_angle = np.deg2rad(deg_angle)
ind_x = np.round(np.cos(rad_angle) * r_values + vector.size/2).astype(np.int)
ind_y = np.round(np.sin(rad_angle) * r_values + vector.size/2).astype(np.int)
ind_x = np.clip(ind_x, 0, vector.size-1)
ind_y = np.clip(ind_y, 0, vector.size-1)
square[ind_y, ind_x] = vector
return square
def place_vectors(vectors, deg_angles):
matrices = []
for deg_angle, vector in zip(deg_angles, vectors):
matrices.append(rotate_vector(vector, deg_angle))
matrix = np.nanmean(np.array(matrices), axis=0)
return np.nan_to_num(matrix, copy=False, nan=0.0)
Example of usage:
r = 100 # Radius of the circle, i.e. half the length of the vector
n = int(np.pi * r / 8) # Number of vectors, e.g. number of tilts in tomography
v = np.ones(2*r) # One vector, e.g. one tilt in tomography
V = np.array([v]*n) # All vectors, e.g. a sinogram in tomography
# Rotate 1D vector to a specific angle (output is 2D)
angle = 45
rotated = rotate_vector(v, angle)
# Rotate each row of a 2D array according to its angle (output is 2D)
angles = np.linspace(-90, 90, num=n, endpoint=False)
inplace = place_vectors(V, angles)
Sanity check:
These are just simple checks which by no means cover all possible edge cases. Depending on your use case you might want to extend the checks and adjust the method.
# I. Sanity check
# Assuming n <= πr and v = np.ones(2r)
# Then sum(inplace) should be approx. equal to (n * (2πr - n)) / π
# which is an area that should be covered by the tilts
desired_area = (n * (2 * np.pi * r - n)) / np.pi
covered_area = np.sum(inplace)
covered_frac = covered_area / desired_area
print(f'This method covered {covered_frac * 100:.2f}% '
'of the area which should be covered in total.')
# II. Sanity check
# Assuming n <= πr and v = np.ones(2r)
# Then a circle M with radius m <= r should be the largest circle which
# is fully covered by the vectors. I.e. its mean should be no less than 1.
# If n = πr then m = r.
# m = n / π
m = int(n / np.pi)
# Code for circular mask not included
mask = create_circular_mask(2*r, 2*r, center=None, radius=m)
m_area = np.mean(inplace[mask])
print(f'Full radius r={r}, radius m={m}, mean(M)={m_area:.4f}.')
Code for plotting:
import matplotlib.pyplot as plt
plt.figure(figsize=(16, 8))
plt.subplot(121)
rotated = np.nan_to_num(rotated) # not necessary in case of the first method
plt.title(
f'Output of rotate_vector(), angle={angle}°\n'
f'Sum is {np.sum(rotated):.2f} and should be {np.sum(v):.2f}')
plt.imshow(rotated, cmap=plt.cm.Greys_r)
plt.subplot(122)
plt.title(
f'Output of place_vectors(), r={r}, n={n}\n'
f'Covered {covered_frac * 100:.2f}% of the area which should be covered.\n'
f'Mean of the circle M is {m_area:.4f} and should be 1.0.')
plt.imshow(inplace)
circle=plt.Circle((r, r), m, color='r', fill=False)
plt.gcf().gca().add_artist(circle)
plt.gcf().gca().legend([circle], [f'Circle M (m={m})'])