Python - How to resample a 2D shape? - python

I am writing a python script for some geometrical data manipulation (calculating motion trajectories for a multi-drive industrial machine). Generally, the idea is that there is a given shape (let's say - an ellipse, but it general case it can be any convex shape, defined with a series of 2D points), which is rotated and it's uppermost tangent point must be followed. I don't have a problem with the latter part but I need a little hint with the 2D shape preparation.
Let's say that the ellipse was defined with too little points, for example - 25. (As I said, ultimately this can be any shape, for example a rounded hexagon). To maintain necessary precision I need far more points (let's say - 1000), preferably equally distributed over whole shape or with higher density of points near corners, sharp curves, etc.
I have a few things ringing in my head, I guess that DFT (FFT) would be a good starting point for this resampling, analyzing the scipy.signal.resample() I have found out that there are far more functions in the scipy.signal package which sound promising to me...
What I'm asking for is a suggestion which way I should follow, what tool I should try for this job, which may be the most suitable. Maybe there is a tool meant exactly for what I'm looking for or maybe I'm overthinking this and one of the implementations of FFT like resample() will work just fine (of course, after some adjustments at the starting and ending point of the shape to make sure it's closing without issues)?
Scipy.signal sounds promising, however, as far as I understand, it is meant to work with time series data, not geometrical data - I guess this may cause some problems as my data isn't a function (in a mathematical understanding).
Thanks and best regards!

As far as I understood, what you want is to get an interpolated version of your original data.
The DFT (or FFT) will not achieve this purpose, since it will perform an Fourier Transform (which is not what you want).
Talking theoretically, what you need to interpolate your data is to define a function to calculate the result in the new-data-points.
So, let's say your data contains 5 points, in which one you have a 1D (to simplify) number stored, representing your data, and you want a new array with 10 points, filled with the linear-interpolation of your original data.
Using numpy.interp:
import numpy as np
original_data = [2, 0, 3, 5, 1] # define your data in 1D
new_data_resolution = 0.5 # define new sampling distance (i.e, your x-axis resolution)
interp_data = np.interp(
x = np.arange(0, 5-1+new_data_resolution , new_data_resolution), # new sampling points (new axis)
xp = range(original_data),
fp = original_data
)
# now interp_data contains (5-1) / 0.5 + 1 = 9 points
After this, you will have a (5-1) / new_resolution (which is greater than 5, since new_resolution < 1)-length data, which values will be (in this case) a linear interpolation of your original data.
After you have achieved/understood this example, you can dive in the scipy.interpolate module to get a better understanding in the interpolation functions (my example uses a linear function to get the data in the missing points).
Applying this to n-D dimensional arrays is straight-forward, iterating over each dimension of your data.

Related

Quantify roughness of a 2D surface based on given scatter points geometrically

How to design a simple code to automatically quantify a 2D rough surface based on given scatter points geometrically? For example, to use a number, r=0 for a smooth surface, r=1 for a very rough surface and the surface is in between smooth and rough when 0 < r < 1.
To more explicitly illustrate this question, the attached figure below is used to show several sketches of 2D rough surfaces. The dots are the scattered points with given coordinates. Accordingly, every two adjacent dots can be connected and a normal vector of each segment can be computed (marked with arrow). I would like to design a function like
def roughness(x, y):
...
return r
where x and y are sequences of coordinates of each scatter point. For example, in case (a), x=[0,1,2,3,4,5,6], y=[0,1,0,1,0,1,0]; in case (b), x=[0,1,2,3,4,5], y=[0,0,0,0,0,0]. When we call the function roughness(x, y), we will get r=1 (very rough) for case (a) and r=0 (smooth) for case (b). Maybe r=0.5 (medium) for case (d). The question is refined to what appropriate components do we need to put inside the function roughness?
Some initial thoughts:
Roughness of a surface is a local concept, which we only consider within a specific range of area, i.e. only with several local points around the location of interest. To use mean of local normal vectors? This may fail: (a) and (b) are with the same mean, (0,1), but (a) is rough surface and (b) is smooth surface. To use variance of local normal vectors? This may also fail: (c) and (d) are with the same variance, but (c) is rougher than (d).
maybe something like this:
import numpy as np
def roughness(x, y):
# angles between successive points
t = np.arctan2(np.diff(y), np.diff(x))
# differences between angles
ts = np.sin(t)
tc = np.cos(t)
dt = ts[1:] * tc[:-1] - tc[1:] * ts[:-1]
# sum of squares
return np.sum(dt**2) / len(dt)
would give you something like you're asking?
Maybe you should consider a protocol definition:
1) geometric definition of the surface first
2) grant unto that geometric surface intrinsic properties.
2.a) step function can be based on quadratic curve between two peaks or two troughs with their concatenated point as the focus of the 'roughness quadratic' using the slope to define roughness in analogy to the science behind road speed-bumps.
2.b) elliptical objects can be defined by a combination of deformation analysis with centered circles on the incongruity within the body. This can be solved in many ways analogous to step functions.
2.c) flat lines: select points that deviate from the mean and do a Newtonian around with a window of 5-20 concatenated points or what ever is clever.
3) define a proper threshold that fits what ever intuition you are defining as "roughness" or apply conventions of any professional field to your liking.
This branched approach might be quicker to program, but I am certain this solution can be refactored into a Euclidean construct of 3-point ellipticals, if someone is up for a geometry problem.
The mathematical definitions of many surface parameters can be found here, which can be easily put into numpy:
https://www.keyence.com/ss/products/microscope/roughness/surface/parameters.jsp
Image (d) shows a challenge: basically you want to flatten the shape before doing the calculation. This requires prior knowledge of the type of geometry you want to fit. I found an app Gwyddion that can do this in 3D, but it can only interface with Python 2.7, not 3.
If you know which base shape lies underneath:
fit the known shape
calculate the arc distance between each two points
remap the numbers by subtracting 1) from the original data and assigning new coordinates according to 2)
perform normal 2D/3D roughness calculations

Simulate speakers around a sphere using superposition - speed improvments needed

Note: Drastic speed improvements since posting, see edits at bottom.
I have some working code by it over utilizes loops and I'm pretty sure there should be a faster way of doing it. The size of the output array ends up being pretty large and so when I try to make other arrays the same size of the output, I run out of memory rather quickly.
I am simulating many speakers placed around a sphere all pointing toward the center. I have a simulation of a single speaker and I would like to leverage this single simulation by using the principle of superposition. Basically I want to sum up rotated copies of the single transducer simulation to get my final result.
I have an axisymmetric simulation of acoustic pressure data in cylindrical coordinates ("polar_coord_r", "polar_coord_z"). The pressure field from the simulation is unique at each R and Z value and completely described by a 2D array ("P_real_RZ").
I want to sum together multiple, rotated copies of the this pressure field onto a 3D Cartesian output array. Each copy is rotated to a different location on the sphere. Currently, I am specifying the rotation with an x,y,z point because it allows me to do vector math (spherical coordinates wouldn't allow me to do this as elegantly). The output array is rather large (770 × 770 × 804).
I have some working code to get the output from a single copy of the speaker ("transducer"). It takes about 12 seconds for each slice so it would take over two hours to add each new speaker!! I want to have a dozen or so copies of the speaker so this will take way to long.
The code takes a slice with constant X and computes the R and Z positions at each location in the that slice. "r_distance" is a 2D array containing the radial distance from a line passing between the origin and a point ("point"). Similarlity, "z_distance" is a 2D array containing the distance along that same line.
To get the pressure for the slice, I find the indices of the closest matching "polar_coord_r" and "polar_coord_z" to the computed R distances and Z distances. I use these indices to find what value of pressure (from P_real_RZ) to place at each value in the output.
Some definitions:
xx, yy, and zz are 1D arrays of describing the slices through the output volume
XXX, YYY, and ZZZ are 3D arrays produced by numpy.meshgrid
point is a point which defines the direction that the speaker is rotated. Basically it's just a position vector of the speakers center.
P_real_RZ is a 2D array which specifies the real pressure at each unique R and Z value.
polar_coord_r and polar_coord_z are 1D arrays which define the unique values of R and Z on which P_real_RZ is defined.
current_transducer (only one so far represented in this code) is the pressure values computer for the current point.
output is the result from summing many speakers/transducers together.
Any suggestions to speed up this code is greatly appreciated.
Working loop:
for i, x in enumerate(xx):
# Creates a unit vector from origin to a point
vector = normalize(point)
# Create a slice of the cartesian space with constant X
xyz_slice = np.array([x*np.ones_like(XXX[i,:,:]), YYY[i,:,:], ZZZ[i,:,:]])
# Projects the position vector of each point of the slice onto the unit vector.
projection = np.array(list(map(np.dot, xyz_slice, vector )))
# Normalizes the projection which results in the Z distance along the line passing through the point
#z_distance = np.apply_along_axis(np.linalg.norm, 0, projection) # this is the slow bit
z_distance = np.linalg.norm(projection, axis=0) # I'm an idiot
# Uses vector math to determine the distance from the line
# Each point in the XYZ slice is the sum of vector along the line and the vector away from the line (radial vector).
# By extension the position of the xyz point minus the projection of the point against the unit vector, results in the radial vector
# Norm the radial vector to get the R value for everywhere in the slice
#r_distance = np.apply_along_axis(np.linalg.norm, 0, xyz_slice - projection) # this is the slow bit
r_distance = np.linalg.norm(xyz_slice - projection, axis=0) # I'm an idiot
# Map the pressure data to each point in the slice using the R and Z distance with the RZ pressure slice.
# look for a more efficient way to do this perhaps. currently takes about 12 seconds per slice
r_indices = r_map_v(r_distance) # 1.3 seconds by itself
z_indices = z_map_v(z_distance)
r_indices[r_indices>384] = 384 # find and remove indicies above the max for r_distance
z_indices[r_indices>803] = 803
current_transducer[i,:,:] = P_real_RZ[z_indices, r_indices]
# Sum the mapped pressure data into the output.
output += current_transducer
I have also tried to work with the simulation data in the form of a 3D Cartesian array. That is the pressure data from the simulation for all x, y, and z values the same size as the output.I can rotate this 3D array in one direction (not two rotations needed for speakers arranged on a sphere). This takes up waaaay too much memory and is still painfully slow. I end up getting memory errors with this approach.
Edit: I found a slightly simpler way to do it but it is still slow. I've updated the code above so that there are no longer nested loops.
I ran a line profiler and the slowest lines by far were the two containing np.apply_along_axis(). I'm afraid I might have to rethink how I do this completely.
Final Edit: I initially had a nested loop which I assumed to be the issue. I don't know what made me think I needed to use apply_along_axis with linalg.norm. In any case that was the issue.
I haven't looked for all the ways that you could optimize this code, but this issue jumped out at me: "I ran a line profiler and the slowest lines by far were the two containing np.apply_along_axis()." np.linalg.norm accepts an axis argument. You can replace the line
z_distance = np.apply_along_axis(np.linalg.norm, 0, projection)
with
z_distance = np.linalg.norm(projection, axis=0)
(and likewise for the other use of np.apply_along_axis and np.linalg.norm).
That should improve the performance a bit.

Efficient 2D cross correlation in Python?

I have two arrays of size (n, m, m) (n number of images of size (m,m)). I want to perform a cross correlation between each corresponding n of the two arrays.
Example: n=1 -> corr2d([m,m]<sub>1</sub>,[m,m]<sub>2</sub>)
My current way include a bunch of for loops in python:
for i in range(len(X)):
X_co = X[i,0,:,:]/(np.max(X[i,0,:,:]))
X_x = X[i,1,:,:]/(np.max(X[i,1,:,:]))
autocorr[i,0,:,:]=correlate2d(X_co, X_x, mode='same', boundary='fill', fillvalue=0)
Obviously this is very slow when the input contain many images, and becomes a substantial part of the total run time if (m,m) << n.
The obvious optimization is to skip the loop and feed everything directly to the compiled correlation function. Currently I'm using scipy's correlate2d.
I've looked around but haven't found any function that allows correlation along some axis or multiple inputs.
Any tips on how to make scipy's correlate2d work or alternatives?
I decided to implement it via the FFT instead.
def fft_xcorr2D(x):
# Over axes (-2,-1) (default in the fft2 function)
## Pad because of cyclic (circular?) behavior of the FFT
x = np.fft2(np.pad(x,([0,0],[0,0],[0,34],[0,34]),mode='constant'))
# Conjugate for correlation, not convolution (Conv. Theorem)
x[:,1,:,:] = np.conj(x[:,1,:,:])
# Over axes (-2,-1) (default in the ifft2 function)
## Multiply elementwise over 2:nd axis (2 image bands for me)
### fftshift over rows and column over images
corr = np.fft.fftshift(np.ifft2(np.prod(x,axis=1)),axes=(-2,-1))
# Return after removing padding
return np.abs(corr)[:,3:-2,3:-2]
Call via:
ts=fft_xcorr2D(X)
If anybody wants to use it:
My input is a 4D array: (N, 2, #Rows, #Cols)
E.g. (500, 2, 30, 30): 500 images, 2 bands (polarizations, for example), of 30x30 pixels
If your input is different, adjust the padding to your liking
Check so your input order is the same as mine otherwise change the axes arguments in the fft2 and ifft2 functions, the np.prod and fftshift. I use fftshift to get the maximum value in the middle (otherwise in the corners), so be wary of that if that's not what you want.
Why is it the maximum value? Technically, it doesn't have to be, but for my purpose it is. fftshift is used to get a correlation that looks like you're used to. Otherwise, the quadrants are turned "inside out". If you wonder what I mean, remove fftshift (just the fftshift part, not its arguments), call the function as before, and plot it.
Afterwards, it should be ready to use.
Possibly x.prod(axis=1) is faster than np.prod(x,axis=1) but it's an old post. It shows no improvement for me after trying.

Python - Kriging (Gaussian Process) in scikit_learn

I am considering using this method to interpolate some 3D points I have. As an input I have atmospheric concentrations of a gas at various elevations over an area. The data I have appears as values every few feet of vertical elevation for several tens of feet, but horizontally separated by many hundreds of feet (so 'columns' of tightly packed values).
The assumption is that values vary in the vertical direction significantly more than in the horizontal direction at any given point in time.
I want to perform 3D kriging with that assumption accounted for (as a parameter I can adjust or that is statistically defined - either/or).
I believe the scikit learn module can do this. If it can, my question is how do I create a discrete cell output? That is, output into a 3D grid of data with dimensions of, say, 50 x 50 x 1 feet. Ideally, I would like an output of [x_location, y_location, value] with separation of those (or similar) distances.
Unfortunately I don't have a lot of time to play around with it, so I'm just hoping to figure out if this is possible in Python before delving into it. Thanks!
Yes, you can definitely do that in scikit_learn.
In fact, it is a basic feature of kriging/Gaussian process regression that you can use anisotropic covariance kernels.
As it is precised in the manual (cited below) ou can either set the parameters of the covariance yourself or estimate them. And you can choose either having all parameters equal or all different.
theta0 : double array_like, optional
An array with shape (n_features, ) or (1, ). The parameters in the
autocorrelation model. If thetaL and thetaU are also specified, theta0
is considered as the starting point for the maximum likelihood
estimation of the best set of parameters. Default assumes isotropic
autocorrelation model with theta0 = 1e-1.
In the 2d case, something like this should work:
import numpy as np
from sklearn.gaussian_process import GaussianProcess
x = np.arange(1,51)
y = np.arange(1,51)
X, Y = np.meshgrid(lons, lats)
points = zip(obs_x, obs_y)
values = obs_data # Replace with your observed data
gp = GaussianProcess(theta0=0.1, thetaL=.001, thetaU=1., nugget=0.001)
gp.fit(points, values)
XY_pairs = np.column_stack([X.flatten(), Y.flatten()])
predicted = gp.predict(XY_pairs).reshape(X.shape)

Sign on results of fft

I am attempting to calculate the MTF from a test target. I calculate the spread function easily enough, but the FFT results do not quite make sense to me. To summarize,the values seem to alternate giving me a reflection of what I would expect. To test, I used a simple square wave and numpy:
from numpy import fft
data = []
for x in range (0, 20):
data.append(0)
data[9] = 10
data[10] = 10
data[11] = 10
dataFFT = fft.fft(data)
The results look correct, with the exception of the sign... I am seeing the following for the first 4 values as an example:
30.00000000 +0.00000000e+00j
-29.02113033 +7.10542736e-15j
26.18033989 -1.24344979e-14j
-21.75570505 +1.24344979e-14j
So my question is why positive->negative->positive->negative in the real plane? This is not what I would expect... It I plot it, it almost appears that the correct function is mirrored around the x axis.
Note: I was expecting the following as an example:
This is what I am getting:
Your pulse is symmetric and positioned in the center of your FFT window (around N/2). Symmetric real data corresponds to only the cosine or "real" components of an FFT result. Note that the cosine function alternates between being -1 and 1 at the center of the FFT window, depending on the frequency bin index (representing cosine periods per FFT width). So the correlation of these FFT basis functions with a positive going pulse will also alternate as long as the pulse is narrower than half the cosine period.
If you want the largest FFT coefficients to be mostly positive, try centering your narrow rectangular pulse around time 0 (or circularly, time N), where the cosine function is always 1 for any frequency.
It works if you shift the data around 0 instead of half your array, with:
dataFFT = fft.fft(np.fftshift(data))
This isn't all that unexpected. If you want to check against conventional plots, make sure you convert that info to magnitude and phase before coming to any conclusions.
I did a quick check using your code and numpy.abs for mag, numpy,angle for phase. It sure looks like a sinc() function to me, which is what would be expected if the time-domain is a square pulse. If you do this, you'll find a pretty wide sinc, as would be expeceted for a short duration pulse on so few samples.
you forget to specify if your data is Real or Complex
not everyone code in python/numpy (including me) and if you do not know this then you probably handle data to/from FFT the wrong way.
FFT input can be both real or complex domain
FFT output is complex domain
so check the docs for your FFT implementation and specify it and also repair your data handling accordingly. Complex domain usually have first value Re and Second Im but that depends on FFT implementation/configuration.
signal
here is an example of impulse response from FFT
first is input Real domain signal (Im=0) single finite nonzero width pulse and second is the Re part of FFT output. The third is the Im part of FFT output. If you zoom it a bit then you will see amplitude range of y axis of each signal (on left).
Do not forget that different FFT implementations can have different normalization constants which will change the amplitude of signal. If you want magnitude and phase convert it like this:
mag=sqrt(Re*Re+Im*Im); // power
ang=atanxy(Re,Im); // phase angle
atanxy(dx,dy) is 4 quadrant arctan also called atan2 but be careful to get the operand order the same as your atanxy/atan2 implementation needs. Also can use mine C++ atanxy implementation
[Notes]
if your input signal is Real domain then FFT output is symmetric. Both Re and Im signals will be like:
{ a0,a1,a2,a3,...,a(n-1),a(n-1)...,a3,a2,a1,a0 }
exactly like on the image above. On the left are low frequencies and in the middle is the top frequency. If your input signal is Complex domain then the output can be anything.

Categories