Efficient stitching of datasets

Efficient stitching of datasets - python

I have multiple measurement datasets that I want to combine to a single dataset. While I have a working solution, it is terribly inefficient and I would be happy for some tips on how I can improve it.
Think of the measurements as multiple height maps of one object that I want to combine to a single height map. My measurements are not perfect and may have some tilt and height offset. Let's assume (for now) that we know the x-y position perfectly accurate. Here is an example:
import numpy as np
import matplotlib.pyplot as plt
def height_profile(x, y):
radius = 100
return np.sqrt(radius**2-x**2-y**2)-radius
np.random.seed(123)
datasets = {}
# DATASET 1
x = np.arange(-8, 2.01, 0.1)
y = np.arange(-3, 7.01, 0.1)
xx, yy = np.meshgrid(x, y)
# height is the actual profile + noise
zz = height_profile(xx, yy) + np.random.randn(*xx.shape)*0.001
datasets[1] = [xx, yy, zz]
plt.figure()
plt.pcolormesh(*datasets[1])
plt.colorbar()
# DATASET 2
x = np.arange(-2, 8.01, 0.1)
y = np.arange(-3, 7.01, 0.1)
xx, yy = np.meshgrid(x, y)
# height is the actual profile + noise + random offset + random tilt
zz = height_profile(xx, yy) + np.random.randn(*xx.shape)*0.001 + np.random.rand() + np.random.rand()*xx*0.1 + np.random.rand()*yy*0.1
datasets[2] = [xx, yy, zz]
plt.figure()
plt.pcolormesh(*datasets[2])
plt.colorbar()
# DATASET 3
x = np.arange(-5, 5.01, 0.1)
y = np.arange(-7, 3.01, 0.1)
xx, yy = np.meshgrid(x, y)
# height is the actual profile + noise + random offset + random tilt
zz = height_profile(xx, yy) + np.random.randn(*xx.shape)*0.001 + np.random.rand() + np.random.rand()*xx*0.1 + np.random.rand()*yy*0.1
datasets[3] = [xx, yy, zz]
plt.figure()
plt.pcolormesh(*datasets[3])
plt.colorbar()
To combine the three (or more) datasets, I have the following strategy: Find the overlap between the datasets, calculate the summed-up height difference between datasets in the overlap regions (residual_overlap) and try to minimize the height differences (residual) using lmfit. To apply the transformations on the dataset (tilt, offset, etc.) I have a dedicated function.
from lmfit import minimize, Parameters
from copy import deepcopy
from itertools import combinations
from scipy.interpolate import griddata
def data_transformation(dataset, idx, params):
dataset = deepcopy(dataset)
if 'x_offset_{}'.format(idx) in params:
x_offset = params['x_offset_{}'.format(idx)].value
else:
x_offset = 0
if 'y_offset_{}'.format(idx) in params:
y_offset = params['y_offset_{}'.format(idx)].value
else:
y_offset = 0
if 'tilt_x_{}'.format(idx) in params:
x_tilt = params['tilt_x_{}'.format(idx)].value
else:
x_tilt = 0
if 'tilt_y_{}'.format(idx) in params:
y_tilt = params['tilt_y_{}'.format(idx)].value
else:
y_tilt = 0
if 'piston_{}'.format(idx) in params:
piston = params['piston_{}'.format(idx)].value
else:
piston = 0
_x = dataset[0] - np.mean(dataset[0])
_y = dataset[1] - np.mean(dataset[1])
dataset[0] = dataset[0] + x_offset
dataset[1] = dataset[1] + y_offset
dataset[2] = dataset[2] + 2 * (x_tilt * _x + y_tilt * _y) + piston
return dataset
def residual_overlap(dataset_0, dataset_1):
xy_0 = np.stack((dataset_0[0].flatten(), dataset_0[1].flatten()), axis=1)
xy_1 = np.stack((dataset_1[0].flatten(), dataset_1[1].flatten()), axis=1)
difference = griddata(xy_0, dataset_0[2].flatten(), xy_1) - \
dataset_1[2].flatten()
return difference
def residual(params, datasets):
datasets = deepcopy(datasets)
for idx in datasets:
datasets[idx] = data_transformation(
datasets[idx], idx, params)
residuals = []
for combination in combinations(list(datasets), 2):
residuals.append(residual_overlap(
datasets[combination[0]], datasets[combination[1]]))
residuals = np.concatenate(residuals)
residuals[np.isnan(residuals)] = 0
return residuals
def minimize_datasets(params, datasets, **minimizer_kw):
minimize_fnc = lambda *args, **kwargs: residual(*args, **kwargs)
datasets = deepcopy(datasets)
min_result = minimize(minimize_fnc, params,
args=(datasets, ), **minimizer_kw)
return min_result
I run the "stitching" like this:
params = Parameters()
params.add('tilt_x_2', 0)
params.add('tilt_y_2', 0)
params.add('piston_2', 0)
params.add('tilt_x_3', 0)
params.add('tilt_y_3', 0)
params.add('piston_3', 0)
fit_result = minimize_datasets(params, datasets)
plt.figure()
plt.pcolormesh(*data_transformation(datasets[1], 1, fit_result.params), alpha=0.3, vmin=-0.5, vmax=0)
plt.pcolormesh(*data_transformation(datasets[2], 2, fit_result.params), alpha=0.3, vmin=-0.5, vmax=0)
plt.pcolormesh(*data_transformation(datasets[3], 3, fit_result.params), alpha=0.3, vmin=-0.5, vmax=0)
plt.colorbar()
As you can see, it does work, but the stitching takes about a minute for these small datasets on my computer. In reality I have more and bigger datasets.
Do you see a way to improve the stitching performance?
Edit: As suggested, I ran a profiler and it shows that 99.5% of the time is spent in the griddata function. That one is used to interpolate datapoints from dataset_0 to the locations of dataset_1. If I switch method to "nearest", the execution time drops to about a second, but then there is no interpolation happening. Any chance to improve the speed of the interpolation?

Skimming through the code, I can't really see anywhere to improve other than you are running deepcopy() over and over again.
However, I would recommend you to do profiling. If you are using pycharm, you can do profiling using the clock/run sign.
I am sure other IDEs also have such capabilities. This way you can figure out which function is taking the most time.
Whole graph:
When I zoom in to a few functions (I am showing google cloud functions):
You can see how many times they are called and how long they took etc.
Long story short, you need a profiler!

Related

How to use a smooth curve to link points approximately distributing in a circle?

I have a set of twelve points, which center at (0, 0) and distribute approximately in a circle, at the interval of 30 degrees, shown in the image.
The twelve points
I want to use a smooth curve to link (go through) them like the image below (I draw the red line by hand).
a hand-drawn curve in red
I want to make it in python or matlab. I have tried some interpolation methods for the upper half and lower half separately, and wanted to combine them as a complete curve. However, the results always overshoot.
Thank you for any suggestions!

I think the key here is to note that you have to consider it as a parametrized curve in 2d, not just a 1d to 2d function. Furthermore since it should be something like a circle, you need an interpolation method that supports periodic boundaries. Here are two methods for which this applies:
% set up toy data
t = linspace(0, 2*pi, 10);
t = t(1:end-1);
a = 0.08;
b = 0.08;
x = cos(t+a*randn(size(t))) + b*randn(size(t));
y = sin(t+a*randn(size(t))) + b*randn(size(t));
plot(x, y, 'ok');
% fourier interpolation
z = x+1i*y;
y = interpft(z, 200);
hold on
plot(real(y), imag(y), '-.r')
% periodic spline interpolation
z = [z, z(1)];
n = numel(z);
t = 1:n;
pp = csape(t, z, 'periodic');
ts = linspace(1, n, 200);
y = ppval(pp, ts);;
plot(real(y), imag(y), ':b');

Thank for suggestions from #flawr. According to the answer from #flawr, I implemented the periodic spline interpolation in python (still working on implementing fourier interpolation in python.). Here is the code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import CubicSpline
# set up toy data
t = np.linspace(0, 2*np.pi, 10)
t = t[0:-1]
a = 0.08
b = 0.08
x = np.cos(t + a * np.random.normal(size=len(t))) + b * np.random.normal(size=len(t))
y = np.sin(t + a * np.random.normal(size=len(t))) + b * np.random.normal(size=len(t))
plt.scatter(x, y)
# periodic spline interpolation
z = []
for idx in range(len(x)):
z.append(complex(x[idx], y[idx]))
z.append(complex(x[0], y[0]))
len_z = len(z)
t = [i for i in range(len_z)]
cs = CubicSpline(t, z, bc_type='periodic')
xs = np.linspace(0, len_z, 200)
y_new = cs(xs)
plt.plot(y_new.real, y_new.imag)
plt.show()

Create 3D Streamtube plot in Plotly

Aim
I would like to create a 3D Streamtube Plot with Plotly.
Here is a cross-section of the vector field in the middle of the plot to give you an idea of how it looks like:
The final vector field should have rotational symmetry.
My Attempt
Download the data here: https://filebin.net/x6ywfuo6v4851v74
Run the code bellow:
Code:
import plotly.graph_objs as go
import plotly.express as px
import pandas as pd
import numpy as np
import plotly.io as pio
pio.renderers.default='browser'
# Import data to pandas
df = pd.read_csv("data.csv")
# Plot
X = np.linspace(0,1,101)
Y = np.linspace(0,1,10)
Z = np.linspace(0,1,101)
# Points from which the streamtubes should originate
xpos,ypos = np.meshgrid(X[::5],Y, indexing="xy")
xpos = xpos.reshape(1,-1)[0]
ypos = ypos.reshape(1,-1)[0]
starting_points = px.scatter_3d(
x=xpos,
y=ypos,
z=[-500]*len(xpos)
)
starting_points.show()
# Streamtube Plot
data_plot = [go.Streamtube(
x = df['x'],
y = df['y'],
z = df['z'],
u = df['u'],
v = df['v'],
w = df['w'],
starts = dict( #Determines the streamtubes starting position.
x=xpos,
y=ypos,
z=[-500]*len(xpos)
),
#sizeref = 0.3,
colorscale = 'jet',
showscale = True,
maxdisplayed = 300 #Determines the maximum segments displayed in a streamtube.
)]
fig = go.Figure(data=data_plot)
fig.show()
The initial points (starting points) of the streamtubes seem to be nicely defined:
...but the resulting 3D streamtube plot is very weird:
Edit
I tried normalizing the field plot, but the result is still not satisfactory:
import plotly.graph_objs as go
import pandas as pd
import numpy as np
import plotly.io as pio
pio.renderers.default='browser'
# Import data to pandas
df = pd.read_csv("data.csv")
# NORMALIZE VECTOR FIELD -> between [0,1]
df["u"] = (df["u"]-df["u"].min()) / (df["u"].max()-df["u"].min())
df["v"] = (df["v"]-df["v"].min()) / (df["v"].max()-df["v"].min())
df["w"] = (df["w"]-df["w"].min()) / (df["w"].max()-df["w"].min())
# Plot
X = np.linspace(0,1,101)
Y = np.linspace(0,1,10)
Z = np.linspace(0,1,101)
# Points from which the streamtubes should originate
xpos,ypos = np.meshgrid(X[::5],Y, indexing="xy")
xpos = xpos.reshape(1,-1)[0]
ypos = ypos.reshape(1,-1)[0]
# Streamtube Plot
data_plot = [go.Streamtube(
x = df['x'],
y = df['y'],
z = df['z'],
u = df['u'],
v = df['v'],
w = df['w'],
starts = dict( #Determines the streamtubes starting position.
x=xpos,
y=ypos,
z=[0]*len(xpos)
),
#sizeref = 0.3,
colorscale = 'jet',
showscale = True,
maxdisplayed = 300 #Determines the maximum segments displayed in a streamtube.
)]
fig = go.Figure(data=data_plot)
fig.show()
Data
As for the data itself:
It is created from 10 slices (y-direction). For each slice (y), [u,v,w] on a regular xz mesh (101x101) was computed. The whole was then assembled into the dataframe which you can download, and which has 101x101x10 data points.
Edit 2
It may be that I am wrongly converting my original data (download here: https://filebin.net/tlgkz3fy1h3j6h5o) into the format suitable for plotly, hence I was wondering if you know how this can be done correctly?
Here some code to visualize the data in a 3D vector plot correctly:
# %%
import pickle
import numpy as np
import matplotlib.pyplot as plt
# Import Full Data
with open("full_data.pickle", 'rb') as handle:
full_data = pickle.load(handle)
# Axis
X = np.linspace(0,1,101)
Y = np.linspace(0,1,10)
Z = np.linspace(-500,200,101)
# Initialize List of all fiels
DX = []
DY = []
DZ = []
for cross_section in list(full_data["cross_sections"].keys()):
# extract field components in x, y, and z
dx,dy,dz = full_data["cross_sections"][cross_section]
# Make them numpy imediatley
dx = np.array(dx)
dy = np.array(dy)
dz = np.array(dz)
# Apppend
DX.append(dx)
DY.append(dy)
DZ.append(dz)
#Convert to numpy
DX = np.array(DX)
DY = np.array(DY)
DZ = np.array(DZ)
# Create 3D Quiver Plot with color gradient
# Source: https://stackoverflow.com/questions/65254887/how-to-plot-with-matplotlib-a-3d-quiver-plot-with-color-gradient-for-length-giv
def plot_3d_quiver(x, y, z, u, v, w):
# COMPUTE LENGTH OF VECTOR -> MAGNITUDE
c = np.sqrt(np.abs(v) ** 2 + np.abs(u) ** 2 + np.abs(w) ** 2)
c = (c.ravel() - c.min()) / c.ptp()
# Repeat for each body line and two head lines
c = np.concatenate((c, np.repeat(c, 2)))
# Colormap
c = plt.cm.jet(c)
fig = plt.figure(dpi =300)
ax = fig.gca(projection='3d')
ax.quiver(x, y, z, u, v, w, colors=c, length=0.2, arrow_length_ratio=0.7)
plt.gca().invert_zaxis()
plt.show()
# Create Mesh !
xi, yi, zi = np.meshgrid(X, Y, Z, indexing='xy')
skip_every = 5
skip_slice = 2
skip3D=(slice(None,None,skip_slice),slice(None,None,skip_every),slice(None,None,skip_every))
# Source: https://stackoverflow.com/questions/68690442/python-plotting-3d-vector-field
plot_3d_quiver(xi[skip3D], yi[skip3D], zi[skip3D]/1000, DX[skip3D], DY[skip3D],
np.moveaxis(DZ[skip3D],2,1))
As you can see there are some long downward vectors in the middle of the 3D space, which is not shown in the plotly tubes.
Edit 3
Using the code from the answer, I get this:
This is a huge improvement. This looks almost perfect and is in accordance to what I expect.
A few more questions:
Is there a way to also show some tubes at the lower part of the plot?
Is there a way to flip the z-axis, such that the tubes are coming down from -z to +z (like shown in the cross-section streamline plot) ?
How does the data need to be structured to be organized correctly for the plotly plot? I ask that because of the use of np.moveaxis()?

I have rewritten my answer to reflect the history of conversation but in a disciplined manner.
The situation is:
len(np.unique(df['x']))
>>> 101
that when compared with:
len(np.unique(df['y']))
>>> 10
Seems data in y-direction are much coarser than that of x-direction!
But in z-direction the situation is even worse because the range of data are way more than that of x and y:
df.min()
>>> x 0.000000
y 0.000000
z -500.000000
u -0.369106
v -0.259156
w -0.517652
df.max()
>>> x 1.000000
y 1.000000
z 200.000000
u 0.368312
v 0.238271
w 1.257869
The solution to the ill formed data-set comprises of three steps:
Normalize the vector field and sample points in each direction
Either reduce data density in x and z direction or increase density of data on y-axis.(This step is optional but generally recommended)
After making a plot based on the new data, change axis ticks to the real values.
To normalize a vector-field in this situation which apparently is an engineering one, it's important to maintain the relative length of vectors on every spacial point by doing it this way:
# NORMALIZE VECTOR FIELD -> between [0,1]
np_df = np.array([u, v, w])
vecf_norm = np.linalg.norm(np_df, 2, axis=0)
max_norm = np.max(vecf_norm)
min_norm = np.min(vecf_norm)
u = u * (vecf_norm - min_norm) / (max_norm - min_norm)
v = v * (vecf_norm - min_norm) / (max_norm - min_norm)
w = w * (vecf_norm - min_norm) / (max_norm - min_norm)
As you will see at the end, this formulation will be used to enhance the resulting tube-plot.
Please let me add some important details about using dimensionless data for engineering data visualisation:
First of all if this vector field is resulted from any sort of differential equations, it is highly recommended to reformulate your P.D.F. to a dimensionless equation before attempting to solve it numerically.
If the vector field is result of an already dimensionless differential equation, you need to plot it using dimensionless data (including geometry and u,v,w values).
Please consider plotly uses the local divergence values to determine the local diameter of the tubes. When changing the vector field (and the geometry) we are changing the divergence as well.
I tried to mix your initial and second codes to get this:
import plotly.graph_objs as go
import plotly.express as px
import pandas as pd
import numpy as np
import plotly.io as pio
import pickle
pio.renderers.default='browser'
# Import Full Data
with open("full_data.pickle", 'rb') as handle:
full_data = pickle.load(handle)
# Axis
X = np.linspace(0,1,101)
Y = np.linspace(0,1,10)
Z = np.linspace(-0.5,0.2,101)
xpos,ypos = np.meshgrid(X[::5],Y, indexing="ij")
#xpos = xpos.reshape(1,-1)[0]
#ypos = ypos.reshape(1,-1)[0]
xpos = np.ravel(xpos)
ypos = np.ravel(ypos)
# Initialize List of all fields
DX = []
DY = []
DZ = []
for cross_section in list(full_data["cross_sections"]):
# extract field components in x, y, and z
dx,dy,dz = full_data["cross_sections"][cross_section]
# Make them numpy imediatley
dx = np.array(dx)
dy = np.array(dy)
dz = np.array(dz)
# Apppend
DX.append(dx)
DY.append(dy)
DZ.append(dz)
#Convert to numpy
move_i = [0, 1, 2]
move_e = [1, 2, 0]
DX = np.moveaxis(np.array(DX), move_i, move_e)
DY = np.moveaxis(np.array(DY), move_i, move_e)
DZ = np.moveaxis(np.array(DZ), move_i, move_e)
# Create Mesh !
xi, yi, zi = np.meshgrid(X, Y, Z, indexing="ij")
data_plot = [go.Streamtube(
x = np.ravel(xi),
y = np.ravel(yi),
z = np.ravel(zi),
u = np.ravel(DX),
v = np.ravel(DY),
w = np.ravel(DZ),
starts = dict( #Determines the streamtubes starting position.
x=xpos,
y=ypos,
z=np.array([-0.5]*len(xpos)
)),
#sizeref = 0.3,
colorscale = 'jet',
showscale = True,
maxdisplayed = 300 #Determines the maximum segments displayed in a streamtube.
)]
fig = go.Figure(data=data_plot)
fig.show()
In this code I have removed the skipping thing, because I suspect the evil is happening there. The resulting plot which you have added to your question, seems similar to the 2D plot of your question, but it requires more work to have better result.
So using what have been told already in addition to the info below:
Yes, Tubes are started from the start points, so you need to define start points where you expect to see tubes there! but, the start points need to be geometrically inside the space defined by sample points, otherwise maybe plotly be forced to extrapolate data (I'm not sure about this) and it results in distorted and unexpected results. This means you can define start points both in upper and lower planes of the field to ensure that you have vectors which emit on both planes. Sometime the vectors are there but you can not see them because they are drawn too thin to see. It's because their local divergences are too low, may be if you normalize this vector field by the rules mentioned earlier, it gives you a better result.
According to plotly documentation:
You can tell plotly's automatic axis range calculation logic to reverse the direction of an axis by setting the autorange axis property to "reversed"
plotly reads data point-by-point, so the order of points doesn't really matter but in case of your problem, the issue happens when data became corrupted and disturbed during omitting of some of sample points. i.e. some of x,y,z and some of u,v,w data loosed their correct location which resulted in an entirely different unexpected data set.
I have tried to normalize the (u,v,w) vector-field(using the formulation provided earlier):
import plotly.graph_objs as go
import plotly.express as px
import pandas as pd
import numpy as np
import plotly.io as pio
import pickle
pio.renderers.default='browser'
# Import Full Data
with open("full_data.pickle", 'rb') as handle:
full_data = pickle.load(handle)
# Axis
X = np.linspace(0,1,101)
Y = np.linspace(0,1,10)
Z = np.linspace(-0.5,0.2,101)
xpos,ypos = np.meshgrid(X[::5],Y, indexing="ij")
#xpos = xpos.reshape(1,-1)[0]
#ypos = ypos.reshape(1,-1)[0]
xpos = np.ravel(xpos)
ypos = np.ravel(ypos)
# Initialize List of all fields
DX = []
DY = []
DZ = []
for cross_section in list(full_data["cross_sections"]):
# extract field components in x, y, and z
dx,dy,dz = full_data["cross_sections"][cross_section]
# Make them numpy imediatley
dx = np.array(dx)
dy = np.array(dy)
dz = np.array(dz)
# Apppend
DX.append(dx)
DY.append(dy)
DZ.append(dz)
#Convert to numpy
move_i = [0, 1, 2]
move_e = [1, 2, 0]
DX = np.moveaxis(np.array(DX), move_i, move_e)
DY = np.moveaxis(np.array(DY), move_i, move_e)
DZ = np.moveaxis(np.array(DZ), move_i, move_e)
u1 = np.ravel(DX)
v1 = np.ravel(DY)
w1 = np.ravel(DZ)
np_df = np.array([u1, v1, w1])
vecf_norm = np.linalg.norm(np_df, 2, axis=0)
max_norm = np.max(vecf_norm)
min_norm = np.min(vecf_norm)
u2 = u1 * (vecf_norm - min_norm) / (max_norm - min_norm)
v2 = v1 * (vecf_norm - min_norm) / (max_norm - min_norm)
w2 = w1 * (vecf_norm - min_norm) / (max_norm - min_norm)
# Create Mesh !
xi, yi, zi = np.meshgrid(X, Y, Z, indexing="ij")
data_plot = [go.Streamtube(
x = np.ravel(xi),
y = np.ravel(yi),
z = np.ravel(zi),
u = u2,
v = v2,
w = w2,
starts = dict( #Determines the streamtubes starting position.
x=xpos,
y=ypos,
z=np.array([-0.5]*len(xpos)
)),
#sizeref = 0.3,
colorscale = 'jet',
showscale = True,
maxdisplayed = 300 #Determines the maximum segments displayed in a streamtube.
)]
fig = go.Figure(data=data_plot)
fig.show()
and get a better plot:

Graphing polynomials

With some help I have produced the following code. Below are some of the desired outputs for given inputs. However I am having some trouble completing the last task of this code. Looking for some help with this, any guidance or help is greatly appreciated, thanks!
flops = 0
def add(x1, x2):
global flops
flops += 1
return x1 + x2
def multiply(x1, x2):
global flops
flops += 1
return x1 * x2
def poly_horner(A, x):
global flops
flops = 0
p = A[-1]
i = len(A) - 2
while i >= 0:
p = add(multiply(p, x), A[i])
i -= 1
return p
def poly_naive(A, x):
global flops
p = 0
flops = 0
for i, a in enumerate(A):
xp = 1
for _ in range(i):
xp = multiply(xp, x)
p = add(p, multiply(xp, a))
return p
Given the following inputs, I got the following outputs:
poly_horner([1,2,3,4,5], 2)
129
print(flops)
8
poly_naive([1,2,3,4,5, 2])
129
print(flops)[![enter image description here][1]][1]
20
np.polyval([5,4,3,2,1], 2)
129

I assume you want to create a figure, though your question is quite vague...but I have a few minutes to kill while my code runs. Anyway, it seems you MIGHT be having difficulty plotting.
import numpy as np
import pylab as pl
x = np.arange(10)
y = x * np.pi
# you can calculate a line of best fit (lobf) using numpy's polyfit function
lobf1 = np.polyfit(x, y, 1) # first degree polynomial
lobf2 = np.polyfit(x, y, 2) # second degree polynomial
lobf3 = np.polyfit(x, y, 3) # third degree polynomial
# you can now use the lines of best fit to calculate the
# value anywhere within the domain using numpy's polyval function
# FIRST, create a figure and a plotting axis within the fig
fig = pl.figure(figsize=(3.25, 2.5))
ax0 = fig.add_subplot(111)
# now use polyval to calculate your y-values at every x
x = np.arange(0, 20, 0.1)
ax0.plot(x, np.polyval(lobf1, x), 'k')
ax0.plot(x, np.polyval(lobf2, x), 'b')
ax0.plot(x, np.polyval(lobf3, x), 'r')
# add a legend for niceness
ax0.legend(('Degree 1', 'Degree 2', 'Degree 3'), fontsize=8, loc=2)
# you can label the axes whatever you like
ax0.set_ylabel('My y-label', fontsize=8)
ax0.set_xlabel('My x-label', fontsize=8)
# you can show the figure on your screen
fig.show()
# and you can save the figure to your computer in different formats
# specifying bbox_inches='tight' helps eliminate unnecessary whitespace around
# the axis when saving...it just looks better this way.
pl.savefig('figName.png', dpi=500, bbox_inches='tight')
pl.savefig('figName.pdf', bbox_inches='tight')
# don't forget to close the figure
pl.close('all')

How can I make my 2D Gaussian fit to my image

I am trying to fit a 2D Gaussian to an image to find the location of the brightest point in it. My code looks like this:
import numpy as np
import astropy.io.fits as fits
import os
from astropy.stats import mad_std
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
from matplotlib.patches import Circle
from lmfit.models import GaussianModel
from astropy.modeling import models, fitting
def gaussian(xycoor,x0, y0, sigma, amp):
'''This Function is the Gaussian Function'''
x, y = xycoor # x and y taken from fit function. Stars at 0, increases by 1, goes to length of axis
A = 1 / (2*sigma**2)
eq = amp*np.exp(-A*((x-x0)**2 + (y-y0)**2)) #Gaussian
return eq
def fit(image):
med = np.median(image)
image = image-med
image = image[0,0,:,:]
max_index = np.where(image >= np.max(image))
x0 = max_index[1] #Middle of X axis
y0 = max_index[0] #Middle of Y axis
x = np.arange(0, image.shape[1], 1) #Stars at 0, increases by 1, goes to length of axis
y = np.arange(0, image.shape[0], 1) #Stars at 0, increases by 1, goes to length of axis
xx, yy = np.meshgrid(x, y) #creates a grid to plot the function over
sigma = np.std(image) #The standard dev given in the Gaussian
amp = np.max(image) #amplitude
guess = [x0, y0, sigma, amp] #The initial guess for the gaussian fitting
low = [0,0,0,0] #start of data array
#Upper Bounds x0: length of x axis, y0: length of y axis, st dev: max value in image, amplitude: 2x the max value
upper = [image.shape[0], image.shape[1], np.max(image), np.max(image)*2]
bounds = [low, upper]
params, pcov = curve_fit(gaussian, (xx.ravel(), yy.ravel()), image.ravel(),p0 = guess, bounds = bounds) #optimal fit. Not sure what pcov is.
return params
def plotting(image, params):
fig, ax = plt.subplots()
ax.imshow(image)
ax.scatter(params[0], params[1],s = 10, c = 'red', marker = 'x')
circle = Circle((params[0], params[1]), params[2], facecolor = 'none', edgecolor = 'red', linewidth = 1)
ax.add_patch(circle)
plt.show()
data = fits.getdata('AzTECC100.fits') #read in file
med = np.median(data)
data = data - med
data = data[0,0,:,:]
parameters = fit(data)
#generates a gaussian based on the parameters given
plotting(data, parameters)
The image is plotting and the code is giving no errors but the fitting isn't working. It's just putting an x wherever the x0 and y0 are. The pixel values in my image are very small. The max value is 0.0007 and std dev is 0.0001 and the x and y are a few orders of magnitude larger. So I believe my problem is that because of this my eq is going to zero everywhere so the curve_fit is failing. I'm wondering if there's a better way to construct my gaussian so that it plots correctly?

I do not have access to your image. Instead I have generated some test "image" as follows:
y, x = np.indices((51,51))
x -= 25
y -= 25
data = 3 * np.exp(-0.7 * ((x+2)**2 + (y-1)**2))
Also, I have modified your code for plotting to increase the radius of the circle by 10:
circle = Circle((params[0], params[1]), 10 * params[2], ...)
and I commented out two more lines:
# image = image[0,0,:,:]
# data = data[0,0,:,:]
The result that I get is shown in the attached image and it looks reasonable to me:
Could it be that the issue is in how you access data from the FITS file? (e.g., image = image[0,0,:,:]) Are the data 4D array? Why do you have 4 indices?
I also saw that you have asked a similar question here: Astropy.model 2DGaussian issue in which you tried to use just astropy.modeling. I will look into that question.
NOTE: you can replace code such as
max_index = np.where(image >= np.max(image))
x0 = max_index[1] #Middle of X axis
y0 = max_index[0] #Middle of Y axis
with
y0, x0 = np.unravel_index(np.argmax(data), data.shape)

Separating gaussian components of a curve using python

I am trying to deblend the emission lines of low resolution spectrum in order to get the gaussian components. This plot represents the kind of data I am using:
After searching a bit, the only option I found was the application of the gauest function from the kmpfit package (http://www.astro.rug.nl/software/kapteyn/kmpfittutorial.html#gauest). I have copied their example but I cannot make it work.
I wonder if anyone could please offer me any alternative to do this or how to correct my code:
import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
def CurveData():
x = np.array([3963.67285156, 3964.49560547, 3965.31835938, 3966.14111328, 3966.96362305,
3967.78637695, 3968.60913086, 3969.43188477, 3970.25463867, 3971.07714844,
3971.89990234, 3972.72265625, 3973.54541016, 3974.36791992, 3975.19067383])
y = np.array([1.75001533e-16, 2.15520995e-16, 2.85030769e-16, 4.10072843e-16, 7.17558032e-16,
1.27759917e-15, 1.57074192e-15, 1.40802933e-15, 1.45038722e-15, 1.55195653e-15,
1.09280316e-15, 4.96611341e-16, 2.68777266e-16, 1.87075114e-16, 1.64335999e-16])
return x, y
def FindMaxima(xval, yval):
xval = np.asarray(xval)
yval = np.asarray(yval)
sort_idx = np.argsort(xval)
yval = yval[sort_idx]
gradient = np.diff(yval)
maxima = np.diff((gradient > 0).view(np.int8))
ListIndeces = np.concatenate((([0],) if gradient[0] < 0 else ()) + (np.where(maxima == -1)[0] + 1,) + (([len(yval)-1],) if gradient[-1] > 0 else ()))
X_Maxima, Y_Maxima = [], []
for index in ListIndeces:
X_Maxima.append(xval[index])
Y_Maxima.append(yval[index])
return X_Maxima, Y_Maxima
def GaussianMixture_Model(p, x, ZeroLevel):
y = 0.0
N_Comps = int(len(p) / 3)
for i in range(N_Comps):
A, mu, sigma = p[i*3:(i+1)*3]
y += A * np.exp(-(x-mu)*(x-mu)/(2.0*sigma*sigma))
Output = y + ZeroLevel
return Output
def Residuals_GaussianMixture(p, x, y, ZeroLevel):
return GaussianMixture_Model(p, x, ZeroLevel) - y
Wave, Flux = CurveData()
Wave_Maxima, Flux_Maxima = FindMaxima(Wave, Flux)
EmLines_Number = len(Wave_Maxima)
ContinuumLevel = 1.64191e-16
# Define initial values
p_0 = []
for i in range(EmLines_Number):
p_0.append(Flux_Maxima[i])
p_0.append(Wave_Maxima[i])
p_0.append(2.0)
p1, conv = optimize.leastsq(Residuals_GaussianMixture, p_0[:],args=(Wave, Flux, ContinuumLevel))
Fig = plt.figure(figsize = (16, 10))
Axis1 = Fig.add_subplot(111)
Axis1.plot(Wave, Flux, label='Emission line')
Axis1.plot(Wave, GaussianMixture_Model(p1, Wave, ContinuumLevel), 'r', label='Fit with optimize.leastsq')
print p1
Axis1.plot(Wave, GaussianMixture_Model([p1[0],p1[1],p1[2]], Wave, ContinuumLevel), 'g:', label='Gaussian components')
Axis1.plot(Wave, GaussianMixture_Model([p1[3],p1[4],p1[5]], Wave, ContinuumLevel), 'g:')
Axis1.set_xlabel( r'Wavelength $(\AA)$',)
Axis1.set_ylabel('Flux' + r'$(erg\,cm^{-2} s^{-1} \AA^{-1})$')
plt.legend()
plt.show()

A typical simplistic way to fit:
def model(p,x):
A,x1,sig1,B,x2,sig2 = p
return A*np.exp(-(x-x1)**2/sig1**2) + B*np.exp(-(x-x2)**2/sig2**2)
def res(p,x,y):
return model(p,x) - y
from scipy import optimize
p0 = [1e-15,3968,2,1e-15,3972,2]
p1,conv = optimize.leastsq(res,p0[:],args=(x,y))
plot(x,y,'+') # data
#fitted function
plot(arange(3962,3976,0.1),model(p1,arange(3962,3976,0.1)),'-')
Where p0 is your initial guess. By the looks of things, you might want to use Lorentzian functions...
If you use full_output=True, you get all kind of info about the fitting. Also check out curve_fit and the fmin* functions in scipy.optimize. There are plenty of wrappers around these around, but often, like here, it's easier to use them directly.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Efficient stitching of datasets - python

Related

How to use a smooth curve to link points approximately distributing in a circle?

Create 3D Streamtube plot in Plotly

Graphing polynomials

How can I make my 2D Gaussian fit to my image

Separating gaussian components of a curve using python

Categories

Resources