I have several sets of data to which I'm trying to fit different profiles. In the centre of one of the minima there is contamination that prevents me from doing a good fit as you can see in this image:
How can I clip out those spikes in the bottom of my data taking into account that the spike is not always in the same position? Or how would you deal with data like this? I'm using lmfit to fit the profiles, in this case a Lorentzian and a Gaussian. Here is a minimal working example where I have played with the initial values to fit the data more closely:
import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model
from lmfit.models import GaussianModel, ConstantModel, LorentzianModel
x = np.array([4085.18084467, 4085.38084374, 4085.5808428 , 4085.78084186, 4085.98084092, 4086.18083999, 4086.38083905, 4086.58083811, 4086.78083717, 4086.98083623, 4087.1808353 , 4087.38083436, 4087.58083342, 4087.78083248, 4087.98083155, 4088.18083061, 4088.38082967, 4088.58082873, 4088.78082779, 4088.98082686, 4089.18082592, 4089.38082498, 4089.58082404, 4089.78082311, 4089.98082217, 4090.18082123, 4090.38082029, 4090.58081935, 4090.78081842, 4090.98081748, 4091.18081654, 4091.3808156 , 4091.58081466, 4091.78081373, 4091.98081279, 4092.18081185, 4092.38081091, 4092.58080998, 4092.78080904, 4092.9808081 , 4093.18080716, 4093.38080622, 4093.58080529, 4093.78080435, 4093.98080341, 4094.18080247, 4094.38080154, 4094.5808006 , 4094.78079966, 4094.98079872, 4095.18079778, 4095.38079685, 4095.58079591, 4095.78079497, 4095.98079403, 4096.1807931 , 4096.38079216, 4096.58079122, 4096.78079028, 4096.98078934, 4097.18078841, 4097.38078747, 4097.58078653, 4097.78078559,4097.98078466, 4098.18078372, 4098.38078278, 4098.58078184, 4098.7807809 , 4098.98077997, 4099.18077903, 4099.38077809, 4099.58077715, 4099.78077622, 4099.98077528, 4100.18077434, 4100.3807734 , 4100.58077246, 4100.78077153, 4100.98077059, 4101.18076965, 4101.38076871, 4101.58076778, 4101.78076684, 4101.9807659 , 4102.18076496, 4102.38076402, 4102.58076309, 4102.78076215, 4102.98076121, 4103.18076027, 4103.38075934, 4103.5807584 , 4103.78075746, 4103.98075652, 4104.18075558, 4104.38075465, 4104.58075371, 4104.78075277, 4104.98075183, 4105.1807509 , 4105.38074996, 4105.58074902, 4105.78074808, 4105.98074714, 4106.18074621, 4106.38074527, 4106.58074433, 4106.78074339, 4106.98074246, 4107.18074152, 4107.38074058, 4107.58073964, 4107.7807387 , 4107.98073777, 4108.18073683, 4108.38073589, 4108.58073495, 4108.78073401, 4108.98073308, 4109.18073214, 4109.3807312 , 4109.58073026, 4109.78072933, 4109.98072839, 4110.18072745, 4110.38072651, 4110.58072557, 4110.78072464, 4110.9807237 , 4111.18072276, 4111.38072182, 4111.58072089, 4111.78071995, 4111.98071901, 4112.18071807, 4112.38071713, 4112.5807162 , 4112.78071526, 4112.98071432, 4113.18071338, 4113.38071245, 4113.58071151, 4113.78071057, 4113.98070963, 4114.18070869, 4114.38070776, 4114.58070682, 4114.78070588, 4114.98070494, 4115.18070401, 4115.38070307, 4115.58070213, 4115.78070119, 4115.98070025, 4116.18069932, 4116.38069838, 4116.58069744, 4116.7806965 , 4116.98069557, 4117.18069463, 4117.38069369, 4117.58069275, 4117.78069181, 4117.98069088, 4118.18068994, 4118.380689 , 4118.58068806, 4118.78068713, 4118.98068619, 4119.18068525, 4119.38068431, 4119.58068337, 4119.78068244, 4119.9806815 , 4120.18068056, 4120.38067962, 4120.58067869, 4120.78067775, 4120.98067681, 4121.18067587, 4121.38067493, 4121.580674 , 4121.78067306, 4121.98067212, 4122.18067118, 4122.38067025, 4122.58066931, 4122.78066837, 4122.98066743, 4123.18066649, 4123.38066556, 4123.58066462, 4123.78066368, 4123.98066274, 4124.1806618 , 4124.38066087, 4124.58065993, 4124.78065899, 4124.98065805, 4125.18065712, 4125.38065618, 4125.58065524, 4125.7806543 , 4125.98065336, 4126.18065243, 4126.38065149, 4126.58065055, 4126.78064961, 4126.98064868, 4127.18064774, 4127.3806468 , 4127.58064586, 4127.78064492, 4127.98064399, 4128.18064305, 4128.38064211, 4128.58064117, 4128.78064024, 4128.9806393 , 4129.18063836, 4129.38063742, 4129.58063648, 4129.78063555, 4129.98063461, 4130.18063367, 4130.38063273, 4130.5806318 , 4130.78063086, 4130.98062992, 4131.18062898, 4131.38062804, 4131.58062711, 4131.78062617, 4131.98062523, 4132.18062429, 4132.38062336, 4132.58062242, 4132.78062148, 4132.98062054, 4133.1806196 , 4133.38061867, 4133.58061773, 4133.78061679, 4133.98061585, 4134.18061492, 4134.38061398, 4134.58061304, 4134.7806121 , 4134.98061116])
y = np.array([0.90312759, 1.00923175, 0.94618369, 0.98284045, 0.91510612, 0.96737804, 0.97690214, 0.94363369, 1.00887784, 1.00110387, 0.91647096, 0.97943202, 1.00672907, 1.01552094, 1.01089407, 0.96914584, 0.9908419 , 1.0176613 , 0.97032148, 0.96003562, 0.9702355 , 0.93684173, 0.94652734, 0.94895018, 1.01214356, 0.85777678, 0.89308203, 0.9789272 , 0.93901884, 0.9684622 , 0.96969321, 0.86326307, 0.89607392, 0.92459571, 1.00454429, 1.06019733, 0.97291196, 0.95646497, 0.95899707, 1.02830351, 0.94938178, 0.91481128, 0.92606219, 0.97085631, 0.93597434, 0.91316857, 0.90644542, 0.91726926, 0.91686184, 0.96445563, 0.92166362, 0.95831572, 0.93859066, 0.85285273, 0.89944073, 0.91812428, 0.94265677, 0.88281406, 0.9470601 , 0.94921529, 0.97289222, 0.94632251, 0.96633195, 0.94096512, 0.95324803, 0.90920845, 0.92100257, 0.91181745, 0.95715298, 0.91715382, 0.90219214, 0.87585035, 0.86592191, 0.89335902, 0.85536392, 0.89619274, 0.9450366 , 0.82780137, 0.81214176, 0.83461329, 0.82858317, 0.80851704, 0.79253546, 0.85440086, 0.81679169, 0.80579976, 0.72312218, 0.75583125, 0.75204599, 0.84519188, 0.68686821, 0.71472154, 0.71706318, 0.72640234, 0.70526356, 0.68295282, 0.66795774, 0.65004383, 0.68096834, 0.72697547, 0.72436393, 0.77128385, 0.79666758, 0.67349101, 0.61479406, 0.57046337, 0.51614312, 0.52945366, 0.53112169, 0.53757761, 0.56680358, 0.63839684, 0.60704329, 0.62377533, 0.67862515, 0.64587581, 0.71316115, 0.76309798, 0.72217569, 0.7477785 , 0.79731849, 0.76934137, 0.77063868, 0.77871584, 0.77688526, 0.84342722, 0.85382332, 0.88700466, 0.85837992, 0.79589266, 0.83798993, 0.79835529, 0.84612746, 0.83214907, 0.86373676, 0.90729115, 0.82111605, 0.86165685, 0.84090099, 0.90389133, 0.89554032, 0.90792356, 0.92798016, 0.95588479, 0.95019718, 0.95447497, 0.89845759, 0.91638311, 0.99263342, 0.97477606, 0.95482538, 0.94489498, 0.94344967, 0.90526465, 0.92538486, 0.96279787, 0.94005143, 0.96842454, 0.92296494, 0.89954172, 0.8684367 , 0.95039002, 0.95229769, 0.93752274, 0.94741173, 0.96704449, 1.01130839, 0.95499414, 0.99596569, 0.95130622, 1.00014723, 1.00252218, 0.95130331, 1.0022896 , 0.99851989, 0.94405282, 0.95814021, 0.94851972, 1.01302067, 1.01400272, 0.97960083, 0.97070283, 1.01312797, 0.9842154 , 1.01147273, 0.97331853, 0.91403182, 0.96813051, 0.92319169, 0.9294103 , 0.96960715, 0.94811518, 0.97115083, 0.84687543, 0.90725159, 0.88061293, 0.87319615, 0.85331661, 0.89775082, 0.90956716, 0.83174505, 0.89753388, 0.89554364, 0.95329739, 0.87687031, 0.93883127, 0.97433899, 0.99515225, 0.97519981, 0.91956466, 0.97977674, 0.93582089, 1.00662722, 0.90157277, 1.02887754, 0.9777419 , 0.94257094, 1.02359615, 0.98968414, 1.00075502, 1.03230265, 1.05904074, 1.00488442, 1.05507886, 1.05085518, 1.02561781, 1.05896008, 0.98024381, 1.08005691, 0.94528977, 1.03853637, 1.02064405, 1.0467137 , 1.05375156, 1.12907949, 0.99295611, 1.06601022, 1.02846374, 0.98006807, 0.96446772, 0.97702428, 0.97788589, 0.93889781, 0.96366778, 0.96645265, 0.95857242, 1.05796304, 0.99441763, 1.00573183, 1.05001927])
e = np.array([0.0647344 , 0.04583914, 0.05665552, 0.04447208, 0.05644753, 0.03968611, 0.05985188, 0.04252311, 0.03366922, 0.04237672, 0.03765898, 0.03290132, 0.04626836, 0.05106203, 0.03619188, 0.03944098, 0.08115469, 0.05859644, 0.06091101, 0.05170821, 0.0427244 , 0.06804469, 0.06708318, 0.03369381, 0.04160575, 0.08007032, 0.09292148, 0.04378329, 0.08216214, 0.06087074, 0.05375458, 0.06185891, 0.06385766, 0.08084546, 0.04864063, 0.06400878, 0.04988693, 0.06689165, 0.05989534, 0.08010138, 0.0681177 , 0.04478208, 0.03876582, 0.05977015, 0.06610619, 0.05020086, 0.07244604, 0.0445143 , 0.06970626, 0.04423994, 0.0414573 , 0.06892836, 0.05715395, 0.04014724, 0.07908425, 0.06082051, 0.08380691, 0.08576757, 0.06571406, 0.04842625, 0.05298355, 0.05271857, 0.06340425, 0.10849621, 0.0811072 , 0.03642638, 0.10614094, 0.09865099, 0.06711037, 0.10244762, 0.11843505, 0.1092357 , 0.09748241, 0.09657009, 0.09970179, 0.10203563, 0.18494082, 0.14097796, 0.1151294 , 0.16172895, 0.17611204, 0.16226913, 0.2295418 , 0.17795924, 0.1253298 , 0.1771586 , 0.15139061, 0.14739618, 0.1620105 , 0.19158538, 0.21431605, 0.19292715, 0.23308884, 0.30519423, 0.31401994, 0.30569885, 0.31216375, 0.35147676, 0.25016472, 0.16232236, 0.09058787, 0.0604483 , 0.05168302, 0.21432774, 0.38149791, 0.5061975 , 0.44281541, 0.50646427, 0.43761581, 0.44989111, 0.47778238, 0.39944325, 0.32462726, 0.34560857, 0.3175776 , 0.30253441, 0.23059451, 0.24516185, 0.20708065, 0.26429751, 0.1830661 , 0.15155041, 0.16497299, 0.15794139, 0.13626666, 0.17839823, 0.13502886, 0.14148522, 0.10869864, 0.11723602, 0.09074029, 0.06922157, 0.07719777, 0.13181317, 0.11441895, 0.10655855, 0.12073767, 0.0846133 , 0.07974657, 0.06538693, 0.0573741 , 0.07864047, 0.08351471, 0.08130351, 0.0768824 , 0.07951992, 0.04478989, 0.0765122 , 0.04842814, 0.04355571, 0.05138656, 0.07215294, 0.04681987, 0.05790133, 0.06163808, 0.082449 , 0.06127927, 0.04971221, 0.05107901, 0.04493687, 0.06072161, 0.06094332, 0.03630467, 0.04162285, 0.04058228, 0.04526251, 0.06191432, 0.04901982, 0.0454908 , 0.06186274, 0.0407017 , 0.03865571, 0.04353665, 0.03898987, 0.04666321, 0.05856035, 0.04225933, 0.04797901, 0.03523971, 0.04728414, 0.05494382, 0.04773011, 0.03210954, 0.05651663, 0.03625933, 0.03596701, 0.03800191, 0.06267668, 0.06431192, 0.0602614 , 0.05139896, 0.04571979, 0.04375182, 0.0576867 , 0.07491418, 0.05339972, 0.07619115, 0.11569378, 0.07087871, 0.09076518, 0.13554717, 0.07811761, 0.07180695, 0.05831886, 0.06042863, 0.08759576, 0.06650081, 0.08420164, 0.08185432, 0.04338836, 0.04970979, 0.04008252, 0.03605485, 0.03456321, 0.05594584, 0.03856822, 0.03576337, 0.03118799, 0.0441686 , 0.0469118 , 0.03591666, 0.03562582, 0.04934832, 0.03280972, 0.03201576, 0.04338048, 0.07443531, 0.04121059, 0.03774147, 0.03717577, 0.03354207, 0.03806978, 0.0319364 , 0.03715712, 0.0379478 , 0.04867626, 0.0304592 , 0.03393844, 0.034518 , 0.04293514, 0.05177898, 0.05332907, 0.0352937 , 0.03359781, 0.04625272, 0.03733088, 0.03501259, 0.03346308, 0.04333749, 0.05741173])
cont = ConstantModel(prefix='cte_')
pars = cont.guess(y, x=x)
gauss = GaussianModel(prefix='g_')
pars.update( gauss.make_params())
pars['cte_c'].set(1)
pars['g_center'].set(4125, min=4120, max=4130)
pars['g_sigma'].set(1, min=0.5)
pars['g_amplitude'].set(-0.2, min=-0.5)
loren = LorentzianModel(prefix='l_')
pars.update( loren.make_params())
pars['l_center'].set(4106, min=4095, max=4115)
pars['l_sigma'].set(4, max=6)
pars['l_amplitude'].set(-6., max=-4.)
model = gauss + loren + cont
init = model.eval(pars, x=x)
result = model.fit(y, pars, x=x, weights=1/e)
#print(result.fit_report(min_correl=0.5))
fig, ax = plt.subplots(figsize=(8,6))
ax.plot(x, y, 'k-', lw=2) # data in red
ax.plot(x, init, 'g--', lw=2) # initial guess
ax.plot(x, result.best_fit, 'r-', lw=2) # best fit
ax.set(xlim=(4085,4135), ylim=(0.4,1.14))
If the bad point is always at the same x value, you could remove that point from the data, perhaps with something like:
import numpy as np
def index_nearest(array, value):
"""index of array nearest to value"""
return np.abs(array-value).argmin()
ybad = index_nearest(x, 4150)
y[ybad] = x[ybad] = np.nan
x = x[np.where(np.isfinite(y))]
y = y[np.where(np.isfinite(y))]
and then fit your model to those data with the bad point removed.
But, also: if there is not an obviously errant point and the data "just" noisy, there is probably no advantage to removing what looks like bad points. Your data looks noisy to me, but it's hard to see that there is a systematically bad point. If you are going to remove a point, remember that you are asserting that this measurement was not merely affected by normal noise, but was wrong.
Finally: another approach to treating noisy data might be to try to smooth the data, say with a Savitzky-Golay filter. There is always some danger of smoothing out features with such an approach, but a modest S-G filter is often good for cleaning up noisy data enough to detect features. Of course, if fits to filtered data give significantly different results from fits to unfiltered data, you will probably need to understand why that is.