Finding all points on a slope of a signal - python

I have a 1d signal that is given as following
x = np.array([34.69936612, 34.70083619, 37.38802174, 39.67141565, 49.05662135,
63.87593075, 67.70815746, 72.06562117, 79.31063707, 85.13125285,
83.34185985, 72.74589905, 57.34778159, 58.63283664, 64.92526896,
65.89153823, 66.07273386, 59.68722257, 59.6801125 , 59.41456929,
58.19250575, 59.92192524, 58.42078866, 55.45131784, 55.09849914,
54.95270916, 49.60804717, 43.05198366, 36.10104167, 26.88848229,
25.38550393, 28.71305461, 30.03802157, 31.3520023 , 32.59509437,
32.67600055, 32.68801666, 32.61500098, 32.65303828, 32.72752018,
32.84099458, 31.46154937, 32.70809456, 27.67842221, 25.65302641,
30.08500957, 31.41003082, 32.91935844, 32.92452782, 35.56587345,
30.09272452, 35.60898454, 49.12005244, 85.79396522, 71.81950127,
63.91915245, 69.14879246, 70.43600086, 71.71703424, 71.74830965,
70.51400086, 70.50201501, 70.50202228, 67.91157904, 66.62396413,
67.90736076, 66.5410636 , 67.96748026, 67.94177515, 65.30929726,
65.29901863, 66.60282538, 66.60666811, 66.55100589, 65.33825435,
66.55222626, 65.29656691, 66.56003543, 65.30964145, 64.07556963,
63.99339626, 62.86668124, 60.43549001, 61.68116229, 61.61140279,
62.65181523, 62.70844205, 62.77783077, 64.03882299, 65.39701193,
65.40123835, 65.41845477, 65.42941287, 65.38851043, 65.36201151,
73.33102635, 73.84443755, 70.94806114, 68.18793023, 69.20003749,
66.61045573, 65.38106858, 64.05484531, 63.88684974, 62.64420529,
62.69196131, 62.74418993, 62.72175294, 64.01210311, 64.1590297 ,
63.0284751 , 64.27265024, 64.24984689, 62.90213438, 62.68704697,
62.65233151, 63.09040365, 63.10330994, 62.72787413, 63.95427977,
63.89707325, 61.38203635, 57.48587612, 60.05764178, 62.70293674,
61.38484666, 60.07995823, 61.34569129, 62.66307354, 61.38549663,
61.34835356, 61.3888718 , 61.48381576, 62.74226583, 62.83945058,
41.78731982, 38.06452548, 40.57553545, 43.10410628, 43.17965777,
44.41576623, 45.67422069, 44.44681128, 44.52855717, 45.69118569,
45.7559632 , 49.9019806 , 50.90898633, 52.2603325 , 36.83061979,
48.36714502, 53.60110239, 53.58750501, 51.03745637, 52.15201941,
50.94600264, 48.50758345, 51.03154956, 51.32249134, 51.49705585,
53.46467209, 51.708078 , 48.1404585 , 46.32157084, 53.20416229,
60.52216104, 67.14976382, 66.6844348 , 63.99400013, 63.89292312,
63.94972283, 65.33551293, 66.54723199, 65.29004129, 67.87224117,
69.3810433 , 69.28915977, 65.32064534, 64.07644938, 64.59988251,
65.55365125, 64.3440046 , 64.4526091 , 63.38977665, 64.61810574,
63.52989024, 63.55126155, 64.4263114 , 64.43874937, 64.78594756,
66.03974204, 67.34958445, 70.07248445, 67.40968741, 66.56554542,
67.59965865, 67.85658168, 67.62022101, 67.87089721, 61.22552792,
54.07823817, 47.96332512, 53.22944931, 54.77573267, 59.55033053,
62.24247612, 62.24529416, 63.9429676 , 63.13145527, 63.29764489,
63.2723988 , 62.96359318, 63.3025575 , 63.47790181, 63.29642863,
63.50702402, 63.71413853, 63.71470992, 62.25079434, 63.46787461,
63.73497156, 63.77631175, 63.69024723, 63.55254533, 63.97794376,
64.05815662, 63.57687055, 66.80917018, 66.82863683, 66.27964922,
65.04852024, 65.29135318, 65.57783886, 65.52090561, 65.29656225,
65.32543578, 66.52825603, 67.1314033 , 50.03567181, 53.53803024,
53.56862071, 55.10515723, 55.14010716, 63.30760687, 62.7114906 ,
62.95237442, 62.75869066, 64.19585539, 62.70371169, 62.65204241,
62.69394807, 62.94844878, 58.36397143, 59.68285611, 60.89452752,
60.97356663, 60.72068974, 59.62036073, 60.52789377, 59.27245489,
58.82200393, 60.10430588, 60.90874661, 61.51060014, 61.74838059,
63.28503148, 61.12237542, 60.87046418, 61.23634728, 60.99214796,
60.18921274, 60.07774571, 61.20623845, 61.65825197, 60.11025633,
60.52832382, 61.18188688, 61.31380433, 58.80528487, 57.84584698,
58.73805752, 54.85645345, 58.79988199, 60.07737149, 56.20096342,
60.3929374 , 36.77761826, 49.22568866, 55.10930206, 65.24736292,
57.08641006, 54.08806036, 53.89556268, 53.5613321 , 53.51515767,
52.30442805, 52.24562597, 53.50311397, 53.49561038, 53.53878528,
49.66610081, 52.35633014, 55.17584864, 53.945292 , 53.79353353,
54.8626422 , 54.87102507, 56.14098197, 57.38968051, 55.1146169 ,
54.92290752, 54.87858275, 54.86639486, 56.34316676, 56.16200014,
69.90905494, 68.20948497, 68.51263756, 65.64670149, 65.53992678,
67.07185321, 67.0542345 , 66.79344433, 66.75400526, 66.76640135,
66.76742739, 65.53052634, 67.01174217, 67.98329773, 69.18915578,
66.69019707, 69.61506484, 67.94096632, 67.91401491, 66.84415179,
67.88935229, 67.89356226, 69.1984958 , 52.24244378, 52.37211419,
50.95591909, 51.07641848, 50.91919022, 52.13500015, 52.26717303,
60.03109894, 65.23341727, 72.11099746, 75.02859632, 81.93540828,
81.20708335, 80.86208705, 81.04817673, 71.74669785, 73.05200134,
74.34519255, 75.72326992, 78.55812705, 76.95800509, 77.08696036,
79.61302675, 79.68123466, 78.31207499, 77.08036041, 77.18815309,
77.11523959, 75.74423094, 75.73143868, 74.48319908, 73.17138546,
66.80804931, 53.88772644, 53.87714358, 53.6088119 , 53.65411471,
54.86536613, 53.49300076, 53.52447811, 53.52000034, 56.83649529,
57.43503283, 82.38440921, 83.83190983, 83.9128805 , 83.94305425,
83.06892508, 82.91998964, 82.29555463, 82.30635577, 82.23464297,
82.20709065, 80.98821075, 83.93336979, 81.32873456, 82.46698736,
82.70592498, 83.93335761, 83.80821766, 83.84313602, 82.59867874,
82.62361191, 83.94865746, 83.83137976, 83.46075784, 82.14902814,
82.18902896, 83.83722778, 83.60064452, 83.63187976, 85.04806926,
84.87213079, 84.92473511, 84.90790341, 83.55500539, 83.59501005,
84.81195299, 84.86952928, 84.85600059, 84.81955391, 82.33120262,
78.56908599, 73.14783901, 64.99883861, 66.78701764, 64.5916058 ,
64.77055337, 64.56918786, 65.02605783, 65.01019955, 64.78145201,
64.77581828, 64.55221044, 64.34285288, 62.8764752 , 64.57949744,
63.17957281, 61.89857751, 63.48365778, 55.62801456, 43.17986365])
I want to find all the slopes for this signal. I have tried first order difference and second order difference (np.diff and taking the difference of the difference). But the point on the slope will have every small difference, in contrast to the point in the beginning or the end, where the difference is bigger.
Here is what I have tried
def detect_slope(signal, window_size = 3, threshold = 5):
list_ = []
for i in range(window_size, len(signal)-window_size):
diff_ = np.mean(signal[i-window_size:i]) - np.mean(signal[i:window_size+i])
list_.append(diff_)
first_order_diff = np.array(list_)
d = np.where(np.abs(first_order_diff) < threshold, 0 , first_order_diff)
idx = np.where(np.abs(d) != 0)
# might need some offset because we are doing some smoothing, but just use raw idx for now
# second order different
diff_list = np.array(list_).copy()
dd = np.diff(diff_list)
print(dd.shape)
dd_idx = np.where(np.abs(dd) > 0.5)
return diff_list, dd, idx, dd_idx
I have played around the 1st-2nd order difference but nothing seems to work. I'm trying to find all the peaks and troughs and exclude all of them or neighbors with close enough values too.
Attached is my desired output. Sorry for the crappy pic.

Not clear what you want here, but if your issue is that the diff is picking up local changes and you want to focus your attention on global changes, smooth the signal first.
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import savgol_filter
plt.plot(x)
x = savgol_filter(x, 21, 3)
plt.plot(x)
diff_list, dd, idx, dd_idx = detect_slope(x)
plt.plot(diff_list)
plt.show()
This gives -
Blue is your original signal, orange is your smoothed signal and green is your new diff. You can set it to pick up changes at various levels by playing around with the two parameters of savgol_filter. The more aggressively you smooth your function, the more global changes(and less local changes) the derivative picks up.

You can try find_peaks function from scipy. As you guess it gets the peaks defining a parameter to be more or less sensitive. The best one in your case is prominence ("How much you have to go down before finding another peak"). I use with your function in positive for max peaks and negative for min peaks.
import numpy as np
from matplotlib import pyplot as plt
from scipy.ndimage import median_filter
from scipy.signal import find_peaks
#Find peaks (maximum)
yhat = x
max_peaks,_ = find_peaks(yhat, prominence=10 )
min_peaks,_ = find_peaks(-yhat, prominence=10 )
#Plot data and max peak
fig, ax2 = plt.subplots(figsize=(20, 10))
ax2.plot(max_peaks, yhat[max_peaks], "xr",markersize=20)
ax2.plot(min_peaks, yhat[min_peaks], "xr",markersize=20)
ax2.plot(yhat,'-')
ax2.plot(yhat,'o',markersize=4)

Related

How to properly plot the pdf of a beta function in scipy.stats

I am trying to fit a beta distribution to some data, and then plot how well the beta distribution fits the data. But the output looks really weird and incorrect.
import scipy.stats as stats
import matplotlib.pyplot as plt
x = np.array([0.9999999 , 0.9602287 , 0.8823198 , 0.83825594, 0.92847216,
0.9632976 , 0.90275735, 0.8383094 , 0.9826664 , 0.9141795 ,
0.88799196, 0.9272752 , 0.94456017, 0.90466917, 0.8905505 ,
0.95424247, 0.781545 , 0.9489085 , 0.9578988 , 0.8644015 ])
beta_params = stats.beta.fit(x)
print(beta_params)
#(3.243900357315478, 1.5909897101396109, 0.7270083219563888, 0.27811444901271615
beta_pdf = stats.beta.pdf(x, beta_params[0], beta_params[1], beta_params[2], beta_params[3])
print(beta_pdf)
#[2.70181543 6.8442073 4.98204632 2.82445508 6.76055614 6.75910611
#5.90419012 2.82696622 5.58521916 6.34096675 5.2508072 6.73212694
#6.98854653 5.98225724 5.36937625 6.9519977 0.67812362 6.99116729
#6.89484982 4.10113147]
plt.plot(x, beta_pdf)
I'm not a statistician, but looking at your code I see that x is unordered.
Does sorting x before fit helps you?
x = np.sort(x)
beta_params = stats.beta.fit(x)
Doing so, you'd get this:

Trying to make a histogram within python for number of magnitudes from a text file

I need help on making a histogram dealing with the number of times a magnitude is within a range. I have a histogram made with the galaxy number, but I realized that doesn't really give any information.
I have tried making a bin of the galaxy numbers but realized that didn't really matter, nor did it work.
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
import csv
import math
from collections import Counter
import numpy as np
from numpy.polynomial.polynomial import polyfit
histflux = []
galnum = []
with open('/home/jacob/PHOTOMETRY/PHOTOM_CATS/SpARCS-0035_totalall_HAWKIKs.cat', 'r') as magfile:
magplots = csv.reader(magfile)
firstmagline = magfile.readline()
for line in magfile:
id , ra , dec , x , y , hawkiks_tot , k_flag , k_star , k_fluxrad , totmask , hawkiks , ehawkiks , vimosu , evimosu , vimosb , \
evimosb , vimosv , evimosv , vimosr , evimosr , vimosi , evimosi , decamz , edecamz , fourstarj1 , efourstarj1 , hawkij , ehawkij , \
irac1 , eirac1 , irac2, eirac2 , irac3 , eirac3 , irac4 , eirac4 = line.split()
goodflag = float(k_flag)
goodhawki = float(hawkiks)
if goodflag != 0.0:
continue
try:
histfluxk = -2.5 * math.log10(goodhawki) +25
except ValueError:
print(histfluxk)
histflux.append(histfluxk)
galnum.append(float(id))
plt.hist([galnum, histflux])
plt.xlabel('Galaxy Number')
plt.ylabel('K-Band Magnitude')
plt.title('K-Band Magnitudes of Galaxies')
plt.legend()
plt.show()
What I want to see is a histogram with the x axis ranging from 0-20 flux magnitudes in intervals of 2. The y-axis should be the number of times that the flux magnitudes were within those ranges. I am stumped on how to do this because I am new to python and especially making graphs on python.

Find locations on a curve where the slope changes

I have data points of time and voltage that create the curve shown below.
The time data is
array([ 0.10810811, 0.75675676, 1.62162162, 2.59459459,
3.56756757, 4.21621622, 4.97297297, 4.97297297,
4.97297297, 4.97297297, 4.97297297, 4.97297297,
4.97297297, 4.97297297, 5.08108108, 5.18918919,
5.2972973 , 5.51351351, 5.72972973, 5.94594595,
6.27027027, 6.59459459, 7.13513514, 7.67567568,
8.32432432, 9.18918919, 10.05405405, 10.91891892,
11.78378378, 12.64864865, 13.51351351, 14.37837838,
15.35135135, 16.32432432, 17.08108108, 18.16216216,
19.02702703, 20. , 20. , 20. ,
20. , 20. , 20. , 20. ,
20.10810811, 20.21621622, 20.43243243, 20.64864865,
20.97297297, 21.40540541, 22.05405405, 22.91891892,
23.78378378, 24.86486486, 25.83783784, 26.7027027 ,
27.56756757, 28.54054054, 29.51351351, 30.48648649,
31.56756757, 32.64864865, 33.62162162, 34.59459459,
35.67567568, 36.64864865, 37.62162162, 38.59459459,
39.67567568, 40.75675676, 41.83783784, 42.81081081,
43.89189189, 44.97297297, 46.05405405, 47.02702703,
48.10810811, 49.18918919, 50.27027027, 51.35135135,
52.43243243, 53.51351351, 54.48648649, 55.56756757,
56.75675676, 57.72972973, 58.81081081, 59.89189189])
and the volts data is
array([ 4.11041056, 4.11041056, 4.11041056, 4.11041056, 4.11041056,
4.11041056, 4.11041056, 4.10454545, 4.09794721, 4.09208211,
4.08621701, 4.07961877, 4.07228739, 4.06568915, 4.05909091,
4.05175953, 4.04516129, 4.03782991, 4.03123167, 4.02463343,
4.01803519, 4.01217009, 4.00557185, 3.99970674, 3.99384164,
3.98797654, 3.98284457, 3.97771261, 3.97331378, 3.96891496,
3.96451613, 3.96085044, 3.95645161, 3.95205279, 3.9483871 ,
3.94398827, 3.94032258, 3.93665689, 3.94325513, 3.94985337,
3.95645161, 3.96378299, 3.97038123, 3.97624633, 3.98284457,
3.98944282, 3.99604106, 4.0026393 , 4.00923754, 4.01510264,
4.02096774, 4.02609971, 4.02903226, 4.03196481, 4.03416422,
4.0356305 , 4.03709677, 4.03856305, 4.03929619, 4.04002933,
4.04076246, 4.04222874, 4.04296188, 4.04296188, 4.04369501,
4.04442815, 4.04516129, 4.04516129, 4.04589443, 4.04589443,
4.04662757, 4.04662757, 4.0473607 , 4.0473607 , 4.04809384,
4.04809384, 4.04809384, 4.04882698, 4.04882698, 4.04882698,
4.04956012, 4.04956012, 4.04956012, 4.04956012, 4.05029326,
4.05029326, 4.05029326, 4.05029326])
I would like to determine the location of the points labeled A, B, C, D, and E. Point A is the first location where the slope goes from zero to undefined. Point B is the location where the line is no longer vertical. Point C is the minimum of the curve. Point D is where the curve is no longer vertical. Point E is where the slope is close to zero again. The Python code below determines the locations for points A and C.
tdiff = np.diff(time)
vdiff = np.diff(volts)
# point A
idxA = np.where(vdiff < 0)[0][0]
timeA = time[idxA]
voltA = volts[idxA]
# point C
idxC = volts.idxmin()
timeC = time[idxC]
voltC = volts[idxC]
How can I determine the other locations on the curve represented by points B, D, and E?
You are looking for the points that mark any location where the slope changes to or from zero or infinity. We do not not actually need to compute slopes anywhere: either yn - yn-1 == 0 and yn+1 - yn != 0, or vice versa, or the same for x.
We can take the diff of x. If one of two successive elements is zero, then the diff of the diff will be the diff or the negative diff at that point. So we just want to find and label all points where diff(x) == diff(diff(x)) and diff(x) != 0, properly adjusted for differences in size between the arrays of course. We also want all the points where the same is true for y.
In numpy terms, this is can be written as follows
def masks(vec):
d = np.diff(vec)
dd = np.diff(d)
# Mask of locations where graph goes to vertical or horizontal, depending on vec
to_mask = ((d[:-1] != 0) & (d[:-1] == -dd))
# Mask of locations where graph comes from vertical or horizontal, depending on vec
from_mask = ((d[1:] != 0) & (d[1:] == dd))
return to_mask, from_mask
to_vert_mask, from_vert_mask = masks(time)
to_horiz_mask, from_horiz_mask = masks(volts)
Keep in mind that the masks are computed on second order differences, so they are two elements shorter than the inputs. Elements in the masks correspond to elements in the input arrays with a one-element border on the leading and trailing edge (hence the index [1:-1] below). You can convert the mask to indices using np.nonzero or you can get the x- and y-values directly using the masks as indices:
def apply_mask(mask, x, y):
return x[1:-1][mask], y[1:-1][mask]
to_vert_t, to_vert_v = apply_mask(to_vert_mask, time, volts)
from_vert_t, from_vert_v = apply_mask(from_vert_mask, time, volts)
to_horiz_t, to_horiz_v = apply_mask(to_horiz_mask, time, volts)
from_horiz_t, from_horiz_v = apply_mask(from_horiz_mask, time, volts)
plt.plot(time, volts, 'b-')
plt.plot(to_vert_t, to_vert_v, 'r^', label='Plot goes vertical')
plt.plot(from_vert_t, from_vert_v, 'kv', label='Plot stops being vertical')
plt.plot(to_horiz_t, to_horiz_v, 'r>', label='Plot goes horizontal')
plt.plot(from_horiz_t, from_horiz_v, 'k<', label='Plot stops being horizontal')
plt.legend()
plt.show()
Here is the resulting plot:
Notice that because the classification is done separately, "Point A" is correctly identified as being both a spot where verticalness starts and horizontalness ends. The problem is that "Point E" does not appear to be resolvable as such according to these criteria. Zooming in shows that all of the proliferated points correctly identify horizontal line segments:
You could choose a "correct" version of "Point E" by discarding from_horiz completely, and only the last value from to_horiz:
to_horiz_t, to_horiz_v = apply_mask(to_horiz_mask, time, volts)
to_horiz_t, to_horiz_v = to_horiz_t[-1], to_horiz_v[-1]
plt.plot(time, volts, 'b-')
plt.plot(*apply_mask(to_vert_mask, time, volts), 'r^', label='Plot goes vertical')
plt.plot(*apply_mask(from_vert_mask, time, volts), 'kv', label='Plot stops being vertical')
plt.plot(to_horiz_t, to_horiz_v, 'r>', label='Plot goes horizontal')
plt.legend()
plt.show()
I am using this as a showcase for the star expansion of the results of apply_mask. The resulting plot is:
This is pretty much exactly the plot you were looking for. Discarding from_horiz also makes "Point A" be identified only as a drop to vertical, which is nice.
As multiple values in to_horiz show, this method is very sensitive to noise within the data. Your data is quite smooth, but this approach is unlikely to ever work with raw unfiltered measurements.

How to get rid of variable amplitude in cosine function using python-scipy?

after a complex FFT analysis, I got my data as `
y=np.array([-0.31757207, -0.759897 , -0.97481323, -0.90067096, -0.56201419,
-0.06141066, 0.45184696, 0.82654122, 0.95429599, 0.80098432,
0.41565507, -0.08528661, -0.55350349, -0.85289024, -0.89924892,
-0.6838725 , -0.27446443, 0.20632688, 0.61789554, 0.84295248,
0.82091852, 0.56394004, 0.15139964, -0.29481341, -0.64650262,
-0.80602096, -0.73317927, -0.45486384, -0.0552602 , 0.3498015 ,
0.64649953, 0.75486814, 0.64997615, 0.36696461, -0.01077531,
-0.37629244, -0.62928397, -0.70331519, -0.58277349, -0.30541445,
0.04956568, 0.38413122, 0.60823604, 0.66362602, 0.53848173,
0.26941065, -0.06930203, -0.38601383, -0.59612545, -0.6442082 ,
-0.5182447 , -0.25246689, 0.08181356, 0.39497462, 0.60264688,
0.6480463 , 0.51731067, 0.24376628, -0.10019951, -0.42181997,
-0.63254052, -0.67215727, -0.52601037, -0.23032442, 0.13622702,
0.47298468, 0.68464621, 0.70816855, 0.53165966, 0.19956061,
-0.19797826, -0.54920882, -0.75206661, -0.74391156, -0.52103408,
-0.14180003, 0.28818356, 0.64528143, 0.82341019, 0.76573576,
0.48295615, 0.05224223, -0.40355087, -0.75089961, -0.88488436,
-0.76111445, -0.41051563, 0.06797007, 0.53521743, 0.85248562,
0.92285209, 0.72105995, 0.30251038, -0.21178154, -0.67024193,
-0.935627 , -0.92637938, -0.64190028, -0.16383967, 0.36735054,
0.79386302, 0.98768951, 0.88930233, 0.52608786, 0.00477518,
-0.52006527, -0.89210845, -1.00012051, -0.8114311 , -0.3818889 ,
0.16075761, 0.65507887, 0.95427369, 0.97001733, 0.69866763,
0.22200955])
Which, when plotted produces this:
This is a cosine function with variable amplitude. I'm looking for a way to get rid of the amplitude envelope to obtain normal cosine function which I can fit.
from matplotlib.pyplot import plot, figure
from numpy import array, exp, arange, real, append
from scipy.fftpack import fft, ifft
w = fft(y)
f = arange(0,y.size)
cf = 10 #central freq
wd = 0.5 #width
gate1 = exp(-((f-cf)/(0.707106*wd))**(2*2))
gate2 = append([0],gate1[::-1])[:-1] #symmetrize
gate = gate1+gate2
w_f = w*gate
figure(1)
plot(abs(w))
plot(abs(w_f))
figure(2)
plot(real(ifft(w_f)))

Python - Clipping out data to fit profiles

I have several sets of data to which I'm trying to fit different profiles. In the centre of one of the minima there is contamination that prevents me from doing a good fit as you can see in this image:
How can I clip out those spikes in the bottom of my data taking into account that the spike is not always in the same position? Or how would you deal with data like this? I'm using lmfit to fit the profiles, in this case a Lorentzian and a Gaussian. Here is a minimal working example where I have played with the initial values to fit the data more closely:
import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model
from lmfit.models import GaussianModel, ConstantModel, LorentzianModel
x = np.array([4085.18084467, 4085.38084374, 4085.5808428 , 4085.78084186, 4085.98084092, 4086.18083999, 4086.38083905, 4086.58083811, 4086.78083717, 4086.98083623, 4087.1808353 , 4087.38083436, 4087.58083342, 4087.78083248, 4087.98083155, 4088.18083061, 4088.38082967, 4088.58082873, 4088.78082779, 4088.98082686, 4089.18082592, 4089.38082498, 4089.58082404, 4089.78082311, 4089.98082217, 4090.18082123, 4090.38082029, 4090.58081935, 4090.78081842, 4090.98081748, 4091.18081654, 4091.3808156 , 4091.58081466, 4091.78081373, 4091.98081279, 4092.18081185, 4092.38081091, 4092.58080998, 4092.78080904, 4092.9808081 , 4093.18080716, 4093.38080622, 4093.58080529, 4093.78080435, 4093.98080341, 4094.18080247, 4094.38080154, 4094.5808006 , 4094.78079966, 4094.98079872, 4095.18079778, 4095.38079685, 4095.58079591, 4095.78079497, 4095.98079403, 4096.1807931 , 4096.38079216, 4096.58079122, 4096.78079028, 4096.98078934, 4097.18078841, 4097.38078747, 4097.58078653, 4097.78078559,4097.98078466, 4098.18078372, 4098.38078278, 4098.58078184, 4098.7807809 , 4098.98077997, 4099.18077903, 4099.38077809, 4099.58077715, 4099.78077622, 4099.98077528, 4100.18077434, 4100.3807734 , 4100.58077246, 4100.78077153, 4100.98077059, 4101.18076965, 4101.38076871, 4101.58076778, 4101.78076684, 4101.9807659 , 4102.18076496, 4102.38076402, 4102.58076309, 4102.78076215, 4102.98076121, 4103.18076027, 4103.38075934, 4103.5807584 , 4103.78075746, 4103.98075652, 4104.18075558, 4104.38075465, 4104.58075371, 4104.78075277, 4104.98075183, 4105.1807509 , 4105.38074996, 4105.58074902, 4105.78074808, 4105.98074714, 4106.18074621, 4106.38074527, 4106.58074433, 4106.78074339, 4106.98074246, 4107.18074152, 4107.38074058, 4107.58073964, 4107.7807387 , 4107.98073777, 4108.18073683, 4108.38073589, 4108.58073495, 4108.78073401, 4108.98073308, 4109.18073214, 4109.3807312 , 4109.58073026, 4109.78072933, 4109.98072839, 4110.18072745, 4110.38072651, 4110.58072557, 4110.78072464, 4110.9807237 , 4111.18072276, 4111.38072182, 4111.58072089, 4111.78071995, 4111.98071901, 4112.18071807, 4112.38071713, 4112.5807162 , 4112.78071526, 4112.98071432, 4113.18071338, 4113.38071245, 4113.58071151, 4113.78071057, 4113.98070963, 4114.18070869, 4114.38070776, 4114.58070682, 4114.78070588, 4114.98070494, 4115.18070401, 4115.38070307, 4115.58070213, 4115.78070119, 4115.98070025, 4116.18069932, 4116.38069838, 4116.58069744, 4116.7806965 , 4116.98069557, 4117.18069463, 4117.38069369, 4117.58069275, 4117.78069181, 4117.98069088, 4118.18068994, 4118.380689 , 4118.58068806, 4118.78068713, 4118.98068619, 4119.18068525, 4119.38068431, 4119.58068337, 4119.78068244, 4119.9806815 , 4120.18068056, 4120.38067962, 4120.58067869, 4120.78067775, 4120.98067681, 4121.18067587, 4121.38067493, 4121.580674 , 4121.78067306, 4121.98067212, 4122.18067118, 4122.38067025, 4122.58066931, 4122.78066837, 4122.98066743, 4123.18066649, 4123.38066556, 4123.58066462, 4123.78066368, 4123.98066274, 4124.1806618 , 4124.38066087, 4124.58065993, 4124.78065899, 4124.98065805, 4125.18065712, 4125.38065618, 4125.58065524, 4125.7806543 , 4125.98065336, 4126.18065243, 4126.38065149, 4126.58065055, 4126.78064961, 4126.98064868, 4127.18064774, 4127.3806468 , 4127.58064586, 4127.78064492, 4127.98064399, 4128.18064305, 4128.38064211, 4128.58064117, 4128.78064024, 4128.9806393 , 4129.18063836, 4129.38063742, 4129.58063648, 4129.78063555, 4129.98063461, 4130.18063367, 4130.38063273, 4130.5806318 , 4130.78063086, 4130.98062992, 4131.18062898, 4131.38062804, 4131.58062711, 4131.78062617, 4131.98062523, 4132.18062429, 4132.38062336, 4132.58062242, 4132.78062148, 4132.98062054, 4133.1806196 , 4133.38061867, 4133.58061773, 4133.78061679, 4133.98061585, 4134.18061492, 4134.38061398, 4134.58061304, 4134.7806121 , 4134.98061116])
y = np.array([0.90312759, 1.00923175, 0.94618369, 0.98284045, 0.91510612, 0.96737804, 0.97690214, 0.94363369, 1.00887784, 1.00110387, 0.91647096, 0.97943202, 1.00672907, 1.01552094, 1.01089407, 0.96914584, 0.9908419 , 1.0176613 , 0.97032148, 0.96003562, 0.9702355 , 0.93684173, 0.94652734, 0.94895018, 1.01214356, 0.85777678, 0.89308203, 0.9789272 , 0.93901884, 0.9684622 , 0.96969321, 0.86326307, 0.89607392, 0.92459571, 1.00454429, 1.06019733, 0.97291196, 0.95646497, 0.95899707, 1.02830351, 0.94938178, 0.91481128, 0.92606219, 0.97085631, 0.93597434, 0.91316857, 0.90644542, 0.91726926, 0.91686184, 0.96445563, 0.92166362, 0.95831572, 0.93859066, 0.85285273, 0.89944073, 0.91812428, 0.94265677, 0.88281406, 0.9470601 , 0.94921529, 0.97289222, 0.94632251, 0.96633195, 0.94096512, 0.95324803, 0.90920845, 0.92100257, 0.91181745, 0.95715298, 0.91715382, 0.90219214, 0.87585035, 0.86592191, 0.89335902, 0.85536392, 0.89619274, 0.9450366 , 0.82780137, 0.81214176, 0.83461329, 0.82858317, 0.80851704, 0.79253546, 0.85440086, 0.81679169, 0.80579976, 0.72312218, 0.75583125, 0.75204599, 0.84519188, 0.68686821, 0.71472154, 0.71706318, 0.72640234, 0.70526356, 0.68295282, 0.66795774, 0.65004383, 0.68096834, 0.72697547, 0.72436393, 0.77128385, 0.79666758, 0.67349101, 0.61479406, 0.57046337, 0.51614312, 0.52945366, 0.53112169, 0.53757761, 0.56680358, 0.63839684, 0.60704329, 0.62377533, 0.67862515, 0.64587581, 0.71316115, 0.76309798, 0.72217569, 0.7477785 , 0.79731849, 0.76934137, 0.77063868, 0.77871584, 0.77688526, 0.84342722, 0.85382332, 0.88700466, 0.85837992, 0.79589266, 0.83798993, 0.79835529, 0.84612746, 0.83214907, 0.86373676, 0.90729115, 0.82111605, 0.86165685, 0.84090099, 0.90389133, 0.89554032, 0.90792356, 0.92798016, 0.95588479, 0.95019718, 0.95447497, 0.89845759, 0.91638311, 0.99263342, 0.97477606, 0.95482538, 0.94489498, 0.94344967, 0.90526465, 0.92538486, 0.96279787, 0.94005143, 0.96842454, 0.92296494, 0.89954172, 0.8684367 , 0.95039002, 0.95229769, 0.93752274, 0.94741173, 0.96704449, 1.01130839, 0.95499414, 0.99596569, 0.95130622, 1.00014723, 1.00252218, 0.95130331, 1.0022896 , 0.99851989, 0.94405282, 0.95814021, 0.94851972, 1.01302067, 1.01400272, 0.97960083, 0.97070283, 1.01312797, 0.9842154 , 1.01147273, 0.97331853, 0.91403182, 0.96813051, 0.92319169, 0.9294103 , 0.96960715, 0.94811518, 0.97115083, 0.84687543, 0.90725159, 0.88061293, 0.87319615, 0.85331661, 0.89775082, 0.90956716, 0.83174505, 0.89753388, 0.89554364, 0.95329739, 0.87687031, 0.93883127, 0.97433899, 0.99515225, 0.97519981, 0.91956466, 0.97977674, 0.93582089, 1.00662722, 0.90157277, 1.02887754, 0.9777419 , 0.94257094, 1.02359615, 0.98968414, 1.00075502, 1.03230265, 1.05904074, 1.00488442, 1.05507886, 1.05085518, 1.02561781, 1.05896008, 0.98024381, 1.08005691, 0.94528977, 1.03853637, 1.02064405, 1.0467137 , 1.05375156, 1.12907949, 0.99295611, 1.06601022, 1.02846374, 0.98006807, 0.96446772, 0.97702428, 0.97788589, 0.93889781, 0.96366778, 0.96645265, 0.95857242, 1.05796304, 0.99441763, 1.00573183, 1.05001927])
e = np.array([0.0647344 , 0.04583914, 0.05665552, 0.04447208, 0.05644753, 0.03968611, 0.05985188, 0.04252311, 0.03366922, 0.04237672, 0.03765898, 0.03290132, 0.04626836, 0.05106203, 0.03619188, 0.03944098, 0.08115469, 0.05859644, 0.06091101, 0.05170821, 0.0427244 , 0.06804469, 0.06708318, 0.03369381, 0.04160575, 0.08007032, 0.09292148, 0.04378329, 0.08216214, 0.06087074, 0.05375458, 0.06185891, 0.06385766, 0.08084546, 0.04864063, 0.06400878, 0.04988693, 0.06689165, 0.05989534, 0.08010138, 0.0681177 , 0.04478208, 0.03876582, 0.05977015, 0.06610619, 0.05020086, 0.07244604, 0.0445143 , 0.06970626, 0.04423994, 0.0414573 , 0.06892836, 0.05715395, 0.04014724, 0.07908425, 0.06082051, 0.08380691, 0.08576757, 0.06571406, 0.04842625, 0.05298355, 0.05271857, 0.06340425, 0.10849621, 0.0811072 , 0.03642638, 0.10614094, 0.09865099, 0.06711037, 0.10244762, 0.11843505, 0.1092357 , 0.09748241, 0.09657009, 0.09970179, 0.10203563, 0.18494082, 0.14097796, 0.1151294 , 0.16172895, 0.17611204, 0.16226913, 0.2295418 , 0.17795924, 0.1253298 , 0.1771586 , 0.15139061, 0.14739618, 0.1620105 , 0.19158538, 0.21431605, 0.19292715, 0.23308884, 0.30519423, 0.31401994, 0.30569885, 0.31216375, 0.35147676, 0.25016472, 0.16232236, 0.09058787, 0.0604483 , 0.05168302, 0.21432774, 0.38149791, 0.5061975 , 0.44281541, 0.50646427, 0.43761581, 0.44989111, 0.47778238, 0.39944325, 0.32462726, 0.34560857, 0.3175776 , 0.30253441, 0.23059451, 0.24516185, 0.20708065, 0.26429751, 0.1830661 , 0.15155041, 0.16497299, 0.15794139, 0.13626666, 0.17839823, 0.13502886, 0.14148522, 0.10869864, 0.11723602, 0.09074029, 0.06922157, 0.07719777, 0.13181317, 0.11441895, 0.10655855, 0.12073767, 0.0846133 , 0.07974657, 0.06538693, 0.0573741 , 0.07864047, 0.08351471, 0.08130351, 0.0768824 , 0.07951992, 0.04478989, 0.0765122 , 0.04842814, 0.04355571, 0.05138656, 0.07215294, 0.04681987, 0.05790133, 0.06163808, 0.082449 , 0.06127927, 0.04971221, 0.05107901, 0.04493687, 0.06072161, 0.06094332, 0.03630467, 0.04162285, 0.04058228, 0.04526251, 0.06191432, 0.04901982, 0.0454908 , 0.06186274, 0.0407017 , 0.03865571, 0.04353665, 0.03898987, 0.04666321, 0.05856035, 0.04225933, 0.04797901, 0.03523971, 0.04728414, 0.05494382, 0.04773011, 0.03210954, 0.05651663, 0.03625933, 0.03596701, 0.03800191, 0.06267668, 0.06431192, 0.0602614 , 0.05139896, 0.04571979, 0.04375182, 0.0576867 , 0.07491418, 0.05339972, 0.07619115, 0.11569378, 0.07087871, 0.09076518, 0.13554717, 0.07811761, 0.07180695, 0.05831886, 0.06042863, 0.08759576, 0.06650081, 0.08420164, 0.08185432, 0.04338836, 0.04970979, 0.04008252, 0.03605485, 0.03456321, 0.05594584, 0.03856822, 0.03576337, 0.03118799, 0.0441686 , 0.0469118 , 0.03591666, 0.03562582, 0.04934832, 0.03280972, 0.03201576, 0.04338048, 0.07443531, 0.04121059, 0.03774147, 0.03717577, 0.03354207, 0.03806978, 0.0319364 , 0.03715712, 0.0379478 , 0.04867626, 0.0304592 , 0.03393844, 0.034518 , 0.04293514, 0.05177898, 0.05332907, 0.0352937 , 0.03359781, 0.04625272, 0.03733088, 0.03501259, 0.03346308, 0.04333749, 0.05741173])
cont = ConstantModel(prefix='cte_')
pars = cont.guess(y, x=x)
gauss = GaussianModel(prefix='g_')
pars.update( gauss.make_params())
pars['cte_c'].set(1)
pars['g_center'].set(4125, min=4120, max=4130)
pars['g_sigma'].set(1, min=0.5)
pars['g_amplitude'].set(-0.2, min=-0.5)
loren = LorentzianModel(prefix='l_')
pars.update( loren.make_params())
pars['l_center'].set(4106, min=4095, max=4115)
pars['l_sigma'].set(4, max=6)
pars['l_amplitude'].set(-6., max=-4.)
model = gauss + loren + cont
init = model.eval(pars, x=x)
result = model.fit(y, pars, x=x, weights=1/e)
#print(result.fit_report(min_correl=0.5))
fig, ax = plt.subplots(figsize=(8,6))
ax.plot(x, y, 'k-', lw=2) # data in red
ax.plot(x, init, 'g--', lw=2) # initial guess
ax.plot(x, result.best_fit, 'r-', lw=2) # best fit
ax.set(xlim=(4085,4135), ylim=(0.4,1.14))
If the bad point is always at the same x value, you could remove that point from the data, perhaps with something like:
import numpy as np
def index_nearest(array, value):
"""index of array nearest to value"""
return np.abs(array-value).argmin()
ybad = index_nearest(x, 4150)
y[ybad] = x[ybad] = np.nan
x = x[np.where(np.isfinite(y))]
y = y[np.where(np.isfinite(y))]
and then fit your model to those data with the bad point removed.
But, also: if there is not an obviously errant point and the data "just" noisy, there is probably no advantage to removing what looks like bad points. Your data looks noisy to me, but it's hard to see that there is a systematically bad point. If you are going to remove a point, remember that you are asserting that this measurement was not merely affected by normal noise, but was wrong.
Finally: another approach to treating noisy data might be to try to smooth the data, say with a Savitzky-Golay filter. There is always some danger of smoothing out features with such an approach, but a modest S-G filter is often good for cleaning up noisy data enough to detect features. Of course, if fits to filtered data give significantly different results from fits to unfiltered data, you will probably need to understand why that is.

Categories