extrapolation of regression line

extrapolation of regression line - python

I have plot a regression line. now I want to extrapolate it. I have tried with np.arange but it didn't work for me? I want to extend the line.
Another question is how i can make proper uncertainty intervals instead of adding a new formula.
import numpy as np
import datetime
import matplotlib.dates
import matplotlib.pyplot as plt
from scipy import polyfit, polyval
kwargs = dict(delimiter = '\t',\
skip_header = 0,\
missing_values = 'NaN',\
converters = {0:matplotlib.dates.strpdate2num('%d-%m-%Y %H:%M')},\
dtype = float,\
names = True,\
)
ratingcats = np.genfromtxt('C:\Users\ker\Documents\Discharge_and_stageheight_Catsop.txt',**kwargs)
dis_rat = ratingcats['discharge'] #change names of collumns
stage_rat = ratingcats['stage'] - 79.331
#mask NaN
dis_ratM = np.ma.masked_array(dis_rat,mask=np.isnan(dis_rat)).compressed()
stage_ratM = np.ma.masked_array(stage_rat,mask=np.isnan(dis_rat)).compressed()
#sort
sort_ind = np.argsort(stage_ratM)
stage_ratM = stage_ratM[sort_ind]
dis_ratM = dis_ratM[sort_ind]
#regression
a1,b1,c1 = polyfit(stage_ratM, dis_ratM, 2)
discharge_pred = polyval([a1,b1,c1],stage_ratM)
print 'regression coefficients'
print (a1,b1,c1)
#create upper and lower uncertainty
upper = discharge_pred*1.15
lower = discharge_pred*0.85
#create scatterplot
plt.scatter(stage_rat,dis_rat,color='b',label='Rating curve')
plt.plot(stage_ratM,discharge_pred,'r-',label='regression line')
plt.plot(stage_ratM,upper,'r--',label='15% error')
plt.plot(stage_ratM,lower,'r--')
plt.title('Rating curve Catsop')
plt.ylabel('discharge')
plt.ylim(0,1)
plt.xlabel('stageheight[m]')
plt.legend(loc='upper left', title='Legend')
plt.grid(True)
plt.show()

Instead of only using stage_ratM for your prediction, use np.arange:
prediction_extrapolation = polyval([a1,b1,c1], np.arange(60,100,1))
plt.plot(np.arange(60,100,1),prediction_extrapolation,'r-',label='regression line')
For your second question you might want to look into the command plt.errorbar Example.

Related

How to remove the noise from cycle-based signal to get better result of filtering via Fourier transform

I was working on filtering signals via Fourier Transforms using Python.
The raw signal was produced from some instruments which should contains the cycle-based singals with fluctuation. An example is shown as follows:
My aim is to extract every plateau of each cycle as one individual sample. With FFT-based method, I tried to firstly get a more smooth curve, and then use differential method to get the abrupt change point (largest/lowest differential value). Based on the starting point and ending point (showing in figure below), I could identify every individual sample which containing a few data point
However, this method met problem because the raw signals also included some non-sense signal which were not useful but leading to worse result of FFT filtering.
Here is the code and the results which I used part of the data and the whole time series. We could see in subplot 2 that the filtered signals could not reflect the actual flucation of the cycle of raw signal due to the influence of noises in the end.
import pandas as pd
import numpy as np
from scipy import fftpack
from scipy.signal import find_peaks, general_gaussian, fftconvolve
test = pd.read_csv("https://raw.githubusercontent.com/envhyf/Notebook/master/example_raw_signal.csv")
def fft_filter(sig):
time_step = 1
period = len(sig)
time_vec = np.arange(0, period,time_step)
# reference: https://scipy-lectures.org/intro/scipy/auto_examples/plot_fftpack.html
sig_fft = fftpack.fft(sig)
power = np.abs(sig_fft)**2
sample_freq = fftpack.fftfreq(sig.size, d=time_step)
pos_mask = np.where(sample_freq > 0)
freqs = sample_freq[pos_mask]
peak_freq = freqs[power[pos_mask].argmax()]
high_freq_fft = sig_fft.copy()
high_freq_fft[np.abs(sample_freq) > peak_freq] = 0
filtered_sig = fftpack.ifft(high_freq_fft)
return filtered_sig
fig = plt.figure(figsize=(12, 6))
##########only extract part of the signal#######################
ax = plt.subplot(211)
sig = cutting_sig(test).F2.values[0:600]
sig_filtered = fft_filter(sig)
period = len(sig)
time_step = 1
time_vec = np.arange(0, period,time_step)
plt.plot(time_vec, sig, label='Original signal')
plt.plot(time_vec,sig_filtered , linewidth=2, label='Filtered signal')
deriv = np.diff(sig_filtered)
pos_peaks, pos_details = find_peaks(deriv)
neg_peaks, neg_details = find_peaks(-deriv)
plt.scatter(time_vec[pos_peaks], sig_filtered[pos_peaks],color = 'r')
plt.scatter(time_vec[neg_peaks], sig_filtered[neg_peaks],color = 'k')
plt.title('Part of the data')
plt.xlabel('Time [s]')
plt.legend(loc='best')
#######################################################################
ax = plt.subplot(212)
sig = cutting_sig(test).F2.values#[0:600]
sig_filtered = fft_filter(sig)
period = len(sig)
time_step = 1
time_vec = np.arange(0, period,time_step)
plt.plot(time_vec, sig, label='Original signal')
plt.plot(time_vec,sig_filtered , linewidth=2, label='Filtered signal')
deriv = np.diff(sig_filtered)
pos_peaks, pos_details = find_peaks(deriv)
neg_peaks, neg_details = find_peaks(-deriv)
plt.scatter(time_vec[pos_peaks], sig_filtered[pos_peaks],color = 'r')
plt.scatter(time_vec[neg_peaks], sig_filtered[neg_peaks],color = 'k')
plt.title('Whole series')
plt.legend()
plt.tight_layout()
plt.show()
So, my question is how to remove those noises in advance and then conduct FFT filtering to achieve my target? I could not figure out the method to get rid of those bad data points out. Any suggestions or comments would be highly appreciated.

Python 2D interpolation with scipy.interpolate.RBFInterpolator

Last week I asked a question about finding a way to interpolate a surface from multiple curves (data from multiple Excel files) and someone referred me to a question which explains how to use scipy.interpolate.RBFInterpolator (How can I perform two-dimensional interpolation using scipy?).
I tried this method but I am getting a bad surface fitting (see the pictures below). Does anyone understand what is wrong with my code? I tried to change the kernel parameter but "linear" seems to be the best. Am I doing an error when I am using np.meshgrid? Thanks for the help.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import os
from scipy.interpolate import RBFInterpolator
fig = plt.figure(figsize=(15,10),dpi=400)
ax = fig.gca(projection='3d')
# List all the results files in the folder (here 'Sress_Strain') to plot them.
results_list = os.listdir(r"C:/Users/bdhugu/Desktop/Strain_Stress")
for i in range(len(results_list)):
if i == 0:
results = pd.read_excel(r"C:/Users/bdhugu/Desktop/Strain_Stress/"+results_list[i])
strain = results["Strain (mm/mm)"]
stress = results["Stress (MPa)"]
strain_rate = results["Strain rate (s^-1)"]
if i>0:
new_results = pd.read_excel(r"C:/Users/bdhugu/Desktop/Strain_Stress/"+results_list[i])
new_strain = new_results["Strain (mm/mm)"]
new_stress = new_results["Stress (MPa)"]
new_strain_rate = new_results["Strain rate (s^-1)"]
strain = strain.append(new_strain, ignore_index=False)
stress = stress.append(new_stress, ignore_index=False)
strain_rate = strain_rate.append(new_strain_rate,ignore_index=False)
# RBFINTERPOLATOR METHOD
# ----------------------------------------------------------------------------
x_scattered = strain
y_scattered = strain_rate
z_scattered = stress
scattered_points = np.stack([x_scattered.ravel(), y_scattered.ravel()],-1)
x_dense, y_dense = np.meshgrid(np.linspace(min(strain), max(strain), 20),np.linspace(min(strain_rate), max(strain_rate), 21))
dense_points = np.stack([x_dense.ravel(), y_dense.ravel()], -1)
interpolation = RBFInterpolator(scattered_points, z_scattered.ravel(), smoothing = 0, kernel='linear',epsilon=1, degree=0)
z_dense = interpolation(dense_points).reshape(x_dense.shape)
fig = plt.figure(figsize=(15,10),dpi=400)
ax = plt.axes(projection='3d')
ax.plot_surface(x_dense, y_dense, z_dense ,cmap='viridis', edgecolor='none')
ax.invert_xaxis()
ax.set_title('Surface plot')
plt.show()
Data to interpolate
Surface fitting with RBFInterpolator

plot the average of every x value

I plot a function which is based on the results of a curve fit I did in the query. Now I want to see how the curve fit actually fits the average values for every x value. I treid it with a for loop and a groupby.
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
plt.style.use('seaborn-colorblind')
x = dataset['mrwSmpVWi']
c = dataset['c']
a = dataset['a']
b = dataset['b']
Snr = dataset['Seriennummer']
dataset["y"] = (c / (1 + (a) * np.exp(-b*(x))))
for number in dataset.groupby('mrwSmpVWi'):
dataset['m'] = dataset['mrwSmpP'].mean()
fig, ax = plt.subplots(figsize=(30,15))
for name, group in dataset.groupby('Seriennummer'):
group.plot(x="mrwSmpVWi", y="m", ax=ax, marker='o', linestyle='', ms=12, label =name)
group.plot(x="mrwSmpVWi", y="y", ax=ax, label =name)
plt.show()
The dataset with the values is huge and not sorted by mrwSmpVWi.
Has someone an idea why I only get a straight line for my average values?

You got to take a look at what you're doing with this line:
for number in dataset.groupby('mrwSmpVWi'):
dataset['m'] = dataset['mrwSmpP'].mean()
You probably want:
dataset['m'] = dataset.groupby('Seriennummer')['mrwSmpVWi'].transform('mean')
(assuming you were intending to calculate the mean of each group of Serienummer)

Linear Regression: Extending line past data and adding a legend

I have a code:
import math
import numpy as np
import pylab as plt1
from matplotlib import pyplot as plt
uH2 = 1.90866638
uHe = 3.60187307
eH2 = 213.38
eHe = 31.96
R = float(uH2*eH2)/(uHe*eHe)
C_Values = []
Delta = []
kHeST = []
J_f21 = []
data = np.genfromtxt("Lamda_HeHCL.txt", unpack=True);
J_i1=data[1];
J_f1=data[2];
kHe=data[7]
data = np.genfromtxt("Basecol_Basic_New_1.txt", unpack=True);
J_i2=data[0];
J_f2=data[1];
kH2=data[5]
print kHe
print kH2
kHe = map(float, kHe)
kH2 = map(float, kH2)
kHe = np.array(kHe)
kH2= np.array(kH2)
g = len(kH2)
for n in range(0,g):
if J_f2[n] == 1:
Jf21 = J_f2[n]
J_f21.append(Jf21)
ratio = kHe[n]/kH2[n]
C = (((math.log(float(kH2[n]),10)))-(math.log(float(kHe[n]),10)))/math.log(R,10)
C_Values.append(C)
St = abs(J_f1[n] - J_i1[n])
Delta.append(St)
print C_Values
print Delta
print J_f21
fig, ax = plt.subplots()
ax.scatter(Delta,C_Values)
for i, txt in enumerate(J_f21):
ax.annotate(txt, (Delta[i],C_Values[i]))
plt.plot(np.unique(Delta), np.poly1d(np.polyfit(Delta, C_Values, 1))(np.unique(Delta)))
plt.plot(Delta, C_Values)
fit = np.polyfit(Delta,C_Values,1)
fit_fn = np.poly1d(fit)
# fit_fn is now a function which takes in x and returns an estimate for y
plt.scatter(Delta,C_Values, Delta, fit_fn(Delta))
plt.xlim(0, 12)
plt.ylim(-3, 3)
In this code, I am trying to plot a linear regression that extends past the data and touches the x-axis. I am also trying to add a legend to the plot that shows the slope of the plot. Using the code, I was able to plot this graph.
Here is some trash data I have been using to try and extend the line and add a legend to my code.
x =[5,7,9,15,20]
y =[10,9,8,7,6]
I would also like it to be a scatter except for the linear regression line.

Given that you don't provide the data you're loading from files I was unable to test this, but off the top of my head:
To extend the line past the plot, you could turn this line
plt.plot(np.unique(Delta), np.poly1d(np.polyfit(Delta, C_Values, 1))(np.unique(Delta)))
Into something like
x = np.linspace(0, 12, 50) # both 0 and 12 are from visually inspecting the plot
plt.plot(x, np.poly1d(np.polyfit(Delta, C_Values, 1))(x))
But if you want the line extended to the x-axis,
polynomial = np.polyfit(Delta, C_Values, 1)
x = np.linspace(0, *np.roots(polynomial))
plt.plot(x, np.poly1d(polynomial)(x))
As for the scatter plot thing, it seems to me you could just remove this line:
plt.plot(Delta, C_Values)
Oh right, as for the legend, add a label to the plots you make, like this:
plt.plot(x, np.poly1d(polynomial)(x), label='Linear regression')
and add a call to plt.legend() just before plt.show().

producing histogram with y axis as relative frequency?

Today my task is to produce a histogram where the y axis is a relative frequency rather than just an absolute count. I've located another question regarding this (see: Setting a relative frequency in a matplotlib histogram) however, when I try to implement it, I get the error message:
'list' object has no attribute size
despite having the exact same code given in the answer -- and despite their information also being stored in a list.
In addition, I have tried the method here(http://www.bertplot.com/visualization/?p=229) with no avail, as the output still doesn't show the y label as ranging from 0 to 1.
import numpy as np
import matplotlib.pyplot as plt
import random
from tabulate import tabulate
import matplotlib.mlab as mlab
precision = 100000000000
def MarkovChain(n,s) :
"""
"""
matrix = []
for l in range(n) :
lineLst = []
sum = 0
crtPrec = precision
for i in range(n-1) :
val = random.randrange(crtPrec)
sum += val
lineLst.append(float(val)/precision)
crtPrec -= val
lineLst.append(float(precision - sum)/precision)
matrix2 = matrix.append(lineLst)
print("The intial probability matrix.")
print(tabulate(matrix2))
baseprob = []
baseprob2 = []
baseprob3 = []
baseprob4 = []
for i in range(1,s): #changed to do a range 1-s instead of 1000
#must use the loop variable here, not s (s is always the same)
matrix_n = np.linalg.matrix_power(matrix2, i)
baseprob.append(matrix_n.item(0))
baseprob2.append(matrix_n.item(1))
baseprob3.append(matrix_n.item(2))
baseprob = np.array(baseprob)
baseprob2 = np.array(baseprob2)
baseprob3 = np.array(baseprob3)
baseprob4 = np.array(baseprob4)
# Here I tried to make a histogram using the plt.hist() command, but the normed=True doesn't work like I assumed it would.
'''
plt.hist(baseprob, bins=20, normed=True)
plt.show()
'''
#Here I tried to make a histogram using the method from the second link in my post.
# The code runs, but then the graph that is outputted isn't doesn't have the relative frequency on the y axis.
'''
n, bins, patches = plt.hist(baseprob, bins=30,normed=True,facecolor = "green",)
y = mlab.normpdf(bins,mu,sigma)
plt.plot(bins,y,'b-')
plt.title('Main Plot Title',fontsize=25,horizontalalignment='right')
plt.ylabel('Count',fontsize=20)
plt.yticks(fontsize=15)
plt.xlabel('X Axis Label',fontsize=20)
plt.xticks(fontsize=15)
plt.show()
'''
# Here I tried to make a histogram using the method seen in the Stackoverflow question I mentioned.
# The figure that pops out looks correct in terms of the axes, but no actual data is posted. Instead the error below is shown in the console.
# AttributeError: 'list' object has no attribute 'size'
fig = plt.figure()
ax = fig.add_subplot(111)
ax.hist(baseprob, weights=np.zeros_like(baseprob)+1./ baseprob.size)
n, bins, patches = ax.hist(baseprob, bins=100, normed=1, cumulative=0)
ax.set_xlabel('Bins', size=20)
ax.set_ylabel('Frequency', size=20)
ax.legend
plt.show()
print("The final probability matrix.")
print(tabulate(matrix_n))
matrixTranspose = zip(*matrix_n)
evectors = np.linalg.eig(matrixTranspose)[1][:,0]
print("The steady state vector is:")
print(evectors)
MarkovChain(5, 1000)
The methods I tried are each commented out, so to reproduce my errors, make sure to erase the comment markers.
As you can tell, I'm really new to Programming. Also this is not for a homework assignment in a computer science class, so there are no moral issues associated with just providing me with code.

The expected input to matplotlib functions are usually numpy arrays, which have the methods nparray.size. Lists do not have size methods so when list.size is called in the hist function, this causes your error. You need to convert, using nparray = np.array(list). You can do this after the loop where you build the lists with append, something like,
baseprob = []
baseprob2 = []
baseprob3 = []
baseprob4 = []
for i in range(1,s): #changed to do a range 1-s instead of 1000
#must use the loop variable here, not s (s is always the same)
matrix_n = numpy.linalg.matrix_power(matrix, i)
baseprob.append(matrix_n.item(0))
baseprob2.append(matrix_n.item(1))
baseprob3.append(matrix_n.item(2))
baseprob = np.array(baseprob)
baseprob2 = np.array(baseprob2)
baseprob3 = np.array(baseprob3)
baseprob4 = np.array(baseprob4)
EDIT: minimal hist example
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
baseprob = np.random.randn(1000000)
ax.hist(baseprob, weights=np.zeros_like(baseprob)+1./ baseprob.size, bins=100)
n, bins, patches = ax.hist(baseprob, bins=100, normed=1, cumulative=0, alpha = 0.4)
ax.set_xlabel('Bins', size=20)
ax.set_ylabel('Frequency', size=20)
ax.legend
plt.show()
which gives,

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

extrapolation of regression line - python

Related

How to remove the noise from cycle-based signal to get better result of filtering via Fourier transform

Python 2D interpolation with scipy.interpolate.RBFInterpolator

plot the average of every x value

Linear Regression: Extending line past data and adding a legend

producing histogram with y axis as relative frequency?

Categories

Resources