Adding a weighting to curve fit - python

So I have a curve fit, and I'm wondering how to include the weighting based on the standard error.
Here's the df (defined as df_altered):
Temperature Growth_rate Standard_Error Result Final_results Weight
14.0 0.363 0.110 0.000 0.363 9.091
18.0 0.677 0.043 0.767 0.673 23.256
22.0 0.822 0.044 0.975 0.832 22.727
26.0 0.936 0.073 0.975 0.920 13.699
30.0 0.897 0.051 0.767 0.911 19.608
And here's the curve fit setup (I don't really know if this qualifies as a curve fit, though):
import numpy as np
import matplotlib.pyplot as plt
x = df_altered['Temperature']
y1 = df_altered['Growth_rate']
y2 = df_altered['Final_results']
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.plot(x, y1, 'r-')
ax2.plot(x, y2, 'g-')
ax1.set_xlabel('Temperature (°C)')
ax1.set_ylabel('Observed growth rate', color='r')
ax2.set_ylabel('Optimised modelled growth rate', color='g')
plt.show()
So essentially, I'd like to use the Weight and Standard_Error columns as a set of parameters to determine the weighting of the plots (the smaller the standard error, the greater the weighting). I've already set this up with popt2:
popt2, pcov2 = curve_fit(boatman_temperature_function_optimised, xdata = df_altered.Temperature, ydata = df_altered.Growth_rate, p0 = array_of_maxima, sigma=df_altered.Weight, bounds = (array_of_minima, array_of_maxima), absolute_sigma=True)
Any thoughts?

Related

How can I implement my PCA results to my unlabelled data?

I am struggling to implement my succesfull PCA.
This is how my PCA plot looks like:
I retrieved this from accelerometer data (x, y, z) which I have observed and labeled with A, S and D.
I can find a lot of information on the internet in how to perform a PCA but now I would like to implement it to my new data. And I cant find any information about that, or I am doing it all wrong.
This is my code:
import pandas as pd
import os
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
os.chdir(r'C:\Users\##\OneDrive - ##\##\Pyth\data\runFlume1')
os.getcwd()
## read csv
df = pd.read_csv('dataframe_0.csv', delimiter=',', names = ['x','y','z','gradient_x','gradient_y','gradient_z','target'])
features = ['x', 'y', 'z']
# Separating out the features
x = df.loc[:, features].values
# Separating out the target
y = df.loc[:,['target']].values
# Standardizing the features
x = StandardScaler().fit_transform(x)
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
, columns = ['principal component 1', 'principal component 2'])
finalDf = pd.concat([principalDf, df[['target']]], axis = 1)
fig = plt.figure(figsize = (8,8))
ax = fig.add_subplot(1,1,1)
ax.set_xlabel('Principal Component 1', fontsize = 15)
ax.set_ylabel('Principal Component 2', fontsize = 15)
ax.set_title('2 component PCA', fontsize = 20)
targets = ['A','S','D']
colors = ['r', 'g', 'b']
for target, color in zip(targets,colors):
indicesToKeep = finalDf['target'] == target
ax.scatter(finalDf.loc[indicesToKeep, 'principal component 1']
, finalDf.loc[indicesToKeep, 'principal component 2']
, c = color
, s = 50)
ax.legend(targets)
ax.grid()
And my raw dataframe looks like this:
x y z gradient_x gradient_y gradient_z target
0 -0.875 -0.143 0.516 0.0310 0.0000 0.032 A
1 -0.844 -0.143 0.548 0.0155 0.0000 0.000 A
2 -0.844 -0.143 0.516 0.0000 0.0000 0.000 A
3 -0.844 -0.143 0.548 0.0000 0.0000 0.016 A
4 -0.844 -0.143 0.548 0.0000 0.0000 0.016 A
... ... ... ... ... ... ...
17947 0.969 -0.079 0.161 0.0000 0.0475 0.016 D
17948 1.000 -0.079 0.161 0.0000 0.0000 0.000 D
17949 0.969 -0.079 0.161 0.0155 0.0000 0.000 D
17950 0.969 -0.079 0.161 0.0000 0.0000 0.000 D
17951 0.969 -0.079 0.161 0.0000 0.0000 0.000 D
So I would to like to use this PCA on data with no label (A,D,S). Does anyone know how I can do this?
Kind regards,
Simon
You can simply take your pca object and transform the features of your unlabelled data. Something like:
unlabelled_df = pd.read_csv('dataframe_unlabeled.csv',
delimiter=',', names = ['x','y','z','gradient_x','gradient_y','gradient_z'])
features = ['x', 'y', 'z']
# Separating out the features
x = df.loc[:, features].values
# Standardizing the features
# You need to retain your previous scaler and only `transform` here to avoid leakage
x = scaler.transform(x)
principalComponents = pca.transform(x)
principalDf = pd.DataFrame(data = principalComponents
, columns = ['principal component 1', 'principal component 2'])

Diagonal heatmap with matplotlib

I have a heatmap that I created from Pandas in this way:
tukey = tukey.set_index('index')
fix,ax = plt.subplots(figsize=(12,6))
ax.set_title(str(date)+' '+ str(hour)+':'+'00',fontsize=14)
heatmap_args = {'linewidths': 0.35, 'linecolor': '0.5', 'clip_on': False, 'square': True, 'cbar_ax_bbox': [0.75, 0.35, 0.04, 0.3]}
sp.sign_plot(tukey, **heatmap_args)
I have tried to do this with seaborn but I haven't gotten the desired output:
# Generate a mask for the upper triangle
mask = np.triu(np.ones_like(tukey, dtype=bool))
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(12, 6))
# Generate a custom diverging colormap
cmap = sns.diverging_palette(230, 20, as_cmap=True)
# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(tukey, mask=mask, cmap=cmap, vmax=.3, center=0,
square=True, linewidths=.5, cbar_kws={"shrink": .5})
As seen, it still shows square where it is supposed to be masked and obviously the cbar is different.
My question is if there is any way to make it diagonal without using seaborn? Or at least just to get rid of the repeating part?
Edit: sample of my dataframe (the tukey):
>>> 1_a 1_b 1_c 1_d 1_e 1_f
index
1_a 1.00 0.900 0.75 0.736 0.900 0.400
1_b 0.9000 1.000 0.72 0.715 0.900 0.508
1_c 0.756 0.342 1.000 0.005 0.124 0.034
1_d 0.736 0.715 0.900 1.000 0.081 0.030
1_e 0.900 0.900 0.804 0.793 1.000 0.475
1_f 0.400 0.508 0.036 0.030 0.475 1.000
*I might have typo mistakes , the two diagonal sides suppose to be equal.
edit:
imports:
import scikit_posthocs as sp
import pandas as pd
import numpy as np
import statsmodels.api as sm
import scipy.stats as stats
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt
import scipy.stats as stats
import seaborn as sns
scikit_posthocs' sign_plot() seems to create a QuadMesh (as does sns.heatmap). Setting an edge color to such a mesh will show horizontal and vertical lines for the full width and height of the mesh. To make the edges invisible in the "empty" region, they can be colored the same as the background (for example white). Making individual cells invisible can be done by setting their values to NaN such as in the code below.
Removing a column and a row (e.g. tukey.drop('1_f', axis=1, inplace=True) and
tukey.drop('1_a', axis=0, inplace=True)), doesn't help to make the plot a bit smaller because sign_plot adds them back in automatically.
import matplotlib.pyplot as plt
import scikit_posthocs as sp
import pandas as pd
import numpy as np
from io import StringIO
data_str = ''' 1_a 1_b 1_c 1_d 1_e 1_f
1_a 1.00 0.900 0.75 0.736 0.900 0.400
1_b 0.9000 1.000 0.72 0.715 0.900 0.508
1_c 0.756 0.342 1.000 0.005 0.124 0.034
1_d 0.736 0.715 0.900 1.000 0.081 0.030
1_e 0.900 0.900 0.804 0.793 1.000 0.475
1_f 0.400 0.508 0.036 0.030 0.475 1.000'''
tukey = pd.read_csv(StringIO(data_str), delim_whitespace=True)
cols = tukey.columns
for i in range(len(cols)):
for j in range(i, len(cols)):
tukey.iloc[i, j] = np.nan
fix, ax = plt.subplots(figsize=(12, 6))
heatmap_args = {'linewidths': 0.35, 'linecolor': 'white', 'clip_on': False, 'square': True,
'cbar_ax_bbox': [0.75, 0.35, 0.04, 0.3]}
sp.sign_plot(tukey, **heatmap_args)
plt.show()

Plotting values from two datasets for comparison

I would like to plot two dataframes in order to compare the results. My first choice would be to plot line charts based only on one column from the two dataframes.
df
Name Surname P R F
0 B N 0.41 0.76 0.53
1 B L 0.62 0.67 0.61
2 B SV 0.63 0.53 0.52
3 B SG 0.43 0.61 0.53
4 B R 0.81 0.51 0.53
5 T N 0.32 0.82 0.53
6 T L 0.58 0.69 0.62
7 T SV 0.67 0.61 0.64
8 T SG 0.53 0.63 0.57
9 T R 0.74 0.48 0.58
and
data = [['B','N',0.41,0.72,0.51],
['B','L',0.66,0.67,0.62],
['B','SV',0.63,0.51,0.51],
['B','SG',0.44,0.63,0.51],
['B','R',0.81,0.51,0.62],
['T','N',0.33,0.80,0.47],
['T','L',0.58,0.61,0.63],
['T','SV',0.68,0.61,0.64],
['T','SG',0.53,0.63,0.57],
['T','R',0.74,0.48,0.58]]
df1 = pd.DataFrame(data, columns = ['Name','Surname','P','R','F'])
I would like to create a plot based on F values, keeping information (in legend/labels) of B/T and R,N,L, SV, SG.
I have tried with bar charts, but this does not take into account labels/legend.
I am looking for something like this:
fig, ax = plt.subplots()
ax2 = ax.twinx()
df.plot(x="Name", y=["F"], ax=ax)
df1.plot(x="Name", y=["F"], ax=ax2, ls="--")
However this misses labels and legend.
I have also tried with:
ax = df.plot()
l = ax.get_lines()
df1.plot(ax=ax, linestyle='--', color=(i.get_color() for i in l))
But I cannot distinguish by Name, Surname and dataframe (on the x axis there should be Surname).
It would be also ok to plot separately the values (P, R and F) as follows:
ax = df[['P']].plot()
l = ax.get_lines()
df1[['P']].plot(ax=ax, linestyle='--', color=(i.get_color() for i in l))
I should compare F values of the two plots based on Name and Surname.
Any help would be greatly appreciated.
IIUC,
fig, ax = plt.subplots()
ax2 = ax.twinx()
df.plot(x="Name", y=["F"], ax=ax)
df1.plot(x="Name", y=["F"], ax=ax2, ls="--")
fig.legend(loc="upper right", bbox_to_anchor=(1,1), bbox_transform=ax.transAxes)
Output:
The simplest way to add information about other parameters to a graph is to use functions like ax.text or ax.annotate over a loop. The code should look like this:
fig, ax = plt.subplots()
data1 = ax.bar(20*index, df["F"], bar_width)
data2 = ax.bar(20*index+bar_width, df1["F"],bar_width)
for i in index:
ax.text(i*20-5,0,df['Surname'][i],)
ax.text(i*20-5,0.05,df['Name'][i])
ax.text(i*20+bar_width-5,0,df1['Surname'][i])
ax.text(i*20+bar_width-5,0.05,df1['Name'][i])
plt.show()
Useful link:
Official Documentation for Text in Matplotlib Plots
Edit:
Probably similar problem: Different text at each point
Edit 2:
Code without index:
fig, ax = plt.subplots()
data1 = ax.plot(df["F"])
data2 = ax.plot(df1["F"])
for i in range(1,10):
ax.text(i,df["F"][i],df['Name'][i]+" "+df['Surname'][i],)
ax.text(i,df["F"][i],df['Name'][i]+" "+df['Surname'][i],)
plt.show()

How do I plot a beautiful scatter plot with linear regression?

I want to make a beautiful scatter plot with linear regression line using the data given below. I was able to create a scatter plot but am not satisfied with how it looks. Additionally, I want to plot a linear regression line on the data.
My data and code are below:
x y
117.00 111.0
107.00 110.0
77.22 78.0
112.00 95.4
149.00 150.0
121.00 121.0
121.61 120.0
111.54 140.0
73.00 72.0
70.47 000.0
66.3 72.0
113.00 131.0
81.00 81.0
72.00 00.0
74.20 98.0
84.24 90.0
86.60 88.0
99.00 97.0
90.00 102.0
85.00 000.0
138.0 135.0
96.00 93.0
import numpy as np
import matplotlib.pyplot as plt
print(plt.style.available)
from sklearn.linear_model import LinearRegression
plt.style.use('ggplot')
data = np.loadtxt('test_data', dtype=float, skiprows=1,usecols=(0,1))
x=data[:,0]
y=data[:,1]
plt.xlim(20,200)
plt.ylim(20,200)
plt.scatter(x,y, marker="o",)
plt.show()
Please check the snippet. You can use numpy.polyfit() with degree=1 to calculate slope and y-intercept of line to y=m*x+c
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
data = np.loadtxt('test_data.txt', dtype=float, skiprows=1,usecols=(0,1))
x=data[:,0]
y=data[:,1]
plt.xlim(20,200)
plt.ylim(20,200)
plt.scatter(x,y, marker="o",)
m, b = np.polyfit(x, y, 1)
plt.plot(x, m*x + b)
plt.show()
Edit1:
Based on your comment, I added more points and now graph seems like this and it seems it passes via points.
To set transparency to points you can use alpha argument . You can set range between 0 and 1 to change transparency. Here I set alpha=0.5
plt.scatter(x,y, marker="o",alpha=0.5)
Edit2: Based on #tmdavison suggestion
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
data = np.loadtxt('test_data.txt', dtype=float, skiprows=1,usecols=(0,1))
x=data[:,0]
y=data[:,1]
x2 = np.arange(0, 200)
plt.xlim(20,200)
plt.ylim(20,200)
plt.scatter(x,y, marker="o",)
m, b = np.polyfit(x, y, 1)
plt.plot(x2, m*x2 + b)
plt.show()

How can i plot this in matplotlib with 3 dimensions?

My data looks like this:
Rate 4 8 16 32 64 128 256
David 0.25 0.176 0.652 0.126 0.123 0.142 0.318
Saul 0.132 0.244 0.142 0.162 0.174 0.244 0.149
Maria 0.145 0.189 0.65 0.42 0.111 0.197 0.182
I need to plot this data 3d. For clarity, I have 3 dimensions, first one is names like David etc. Secondly the rate, and a variable value for every combination of Name and Rate.
Which kind of plot should I use? I am a little confused because rates and variables are in the same dimension but names are different and when I use regular 3d plot like scatter it said xs, ys and zs should be in the same dimension.
This is my code:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
fig=plt.figure()
ax=fig.add_subplot(111, projection='3d')
X=[1,2,3]
Y=[4,5,6]
Z=[7,8,9]
ax.scatter(X,Y,Z, c='r', marker='o')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
plt.show()
If we assume X, Y and Z as names, rates and variables, then an error arise about x and y and z should be in the same dimension!

Categories