My data looks like this:
Rate 4 8 16 32 64 128 256
David 0.25 0.176 0.652 0.126 0.123 0.142 0.318
Saul 0.132 0.244 0.142 0.162 0.174 0.244 0.149
Maria 0.145 0.189 0.65 0.42 0.111 0.197 0.182
I need to plot this data 3d. For clarity, I have 3 dimensions, first one is names like David etc. Secondly the rate, and a variable value for every combination of Name and Rate.
Which kind of plot should I use? I am a little confused because rates and variables are in the same dimension but names are different and when I use regular 3d plot like scatter it said xs, ys and zs should be in the same dimension.
This is my code:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
fig=plt.figure()
ax=fig.add_subplot(111, projection='3d')
X=[1,2,3]
Y=[4,5,6]
Z=[7,8,9]
ax.scatter(X,Y,Z, c='r', marker='o')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
plt.show()
If we assume X, Y and Z as names, rates and variables, then an error arise about x and y and z should be in the same dimension!
Related
So I have a curve fit, and I'm wondering how to include the weighting based on the standard error.
Here's the df (defined as df_altered):
Temperature Growth_rate Standard_Error Result Final_results Weight
14.0 0.363 0.110 0.000 0.363 9.091
18.0 0.677 0.043 0.767 0.673 23.256
22.0 0.822 0.044 0.975 0.832 22.727
26.0 0.936 0.073 0.975 0.920 13.699
30.0 0.897 0.051 0.767 0.911 19.608
And here's the curve fit setup (I don't really know if this qualifies as a curve fit, though):
import numpy as np
import matplotlib.pyplot as plt
x = df_altered['Temperature']
y1 = df_altered['Growth_rate']
y2 = df_altered['Final_results']
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.plot(x, y1, 'r-')
ax2.plot(x, y2, 'g-')
ax1.set_xlabel('Temperature (°C)')
ax1.set_ylabel('Observed growth rate', color='r')
ax2.set_ylabel('Optimised modelled growth rate', color='g')
plt.show()
So essentially, I'd like to use the Weight and Standard_Error columns as a set of parameters to determine the weighting of the plots (the smaller the standard error, the greater the weighting). I've already set this up with popt2:
popt2, pcov2 = curve_fit(boatman_temperature_function_optimised, xdata = df_altered.Temperature, ydata = df_altered.Growth_rate, p0 = array_of_maxima, sigma=df_altered.Weight, bounds = (array_of_minima, array_of_maxima), absolute_sigma=True)
Any thoughts?
Background:
I am working on python, I have a lot of data points (in .CSV form) so far what the code I have
Reads the csv and the "result" column
if the value in the "result" column is positive, the code plots the A B C D E F G parameters corresponding to it in such a way that the y-axis is the value of the parameters and x-axis is the name of the parameter.
If the number of such "result" are more than 10 It plots the first 10 A B C D E F G parameters corresponding to the results.
An example of the type of dataset is below. (Mine contains around 12000 rows)
The Dataset
A B C D E F G result
1.00 0.85 -0.999 0.27 0.98 0.39 0.80 -0.86
0.89 0.4 -0.6 0.47 0.28 0.29 0.26 0.65
0.65 -1.00 0.26 0.67 -0.88 0.29 0.10 0.50
0.98 -0.98 0.76 0.37 0.68 0.59 0.90 0
0 0.5 0.56 0.27 0.38 0.79 0.48 -0.65
The code :
df = pd.read_csv("result.csv")
df.loc[df.result>0, df.columns[:-1]].T.plot(ls='', marker='o')
Issue :
Sometimes if the value is the same the dot mark is at the same place thus it's hard to see the frequency distribution(such as in Column B and C below though they look similar one value has more points.
What I want to do is to plot something like a ridge plot on the current graph (as I drew below )so that the frequency distribution can be seen.
I am a novice in this type of data visualization. Kindly guide me on how it could be done
The density plot type already does pretty much what you want, we just need to superpose it to your data:
>>> data_to_plot = df.loc[df.result>0, df.columns[:-1]]
>>> data_to_plot.plot(kind='density')
This is trivial if you want horizontal subplots, you can simply use the subplots=True on either plot (and then zip the returned axes with columns to superpose the other plot):
>>> axes = data_to_plot.plot(kind='density', subplots=True, legend=False)
>>> for ax, (colname, series) in zip(axes, data_to_plot.iteritems()):
... ax.plot(series.values, np.zeros_like(series), ls='', marker='o')
... ax.set_ylabel(colname)
However if you want them vertically it’s likely we’ll have to compute the Gaussian densities ourselves. Pandas documentation points to scipy.stats.gaussian_kde. For this we’ll need to know at which points to interpolate the kernel. On your example it looks like [-1..1] is a good interval but of course you can take it from data min/max.
>>> from scipy.stats import gaussian_kde
>>> y = np.arange(-1, 1.01, .01)
>>> ridges = data_to_plot.apply(lambda s: gaussian_kde(s)(y))
>>> ridges
A B C D E F G
0 0.001119 0.271510 0.270048 2.029737e-24 0.163222 2.352981e-15 0.000018
1 0.001247 0.272310 0.272122 4.796826e-24 0.164507 3.959987e-15 0.000021
2 0.001389 0.273071 0.274155 1.125941e-23 0.165765 6.637610e-15 0.000025
3 0.001545 0.273794 0.276145 2.624972e-23 0.166995 1.108083e-14 0.000030
4 0.001717 0.274479 0.278093 6.078288e-23 0.168200 1.842365e-14 0.000036
.. ... ... ... ... ... ... ...
196 0.939109 0.307535 0.314227 3.791151e-02 0.436305 3.153771e-01 0.630121
197 0.932996 0.304793 0.310216 3.100156e-02 0.431472 2.913782e-01 0.615406
198 0.926089 0.302012 0.306172 2.518140e-02 0.426576 2.682819e-01 0.600298
199 0.918401 0.299193 0.302097 2.031681e-02 0.421619 2.461581e-01 0.584834
200 0.909948 0.296337 0.297994 1.628194e-02 0.416607 2.250649e-01 0.569049
[201 rows x 7 columns]
Then simply ploy with zip, as before. There might be some adjustment needed, but this is how it looks like with your sample data. Note the scaling of ridges so they are all on the same scale and fit inside a 0.5-wide space on the plot.
>>> ax = data_to_plot.T.plot(ls='', marker='o')
>>> for n, (colname, ridge) in enumerate(ridges.iteritems()):
... ax.plot(ridge / (-2 * ridges.max().max()) + n, y, color='black')
I have a heatmap that I created from Pandas in this way:
tukey = tukey.set_index('index')
fix,ax = plt.subplots(figsize=(12,6))
ax.set_title(str(date)+' '+ str(hour)+':'+'00',fontsize=14)
heatmap_args = {'linewidths': 0.35, 'linecolor': '0.5', 'clip_on': False, 'square': True, 'cbar_ax_bbox': [0.75, 0.35, 0.04, 0.3]}
sp.sign_plot(tukey, **heatmap_args)
I have tried to do this with seaborn but I haven't gotten the desired output:
# Generate a mask for the upper triangle
mask = np.triu(np.ones_like(tukey, dtype=bool))
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(12, 6))
# Generate a custom diverging colormap
cmap = sns.diverging_palette(230, 20, as_cmap=True)
# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(tukey, mask=mask, cmap=cmap, vmax=.3, center=0,
square=True, linewidths=.5, cbar_kws={"shrink": .5})
As seen, it still shows square where it is supposed to be masked and obviously the cbar is different.
My question is if there is any way to make it diagonal without using seaborn? Or at least just to get rid of the repeating part?
Edit: sample of my dataframe (the tukey):
>>> 1_a 1_b 1_c 1_d 1_e 1_f
index
1_a 1.00 0.900 0.75 0.736 0.900 0.400
1_b 0.9000 1.000 0.72 0.715 0.900 0.508
1_c 0.756 0.342 1.000 0.005 0.124 0.034
1_d 0.736 0.715 0.900 1.000 0.081 0.030
1_e 0.900 0.900 0.804 0.793 1.000 0.475
1_f 0.400 0.508 0.036 0.030 0.475 1.000
*I might have typo mistakes , the two diagonal sides suppose to be equal.
edit:
imports:
import scikit_posthocs as sp
import pandas as pd
import numpy as np
import statsmodels.api as sm
import scipy.stats as stats
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt
import scipy.stats as stats
import seaborn as sns
scikit_posthocs' sign_plot() seems to create a QuadMesh (as does sns.heatmap). Setting an edge color to such a mesh will show horizontal and vertical lines for the full width and height of the mesh. To make the edges invisible in the "empty" region, they can be colored the same as the background (for example white). Making individual cells invisible can be done by setting their values to NaN such as in the code below.
Removing a column and a row (e.g. tukey.drop('1_f', axis=1, inplace=True) and
tukey.drop('1_a', axis=0, inplace=True)), doesn't help to make the plot a bit smaller because sign_plot adds them back in automatically.
import matplotlib.pyplot as plt
import scikit_posthocs as sp
import pandas as pd
import numpy as np
from io import StringIO
data_str = ''' 1_a 1_b 1_c 1_d 1_e 1_f
1_a 1.00 0.900 0.75 0.736 0.900 0.400
1_b 0.9000 1.000 0.72 0.715 0.900 0.508
1_c 0.756 0.342 1.000 0.005 0.124 0.034
1_d 0.736 0.715 0.900 1.000 0.081 0.030
1_e 0.900 0.900 0.804 0.793 1.000 0.475
1_f 0.400 0.508 0.036 0.030 0.475 1.000'''
tukey = pd.read_csv(StringIO(data_str), delim_whitespace=True)
cols = tukey.columns
for i in range(len(cols)):
for j in range(i, len(cols)):
tukey.iloc[i, j] = np.nan
fix, ax = plt.subplots(figsize=(12, 6))
heatmap_args = {'linewidths': 0.35, 'linecolor': 'white', 'clip_on': False, 'square': True,
'cbar_ax_bbox': [0.75, 0.35, 0.04, 0.3]}
sp.sign_plot(tukey, **heatmap_args)
plt.show()
I would like to plot two dataframes in order to compare the results. My first choice would be to plot line charts based only on one column from the two dataframes.
df
Name Surname P R F
0 B N 0.41 0.76 0.53
1 B L 0.62 0.67 0.61
2 B SV 0.63 0.53 0.52
3 B SG 0.43 0.61 0.53
4 B R 0.81 0.51 0.53
5 T N 0.32 0.82 0.53
6 T L 0.58 0.69 0.62
7 T SV 0.67 0.61 0.64
8 T SG 0.53 0.63 0.57
9 T R 0.74 0.48 0.58
and
data = [['B','N',0.41,0.72,0.51],
['B','L',0.66,0.67,0.62],
['B','SV',0.63,0.51,0.51],
['B','SG',0.44,0.63,0.51],
['B','R',0.81,0.51,0.62],
['T','N',0.33,0.80,0.47],
['T','L',0.58,0.61,0.63],
['T','SV',0.68,0.61,0.64],
['T','SG',0.53,0.63,0.57],
['T','R',0.74,0.48,0.58]]
df1 = pd.DataFrame(data, columns = ['Name','Surname','P','R','F'])
I would like to create a plot based on F values, keeping information (in legend/labels) of B/T and R,N,L, SV, SG.
I have tried with bar charts, but this does not take into account labels/legend.
I am looking for something like this:
fig, ax = plt.subplots()
ax2 = ax.twinx()
df.plot(x="Name", y=["F"], ax=ax)
df1.plot(x="Name", y=["F"], ax=ax2, ls="--")
However this misses labels and legend.
I have also tried with:
ax = df.plot()
l = ax.get_lines()
df1.plot(ax=ax, linestyle='--', color=(i.get_color() for i in l))
But I cannot distinguish by Name, Surname and dataframe (on the x axis there should be Surname).
It would be also ok to plot separately the values (P, R and F) as follows:
ax = df[['P']].plot()
l = ax.get_lines()
df1[['P']].plot(ax=ax, linestyle='--', color=(i.get_color() for i in l))
I should compare F values of the two plots based on Name and Surname.
Any help would be greatly appreciated.
IIUC,
fig, ax = plt.subplots()
ax2 = ax.twinx()
df.plot(x="Name", y=["F"], ax=ax)
df1.plot(x="Name", y=["F"], ax=ax2, ls="--")
fig.legend(loc="upper right", bbox_to_anchor=(1,1), bbox_transform=ax.transAxes)
Output:
The simplest way to add information about other parameters to a graph is to use functions like ax.text or ax.annotate over a loop. The code should look like this:
fig, ax = plt.subplots()
data1 = ax.bar(20*index, df["F"], bar_width)
data2 = ax.bar(20*index+bar_width, df1["F"],bar_width)
for i in index:
ax.text(i*20-5,0,df['Surname'][i],)
ax.text(i*20-5,0.05,df['Name'][i])
ax.text(i*20+bar_width-5,0,df1['Surname'][i])
ax.text(i*20+bar_width-5,0.05,df1['Name'][i])
plt.show()
Useful link:
Official Documentation for Text in Matplotlib Plots
Edit:
Probably similar problem: Different text at each point
Edit 2:
Code without index:
fig, ax = plt.subplots()
data1 = ax.plot(df["F"])
data2 = ax.plot(df1["F"])
for i in range(1,10):
ax.text(i,df["F"][i],df['Name'][i]+" "+df['Surname'][i],)
ax.text(i,df["F"][i],df['Name'][i]+" "+df['Surname'][i],)
plt.show()
I'm trying to plot a polar plot using the code below
ax = plt.subplot(111,polar=True)
plt.scatter(ts,rds,c=cs,
s=[(y*100)+10 for y in ys],
cmap='gist_rainbow')
ax.set_yticklabels([])
for i in range(len(names)):
ax.text(ts[i],rds[i],"{}".format(i+1),size=12)
plt.show()
There are 42 points, the first 7 of which are
# Theta Radius
1 249.25 0.39
2 66.25 0.40
3 239.09 0.71
4 31.82 1.05
5 114.02 0.54
6 189.15 0.46
7 359.00 0.05
However, the resulting figure/plot at the below link is incorrect
https://i.imgur.com/n5dMuND.png
Before taking a screenshot, I mouse hovered over point 7. You can see that point 7 is plotted at 41.186 degrees and not 359 as it should be. Any idea why this is?
Thanks