I have 2 datasets (df3 and df4 which respectively hold information for total head and efficiency) with a common independent variable (flow rate).
I am looking to plot both of them in the same graph but the dependent variables have different y-axes. I initially used lmplot() for the polynomial order functionality but this was unsuccessful in having both plots appear in one window. I would like assistance with combining both my scatter plot and regression plots into one plot which shows the overlap between the datasets.
I have used the following approach to generate my charts:
ax2.scatter(df3['Flow_Rate_(KG/S)'], df2['Efficiency_%'], color='pink')
ax2.scatter(df4['Flow_Rate_(KG/S)'], df4['Total Head'], color='teal')
plt.show()
The reason why it is important for the lines to be plotted against each other is that to monitor pump performance, we need to have both the total head (M) and efficiency % of the pump to understand the relationship and subsequent degradation of performance.
The only other way I could think of is to write the polynomial functions as equations to be put into arguments in the plot function and have them drawn out as such. I haven't yet tried this but thought I'd ask if there are any other alternatives before I head down this pathway.
Let me try to rephrase the problem: You have two datasets with common independent values, but different dependent values (f(x), g(x) respectively). You want to plot them both in the same graph, however the dependent values have totally different ranges. Therefore you want to have two different y axes, one for each dataset. The data should be plotted as a scatter plot and a regression line should be shown for each of them; you are more interested in seeing the regression line than knowing or calculating the regression curve itself. Hence you tried to use seaborn lmplot, but you were unsuccessful to get both datasets into the same graph.
In case the above is the problem you want to solve, the answer could be the following.
lmplot essentially plots a regplot to an axes grid. Because you don't need that axes grid here, using a regplot may make more sense. You may then create an axes and a twin axes and plot one regplot to each of them.
import numpy as np; np.random.seed(42)
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df1 = pd.DataFrame({"x": np.sort(np.random.rand(30)),
"f": np.sort(np.random.rayleigh(size=30))})
df2 = pd.DataFrame({"x": np.sort(np.random.rand(30)),
"g": 500-0.1*np.sort(np.random.rayleigh(20,size=30))**2})
fig, ax = plt.subplots()
ax2 = ax.twinx()
sns.regplot(x="x", y="f", data=df1, order=2, ax=ax)
sns.regplot(x="x", y="g", data=df2, order=2, ax=ax2)
ax2.legend(handles=[a.lines[0] for a in [ax,ax2]],
labels=["f", "g"])
plt.show()
Related
Hello I am very new to using python, I am starting to use it for creating graphs at work (for papers and reports etc). I was just wondering if someone could help with the problem which I have detailed below? I am guessing there is a very simple solution but I can't figure it out and it is driving me insane!
Basically, I am plotting the results from an experiment where by on the Y-axis I have the results which in this case is a numerical number (Result), against the x-axis which is categorical and is labeled Location. The data is then split across four graphs based on which machine the experiment is carried out on (Machine)(Also categorical).
This first part is easy the code used is this:
'sns.catplot(x='Location', y='Result', data=df3, hue='Machine', col='Machine', col_wrap = 2, linewidth=2, kind='swarm')'
this provides me with the following graph:
I now want to add another layer to the plot where by it is a red line which represents the Upper spec limit for the data.
So I add the following line off code to the above:
'sns.lineplot(x='Location',y=1.8, data=df3, linestyle='--', color='r',linewidth=2)'
This then gives the following graph:
As you can see the red line which I want is only on one of the graphs, all I want to do is add the same red line across all four graphs in the exact same position etc.
Can anyone help me???
You could use .map to draw a horizontal lines on each of the subplots. You need to catch the generated FacetGrid object into a variable.
Here is an example:
import matplotlib.pyplot as plt
import seaborn as sns
titanic = sns.load_dataset('titanic').dropna()
g = sns.catplot(x='class', y='age', data=titanic,
hue='embark_town', col='embark_town', col_wrap=2, linewidth=2, kind='swarm')
g.map(plt.axhline, y=50, ls='--', color='r', linewidth=2)
plt.tight_layout()
plt.show()
I am toying around with seaborn violinplot, trying to make a single "violin" with each half being a different distribution, to be easily compared.
Modifying the simple example from here by changing the x axis to x=smoker I got to the following graph (linked below).
import seaborn as sns
sns.set(style="whitegrid", palette="pastel", color_codes=True)
# Load the example tips dataset
tips = sns.load_dataset("tips")
# Draw a nested violinplot and split the violins for easier comparison
sns.violinplot(x="smoker", y="total_bill", hue="smoker",
split=True, inner="quart", data=tips)
sns.despine(left=True)
This is the resulting graph
I would like that the graph does not show two separated halves, just one single violin with two different distributions and colours.
Is it possible to do this with seaborn? Or maybe with other library?
Thanks!
This is because you are specifying two things for the x axis with this line x="smoker". Namely, that it plot smoker yes and smoker no.
What you really want to do is plot all data. To do this you can just specify a single value for the x axis.
sns.set(style="whitegrid", palette="pastel", color_codes=True)
# Load the example tips dataset
tips = sns.load_dataset("tips")
# Draw a nested violinplot and split the violins for easier comparison
sns.violinplot(x=['Data']*len(tips),y="total_bill", hue="smoker",
split=True, inner="quart",
palette={"Yes": "y", "No": "b"},
data=tips)
sns.despine(left=True)
This outputs the following:
I have just started using seaborn to produce my figures. However I can't seem to remove one of the legends produced here.
I am trying to plot two accuracies against each other and draw a line along the diagonal to make it easier to see which has performed better (if anyone has a better way of plotting this data in seaborn - let me know!). The legend I'd like to keep is the one on the left, that shows the different colours for 'N_bands' and different shapes for 'Subject No'
ax1 = sns.relplot(y='y',x='x',data=df,hue='N bands',legend='full',style='Subject No.',markers=['.','^','<','>','8','s','p','*','P','X','D','H','d']).set(ylim=(80,100),xlim=(80,100))
ax2 = sns.lineplot(x=range(80,110),y=range(80,110),legend='full')
I have tried setting the kwarg legend to 'full','brief' and False for both ax1 and ax2 (together and separately) and it only seems to remove the one on the left, or both.
I have also tried to remove the axes using matplotlib
ax1.ax.legend_.remove()
ax2.legend_.remove()
But this results in the same behaviour (left legend dissapearing).
UPDATE: Here is a minimal example you can run yourself:
test_data = np.array([[1.,2.,100.,9.],[2.,1.,100.,8.],[3.,4.,200.,7.]])
test_df = pd.DataFrame(columns=['x','y','p','q'], data=test_data)
sns.set_context("paper")
ax1=sns.relplot(y='y',x='x',data=test_df,hue='p',style='q',markers=['.','^','<','>','8'],legend='full').set(ylim=(0,4),xlim=(0,4))
ax2=sns.lineplot(x=range(0,5),y=range(0,5),legend='full')
Although this doesn't reproduce the error perfectly as the right legend is coloured (I have no idea how to reproduce this error then - does the way my dataframe was created make a difference?). But the essence of the problem remains - how do I remove the legend on the right but keep the one on the left?
You're plotting a lineplot in the (only) axes of a FacetGrid produced via relplot. That's quite unconventional, so strange things might happen.
One option to remove the legend of the FacetGrid but keeping the one from the lineplot would be
g._legend.remove()
Full code (where I also corrected for the confusing naming if grids and axes)
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
test_data = np.array([[1.,2.,100.,9.],[2.,1.,100.,8.],[3.,4.,200.,7.]])
test_df = pd.DataFrame(columns=['x','y','p','q'], data=test_data)
sns.set_context("paper")
g=sns.relplot(y='y',x='x',data=test_df,hue='p',style='q',markers=['.','^','<','>','8'], legend='full')
sns.lineplot(x=range(0,5),y=range(0,5),legend='full', ax=g.axes[0,0])
g._legend.remove()
plt.show()
Note that this is kind of a hack, and it might break in future seaborn versions.
The other option is to not use a FacetGrid here, but just plot a scatter and a line plot in one axes,
ax1 = sns.scatterplot(y='y',x='x',data=test_df,hue='p',style='q',
markers=['.','^','<','>','8'], legend='full')
sns.lineplot(x=range(0,5),y=range(0,5), legend='full', ax=ax1)
plt.show()
Currently I have a few plots using Facet Grids in seaborn. I have the following code:
g = sns.FacetGrid(masterdata1,col = "courseName")
g=g.map(plt.scatter, "SubjectwisePercentage", "SemesterPercentage")
The above code plots subjectwisepercentage vs semesterpercentage, for different courses across a semester. How can I plot the different scatter plots in a single plot, instead of multiple plots across the facet grid? In the single plot, the plotted points for each course should be a different color.
There are links online that specify how to plot different datasets in a single plot. However I need to use the same dataset. Therefore I need to specify col="courseName", or something equivalent, to plot course wise data in a single plot. I am not sure of how to accomplish this. Thank you in advance for your help.
You can try using seaborn's scatter plot features. It allows to define, x, y, hue and style, and even size. Which gives up to a 5D view of your data. Sometimes, people like to make hue and style based on the same variables for better-looking graphs.
Sample code (not pretty much mine, since the seaborn documentation pretty much explains everything).
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="ticks", color_codes=True)
tips = sns.load_dataset("tips")
# g = sns.FacetGrid(tips, col="sex", hue="time", palette="Set1",
# hue_order=["Dinner", "Lunch"])
# g= (g.map(plt.scatter, "total_bill", "tip")).add_legend()
# sns.scatterplot(data=tips, x="total_bill", y="tip", hue='time', style='sex')
sns.scatterplot(data=tips, x="total_bill", y="tip", hue='time', style='sex', size='size')
plt.show()
The matplotlib scatter plot can also be helpful. Since you can plot several data on the same plot with different markers/colors/sizes.
See this example.
The pairplot function from seaborn allows to plot pairwise relationships in a dataset.
According to the documentation (highlight added):
By default, this function will create a grid of Axes such that each variable in data will by shared in the y-axis across a single row and in the x-axis across a single column. The diagonal Axes are treated differently, drawing a plot to show the univariate distribution of the data for the variable in that column.
It is also possible to show a subset of variables or plot different variables on the rows and columns.
I could find only one example of subsetting different variables for rows and columns, here (it's the 6th plot under the Plotting pairwise relationships with PairGrid and pairplot() section). As you can see, it's plotting many independent variables (x_vars) against the same single dependent variable (y_vars) and the results are pretty nice.
I'm trying to do the same plotting a single independent variable against many dependent ones.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
ages = np.random.gamma(6,3, size=50)
data = pd.DataFrame({"age": ages,
"weight": 80*ages**2/(ages**2+10**2)*np.random.normal(1,0.2,size=ages.shape),
"height": 1.80*ages**5/(ages**5+12**5)*np.random.normal(1,0.2,size=ages.shape),
"happiness": (1-ages*0.01*np.random.normal(1,0.3,size=ages.shape))})
pp = sns.pairplot(data=data,
x_vars=['age'],
y_vars=['weight', 'height', 'happiness'])
The problem is that the subplots get arranged vertically, and I couldn't find a way to change it.
I know that then the tiling structure would not be so neat as the Y axis should be labeled at every subplot. Also, I know I could generate the plots making it by hand with something like this:
fig, axes = plt.subplots(ncols=3)
for i, yvar in enumerate(['weight', 'height', 'happiness']):
axes[i].scatter(data['age'],data[yvar])
Still, I'm learning to use the seaborn and I find interface very convenient, so I wonder if there's a way. Also, this example is pretty easy, but for more complex datasets seaborn handles for you many more things that would make the raw-matplotlib approach much more complex quite quickly (hue, to start)
You can achieve what it seems you are looking for by swapping the variable names passed to the x_vars and y_vars parameters. So revisiting the sns.pairplot portion of your code:
pp = sns.pairplot(data=data,
y_vars=['age'],
x_vars=['weight', 'height', 'happiness'])
Note that all I've done here is swap x_vars for y_vars. The plots should now be displayed horizontally:
The x-axis will now be unique to each plot with a common y-axis determined by the age column.