Plotting multiple scattter plots in the same graph instead of Facet Grids - python

Currently I have a few plots using Facet Grids in seaborn. I have the following code:
g = sns.FacetGrid(masterdata1,col = "courseName")
g=g.map(plt.scatter, "SubjectwisePercentage", "SemesterPercentage")
The above code plots subjectwisepercentage vs semesterpercentage, for different courses across a semester. How can I plot the different scatter plots in a single plot, instead of multiple plots across the facet grid? In the single plot, the plotted points for each course should be a different color.
There are links online that specify how to plot different datasets in a single plot. However I need to use the same dataset. Therefore I need to specify col="courseName", or something equivalent, to plot course wise data in a single plot. I am not sure of how to accomplish this. Thank you in advance for your help.

You can try using seaborn's scatter plot features. It allows to define, x, y, hue and style, and even size. Which gives up to a 5D view of your data. Sometimes, people like to make hue and style based on the same variables for better-looking graphs.
Sample code (not pretty much mine, since the seaborn documentation pretty much explains everything).
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="ticks", color_codes=True)
tips = sns.load_dataset("tips")
# g = sns.FacetGrid(tips, col="sex", hue="time", palette="Set1",
# hue_order=["Dinner", "Lunch"])
# g= (g.map(plt.scatter, "total_bill", "tip")).add_legend()
# sns.scatterplot(data=tips, x="total_bill", y="tip", hue='time', style='sex')
sns.scatterplot(data=tips, x="total_bill", y="tip", hue='time', style='sex', size='size')
plt.show()
The matplotlib scatter plot can also be helpful. Since you can plot several data on the same plot with different markers/colors/sizes.
See this example.

Related

Changing marker's size in stripplot according to data value

I'm trying to create a categorical plot in which the size of each marker reflects some magnitude of the corresponding sample, as in the following example using the preloaded tips data(upper plot https://i.stack.imgur.com/pRn0x.png):
import seaborn as sns
sns.set(style="whitegrid")
tips = sns.load_dataset("tips")
ax = sns.stripplot("day", "total_bill", data=tips, palette="Set2", size=tips["size"]*5, edgecolor="gray", alpha=.25)
But when I try the same with my own data, all the markers have the same size (lower plot https://i.stack.imgur.com/pRn0x.png):
import seaborn as sns
import pandas as pd
df = pd.read_csv("python_plot_test3.csv")
sns.set(style="whitegrid")
ax = sns.stripplot("log10p_value","term_name", data=df, palette="Set2", size=df['precision'], edgecolor="gray", alpha=.50)
I suspected the datatypes were not the same, but it didn't seem so, although, when I print df['precision'] it returns name and dtype and when I print tips["size"] it also returns its length.
Could someone give me a hint? I found how to change it in scatter plots, but nothing on categorical plots.
my data data:
term_name,log10p_value,precision
muscle structure development,33.34122617,15
anatomical structure morphogenesis,32.91330177,5
muscle system process,31.61813233,11
regulation of multicellular organismal process,30.84862451,25
system development,29.16494157,36
muscle cell differentiation,28.79114555,11
Okay, so looks like relplot is the right kind of function for this, at first I guessed it was specific for continuous data, but it also can handle categorized data. Although, I still don't understand why stripplot worked with the example data.

Violin plot: one violin, two halves by boolean value

I am toying around with seaborn violinplot, trying to make a single "violin" with each half being a different distribution, to be easily compared.
Modifying the simple example from here by changing the x axis to x=smoker I got to the following graph (linked below).
import seaborn as sns
sns.set(style="whitegrid", palette="pastel", color_codes=True)
# Load the example tips dataset
tips = sns.load_dataset("tips")
# Draw a nested violinplot and split the violins for easier comparison
sns.violinplot(x="smoker", y="total_bill", hue="smoker",
split=True, inner="quart", data=tips)
sns.despine(left=True)
This is the resulting graph
I would like that the graph does not show two separated halves, just one single violin with two different distributions and colours.
Is it possible to do this with seaborn? Or maybe with other library?
Thanks!
This is because you are specifying two things for the x axis with this line x="smoker". Namely, that it plot smoker yes and smoker no.
What you really want to do is plot all data. To do this you can just specify a single value for the x axis.
sns.set(style="whitegrid", palette="pastel", color_codes=True)
# Load the example tips dataset
tips = sns.load_dataset("tips")
# Draw a nested violinplot and split the violins for easier comparison
sns.violinplot(x=['Data']*len(tips),y="total_bill", hue="smoker",
split=True, inner="quart",
palette={"Yes": "y", "No": "b"},
data=tips)
sns.despine(left=True)
This outputs the following:

Remove one of the two legends produced in this Seaborn figure?

I have just started using seaborn to produce my figures. However I can't seem to remove one of the legends produced here.
I am trying to plot two accuracies against each other and draw a line along the diagonal to make it easier to see which has performed better (if anyone has a better way of plotting this data in seaborn - let me know!). The legend I'd like to keep is the one on the left, that shows the different colours for 'N_bands' and different shapes for 'Subject No'
ax1 = sns.relplot(y='y',x='x',data=df,hue='N bands',legend='full',style='Subject No.',markers=['.','^','<','>','8','s','p','*','P','X','D','H','d']).set(ylim=(80,100),xlim=(80,100))
ax2 = sns.lineplot(x=range(80,110),y=range(80,110),legend='full')
I have tried setting the kwarg legend to 'full','brief' and False for both ax1 and ax2 (together and separately) and it only seems to remove the one on the left, or both.
I have also tried to remove the axes using matplotlib
ax1.ax.legend_.remove()
ax2.legend_.remove()
But this results in the same behaviour (left legend dissapearing).
UPDATE: Here is a minimal example you can run yourself:
test_data = np.array([[1.,2.,100.,9.],[2.,1.,100.,8.],[3.,4.,200.,7.]])
test_df = pd.DataFrame(columns=['x','y','p','q'], data=test_data)
sns.set_context("paper")
ax1=sns.relplot(y='y',x='x',data=test_df,hue='p',style='q',markers=['.','^','<','>','8'],legend='full').set(ylim=(0,4),xlim=(0,4))
ax2=sns.lineplot(x=range(0,5),y=range(0,5),legend='full')
Although this doesn't reproduce the error perfectly as the right legend is coloured (I have no idea how to reproduce this error then - does the way my dataframe was created make a difference?). But the essence of the problem remains - how do I remove the legend on the right but keep the one on the left?
You're plotting a lineplot in the (only) axes of a FacetGrid produced via relplot. That's quite unconventional, so strange things might happen.
One option to remove the legend of the FacetGrid but keeping the one from the lineplot would be
g._legend.remove()
Full code (where I also corrected for the confusing naming if grids and axes)
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
test_data = np.array([[1.,2.,100.,9.],[2.,1.,100.,8.],[3.,4.,200.,7.]])
test_df = pd.DataFrame(columns=['x','y','p','q'], data=test_data)
sns.set_context("paper")
g=sns.relplot(y='y',x='x',data=test_df,hue='p',style='q',markers=['.','^','<','>','8'], legend='full')
sns.lineplot(x=range(0,5),y=range(0,5),legend='full', ax=g.axes[0,0])
g._legend.remove()
plt.show()
Note that this is kind of a hack, and it might break in future seaborn versions.
The other option is to not use a FacetGrid here, but just plot a scatter and a line plot in one axes,
ax1 = sns.scatterplot(y='y',x='x',data=test_df,hue='p',style='q',
markers=['.','^','<','>','8'], legend='full')
sns.lineplot(x=range(0,5),y=range(0,5), legend='full', ax=ax1)
plt.show()

Superimposing Images in Catplot

I am attempting to superimpose dot plots on top of a bar graph. Both types of plots have been generated using seaborn's catplot function.
Scatter-plot Code:
dotplot2 = sns.catplot(x="Group", y="MASQ_Score", col= "MASQ_Item", units="subject", aspect=.6, hue="Group", ci = 68, data=df_reshaped)
Resulting Plot Image:
Bar-plot Code:
barplot2 = sns.catplot(x="Group", y="MASQ_Score", hue='Group', kind="bar", col= "MASQ_Item", units="subject", aspect=.6, ci = 68, data=df_reshaped)
Resulting Plot Image:
Does anyone know if there is a way to superimpose the scatter-plot data on top of the bar-plot? So that both types of information are conveniently visible.
Based on #ImportanceOfBeingErnest comment, here is a working solution with open data using the map_dataframe method from FacetGrid:
import seaborn as sns
tips = sns.load_dataset("tips")
g = sns.FacetGrid(tips, col="time")
g.map_dataframe(sns.stripplot,x="sex", y="total_bill")
g.map_dataframe(sns.boxplot,x="sex", y="total_bill",boxprops={'facecolor':'None'},showfliers = False)
The stirpplot and boxplot can easily be replaced by other seaborn components like swarmplot or violinplot.
What you are plotting is not a scatter plot, it is a stirpplot. As #Rachid Riad shows. If what you are looking for is to make a barplot instead of a boxplot, you just have to change that line to:
sns.barplot()
I would personally recommend using boxplot and swarmplot.

Dual Plotting X-Axis via Seaborn

I have 2 datasets (df3 and df4 which respectively hold information for total head and efficiency) with a common independent variable (flow rate).
I am looking to plot both of them in the same graph but the dependent variables have different y-axes. I initially used lmplot() for the polynomial order functionality but this was unsuccessful in having both plots appear in one window. I would like assistance with combining both my scatter plot and regression plots into one plot which shows the overlap between the datasets.
I have used the following approach to generate my charts:
ax2.scatter(df3['Flow_Rate_(KG/S)'], df2['Efficiency_%'], color='pink')
ax2.scatter(df4['Flow_Rate_(KG/S)'], df4['Total Head'], color='teal')
plt.show()
The reason why it is important for the lines to be plotted against each other is that to monitor pump performance, we need to have both the total head (M) and efficiency % of the pump to understand the relationship and subsequent degradation of performance.
The only other way I could think of is to write the polynomial functions as equations to be put into arguments in the plot function and have them drawn out as such. I haven't yet tried this but thought I'd ask if there are any other alternatives before I head down this pathway.
Let me try to rephrase the problem: You have two datasets with common independent values, but different dependent values (f(x), g(x) respectively). You want to plot them both in the same graph, however the dependent values have totally different ranges. Therefore you want to have two different y axes, one for each dataset. The data should be plotted as a scatter plot and a regression line should be shown for each of them; you are more interested in seeing the regression line than knowing or calculating the regression curve itself. Hence you tried to use seaborn lmplot, but you were unsuccessful to get both datasets into the same graph.
In case the above is the problem you want to solve, the answer could be the following.
lmplot essentially plots a regplot to an axes grid. Because you don't need that axes grid here, using a regplot may make more sense. You may then create an axes and a twin axes and plot one regplot to each of them.
import numpy as np; np.random.seed(42)
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df1 = pd.DataFrame({"x": np.sort(np.random.rand(30)),
"f": np.sort(np.random.rayleigh(size=30))})
df2 = pd.DataFrame({"x": np.sort(np.random.rand(30)),
"g": 500-0.1*np.sort(np.random.rayleigh(20,size=30))**2})
fig, ax = plt.subplots()
ax2 = ax.twinx()
sns.regplot(x="x", y="f", data=df1, order=2, ax=ax)
sns.regplot(x="x", y="g", data=df2, order=2, ax=ax2)
ax2.legend(handles=[a.lines[0] for a in [ax,ax2]],
labels=["f", "g"])
plt.show()

Categories