Plotting multiple scatter plots with simple linear regression equation showing in python? - python

I have a pandas df with multiple columns. I want to scatter plot multiple ordinary least squares regressions with equations showing on the plot. I also want to vary based on another column or 'hue' so that certain column values or hue's are plotted separately.
sns.relplot(y='Latitude', x='Longitude', hue='cluster', palette="Paired", s=9, data=df)
Is there a streamlined way of doing this, so that you don't have to write a function to iterate over each column and hue?

Related

Using seaborn how do I plot a column which has 70+ categories

I am trying to plot a column from a dataframe. There are about 8500 rows and the Assignment group column has about 70+ categories. How do I plot this visually using seaborn to get some meaningful output?
nlp_data['Assignment group'].hist(figsize=(17,7))
I used the hist() method to plot
you can use heatmap for such data
seaborn.heatmap

Center nested boxplots in Python/Seaborn with unequal classes

I have grouped data from which I want to generated boxplots using seaborn. However, not every group has all classes. As a result, the boxplots are not centered if classes are missing within one group:
Figure
The graph is generated using the following code:
sns.boxplot(x="label2", y="value", hue="variable",palette="Blues")
Is there any way to force seaborn to center theses boxes? I didn't find any approbiate way.
Thank you in advance.
Yes there is but you are not going to like it.
Centering these will mean that you will have the same y value for median values, so normalize your data so that the median is 0.5 for each y value for each value of x. That will give you the plot you want, but you should note that somewhere in the plot so people will not be confused.

How to plot multiple barplot with different Y and the same X?

I have a dataset and I want to find out how several columns values (numeric values) differ across two different groups ('group' is a column that takes either the value of 'high' or 'low').
I want to plot several barplots using a similar system/aesthetics to Seaborn's FacetGrid or PairGrid. Each plot will have a different Y value but the same X-axis (The group variable)
This is what I have so far:
sns.catplot(x='group', y='Number of findings (total)', kind="bar",
palette="muted", data=df)
But I would like to write a loop that can replace my y variable with different variables. How to do it?

Plot stacked histogram with grouped DataFrame

I want I stacked histogram where the different classes are visible.
At the moment I have the histogram without classes with this code:
plt.hist(hist_matrix2.column_name)
which produces this histogram:
and another histogram with the same data, that is grouped by the classes with this code:
hist_matrix2.groupby("number").column_name.plot.hist(alpha=0.5, bins = [0,5,10,15,20,25,30], stacked = True)
which produces this histogram:
As you can see the classes are there but it is not stacked, although the parameter is set. What can I do to stack the classes?
plt.hist has a built-in stacking flag you can set:
plt.hist(hist_matrix2.column_name, stacked=True)
Edit in response to your question, for long data (with multiple levels stacked) first you need to restructure the data into a list of lists:
wide=hist_matrix2.pivot( columns='number', values='column_name')
#This creates many missing values which pandas does not like, so we drop them
widelist=[wide[col].dropna() for col in wide.columns]
# and the stacked graph is here
plt.hist(widelist,stacked=True)
plt.show()

Compare 1 independent vs many dependent variables using seaborn pairplot in an horizontal plot

The pairplot function from seaborn allows to plot pairwise relationships in a dataset.
According to the documentation (highlight added):
By default, this function will create a grid of Axes such that each variable in data will by shared in the y-axis across a single row and in the x-axis across a single column. The diagonal Axes are treated differently, drawing a plot to show the univariate distribution of the data for the variable in that column.
It is also possible to show a subset of variables or plot different variables on the rows and columns.
I could find only one example of subsetting different variables for rows and columns, here (it's the 6th plot under the Plotting pairwise relationships with PairGrid and pairplot() section). As you can see, it's plotting many independent variables (x_vars) against the same single dependent variable (y_vars) and the results are pretty nice.
I'm trying to do the same plotting a single independent variable against many dependent ones.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
ages = np.random.gamma(6,3, size=50)
data = pd.DataFrame({"age": ages,
"weight": 80*ages**2/(ages**2+10**2)*np.random.normal(1,0.2,size=ages.shape),
"height": 1.80*ages**5/(ages**5+12**5)*np.random.normal(1,0.2,size=ages.shape),
"happiness": (1-ages*0.01*np.random.normal(1,0.3,size=ages.shape))})
pp = sns.pairplot(data=data,
x_vars=['age'],
y_vars=['weight', 'height', 'happiness'])
The problem is that the subplots get arranged vertically, and I couldn't find a way to change it.
I know that then the tiling structure would not be so neat as the Y axis should be labeled at every subplot. Also, I know I could generate the plots making it by hand with something like this:
fig, axes = plt.subplots(ncols=3)
for i, yvar in enumerate(['weight', 'height', 'happiness']):
axes[i].scatter(data['age'],data[yvar])
Still, I'm learning to use the seaborn and I find interface very convenient, so I wonder if there's a way. Also, this example is pretty easy, but for more complex datasets seaborn handles for you many more things that would make the raw-matplotlib approach much more complex quite quickly (hue, to start)
You can achieve what it seems you are looking for by swapping the variable names passed to the x_vars and y_vars parameters. So revisiting the sns.pairplot portion of your code:
pp = sns.pairplot(data=data,
y_vars=['age'],
x_vars=['weight', 'height', 'happiness'])
Note that all I've done here is swap x_vars for y_vars. The plots should now be displayed horizontally:
The x-axis will now be unique to each plot with a common y-axis determined by the age column.

Categories