Seaborn Scatterplot X Values Missing - python

I have a scatter plot im working with and for some reason im not seeing all the x values on my graph
#%%
from pandas import DataFrame, read_csv
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
file = r"re2.csv"
df = pd.read_csv(file)
#sns.set(rc={'figure.figsize':(11.7,8.27)})
g = sns.FacetGrid(df, col='city')
g.map(plt.scatter, 'type', 'price').add_legend()
This is an image of a small subset of my plots, you can see that Res is displaying, the middle bar should be displaying Con and the last would be Mlt. These are all defined in the type column from my data set but are not displaying.
Any clue how to fix?

Python is doing what you tell it to do. Just pick different features, presumably things that make more sense for plotting, if you want to generate a more interesting plots. See this generic example below.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme(style="darkgrid")
tips = sns.load_dataset("tips")
sns.relplot(x="total_bill", y="tip", hue="smoker", data=tips);
Personally, I like plotly plots, which are dynamic, more than I like seaborn plots.
https://plotly.com/python/line-and-scatter/

Related

seaborn mixing of plots

I'm having trouble creating this plot in spyder:
import seaborn as sns
import pandas as pd
from pandas.api.types import CategoricalDtype
diamonds= sns.load_dataset("diamonds")
df=diamonds.copy()
cut_Kategoriler=["Fair","Good","Very Good","Premium","Ideal"]
df.cut=df.cut.astype(CategoricalDtype(categories = cut_Kategoriler,ordered=True))
print(df.head())
sns.catplot(x="cut",y="price",data=df)
sns.barplot(x="cut",y="price",hue="color",data=df)
I want create two plots. But these plots overflap. How can i separate the graphics in the last two lines?
You need to import matplotlib.pyplot as plt and then add plt.show() after each of the two plots.
The modified code is added below:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt # Import Matplotlib
from pandas.api.types import CategoricalDtype
diamonds = sns.load_dataset("diamonds")
df=diamonds.copy()
cut_Kategoriler=["Fair","Good","Very Good","Premium","Ideal"]
df.cut=df.cut.astype(CategoricalDtype(categories = cut_Kategoriler,ordered=True))
print(df.head())
sns.catplot(x="cut",y="price",data=df)
plt.show() # Display the first plot
sns.barplot(x="cut",y="price",hue="color",data=df)
plt.show() # Display the second plot

How can I change the labels in this pie chart [plotly]?

I want to change the labels [2,3,4,5] from my pie chart and instead have them say [Boomer, Gen X, Gen Y, Gen Z] respectively. I can't seem to find a direct way of doing this without changing the dataframe. Is there any way to do this by working through the code I have?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
data = df.groupby("Q10_Ans")["Q4_Agree"].count()
pie, ax = plt.subplots(figsize=[10,6])
labels = data.keys()
plt.pie(x=data, autopct="%.1f%%", explode=[0.05]*4, labels=labels, pctdistance=0.5)
plt.title("Generations that agree data visualization will help with job prospects", fontsize=14);
pie.savefig("DeliveryPieChart.png")
how about change the code
labels = data.keys()
to
labels = ['Boomer','Gen X','Gen Y','Gen Z']
I don't know the data structure of your data, so I made a sample data and created a pie chart. Please modify your code to follow this.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
# data = df.groupby("Q10_Ans")["Q4_Agree"].count()
data = pd.DataFrame({'Q10_Ans':['Boomer','Gen X','Gen Y','Gen Z'],'Q4_Agree':[2,3,4,5]})
fig, ax = plt.subplots(figsize=[10,6])
labels = data['Q10_Ans']
ax.pie(x=data['Q4_Agree'], autopct="%.1f%%", explode=[0.05]*4, labels=labels, pctdistance=0.5)
ax.set_title("Generations that agree data visualization will help with job prospects", fontsize=14);
plt.savefig("DeliveryPieChart.png")

Matplotlib - How can I add labels to legend

Here I am trying to separate the data with the factor male or not by plotting Age on x-axis and Fare on y-axis and I want to display two labels in the legend differentiating male and female with respective colors.Can anyone help me do this.
Code:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('https://sololearn.com/uploads/files/titanic.csv')
df['male']=df['Sex']=='male'
sc1= plt.scatter(df['Age'],df['Fare'],c=df['male'])
plt.legend()
plt.show()
You could use the seaborn library which builds on top of matplotlib to perform the exact task you require. You can scatterplot 'Age' vs 'Fare' and colour code it by 'Sex' by just passing the hue parameter in sns.scatterplot, as follows:
import matplotlib.pyplot as plt
import seaborn as sns
plt.figure()
# No need to call plt.legend, seaborn will generate the labels and legend
# automatically.
sns.scatterplot(df['Age'], df['Fare'], hue=df['Sex'])
plt.show()
Seaborn generates nicer plots with less code and more functionality.
You can install seaborn from PyPI using pip install seaborn.
Refer: Seaborn docs
PathCollection.legend_elements method
can be used to steer how many legend entries are to be created and how they
should be labeled.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('https://sololearn.com/uploads/files/titanic.csv')
df['male'] = df['Sex']=='male'
sc1= plt.scatter(df['Age'], df['Fare'], c=df['male'])
plt.legend(handles=sc1.legend_elements()[0], labels=['male', 'female'])
plt.show()
Legend guide and Scatter plots with a legend for reference.
This can be achieved by segregating the data in two separate dataframe and then, label can be set for these dataframe.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('https://sololearn.com/uploads/files/titanic.csv')
subset1 = df[(df['Sex'] == 'male')]
subset2 = df[(df['Sex'] != 'male')]
plt.scatter(subset1['Age'], subset1['Fare'], label = 'Male')
plt.scatter(subset2['Age'], subset2['Fare'], label = 'Female')
plt.legend()
plt.show()
enter image description here

Stop seaborn plotting multiple figures on top of one another

I'm starting to learn a bit of python (been using R) for data analysis. I'm trying to create two plots using seaborn, but it keeps saving the second on top of the first. How do I stop this behavior?
import seaborn as sns
iris = sns.load_dataset('iris')
length_plot = sns.barplot(x='sepal_length', y='species', data=iris).get_figure()
length_plot.savefig('ex1.pdf')
width_plot = sns.barplot(x='sepal_width', y='species', data=iris).get_figure()
width_plot.savefig('ex2.pdf')
You have to start a new figure in order to do that. There are multiple ways to do that, assuming you have matplotlib. Also get rid of get_figure() and you can use plt.savefig() from there.
Method 1
Use plt.clf()
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset('iris')
length_plot = sns.barplot(x='sepal_length', y='species', data=iris)
plt.savefig('ex1.pdf')
plt.clf()
width_plot = sns.barplot(x='sepal_width', y='species', data=iris)
plt.savefig('ex2.pdf')
Method 2
Call plt.figure() before each one
plt.figure()
length_plot = sns.barplot(x='sepal_length', y='species', data=iris)
plt.savefig('ex1.pdf')
plt.figure()
width_plot = sns.barplot(x='sepal_width', y='species', data=iris)
plt.savefig('ex2.pdf')
I agree with a previous comment that importing matplotlib.pyplot is not the best software engineering practice as it exposes the underlying library. As I was creating and saving plots in a loop, then I needed to clear the figure and found out that this can now be easily done by importing seaborn only:
since version 0.11:
import seaborn as sns
import numpy as np
data = np.random.normal(size=100)
path = "/path/to/img/plot.png"
plot = sns.displot(data) # also works with histplot() etc
plot.fig.savefig(path)
plot.fig.clf() # this clears the figure
# ... continue with next figure
alternative example with a loop:
import seaborn as sns
import numpy as np
for i in range(3):
data = np.random.normal(size=100)
path = "/path/to/img/plot2_{0:01d}.png".format(i)
plot = sns.displot(data)
plot.fig.savefig(path)
plot.fig.clf() # this clears the figure
before version 0.11 (original post):
import seaborn as sns
import numpy as np
data = np.random.normal(size=100)
path = "/path/to/img/plot.png"
plot = sns.distplot(data)
plot.get_figure().savefig(path)
plot.get_figure().clf() # this clears the figure
# ... continue with next figure
Create specific figures and plot onto them:
import seaborn as sns
iris = sns.load_dataset('iris')
length_fig, length_ax = plt.subplots()
sns.barplot(x='sepal_length', y='species', data=iris, ax=length_ax)
length_fig.savefig('ex1.pdf')
width_fig, width_ax = plt.subplots()
sns.barplot(x='sepal_width', y='species', data=iris, ax=width_ax)
width_fig.savefig('ex2.pdf')
I've found that if the interaction is turned off seaborn plot the heatmap normally.

forming histogram plots in python

suppose I want to plot 2 histogram subplots on the same window in python, one below the next. The data from these histograms will be read from a file containing a table with attributes A and B.
In the same window, I need a plot of A vs the number of each A and a plot of B vs the number of each B - directly below the plot of A. so suppose the attributes were height and weight, then we'd have a graph of height and number of people with said height and below it a separate graph of weight and number of people with said weight.
import numpy as np; import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
frame = pd.read_csv('data.data', header=None)
subplot.hist(frame['A'], frame['A.count()'])
subplot.hist(frame['B'], frame['B.count()'])
Thanks for any help!
Using pandas you can make histograms like this:
import numpy as np; import pandas as pd
import matplotlib.pyplot as plt
frame = pd.read_csv('data.csv')
frame.hist(layout = (2,1))
plt.show()
I'm confused by the second part of the question. Do you want four separate subplots?
You can do this:
import numpy as np
import numpy.random
import pandas as pd
import matplotlib.pyplot as plt
#df = pd.read_csv('data.data', header=None)
df = pd.DataFrame({'A': numpy.random.random_integers(0,10,30),
'B': numpy.random.random_integers(0,10,30)})
print df['A']
ax1 = plt.subplot(211)
ax1.set_title('A')
ax1.set_ylabel('number of people')
ax1.set_xlabel('height')
ax2 = plt.subplot(212)
ax2.set_title('B')
ax2.set_ylabel('number of people')
ax2.set_xlabel('weight')
ax1.hist(df['A'])
ax2.hist(df['B'])
plt.tight_layout()
plt.show()

Categories