Seaborn automatic hide yticks if there are too many yticks - python

I'm using seaborn to draw a heatmap. But if there are too many yticks, some of them will be automatically hidden. The result looks like:
As you can see, the yticks only shows 1, 3, 5, 7.... 31, 33
How can I let seaborn or matplotlib show all of them like: 1, 2, 3, 4.....31, 32, 33, 34 ?
my code is:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
y = np.random.randint(1, 100, 510)
y = y.reshape((34,15))
df = pd.DataFrame(y, columns=[x for x in 'wwwwwwwwwwwwwww'], index=['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31','32','33','34'])
sns.heatmap(df, annot=True)
plt.yticks(rotation=0)
plt.show()

Seaborn heatmap provides arguments
xticklabels, yticklabels : “auto”, bool, list-like, or int, optional
If True, plot the column names of the dataframe. If False, don’t plot the column names. If list-like, plot these alternate labels as the xticklabels. If an integer, use the column names but plot only every n label. If “auto”, try to densely plot non-overlapping labels.
Hence the easies solution is to add yticklabels=1 as argument.
sns.heatmap(df, annot=True, yticklabels=1)

Related

How to sync color between Seaborn and pandas pie plot

I am struggling with syncing colors between [seaborn.countplot] and [pandas.DataFrame.plot] pie plot.
I found a similar question on SO, but it does not work with pie chart as it throws an error:
TypeError: pie() got an unexpected keyword argument 'color'
I searched on the documentation sites, but all I could find is that I can set a colormap and palette, which was also not in sync in the end:
Result of using the same colormap and palette
My code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
df[var].value_counts().plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
cplot = sns.countplot(data=df, x=var, ax=ax[1])
for patch in cplot.patches:
cplot.annotate(
format(patch.get_height()),
(
patch.get_x() + patch.get_width() / 2,
patch.get_height()
)
)
plt.show()
Illustration of the problem
As you can see, colors are not in sync with labels.
I added the argument order to the sns.countplot(). This would change how seaborn selects the values and as a consequence the colours between both plots will mach.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
df[var].value_counts().plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
cplot = sns.countplot(data=df, x=var, ax=ax[1],
order=df[var].value_counts().index)
for patch in cplot.patches:
cplot.annotate(
format(patch.get_height()),
(
patch.get_x() + patch.get_width() / 2,
patch.get_height()
)
)
plt.show()
Explanation: Colors are selected by order. So, if the columns in the sns.countplot have a different order than the other plot, both plots will have different columns for the same label.
Using default colors
Using the same dataframe for the pie plot and for the seaborn plot might help. As the values are already counted for the pie plot, that same dataframe could be plotted directly as a bar plot. That way, the order of the values stays the same.
Note that seaborn by default makes the colors a bit less saturated. To get the same colors as in the pie plot, you can use saturation=1 (default is .75). To add text above the bars, the latest matplotlib versions have a new function bar_label.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
counts_df = df[var].value_counts()
counts_df.plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
sns.barplot(x=counts_df.index, y=counts_df.values, saturation=1, ax=ax[1])
ax[1].bar_label(ax[1].containers[0])
#Customized colors
If you want to use a customized list of colors, you can use the colors= keyword in pie() and palette= in seaborn.
To make things fit better, you can replace spaces by newlines (so "Staten Island" will use two lines). plt.tight_layout() will rearrange spacings to make titles and texts fit nicely into the figure.

Updating chart format

I would like to change this from a line of regression to a curve. Also to have the line reach either side of the graph. Here is my code:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
data = {'Days': [5, 10, 15, 20],
'Impact': [33.7561, 30.6281, 29.5748, 29.0482]
}
a = pd.DataFrame (data, columns = ['Days','Impact'])
print (a)
ax = sns.barplot(data=a, x='Days', y='Impact', color='lightblue' )
# put bars in background:
for c in ax.patches:
c.set_zorder(0)
# plot regplot with numbers 0,..,len(a) as x value
ax = sns.regplot(x=np.arange(0,len(a)), y=a['Impact'], marker="+")
sns.despine(offset=10, trim=False)
ax.set_ylabel("")
ax.set_xticklabels(['5', '10','15','20'])
plt.show()
Alternatively, I would prefer to do it in matplotlib as a scatter plot instead of bar chart. Here is an example in excel, but ideally to have the curve extend beyond the outside markers at least a little.
Can anyone help?

Multiple histogram graphs with Seaborn

Graphing with matplotlib I get this 4 histograms model:
Using Seaborn I am getting the exact graph I need but I cannot replicate it to get 4 at a time:
I want to get 4 of the seaborn graphs (image 2) in the format of the image 1 (4 at a time with the calculations I made with seaborn).
My seaborn code is the following:
import os
import re
import time
import ipdb
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
path_file = os.path.join(BASE_DIR, 'camel_product_list.csv')
gapminder = pd.read_csv(path_file)
print(gapminder.head())
df = gapminder
sns.distplot(df['average_histogram_ssim'], hist=True, kde = False, label='All values')
df = gapminder[gapminder.color == 'green']
# sns.distplot(df['lifeExp'], hist = True, kde = True, label='Only Matches')
sns.distplot(df['average_histogram_ssim'], hist_kws={"histtype": "step",
"linewidth": 3,
"alpha": 1, "color": "b"} ,
kde = False, label='Only Matches')
# Plot formatting
plt.legend(prop={'size': 12})
plt.title('ratio_image SSIM')
plt.xlabel('Data Range')
plt.ylabel('Density')
plt.show()
The names of the columns of the dataframe are:
'ratio_text','ratio_image', 'ratio_hist', 'ratio_sub', 'color'
I'm using the color column as a filter.
How can I get the 4 seaborn plots for ratio_text','ratio_image', 'ratio_hist', 'ratio_sub', filtered by all colors and green color?
First define your grid of subplots and assign its four axes to an array ax:
fig, ax = plt.subplots(2, 2)
Now you can pass the axes you want to plot on to the seaborn plotting function with the ax keyword argument, e.g. for the first plot:
sns.distplot(df['average_histogram_ssim'], hist=True, kde=False, label='All values',
ax=ax[0, 0])
Same with ax=ax[0, 1] for the upper right plots, and so on.

Plotting categorical variable against numeric variable in matplotlib

My DataFrame's structure
trx.columns
Index(['dest', 'orig', 'timestamp', 'transcode', 'amount'], dtype='object')
I'm trying to plot transcode (transaction code) against amount to see the how much money is spent per transaction. I made sure to convert transcode to a categorical type as seen below.
trx['transcode']
...
Name: transcode, Length: 21893, dtype: category
Categories (3, int64): [1, 17, 99]
The result I get from doing plt.scatter(trx['transcode'], trx['amount']) is
Scatter plot
While the above plot is not entirely wrong, I would like the X axis to contain just the three possible values of transcode [1, 17, 99] instead of the entire [1, 100] range.
Thanks!
In matplotlib 2.1 you can plot categorical variables by using strings. I.e. if you provide the column for the x values as string, it will recognize them as categories.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({"x" : np.random.choice([1,17,99], size=100),
"y" : np.random.rand(100)*100})
plt.scatter(df["x"].astype(str), df["y"])
plt.margins(x=0.5)
plt.show()
In order to optain the same in matplotlib <=2.0 one would plot against some index instead.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({"x" : np.random.choice([1,17,99], size=100),
"y" : np.random.rand(100)*100})
u, inv = np.unique(df["x"], return_inverse=True)
plt.scatter(inv, df["y"])
plt.xticks(range(len(u)),u)
plt.margins(x=0.5)
plt.show()
The same plot can be obtained using seaborn's stripplot:
sns.stripplot(x="x", y="y", data=df)
And a potentially nicer representation can be done via seaborn's swarmplot:
sns.swarmplot(x="x", y="y", data=df)

Remove some x labels with Seaborn

In the screenshot below, all my x-labels are overlapping each other.
g = sns.factorplot(x='Age', y='PassengerId', hue='Survived', col='Sex', kind='strip', data=train);
I know that I can remove all the labels by calling g.set(xticks=[]), but is there a way to just show some of the Age labels, like 0, 20, 40, 60, 80?
I am not sure why there aren't sensible default ticks and values like there are on the y-axis. At any rate you can do something like the following:
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
titanic = sns.load_dataset('titanic')
sns.factorplot(x='age',y='fare',hue='survived',col='sex',data=titanic,kind='strip')
ax = plt.gca()
ax.xaxis.set_major_formatter(ticker.FormatStrFormatter('%d'))
ax.xaxis.set_major_locator(ticker.MultipleLocator(base=20))
plt.show()
Result:

Categories