Plotting seaborn histplot bar_label with condition - python

I want to plot a seaborn histogram with labels to show the values of each bar. I only want to show the non-zero values, but I'm not sure how to do it. My MWE is
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
xlist = 900+200*np.random.randn(50,1)
fig, ax = plt.subplots()
y = sns.histplot(data=xlist, element="bars", bins=20, stat='count', legend=False)
y.set(xlabel='total time (ms)')
y.bar_label(y.containers[0])
## y.bar_label(y.containers[0][y.containers[0]!=0])
plt.show()
The graph looks like
and I want to remove all the 0 labels.

Update
A best version suggested by #BigBen:
labels = [str(v) if v else '' for v in y.containers[0].datavalues]
y.bar_label(y.containers[0], labels=labels)
Try:
labels = []
for p in y.patches:
h = p.get_height()
labels.append(str(h) if h else '')
y.bar_label(y.containers[0], labels=labels)

Related

Matplotlib scatter plot legend display problem

I'm struggling to make a scatterplot code which includes legend related to 'pitch' like below.
The variable 'u' stands for the unique pitch list. And the number is 19.
So the number of legend's label should be 19. But it turns out to be like below. There are only 10 labels on legend. And the color range assign looks weird. (label '15' should be the dark-blue but it's not though)
What seems to be the problem?
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import matplotlib.colors as colors
import matplotlib.cm as cm
import numpy as np
df = pd.DataFrame({
"X" : [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37],
"Y" : [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37],
"pitch":[10,10,20,20,30,40,50,50,60,60,60,70,70,70,70,80,80,80,100,150,1,2,3,4,5,6,7,8,9,10,3,4,5,8,9,10,3],
})
color = cm.jet
u, div = np.unique(df.pitch.values, return_inverse=True)
colorlist = [colors.rgb2hex(color(i)) for i in np.linspace(1, 0, len(u))]
cmap = ListedColormap(colorlist)
fig,ax = plt.subplots()
scatter = plt.scatter(df['X'],df['Y'], c=div, cmap=cmap)
plt.legend(scatter.legend_elements()[0], u, loc=2)
plt.show()
fig,ax = plt.subplots(figsize=(12,8))
for i, pitch in enumerate(u):
df_p = df[df['pitch'] == pitch]
scatter = ax.scatter(df_p['X'],df_p['Y'], c=colorlist[i], cmap=cmap,
label=pitch)
ax.legend(loc=2)
plt.show()
You need to replace the plt.legend(scatteer..) line by this.... Documentation on legend_element is available here.
plt.legend(scatter.legend_elements(prop='colors', num=len(colorlist))[0], u, loc=2)
Output Plot

How to plot colors for two variables in scatterplot in python?

I have a dataset with two different variables, i want to give colors to each with different color, Can anyone help please? Link to my dataset : "https://github.com/mayuripandey/Data-Analysis/blob/main/word.csv"
import matplotlib.pyplot as plt
import pandas as pd
fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(x = df['Friends Network-metrics'], y = df['Number of Followers'],cmap = "magma")
plt.xlabel("Friends Network-metrics")
plt.ylabel("Number of Followers")
plt.show()
Not very clear what you want to do here. But I'll provide a solution that may help you a bit.
Could use seaborn to implement the colors on the variables. Otherwise, you'd need to iterate through the points to set the color. Or create a new column that conditionally inputs a color for a value.
I don't know what your variable is, but you just want to put that in for the hue parameter:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.read_csv('https://raw.githubusercontent.com/mayuripandey/Data-Analysis/main/word.csv')
# Use the 'hue' argument to provide a factor variable
sns.lmplot(x='Friends Network-metrics',
y='Number of Followers',
height=8,
aspect=.8,
data=df,
fit_reg=False,
hue='Sentiment',
legend=True)
plt.xlabel("Friends Network-metrics")
plt.ylabel("Number of Followers")
plt.show()
This can give you a view like this:
If you were looking for color scale for one of the variables though, you would do the below. However, the max value is so big that the range also doesn't make it really an effective visual:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/mayuripandey/Data-Analysis/main/word.csv')
fig, ax = plt.subplots(figsize=(10, 6))
g = ax.scatter(x = df['Friends Network-metrics'],
y = df['Number of Followers'],
c = df['Friends Network-metrics'],
cmap = "magma")
fig.colorbar(g)
plt.xlabel("Friends Network-metrics")
plt.ylabel("Number of Followers")
plt.show()
So you could adjust the scale (I'd also add edgecolors = 'black' as its hard to see the light plots):
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/mayuripandey/Data-Analysis/main/word.csv')
fig, ax = plt.subplots(figsize=(10, 6))
g = ax.scatter(x = df['Friends Network-metrics'],
y = df['Number of Followers'],
c = df['Friends Network-metrics'],
cmap = "magma",
vmin=0, vmax=10000,
edgecolors = 'black')
fig.colorbar(g)
plt.xlabel("Friends Network-metrics")
plt.ylabel("Number of Followers")
plt.show()

formatting the x axis to % and the y axis to £

I have used this python code in Power BI:
colors = ["#FF0B04", "#ffbf00", "#228800"]
sns.set_palette(sns.color_palette(colors))
g = sns.JointGrid(data=dataset, x="Score", y="Profit", hue= 'Score Bands')
g.plot(sns.scatterplot, sns.histplot,legend=False)
plt.tight_layout()
plt.show()
to create the plot:
Can anybody please help on formatting the x axis to % and the y axis to £?
Using sample data from the official site, I associated seaborn with ax, and then used formatter to specify the percentage display and euro currency symbol in unicode.
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
tips = sns.load_dataset("tips")
colors = ["#FF0B04", "#ffbf00", "#228800"]
sns.set_palette(sns.color_palette(colors))
g = sns.JointGrid(data=tips, x="total_bill", y="tip", hue= 'time')
g.plot(sns.scatterplot, sns.histplot,legend=False)
ax = g.ax_joint
ax.xaxis.set_major_formatter(ticker.PercentFormatter(xmax=5))
ax.yaxis.set_major_formatter(ticker.FormatStrFormatter("\u20ac%d"))
plt.tight_layout()
plt.show()

Correlation values in pairplot()

Is there a way to show pair-correlation values with seaborn.pairplot(), as in the example below (created with ggpairs() in R)? I can make the plots using the attached code, but cannot add the correlations. Thanks
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset('iris')
g = sns.pairplot(iris, kind='scatter', diag_kind='kde')
# remove upper triangle plots
for i, j in zip(*np.triu_indices_from(g.axes, 1)):
g.axes[i, j].set_visible(False)
plt.show()
If you use PairGrid instead of pairplot, then you can pass a custom function that would calculate the correlation coefficient and display it on the graph:
from scipy.stats import pearsonr
def reg_coef(x,y,label=None,color=None,**kwargs):
ax = plt.gca()
r,p = pearsonr(x,y)
ax.annotate('r = {:.2f}'.format(r), xy=(0.5,0.5), xycoords='axes fraction', ha='center')
ax.set_axis_off()
iris = sns.load_dataset("iris")
g = sns.PairGrid(iris)
g.map_diag(sns.distplot)
g.map_lower(sns.regplot)
g.map_upper(reg_coef)

How can I add jitter to my seaborn and matplot plots?

I am working on trying to add Jitter to my plots using seaborn and matplot plots. I am getting mixed information form what I am reading online. Some information is saying coding needs to be done and other information show it as being as simple as jitter = True. I there another library or something that I should be importing that I am not aware of? Below is the code that I am running and trying to add jitter to:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
filename = 'https://library.startlearninglabs.uw.edu/DATASCI410/Datasets/JitteredHeadCount.csv'
headcount_df = pd.read_csv(filename)
headcount_df.describe()
%matplotlib inline
ax = plt.figure(figsize=(12, 6)).gca() # define axis
headcount_df.plot.scatter(x = 'Hour', y = 'TablesOpen', ax = ax, alpha = 0.2)
# auto_price.plot(kind = 'scatter', x = 'city-mpg', y = 'price', ax = ax)
ax.set_title('Hour vs TablesOpen') # Give the plot a main title
ax.set_ylabel('TablesOpen')# Set text for y axis
ax.set_xlabel('Hour')
ax = sns.kdeplot(headcount_df.loc[:, ['TablesOpen', 'Hour']], shade = True, cmap = 'PuBu')
headcount_df.plot.scatter(x = 'Hour', y = 'TablesOpen', ax = ax, jitter = True)
ax.set_title('Hour vs TablesOpen') # Give the plot a main title
ax.set_ylabel('TablesOpen')# Set text for y axis
ax.set_xlabel('Hour')
I receive the error: AttributeError: 'PathCollection' object has no property 'jitter' when trying to add the jitter. Any help or more information on this would be much appreciated
To add jitter to a scatter plot, first get a handle to the collection that contains the scatter dots. When a scatter plot is just created on an ax, ax.collections[-1] will be the desired collection.
Calling get_offsets() on the collection gets all the xy coordinates of the dots. Add some small random number to each of them. As in this case all coordinates are integers, adding a random number between 0 and 1 spreads the dots out evenly.
In this case the number of dots is very huge. To better see where the dots are concentrated, they can be made very small (marker=',', linewidth=0, s=1,) and be very transparent (e.g.alpha=0.1).
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
filename = 'https://library.startlearninglabs.uw.edu/DATASCI410/Datasets/JitteredHeadCount.csv'
headcount_df = pd.read_csv(filename)
fig, ax = plt.subplots(figsize=(12, 6))
headcount_df.plot.scatter(x='Hour', y='TablesOpen', marker=',', linewidth=0, s=1, alpha=.1, color='crimson', ax=ax)
dots = ax.collections[-1]
offsets = dots.get_offsets()
jittered_offsets = offsets + np.random.uniform(0, 1, offsets.shape)
dots.set_offsets(jittered_offsets)
ax.set_title('Hour vs TablesOpen') # Give the plot a main title
ax.set_ylabel('TablesOpen') # Set text for y axis
ax.set_xlabel('Hour')
ax.set_xticks(range(25))
ax.autoscale(enable=True, tight=True)
plt.tight_layout()
plt.show()
As there are a huge number of points, drawing the 2D kde takes a long time. The time can be reduced by taking a random sample from the rows. Note that to draw a 2D kde, the latest versions of Seaborn want each column as a separate parameter.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
filename = 'https://library.startlearninglabs.uw.edu/DATASCI410/Datasets/JitteredHeadCount.csv'
headcount_df = pd.read_csv(filename)
fig, ax = plt.subplots(figsize=(12, 6))
N = 5000
rand_sel_df = headcount_df.iloc[np.random.choice(range(len(headcount_df)), N)]
ax = sns.kdeplot(rand_sel_df['Hour'], rand_sel_df['TablesOpen'], shade=True, cmap='PuBu', ax=ax)
ax.set_title('Hour vs TablesOpen')
ax.set_xticks(range(25))
plt.tight_layout()
plt.show()

Categories