How to add bar labels using Matplotlib [duplicate] - python

This question already has answers here:
How to add value labels on a bar chart
(7 answers)
Closed 10 months ago.
I have the following data frame.
_id message_date_time country
0 {'$oid': '61f7dfd24b11720cdbda5c86'} {'$date': '2021-12-24T12:30:09Z'} RUS
1 {'$oid': '61f7eb7b4b11720cdbda9322'} {'$date': '2021-12-20T21:58:20Z'} RUS
2 {'$oid': '61f7fdad4b11720cdbdb0beb'} {'$date': '2021-12-15T15:29:13Z'} RUS
3 {'$oid': '61f8234f4b11720cdbdbec52'} {'$date': '2021-12-10T00:03:43Z'} USA
4 {'$oid': '61f82c274b11720cdbdc21c7'} {'$date': '2021-12-09T15:10:35Z'} USA
With these values
df["country"].value_counts()
RUS 156
USA 139
FRA 19
GBR 11
AUT 9
AUS 8
DEU 7
CAN 4
BLR 3
ROU 3
GRC 3
NOR 3
NLD 3
SWE 2
ESP 2
CHE 2
POL 1
HUN 1
DNK 1
ITA 1
ISL 1
BIH 1
Name: country, dtype: int64
I'm trying to plot using the country and frequency of it using the following:
plt.figure(figsize=(15, 8))
plt.xlabel("Frequency")
plt.ylabel("Country")
plt.hist(df["country"])
plt.show()
What I need is to show the country frequency above every bar and keep a very small space between the bars.

Arguably the easiest way it to use plt.bar(). For example:
counts = df["country"].value_counts()
names, values = counts.index.tolist(), counts.values.tolist()
plt.bar(names, values)
height_above_bar = 0.05 # distance of count from bar
fontsize = 12 # the fontsize that you want the count to have
for i, val in enumerate(values):
plt.text(i, val + height_above_bar, str(val), fontsize=12)
plt.show()

For this I have used countplot from seaborn as it's better for checking the counts of each object in a series.
plt.figure(figsize = (20,5))
bars = plt.bar(df["country"], df["counts"])
for bar in bars.patches:
plt.annotate(s = bar.get_height(), xy = (bar.get_x() + bar.get_width() / 2, bar.get_height()), va = "bottom", ha = "center")
plt.show()
The output should be something like this,
If you want something else to be on the graph instead of the height, just change the s parameter in the annotate function to a value of your choice.

Related

Python Matplotlib bars subplots by Category and Aggregation

I have a table like this:
data = {'Category':["Toys","Toys","Toys","Toys","Food","Food","Food","Food","Food","Food","Food","Food","Furniture","Furniture","Furniture"],
'Product':["AA","BB","CC","DD","SSS","DDD","FFF","RRR","EEE","WWW","LLLLL","PPPPPP","LPO","NHY","MKO"],
'QTY':[100,200,300,50,20,800,300,450,150,320,400,1000,150,900,1150]}
df = pd.DataFrame(data)
df
Out:
Category Product QTY
0 Toys AA 100
1 Toys BB 200
2 Toys CC 300
3 Toys DD 50
4 Food SSS 20
5 Food DDD 800
6 Food FFF 300
7 Food RRR 450
8 Food EEE 150
9 Food WWW 320
10 Food LLLLL 400
11 Food PPPPP 1000
12 Furniture LPO 150
13 Furniture NHY 900
14 Furniture MKO 1150
So, I need to make bars subplots like this (Sum Products in each Category):
My problem is that I can't figure out how to combine categories, series, and aggregation.
I manage to split them into 3 subplots (1 always stays blank) but I can not unite them ...
import matplotlib.pyplot as plt
fig, axarr = plt.subplots(2, 2, figsize=(12, 8))
df['Category'].value_counts().plot.bar(
ax=axarr[0][0], fontsize=12, color='b'
)
axarr[0][0].set_title("Category", fontsize=18)
df['Product'].value_counts().plot.bar(
ax=axarr[1][0], fontsize=12, color='b'
)
axarr[1][0].set_title("Product", fontsize=18)
df['QTY'].value_counts().plot.bar(
ax=axarr[1][1], fontsize=12, color='b'
)
axarr[1][1].set_title("QTY", fontsize=18)
plt.subplots_adjust(hspace=.3)
plt.show()
Out
What do I need to add to combine them?
This would be a lot easier with seaborn and FacetGrid
import pandas as pd
import seaborn as sns
data = {'Category':["Toys","Toys","Toys","Toys","Food","Food","Food","Food","Food","Food","Food","Food","Furniture","Furniture","Furniture"],
'Product':["AA","BB","CC","DD","SSS","DDD","FFF","RRR","EEE","WWW","LLLLL","PPPPPP","LPO","NHY","MKO"],
'QTY':[100,200,300,50,20,800,300,450,150,320,400,1000,150,900,1150]}
df = pd.DataFrame(data)
g = sns.FacetGrid(df, col='Category', sharex=False, sharey=False, col_wrap=2, height=3, aspect=1.5)
g.map_dataframe(sns.barplot, x='Product', y='QTY')

Different binning for histplot as JoinGrid (x,y) marginal plot

I have a pandas dataframe like this:
Date
Weight
Year
Month
Day
Week
DayOfWeek
0
2017-11-13
76.1
2017
11
13
46
0
1
2017-11-14
76.2
2017
11
14
46
1
2
2017-11-15
76.6
2017
11
15
46
2
3
2017-11-16
77.1
2017
11
16
46
3
4
2017-11-17
76.7
2017
11
17
46
4
...
...
...
...
...
...
...
...
I created a JoinGrid with:
g = sns.JointGrid(data=df,
x="Date",
y="Weight",
marginal_ticks=True,
height=6,
ratio=2,
space=.05)
Then a defined joint and marginal plots:
g.plot_joint(sns.scatterplot,
hue=df["Year"],
alpha=.4,
legend=True)
g.plot_marginals(sns.histplot,
multiple="stack",
bins=20,
hue=df["Year"])
Result is this.
Now the question is: "is it possible to specify different binning for the two histplot resulting in the x and y marginal plot?"
I don't think there is a built-in way to do that, by you can plot directly on the marginal axes using the plotting function of your choice, like so:
penguins = sns.load_dataset('penguins')
data = penguins
x_col = "bill_length_mm"
y_col = "bill_depth_mm"
hue_col = "species"
g = sns.JointGrid(data=data, x=x_col, y=y_col, hue=hue_col)
g.plot_joint(sns.scatterplot)
# top marginal
sns.histplot(data=data, x=x_col, hue=hue_col, bins=5, ax=g.ax_marg_x, legend=False, multiple='stack')
# right marginal
sns.histplot(data=data, y=y_col, hue=hue_col, bins=40, ax=g.ax_marg_y, legend=False, multiple='stack')

Pandas Groubpy plotting with unstack()

I have the following code
df = pd.DataFrame({
'type':['john','bill','john','bill','bill','bill','bill','john','john'],
'num':[1006,1004,1006,1004,1006,1006,1006,1004,1004],
'date':[2017,2016,2015,2017,2017,2013,2012,2013,2012],
'pos':[0,0,1,4,0,3,3,8,9],
'force':[5,2,7,10,6,12,4,7,8]})
fig, ax = plt.subplots()
grp=df.sort_values('date').groupby(['type'])
for name, group in grp :
print(name)
print(group)
group.plot(x='date', y='force', label=name)
plt.show()
The result obtained is as follows:
bill
type num date pos force
6 bill 1006 2012 3 4
5 bill 1006 2013 3 12
1 bill 1004 2016 0 2
3 bill 1004 2017 4 10
4 bill 1006 2017 0 6
john
type num date pos force
8 john 1004 2012 9 8
7 john 1004 2013 8 7
2 john 1006 2015 1 7
0 john 1006 2017 0 5
[img1_force_Bill][1]
[img2_Force_john][2]
how can i get 4 Fig, in each one 2 lines:
Fig1 for bill: line1(x=date , y= force) for num(1004)/
line2(x=date , y= force) for num(1006)
Fig2 for bill: line1(x=date , y= pos) for num(1004)/
line2(x=date , y= pos) for num(1006)
Fig3 for john: line1(x=date , y= force) for num(1004)/
line2(x=date , y= force) for num(1006)
Fig4 for john: line1(x=date , y= pos) for num(1004)/
line2(x=date , y= pos) for num(1006)
Let's try this:
df = pd.DataFrame({
'type':['john','bill','john','bill','bill','bill','bill','john','john'],
'num':[1006,1004,1006,1004,1006,1006,1006,1004,1004],
'date':[2017,2016,2015,2017,2017,2013,2012,2013,2012],
'pos':[0,0,1,4,0,3,3,8,9],
'force':[5,2,7,10,6,12,4,7,8]})
fig, ax = plt.subplots(2,2)
axi=iter(ax.flatten())
grp=df.sort_values('date').groupby(['type'])
for name, group in grp :
# print(name)
# print(group)
group.set_index(['date','num'])['force'].unstack().plot(title=name+' - force', ax=next(axi), legend=False)
group.set_index(['date','num'])['pos'].unstack().plot(title=name+ ' - pos', ax=next(axi), legend=False)
plt.tight_layout()
plt.legend(loc='upper center', bbox_to_anchor=(0, -.5), ncol=2)
plt.show()
Output:
Update per comment below:
dfj = df[df['type'] == 'john']
ax = dfj.set_index(['date','num'])['force'].unstack().plot(title=name+' - force', legend=False)
ax.axhline(y=dfj['force'].max(), color='red', alpha=.8)
Chart:
#Scott Boston
.... thank you alot for your help.
unfortunately after using the following code with big data to plot 2 lines
for name, group in grp_new:
axn= group.set_index(['date', 'num'])['pos'].unstack().plot(title= name+' _pos', legend=False)
the plot looks like plot2Lines .They are not continuous plots.I tried to plot single lines and it were ok.

How to set color range in Matplotlib?

Looking for some help on properly coding the color range, and how to set the colors. Currently I am getting the default color range and I am unsure how to change the range and also the colors selected. I assume the code should be either in the plot method or near it. Looking to change the default purple to yellow to a custom set, while also manually setting the ranges.
Sales range from 0-15
Code:
f,ax = plt.subplots(1, figsize=(12,12))
ax = AZ3.plot(column='Sales',ax=ax,edgecolor='black')
f.suptitle('AZ')
lims = plt.axis('equal')
patchList = []
for key in legend_dict:
data_key = mpatches.Patch(color=legend_dict[key], label=key)
patchList.append(data_key)
plt.legend(handles=patchList, loc=3)
plt.savefig('legend.png', bbox_inches='tight')
for idx, row in AZ3.iterrows():
plt.annotate(s=row['Name'], xy=row['coords'],
horizontalalignment='center', color='white', size=(12))
Map:
enter image description here
Sample Data:
FIPS Name State Sales
04007 Gila AZ 1
04027 Yuma AZ 10
04012 La Paz AZ 6
04019 Pima AZ 5
04009 Graham AZ 2
04021 Pinal AZ 7
04025 Yavapai AZ 3
04001 Apache AZ 8
04023 Santa AZ 9
04005 Coco AZ 0
04003 Cochise AZ 0
04011 Green AZ 0
04013 Maricopa AZ 15
04015 Mohave AZ 1
04017 Navajo AZ 4
Thanks,
Justin

Python scatter plot different colors depending on value

I have a dataframe which i want to make a scatter plot of.
the dataframe looks like:
year length Animation
0 1971 121 1
1 1939 71 1
2 1941 7 0
3 1996 70 1
4 1975 71 0
I want the points in my scatter plot to be a different color depending the value in the Animation row.
So animation = 1 = yellow
animation = 0 = black
or something similiar
I tried doing the following:
dfScat = df[['year','length', 'Animation']]
dfScat = dfScat.loc[dfScat.length < 200]
axScat = dfScat.plot(kind='scatter', x=0, y=1, alpha=1/15, c=2)
This results in a slider which makes it hard to tell the difference.
You can also assign discrete colors to the points by passing an array to c=
Like this:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
d = {"year" : (1971, 1939, 1941, 1996, 1975),
"length" : ( 121, 71, 7, 70, 71),
"Animation" : ( 1, 1, 0, 1, 0)}
df = pd.DataFrame(d)
print(df)
colors = np.where(df["Animation"]==1,'y','k')
df.plot.scatter(x="year",y="length",c=colors)
plt.show()
This gives:
Animation length year
0 1 121 1971
1 1 71 1939
2 0 7 1941
3 1 70 1996
4 0 71 1975
Use the c parameter in scatter
df.plot.scatter('year', 'length', c='Animation', colormap='jet')

Categories