I have a CSV file on employee salaries in 2020 and I cant figure out how to organize my bar graph. here is the CSV for reference: https://catalog.data.gov/dataset/employee-salaries-2020
I would like to present the average salary of each department in a bar graph.
I've started by organizing the bar graph by Department and its value_count() but I would like the x axis to represent the average salary in that department. Any tips on how I can achieve this?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
file_path = 'Employee_Salaries_-_2020.csv'
salaries = pd.read_csv(file_path)
a = salaries.Department.value_counts()
x = list(a.index)
y = list(a)
f, ax = plt.subplots(figsize=(20,10))
width = 0.75 # the width of the bars
ind = np.arange(len(y)) # the x locations for the groups
ax.barh(ind, y, width, color="blue")
ax.set_yticks(ind+width/2)
ax.set_yticklabels(x, minor=False)
for i, v in enumerate(y):
ax.text(v + .25, i + .25, str(v), color='blue', fontweight='bold') #add value labels into bar
plt.title('Average Base Pay by Department')
plt.xlabel('Average Base Pay')
plt.ylabel('Department')
plt.show()
Instead of value counts you can get the average salary by doing salaries.groupby('Department')['Base Salary'].mean(). This should be the value you are looking for.
Related
How can the following code be modified to show the mean as well as the different error bars on each bar of the bar plot?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("white")
a,b,c,d = [],[],[],[]
for i in range(1,5):
np.random.seed(i)
a.append(np.random.uniform(35,55))
b.append(np.random.uniform(40,70))
c.append(np.random.uniform(63,85))
d.append(np.random.uniform(59,80))
data_df =pd.DataFrame({'stages':[1,2,3,4],'S1':a,'S2':b,'S3':c,'S4':d})
print("Delay:")
display(data_df)
S1 S2 S3 S4
0 43.340440 61.609735 63.002516 65.348984
1 43.719898 40.777787 75.092575 68.141770
2 46.015958 61.244435 69.399904 69.727380
3 54.340597 56.416967 84.399056 74.011136
meansd_df=data_df.describe().loc[['mean', 'std'],:].drop('stages', axis = 1)
display(meansd_df)
sns.set()
sns.set_style('darkgrid',{"axes.facecolor": ".92"}) # (1)
sns.set_context('notebook')
fig, ax = plt.subplots(figsize = (8,6))
x = meansd_df.columns
y = meansd_df.loc['mean',:]
yerr = meansd_df.loc['std',:]
plt.xlabel("Time", size=14)
plt.ylim(-0.3, 100)
width = 0.45
for i, j,k in zip(x,y,yerr): # (2)
ax.bar(i,j, width, yerr = k, edgecolor = "black",
error_kw=dict(lw=1, capsize=8, capthick=1)) # (3)
ax.set(ylabel = 'Delay')
from matplotlib import ticker
ax.yaxis.set_major_locator(ticker.MultipleLocator(10))
plt.savefig("Over.png", dpi=300, bbox_inches='tight')
Given the example data, for a seaborn.barplot with capped error bars, data_df must be converted from a wide format, to a tidy (long) format, which can be accomplished with pandas.DataFrame.stack or pandas.DataFrame.melt
It is also important to keep in mind that a bar plot shows only the mean (or other estimator) value
Sample Data and DataFrame
.iloc[:, 1:] is used to skip the 'stages' column at column index 0.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# given data_df from the OP, select the columns except stage and reshape to long format
df = data_df.iloc[:, 1:].melt(var_name='set', value_name='val')
# display(df.head())
set val
0 S1 43.340440
1 S1 43.719898
2 S1 46.015958
3 S1 54.340597
4 S2 61.609735
Updated as of matplotlib v3.4.2
Use matplotlib.pyplot.bar_label
See How to add value labels on a bar chart for additional details and examples with .bar_label.
Some formatting can be done with the fmt parameter, but more sophisticated formatting should be done with the labels parameter, as show in How to add multiple annotations to a barplot.
Tested with seaborn v0.11.1, which is using matplotlib as the plot engine.
fig, ax = plt.subplots(figsize=(8, 6))
# add the plot
sns.barplot(x='set', y='val', data=df, capsize=0.2, ax=ax)
# add the annotation
ax.bar_label(ax.containers[-1], fmt='Mean:\n%.2f', label_type='center')
ax.set(ylabel='Mean Time')
plt.show()
plot with seaborn.barplot
Using matplotlib before version 3.4.2
The default for the estimator parameter is mean, so the height of the bar is the mean of the group.
The bar height is extracted from p with .get_height, which can be used to annotate the bar.
fig, ax = plt.subplots(figsize=(8, 6))
sns.barplot(x='set', y='val', data=df, capsize=0.2, ax=ax)
# show the mean
for p in ax.patches:
h, w, x = p.get_height(), p.get_width(), p.get_x()
xy = (x + w / 2., h / 2)
text = f'Mean:\n{h:0.2f}'
ax.annotate(text=text, xy=xy, ha='center', va='center')
ax.set(xlabel='Delay', ylabel='Time')
plt.show()
Seaborn is most powerfull with long form data. So you might want to transform your data, something like this:
sns.barplot(data=data_df.melt('stages', value_name='Delay', var_name='Time'),
x='Time', y='Delay',
capsize=0.1, edgecolor='k')
Output:
I have a dict with 200 keys and values that generates this graph.
Also i compute the average of these values and plot that too. I would like the avg value to be the same color as the line. The code that i use is this :
with open('path_to_json/values.json', 'r') as outfile:
a = json.load(outfile)
sum = 0
for k, v in a.items():
sum += v
avg = sum / len(a.values())
fig, ax = plt.subplots(1,1)
p=ax.plot(list(a.keys()),list(a.values()))
p=ax.plot([0, len(a.values())], [avg, avg], "--")
yt = ax.get_yticks()
yt = np.append(yt, avg)
xticks = np.arange(0, len(a.values()), 10)
ax.yaxis.set_major_formatter(FormatStrFormatter('%.2f'))
ax.set_xticks(xticks)
ax.set_yticks(yt)
plt.show()
Have a look at the overview for the Axes class, especially the "Ticks and tick labels" section: There's get_yticklabels and get_yticklines which return the objects which both have a set_color method.
Minimal example:
import matplotlib.pyplot as plt
import numpy as np
plt.plot([0, 1], [0, 1])
ax = plt.gca()
ax.set_yticks(np.append(ax.get_yticks(), 0.55))
ax.get_yticklabels()[-1].set_color("red")
ax.get_yticklines()[-2].set_color("red")
plt.show()
How can I plot a horizontal bar chart with the values at the end of the bar, Something similar to this
I tried this
plt.barh(inc.index,inc)
plt.yticks(inc.index)
plt.xticks(inc);
plt.xlabel("Order Count")
plt.ylabel("Date")
Bar chart
The answer can be found here:
How to display the value of the bar on each bar with pyplot.barh()?
Just add the for loop as cphlewis said:
for i, v in enumerate(inc):
ax.text(v + 3, i + .25, str(v), color='blue', fontweight='bold')
plt.show()
Here is the code that I tried for your situation:
import matplotlib.pyplot as plt
import numpy as np
inc = [12, 25, 50, 65, 40, 45]
index = ["2019-10-31", "2019-10-30", "2019-10-29", "2019-10-28", "2019-10-27", "2019-10-26"]
fig, ax = plt.subplots()
ax.barh(index,inc, color='black')
plt.yticks(index)
plt.xticks(inc);
plt.xlabel("Order Count")
plt.ylabel("Date")
# Set xticks
plt.xticks(np.arange(0, max(inc)+15, step=10))
# Loop for showing inc numbers in the end of bar
for i, v in enumerate(inc):
ax.text(v + 1, i, str(v), color='black', fontweight='bold')
plt.show()
Plot looks like this:
To generate a plot with values superimposed, run:
ax = inc.plot.barh(xticks=inc, xlim=(0, 40));
ax.set_xlabel('Order Count')
ax.set_ylabel('Date')
for p in ax.patches:
w = p.get_width()
ax.annotate(f' {w}', (w + 0.1, p.get_y() + 0.1))
Note that I set xlim with upper limit slightly above the
maximum Order Count, to provide the space for annotations.
For a subset of your data I got:
And one more impovement:
As I see, your data is a Series with a DatetimeIndex.
So if you want to have y label values as dates only (without
00:00:00 for hours), convert the index to string:
inc.index = inc.index.strftime('%Y-%m-%d')
like I did, generating my plot.
I have a notebook with 2* bar charts, one is winter data & one is summer data. I have counted the total of all the crimes and plotted them in a bar chart, using code:
ax = summer["crime_type"].value_counts().plot(kind='bar')
plt.show()
Which shows a graph like:
I have another chart nearly identical, but for winter:
ax = winter["crime_type"].value_counts().plot(kind='bar')
plt.show()
And I would like to have these 2 charts compared against one another in the same bar chart (Where every crime on the x axis has 2 bars coming from it, one winter & one summer).
I have tried, which is just me experimenting:
bx = (summer["crime_type"],winter["crime_type"]).value_counts().plot(kind='bar')
plt.show()
Any advice would be appreciated!
The following generates dummies of your data and does the grouped bar chart you wanted:
import random
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
s = "Crime Type Summer|Crime Type Winter".split("|")
# Generate dummy data into a dataframe
j = {x: [random.choice(["ASB", "Violence", "Theft", "Public Order", "Drugs"]
) for j in range(300)] for x in s}
df = pd.DataFrame(j)
index = np.arange(5)
bar_width = 0.35
fig, ax = plt.subplots()
summer = ax.bar(index, df["Crime Type Summer"].value_counts(), bar_width,
label="Summer")
winter = ax.bar(index+bar_width, df["Crime Type Winter"].value_counts(),
bar_width, label="Winter")
ax.set_xlabel('Category')
ax.set_ylabel('Incidence')
ax.set_title('Crime incidence by season, type')
ax.set_xticks(index + bar_width / 2)
ax.set_xticklabels(["ASB", "Violence", "Theft", "Public Order", "Drugs"])
ax.legend()
plt.show()
With this script I got:
You can check out the demo in the matplotlib docs here: https://matplotlib.org/gallery/statistics/barchart_demo.html
The important thing to note is the index!
index = np.arange(5) # Set an index of n crime types
...
summer = ax.bar(index, ...)
winter = ax.bar(index+bar_width, ...)
...
ax.set_xticks(index + bar_width / 2)
These are the lines that arrange the bars on the horizontal axis so that they are grouped together.
Create a pandas dataframe with 3 columns crimetype, count, Season and try this function.
#Importing required packages
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.ticker import MaxNLocator
#Function Creation
def plt_grouped_bar(Plot_Nm,group_bar,x, y,plt_data,**bar_kwargs):
plt_fig=plt.figure(figsize=(18,9))
ax=plt_fig.add_subplot()
g = sns.catplot(x=x, y=y, hue=group_bar,data=plt_data,ax=ax,kind="bar",**bar_kwargs)
for p in ax.patches:
height = p.get_height()
ax.text(x = p.get_x()+(p.get_width()/2),
y = height+0.05,
s = '{:.0f}'.format(height),
ha = 'center',va = 'bottom',zorder=20, rotation=90)
ax.set_title(Plot_Nm,fontweight="bold",fontsize=18,alpha=0.7,y=1.03)
g.set_xticklabels(x,fontsize=10,alpha=0.8,fontweight="bold")
plt.setp(ax.get_xticklabels(), rotation=90)
ax.set_yticklabels("")
ax.set_xlabel("")
ax.set_ylabel("")
ax.yaxis.set_major_locator(MaxNLocator(integer=True))
ax.tick_params(axis=u'both',length=0)
ax.legend(loc='upper right')
for spine in ax.spines:
ax.spines[spine].set_visible(False)
plt.close()
#Calling the function
plt_grouped_bar('Title of bar','weather','crimetype','count',pandasdataframename)
I would like to plot a data-set by its categories, using geometric shapes such as circle, triangle and square to represent category 1 and colors to represent category 2. The output would have varying combination of the geometric shapes and colors and the legend would list the attributes of the categories separately i.e.:
circle = a
triangle = b square = c
red = I
green = II
blue = III
Looking for solutions I found following posts which would only give solutions for one specific geometric shape having one specific color.
How to plot by category with different markers
How to plot by category
I tried to work something out with the code from one of the posts but without success.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
np.random.seed(1983)
num = 10
x, y = np.random.random((2, num))
cat1 = np.random.choice(['a', 'b', 'c'], num)
cat2 = np.random.choice(['I', 'II', 'III'], num)
df = pd.DataFrame(dict(x=x, y=y, cat1=cat1, cat2=cat2))
groups = df.groupby(['cat1', 'cat2'])
fig, ax = plt.subplots()
for name, group in groups:
ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name)
ax.legend()
plt.show()
you can try this code block
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
#Create mapping dictionary that you want
marker_dict = {'a':'o','b':'^','c':'s'}
color_dict = {'I':'red', 'II':'green', 'III':'blue'}
np.random.seed(1983)
num = 10
x, y = np.random.random((2, num))
cat1 = np.random.choice(['a', 'b', 'c'], num)
cat2 = np.random.choice(['I', 'II', 'III'], num)
df = pd.DataFrame(dict(x=x, y=y, cat1=cat1, cat2=cat2))
groups = df.groupby(['cat1', 'cat2'])
fig, ax = plt.subplots()
ax.margins(0.05)
for name, group in groups:
marker = marker_dict[name[0]]
color = color_dict[name[1]]
ax.plot(group.x, group.y, marker=marker, linestyle='', ms=12, label=name,color=color)
ax.legend()
plt.show()
Hope it helps.