Have a dataframe but need to make a barplot in python - python

Hi I have a very big dataframe, below is a snapshot. I want to calculate target % split across various worker type and plot bar graph (see attached picture)
Worker type TARGET
0 Working 1
1 State servant 0
2 Pensioner 1
3 Working 0
4 Commercial associate 1
5 State servant 0
6 Commercial associate 0
7 Pensioner 1
8 Working 1
9 Working 0

Try,
import matplotlib.pyplot as plt
ax = df[['Worker type']].plot(kind='bar', title ="Worker Type", figsize=(15, 10), legend=True, fontsize=12)
ax.set_xlabel("Worker", fontsize=12)
ax.set_ylabel("Count", fontsize=12)
plt.show()

try this:
df.groupby('Worker type').count().plot.bar(y='TARGET')

Related

How to select different sets of variables (ei value counts for a specific country) from a groupby df for a 2,2 subplot

From my original data frame, I used the group-by to create the new df as shown below, which has the natural disaster subtype counts for each country.
However, I'm unsure how to, for example, select 4 specific countries and set them as variables in a 2 by 2 plot.
The X-axis will be the disaster subtype name, with the Y being the value count, however, I can't quite figure out the right code to select this information.
This is how I grouped the countries -
g_grp= df_geo.groupby(['Country'])
c_val = pd.DataFrame(c_grp['Disaster Subtype'].value_counts())
c_val = c_val.rename(columns={'Disaster Subtype': 'Disaster Subtype', 'Disaster Subtype': 'Num of Disaster'})
c_val.head(40)
Output:
Country Disaster Subtype
Afghanistan Riverine flood 45
Ground movement 33
Flash flood 32
Avalanche 19
Drought 8
Bacterial disease 7
Convective storm 6
Landslide 6
Cold wave 5
Viral disease 5
Mudslide 3
Severe winter conditions 2
Forest fire 1
Locust 1
Parasitic disease 1
Albania Ground movement 16
Riverine flood 8
Severe winter conditions 3
Convective storm 2
Flash flood 2
Heat wave 2
Avalanche 1
Coastal flood 1
Drought 1
Forest fire 1
Viral disease 1
Algeria Ground movement 21
Riverine flood 20
Flash flood 8
Bacterial disease 2
Cold wave 2
Forest fire 2
Coastal flood 1
Drought 1
Heat wave 1
Landslide 1
Locust 1
American Samoa Tropical cyclone 4
Flash flood 1
Tsunami 1
However, let's say I want to select these for and plot 4 plots, 1 for each country, showing the number of each type of disaster happening in each country, I know I would need something along the lines of what's below, but I'm unsure how to set the x and y variables for each -- or if there is a more efficient way to set the variables/plot, that would be great. Usually, I would just use loc or iloc, but I need to be more specific with selecting.
fig, ax = subplots(2,2, figsize(16,10)
X1 = c_val.loc['Country'] == 'Afghanistan' #This doesn't work, just need something similar
y1 = c_val.loc['Num of Disasters']
X2 =
y2 =
X3 =
y3 =
X4 =
y4 =
ax[0,0].bar(X1,y1,width=.4, color=['#A2BDF2'])
ax[0,1].bar(X2,y2,width=.4,color=['#A2BDF2'])
ax[1,0].bar(X3,y3,width=.4,color=['#A2BDF2'])
ax[1,1].bar(X4,y4,width=.4,color=['#A2BDF2'])
IIUC, an simple way is to use catplot from seaborn package:
# Python env: pip install seaborn
# Anaconda env: conda install seaborn
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
g = sns.catplot(x='Disaster Subtype', y='Num of Disaster', col='Country',
data=df, col_wrap=2, kind='bar')
g.set_xticklabels(rotation=90)
g.tight_layout()
plt.show()
Update
How I can select the specific countries to be plotted in each subplot?
subdf = df.loc[df['Country'].isin(['Albania', 'Algeria'])]
g = sns.catplot(x='Disaster Subtype', y='Num of Disaster', col='Country',
data=subdf, col_wrap=2, kind='bar')
...

Grouped bar chart for categories by month/year

I'm trying to use Plotly to create a stacked or grouped bar chart that has month/year on the x-axis and values on the y-axis. The data frame looks like this:
category value date
apple 4 10/2020
banana 3 10/2020
apple 2 10/2020
strawberry 1 11/2020
banana 4 11/2020
apple 9 11/2020
banana 4 12/2020
apple 7 12/2020
strawberry 4 12/2020
banana 8 12/2020
.
.
.
Assuming that newer dates will come through, and also more categories can be added, I'm trying to create a grouped bar chart that is also scrollable on the x-axis(date).
I tried this to create the grouped bar chart but it ends up being a stacked bar chart instead:
import plotly.graph_objects as go
fig_3_a = go.Figure(data=[go.Bar(
x=df['date'],
y=df['value'],
text=df['category'],
textposition='auto',
orientation ='v',
)],
layout=go.Layout(barmode='group'))
I would like something like this instead, where the different categories can possibly be assigned a different color, and the x-axis being the month/day and the y-axis being the value. Here, gender==category and x-axis==month/year. Also would need to add the scrolling for the x-axis to see all the month/year:
You can do it simply with plotly.express.
import plotly.express as px
fig = px.bar(df, x='date', y='value', color='category', barmode='group')
fig.show()
If you want to do it with go.Bar class, you need to add traces for each category.

how to plot in pandas categorical data

I have this kind of dataframe:
animal age where
0 dog 1 indoor
1 cat 4 indoor
2 horse 3 outdoor
I would like to present a bar plot in which:
y axis is age, x axis is animal, and the animals are grouped in adjacent bars with different colors.
Thanks
This should do the trick
df = pd.DataFrame({"animal":["dog","cat","horse"],"age":[1,4,3],"where":["indoor","indoor","outdoor"]})
df
animal age where
0 dog 1 indoor
1 cat 4 indoor
2 horse 3 outdoor
ax = df.plot.bar(x="animal",y="age",color=["b","r","g"])
ax.legend("")
ax.set_ylabel("Age")
Another easy way. Set the intended x axis label as index and plot. By defaul, float/integer end up on the y axis
import matplotlib.pyplot as plt
df.set_index(df['animal']).plot( kind='bar')
plt.ylabel('age')

Black-white/Gray bar charts in Python

I have following small data:
Tom Dick Harry Jack
Sub
Maths 9 12 3 10
Science 16 40 1 10
English 12 11 4 15
French 17 15 2 15
Sports 23 19 3 15
I want to create a bar chart in black-white/gray colors for these data.
I can have such a figure with following code:
df.plot(kind='bar', colormap='gray')
plt.show()
However, the fourth bar (Jack's) is pure white and same as background. How can I avoid this problem of last bar being pure white?
Use the other colormaps or manually enter the color names. Alternatively you can change the background by using different style sheet such as ggplot,seabor or fivethirty eight.
colors=['darkgray','gray','dimgray','lightgray']
df.plot(kind='bar',color=colors )
plt.show()
df.plot(kind='bar',colormap=plt.cm.viridis )
plt.show()
Using the style sheets here:
https://matplotlib.org/3.1.1/gallery/style_sheets/style_sheets_reference.html
plt.style.use('seaborn')#change the style sheets here
df.plot(kind='bar',colormap=plt.cm.gray)
plt.show()
Here is the output looks like:

Plotting for repeated values using loops Python

I have some data that looks like data = pd.read_csv(....):
Year Month HOUR NAME RATE
2010 1 0 Big 222
2010 1 0 Welsch Power 434
2010 1 0 Cottonwood 124
2010 1 1 Big 455
2010 1 1 Welsch Power 900
2010 1 1 Cottonwood 110
.
.
.
2010 2 0 Big 600
2010 2 0 Welsch Power 1000
2010 2 0 Cottonwood 170
.
.
2010 3 0 Big 400
2010 3 0 Welsch Power 900
2010 3 0 Cottonwood 110
As you can see the HOUR ( 0 - 23 ) repeats itself every Month ( 0 - 12 ). I need a way to loop through values so I can plot the RATE (Y-Axis) every Month by the HOUR (X-Axis) for each NAME.
My attempt looks like:
for name, data in data.groupby('NAME'):
fig = plt.figure(figsize=(14,23))
plt.subplot(211)
plt.plot(data['HOUR'], data['RATE'], label=name)
plt.xlabel('Hour')
plt.ylabel('Rate')
plt.legend()
plt.show()
plt.close()
This works but because HOUR repeats every change in month the graphs end up back at 0 every time it loops. I want to have each of the 12 Months as separate lines in different colors for each NAME on one graph but currently they look like this:
.pivot your DataFrame after the groupby so it will plot each month as a different line:
import matplotlib.pyplot as plt
for name, gp in df.groupby(['NAME']):
fig, ax = plt.subplots()
gp.pivot(index='HOUR', columns='Month', values='RATE').plot(ax=ax, marker='o', title=name)
plt.show()

Categories