Black-white/Gray bar charts in Python - python

I have following small data:
Tom Dick Harry Jack
Sub
Maths 9 12 3 10
Science 16 40 1 10
English 12 11 4 15
French 17 15 2 15
Sports 23 19 3 15
I want to create a bar chart in black-white/gray colors for these data.
I can have such a figure with following code:
df.plot(kind='bar', colormap='gray')
plt.show()
However, the fourth bar (Jack's) is pure white and same as background. How can I avoid this problem of last bar being pure white?

Use the other colormaps or manually enter the color names. Alternatively you can change the background by using different style sheet such as ggplot,seabor or fivethirty eight.
colors=['darkgray','gray','dimgray','lightgray']
df.plot(kind='bar',color=colors )
plt.show()
df.plot(kind='bar',colormap=plt.cm.viridis )
plt.show()
Using the style sheets here:
https://matplotlib.org/3.1.1/gallery/style_sheets/style_sheets_reference.html
plt.style.use('seaborn')#change the style sheets here
df.plot(kind='bar',colormap=plt.cm.gray)
plt.show()
Here is the output looks like:

Related

How to set a Seaborn scatterplot hue equal to features and not the values of a single feature?

I would like to create a scatterplot in Seaborn that sets the hue parameter so that values are coloured based on what feature they are from. (All features have values 0 to 100)
For example, supposing I have the following dataframe:
Happiness Kindness Sadness
1 100 70 0
2 60 50 1
3 34 32 10
4 23 65 54
5 43 54 87
When plotting the values, I would like to set all the values under happiness to red, kindness to blue, sadness to green. However, with Seaborn's scatterplot, the hue parameter only accepts one variable under the dataframe. Documentation here: http://man.hubwiz.com/docset/Seaborn.docset/Contents/Resources/Documents/generated/seaborn.scatterplot.html
I want to know if there are any workarounds to the only one variable being accepted feature so I can use the hue parameter to distinguish between the different variables.

Have a dataframe but need to make a barplot in python

Hi I have a very big dataframe, below is a snapshot. I want to calculate target % split across various worker type and plot bar graph (see attached picture)
Worker type TARGET
0 Working 1
1 State servant 0
2 Pensioner 1
3 Working 0
4 Commercial associate 1
5 State servant 0
6 Commercial associate 0
7 Pensioner 1
8 Working 1
9 Working 0
Try,
import matplotlib.pyplot as plt
ax = df[['Worker type']].plot(kind='bar', title ="Worker Type", figsize=(15, 10), legend=True, fontsize=12)
ax.set_xlabel("Worker", fontsize=12)
ax.set_ylabel("Count", fontsize=12)
plt.show()
try this:
df.groupby('Worker type').count().plot.bar(y='TARGET')

Stacked bar plot of large data in python

I would like to plot a stacked bar plot from a csv file in python. I have three columns of data
year word frequency
2018 xyz 12
2017 gfh 14
2018 sdd 10
2015 fdh 1
2014 sss 3
2014 gfh 12
2013 gfh 2
2012 gfh 4
2011 wer 5
2010 krj 4
2009 krj 4
2019 bfg 4
... 300+ rows of data.
I need to go through all the data and plot a stacked bar plot which is categorized based on the year, so x axis is word and y axis is frequency, the legend color should show year wise. I want to see how the evolution of each word occured year wise. Some of the technology words are repeatedly used in every year and hence the stack bar graph should add the values on top and plot, for example the word gfh initially plots 14 for year 2017, and then in year 2014 I want the gfh word to plot (in a different color) for a value of 12 on top of the gfh of 2017. How do I do this? So far I called the csv file in my code. But I don't understand how could it go over all the rows and stack the words appropriately (as some words repeat through all the years). Any help is highly appreciated. Also the years are arranged in random order in csv but I sorted them year wise to make it easier. I am just learning python and trying to understand this plotting routine since i have 40 years of data and ~20 words. So I thought stacked bar plot is the best way to represent them. Any other visualisation method is also welcome.
This can be done using pandas:
import pandas as pd
df = pd.read_csv("file.csv")
# Aggregate data
df = df.groupby(["word", "year"], as_index=False).agg({"frequency": "sum"})
# Create list to sort by
sorter = (
df.groupby(["word"], as_index=False)
.agg({"frequency": "sum"})
.sort_values("frequency")["word"]
.values
)
# Pivot, reindex, and plot
df = df.pivot(index="word", columns="year", values="frequency")
df = df.reindex(sorter)
df.plot.bar(stacked=True)
Which outputs:

Overlaying bar charts in python

Can I overlay 3 barcharts in python? The code I used to produce the three barcharts can be seen below:
fig3.set_title('Sample 2(2019-10-05)- Averge bikes used per hour')
fig3.set_xlabel('Hour')
fig3.set_ylabel('Average Percentage')
fig3.set_ylim(ymin=70) ```
fig4=average_bikes_used_hours3.plot.bar(y='Average bikes used in a hour', x='hour',figsize=(20,10))
fig4.set_title('Sample 3(2019-08-31)- Averge bikes used per hour')
fig4.set_xlabel('Hour')
fig4.set_ylabel('Average Percentage')
fig4.set_ylim(ymin=70)
fig5=average_bikes_used_hours4.plot.bar(y='Average bikes used in a hour', x='hour',figsize=(20,10))
fig5.set_title('Sample 4(2019-08-31)- Averge bikes used per hour')
fig5.set_xlabel('Hour')
fig5.set_ylabel('Average Percentage')
fig5.set_ylim(ymin=70)
The most intuitive way is:
create a single DataFrame,
with index for consecutive hours,
with separate columns for each sample.
Something like:
Sample 2 Sample 3 Sample 4
Hour
8 20 25 21
9 22 27 27
10 23 34 29
11 21 30 22
12 19 22 24
Then just plot:
df.plot.bar();
and you will have all samples in a single picture.
For the above data, I got:
If you want some extra space for the legend, pass ylim parameter, e.g.:
df.plot.bar(ylim=(0,40));

Normalisation of data

I am trying to plot the data below in a pie chart. I split the pie chart based on the group first and then based on the Id. But since for some rows, the count is very small, I am not able to see it in the pie chart.
I am trying to normalise the data. I am not sure on how to do that. Any help would be sincerely appreciated.
Group Id Count
G1 12 276938
G1 13 102
G2 12 27
G3 12 4683
G3 13 7
G4 12 301
Don't pie chart what doesn't fit a visual representation in a pie chart
(
df.groupby(['Group', 'Id'])
.sum().Count.sort_values(ascending=False)
.plot.bar(logy=True, subplots=True)
)

Categories