I have a dataframe containing columns code, year and number_of_dues. I want to plot barplot having year on x axis and no of claims for each year on y axis for each code in one after after subplot fashion. please help me.
Sample data is given below.
Code Year No_of_dues
1 2016 100
1 2017 200
1 2018 300
2 2016 200
2 2017 300
2 2018 500
3 2016 600
3 2017 800
3 2018
Try this one:
df.groupby(['Code', 'Year'])['No_of_dues'].sum().to_frame().plot.bar()
just use seaborn.
set your x and y axes, and hue by the class you want to cohort by
I have a dataframe with 2 arrays that I want to compare and year (2017, 2018, 2019, 2020, 2021)
d_value d_price year
-0.5064707042427661 0.023998098076044583 2017
0.28441876232859986 0.0022609116336060886 2017
0.5367803487234388 0.13772218694253624 2017
0.0377291747906765 0.021950729043333617 2017
-0.18032315533671328 -0.04870435126455952 2017
-0.36067080655671513 -0.019264936817780298 2017
-0.372303636796094 0.01799282703076832 2017
...
0.007174422512419509 0.0019633270749088716 2019
-0.12315438362311693 -0.001272120509836161 2019
0.06807880349789097 -0.027139767879056143 2019
-0.0415780440397856 0.0005471861484347418 2019
-0.07717256578262432 -0.011382707712158657 2019
0.2627255062420586 -0.0021274426812466496 2019
-0.13879042170946043 0.009918696052991338 2019
-0.040799674958042265 -0.0044788215666135 2019
....
I used this to plot all years in one graph
sns.set(rc={'figure.figsize':(6,5)})
plt = sns.scatterplot(data=corr, x="d_price", y="d_value", hue="year")
#plt.title('Price-Volume Correlation ', fontsize=18)
plt.set_xlabel("Change in price", fontsize = 10)
plt.set_ylabel("Change in total transaction volume (USD)", fontsize = 10)
but there are just too many data points, so I want to have separate plots by year, 5 scatter plots in one grid. Is there a way to do that instead of manually producing 5 separate plots? and also, is there a way to insert a text in each plot something like "corr: 0.22"? Thanks a lot in advance!
I have this pandas data frame, where I want to make a line plot, per each year strata:
year month canasta
0 2011 1 239.816531
1 2011 2 239.092353
2 2011 3 239.332308
3 2011 4 237.591538
4 2011 5 238.384231
... ... ... ...
59 2015 12 295.578605
60 2016 1 296.918861
61 2016 2 296.398701
62 2016 3 296.488780
63 2016 4 300.922927
And I tried this code:
dca.groupby(['year', 'month'])['canasta'].mean().reset_index().plot()
But I get this result:
I must be doing something wrong. Please, could you help me with this plot? The x axis is the months, and there should be a line per each year.
Why: Because after you do reset_index, year and month become normal columns. And some_df.plot() simply plots all the columns of the dataframe into one plot, resulting what you posted.
Fix: Try unstack instead of reset_index:
(dca.groupby(['year', 'month'])
['canasta'].mean()
.unstack('year').plot()
)
Below is script for a simplified version of the df in question:
import pandas as pd
df = pd.DataFrame({
'week': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17],
'month' : ['JAN','JAN ','JAN','JAN','FEB','FEB','FEB','FEB','MAR','MAR',
'MAR','MAR','APR','APR','APR','APR','APR'],
'weekly_stock' : [4,2,5,6,2,3,6,8,7,9,5,3,5,4,5,8,9]
})
df
week month weekly_stock
0 1 JAN 4
1 2 JAN 2
2 3 JAN 5
3 4 JAN 6
4 5 FEB 2
5 6 FEB 3
6 7 FEB 6
7 8 FEB 8
8 9 MAR 7
9 10 MAR 9
10 11 MAR 5
11 12 MAR 3
12 13 APR 5
13 14 APR 4
14 15 APR 5
15 16 APR 8
16 17 APR 9
As it currently stands, the script below produces a bar chart with week for x-labels
# plot chart
labels=df.week
line=df['weekly_stock']
fig, ax = plt.subplots(figsize=(20,8))
line1=plt.plot(line, label = '2019')
ax.set_xticks(x)
ax.set_xticklabels(labels, rotation=0)
ax.set_ylabel('Stock')
ax.set_xlabel('week')
plt.title('weekly stock')
However, I would like to have the month as the x-label.
INTENDED PLOT:
Any help would be greatly appreciated.
My recommendation is to have a valid datetime values column instead of 'month' and 'week', like you have. Matplotlib is pretty smart when working with valid datetime values, so I'd structure the dates like so first:
import pandas as pd
import matplotlib.pyplot as plt
# valid datetime values in a range
dates = pd.date_range(
start='2019-01-01',
end='2019-04-30',
freq='W', # weekly increments
name='dates',
closed='left'
)
weekly_stocks = [4,2,5,6,2,3,6,8,7,9,5,3,5,4,5,8,9]
df = pd.DataFrame(
{'weekly_stocks': weekly_stocks},
index=dates # set dates column as index
)
df.plot(
figsize=(20,8),
kind='line',
title='Weekly Stocks',
legend=False,
xlabel='Week',
ylabel='Stock'
)
plt.grid(which='both', linestyle='--', linewidth=0.5)
Now this is a fairly simple solution. Take notice that the ticks appear exactly where the weeks are; Matplotlib did all the work for us!
(easier) You can either lay the "data foundation" prior to plotting correctly, i.e., format the data for Matplotlib to do all the work like we did above(think of the ticks being the actual date-points created in the pd.date_range()).
(harder) Use tick locators/formatters as mentioned in docs here
Hope this was helpful.
I'm trying to create a graph that shows whether or not average temperatures in my city are increasing. I'm using data provided by NOAA and have a DataFrame that looks like this:
DATE TAVG MONTH YEAR
0 1939-07 86.0 07 1939
1 1939-08 84.8 08 1939
2 1939-09 82.2 09 1939
3 1939-10 68.0 10 1939
4 1939-11 53.1 11 1939
5 1939-12 52.5 12 1939
This is saved in a variable called "avgs", and I then use groupby and plot functions like so:
avgs.groupby(["YEAR"]).plot(kind='line',x='MONTH', y='TAVG')
This produces a line graph (see below for example) for each year that shows the average temperature for each month. That's great stuff, but I'd like to be able to put all of the yearly line graphs into one graph, for the purposes of visual comparison (to see if the monthly averages are increasing).
Example output
I'm a total noob with matplotlib and pandas, so I don't know the best way to do this. Am I going wrong somewhere and just don't realize it? And if I'm on the right track, where should I go from here?
Very similar to the other answer (by Anake), but you can get control over legend here (the other answer, legends for all years will be "TAVG". I add a new year entries into your data just to show this.
avgs = '''
DATE TAVG MONTH YEAR
0 1939-07 86.0 07 1939
1 1939-08 84.8 08 1939
2 1939-09 82.2 09 1939
3 1939-10 68.0 10 1939
4 1940-11 53.1 11 1940
5 1940-12 52.5 12 1940
'''
ax = plt.subplot()
for key, group in avgs.groupby("YEAR"):
ax.plot(group.MONTH, group.TAVG, label = key)
ax.set_xlabel('Month')
ax.set_ylabel('TAVG')
plt.legend()
plt.show()
will result in
You can do:
ax = None
for group in df.groupby("YEAR"):
ax = group[1].plot(x="MONTH", y="TAVG", ax=ax)
plt.show()
Each plot() returns the matplotlib Axes instance where it drew the plot. So by feeding that back in each time, you can repeatedly draw on the same set of axes.
I don't think you can do that directly in the functional style as you have tried unfortunately.