I have a dataframe containing columns code, year and number_of_dues. I want to plot barplot having year on x axis and no of claims for each year on y axis for each code in one after after subplot fashion. please help me.
Sample data is given below.
Code Year No_of_dues
1 2016 100
1 2017 200
1 2018 300
2 2016 200
2 2017 300
2 2018 500
3 2016 600
3 2017 800
3 2018
Try this one:
df.groupby(['Code', 'Year'])['No_of_dues'].sum().to_frame().plot.bar()
just use seaborn.
set your x and y axes, and hue by the class you want to cohort by
I have a car dataframe:
### Open and load the data ###
my_data_frame = pd.read_csv(data_file_pathname, sep=",", header=0)
### Print some instances in the console ###
print(my_data_frame.head())
print(my_data_frame.dtypes)
model year price transmission mileage fuelType tax mpg engineSize
0 A1 2017 12500 Manual 15735 Petrol 150 55.4 1.4
1 A6 2016 16500 Automatic 36203 Diesel 20 64.2 2.0
2 A1 2016 11000 Manual 29946 Petrol 30 55.4 1.4
3 A4 2017 16800 Automatic 25952 Diesel 145 67.3 2.0
4 A3 2019 17300 Manual 1998 Petrol 145 49.6 1.0
model object
year int64
price int64
transmission object
mileage int64
fuelType object
tax int64
mpg float64
engineSize float64
dtype: object
I have tried many ways to get a box and whiskers plot for the year column but they all failed. Now I am trying out the matplotlib boxplot function and it almost works but the boxplot won't really show. Here is the code I use:
my_data_frame.boxplot(column='year')
plt.show()
As you can see, there is a line and a dot but they seem out of the picture. Also, there are those weird lines on the y-axis that shouldn't be there. Anyone know what my problem is?
Found the solution, was in my face the entire time. I had used the plt.figure() on a previous plot, so I was plotting my boxplot on this previous figure and it gave the result you saw.
By putting fig1 = plt.figure() for the first plot and fig2 = plt.figure() for the boxplot, the problem was solved.
I've got data in the form:
Year Month State Value
2001 Jan AK 80
2001 Feb AK 40
2001 Mar AK 60
2001 Jan LA 70
2001 Feb LA 79
2001 Mar LA 69
2001 Jan KS 65
.
.
This data is only for Year 2001 and Months repeat on each State.
I want a basic graph with this data together in one based off the State with X-Axis being Month and Y-Axis being the Value.
When I plot with:
g = df.groupby('State')
for state, data in g:
plt.plot(df['Month'], df['Value'], label=state)
plt.show()
I get a very wonky looking graph.
I know based off plotting these individually they aren't extremely different in their behaviour but they are not even close to being this much overlapped.
Is there a way of building more of a continuous plot?
Your problem is that inside your for loop you're referencing df, which still has the data for all the states. Try:
for state, data in g:
plt.plot(data['Month'], data['Value'], label = state)
plt.legend()
plt.show()
Hopefully this helps!
I have a temperature file with many years temperature records, in a format as below:
2012-04-12,16:13:09,20.6
2012-04-12,17:13:09,20.9
2012-04-12,18:13:09,20.6
2007-05-12,19:13:09,5.4
2007-05-12,20:13:09,20.6
2007-05-12,20:13:09,20.6
2005-08-11,11:13:09,20.6
2005-08-11,11:13:09,17.5
2005-08-13,07:13:09,20.6
2006-04-13,01:13:09,20.6
Every year has different numbers, time of the records, so the pandas datetimeindices are all different.
I want to plot the different year's data in the same figure for comparing . The X-axis is Jan to Dec, the Y-axis is temperature. How should I go about doing this?
Try:
ax = df1.plot()
df2.plot(ax=ax)
If you a running Jupyter/Ipython notebook and having problems using;
ax = df1.plot()
df2.plot(ax=ax)
Run the command inside of the same cell!! It wont, for some reason, work when they are separated into sequential cells. For me at least.
Chang's answer shows how to plot a different DataFrame on the same axes.
In this case, all of the data is in the same dataframe, so it's better to use groupby and unstack.
Alternatively, pandas.DataFrame.pivot_table can be used.
dfp = df.pivot_table(index='Month', columns='Year', values='value', aggfunc='mean')
When using pandas.read_csv, names= creates column headers when there are none in the file. The 'date' column must be parsed into datetime64[ns] Dtype so the .dt extractor can be used to extract the month and year.
import pandas as pd
# given the data in a file as shown in the op
df = pd.read_csv('temp.csv', names=['date', 'time', 'value'], parse_dates=['date'])
# create additional month and year columns for convenience
df['Year'] = df.date.dt.year
df['Month'] = df.date.dt.month
# groupby the month a year and aggreate mean on the value column
dfg = df.groupby(['Month', 'Year'])['value'].mean().unstack()
# display(dfg)
Year 2005 2006 2007 2012
Month
4 NaN 20.6 NaN 20.7
5 NaN NaN 15.533333 NaN
8 19.566667 NaN NaN NaN
Now it's easy to plot each year as a separate line. The OP only has one observation for each year, so only a marker is displayed.
ax = dfg.plot(figsize=(9, 7), marker='.', xticks=dfg.index)
To do this for multiple dataframes, you can do a for loop over them:
fig = plt.figure(num=None, figsize=(10, 8))
ax = dict_of_dfs['FOO'].column.plot()
for BAR in dict_of_dfs.keys():
if BAR == 'FOO':
pass
else:
dict_of_dfs[BAR].column.plot(ax=ax)
This can also be implemented without the if condition:
fig, ax = plt.subplots()
for BAR in dict_of_dfs.keys():
dict_of_dfs[BAR].plot(ax=ax)
You can make use of the hue parameter in seaborn. For example:
import seaborn as sns
df = sns.load_dataset('flights')
year month passengers
0 1949 Jan 112
1 1949 Feb 118
2 1949 Mar 132
3 1949 Apr 129
4 1949 May 121
.. ... ... ...
139 1960 Aug 606
140 1960 Sep 508
141 1960 Oct 461
142 1960 Nov 390
143 1960 Dec 432
sns.lineplot(x='month', y='passengers', hue='year', data=df)
I have a dataframe df below: showing the number of kilometers done per day per type of people by type of car.
People Car dmy value(km)
A Renault 14-05-2016 500
B Peugeot 14-05-2016 1000
A Citroen 14-05-2016 400
A Renault 15-05-2016 24
B Peugeot 15-05-2016 247
A Renault 15-05-2016 369
A Citroen 23-05-2016 692
A Citroen 28-05-2016 284
I have 20k lines over 1 year
I want to group by the dmy column to get the mean value of the 'value(km)' column per day
This is what I have done:
I first create a new dataframe given 2 conditions: My graph will only show the mean value of kms per day for 1 type of car and 1 category of people.
yy = (df["Car"] == 'Renault') & (df["People"] == 'A')
Then I create a dataframe to perform the group.by
zz = yy.groupby('dmy')['value(km)'].mean()
And set the dmy column as the index
zz = zz.set_index('dmy')
Then I plot this new zz dataframe:
plt.plot(zz.index, zz["value"].values, linestyle='-', color='b', label="Renault")
plt.gcf().autofmt_xdate()
plt.legend()
plt.show()
No plots appear though. Thx for help!
groupby returns a DataFrameGroupBy and not a Dataframe.
I would first select the necessary columns and then call aggregate and eventually plot:
import numpy as np
zz = yy[['value(km)', 'dmy']].groupby('dmy').aggregate(np.mean)
zz.plot()