Matplotlib plotting dataframe

Matplotlib plotting dataframe - python

I'm trying to learn how to plot dataframes. I read in a csv and have the following columns:
cost, model, origin, year
--------------------------
200 x1 usa 2020
145 x1 chn 2020
233 x1 usa 2020
122 x2 chn 2020
583 x2 usa 2020
233 x3 chn 2020
201 x3 chn 2020
I'm trying to create a bar plot and only want to plot the average cost per model.
Here's my attempt, but I dont think im on the right track:
df = df.groupby('cost').mean()
plt.bar(df.index, df['model'])
plt.show()

You can groupby model, then calculate the mean of cost and plot it:
df.groupby('model')['cost'].mean().plot.bar()
Output:
Or with seaborn:
sns.barplot(data=df, x='model', y='cost', ci=None)
Output:

You can use the pandas plot function like so:
df.plot.bar(x='model', y='cost')

Related

Plotting barplot category-wise in pandas

I have a dataframe containing columns code, year and number_of_dues. I want to plot barplot having year on x axis and no of claims for each year on y axis for each code in one after after subplot fashion. please help me.
Sample data is given below.
Code Year No_of_dues
1 2016 100
1 2017 200
1 2018 300
2 2016 200
2 2017 300
2 2018 500
3 2016 600
3 2017 800
3 2018

Try this one:
df.groupby(['Code', 'Year'])['No_of_dues'].sum().to_frame().plot.bar()

just use seaborn.
set your x and y axes, and hue by the class you want to cohort by

why is my matplotlib boxplot appearing almost empty with only a few lines and half a dot?

I have a car dataframe:
### Open and load the data ###
my_data_frame = pd.read_csv(data_file_pathname, sep=",", header=0)
### Print some instances in the console ###
print(my_data_frame.head())
print(my_data_frame.dtypes)
model year price transmission mileage fuelType tax mpg engineSize
0 A1 2017 12500 Manual 15735 Petrol 150 55.4 1.4
1 A6 2016 16500 Automatic 36203 Diesel 20 64.2 2.0
2 A1 2016 11000 Manual 29946 Petrol 30 55.4 1.4
3 A4 2017 16800 Automatic 25952 Diesel 145 67.3 2.0
4 A3 2019 17300 Manual 1998 Petrol 145 49.6 1.0
model object
year int64
price int64
transmission object
mileage int64
fuelType object
tax int64
mpg float64
engineSize float64
dtype: object
I have tried many ways to get a box and whiskers plot for the year column but they all failed. Now I am trying out the matplotlib boxplot function and it almost works but the boxplot won't really show. Here is the code I use:
my_data_frame.boxplot(column='year')
plt.show()
As you can see, there is a line and a dot but they seem out of the picture. Also, there are those weird lines on the y-axis that shouldn't be there. Anyone know what my problem is?

Found the solution, was in my face the entire time. I had used the plt.figure() on a previous plot, so I was plotting my boxplot on this previous figure and it gave the result you saw.
By putting fig1 = plt.figure() for the first plot and fig2 = plt.figure() for the boxplot, the problem was solved.

Multiple column plotting Python

I've got data in the form:
Year Month State Value
2001 Jan AK 80
2001 Feb AK 40
2001 Mar AK 60
2001 Jan LA 70
2001 Feb LA 79
2001 Mar LA 69
2001 Jan KS 65
.
.
This data is only for Year 2001 and Months repeat on each State.
I want a basic graph with this data together in one based off the State with X-Axis being Month and Y-Axis being the Value.
When I plot with:
g = df.groupby('State')
for state, data in g:
plt.plot(df['Month'], df['Value'], label=state)
plt.show()
I get a very wonky looking graph.
I know based off plotting these individually they aren't extremely different in their behaviour but they are not even close to being this much overlapped.
Is there a way of building more of a continuous plot?

Your problem is that inside your for loop you're referencing df, which still has the data for all the states. Try:
for state, data in g:
plt.plot(data['Month'], data['Value'], label = state)
plt.legend()
plt.show()
Hopefully this helps!

Python matplotlib - add trend line, make subplot and save to .pdf [duplicate]

I have a temperature file with many years temperature records, in a format as below:
2012-04-12,16:13:09,20.6
2012-04-12,17:13:09,20.9
2012-04-12,18:13:09,20.6
2007-05-12,19:13:09,5.4
2007-05-12,20:13:09,20.6
2007-05-12,20:13:09,20.6
2005-08-11,11:13:09,20.6
2005-08-11,11:13:09,17.5
2005-08-13,07:13:09,20.6
2006-04-13,01:13:09,20.6
Every year has different numbers, time of the records, so the pandas datetimeindices are all different.
I want to plot the different year's data in the same figure for comparing . The X-axis is Jan to Dec, the Y-axis is temperature. How should I go about doing this?

Try:
ax = df1.plot()
df2.plot(ax=ax)

If you a running Jupyter/Ipython notebook and having problems using;
ax = df1.plot()
df2.plot(ax=ax)
Run the command inside of the same cell!! It wont, for some reason, work when they are separated into sequential cells. For me at least.

Chang's answer shows how to plot a different DataFrame on the same axes.
In this case, all of the data is in the same dataframe, so it's better to use groupby and unstack.
Alternatively, pandas.DataFrame.pivot_table can be used.
dfp = df.pivot_table(index='Month', columns='Year', values='value', aggfunc='mean')
When using pandas.read_csv, names= creates column headers when there are none in the file. The 'date' column must be parsed into datetime64[ns] Dtype so the .dt extractor can be used to extract the month and year.
import pandas as pd
# given the data in a file as shown in the op
df = pd.read_csv('temp.csv', names=['date', 'time', 'value'], parse_dates=['date'])
# create additional month and year columns for convenience
df['Year'] = df.date.dt.year
df['Month'] = df.date.dt.month
# groupby the month a year and aggreate mean on the value column
dfg = df.groupby(['Month', 'Year'])['value'].mean().unstack()
# display(dfg)
Year 2005 2006 2007 2012
Month
4 NaN 20.6 NaN 20.7
5 NaN NaN 15.533333 NaN
8 19.566667 NaN NaN NaN
Now it's easy to plot each year as a separate line. The OP only has one observation for each year, so only a marker is displayed.
ax = dfg.plot(figsize=(9, 7), marker='.', xticks=dfg.index)

To do this for multiple dataframes, you can do a for loop over them:
fig = plt.figure(num=None, figsize=(10, 8))
ax = dict_of_dfs['FOO'].column.plot()
for BAR in dict_of_dfs.keys():
if BAR == 'FOO':
pass
else:
dict_of_dfs[BAR].column.plot(ax=ax)
This can also be implemented without the if condition:
fig, ax = plt.subplots()
for BAR in dict_of_dfs.keys():
dict_of_dfs[BAR].plot(ax=ax)

You can make use of the hue parameter in seaborn. For example:
import seaborn as sns
df = sns.load_dataset('flights')
year month passengers
0 1949 Jan 112
1 1949 Feb 118
2 1949 Mar 132
3 1949 Apr 129
4 1949 May 121
.. ... ... ...
139 1960 Aug 606
140 1960 Sep 508
141 1960 Oct 461
142 1960 Nov 390
143 1960 Dec 432
sns.lineplot(x='month', y='passengers', hue='year', data=df)

plot dataframe based on conditions

I have a dataframe df below: showing the number of kilometers done per day per type of people by type of car.
People Car dmy value(km)
A Renault 14-05-2016 500
B Peugeot 14-05-2016 1000
A Citroen 14-05-2016 400
A Renault 15-05-2016 24
B Peugeot 15-05-2016 247
A Renault 15-05-2016 369
A Citroen 23-05-2016 692
A Citroen 28-05-2016 284
I have 20k lines over 1 year
I want to group by the dmy column to get the mean value of the 'value(km)' column per day
This is what I have done:
I first create a new dataframe given 2 conditions: My graph will only show the mean value of kms per day for 1 type of car and 1 category of people.
yy = (df["Car"] == 'Renault') & (df["People"] == 'A')
Then I create a dataframe to perform the group.by
zz = yy.groupby('dmy')['value(km)'].mean()
And set the dmy column as the index
zz = zz.set_index('dmy')
Then I plot this new zz dataframe:
plt.plot(zz.index, zz["value"].values, linestyle='-', color='b', label="Renault")
plt.gcf().autofmt_xdate()
plt.legend()
plt.show()
No plots appear though. Thx for help!

groupby returns a DataFrameGroupBy and not a Dataframe.
I would first select the necessary columns and then call aggregate and eventually plot:
import numpy as np
zz = yy[['value(km)', 'dmy']].groupby('dmy').aggregate(np.mean)
zz.plot()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Matplotlib plotting dataframe - python

You can groupby model, then calculate the mean of cost and plot it: df.groupby('model')['cost'].mean().plot.bar() Output: Or with seaborn: sns.barplot(data=df, x='model', y='cost', ci=None) Output:

You can use the pandas plot function like so: df.plot.bar(x='model', y='cost')

Related

Plotting barplot category-wise in pandas

why is my matplotlib boxplot appearing almost empty with only a few lines and half a dot?

Multiple column plotting Python

Python matplotlib - add trend line, make subplot and save to .pdf [duplicate]

plot dataframe based on conditions

Categories

Resources