I'm trying to learn how to plot dataframes. I read in a csv and have the following columns:
cost, model, origin, year
--------------------------
200 x1 usa 2020
145 x1 chn 2020
233 x1 usa 2020
122 x2 chn 2020
583 x2 usa 2020
233 x3 chn 2020
201 x3 chn 2020
I'm trying to create a bar plot and only want to plot the average cost per model.
Here's my attempt, but I dont think im on the right track:
df = df.groupby('cost').mean()
plt.bar(df.index, df['model'])
plt.show()
You can groupby model, then calculate the mean of cost and plot it:
df.groupby('model')['cost'].mean().plot.bar()
Output:
Or with seaborn:
sns.barplot(data=df, x='model', y='cost', ci=None)
Output:
You can use the pandas plot function like so:
df.plot.bar(x='model', y='cost')
I would like to do a pie chart based on the columns values with the whole sums per console,
So total units % for the whole year for Megadrive, Super Nintendo and NES and ignoring the other two columns using Python.
Consoles Units Sold
Quarter Megadrive Super Nintendo NES Total Units Total Sales
Q1 1230 7649 765 9644 316500
Q2 345 3481 874 4700 274950
Q3 654 2377 1234 4265 337050
Q4 1234 6555 2666 10455 334050
But I can only find the way using all columns or by rows.
Thanks
I have 12 avg monthly values for 1000 columns and I want to convert the data into daily using pandas. I have tried to do it using interplolate but I got the daily values from 31/01/1991 to 31/12/1991, which does not cover the whole year. January month values are not getting. I used date_range for index of my data frame.
date=pd.date_range(start="01/01/1991",end="31/12/1991",freq="M")
upsampled=df.resample("D")
interpolated = upsampled.interpolate(method='linear')
How can I get the interpolated values for 365 days?
Note that interpolation is between the known points.
So to interpolate throughout the whole year, it is not enough to have
just 12 values (for each month).
You must have 13 values (e.g. for the start of each month and
the start of the next year).
Thus I created the source df as:
date = pd.date_range(start='01/01/1991', periods=13, freq='MS')
df = pd.DataFrame({'date': date, 'amount': np.random.randint(100, 200, date.size)})
getting e.g.:
date amount
0 1991-01-01 113
1 1991-02-01 164
2 1991-03-01 181
3 1991-04-01 164
4 1991-05-01 155
5 1991-06-01 157
6 1991-07-01 118
7 1991-08-01 133
8 1991-09-01 184
9 1991-10-01 183
10 1991-11-01 159
11 1991-12-01 193
12 1992-01-01 163
Then to upsample it to daily frequency and interpolate, I ran:
df.set_index('date').resample('D').interpolate()
If you don't want the result to contain the last row (for 1992-01-01),
take only a slice of the above result, dropping the last row:
df.set_index('date').resample('D').interpolate()[:-1]
df
SKU Comp Brand Jan_Sales Feb_Sales Mar_sales Apr_sales Dec_sales..
A AC BA 122 100 50 200 300
B BC BB 100 50 80 90 250
C CC BC 40 30 100 10 11
and so on
Now I want a graph which will plot Jan sales, feb sales and so on till dec in one line for SKU A, Similarly one line on the same graph for SKU B and same way for SKU C.
I read few answers which say that I need to transpose my data. Something like below
df.T. plot()
However my first column is SKU, and I want to plot based on that. Rest of the columns are numeric. So I want that on each line SKU Name should be mentioned. And plotting should be row wise
EDIT(added after receiving some answers as I am facing this issue in few other datasets):
lets say I dont want columns Company, brand etc, then what to do
Use DataFrame.set_index for convert SKU to index and then tranpose:
df.set_index('SKU').T.plot()
Use set_index then transpose:
df.set_index("SKU").T.plot()
Output:
I have a temperature file with many years temperature records, in a format as below:
2012-04-12,16:13:09,20.6
2012-04-12,17:13:09,20.9
2012-04-12,18:13:09,20.6
2007-05-12,19:13:09,5.4
2007-05-12,20:13:09,20.6
2007-05-12,20:13:09,20.6
2005-08-11,11:13:09,20.6
2005-08-11,11:13:09,17.5
2005-08-13,07:13:09,20.6
2006-04-13,01:13:09,20.6
Every year has different numbers, time of the records, so the pandas datetimeindices are all different.
I want to plot the different year's data in the same figure for comparing . The X-axis is Jan to Dec, the Y-axis is temperature. How should I go about doing this?
Try:
ax = df1.plot()
df2.plot(ax=ax)
If you a running Jupyter/Ipython notebook and having problems using;
ax = df1.plot()
df2.plot(ax=ax)
Run the command inside of the same cell!! It wont, for some reason, work when they are separated into sequential cells. For me at least.
Chang's answer shows how to plot a different DataFrame on the same axes.
In this case, all of the data is in the same dataframe, so it's better to use groupby and unstack.
Alternatively, pandas.DataFrame.pivot_table can be used.
dfp = df.pivot_table(index='Month', columns='Year', values='value', aggfunc='mean')
When using pandas.read_csv, names= creates column headers when there are none in the file. The 'date' column must be parsed into datetime64[ns] Dtype so the .dt extractor can be used to extract the month and year.
import pandas as pd
# given the data in a file as shown in the op
df = pd.read_csv('temp.csv', names=['date', 'time', 'value'], parse_dates=['date'])
# create additional month and year columns for convenience
df['Year'] = df.date.dt.year
df['Month'] = df.date.dt.month
# groupby the month a year and aggreate mean on the value column
dfg = df.groupby(['Month', 'Year'])['value'].mean().unstack()
# display(dfg)
Year 2005 2006 2007 2012
Month
4 NaN 20.6 NaN 20.7
5 NaN NaN 15.533333 NaN
8 19.566667 NaN NaN NaN
Now it's easy to plot each year as a separate line. The OP only has one observation for each year, so only a marker is displayed.
ax = dfg.plot(figsize=(9, 7), marker='.', xticks=dfg.index)
To do this for multiple dataframes, you can do a for loop over them:
fig = plt.figure(num=None, figsize=(10, 8))
ax = dict_of_dfs['FOO'].column.plot()
for BAR in dict_of_dfs.keys():
if BAR == 'FOO':
pass
else:
dict_of_dfs[BAR].column.plot(ax=ax)
This can also be implemented without the if condition:
fig, ax = plt.subplots()
for BAR in dict_of_dfs.keys():
dict_of_dfs[BAR].plot(ax=ax)
You can make use of the hue parameter in seaborn. For example:
import seaborn as sns
df = sns.load_dataset('flights')
year month passengers
0 1949 Jan 112
1 1949 Feb 118
2 1949 Mar 132
3 1949 Apr 129
4 1949 May 121
.. ... ... ...
139 1960 Aug 606
140 1960 Sep 508
141 1960 Oct 461
142 1960 Nov 390
143 1960 Dec 432
sns.lineplot(x='month', y='passengers', hue='year', data=df)