How to plot daily averages with pandas? - python

This should be very easy, but I'm having several issues. The thing is, I want to do something like this post, but (1) I have a datetime field, so I have the hour, minutes and seconds in my date column, (2) I want to plot a line graph by day.
So, this is my data:
date col1 col2
2020-01-01 00:01:020 20 500
2020-01-02 00:01:020 10 500
2020-01-02 00:01:000 20 500
2020-01-02 00:01:021 20 500
2020-02-05 20:11:010 30 500
2020-02-05 10:01:020 10 500
.
.
.
So, as I mentioned above, what I want is to plot the daily average of col1.
I started with this:
df.groupby('date')['col1'].mean()
That didn't work because of the hours, minutes and seconds.
Later, I tried this:
df["day"] = df["date"].dt.day
df.groupby("day")["col1"].mean().plot(kind="line")
I almost did it, but the column day is not actually the day, but a number which represents the position of the day in the year, I guess. So any ideas on how to make this plot?

IIUC:
groupby date instead of day:
df.groupby(df['date'].dt.date)["col1"].mean().plot(kind="line",rot=25)
#you don't need to create a column date for this directly pass date in groupby()
OR
df.groupby(df['date'].dt.normalize())["col1"].mean().plot(kind="line",rot=25)
Optional(you can also do this by these 2 but the above 2 fits best for your data and condition since the below ones will create unnecessary dates and NaN's):
#via pd.Grouper():
df.groupby(pd.Grouper(key='date',freq='1D'))["col1"].mean().dropna().plot(kind="line")
#OR
#via dt.floor():
df.groupby(df['date'].dt.floor('1D'))["col1"].mean().dropna().plot(kind="line")
output(for given sample data):

Since this question has seaborn and plotly tags as well,
sns.lineplot performs this operation without the need for groupby mean as the default estimator will compute the mean value per x instance. To remove error shading set ci=None.
Imports and setup:
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
df = pd.DataFrame({
'date': ['2020-01-01 00:01:020', '2020-01-02 00:01:020',
'2020-01-02 00:01:000', '2020-01-02 00:01:021',
'2020-02-05 20:11:010', '2020-02-05 10:01:020'],
'col1': [20, 10, 20, 20, 30, 10],
'col2': [500, 500, 500, 500, 500, 500]
})
df['date'] = pd.to_datetime(df['date'])
Plotting Code:
# Seaborn Line Plot x is the date, y is col1 default estimator is mean
ax = sns.lineplot(data=df, x=df['date'].dt.date, y='col1', ci=None)
ax.tick_params(axis='x', rotation=45) # Make X ticks easier to read
plt.tight_layout()
plt.show()
For plotly take the groupby mean and create a px.line.
Imports and setup:
import pandas as pd
import plotly.express as px
df = pd.DataFrame({
'date': ['2020-01-01 00:01:020', '2020-01-02 00:01:020',
'2020-01-02 00:01:000', '2020-01-02 00:01:021',
'2020-02-05 20:11:010', '2020-02-05 10:01:020'],
'col1': [20, 10, 20, 20, 30, 10],
'col2': [500, 500, 500, 500, 500, 500]
})
df['date'] = pd.to_datetime(df['date'])
Plotting code:
plot_values = df.groupby(df['date'].dt.date)["col1"].mean()
fig = px.line(plot_values)
fig.show()

What do you want exactly? the date without time?
try this:
df["day"] = df["date"].apply(lambda l: l.date())

Related

How to get three plots from pandas plot into one figure

I have massive CSV file which uses timeseries which spans 3 years and looks something like this:
Date Company1 Company2
2020-01-01 00:00:00 100 200
2020-01-01 01:00:00 110 180
2020-01-01 02:00:00 90 210
2020-01-01 03:00:00 100 200
.... ... ...
2020-12-31 21:00:00 100 200
2020-12-31 22:00:00 80 230
2020-12-31 23:00:00 120 220
Except I have 10 companies.
Anyway, I managed to plot 3 plots for one month for each year, looks like this
newMatrix.plot(x='Date', y='Company4', xlim=('2020-01-01 00:00:00', '2020-01-31 23:00:00'))
newMatrix.plot(x='Date', y='Company4', xlim=('2021-01-01 00:00:00', '2021-01-31 23:00:00'))
newMatrix.plot(x='Date', y='Company4', xlim=('2022-01-01 00:00:00', '2022-01-31 23:00:00'))
Now the problem is that I can't figure out how to make one figure where I can see better how the trends differ between years (for example during January each year). The best outcome would be to have the days/months on the x axis and each plotted line representing each year.
I have been experimenting combining matplotlib with pandas plot, but so far I either get no plot or three figures. How can I solve this?
I think you need to do something like this (keep in mind I don't have the entire df):
import pandas as pd
import matplotlib.pyplot as plt
# create DataFrame
data = {'Date': ['2020-01-01 00:00:00', '2020-01-01 01:00:00', '2020-01-01 02:00:00', '2020-01-01 03:00:00', '2020-12-31 21:00:00', '2020-12-31 22:00:00', '2020-12-31 23:00:00'],
'Company1': [100, 110, 90, 100, 100, 80, 120],
'Company2': [200, 180, 210, 200, 200, 230, 220]}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
fig, axes = plt.subplots(nrows=2, ncols=5, figsize=(15, 6), sharey=True)
for i, company in enumerate(df.columns):
# create mask for January for each year
mask_2020 = (df.index >= '2020-01-01') & (df.index <= '2020-01-31')
mask_2021 = (df.index >= '2021-01-01') & (df.index <= '2021-01-31')
mask_2022 = (df.index >= '2022-01-01') & (df.index <= '2022-01-31')
axes[i//5, i%5].plot(df.loc[mask_2020, company], label='2020')
axes[i//5, i%5].plot(df.loc[mask_2021, company], label='2021')
axes[i//5, i%5].plot(df.loc[mask_2022, company], label='2022')
axes[i//5, i%5].set_title(company)
axes[i//5, i%5].legend()
fig.text(0.04, 0.5, 'Value', va='center', rotation='vertical')
fig.text(0.5, 0.04, 'Date', ha='center')
fig.suptitle('Company trends for January of each year')
plt.subplots_adjust(wspace=0.3, hspace=0.5)
plt.show()
which gives:

How to read a specific colum value and also n - 1 from data fram

I have some set of data in csv format. whereas I have to find total sale amount difference from yesterday and today sales. so could you please explain to access same in a columns
date
sales value
diff
10-oct-22
100
0
11-oct-22
120
20
12-oct-22
105
-15
How get get date and sales value from pandas dataframe and add next colum diff
No idea how to do this
something like this (?):
import pandas as pd
from datetime import datetime
df = pd.DataFrame({'date': [datetime(2022, 10, 10), datetime(2022, 10, 11), datetime(2022, 10, 12)], 'sales value': [100, 120, 105]})
df['diff'] = df['sales value'].diff().fillna(0)

Make datetime line look nice on seaborn plot x axis

How do you reformat from datetime to Week 1, Week 2... to plot onto a seaborn line chart?
Input
Date Ratio
0 2019-10-04 0.350365
1 2019-10-04 0.416058
2 2019-10-11 0.489051
3 2019-10-18 0.540146
4 2019-10-25 0.598540
5 2019-11-08 0.547445
6 2019-11-01 0.722628
7 2019-11-15 0.788321
8 2019-11-22 0.875912
9 2019-11-27 0.948905
Desired output
I was able to cheese it by matching the natural index of the dataframe to the week. I wonder if there's another way to do this.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = {'Date': ['2019-10-04',
'2019-10-04',
'2019-10-11',
'2019-10-18',
'2019-10-25',
'2019-11-08',
'2019-11-01',
'2019-11-15',
'2019-11-22',
'2019-11-27'],
'Ratio': [0.350365,
0.416058,
0.489051,
0.540146,
0.598540,
0.547445,
0.722628,
0.788321,
0.875912,
0.948905]}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
graph = sns.lineplot(data=df,x='Date',y='Ratio')
plt.show()
# First plot looks bad.
week_mapping = dict(zip(df['Date'].unique(),range(len(df['Date'].unique()))))
df['Week'] = df['Date'].map(week_mapping)
graph = sns.lineplot(data=df,x='Week',y='Ratio')
plt.show()
# This plot looks better, but method seems cheesy.
It looks like your data is already spaced weekly, so you can just do:
df.groupby('Date',as_index=False)['Ratio'].mean().plot()
Output:
You can make a new column with the week number and use that as your x value. This would give you the week of the year. If you want to start your week numbers with 0, just subtract the week number of the first date from the value (see the commented out section of the code)
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from datetime import datetime as dt
data = {'Date': ['2019-10-04',
'2019-10-04',
'2019-10-11',
'2019-10-18',
'2019-10-25',
'2019-11-08',
'2019-11-01',
'2019-11-15',
'2019-11-22',
'2019-11-27'],
'Ratio': [0.350365,
0.416058,
0.489051,
0.540146,
0.598540,
0.547445,
0.722628,
0.788321,
0.875912,
0.948905]}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
# To get the week number of the year
df.loc[:, 'Week'] = df['Date'].dt.week
# Or you can use the line below for the exact output you had
#df.loc[:, 'Week'] = df['Date'].dt.week - (df.sort_values(by='Date').iloc[0,0].week)
graph = sns.lineplot(data=df,x='Week',y='Ratio')
plt.show()

python how to convert one column in dataframe to date tye and plot

I have one dataframe df as below:
df = pd.DataFrame({'date': [20121231,20130102, 20130105, 20130106, 20130107, 20130108],'price': [25, 163, 235, 36, 40, 82]})
How to make df['date'] as date type and make 'price' as y-label and 'date' as x-label?
Thanks a lot.
Use to_datetime with parameter format, check http://strftime.org/:
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
print (df)
date price
0 2012-12-31 25
1 2013-01-02 163
2 2013-01-05 235
3 2013-01-06 36
4 2013-01-07 40
5 2013-01-08 82
And then plot:
df.plot(x='date', y='price')
import pandas as pd
%matplotlib inline
df = pd.DataFrame({'date': [20121231,20130102, 20130105, 20130106, 20130107,
20130108],'price': [25, 163, 235, 36, 40, 82]})
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df.plot(x='date', y='price')
With pandas you can directly convert the date column to datetime type. And then you can plot with matplotlib. Take a look at this answer and also this one.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as dates
df = pd.DataFrame(
{'date': [20121231, 20130102, 20130105, 20130106, 20130107, 20130108],
'price': [25, 163, 235, 36, 40, 82]
})
fig, ax = plt.subplots()
# Date plot with matplotlib
ax.plot_date(
pd.to_datetime(df["date"], format="%Y%m%d"),
df["price"],
'v-'
)
# Days and months and the horizontal locators
ax.xaxis.set_minor_locator(dates.DayLocator())
ax.xaxis.set_minor_formatter(dates.DateFormatter('%d\n%a'))
ax.xaxis.set_major_locator(dates.MonthLocator())
ax.xaxis.set_major_formatter(dates.DateFormatter('\n\n\n%b\n%Y'))
ax.xaxis.grid(True, which="minor")
ax.yaxis.grid()
plt.tight_layout()
plt.show()
Result:

pandas scatter, timedata and size of point

My DataFrame looks that:
I plot it by this code:
tmp['event_name'].plot(style='.', figsize=(20,10), grid=True)
Results looks that:
I want to change size of points( using column details).
Question:
How can I do it? Plot haven't argument size and I can not using plot.scatter() because I can not use time format for x axis.
DataFrame.plot passes any unknown keywords down to Matplotlib.Artist, as stated in the linked docs. Therefore, you can specify the marker size using the general matplotlib syntax ms:
tmp['event_name'].plot(style='.', figsize=(20,10), grid=True, ms=5)
That said, you can use plt.scatter with time stamps as well, which makes using the 'details' column as marker size more straight forward:
import matplotlib.pyplot as plt
import pandas as pd
data = {'time': ['2015-01-01', '2015-01-02', '2015-01-03', '2015-01-04'],
'event_name': [2, 2, 2, 2],
'details': [46, 16, 1, 7]}
df = pd.DataFrame(data)
dates = [pd.to_datetime(date) for date in df.time]
plt.scatter(dates, df.event_name, s=df.details)
plt.show()
You can try so:
for index, i in enumerate(df['details']):
plt.plot(df.index[index], df.iloc[index]['event_name'], marker='.', linestyle='None', markersize=i*4, color='b')
plt.show()
Example:
import matplotlib.pyplot as plt
import pandas as pd
df = {'time': ['2015-01-01','2015-01-02','2015-01-03', '2015-01-04', '2015-01-05'],'event_name': [2,2,2,2,2], 'details':[46,16,1,7,4]}
df = pd.DataFrame(data=df)
df['time'] = pd.to_datetime(df['time'], format='%Y-%m-%d')
df = df.set_index('time')
df:
details event_name
time
2015-01-01 46 2
2015-01-02 16 2
2015-01-03 1 2
2015-01-04 7 2
2015-01-05 4 2
Output:

Categories