Let's say I have one-minute data during business hours of 8am to 4pm over three days. I would like to plot these data using the pandas plot function:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(51723)
dates = pd.date_range("11/8/2018", "11/11/2018", freq = "min")
df = pd.DataFrame(np.random.rand(len(dates)), index = dates, columns = ['A'])
df = df[(df.index.hour >= 8) & (df.index.hour <= 16)] # filter for business hours
fig, ax = plt.subplots()
df.plot(ax = ax)
plt.show()
However, the plot function also includes overnight hours in the plot, resulting in unintended plotting during this time:
I would the data to be plotted contiguously, ignoring the overnight time (something like this):
What is a good way to plot only the intended hours of 8am to 4pm?
This can be done by plotting each date on a different axis. But things like the labels will get cramped in certain cases.
import datetime
import matplotlib.pyplot as plt
pdates = np.unique(df.index.date) # Unique Dates
fig, ax = plt.subplots(ncols=len(pdates), sharey=True, figsize=(18,6))
# Adjust spacing between suplots
# (Set to 0 for continuous, though labels will overlap)
plt.subplots_adjust(wspace=0.05)
# Plot all data on each subplot, adjust the limits of each accordingly
for i in range(len(pdates)):
df.plot(ax=ax[i], legend=None)
# Hours 8-16 each day:
ax[i].set_xlim(datetime.datetime.combine(pdates[i], datetime.time(8)),
datetime.datetime.combine(pdates[i], datetime.time(16)))
# Deal with spines for each panel
if i !=0:
ax[i].spines['left'].set_visible(False)
ax[i].tick_params(right=False,
which='both',
left=False,
axis='y')
if i != len(pdates)-1:
ax[i].spines['right'].set_visible(False)
plt.show()
Related
I am trying to create a plot with an amount (int) in the y-axis and days in the x-axis.
I want the plot to always have the whole month in the x-axis although I dont have data for all days.
This is the code I tryed:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.dates as mdates
import datetime as dt
df=get_pandas_data(datab) #Taking data from database in pandas DataFrame
fig = plt.figure(figsize=(10,10)) #Initialize plot
ax1 = fig.add_subplot(1,1,1)
dates=[dt.datetime.strptime(d,'%Y-%m-%d').date() for d in df['date']]
dates=list(set(dates)) #Takes all the dates from de Dataframe and sets to avoid repeated dates
s=df.resample('D', on='date')['amount'].sum() #Takes the total amount for the same date
ax1.bar(dates,s) #Bar plot for dates and amount
ax1.set(xlabel="Date",
ylabel="Balance (€)",
title="Total Monthly balance") # Plot information
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%d-%m-%Y'))
#this is soposed to set all days of the month in the x-axis
ax1.xaxis.set_major_locator(mdates.DayLocator(interval=1))
fig.autofmt_xdate()
plt.show()
The result I get from this is a plot but only with those days that have data.
How can I make the plot to have all days in the month and plot the bar on those who have data?
This works fine with bare datetimes and matplotlib so you must be malforming your data somehow when doing your pandas manipulations. But we can't really help because we don't have your dataframe. Its always preferable to create a standalone example with dummy data, and as little code as possible to recreate the issue. a) 90% of the time you will realize your problem b) if not, we can help...
import numpy as np
import matplotlib.pyplot as plt
import datetime
x = np.array([1, 3, 7, 8, 10])
y = x * 2
dates = [datetime.datetime(2000, 2, xx) for xx in x]
fig, ax = plt.subplots()
ax.bar(dates, y)
fig.autofmt_xdate()
plt.show()
I am trying to create a heat map from pandas dataframe using seaborn library. Here, is the code:
test_df = pd.DataFrame(np.random.randn(367, 5),
index = pd.DatetimeIndex(start='01-01-2000', end='01-01-2001', freq='1D'))
ax = sns.heatmap(test_df.T)
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_minor_locator(mdates.DayLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
ax.xaxis.set_minor_formatter(mdates.DateFormatter('%d'))
However, I am getting a figure with nothing printed on the x-axis.
Seaborn heatmap is a categorical plot. It scales from 0 to number of columns - 1, in this case from 0 to 366. The datetime locators and formatters expect values as dates (or more precisely, numbers that correspond to dates). For the year in question that would be numbers between 730120 (= 01-01-2000) and 730486 (= 01-01-2001).
So in order to be able to use matplotlib.dates formatters and locators, you would need to convert your dataframe index to datetime objects first. You can then not use a heatmap, but a plot that allows for numerical axes, e.g. an imshow plot. You may then set the extent of that imshow plot to correspond to the date range you want to show.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
df = pd.DataFrame(np.random.randn(367, 5),
index = pd.DatetimeIndex(start='01-01-2000', end='01-01-2001', freq='1D'))
dates = df.index.to_pydatetime()
dnum = mdates.date2num(dates)
start = dnum[0] - (dnum[1]-dnum[0])/2.
stop = dnum[-1] + (dnum[1]-dnum[0])/2.
extent = [start, stop, -0.5, len(df.columns)-0.5]
fig, ax = plt.subplots()
im = ax.imshow(df.T.values, extent=extent, aspect="auto")
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_minor_locator(mdates.DayLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
fig.colorbar(im)
plt.show()
I found this question when trying to do a similar thing and you can hack together a solution but it's not very pretty.
For example I get the current labels, loop over them to find the ones for January and set those to just the year, setting the rest to be blank.
This gives me year labels in the correct position.
xticklabels = ax.get_xticklabels()
for label in xticklabels:
text = label.get_text()
if text[5:7] == '01':
label.set_text(text[0:4])
else:
label.set_text('')
ax.set_xticklabels(xticklabels)
Hopefully from that you can figure out what you want to do.
I am running into some issues adding Matplotlib lines into Pandas plot. I am trying to plot a straight line using the slope to determine what the start and end-points are. But the resultant graph does not look like a straight line at all.
I have simplified the case to the MVCE below. The initial part is for setup to replicate the key feature of the complicated dataframe I have.
import pandas as pd
import matplotlib.pyplot as plt
LEN_SER = 23
dates = pd.date_range('2015-07-03', periods=LEN_SER, freq='B')
df = pd.DataFrame(range(1,LEN_SER+1), index=dates)
ts = df.iloc[:,0]
# The above is the setup of the MVCE to replicate the issue.
fig = plt.figure()
ax1 = plt.subplot2grid((1, 1), (0, 0))
ax1.plot([ts.index[5], ts.index[20]],
[ts[5], ts[5] + (1.0 * (20 - 5))], 'o-')
ts.plot(ax=ax1)
plt.show()
This gives a graph that has a wavy line due to the weekends. The Matplotlib is affecting how Pandas is plotting the series. If I take out the ax1.plot() line, then it becomes a straight line.
So the question is: How do I draw straight lines on my Pandas plot with Matplotlib? Put it another way, I want the plot to treat the axis labels as categories so weekends will be ignored. That way, I am hoping that Matplotlib and Pandas will both give a straight line.
As you correctly observe, if you delete the line ax1.plot(), then matplotlib treats your dates as categories, and the pandas plot is a nice straight line. However, in the command
ax1.plot([ts.index[5], ts.index[20]],
[ts[5], ts[5] + (1.0 * (20 - 5))], 'o-')
you ask matplotlib to interpolate between two points, in the process of interpolating matplotlib recognize dates in the x-axis. That is why the straight line pandas plot with respect to date categories (5 a week) becomes a wavy line with respect to dates (7 a week). Which is correct as well, because with respect to dates your data simply isn't a represented by a straight line.
You can force the category interpretation replacing dates by strings through
df.index = df.reset_index().apply(lambda x: x['index'].strftime('%Y-%m-%d'), axis=1)
before defining ts. That results in the plot
Now the matplotlib plot is just two categories against two values and matplotlib does not bother to realize that the two categories are among the categories in the pandas plot. (Changing the order of the two plots saves your x-axis at least.) Modifying the matplotlib plot to
ax1.plot([5, 20], [ts[5], ts[5] + (1.0 * (20 - 5))], 'o-')
plots a line between categories 5 and 20, and finally gives you two straight lines with respect to a categories x-axis.
Full code:
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('seaborn') # (optional - style was set when I produced my graph)
LEN_SER = 23
dates = pd.date_range('2015-07-03', periods=LEN_SER, freq='B')
df = pd.DataFrame(range(1,LEN_SER+1), index=dates)
df.index = df.reset_index().apply(lambda x: \
x['index'].strftime('%Y-%m-%d'), axis=1) # dates -> categories (string)
ts = df.iloc[:,0]
# The above is the setup of the MVCE to replicate the issue.
fig = plt.figure()
ax1 = plt.subplot2grid((1, 1), (0, 0))
ax1.plot([5, 20], [ts[5], ts[5] + (1.0 * (20 - 5))], 'o-')
# x coordinates 'categories' 5 and 20
ts.plot(ax=ax1)
plt.show()
You already answered the question: " probably due to the weekends"
replace:
dates = pd.date_range('2015-07-03', periods=LEN_SER, freq='B')
with
dates = pd.date_range('2015-07-03', periods=LEN_SER, freq='D')
B - business day frequency
D - calendar day frequency
And your lines are straightened.
You're right - it is due to weekends. You can tell by the slope - five consecutive days have a sharper incline (+1 each day), than the three consecutive days (+1 total). So, what exactly do you want to plot? If you want to literally plot the blue line, you can interpolate the points between your two points like this:
...
# ts.plot(ax=ax1)
ts.iloc[[5,20]].resample('1D').interpolate(how='mean').plot(ax=ax1)
plt.show()
For simplicity I started from 2015-07-04. Does it work for you?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
LEN_SER = 21
dates = pd.date_range('2015-07-04', periods=LEN_SER, freq='B')
the_axes = []
# take the_axes like monday and friday for each week
for monday, friday in zip(dates[dates.weekday==0], dates[dates.weekday==4]):
the_axes.append([monday.date(), friday.date()])
x = dates
y = range(1,LEN_SER+1)
n_Axes = len(the_axes)
fig,(axes) = plt.subplots(1, n_Axes, sharey=True, figsize=(15,8))
for i in range(n_Axes):
ax = axes[i]
ax.plot(x, y)
ax.set_xlim(the_axes[i])
fig.autofmt_xdate()
print(dates)
plt.show()
I am trying to do analysis on a bike share dataset. Part of the analysis includes showing the weekends' demand in date wise plot.
My dataframe in pandas with last 5 row looks like this.
Here is my code for date vs total ride plot.
import seaborn as sns
sns.set_style("darkgrid")
plt.plot(d17_day_count)
plt.show()
.
I want to highlight weekends in the plot. So that it could look something similar to this plot.
I am using Python with matplotlib and seaborn library.
You can easily highlight areas by using axvspan, to get the areas to be highlighted you can run through the index of your dataframe and search for the weekend days. I've also added an example for highlighting 'occupied hours' during a working week (hopefully that doesn't confuse things).
I've created dummy data for a dataframe based on days and another one for hours.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# dummy data (Days)
dates_d = pd.date_range('2017-01-01', '2017-02-01', freq='D')
df = pd.DataFrame(np.random.randint(1, 20, (dates_d.shape[0], 1)))
df.index = dates_d
# dummy data (Hours)
dates_h = pd.date_range('2017-01-01', '2017-02-01', freq='H')
df_h = pd.DataFrame(np.random.randint(1, 20, (dates_h.shape[0], 1)))
df_h.index = dates_h
#two graphs
fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True)
#plot lines
dfs = [df, df_h]
for i, df in enumerate(dfs):
for v in df.columns.tolist():
axes[i].plot(df[v], label=v, color='black', alpha=.5)
def find_weekend_indices(datetime_array):
indices = []
for i in range(len(datetime_array)):
if datetime_array[i].weekday() >= 5:
indices.append(i)
return indices
def find_occupied_hours(datetime_array):
indices = []
for i in range(len(datetime_array)):
if datetime_array[i].weekday() < 5:
if datetime_array[i].hour >= 7 and datetime_array[i].hour <= 19:
indices.append(i)
return indices
def highlight_datetimes(indices, ax):
i = 0
while i < len(indices)-1:
ax.axvspan(df.index[indices[i]], df.index[indices[i] + 1], facecolor='green', edgecolor='none', alpha=.5)
i += 1
#find to be highlighted areas, see functions
weekend_indices = find_weekend_indices(df.index)
occupied_indices = find_occupied_hours(df_h.index)
#highlight areas
highlight_datetimes(weekend_indices, axes[0])
highlight_datetimes(occupied_indices, axes[1])
#formatting..
axes[0].xaxis.grid(b=True, which='major', color='black', linestyle='--', alpha=1) #add xaxis gridlines
axes[1].xaxis.grid(b=True, which='major', color='black', linestyle='--', alpha=1) #add xaxis gridlines
axes[0].set_xlim(min(dates_d), max(dates_d))
axes[0].set_title('Weekend days', fontsize=10)
axes[1].set_title('Occupied hours', fontsize=10)
plt.show()
I tried using the code in the accepted answer but the way the indices are used, the last weekend in the time series does not get highlighted entirely, despite what the image currently shown suggests (this is noticeable mainly with a frequency of 6 hours or more). Also, it does not work if the frequency of the data is higher than daily. This is why I share here a solution that uses the x-axis units so that weekends (or any other recurring time period) can be highlighted without any problem related to the index.
This solution takes only 6 lines of code and it works with any frequency. In the example below, it highlights full weekend days which makes it more efficient than the accepted answer where small frequencies (e.g. 30 minutes) will produce many polygons to cover the whole weekend.
The x-axis limits are used to compute the range of time covered by the plot in terms of days, which is the unit used for matplotlib dates. Then a weekends mask is computed and passed to the where argument of the fill_between plotting function. The masks are processed as right-exclusive so in this case, they must contain Mondays for the highlights to be drawn up to Mondays 00:00. Because plotting these highlights can alter the x-axis limits when weekends occur near the limits, the x-axis limits are set back to the original values after plotting.
Note that contrary to axvspan, the fill_between function needs the y1 and y2 arguments. For some reason, using the default y-axis limits leaves a small gap between the plot frame and the tops and bottoms of the weekend highlights. This issue is solved by running ax.set_ylim(*ax.get_ylim()) just after creating the plot.
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
import matplotlib.dates as mdates
# Create sample dataset
rng = np.random.default_rng(seed=1234) # random number generator
dti = pd.date_range('2017-01-01', '2017-05-15', freq='D')
counts = 5000 + np.cumsum(rng.integers(-1000, 1000, size=dti.size))
df = pd.DataFrame(dict(Counts=counts), index=dti)
# Draw pandas plot: x_compat=True converts the pandas x-axis units to matplotlib
# date units (not strictly necessary when using a daily frequency like here)
ax = df.plot(x_compat=True, figsize=(10, 5), legend=None, ylabel='Counts')
ax.set_ylim(*ax.get_ylim()) # reset y limits to display highlights without gaps
# Highlight weekends based on the x-axis units
xmin, xmax = ax.get_xlim()
days = np.arange(np.floor(xmin), np.ceil(xmax)+2)
weekends = [(dt.weekday()>=5)|(dt.weekday()==0) for dt in mdates.num2date(days)]
ax.fill_between(days, *ax.get_ylim(), where=weekends, facecolor='k', alpha=.1)
ax.set_xlim(xmin, xmax) # set limits back to default values
# Create appropriate ticks using matplotlib date tick locators and formatters
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_minor_locator(mdates.MonthLocator(bymonthday=np.arange(5, 31, step=7)))
ax.xaxis.set_major_formatter(mdates.DateFormatter('\n%b'))
ax.xaxis.set_minor_formatter(mdates.DateFormatter('%d'))
# Additional formatting
ax.figure.autofmt_xdate(rotation=0, ha='center')
title = 'Daily count of trips with weekends highlighted from SAT 00:00 to MON 00:00'
ax.set_title(title, pad=20, fontsize=14);
As you can see, the weekends are always highlighted to the full extent, regardless of where the data starts and ends.
You can find more examples of this solution in the answers I have posted here and here.
I have another suggestion to make in this regard, which takes inspirations from previous posts by other contributors. The code is as follows:
import datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
rng = np.random.default_rng(seed=42) # random number generator
dti = pd.date_range('2021-08-01', '2021-08-31', freq='D')
counts = 5000 + np.cumsum(rng.integers(-1000, 1000, size=dti.size))
df = pd.DataFrame(dict(Counts=counts), index=dti)
weekends = [d for d in df.index if d.isoweekday() in [6,7]]
weekend_list = []
for weekendday in weekends:
d1 = weekendday
d2 = weekendday + datetime.timedelta(days=1)
weekend_list.append((d1, d2))
weekend_df = pd.DataFrame(weekend_list)
sns.set()
plt.figure(figsize=(15, 10), dpi=100)
df.plot()
plt.legend(bbox_to_anchor=(1.02, 0), loc="lower left", borderaxespad=0)
plt.ylabel("Counts")
plt.xlabel("Date of visit")
plt.xticks(rotation = 0)
plt.title("Daily counts of shop visits with weekends highlighted in green")
ax = plt.gca()
for d in weekend_df.index:
print(weekend_df[0][d], weekend_df[1][d])
ax.axvspan(weekend_df[0][d], weekend_df[1][d], facecolor="g", edgecolor="none", alpha=0.5)
ax.relim()
ax.autoscale_view()
plt.savefig("junk.png", dpi=100, bbox_inches='tight', pad_inches=0.2)
The result would be something like the following diagram:
I have data that shows some values collected on three different dates: 2015-01-08, 2015-01-09 and 2015-01-12. For each date there are several data points that have timestamps.
Date/times are in a list and it looks as follows:
['2015-01-08-09:00:00', '2015-01-08-10:00:00', '2015-01-08-11:00:00', '2015-01-08-12:00:00', '2015-01-08-13:00:00', '2015-01-09-14:00:00', '2015-01-09-15:00:00', '2015-01-09-16:00:00', '2015-01-12-09:00:00', '2015-01-12-10:00:00', '2015-01-12-11:00:00']
On the other hand I have corresponding values (floats) in another list:
[12210.0, 12210.0, 12180.0, 12240.0, 12250.0, 12420.0, 12390.0, 12400.0, 12380.0, 12450.0, 12460.0]
To put all this together and plot a graph I use following code:
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import matplotlib.dates as md
import dateutil
from matplotlib.font_manager import FontProperties
timestamps = ['2015-01-08-09:00:00', '2015-01-08-10:00:00', '2015-01-08-11:00:00', '2015-01-08-12:00:00', '2015-01-08-13:00:00', '2015-01-09-14:00:00', '2015-01-09-15:00:00', '2015-01-09-16:00:00', '2015-01-12-09:00:00', '2015-01-12-10:00:00', '2015-01-12-11:00:00']
ticks = [12210.0, 12210.0, 12180.0, 12240.0, 12250.0, 12420.0, 12390.0, 12400.0, 12380.0, 12450.0, 12460.0]
plt.subplots_adjust(bottom=0.2)
plt.xticks( rotation=90 )
dates = [dateutil.parser.parse(s) for s in timestamps]
ax=plt.gca()
ax.set_xticks(dates)
ax.tick_params(axis='x', labelsize=8)
xfmt = md.DateFormatter('%H:%M:%S')
ax.xaxis.set_major_formatter(xfmt)
plt.plot(dates, ticks, label="Price")
plt.xlabel("Date and time", fontsize=12)
plt.ylabel("Price", fontsize=12)
plt.suptitle("Price during last three days", fontsize=12)
plt.legend(loc=0,prop={'size':8})
plt.savefig("figure.pdf")
When I try to plot these datetimes and values I get a messy graph with the line going back and forth.
It looks like the dates are being ignored and only timestamps are taken in account which is the reason for the messy chart. I tried to edit the datetimes to have the same date and consecutive timestamps and it fixed the chart. However, I must have dates as well..
What am I doing wrong?
When I try to plot these datetimes and values I get a messy graph with the line going back and forth.
Your plots are going all over the place because plt.plot connects the dots in the order you give it. If this order is not monotonically increasing in x, then it looks "messy". You can sort the points by x first to fix this. Here is a minimal example:
import numpy as np
import pylab as plt
X = np.random.random(20)
Y = 2*X+np.random.random(20)
idx = np.argsort(X)
X2 = X[idx]
Y2 = Y[idx]
fig,ax = plt.subplots(2,1)
ax[0].plot(X,Y)
ax[1].plot(X2,Y2)
plt.show()