Despite trying some solutions available on SO and at Matplotlib's documentation, I'm still unable to disable Matplotlib's creation of weekend dates on the x-axis.
As you can see see below, it adds dates to the x-axis that are not in the original Pandas column.
I'm plotting my data using (commented lines are unsuccessful in achieving my goal):
fig, ax1 = plt.subplots()
x_axis = df.index.values
ax1.plot(x_axis, df['MP'], color='k')
ax2 = ax1.twinx()
ax2.plot(x_axis, df['R'], color='r')
# plt.xticks(np.arange(len(x_axis)), x_axis)
# fig.autofmt_xdate()
# ax1.fmt_xdata = mdates.DateFormatter('%Y-%m-%d')
fig.tight_layout()
plt.show()
An example of my Pandas dataframe is below, with dates as index:
2019-01-09 1.007042 2585.898714 4.052480e+09 19.980000 12.07 1
2019-01-10 1.007465 2581.828491 3.704500e+09 19.500000 19.74 1
2019-01-11 1.007154 2588.605258 3.434490e+09 18.190001 18.68 1
2019-01-14 1.008560 2582.151225 3.664450e+09 19.070000 14.27 1
Some suggestions I've found include a custom ticker here and here however although I don't get errors the plot is missing my second series.
Any suggestions on how to disable date interpolation in matplotlib?
The matplotlib site recommends creating a custom formatter class. This class will contain logic that tells the axis label not to display anything if the date is a weekend. Here's an example using a dataframe I constructed from the 2018 data that was in the image you'd attached:
df = pd.DataFrame(
data = {
"Col 1" : [1.000325, 1.000807, 1.001207, 1.000355, 1.001512, 1.003237, 1.000979],
"MP": [2743.002071, 2754.011543, 2746.121450, 2760.169848, 2780.756857, 2793.953050, 2792.675162],
"Col 3": [3.242650e+09, 3.453480e+09, 3.576350e+09, 3.641320e+09, 3.573970e+09, 3.573970e+09, 4.325970e+09],
"Col 4": [9.520000, 10.080000, 9.820000, 9.880000, 10.160000, 10.160000, 11.660000],
"Col 5": [5.04, 5.62, 5.29, 6.58, 8.32, 9.57, 9.53],
"R": [0,0,0,0,0,1,1]
},
index=['2018-01-08', '2018-01-09', '2018-01-10', '2018-01-11',
'2018-01-12', '2018-01-15', '2018-01-16'])
Move the dates from the index to their own column:
df = df.reset_index().rename({'index': 'Date'}, axis=1, copy=False)
df['Date'] = pd.to_datetime(df['Date'])
Create the custom formatter class:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import Formatter
%config InlineBackend.figure_format = 'retina' # Get nicer looking graphs for retina displays
class CustomFormatter(Formatter):
def __init__(self, dates, fmt='%Y-%m-%d'):
self.dates = dates
self.fmt = fmt
def __call__(self, x, pos=0):
'Return the label for time x at position pos'
ind = int(np.round(x))
if ind >= len(self.dates) or ind < 0:
return ''
return self.dates[ind].strftime(self.fmt)
Now let's plot the MP and R series. Pay attention to the line where we call the custom formatter:
formatter = CustomFormatter(df['Date'])
fig, ax1 = plt.subplots()
ax1.xaxis.set_major_formatter(formatter)
ax1.plot(np.arange(len(df)), df['MP'], color='k')
ax2 = ax1.twinx()
ax2.plot(np.arange(len(df)), df['R'], color='r')
fig.autofmt_xdate()
fig.tight_layout()
plt.show()
The above code outputs this graph:
Now, no weekend dates, such as 2018-01-13, are displayed on the x-axis.
If you would like to simply not show the weekends, but for the graph to still scale correctly matplotlib has a built-in functionality for this in matplotlib.mdates. Specifically, the WeekdayLocator pretty much solves this problem singlehandedly. It's a one line solution (the rest just fabricates data for testing). Note that this works whether or not the data includes weekends:
import matplotlib.pyplot as plt
import datetime
import numpy as np
import matplotlib.dates as mdates
from matplotlib.dates import MO, TU, WE, TH, FR, SA, SU
DT_FORMAT="%Y-%m-%d"
if __name__ == "__main__":
N = 14
#Fake data
x = list(zip([2018]*N, [5]*N, list(range(1,N+1))))
x = [datetime.datetime(*y) for y in x]
x = [y for y in x if y.weekday() < 5]
random_walk_steps = 2 * np.random.randint(0, 6, len(x)) - 3
random_walk = np.cumsum(random_walk_steps)
y = np.arange(len(x)) + random_walk
# Make a figure and plot everything
fig, ax = plt.subplots()
ax.plot(x, y)
### HERE IS THE BIT THAT ANSWERS THE QUESTION
ax.xaxis.set_major_locator(mdates.WeekdayLocator(byweekday=(MO, TU, WE, TH, FR)))
ax.xaxis.set_major_formatter(mdates.DateFormatter(DT_FORMAT))
# plot stuff
fig.autofmt_xdate()
plt.tight_layout()
plt.show()
If you are trying to avoid the fact that matplotlib is interpolating between each point of your dataset, you can exploit the fact that matplotlib will plot a new line segment each time a np.NaN is encountered. Pandas makes it easy to insert np.NaN for the days that are not in your dataset with pd.Dataframe.asfreq():
df = pd.DataFrame(data = {
"Col 1" : [1.000325, 1.000807, 1.001207, 1.000355, 1.001512, 1.003237, 1.000979],
"MP": [2743.002071, 2754.011543, 2746.121450, 2760.169848, 2780.756857, 2793.953050, 2792.675162],
"Col 3": [3.242650e+09, 3.453480e+09, 3.576350e+09, 3.641320e+09, 3.573970e+09, 3.573970e+09, 4.325970e+09],
"Col 4": [9.520000, 10.080000, 9.820000, 9.880000, 10.160000, 10.160000, 11.660000],
"Col 5": [5.04, 5.62, 5.29, 6.58, 8.32, 9.57, 9.53],
"R": [0,0,0,0,0,1,1]
},
index=['2018-01-08', '2018-01-09', '2018-01-10', '2018-01-11', '2018-01-12', '2018-01-15', '2018-01-16'])
df.index = pd.to_datetime(df.index)
#rescale R so I don't need to worry about twinax
df.loc[df.R==0, 'R'] = df.loc[df.R==0, 'R'] + df.MP.min()
df.loc[df.R==1, 'R'] = df.loc[df.R==1, 'R'] * df.MP.max()
df = df.asfreq('D')
df
Col 1 MP Col 3 Col 4 Col 5 R
2018-01-08 1.000325 2743.002071 3.242650e+09 9.52 5.04 2743.002071
2018-01-09 1.000807 2754.011543 3.453480e+09 10.08 5.62 2743.002071
2018-01-10 1.001207 2746.121450 3.576350e+09 9.82 5.29 2743.002071
2018-01-11 1.000355 2760.169848 3.641320e+09 9.88 6.58 2743.002071
2018-01-12 1.001512 2780.756857 3.573970e+09 10.16 8.32 2743.002071
2018-01-13 NaN NaN NaN NaN NaN NaN
2018-01-14 NaN NaN NaN NaN NaN NaN
2018-01-15 1.003237 2793.953050 3.573970e+09 10.16 9.57 2793.953050
2018-01-16 1.000979 2792.675162 4.325970e+09 11.66 9.53 2793.953050
df[['MP', 'R']].plot(); plt.show()
Related
I am trying to plot a simple pandas Series object, its something like this:
2018-01-01 10
2018-01-02 90
2018-01-03 79
...
2020-01-01 9
2020-01-02 72
2020-01-03 65
It includes only the first month of each year, so it only contains the month January and all its values through the days.
When i try to plot it
# suppose the name of the series is dates_and_values
dates_and_values.plot()
It returns a plot like this (made using my current data)
It is clearly plotting by year and then the month, so it looks pretty squished and small, since i don't have any other months except January, is there a way to plot it by the year and day so it outputs a better plot to observe the days.
the x-axis is the index of the dataframe
dates are a continuous series, x-axis is continuous
change index to be a string of values, means it it no longer continuous and squishes your graph
have generated some sample data that only has January to demonstrate
import matplotlib.pyplot as plt
cf = pd.tseries.offsets.CustomBusinessDay(weekmask="Sun Mon Tue Wed Thu Fri Sat",
holidays=[d for d in pd.date_range("01-jan-1990",periods=365*50, freq="D")
if d.month!=1])
d = pd.date_range("01-jan-2015", periods=200, freq=cf)
df = pd.DataFrame({"Values":np.random.randint(20,70,len(d))}, index=d)
fig, ax = plt.subplots(2, figsize=[14,6])
df.set_index(df.index.strftime("%Y %d")).plot(ax=ax[0])
df.plot(ax=ax[1])
I suggest that you convert the series to a dataframe and then pivot it to get one column for each year. This lets you plot the data for each year with a separate line, either in the same plot using different colors or in subplots. Here is an example:
import numpy as np # v 1.19.2
import pandas as pd # v 1.2.3
# Create sample series
rng = np.random.default_rng(seed=123) # random number generator
dt = pd.date_range('2018-01-01', '2020-01-31', freq='D')
dt_jan = dt[dt.month == 1]
series = pd.Series(rng.integers(20, 90, size=dt_jan.size), index=dt_jan)
# Convert series to dataframe and pivot it
df_raw = series.to_frame()
df_pivot = df_raw.pivot_table(index=df_raw.index.day, columns=df_raw.index.year)
df = df_pivot.droplevel(axis=1, level=0)
df.head()
# Plot all years together in different colors
ax = df.plot(figsize=(10,4))
ax.set_xlim(1, 31)
ax.legend(frameon=False, bbox_to_anchor=(1, 0.65))
ax.set_xlabel('January', labelpad=10, size=12)
for spine in ['top', 'right']:
ax.spines[spine].set_visible(False)
# Plot years separately
axs = df.plot(subplots=True, color='tab:blue', sharey=True,
figsize=(10,8), legend=None)
for ax in axs:
ax.set_xlim(1, 31)
ax.grid(axis='x', alpha=0.3)
handles, labels = ax.get_legend_handles_labels()
ax.text(28.75, 80, *labels, size=14)
if ax.is_last_row():
ax.set_xlabel('January', labelpad=10, size=12)
ax.figure.subplots_adjust(hspace=0)
I am creating these timeseries plots specifically stl decomposition and already managed to get all the plots into one. The issue I am having is having them shown side by side like the solution here. I tried the solution on the link but it did not work, instead I kept getting an empty plot on the top.
I have four time series plots and managed to get them outputted on the bottom of each other however I
would like to have them side by side or two side by side and the last two on the bottom side by side.
Then for the dates on the xaxis, I have already tried using ax.xaxis.set_major_formatter(DateFormatter('%b %Y')) but it is not working on the code below since the res.plot function won't allow it.
I have already searched everywhere but I can't find the solution to my issue. I would appreciate any help.
Data
Date Crime
0 2018-01-01 149
1 2018-01-02 88
2 2018-01-03 86
3 2018-01-04 100
4 2018-01-05 123
... ... ...
664 2019-10-27 142
665 2019-10-28 113
666 2019-10-29 126
667 2019-10-30 120
668 2019-10-31 147
Code
from statsmodels.tsa.seasonal import STL
import matplotlib.pyplot as plt
import seaborn as sns
from pandas.plotting import register_matplotlib_converters
from matplotlib.dates import DateFormatter
register_matplotlib_converters()
sns.set(style='whitegrid', palette = sns.color_palette('winter'), rc={'axes.titlesize':17,'axes.labelsize':17, 'grid.linewidth': 0.5})
plt.rc("axes.spines", top=False, bottom = False, right=False, left=False)
plt.rc('font', size=13)
plt.rc('figure',figsize=(17,12))
#fig=plt.figure()
#fig, axes = plt.subplots(2, sharex=True)
#fig,(ax,ax2,ax3,ax4) = plt.subplots(1,4,sharey=True)
#fig, ax = plt.subplots()
#fig, axes = plt.subplots(1,3,sharex=True, sharey=True, figsize=(12,5))
#ax.plot([0, 0], [0,1])
stl = STL(seatr, seasonal=13)
res = stl.fit()
res.plot()
plt.title('Seattle', fontsize = 20, pad=670)
stl2 = STL(latr, seasonal=13)
res2 = stl.fit()
res2.plot()
plt.title('Los Angles', fontsize = 20, pad=670)
stl3 = STL(sftr, seasonal=13)
res3 = stl.fit()
res3.plot()
plt.title('San Francisco', fontsize = 20, pad=670)
stl4 = STL(phtr, seasonal=13)
res4 = stl.fit()
res4.plot()
plt.title('Philadelphia', fontsize = 20, pad=670)
#ax.xaxis.set_major_formatter(DateFormatter('%b %Y'))
One of the Plots
Whole Output
Here is an example using artificial data. The main idea is to group the outputs in to DataFrames and then to plot these using the pandas plot function.
Note that I had to change your code to use stl2, stl3 and stl4 when fitting.
from statsmodels.tsa.seasonal import STL
import matplotlib.pyplot as plt
import seaborn as sns
from pandas.plotting import register_matplotlib_converters
from matplotlib.dates import DateFormatter
register_matplotlib_converters()
sns.set(style='whitegrid', palette = sns.color_palette('winter'), rc={'axes.titlesize':17,'axes.labelsize':17, 'grid.linewidth': 0.5})
plt.rc("axes.spines", top=False, bottom = False, right=False, left=False)
plt.rc('font', size=13)
plt.rc('figure',figsize=(17,12))
idx = pd.date_range("1-1-2020", periods=200, freq="M")
seas = 10*np.sin(np.arange(200) * np.pi/12)
trend = np.arange(200) / 10.0
seatr = pd.Series(trend + seas + np.random.standard_normal(200), name="Seattle", index=idx)
latr = pd.Series(trend + seas + np.random.standard_normal(200), name="LA", index=idx)
sftr = pd.Series(trend + seas + np.random.standard_normal(200), name="SF", index=idx)
phtr = pd.Series(trend + seas + np.random.standard_normal(200), name="Philly", index=idx)
stl = STL(seatr, seasonal=13)
res = stl.fit()
stl2 = STL(latr, seasonal=13)
res2 = stl2.fit()
stl3 = STL(sftr, seasonal=13)
res3 = stl3.fit()
stl4 = STL(phtr, seasonal=13)
res4 = stl4.fit()
data = pd.concat([seatr, latr, sftr, phtr], 1)
trends = pd.concat([res.trend, res2.trend, res3.trend, res4.trend], 1)
seasonals = pd.concat([res.seasonal, res2.seasonal, res3.seasonal, res4.seasonal], 1)
resids = pd.concat([res.resid, res2.resid, res3.resid, res4.resid], 1)
fig, axes = plt.subplots(4,1)
data.plot(ax=axes[0])
trends.plot(ax=axes[1])
seasonals.plot(ax=axes[2])
resids.plot(ax=axes[3])
This produces:
I have tried unsuccessfully to create a bar plot of a time series dataset. I have tried converting the dates to Pandas Datetime objects, Timestamp Objects, primitive strings, floats, and ints. No matter what I do, I get the following error: TypeError: float() argument must be a string or a number, not 'Timestamp' Here are a few minimal examples that produce the error:
Here, the 'Date' object is of type <class 'pandas._libs.tslibs.timestamps.Timestamp'>, so I know why this doesn't work:
import matplotlib.pylab as plt
import matplotlib.dates as mdates
import seaborn as sns
def main():
path = 'Data/AQ+RX Counts.csv'
df = pd.read_csv(path, parse_dates=['Date'], index_col=['Date'])
weekly_df = df.resample('W').mean().reset_index()
weekly_df['count'] = df['count'].resample('W').sum().reset_index()
sns.barplot(x = 'Date', y='count', data = weekly_df)
plt.show()
main()
I then tried making the dates floats, intending to format them back to dates after, but this still doesnt work:
dates = mdates.datestr2num(weekly_df.Date.astype(str))
weekly_df['n_dates'] = dates
sns.barplot(x = 'n_dates', y='count', data = weekly_df)
plt.show()
I also tried making them integers, to no avail:
dates = mdates.datestr2num(weekly_df.Date.astype(str))
dates = dates.astype(int)
dates = pd.Series(dates)
weekly_df['n_dates'] = dates
sns.barplot(x = 'n_dates', y='count', data = weekly_df)
plt.show()
I've tried many other variations, and all produce the same error. I've even compared it to other code and verified that all the types are identical, and the comparison code works fine. I am at a complete loss of where to go from here.
Dataframe:
Date,WSA,WSV,WDV,WSM,SGT,T2M,T10M,DELTA_T,PBAR,SRAD,RH,PM25,AQI,count
2015-01-01,1.0708333333333335,0.8750000000000001,132.95833333333334,3.4708333333333337,35.39166666666667,30.72916666666667,30.625,-0.11666666666666667,738.8249999999998,72.66666666666667,99.75416666666666,24.80833333333333,73.30793131580873,0.0
2015-01-02,1.1086956521739129,0.9391304347826086,148.47826086956522,3.734782608695653,32.46521739130434,34.39130434782609,34.27826086956521,-0.11739130434782602,738.3478260869565,61.39130434782609,100.01304347826084,23.500000000000004,64.15072523318715,4.0
2015-01-03,1.0173913043478258,0.7173913043478259,168.04347826086956,3.773913043478261,42.71739130434783,36.24782608695652,36.160869565217396,-0.09565217391304348,739.4434782608695,49.60869565217392,100.76956521739132,20.460869565217394,55.65271063058384,0.0
2015-01-04,1.0,0.6,159.95833333333334,3.85,49.15,38.8875,38.66666666666666,-0.225,741.5000000000001,31.54166666666667,101.47916666666669,13.012499999999998,46.835258118800965,0.0
2015-01-05,1.0333333333333334,0.4416666666666667,137.0,4.0,57.56666666666666,42.99583333333333,42.94583333333333,-0.04999999999999995,742.5333333333333,44.58333333333334,101.00416666666666,16.654166666666665,52.420271225456766,4.0
2015-01-06,0.7818181818181817,0.5590909090909091,114.72727272727272,3.654545454545455,42.86818181818182,40.7409090909091,41.09545454545454,0.36818181818181817,740.9045454545453,48.27272727272727,100.57727272727274,21.954545454545453,67.31833852518514,6.0
2015-01-07,0.9739130434782608,0.8304347826086954,110.82608695652172,3.956521739130436,30.817391304347833,40.36521739130435,40.59565217391304,0.22173913043478266,739.8652173913043,60.04347826086956,100.19565217391305,24.456521739130434,72.3472505968891,6.0
2015-01-08,0.9833333333333336,0.8250000000000001,156.5,4.208333333333333,32.67083333333333,41.520833333333336,41.36666666666667,-0.12916666666666668,736.35,69.58333333333333,99.95833333333331,22.274999999999995,65.77072473472253,10.0
2015-01-09,0.9583333333333331,0.7291666666666669,133.70833333333334,3.3791666666666664,39.645833333333336,42.279166666666654,42.15833333333333,-0.11666666666666665,735.2041666666665,60.41666666666666,100.04166666666669,19.370833333333334,59.08512936837911,10.0
2015-01-10,0.9666666666666668,0.7583333333333336,164.5,3.675,37.34583333333333,42.96250000000001,42.775,-0.2,734.2875,41.5,100.12083333333337,14.658333333333335,49.31465266245389,0.0
The transformation in the function is converting 'count' from floats to a datetime dtype.
Using the posted sample data
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
path = 'data/test.csv'
df = pd.read_csv(path, parse_dates=['Date'], index_col=['Date'])
# display(df)
WSA WSV WDV WSM SGT T2M T10M DELTA_T PBAR SRAD RH PM25 AQI count
Date
2015-01-01 1.070833 0.875000 132.958333 3.470833 35.391667 30.729167 30.625000 -0.116667 738.825000 72.666667 99.754167 24.808333 73.307931 0.0
2015-01-02 1.108696 0.939130 148.478261 3.734783 32.465217 34.391304 34.278261 -0.117391 738.347826 61.391304 100.013043 23.500000 64.150725 4.0
2015-01-03 1.017391 0.717391 168.043478 3.773913 42.717391 36.247826 36.160870 -0.095652 739.443478 49.608696 100.769565 20.460870 55.652711 0.0
2015-01-04 1.000000 0.600000 159.958333 3.850000 49.150000 38.887500 38.666667 -0.225000 741.500000 31.541667 101.479167 13.012500 46.835258 0.0
2015-01-05 1.033333 0.441667 137.000000 4.000000 57.566667 42.995833 42.945833 -0.050000 742.533333 44.583333 101.004167 16.654167 52.420271 4.0
2015-01-06 0.781818 0.559091 114.727273 3.654545 42.868182 40.740909 41.095455 0.368182 740.904545 48.272727 100.577273 21.954545 67.318339 6.0
2015-01-07 0.973913 0.830435 110.826087 3.956522 30.817391 40.365217 40.595652 0.221739 739.865217 60.043478 100.195652 24.456522 72.347251 6.0
2015-01-08 0.983333 0.825000 156.500000 4.208333 32.670833 41.520833 41.366667 -0.129167 736.350000 69.583333 99.958333 22.275000 65.770725 10.0
2015-01-09 0.958333 0.729167 133.708333 3.379167 39.645833 42.279167 42.158333 -0.116667 735.204167 60.416667 100.041667 19.370833 59.085129 10.0
2015-01-10 0.966667 0.758333 164.500000 3.675000 37.345833 42.962500 42.775000 -0.200000 734.287500 41.500000 100.120833 14.658333 49.314653 0.0
# resample mean
dfr = df.resample('W').mean()
# add the resampled sum to dfr
dfr['mean'] = df['count'].resample('W').sum()
# reset index
dfr = dfr.reset_index()
# display(dfr)
Date WSA WSV WDV WSM SGT T2M T10M DELTA_T PBAR SRAD RH PM25 AQI count mean
0 2015-01-04 1.049230 0.782880 152.359601 3.707382 39.931069 35.063949 34.932699 -0.138678 739.529076 53.802083 100.503986 20.445426 59.986656 1.0 4.0
1 2015-01-11 0.949566 0.690615 136.210282 3.812261 40.152457 41.810743 41.822823 0.015681 738.190794 54.066590 100.316321 19.894900 61.042728 6.0 36.0
# plot dfr
fig, ax = plt.subplots(figsize=(16, 10))
fig = sns.barplot(x='Date', y='count', data=dfr)
# configure the xaxis ticks from datetime to date
x_dates = dfr.Date.dt.strftime('%Y-%m-%d').sort_values().unique()
ax.set_xticklabels(labels=x_dates, rotation=90, ha='right')
plt.show()
I need help figuring out how to plot sub-plots for easy comparison from my dataframe shown:
Date A B C
2017-03-22 15:00:00 obj1 value_a other_1
2017-03-22 14:00:00 obj2 value_ns other_5
2017-03-21 15:00:00 obj3 value_kdsa other_23
2014-05-08 17:00:00 obj2 value_as other_4
2010-07-01 20:00:00 obj1 value_as other_0
I am trying to graph the occurrences of each hour for each respective day of the week. So count the number of occurrences for each day of the week and hour and plot them on subplots like the ones shown below.
If this question sounds confusing please let me know if you have any questions. Thanks.
You can accomplish this with multiple groupby. Since we know there are 7 days in a week, we can specify that number of panels. If you groupby(df.Date.dt.dayofweek), you can use the group index as the index for your subplot axes:
Sample Data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
n = 10000
np.random.seed(123)
df = pd.DataFrame({'Date': pd.date_range('2010-01-01', freq='1.09min', periods=n),
'A': np.random.randint(1,10,n),
'B': np.random.normal(0,1,n)})
Code:
fig, ax = plt.subplots(ncols=7, figsize=(30,5))
plt.subplots_adjust(wspace=0.05) #Remove some whitespace between subplots
for idx, gp in df.groupby(df.Date.dt.dayofweek):
ax[idx].set_title(gp.Date.dt.day_name().iloc[0]) #Set title to the weekday
(gp.groupby(gp.Date.dt.hour).size().rename_axis('Tweet Hour').to_frame('')
.reindex(np.arange(0,24,1)).fillna(0)
.plot(kind='bar', ax=ax[idx], rot=0, ec='k', legend=False))
# Ticks and labels on leftmost only
if idx == 0:
_ = ax[idx].set_ylabel('Counts', fontsize=11)
_ = ax[idx].tick_params(axis='both', which='major', labelsize=7,
labelleft=(idx == 0), left=(idx == 0))
# Consistent bounds between subplots.
lb, ub = list(zip(*[axis.get_ylim() for axis in ax]))
for axis in ax:
axis.set_ylim(min(lb), max(ub))
plt.show()
If you'd like to make the aspect ratio less extreme, then consider plotting a 4x2 grid. It's a very similar plot as above, once we flatten the axis array. There's some integer and remainder division to figure out which axes need the labels.
fig, ax = plt.subplots(nrows=2, ncols=4, figsize=(20,10))
fig.delaxes(ax[1,3]) #7 days in a week, remove 8th panel
ax = ax.flatten() #Far easier to work with a flattened array
lsize=8
plt.subplots_adjust(wspace=0.05, hspace=0.15) #Remove some whitespace between subplots
for idx, gp in df.groupby(df.Date.dt.dayofweek):
ax[idx].set_title(gp.Date.dt.day_name().iloc[0]) #Set title to the weekday
(gp.groupby(gp.Date.dt.hour).size().rename_axis([None]).to_frame()
.reindex(np.arange(0,24,1)).fillna(0)
.plot(kind='bar', ax=ax[idx], rot=0, ec='k', legend=False))
# Titles on correct panels
if idx%4 == 0:
_ = ax[idx].set_ylabel('Counts', fontsize=11)
if (idx//4 == 1) | (idx%4 == 3):
_ = ax[idx].set_xlabel('Tweet Hour', fontsize=11)
# Ticks on correct panels
_ = ax[idx].tick_params(axis='both', which='major', labelsize=lsize,
labelbottom=(idx//4 == 1) | (idx%4 == 3),
bottom=(idx//4 == 1) | (idx%4 == 3),
labelleft=(idx%4 == 0),
left=(idx%4 == 0))
# Consistent bounds between subplots.
lb, ub = list(zip(*[axis.get_ylim() for axis in ax]))
for axis in ax:
axis.set_ylim(min(lb), max(ub))
plt.show()
What about using seaborn? sns.FacetGrid was made for this:
import pandas as pd
import seaborn as sns
# make some data
date = pd.date_range('today', periods=100, freq='2.5H')
# put in dataframe
df = pd.DataFrame({
'date' : date
})
# create day_of_week and hour columns
df['dow'] = df.date.dt.day_name()
df['hour'] = df.date.dt.hour
# create facet grid
g = sns.FacetGrid(data=df.groupby([
'dow',
'hour'
]).hour.count().to_frame(name='day_hour_count').reset_index(), col='dow', col_order=[
'Sunday',
'Monday',
'Tuesday',
'Wednesday',
'Thursday',
'Friday',
'Saturday'
], col_wrap=3)
# map barplot to each subplot
g.map(sns.barplot, 'hour', 'day_hour_count');
I have the following data set:
In[55]: usdbrl
Out[56]:
Date Price Open High Low Change STD
0 2016-03-18 3.6128 3.6241 3.6731 3.6051 -0.31 0.069592
1 2016-03-17 3.6241 3.7410 3.7449 3.6020 -3.16 0.069041
2 2016-03-16 3.7422 3.7643 3.8533 3.7302 -0.62 0.068772
3 2016-03-15 3.7656 3.6610 3.7814 3.6528 2.83 0.071474
4 2016-03-14 3.6618 3.5813 3.6631 3.5755 2.23 0.070348
5 2016-03-11 3.5820 3.6204 3.6692 3.5716 -1.09 0.076458
6 2016-03-10 3.6215 3.6835 3.7102 3.6071 -1.72 0.062977
7 2016-03-09 3.6849 3.7543 3.7572 3.6790 -1.88 0.041329
8 2016-03-08 3.7556 3.7826 3.8037 3.7315 -0.72 0.013700
9 2016-03-07 3.7830 3.7573 3.7981 3.7338 0.63 0.000000
I want to plot Price against Date:
But I would like to color the line by a third variable (in my case Date or Change).
Could anybody help with this please?
Thanks.
I've wrote a simple function to map a given property into a color:
import matplotlib.cm as cm
import matplotlib.pyplot as plt
def plot_colourline(x,y,c):
c = cm.jet((c-np.min(c))/(np.max(c)-np.min(c)))
ax = plt.gca()
for i in np.arange(len(x)-1):
ax.plot([x[i],x[i+1]], [y[i],y[i+1]], c=c[i])
return
This function normalizes the desired property and get a color from the jet colormap. You may want to use a different one. Then, get the current axis and plot different segments of your data with a different colour. Because I am doing a for loop, you should avoid using it for a very large data set, however, for normal purposes it is useful.
Consider the following example as a test:
import numpy as np
import matplotlib.pyplot as plt
n = 100
x = 1.*np.arange(n)
y = np.random.rand(n)
prop = x**2
fig = plt.figure(1, figsize=(5,5))
ax = fig.add_subplot(111)
plot_colourline(x,y,prop)
You could color the data points by a third variable, if that would help:
dates = [dt.date() for dt in pd.to_datetime(df.Date)]
plt.scatter(dates, df.Price, c=df.Change, s=100, lw=0)
plt.plot(dates, df.Price)
plt.colorbar()
plt.show()