Related
I am looking to automate some work I have been doing in PowerPoint/Excel using Python and MatPlotLib; however, I am having trouble recreating what I have been doing in PowerPoint/Excel.
I have three data series that are grouped by month on the x-axis; however, the months are not date/time and have no real x-values. I want to be able to assign x-values based on the number of rows (so they are not stacked), then group them by month, and add a vertical line once the month "value" changes.
It is also important to note that the number of rows per month can vary, so im having trouble grouping the months and automatically adding the vertical line once the month data changes to the next month.
Here is a sample image of what I created in PowerPoint/Excel and what I am hoping to accomplish:
Here is what I have so far:
For above: I added a new column to my csv file named "Count" and added that as my x-values; however, that is only a workaround to get my desired "look" and does not separate the points by month.
My code so far:
manipulate.csv
Count,Month,Type,Time
1,June,Purple,13
2,June,Orange,3
3,June,Purple,13
4,June,Orange,12
5,June,Blue,55
6,June,Blue,42
7,June,Blue,90
8,June,Orange,3
9,June,Orange,171
10,June,Blue,132
11,June,Blue,96
12,July,Orange,13
13,July,Orange,13
14,July,Orange,22
15,July,Orange,6
16,July,Purple,4
17,July,Orange,3
18,July,Orange,18
19,July,Blue,99
20,August,Blue,190
21,August,Blue,170
22,August,Orange,33
23,August,Orange,29
24,August,Purple,3
25,August,Purple,9
26,August,Purple,6
testchart.py
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('manipulate.csv')
df=df.reindex(columns=["Month", "Type", "Time", "Count"])
df['Orange'] = df.loc[df['Type'] == 'Orange', 'Time']
df['Blue'] = df.loc[df['Type'] == 'Blue', 'Time']
df['Purple'] = df.loc[df['Type'] == 'Purple', 'Time']
print(df)
w = df['Count']
x = df['Orange']
y = df['Blue']
z = df['Purple']
plt.plot(w, x, linestyle = 'none', marker='o', c='Orange')
plt.plot(w, y, linestyle = 'none', marker='o', c='Blue')
plt.plot(w, z, linestyle = 'none', marker='o', c='Purple')
plt.ylabel("Time")
plt.xlabel("Month")
plt.show()
Can I suggest using Seaborn's swarmplot instead? It might be easier:
import seaborn as sns
import matplotlib.pyplot as plt
# Change the month to an actual date then set the format to just the date's month's name
df.Month = pd.to_datetime(df.Month, format='%B').dt.month_name()
sns.swarmplot(data=df, x='Month', y='Time', hue='Type', palette=['purple', 'orange', 'blue'])
plt.legend().remove()
for x in range(len(df.Month.unique())-1):
plt.axvline(0.5+x, linestyle='--', color='black', alpha = 0.5)
Output Graph:
Or Seaborn's stripplot with some jitter value:
import seaborn as sns
import matplotlib.pyplot as plt
# Change the month to an actual date then set the format to just the date's month's name
df.Month = pd.to_datetime(df.Month, format='%B').dt.month_name()
sns.stripplot(data=df, x='Month', y='Time', hue='Type', palette=['purple', 'orange', 'blue'], jitter=0.4)
plt.legend().remove()
for x in range(len(df.Month.unique())-1):
plt.axvline(0.5+x, linestyle='--', color='black', alpha = 0.5)
If not, this answer will use matplotlib.dates's mdates to format the labels of the xaxis to just the month names. It will also use datetime's timedelta to add some days to each month to split them up (so that they are not overlapped):
from datetime import timedelta
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
df.Month = pd.to_datetime(df.Month, format='%B')
separators = df.Month.unique() # Get each unique month, to be used for the vertical lines
# Add an amount of days to each value within a range of 25 days based on how many days are in each month in the dataframe
# This is just to split up the days so that there is no overlap
dayAdditions = sum([list(range(2,25,int(25/x))) for x in list(df.groupby('Month').count().Time)], [])
df.Month = [x + timedelta(days=count) for x,count in zip(df.Month, dayAdditions)]
df=df.reindex(columns=["Month", "Type", "Time", "Count"])
df['Orange'] = df.loc[df['Type'] == 'Orange', 'Time']
df['Blue'] = df.loc[df['Type'] == 'Blue', 'Time']
df['Purple'] = df.loc[df['Type'] == 'Purple', 'Time']
w = df['Count']
x = df['Orange']
y = df['Blue']
z = df['Purple']
fig, ax = plt.subplots()
plt.plot(df.Month, x, linestyle = 'none', marker='o', c='Orange')
plt.plot(df.Month, y, linestyle = 'none', marker='o', c='Blue')
plt.plot(df.Month, z, linestyle = 'none', marker='o', c='Purple')
plt.ylabel("Time")
plt.xlabel("Month")
ax.xaxis.set_major_locator(mdates.MonthLocator(bymonthday=15)) # Set the locator at the 15th of each month
ax.xaxis.set_major_formatter(mdates.DateFormatter('%B')) # Set the format to just be the month name
for sep in separators[1:]:
plt.axvline(sep, linestyle='--', color='black', alpha = 0.5) # Add a separator at every month starting at the second month
plt.show()
Output:
This is how I put your data in a df, in case anyone else wants to grab it to help answer the question:
from io import StringIO
import pandas as pd
TESTDATA = StringIO(
'''Count,Month,Type,Time
1,June,Purple,13
2,June,Orange,3
3,June,Purple,13
4,June,Orange,12
5,June,Blue,55
6,June,Blue,42
7,June,Blue,90
8,June,Orange,3
9,June,Orange,171
10,June,Blue,132
11,June,Blue,96
12,July,Orange,13
13,July,Orange,13
14,July,Orange,22
15,July,Orange,6
16,July,Purple,4
17,July,Orange,3
18,July,Orange,18
19,July,Blue,99
20,August,Blue,190
21,August,Blue,170
22,August,Orange,33
23,August,Orange,29
24,August,Purple,3
25,August,Purple,9
26,August,Purple,6''')
df = pd.read_csv(TESTDATA, sep = ',')
Maybe add custom x-axis labels and separating lines between months:
new_month = ~df.Month.eq(df.Month.shift(-1))
for c in df[new_month].Count.values[:-1]:
plt.axvline(c + 0.5, linestyle="--", color="gray")
plt.xticks(
(df[new_month].Count + df[new_month].Count.shift(fill_value=0)) / 2,
df[new_month].Month,
)
for color in ["Orange", "Blue", "Purple"]:
plt.plot(
df["Count"],
df[color],
linestyle="none",
marker="o",
color=color.lower(),
label=color,
)
I would also advise that you rename the color columns into something more descriptive and if possible add more time information to your data sample (days, year).
I am trying to draw a stock market graph
timeseries vs closing price and timeseries vs volume.
Somehow the x-axis shows the time in 1970
the following is the graph and the code
The code is:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
pd_data = pd.DataFrame(data, columns=['id', 'symbol', 'volume', 'high', 'low', 'open', 'datetime','close','datetime_utc','created_at'])
pd_data['DOB'] = pd.to_datetime(pd_data['datetime_utc']).dt.strftime('%Y-%m-%d')
pd_data.set_index('DOB')
print(pd_data)
print(pd_data.dtypes)
ax=pd_data.plot(x='DOB',y='close',kind = 'line')
ax.set_ylabel("price")
#ax.pd_data['volume'].plot(secondary_y=True, kind='bar')
ax1=pd_data.plot(y='volume',secondary_y=True, ax=ax,kind='bar')
ax1.set_ylabel('Volumne')
# Choose your xtick format string
date_fmt = '%d-%m-%y'
date_formatter = mdates.DateFormatter(date_fmt)
ax1.xaxis.set_major_formatter(date_formatter)
# set monthly locator
ax1.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
# set font and rotation for date tick labels
plt.gcf().autofmt_xdate()
plt.show()
Also tried the two graphs independently without ax=ax
ax=pd_data.plot(x='DOB',y='close',kind = 'line')
ax.set_ylabel("price")
ax1=pd_data.plot(y='volume',secondary_y=True,kind='bar')
ax1.set_ylabel('Volumne')
then price graph shows years properly whereas volumen graph shows 1970
And if i swap them
ax1=pd_data.plot(y='volume',secondary_y=True,kind='bar')
ax1.set_ylabel('Volumne')
ax=pd_data.plot(x='DOB',y='close',kind = 'line')
ax.set_ylabel("price")
Now the volume graph shows years properly whereas the price graph shows the years as 1970
I tried removing secondary_y and also changing bar to line. BUt no luck
Somehow pandas Data after first graph is changing the year.
I do not advise plotting a bar plot with such a numerous amount of bars.
This answer explains why there is an issue with the xtick labels, and how to resolve the issue.
Plotting with pandas.DataFrame.plot works without issue with .set_major_locator
Tested in python 3.8.11, pandas 1.3.2, matplotlib 3.4.2
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas_datareader as web # conda install -c anaconda pandas-datareader or pip install pandas-datareader
# download data
df = web.DataReader('amzn', data_source='yahoo', start='2015-02-21', end='2021-04-27')
# plot
ax = df.plot(y='Close', color='magenta', ls='-.', figsize=(10, 6), ylabel='Price ($)')
ax1 = df.plot(y='Volume', secondary_y=True, ax=ax, alpha=0.5, rot=0, lw=0.5)
ax1.set(ylabel='Volume')
# format
date_fmt = '%d-%m-%y'
years = mdates.YearLocator() # every year
yearsFmt = mdates.DateFormatter(date_fmt)
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
plt.setp(ax.get_xticklabels(), ha="center")
plt.show()
Why are the OP x-tick labels starting from 1970?
Bar plots locations are being 0 indexed (with pandas), and 0 corresponds to 1970
See Pandas bar plot changes date format
Most solutions with bar plots simply reformat the label to the appropriate datetime, however this is cosmetic and will not align the locations between the line plot and bar plot
Solution 2 of this answer shows how to change the tick locators, but is really not worth the extra code, when plt.bar can be used.
print(pd.to_datetime(ax1.get_xticks()))
DatetimeIndex([ '1970-01-01 00:00:00',
'1970-01-01 00:00:00.000000001',
'1970-01-01 00:00:00.000000002',
'1970-01-01 00:00:00.000000003',
...
'1970-01-01 00:00:00.000001552',
'1970-01-01 00:00:00.000001553',
'1970-01-01 00:00:00.000001554',
'1970-01-01 00:00:00.000001555'],
dtype='datetime64[ns]', length=1556, freq=None)
ax = df.plot(y='Close', color='magenta', ls='-.', figsize=(10, 6), ylabel='Price ($)')
print(ax.get_xticks())
ax1 = df.plot(y='Volume', secondary_y=True, ax=ax, kind='bar')
print(ax1.get_xticks())
ax1.set_xlim(0, 18628.)
date_fmt = '%d-%m-%y'
years = mdates.YearLocator() # every year
yearsFmt = mdates.DateFormatter(date_fmt)
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
[out]:
[16071. 16436. 16801. 17167. 17532. 17897. 18262. 18628.] ← ax tick locations
[ 0 1 2 ... 1553 1554 1555] ← ax1 tick locations
With plt.bar the bar plot locations are indexed based on the datetime
ax = df.plot(y='Close', color='magenta', ls='-.', figsize=(10, 6), ylabel='Price ($)', rot=0)
plt.setp(ax.get_xticklabels(), ha="center")
print(ax.get_xticks())
ax1 = ax.twinx()
ax1.bar(df.index, df.Volume)
print(ax1.get_xticks())
date_fmt = '%d-%m-%y'
years = mdates.YearLocator() # every year
yearsFmt = mdates.DateFormatter(date_fmt)
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
[out]:
[16071. 16436. 16801. 17167. 17532. 17897. 18262. 18628.]
[16071. 16436. 16801. 17167. 17532. 17897. 18262. 18628.]
sns.barplot(x=df.index, y=df.Volume, ax=ax1) has xtick locations as [ 0 1 2 ... 1553 1554 1555], so the bar plot and line plot did not align.
I could not find the reason for 1970, but rather use matplotlib.pyplot to plot instead of indirectly using pandas and also pass the datatime array instead of pandas
So the following code worked
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import datetime as dt
import numpy as np
pd_data = pd.read_csv("/home/stockdata.csv",sep='\t')
pd_data['DOB'] = pd.to_datetime(pd_data['datetime2']).dt.strftime('%Y-%m-%d')
dates=[dt.datetime.strptime(d,'%Y-%m-%d').date() for d in pd_data['DOB']]
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%m/%d/%Y'))
plt.gca().xaxis.set_major_locator(mdates.MonthLocator(interval=2))
plt.bar(dates,pd_data['close'],align='center')
plt.gca().xaxis.set_minor_locator(plt.MultipleLocator(1))
plt.gcf().autofmt_xdate()
plt.show()
I have created a dates array in the datetime format. If i make graph using that then the dates are no more shown as 1970
open high low close volume datetime datetime2
35.12 35.68 34.79 35.58 1432995 1244385200000 2012-6-15 10:30:00
35.69 36.02 35.37 35.78 1754319 1244371600000 2012-6-16 10:30:00
35.69 36.23 35.59 36.23 3685845 1245330800000 2012-6-19 10:30:00
36.11 36.52 36.03 36.32 2635777 1245317200000 2012-6-20 10:30:00
36.54 36.6 35.8 35.9 2886412 1245303600000 2012-6-21 10:30:00
36.03 36.95 36.0 36.09 3696278 1245390000000 2012-6-22 10:30:00
36.5 37.27 36.18 37.11 2732645 1245376400000 2012-6-23 10:30:00
36.98 37.11 36.686 36.83 1948411 1245335600000 2012-6-26 10:30:00
36.67 37.06 36.465 37.05 2557172 1245322000000 2012-6-27 10:30:00
37.06 37.61 36.77 37.52 1780126 1246308400000 2012-6-28 10:30:00
37.47 37.77 37.28 37.7 1352267 1246394800000 2012-6-29 10:30:00
37.72 38.1 37.68 37.76 2194619 1246381200000 2012-6-30 10:30:00
The plot i get is
I am looking for a solution to show the x_axis correct. the date 2021-01-31 is displayed as "Feb 2021". i would like to show it as "Jan 2021". thanks for help!
sdate = date(2021,1,31)
edate = date(2021,8,30)
date_range = pd.date_range(sdate,edate-timedelta(days=1),freq='m')
df_test = pd.DataFrame({ 'Datum': date_range})
df_test['values'] = 10
fig = px.line(df_test, x=df_test['Datum'], y=df_test['values'])
fig.layout = go.Layout(yaxis=dict(tickformat=".0%"))
fig.update_xaxes(dtick="M1", tickformat="%b %Y")
fig.update_layout(width=1485, height=1100)
plotly.io.write_image(fig, file='test_line.png', format='png')
You can force the ticks to start at 2021-01-31 by setting the starting tick to the starting date of your data sdate.
from datetime import date, timedelta
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
sdate = date(2021,1,31)
edate = date(2021,8,30)
date_range = pd.date_range(sdate,edate-timedelta(days=1),freq='m')
df_test = pd.DataFrame({ 'Datum': date_range})
df_test['values'] = 10
fig = px.line(df_test, x=df_test['Datum'], y=df_test['values'])
fig.layout = go.Layout(yaxis=dict(tickformat=".0%"))
fig.update_xaxes(dtick="M1", tickformat="%b %Y")
## set tick0 to the starting date
fig.update_layout(
xaxis=dict(tick0=sdate),
width=1485, height=1100
)
fig.show()
I should point out that this plot has the potential to be misleading as I believe most people would interpret each tickmark as starting at the beginning of the month (e.g. most people would think that the data starts on 2021-01-01) if you don't specify the day in your tickformat, but that is up to you depending on what you want to show on your chart.
If you instead you change the tickformat by rewriting the line fig.update_xaxes(dtick="M1", tickformat="%b %d %Y") then you get the following plot:
I am graphing three lines on a single plot. I want the x-axis to display the date the data was taken on and the time from 00:00 to 24:00. Right now my code displays the time of day correctly but for the date, instead of the date that the data was recorded on being displayed, the current date is shown (12-18). I am unsure how to correct this. Also it would be acceptable for my plot to show only time from 00:00 to 24:00 with out the date on the x-axis. Thank you for your help!!
# set index as time for graphing
monAverages['Time'] = monAverages['Time'].apply(lambda x: pd.to_datetime(str(x)))
index = monAverages['Time']
index = index.apply(lambda x: pd.to_datetime(str(x)))
averagePlot = dfSingleDay
predictPlot = predictPlot[np.isfinite(predictPlot)]
datasetPlot = datasetPlot[np.isfinite(datasetPlot)]
predictPlot1 = pd.DataFrame(predictPlot)
datasetPlot1 = pd.DataFrame(datasetPlot)
averagePlot.set_index(index, drop=True,inplace=True)
datasetPlot1.set_index(index, drop=True,inplace=True)
predictPlot1.set_index(index, drop=True,inplace=True)
plt.rcParams["figure.figsize"] = (10,10)
plt.plot(datasetPlot1,'b', label='Real Data')
plt.plot(averagePlot, 'y', label='Average for this day of the week')
plt.plot(predictPlot1, 'g', label='Predictions')
plt.title('Power Consumption')
plt.xlabel('Date (00-00) and Time of Day(00)')
plt.ylabel('kW')
plt.legend()
plt.show()
You need to be sure that you get only the time:
import matplotlib.dates as mdates
# set index as time for graphing
monAverages['Time'] = monAverages['Time'].apply(lambda x: pd.to_datetime(str(x)))
index = monAverages['Time']
#index = index.apply(lambda x: pd.to_datetime(str(x)))
dates= [dt.datetime.strptime(d,'%Y-%m-%d %H:%M:%S').time() for d in index]
averagePlot = dfSingleDay
predictPlot = predictPlot[np.isfinite(predictPlot)]
datasetPlot = datasetPlot[np.isfinite(datasetPlot)]
predictPlot1 = pd.DataFrame(predictPlot)
datasetPlot1 = pd.DataFrame(datasetPlot)
plt.rcParams["figure.figsize"] = (10,10)
plt.plot(dates,datasetPlot1,'b', label='Real Data')
plt.plot(dates,averagePlot, 'y', label='Average for this day of the week')
plt.plot(dates,predictPlot1, 'g', label='Predictions')
plt.title('Power Consumption')
plt.xlabel('Date (00-00) and Time of Day(00)')
plt.ylabel('kW')
plt.legend()
plt.show()
This code here explains how you can run it
import datetime as dt
import matplotlib.pyplot as plt
dates = ['2019-12-18 00:00:00','2019-12-18 12:00:00','2019-12-18 13:00:00']
x = [dt.datetime.strptime(d,'%Y-%m-%d %H:%M:%S').time() for d in dates]
y = range(len(x))
plt.plot(x,y)
plt.gcf().autofmt_xdate()
plt.show()
A little info: I'm very new to programming and this is a small part of the my first script. The goal of this particular segment is to display a seaborn heatmap with vertical depth on y-axis, time on x-axis and intensity of a scientific measurement as the heat function.
I'd like to apologize if this has been answered elsewhere, but my searching abilities must have failed me.
sns.set()
nametag = 'Well_4_all_depths_capf'
Dp = D[D.well == 'well4']
print(Dp.date)
heat = Dp.pivot("depth", "date", "capf")
### depth, date and capf are all columns of a pandas dataframe
plt.title(nametag)
sns.heatmap(heat, linewidths=.25)
plt.savefig('%s%s.png' % (pathheatcapf, nametag), dpi = 600)
this is the what prints from the ' print(Dp.date) '
so I'm pretty sure the formatting from the dataframe is in the format I want, particularly Year, day, month.
0 2016-08-09
1 2016-08-09
2 2016-08-09
3 2016-08-09
4 2016-08-09
5 2016-08-09
6 2016-08-09
...
But, when I run it the date axis always prints with blank times (00:00 etc) that I don't want.
Is there a way to remove these from the date axis?
Is the problem that in a cell above I used this function to scan the file name and make a column with the date??? Is it wrong to use datetime instead of just a date function?
D['date']=pd.to_datetime(['%s-%s-%s' %(f[0:4],f[4:6],f[6:8]) for f in
D['filename']])
You have to use strftime function for your date series of dataframe to plot xtick labels correctly:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import random
dates = [datetime.today() - timedelta(days=x * random.getrandbits(1)) for x in xrange(25)]
df = pd.DataFrame({'depth': [0.1,0.05, 0.01, 0.005, 0.001, 0.1, 0.05, 0.01, 0.005, 0.001, 0.1, 0.05, 0.01, 0.005, 0.001, 0.1, 0.05, 0.01, 0.005, 0.001, 0.1, 0.05, 0.01, 0.005, 0.001],\
'date': dates,\
'value': [-4.1808639999999997, -9.1753490000000006, -11.408113999999999, -10.50245, -8.0274750000000008, -0.72260200000000008, -6.9963940000000004, -10.536339999999999, -9.5440649999999998, -7.1964070000000007, -0.39225599999999999, -6.6216390000000001, -9.5518009999999993, -9.2924690000000005, -6.7605589999999998, -0.65214700000000003, -6.8852289999999989, -9.4557760000000002, -8.9364629999999998, -6.4736289999999999, -0.96481800000000006, -6.051482, -9.7846860000000007, -8.5710630000000005, -6.1461209999999999]})
pivot = df.pivot(index='depth', columns='date', values='value')
sns.set()
ax = sns.heatmap(pivot)
ax.set_xticklabels(df['date'].dt.strftime('%d-%m-%Y'))
plt.xticks(rotation=-90)
plt.show()
Example with standard heatmap datetime labels
import pandas as pd
import seaborn as sns
dates = pd.date_range('2019-01-01', '2020-12-01')
df = pd.DataFrame(np.random.randint(0, 100, size=(len(dates), 4)), index=dates)
sns.heatmap(df)
We can create some helper classes/functions to get to some better looking labels and placement. AxTransformer enables conversion from data coordinates to tick locations, set_date_ticks allows custom date ranges to be applied to plots.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from collections.abc import Iterable
from sklearn import linear_model
class AxTransformer:
def __init__(self, datetime_vals=False):
self.datetime_vals = datetime_vals
self.lr = linear_model.LinearRegression()
return
def process_tick_vals(self, tick_vals):
if not isinstance(tick_vals, Iterable) or isinstance(tick_vals, str):
tick_vals = [tick_vals]
if self.datetime_vals == True:
tick_vals = pd.to_datetime(tick_vals).astype(int).values
tick_vals = np.array(tick_vals)
return tick_vals
def fit(self, ax, axis='x'):
axis = getattr(ax, f'get_{axis}axis')()
tick_locs = axis.get_ticklocs()
tick_vals = self.process_tick_vals([label._text for label in axis.get_ticklabels()])
self.lr.fit(tick_vals.reshape(-1, 1), tick_locs)
return
def transform(self, tick_vals):
tick_vals = self.process_tick_vals(tick_vals)
tick_locs = self.lr.predict(np.array(tick_vals).reshape(-1, 1))
return tick_locs
def set_date_ticks(ax, start_date, end_date, axis='y', date_format='%Y-%m-%d', **date_range_kwargs):
dt_rng = pd.date_range(start_date, end_date, **date_range_kwargs)
ax_transformer = AxTransformer(datetime_vals=True)
ax_transformer.fit(ax, axis=axis)
getattr(ax, f'set_{axis}ticks')(ax_transformer.transform(dt_rng))
getattr(ax, f'set_{axis}ticklabels')(dt_rng.strftime(date_format))
ax.tick_params(axis=axis, which='both', bottom=True, top=False, labelbottom=True)
return ax
These provide us a lot of flexibility, e.g.
fig, ax = plt.subplots(dpi=150)
sns.heatmap(df, ax=ax)
set_date_ticks(ax, '2019-01-01', '2020-12-01', freq='3MS')
or if you really want to get weird you can do stuff like
fig, ax = plt.subplots(dpi=150)
sns.heatmap(df, ax=ax)
set_date_ticks(ax, '2019-06-01', '2020-06-01', freq='2MS', date_format='%b `%y')
For your specific example you'll have to pass axis='x' to set_date_ticks
First, the 'date' column must be converted to a datetime dtype with pandas.to_datetime
If the desired result is to only have the dates (without time), then the easiest solution is to use the .dt accessor to extract the .date component. Alternative, use dt.strftime to set a specific string format.
strftime() and strptime() Format Codes
df.date.dt.strftime('%H:%M') would extract hours and minutes into a string like '14:29'
In the example below, the extracted date is assigned to the same column, but the value can also be assigned as a new column.
pandas.DataFrame.pivot_table is used to aggregate a function if there are multiple values in a column for each index, pandas.DataFrame.pivot should be used if there is only a single value.
This is better than .groupby because the dataframe is correctly shaped to be easily plotted.
Tested in python 3.8.11, pandas 1.3.2, matplotlib 3.4.3, seaborn 0.11.2
import pandas as pd
import numpy as np
import seaborn as sns
# create sample data
dates = [f'2016-08-{d}T00:00:00.000000000' for d in range(9, 26, 2)] + ['2016-09-09T00:00:00.000000000']
depths = np.arange(1.25, 5.80, 0.25)
np.random.seed(365)
p1 = np.random.dirichlet(np.ones(10), size=1)[0] # random probabilities for random.choice
p2 = np.random.dirichlet(np.ones(19), size=1)[0] # random probabilities for random.choice
data = {'date': np.random.choice(dates, size=1000, p=p1), 'depth': np.random.choice(depths, size=1000, p=p2), 'capf': np.random.normal(0.3, 0.05, size=1000)}
df = pd.DataFrame(data)
# display(df.head())
date depth capf
0 2016-08-19T00:00:00.000000000 4.75 0.339233
1 2016-08-19T00:00:00.000000000 3.00 0.370395
2 2016-08-21T00:00:00.000000000 5.75 0.332895
3 2016-08-23T00:00:00.000000000 1.75 0.237543
4 2016-08-23T00:00:00.000000000 5.75 0.272067
# make sure the date column is converted to a datetime dtype
df.date = pd.to_datetime(df.date)
# extract only the date component of the date column
df.date = df.date.dt.date
# reshape the data for heatmap; if there's no need to aggregate a function, then use .pivot(...)
dfp = df.pivot_table(index='depth', columns='date', values='capf', aggfunc='mean')
# display(dfp.head())
date 2016-08-09 2016-08-11 2016-08-13 2016-08-15 2016-08-17 2016-08-19 2016-08-21 2016-08-23 2016-08-25 2016-09-09
depth
1.50 0.334661 NaN NaN 0.302670 0.314186 0.325257 0.313645 0.263135 NaN NaN
1.75 0.305488 0.303005 0.410124 0.299095 0.313899 0.280732 0.275758 0.260641 NaN 0.318099
2.00 0.322312 0.274105 NaN 0.319606 0.268984 0.368449 0.311517 0.309923 NaN 0.306162
2.25 0.289959 0.315081 NaN 0.302202 0.306286 0.339809 0.292546 0.314225 0.263875 NaN
2.50 0.314227 0.296968 NaN 0.312705 0.333797 0.299556 0.327187 0.326958 NaN NaN
# plot
sns.heatmap(dfp, cmap='GnBu')
I had a similar problem, but the date was the index. I've just converted the date to string (pandas 1.0) before plotting and it worked for me.
heat['date'] = heat.date.astype('string')