I have tried unsuccessfully to create a bar plot of a time series dataset. I have tried converting the dates to Pandas Datetime objects, Timestamp Objects, primitive strings, floats, and ints. No matter what I do, I get the following error: TypeError: float() argument must be a string or a number, not 'Timestamp' Here are a few minimal examples that produce the error:
Here, the 'Date' object is of type <class 'pandas._libs.tslibs.timestamps.Timestamp'>, so I know why this doesn't work:
import matplotlib.pylab as plt
import matplotlib.dates as mdates
import seaborn as sns
def main():
path = 'Data/AQ+RX Counts.csv'
df = pd.read_csv(path, parse_dates=['Date'], index_col=['Date'])
weekly_df = df.resample('W').mean().reset_index()
weekly_df['count'] = df['count'].resample('W').sum().reset_index()
sns.barplot(x = 'Date', y='count', data = weekly_df)
plt.show()
main()
I then tried making the dates floats, intending to format them back to dates after, but this still doesnt work:
dates = mdates.datestr2num(weekly_df.Date.astype(str))
weekly_df['n_dates'] = dates
sns.barplot(x = 'n_dates', y='count', data = weekly_df)
plt.show()
I also tried making them integers, to no avail:
dates = mdates.datestr2num(weekly_df.Date.astype(str))
dates = dates.astype(int)
dates = pd.Series(dates)
weekly_df['n_dates'] = dates
sns.barplot(x = 'n_dates', y='count', data = weekly_df)
plt.show()
I've tried many other variations, and all produce the same error. I've even compared it to other code and verified that all the types are identical, and the comparison code works fine. I am at a complete loss of where to go from here.
Dataframe:
Date,WSA,WSV,WDV,WSM,SGT,T2M,T10M,DELTA_T,PBAR,SRAD,RH,PM25,AQI,count
2015-01-01,1.0708333333333335,0.8750000000000001,132.95833333333334,3.4708333333333337,35.39166666666667,30.72916666666667,30.625,-0.11666666666666667,738.8249999999998,72.66666666666667,99.75416666666666,24.80833333333333,73.30793131580873,0.0
2015-01-02,1.1086956521739129,0.9391304347826086,148.47826086956522,3.734782608695653,32.46521739130434,34.39130434782609,34.27826086956521,-0.11739130434782602,738.3478260869565,61.39130434782609,100.01304347826084,23.500000000000004,64.15072523318715,4.0
2015-01-03,1.0173913043478258,0.7173913043478259,168.04347826086956,3.773913043478261,42.71739130434783,36.24782608695652,36.160869565217396,-0.09565217391304348,739.4434782608695,49.60869565217392,100.76956521739132,20.460869565217394,55.65271063058384,0.0
2015-01-04,1.0,0.6,159.95833333333334,3.85,49.15,38.8875,38.66666666666666,-0.225,741.5000000000001,31.54166666666667,101.47916666666669,13.012499999999998,46.835258118800965,0.0
2015-01-05,1.0333333333333334,0.4416666666666667,137.0,4.0,57.56666666666666,42.99583333333333,42.94583333333333,-0.04999999999999995,742.5333333333333,44.58333333333334,101.00416666666666,16.654166666666665,52.420271225456766,4.0
2015-01-06,0.7818181818181817,0.5590909090909091,114.72727272727272,3.654545454545455,42.86818181818182,40.7409090909091,41.09545454545454,0.36818181818181817,740.9045454545453,48.27272727272727,100.57727272727274,21.954545454545453,67.31833852518514,6.0
2015-01-07,0.9739130434782608,0.8304347826086954,110.82608695652172,3.956521739130436,30.817391304347833,40.36521739130435,40.59565217391304,0.22173913043478266,739.8652173913043,60.04347826086956,100.19565217391305,24.456521739130434,72.3472505968891,6.0
2015-01-08,0.9833333333333336,0.8250000000000001,156.5,4.208333333333333,32.67083333333333,41.520833333333336,41.36666666666667,-0.12916666666666668,736.35,69.58333333333333,99.95833333333331,22.274999999999995,65.77072473472253,10.0
2015-01-09,0.9583333333333331,0.7291666666666669,133.70833333333334,3.3791666666666664,39.645833333333336,42.279166666666654,42.15833333333333,-0.11666666666666665,735.2041666666665,60.41666666666666,100.04166666666669,19.370833333333334,59.08512936837911,10.0
2015-01-10,0.9666666666666668,0.7583333333333336,164.5,3.675,37.34583333333333,42.96250000000001,42.775,-0.2,734.2875,41.5,100.12083333333337,14.658333333333335,49.31465266245389,0.0
The transformation in the function is converting 'count' from floats to a datetime dtype.
Using the posted sample data
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
path = 'data/test.csv'
df = pd.read_csv(path, parse_dates=['Date'], index_col=['Date'])
# display(df)
WSA WSV WDV WSM SGT T2M T10M DELTA_T PBAR SRAD RH PM25 AQI count
Date
2015-01-01 1.070833 0.875000 132.958333 3.470833 35.391667 30.729167 30.625000 -0.116667 738.825000 72.666667 99.754167 24.808333 73.307931 0.0
2015-01-02 1.108696 0.939130 148.478261 3.734783 32.465217 34.391304 34.278261 -0.117391 738.347826 61.391304 100.013043 23.500000 64.150725 4.0
2015-01-03 1.017391 0.717391 168.043478 3.773913 42.717391 36.247826 36.160870 -0.095652 739.443478 49.608696 100.769565 20.460870 55.652711 0.0
2015-01-04 1.000000 0.600000 159.958333 3.850000 49.150000 38.887500 38.666667 -0.225000 741.500000 31.541667 101.479167 13.012500 46.835258 0.0
2015-01-05 1.033333 0.441667 137.000000 4.000000 57.566667 42.995833 42.945833 -0.050000 742.533333 44.583333 101.004167 16.654167 52.420271 4.0
2015-01-06 0.781818 0.559091 114.727273 3.654545 42.868182 40.740909 41.095455 0.368182 740.904545 48.272727 100.577273 21.954545 67.318339 6.0
2015-01-07 0.973913 0.830435 110.826087 3.956522 30.817391 40.365217 40.595652 0.221739 739.865217 60.043478 100.195652 24.456522 72.347251 6.0
2015-01-08 0.983333 0.825000 156.500000 4.208333 32.670833 41.520833 41.366667 -0.129167 736.350000 69.583333 99.958333 22.275000 65.770725 10.0
2015-01-09 0.958333 0.729167 133.708333 3.379167 39.645833 42.279167 42.158333 -0.116667 735.204167 60.416667 100.041667 19.370833 59.085129 10.0
2015-01-10 0.966667 0.758333 164.500000 3.675000 37.345833 42.962500 42.775000 -0.200000 734.287500 41.500000 100.120833 14.658333 49.314653 0.0
# resample mean
dfr = df.resample('W').mean()
# add the resampled sum to dfr
dfr['mean'] = df['count'].resample('W').sum()
# reset index
dfr = dfr.reset_index()
# display(dfr)
Date WSA WSV WDV WSM SGT T2M T10M DELTA_T PBAR SRAD RH PM25 AQI count mean
0 2015-01-04 1.049230 0.782880 152.359601 3.707382 39.931069 35.063949 34.932699 -0.138678 739.529076 53.802083 100.503986 20.445426 59.986656 1.0 4.0
1 2015-01-11 0.949566 0.690615 136.210282 3.812261 40.152457 41.810743 41.822823 0.015681 738.190794 54.066590 100.316321 19.894900 61.042728 6.0 36.0
# plot dfr
fig, ax = plt.subplots(figsize=(16, 10))
fig = sns.barplot(x='Date', y='count', data=dfr)
# configure the xaxis ticks from datetime to date
x_dates = dfr.Date.dt.strftime('%Y-%m-%d').sort_values().unique()
ax.set_xticklabels(labels=x_dates, rotation=90, ha='right')
plt.show()
Related
I have two data frames that look like:
Temp [Degrees_C] Cond [mS/cm]
yyyy-mm-ddThh:mm:ss.sss
2020-01-28 03:00:59 14.553947 19.301285
2020-01-28 08:00:59 14.501740 19.310037
2020-01-28 13:00:59 14.425415 18.531609
2020-01-28 18:00:59 14.414717 16.155998
...
And this:
CONDUCTIVITY Temp [C]
DATE TIME
2020-01-28 03:00:00 18.240 15.761111
2020-01-28 04:00:00 18.147 15.722222
2020-01-28 05:00:00 17.930 15.722222
2020-01-28 06:00:00 17.873 15.666667
...
I want to create one plot using these two data sets, they should share the same x-axis as the date-time, and two different y-axes (one for temperature and one for conductivity).
However, since the sampling is different for both of them, I'm not sure how to do it.
Any suggestions?
Thank you!!
To get the dates to match between the two datasets, I would use pd.DateOffset. Alternatively, you could use dt.round, but that could lead to some issues if you aren't sure the rounding will work.
To plot two y-axes, you are looking for ax.twinx
import pandas as pd
from io import StringIO
str1 = """
2020-01-28 03:00:59, 14.553947, 19.301285
2020-01-28 08:00:59, 14.501740, 19.310037
2020-01-28 13:00:59, 14.425415, 18.531609
2020-01-28 18:00:59, 14.414717, 16.155998
"""
str2="""
2020-01-28 03:00:00, 18.240, 15.761111
2020-01-28 04:00:00, 18.147, 15.722222
2020-01-28 05:00:00, 17.930, 15.722222
2020-01-28 06:00:00, 17.873, 15.666667
"""
colnames1 = ['date','cond','temp']
colnames2 = ['date','temp', 'cond']
df1 = pd.read_csv(StringIO(str1), header=None, names = colnames1, parse_dates=['date'])
df2 = pd.read_csv(StringIO(str2), header=None, names = colnames2, parse_dates=['date'])
#Offset to even seconds
df1.date = df1.date - pd.DateOffset(seconds=59)
#plot
ax = df1.plot(x='date',y='temp', label='df1', color='k', ls = '--')
#Create second y axis
ax_tw = ax.twinx()
df1.plot(x='date',y='cond', ax = ax_tw, label='df1', color='k', ls='-')
df2.plot(x='date',y='temp', ax= ax, label='df2', color='red', ls='--')
df2.plot(x='date',y='cond', ax = ax_tw,label='df2', color='red', ls ='-')
ax.legend()
which returns:
This matplotlib documentation webpage shows a simple example how to plot with double y axis: https://matplotlib.org/gallery/api/two_scales.html
You wrote you have a data frames, which sounds like you use pandas for data handling. You might simply cast your data to numpy arrays and follow the example.
I have the following data:
apple[0].head()
Out[76]:
Date Open High Low Close Adj Close Volume
0 1999-12-31 3.604911 3.674107 3.553571 3.671875 3.204494 40952800
1 2000-01-03 3.745536 4.017857 3.631696 3.997768 3.488905 133949200
2 2000-01-04 3.866071 3.950893 3.613839 3.660714 3.194754 128094400
3 2000-01-05 3.705357 3.948661 3.678571 3.714286 3.241507 194580400
4 2000-01-06 3.790179 3.821429 3.392857 3.392857 2.960991 191993200
and I am trying to plot prices (Close) on the y-axis, and Date on the x-axis.
If I write
plt.plot(apple[0]['Close'])
plt.title('AAPL Closing Prices')
plt.show()
it works but it plots numbers on the x-axis, while I would like to have the dates on the horizontal axis.
I tried
plt.plot(apple[0]['Date'],apple[0]['Close'])
plt.title('AAPL Closing Prices')
plt.show()
but it is not working. How can I make it work?
The type of apple[0]['Date'] is pandas.core.series.Series if that helps.
apple[0].plot(x = 'Date', y = 'Close')
gives me the following picture
doesn't show dates after 2015-11-24. How can I show more dates on the x-horizontal axis?
You able to use just DataFrame methods, like this.
In[14]: apple[0]
Out[14]:
Date Open High Low Close Adj Close Volume
0 1999-12-31 3.604911 3.674107 3.553571 3.671875 3.204494 40952800
1 2000-01-03 3.745536 4.017857 3.631696 3.997768 3.488905 133949200
2 2000-01-04 3.866071 3.950893 3.613839 3.660714 3.194754 128094400
3 2000-01-05 3.705357 3.948661 3.678571 3.714286 3.241507 194580400
4 2000-01-06 3.790179 3.821429 3.392857 3.392857 2.960991 191993200
apple[0].plot(x = 'Date', y = 'Close')
This version with explicitly matplotlib usage:
import matplotlib.pyplot as plt
from matplotlib import dates
from matplotlib.ticker import MultipleLocator
plt.plot(df['Date'], df['Close'])
plt.legend()
ax = plt.gca().get_xaxis()
ax.set_major_locator(MultipleLocator(1))
ax.set_minor_locator(MultipleLocator(0.1))
ax.set_major_formatter(dates.DateFormatter('%Y-%b-%d'))
for item in ax.get_ticklabels():
item.set_rotation(45)
https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.xticks.html
You can add custom ticks/ticklabels for the axes, the matplotlib documentation is quite detailed. Have a look.
Despite trying some solutions available on SO and at Matplotlib's documentation, I'm still unable to disable Matplotlib's creation of weekend dates on the x-axis.
As you can see see below, it adds dates to the x-axis that are not in the original Pandas column.
I'm plotting my data using (commented lines are unsuccessful in achieving my goal):
fig, ax1 = plt.subplots()
x_axis = df.index.values
ax1.plot(x_axis, df['MP'], color='k')
ax2 = ax1.twinx()
ax2.plot(x_axis, df['R'], color='r')
# plt.xticks(np.arange(len(x_axis)), x_axis)
# fig.autofmt_xdate()
# ax1.fmt_xdata = mdates.DateFormatter('%Y-%m-%d')
fig.tight_layout()
plt.show()
An example of my Pandas dataframe is below, with dates as index:
2019-01-09 1.007042 2585.898714 4.052480e+09 19.980000 12.07 1
2019-01-10 1.007465 2581.828491 3.704500e+09 19.500000 19.74 1
2019-01-11 1.007154 2588.605258 3.434490e+09 18.190001 18.68 1
2019-01-14 1.008560 2582.151225 3.664450e+09 19.070000 14.27 1
Some suggestions I've found include a custom ticker here and here however although I don't get errors the plot is missing my second series.
Any suggestions on how to disable date interpolation in matplotlib?
The matplotlib site recommends creating a custom formatter class. This class will contain logic that tells the axis label not to display anything if the date is a weekend. Here's an example using a dataframe I constructed from the 2018 data that was in the image you'd attached:
df = pd.DataFrame(
data = {
"Col 1" : [1.000325, 1.000807, 1.001207, 1.000355, 1.001512, 1.003237, 1.000979],
"MP": [2743.002071, 2754.011543, 2746.121450, 2760.169848, 2780.756857, 2793.953050, 2792.675162],
"Col 3": [3.242650e+09, 3.453480e+09, 3.576350e+09, 3.641320e+09, 3.573970e+09, 3.573970e+09, 4.325970e+09],
"Col 4": [9.520000, 10.080000, 9.820000, 9.880000, 10.160000, 10.160000, 11.660000],
"Col 5": [5.04, 5.62, 5.29, 6.58, 8.32, 9.57, 9.53],
"R": [0,0,0,0,0,1,1]
},
index=['2018-01-08', '2018-01-09', '2018-01-10', '2018-01-11',
'2018-01-12', '2018-01-15', '2018-01-16'])
Move the dates from the index to their own column:
df = df.reset_index().rename({'index': 'Date'}, axis=1, copy=False)
df['Date'] = pd.to_datetime(df['Date'])
Create the custom formatter class:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import Formatter
%config InlineBackend.figure_format = 'retina' # Get nicer looking graphs for retina displays
class CustomFormatter(Formatter):
def __init__(self, dates, fmt='%Y-%m-%d'):
self.dates = dates
self.fmt = fmt
def __call__(self, x, pos=0):
'Return the label for time x at position pos'
ind = int(np.round(x))
if ind >= len(self.dates) or ind < 0:
return ''
return self.dates[ind].strftime(self.fmt)
Now let's plot the MP and R series. Pay attention to the line where we call the custom formatter:
formatter = CustomFormatter(df['Date'])
fig, ax1 = plt.subplots()
ax1.xaxis.set_major_formatter(formatter)
ax1.plot(np.arange(len(df)), df['MP'], color='k')
ax2 = ax1.twinx()
ax2.plot(np.arange(len(df)), df['R'], color='r')
fig.autofmt_xdate()
fig.tight_layout()
plt.show()
The above code outputs this graph:
Now, no weekend dates, such as 2018-01-13, are displayed on the x-axis.
If you would like to simply not show the weekends, but for the graph to still scale correctly matplotlib has a built-in functionality for this in matplotlib.mdates. Specifically, the WeekdayLocator pretty much solves this problem singlehandedly. It's a one line solution (the rest just fabricates data for testing). Note that this works whether or not the data includes weekends:
import matplotlib.pyplot as plt
import datetime
import numpy as np
import matplotlib.dates as mdates
from matplotlib.dates import MO, TU, WE, TH, FR, SA, SU
DT_FORMAT="%Y-%m-%d"
if __name__ == "__main__":
N = 14
#Fake data
x = list(zip([2018]*N, [5]*N, list(range(1,N+1))))
x = [datetime.datetime(*y) for y in x]
x = [y for y in x if y.weekday() < 5]
random_walk_steps = 2 * np.random.randint(0, 6, len(x)) - 3
random_walk = np.cumsum(random_walk_steps)
y = np.arange(len(x)) + random_walk
# Make a figure and plot everything
fig, ax = plt.subplots()
ax.plot(x, y)
### HERE IS THE BIT THAT ANSWERS THE QUESTION
ax.xaxis.set_major_locator(mdates.WeekdayLocator(byweekday=(MO, TU, WE, TH, FR)))
ax.xaxis.set_major_formatter(mdates.DateFormatter(DT_FORMAT))
# plot stuff
fig.autofmt_xdate()
plt.tight_layout()
plt.show()
If you are trying to avoid the fact that matplotlib is interpolating between each point of your dataset, you can exploit the fact that matplotlib will plot a new line segment each time a np.NaN is encountered. Pandas makes it easy to insert np.NaN for the days that are not in your dataset with pd.Dataframe.asfreq():
df = pd.DataFrame(data = {
"Col 1" : [1.000325, 1.000807, 1.001207, 1.000355, 1.001512, 1.003237, 1.000979],
"MP": [2743.002071, 2754.011543, 2746.121450, 2760.169848, 2780.756857, 2793.953050, 2792.675162],
"Col 3": [3.242650e+09, 3.453480e+09, 3.576350e+09, 3.641320e+09, 3.573970e+09, 3.573970e+09, 4.325970e+09],
"Col 4": [9.520000, 10.080000, 9.820000, 9.880000, 10.160000, 10.160000, 11.660000],
"Col 5": [5.04, 5.62, 5.29, 6.58, 8.32, 9.57, 9.53],
"R": [0,0,0,0,0,1,1]
},
index=['2018-01-08', '2018-01-09', '2018-01-10', '2018-01-11', '2018-01-12', '2018-01-15', '2018-01-16'])
df.index = pd.to_datetime(df.index)
#rescale R so I don't need to worry about twinax
df.loc[df.R==0, 'R'] = df.loc[df.R==0, 'R'] + df.MP.min()
df.loc[df.R==1, 'R'] = df.loc[df.R==1, 'R'] * df.MP.max()
df = df.asfreq('D')
df
Col 1 MP Col 3 Col 4 Col 5 R
2018-01-08 1.000325 2743.002071 3.242650e+09 9.52 5.04 2743.002071
2018-01-09 1.000807 2754.011543 3.453480e+09 10.08 5.62 2743.002071
2018-01-10 1.001207 2746.121450 3.576350e+09 9.82 5.29 2743.002071
2018-01-11 1.000355 2760.169848 3.641320e+09 9.88 6.58 2743.002071
2018-01-12 1.001512 2780.756857 3.573970e+09 10.16 8.32 2743.002071
2018-01-13 NaN NaN NaN NaN NaN NaN
2018-01-14 NaN NaN NaN NaN NaN NaN
2018-01-15 1.003237 2793.953050 3.573970e+09 10.16 9.57 2793.953050
2018-01-16 1.000979 2792.675162 4.325970e+09 11.66 9.53 2793.953050
df[['MP', 'R']].plot(); plt.show()
I'm using matplotlib.pyplot (plt) to plot a graph of temperature against time. I am wanting the xticks to be at 12am and 12pm only, matplot auto picks where the xticks go.
How can I pick exactly 12am and 12pm for each date as the data I'm using doesn't have those data points? Basically I'm wanting an xtick on a point between two data points / on the line between two data points.
Below is a snippet of my data and my function.
Temp UNIX Time Time
5.04 1490562000 2017-03-26 22:00:00
3.21 1490572800 2017-03-27 01:00:00
2.15 1490583600 2017-03-27 04:00:00
1.66 1490594400 2017-03-27 07:00:00
6.92 1490605200 2017-03-27 10:00:00
11.73 1490616000 2017-03-27 13:00:00
13.77 1490626800 2017-03-27 16:00:00
def ploting_graph(self):
ax=plt.gca()
xfmt = md.DateFormatter('%d/%m/%y %p')
ax.xaxis.set_major_formatter(locator)
plt.plot(self.df['Time'],self.df['Temp'], 'k--^')
plt.xticks(rotation=10)
plt.grid(axis='both',color='r')
plt.ylabel("Temp (DegC)")
plt.xlim(min(self.df['Time']),max(self.df['Time']))
plt.ylim(min(self.df['Temp'])-1,max(self.df['Temp'])+1)
plt.title("Forecast Temperature {}".format(self.location))
plt.savefig('{}_day_forecast_{}.png'.format(self.num_of_days,self.location),dpi=300)
As hopefully you can see I'm looking for an xtick on the 2017-03-27 00:00:00 and 12:00:00.
Any help would be greatly appreciated and even a push in the right direction would be fantastic! Thank you very much in advanced!
John.
You need a locator and a formatter. The locator determines that you only want ticks every 12 hour while the formatter determines how the datetime should look like.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates
dates = ["2017-03-26 22:00:00","2017-03-27 01:00:00","2017-03-27 04:00:00","2017-03-27 07:00:00",
"2017-03-27 10:00:00","2017-03-27 13:00:00","2017-03-27 16:00:00"]
temps = [5.04, 3.21, 2.15,1.66, 6.92, 11.73, 13.77 ]
df = pd.DataFrame({"Time":dates, "Temp":temps})
df["Time"] = pd.to_datetime(df["Time"], format="%Y-%m-%d %H:%M:%S")
fig, ax = plt.subplots()
xfmt = matplotlib.dates.DateFormatter('%d/%m/%y 12%p')
ax.xaxis.set_major_formatter(xfmt)
locator = matplotlib.dates.HourLocator(byhour=[0,12])
ax.xaxis.set_major_locator(locator)
plt.plot(df['Time'],df['Temp'], 'k--^')
plt.xticks(rotation=10)
plt.grid(axis='both',color='r')
plt.ylabel("Temp (DegC)")
plt.xlim(min(df['Time']),max(df['Time']))
plt.ylim(min(df['Temp'])-1,max(df['Temp'])+1)
plt.title("Forecast Temperature")
plt.show()
Try playing with the xticks function on plot. Below is a quick example I tried.
import matplotlib.pyplot as plt
x = [2,3,4,5,6,14,16,18,24,22,20]
y = [1, 4, 9, 6,5,6,7,8,9,5,7]
labels = ['12Am', '12Pm']
plt.plot(x, y, 'ro')
'12,24 are location on x-axis for labels'
plt.xticks((12,24),('12Am', '12Pm'))
plt.margins(0.2)
plt.show()
A little info: I'm very new to programming and this is a small part of the my first script. The goal of this particular segment is to display a seaborn heatmap with vertical depth on y-axis, time on x-axis and intensity of a scientific measurement as the heat function.
I'd like to apologize if this has been answered elsewhere, but my searching abilities must have failed me.
sns.set()
nametag = 'Well_4_all_depths_capf'
Dp = D[D.well == 'well4']
print(Dp.date)
heat = Dp.pivot("depth", "date", "capf")
### depth, date and capf are all columns of a pandas dataframe
plt.title(nametag)
sns.heatmap(heat, linewidths=.25)
plt.savefig('%s%s.png' % (pathheatcapf, nametag), dpi = 600)
this is the what prints from the ' print(Dp.date) '
so I'm pretty sure the formatting from the dataframe is in the format I want, particularly Year, day, month.
0 2016-08-09
1 2016-08-09
2 2016-08-09
3 2016-08-09
4 2016-08-09
5 2016-08-09
6 2016-08-09
...
But, when I run it the date axis always prints with blank times (00:00 etc) that I don't want.
Is there a way to remove these from the date axis?
Is the problem that in a cell above I used this function to scan the file name and make a column with the date??? Is it wrong to use datetime instead of just a date function?
D['date']=pd.to_datetime(['%s-%s-%s' %(f[0:4],f[4:6],f[6:8]) for f in
D['filename']])
You have to use strftime function for your date series of dataframe to plot xtick labels correctly:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import random
dates = [datetime.today() - timedelta(days=x * random.getrandbits(1)) for x in xrange(25)]
df = pd.DataFrame({'depth': [0.1,0.05, 0.01, 0.005, 0.001, 0.1, 0.05, 0.01, 0.005, 0.001, 0.1, 0.05, 0.01, 0.005, 0.001, 0.1, 0.05, 0.01, 0.005, 0.001, 0.1, 0.05, 0.01, 0.005, 0.001],\
'date': dates,\
'value': [-4.1808639999999997, -9.1753490000000006, -11.408113999999999, -10.50245, -8.0274750000000008, -0.72260200000000008, -6.9963940000000004, -10.536339999999999, -9.5440649999999998, -7.1964070000000007, -0.39225599999999999, -6.6216390000000001, -9.5518009999999993, -9.2924690000000005, -6.7605589999999998, -0.65214700000000003, -6.8852289999999989, -9.4557760000000002, -8.9364629999999998, -6.4736289999999999, -0.96481800000000006, -6.051482, -9.7846860000000007, -8.5710630000000005, -6.1461209999999999]})
pivot = df.pivot(index='depth', columns='date', values='value')
sns.set()
ax = sns.heatmap(pivot)
ax.set_xticklabels(df['date'].dt.strftime('%d-%m-%Y'))
plt.xticks(rotation=-90)
plt.show()
Example with standard heatmap datetime labels
import pandas as pd
import seaborn as sns
dates = pd.date_range('2019-01-01', '2020-12-01')
df = pd.DataFrame(np.random.randint(0, 100, size=(len(dates), 4)), index=dates)
sns.heatmap(df)
We can create some helper classes/functions to get to some better looking labels and placement. AxTransformer enables conversion from data coordinates to tick locations, set_date_ticks allows custom date ranges to be applied to plots.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from collections.abc import Iterable
from sklearn import linear_model
class AxTransformer:
def __init__(self, datetime_vals=False):
self.datetime_vals = datetime_vals
self.lr = linear_model.LinearRegression()
return
def process_tick_vals(self, tick_vals):
if not isinstance(tick_vals, Iterable) or isinstance(tick_vals, str):
tick_vals = [tick_vals]
if self.datetime_vals == True:
tick_vals = pd.to_datetime(tick_vals).astype(int).values
tick_vals = np.array(tick_vals)
return tick_vals
def fit(self, ax, axis='x'):
axis = getattr(ax, f'get_{axis}axis')()
tick_locs = axis.get_ticklocs()
tick_vals = self.process_tick_vals([label._text for label in axis.get_ticklabels()])
self.lr.fit(tick_vals.reshape(-1, 1), tick_locs)
return
def transform(self, tick_vals):
tick_vals = self.process_tick_vals(tick_vals)
tick_locs = self.lr.predict(np.array(tick_vals).reshape(-1, 1))
return tick_locs
def set_date_ticks(ax, start_date, end_date, axis='y', date_format='%Y-%m-%d', **date_range_kwargs):
dt_rng = pd.date_range(start_date, end_date, **date_range_kwargs)
ax_transformer = AxTransformer(datetime_vals=True)
ax_transformer.fit(ax, axis=axis)
getattr(ax, f'set_{axis}ticks')(ax_transformer.transform(dt_rng))
getattr(ax, f'set_{axis}ticklabels')(dt_rng.strftime(date_format))
ax.tick_params(axis=axis, which='both', bottom=True, top=False, labelbottom=True)
return ax
These provide us a lot of flexibility, e.g.
fig, ax = plt.subplots(dpi=150)
sns.heatmap(df, ax=ax)
set_date_ticks(ax, '2019-01-01', '2020-12-01', freq='3MS')
or if you really want to get weird you can do stuff like
fig, ax = plt.subplots(dpi=150)
sns.heatmap(df, ax=ax)
set_date_ticks(ax, '2019-06-01', '2020-06-01', freq='2MS', date_format='%b `%y')
For your specific example you'll have to pass axis='x' to set_date_ticks
First, the 'date' column must be converted to a datetime dtype with pandas.to_datetime
If the desired result is to only have the dates (without time), then the easiest solution is to use the .dt accessor to extract the .date component. Alternative, use dt.strftime to set a specific string format.
strftime() and strptime() Format Codes
df.date.dt.strftime('%H:%M') would extract hours and minutes into a string like '14:29'
In the example below, the extracted date is assigned to the same column, but the value can also be assigned as a new column.
pandas.DataFrame.pivot_table is used to aggregate a function if there are multiple values in a column for each index, pandas.DataFrame.pivot should be used if there is only a single value.
This is better than .groupby because the dataframe is correctly shaped to be easily plotted.
Tested in python 3.8.11, pandas 1.3.2, matplotlib 3.4.3, seaborn 0.11.2
import pandas as pd
import numpy as np
import seaborn as sns
# create sample data
dates = [f'2016-08-{d}T00:00:00.000000000' for d in range(9, 26, 2)] + ['2016-09-09T00:00:00.000000000']
depths = np.arange(1.25, 5.80, 0.25)
np.random.seed(365)
p1 = np.random.dirichlet(np.ones(10), size=1)[0] # random probabilities for random.choice
p2 = np.random.dirichlet(np.ones(19), size=1)[0] # random probabilities for random.choice
data = {'date': np.random.choice(dates, size=1000, p=p1), 'depth': np.random.choice(depths, size=1000, p=p2), 'capf': np.random.normal(0.3, 0.05, size=1000)}
df = pd.DataFrame(data)
# display(df.head())
date depth capf
0 2016-08-19T00:00:00.000000000 4.75 0.339233
1 2016-08-19T00:00:00.000000000 3.00 0.370395
2 2016-08-21T00:00:00.000000000 5.75 0.332895
3 2016-08-23T00:00:00.000000000 1.75 0.237543
4 2016-08-23T00:00:00.000000000 5.75 0.272067
# make sure the date column is converted to a datetime dtype
df.date = pd.to_datetime(df.date)
# extract only the date component of the date column
df.date = df.date.dt.date
# reshape the data for heatmap; if there's no need to aggregate a function, then use .pivot(...)
dfp = df.pivot_table(index='depth', columns='date', values='capf', aggfunc='mean')
# display(dfp.head())
date 2016-08-09 2016-08-11 2016-08-13 2016-08-15 2016-08-17 2016-08-19 2016-08-21 2016-08-23 2016-08-25 2016-09-09
depth
1.50 0.334661 NaN NaN 0.302670 0.314186 0.325257 0.313645 0.263135 NaN NaN
1.75 0.305488 0.303005 0.410124 0.299095 0.313899 0.280732 0.275758 0.260641 NaN 0.318099
2.00 0.322312 0.274105 NaN 0.319606 0.268984 0.368449 0.311517 0.309923 NaN 0.306162
2.25 0.289959 0.315081 NaN 0.302202 0.306286 0.339809 0.292546 0.314225 0.263875 NaN
2.50 0.314227 0.296968 NaN 0.312705 0.333797 0.299556 0.327187 0.326958 NaN NaN
# plot
sns.heatmap(dfp, cmap='GnBu')
I had a similar problem, but the date was the index. I've just converted the date to string (pandas 1.0) before plotting and it worked for me.
heat['date'] = heat.date.astype('string')