I have a time series of monthly data like this, and plot it like this:
rng = pd.date_range('1965-01-01', periods=600, freq='M')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
fig, ax = plt.subplots()
ts.plot(ax=ax)
The major tick marks are set every 10 years, beginning in 1969. I'd like to change this so they start in 1975. After looking at some matplotlib samples (here and here) I tried adding
from matplotlib.dates import YearLocator, DateFormatter
decs = YearLocator(10) # decades
decsFmt = DateFormatter("%Y")
ax.xaxis.set_major_locator(decs)
ax.xaxis.set_major_formatter(decsFmt)
datemin = pd.datetime(ts.index.min().year, 1, 1)
datemax = pd.date(ts.index.max().year + 1, 1, 1)
ax.set_xlim(datemin, datemax)
but this doesn't work.
If you want to use matplotlib to set axis limits you will need to turn off pandas' date formatting.
Just change the line to
ts.plot(x_compat=True, ax=ax)
and it should work.
Related
I am trying to create a heat map from pandas dataframe using seaborn library. Here, is the code:
test_df = pd.DataFrame(np.random.randn(367, 5),
index = pd.DatetimeIndex(start='01-01-2000', end='01-01-2001', freq='1D'))
ax = sns.heatmap(test_df.T)
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_minor_locator(mdates.DayLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
ax.xaxis.set_minor_formatter(mdates.DateFormatter('%d'))
However, I am getting a figure with nothing printed on the x-axis.
Seaborn heatmap is a categorical plot. It scales from 0 to number of columns - 1, in this case from 0 to 366. The datetime locators and formatters expect values as dates (or more precisely, numbers that correspond to dates). For the year in question that would be numbers between 730120 (= 01-01-2000) and 730486 (= 01-01-2001).
So in order to be able to use matplotlib.dates formatters and locators, you would need to convert your dataframe index to datetime objects first. You can then not use a heatmap, but a plot that allows for numerical axes, e.g. an imshow plot. You may then set the extent of that imshow plot to correspond to the date range you want to show.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
df = pd.DataFrame(np.random.randn(367, 5),
index = pd.DatetimeIndex(start='01-01-2000', end='01-01-2001', freq='1D'))
dates = df.index.to_pydatetime()
dnum = mdates.date2num(dates)
start = dnum[0] - (dnum[1]-dnum[0])/2.
stop = dnum[-1] + (dnum[1]-dnum[0])/2.
extent = [start, stop, -0.5, len(df.columns)-0.5]
fig, ax = plt.subplots()
im = ax.imshow(df.T.values, extent=extent, aspect="auto")
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_minor_locator(mdates.DayLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
fig.colorbar(im)
plt.show()
I found this question when trying to do a similar thing and you can hack together a solution but it's not very pretty.
For example I get the current labels, loop over them to find the ones for January and set those to just the year, setting the rest to be blank.
This gives me year labels in the correct position.
xticklabels = ax.get_xticklabels()
for label in xticklabels:
text = label.get_text()
if text[5:7] == '01':
label.set_text(text[0:4])
else:
label.set_text('')
ax.set_xticklabels(xticklabels)
Hopefully from that you can figure out what you want to do.
I created a chart where you can see the visualized data and the trend of the data.
Is it possible to cut the chart on a timespan?
This is my code for the chart
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
fig, ax = plt.subplots()
ax.grid(True)
year = mdates.YearLocator(month=1)
month = mdates.MonthLocator(interval=3)
year_format = mdates.DateFormatter('%Y')
month_format = mdates.DateFormatter('%m')
ax.xaxis.set_minor_locator(month)
ax.xaxis.grid(True, which = 'minor')
ax.xaxis.set_major_locator(year)
ax.xaxis.set_major_formatter(year_format)
plt.plot(df.index, df['JAN'], c='blue')
plt.plot(decomposition.trend.index, decomposition.trend, c='red')
I had this code to shorten the chart but I couldnĀ“t figure out how to use it in the code above.
start_date = datetime(2004,1,1)
end_date = datetime(2008,1,1)
df[(start_date<=df.index) & (df.index<=end_date)].plot(grid='on')
You can use plt.xlim to adjust the date range,
plt.xlim([datetime(2004, 1, 1), datetime(2008, 1, 1)])
Which will give you an x-axis that looks like
I am trying to use the following code to set the x-ticks to [Jan., Feb., ...]
import matplotlib.pyplot as plt
from matplotlib.dates import MonthLocator, DateFormatter
fig = plt.figure(figsize=[10, 5])
ax = fig.add_subplot(111)
ax.plot(np.arange(1000))
ax.xaxis.set_major_locator(MonthLocator())
ax.xaxis.set_major_formatter(DateFormatter('%b'))
I get the following figure, without x-ticks
I'm wondering why all x-ticks disappeared? I wrote the above code with reference to this implementation
Many thanks.
It is not very clear the type of data you currently have. But below are my suggestions for plotting the month on the x-axis:
Transform your date using pd.to_datetime
Set it to your dataframe index.
Call explicitly the plt.set_xticks() method
Below one example with re-created data:
from datetime import datetime as dt
from datetime import timedelta
### create sample data
your_df = pd.DataFrame()
your_df['vals'] = np.arange(1000)
## make sure your datetime is considered as such by pandas
your_df['date'] = pd.to_datetime([dt.today()+timedelta(days=x) for x in range(1000)])
your_df= your_df.set_index('date') ## set it as index
### plot it
fig = plt.figure(figsize=[10, 5])
ax = fig.add_subplot(111)
ax.plot(your_df['vals'])
plt.xticks(rotation='vertical')
ax.xaxis.set_major_locator(MonthLocator())
ax.xaxis.set_major_formatter(DateFormatter('%b'))
Note that if you do not want every month plotted, you can let matplotlib handle that for you, by removing the major locator.
fig = plt.figure(figsize=[10, 5])
ax = fig.add_subplot(111)
ax.plot(your_df['vals'])
plt.xticks(rotation='vertical')
# ax.xaxis.set_major_locator(MonthLocator())
ax.xaxis.set_major_formatter(DateFormatter('%b'))
Added Went into the link provided, and you do have a DATE field in the dataset used (boulder-precip.csv). You can actually follow the same procedure and have it plotted on a monthly-basis:
df = pd.read_csv('boulder-precip.csv')
df['DATE'] = pd.to_datetime(df['DATE'])
df = df.set_index('DATE')
fig = plt.figure(figsize=[10, 5])
ax = fig.add_subplot(111)
ax.plot(df['PRECIP'])
plt.xticks(rotation='vertical')
ax.xaxis.set_major_locator(MonthLocator())
ax.xaxis.set_major_formatter(DateFormatter('%b'))
My dataframe is like this-
Energy_MWh Month
0 39686.82 1979-01
1 35388.78 1979-02
2 50134.02 1979-03
3 37499.22 1979-04
4 20104.08 1979-05
5 17440.26 1979-06
It goes on like this to the month 2015-12. So you can imagine all the data.
I want to plot a continuous graph with the months as the x-axis and the Energy_MWh as the y-axis. How to best represent this using matplotlib?
I would also like to know for my knowledge if there's a way to print 1979-01 as Jan-1979 on the x-axis and so on. Probably a lambda function or something while plotting.
Borrowed liberally from this answer, which you should go out and upvote:
from datetime import datetime
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
df = <set_your_data_frame_here>
myDates = pd.to_datetime(df['Month'])
myValues = df['Energy_MWh']
fig, ax = plt.subplots()
ax.plot(myDates,myValues)
myFmt = DateFormatter("%b-%Y")
ax.xaxis.set_major_formatter(myFmt)
## Rotate date labels automatically
fig.autofmt_xdate()
plt.show()
Set Month as the index:
df.set_index('Month', inplace=True)
Convert the index to Datetime:
df.index = pd.DatetimeIndex(df.index)
Plot:
df.plot()
I have a DataFrame which is structurally similar to the following:
from datetime import datetime
import pandas as pd
from mpu.datetime import generate # pip install mpu
mind, maxd = datetime(2018, 1, 1), datetime(2018, 12, 30)
df = pd.DataFrame({'datetime': [generate(mind, maxd) for _ in range(10)]})
I want to understand how this data is distributed over hours of the day and days of the week. I can get them via:
df['weekday'] = df['datetime'].dt.weekday
df['hour'] = df['datetime'].dt.hour
And finally I have the plot:
ax = df.groupby(['weekday', 'hour'])['datetime'].count().plot(kind='line', color='blue')
ax.set_ylabel("#")
ax.set_xlabel("time")
plt.show()
which gives me:
But you can notice that it is hard to distinguish the weekdays and the hours are not even noticeable. How can I get two-level labels similar to the following?
If you assume that every possible weekday and hour actually appears in the data, the axis units will simply be hours, with Monday midnight being 0, and Sunday 23h being 24*7-1 = 167.
You can then tick every 24 hours with major ticks and label every noon with the respective day of the week.
import numpy as np; np.random.seed(42)
import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator, FuncFormatter, NullFormatter
# Generate example data
N = 5030
delta = (dt.datetime(2019, 1, 1) - dt.datetime(2018, 1, 1)).total_seconds()
df = pd.DataFrame({'datetime': np.array("2018-01-01").astype(np.datetime64) +
(delta*np.random.rand(N)).astype(np.timedelta64)})
# Group the data
df['weekday'] = df['datetime'].dt.weekday
df['hour'] = df['datetime'].dt.hour
counts = df.groupby(['weekday', 'hour'])['datetime'].count()
ax = counts.plot(kind='line', color='blue')
ax.set_ylabel("#")
ax.set_xlabel("time")
ax.grid()
# Now we assume that there is data for every hour and day present
assert len(counts) == 7*24
# Hence we can tick the axis with multiples of 24h
ax.xaxis.set_major_locator(MultipleLocator(24))
ax.xaxis.set_minor_locator(MultipleLocator(1))
days = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
def tick(x,pos):
if x % 24 == 12:
return days[int(x)//24]
else:
return ""
ax.xaxis.set_major_formatter(NullFormatter())
ax.xaxis.set_minor_formatter(FuncFormatter(tick))
ax.tick_params(which="major", axis="x", length=10, width=1.5)
plt.show()
It is not exactly the visualization you mentioned, but an idea would be to unstack your pandas time series and then plot.
df.groupby(['weekday', 'hour'])['datetime'].count().unstack(level=0).plot()
The result would be the following with the data you provided on your code is:
I was not able to test it with your dataset, and pandas datetime is sometimes difficult with matplotlib datetime. But the idea is to set major and minor ticks and define their grid qualities separately:
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib import dates as mdates
#create sample data and plot it
from io import StringIO
data = StringIO("""
X,A,B
2018-11-21T12:04:20,1,8
2018-11-21T18:14:17,6,7
2018-11-22T02:18:21,8,14
2018-11-22T12:31:54,7,8
2018-11-22T20:33:20,5,5
2018-11-23T12:23:12,13,2
2018-11-23T21:31:05,7,12
""")
df = pd.read_csv(data, parse_dates = True, index_col = "X")
ax=df.plot()
#format major locator
ax.xaxis.set_major_locator(mdates.DayLocator())
#format minor locator with specific hours
ax.xaxis.set_minor_locator(mdates.HourLocator(byhour = [8, 12, 18]))
#label major ticks
ax.xaxis.set_major_formatter(mdates.DateFormatter('%a %d %m'))
#label minor ticks
ax.xaxis.set_minor_formatter(mdates.DateFormatter("%H:00"))
#set grid for major ticks
ax.grid(which = "major", axis = "x", linestyle = "-", linewidth = 2)
#set grid for minor ticks with different properties
ax.grid(which = "minor", axis = "x", linestyle = "--", linewidth = 1)
plt.show()
Sample output: