Histogram in matplotlib, time on x-Axis - python

I am new to matplotlib (1.3.1-2) and I cannot find a decent place to start.
I want to plot the distribution of points over time in a histogram with matplotlib.
Basically I want to plot the cumulative sum of the occurrence of a date.
date
2011-12-13
2011-12-13
2013-11-01
2013-11-01
2013-06-04
2013-06-04
2014-01-01
...
That would make
2011-12-13 -> 2 times
2013-11-01 -> 3 times
2013-06-04 -> 2 times
2014-01-01 -> once
Since there will be many points over many years, I want to set the start date on my x-Axis and the end date, and then mark n-time steps(i.e. 1 year steps) and finally decide how many bins there will be.
How would I achieve that?

Matplotlib uses its own format for dates/times, but also provides simple functions to convert which are provided in the dates module. It also provides various Locators and Formatters that take care of placing the ticks on the axis and formatting the corresponding labels. This should get you started:
import random
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# generate some random data (approximately over 5 years)
data = [float(random.randint(1271517521, 1429197513)) for _ in range(1000)]
# convert the epoch format to matplotlib date format
mpl_data = mdates.epoch2num(data)
# plot it
fig, ax = plt.subplots(1,1)
ax.hist(mpl_data, bins=50, color='lightblue')
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d.%m.%y'))
plt.show()
Result:

To add to hitzg's answer, you can use AutoDateLocator and AutoDateFormatter to have matplotlib do the location and formatting for you:
locator = mdates.AutoDateLocator()
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(mdates.AutoDateFormatter(locator))

Here is a more modern solution for matplotlib version 3.5.3.
Also, it explicitly specifies the min/max date instead of relying on min/max values derived from the data.
import random
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
days = 365*3
start_date = datetime.now()
random_dates = [
start_date + timedelta(days=int(random.random()*days))
for _ in range(100)
]
end_date = start_date + timedelta(days=days)
fig, ax = plt.subplots(figsize=(5,3))
n, bins, patches = ax.hist(random_dates, bins=52, range=(start_date, end_date))
fig.autofmt_xdate()
plt.show()

Related

How do I control the number of x-axis ticks?

I have pulled in a dataset that I want to use, with columns named Date and Adjusted. Adjusted is just the adjusted percentage growth on the base month.
The code I currently have is:
x = data['Date']
y = data['Adjusted']
fig = plt.figure(dpi=128, figsize=(7,3))
plt.plot(x,y)
plt.title("FTSE 100 Growth", fontsize=25)
plt.xlabel("Date", fontsize=14)
plt.ylabel("Adjusted %", fontsize=14)
plt.show()
However, when I run it I get essentially a solid black line across the bottom where all of the dates are covering each other up. It is trying to show every single date, when obviously I only want to show major ones. That dates are in the format Apr-19, and the data runs from Oct-03 to May-20.
How do I limit the number of date ticks and labels to one per year, or any amount I choose? If you do have a solution, if you could respond with the edits made to the code itself that would be great. I've tried other solutions I've found on here but I haven't been able to get it to work.
dates module of matplotlib will do the job. You can control the interval by modifying the MonthLocator (It's currently set to 6 months). Here's how:
import pandas as pd
from datetime import date, datetime, timedelta
import matplotlib.pyplot as plt
import matplotlib.dates as md
import numpy as np
import matplotlib.ticker as ticker
x = data['Date']
y = data['Adjusted']
#converts differently formatted date to a datetime object
def convert_date(df):
return datetime.strptime(df['Date'], '%b-%y')
data['Formatted_Date'] = data.apply(convert_date, axis=1)
# plot
fig, ax = plt.subplots(1, 1)
ax.plot(data['Formatted_Date'], y,'ok')
## Set time format and the interval of ticks (every 6 months)
xformatter = md.DateFormatter('%Y-%m') # format as year, month
xlocator = md.MonthLocator(interval = 6)
## Set xtick labels to appear every 6 months
ax.xaxis.set_major_locator(xlocator)
## Format xtick labels as YYYY:mm
plt.gcf().axes[0].xaxis.set_major_formatter(xformatter)
plt.title("FTSE 100 Growth", fontsize=25)
plt.xlabel("Date", fontsize=14)
plt.ylabel("Adjusted %", fontsize=14)
plt.show()
Example output:

Matplotlib x-axis ticks, fixed location for dates

In the timeline plot I’m making, I want date tickers to show only specified dates. (In my example I show tickers for events ‘A’, but it can be any list on tickers). I found how to do it when x-axis data is numeric (upper subplot in my example), but this won’t work with timestamp date type (bottom plot).
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.ticker as ticker
myData = pd.DataFrame({'date':['2019-01-15','2019-02-10','2019-03-20','2019-04-17','2019-05-23','2019-06-11'],'cnt':range(6),'event':['a','b','a','b','a','b']})
myData['date'] = [pd.Timestamp(j) for j in myData['date']]
start = pd.Timestamp('2019-01-01')
stop = pd.Timestamp('2019-07-01')
inxa = myData.loc[myData['event'] == 'a'].index
inxb = myData.loc[myData['event'] == 'b'].index
# create two plots, one with 'cnt' as x-axis, the other 'dates' on x-axis.
fig, ax = plt.subplots(2,1,figsize=(16,9))
ax[0].plot((0,6),(0,0), 'k')
ax[1].plot((start, stop),(0,0))
for g in inxa:
ax[0].plot((myData.loc[g,'cnt'],myData.loc[g,'cnt']),(0,1),c='r')
ax[1].plot((myData.loc[g,'date'],myData.loc[g,'date']),(0,1),c='r')
for g in inxb:
ax[0].plot((myData.loc[g,'cnt'],myData.loc[g,'cnt']),(0,2),c='b')
ax[1].plot((myData.loc[g,'date'],myData.loc[g,'date']),(0,2),c='b')
xlist0 = myData.loc[myData['event']=='a','cnt']
xlist1 = myData.loc[myData['event']=='a','date']
ax[0].xaxis.set_major_locator(ticker.FixedLocator(xlist0))
# ax[1].xaxis.set_major_locator(**???**)
Couldn't find a sufficient duplicate, maybe I didn't look hard enough. There are a number of ways to do this:
Converting to numbers first or using the underlying values of a Pandas DateTime Series
xticks = [mdates.date2num(z) for z in xlist1]
# or
xticks = xlist1.values
and at least a couple ways to use it/them
ax[1].xaxis.set_major_locator(ticker.FixedLocator(xticks))
ax[1].xaxis.set_ticks(xticks)
Date tick labels
How to set the xticklabels for date in matplotlib
how to get ticks every hour?
...

Represent a continuous graph using `matplotlib` and `pandas`

My dataframe is like this-
Energy_MWh Month
0 39686.82 1979-01
1 35388.78 1979-02
2 50134.02 1979-03
3 37499.22 1979-04
4 20104.08 1979-05
5 17440.26 1979-06
It goes on like this to the month 2015-12. So you can imagine all the data.
I want to plot a continuous graph with the months as the x-axis and the Energy_MWh as the y-axis. How to best represent this using matplotlib?
I would also like to know for my knowledge if there's a way to print 1979-01 as Jan-1979 on the x-axis and so on. Probably a lambda function or something while plotting.
Borrowed liberally from this answer, which you should go out and upvote:
from datetime import datetime
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
df = <set_your_data_frame_here>
myDates = pd.to_datetime(df['Month'])
myValues = df['Energy_MWh']
fig, ax = plt.subplots()
ax.plot(myDates,myValues)
myFmt = DateFormatter("%b-%Y")
ax.xaxis.set_major_formatter(myFmt)
## Rotate date labels automatically
fig.autofmt_xdate()
plt.show()
Set Month as the index:
df.set_index('Month', inplace=True)
Convert the index to Datetime:
df.index = pd.DatetimeIndex(df.index)
Plot:
df.plot()

How to plot time series that consists of different dates but same timestamps on one graph in matplotlib

I have data that shows some values collected on three different dates: 2015-01-08, 2015-01-09 and 2015-01-12. For each date there are several data points that have timestamps.
Date/times are in a list and it looks as follows:
['2015-01-08-09:00:00', '2015-01-08-10:00:00', '2015-01-08-11:00:00', '2015-01-08-12:00:00', '2015-01-08-13:00:00', '2015-01-09-14:00:00', '2015-01-09-15:00:00', '2015-01-09-16:00:00', '2015-01-12-09:00:00', '2015-01-12-10:00:00', '2015-01-12-11:00:00']
On the other hand I have corresponding values (floats) in another list:
[12210.0, 12210.0, 12180.0, 12240.0, 12250.0, 12420.0, 12390.0, 12400.0, 12380.0, 12450.0, 12460.0]
To put all this together and plot a graph I use following code:
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import matplotlib.dates as md
import dateutil
from matplotlib.font_manager import FontProperties
timestamps = ['2015-01-08-09:00:00', '2015-01-08-10:00:00', '2015-01-08-11:00:00', '2015-01-08-12:00:00', '2015-01-08-13:00:00', '2015-01-09-14:00:00', '2015-01-09-15:00:00', '2015-01-09-16:00:00', '2015-01-12-09:00:00', '2015-01-12-10:00:00', '2015-01-12-11:00:00']
ticks = [12210.0, 12210.0, 12180.0, 12240.0, 12250.0, 12420.0, 12390.0, 12400.0, 12380.0, 12450.0, 12460.0]
plt.subplots_adjust(bottom=0.2)
plt.xticks( rotation=90 )
dates = [dateutil.parser.parse(s) for s in timestamps]
ax=plt.gca()
ax.set_xticks(dates)
ax.tick_params(axis='x', labelsize=8)
xfmt = md.DateFormatter('%H:%M:%S')
ax.xaxis.set_major_formatter(xfmt)
plt.plot(dates, ticks, label="Price")
plt.xlabel("Date and time", fontsize=12)
plt.ylabel("Price", fontsize=12)
plt.suptitle("Price during last three days", fontsize=12)
plt.legend(loc=0,prop={'size':8})
plt.savefig("figure.pdf")
When I try to plot these datetimes and values I get a messy graph with the line going back and forth.
It looks like the dates are being ignored and only timestamps are taken in account which is the reason for the messy chart. I tried to edit the datetimes to have the same date and consecutive timestamps and it fixed the chart. However, I must have dates as well..
What am I doing wrong?
When I try to plot these datetimes and values I get a messy graph with the line going back and forth.
Your plots are going all over the place because plt.plot connects the dots in the order you give it. If this order is not monotonically increasing in x, then it looks "messy". You can sort the points by x first to fix this. Here is a minimal example:
import numpy as np
import pylab as plt
X = np.random.random(20)
Y = 2*X+np.random.random(20)
idx = np.argsort(X)
X2 = X[idx]
Y2 = Y[idx]
fig,ax = plt.subplots(2,1)
ax[0].plot(X,Y)
ax[1].plot(X2,Y2)
plt.show()

Plotting timestamps (hour/minute/seconds) with Matplotlib

I want to plot some timestamps (Year-month-day Hour-Minute-Second format). I am using the following code, however it doesn't show any hour-minute-second information, it shows them as 00-00-00. I double checked my date array, and as you can see from the snippet below, they are not zero.
Do you have any idea about why I am getting 00-00-00's?
import matplotlib.pyplot as plt
import matplotlib.dates as md
import dateutil
dates = [dateutil.parser.parse(s) for s in datestrings]
# datestrings = ['2012-02-21 11:28:17.980000', '2012-02-21 12:15:32.453000', '2012-02-21 23:26:23.734000', '2012-02-26 17:42:15.804000']
plt.subplots_adjust(bottom=0.2)
plt.xticks( rotation= 80 )
ax=plt.gca()
xfmt = md.DateFormatter('%Y-%m-%d %H:%M:%S')
ax.xaxis.set_major_formatter(xfmt)
plt.plot(dates[0:10],plt_data[0:10], "o-")
plt.show()
Try zooming in on your graph, you will see the datetimes expand as your x axis scale changes.
plotting unix timestamps in matplotlib
I had a similarly annoying problem when trying to plot heatmaps of positive selection on chromosomes. If I zoomed out too far things would disappear entirely!
edit: This code plots your dates exactly as you give them, but doesn't add ticks in between.
import matplotlib.pyplot as plt
import matplotlib.dates as md
import dateutil
datestrings = ['2012-02-21 11:28:17.980000', '2012-02-21 12:15:32.453000', '2012-02-21 23:26:23.734000', '2012-02-26 17:42:15.804000']
dates = [dateutil.parser.parse(s) for s in datestrings]
plt_data = range(5,9)
plt.subplots_adjust(bottom=0.2)
plt.xticks( rotation=25 )
ax=plt.gca()
ax.set_xticks(dates)
xfmt = md.DateFormatter('%Y-%m-%d %H:%M:%S')
ax.xaxis.set_major_formatter(xfmt)
plt.plot(dates,plt_data, "o-")
plt.show()
I can tell you why it shows the 00:00:00. It's because that's the start time of that particular day. For example, one tick is at 2012-02-22 00:00:00 (12 midnight of 2012-02-22) and another is at 2012-02-23 00:00:00 (12 midnight of 2012-02-23).
Ticks for the timestamps in between these two times are not shown.
I myself am trying to figure out how to show ticks for in between these times.

Categories