Matplotlib Date Index Formatting - python

I am using matplotlib to plot some financial data. However, in its default configuration matplotlib inserts gaps in place of missing data. The documentation recommends using a date index formatter to resolve this.
However, can be seen in the examples provided on the page:
The formatting has changed from "Sept 03 2008" => "2008-09-03"
The chart no longer ends on the final sample, but rather is padded to "2008-10-14".
How can I retain this default behavior while still avoiding gaps in the data?
Edit
Sample code, from the documentation, with the desired ticks on top.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import matplotlib.cbook as cbook
import matplotlib.ticker as ticker
datafile = cbook.get_sample_data('aapl.csv', asfileobj=False)
print 'loading', datafile
r = mlab.csv2rec(datafile)
r.sort()
r = r[-30:] # get the last 30 days
# first we'll do it the default way, with gaps on weekends
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(r.date, r.adj_close, 'o-')
fig.autofmt_xdate()
# next we'll write a custom formatter
N = len(r)
ind = np.arange(N) # the evenly spaced plot indices
def format_date(x, pos=None):
thisind = np.clip(int(x+0.5), 0, N-1)
return r.date[thisind].strftime('%Y-%m-%d')
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(ind, r.adj_close, 'o-')
ax.xaxis.set_major_formatter(ticker.FuncFormatter(format_date))
fig.autofmt_xdate()
plt.show()

Well, I'll answer the easy part: To get Sept 03 2008 instead of 2008-09-03 use strftime('%b %d %Y'):
def format_date(x, pos=None):
thisind = np.clip(int(x+0.5), 0, N-1)
result = r.date[thisind].strftime('%b %d %Y')
return result
PS. The last date in r.date is Oct 14 2008, so I don't think it is a bad thing to include a tick mark for it. Are you sure you don't want it?

Related

How can I list sequentially the x and y axis on chart?

I have a dataframe and I want to show them on graph. When I start my code, the x and y axis are non-sequential. How can I solve it? Also I give a example graph on picture. First image is mine, the second one is what I want.
This is my code:
from datetime import timedelta, date
import datetime as dt #date analyse
import matplotlib.pyplot as plt
import pandas as pd #read file
def daterange(date1, date2):
for n in range(int ((date2 - date1).days)+1):
yield date1 + timedelta(n)
tarih="01-01-2021"
tarih2="20-06-2021"
start=dt.datetime.strptime(tarih, '%d-%m-%Y')
end=dt.datetime.strptime(tarih2, '%d-%m-%Y')
fg=pd.DataFrame()
liste=[]
tarih=[]
for dt in daterange(start, end):
dates=dt.strftime("%d-%m-%Y")
with open("fng_value.txt", "r") as filestream:
for line in filestream:
date = line.split(",")[0]
if dates == date:
fng_value=line.split(",")[1]
liste.append(fng_value)
tarih.append(dates)
fg['date']=tarih
fg['fg_value']=liste
print(fg.head())
plt.subplots(figsize=(20, 10))
plt.plot(fg.date,fg.fg_value)
plt.title('Fear&Greed Index')
plt.ylabel('Fear&Greed Data')
plt.xlabel('Date')
plt.show()
This is my graph:
This is the graph that I want:
Line plot with datetime x axis
So it appears this code is opening a text file, adding values to either a list of dates or a list of values, and then making a pandas dataframe with those lists. Finally, it plots the date vs values with a line plot.
A few changes should help your graph look a lot better. A lot of this is very basic, and I'd recommend reviewing some matplotlib tutorials. The Real Python tutorial is a good starting place in my opinion.
Fix the y axis limit:
plt.set_ylim(0, 100)
Use a x axis locator from mdates to find better spaced x label locations, it depends on your time range, but I made some data and used day locator.
import matplotlib.dates as mdates
plt.xaxis.set_major_locator(mdates.DayLocator())
Use a scatter plot to add data points as on the linked graph
plt.scatter(x, y ... )
Add a grid
plt.grid(axis='both', color='gray', alpha=0.5)
Rotate the x tick labels
plt.tick_params(axis='x', rotation=45)
I simulated some data and plotted it to look like the plot you linked, this may be helpful for you to work from.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import matplotlib.dates as mdates
fig, ax = plt.subplots(figsize=(15,5))
x = pd.date_range(start='june 26th 2021', end='july 25th 2021')
rng = np.random.default_rng()
y = rng.integers(low=15, high=25, size=len(x))
ax.plot(x, y, color='gray', linewidth=2)
ax.scatter(x, y, color='gray')
ax.set_ylim(0,100)
ax.grid(axis='both', color='gray', alpha=0.5)
ax.set_yticks(np.arange(0,101, 10))
ax.xaxis.set_major_locator(mdates.DayLocator())
ax.tick_params(axis='x', rotation=45)
ax.set_xlim(min(x), max(x))

Seaborn plot showing two labels at the start and end of a month [duplicate]

I am trying to create a heat map from pandas dataframe using seaborn library. Here, is the code:
test_df = pd.DataFrame(np.random.randn(367, 5),
index = pd.DatetimeIndex(start='01-01-2000', end='01-01-2001', freq='1D'))
ax = sns.heatmap(test_df.T)
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_minor_locator(mdates.DayLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
ax.xaxis.set_minor_formatter(mdates.DateFormatter('%d'))
However, I am getting a figure with nothing printed on the x-axis.
Seaborn heatmap is a categorical plot. It scales from 0 to number of columns - 1, in this case from 0 to 366. The datetime locators and formatters expect values as dates (or more precisely, numbers that correspond to dates). For the year in question that would be numbers between 730120 (= 01-01-2000) and 730486 (= 01-01-2001).
So in order to be able to use matplotlib.dates formatters and locators, you would need to convert your dataframe index to datetime objects first. You can then not use a heatmap, but a plot that allows for numerical axes, e.g. an imshow plot. You may then set the extent of that imshow plot to correspond to the date range you want to show.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
df = pd.DataFrame(np.random.randn(367, 5),
index = pd.DatetimeIndex(start='01-01-2000', end='01-01-2001', freq='1D'))
dates = df.index.to_pydatetime()
dnum = mdates.date2num(dates)
start = dnum[0] - (dnum[1]-dnum[0])/2.
stop = dnum[-1] + (dnum[1]-dnum[0])/2.
extent = [start, stop, -0.5, len(df.columns)-0.5]
fig, ax = plt.subplots()
im = ax.imshow(df.T.values, extent=extent, aspect="auto")
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_minor_locator(mdates.DayLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
fig.colorbar(im)
plt.show()
I found this question when trying to do a similar thing and you can hack together a solution but it's not very pretty.
For example I get the current labels, loop over them to find the ones for January and set those to just the year, setting the rest to be blank.
This gives me year labels in the correct position.
xticklabels = ax.get_xticklabels()
for label in xticklabels:
text = label.get_text()
if text[5:7] == '01':
label.set_text(text[0:4])
else:
label.set_text('')
ax.set_xticklabels(xticklabels)
Hopefully from that you can figure out what you want to do.

Change matplotlib offset notation from scientific to plain

I want to set the formatting of the y-axis offset in my plot to non-scientific notation, but I can't find a setting to do this. Other questions and their solutions describe how to either remove the offset altogether, or set the y-ticks to scientific/plain notation; I haven't found an answer for setting the notation of the offset itself.
I've already tried using these two options, but I think they're meant for the y-ticks, not the offsets:
ax.ticklabel_format(axis='y', style='plain', useOffset=6378.1)
and
ax.get_yaxis().get_major_formatter().set_scientific(False)
So, the actual result is +6.3781e3, when I want +6378.1
Any way to do this?
Edit: Added example code and figure:
#!/usr/bin/env python
from matplotlib import pyplot as plt
from matplotlib import ticker
plt.rcParams['font.family'] = 'monospace'
import random
Date = range(10)
R = [6373.1+10*random.random() for i in range(10)]
fig, ax = plt.subplots(figsize=(9,6))
ax.plot(Date,R,'-D',zorder=2,markersize=3)
ax.ticklabel_format(axis='y', style='plain', useOffset=6378.1)
ax.set_ylabel('Mean R (km)',fontsize='small',labelpad=1)
plt.show()
You can subclass the default ScalarFormatter and replace the get_offset method, such that it would simply return the offset as it is. Note that if you wanted to make this compatible with the multiplicative "offset", this solution would need to be adapted (currently it just prints a warning).
from matplotlib import pyplot as plt
import matplotlib.ticker
import random
class PlainOffsetScalarFormatter(matplotlib.ticker.ScalarFormatter):
def get_offset(self):
if len(self.locs) == 0:
return ''
if self.orderOfMagnitude:
print("Your plot will likely be labelled incorrectly")
return self.offset
Date = range(10)
R = [6373.1+10*random.random() for i in range(10)]
fig, ax = plt.subplots(figsize=(9,6))
ax.plot(Date,R,'-D',zorder=2,markersize=3)
ax.yaxis.set_major_formatter(PlainOffsetScalarFormatter())
ax.ticklabel_format(axis='y', style='plain', useOffset=6378.1)
ax.set_ylabel('Mean R (km)',fontsize='small',labelpad=1)
plt.show()
A way to do this is to disable the offset text itself and add your custom ax.text there as follows
from matplotlib import pyplot as plt
import random
plt.rcParams['font.family'] = 'monospace'
offset = 6378.1
Date = range(10)
R = [offset+10*random.random() for i in range(10)]
fig, ax = plt.subplots(figsize=(9,6))
ax.plot(Date,R,'-D',zorder=2,markersize=3)
ax.ticklabel_format(axis='y', style='plain', useOffset=offset)
ax.set_ylabel('Mean R (km)',fontsize='small',labelpad=1)
ax.yaxis.offsetText.set_visible(False)
ax.text(x = 0.0, y = 1.01, s = str(offset), transform=ax.transAxes)
plt.show()

Matplotlib labeling x-axis with time stamps, deleting extra 0's in microseconds

I am plotting over a period of seconds and have time as the labels on the x-axis. Here is the only way I could get the correct time stamps. However, there are a bunch of zeros on the end. Any idea how to get rid of them??
plt.style.use('seaborn-whitegrid')
df['timestamp'] = pd.to_datetime(df['timestamp'])
fig, ax = plt.subplots(figsize=(8,4))
seconds=MicrosecondLocator(interval=500000)
myFmt = DateFormatter("%S:%f")
ax.plot(df['timestamp'], df['vibration(g)_0'], c='blue')
ax.xaxis.set_major_locator(seconds)
ax.xaxis.set_major_formatter(myFmt)
plt.gcf().autofmt_xdate()
plt.show()
This produces this image. Everything looks perfect except for all of the extra zeros. How can I get rid of them while still keeping the 5?
I guess you would want to simply cut the last 5 digits out of the string. That's also what answers to python datetime: Round/trim number of digits in microseconds suggest.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.dates import MicrosecondLocator, DateFormatter
from matplotlib.ticker import FuncFormatter
x = np.datetime64("2018-11-30T00:00") + np.arange(1,4, dtype="timedelta64[s]")
fig, ax = plt.subplots(figsize=(8,4))
seconds=MicrosecondLocator(interval=500000)
myFmt = DateFormatter("%S:%f")
ax.plot(x,[2,1,3])
def trunc_ms_fmt(x, pos=None):
return myFmt(x,pos)[:-5]
ax.xaxis.set_major_locator(seconds)
ax.xaxis.set_major_formatter(FuncFormatter(trunc_ms_fmt))
plt.gcf().autofmt_xdate()
plt.show()
Note that this format is quite unusual; so make sure the reader of the plot understands it.

Plot 15 minute interval data with days and months as the x axis ticks

I have data logged on a 15 minute interval. I want to plot this data and have the x axis display a minor tick at each day, and a major tick each month. Here's a snippet of what I'm trying to do using Jan and Feb.
#!/usr/bin/env python3
import numpy as np
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
#make some data
t = np.arange(0, 5664, 1) # Jan and Feb worth of 15 minute steps
s = np.sin(2*np.pi*t) # data measured
#plot the data
fig, ax = plt.subplots()
ax.plot(t, s)
#select formatting
days = mdates.DayLocator()
daysfmt = mdates.DateFormatter('%d')
months = mdates.MonthLocator()
monthsfmt = mdates.DateFormatter('\n%b')
From the documentation and other Q&A's I have read and tried to piece together in my head, I understand that I need to tell matplotlib the formats I want to use for the x-axis. Here is where I am getting confused. I can't figure out how to indicate that the plot data is every 15 minutes (1440 samples per minor tick), and so when I show the plot there is nothing displayed on the graph. Or at least I think that's the cause...
#apply formatting
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(monthsfmt)
ax.xaxis.set_minor_locator(days)
ax.xaxis.set_minor_formatter(daysfmt)
#select dates
datemin = dt.datetime.strptime('01/01/17', '%d/%m/%y')
datemax = dt.datetime.strptime('28/02/17', '%d/%m/%y')
ax.set_xlim(datemin, datemax)
plt.show()
Plot Results
There, working ;)
There were two problems with your code:
you were plotting t and s, but setting the x axis with a wrong scale
you were only plotting the numerical noise of sin(x) because you were always using multiples of 2pi as argument to sin, and sin(n*2pi) = 0, for any integer n
Complete code:
#!/usr/bin/env python3
import num
py as np
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
#make some data
t = np.arange(0, 5664, 0.1) # Jan and Feb worth of 15 minute steps
s = np.sin(2*np.pi*t) # data measured
#plot the data
fig, ax = plt.subplots()
#ax.plot(t, s)
#select formatting
days = mdates.DayLocator()
daysfmt = mdates.DateFormatter('%d')
months = mdates.MonthLocator()
monthsfmt = mdates.DateFormatter('\n%b')
#apply formatting
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(monthsfmt)
ax.xaxis.set_minor_locator(days)
ax.xaxis.set_minor_formatter(daysfmt)
#select dates
datemin = dt.datetime.strptime('01/01/17', '%d/%m/%y')
datemax = dt.datetime.strptime('28/02/17', '%d/%m/%y')
t0 = dt.datetime(2017,1,1)
t_datetime = [ t0 + dt.timedelta(minutes=15*t_) for t_ in t ]
ax.plot(t_datetime,s)
ax.set_xlim(t0, t_datetime[-1])
plt.show()

Categories