Plotting pd.Series object does not show year correctly

Plotting pd.Series object does not show year correctly - python

I am graphing the results of the measurements of a humidity sensor over time.
I'm using Python 3.7.1 and Pandas 0.24.2.
I have a list called dateTimeList with date and time strings:
dateTimeList = ['15.3.2019 11:44:27', '15.3.2019 12:44:33', '15.3.2019 13:44:39']
I wrote this code where index is a DatetimeIndex object and humList is a list of floats.
index = pd.to_datetime(dateTimeList, format='%d.%m.%Y %H:%M:%S')
ts = pd.Series(humList, index)
plt.figure(figsize=(12.80, 7.20))
ts.plot(title='Gráfico de Humedad en el Tiempo', style='g', marker='o')
plt.xlabel('Tiempo [días]')
plt.ylabel('Humedad [V]')
plt.grid()
plt.savefig('Hum_General'+'.png', bbox_inches='tight')
plt.show()
And I have this two results, one with data from February1 and the other one with data from March2.
The problem is that in March instead of leaving the year 2019, sequences of 00 12 00 12 appear on the x axis. I think it is important to note that this only happens on the data of March, since February is ok, and the data of both months have the same structure. Day and Month are shown correctly on both plots.
I also tried with:
index = [ pd.to_datetime(date, format='%d.%m.%Y %H:%M:%S') for date in dateTimeList]
Now index is a list of Timestamps objects. Same Results.

Add this immediately after creating the plot
import matplotlib.dates as mdates # this should be on the top of the script
xfmt = mdates.DateFormatter('%Y-%m-%d')
ax = plt.gca()
ax.xaxis.set_major_formatter(xfmt)
My guess is that since March has less data points, Matplotlib prefers to label dates as month-day-hour instead of year-month-date, so probably when you have more data in March the issue should fix itself. The code I posted should keep a year-month-day format regardless the number of data points used to plot.

Related

Ensuring first and last date ticks in x-axis - Matplotlib

Currently I am charting data from some historical point to a point in current time. For example, January 2019 to TODAY (February 2021). However, my matplotlib chart only shows dates from January 2019 to January 2021 on the x-axis (with the last February tick missing) even though the data is charted to today's date on the actual plot.
Is there any way to ensure that the first and last month is always reflected on the x-axis chart? In other words, I would like the x-axis to have the range displayed (inclusive).
Picture of x axis (missing February 2021)
The data charted here is from January 2019 to TODAY (February 12th).
Here is my code for the date format:
fig.autofmt_xdate()
date_format = mdates.DateFormatter("%b-%y")
ax.xaxis.set_major_formatter(date_format)
EDIT: The numbers after each month represent years.

I am not aware of any way to do this other than by creating the ticks from scratch.
In the following example, a list of all first-DatetimeIndex-timestamp-of-the-month is created from the DatetimeIndex of a pandas dataframe, starting from the month of the first date (25th of Jan.) up to the start of the last ongoing month. An appropriate number of ticks is automatically selected by the step variable and the last month is appended and then removed with np.unique when it is a duplicate. The labels are formatted from the tick timestamps.
This solution works for any frequency smaller than yearly:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
# Create sample dataset
start_date = '2019-01-25'
end_date = '2021-02-12'
rng = np.random.default_rng(seed=123) # random number generator
dti = pd.date_range(start_date, end_date, freq='D')
variable = 100 + rng.normal(size=dti.size).cumsum()
df = pd.DataFrame(dict(variable=variable), index=dti)
# Create matplotlib plot
fig, ax = plt.subplots(figsize=(10, 2))
ax.plot(df.index, df.variable)
# Create list of monthly ticks made of timestamp objects
monthly_ticks = [timestamp for idx, timestamp in enumerate(df.index)
if (timestamp.month != df.index[idx-1].month) | (idx == 0)]
# Select appropriate number of ticks and include last month
step = 1
while len(monthly_ticks[::step]) > 10:
step += 1
ticks = np.unique(np.append(monthly_ticks[::step], monthly_ticks[-1]))
# Create tick labels from tick timestamps
labels = [timestamp.strftime('%b\n%Y') if timestamp.year != ticks[idx-1].year
else timestamp.strftime('%b') for idx, timestamp in enumerate(ticks)]
plt.xticks(ticks, labels, rotation=0, ha='center');
As you can see, the first and last months are located at an irregular distance from the neighboring tick.
In case you are plotting a time series with a discontinous date range (e.g. weekend and holidays not included) and you are not using the DatetimeIndex for the x-axis (like this for example: ax.plot(range(df.index.size), df.variable)) so as to avoid gaps with straight lines showing up on short time series and/or very wide plots, then replace the last line of code with this:
plt.xticks([df.index.get_loc(tick) for tick in ticks], labels, rotation=0, ha='center');

Matplotlib uses a limited number of ticks. It just happens that for February 2021 no tick is used. There are two things you could try. First try setting the axis limits to past today with:
ax.set_xlim(start_date, end_date)
What you could also try, is using even more ticks:
ax.set_xticks(np.arange(np.min(x), np.max(x), n_ticks))
Where n_ticks stands for the amount of ticks and x for the values on the x-axis.

Display Multiple Year's Data Using Custom Start/End Dates - datetime, matplotlib

I feel like this question has an obvious answer and I'm just being a bit of a fool. Say you have a couple of dataframes with datetime indices, where each dataframe is for a different year. In my case the index is every day going from June 25th to June 24th the next year:
date var
2019-06-25 107.230294
2019-06-26 104.110004
2019-06-27 104.291506
2019-06-28 111.162552
2019-06-29 112.515364
...
2020-06-20 132.840242
2020-06-21 127.641148
2020-06-22 132.797584
2020-06-23 129.094451
2020-06-24 110.408866
What I want is a single plot with multiple lines, where each line represents a year. The y-axis is my variable, var, and the x-axis should be day of the year. The x-axis should start from June 25th and end at June 24th.
This is what I've tried so far but it messes up the x-axis. Anyone know a more elegant way to do this?
fig, ax = plt.subplots()
plt.plot(average_prices19.index.strftime("%d/%m"), average_prices19.var, label = "2019-20")
plt.plot(average_prices20.index.strftime("%d/%m"), average_prices20.var, label = "2020-21")
plt.legend()
plt.show()

Well, there is a twist in this question: the list of dates in a year is not constant: on leap years there is a 'Feb-29' that is otherwise absent.
If you are comfortable glossing over this (and always representing a potential 'Feb-29' date on your plot, with missing data for non-leap years), then the following will achieve what you are seeking (assuming the data is in df with the date as DateTimeIndex):
import matplotlib.dates as mdates
fig, ax = plt.subplots()
for label, dfy in df.assign(
# note: 2000 is a leap year; the choice is deliberate
date=pd.to_datetime(df.index.strftime('2000-%m-%d')),
label=df.index.strftime('%Y')
).groupby('label'):
dfy.set_index('date')['var'].plot(ax=ax, label=str(label))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%m-%d"))
ax.legend()
Update
For larger amounts of data however, the above does not produce very legible xlabels. So instead, we can use ConciseFormatter to customize the display of xlabels (and remove the fake year 2000):
import matplotlib.dates as mdates
fig, ax = plt.subplots()
for label, dfy in df.assign(
# note: 2000 is a leap year; the choice is deliberate
date=pd.to_datetime(df.index.strftime('2000-%m-%d')),
label=df.index.strftime('%Y')
).groupby('label'):
dfy.set_index('date')['var'].plot(ax=ax, label=str(label))
ax.legend()
locator = mdates.AutoDateLocator(minticks=3, maxticks=7)
formatter = mdates.ConciseDateFormatter(
locator,
formats=['', '%b', '%d', '%H:%M', '%H:%M', '%S.%f'],
offset_formats=['', '', '%b', '%b-%d', '%b-%d', '%b-%d %H:%M']
)
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(formatter)
For the data in your example:
For more data:
# setup
idx = pd.date_range('2016-01-01', 'now', freq='QS')
df = pd.DataFrame(
{'var': np.random.uniform(size=len(idx))},
index=idx).resample('D').interpolate(method='polynomial', order=5)
Corresponding plot (with ConciseFormatter):

Add Pivot table columns and index as xticks and yticks

I have a pivot table created according to this: Color mapping of data on a date vs time plot and plot it with imshow(). I want to use the index and columns of the pivot table as yticks and xticks. The columns in my pivot table are dates and the index are time of the day.
data = pd.DataFrame()
data['Date']=Tgrad_GFAVD_3m2mRot.index.date
data['Time']=Tgrad_GFAVD_3m2mRot.index.strftime("%H")
data['Tgrad']=Tgrad_GFAVD_3m2mRot.values
C = data.pivot(index='Time', columns='Date', values='Tgrad')
print(C.head()):
Date 2016-08-01 2016-08-02 2016-08-03 2016-08-04 2016-08-05 2016-08-06 \
Time
00 -0.841203 -0.541871 -0.042984 -0.867929 -0.790869 -0.940757
01 -0.629176 -0.520935 -0.194655 -0.866815 -0.794878 -0.910690
02 -0.623608 -0.268820 -0.255457 -0.859688 -0.824276 -0.913808
03 -0.615145 -0.008241 -0.463920 -0.909354 -0.811136 -0.878619
04 -0.726949 -0.169488 -0.529621 -0.897773 -0.833408 -0.825612
I plot the pivot table with
fig, ax = plt.subplots(figsize = (16,9))
plt = ax.imshow(C,aspect = 'auto', extent=[0,len(data["Date"]),0,23], origin = "lower")
I tried a couple of things but nothing worked. At the moment my xticks range between 0 and 6552, which is the length of the C.columns object and is set by the extent argument in imshow()
I would like to have the xticks at every first of the month but not by index number but as a datetick in the format "2016-08-01" for example.
I am sure it was just a small thing that has been stopping me the last hour, but now I give up. Do you know how to set the xticks accordingly?

I found the solution myself after trying one more thing.. I created another column in the "data" Dataframe with datenum entries instead of dates
data["datenum"]=mdates.date2num(data["Date"])
Then changed the plot line to:
pl = ax.imshow(C,aspect = 'auto',
extent=[data["datenum"].iloc[0],data["datenum"].iloc[-1],data["Time"].iloc[0],data["Time"].iloc[-1]],
origin = "lower")
So the change of the extent argument provided the datenum values to the plot instead of the index of the date column.
Then with this the following lines worked:
ax.set_yticks(data["Time"]) # sets yticks
ax.xaxis_date() # tells the xaxis that it should expect datetime values
ax.xaxis.set_major_formatter(mdates.DateFormatter("%m/%d") ) # formats the datetime values
fig.autofmt_xdate() # makes it look nice
Best,
Vroni

Remove Saturdays (but not Sundays or other dataless periods) from Timeserie plot

I am plotting a financial timeserie (see below, here 1 month worth of data)
I would like to remove the periods I show with red cross etc., which are Saturdays. Note that those periods are not all the time periods without data but only the Saturdays.
I know there are some example of how to remove the gaps , for instance: http://matplotlib.org/examples/api/date_index_formatter.html.
This is not what I am after since they remove all the gaps. (NOT MY INTENT!).
I was thinking that the way to go might be to create a custom sequence of values for the xaxis. Since the days are ordinals (ie 1 day = a value of 1), it might be possible to create a sequence such as 1,2,3,4,5,6,8,9,10,11,12,13,15,16,etc. skipping 1 every seven days - that day skipped needing to be a Saturday of course.
The skipping of every Saturday i can imagine how to do it using rrule from timeutil. It is done here (see below) as every Monday is marked with a stronger vertical line. But How would i go at passing it to the Tick locator? There is in fact a RRuleLocator class in the matplotlib API but no indication on how to use it is given in the doc: http://matplotlib.org/api/dates_api.html#matplotlib.dates.RRuleLocator.
Every suggestion welcome.
Here the code that I use for the current chart:
fig, axes = plt.subplots(2, figsize=(20, 6))
quotes = price_data.as_matrix() # as matrix() to remove the columns header of the df
mpf.candlestick_ohlc(axes[0], quotes, width=0.01)
plt.bar(quotes[:,0] , quotes[:,5], width = 0.01)
for i , axes[i] in enumerate(axes):
axes[i].xaxis.set_major_locator(mdates.DayLocator(interval=1) )
axes[i].xaxis.set_major_formatter(mdates.DateFormatter('%a, %b %d'))
axes[i].grid(True)
# show night times with a grey shade
majors=axes[i].xaxis.get_majorticklocs()
chart_start, chart_end = (axes[i].xaxis.get_view_interval()[0],
axes[i].xaxis.get_view_interval()[1])
for major in majors:
axes[i].axvspan(max (chart_start, major-(0.3333)),
min(chart_end, major+(0.3333)),color="0.95", zorder=-1 ) #0.33 corresponds to 1/3 of a day i.e. 8h
# show mondays with a line
mondays = list(rrule(WEEKLY, byweekday=MO, dtstart= mdates.num2date(chart_start),
until=mdates.num2date(chart_end)))
for j, monday in enumerate(mondays):
axes[i].axvline(mdates.date2num(mondays[j]), linewidth=0.75, color='k', zorder=1)

If your dates are datetime objects, or a DateTimeIndex in a pandas DataFrame, you could check which weekday a certain date is using .weekday. Then you just set the data on Saturdays to nan. See the example below.
Code
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import random
import datetime
import numpy as np
# generate some data with a datetime index
x = 400
data = pd.DataFrame([
random.random() for i in range(x)],
index=[datetime.datetime(2018, 1, 1, 0)
+ datetime.timedelta(hours=i) for i in range(x)])
# Set all data on a Saturday (5) to nan, so it doesn't show in the graph
data[data.index.weekday == 5] = np.nan
# Plot the data
fig, ax = plt.subplots(figsize=(12, 2.5))
ax.plot(data)
# Set a major tick on each weekday
days = mdates.DayLocator()
daysFmt = mdates.DateFormatter('%a')
ax.xaxis.set_major_locator(days)
ax.xaxis.set_major_formatter(daysFmt)
Result

Formatting X axis labels Pandas time series plot

I am trying to plot a multiple time series dataframe in pandas. The time series is a 1 year daily points of length 365. The figure is coming alright but I want to suppress the year tick showing on the x axis.
I want to suppress the 1950 label showing in the left corner of x axis. Can anybody suggest something on this? My code
dates = pandas.date_range('1950-01-01', '1950-12-31', freq='D')
data_to_plot12 = pandas.DataFrame(data=data_array, # values
index=homo_regions) # 1st column as index
dataframe1 = pandas.DataFrame.transpose(data_to_plot12)
dataframe1.index = dates
ax = dataframe1.plot(lw=1.5, marker='.', markersize=2, title='PRECT time series PI Slb Ocn CNTRL 60 years')
ax.set(xlabel="Months", ylabel="PRECT (mm/day)")
fig_name = 'dataframe1.pdf'
plt.savefig(fig_name)

You should be able to specify the xaxis major formatter like so
import matplotlib.dates as mdates
...
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Plotting pd.Series object does not show year correctly - python

Related

Ensuring first and last date ticks in x-axis - Matplotlib

Display Multiple Year's Data Using Custom Start/End Dates - datetime, matplotlib

Add Pivot table columns and index as xticks and yticks

Remove Saturdays (but not Sundays or other dataless periods) from Timeserie plot

Formatting X axis labels Pandas time series plot

Categories

Resources