seaborn plot misplotting x axis dates from pandas - python

I am trying to plot a lineplot in seaborn with values for the y axis and dates for the x axis coming from a pandas dataframe.
When I create the lineplot without converting the date column to datetime object it puts the dates in the wrong order. When I do convert the date column to datetime format it gives strange x labels and only shows 5 of the dates.
df = pd.read_csv("dddd.csv")
df["date"] = pd.to_datetime(df["date"])
df = df.sort_values(["date"])
ax = sns.lineplot(y=df["google"],x=df["date"],color="red",data=df)
plt.show()
I want to just plot the data with the x labels being the dates have it in order. Here is some example data.
25-03-2019 -100
26-03-2019 -66.66666667
27-03-2019 -80
28-03-2019 -87.08333333
29-03-2019 -88.88888889
30-03-2019 -86.28526646
31-03-2019 -87.5
01-04-2019 -87.87878788
02-04-2019 -82.92682927
03-04-2019 -84.09090909
04-04-2019 -84.7826087
05-04-2019 -85.71428571
06-04-2019 -81.30677848
07-04-2019 -81.98051948
08-04-2019 -82.14285714
09-04-2019 -78.46153846
10-04-2019 -76.05633803
11-04-2019 -75
12-04-2019 -75
13-04-2019 -80
14-04-2019 -83.33333333
15-04-2019 -83.33333333
16-04-2019 -77.77777778
17-04-2019 -68
18-04-2019 -54.70085471
19-04-2019 -64.70588235
20-04-2019 -66.66666667

Despite this question most certainly being a duplicate, here is some explanation:
Without converting to datetime, your x-axis is unordered, because python doesn't understand the values are dates. Here it probably draws a tick for every point, because it doesn't know how to fit "16-04-2019" on a continuous scale. (Try plotting an array of strings on the x-axis as comparison).
If you do convert to datetime, seaborn and pandas are intelligent enough, to notice your dates form a continuous scale. Because the ticklabels would overlap when drawing all of them, it leaves out some of them and draws ticks in regular intervals instead.
You can influence the location with ax.set_ticks(locations), their style with ax.tick_params(rotation=90, horizontal_alignment='left') or their labels with ax.set_xticklabels(labels). (keywords as example).
You can also use
import matplotlib.dates ad md
locs = md.YearLocator() # Locations for yearly ticks
format = md.MajorFormatter('%B %Y') # Format for labels
to access formatter functions and ax.set_major_locator / formatter to apply them to your plot.
To simply draw a tick for every datapoint, in your case use:
ax.set_xticks(df["Date"])

Related

How to add hourly ticks in an axis from datetime formatted data

I have a dataframe of daily temperature variation with time
time temp temp_mean
00:01:51.57 185.94 185.94
00:01:52.54 187.48 186.71
00:01:53.51 197.85 190.4233333
00:01:54.49 195.71 191.745
00:01:55.46 197.22 192.84
00:01:56.43 187.33 191.9216667
00:01:57.41 194.18 192.2442857
00:01:58.38 199.9 193.20125
00:01:59.35 184.23 192.2044444
00:02:00.33 201.34 193.118
00:02:01.30 200.12 193.7545455
00:02:02.27 199.13 194.2025
00:02:03.24 187.47 193.6846154
00:02:04.22 187.65 193.2535714
00:02:05.19 195.59 193.4093333
00:02:06.17 188.7 193.115
00:02:07.14 196.16 193.2941176
00:02:08.11 191.17 193.1761111
00:02:09.08 198.62 193.4626316
00:02:10.06 190.79 193.329
00:02:11.03 193.35 193.33
00:02:12.00 199.36 193.6040909
00:02:12.98 190.76 193.4804348
00:02:13.95 205.16 193.9670833
00:02:14.92 194.89 194.004
00:02:15.90 185.3 193.6692308
like this. (12000+ rows)
I want to plot time vs temp as a line plot, with hourly ticks on x-axis(1 hr interval).
But somehow I couldn't assign x ticks with proper frequency.
fig, ax = plt.subplots()
ax.plot(data['time'], data['temp'])
ax.plot(data['time'], data['temp_mean'],color='red')
xformatter = mdates.DateFormatter('%H:%M')
xlocator = mdates.HourLocator(interval = 1)
## Set xtick labels to appear every 15 minutes
ax.xaxis.set_major_locator(xlocator)
## Format xtick labels as HH:MM
ax.xaxis.set_major_formatter(xformatter)
fig.autofmt_xdate()
ax.tick_params(axis='x', rotation=45)
plt.show()
Here xticks seems to be crowded and overlapping, but I need ticks from 0:00 to 23:00 with one hour interval.
What should I do ?
Convert the 'time' column to a datetime dtype with pd.to_datetime, and then extract the time component with the .dt accessor.
See python datetime format codes to specify the format=... string.
Plot with pandas.DataFrame.plot
Tested in python 3.8.12, pandas 1.3.3, matplotlib 3.4.3
import pandas as pd
# sample data
data = {'time': ['00:01:51.57', '00:01:52.54', '00:01:53.51', '00:01:54.49', '00:01:55.46', '00:01:56.43', '00:01:57.41', '00:01:58.38', '00:01:59.35', '00:02:00.33', '00:02:01.30', '00:02:02.27', '00:02:03.24', '00:02:04.22', '00:02:05.19', '00:02:06.17', '00:02:07.14', '00:02:08.11', '00:02:09.08', '00:02:10.06', '00:02:11.03', '00:02:12.00', '00:02:12.98', '00:02:13.95', '00:02:14.92', '00:02:15.90'],
'temp': [185.94, 187.48, 197.85, 195.71, 197.22, 187.33, 194.18, 199.9, 184.23, 201.34, 200.12, 199.13, 187.47, 187.65, 195.59, 188.7, 196.16, 191.17, 198.62, 190.79, 193.35, 199.36, 190.76, 205.16, 194.89, 185.3],
'temp_mean': [185.94, 186.71, 190.4233333, 191.745, 192.84, 191.9216667, 192.2442857, 193.20125, 192.2044444, 193.118, 193.7545455, 194.2025, 193.6846154, 193.2535714, 193.4093333, 193.115, 193.2941176, 193.1761111, 193.4626316, 193.329, 193.33, 193.6040909, 193.4804348, 193.9670833, 194.004, 193.6692308]}
df = pd.DataFrame(data)
# convert column to datetime and extract time component
df.time = pd.to_datetime(df.time, format='%H:%M:%S.%f').dt.time
# plot
ax = df.plot(x='time', color=['tab:blue', 'tab:red'])

multi colored rel or line plot in seaborn

Im the dataframe below, I have unique compIds which can have multiple capei and multiple date. This is primarily a time series dataset.
date capei compId
0 200401 25.123777 31946.0
1 200401 15.844910 29586.0
2 200401 20.524131 32507.0
3 200401 15.844910 29586.0
4 200401 15.844910 29586.0
... ... ... ...
73226 202011 9.372320 2817.0
73227 202011 9.372320 2817.0
73228 202011 22.334842 28581.0
73229 202011 10.761727 31946.0
73230 202011 30.205348 15029.0
With the following visualization code, I get the plot but the color of the line plots are not different. I wanted different colors.
import seaborn as sns
a4_dims = (15, 5)
sns.set_palette("vlag")
**plot**
sns.set_style('ticks')
fig, ax = plt.subplots()
fig.set_size_inches(11.7, 8.27)
sns.relplot(x="date", ax=ax, y="capei", style='compId', kind='line',data=fDf, palette=sns.color_palette("Spectral", as_cmap=True) )
It generates image like this
However I am expecting plot as like
The compId in the picture generated figure 1 can be Month equivalent in figure 2.
Figure 2 is a screenshot from here.
How would be able to have different colors for compId in the Figure 1.
First of all, you should reformat your dataframe:
convert 'date' from str to datetime:
fDf['date'] = pd.to_datetime(fDf['date'], format='%Y%m')
convert 'compId' from float to str in order to be used as a categorical axis(1):
fDf['compId'] = fDf['compId'].apply(str)
Now you can pass 'compId' to seaborn.relplot as hue and/or style parameter, depending on your preferencies.
Complete Code
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
fDf = pd.read_csv(r'data/data.csv')
fDf['date'] = pd.to_datetime(fDf['date'], format='%Y%m')
fDf['compId'] = fDf['compId'].apply(str)
sns.set_palette("vlag")
sns.set_style('ticks')
sns.relplot(x="date", y="capei", style='compId', hue='compId', kind='line',data=fDf, estimator=None)
plt.show()
(plot drawn with a fake-dataframe)
This passage may or may not be necessary; given the context I suggest it is.
If you keep compId as numerical type, then the hue in the plot will be proportional to 'compId'value. This means 0.4 will have a color very different from 31946.0; but 31946.0 and 32507.0 will be practically indistinguishable by color.
If you convert compId to str, then the hue won't depent of compId numerical value, so the colors will be equally spaced among categories.
fDf['compId'] as it is
fDf['compId'].apply(str)

Ensuring first and last date ticks in x-axis - Matplotlib

Currently I am charting data from some historical point to a point in current time. For example, January 2019 to TODAY (February 2021). However, my matplotlib chart only shows dates from January 2019 to January 2021 on the x-axis (with the last February tick missing) even though the data is charted to today's date on the actual plot.
Is there any way to ensure that the first and last month is always reflected on the x-axis chart? In other words, I would like the x-axis to have the range displayed (inclusive).
Picture of x axis (missing February 2021)
The data charted here is from January 2019 to TODAY (February 12th).
Here is my code for the date format:
fig.autofmt_xdate()
date_format = mdates.DateFormatter("%b-%y")
ax.xaxis.set_major_formatter(date_format)
EDIT: The numbers after each month represent years.
I am not aware of any way to do this other than by creating the ticks from scratch.
In the following example, a list of all first-DatetimeIndex-timestamp-of-the-month is created from the DatetimeIndex of a pandas dataframe, starting from the month of the first date (25th of Jan.) up to the start of the last ongoing month. An appropriate number of ticks is automatically selected by the step variable and the last month is appended and then removed with np.unique when it is a duplicate. The labels are formatted from the tick timestamps.
This solution works for any frequency smaller than yearly:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
# Create sample dataset
start_date = '2019-01-25'
end_date = '2021-02-12'
rng = np.random.default_rng(seed=123) # random number generator
dti = pd.date_range(start_date, end_date, freq='D')
variable = 100 + rng.normal(size=dti.size).cumsum()
df = pd.DataFrame(dict(variable=variable), index=dti)
# Create matplotlib plot
fig, ax = plt.subplots(figsize=(10, 2))
ax.plot(df.index, df.variable)
# Create list of monthly ticks made of timestamp objects
monthly_ticks = [timestamp for idx, timestamp in enumerate(df.index)
if (timestamp.month != df.index[idx-1].month) | (idx == 0)]
# Select appropriate number of ticks and include last month
step = 1
while len(monthly_ticks[::step]) > 10:
step += 1
ticks = np.unique(np.append(monthly_ticks[::step], monthly_ticks[-1]))
# Create tick labels from tick timestamps
labels = [timestamp.strftime('%b\n%Y') if timestamp.year != ticks[idx-1].year
else timestamp.strftime('%b') for idx, timestamp in enumerate(ticks)]
plt.xticks(ticks, labels, rotation=0, ha='center');
As you can see, the first and last months are located at an irregular distance from the neighboring tick.
In case you are plotting a time series with a discontinous date range (e.g. weekend and holidays not included) and you are not using the DatetimeIndex for the x-axis (like this for example: ax.plot(range(df.index.size), df.variable)) so as to avoid gaps with straight lines showing up on short time series and/or very wide plots, then replace the last line of code with this:
plt.xticks([df.index.get_loc(tick) for tick in ticks], labels, rotation=0, ha='center');
Matplotlib uses a limited number of ticks. It just happens that for February 2021 no tick is used. There are two things you could try. First try setting the axis limits to past today with:
ax.set_xlim(start_date, end_date)
What you could also try, is using even more ticks:
ax.set_xticks(np.arange(np.min(x), np.max(x), n_ticks))
Where n_ticks stands for the amount of ticks and x for the values on the x-axis.

Add Pivot table columns and index as xticks and yticks

I have a pivot table created according to this: Color mapping of data on a date vs time plot and plot it with imshow(). I want to use the index and columns of the pivot table as yticks and xticks. The columns in my pivot table are dates and the index are time of the day.
data = pd.DataFrame()
data['Date']=Tgrad_GFAVD_3m2mRot.index.date
data['Time']=Tgrad_GFAVD_3m2mRot.index.strftime("%H")
data['Tgrad']=Tgrad_GFAVD_3m2mRot.values
C = data.pivot(index='Time', columns='Date', values='Tgrad')
print(C.head()):
Date 2016-08-01 2016-08-02 2016-08-03 2016-08-04 2016-08-05 2016-08-06 \
Time
00 -0.841203 -0.541871 -0.042984 -0.867929 -0.790869 -0.940757
01 -0.629176 -0.520935 -0.194655 -0.866815 -0.794878 -0.910690
02 -0.623608 -0.268820 -0.255457 -0.859688 -0.824276 -0.913808
03 -0.615145 -0.008241 -0.463920 -0.909354 -0.811136 -0.878619
04 -0.726949 -0.169488 -0.529621 -0.897773 -0.833408 -0.825612
I plot the pivot table with
fig, ax = plt.subplots(figsize = (16,9))
plt = ax.imshow(C,aspect = 'auto', extent=[0,len(data["Date"]),0,23], origin = "lower")
I tried a couple of things but nothing worked. At the moment my xticks range between 0 and 6552, which is the length of the C.columns object and is set by the extent argument in imshow()
I would like to have the xticks at every first of the month but not by index number but as a datetick in the format "2016-08-01" for example.
I am sure it was just a small thing that has been stopping me the last hour, but now I give up. Do you know how to set the xticks accordingly?
I found the solution myself after trying one more thing.. I created another column in the "data" Dataframe with datenum entries instead of dates
data["datenum"]=mdates.date2num(data["Date"])
Then changed the plot line to:
pl = ax.imshow(C,aspect = 'auto',
extent=[data["datenum"].iloc[0],data["datenum"].iloc[-1],data["Time"].iloc[0],data["Time"].iloc[-1]],
origin = "lower")
So the change of the extent argument provided the datenum values to the plot instead of the index of the date column.
Then with this the following lines worked:
ax.set_yticks(data["Time"]) # sets yticks
ax.xaxis_date() # tells the xaxis that it should expect datetime values
ax.xaxis.set_major_formatter(mdates.DateFormatter("%m/%d") ) # formats the datetime values
fig.autofmt_xdate() # makes it look nice
Best,
Vroni

Formatting X axis labels Pandas time series plot

I am trying to plot a multiple time series dataframe in pandas. The time series is a 1 year daily points of length 365. The figure is coming alright but I want to suppress the year tick showing on the x axis.
I want to suppress the 1950 label showing in the left corner of x axis. Can anybody suggest something on this? My code
dates = pandas.date_range('1950-01-01', '1950-12-31', freq='D')
data_to_plot12 = pandas.DataFrame(data=data_array, # values
index=homo_regions) # 1st column as index
dataframe1 = pandas.DataFrame.transpose(data_to_plot12)
dataframe1.index = dates
ax = dataframe1.plot(lw=1.5, marker='.', markersize=2, title='PRECT time series PI Slb Ocn CNTRL 60 years')
ax.set(xlabel="Months", ylabel="PRECT (mm/day)")
fig_name = 'dataframe1.pdf'
plt.savefig(fig_name)
You should be able to specify the xaxis major formatter like so
import matplotlib.dates as mdates
...
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))

Categories