Weird tick spacing when plotting with time on x-axis - python

I'm plotting a pandas time series, which work OK when plotting the timestamps on the x-axis...
import pandas as pd
import numpy as np
ts = pd.date_range("2020-01-01", "2020-01-02", freq='15T', closed="left")
vals = np.random.rand(len(ts))
pd.Series(vals, ts).plot()
...but which gives me a very unnatural tick spacing when plotting only the time part on the x-axis:
pd.Series(vals, ts.time).plot()
How can I turn the x-axis of this second plot into something more human-readable?

Use matplotlib and format xaxis:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
plt.plot(ts, vals)
myFmt = mdates.DateFormatter('%H:%M')
plt.gca().xaxis.set_major_formatter(myFmt)

Related

matplotlib bar chart with overlapping dates

I am plotting a simple bar chart using pandas/matplotlib. The x-axis is a datetime index. There are so many datapoints that the labels overlap. Is there an easy solution for this problem, no matter if I have daily, weekly, monthly, or yearly data?
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
idx = pd.date_range("2015-01-01", "2021-09-30", freq="b")
data = np.random.randn(len(idx))
df = pd.DataFrame(data={"returns": data}, index=idx)
df.plot(kind="bar")
plt.show()
Use DateFormatter to custom the xaxis but let Matplotlib handle the figure rather than Pandas:
import matplotlib.dates as mdates
# ...
fig, ax = plt.subplots(figsize=(15, 7))
ax.bar(df.index, df['returns'])
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))

DateFormatter is bringing 1970 as year not the original year in the dataset

I am trying to plot time series data. But x axis ticks are not coming the way it should. I wanted to out mont and year as x axis ticks. here is my code
from matplotlib.dates import DateFormatter
import matplotlib.dates as mdates
fig,ax = plt.subplots()
df_month.loc['2017', "Volume"].plot.bar(color='blue', ax=ax)
ax.set_ylabel("Volume")
ax.set_title("Volume")
date_form = DateFormatter("%y-%m")
ax.xaxis.set_major_formatter(date_form)
plt.xticks(rotation=45)
plt.show()
The output looks like this
What am I doing wrong? Please help.
My dataset looks like this:
Here is df_month data:
The following gives the right x-axis labels.
Import modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
import matplotlib.dates as mdates
Example data
df_month = pd.DataFrame({'Date':['2006-01-03', '2006-02-04', '2006-02-08'], 'Volume':[24232729, 20553479, 20500000]}) # '2006-01-03', '2006-01-04'
df_month['Date'] = pd.to_datetime(df_month['Date'])
Plotting
fig,ax = plt.subplots()
ax.set_ylabel("Volume")
ax.set_title("Volume")
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
ax.bar(df_month['Date'], df_month['Volume'])
plt.xticks(df_month['Date'], rotation=90)
plt.show()

Plotting more than 10K data point using Seaborn for x-axis as timestamp

I am trying to plot more than 10k data points, where I want to plot a data properties versus Timestamp. But on the x-axis the timestamps are overlapping and not visible.
How can I reduce the amount of labels on the x-axis, so that they are legible?
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
sns.set_style("whitegrid")
data = pd.read_csv('0912Testday4.csv',header=2)
for i in data.columns:
if i!='TIMESTAMP':
sns.lineplot(x="TIMESTAMP",y=i,data = data)
plt.title(f"{i} vs TIMESTAMP")
plt.show()
Example plot demonstrating the problem:
Update:TIMESTAMP was in string format by converting into datatime format it resolves the problem.
data['TIMESTAMP'] = pd.to_datetime(data['TIMESTAMP'])
Update:TIMESTAMP was in string format by converting into datetime format it resolves the problem.
data['TIMESTAMP'] = pd.to_datetime(data['TIMESTAMP'])
Please make sure that TIMESTAMP is a datetime object. This should not happen when the x axis is a datetime. (You can use pd.to_datetime to convert int, float, str, and ... to datetime.)
If TIMESTAMP is a datetime, you can use the autofmt_xdate() method:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
fig, ax = plt.subplots() # Create a figure and a set of subplots.
sns.set_style("whitegrid")
data = pd.read_csv('0912Testday4.csv',header=2)
# Use the following line if the TIMESTAMP is not a datetime.
# (You may need to change the format from "%Y-%m-%d %H:%M:%S+00:00".)
# data['TIMESTAMP'] = pd.to_datetime(data.TIMESTAMP, format="%Y-%m-%d %H:%M:%S+00:00")
for i in data.columns:
if i!='TIMESTAMP':
sns.lineplot(x="TIMESTAMP", y=i, data=data, ax=ax)
fig.autofmt_xdate() # rotate and right align date ticklabels
plt.title(f"{i} vs TIMESTAMP")
plt.show()
I didn't encounter such problem with sns.lineplot
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
# example data
time_stamps = pd.date_range('2019-01-01', '2020-01-01', freq='H')
vals =[np.random.randint(0, 1000) for i in time_stamps]
data_df = pd.DataFrame()
data_df['time'] = time_stamps
data_df['value'] = vals
print(data_df.shape)
# plotting
fig, ax = plt.subplots()
sns.lineplot(x='time', y='value', data=data_df)
plt.show()
sns automatically selects the x ticks and x labels.
alternatively, you can use ax.set_xticks and ax.set_xlabels to set the x ticks and x labels manually.
Also you may use fig.autofmt_xdate() to rotate the x labels

datetime x-axis matplotlib labels causing uncontrolled overlap

I'm trying to plot a pandas series with a 'pandas.tseries.index.DatetimeIndex'. The x-axis label stubbornly overlap, and I cannot make them presentable, even with several suggested solutions.
I tried stackoverflow solution suggesting to use autofmt_xdate but it doesn't help.
I also tried the suggestion to plt.tight_layout(), which fails to make an effect.
ax = test_df[(test_df.index.year ==2017) ]['error'].plot(kind="bar")
ax.figure.autofmt_xdate()
#plt.tight_layout()
print(type(test_df[(test_df.index.year ==2017) ]['error'].index))
UPDATE: That I'm using a bar chart is an issue. A regular time-series plot shows nicely-managed labels.
A pandas bar plot is a categorical plot. It shows one bar for each index at integer positions on the scale. Hence the first bar is at position 0, the next at 1 etc. The labels correspond to the dataframes' index. If you have 100 bars, you'll end up with 100 labels. This makes sense because pandas cannot know if those should be treated as categories or ordinal/numeric data.
If instead you use a normal matplotlib bar plot, it will treat the dataframe index numerically. This means the bars have their position according to the actual dates and labels are placed according to the automatic ticker.
import pandas as pd
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=42).tolist()
df = pd.DataFrame(np.cumsum(np.random.randn(42)),
columns=['error'], index=pd.to_datetime(datelist))
plt.bar(df.index, df["error"].values)
plt.gcf().autofmt_xdate()
plt.show()
The advantage is then in addition that matplotlib.dates locators and formatters can be used. E.g. to label each first and fifteenth of a month with a custom format,
import pandas as pd
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=93).tolist()
df = pd.DataFrame(np.cumsum(np.random.randn(93)),
columns=['error'], index=pd.to_datetime(datelist))
plt.bar(df.index, df["error"].values)
plt.gca().xaxis.set_major_locator(mdates.DayLocator((1,15)))
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter("%d %b %Y"))
plt.gcf().autofmt_xdate()
plt.show()
In your situation, the easiest would be to manually create labels and spacing, and apply that using ax.xaxis.set_major_formatter.
Here's a possible solution:
Since no sample data was provided, I tried to mimic the structure of your dataset in a dataframe with some random numbers.
The setup:
# imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.ticker as ticker
# A dataframe with random numbers ro run tests on
np.random.seed(123456)
rows = 100
df = pd.DataFrame(np.random.randint(-10,10,size=(rows, 1)), columns=['error'])
datelist = pd.date_range(pd.datetime(2017, 1, 1).strftime('%Y-%m-%d'), periods=rows).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
test_df = df.copy(deep = True)
# Plot of data that mimics the structure of your dataset
ax = test_df[(test_df.index.year ==2017) ]['error'].plot(kind="bar")
ax.figure.autofmt_xdate()
plt.figure(figsize=(15,8))
A possible solution:
test_df = df.copy(deep = True)
ax = test_df[(test_df.index.year ==2017) ]['error'].plot(kind="bar")
plt.figure(figsize=(15,8))
# Make a list of empty myLabels
myLabels = ['']*len(test_df.index)
# Set labels on every 20th element in myLabels
myLabels[::20] = [item.strftime('%Y - %m') for item in test_df.index[::20]]
ax.xaxis.set_major_formatter(ticker.FixedFormatter(myLabels))
plt.gcf().autofmt_xdate()
# Tilt the labels
plt.setp(ax.get_xticklabels(), rotation=30, fontsize=10)
plt.show()
You can easily change the formatting of labels by checking strftime.org

Polar chart of yearly data with matplolib

I am trying to plot data for a whole year as a polar chart in matplotlib, and having trouble locating any examples of this. I managed to convert the dates from pandas according to this thread, but I can't wrap my head around (litterally) the y-axis, or theta.
This is how far I got:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
times = pd.date_range("01/01/2016", "12/31/2016")
rand_nums = np.random.rand(len(times),1)
df = pd.DataFrame(index=times, data=rand_nums, columns=['A'])
ax = plt.subplot(projection='polar')
ax.set_theta_direction(-1)
ax.set_theta_zero_location("N")
ax.plot(mdates.date2num(df.index.to_pydatetime()), df['A'])
plt.show()
which gives me this plot:
reducing the date range to understand what is going on
times = pd.date_range("01/01/2016", "01/05/2016") I get this plot:
I gather that the start of the series is between 90 and 135, but how can I 'remap' this so that my year date range starts and finishes at the north origin?
The polar plot's angle range ranges over a full circle in radiants, i.e. [0, 2π].
One would therefore need to normalize the date range to the full circle.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
times = pd.date_range("01/01/2016", "12/31/2016")
rand_nums = np.random.rand(len(times),1)
df = pd.DataFrame(index=times, data=rand_nums, columns=['A'])
ax = plt.subplot(projection='polar')
ax.set_theta_direction(-1)
ax.set_theta_zero_location("N")
t = mdates.date2num(df.index.to_pydatetime())
y = df['A']
tnorm = (t-t.min())/(t.max()-t.min())*2.*np.pi
ax.fill_between(tnorm,y ,0, alpha=0.4)
ax.plot(tnorm,y , linewidth=0.8)
plt.show()

Categories