I am currently working on visualizing datasets with Seaborn and Pandas. I have some time-dependent data that I would like to graph in bar charts.
However, I am battling with two issues in Seaborn:
Formatting dates on the x-axis
Only showing a handful of dates (as
it doesn't make sense to have every day labeled on a 6 month graph)
I have found a solution for my issues in normal Matplotlib, which is:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
N = 20
np.random.seed(2022)
dates = pd.date_range('1/1/2014', periods=N, freq='m')
df = pd.DataFrame(
data={'dt':dates, 'val': np.random.randn(N)}
)
fig, ax = plt.subplots(figsize=(10, 6))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
ax.bar(df['dt'], df['val'], width=25, align='center')
However, I already have most of my graphs done in Seaborn, and I would like to stay consistent. Once I convert the previous code into Seaborn, I lose the ability to format the dates:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
N = 20
np.random.seed(2022)
dates = pd.date_range('1/1/2014', periods=N, freq='m')
df = pd.DataFrame(
data={'dt':dates, 'val': np.random.randn(N)}
)
fig, ax = plt.subplots(1,1)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%y-%m'))
sns.barplot(x='dt', y='val', data=df)
fig.autofmt_xdate()
When I run the code, the date format remains unchanged and I can't locate any dates with DateLocator.
Is there any way for me to format my X-Axis for dates in Seaborn in a way similar to Matplotlib with DateLocator and DateFormatter?
No, you cannot use seaborn.barplot in conjunction with matplotlib.dates ticking. The reason is that the ticks for seaborn barplots are at integer positions (0,1,..., N-1). So they cannot be interpreted as dates.
You have three options:
Use seaborn, and loop through the labels and set them to anything you want
Not use seaborn and have the advantages (and disadvantages) of matplotlib.dates tickers available.
Change the format in the dataframe prior to plotting.
Tested in python 3.10, pandas 1.5.0, matplotlib 3.5.2, seaborn 0.12.0
N = 20
np.random.seed(2022)
dates = pd.date_range('1/1/2014', periods=N, freq='m')
df = pd.DataFrame(data={'dates': dates, 'val': np.random.randn(N)})
# change the datetime format in the dataframe prior to plotting
df.dates = df.dates.dt.strftime('%Y-%m')
fig, ax = plt.subplots(1,1)
sns.barplot(x='dates', y='val', data=df)
xticks = ax.get_xticks()
xticklabels = [x.get_text() for x in ax.get_xticklabels()]
_ = ax.set_xticks(xticks, xticklabels, rotation=90)
N = 20
np.random.seed(2022)
dates = pd.date_range('1/1/2014', periods=N, freq='m')
df = pd.DataFrame(data={'dates': dates, 'val': np.random.randn(N)})
df.dates = df.dates.dt.strftime('%Y-%m')
fig, ax = plt.subplots(figsize=(10, 6))
sns.barplot(x='dates', y='val', data=df)
xticks = ax.get_xticks()
xticklabels = [x.get_text() if not i%2 == 0 else '' for i, x in enumerate(ax.get_xticklabels())]
_ = ax.set_xticks(xticks, xticklabels)
Related
I am plotting a simple bar chart using pandas/matplotlib. The x-axis is a datetime index. There are so many datapoints that the labels overlap. Is there an easy solution for this problem, no matter if I have daily, weekly, monthly, or yearly data?
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
idx = pd.date_range("2015-01-01", "2021-09-30", freq="b")
data = np.random.randn(len(idx))
df = pd.DataFrame(data={"returns": data}, index=idx)
df.plot(kind="bar")
plt.show()
Use DateFormatter to custom the xaxis but let Matplotlib handle the figure rather than Pandas:
import matplotlib.dates as mdates
# ...
fig, ax = plt.subplots(figsize=(15, 7))
ax.bar(df.index, df['returns'])
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
I have a simple dataframe with the time as index and dummy values as example.[]
I did a simple scatter plot as you see here:
Simple question: How to adjust the xaxis, so that all time values from 00:00 to 23:00 are visible in the xaxis? The rest of the plot is fine, it shows all the datapoints, it is just the labeling. Tried different things but didn't work out.
All my code so far is:
import pandas as pd
import seaborn as sns
import matplotlib.dates as mdates
from datetime import time
data = []
for i in range(0, 24):
temp_list = []
temp_list.append(time(i))
temp_list.append(i)
data.append(temp_list)
my_df = pd.DataFrame(data, columns=["time", "values"])
my_df.set_index(['time'],inplace=True)
my_df
fig = sns.scatterplot(my_df.index, my_df['values'])
fig.set(xlabel='time', ylabel='values')
I think you're gonna have to go down to the matplotlib level for this:
import pandas as pd
import seaborn as sns
import matplotlib.dates as mdates
from datetime import time
import matplotlib.pyplot as plt
data = []
for i in range(0, 24):
temp_list = []
temp_list.append(time(i))
temp_list.append(i)
data.append(temp_list)
df = pd.DataFrame(data, columns=["time", "values"])
df.time = pd.to_datetime(df.time, format='%H:%M:%S')
df.set_index(['time'],inplace=True)
ax = sns.scatterplot(df.index, df["values"])
ax.set(xlabel="time", ylabel="measured values")
ax.set_xlim(df.index[0], df.index[-1])
ax.xaxis.set_major_locator(mdates.HourLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M:%S"))
ax.tick_params(axis="x", rotation=45)
This produces
i think you have 2 options:
convert the time to hour only, for that just extract the hour to new column in your df
df['hour_'] = datetime.hour
than use it as your xaxis
if you need the time in the format you described, it may cause you a visibility problem in which timestamps will overlay each other. i'm using the
plt.xticks(rotation=45, horizontalalignment='right')
ax.xaxis.set_major_locator(plt.MaxNLocator(12))
so first i rotate the text then i'm limiting the ticks number.
here is a full script where i used it:
sns.set()
sns.set_style("whitegrid")
sns.axes_style("whitegrid")
for k, g in df_forPlots.groupby('your_column'):
fig = plt.figure(figsize=(10,5))
wide_df = g[['x', 'y', 'z']]
wide_df.set_index(['x'], inplace=True)
ax = sns.lineplot(data=wide_df)
plt.xticks(rotation=45,
horizontalalignment='right')
ax.yaxis.set_major_locator(plt.MaxNLocator(14))
ax.xaxis.set_major_locator(plt.MaxNLocator(35))
plt.title(f"your {k} in somthing{g.z.unique()}")
plt.tight_layout()
hope i halped
I am trying to plot a bar chart with the date vs the price of a crypto currency from a dataframe and have 731 daily samples. When i plot the graph i get the image as seen below. Due to the amount of dates the x axis is unreadable and i would like to make it so it only labels the 1st of every month on the x-axis.
This is the graph i currently have: https://imgur.com/a/QVNn4Zp
I have tried using other methods i have found online both in stackoverflow and other sources such as youtube but had no success.
This is the Code i have so far to plot the bar chart.
df.plot(kind='bar',x='Date',y='Price in USD (at 00:00:00 UTC)',color='red')
plt.show()
One option is to plot a numeric barplot with matplotlib.
Matplotlib < 3.0
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
start = pd.to_datetime("5-1-2012")
idx = pd.date_range(start, periods= 365)
df = pd.DataFrame({'Date': idx, 'A':np.random.random(365)})
fig, ax = plt.subplots()
dates = mdates.date2num(df["Date"].values)
ax.bar(dates, df["A"], width=1)
loc = mdates.AutoDateLocator()
ax.xaxis.set_major_locator(loc)
ax.xaxis.set_major_formatter(mdates.AutoDateFormatter(loc))
plt.show()
Matplotlib >= 3.0
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
pd.plotting.register_matplotlib_converters()
start = pd.to_datetime("5-1-2012")
idx = pd.date_range(start, periods= 365)
df = pd.DataFrame({'Date': idx, 'A':np.random.random(365)})
fig, ax = plt.subplots()
ax.bar(df["Date"], df["A"], width=1)
plt.show()
Further options:
For other options see Pandas bar plot changes date format
This code gives plot of candlesticks with moving averages but the x-axis is in index, I need the x-axis in dates.
What changes are required?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from mpl_finance import candlestick2_ohlc
#date format in data-> dd-mm-yyyy
nif = pd.read_csv('data.csv')
#nif['Date'] = pd.to_datetime(nif['Date'], format='%d-%m-%Y', utc=True)
mavg = nif['Close'].ewm(span=50).mean()
mavg1 = nif['Close'].ewm(span=13).mean()
fg, ax1 = plt.subplots()
cl = candlestick2_ohlc(ax=ax1,opens=nif['Open'],highs=nif['High'],lows=nif['Low'],closes=nif['Close'],width=0.4, colorup='#77d879', colordown='#db3f3f')
mavg.plot(ax=ax1,label='50_ema')
mavg1.plot(color='k',ax=ax1, label='13_ema')
plt.legend(loc=4)
plt.subplots_adjust(left=0.09, bottom=0.20, right=0.94, top=0.90, wspace=0.2, hspace=0)
plt.show()
Output:
I also had a lot of "fun" with this in the past... Here is one way of doing it using mdates:
import pandas as pd
import pandas_datareader.data as web
import datetime as dt
import matplotlib.pyplot as plt
from matplotlib.finance import candlestick_ohlc
import matplotlib.dates as mdates
ticker = 'MCD'
start = dt.date(2014, 1, 1)
#Gathering the data
data = web.DataReader(ticker, 'yahoo', start)
#Calc moving average
data['MA10'] = data['Adj Close'].rolling(window=10).mean()
data['MA60'] = data['Adj Close'].rolling(window=60).mean()
data.reset_index(inplace=True)
data['Date']=mdates.date2num(data['Date'].astype(dt.date))
#Plot candlestick chart
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = fig.add_subplot(111)
ax3 = fig.add_subplot(111)
ax1.xaxis_date()
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%d-%m-%Y'))
ax2.plot(data.Date, data['MA10'], label='MA_10')
ax3.plot(data.Date, data['MA60'], label='MA_60')
plt.ylabel("Price")
plt.title(ticker)
ax1.grid(True)
plt.legend(loc='best')
plt.xticks(rotation=45)
candlestick_ohlc(ax1, data.values, width=0.6, colorup='g', colordown='r')
plt.show()
Output:
Hope this helps.
Simple df:
Using plotly:
import plotly.figure_factory
fig = plotly.figure_factory.create_candlestick(df.open, df.high, df.low, df.close, dates=df.ts)
fig.show()
will automatically parse the ts column to be displayed correctly on x.
Clunky workaround here, derived from other post (if i can find again, will reference). Using a pandas df, plot by index and then reference xaxis tick labels to date strings for display. Am new to python / matplotlib, and this this solution is not so flexible, but it works basically. Also using a pd index for plotting removes the blank 'weekend' daily spaces on market price data.
Matplotlib xaxis index as dates
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from mpl_finance import candlestick2_ohlc
from mpl_finance import candlestick_ohlc
%matplotlib notebook # for Jupyter
# Format m/d/Y,Open,High,Low,Close,Adj Close,Volume
# csv data does not include NaN, or 'weekend' lines,
# only dates from which prices are recorded
DJIA = pd.read_csv('yourFILE.csv') #Format m/d/Y,Open,High,
Low,Close,Adj Close,Volume
print(DJIA.head())
fg, ax1 = plt.subplots()
cl =candlestick2_ohlc(ax=ax1,opens=DJIA['Open'],
highs=DJIA['High'],lows=DJIA['Low'],
closes=DJIA['Close'],width=0.4, colorup='#77d879',
colordown='#db3f3f')
ax1.set_xticks(np.arange(len(DJIA)))
ax1.set_xticklabels(DJIA['Date'], fontsize=6, rotation=-90)
plt.show()
I am plotting two pandas series. The index is a date (1-1 to 12-31)
s1.plot()
s2.plot()
pd.plot() interprets the dates and assigns them to axis values as such:
I would like to modify the major ticks to be the 1st of every month and minor ticks to be the days in between
This works:
%matplotlib notebook
import matplotlib as mpl
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('data.csv')
df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%m-%d')
s2014max = df2014.groupby(['Date'], sort=True)['Data_Value'].max()/10
s2014min = df2014.groupby(['Date'], sort=True)['Data_Value'].min()/10
#remove the leap day and convert to datetime for plotting
s2014min = s2014min[s2014min.index != '02-29']
s2014max = s2014max[s2014max.index != '02-29']
dateslist = s2014min.index.tolist()
dates = [pd.datetime.strptime(date, '%m-%d').date() for date in dateslist]
plt.figure()
ax = plt.gca()
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_minor_locator(mdates.DayLocator())
monthFmt = mdates.DateFormatter('%b')
dayFmt = mdates.DateFormatter('%d')
ax.xaxis.set_major_formatter(monthFmt)
ax.xaxis.set_minor_formatter(dayFmt)
ax.tick_params(direction='out', pad=15)
s2014min.plot()
s2014max.plot()
This results in no ticks:
A possible way is to use matplotlib for plotting the dates instead of pandas.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
dates = pd.date_range("2016-01-01", "2016-12-31" )
y = np.cumsum(np.random.normal(size=len(dates)))
df = pd.DataFrame({"Dates" : dates, "y": y})
fig, ax = plt.subplots()
ax.plot_date(df["Dates"], df.y, '-')
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_minor_locator(mdates.DayLocator())
monthFmt = mdates.DateFormatter('%b')
ax.xaxis.set_major_formatter(monthFmt)
plt.show()
You were so close! All you needed to do was add the formatters similar to how the other answer did it. Here is a working sample similar to your code (note I did mine in ipython notebook hence the %matplotlib inline).
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime, timedelta
from random import random
y = [random() for i in range(25)]
x = [(datetime.now() - timedelta(days=i)) for i in range(25)]
x.reverse()
s = pd.Series(y, index=x) # NOTE: S, not df, since you said you were using series
# format the ticks
ax = plt.gca()
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_minor_locator(mdates.DayLocator())
monthFmt = mdates.DateFormatter('%b')
dayFmt = mdates.DateFormatter('%d')
ax.xaxis.set_major_formatter(monthFmt) # This is what you needed
ax.xaxis.set_minor_formatter(dayFmt) # This is what you needed
ax.tick_params(direction='out', pad=15)
# format the coords message box
s.plot(figsize=(10,3))
which will look like this: