I am trying to plot a simple .csv file downloaded from Yahoo-finance (file example here), but I cannot understand why the years appear as (apparently) random numbers. Please see image below:
Another thing that I would like to do is to remove the x axis from the top graph (since the same axis is already in the bottom plot) but I would like to keep the dashed grid. I tired to use ax[0].set_xticklabels([]), but it didn't work.
Here is my code:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter, MonthLocator, YearLocator
#LOAD DATA
df_name = "0P0000UL8U.L.csv"
col_list = ["Date", "Adj Close"] #list of column to import
df = pd.read_csv(df_name, header=0, usecols=col_list, na_values=['null'], thousands=r',', parse_dates=["Date"], dayfirst=True)
df = df.dropna() #Drop the rows where at least one element is missing.
df.set_index("Date", inplace = True)
df.index = [pd.to_datetime(date).date() for date in df.index] #convert index to datetime.date, not datetime.datetime.
print("Opening df:\n", df)
print("\nLength of df: ", len(df.index))
#PLOT DATA
fig, ax = plt.subplots(2,1, figsize=(11,5))
plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=0.25, hspace=0.8) #Adjust space between graphs
df[["Adj Close"]].plot(ax=ax[0], kind="line", style="-", color="blue", stacked=False, rot=90)
ax[0].set_axisbelow(True) # To put plot grid below plots
ax[0].yaxis.grid(color='gray', linestyle='dashed')
ax[0].xaxis.grid(color='gray', linestyle='dashed')
ax[0].xaxis.set_major_locator(YearLocator()) # specify a MonthLocator
ax[0].xaxis.set_major_formatter(DateFormatter("%b %Y"))
ax[0].set(xlabel=None, ylabel="Price") # Set title and labels for axes
df[["Adj Close"]].plot(ax=ax[1], kind="line", style="-", color="blue", stacked=False, rot=90)
ax[1].set_axisbelow(True) # To put plot grid below plots
ax[1].yaxis.grid(color='gray', linestyle='dashed')
ax[1].xaxis.grid(color='gray', linestyle='dashed')
ax[1].xaxis.set_major_locator(YearLocator()) # specify a MonthLocator
ax[1].xaxis.set_major_formatter(DateFormatter("%b %Y"))
ax[1].set(xlabel="Time", ylabel="Price") # Set title and labels for axes
fig.savefig("0P0000UL8U.L.png", bbox_inches='tight', dpi=300)
What am I doing wrong? Thank for any help in advance.
To remove the x-Axis labels from the top graph, you can add the following line:
ax[0].tick_params(labelbottom=False)
before ax[0].set(xlabel=None, ylabel="Price")
It's not your fault. Python 3 is very far from stable yet. That's why hardcore developers still prefer Python 2. This time matplotlib devs screwed dates handling. They even have a number of corresponding bugs (#18010, #17983, #34850).
Meantime you can downgrade matplotlib to v 3.2.2, it's working perfectly and wait if devs repair the bug.
Related
My data is in a dataframe of two columns: y and x. The data refers to the past few years. Dummy data is below:
np.random.seed(167)
rng = pd.date_range('2017-04-03', periods=365*3)
df = pd.DataFrame(
{"y": np.cumsum([np.random.uniform(-0.01, 0.01) for _ in range(365*3)]),
"x": np.cumsum([np.random.uniform(-0.01, 0.01) for _ in range(365*3)])
}, index=rng
)
In first attempt, I plotted a scatterplot with Seaborn using the following code:
import seaborn as sns
import matplotlib.pyplot as plt
def plot_scatter(data, title, figsize):
fig, ax = plt.subplots(figsize=figsize)
ax.set_title(title)
sns.scatterplot(data=data,
x=data['x'],
y=data['y'])
plot_scatter(data=df, title='dummy title', figsize=(10,7))
However, I would like to generate a 4x3 matrix including 12 scatterplots, one for each month with year as hue. I thought I could create a third column in my dataframe that tells me the year and I tried the following:
import seaborn as sns
import matplotlib.pyplot as plt
def plot_scatter(data, title, figsize):
fig, ax = plt.subplots(figsize=figsize)
ax.set_title(title)
sns.scatterplot(data=data,
x=data['x'],
y=data['y'],
hue=data.iloc[:, 2])
df['year'] = df.index.year
plot_scatter(data=df, title='dummy title', figsize=(10,7))
While this allows me to see the years, it still shows all the data in the same scatterplot instead of creating multiple scatterplots, one for each month, so it's not offering the level of detail I need.
I could slice the data by month and build a for loop that plots one scatterplot per month but I actually want a matrix where all the scatterplots use similar axis scales. Does anyone know an efficient way to achieve that?
To create multiple subplots at once, seaborn introduces figure-level functions. The col= argument indicates which column of the dataframe should be used to identify the subplots. col_wrap= can be used to tell how many subplots go next to each other before starting an additional row.
Note that you shouldn't create a figure, as the function creates its own new figure. It uses the height= and aspect= arguments to tell the size of the individual subplots.
The code below uses a sns.relplot() on the months. An extra column for the months is created; it is made categorical to fix an order.
To remove the month= in the title, you can loop through the generated axes (a recent seaborn version is needed for axes_dict). With sns.set(font_scale=...) you can change the default sizes of all texts.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(167)
dates = pd.date_range('2017-04-03', periods=365 * 3, freq='D')
df = pd.DataFrame({"y": np.cumsum([np.random.uniform(-0.01, 0.01) for _ in range(365 * 3)]),
"x": np.cumsum([np.random.uniform(-0.01, 0.01) for _ in range(365 * 3)])
}, index=dates)
df['year'] = df.index.year
month_names = pd.date_range('2017-01-01', periods=12, freq='M').strftime('%B')
df['month'] = pd.Categorical.from_codes(df.index.month - 1, month_names)
sns.set(font_scale=1.7)
g = sns.relplot(kind='scatter', data=df, x='x', y='y', hue='year', col='month', col_wrap=4, height=4, aspect=1)
# optionally remove the `month=` in the title
for name, ax in g.axes_dict.items():
ax.set_title(name)
plt.setp(g.axes, xlabel='', ylabel='') # remove all x and y labels
g.axes[-2].set_xlabel('x', loc='left') # set an x label at the left of the second to last subplot
g.axes[4].set_ylabel('y') # set a y label to 5th subplot
plt.subplots_adjust(left=0.06, bottom=0.06) # set some more spacing at the left and bottom
plt.show()
I am trying to create a heat map from pandas dataframe using seaborn library. Here, is the code:
test_df = pd.DataFrame(np.random.randn(367, 5),
index = pd.DatetimeIndex(start='01-01-2000', end='01-01-2001', freq='1D'))
ax = sns.heatmap(test_df.T)
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_minor_locator(mdates.DayLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
ax.xaxis.set_minor_formatter(mdates.DateFormatter('%d'))
However, I am getting a figure with nothing printed on the x-axis.
Seaborn heatmap is a categorical plot. It scales from 0 to number of columns - 1, in this case from 0 to 366. The datetime locators and formatters expect values as dates (or more precisely, numbers that correspond to dates). For the year in question that would be numbers between 730120 (= 01-01-2000) and 730486 (= 01-01-2001).
So in order to be able to use matplotlib.dates formatters and locators, you would need to convert your dataframe index to datetime objects first. You can then not use a heatmap, but a plot that allows for numerical axes, e.g. an imshow plot. You may then set the extent of that imshow plot to correspond to the date range you want to show.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
df = pd.DataFrame(np.random.randn(367, 5),
index = pd.DatetimeIndex(start='01-01-2000', end='01-01-2001', freq='1D'))
dates = df.index.to_pydatetime()
dnum = mdates.date2num(dates)
start = dnum[0] - (dnum[1]-dnum[0])/2.
stop = dnum[-1] + (dnum[1]-dnum[0])/2.
extent = [start, stop, -0.5, len(df.columns)-0.5]
fig, ax = plt.subplots()
im = ax.imshow(df.T.values, extent=extent, aspect="auto")
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_minor_locator(mdates.DayLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
fig.colorbar(im)
plt.show()
I found this question when trying to do a similar thing and you can hack together a solution but it's not very pretty.
For example I get the current labels, loop over them to find the ones for January and set those to just the year, setting the rest to be blank.
This gives me year labels in the correct position.
xticklabels = ax.get_xticklabels()
for label in xticklabels:
text = label.get_text()
if text[5:7] == '01':
label.set_text(text[0:4])
else:
label.set_text('')
ax.set_xticklabels(xticklabels)
Hopefully from that you can figure out what you want to do.
I have the following code:
import pandas as pd
from pandas_datareader import data as web
import matplotlib as plt
import datetime as date
start = date.datetime(2000,1,1)
end = date.datetime.today()
df = web.DataReader('fb', 'yahoo',start, end)
df2 = web.DataReader('f', 'yahoo',start, end)
ax = df.plot(y = 'Close')
df3 = df2.pct_change()
df3.plot(y = 'Close', ax=ax)
This code produces the following chart:
The orange line is of a plot where it is percentage change thus it is plotted as a horizontal line.
Is it possible to plot percentage change on the same plot where i have plotted another symbol against price. What i had in mind was so that on the right axis it would show the percentage and on the left it would be the price. Is it possible ? If so could you please show me how?
Just a little change : you need to use plt.subplots() as well as twinx. This way, the x-axis will be duplicated from ax, and use the other side of the plot for the y-axis
fig, ax = plt.subplots()
df.plot(y = 'Close', ax=ax)
ax2 = ax.twinx()
df3 = df2.pct_change()
df3.plot(y = 'Close', ax=ax2)
You will probably need to add a color argument, as both plots will use the same default color.
I am unable to show a bar and line graph on the same plot. Example code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Df = pd.DataFrame(data=np.random.randn(10,4), index=pd.DatetimeIndex(start='2005', freq='M', periods=10), columns=['A','B','C','D'])
fig = plt.figure()
ax = fig.add_subplot(111)
Df[['A','B']].plot(kind='bar', ax=ax)
Df[['C','D']].plot(ax=ax, color=['r', 'c'])
You can also try this:
fig = plt.figure()
ax = DF['A','B'].plot(kind="bar");plt.xticks(rotation=0)
ax2 = ax.twinx()
ax2.plot(ax.get_xticks(),DF['C','D'],marker='o')
I wanted to know as well, however all existing answers are not for showing bar and line graph on the same plot, but on different axis instead.
so I looked for the answer myself and have found an example that is working -- Plot Pandas DataFrame as Bar and Line on the same one chart. I can confirm that it works.
What baffled me was that, the almost same code works there but does not work here. I.e., I copied the OP's code and can verify that it is not working as expected.
The only thing I could think of is to add the index column to Df[['A','B']] and Df[['C','D']], but I don't know how since the index column doesn't have a name for me to add.
Today, I realize that even I can make it works, the real problem is that Df[['A','B']] gives a grouped (clustered) bar chart, but grouped (clustered) line chart is not supported.
The issue is that the pandas bar plot function treats the dates as a categorical variable where each date is considered to be a unique category, so the x-axis units are set to integers starting at 0 (like the default DataFrame index when none is assigned).
The pandas line plot uses x-axis units corresponding to the DatetimeIndex, for which 0 is located on January 1970 and the integers count the number of periods (months in this example) since then. So let's take a look at what happens in this particular case:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
# Create random data
rng = np.random.default_rng(seed=1) # random number generator
df = pd.DataFrame(data=rng.normal(size=(10,4)),
index=pd.date_range(start='2005', freq='M', periods=10),
columns=['A','B','C','D'])
# Create a pandas bar chart overlaid with a pandas line plot using the same
# Axes: note that seeing as I do not set any variable for x, df.index is used
# by default, which is usually what we want when dealing with a dataset
# containing a time series
ax = df.plot.bar(y=['A','B'], figsize=(9,5))
df.plot(y=['C','D'], color=['tab:green', 'tab:red'], ax=ax);
The bars are nowhere to be seen. If you check what x ticks are being used, you'll see that the single major tick placed on January is 420 followed by these minor ticks for the other months:
ax.get_xticks(minor=True)
# [421, 422, 423, 424, 425, 426, 427, 428, 429]
This is because there are 35 years * 12 months since 1970, the numbering starts at 0 so January 2005 lands on 420. This explains why we do not see the bars. If you change the x-axis limit to start from zero, here is what you get:
ax = df.plot.bar(y=['A','B'], figsize=(9,5))
df.plot(y=['C','D'], color=['tab:green', 'tab:red'], ax=ax)
ax.set_xlim(0);
The bars are squashed to the left, starting on January 1970. This problem can be solved by setting use_index=False in the line plot function so that the lines also start at 0:
ax = df.plot.bar(y=['A','B'], figsize=(9,5))
df.plot(y=['C','D'], color=['tab:green', 'tab:red'], ax=ax, use_index=False)
ax.set_xticklabels(df.index.strftime('%b'), rotation=0, ha='center');
# # Optional: move legend to new position
# import matplotlib.pyplot as plt # v 3.3.2
# ax.legend().remove()
# plt.gcf().legend(loc=(0.08, 0.14));
In case you want more advanced tick label formatting, you can check out the answers to this question which are compatible with this example. If you need more flexible/automated tick label formatting as provided by the tick locators and formatters in the matplotlib.dates module, the easiest is to create the plot with matplotlib like in this answer.
You can do something like that, both on the same figure:
In [4]: Df = pd.DataFrame(data=np.random.randn(10,4), index=pd.DatetimeIndex(start='2005', freq='M', periods=10), columns=['A','B','C','D'])
In [5]: fig, ax = plt.subplots(2, 1) # you can pass sharex=True, sharey=True if you want to share axes.
In [6]: Df[['A','B']].plot(kind='bar', ax=ax[0])
Out[6]: <matplotlib.axes.AxesSubplot at 0x10cf011d0>
In [7]: Df[['C','D']].plot(color=['r', 'c'], ax=ax[1])
Out[7]: <matplotlib.axes.AxesSubplot at 0x10a656ed0>
I am unable to show a bar and line graph on the same plot. Example code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Df = pd.DataFrame(data=np.random.randn(10,4), index=pd.DatetimeIndex(start='2005', freq='M', periods=10), columns=['A','B','C','D'])
fig = plt.figure()
ax = fig.add_subplot(111)
Df[['A','B']].plot(kind='bar', ax=ax)
Df[['C','D']].plot(ax=ax, color=['r', 'c'])
You can also try this:
fig = plt.figure()
ax = DF['A','B'].plot(kind="bar");plt.xticks(rotation=0)
ax2 = ax.twinx()
ax2.plot(ax.get_xticks(),DF['C','D'],marker='o')
I wanted to know as well, however all existing answers are not for showing bar and line graph on the same plot, but on different axis instead.
so I looked for the answer myself and have found an example that is working -- Plot Pandas DataFrame as Bar and Line on the same one chart. I can confirm that it works.
What baffled me was that, the almost same code works there but does not work here. I.e., I copied the OP's code and can verify that it is not working as expected.
The only thing I could think of is to add the index column to Df[['A','B']] and Df[['C','D']], but I don't know how since the index column doesn't have a name for me to add.
Today, I realize that even I can make it works, the real problem is that Df[['A','B']] gives a grouped (clustered) bar chart, but grouped (clustered) line chart is not supported.
The issue is that the pandas bar plot function treats the dates as a categorical variable where each date is considered to be a unique category, so the x-axis units are set to integers starting at 0 (like the default DataFrame index when none is assigned).
The pandas line plot uses x-axis units corresponding to the DatetimeIndex, for which 0 is located on January 1970 and the integers count the number of periods (months in this example) since then. So let's take a look at what happens in this particular case:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
# Create random data
rng = np.random.default_rng(seed=1) # random number generator
df = pd.DataFrame(data=rng.normal(size=(10,4)),
index=pd.date_range(start='2005', freq='M', periods=10),
columns=['A','B','C','D'])
# Create a pandas bar chart overlaid with a pandas line plot using the same
# Axes: note that seeing as I do not set any variable for x, df.index is used
# by default, which is usually what we want when dealing with a dataset
# containing a time series
ax = df.plot.bar(y=['A','B'], figsize=(9,5))
df.plot(y=['C','D'], color=['tab:green', 'tab:red'], ax=ax);
The bars are nowhere to be seen. If you check what x ticks are being used, you'll see that the single major tick placed on January is 420 followed by these minor ticks for the other months:
ax.get_xticks(minor=True)
# [421, 422, 423, 424, 425, 426, 427, 428, 429]
This is because there are 35 years * 12 months since 1970, the numbering starts at 0 so January 2005 lands on 420. This explains why we do not see the bars. If you change the x-axis limit to start from zero, here is what you get:
ax = df.plot.bar(y=['A','B'], figsize=(9,5))
df.plot(y=['C','D'], color=['tab:green', 'tab:red'], ax=ax)
ax.set_xlim(0);
The bars are squashed to the left, starting on January 1970. This problem can be solved by setting use_index=False in the line plot function so that the lines also start at 0:
ax = df.plot.bar(y=['A','B'], figsize=(9,5))
df.plot(y=['C','D'], color=['tab:green', 'tab:red'], ax=ax, use_index=False)
ax.set_xticklabels(df.index.strftime('%b'), rotation=0, ha='center');
# # Optional: move legend to new position
# import matplotlib.pyplot as plt # v 3.3.2
# ax.legend().remove()
# plt.gcf().legend(loc=(0.08, 0.14));
In case you want more advanced tick label formatting, you can check out the answers to this question which are compatible with this example. If you need more flexible/automated tick label formatting as provided by the tick locators and formatters in the matplotlib.dates module, the easiest is to create the plot with matplotlib like in this answer.
You can do something like that, both on the same figure:
In [4]: Df = pd.DataFrame(data=np.random.randn(10,4), index=pd.DatetimeIndex(start='2005', freq='M', periods=10), columns=['A','B','C','D'])
In [5]: fig, ax = plt.subplots(2, 1) # you can pass sharex=True, sharey=True if you want to share axes.
In [6]: Df[['A','B']].plot(kind='bar', ax=ax[0])
Out[6]: <matplotlib.axes.AxesSubplot at 0x10cf011d0>
In [7]: Df[['C','D']].plot(color=['r', 'c'], ax=ax[1])
Out[7]: <matplotlib.axes.AxesSubplot at 0x10a656ed0>