My dataframe is like this-
Energy_MWh Month
0 39686.82 1979-01
1 35388.78 1979-02
2 50134.02 1979-03
3 37499.22 1979-04
4 20104.08 1979-05
5 17440.26 1979-06
It goes on like this to the month 2015-12. So you can imagine all the data.
I want to plot a continuous graph with the months as the x-axis and the Energy_MWh as the y-axis. How to best represent this using matplotlib?
I would also like to know for my knowledge if there's a way to print 1979-01 as Jan-1979 on the x-axis and so on. Probably a lambda function or something while plotting.
Borrowed liberally from this answer, which you should go out and upvote:
from datetime import datetime
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
df = <set_your_data_frame_here>
myDates = pd.to_datetime(df['Month'])
myValues = df['Energy_MWh']
fig, ax = plt.subplots()
ax.plot(myDates,myValues)
myFmt = DateFormatter("%b-%Y")
ax.xaxis.set_major_formatter(myFmt)
## Rotate date labels automatically
fig.autofmt_xdate()
plt.show()
Set Month as the index:
df.set_index('Month', inplace=True)
Convert the index to Datetime:
df.index = pd.DatetimeIndex(df.index)
Plot:
df.plot()
Related
I am trying to create a plot with an amount (int) in the y-axis and days in the x-axis.
I want the plot to always have the whole month in the x-axis although I dont have data for all days.
This is the code I tryed:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.dates as mdates
import datetime as dt
df=get_pandas_data(datab) #Taking data from database in pandas DataFrame
fig = plt.figure(figsize=(10,10)) #Initialize plot
ax1 = fig.add_subplot(1,1,1)
dates=[dt.datetime.strptime(d,'%Y-%m-%d').date() for d in df['date']]
dates=list(set(dates)) #Takes all the dates from de Dataframe and sets to avoid repeated dates
s=df.resample('D', on='date')['amount'].sum() #Takes the total amount for the same date
ax1.bar(dates,s) #Bar plot for dates and amount
ax1.set(xlabel="Date",
ylabel="Balance (€)",
title="Total Monthly balance") # Plot information
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%d-%m-%Y'))
#this is soposed to set all days of the month in the x-axis
ax1.xaxis.set_major_locator(mdates.DayLocator(interval=1))
fig.autofmt_xdate()
plt.show()
The result I get from this is a plot but only with those days that have data.
How can I make the plot to have all days in the month and plot the bar on those who have data?
This works fine with bare datetimes and matplotlib so you must be malforming your data somehow when doing your pandas manipulations. But we can't really help because we don't have your dataframe. Its always preferable to create a standalone example with dummy data, and as little code as possible to recreate the issue. a) 90% of the time you will realize your problem b) if not, we can help...
import numpy as np
import matplotlib.pyplot as plt
import datetime
x = np.array([1, 3, 7, 8, 10])
y = x * 2
dates = [datetime.datetime(2000, 2, xx) for xx in x]
fig, ax = plt.subplots()
ax.bar(dates, y)
fig.autofmt_xdate()
plt.show()
I am trying to create a heat map from pandas dataframe using seaborn library. Here, is the code:
test_df = pd.DataFrame(np.random.randn(367, 5),
index = pd.DatetimeIndex(start='01-01-2000', end='01-01-2001', freq='1D'))
ax = sns.heatmap(test_df.T)
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_minor_locator(mdates.DayLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
ax.xaxis.set_minor_formatter(mdates.DateFormatter('%d'))
However, I am getting a figure with nothing printed on the x-axis.
Seaborn heatmap is a categorical plot. It scales from 0 to number of columns - 1, in this case from 0 to 366. The datetime locators and formatters expect values as dates (or more precisely, numbers that correspond to dates). For the year in question that would be numbers between 730120 (= 01-01-2000) and 730486 (= 01-01-2001).
So in order to be able to use matplotlib.dates formatters and locators, you would need to convert your dataframe index to datetime objects first. You can then not use a heatmap, but a plot that allows for numerical axes, e.g. an imshow plot. You may then set the extent of that imshow plot to correspond to the date range you want to show.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
df = pd.DataFrame(np.random.randn(367, 5),
index = pd.DatetimeIndex(start='01-01-2000', end='01-01-2001', freq='1D'))
dates = df.index.to_pydatetime()
dnum = mdates.date2num(dates)
start = dnum[0] - (dnum[1]-dnum[0])/2.
stop = dnum[-1] + (dnum[1]-dnum[0])/2.
extent = [start, stop, -0.5, len(df.columns)-0.5]
fig, ax = plt.subplots()
im = ax.imshow(df.T.values, extent=extent, aspect="auto")
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_minor_locator(mdates.DayLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
fig.colorbar(im)
plt.show()
I found this question when trying to do a similar thing and you can hack together a solution but it's not very pretty.
For example I get the current labels, loop over them to find the ones for January and set those to just the year, setting the rest to be blank.
This gives me year labels in the correct position.
xticklabels = ax.get_xticklabels()
for label in xticklabels:
text = label.get_text()
if text[5:7] == '01':
label.set_text(text[0:4])
else:
label.set_text('')
ax.set_xticklabels(xticklabels)
Hopefully from that you can figure out what you want to do.
I have a plot_graph() function that plots pandas dataframe as a line chart.
def plot_graph(df):
ax = plt.gca()
#df["Date"].dt.strftime("%m/%d/%y")
#df["date"] = df["date"].astype('datetime64[ns]')
print(df['date'])
df.plot(kind='line', x='date', y='Actual', ax=ax)
df.plot(kind='line', x='date', y='Expected', color='red', ax=ax)
ax.xaxis.set_major_locator(plt.MaxNLocator(3))
plt.savefig("fig1.png")
I pass pandas dataframe in this format
date actual expected
2019-11 20 65
2019-12 35 65
When I plot the line chart, x axis labels does not get displayed correctly as in (yyyy-mm) format. I believe it is with the date format. So I tried converting it to date. I tried with all the options(commented in the code), nothing seems to work. Any suggestions would be appreicated.
Try this:
import pandas as pd
import matplotlib.dates as mdates
def plot_graph(df):
ax = plt.gca()
df['date'] = pd.to_datetime(df['date']).dt.date
df.plot(kind='line', x='date', y='actual', ax=ax)
df.plot(kind='line', x='date', y='expected', color='red', ax=ax)
ax.xaxis.set_major_locator(mdates.MonthLocator())
# ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m')) #to explicitly set format
plot_graph(df)
I think using matplotlib.dates is the best thing here, but it seems like df.plot() needs dates to be date and not datetime (or string). If you instead plot directly through matplotlib you don't need to do this. More here.
Reference Matplotlib: Date tick labels & Formatting date ticks using ConciseDateFormatter
matplotlib.dates.MonthLocator
matplotlib.dates.DateFormatter
matplotlib.axis.Axis.set_major_locator
matplotlib.axis.XAxis.set_major_formatter
Note the index column is in a datetime format. To transform your column to datetime, use df.date = pd.to_datetime(df.date)
df.plot() has tick locs like array([13136, 13152, 13174, 13175], dtype=int64). I don't actually know how those numbers are derived, but they cause an issue with some of the matplotlib axis and date formatting methods, which is why I changed the plots away from df.plot.
sns.lineplot and plt.plot have tick locs that are the ordinal representation of the datetime, array([737553., 737560., 737567., 737577., 737584., 737591., 737598., 737607.].
import pandas as pd
import numpy as np # for test data
from datetime import datetime # for test data
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
# synthetic data with date as a datetime
np.random.seed(365)
length = 700
df = pd.DataFrame(np.random.rand(length, 2) * 10, columns=['Actual', 'Expected'], index=pd.bdate_range(datetime.today(), freq='d', periods=length).tolist()).reset_index()
# display(df.head())
index Actual Expected
0 2020-07-16 9.414557 6.416027
1 2020-07-17 6.846105 5.885621
2 2020-07-18 5.438872 3.680709
3 2020-07-19 7.666258 3.050124
4 2020-07-20 4.420860 1.104433
# function
def plot_graph(df):
# df.date = pd.to_datetime(df.date) # if needed and date is the column name
fig, ax = plt.subplots()
months = mdates.MonthLocator() # every month
months_fmt = mdates.DateFormatter('%Y-%m') # format
ax.plot('index', 'Actual', data=df)
ax.plot('index', 'Expected', data=df, color='red')
# format the ticks
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(months_fmt)
plt.xticks(rotation=90)
plt.legend()
plt.show()
plot_graph(df)
I have a series whose index is datetime that I wish to plot. I want to plot the values of the series on the y axis and the index of the series on the x axis. The Series looks as follows:
2014-01-01 7
2014-02-01 8
2014-03-01 9
2014-04-01 8
...
I generate a graph using plt.plot(series.index, series.values). But the graph looks like:
The problem is that I would like to have only year and month (yyyy-mm or 2016 March). However, the graph contains hours, minutes and seconds. How can I remove them so that I get my desired formatting?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# sample data
N = 30
drange = pd.date_range("2014-01", periods=N, freq="MS")
np.random.seed(365) # for a reproducible example of values
values = {'values':np.random.randint(1,20,size=N)}
df = pd.DataFrame(values, index=drange)
fig, ax = plt.subplots()
ax.plot(df.index, df.values)
ax.set_xticks(df.index)
# use formatters to specify major and minor ticks
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
ax.xaxis.set_minor_formatter(mdates.DateFormatter("%Y-%m"))
_ = plt.xticks(rotation=90)
You can try something like this:
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
df = pd.DataFrame({'values':np.random.randint(0,1000,36)},index=pd.date_range(start='2014-01-01',end='2016-12-31',freq='M'))
fig,ax1 = plt.subplots()
plt.plot(df.index,df.values)
monthyearFmt = mdates.DateFormatter('%Y %B')
ax1.xaxis.set_major_formatter(monthyearFmt)
_ = plt.xticks(rotation=90)
You should check out this native function of matplotlib:
fig.autofmt_xdate()
See examples on the source website Custom tick formatter
I have a datetime series (called date) like this:
0 2012-06-26
1 2011-02-22
2 2012-06-06
3 2013-02-10
4 2004-01-01
5 2011-01-25
6 2015-11-02
And i want to scatter plot with dates for Y axis and months and years on X axis.
I've played around with pyplot.plot_date, but can't figure out any solution.
It shoud be something like this only with dates on Y axis.
Any advice?
Take a look into matplotlib, it's really usefull.
This might also help you to start the plotting.
dates = [
'2012-06-26',
'2011-02-22',
'2012-06-06',
'2013-02-10',
'2004-01-01',
'2011-01-25',
'2015-11-02',
]
year = []
month = []
day = []
# Sorts your data in an useable dataset
def sort_data():
for i in range(len(dates)):
extracted_year = dates[i][0:4]
extracted_year = int(extracted_year)
year.append(extracted_year)
for j in range(len(dates)):
extracted_month = dates[j][5:7]
extracted_month = int(extracted_month)
month.append(extracted_month)
for k in range(len(dates)):
extracted_day = dates[k][8:10]
extracted_day = int(extracted_day)
day.append(extracted_day)
sort_data()
# Just checking if sort_date() worked correctly
print(year)
print(month)
print(day)
As far my best solution is converting year and month to floats and days to int:
from matplotlib import pyplot
dates = [
'2012-06-26',
'2011-02-22',
'2012-06-06',
'2013-02-10',
'2004-01-01',
'2011-01-25',
'2015-11-02',]
fig, ax = pyplot.subplots()
ax.scatter(date.apply(lambda x: float(x.strftime('%Y.%m'))),date.apply(lambda x: x.day), marker='o')
As your question is not completely clear to me, I assume you want to scatter plot all input datetime entries between oldest and recent most year (to be selected from the given input only).
Also, it is my very first attempt in "matplotlib" library. So, I just hope following answer would lead you to your expected result.
import datetime
import numpy
import matplotlib.pyplot as plt
from matplotlib.dates import MONDAY
from matplotlib.dates import DateFormatter, MonthLocator, WeekdayLocator
from matplotlib.dates import date2num
# Input datetime series.
dt_series = ["2012-06-26", "2011-02-22", "2012-06-06", "2013-02-10", "2004-01-01", "2011-01-25", "2015-11-02"]
mondays = WeekdayLocator(MONDAY)
months = MonthLocator(range(1, 13), bymonthday=1, interval=6)
monthsFmt = DateFormatter("%b'%y")
# Loop to create our own X-axis and Y-axis values.
# mths is for X-axis values, which are months along with year.
# dates is for Y-axis values, which are dates.
mths = list()
dates = list()
for dt in dt_series:
mths.append(date2num(datetime.datetime.strptime(dt.replace('-', ''), "%Y%m%d")))
dates.append(numpy.float64(float(dt.split('-')[2])))
fig, ax = plt.subplots(squeeze=True)
ax.plot_date(mths, dates, 'o', tz=None, xdate=True, ydate=False)
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(monthsFmt)
ax.xaxis.set_minor_locator(mondays)
ax.autoscale_view()
ax.grid(True)
fig.autofmt_xdate()
plt.show()
I would also suggest you to go through following link which I used for my reference:
http://matplotlib.org/examples/pylab_examples/date_demo2.html
Thank You