I am trying to plot some data, but I don't know how I can add the date values on the x-axis on my graph. Here is my code:
import pandas as pd
import numpy as np
%matplotlib inline
%pylab inline
import matplotlib.pyplot as plt
pylab.rcParams['figure.figsize'] = (15, 9)
df["msft"].plot(grid = True)
The description of the image is a plot, but the x-axis just has numbers, but I am looking for dates to appear on x-axis. The dates are in the date column in the dataframe.
Here is what the dataframe looks like:
date msft nok aapl ibm amzn
1 2018-01-01 09:00:00 112 1 143 130 1298
2 2018-01-01 10:00:00 109 10 185 137 1647
3 2018-01-01 11:00:00 98 11 146 105 1331
4 2018-01-01 12:00:00 83 3 214 131 1355
Can you offer some help on what I am missing?
Your column date is just another column for pandas, you have to tell the program that you want to plot against this specific one. One way is to plot against this column:
from matplotlib import pyplot as plt
import pandas as pd
#load dataframe
df = pd.read_csv("test.txt", delim_whitespace=True)
#convert date column to datetime object, if it is not already one
df["date"] = pd.to_datetime(df["date"])
#plot the specified columns vs dates
df.plot(x = "date", y = ["msft", "ibm"], kind = "line", grid = True)
plt.show()
For more pandas plot options, please have a look at the documentation.
Another way would be to set date as the index of the dataframe. Then you can use your approach:
df.set_index("date", inplace = True)
df[["msft", "ibm"]].plot(grid = True)
plt.show()
The automatic date labels might not be, what you want to display. But there are ways to format the output and you can find examples on SO.
one way to do it is the set_xticklabels function, though Mr. T's answer is the proper way to go
ax = plt.subplot(111)
df["msft"].plot(grid = True)
ax.set_xticklabels(df['date'])
plt.xticks(np.arange(4))
with the data provided:
Related
I have some data I would like to plot consisting of two columns, one being an amount count and the other column being the actually date recorded. When plotting this, since I have over 2000 dates, it makes more sense to not show every single date as a tick on the x-axis, otherwise it won't be readable. However, I am having a hard time making the dates show up on the x-axis with some kind of logic. I have tried using the in-built tick locators for matplotlib but it's not working somehow. Here is a preview of the data:
PatientTraffic = pd.DataFrame({'count' : CleanData.groupby("TimeStamp").size()}).reset_index()
display(PatientTraffic.head(3000))
TimeStamp count
0 2016-03-13 12:20:00 1
1 2016-03-13 13:39:00 1
2 2016-03-13 13:43:00 1
3 2016-03-13 16:00:00 1
4 2016-03-14 13:27:00 1
... ... ...
2088 2020-02-18 16:00:00 8
2089 2020-02-19 16:00:00 8
2090 2020-02-20 16:00:00 8
2091 2020-02-21 16:00:00 8
2092 2020-02-22 16:00:00 8
2093 rows × 2 columns
and when I go to plot it with these settings:
PatientTrafficPerTimeStamp = PatientTraffic.plot.bar(
x='TimeStamp',
y='count',
figsize=(20,3),
title = "Patient Traffic over Time"
)
PatientTrafficPerTimeStamp.xaxis.set_major_locator(plt.MaxNLocator(3))
I expect to get a bar chart where the x-axis has three ticks, one in the beginning middle and end...maybe I'm using this wrong. Also, it seems like the ticks that appear are simply the first 3 in the column which is not what I want. Any help would be appreciated!
You probably think that you have one problem, but you actually have two - and both are based on the fact that you use convenience functions. The problem that you are most likely not aware of is that pandas plots bars as categorical data. This makes sense under most conditions but obviously not, if you have TimeStamp data as your x-axis. Let's see if I didn't make that up:
import matplotlib.pyplot as plt
import pandas as pd
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
df = pd.read_csv("test.txt", sep = "\s{2,}", engine="python")
#convert TS from string into datetime objects
df.TS = pd.to_datetime(df.TS, format="%Y-%m-%d %H:%M:%S")
#and plot it as you do directly from pandas that provides the data to matplotlib
df.plot.bar(
x="TS",
y="Val",
ax=ax1,
title="pandas version"
)
#now plot the same data using matplotlib
ax2.bar(df.TS, df.Val, width=22)
ax2.tick_params(axis="x", labelrotation=90)
ax2.set_title("matplotlib version")
plt.tight_layout()
plt.show()
Sample output:
So, we should plot them directly from matplotlib to prevent losing the TimeStamp information. Obviously, we lose some comfort provided by pandas, e.g., we have to adjust the width of the bars and label the axes. Now, you could use the other convenience function of MaxNLocatorbut as you noticed that has been written to work well for most conditions but you give up control over the exact positioning of the ticks. Why not write our own locator using FixedLocator?
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.ticker import FixedLocator
import pandas as pd
def myownMaxNLocator(datacol, n):
datemin = mdates.date2num(datacol.min())
datemax = mdates.date2num(datacol.max())
xticks = np.linspace(datemin, datemax, n)
return xticks
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
df = pd.read_csv("test.txt", sep = "\s{2,}", engine="python")
df.TS = pd.to_datetime(df.TS, format="%Y-%m-%d %H:%M:%S")
df.plot.bar(
x="TS",
y="Val",
ax=ax1,
title="pandas version"
)
ax2.bar(df.TS, df.Val, width=22)
ax2.set_title("matplotlib version")
dateticks = myownMaxNLocator(df.TS, 5)
ax2.xaxis.set_major_locator(FixedLocator(dateticks))
ax2.tick_params(axis="x", labelrotation=90)
plt.tight_layout()
plt.show()
Sample output:
Here, the ticks start with the lowest value and end with the highest value. Alternatively, you could use the LinearLocator that distributes the ticks evenly over the entire view:
from matplotlib.ticker import LinearLocator
...
ax2.bar(df.TS, df.Val, width=22)
ax2.set_title("matplotlib version")
ax2.xaxis.set_major_locator(LinearLocator(numticks=5))
ax2.tick_params(axis="x", labelrotation=90)
...
Sample output:
The sample data were stored in a file with the following structure:
TS Val
0 2016-03-13 12:20:00 1
1 2016-04-13 13:39:00 3
2 2016-04-03 13:43:00 5
3 2016-06-17 16:00:00 1
4 2016-09-14 13:27:00 2
2088 2017-02-08 16:00:00 7
2089 2017-02-25 16:00:00 2
2090 2018-02-20 16:00:00 8
2091 2019-02-21 16:00:00 9
2092 2020-02-22 16:00:00 8
Have you considered grouping by date if you don't need that many xticks?
Answering your question, you can make custom ticks with :
plt.xticks(ticks=[ any list ], labels=[ list of labels ])
link to documentation
I am trying to modify the format of the x-tick label to date format (%m-%d).
My data consists of hourly data values over a certain period of dates. I am trying to plot the data for 14 days. However when I run I get x labels fully jumbled up.
Is there any way I can show only dates and skip hourly values on the x-axis. ? Is there any way to modify x ticks where I can skip labels for hours and show labels only for dates? I am using seaborn.
After suggestion from comment by i edited my code to plot as below:
fig, ax = plt.pyplot.subplots()
g = sns.barplot(data=data_n,x='datetime',y='hourly_return')
g.xaxis.set_major_formatter(plt.dates.DateFormatter("%d-%b"))
But I got the following error:
ValueError: DateFormatter found a value of x=0, which is an illegal
date; this usually occurs because you have not informed the axis that
it is plotting dates, e.g., with ax.xaxis_date()
Upon checking the datetime column I get following output with data type type of the column:
0 2020-01-01 00:00:00
1 2020-01-01 01:00:00
2 2020-01-01 02:00:00
3 2020-01-01 03:00:00
4 2020-01-01 04:00:00
...
307 2020-01-13 19:00:00
308 2020-01-13 20:00:00
309 2020-01-13 21:00:00
310 2020-01-13 22:00:00
311 2020-01-13 23:00:00
Name: datetime, Length: 312, dtype: datetime64[ns]
I was suspecting the x ticks so when I ran g.get_xticks() [which gets the ticks on x-axis], I got output as ordinal numbers. Can anyone tell why is this happening?
1. Approach for Drawing Line Plot with x-axis datetime
Can you try changing x axis format as below
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import dates
## create dummy dataframe
datelist = pd.date_range(start='2020-01-01 00:00:00', periods=312,freq='1H').tolist()
#create dummy dataframe
df = pd.DataFrame(datelist, columns=["datetime"])
df["val"] = [i for i in range(1,312+1)]
df.head()
Below is the dataframe info
Draw plot
fig, ax = plt.subplots()
chart = sns.lineplot(data=df, ax=ax, x="datetime",y="val")
ax.xaxis.set_major_formatter(dates.DateFormatter("%d-%b"))
Output:
2. Approach for Drawing Bar plot using seaborn with x-axis datetime
There is a problem with the above approach if you draw for barplot. So, will use below code
fig, ax = plt.subplots()
## barplot
chart = sns.barplot(data=df, ax=ax,x="datetime",y="val")
## freq of showing dates, since frequency of datetime in our data is 1H.
## so, will have every day 24data points
## Trying to calculate the frequency by each day
## (assumed points are collected every hours in each day, 24)
## set the frequency for labelling the xaxis
freq = int(24)
# set the xlabels as the datetime data for the given labelling frequency,
# also use only the date for the label
ax.set_xticklabels(df.iloc[::freq]["datetime"].dt.strftime("%d-%b-%y"))
# set the xticks at the same frequency as the xlabels
xtix = ax.get_xticks()
ax.set_xticks(xtix[::freq])
# nicer label format for dates
fig.autofmt_xdate()
plt.show()
output:
I read this excell sheet (only column of 'DATEHEUREMAX') with pandas using this command:
xdata = read_excel('Data.xlsx', 'Data', usecols=['DATEHEUREMAX'])
now I want to turn this df into a simplify df with only hour:min rounded to 15min up. The main idea is to plot an histogram base on hour:min
Consider the following DataFrame, with a single column, read as datetime (not string):
Dat
0 2019-06-03 12:07:00
1 2019-06-04 10:04:00
2 2019-06-05 11:42:00
3 2019-06-06 10:17:00
To round these dates to 15 mins run:
df['Dat2'] = df.Dat.dt.round('15T').dt.time.map(lambda s: str(s)[:-3])
The result is:
Dat Dat2
0 2019-06-03 12:07:00 12:00
1 2019-06-04 10:04:00 10:00
2 2019-06-05 11:42:00 11:45
3 2019-06-06 10:17:00 10:15
For demonstration purpose, I saved the result in a new column, but you can
save it in the original column.
I think this is what you are asking for
rounded_column = df['time_column'].dt.round('15min').strftime("%H:%M")
although i agree with the commenters you might not really need to do this and just use a timegrouper
There is no need to round your column in order to get a histogram of dates with your DATEHEUREMAX column. For this purpose you can just make use of pd.Grouper as detailed below.
Toy sample code
You can work out this example to get a solution with your date column:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Generating a sample of 10000 timestamps and selecting 500 to randomize them
df = pd.DataFrame(np.random.choice(pd.date_range(start=pd.to_datetime('2015-01-14'),periods = 10000, freq='S'), 500), columns=['date'])
# Setting the date as the index since the TimeGrouper works on Index, the date column is not dropped to be able to count
df.set_index('date', drop=False, inplace=True)
# Getting the histogram
df.groupby(pd.Grouper(freq='15Min')).count().plot(kind='bar')
plt.show()
This code resolves to a graph like below:
Solution with your data
For your data you should be able to do something like:
import pandas as pd
import matplotlib.pyplot as plt
xdata = read_excel('Data.xlsx', 'Data', usecols=['DATEHEUREMAX'])
xdata.set_index('DATEHEUREMAX', drop=False, inplace=True)
xdata.groupby(pd.Grouper(freq='15Min')).count().plot(kind='bar')
plt.show()
Hi I have a dataframe like this:
Date Influenza[it] Febbre[it] Cefalea[it] Paracetamolo[it] \
0 2008-01 989 2395 1291 2933
1 2008-02 962 2553 1360 2547
2 2008-03 1029 2309 1401 2735
3 2008-04 1031 2399 1137 2296
Unnamed: 6 tot_incidence
0 NaN 4.56
1 NaN 5.98
2 NaN 6.54
3 NaN 6.95
I'd like to plot different figures with on x-axis the Date column and the y-axis the Influenza[it] column and another column like Febbre[it]. Then again x-axis the Date column, y-axis Influenza[it] column and another column (ex. Paracetamolo[it]) and so on. I'm trying to figure out if there is a fast way to make it without completely manipulate the dataframes.
You can simply plot 3 different subplots.
import pandas as pd
import matplotlib.pyplot as plt
dic = {"Date" : ["2008-01","2008-02", "2008-03", "2008-04"],
"Influenza[it]" : [989,962,1029,1031],
"Febbre[it]" : [2395,2553,2309,2399],
"Cefalea[it]" : [1291,1360,1401,1137],
"Paracetamolo[it]" : [2933,2547,2735,2296]}
df = pd.DataFrame(dic)
#optionally convert to datetime
df['Date'] = pd.to_datetime(df['Date'])
fig, ax = plt.subplots(1,3, figsize=(13,7))
df.plot(x="Date", y=["Influenza[it]","Febbre[it]" ], ax=ax[0])
df.plot(x="Date", y=["Influenza[it]","Cefalea[it]" ], ax=ax[1])
df.plot(x="Date", y=["Influenza[it]","Paracetamolo[it]" ], ax=ax[2])
#optionally equalize yaxis limits
for a in ax:
a.set_ylim([800, 3000])
plt.show()
If you want to plot each plot separately in a jupyter notebook, the following might do what you want.
Additionally we convert the dates from format year-week to a datetime to be able to plot them with matplotlib.
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
dic = {"Date" : ["2008-01","2008-02", "2008-03", "2008-04"],
"Influenza[it]" : [989,962,1029,1031],
"Febbre[it]" : [2395,2553,2309,2399],
"Cefalea[it]" : [1291,1360,1401,1137],
"Paracetamolo[it]" : [2933,2547,2735,2296]}
df = pd.DataFrame(dic)
#convert to datetime, format year-week -> date (monday of that week)
df['Date'] = [ date + "-1" for date in df['Date']] # add "-1" indicating monday of that week
df['Date'] = pd.to_datetime(df['Date'], format="%Y-%W-%w")
cols = ["Febbre[it]", "Cefalea[it]", "Paracetamolo[it]"]
for col in cols:
plt.close()
fig, ax = plt.subplots(1,1)
ax.set_ylim([800, 3000])
ax.plot(df.Date, df["Influenza[it]"], label="Influenza[it]")
ax.plot(df.Date, df[col], label=col)
ax.legend()
plt.show()
I am plotting several pandas series objects of "total events per week". The data in the series events_per_week looks like this:
Datetime
1995-10-09 45
1995-10-16 63
1995-10-23 83
1995-10-30 91
1995-11-06 101
Freq: W-SUN, dtype: int64
My problem is as follows. All pandas series are the same length, i.e. beginning in same year 1995. One array begins in 2003 however. events_per_week2003 begins in 2003
Datetime
2003-09-08 25
2003-09-15 36
2003-09-22 74
2003-09-29 25
2003-09-05 193
Freq: W-SUN, dtype: int64
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(20,5))
ax = plt.subplot(111)
plt.plot(events_per_week)
plt.plot(events_per_week2003)
I get the following value error.
ValueError: setting an array element with a sequence.
How can I do this?
I really don't get where you're having problems.
I tried to recreate a piece of the dataframe, and it plotted with no problems.
import numpy, matplotlib
data = numpy.array([45,63,83,91,101])
df1 = pd.DataFrame(data, index=pd.date_range('2005-10-09', periods=5, freq='W'), columns=['events'])
df2 = pd.DataFrame(numpy.arange(10,21,2), index=pd.date_range('2003-01-09', periods=6, freq='W'), columns=['events'])
matplotlib.pyplot.plot(df1.index, df1.events)
matplotlib.pyplot.plot(df2.index, df2.events)
matplotlib.pyplot.show()
Using Series instead of Dataframe:
ds1 = pd.Series(data, index=pd.date_range('2005-10-09', periods=5, freq='W'))
ds2 = pd.Series(numpy.arange(10,21,2), index=pd.date_range('2003-01-09', periods=6, freq='W'))
matplotlib.pyplot.plot(ds1)
matplotlib.pyplot.plot(ds2)
matplotlib.pyplot.show()