Set index values for python plot - python

I am graphing three lines on a single plot. I want the x-axis to display the date the data was taken on and the time from 00:00 to 24:00. Right now my code displays the time of day correctly but for the date, instead of the date that the data was recorded on being displayed, the current date is shown (12-18). I am unsure how to correct this. Also it would be acceptable for my plot to show only time from 00:00 to 24:00 with out the date on the x-axis. Thank you for your help!!
# set index as time for graphing
monAverages['Time'] = monAverages['Time'].apply(lambda x: pd.to_datetime(str(x)))
index = monAverages['Time']
index = index.apply(lambda x: pd.to_datetime(str(x)))
averagePlot = dfSingleDay
predictPlot = predictPlot[np.isfinite(predictPlot)]
datasetPlot = datasetPlot[np.isfinite(datasetPlot)]
predictPlot1 = pd.DataFrame(predictPlot)
datasetPlot1 = pd.DataFrame(datasetPlot)
averagePlot.set_index(index, drop=True,inplace=True)
datasetPlot1.set_index(index, drop=True,inplace=True)
predictPlot1.set_index(index, drop=True,inplace=True)
plt.rcParams["figure.figsize"] = (10,10)
plt.plot(datasetPlot1,'b', label='Real Data')
plt.plot(averagePlot, 'y', label='Average for this day of the week')
plt.plot(predictPlot1, 'g', label='Predictions')
plt.title('Power Consumption')
plt.xlabel('Date (00-00) and Time of Day(00)')
plt.ylabel('kW')
plt.legend()
plt.show()

You need to be sure that you get only the time:
import matplotlib.dates as mdates
# set index as time for graphing
monAverages['Time'] = monAverages['Time'].apply(lambda x: pd.to_datetime(str(x)))
index = monAverages['Time']
#index = index.apply(lambda x: pd.to_datetime(str(x)))
dates= [dt.datetime.strptime(d,'%Y-%m-%d %H:%M:%S').time() for d in index]
averagePlot = dfSingleDay
predictPlot = predictPlot[np.isfinite(predictPlot)]
datasetPlot = datasetPlot[np.isfinite(datasetPlot)]
predictPlot1 = pd.DataFrame(predictPlot)
datasetPlot1 = pd.DataFrame(datasetPlot)
plt.rcParams["figure.figsize"] = (10,10)
plt.plot(dates,datasetPlot1,'b', label='Real Data')
plt.plot(dates,averagePlot, 'y', label='Average for this day of the week')
plt.plot(dates,predictPlot1, 'g', label='Predictions')
plt.title('Power Consumption')
plt.xlabel('Date (00-00) and Time of Day(00)')
plt.ylabel('kW')
plt.legend()
plt.show()
This code here explains how you can run it
import datetime as dt
import matplotlib.pyplot as plt
dates = ['2019-12-18 00:00:00','2019-12-18 12:00:00','2019-12-18 13:00:00']
x = [dt.datetime.strptime(d,'%Y-%m-%d %H:%M:%S').time() for d in dates]
y = range(len(x))
plt.plot(x,y)
plt.gcf().autofmt_xdate()
plt.show()

Related

Matplotlib Plot X-Axis by Month

I am looking to automate some work I have been doing in PowerPoint/Excel using Python and MatPlotLib; however, I am having trouble recreating what I have been doing in PowerPoint/Excel.
I have three data series that are grouped by month on the x-axis; however, the months are not date/time and have no real x-values. I want to be able to assign x-values based on the number of rows (so they are not stacked), then group them by month, and add a vertical line once the month "value" changes.
It is also important to note that the number of rows per month can vary, so im having trouble grouping the months and automatically adding the vertical line once the month data changes to the next month.
Here is a sample image of what I created in PowerPoint/Excel and what I am hoping to accomplish:
Here is what I have so far:
For above: I added a new column to my csv file named "Count" and added that as my x-values; however, that is only a workaround to get my desired "look" and does not separate the points by month.
My code so far:
manipulate.csv
Count,Month,Type,Time
1,June,Purple,13
2,June,Orange,3
3,June,Purple,13
4,June,Orange,12
5,June,Blue,55
6,June,Blue,42
7,June,Blue,90
8,June,Orange,3
9,June,Orange,171
10,June,Blue,132
11,June,Blue,96
12,July,Orange,13
13,July,Orange,13
14,July,Orange,22
15,July,Orange,6
16,July,Purple,4
17,July,Orange,3
18,July,Orange,18
19,July,Blue,99
20,August,Blue,190
21,August,Blue,170
22,August,Orange,33
23,August,Orange,29
24,August,Purple,3
25,August,Purple,9
26,August,Purple,6
testchart.py
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('manipulate.csv')
df=df.reindex(columns=["Month", "Type", "Time", "Count"])
df['Orange'] = df.loc[df['Type'] == 'Orange', 'Time']
df['Blue'] = df.loc[df['Type'] == 'Blue', 'Time']
df['Purple'] = df.loc[df['Type'] == 'Purple', 'Time']
print(df)
w = df['Count']
x = df['Orange']
y = df['Blue']
z = df['Purple']
plt.plot(w, x, linestyle = 'none', marker='o', c='Orange')
plt.plot(w, y, linestyle = 'none', marker='o', c='Blue')
plt.plot(w, z, linestyle = 'none', marker='o', c='Purple')
plt.ylabel("Time")
plt.xlabel("Month")
plt.show()
Can I suggest using Seaborn's swarmplot instead? It might be easier:
import seaborn as sns
import matplotlib.pyplot as plt
# Change the month to an actual date then set the format to just the date's month's name
df.Month = pd.to_datetime(df.Month, format='%B').dt.month_name()
sns.swarmplot(data=df, x='Month', y='Time', hue='Type', palette=['purple', 'orange', 'blue'])
plt.legend().remove()
for x in range(len(df.Month.unique())-1):
plt.axvline(0.5+x, linestyle='--', color='black', alpha = 0.5)
Output Graph:
Or Seaborn's stripplot with some jitter value:
import seaborn as sns
import matplotlib.pyplot as plt
# Change the month to an actual date then set the format to just the date's month's name
df.Month = pd.to_datetime(df.Month, format='%B').dt.month_name()
sns.stripplot(data=df, x='Month', y='Time', hue='Type', palette=['purple', 'orange', 'blue'], jitter=0.4)
plt.legend().remove()
for x in range(len(df.Month.unique())-1):
plt.axvline(0.5+x, linestyle='--', color='black', alpha = 0.5)
If not, this answer will use matplotlib.dates's mdates to format the labels of the xaxis to just the month names. It will also use datetime's timedelta to add some days to each month to split them up (so that they are not overlapped):
from datetime import timedelta
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
df.Month = pd.to_datetime(df.Month, format='%B')
separators = df.Month.unique() # Get each unique month, to be used for the vertical lines
# Add an amount of days to each value within a range of 25 days based on how many days are in each month in the dataframe
# This is just to split up the days so that there is no overlap
dayAdditions = sum([list(range(2,25,int(25/x))) for x in list(df.groupby('Month').count().Time)], [])
df.Month = [x + timedelta(days=count) for x,count in zip(df.Month, dayAdditions)]
df=df.reindex(columns=["Month", "Type", "Time", "Count"])
df['Orange'] = df.loc[df['Type'] == 'Orange', 'Time']
df['Blue'] = df.loc[df['Type'] == 'Blue', 'Time']
df['Purple'] = df.loc[df['Type'] == 'Purple', 'Time']
w = df['Count']
x = df['Orange']
y = df['Blue']
z = df['Purple']
fig, ax = plt.subplots()
plt.plot(df.Month, x, linestyle = 'none', marker='o', c='Orange')
plt.plot(df.Month, y, linestyle = 'none', marker='o', c='Blue')
plt.plot(df.Month, z, linestyle = 'none', marker='o', c='Purple')
plt.ylabel("Time")
plt.xlabel("Month")
ax.xaxis.set_major_locator(mdates.MonthLocator(bymonthday=15)) # Set the locator at the 15th of each month
ax.xaxis.set_major_formatter(mdates.DateFormatter('%B')) # Set the format to just be the month name
for sep in separators[1:]:
plt.axvline(sep, linestyle='--', color='black', alpha = 0.5) # Add a separator at every month starting at the second month
plt.show()
Output:
This is how I put your data in a df, in case anyone else wants to grab it to help answer the question:
from io import StringIO
import pandas as pd
TESTDATA = StringIO(
'''Count,Month,Type,Time
1,June,Purple,13
2,June,Orange,3
3,June,Purple,13
4,June,Orange,12
5,June,Blue,55
6,June,Blue,42
7,June,Blue,90
8,June,Orange,3
9,June,Orange,171
10,June,Blue,132
11,June,Blue,96
12,July,Orange,13
13,July,Orange,13
14,July,Orange,22
15,July,Orange,6
16,July,Purple,4
17,July,Orange,3
18,July,Orange,18
19,July,Blue,99
20,August,Blue,190
21,August,Blue,170
22,August,Orange,33
23,August,Orange,29
24,August,Purple,3
25,August,Purple,9
26,August,Purple,6''')
df = pd.read_csv(TESTDATA, sep = ',')
Maybe add custom x-axis labels and separating lines between months:
new_month = ~df.Month.eq(df.Month.shift(-1))
for c in df[new_month].Count.values[:-1]:
plt.axvline(c + 0.5, linestyle="--", color="gray")
plt.xticks(
(df[new_month].Count + df[new_month].Count.shift(fill_value=0)) / 2,
df[new_month].Month,
)
for color in ["Orange", "Blue", "Purple"]:
plt.plot(
df["Count"],
df[color],
linestyle="none",
marker="o",
color=color.lower(),
label=color,
)
I would also advise that you rename the color columns into something more descriptive and if possible add more time information to your data sample (days, year).

ConversionError: Failed to convert value(s) to axis units: '2015-01-01'

I am trying to convert values to axis units. I checked codes with similar problems but none addressed this specific challenge. As can be seen in the image below, expected plot (A) was supposed to show month (Jan, Feb etc.) on the x-axis, but it was showing dates (2015-01 etc) in plot (B).
Below is the source code, kindly assist. Thanks.
plt.rcParams["font.size"] = 18
plt.figure(figsize=(20,5))
plt.plot(df.air_temperature,label="Air temperature at Frankfurt Int. Airport in 2015")
plt.xlim(("2015-01-01","2015-12-31"))
plt.xticks(["2015-{:02d}-15".format(x) for x in range(1,13,1)],["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"])
plt.legend()
plt.ylabel("Temperature (°C)")
plt.show()
A wise way to draw the plot with datetime is to use datetime format in place of str; so, first of all, you should do this conversion:
df = pd.read_csv(r'data/frankfurt_weather.csv')
df['time'] = pd.to_datetime(df['time'], format = '%Y-%m-%d %H:%M')
Then you can set up the plot as you please, preferably following Object Oriented Interface:
plt.rcParams['font.size'] = 18
fig, ax = plt.subplots(figsize = (20,5))
ax.plot(df['time'], df['air_temperature'], label = 'Air temperature at Frankfurt Int. Airport in 2015')
ax.legend()
ax.set_ylabel('Temperature (°C)')
plt.show()
Then you can customize:
x ticks' labels format and position with matplotlib.dates:
ax.xaxis.set_major_locator(md.MonthLocator(interval = 1))
ax.xaxis.set_major_formatter(md.DateFormatter('%b'))
x axis limits:
ax.set_xlim([pd.to_datetime('2015-01-01', format = '%Y-%m-%d'),
pd.to_datetime('2015-12-31', format = '%Y-%m-%d')])
capital first letter of x ticks' labels for months' names
fig.canvas.draw()
ax.set_xticklabels([month.get_text().title() for month in ax.get_xticklabels()])
Complete Code
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as md
df = pd.read_csv(r'data/frankfurt_weather.csv')
df['time'] = pd.to_datetime(df['time'], format = '%Y-%m-%d %H:%M')
plt.rcParams['font.size'] = 18
fig, ax = plt.subplots(figsize = (20,5))
ax.plot(df['time'], df['air_temperature'], label = 'Air temperature at Frankfurt Int. Airport in 2015')
ax.legend()
ax.set_ylabel('Temperature (°C)')
ax.xaxis.set_major_locator(md.MonthLocator(interval = 1))
ax.xaxis.set_major_formatter(md.DateFormatter('%b'))
ax.set_xlim([pd.to_datetime('2015-01-01', format = '%Y-%m-%d'),
pd.to_datetime('2015-12-31', format = '%Y-%m-%d')])
fig.canvas.draw()
ax.set_xticklabels([month.get_text().title() for month in ax.get_xticklabels()])
plt.show()

Matplotlib - 24h Timeline graph

I want to make a timeline that shows the average number of messages sent over a 24h period. So far, I have managed to format both of the axes. The Y-axis already has the correct data in it.
These are the lists of data:
dates[] #a list of datetimes reduced to hours and minutes
values[] #a list of int
Now, for some time, I have tried to insert data into the graph. I have managed to insert the data now, but I assume that the X-axis is causing some problems because of formatting.
lineColor = "#f0f8ff"
chartColor = "#f0f8ff"
backgroundColor = "#36393f"
girdColor = "#8a8a8a"
dates = []
values = []
fig, ax = plt.subplots()
hours = mdates.HourLocator(interval=2)
d_fmt = mdates.DateFormatter('%H:%M')
ax.xaxis.set_minor_locator(mdates.HourLocator(interval=1))
ax.xaxis.set_major_locator(hours)
ax.xaxis.set_major_formatter(d_fmt)
ax.fill(dates, values)
ax.plot(dates, values, color=Commands.lineColor)
ax.set_xlim(["00:00", "23:59"])
plt.fill_between(dates, values,)
# region ChartDesign
ax.set_title('Amount of Messages')
ax.tick_params(axis='y', colors=Commands.chartColor)
ax.tick_params(axis='x', colors=Commands.chartColor)
ax.tick_params(which='minor', colors=Commands.chartColor)
ax.set_ylabel('Messages', color=Commands.chartColor)
plt.grid(True, color=Commands.girdColor)
ax.set_facecolor(Commands.backgroundColor)
ax.spines["bottom"].set_color(Commands.chartColor)
ax.spines["left"].set_color(Commands.chartColor)
ax.spines["top"].set_color(Commands.chartColor)
ax.spines["right"].set_color(Commands.chartColor)
fig.patch.set_facecolor(Commands.backgroundColor)
fig.tight_layout()
fig.autofmt_xdate()
# endregion
There are similar questions, but they aren't much use for me.
Since I don't have any sample data, I created a simple data and made a graph. The 0:00 time on the timeline is a challenge, so I need to be creative. I have replaced the last 0:00 with 24:00. Then I set the time interval value to 48 as the interval on the X axis. In your code, it will be every 2 hours. I have removed the code that I deemed unnecessary.
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import numpy as np
lineColor = "#f0f8ff"
chartColor = "#f0f8ff"
backgroundColor = "#36393f"
girdColor = "#8a8a8a"
date_rng = pd.date_range('2020-12-01', '2020-12-02', freq='1H')
dates = date_rng.strftime('%H:%M').tolist()
values = np.random.randint(0,25, size=25)
dates[-1] = '24:00'
fig, ax = plt.subplots(figsize=(12,9))
hours = mdates.HourLocator(interval=48)
ax.xaxis.set_major_locator(hours)
# ax.fill(dates, values)
ax.plot(dates, values, color=lineColor)
ax.fill_between(dates, values,)
# region ChartDesign
ax.set_title('Amount of Messages', color=chartColor)
ax.tick_params(axis='y', colors=chartColor)
ax.tick_params(axis='x', colors=chartColor)
# ax.tick_params(which='major', colors=chartColor)
ax.set_ylabel('Messages', color=chartColor)
ax.grid(True, color=girdColor)
ax.set_facecolor(backgroundColor)
ax.spines["bottom"].set_color(chartColor)
ax.spines["left"].set_color(chartColor)
ax.spines["top"].set_color(chartColor)
ax.spines["right"].set_color(chartColor)
fig.set_facecolor(backgroundColor)
fig.tight_layout()
fig.autofmt_xdate()
plt.show()

Cannot prepare proper labels in Matplotlib

I have very simple code:
from matplotlib import dates
import matplotlib.ticker as ticker
my_plot=df_h.boxplot(by='Day',figsize=(12,5), showfliers=False, rot=90)
I've got:
but I would like to have fewer labels on X axis. To do this I've add:
my_plot.xaxis.set_major_locator(ticker.MaxNLocator(12))
It generates fewer labels but values of labels have wrong values (=first of few labels from whole list)
What am I doing wrong?
I have add additional information:
I've forgoten to show what is inside DataFrame.
I have three columns:
reg_Date - datetime64 (index)
temperature - float64
Day - date converted from reg_Date to string, it looks like '2017-10' (YYYY-MM)
Box plot group date by 'Day' and I would like to show values 'Day" as a label but not all values
, for example every third one.
You were almost there. Just set ticker.MultipleLocator.
The pandas.DataFrame.boxplot also returns axes, which is an object of class matplotlib.axes.Axes. So you can use this code snippet to customize your labels:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
center = np.random.randint(50,size=(10, 20))
spread = np.random.rand(10, 20) * 30
flier_high = np.random.rand(10, 20) * 30 + 30
flier_low = np.random.rand(10, 20) * -30
y = np.concatenate((spread, center, flier_high, flier_low))
fig, ax = plt.subplots(figsize=(10, 5))
ax.boxplot(y)
x = ['Label '+str(i) for i in range(20)]
ax.set_xticklabels(x)
ax.set_xlabel('Day')
# Set a tick on each integer multiple of a base within the view interval.
ax.xaxis.set_major_locator(ticker.MultipleLocator(5))
plt.xticks(rotation=90)
I think there is a compatibility issue with Pandas plots and Matplotlib formatters.
With the following code:
df = pd.read_csv('lt_stream-1001-full.csv', header=0, encoding='utf8')
df['reg_date'] = pd.to_datetime(df['reg_date'] , format='%Y-%m-%d %H:%M:%S')
df.set_index('reg_date', inplace=True)
df_h = df.resample(rule='H').mean()
df_h['Day']=df_h.index.strftime('%Y-%m')
print(df_h)
f, ax = plt.subplots()
my_plot = df_h.boxplot(by='Day',figsize=(12,5), showfliers=False, rot=90, ax=ax)
locs, labels = plt.xticks()
i = 0
new_labels = list()
for l in labels:
if i % 3 == 0:
label = labels[i]
i += 1
new_labels.append(label)
else:
label = ''
i += 1
new_labels.append(label)
ax.set_xticklabels(new_labels)
plt.show()
You get this chart:
But I notice that this is grouped by month instead of by day. It may not be what you wanted.
Adding the day component to the string 'Day' messes up the chart as there seems to be too many boxes.
df = pd.read_csv('lt_stream-1001-full.csv', header=0, encoding='utf8')
df['reg_date'] = pd.to_datetime(df['reg_date'] , format='%Y-%m-%d %H:%M:%S')
df.set_index('reg_date', inplace=True)
df_h = df.resample(rule='H').mean()
df_h['Day']=df_h.index.strftime('%Y-%m-%d')
print(df_h)
f, ax = plt.subplots()
my_plot = df_h.boxplot(by='Day',figsize=(12,5), showfliers=False, rot=90, ax=ax)
locs, labels = plt.xticks()
i = 0
new_labels = list()
for l in labels:
if i % 15 == 0:
label = labels[i]
i += 1
new_labels.append(label)
else:
label = ''
i += 1
new_labels.append(label)
ax.set_xticklabels(new_labels)
plt.show()
The for loop creates the tick labels every as many periods as desired. In the first chart they were set every 3 months. In the second one, every 15 days.
If you would like to see less grid lines:
df = pd.read_csv('lt_stream-1001-full.csv', header=0, encoding='utf8')
df['reg_date'] = pd.to_datetime(df['reg_date'] , format='%Y-%m-%d %H:%M:%S')
df.set_index('reg_date', inplace=True)
df_h = df.resample(rule='H').mean()
df_h['Day']=df_h.index.strftime('%Y-%m-%d')
print(df_h)
f, ax = plt.subplots()
my_plot = df_h.boxplot(by='Day',figsize=(12,5), showfliers=False, rot=90, ax=ax)
locs, labels = plt.xticks()
i = 0
new_labels = list()
new_locs = list()
for l in labels:
if i % 3 == 0:
label = labels[i]
loc = locs[i]
i += 1
new_labels.append(label)
new_locs.append(loc)
else:
i += 1
ax.set_xticks(new_locs)
ax.set_xticklabels(new_labels)
ax.grid(axis='y')
plt.show()
I've read about x_compat in Pandas plot in order to apply Matplotlib formatters, but I get an error when trying to apply it. I'll give it another shot later.
Old unsuccesful answer
The tick labels seem to be dates. If they are set as datetime in your dataframe, you can:
months = mdates.MonthLocator(1,4,7,10) #Choose the months you like the most
ax.xaxis.set_major_locator(months)
Otherwise, you can let Matplotlib know they are dates by:
ax.xaxis_date()
Your comment:
I have add additional information:
I've forgoten to show what is inside DataFrame.
I have three columns:
reg_Date - datetime64 (index)
temperature - float64
Day - date converted from reg_Date to string, it looks like '2017-10' *(YYYY-MM) *
Box plot group date by 'Day' and I would like to show values 'Day" as a label but not all values
, for example every third one.
Based on your comment in italic above, I would use reg_Date as the input and the following lines:
days = mdates.DayLocator(interval=3)
daysFmt = mdates.DateFormatter('%Y-%m') #to format display
ax.xaxis.set_major_locator(days)
ax.xaxis.set_major_formatter(daysFmt)
I forgot to mention that you will need to:
import matplotlib.dates as mdates
Does this work?

divide x and y labels in Matplotlib

I have a graph with X as a date and Y as some readings. the X axis has a date interval with an increment of one day. what i want is to show the hours on the x axis between two days(just to set the hours in the yellow area in the graph).
The idea of the code is:
Date=[];Readings=[] # will be filled from another function
dateconv=np.vectorize(datetime.fromtimestamp)
Date_F=dateconv(Date)
ax1 = plt.subplot2grid((1,1), (0,0))
ax1.plot_date(Date_F,Readings,'-')
for label in ax1.xaxis.get_ticklabels():
label.set_rotation(45)
ax1.grid(True)
plt.xlabel('Date')
plt.ylabel('Readings')
ax1.set_yticks(range(0,800,50))
plt.legend()
plt.show()
You can use MultipleLocator from matplotlib.ticker with set_major_locator and set_minor_locator. See example.
Example
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator
import datetime
# Generate some data
d = datetime.timedelta(hours=1/5)
now = datetime.datetime.now()
times = [now + d * j for j in range(250)]
ax = plt.gca() # get the current axes
ax.plot(times, range(len(times)))
for label in ax.xaxis.get_ticklabels():
label.set_rotation(30)
# Set the positions of the major and minor ticks
dayLocator = MultipleLocator(1)
hourLocator = MultipleLocator(1/24)
ax.xaxis.set_major_locator(dayLocator)
ax.xaxis.set_minor_locator(hourLocator)
# Convert the labels to the Y-m-d format
xax = ax.get_xaxis() # get the x-axis
adf = xax.get_major_formatter() # the the auto-formatter
adf.scaled[1/24] = '%Y-%m-%d' # set the < 1d scale to Y-m-d
adf.scaled[1.0] = '%Y-%m-%d' # set the > 1d < 1m scale to Y-m-d
plt.show()
Result

Categories