pandas day of week axis labels - python

I am plotting a pandas series that spans one week. My code:
rng = pd.date_range('1/6/2014',periods=169,freq='H')
graph = pd.Series(shared_index, index=rng[:168])
graph.plot(shared_index)
Which displays 7 x-axis labels:
[06 Jan 2014, 07, 08, 09, 10, 11, 12]
But I want:
[Mon, Tue, Wed, Thu, Fri, Sat, Sun]
What do I specify in code to change axis labels?
Thanks!

perhaps you can manually fix the tick labels:
rng = pd.date_range('1/6/2014',periods=169,freq='H')
graph = pd.Series(np.random.randn(168), index=rng[:168])
ax = graph.plot()
weekday_map= {0:'MON', 1:'TUE', 2:'WED', 3:'THU',
4:'FRI', 5:'SAT', 6:'SUN'}
xs = sorted(ax.get_xticks(minor='both'))
wd = graph.index[xs - xs[0]].map(pd.Timestamp.weekday)
ax.set_xticks(xs)
ax.set_xticks([], minor=True)
ax.set_xticklabels([weekday_map[d] for d in wd])

Related

Add a date event on a line chart in Python

So, I have a line chart that shows a random sales data from 2010 to 2020. But, I want to add a vertical line, or some visual resource to indicate something important that happened in 2014, for example. How can I do that in Python? Any library would do!
try using plt.axvline() with matplotlib
import matplotlib.pyplot as plt
x = [ 2015, 2016, 2017, 2018,2019,2020]
y = [ 1000, 1200, 2500, 1000, 1100,250]
plt.plot(x,y)
plt.title("Sales Bar graph")
plt.xlabel("year")
plt.ylabel('Sales')
#drwa a line in 2019 value
plt.axvline(x=2019, label='line at x = {}'.format(2019), c='red')
plt.show()

How to plot different dataframe data in one figure?

I need some guidance to plot:
scatter plot of df1 data: time vs y use the hue for the column z
line plot df2 data: time vs. y
a single line at y=c (c is a constant)
y data in df1 and df2 are different but they are in the same range.
I do not know where to begin. Any guidance is appreciated.
More explanation. A portion of data is presented here. I want to plot:
scatter plot of time vs CO2
finding the yearly rolling average of CO2 (from 01/01/2016 to 09/30/2019 based on hourly data. So the first average will be from "01/01/2016 00" to "12/31/2016 23" and second average will be from "01/01/2016 01" to "01/01/2017 00") (like the trend in plot below)
finding the maximum of all the data and through a line over the plot (like straight line below)
Sample data
data = {'Date':['0 01/14/2016 00', '01/14/2016 01','01/14/2016 02','01/14/2016 03','01/14/2016 04','01/14/2016 05','01/14/2016 06','01/14/2016 07','01/14/2016 08','01/14/2016 09','01/14/2016 10','01/14/2016 11','01/14/2016 12','01/14/2016 13','01/14/2016 14','01/14/2016 15','01/14/2016 16','01/14/2016 17','01/14/2016 18','01/14/2016 19'],
'CO2':[2415.9,2416.5,2429.8,2421.5,2422.2,2428.3,2389.1,2343.2,2444.,2424.8,2429.6,2414.7,2434.9,2420.6,2420.5,2397.1,2415.6,2417.4,2373.2,2367.9],
'Year':[2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016]}
# Create DataFrame
df = pd.DataFrame(data)
# DataFrame view
Date CO2 Year
0 01/14/2016 00 2415.9 2016
01/14/2016 01 2416.5 2016
01/14/2016 02 2429.8 2016
01/14/2016 03 2421.5 2016
01/14/2016 04 2422.2 2016
using matplotlib.pyplot:
plt.hlines to add a horizontal line at a constant
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# with synthetic data
np.random.seed(365)
data = {'CO2': [np.random.randint(2000, 2500) for _ in range(783)],
'Date': pd.bdate_range(start='1/1/2016', end='1/1/2019').tolist()}
# create the dataframe:
df = pd.DataFrame(data)
# verify Date is in datetime format
df['Date'] = pd.to_datetime(df['Date'])
# set Date as index so .rolling can be used
df.set_index('Date', inplace=True)
# add rolling mean
df['rolling'] = df['CO2'].rolling('365D').mean()
# plot the data
plt.figure(figsize=(8, 8))
plt.scatter(x=df.index, y='CO2', data=df, label='data')
plt.plot(df.index, 'rolling', data=df, color='black', label='365 day rolling mean')
plt.hlines(max(df['CO2']), xmin=min(df.index), xmax=max(df.index), color='red', linestyles='dashed', label='Max')
plt.hlines(np.mean(df['CO2']), xmin=min(df.index), xmax=max(df.index), color='green', linestyles='dashed', label='Mean')
plt.xticks(rotation='45')
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
plt.show()
Plot using synthetic data:
Issues with the Date format in the data from the op:
Use a regular expression to fix the Date column
Place the code to fix Date, just before df['Date'] = pd.to_datetime(df['Date'])
import re
# your data
Date CO2 Year
0 01/14/2016 00 2415.9 2016
01/14/2016 01 2416.5 2016
01/14/2016 02 2429.8 2016
01/14/2016 03 2421.5 2016
01/14/2016 04 2422.2 2016
df['Date'] = df['Date'].apply(lambda x: (re.findall(r'\d{2}/\d{2}/\d{4}', x)[0]))
# fixed Date column
Date CO2 Year
01/14/2016 2415.9 2016
01/14/2016 2416.5 2016
01/14/2016 2429.8 2016
01/14/2016 2421.5 2016
01/14/2016 2422.2 2016
You can use a dual-axis chart. It will ideally look the same as yours because both the axes will be the same scale. Can directly plot using pandas data frames
import matplotlib.pyplot as plt
import pandas as pd
# create a color map for the z column
color_map = {'z_val1':'red', 'z_val2':'blue', 'z_val3':'green', 'z_val4':'yellow'}
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twinx() #second axis within the first
# define scatter plot
df1.plot.scatter(x = 'date',
y = 'CO2',
ax = ax1,
c = df['z'].apply(lambda x:color_map[x]))
# define line plot
df2.plot.line(x = 'date',
y = 'MA_CO2', #moving average in dataframe 2
ax = ax2)
# plot the horizontal line at y = c (constant value)
ax1.axhline(y = c, color='r', linestyle='-')
# to fit the chart properly
plt.tight_layout()

visualize two columns in the same data set

I am trying to group and sort four columns, count values and chart them in the same bar graph to see the trend how the count has changed.
Year Month Bl_year Month
2018 Jan 2019 Jan
2018 Feb 2018 Mar
2018 Dec 2020 Dec
2019 Apr 2019 Sep
2020 Nov 2020 Dec
2019 Sep 2018 Jan
I tried to group and sort first and counting values first by the year and then next by the month.
df_Activity_count = df.sort_values(['year','month'],ascending = True).groupby('month')
df_Activity_count_BL = df.sort_values(['BL year','BL month'],ascending = True).groupby('BL month')
Now I am trying to compare these two in the same bar. Can someone please help.
Try to pass ax to your plot command:
df_Activity_count = df.sort_values(['year','month'],ascending = True).groupby('month')
df_Activity_count_BL = df.sort_values(['BL year','BL month'],ascending = True).groupby('BL month')
ax = df_Activity_count.years.value_counts().unstack(0).plot.bar()
df_Activity_count_BL['BL year'].value_counts().unstack(0).plot.bar(ax=ax)
Since you tagged matplotlib, I will chip in a solution using pyplot
import matplotlib.pyplot as plt
# Create an axis object
fig, ax = plt.subplots()
# Define dataframes
df_Activity_count = df.sort_values(['year','month'],ascending = True).groupby('month')
df_Activity_count_BL = df.sort_values(['BL year','BL month'],ascending = True).groupby('BL month')
# Plot using the axis object ax defined above
df_Activity_count['year'].value_counts().unstack(0).plot.bar(ax=ax)
df_Activity_count_BL['BL year'].value_counts().unstack(0).plot.bar(ax=ax)

time on xaxis in plotly

I have my x-axis values in this format : ['May 23 2018 06:31:52 GMT', 'May 23 2018 06:32:02 GMT', 'May 23 2018 06:32:12 GMT', 'May 23 2018 06:32:22 GMT', 'May 23 2018 06:32:32 GMT']
and corresponding values for the y-axis which are some numbers.
But when I am plotting these using plotly , x-axis show only part of the date (May 23 2018) for each point. Time for each point is not shown.
I tried setting up tickformat also in layout, but it does not seems to work.
layout = go.Layout(
title=field+ "_its diff_value chart",
xaxis = dict(
tickformat = '%b %d %Y %H:%M:%S'
)
)
any help is appreciated.
This is the screenshot of the graph made.
Try converting your x-values to datetime objects
Then tell plotly to use a fixed tick distance
import random
import datetime
import plotly
plotly.offline.init_notebook_mode()
x = [datetime.datetime.now()]
for d in range(100):
x.append(x[0] + datetime.timedelta(d))
y = [random.random() for _ in x]
scatter = plotly.graph_objs.Scatter(x=x, y=y)
layout = plotly.graph_objs.Layout(xaxis={'type': 'date',
'tick0': x[0],
'tickmode': 'linear',
'dtick': 86400000.0 * 14}) # 14 days
fig = plotly.graph_objs.Figure(data=[scatter], layout=layout)
plotly.offline.iplot(fig)
To skip inconsistent time series, add this before plotting the plotly chart
fig.update_xaxes(
rangebreaks=[
dict(bounds=['2018-05-23 06:31:52','2018-05-23 06:32:02']),
dict(bounds=['2018-05-23 06:32:02','2018-05-23 06:32:12']),
dict(bounds=['2018-05-23 06:32:12','2018-05-23 06:32:22']),
dict(bounds=['2018-05-23 06:32:22','2018-05-23 06:32:32'])
]
)

Python drawing cumulative plot (matplotlib)

I have not used matplotlib, but looks like it is main library for drawing plots. I want to draw CPU usage plot. I have background processes each minute making record (date, min_load, avg_load, max_load). date could be timestamp or nice formatted date.
I want to draw diagram which show min_load, avg_load and max_load on the same plot. On X axis I would like to put minutes, hours, days, week depending on how much data there is.
There are possible gaps. Let's say monitored process crashes and because no one restarts it there might be gaps for several hours.
Example of how I imagine it: http://img714.imageshack.us/img714/2074/infoplot1.png
This does not illustrate gaps, but in this situation on readings go to 0.
I am playing with matplotlib right now I will try sharing my results too. This is how data might look like:
1254152292;0.07;0.08;0.13
1254152352;0.04;0.05;0.10
1254152412;0.09;0.10;0.17
1254152472;0.28;0.29;0.30
1254152532;0.20;0.20;0.21
1254152592;0.09;0.12;0.15
1254152652;0.09;0.12;0.14
1254152923;0.13;0.12;0.30
1254152983;0.13;0.25;0.32
Or it could look something like this:
Wed Oct 06 08:03:55 CEST 2010;0.25;0.30;0.35
Wed Oct 06 08:03:56 CEST 2010;0.00;0.01;0.02
Wed Oct 06 08:03:57 CEST 2010;0.00;0.01;0.02
Wed Oct 06 08:03:58 CEST 2010;0.00;0.01;0.02
Wed Oct 06 08:03:59 CEST 2010;0.00;0.01;0.02
Wed Oct 06 08:04:00 CEST 2010;0.00;0.01;0.02
Wed Oct 06 08:04:01 CEST 2010;0.25;0.50;0,75
Wed Oct 06 08:04:02 CEST 2010;0.00;0.01;0.02
-david
Try:
from matplotlib.dates import strpdate2num, epoch2num
import numpy as np
from pylab import figure, show, cm
datefmt = "%a %b %d %H:%M:%S CEST %Y"
datafile = "cpu.dat"
def parsedate(x):
global datefmt
try:
res = epoch2num( int(x) )
except:
try:
res = strpdate2num(datefmt)(x)
except:
print("Cannot parse date ('"+x+"')")
exit(1)
return res
# parse data file
t,a,b,c = np.loadtxt(
datafile, delimiter=';',
converters={0:parsedate},
unpack=True)
fig = figure()
ax = fig.add_axes((0.1,0.1,0.7,0.85))
# limit y axis to 0
ax.set_ylim(0);
# colors
colors=['b','g','r']
fill=[(0.5,0.5,1), (0.5,1,0.5), (1,0.5,0.5)]
# plot
for x in [c,b,a]:
ax.plot_date(t, x, '-', lw=2, color=colors.pop())
ax.fill_between(t, x, color=fill.pop())
# legend
ax.legend(['max','avg','min'], loc=(1.03,0.4), frameon=False)
fig.autofmt_xdate()
show()
This parses the lines from "cpu.dat" file. Date is parsed by parsedate function.
Matplotlib should find the best format for the x axis.
Edit: Added legend and fill_between (maybe there is better way to do this).

Categories