Python drawing cumulative plot (matplotlib) - python

I have not used matplotlib, but looks like it is main library for drawing plots. I want to draw CPU usage plot. I have background processes each minute making record (date, min_load, avg_load, max_load). date could be timestamp or nice formatted date.
I want to draw diagram which show min_load, avg_load and max_load on the same plot. On X axis I would like to put minutes, hours, days, week depending on how much data there is.
There are possible gaps. Let's say monitored process crashes and because no one restarts it there might be gaps for several hours.
Example of how I imagine it: http://img714.imageshack.us/img714/2074/infoplot1.png
This does not illustrate gaps, but in this situation on readings go to 0.
I am playing with matplotlib right now I will try sharing my results too. This is how data might look like:
1254152292;0.07;0.08;0.13
1254152352;0.04;0.05;0.10
1254152412;0.09;0.10;0.17
1254152472;0.28;0.29;0.30
1254152532;0.20;0.20;0.21
1254152592;0.09;0.12;0.15
1254152652;0.09;0.12;0.14
1254152923;0.13;0.12;0.30
1254152983;0.13;0.25;0.32
Or it could look something like this:
Wed Oct 06 08:03:55 CEST 2010;0.25;0.30;0.35
Wed Oct 06 08:03:56 CEST 2010;0.00;0.01;0.02
Wed Oct 06 08:03:57 CEST 2010;0.00;0.01;0.02
Wed Oct 06 08:03:58 CEST 2010;0.00;0.01;0.02
Wed Oct 06 08:03:59 CEST 2010;0.00;0.01;0.02
Wed Oct 06 08:04:00 CEST 2010;0.00;0.01;0.02
Wed Oct 06 08:04:01 CEST 2010;0.25;0.50;0,75
Wed Oct 06 08:04:02 CEST 2010;0.00;0.01;0.02
-david

Try:
from matplotlib.dates import strpdate2num, epoch2num
import numpy as np
from pylab import figure, show, cm
datefmt = "%a %b %d %H:%M:%S CEST %Y"
datafile = "cpu.dat"
def parsedate(x):
global datefmt
try:
res = epoch2num( int(x) )
except:
try:
res = strpdate2num(datefmt)(x)
except:
print("Cannot parse date ('"+x+"')")
exit(1)
return res
# parse data file
t,a,b,c = np.loadtxt(
datafile, delimiter=';',
converters={0:parsedate},
unpack=True)
fig = figure()
ax = fig.add_axes((0.1,0.1,0.7,0.85))
# limit y axis to 0
ax.set_ylim(0);
# colors
colors=['b','g','r']
fill=[(0.5,0.5,1), (0.5,1,0.5), (1,0.5,0.5)]
# plot
for x in [c,b,a]:
ax.plot_date(t, x, '-', lw=2, color=colors.pop())
ax.fill_between(t, x, color=fill.pop())
# legend
ax.legend(['max','avg','min'], loc=(1.03,0.4), frameon=False)
fig.autofmt_xdate()
show()
This parses the lines from "cpu.dat" file. Date is parsed by parsedate function.
Matplotlib should find the best format for the x axis.
Edit: Added legend and fill_between (maybe there is better way to do this).

Related

Change line width of specific line in line plot pandas/matplotlib

I am plotting a dataframe that looks like this.
Date 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
Date
01 Jan 12.896 13.353 12.959 13.011 13.073 12.721 12.643 12.484 12.876 13.102
02 Jan 12.915 13.421 12.961 13.103 13.125 12.806 12.644 12.600 12.956 13.075
03 Jan 12.926 13.379 13.012 13.116 13.112 12.790 12.713 12.634 12.959 13.176
04 Jan 13.051 13.414 13.045 13.219 13.051 12.829 12.954 12.724 13.047 13.187
05 Jan 13.176 13.417 13.065 13.148 13.115 12.874 12.956 12.834 13.098 13.123
The code for plotting is here.
ice_data_dates.plot(figsize=(20,12), title='Arctic Sea Ice Extent', lw=3, fontsize=16, ax=ax, grid=True)
This plots a line plot for each of the years listed in the dataframe over each day in the year. However, I would like to make the line for 2020 much thicker than the others so it stands out more clearly. Is there a way to do that using this one line of code? Or do I need to manually plot all of the years such that I can control the thickness of each line separately? A current picture is attached, where the line thicknesses are all the same.
You can iterate over the lines in the plot, which can be retrieved with ax.get_lines, and increase the width using set_linewidth if its label matches the value of interest:
fig, ax = plt.subplots()
df.plot(figsize=(20,12), title='Arctic Sea Ice Extent',
lw=3, fontsize=16, ax=ax, grid=True)
for line in ax.get_lines():
if line.get_label() == '2020':
line.set_linewidth(15)
plt.show()
You can do it in two lines like this:
ice_data_dates.loc[:, ice_data_dates.columns != "2020"].plot(figsize=(20, 12), title='Arctic Sea Ice Extent', lw=3, fontsize=16, ax=ax, grid=True)
ice_data_dates["2020"].plot(figsize=(20, 12), title='Arctic Sea Ice Extent', lw=15, fontsize=16, ax=ax, grid=True)
This will first plot the entire DataFrame except for the column for 2020 and then only plot 2020. The output looks like this:
This uses a different approach as the selected answer but it gives the same result.

visualize two columns in the same data set

I am trying to group and sort four columns, count values and chart them in the same bar graph to see the trend how the count has changed.
Year Month Bl_year Month
2018 Jan 2019 Jan
2018 Feb 2018 Mar
2018 Dec 2020 Dec
2019 Apr 2019 Sep
2020 Nov 2020 Dec
2019 Sep 2018 Jan
I tried to group and sort first and counting values first by the year and then next by the month.
df_Activity_count = df.sort_values(['year','month'],ascending = True).groupby('month')
df_Activity_count_BL = df.sort_values(['BL year','BL month'],ascending = True).groupby('BL month')
Now I am trying to compare these two in the same bar. Can someone please help.
Try to pass ax to your plot command:
df_Activity_count = df.sort_values(['year','month'],ascending = True).groupby('month')
df_Activity_count_BL = df.sort_values(['BL year','BL month'],ascending = True).groupby('BL month')
ax = df_Activity_count.years.value_counts().unstack(0).plot.bar()
df_Activity_count_BL['BL year'].value_counts().unstack(0).plot.bar(ax=ax)
Since you tagged matplotlib, I will chip in a solution using pyplot
import matplotlib.pyplot as plt
# Create an axis object
fig, ax = plt.subplots()
# Define dataframes
df_Activity_count = df.sort_values(['year','month'],ascending = True).groupby('month')
df_Activity_count_BL = df.sort_values(['BL year','BL month'],ascending = True).groupby('BL month')
# Plot using the axis object ax defined above
df_Activity_count['year'].value_counts().unstack(0).plot.bar(ax=ax)
df_Activity_count_BL['BL year'].value_counts().unstack(0).plot.bar(ax=ax)

Plotting pd.Series object does not show year correctly

I am graphing the results of the measurements of a humidity sensor over time.
I'm using Python 3.7.1 and Pandas 0.24.2.
I have a list called dateTimeList with date and time strings:
dateTimeList = ['15.3.2019 11:44:27', '15.3.2019 12:44:33', '15.3.2019 13:44:39']
I wrote this code where index is a DatetimeIndex object and humList is a list of floats.
index = pd.to_datetime(dateTimeList, format='%d.%m.%Y %H:%M:%S')
ts = pd.Series(humList, index)
plt.figure(figsize=(12.80, 7.20))
ts.plot(title='Gráfico de Humedad en el Tiempo', style='g', marker='o')
plt.xlabel('Tiempo [días]')
plt.ylabel('Humedad [V]')
plt.grid()
plt.savefig('Hum_General'+'.png', bbox_inches='tight')
plt.show()
And I have this two results, one with data from February1 and the other one with data from March2.
The problem is that in March instead of leaving the year 2019, sequences of 00 12 00 12 appear on the x axis. I think it is important to note that this only happens on the data of March, since February is ok, and the data of both months have the same structure. Day and Month are shown correctly on both plots.
I also tried with:
index = [ pd.to_datetime(date, format='%d.%m.%Y %H:%M:%S') for date in dateTimeList]
Now index is a list of Timestamps objects. Same Results.
Add this immediately after creating the plot
import matplotlib.dates as mdates # this should be on the top of the script
xfmt = mdates.DateFormatter('%Y-%m-%d')
ax = plt.gca()
ax.xaxis.set_major_formatter(xfmt)
My guess is that since March has less data points, Matplotlib prefers to label dates as month-day-hour instead of year-month-date, so probably when you have more data in March the issue should fix itself. The code I posted should keep a year-month-day format regardless the number of data points used to plot.

time on xaxis in plotly

I have my x-axis values in this format : ['May 23 2018 06:31:52 GMT', 'May 23 2018 06:32:02 GMT', 'May 23 2018 06:32:12 GMT', 'May 23 2018 06:32:22 GMT', 'May 23 2018 06:32:32 GMT']
and corresponding values for the y-axis which are some numbers.
But when I am plotting these using plotly , x-axis show only part of the date (May 23 2018) for each point. Time for each point is not shown.
I tried setting up tickformat also in layout, but it does not seems to work.
layout = go.Layout(
title=field+ "_its diff_value chart",
xaxis = dict(
tickformat = '%b %d %Y %H:%M:%S'
)
)
any help is appreciated.
This is the screenshot of the graph made.
Try converting your x-values to datetime objects
Then tell plotly to use a fixed tick distance
import random
import datetime
import plotly
plotly.offline.init_notebook_mode()
x = [datetime.datetime.now()]
for d in range(100):
x.append(x[0] + datetime.timedelta(d))
y = [random.random() for _ in x]
scatter = plotly.graph_objs.Scatter(x=x, y=y)
layout = plotly.graph_objs.Layout(xaxis={'type': 'date',
'tick0': x[0],
'tickmode': 'linear',
'dtick': 86400000.0 * 14}) # 14 days
fig = plotly.graph_objs.Figure(data=[scatter], layout=layout)
plotly.offline.iplot(fig)
To skip inconsistent time series, add this before plotting the plotly chart
fig.update_xaxes(
rangebreaks=[
dict(bounds=['2018-05-23 06:31:52','2018-05-23 06:32:02']),
dict(bounds=['2018-05-23 06:32:02','2018-05-23 06:32:12']),
dict(bounds=['2018-05-23 06:32:12','2018-05-23 06:32:22']),
dict(bounds=['2018-05-23 06:32:22','2018-05-23 06:32:32'])
]
)

pandas day of week axis labels

I am plotting a pandas series that spans one week. My code:
rng = pd.date_range('1/6/2014',periods=169,freq='H')
graph = pd.Series(shared_index, index=rng[:168])
graph.plot(shared_index)
Which displays 7 x-axis labels:
[06 Jan 2014, 07, 08, 09, 10, 11, 12]
But I want:
[Mon, Tue, Wed, Thu, Fri, Sat, Sun]
What do I specify in code to change axis labels?
Thanks!
perhaps you can manually fix the tick labels:
rng = pd.date_range('1/6/2014',periods=169,freq='H')
graph = pd.Series(np.random.randn(168), index=rng[:168])
ax = graph.plot()
weekday_map= {0:'MON', 1:'TUE', 2:'WED', 3:'THU',
4:'FRI', 5:'SAT', 6:'SUN'}
xs = sorted(ax.get_xticks(minor='both'))
wd = graph.index[xs - xs[0]].map(pd.Timestamp.weekday)
ax.set_xticks(xs)
ax.set_xticks([], minor=True)
ax.set_xticklabels([weekday_map[d] for d in wd])

Categories