I'm trying to draw the following chart using python.
Can you help me out?
thanks
You can try this.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'realtime':[2,3,4,2,4],
'esttime':[1,1,3,1,4],
'time of 5 mins': ['09:15','09:20','09:25','09:30','09:35']})
df
realtime esttime time of 5 mins
0 2 1 9:15
1 3 1 9:20
2 4 3 9:25
3 2 1 9:30
4 4 4 9:35
Convert your time of 5 mins to valid datetime object using pd.to_datetime.
df['time of 5 mins']=pd.to_datetime(df['time of 5 mins'],format='%H:%M').dt.strftime('%H:%M')
Output:
Now, use time of 5 mins as X-Axis and Y-Axis for realtime and esttime and use matplotlib.pyplot.plot.annotate as 3-rd dimension.
index= ['A', 'B', 'C', 'D', 'E']
plt.plot(df['time of 5 mins'],df['esttime'],marker='o',alpha=0.8,color='#CD5C5C',lw=0.8)
plt.plot(df['time of 5 mins'],df['realtime'],marker='o',alpha=0.8,color='green',lw=0.8)
ax= plt.gca() #gca is get current axes
for i,txt in enumerate(index):
ax.annotate(txt,(df['time of 5 mins'][i],df['realtime'][i]))
ax.annotate(txt,(df['time of 5 mins'][i],df['esttime'][i]))
plt.show()
To make the plot more complete add legend, xlabel, ylabel, title, and stretch the X-Y Axis ranges a little so that it will be visually aesthetic. More details about matplotlib.pyplot here
import matplotlib.pyplot as plt
import numpy as np
y = [2, 3, 4, 2, 4]
y2 = [1, 1, 3, 1, 4]
a = ['9:15', '9:20', '9:25', '9:30', '9:35']
x = np.arange(5)
fig = plt.figure()
ax = plt.subplot(111)
ax.plot(x, y, label='Real Time')
ax.plot(x, y2, label='Estimated Time')
plt.xticks(x, labels=a)
plt.xlabel('Time')
chartBox = ax.get_position()
ax.set_position([chartBox.x0, chartBox.y0, chartBox.width*0.6, chartBox.height])
ax.legend(loc='upper center', bbox_to_anchor=(1.45, 0.8), shadow=True, ncol=1)
plt.show()
Related
I tried drawing subplot through relplot method of seaborn. Now the question is, due to the original dataset is varying, sometimes I don't know how much final subplots will be.
I set col_wrap to limit it, but sometimes the results looks not so good. For example, I set col_wrap = 3, while there are 5 subplots as below:
As the figure shows, the x_axis only occurs in the C D E, which seems strange. I want x axis label is shown in all subplots(from A to E).
Now I already know that facet_kws={'sharex': 'col'} allows plots to have independent axis scales(according to set axis limits on individual facets of seaborn facetgrid).
But I want set labels for x axis of all subplots.I haven't found any solution for it.
Any keyword like set_xlabels in object FacetGrid seems to be useless, because official document announces they only control "on the bottom row of the grid".
FacetGrid.set_xlabels(label=None, clear_inner=True, **kwargs)
Label the x axis on the bottom row of the grid.
The following are my example data and my code:
city date value
0 A 1 9
1 B 1 20
2 C 1 4
3 D 1 33
4 E 1 2
5 A 2 22
6 B 2 32
7 C 2 27
8 D 2 32
9 E 2 18
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_excel("data/example_data.xlsx")
# print(df)
g = sns.relplot(data=df, x="date", y="value", kind="line", col="city", col_wrap=3,
errorbar=None, facet_kws={'sharex': 'col'})
(g.set_axis_labels("x_axis", "y_axis", )
.set_titles("{col_name}")
.tight_layout()
.add_legend()
)
plt.subplots_adjust(top=0.94, wspace=None, hspace=0.4)
plt.show()
Thanks in advance.
In order to reduce superfluous information, Seaborn makes these inner labels invisible. You can make them visible again:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame({'date': np.repeat([1, 2], 5),
'value': np.random.randint(1, 20, 10),
'city': np.tile([*'abcde'], 2)})
# print(df)
g = sns.relplot(data=df, x="date", y="value", kind="line", col="city", col_wrap=3,
errorbar=None, facet_kws={'sharex': 'col'})
g.set_titles("{col_name}")
g.add_legend()
for ax in g.axes.flat:
ax.set_xlabel('x axis', visible=True)
ax.set_ylabel('y axis', visible=True)
plt.subplots_adjust(top=0.94, wspace=None, hspace=0.4)
plt.show()
I have a dataframe with information concerning the date and the cluster that it belongs (it was done before based on collected temperatures for each day). I want to plot this data in sequence, like a stacked bar chart, changing the color of each element according to the assigned cluster. Here it is my table (the info goes up to 100 days):
Date
order
ClusterNo2
constant
2020-08-07
1
3.0
1
2020-08-08
2
0.0
1
2020-08-09
3
1.0
1
2020-08-10
4
3.0
1
2020-08-11
5
1.0
1
2020-08-12
6
1.0
1
2020-08-13
7
3.0
1
2020-08-14
8
2.0
1
2020-08-15
9
2.0
1
2020-08-16
10
2.0
1
2020-08-17
11
2.0
1
2020-08-18
12
1.0
1
2020-08-19
13
1.0
1
2020-08-20
14
0.0
1
2020-08-21
15
0.0
1
2020-08-22
16
1.0
1
Obs: I can't simply group the data by cluster because the plot should be sequential. I thought writing a code to identify the number of elements of each cluster sequentially, but then I will face the same problem for plotting. Someone know how to solve this?
The expected result should be something like this (the numbers inside the bar representing the cluster, the x-axis the time in days and the bar width the number of observed days with the same cluster in order :
You could use the dates for the x-axis, the 'constant' column for the y-axis,
and the Cluster id for the coloring.
You can create a custom legend using a list of colored rectangles.
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
import pandas as pd
import numpy as np
N = 100
df = pd.DataFrame({'Date': pd.date_range('2020-08-07', periods=N, freq='D'),
'order': np.arange(1, N + 1),
'ClusterNo2': np.random.randint(0, 4, N).astype(float),
'constant': 1})
df['ClusterNo2'] = df['ClusterNo2'].astype(int) # convert to integers
fig, ax = plt.subplots(figsize=(15, 3))
num_clusters = df['ClusterNo2'].max() + 1
colors = plt.cm.Set2.colors
ax.bar(x=range(len(df)), height=df['constant'], width=1, color=[colors[i] for i in df['ClusterNo2']], edgecolor='none')
ax.set_xticks(range(len(df)))
labels = ['' if i % 3 != 0 else day.strftime('%d\n%b %Y') if i == 0 or day.day <= 3 else day.strftime('%d')
for i, day in enumerate(df['Date'])]
ax.set_xticklabels(labels)
ax.margins(x=0, y=0)
ax.yaxis.set_major_locator(MaxNLocator(integer=True))
legend_handles = [plt.Rectangle((0, 0), 0, 0, color=colors[i], label=f'{i}') for i in range(num_clusters)]
ax.legend(handles=legend_handles, title='Clusters', bbox_to_anchor=(1.01, 1.01), loc='upper left')
fig.tight_layout()
plt.show()
You could just plot a normal bar graph, with 1 bar corresponding to 1 day. If you make the width also 1, it will look as if the patches are contiguous.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import BoundaryNorm
# simulate data
total_datapoints = 16
total_clusters = 4
order = np.arange(total_datapoints)
clusters = np.random.randint(0, total_clusters, size=total_datapoints)
# map clusters to colors
cmap = plt.cm.tab10
bounds = np.arange(total_clusters + 1)
norm = BoundaryNorm(bounds, cmap.N)
colors = [cmap(norm(cluster)) for cluster in clusters]
# plot
fig, ax = plt.subplots()
ax.bar(order, np.ones_like(order), width=1, color=colors, align='edge')
# xticks
change_points = np.where(np.diff(clusters) != 0)[0] + 1
change_points = np.unique([0] + change_points.tolist() + [total_datapoints])
ax.set_xticks(change_points)
# annotate clusters
for ii, dx in enumerate(np.diff(change_points)):
xx = change_points[ii] + dx/2
ax.text(xx, 0.5, str(clusters[int(xx)]), ha='center', va='center')
ax.set_xlabel('Time (days)')
plt.show()
Example Plot
I have a dataframe that includes time series of snow-water and temperature data. I am looking to create a time series plot of snow water, that shows two colors in the snow water line plot, 'blue' if the temperature is < = 273 deg K and 'red' if the temperature is > 273 deg K. I tried to follow the matplotib documentation (https://matplotlib.org/3.1.1/gallery/lines_bars_and_markers/multicolored_line.html) but have not been successful. Would appreciate some insights. Thank you!
My dataframe is as follows: Date (datetime64[ns]); Snow-water (float64) and Temp (float64)
from matplotlib.collections import LineCollection
Date Snowwater Temperature
2014-01-01 01:00:00 5 240
2014-01-01 02:00:00 10 270
2014-01-01 03:00:00 11 273
2014-01-01 04:00:00 15 279
2014-01-01 05:00:00 20 300
2014-01-01 06:00:00 25 310
I am looking for output something like in the example plot linked above but with snow-water values in the y-axis (line color blue or red depending on the temperature) and datetime on the x-axis
This did the trick although there might be better way of doing it:
colors=['blue' if x < 273 else 'red' for x in df['AIR_T[K]']]
x = mpd.date2num(df['Date'])
y = df['SWE_St'].values
points = np.array([x, y]).T.reshape(-1, 1, 2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
lc = LineCollection(segments, colors=colors)
fig, ax = plt.subplots()
ax.add_collection(lc)
ax.autoscale()
ax.xaxis.set_major_locator(mpd.MonthLocator())
ax.xaxis.set_major_locator(ticker.MultipleLocator(200))
ax.xaxis.set_major_formatter(mpd.DateFormatter('%Y-%m-%d:%H:%M:%S'))
plt.setp(ax.xaxis.get_majorticklabels(), rotation=70)
plt.show()
result plot
I created the data manually, but LineCollection is a
This is an object that contains multiple lines, the first argument being a list of lines.
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
xs = [0, 1, 2, 3, 4, 5]
ys = [-2, -1, 0, 1, 5, 10]
lines = [[(x1, y1), (x2, y2)] for x1, y1, x2, y2 in zip(xs, ys, xs[1:], ys[1:])]
colors = ['r', 'r', 'b', 'b', 'b']
lc = LineCollection(lines, colors=colors)
fig, ax = plt.subplots()
ax.add_collection(lc)
ax.autoscale()
plt.show()
I have a dataframe with a datetime index:
A B
date
2020-05-04 0 0
2020-05-05 5 0
2020-05-07 2 0
2020-05-09 2 0
2020-05-18 -5 0
2020-05-19 -1 0
2020-05-20 0 0
2020-05-21 1 0
2020-05-22 0 0
2020-05-23 3 0
2020-05-24 1 1
2020-05-25 0 1
2020-05-26 4 1
2020-05-27 3 1
I want to make a lineplot to track A over time and colour the background of the plot red when the values of B are 1. I have implemented this code to make the graph:
from matplotlib import dates as mdates
from matplotlib.colors import ListedColormap
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
cmap = ListedColormap(['white','red'])
ax.plot(data['A'])
ax.set_xlabel('')
plt.xticks(rotation = 30)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
ax.pcolorfast(ax.get_xlim(), ax.get_ylim(),
data['B'].values[np.newaxis],
cmap = cmap, alpha = 0.4)
plt.axhline(y = 0, color = 'black')
plt.tight_layout()
This gives me this graph:
But the red region incorrectly starts from 2020-05-21 rather than 2020-05-24 and it doesn't end at the end date in the dataframe. How can I alter my code to fix this?
If you change ax.pcolorfast(ax.get_xlim(), ... by ax.pcolor(data.index, ... you get what you want. The problem with the current code is that by using ax.get_xlim(), it creates a uniform rectangular grid while your index is not uniform (dates are missing), so the coloredmeshed is not like expected. The whole thing is:
from matplotlib import dates as mdates
from matplotlib.colors import ListedColormap
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
cmap = ListedColormap(['white','red'])
fig = plt.figure()
ax = fig.add_subplot()
ax.plot(data['A'])
ax.set_xlabel('')
plt.xticks(rotation = 30)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
#here are the two changes use pcolor
ax.pcolor(data.index, #use data.index to create the proper grid
ax.get_ylim(),
data['B'].values[np.newaxis],
cmap = cmap, alpha = 0.4,
linewidth=0, antialiased=True)
plt.axhline(y = 0, color = 'black')
plt.tight_layout()
and you get
I prefer axvspan in this case, see here for more information.
This adaptation will color the areas where data.B==1, including the potential where data.B might not be a continuous block.
With a modified dataframe data from data1.csv (added some more points that are 1):
date A B
5/4/2020 0 0
5/5/2020 5 0
5/7/2020 2 1
5/9/2020 2 1
5/18/2020 -5 0
5/19/2020 -1 0
5/20/2020 0 0
5/21/2020 1 0
5/22/2020 0 0
5/23/2020 3 0
5/24/2020 1 1
5/25/2020 0 1
5/26/2020 4 1
5/27/2020 3 1
from matplotlib import dates as mdates
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('data1.csv',index_col='date')
data.index = pd.to_datetime(data.index)
fig = plt.figure()
ax = fig.add_subplot()
ax.plot(data['A'])
plt.xticks(rotation = 30)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
plt.axhline(y = 0, color = 'black')
# in this case I'm looking for a pair of ones to determine where to color
for i in range(1,len(data.B)):
if data.B[i]==True and data.B[i-1]==True:
plt.axvspan(data.index[i-1], data.index[i], color='r', alpha=0.4, lw=0)
plt.tight_layout()
If data.B==1 will always be "one block" you can do away with the for loop and just use something like this in its place:
first = min(idx for idx, val in enumerate(data.B) if val == 1)
last = max(idx for idx, val in enumerate(data.B) if val == 1)
plt.axvspan(data.index[first], data.index[last], color='r', alpha=0.4, lw=0)
Regarding "why" your data does not align, #Ben.T has this solution.
UPDATE: as pointed out, the for loop could be too crude for large datasets. The following uses numpy to find the falling and rising edges of data.B and then loops on those results:
import numpy as np
diffB = np.append([0], np.diff(data.B))
up = np.where(diffB == 1)[0]
dn = np.where(diffB == -1)[0]
if diffB[np.argmax(diffB!=0)]==-1:
# we have a falling edge before rising edge, must have started 'up'
up = np.append([0], up)
if diffB[len(diffB) - np.argmax(diffB[::-1]) - 1]==1:
# we have a rising edge that never fell, force it 'dn'
dn = np.append(dn, [len(data.B)-1])
for i in range(len(up)):
plt.axvspan(data.index[up[i]], data.index[dn[i]], color='r', alpha=0.4, lw=0)
I have two sets of data I want to plot together on a single figure. I have a set of flow data at 15 minute intervals I want to plot as a line plot, and a set of precipitation data at hourly intervals, which I am resampling to a daily time step and plotting as a bar plot. Here is what the format of the data looks like:
2016-06-01 00:00:00 56.8
2016-06-01 00:15:00 52.1
2016-06-01 00:30:00 44.0
2016-06-01 00:45:00 43.6
2016-06-01 01:00:00 34.3
At first I set this up as two subplots, with precipitation and flow rate on different axis. This works totally fine. Here's my code:
import matplotlib.pyplot as plt
import pandas as pd
from datetime import datetime
filename = 'manhole_B.csv'
plotname = 'SSMH-2A B'
plt.style.use('bmh')
# Read csv with precipitation data, change index to datetime object
pdf = pd.read_csv('precip.csv', delimiter=',', header=None, index_col=0)
pdf.columns = ['Precipitation[in]']
pdf.index.name = ''
pdf.index = pd.to_datetime(pdf.index)
pdf = pdf.resample('D').sum()
print(pdf.head())
# Read csv with flow data, change index to datetime object
qdf = pd.read_csv(filename, delimiter=',', header=None, index_col=0)
qdf.columns = ['Flow rate [gpm]']
qdf.index.name = ''
qdf.index = pd.to_datetime(qdf.index)
# Plot
f, ax = plt.subplots(2)
qdf.plot(ax=ax[1], rot=30)
pdf.plot(ax=ax[0], kind='bar', color='r', rot=30, width=1)
ax[0].get_xaxis().set_ticks([])
ax[1].set_ylabel('Flow Rate [gpm]')
ax[0].set_ylabel('Precipitation [in]')
ax[0].set_title(plotname)
f.set_facecolor('white')
f.tight_layout()
plt.show()
2 Axis Plot
However, I decided I want to show everything on a single axis, so I modified my code to put precipitation on a secondary axis. Now my flow data data has disppeared from the plot, and even when I set the axis ticks to an empty set, I get these 00:15 00:30 and 00:45 tick marks along the x-axis.
Secondary-y axis plots
Any ideas why this might be occuring?
Here is my code for the single axis plot:
f, ax = plt.subplots()
qdf.plot(ax=ax, rot=30)
pdf.plot(ax=ax, kind='bar', color='r', rot=30, secondary_y=True)
ax.get_xaxis().set_ticks([])
Here is an example:
Setup
In [1]: from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
%matplotlib inline
df = pd.DataFrame({'x' : np.arange(10),
'y1' : np.random.rand(10,),
'y2' : np.square(np.arange(10))})
df
Out[1]: x y1 y2
0 0 0.451314 0
1 1 0.321124 1
2 2 0.050852 4
3 3 0.731084 9
4 4 0.689950 16
5 5 0.581768 25
6 6 0.962147 36
7 7 0.743512 49
8 8 0.993304 64
9 9 0.666703 81
Plot
In [2]: fig, ax1 = plt.subplots()
ax1.plot(df['x'], df['y1'], 'b-')
ax1.set_xlabel('Series')
ax1.set_ylabel('Random', color='b')
for tl in ax1.get_yticklabels():
tl.set_color('b')
ax2 = ax1.twinx() # Note twinx, not twiny. I was wrong when I commented on your question.
ax2.plot(df['x'], df['y2'], 'ro')
ax2.set_ylabel('Square', color='r')
for tl in ax2.get_yticklabels():
tl.set_color('r')
Out[2]: