How to plot values in a line plot with string xaxis? - python

I have this df:
Month MEAN
0 JAN 1.0
1 FEB 2.0
2 MAR 5.0
3 APR 3.0
4 MAY 4.0
5 JUN 2.0
6 JUL 1.0
7 AUG 1.0
8 SEP 0.0
9 OCT 0.0
10 NOV 2.0
11 DEC 3.0
I want to annotate the values of my plot in a lineplot graphic, so i tried this code:
fig = plt.figure('Graphic', figsize=(20,15), dpi=300)
ax1 = fig.add_axes([0.15, 0.20, 0.70, 0.60])
df.plot(kind='line', marker='o',style=['--'],linewidth=7,color='black', ms=15,ax=ax1)
for x,y in zip(df['Month'],df['MEAN']):
label = "{:.2f}".format(y)
plt.annotate(label, # this is the text
(x,y),
textcoords="offset points",
xytext=(0,10),
ha='center')
But i get this error:
ConversionError: Failed to convert value(s) to axis units: 'JAN'
How can i solve this?
pd: Maybe i should change df['Month'] values to numerical but i need to plot the string values in the graphic.
Thanks in advance.

This should work:
fig = plt.figure('Graphic', figsize=(20,15), dpi=300)
ax1 = fig.add_axes([0.15, 0.20, 0.70, 0.60])
df.plot(kind='line', marker='o',style=['--'],linewidth=7,color='black', ms=15,ax=ax1)
plt.xticks(range(0,len(df['Month'])), df['Month'])
plt.show()
Let me know if you have any questions.

As you are aware, the x-axis value must be a number, not a string, so the graph can be created by using the data frame index and then setting the string ticks.
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure('Graphic', figsize=(10,7.5), dpi=72)
ax1 = fig.add_axes([0.15, 0.20, 0.70, 0.60])
df.plot(kind='line', marker='o', style=['--'], linewidth=7, color='black', ms=15, ax=ax1)
for x,y in zip(df.index, df['MEAN']):
label = "{:.2f}".format(y)
plt.annotate(label, # this is the text
(x,y),
textcoords="offset points",
xytext=(0,10),
ha='center')
ax1.set_xticks(np.arange(0,12,1))
ax1.set_xticklabels(df['Month'].unique())
plt.show()

Related

Python : Pandas Chaining : How to add text to the plot?

How to add text/label to a bar plot, when using pandas chaining ? Below is how I'm plotting without the label.
(
df
.groupby(['col1','col2'], dropna=False)
[['col1', 'col2']]
.size()
.unstack()
.plot(kind='bar', figsize = (8,8))
)
The unstacked data frame (right before .plot in the above code) has data as below.
col2 1.0 2.0 3.0 4.0 NaN
col1
1.0 514 1922 7827 18877 1966
2.0 NaN NaN NaN NaN 2018
NaN 21 20 59 99 5570
The plot is as below:
I would like to have the numbers displayed on top of the bars. Please advice. Thank you.
You have to get the output of your chaining (return an Axes instance):
ax = (df.groupby(['col1', 'col2'], dropna=False)[['col1', 'col2']]
.size().unstack()
.plot(kind='bar', figsize=(8, 8)))
for bars in ax.containers:
ax.bar_label(bars)
Output:
Update
I'm using 3.3.2 and cannot upgrade due to system restrictions
for rect in ax.patches:
height = rect.get_height()
ax.annotate(r'{:d}'.format(int(height)),
xy=(rect.get_x() + rect.get_width() / 2, height),
xytext=(0, 3),
textcoords="offset points",
ha='center', va='bottom')

my plot picture have two xticks and two yticks by using matplotlib

the code is so simple, but there are two xticks and two yticks. it's so strange!
fig = plt.figure(figsize=(16,12))
ax1 = fig.add_subplot(1, 1, 1)
ax1.plot(data['timestamp'], data['value'], 'r', label='value')
ax1.set_xlabel('date', fontsize=16)
ax1.set_ylabel('profit', fontsize=16)
ax1.legend(loc='upper left')
ax1.grid(True)
the data value is below:
0 2010-01-04
1 2010-01-04
2 2010-03-08
3 2010-07-05
4 2010-11-04
Name: timestamp, dtype: datetime64[ns]
0 1.037868
1 1.085912
2 1.092537
3 1.077828
4 1.160641
plot:
I just want the data['timestamp'] and data['value'] show on the picture.
I have tried to add the code below, but the result is the same.
ax1.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m"))
ax1.xaxis.set_major_locator(mdates.YearLocator())
ax1.xaxis.set_minor_locator(mdates.MonthLocator())
I have get the x-tick and y-ticks, the result as below, there are not any value like 0, 0.2, 0.4, 0.6, 0.8, 1.0 in the result.
[14610. 14641. 14669. 14700. 14730. 14761. 14791. 14822. 14853. 14883. 14914.]
[1.02 1.04 1.06 1.08 1.1 1.12 1.14 1.16 1.18]

Creating a multi-bar plot in MatplotLib

Given a simple pd.Dataframe df that looks like this:
workflow blocked_14 blocked_7 blocked_5 blocked_2 blocked_1
au_in_service_order_response au_in_service_order_response 12.00 11.76 15.38 25.0 0.0
au_in_cats_sync_billing_period au_in_cats_sync_billing_period 3.33 0.00 0.00 0.0 0.0
au_in_MeterDataNotification au_in_MeterDataNotification 8.70 0.00 0.00 0.0 0.0
I want to create a bar-chart that shows the blocked_* columns as the x-axis.
Since df.plot(x='workflow', kind='bar') obviously puts the workflows on the x-axis, I tried ax = blocked_df.plot(x=['blocked_14','blocked_7',...], kind='bar') but this gives me
ValueError: x must be a label or position
How would I create 5 y-Values and have each bar show the according value of the workflow?
Since pandas interprets the x as the index and y as the values you want to plot, you'll need to transpose your dataframe first.
import matplotlib.pyplot as plt
ax = df.set_index('workflow').T.plot.bar()
plt.show()
But that doesn't look too good does it? Let's ensure all of the labels fit on the Axes and move the legend outside of the plot so it doesn't obscure the data.
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(14, 6), layout='constrained')
ax = df.set_index('workflow').T.plot.bar(legend=False, ax=ax)
ax.legend(loc='upper left', bbox_to_anchor=(1, .8))
plt.show()

How to plot sequential data, changing the color according to cluster

I have a dataframe with information concerning the date and the cluster that it belongs (it was done before based on collected temperatures for each day). I want to plot this data in sequence, like a stacked bar chart, changing the color of each element according to the assigned cluster. Here it is my table (the info goes up to 100 days):
Date
order
ClusterNo2
constant
2020-08-07
1
3.0
1
2020-08-08
2
0.0
1
2020-08-09
3
1.0
1
2020-08-10
4
3.0
1
2020-08-11
5
1.0
1
2020-08-12
6
1.0
1
2020-08-13
7
3.0
1
2020-08-14
8
2.0
1
2020-08-15
9
2.0
1
2020-08-16
10
2.0
1
2020-08-17
11
2.0
1
2020-08-18
12
1.0
1
2020-08-19
13
1.0
1
2020-08-20
14
0.0
1
2020-08-21
15
0.0
1
2020-08-22
16
1.0
1
Obs: I can't simply group the data by cluster because the plot should be sequential. I thought writing a code to identify the number of elements of each cluster sequentially, but then I will face the same problem for plotting. Someone know how to solve this?
The expected result should be something like this (the numbers inside the bar representing the cluster, the x-axis the time in days and the bar width the number of observed days with the same cluster in order :
You could use the dates for the x-axis, the 'constant' column for the y-axis,
and the Cluster id for the coloring.
You can create a custom legend using a list of colored rectangles.
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
import pandas as pd
import numpy as np
N = 100
df = pd.DataFrame({'Date': pd.date_range('2020-08-07', periods=N, freq='D'),
'order': np.arange(1, N + 1),
'ClusterNo2': np.random.randint(0, 4, N).astype(float),
'constant': 1})
df['ClusterNo2'] = df['ClusterNo2'].astype(int) # convert to integers
fig, ax = plt.subplots(figsize=(15, 3))
num_clusters = df['ClusterNo2'].max() + 1
colors = plt.cm.Set2.colors
ax.bar(x=range(len(df)), height=df['constant'], width=1, color=[colors[i] for i in df['ClusterNo2']], edgecolor='none')
ax.set_xticks(range(len(df)))
labels = ['' if i % 3 != 0 else day.strftime('%d\n%b %Y') if i == 0 or day.day <= 3 else day.strftime('%d')
for i, day in enumerate(df['Date'])]
ax.set_xticklabels(labels)
ax.margins(x=0, y=0)
ax.yaxis.set_major_locator(MaxNLocator(integer=True))
legend_handles = [plt.Rectangle((0, 0), 0, 0, color=colors[i], label=f'{i}') for i in range(num_clusters)]
ax.legend(handles=legend_handles, title='Clusters', bbox_to_anchor=(1.01, 1.01), loc='upper left')
fig.tight_layout()
plt.show()
You could just plot a normal bar graph, with 1 bar corresponding to 1 day. If you make the width also 1, it will look as if the patches are contiguous.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import BoundaryNorm
# simulate data
total_datapoints = 16
total_clusters = 4
order = np.arange(total_datapoints)
clusters = np.random.randint(0, total_clusters, size=total_datapoints)
# map clusters to colors
cmap = plt.cm.tab10
bounds = np.arange(total_clusters + 1)
norm = BoundaryNorm(bounds, cmap.N)
colors = [cmap(norm(cluster)) for cluster in clusters]
# plot
fig, ax = plt.subplots()
ax.bar(order, np.ones_like(order), width=1, color=colors, align='edge')
# xticks
change_points = np.where(np.diff(clusters) != 0)[0] + 1
change_points = np.unique([0] + change_points.tolist() + [total_datapoints])
ax.set_xticks(change_points)
# annotate clusters
for ii, dx in enumerate(np.diff(change_points)):
xx = change_points[ii] + dx/2
ax.text(xx, 0.5, str(clusters[int(xx)]), ha='center', va='center')
ax.set_xlabel('Time (days)')
plt.show()

labelling bins in each subplots of an histogram chart

I have a dataframe,df with 29 rows by 24 columns dimension
Index 0.0 5.0 34.0 ... 22.0
2017-08-03 00:00:00 10 0 10 0
2017-08-04 00:00:00 20 60 1470 20
2017-08-05 00:00:00 0 58 0 24
2017-08-06 00:00:00 0 0 480 24
2017-09-07 00:00:00 0 0 0 25
: : : : :
: : : : :
2017-09-30 00:00:00
I intend to label bins for each subplot representing a column in the histogram chart.I have been able to draw the histogram in each subplot for each column using this code
fig = plt.figure(figsize = (15,20))
ax = fig.gca()
#Initialize the figure
plt.style.use('seaborn-darkgrid')
df.hist(ax = ax)
However, the labels of the bins of each subplot are far apart and bin labels are not explicitly specified by ranges on the x-axis which is difficult to interpret. I have looked at
Aligning bins to xticks in plt.hist but it doesnt explicitly solve for labelling bins when subplots are concerned. Any help will be great...
I have also tried this but i get ValueError: too many values to unpack (expected 2)
x=[0,40,80,120,160,200,240,280,320]
fig = plt.figure(figsize = (15,20))
ax = fig.gca()
# Initialize the figure
plt.style.use('seaborn-darkgrid')
n,bins= plt.hist(df,bins= x)
#labels & axes
plt.locator_params(nbins=8, axis='x')
plt.ticklabel_format(style='sci', axis='x', scilimits=(0,0))
plt.title('Daily occurrence',fontsize=16)
plt.xlabel('Number of occurrence',fontsize=12)
plt.ylabel('Frequency',fontsize=12)
plt.xticks(x)
plt.xlim(0,320)

Categories