labelling bins in each subplots of an histogram chart

labelling bins in each subplots of an histogram chart - python

I have a dataframe,df with 29 rows by 24 columns dimension
Index 0.0 5.0 34.0 ... 22.0
2017-08-03 00:00:00 10 0 10 0
2017-08-04 00:00:00 20 60 1470 20
2017-08-05 00:00:00 0 58 0 24
2017-08-06 00:00:00 0 0 480 24
2017-09-07 00:00:00 0 0 0 25
: : : : :
: : : : :
2017-09-30 00:00:00
I intend to label bins for each subplot representing a column in the histogram chart.I have been able to draw the histogram in each subplot for each column using this code
fig = plt.figure(figsize = (15,20))
ax = fig.gca()
#Initialize the figure
plt.style.use('seaborn-darkgrid')
df.hist(ax = ax)
However, the labels of the bins of each subplot are far apart and bin labels are not explicitly specified by ranges on the x-axis which is difficult to interpret. I have looked at
Aligning bins to xticks in plt.hist but it doesnt explicitly solve for labelling bins when subplots are concerned. Any help will be great...
I have also tried this but i get ValueError: too many values to unpack (expected 2)
x=[0,40,80,120,160,200,240,280,320]
fig = plt.figure(figsize = (15,20))
ax = fig.gca()
# Initialize the figure
plt.style.use('seaborn-darkgrid')
n,bins= plt.hist(df,bins= x)
#labels & axes
plt.locator_params(nbins=8, axis='x')
plt.ticklabel_format(style='sci', axis='x', scilimits=(0,0))
plt.title('Daily occurrence',fontsize=16)
plt.xlabel('Number of occurrence',fontsize=12)
plt.ylabel('Frequency',fontsize=12)
plt.xticks(x)
plt.xlim(0,320)

Related

Colour by Category in scatterplot

My dataframe looks like this:
date index count weekday_num max_temperature_C
0 2019-04-01 0 1379 0 18
1 2019-04-02 1 1395 1 21
2 2019-04-03 2 1155 2 19
3 2019-04-04 3 342 3 18
4 2019-04-05 4 216 4 14
I would like to plot count vs max_temperature_C and colour by weekday_num
I have tried the below:
#create the scatter plot of trips vs Temp
plt.scatter(comb2['count'], comb2['max_temperature_C'], c=comb2['weekday_num'])
# Label the axis
plt.xlabel('Daily Trip count')
plt.ylabel('Max Temp c')
plt.legend(['weekday_num'])
# Show it!
plt.show()
However I am not sure quite how to get the legend to display all of the colours which correspond to each of the 'weekday_num' ?
Thanks

You can use the automated legend creation like this:
fig, ax = plt.subplots()
scatter = ax.(comb2['count'], comb2['max_temperature_C'], c=comb2['weekday_num'])
# produce a legend with the unique colors from the scatter
legend = ax.legend(*scatter.legend_elements(),
loc="upper right", title="Weekday num")
ax.add_artist(legend)
plt.show()

How to plot sequential data, changing the color according to cluster

I have a dataframe with information concerning the date and the cluster that it belongs (it was done before based on collected temperatures for each day). I want to plot this data in sequence, like a stacked bar chart, changing the color of each element according to the assigned cluster. Here it is my table (the info goes up to 100 days):
Date
order
ClusterNo2
constant
2020-08-07
1
3.0
1
2020-08-08
2
0.0
1
2020-08-09
3
1.0
1
2020-08-10
4
3.0
1
2020-08-11
5
1.0
1
2020-08-12
6
1.0
1
2020-08-13
7
3.0
1
2020-08-14
8
2.0
1
2020-08-15
9
2.0
1
2020-08-16
10
2.0
1
2020-08-17
11
2.0
1
2020-08-18
12
1.0
1
2020-08-19
13
1.0
1
2020-08-20
14
0.0
1
2020-08-21
15
0.0
1
2020-08-22
16
1.0
1
Obs: I can't simply group the data by cluster because the plot should be sequential. I thought writing a code to identify the number of elements of each cluster sequentially, but then I will face the same problem for plotting. Someone know how to solve this?
The expected result should be something like this (the numbers inside the bar representing the cluster, the x-axis the time in days and the bar width the number of observed days with the same cluster in order :

You could use the dates for the x-axis, the 'constant' column for the y-axis,
and the Cluster id for the coloring.
You can create a custom legend using a list of colored rectangles.
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
import pandas as pd
import numpy as np
N = 100
df = pd.DataFrame({'Date': pd.date_range('2020-08-07', periods=N, freq='D'),
'order': np.arange(1, N + 1),
'ClusterNo2': np.random.randint(0, 4, N).astype(float),
'constant': 1})
df['ClusterNo2'] = df['ClusterNo2'].astype(int) # convert to integers
fig, ax = plt.subplots(figsize=(15, 3))
num_clusters = df['ClusterNo2'].max() + 1
colors = plt.cm.Set2.colors
ax.bar(x=range(len(df)), height=df['constant'], width=1, color=[colors[i] for i in df['ClusterNo2']], edgecolor='none')
ax.set_xticks(range(len(df)))
labels = ['' if i % 3 != 0 else day.strftime('%d\n%b %Y') if i == 0 or day.day <= 3 else day.strftime('%d')
for i, day in enumerate(df['Date'])]
ax.set_xticklabels(labels)
ax.margins(x=0, y=0)
ax.yaxis.set_major_locator(MaxNLocator(integer=True))
legend_handles = [plt.Rectangle((0, 0), 0, 0, color=colors[i], label=f'{i}') for i in range(num_clusters)]
ax.legend(handles=legend_handles, title='Clusters', bbox_to_anchor=(1.01, 1.01), loc='upper left')
fig.tight_layout()
plt.show()

You could just plot a normal bar graph, with 1 bar corresponding to 1 day. If you make the width also 1, it will look as if the patches are contiguous.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import BoundaryNorm
# simulate data
total_datapoints = 16
total_clusters = 4
order = np.arange(total_datapoints)
clusters = np.random.randint(0, total_clusters, size=total_datapoints)
# map clusters to colors
cmap = plt.cm.tab10
bounds = np.arange(total_clusters + 1)
norm = BoundaryNorm(bounds, cmap.N)
colors = [cmap(norm(cluster)) for cluster in clusters]
# plot
fig, ax = plt.subplots()
ax.bar(order, np.ones_like(order), width=1, color=colors, align='edge')
# xticks
change_points = np.where(np.diff(clusters) != 0)[0] + 1
change_points = np.unique([0] + change_points.tolist() + [total_datapoints])
ax.set_xticks(change_points)
# annotate clusters
for ii, dx in enumerate(np.diff(change_points)):
xx = change_points[ii] + dx/2
ax.text(xx, 0.5, str(clusters[int(xx)]), ha='center', va='center')
ax.set_xlabel('Time (days)')
plt.show()

display last N values in x axis rang as label in matplotlib in Python

In my python script df.Value have set of n values(200). I need last 100 values as my x axis label like last 100-200 index values.
plt.figure(figsize=(100, 5), dpi=100)
plt.plot(df['Time'], df['sale'], label='sales')
plt.xlabel('Time ')
plt.ylabel('sales')
plt.title('sales')
plt.legend()
plt.show()
its show 0-200 value in x axis but i need last N values in x axis label
sample data
sample data
sales and time
1 604.802656 13:00:00
2 604.400000 13:01:00
3 604.900024 13:02:00
4 604.099976 13:03:00
5 604.000000 13:04:00
6 604.250000 13:05:00
7 604.400024 13:06:00
8 604.150024 13:07:00
9 604.000000 13:08:00

plt.xticks(np.arange(100),df['Time'].values[100:200])
thid will help you to shows 100 x axis label in last 100 values

try this
plt.xticks(np.arange(100, 200, step=1))
for your case i.e. Time on x-axis you can see this post https://stackoverflow.com/a/16428019/5202279

plt.figure(figsize=(100, 5), dpi=100)
plt.plot(df['Time'], df['sale'], label='sales')
plt.xlabel('Time ')
plt.xticks(np.arrange(100),df.Time[100:200],rotation=45)
plt.ylabel('sales')
plt.title('sales')
plt.legend()
plt.show()
np.ararange(100) indicate 100 X axis value want to show
and df.Time[100:200] get the last 100 string value from data set df.Time
rotate the labe 45 degree
thanks for your support

Change tick frequency for datetime axis [duplicate]

This question already has an answer here:
Change tick frequency on X (time, not number) frequency in matplotlib
(1 answer)
Closed 3 years ago.
I have the following dataframe:
Date Prod_01 Prod_02
19 2018-03-01 49870 0.0
20 2018-04-01 47397 0.0
21 2018-05-01 53752 0.0
22 2018-06-01 47111 0.0
23 2018-07-01 53581 0.0
24 2018-08-01 55692 0.0
25 2018-09-01 51886 0.0
26 2018-10-01 56963 0.0
27 2018-11-01 56732 0.0
28 2018-12-01 59196 0.0
29 2019-01-01 57221 5.0
30 2019-02-01 55495 472.0
31 2019-03-01 65394 753.0
32 2019-04-01 59030 1174.0
33 2019-05-01 64466 2793.0
34 2019-06-01 58471 4413.0
35 2019-07-01 64785 6110.0
36 2019-08-01 63774 8360.0
37 2019-09-01 64324 9558.0
38 2019-10-01 65733 11050.0
And I need to plot a time series of the 'Prod_01' column.
The 'Date' column is in the pandas datetime format.
So I used the following command:
plt.figure(figsize=(10,4))
plt.plot('Date', 'Prod_01', data=test, linewidth=2, color='steelblue')
plt.xticks(rotation=45, horizontalalignment='right');
Output:
However, I want to change the frequency of the xticks to one month, so I get one tick and one label for each month.
I have tried the following command:
plt.figure(figsize=(10,4))
plt.plot('Date', 'Prod_01', data=test, linewidth=2, color='steelblue')
plt.xticks(np.arange(1, len(test), 1), test['Date'] ,rotation=45, horizontalalignment='right');
But I get this:
How can I solve this problem?
Thanks in advance.

I'm not very familiar with pandas data frames. However, I can't see why this wouldn't work with any pyplot:
According the top SO answer on related post by ImportanceOfBeingErnest:
The spacing between ticklabels is exclusively determined by the space between ticks on the axes.
So, to change the distance between ticks, and the labels you can do this:
Suppose a cluttered and base-10 centered person displays the following graph:
It takes the following code and importing matplotlib.ticker:
import numpy as np
import matplotlib.pyplot as plt
# Import this, too
import matplotlib.ticker as ticker
# Arbitrary graph with x-axis = [-32..32]
x = np.linspace(-32, 32, 1024)
y = np.sinc(x)
# -------------------- Look Here --------------------
# Access plot's axes
axs = plt.axes()
# Set distance between major ticks (which always have labels)
axs.xaxis.set_major_locator(ticker.MultipleLocator(5))
# Sets distance between minor ticks (which don't have labels)
axs.xaxis.set_minor_locator(ticker.MultipleLocator(1))
# -----------------------------------------------------
# Plot and show graph
plt.plot(x, y)
plt.show()
To change where the labels are placed, you can change the distance between the 'major ticks'. You can also change the smaller 'minor ticks' in between, which don't have a number attached. E.g., on a clock, the hour ticks have numbers on them and are larger (major ticks) with smaller, unlabeled ones between marking the minutes (minor ticks).
By changing the --- Look Here --- part to:
# -------------------- Look Here --------------------
# Access plot's axes
axs = plt.axes()
# Set distance between major ticks (which always have labels)
axs.xaxis.set_major_locator(ticker.MultipleLocator(8))
# Sets distance between minor ticks (which don't have labels)
axs.xaxis.set_minor_locator(ticker.MultipleLocator(4))
# -----------------------------------------------------
You can generate the cleaner and more elegant graph below:
Hope that helps!

Pandas dataframe plotting - issue when switching from two subplots to single plot w/ secondary axis

I have two sets of data I want to plot together on a single figure. I have a set of flow data at 15 minute intervals I want to plot as a line plot, and a set of precipitation data at hourly intervals, which I am resampling to a daily time step and plotting as a bar plot. Here is what the format of the data looks like:
2016-06-01 00:00:00 56.8
2016-06-01 00:15:00 52.1
2016-06-01 00:30:00 44.0
2016-06-01 00:45:00 43.6
2016-06-01 01:00:00 34.3
At first I set this up as two subplots, with precipitation and flow rate on different axis. This works totally fine. Here's my code:
import matplotlib.pyplot as plt
import pandas as pd
from datetime import datetime
filename = 'manhole_B.csv'
plotname = 'SSMH-2A B'
plt.style.use('bmh')
# Read csv with precipitation data, change index to datetime object
pdf = pd.read_csv('precip.csv', delimiter=',', header=None, index_col=0)
pdf.columns = ['Precipitation[in]']
pdf.index.name = ''
pdf.index = pd.to_datetime(pdf.index)
pdf = pdf.resample('D').sum()
print(pdf.head())
# Read csv with flow data, change index to datetime object
qdf = pd.read_csv(filename, delimiter=',', header=None, index_col=0)
qdf.columns = ['Flow rate [gpm]']
qdf.index.name = ''
qdf.index = pd.to_datetime(qdf.index)
# Plot
f, ax = plt.subplots(2)
qdf.plot(ax=ax[1], rot=30)
pdf.plot(ax=ax[0], kind='bar', color='r', rot=30, width=1)
ax[0].get_xaxis().set_ticks([])
ax[1].set_ylabel('Flow Rate [gpm]')
ax[0].set_ylabel('Precipitation [in]')
ax[0].set_title(plotname)
f.set_facecolor('white')
f.tight_layout()
plt.show()
2 Axis Plot
However, I decided I want to show everything on a single axis, so I modified my code to put precipitation on a secondary axis. Now my flow data data has disppeared from the plot, and even when I set the axis ticks to an empty set, I get these 00:15 00:30 and 00:45 tick marks along the x-axis.
Secondary-y axis plots
Any ideas why this might be occuring?
Here is my code for the single axis plot:
f, ax = plt.subplots()
qdf.plot(ax=ax, rot=30)
pdf.plot(ax=ax, kind='bar', color='r', rot=30, secondary_y=True)
ax.get_xaxis().set_ticks([])

Here is an example:
Setup
In [1]: from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
%matplotlib inline
df = pd.DataFrame({'x' : np.arange(10),
'y1' : np.random.rand(10,),
'y2' : np.square(np.arange(10))})
df
Out[1]: x y1 y2
0 0 0.451314 0
1 1 0.321124 1
2 2 0.050852 4
3 3 0.731084 9
4 4 0.689950 16
5 5 0.581768 25
6 6 0.962147 36
7 7 0.743512 49
8 8 0.993304 64
9 9 0.666703 81
Plot
In [2]: fig, ax1 = plt.subplots()
ax1.plot(df['x'], df['y1'], 'b-')
ax1.set_xlabel('Series')
ax1.set_ylabel('Random', color='b')
for tl in ax1.get_yticklabels():
tl.set_color('b')
ax2 = ax1.twinx() # Note twinx, not twiny. I was wrong when I commented on your question.
ax2.plot(df['x'], df['y2'], 'ro')
ax2.set_ylabel('Square', color='r')
for tl in ax2.get_yticklabels():
tl.set_color('r')
Out[2]:

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

labelling bins in each subplots of an histogram chart - python

Related

Colour by Category in scatterplot

How to plot sequential data, changing the color according to cluster

display last N values in x axis rang as label in matplotlib in Python

Change tick frequency for datetime axis [duplicate]

Pandas dataframe plotting - issue when switching from two subplots to single plot w/ secondary axis

Categories

Resources