Plotting grouped multi-index data with a For loop - python

I am trying to produce multiple plots from a for loop.
My dataframe is multi-indexed as below:
temperature depth
ID Month
33 2 150 95
3 148 79
4 148 54
5 155 77
55 2 168 37
3 172 33
4 107 32
5 155 77
61 2 168 37
3 172 33
4 107 32
5 155 77
I want to loop through each ID and plot:
Temperature as a line against Month (x-axis)
Depth as a bar against Month (x-axis)
I want these to be on the same plot.
This is what I have so far:
# group the dataframe
grp = df.groupby([df.index.get_level_values(0), df.index.get_level_values(1)])
# create empty plots
fig, ax = plt.subplots()
# create an empty plot for combining with ax
ax2 = ax.twinx()
# for loop
for ID, group in grp:
ax.bar(df.index.get_level_values(1), group["temperature"], color='blue', label='Release')
ax2.plot(df.index.get_level_values(1), group["depth"], color='green', label='Hold')
ax.set_xticklabels(df.index.get_level_values(1))
plt.savefig("value{y}.png".format(y=ID))
next
dataframe reprex:
import pandas as pd
index = pd.MultiIndex.from_product([[33, 55, 61],['2','3','4', '5']], names=['ID','Month'])
df = pd.DataFrame([[150, 95],
[148, 79],
[148, 54],
[155, 77],
[168, 37],
[172, 33],
[107, 32],
[155, 77],
[168, 37],
[172, 33],
[107, 32],
[155, 77]],
columns=['temperature', 'depth'], index=index)

Related

How to make a multi-level chart column label by hue

This is a continuation of this question. But now I have a bar-chart with hue.
Here's what I have:
df = pd.DataFrame({'age': ['20-30', '20-30', '20-30', '30-40', '30-40', '30-40', '40-50', '40-50', '40-50', '50-60', '50-60', '50-60'],
'expenses':['50$', '100$', '200$', '50$', '100$', '200$', '50$', '100$', '200$', '50$', '100$', '200$'],
'users': [59, 42, 57, 68, 47, 98, 75, 73, 54, 81, 52, 43],
'buyers': [22, 35, 18, 27, 12, 57, 19, 29, 31, 47, 10, 5],
'percentage': [37.2881, 83.3333, 31.5789, 39.7058, 25.5319, 58.1632, 25.3333, 39.7260, 57.4074, 58.0246, 19.2307, 11.6279]})
index
age
expenses
users
buyers
percentage
0
20-30
50$
59
22
37.2881
1
20-30
100$
42
35
83.3333
2
20-30
200$
57
18
31.5789
3
30-40
50$
68
27
39.7058
4
30-40
100$
47
12
25.5319
5
30-40
200$
98
57
58.1632
6
40-50
50$
75
19
25.3333
7
40-50
100$
73
29
39.726
8
40-50
200$
54
31
57.4074
9
50-60
50$
81
47
58.0246
10
50-60
100$
52
10
19.2307
11
50-60
200$
43
5
11.6279
fig, ax = plt.subplots(figsize=(20, 10))
# Plot the all users
sns.barplot(x='age', y='users', data=df, hue='expenses', palette='Blues', edgecolor='grey', alpha=0.7, ax=ax)
# Plot the buyers
sns.barplot(x='age', y='buyers', data=df, hue='expenses', palette='Blues', edgecolor='darkgrey', hatch='//', ax=ax)
plt.show()
I need to get the same chart. In the case of hue, the code:
# extract the separate containers
c1, c2 = ax.containers
# annotate with the users values
ax.bar_label(c1, fontsize=13)
# annotate with the buyer and percentage values
l2 = [f"{v.get_height()}: {df.loc[i, 'percentage']}%" for i, v in enumerate(c2)]
ax.bar_label(c2, labels=l2, fontsize=8, label_type='center', fontweight='bold')
no longer works.
I would be glad for any hints.
Each object in ax.containers represents the bars for a single hue group.
When using bar_label, the annotations for each bar in '50$', then '100$', and then '200$' are added.
I think it's easier to select the correct data by annotating the 'buyers' group separately.
The answer to your previous question selects the data from the entire dataframe, but here Boolean indexing is used to select only a segment of the dataframe. Using print(data) in each loop will help with understanding.
fig, ax = plt.subplots(figsize=(20, 10))
# plot the all users
sns.barplot(x='age', y='users', data=df, hue='expenses', palette='Blues', edgecolor='grey', alpha=0.7, ax=ax)
# annotate the bars in the 3 containers (1 container per hue group)
for c in ax.containers:
ax.bar_label(c)
# plot the 'buyers', which adds 3 more containers to ax
sns.barplot(x='age', y='buyers', data=df, hue='expenses', palette='Blues', edgecolor='darkgrey', hatch='//', ax=ax)
# iterate through the last 3 new containers containing the hatched groups
for c in ax.containers[3:]:
# get the hue label, which will be used to select the data group
hue_label = c.get_label()
# select the data based on hue_label
data = df.loc[df.expenses.eq(hue_label), ['buyers', 'percentage']]
# customize the labels
labels = [f"{v.get_height()}: {data.iloc[i, 1]:0.2f}%" for i, v in enumerate(c)]
# add the labels
ax.bar_label(c, labels=labels)
plt.show()

Matplotlib error plotting interval bins for discretized values form pandas dataframe

An error is returned when I want to plot an interval.
I created an interval for my age column so now I want to show on a chart the age interval compares to the revenue
my code
bins = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
clients['tranche'] = pd.cut(clients.age, bins)
clients.head()
client_id sales revenue birth age sex tranche
0 c_1 39 558.18 1955 66 m (60, 70]
1 c_10 58 1353.60 1956 65 m (60, 70]
2 c_100 8 254.85 1992 29 m (20, 30]
3 c_1000 125 2261.89 1966 55 f (50, 60]
4 c_1001 102 1812.86 1982 39 m (30, 40]
# Plot a scatter tranche x revenue
df = clients.groupby('tranche')[['revenue']].sum().reset_index().copy()
plt.scatter(df.tranche, df.revenue)
plt.show()
But an error appears ending by
TypeError: float() argument must be a string or a number, not 'pandas._libs.interval.Interval'
How to use an interval for plotting ?
You'll need to add labels. (i tried to convert them to str using .astype(str) but that does not seem to work in 3.9)
if you do the following, it will work just fine.
labels = ['10-20', '20-30', '30-40']
df['tranche'] = pd.cut(df.age, bins, labels=labels)

Maximum average of n consecutive values in DataFrame

I want to find maximum average of n conseсutive values in DataFrame.
import pandas as pd
list1 = [120, 130, 135, 140, 170, 131, 131, 151, 181, 191, 200, 210, 220, 170, 160, 151, 120, 140, 170, 173]
list2 = [80, 81, 82, 82, 82, 83, 84, 84, 85, 85, 85, 86, 87, 88, 89, 90, 90, 90, 91, 91 ]
df = pd.DataFrame(zip(list1, list2), columns=['value1', 'value2'])
df['interval'] = 0
interval_duration = 3 # set interval duration
number_of_intervals = 4 # set number of intervals
# I found only a way with for loop:
for x in range(1, number_of_intervals + 1):
max_average_interval = sum(df['value1'][0 : interval_duration]) / interval_duration
item_max = 0
for item in range(len(df['value1']) - interval_duration + 1):
if sum(df['interval'].loc[item : item + interval_duration - 1]) == 0:
if max_average_interval < sum(df['value1'][item : item + interval_duration]) / interval_duration:
max_average_interval = sum(df['value1'][item : item + interval_duration]) / interval_duration
item_max = item
df['interval'].loc[item_max : item_max + interval_duration - 1] = x
Result:
value1 value2 interval
0 120 80 0
1 130 81 0
2 135 82 0
3 140 82 0
4 170 82 0
5 131 83 0
6 131 84 0
7 151 84 2
8 181 85 2
9 191 85 2
10 200 85 1
11 210 86 1
12 220 87 1
13 170 88 4
14 160 89 4
15 151 90 4
16 120 90 0
17 140 90 3
18 170 91 3
19 173 91 3
where in the interval column:
1 - first maximum interval of consecutive values
2 - second maximum interval of consecutive values
and so on.
Question. If there is a more efficient way to do this? That's matter because I can have thousands and thousands of values.
Updated
Updated again..

Group by continuous indexes in Pandas DataFrame

I'm working on code for sensors data analysis using python.
I'm taking rows from DataFrame (of gyro data in the example) according to some condition.
import pandas as pd
gyro = pd.read_csv("gyroOutput.csv")
above = gyro[gyro['gyro_z'] > 0.30]
above
Out[162]:
gyro_x gyro_y gyro_z elapsed
27 0.026632 0.021305 0.305731 4.927
28 0.017044 0.011718 0.344080 5.115
29 0.008522 0.013848 0.380299 5.289
30 0.006392 0.026632 0.412257 5.470
31 0.007457 0.005326 0.448476 5.643
32 -0.004261 0.012783 0.465521 5.822
33 -0.001065 0.000000 0.452737 6.002
34 0.009587 0.006392 0.445281 6.181
35 0.010653 0.001065 0.412257 6.361
36 0.006392 0.003196 0.373908 6.543
37 -0.006392 0.007457 0.320645 6.722
108 -0.036219 0.052198 0.323840 19.470
109 -0.061785 -0.001065 0.389887 19.654
110 -0.049002 0.018109 0.453803 19.835
111 -0.038350 0.078830 0.513458 20.015
112 -0.034088 0.011718 0.555003 20.192
113 -0.005326 -0.001065 0.607201 20.374
114 0.009587 0.058590 0.629571 20.553
115 0.038350 -0.029827 0.598679 20.727
116 0.006392 0.013848 0.546481 20.907
117 0.007457 0.030893 0.478304 21.086
118 0.012783 -0.035154 0.446346 21.266
119 0.005326 -0.026632 0.367516 21.444
352 0.007457 0.028762 0.313188 63.284
353 0.006392 -0.011718 0.332363 63.463
354 0.008522 0.030893 0.378169 63.643
355 -0.015979 0.039415 0.409062 63.822
356 -0.009587 -0.022371 0.423975 64.002
357 -0.008522 0.023436 0.450607 64.181
358 -0.011718 0.047937 0.453803 64.361
That result data frame (above) holds groups of continuous indexes rows. For example, lines 27-37.
I want to get all those group's, couldn't find any way to do it using DataFrame.groupby or any other function.
I could iterate over the rows and separate them myself, but maybe there's some simpler way using pandas functions.
IIUC:
In [294]: df.groupby(df.index.to_series().diff().ne(1).cumsum()).groups
Out[294]:
{1: Int64Index([27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37], dtype='int64'),
2: Int64Index([108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119], dtype='int64'),
3: Int64Index([352, 353, 354, 355, 356, 357, 358], dtype='int64')}

Pivot Table to Dictionary

I have this pivot table:
[in]:unit_d
[out]:
units
store_nbr item_nbr
1 9 27396
28 4893
40 254
47 2409
51 925
89 157
93 1103
99 492
2 5 55104
11 655
44 117125
85 106
93 653
I want to have a dictionary with 'store_nbr' as the key and 'item_nbr' as the values.
So, {'1': [9, 28, 40,...,99], '2': [5, 11 ,44, 85, 93], ...}
I'd use groupby here, after resetting the index to make it into columns:
>>> d = unit_d.reset_index()
>>> {k: v.tolist() for k, v in d.groupby("store_nbr")["item_nbr"]}
{1: [9, 28, 40, 47, 51, 89, 93, 99], 2: [5, 11, 44, 85, 93]}

Categories