I am studying different variables. I want to plot the results with the stadnard error.
I use the filter function because depending on what I want to analyse, I am interested in just plotting mineral, or just plotting one material...etc. I mention this because it is important for the error bars. With seaborn it is not possible to plot the error bars (I used the raw data and I introduced in the seaborn function cd='', but it does not work. Therefore, I have calculated the mean and st error in excel and I plot that directly. The table is the result of the average and the st error that I use in the script.
If I add ci in the seaborn, does not do anything. Therefore I want to add it externally in a second line. But I have tried with ax.errorbar(), I cant either plot the st error.
import os
import io
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.ticker import FormatStrFormatter
Results = pd.read_excel('results.xlsx',sheet_name='Sheet1',usecols="A:J")
df=pd.DataFrame(Results)
RR_filtered=Results[(Results['Mineral ']=='IC60') | (Results['Mineral ']=='MinFree')]
R_filtered=RR_filtered[(Results['Material']=='A')]
palette = ["#fdae61","#abd9e9"]
sns.set_palette(palette)
ax1=sns.relplot( data=R_filtered,x="Impeller speed (rpm)", y="Result",col="Media size ",hue="Mineral content (g/g fibre)",
palette=palette,size="Media size ",sizes=(50, 200))
R2_filtered=RR_filtered[(Results['Material']=='B')]
ax2=sns.relplot( data=R2_filtered,x="Impeller speed (rpm)", y="Result",col="Media size ",hue="Mineral content (g/g fibre)",
palette=palette,size="Media size ",sizes=(50, 200))
plt.show()
data as image
Media size Material Impeller speed (rpm) Energy input (kWh/t) Mineral Mineral content (g/g fibre) Result ster
1.7 A 400 3000 IC60 4 3.42980002276166 0.21806853183829
1.7 A 650 3000 IC60 4 5.6349292302978 0.63877270588513
1.7 A 900 3000 IC60 4 6.1386616444364 0.150420705145224
1.7 A 1150 3000 IC60 4 5.02677117937851 1.05459146256349
1.7 A 1400 3000 IC60 4 3.0654271029038 0.917937247698497
3 A 400 3000 IC60 4 8.06973541574516 2.07869756201064
3 A 650 3000 IC60 4 4.69110601906018 1.21725878149246
3 A 900 3000 IC60 4 10.2119514553564 1.80680816945106
3 A 1150 3000 IC60 4 7.3271067522139 0.438931805677489
3 A 1400 3000 IC60 4 4.86901883487513 2.04826541508181
1.7 A 400 3000 MinFree 0 1.30614274245145 0.341512517371074
1.7 A 650 3000 MinFree 0 0.80632268273782 0.311762840996982
1.7 A 900 3000 MinFree 0 1.35958635068886 0.360649049944933
1.7 A 1150 3000 MinFree 0 1.38784671261469 0.00524838126778526
1.7 A 1400 3000 MinFree 0 1.12365621425779 0.561737044169193
3 A 400 3000 MinFree 0 4.61104587078813 0.147526557483362
3 A 650 3000 MinFree 0 4.40934493149759 0.985706944001226
3 A 900 3000 MinFree 0 5.06333415444978 0.00165055503033251
3 A 1150 3000 MinFree 0 3.85940865344646 0.731238210429852
3 A 1400 3000 MinFree 0 3.75572328102963 0.275897272330075
3 A 400 3000 GIC 4 6.05239906571977 0.0646300937591957
3 A 650 3000 GIC 4 7.9023202316634 0.458062146361444
3 A 900 3000 GIC 4 6.97774277141699 0.171777036954104
3 A 1150 3000 GIC 4 11.0705742735252 1.3960974547215
3 A 1400 3000 GIC 4 9.37948091546579 0.0650589433632627
1.7 A 869 3000 IC60 4 2.39416757908564 0.394947207603093
3 A 859 3000 IC60 4 10.2373958352881 1.55162686552938
1.7 A 885 3000 BHX 4 87.7569689333017 10.2502550323564
3 A 918 3000 BHX 4 104.135074642339 4.77467275433362
1.7 B 400 3000 MinFree 0 1.87573877068556 0.34648345153664
1.7 B 650 3000 MinFree 0 1.99555403904079 0.482200923313764
1.7 B 900 3000 MinFree 0 2.54989484285768 0.398071770532481
1.7 B 1150 3000 MinFree 0 3.67636872311402 0.662270521850053
1.7 B 1400 3000 MinFree 0 3.5664978541551 0.164453275639932
3 B 400 3000 MinFree 0 2.62948341485392 0.0209463845730038
3 B 650 3000 MinFree 0 3.0066638279753 0.305024483713006
3 B 900 3000 MinFree 0 2.79255446831386 0.472851866083359
3 B 1150 3000 MinFree 0 5.64970870330824 0.251859240942665
3 B 1400 3000 MinFree 0 7.40595580787647 0.629256778750272
1.7 B 400 3000 IC60 4 0.38040036521839 0.231869270120922
1.7 B 650 3000 IC60 4 0.515922221163329 0.434661621954815
1.7 B 900 3000 IC60 4 3.06358032815653 0.959408177590503
1.7 B 1150 3000 IC60 4 4.04800689693192 0.255594912271896
1.7 B 1400 3000 IC60 4 3.69967975589305 0.469944383688801
3 B 400 3000 IC60 4 1.35706340378197 0.134829945730943
3 B 650 3000 IC60 4 1.91317966458018 1.77106692180411
3 B 900 3000 IC60 4 0.874227487043329 0.493348110823194
3 B 1150 3000 IC60 4 2.71732337235447 0.0703901684702626
3 B 1400 3000 IC60 4 4.96743231003956 0.45853815499614
3 B 400 3000 GIC 4 0.325743752029247 0.325743752029247
3 B 650 3000 GIC 4 3.12776074994155 0.452049425276085
3 B 900 3000 GIC 4 3.25564762321322 0.319567445434468
3 B 1150 3000 GIC 4 5.99730462724499 1.03439035936441
3 B 1400 3000 GIC 4 7.51312624370307 0.38399627585515
Tested in python 3.11, pandas 1.5.2, matplotlib 3.6.2, seaborn 0.12.1
Sample DataFrame
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data = {'Media size': [1.7, 1.7, 1.7, 1.7, 1.7, 3.0, 3.0, 3.0, 3.0, 3.0, 1.7, 1.7, 1.7, 1.7, 1.7, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 1.7, 3.0, 1.7, 3.0, 1.7, 1.7, 1.7, 1.7, 1.7, 3.0, 3.0, 3.0, 3.0, 3.0, 1.7, 1.7, 1.7, 1.7, 1.7, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0],
'Material': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B'],
'Impeller speed (rpm)': [400, 650, 900, 1150, 1400, 400, 650, 900, 1150, 1400, 400, 650, 900, 1150, 1400, 400, 650, 900, 1150, 1400, 400, 650, 900, 1150, 1400, 869, 859, 885, 918, 400, 650, 900, 1150, 1400, 400, 650, 900, 1150, 1400, 400, 650, 900, 1150, 1400, 400, 650, 900, 1150, 1400, 400, 650, 900, 1150, 1400],
'Energy input (kWh/t)': [3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000, 3000],
'Mineral': ['IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'GIC', 'GIC', 'GIC', 'GIC', 'GIC', 'IC60', 'IC60', 'BHX', 'BHX', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'MinFree', 'IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'IC60', 'GIC', 'GIC', 'GIC', 'GIC', 'GIC'],
'Mineral content (g/g fibre)': [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 4, 4, 4, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4], 'Result': [3.42980002276166, 5.6349292302978, 6.1386616444364, 5.02677117937851, 3.0654271029038, 8.06973541574516, 4.69110601906018, 10.2119514553564, 7.3271067522139, 4.86901883487513, 1.30614274245145, 0.80632268273782, 1.35958635068886, 1.38784671261469, 1.12365621425779, 4.61104587078813, 4.40934493149759, 5.06333415444978, 3.85940865344646, 3.75572328102963, 6.05239906571977, 7.9023202316634, 6.97774277141699, 11.0705742735252, 9.37948091546579, 2.39416757908564, 10.2373958352881, 87.7569689333017, 104.135074642339, 1.87573877068556, 1.99555403904079, 2.54989484285768, 3.67636872311402, 3.5664978541551, 2.62948341485392, 3.0066638279753, 2.79255446831386, 5.64970870330824, 7.40595580787647, 0.38040036521839, 0.515922221163329, 3.06358032815653, 4.04800689693192, 3.69967975589305, 1.35706340378197, 1.91317966458018, 0.874227487043329, 2.71732337235447, 4.96743231003956, 0.325743752029247, 3.12776074994155, 3.25564762321322, 5.99730462724499, 7.51312624370307],
'ster': [0.21806853183829, 0.63877270588513, 0.150420705145224, 1.05459146256349, 0.917937247698497, 2.07869756201064, 1.21725878149246, 1.80680816945106, 0.438931805677489, 2.04826541508181, 0.341512517371074, 0.311762840996982, 0.360649049944933, 0.0052483812677852, 0.561737044169193, 0.147526557483362, 0.985706944001226, 0.0016505550303325, 0.731238210429852, 0.275897272330075, 0.0646300937591957, 0.458062146361444, 0.171777036954104, 1.3960974547215, 0.0650589433632627, 0.394947207603093, 1.55162686552938, 10.2502550323564, 4.77467275433362, 0.34648345153664, 0.482200923313764, 0.398071770532481, 0.662270521850053, 0.164453275639932, 0.0209463845730038, 0.305024483713006, 0.472851866083359, 0.251859240942665, 0.629256778750272, 0.231869270120922, 0.434661621954815, 0.959408177590503, 0.255594912271896, 0.469944383688801, 0.134829945730943, 1.77106692180411, 0.493348110823194, 0.0703901684702626, 0.45853815499614, 0.325743752029247, 0.452049425276085, 0.319567445434468, 1.03439035936441, 0.38399627585515]}
df = pd.DataFrame(data)
Map plt.errorbar onto sns.relplot
# filter the dataframe by Mineral
filtered = df[(df['Mineral']=='IC60') | (df['Mineral']=='MinFree')]
# plot the filtered dataframe
g = sns.relplot(data=filtered, x="Impeller speed (rpm)", y="Result", col="Media size", row='Material', hue="Mineral content (g/g fibre)", size="Media size", sizes=(50, 200))
# add the errorbars
g.map(plt.errorbar, "Impeller speed (rpm)", "Result", "ster", marker="none", color='r', ls='none')
Specify color for each group of errorbars
plt.errorbar only accepts one value for color. In order to match the colors to a palette, the specific data for each facet needs to be selected, and the proper color for that group passed to the color parameter.
errorbars that are smaller than the circle can't be seen.
# create a palette dictionary for the unique values in the hue column
palette = dict(zip(filtered['Mineral content (g/g fibre)'].unique(), ["#fdae61", "#abd9e9"]))
# plot the filtered dataframe
g = sns.relplot(data=filtered, x="Impeller speed (rpm)", y="Result", col="Media size", row='Material', hue="Mineral content (g/g fibre)", size="Media size", sizes=(50, 200), palette=palette)
# iterate through each facet of the facetgrid
for (material, media), ax in g.axes_dict.items():
# select the data for the facet
data = filtered[filtered['Material'].eq(material) & filtered['Media size'].eq(media)]
# select the data for each hue group
for group, selected in data.groupby('Mineral content (g/g fibre)'):
# plot the errorbar with the correct color for each group
ax.errorbar(data=selected, x="Impeller speed (rpm)", y="Result", yerr="ster", marker="none", color=palette[group], ls='none')
I have a list of persons with the respective earnings by company like this
Company_code Person Date Earning1 Earning2
1 Jonh 2014-01 100 200
2 Jonh 2014-01 300 400
1 Jonh 2014-02 500 600
1 Peter 2014-01 300 400
1 Peter 2014-02 500 600
And I would like to summarize into this:
Company_code Person 2014-01_E1 2014-01_E2 2014-02_E1 2014-02_E2
1 Jonh 100 200 300 400
2 Jonh 500 600
1 Peter 300 400 500 600
I had the same problem doing this with SQL which I solved with the code:
with t(Company_code, Person, Dt, Earning1, Earning2) as (
select 1, 'Jonh', to_date('2014-01-01', 'YYYY-MM-DD'), 100, 200 from dual union all
select 2, 'Jonh', to_date('2014-01-01', 'YYYY-MM-DD'), 300, 400 from dual union all
select 1, 'Jonh', to_date('2014-02-01', 'YYYY-MM-DD'), 500, 600 from dual union all
select 1, 'Peter', to_date('2014-01-01', 'YYYY-MM-DD'), 300, 400 from dual union all
select 1, 'Peter', to_date('2014-02-01', 'YYYY-MM-DD'), 500, 600 from dual
)
select *
from t
pivot (
sum(Earning1) e1
, sum(Earning2) e2
for dt in (
to_date('2014-01-01', 'YYYY-MM-DD') "2014-01"
, to_date('2014-02-01', 'YYYY-MM-DD') "2014-02"
)
)
COMPANY_CODE PERSON 2014-01_E1 2014-01_E2 2014-02_E1 2014-02_E2
----------------------------------------------------------------------
2 Jonh 300 400 - -
1 Peter 300 400 500 600
1 Jonh 100 200 500 600
How can this be achived in python? I'm trying with Pandas pivot_table:
pd.pivot_table(df, columns=['COMPANY_CODE', 'PERSON', 'DATE'], aggfunc=np.sum)
but this just transposes the table ... any clues?
Using user1827356's suggestion:
df2 = pd.pivot_table(df, rows=['Company_code', 'Person'], cols=['Date'], aggfunc='sum')
print(df2)
# Earning1 Earning2
# Date 2014-01 2014-02 2014-01 2014-02
# Company_code Person
# 1 Jonh 100 500 200 600
# Peter 300 500 400 600
# 2 Jonh 300 NaN 400 NaN
You can flatten the hierarchical columns like this:
columns = ['{}_E{}'.format(date, earning.replace('Earning', ''))
for earning, date in df2.columns.tolist()]
df2.columns = columns
print(df2)
# 2014-01_E1 2014-02_E1 2014-01_E2 2014-02_E2
# Company_code Person
# 1 Jonh 100 500 200 600
# Peter 300 500 400 600
# 2 Jonh 300 NaN 400 NaN
Here's the nicest way to do it, using unstack.
df = pd.DataFrame({
'company_code': [1, 2, 1, 1, 1],
'person': ['Jonh', 'Jonh', 'Jonh', 'Peter', 'Peter'],
'earning2': [200, 400, 600, 400, 600],
'earning1': [100, 300, 500, 300, 500],
'date': ['2014-01', '2014-01', '2014-02', '2014-01', '2014-02']
})
df = df.set_index(['date', 'company_code', 'person'])
df.unstack('date')
Resulting in:
earning1 earning2
date 2014-01 2014-02 2014-01 2014-02
company_code person
1 Jonh 100.0 500.0 200.0 600.0
1 Peter 300.0 500.0 400.0 600.0
2 Jonh 300.0 NaN 400.0 NaN
Setting the index to ['date', 'company_code', 'person'] is a good idea anyway, since that's really what your DataFrame contains: two different earnings categories (1 and 2) each described by a date, a company code and a person.
It's good practice to always work out what the 'real' data in your DataFrame is, and which columns are meta-data, and index accordingly.