I'm making a barplot using 3 datasets in seaborn, however each datapoint overlays the previous, regardless of if it is now hiding the previous plot. eg:
sns.barplot(x="Portfolio", y="Factor", data=d2,
label="Portfolio", color="g")
sns.barplot(x="Benchmark", y="Factor", data=d2,
label="Benchmark", color="b")
sns.barplot(x="Active Exposure", y="Factor", data=d2,
label="Active", color="r")
ax.legend(frameon=True)
ax.set(xlim=(-.1, .5), ylabel="", xlabel="Sector Decomposition")
sns.despine(left=True, bottom=True)
However, I want it to show green, even if the blue being overlayed is greater. Any ideas?
Without being able to see your data I can only guess that your dataframe is not in long-form. There's a section on the seaborn tutorial on the expected shape of DataFrames that seaborn is expecting, I'd take a look there for more info, specifically the section on messy data.
Because I can't see your DataFrame I have made some assumptions about it's shape:
import numpy as np
import pandas as pd
import seaborn as sns
df = pd.DataFrame({
"Factor": list("ABC"),
"Portfolio": np.random.random(3),
"Benchmark": np.random.random(3),
"Active Exposure": np.random.random(3),
})
# Active Exposure Benchmark Factor Portfolio
# 0 0.140177 0.112653 A 0.669687
# 1 0.823740 0.078819 B 0.072474
# 2 0.450814 0.702114 C 0.039068
We can melt this DataFrame to get the long-form data seaborn wants:
d2 = df.melt(id_vars="Factor", var_name="exposure")
# Factor exposure value
# 0 A Active Exposure 0.140177
# 1 B Active Exposure 0.823740
# 2 C Active Exposure 0.450814
# 3 A Benchmark 0.112653
# 4 B Benchmark 0.078819
# 5 C Benchmark 0.702114
# 6 A Portfolio 0.669687
# 7 B Portfolio 0.072474
# 8 C Portfolio 0.039068
Then, finally we can plot out box plot using the seaborn's builtin aggregations:
ax = sns.barplot(x="value", y="Factor", hue="exposure", data=d2)
ax.set(ylabel="", xlabel="Sector Decomposition")
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
Which produces:
Here's the plot params I used to make this chart:
import matplotlib as mpl
# Plot configuration
mpl.style.use("seaborn-pastel")
mpl.rcParams.update(
{
"font.size": 14,
"figure.facecolor": "w",
"axes.facecolor": "w",
"axes.spines.right": False,
"axes.spines.top": False,
"axes.spines.bottom": False,
"xtick.top": False,
"xtick.bottom": False,
"ytick.right": False,
"ytick.left": False,
}
)
If you are fine without using seaborn you can use pandas plotting to create a stacked horizontal bar chart (barh):
import pandas as pd
import matplotlib as mpl
# Plot configuration
mpl.style.use("seaborn-pastel")
mpl.rcParams.update(
{
"font.size": 14,
"figure.facecolor": "w",
"axes.facecolor": "w",
"axes.spines.right": False,
"axes.spines.top": False,
"axes.spines.bottom": False,
"xtick.top": False,
"xtick.bottom": False,
"ytick.right": False,
"ytick.left": False,
}
)
df = pd.DataFrame({
"Factor": list("ABC"),
"Portfolio": [0.669687, 0.072474, 0.039068],
"Benchmark": [0.112653, 0.078819, 0.702114],
"Active Exposure": [0.140177, 0.823740, 0.450814],
}).set_index("Factor")
ax = df.plot.barh(stacked=True)
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
ax.set_ylabel("")
ax.set_xlabel("Sector Decomposition")
Notice in the code above the index is set to Factor which then becomes the y axis.
If you don't set stacked=True you get almost the same chart as seaborn produced:
ax = df.plot.barh(stacked=False)
ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))
ax.set_ylabel("")
ax.set_xlabel("Sector Decomposition")
Related
I need to fix an errorbar like in the graph, but I don't know how to use it. I get an error, and it doesn't work. Please can you help me?
#! /usr/bin/python3
# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
x = np.arange(9)
country = [
"Finland",
"Denmark",
"Switzerland",
"Iceland",
"Netherland",
"Norway",
"Sweden",
"Luxembourg"
]
data = {
"Explained by : Log GDP per Capita": [1.446, 1.502, 1.566, 1.482, 1.501, 1.543, 1.478, 1.751],
"Explained by : Social Support": [1.106, 1.108, 1.079, 1.172, 1.079, 1.108, 1.062, 1.003],
"Explained by : Healthy life expectancy": [0.741, 0.763, 0.816, 0.772, 0.753, 0.782, 0.763, 0.760],
"Explained by : Freedom to make life choices": [0.691, 0.686, 0.653, 0.698, 0.647, 0.703, 0.685, 0.639],
"Explained by : Generosity": [0.124, 0.208, 0.204, 0.293, 0.302, 0.249, 0.244, 0.166],
"Explained by : Perceptions of corruption": [0.481, 0.485, 0.413, 0.170, 0.384, 0.427, 0.448, 0.353],
"Dystopia + residual": [3.253, 2.868, 2.839, 2.967, 2.798, 2.580, 2.683, 2.653]
}
error_value = [[7.904, 7.780], [7.687, 7.552], [7.643, 7.500], [7.670, 7.438], [7.518, 7.410], [7.462, 7.323], [7.433, 7.293], [7.396, 7.252]]
df = pd.DataFrame(data, index=country)
df.plot(width=0.1, kind='barh', stacked=True, figsize=(11, 8))
plt.subplots_adjust(bottom=0.2)
# plt.errorbar(country, error_value, yerr=error_value)
plt.axvline(x=2.43, label="Dystopia (hapiness=2.43)")
plt.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05),
fancybox=True, shadow=True, ncol=3)
plt.xticks(x)
plt.show()
Error bars are drawn as differences from the center. You provide seemingly the values where each error bar ends, so you have to recalculate the distance to the endpoint and provide a numpy array in form (2, N) where the first row contains the negative errorbar values and the second row the positive values:
...
df.plot(width=0.1, kind='barh', stacked=True, figsize=(11, 8))
#determine x-values of the stacked bars
country_sum = df.sum(axis=1).values
#calculate differences of error bar values to bar heights
#and transform array into necessary (2, N) form
err_vals = np.abs(np.asarray(error_value).T - country_sum[None])[::-1, :]
plt.errorbar(country_sum, np.arange(df.shape[0]), xerr=err_vals, capsize=4, color="k", ls="none")
plt.subplots_adjust(bottom=0.2)
...
Sample output:
I'm trying to programmatically create a combo of lineplots and barbplots inside a series of subplots using a For Loop in Python.
Here is reproductible version of the code I tried so far:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df_q = pd.DataFrame.from_dict ({'Year': [2020,2021,2022,2020,2021,2022,2020,2021,2022],
'Category': ['A','A','A','B','B','B','C','C','C'],
'Requested': [5,8,3,15,6,9,11,14,9],
'Allowed': [5,6,2,10,6,7,5,11,9]})
df_a = pd.DataFrame.from_dict ({'Year': [2020,2021,2022,2020,2021,2022,2020,2021,2022],
'Category': ['A','A','A','B','B','B','C','C','C'],
'Amount': [4,10,3,12,6,11,7,5,12]})
df_qt = df_q.melt(id_vars=['Year', 'Category'],
value_vars=['Requested', 'Allowed' ], ignore_index=True)
catgs = df_a['Category'].unique()
fig, ax = plt.subplots(nrows=len(catgs), figsize=(20,len(catgs)*6))
for i, cat in enumerate(catgs):
filt_q = df_qt.loc [df_qt['Category'] == cat]
filt_a= df_a.loc [df_a['Category'] == cat]
sns.barplot(data = filt_q, x='Year', y='value', hue= 'variable', alpha=0.7, ax=ax[i])
sns.lineplot(data =filt_a['Amount'], linewidth = 1.5, markers=True,
marker="o",markersize=10, color='darkred', label='Amount', ax=ax[i])
ax[i].set_title('Category: {}'.format(cat), size=25)
However, there seems to be a lag between the lineplot and barplot starting from the second subplot. See screenshot below.
Any idea what I'm doing wrong ?
Try this:
filt_a= df_a.loc[df_a['Category'] == cat].reset_index()
The reason you see the drift in Category B and C were due to the x-coordinate of the bar chart:
# The x-coordinates are not 2020, 2021, 2022.
# They are 0, 1, 2. The *x-labels* are 2020, 2021, 2022
sns.barplot(data=filt_q, x='Year', y='value', ...)
# The x-coordinates are the index of the filt_a dataframe
# Calling reset_index resets it to 0, 1, 2, ...
sns.lineplot(data=filt_a['Amount'], ...)
I have a data frame that contains 4 columns of data. Each of these columns is a character variable containing 5 different values ( i.e. column1 contains the values A,B,C,D or E . column2 contains the values EXCELLENT , VERY GOOD, GOOD, AVERAGE, and POOR. columns 3 and 4 are similar.
I'm trying to get a separate bar chart for each of the columns by using the below for loop. Unfortunately, it only provides me with the bar chart for column 4. It does not provide the bar chart for the previous 3 columns. Not sure what I am doing wrong.
categorical_attribs=list(CharacterVarDF)
for i in categorical_attribs:
CharacterVarDF [i].value_counts().plot(kind='bar')
Simply set up matplotlib subplots with number of rows and columns. Then in loop, assign each column bar plot to each ax:
import matplotlib.pyplot as plt
...
fig, axes = plt.subplots(figsize=(8,6), ncols=1, nrows=CharacterVarDF.shape[1])
for col, ax in zip(CharacterVarDF.columns, np.ravel(axes)):
CharacterVarDF[col].value_counts().plot(kind='bar', ax=ax, rot=0, title=col)
plt.tight_layout()
plt.show()
To demonstrate with random data:
import numpy as np
import pandas as pd
from matplotlib import rc
import matplotlib.pyplot as plt
np.random.seed(52021)
env_df = pd.DataFrame({
"planetary_boundaries": np.random.choice(
["ocean", "land", "biosphere", "atmosphere",
"climate", "soil", "ozone", "freshwater"], 50),
"species": np.random.choice(
["invertebrates", "vertebrates", "plants", "fungi & protists"], 50),
"tipping_points": np.random.choice(
["Arctic Sea Ice", "Greenland ice sheet", "West Antarctica ice sheet",
"Amazon Rainforest", "Boreal forest", "Indian Monsoon",
"Atlantic meridional overturning circulation",
"West African Monsoon", "Coral reef"], 50)
})
rc('font', **{'family' : 'Arial'})
fig, axes = plt.subplots(ncols=1, nrows=env_df.shape[1], figsize=(7,7))
for col, ax in zip(env_df.columns, np.ravel(axes)):
env_df[col] = env_df[col].str.replace(" ", "\n")
env_df[col].value_counts(sort=False).sort_index().plot(
kind='bar', ax=ax, color='g', rot=0,
title=col.replace("_", " ").title(),
)
plt.tight_layout()
plt.show()
plt.clf()
plt.close()
import matplotlib.pyplot as plt
import numpy as np
# data
x=["IEEE", "Elsevier", "Others"]
y=[7, 6, 2]
import seaborn as sns
plt.legend()
plt.scatter(x, y, s=300, c="blue", alpha=0.4, linewidth=3)
plt.ylabel("No. of Papers")
plt.figure(figsize=(10, 4))
I want to make a graph as shown in the image. I am not sure how to provide data for both journal and conference categories. (Currently, I just include one). Also, I am not sure how to add different colors for each category.
You can try this code snippet for you problem.
- I modified your Data format, I suggest you to use pandas for
data visualization.
- I added one more field to visualize the data more efficiently.
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
# data
x=["IEEE", "Elsevier", "Others", "IEEE", "Elsevier", "Others"]
y=[7, 6, 2, 5, 4, 3]
z=["conference", "journal", "conference", "journal", "conference", "journal"]
# create pandas dataframe
data_list = pd.DataFrame(
{'x_axis': x,
'y_axis': y,
'category': z
})
# change size of data points
minsize = min(data_list['y_axis'])
maxsize = max(data_list['y_axis'])
# scatter plot
sns.catplot(x="x_axis", y="y_axis", kind="swarm", hue="category",sizes=(minsize*100, maxsize*100), data=data_list)
plt.grid()
How to create the graph with correct bubble sizes and with no overlap
Seaborn stripplot and swarmplot (or sns.catplot(kind=strip or kind=swarm)) provide the handy dodge argument which prevents the bubbles from overlapping. The only downside is that the size argument applies a single size to all bubbles and the sizes argument (as used in the other answer) is of no use here. They do not work like the s and size arguments of scatterplot. Therefore, the size of each bubble must be edited after generating the plot:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import seaborn as sns # v 0.11.0
# Create sample data
x = ['IEEE', 'Elsevier', 'Others', 'IEEE', 'Elsevier', 'Others']
y = np.array([7, 6, 3, 7, 1, 3])
z = ['conference', 'conference', 'conference', 'journal', 'journal', 'journal']
df = pd.DataFrame(dict(organisation=x, count=y, category=z))
# Create seaborn stripplot (swarmplot can be used the same way)
ax = sns.stripplot(data=df, x='organisation', y='count', hue='category', dodge=True)
# Adjust the size of the bubbles
for coll in ax.collections[:-2]:
y = coll.get_offsets()[0][1]
coll.set_sizes([100*y])
# Format figure size, spines and grid
ax.figure.set_size_inches(7, 5)
ax.grid(axis='y', color='black', alpha=0.2)
ax.grid(axis='x', which='minor', color='black', alpha=0.2)
ax.spines['bottom'].set(position='zero', color='black', alpha=0.2)
sns.despine(left=True)
# Format ticks
ax.tick_params(axis='both', length=0, pad=10, labelsize=12)
ax.tick_params(axis='x', which='minor', length=25, width=0.8, color=[0, 0, 0, 0.2])
minor_xticks = [tick+0.5 for tick in ax.get_xticks() if tick != ax.get_xticks()[-1]]
ax.set_xticks(minor_xticks, minor=True)
ax.set_yticks(range(0, df['count'].max()+2))
# Edit labels and legend
ax.set_xlabel('Organisation', labelpad=15, size=12)
ax.set_ylabel('No. of Papers', labelpad=15, size=12)
ax.legend(bbox_to_anchor=(1.0, 0.5), loc='center left', frameon=False);
Alternatively, you can use scatterplot with the convenient s argument (or size) and then edit the space between the bubbles to reproduce the effect of the missing dodge argument (note that the x_jitter argument seems to have no effect). Here is an example using the same data as before and without all the extra formatting:
# Create seaborn scatterplot with size argument
ax = sns.scatterplot(data=df, x='organisation', y='count',
hue='category', s=100*df['count'])
ax.figure.set_size_inches(7, 5)
ax.margins(0.2)
# Dodge bubbles
bubbles = ax.collections[0].get_offsets()
signs = np.repeat([-1, 1], df['organisation'].nunique())
for bubble, sign in zip(bubbles, signs):
bubble[0] += sign*0.15
As a side note, I recommend that you consider other types of plots for this data. A grouped bar chart:
df.pivot(index='organisation', columns='category').plot.bar()
Or a balloon plot (aka categorical bubble plot):
sns.scatterplot(data=df, x='organisation', y='category', s=100*count).margins(0.4)
Why? In the bubble graph, the counts are displayed using 2 visual attributes, i) the y-coordinate location and ii) the bubble size. Only one of them is really necessary.
I would like to combine the following lmplots I have. More specifically, the red lines are averages for each season, and I want to place them on their respective lmplots with the other data, instead of having them separate. Here is my code (note, the axes limits aren't working because the second lmplot is messing it up. It works when I just plot the initial data):
ax = sns.lmplot(data=data, x='air_yards', y='cpoe',col='season', lowess = True, scatter_kws={'alpha':.6, 'color': '#4F2E84'}, line_kws={'alpha':.6, 'color': '#4F2E84'})
ax = sns.lmplot(data=avg, x='air_yards', y= 'cpoe',lowess=True, scatter=False, line_kws={'linestyle':'--', 'color': 'red'}, col = 'season')
axes.set_xlim([-5,30])
axes.set_ylim([-25,25])
ax.set(xlabel='air yards')
And here is the output. Simply put, I want to take those red lines and put them on their respective year plots above. Thanks!
Not sure if it is possible the way you want, so maybe something like:
import matplotlib.pyplot as plt
import seaborn as sns
#dummy example
data = pd.DataFrame({'air_yards': range(1,11),
'cpoe': range(1,11),
'season': [1,2,3,2,1,3,2,1,3,2]})
avg = pd.DataFrame({'air_yards': [1, 10]*3,
'cpoe': [2,2,5,5,8,8],
'season': [1,1,2,2,3,3]})
# need this info
n = data["season"].nunique()
# create the number of subplots
fig, axes = plt.subplots(ncols=n, sharex=True, sharey=True)
# now you need to loop through unique season
for ax, (season, dfg) in zip(axes.flat, data.groupby("season")):
# set title
ax.set_title(f'season={season}')
# create the replot for data
sns.regplot("air_yards", "cpoe", data=dfg, ax=ax,
lowess = True, scatter_kws={'alpha':.6, 'color': '#4F2E84'},
line_kws={'alpha':.6, 'color': '#4F2E84'})
# create regplot for avg
sns.regplot("air_yards", "cpoe", data=avg[avg['season'].eq(season)], ax=ax,
lowess=True, scatter=False,
line_kws={'linestyle':'--', 'color': 'red'})
plt.show()
you get