Prettifying Matplotlib Line Graph - python

What I'm trying to achieve is a line graph of genres and their average score throughout history. X-axis = years, y-axis = score.
genre_list is an array of the types of genres.
for genre in genre_list:
random_color = [np.random.random_sample(), np.random.random_sample(), np.random.random_sample()]
plt.plot('release_year', 'vote_average',
data=genre_df, marker='',
markerfacecolor=random_color,
markersize=1,
color=random_color,
linewidth=1,
label = genre)
plt.legend()
plt.figure(figsize=(5,5))
Though what I end up with is quite ugly.
Question 1) I've tried setting the figure size, but it seems to stay the same proportion. How do I configure this?
Question 2) How do I set the line color to match the legend?
Question 3) How do I configure the x and y axis so that they are more precise? (potentially the same question as #1)
I appreciate any sort of input, thank you.

Consider groupby to split dataframe by genre and then loop through subsets for plot lines. And as #ImportanceOfBeingErnest references above, use this SO answer to space out x axis at yearly intervals (rotating ticks as needed):
import matplotlib.pyplot as plt
import matplotlib.ticker as plticker
...
fig, ax = plt.subplots(figsize=(12,5))
for genre, sub_df in genre_df.groupby(['genres']):
random_color = [np.random.random_sample() for _ in range(3)]
plt.plot('release_year', 'vote_average',
data = sub_df, marker = '',
markerfacecolor = random_color,
markersize = 1,
color = random_color,
linewidth = 1,
label = genre)
loc = plticker.MultipleLocator(base=1.0)
ax.xaxis.set_major_locator(loc)
plt.xticks(rotation=45)
plt.legend()
plt.show()
plt.clf()

Related

How to set xlim in seaborn barplot?

I have created a barplot for given days of the year and the number of people born on this given day (figure a). I want to set the x-axes in my seaborn barplot to xlim = (0,365) to show the whole year.
But, once I use ax.set_xlim(0,365) the bar plot is simply moved to the left (figure b).
This is the code:
#data
df = pd.DataFrame()
df['day'] = np.arange(41,200)
df['born'] = np.random.randn(159)*100
#plot
f, axes = plt.subplots(4, 4, figsize = (12,12))
ax = sns.barplot(df.day, df.born, data = df, hue = df.time, ax = axes[0,0], color = 'skyblue')
ax.get_xaxis().set_label_text('')
ax.set_xticklabels('')
ax.set_yscale('log')
ax.set_ylim(0,10e3)
ax.set_xlim(0,366)
ax.set_title('SE Africa')
How can I set the x-axes limits to day 0 and 365 without the bars being shifted to the left?
IIUC, the expected output given the nature of data is difficult to obtain straightforwardly, because, as per the documentation of seaborn.barplot:
This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type.
This means the function seaborn.barplot creates categories based on the data in x (here, df.day) and they are linked to integers, starting from 0.
Therefore, it means even if we have data from day 41 onwards, seaborn is going to refer the starting category with x = 0, making for us difficult to tweak the lower limit of x-axis post function call.
The following code and corresponding plot clarifies what I explained above:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# data
rng = np.random.default_rng(101)
day = np.arange(41,200)
born = rng.integers(low=0, high=10e4, size=200-41)
df = pd.DataFrame({"day":day, "born":born})
# plot
f, ax = plt.subplots(figsize=(4, 4))
sns.barplot(data=df, x='day', y='born', ax=ax, color='b')
ax.set_xlim(0,365)
ax.set_xticks(ticks=np.arange(0, 365, 30), labels=np.arange(0, 365, 30))
ax.set_yscale('log')
ax.set_title('SE Africa')
plt.tight_layout()
plt.show()
I suggest using matplotlib.axes.Axes.bar to overcome this issue, although handling colors of the bars would be not straightforward compared to sns.barplot(..., hue=..., ...) :
# plot
f, ax = plt.subplots(figsize=(4, 4))
ax.bar(x=df.day, height=df.born) # instead of sns.barplot
ax.get_xaxis().set_label_text('')
ax.set_xlim(0,365)
ax.set_yscale('log')
ax.set_title('SE Africa')
plt.tight_layout()
plt.show()

Seaborn countplot with group order

I try to plot a count plot using seaborn and matplotlib. Given each year, I want to sort the count "drought types" within each year so that it looks better. Currently it is unsorted within each year and look very messy.
Thank you!
import seaborn as sns
import matplotlib.pyplot as plt
count=pd.read_csv(r"https://raw.githubusercontent.com/tuyenhavan/Course_Data/main/count.csv")
plt.figure(figsize=(15,8))
sns.countplot(x= 'Year', hue = 'Drought types', data = count, palette = 'YlOrRd')
plt.legend(loc = "best",frameon=True,bbox_to_anchor=(0.9,0.75))
plt.show()
The following approach draws the years one-by-one. order= is used to fix the order of the years. hue_order is recalculated for each individual year (.reindex() is needed to make sure all drought_types are present).
A dictionary palette is used to make sure each hue value gets the same color, independent of the order. The automatic legend repeats all hue values for each year, so the legend needs to be reduced.
By the way, loc='best' shouldn't be used together with bbox_to_anchor in the legend, as it might cause very unexpected changes with small changes in the data. loc='best' will be changed to one of the 9 possible locations depending on the available space.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
count = pd.read_csv("https://raw.githubusercontent.com/tuyenhavan/Course_Data/main/count.csv")
fig, ax = plt.subplots(figsize=(15, 8))
drought_types = count['Drought types'].unique()
palette = {drought_type: color
for drought_type, color in zip(drought_types, sns.color_palette('YlOrRd', len(drought_types)))}
all_years = range(count['Year'].min(), count['Year'].max() + 1)
sns.set_style('darkgrid')
for year in all_years:
year_data = count[count['Year'] == year]
if len(year_data) > 0:
# reindex is needed to make sure all drought_types are present
hue_order = year_data.groupby('Drought types').size().reindex(drought_types).sort_values(ascending=True).index
sns.countplot(x='Year', order=all_years,
hue='Drought types', hue_order=hue_order,
data=year_data, palette=palette, ax=ax)
# handles, _ = ax.get_legend_handles_labels()
# handles = handles[:len(drought_types)]
handles = [plt.Rectangle((0, 0), 0, 0, color=palette[drought_type], label=drought_type)
for drought_type in drought_types]
ax.legend(handles=handles, loc="upper right", frameon=True, bbox_to_anchor=(0.995, 0.99))
plt.show()

Pointplot and Scatterplot in one figure but X axis is shifting

Hi I'm trying to plot a pointplot and scatterplot on one graph with the same dataset so I can see the individual points that make up the pointplot.
Here is the code I am using:
xlPath = r'path to data here'
df = pd.concat(pd.read_excel(xlPath, sheet_name=None),ignore_index=True)
sns.pointplot(data=df, x='ID', y='HM (N/mm2)', palette='bright', capsize=0.15, alpha=0.5, ci=95, join=True, hue='Layer')
sns.scatterplot(data=df, x='ID', y='HM (N/mm2)')
plt.show()
When I plot, for some reason the points from the scatterplot are offsetting one ID spot right on the x-axis. When I plot the scatter or the point plot separately, they each are in the correct ID spot. Why would plotting them on the same plot cause the scatterplot to offset one right?
Edit: Tried to make the ID column categorical, but that didn't work either.
Seaborn's pointplot creates a categorical x-axis while here the scatterplot uses a numerical x-axis.
Explicitly making the x-values categorical: df['ID'] = pd.Categorical(df['ID']), isn't sufficient, as the scatterplot still sees numbers. Changing the values to strings does the trick. To get them in the correct order, sorting might be necessary.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# first create some test data
df = pd.DataFrame({'ID': np.random.choice(np.arange(1, 49), 500),
'HM (N/mm2)': np.random.uniform(1, 10, 500)})
df['Layer'] = ((df['ID'] - 1) // 6) % 4 + 1
df['HM (N/mm2)'] += df['Layer'] * 8
df['Layer'] = df['Layer'].map(lambda s: f'Layer {s}')
# sort the values and convert the 'ID's to strings
df = df.sort_values('ID')
df['ID'] = df['ID'].astype(str)
fig, ax = plt.subplots(figsize=(12, 4))
sns.pointplot(data=df, x='ID', y='HM (N/mm2)', palette='bright',
capsize=0.15, alpha=0.5, ci=95, join=True, hue='Layer', ax=ax)
sns.scatterplot(data=df, x='ID', y='HM (N/mm2)', color='purple', ax=ax)
ax.margins(x=0.02)
plt.tight_layout()
plt.show()

(python matplotlib ) How to change color of each lollipop in a lollipop plot (ax.stem)

I am using ax.stem to draw lollipop plot in python. However, I found it difficult to assign different colors to each lollipop
as shown here
As you can see I have 2 categories "GWP" & "FDP".
In my project, each category should be divided into 4 subcategories "ingredient", "Waste", "energy" and "infrastructure". Therefore, I want to assign them different colors to indicate the subcategory.
There is a solution proposed here: https://python-graph-gallery.com/181-custom-lollipop-plot/
But this only teaches you how to change color for all lollipops.
And there is another solution: https://python-graph-gallery.com/183-highlight-a-group-in-lollipop/
But this one doesn't really use ax.stem.
Please let me know how to assign different colors to each lollipop.
(Also, I don't know somehow why my plot is displayed upside down. Also, the y axis does not align in order, and there is one dot not connected by a line. It displays correctly in my original plot though.)
Here is my code:
#%%
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
# my dataset
columns = np.array(['types', 'GWP100 (year)', 'FDP (year)'])
types = np.array(['Total (ingredient) per kg', 'Total (waste) per kg',
'energy (whole process) per kg', 'Infrastructure', 'Total (Total)']).reshape(5,1)
gwp = np.array([ 2.86982617e+02, 2.16824983e+02, 4.38920760e+01,
6.02400000e-02, 5.47759916e+02]).reshape(5,1)
fdp = np.array([ 1.35455867e+02, 7.02868322e+00, 1.26622560e+01,
1.64568000e-02, 1.55163263e+02]).reshape(5,1)
original_data = np.concatenate((types, gwp, fdp), axis = 1)
# produce dataframe
data = pd.DataFrame(original_data, columns = columns)
# types GWP100 (year) FDP (year)
#0 Total (ingredient) per kg 286.982617 135.455867
#1 Total (waste) per kg 216.824983 7.02868322
#2 energy (whole process) per kg 43.892076 12.662256
#3 Infrastructure 0.06024 0.0164568
#4 Total (Total) 547.759916 155.163263
#%% graph
fig = plt.figure(1, figsize =(8,6))
# 1st subplot
ax1 = fig.add_subplot(1,2,1)
gwp = data[data.columns[1]]
ax1.stem(gwp)
ax1.set_ylabel(r'kg CO$_2$-Eq', fontsize=10)
ax1.set_xlabel('GWP', fontsize=10)
# 2nd subplot
ax2 = fig.add_subplot(1,2,2)
fdp = data[data.columns[2]]
ax2.stem(fdp)
ax2.set_ylabel(r'kg oil-Eq', fontsize = 10)
ax2.set_xlabel('FDP', fontsize=10)
The stem currently consists of a couple of lines and a "line" consisting of dots on top. It has no option to colorize the lines separately within its interface.
You may replicate the stem plot to draw the lines manually with the color you like.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
columns = np.array(['types', 'GWP100 (year)', 'FDP (year)'])
types = np.array(['Total (ingredient) per kg', 'Total (waste) per kg',
'energy (whole process) per kg', 'Infrastructure', 'Total (Total)'])
gwp = np.array([ 2.86982617e+02, 2.16824983e+02, 4.38920760e+01,
6.02400000e-02, 5.47759916e+02])
fdp = np.array([ 1.35455867e+02, 7.02868322e+00, 1.26622560e+01,
1.64568000e-02, 1.55163263e+02])
# produce dataframe
data = pd.DataFrame([types,gwp,fdp], index = columns).transpose()
colors = list("bgryk")
fig, (ax, ax2) = plt.subplots(ncols=2)
for t, y, c in zip(data["types"], data["GWP100 (year)"],colors):
ax.plot([t,t], [0,y], color=c, marker="o", markevery=(1,2))
ax.set_ylim(0,None)
plt.setp(ax.get_xticklabels(), rotation=90)
fig.tight_layout()
plt.show()
A more efficient solution is of course to use a LineCollection in combination with a scatter plot for the dots.
fig, (ax, ax2) = plt.subplots(ncols=2)
segs = np.zeros((len(data), 2, 2))
segs[:,:,0] = np.repeat(np.arange(len(data)),2).reshape(len(data),2)
segs[:,1,1] = data["GWP100 (year)"].values
lc = LineCollection(segs, colors=colors)
ax.add_collection(lc)
ax.scatter(np.arange(len(data)), data["GWP100 (year)"].values, c=colors)
ax.set_xticks(np.arange(len(data)))
ax.set_xticklabels(data["types"], rotation=90)
ax.autoscale()
ax.set_ylim(0,None)
fig.tight_layout()
plt.show()
I will answer one of your main questions regarding the same coloring of the lines and markers category wise. There seems to be no direct option while calling ax1.stem() to specify the list of colors as per the official docs. In fact they say that the resulting plot might not be reasonable if one do so. Nevertheless, below is one trick to get things done your way.
The idea is following:
Get the objects (stemline) displayed on the subplot
Get the x-y data of the markers
Loop over the data and change the color of each stemline. Plot the marker individually with the same color as stemline. The colors is an array specifying the colors of your choice.
Following is the relevant part of the code:
# 1st subplot
ax1 = fig.add_subplot(1,2,1)
gwp = data[data.columns[1]]
colors = ['r', 'g', 'b', 'y', 'k']
_, stemlines, _ = ax1.stem(gwp)
line = ax1.get_lines()
xd = line[0].get_xdata()
yd = line[0].get_ydata()
# mec and mfc stands for markeredgecolor and markerfacecolor
for i in range(len(stemlines)):
plt.plot([xd[i]], [yd[i]], 'o', ms=7, mfc=colors[i], mec=colors[i])
plt.setp(stemlines[i], 'color', colors[i])
ax1.set_ylabel(r'kg CO$_2$-Eq', fontsize=10)
ax1.set_xlabel('GWP', fontsize=10)
# 2nd subplot
ax2 = fig.add_subplot(1,2,2)
fdp = data[data.columns[2]]
_, stemlines, _ = ax2.stem(fdp)
line = ax2.get_lines()
xd = line[0].get_xdata()
yd = line[0].get_ydata()
for i in range(len(stemlines)):
plt.plot([xd[i]], [yd[i]], 'o', ms=7, mfc=colors[i], mec=colors[i])
plt.setp(stemlines[i], 'color', colors[i])

Seaborn (time series) boxplot using hue and different scale axes

I have a dataframe which has a number of values per date (datetime field). This values are classified in U (users) and S (session) by using a column Group. Seaborn is used to visualize two boxplots per date, where the hue is set to Group.
The problem comes when considering that the values corresponding to U (users) are much bigger than those corresponding to S (session), making the S data illegible. Thus, I need to come up with a solution that allows me to plot both series (U and S) in the same figure in an understandable manner.
I wonder if independent Y axes (with different scales) can be set to each hue, so that both Y axes are shown (as when using twinx but without losing hue visualization capabilities).
Any other alternative would be welcome =)
The S boxplot time series boxplot:
The combined boxplot time series using hue. Obviously it's not possible to see any information about the S group because of the scale of the Y axis:
The columns of the dataframe:
| Day (datetime) | n_data (numeric) | Group (S or U)|
The code line generating the combined boxplot:
seaborn.boxplot(ax=ax,x='Day', y='n_data', hue='Group', data=df,
palette='PRGn', showfliers=False)
Managed to find a solution by using twinx:
fig,ax= plt.subplots(figsize=(50,10))
tmpU = groups.copy()
tmpU.loc[tmp['Group']!='U','n_data'] = np.nan
tmpS = grupos.copy()
tmpS.loc[tmp['Group']!='S','n_data'] = np.nan
ax=seaborn.boxplot(ax=ax,x='Day', y = 'n_data', hue='Group', data=tmpU, palette = 'PRGn', showfliers=False)
ax2 = ax.twinx()
seaborn.boxplot(ax=ax2,x='Day', y = 'n_data', hue='Group', data=tmpS, palette = 'PRGn', showfliers=False)
handles,labels = ax.get_legend_handles_labels()
l= plt.legend(handles[0:2],labels[0:2],loc=1)
plt.setp(ax.get_xticklabels(),rotation=30,horizontalalignment='right')
for label in ax.get_xticklabels()[::2]:
label.set_visible(False)
plt.show()
plt.close('all')
The code above generates the following figure:
Which in this case turns out to be too dense to be published. Therefore I would adopt a visualization based in subplots, as Parfait susgested in his/her answer.
It wasn't an obvious solution to me so I would like to thank Parfait for his/her answer.
Consider building separate plots on same figure with y-axes ranges tailored to subsetted data. Below demonstrates with random data seeded for reproducibility (for readers of this post).
Data (with U values higher than S values)
import pandas as pd
import numpy as np
import seaborn
import matplotlib.pyplot as plt
np.random.seed(2018)
u_df = pd.DataFrame({'Day': pd.date_range('2016-10-01', periods=10)\
.append(pd.date_range('2016-10-01', periods=10)),
'n_data': np.random.uniform(0,800,20),
'Group': 'U'})
s_df = pd.DataFrame({'Day': pd.date_range('2016-10-01', periods=10)\
.append(pd.date_range('2016-10-01', periods=10)),
'n_data': np.random.uniform(0,200,20),
'Group': 'S'})
df = pd.concat([u_df, s_df], ignore_index=True)
df['Day'] = df['Day'].astype('str')
Plot
fig = plt.figure(figsize=(10,5))
for i,g in enumerate(df.groupby('Group')):
plt.title('N_data of {}'.format(g[0]))
plt.subplot(2, 1, i+1)
seaborn.boxplot(x="Day", y="n_data", data=g[1], palette="PRGn", showfliers=False)
plt.tight_layout()
plt.show()
plt.clf()
plt.close('all')
To retain original hue and grouping, render all non-group n_data to np.nan:
fig = plt.figure(figsize=(10,5))
for i,g in enumerate(df.Group.unique()):
plt.subplot(2, 1, i+1)
tmp = df.copy()
tmp.loc[tmp['Group']!=g, 'n_data'] = np.nan
seaborn.boxplot(x="Day", y="n_data", hue="Group", data=tmp,
palette="PRGn", showfliers=False)
plt.tight_layout()
plt.show()
plt.clf()
plt.close('all')
So one option to do a grouped box plot with two separate axis is to use hue_order= ['value, np.nan] in your argument for sns.boxplot:
fig = plt.figure(figsize=(14,8))
ax = sns.boxplot(x="lon_bucketed", y="value", data=m, hue='name', hue_order=['co2',np.nan],
width=0.75,showmeans=True,meanprops={"marker":"s","markerfacecolor":"black", "markeredgecolor":"black"},linewidth=0.5 ,palette = customPalette)
ax2 = ax.twinx()
ax2 = sns.boxplot(ax=ax2,x="lon_bucketed", y="value", data=m, hue='name', hue_order=[np.nan,'g_xco2'],
width=0.75,showmeans=True,meanprops={"marker":"s","markerfacecolor":"black", "markeredgecolor":"black"},linewidth=0.5, palette = customPalette)
ax1.grid(alpha=0.5, which = 'major')
plt.tight_layout()
ax.legend_.remove()
GW = mpatches.Patch(color='seagreen', label='$CO_2$')
WW = mpatches.Patch(color='mediumaquamarine', label='$XCO_2$')
ax, ax2.legend(handles=[GW,WW], loc='upper right',prop={'size': 14}, fontsize=12)
ax.set_title("$XCO_2$ vs. $CO_2$",fontsize=18)
ax.set_xlabel('Longitude [\u00b0]',fontsize=14)
ax.set_ylabel('$CO_2$ [ppm]',fontsize=14)
ax2.set_ylabel('$XCO_2$ [ppm]',fontsize=14)
ax.tick_params(labelsize=14)

Categories