Matplotlib Axes legend shows only one label in barh - python

I have 15 barh subplots that looks like this:
I can't seem to get the legend working, so I'll see [2,3,4] as separate labels in the graph and in the legend.
I'm having trouble with making this work for subgraphs. My code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
def plot_bars_by_data(data, title):
fig, axs = plt.subplots(8,2, figsize=(20,40))
fig.suptitle(title, fontsize=20)
fig.subplots_adjust(top=0.95)
plt.rcParams.update({'font.size': 13})
axs[7,1].remove()
column_index = 0
for ax_line in axs:
for ax in ax_line:
if column_index < len(data.columns):
column_name = data.columns[column_index]
current_column_values = data[column_name].value_counts().sort_index()
ax.barh([str(i) for i in current_column_values.index], current_column_values.values)
ax.legend([str(i) for i in current_column_values.index])
ax.set_title(column_name)
column_index +=1
plt.show()
# random data
df_test = pd.DataFrame([np.random.randint(2,5,size=15) for i in range(15)], columns=list('abcdefghijlmnop'))
plot_bars_by_data(df_test, "testing")
I just get a 8x2 bars that looks like the above graph. How can I fix this?
I'm using Python 3.6 and Jupyter Python notebook.

Use the following lines in your code. I can't put the whole output here as its a large figure with lots of subplots and hence showing a particular subplot. It turns out that first you have to create a handle for your subplot and then pass the legend values and the handle to produce the desired legends.
colors = ['r', 'g', 'b']
axx = ax.barh([str(i) for i in current_column_values.index], current_column_values.values, color=colors)
ax.legend(axx, [str(i) for i in current_column_values.index])
Sample Output

Related

Dynamic pandas subplots with matplotlib

I need help creating subplots in matplotlib dynamically from a pandas dataframe.
The data I am using is from data.word.
I have already created the viz but the plots have been created manually.
The reason why I need it dynamically is because I am going to apply a filter dynamically (in Power BI) and i need the graph to adjust to the filter.
This is what i have so far:
I imported the data and got it in the shape i need:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as
# read file from makeover monday year 2018 week 48
df = pd.read_csv(r'C:\Users\Ruth Pozuelo\Documents\python_examples\data\2018w48.csv', usecols=["city", "category","item", "cost"], index_col=False, decimal=",")
df.head()
this is the table:
I then apply the filter that will come from Power BI dynamically:
df = df[df.category=='Party night']
and then I count the number of plots based on the number of items I get after I apply the filter:
itemCount = df['item'].nunique() #number of plots
If I then plot the subplots:
fig, ax = plt.subplots( nrows=1, ncols=itemCount ,figsize=(30,10), sharey=True)
I get the skeleton:
So far so good!
But now i am suck on how to feed the x axis to the loop to generate the subcategories. I am trying something like below, but nothing works.
#for i, ax in enumerate(axes.flatten()):
# ax.plot(??,cityValues, marker='o',markersize=25, lw=0, color="green") # The top-left axes
As I already have the code for the look and feel of the chart, annotations,ect, I would love to be able to use the plt.subplots method and I prefer not use seaborn if possible.
Any ideas on how to get his working?
Thanks in advance!
The data was presented to us and we used it as the basis for our code. I prepared a list of columns and a list of coloring and looped through them. axes.rabel() is more memory efficient than axes.fatten(). This is because the list contains an object for each subplot, allowing for centralized configuration.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
url='https://raw.githubusercontent.com/Curbal-Data-Labs/Matplotlib-Labs/master/2018w48.csv'
dataset = pd.read_csv(url)
dataset.drop_duplicates(['city','item'], inplace=True)
dataset.pivot_table(index='city', columns='item', values='cost', aggfunc='sum', margins = True).sort_values('All', ascending=True).drop('All', axis=1)
df = dataset.pivot_table(index='city', columns='item', values='cost', aggfunc='sum', margins = True).sort_values('All', ascending=True).drop('All', axis=1).sort_values('All', ascending=False, axis=1).drop('All').reset_index()
# comma replace
for c in df.columns[1:]:
df[c] = df[c].str.replace(',','.').astype(float)
fig, axes = plt.subplots(nrows=1, ncols=5, figsize=(30,10), sharey=True)
colors = ['green','blue','red','black','brown']
col_names = ['Dinner','Drinks at Dinner','2 Longdrinks','Club entry','Cinema entry']
for i, (ax,col,c) in enumerate(zip(axes.ravel(), col_names, colors)):
ax.plot(df.loc[:,col], df['city'], marker='o', markersize=25, lw=0, color=c)
ax.set_title(col)
for i,j in zip(df[col], df['city']):
ax.annotate('$'+str(i), xy=(i, j), xytext=(i-4,j), color="white", fontsize=8)
ax.set_xticks([])
ax.spines[['top', 'right', 'left', 'bottom']].set_visible(False)
ax.grid(True, axis='y', linestyle='solid', linewidth=2)
ax.grid(True, axis='x', linestyle='solid', linewidth=0.2)
ax.xaxis.tick_top()
ax.xaxis.set_label_position('top')
ax.set_xlim(xmin=0, xmax=160)
ax.xaxis.set_major_formatter('${x:1.0f}')
ax.tick_params(labelsize=8, top=False, left=False)
plt.show()
Working Example below. I used seaborn to plot the bars but the idea is the same you can loop through the facets and increase a count. Starting from -1 so that your first count = 0, and use this as the axis label.
import seaborn as sns
fig, ax = plt.subplots( nrows=1, ncols=itemCount ,figsize=(30,10), sharey=True)
df['Cost'] = df['Cost'].astype(float)
count = -1
variables = df['Item'].unique()
fig, axs = plt.subplots(1,itemCount , figsize=(25,70), sharex=False, sharey= False)
for var in variables:
count += 1
sns.barplot(ax=axs[count],data=df, x='Cost', y='City')

Make Left and Bottom Spines Visible in Seaborn Subplots

I try to come up with a helper function to plot figure with subplots in Seaborn.
The codes currently look like below:
def granular_barplot(data, col_name, separator):
'''
data = dataframe
col_name: the column to be analysed
separator: column to be plotted in subplot
'''
g = sns.catplot(data=data, y=col_name, col=separator, kind='count',color=blue)
g.fig.set_size_inches(16,8)
g.fig.suptitle(f'{col_name.capitalize()} Changes by {separator.capitalize()}',fontsize=16, fontweight='bold')
g.despine()
for ax in g.axes.ravel():
for c in ax.containers:
ax.bar_label(c)
and it produces the graph like this:
What I'm trying to achieve is to make the left and bottom spines visible for each subplots in the helper function like below (which is similar to the sns.despine function):
Appreciate your helps and idea. Thanks.
Try setting this style:
def granular_barplot(data, col_name, separator):
'''
data = dataframe
col_name: the column to be analysed
separator: column to be plotted in subplot
'''
sns.set_style({'axes.linewidth': 2, 'axes.edgecolor':'black'})
g = sns.catplot(data=data, y=col_name, col=separator, kind='count',color='blue')
g.fig.set_size_inches(16,8)
g.fig.suptitle(f'{col_name.capitalize()} Changes by {separator.capitalize()}',fontsize=16, fontweight='bold')
g.despine()
for ax in g.axes.ravel():
ax.spines['left'].set_visible(True)
ax.spines['bottom'].set_visible(True)
df = sns.load_dataset('tips')
granular_barplot(df, 'sex', 'smoker')
Output:
You should either be able to pass some settings to seaborn's despine function or use matplotlib's ability to set spine visibility:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(1,100)
y = np.arange(1,100)
g = sns.lineplot(x=x,y=y)
plt.title("Seaborn despine")
sns.despine(left=False, bottom=False)
plt.show()
g = sns.lineplot(x=x,y=y)
plt.title("False Spines")
g.spines['right'].set_visible(False)
g.spines['top'].set_visible(False)
plt.show()

Text based colors in scatterplot python matplotlib [duplicate]

This question already has answers here:
Discrete Color Bar with Tick labels in between colors
(2 answers)
Closed 2 years ago.
Hi I am trying to create a scatterplot where each X,Y variable combination is of a particular category, so within the scatterplot I would like to have each category with a different color.
I was able to achieve that as per the code below. However the colorbar that I see on the plot would make more sense if it had the category name on it rather than a numerical value.
Any pointers would be greatly appreciated.
I know seaborn could probably make it easier but I am specifically looking for a matplotlib based solution.
import numpy
import pandas
import matplotlib.pyplot as plt
numpy.random.seed(0)
N = 50
_categories= ['A', 'B', 'C', 'D']
df = pandas.DataFrame({
'VarX': numpy.random.uniform(low=130, high=200, size=N),
'VarY': numpy.random.uniform(low=30, high=100, size=N),
'Category': numpy.random.choice(_categories, size=N)
})
colorMap = {}
k = 0
for i in _categories:
colorMap[_categories[k]] = k
k+=1
plt.figure(figsize=(15,5))
plt.scatter(df.VarX, df.VarY, c= df.Category.map(colorMap), cmap='viridis')
plt.colorbar()
plt.show()
This code produces
Output
First of all, I presume you want to have a "discrete" colormap, so one way to do this is:
n_cat = len(_categories)
cmap = plt.get_cmap('viridis', n_cat)
Which is a convenient function to obtain a ListedColormap, i.e. a list of colors for each of your categories, sampled from the default colormap "viridis". Next, you simply pass that colormap over to the scatter plot, apply the colorbar and then set the ticks accordingly:
plt.scatter(df.VarX, df.VarY, c= df.Category.map(colorMap), cmap=cmap)
cbar = plt.colorbar()
tick_locs = (numpy.arange(n_cat) + 0.5)*(n_cat-1)/n_cat
cbar.set_ticks(tick_locs)
cbar.set_ticklabels(_categories)
Note: this answer is heavily inspired from this answer
The answer from here (pasted below) might be what you're looking for. The key is probably to use something like groups = df.groupby('label') and then plotting each group/category of the df.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
np.random.seed(1974)
# Generate Data
num = 20
x, y = np.random.random((2, num))
labels = np.random.choice(['a', 'b', 'c'], num)
df = pd.DataFrame(dict(x=x, y=y, label=labels))
groups = df.groupby('label')
# Plot
fig, ax = plt.subplots()
ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling
for name, group in groups:
ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name)
ax.legend()
plt.show()

Matplotlib graphics problems in python

I have the following graphic generated with the following code
I want to correct the x-axis display to make the date more readable.
I would also like to be able to enlarge the graph
My code is :
import requests
import urllib.parse
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
def get_api_call(ids, **kwargs):
API_BASE_URL = "https://apis.datos.gob.ar/series/api/"
kwargs["ids"] = ",".join(ids)
return "{}{}?{}".format(API_BASE_URL, "series", urllib.parse.urlencode(kwargs))
df = pd.read_csv(get_api_call(
["168.1_T_CAMBIOR_D_0_0_26", "101.1_I2NG_2016_M_22",
"116.3_TCRMA_0_M_36", "143.3_NO_PR_2004_A_21", "11.3_VMATC_2004_M_12"],
format="csv", start_date=2018
))
time = df.indice_tiempo
construccion=df.construccion
emae = df.emae_original
time = pd.to_datetime(time)
list = d = {'date':time,'const':construccion,'EMAE':emae}
dataset = pd.DataFrame(list)
plt.plot( 'date', 'EMAE', data=dataset, marker='o', markerfacecolor='blue', markersize=12, color='skyblue', linewidth=4)
plt.plot( 'date', 'const', data=dataset, marker='', color='olive', linewidth=2)
plt.legend()
To make the x-tick labels more readable, try rotating them. So use, for example, a 90 degree rotation.
plt.xticks(rotation=90)
To enlarge the size, you can define your own size using the following in the beginning for instance
fig, ax = plt.subplots(figsize=(10, 8))
I am fairly sure that this can be done by using the window itself of Matplotlib. If you have the latest version you can enlarge on a section of the graph by clicking the zoom button in the bottom left. To get the x-tick labels to be more readable you can simply click the expand button in the top right or use Sheldore's solution.

Matplotlib multiple figures opening and saving

Hello I am having a problem plotting data from pandas dataframes. Within a few for loops I would like to create one large scatter plot (multiplots.png), to which new data is added in every loop, while also creating separate plots that are plotted and saved in every j loop (plot_i_j.png).
In my code the plots_i_j.png figures are produced correctly, but multiplots.png always ends up being the last plot_i_j.png figure. As you can see, I am trying to plot multiplots.png on axComb, while the plot_i_j.png figures are plotted on ax. Can anyone help me on this please?
import pandas as pd
import matplotlib.pyplot as plt
columnNames = ['a','b']
scatterColors = ['red','blue','green','black']
figComb, axComb = plt.subplots(figsize=(8,6))
for i in range(4): # this is turbine number
df1 = pd.DataFrame(np.random.randn(5, 2), columns=columnNames)
df2 = pd.DataFrame(np.random.randn(5, 2), columns=columnNames)
print(df1)
for j in range(2):
fig, ax = plt.subplots(figsize=(8,6))
fig.suptitle(str(i)+'_'+str(j), fontsize=16)
df1.plot(columnNames[j], ax=ax, color='blue', ls="--")
plt.savefig('plot_'+str(i)+'_'+str(j)+'.png')
df1.reset_index().plot.scatter('index',columnNames[j],3,ax=axComb,color=scatterColors[j])
df2.reset_index().plot.scatter('index',columnNames[j],100,ax=axComb,color=scatterColors[j])
plt.savefig('multiPlots.png')
Really a small error. When you do plt.savefig, matplotlib looks for the last called figure.
Replace the plt.savefig('plot_'+str(i)+'_'+str(j)+'.png') with fig.savefig('plot_'+str(i)+'_'+str(j)+'.png').
And replace plt.savefig('multiPlots.png') by figComb.savefig('multiPlots.png').

Categories