I have the electricity consumption of 25 houses, and Im doing K-Means clustering on the dataset that holds those houses. After importing the dataset, pre-processing it, and applying K-Means with K=2, I plotted the data but when Im adding the legend I`m getting this:
No handles with labels found to put in legend.
No error in the code and it is running but I want my code to generate automatic legends that hold the ID of each house starting from 0 to 24.
Here is my code where I`m plotting the data:
plt.figure(figsize=(13,13))
import itertools
marker = itertools.cycle(('+', 'o', '*' , 'X', 's','8','>','1','<'))
for cluster_index in [0,1]:
plt.subplot(2,1,cluster_index + 1)
for index, row in data1.iterrows():
if row.iloc[-1] == cluster_index:
plt.plot(row.iloc[1:-1] ,marker = next(marker) , alpha=1)
plt.legend(loc="right")
plt.plot(kmeans.cluster_centers_[cluster_index], color='k' ,marker='o', alpha=1)
ax = plt.gca()
ax.tick_params(axis = 'x', which = 'major', labelsize = 10)
plt.xticks(rotation="vertical")
plt.ylabel('Monthly Mean Consumption 2018-2019', fontsize=10)
plt.title(f'Cluster {cluster_index}', fontsize=15)
plt.tight_layout()
plt.show()
plt.close()
I just want to have the legend in the output figure with the id of each house, please any help
As I do not have your data, I can not test it in a plot right now, but I assume the problem comes from not passing a label argument to plt.plot i.e.:
for index, row in data1.iterrows():
if row.iloc[-1] == cluster_index:
plt.plot(row.iloc[1:-1] ,marker = next(marker), alpha=1, label=index)
plt.legend(loc="right")
Related
I have an issue, I need to draw a plot for a data that contains factory_name and date in format YYYY. I need to have plot which will contain value for each factory_name, sum of parts that was sold/bought and the mean value for each one for each year.
I tried to make it like that:
pd.pivot_table(df.reset_index(),
index='Year', columns='factory_name', values='Demand').plot()
And this is ok however I do not have the mean value for each factory_name Demand which I could get but only in data frame and I do not know how I can add this results to my plot.
df.groupby(['factory_name','Year']).agg(['sum','mean'])
Here is the code to create data frame:
df = pd.DataFrame({'factory_name' : ['A','B','A','B','A','B','B','A','B','A','A','A'],
'Year': [2001,2002,2003,2001,2002,2003,2002,2003,2003,2003,2003,2003],
'Demand': [100,200,-20,40,30,50,100,200,50,-100,40,50]})
Thanks for help!
colors = ["brown", "darkgreen"]
plt.figure(figsize=(12,8))
for factory, color in zip(df.factory_name.unique(), colors):
s = df.loc[df.factory_name==factory].groupby("Year").Demand.mean()
plt.plot(
s.index,
s.values,
color=color,
linewidth=2,
alpha=.5,
label="%s mean"%factory
)
for factory, color in zip(df.factory_name.unique(), colors):
s = df.loc[df.factory_name==factory].groupby("Year").Demand.sum()
plt.plot(
s.index,
s.values,
color=color,
linewidth=4,
alpha=.25,
label="%s sum"%factory
)
plt.ylim(0,500)
plt.xticks(df.Year.unique())
plt.xlabel("year")
plt.legend()
plt.show()
EDIT:
I edited the code in order to expand the figure size and add the legend.
OUTPUT:
I'm working on an experimentation personal project. I have the following dataframes:
treat_repr = pd.DataFrame({'kpi': ['cpsink', 'hpu', 'mpu', 'revenue', 'wallet']
,'diff_pct': [0.655280, 0.127299, 0.229958, 0.613308, -0.718421]
,'me_pct': [1.206313, 0.182875, 0.170821, 1.336590, 2.229763]
,'p': [0.287025, 0.172464, 0.008328, 0.368466, 0.527718]
,'significance': ['insignificant', 'insignificant', 'significant', 'insignificant', 'insignificant']})
pre_treat_repr = pd.DataFrame({'kpi': ['cpsink', 'hpu', 'mpu', 'revenue', 'wallet']
,'diff_pct': [0.137174, 0.111005, 0.169490, -0.152929, -0.450667]
,'me_pct': [1.419080, 0.207081, 0.202014, 1.494588, 1.901672]
,'p': [0.849734, 0.293427, 0.100091, 0.841053, 0.642303]
,'significance': ['insignificant', 'insignificant', 'insignificant', 'insignificant', 'insignificant']})
I have used the below code to construct errorbar plot, which works fine:
def confint_plot(df):
plt.style.use('fivethirtyeight')
fig, ax = plt.subplots(figsize=(18, 10))
plt.errorbar(df[df['significance'] == 'significant']["diff_pct"], df[df['significance'] == 'significant']["kpi"], xerr = df[df['significance'] == 'significant']["me_pct"], color = '#d62828', fmt = 'o', capsize = 10)
plt.errorbar(df[df['significance'] == 'insignificant']["diff_pct"], df[df['significance'] == 'insignificant']["kpi"], xerr = df[df['significance'] == 'insignificant']["me_pct"], color = '#2a9d8f', fmt = 'o', capsize = 10)
plt.legend(['significant', 'insignificant'], loc = 'best')
ax.axvline(0, c='red', alpha=0.5, linewidth=3.0,
linestyle = '--', ymin=0.0, ymax=1)
plt.title("Confidence Intervals of Continous Metrics", size=14, weight='bold')
plt.xlabel("% Difference of Control over Treatment", size=12)
plt.show()
for which the output of confint_plot(treat_repr) looks like this:
Now if I run the same plot function on a pre-treatment dataframe confint_plot(pre_treat_repr), the plot looks like this:
We can observe from both the plots that the order of the variables changed from 1st plot to 2nd plot depending on whether the kpi is significant(that's the way I figured after exhausting many attempts).
Questions:
How do I make a change to the code to dynamically allocate color maps without changing the order of the kpis on y axis?
Currently I have manually typed in the legends. Is there a way to dynamically populate legends?
Appreciate the help!
Because you plot the significant KPIs first, they will always appear on the bottom of the chart. How you solve this and keep the desired colors depends on the kind of charts you are making with matplotlib. With scatter charts, you can specify a color array in c parameter. Error bar charts do not offer that functionality.
One way to work around that is to sort your KPIs, give them numeric position (0, 1, 2, 3 , ...), plot them twice (once for significants, once for insignificants) and re-tick them:
def confint_plot(df):
plt.style.use('fivethirtyeight')
fig, ax = plt.subplots(figsize=(18, 10))
# Sort the KPIs alphabetically. You can change the order to anything
# that fits your purpose
df_plot = df.sort_values('kpi').assign(y=range(len(df)))
for significance in ['significant', 'insignificant']:
cond = df_plot['significance'] == significance
color = '#d62828' if significance == 'significant' else '#2a9d8f'
# Plot them in their numeric positions first
plt.errorbar(
df_plot.loc[cond, 'diff_pct'], df_plot.loc[cond, 'y'],
xerr=df_plot.loc[cond, 'me_pct'], label=significance,
fmt='o', capsize=10, c=color
)
plt.legend(loc='best')
ax.axvline(0, c='red', alpha=0.5, linewidth=3.0,
linestyle = '--', ymin=0.0, ymax=1)
# Re-tick to show the KPIs
plt.yticks(df_plot['y'], df_plot['kpi'])
plt.title("Confidence Intervals of Continous Metrics", size=14, weight='bold')
plt.xlabel("% Difference of Control over Treatment", size=12)
plt.show()
I have two different dataframes:
df_test1 = pd.DataFrame(
[['<18', 80841], ['18-24', 334725], ['25-44', 698261], ['45-64', 273087], ['65+', 15035]],
columns = ['age_group', 'total_arrests']
)
df_test2 = pd.DataFrame(
[['<18', 33979], ['18-24', 106857], ['25-44', 219324], ['45-64', 80647], ['65+', 4211]],
columns = ['age_group','total_arrests']
)
I created the following plot using matplotlib:
fig, ax = plt.subplots()
ax.bar(df_test1.age_group, df_test1.total_arrests, color = 'seagreen')
ax.bar(df_test2.age_group, df_test2.total_arrests, color = 'lightgreen')
ax.set_xlabel('Age Group')
ax.set_ylabel('Number of Arrests')
ax.set_title('Arrests vs. Felony Arrests by Age Group')
plt.xticks(rotation=0)
plt.legend(['All Arressts', 'Felony Arrests'])
ax.yaxis.set_major_formatter(
ticker.FuncFormatter(lambda y,p: format(int(y), ','))
)
for i,j in zip(df_test1.age_group, df_test1.total_arrests):
ax.annotate(format(j, ','), xy=(i,j))
for i,j in zip(df_test2.age_group, df_test2.total_arrests):
ax.annotate(format(j, ','), xy=(i,j))
plt.show()
I was expecting 2 separate bars, one for each dataframe column, df_test1.total_arrests and df_test2.total_arrests but instead I got a stacked bar chart. How can I get a chart with bars next to one another similar to the chart here Matplotlib plot multiple bars in one graph ? I tried adjusting my code to the one in that example but I couldn't get it.
With only two bars, it's fairly easy. The solution is to align the bars on the "edge" of the tick, one bar is aligned to the left, the other to the right.
Repeat the same logic for proper alignment of the annotations. Half of them are left-aligned, the others are right-aligned
fig, ax = plt.subplots()
ax.bar(df_test1.age_group, df_test1.total_arrests, color = 'seagreen', width=0.4, align='edge')
ax.bar(df_test2.age_group, df_test2.total_arrests, color = 'lightgreen', width=-0.4, align='edge')
ax.set_xlabel('Age Group')
ax.set_ylabel('Number of Arrests')
ax.set_title('Arrests vs. Felony Arrests by Age Group')
plt.xticks(rotation=0)
plt.legend(['All Arressts', 'Felony Arrests'])
ax.yaxis.set_major_formatter(
matplotlib.ticker.FuncFormatter(lambda y,p: format(int(y), ','))
)
for i,j in zip(df_test1.age_group, df_test1.total_arrests):
ax.annotate(format(j, ','), xy=(i,j))
for i,j in zip(df_test2.age_group, df_test2.total_arrests):
ax.annotate(format(j, ','), xy=(i,j), ha='right')
plt.show()
If you have more than 2 bars, then the situation is more complicated (see the code that you linked above). You'll have an easier time using seaborn, but you have to transform your dataframe a bit:
df = pd.merge(left=df_test1, right=df_test2, on='age_group')
df.columns=['age_group','all_arrests', 'felonies']
df = df.melt(id_vars=['age_group'], var_name='Type', value_name='Number')
fig, ax = plt.subplots()
sns.barplot(y='Number',x='age_group',hue='Type', data=df, hue_order=['felonies','all_arrests'])
What I'm trying to achieve is a line graph of genres and their average score throughout history. X-axis = years, y-axis = score.
genre_list is an array of the types of genres.
for genre in genre_list:
random_color = [np.random.random_sample(), np.random.random_sample(), np.random.random_sample()]
plt.plot('release_year', 'vote_average',
data=genre_df, marker='',
markerfacecolor=random_color,
markersize=1,
color=random_color,
linewidth=1,
label = genre)
plt.legend()
plt.figure(figsize=(5,5))
Though what I end up with is quite ugly.
Question 1) I've tried setting the figure size, but it seems to stay the same proportion. How do I configure this?
Question 2) How do I set the line color to match the legend?
Question 3) How do I configure the x and y axis so that they are more precise? (potentially the same question as #1)
I appreciate any sort of input, thank you.
Consider groupby to split dataframe by genre and then loop through subsets for plot lines. And as #ImportanceOfBeingErnest references above, use this SO answer to space out x axis at yearly intervals (rotating ticks as needed):
import matplotlib.pyplot as plt
import matplotlib.ticker as plticker
...
fig, ax = plt.subplots(figsize=(12,5))
for genre, sub_df in genre_df.groupby(['genres']):
random_color = [np.random.random_sample() for _ in range(3)]
plt.plot('release_year', 'vote_average',
data = sub_df, marker = '',
markerfacecolor = random_color,
markersize = 1,
color = random_color,
linewidth = 1,
label = genre)
loc = plticker.MultipleLocator(base=1.0)
ax.xaxis.set_major_locator(loc)
plt.xticks(rotation=45)
plt.legend()
plt.show()
plt.clf()
I analyzed a guessing game for lift usage on a mountain and plotted those things per day. In the plot window, it looks the way I want it to look but when saving as a png, it squeezes the first column.
I have no idea why this happens. Does anyone have any idea? When saving from plot it doesn't do this.
correct depiction in plot window
squeezed first column
Code for the plot looks like this:
plt.figure(figsize=(15,8), dpi=80, facecolor = 'white')
# Histogram
ax1 = plt.subplot2grid( (1,3),(0,0), colspan = 2)
plt.hist(estDay.visitors[estDay.date == est_date], color='#E7E7E7', bins=15)
plt.axvline(estDay.visitors[estDay.date == est_date].mean(), linestyle='dashed', linewidth=3, color='#353535')
plt.axvline(erst.eintritte[erst.date == est_date].mean(), linestyle='dashed', linewidth=3, color='#AF272F')
plt.title(est_date)
ax1.spines['right'].set_visible(False)
ax1.spines['top'].set_visible(False)
ax1.yaxis.set_ticks_position('left')
ax1.xaxis.set_ticks_position('bottom')
summ = statSumm(est_date)
# Info Table
plt.subplot2grid( (1,3),(0,2))
plt.axis('off')
plt.table( cellText = summ.values,
rowLabels = summ.index,
colLabels = summ.columns,
cellLoc = 'center',
rowLoc = 'center',
bbox=[0.6, 0.1, 0.5, 0.8] )
plt.savefig('lottoDays/' + est_date + '.png')
The idea would be to draw the canvas once before saving such that the row has the chance to adapt its size to the row headers.
plt.gcf().canvas.draw()