I have looked on this forum for the solution to my problem, but not quite able to find it for my case.
Here is a minimum working example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randint(0,100,size=(16,15)),
columns=list('ABCDEFGHIJKLMNO'))
clr=["#1b0031","#3fee42","#b609d2","#c9ff5d","#7449f6","#03fca1","#9164ff","#ffaf06",
"#087dff","#ff5c0d","#0081b0","#fff276","#530069","#8cff9c","#ff56d7"]
df1=df.loc[0:3]
df1.loc[4]=clr
df1=df1.drop(columns=["A","M","J","F"])
clr1=list(df1.loc[4])
df1=df1.drop(4)
df2=df.loc[4:7]
df2=df2.reset_index(drop=True)
df2.loc[4]=clr
df2=df2.drop(columns=["B","M","K","L"])
clr2=list(df2.loc[4])
df2=df2.drop(4)
df3=df.loc[8:11]
df3=df3.reset_index(drop=True)
df3.loc[4]=clr
df3=df3.drop(columns=["D","L","F"])
clr3=list(df3.loc[4])
df3=df3.drop(4)
df4=df.loc[12:16]
df4=df4.reset_index(drop=True)
df4.loc[4]=clr
df4=df4.drop(columns=["G","I","N","O"])
clr4=list(df4.loc[4])
df4=df4.drop(4)
fig, axes = plt.subplots(nrows=2, ncols=2,sharex=True,figsize=(8,15))
df1.plot.area(ax=axes[0][0],color=clr1)
box = axes[0][0].get_position()
axes[0][0].legend(loc='center left', bbox_to_anchor=(1, 0.5),fontsize=12)
df2.plot.area(ax=axes[1][0],color=clr2)
box = axes[1][0].get_position()
axes[1][0].legend(loc='center left', bbox_to_anchor=(1, 0.5),fontsize=12)
df3.plot.area(ax=axes[0][1],color=clr3)
box = axes[1][0].get_position()
axes[1][0].legend(loc='center left', bbox_to_anchor=(1, 0.5),fontsize=12)
df4.plot.area(ax=axes[1][1],color=clr4)
box = axes[1][0].get_position()
axes[1][0].legend(loc='center left', bbox_to_anchor=(1, 0.5),fontsize=12)
This generates the following
I want to make a common legend on the right side of the figure. For example, even though "A" appears in three subplots, I would like to have it appear only once in the common legend.
From my dataframe df , I know which column names map to which color. Is there a way to use this information to build a legend?
Looking forward to any suggestions.
Easiest way to do this, given that each column maps directly onto a single color, is to disable the automatic legend generation in pandas.DataFrame.plot.area() via legend=False and instead manually creating the list of legend handles.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
df = pd.DataFrame(np.random.randint(0,100,size=(16,15)),
columns=list('ABCDEFGHIJKLMNO'))
clr=["#1b0031","#3fee42","#b609d2","#c9ff5d","#7449f6","#03fca1","#9164ff","#ffaf06",
"#087dff","#ff5c0d","#0081b0","#fff276","#530069","#8cff9c","#ff56d7"]
df1=df.loc[0:3]
df1.loc[4]=clr
df1=df1.drop(columns=["A","M","J","F"])
clr1=list(df1.loc[4])
df1=df1.drop(4)
df2=df.loc[4:7]
df2=df2.reset_index(drop=True)
df2.loc[4]=clr
df2=df2.drop(columns=["B","M","K","L"])
clr2=list(df2.loc[4])
df2=df2.drop(4)
df3=df.loc[8:11]
df3=df3.reset_index(drop=True)
df3.loc[4]=clr
df3=df3.drop(columns=["D","L","F"])
clr3=list(df3.loc[4])
df3=df3.drop(4)
df4=df.loc[12:16]
df4=df4.reset_index(drop=True)
df4.loc[4]=clr
df4=df4.drop(columns=["G","I","N","O"])
clr4=list(df4.loc[4])
df4=df4.drop(4)
fig, axes = plt.subplots(nrows=2, ncols=2,sharex=True,figsize=(8,15))
df1.plot.area(ax=axes[0][0],color=clr1, legend=False)
df2.plot.area(ax=axes[1][0],color=clr2, legend=False)
df3.plot.area(ax=axes[0][1],color=clr3, legend=False)
df4.plot.area(ax=axes[1][1],color=clr4, legend=False)
handles = [Patch(color = clr[i], label = df.columns.values[i]) for i in range(len(clr))]
plt.figlegend(handles=handles)
plt.show()
You can adjust the position of the legend using the bbox_to_anchor and loc arguments, as normally.
Related
I need help creating subplots in matplotlib dynamically from a pandas dataframe.
The data I am using is from data.word.
I have already created the viz but the plots have been created manually.
The reason why I need it dynamically is because I am going to apply a filter dynamically (in Power BI) and i need the graph to adjust to the filter.
This is what i have so far:
I imported the data and got it in the shape i need:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as
# read file from makeover monday year 2018 week 48
df = pd.read_csv(r'C:\Users\Ruth Pozuelo\Documents\python_examples\data\2018w48.csv', usecols=["city", "category","item", "cost"], index_col=False, decimal=",")
df.head()
this is the table:
I then apply the filter that will come from Power BI dynamically:
df = df[df.category=='Party night']
and then I count the number of plots based on the number of items I get after I apply the filter:
itemCount = df['item'].nunique() #number of plots
If I then plot the subplots:
fig, ax = plt.subplots( nrows=1, ncols=itemCount ,figsize=(30,10), sharey=True)
I get the skeleton:
So far so good!
But now i am suck on how to feed the x axis to the loop to generate the subcategories. I am trying something like below, but nothing works.
#for i, ax in enumerate(axes.flatten()):
# ax.plot(??,cityValues, marker='o',markersize=25, lw=0, color="green") # The top-left axes
As I already have the code for the look and feel of the chart, annotations,ect, I would love to be able to use the plt.subplots method and I prefer not use seaborn if possible.
Any ideas on how to get his working?
Thanks in advance!
The data was presented to us and we used it as the basis for our code. I prepared a list of columns and a list of coloring and looped through them. axes.rabel() is more memory efficient than axes.fatten(). This is because the list contains an object for each subplot, allowing for centralized configuration.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
url='https://raw.githubusercontent.com/Curbal-Data-Labs/Matplotlib-Labs/master/2018w48.csv'
dataset = pd.read_csv(url)
dataset.drop_duplicates(['city','item'], inplace=True)
dataset.pivot_table(index='city', columns='item', values='cost', aggfunc='sum', margins = True).sort_values('All', ascending=True).drop('All', axis=1)
df = dataset.pivot_table(index='city', columns='item', values='cost', aggfunc='sum', margins = True).sort_values('All', ascending=True).drop('All', axis=1).sort_values('All', ascending=False, axis=1).drop('All').reset_index()
# comma replace
for c in df.columns[1:]:
df[c] = df[c].str.replace(',','.').astype(float)
fig, axes = plt.subplots(nrows=1, ncols=5, figsize=(30,10), sharey=True)
colors = ['green','blue','red','black','brown']
col_names = ['Dinner','Drinks at Dinner','2 Longdrinks','Club entry','Cinema entry']
for i, (ax,col,c) in enumerate(zip(axes.ravel(), col_names, colors)):
ax.plot(df.loc[:,col], df['city'], marker='o', markersize=25, lw=0, color=c)
ax.set_title(col)
for i,j in zip(df[col], df['city']):
ax.annotate('$'+str(i), xy=(i, j), xytext=(i-4,j), color="white", fontsize=8)
ax.set_xticks([])
ax.spines[['top', 'right', 'left', 'bottom']].set_visible(False)
ax.grid(True, axis='y', linestyle='solid', linewidth=2)
ax.grid(True, axis='x', linestyle='solid', linewidth=0.2)
ax.xaxis.tick_top()
ax.xaxis.set_label_position('top')
ax.set_xlim(xmin=0, xmax=160)
ax.xaxis.set_major_formatter('${x:1.0f}')
ax.tick_params(labelsize=8, top=False, left=False)
plt.show()
Working Example below. I used seaborn to plot the bars but the idea is the same you can loop through the facets and increase a count. Starting from -1 so that your first count = 0, and use this as the axis label.
import seaborn as sns
fig, ax = plt.subplots( nrows=1, ncols=itemCount ,figsize=(30,10), sharey=True)
df['Cost'] = df['Cost'].astype(float)
count = -1
variables = df['Item'].unique()
fig, axs = plt.subplots(1,itemCount , figsize=(25,70), sharex=False, sharey= False)
for var in variables:
count += 1
sns.barplot(ax=axs[count],data=df, x='Cost', y='City')
I am struggling with syncing colors between [seaborn.countplot] and [pandas.DataFrame.plot] pie plot.
I found a similar question on SO, but it does not work with pie chart as it throws an error:
TypeError: pie() got an unexpected keyword argument 'color'
I searched on the documentation sites, but all I could find is that I can set a colormap and palette, which was also not in sync in the end:
Result of using the same colormap and palette
My code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
df[var].value_counts().plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
cplot = sns.countplot(data=df, x=var, ax=ax[1])
for patch in cplot.patches:
cplot.annotate(
format(patch.get_height()),
(
patch.get_x() + patch.get_width() / 2,
patch.get_height()
)
)
plt.show()
Illustration of the problem
As you can see, colors are not in sync with labels.
I added the argument order to the sns.countplot(). This would change how seaborn selects the values and as a consequence the colours between both plots will mach.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
df[var].value_counts().plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
cplot = sns.countplot(data=df, x=var, ax=ax[1],
order=df[var].value_counts().index)
for patch in cplot.patches:
cplot.annotate(
format(patch.get_height()),
(
patch.get_x() + patch.get_width() / 2,
patch.get_height()
)
)
plt.show()
Explanation: Colors are selected by order. So, if the columns in the sns.countplot have a different order than the other plot, both plots will have different columns for the same label.
Using default colors
Using the same dataframe for the pie plot and for the seaborn plot might help. As the values are already counted for the pie plot, that same dataframe could be plotted directly as a bar plot. That way, the order of the values stays the same.
Note that seaborn by default makes the colors a bit less saturated. To get the same colors as in the pie plot, you can use saturation=1 (default is .75). To add text above the bars, the latest matplotlib versions have a new function bar_label.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
counts_df = df[var].value_counts()
counts_df.plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
sns.barplot(x=counts_df.index, y=counts_df.values, saturation=1, ax=ax[1])
ax[1].bar_label(ax[1].containers[0])
#Customized colors
If you want to use a customized list of colors, you can use the colors= keyword in pie() and palette= in seaborn.
To make things fit better, you can replace spaces by newlines (so "Staten Island" will use two lines). plt.tight_layout() will rearrange spacings to make titles and texts fit nicely into the figure.
How can I plot specific attributes of a time series and not the default of all attributes in the Data Frame. I would like to make a Time Series of a particular attribute and two particular attributes. Is it possible to make a time series graph of headcount and another time series graph of headcount and tables open? Below is the code I have been using, if I try and call specific variables I get error codes. Thanks in advance
# Load necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Load data
filename = 'https://library.startlearninglabs.uw.edu/DATASCI410/Datasets/JitteredHeadCount.csv'
headcount_df = pd.read_csv(filename)
headcount_df.describe()
headcount_df.columns
ax = plt.figure(figsize=(12, 3)).gca() # define axis
headcount_df.plot(ax = ax)
ax.set_xlabel('Date')
ax.set_ylabel('Number of guests')
ax.set_title('Time series of Casino data')
You might have to mess around with the ticks and some other formatting, but this should get you headed in the right direction.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
filename = 'https://library.startlearninglabs.uw.edu/DATASCI410/Datasets/JitteredHeadCount.csv'
headcount_df = pd.read_csv(filename)
headcount_df['DateFormat'] = pd.to_datetime(headcount_df['DateFormat'].fillna('ffill'))
headcount_df.set_index('DateFormat', inplace=True)
headcount_df.sort_index(inplace=True)
headcount_df_to = headcount_df[['TablesOpen']]
headcount_df_hc_to = headcount_df[['HeadCount', 'TablesOpen']]
fig, axes = plt.subplots(nrows=2, ncols=1,
figsize=(12, 8))
headcount_df_to.plot(ax=axes[0], color=['orange'])
headcount_df_hc_to.plot(ax=axes[1], color=['blue', 'orange'])
axes[0].set_xlabel('Date')
axes[0].set_ylabel('Tables Open')
axes[0].legend(loc='center left', bbox_to_anchor=(1, 0.5))
axes[1].set_xlabel('Date')
axes[1].set_ylabel('Number of guests and Tables Open')
axes[1].legend(loc='center left', bbox_to_anchor=(1, 0.5))
fig.suptitle('Time Series of Casino data')
I have the following graphic generated with the following code
I want to correct the x-axis display to make the date more readable.
I would also like to be able to enlarge the graph
My code is :
import requests
import urllib.parse
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
def get_api_call(ids, **kwargs):
API_BASE_URL = "https://apis.datos.gob.ar/series/api/"
kwargs["ids"] = ",".join(ids)
return "{}{}?{}".format(API_BASE_URL, "series", urllib.parse.urlencode(kwargs))
df = pd.read_csv(get_api_call(
["168.1_T_CAMBIOR_D_0_0_26", "101.1_I2NG_2016_M_22",
"116.3_TCRMA_0_M_36", "143.3_NO_PR_2004_A_21", "11.3_VMATC_2004_M_12"],
format="csv", start_date=2018
))
time = df.indice_tiempo
construccion=df.construccion
emae = df.emae_original
time = pd.to_datetime(time)
list = d = {'date':time,'const':construccion,'EMAE':emae}
dataset = pd.DataFrame(list)
plt.plot( 'date', 'EMAE', data=dataset, marker='o', markerfacecolor='blue', markersize=12, color='skyblue', linewidth=4)
plt.plot( 'date', 'const', data=dataset, marker='', color='olive', linewidth=2)
plt.legend()
To make the x-tick labels more readable, try rotating them. So use, for example, a 90 degree rotation.
plt.xticks(rotation=90)
To enlarge the size, you can define your own size using the following in the beginning for instance
fig, ax = plt.subplots(figsize=(10, 8))
I am fairly sure that this can be done by using the window itself of Matplotlib. If you have the latest version you can enlarge on a section of the graph by clicking the zoom button in the bottom left. To get the x-tick labels to be more readable you can simply click the expand button in the top right or use Sheldore's solution.
I have 15 barh subplots that looks like this:
I can't seem to get the legend working, so I'll see [2,3,4] as separate labels in the graph and in the legend.
I'm having trouble with making this work for subgraphs. My code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
def plot_bars_by_data(data, title):
fig, axs = plt.subplots(8,2, figsize=(20,40))
fig.suptitle(title, fontsize=20)
fig.subplots_adjust(top=0.95)
plt.rcParams.update({'font.size': 13})
axs[7,1].remove()
column_index = 0
for ax_line in axs:
for ax in ax_line:
if column_index < len(data.columns):
column_name = data.columns[column_index]
current_column_values = data[column_name].value_counts().sort_index()
ax.barh([str(i) for i in current_column_values.index], current_column_values.values)
ax.legend([str(i) for i in current_column_values.index])
ax.set_title(column_name)
column_index +=1
plt.show()
# random data
df_test = pd.DataFrame([np.random.randint(2,5,size=15) for i in range(15)], columns=list('abcdefghijlmnop'))
plot_bars_by_data(df_test, "testing")
I just get a 8x2 bars that looks like the above graph. How can I fix this?
I'm using Python 3.6 and Jupyter Python notebook.
Use the following lines in your code. I can't put the whole output here as its a large figure with lots of subplots and hence showing a particular subplot. It turns out that first you have to create a handle for your subplot and then pass the legend values and the handle to produce the desired legends.
colors = ['r', 'g', 'b']
axx = ax.barh([str(i) for i in current_column_values.index], current_column_values.values, color=colors)
ax.legend(axx, [str(i) for i in current_column_values.index])
Sample Output