I am trying to create multiple bar charts automatically in a loop using a subplot.
I have created a function to create the parameters for the plot according to how many plots I need like so:
def create_parameters(parameters):
exec("def f_create_parameters({}): pass".format(', '.join(parameters)))
return locals()['f_create_parameters']
and the code that uses the function:
parList = []
names = []
even = 2
odd = 1
for i in range (0, len(listOfCategoriesEN)*2):
parList.append(create_parameters(["ax"+str(odd),"ax"+str(even)]))
names.append("ax"+str(odd))
names.append("ax"+str(even))
odd+=2
even+=2
Then this is the code where I am trying to create a single figure with multiple plots. I am getting all the plots overlayed on the last bar graph. Any idea how to fix it:
val = 0
fig2, (parList) = plt.subplots(len(listOfCategoriesEN)*2,2,figsize=(20,20))
for name,dict_ in categoriesDict.items():
df = pd.DataFrame.from_dict(dict_, orient='index', columns=["Title", "Pageviews"])
df = df.sort_values(by=['Pageviews'], ascending=False)
df[ "Pageviews"] = df[ "Pageviews"].astype(int)
#get top 5
df1 = df.head(5)
df1 = df1.sort_values(by=['Pageviews'], ascending=True)
df1['Title'] = df1['Title'].str.replace('’','\'')
if(not df1.empty):
x = df1['Title']
y = df1['Pageviews']
locals()[names[val]].barh(x, y, color=colours)
#locals()[names[val]].set_title(name+" TOP 5 PAGES")
val+=1
plt.show()
parList is a list of all your subplots.
By using plt.sca(ax) (sca = set current axis) you select the active axis to be ax and then plot your data:
val = 0
for name,dict_ in categoriesDict.items():
# do your data stuff
if(not df1.empty):
x = df1['Title']
y = df1['Pageviews']
ax = parList[val//2, val%2] # needs to be changed if you rearrange your plots
plt.sca(ax)
locals()[names[val]].barh(x, y, color=colours)
#locals()[names[val]].set_title(name+" TOP 5 PAGES")
val+=1
Related
This is a problem I encounter very often. I have a plotly figure with column and row facets. I have already unlinked the y axes using fig.update_yaxes(matches=None). However, while this is useful to scale axes between rows 1 and 2 as they exist in quite different domains, it breaks the ability to compare among column facets. You can see this issue in the plot below:
So my question is, how can I have the same y axes across all column facets in each row, while having different y axes for row 1 and row 2?
In order to ensure a row-wise matching you'll have to specify the following for the first row:
fig.layout.yaxis.matches = 'y'
fig.layout.yaxis2.matches = 'y'
fig.layout.yaxis3.matches = 'y'
And this for the second:
fig.layout.yaxis4.matches = 'y4'
fig.layout.yaxis5.matches = 'y4'
fig.layout.yaxis6.matches = 'y4'
As you can see, all y-axes are tied to the first y-axis of each corresponding row.
For those of you who would like to try it out, here's an example that builds on a facet plot
Complete code:
import plotly.express as px
df = px.data.gapminder()
fig = px.scatter(df, x='gdpPercap', y='lifeExp', color='continent', size='pop',
facet_col='year', facet_col_wrap=4)
fig.layout.yaxis.matches = 'y'
fig.layout.yaxis2.matches = 'y'
fig.layout.yaxis3.matches = 'y'
fig.layout.yaxis4.matches = 'y'
fig.layout.yaxis5.matches = 'y5'
fig.layout.yaxis7.matches = 'y5'
fig.layout.yaxis6.matches = 'y5'
fig.layout.yaxis8.matches = 'y5'
fig.layout.yaxis9.matches = 'y9'
fig.layout.yaxis10.matches = 'y9'
fig.layout.yaxis11.matches = 'y9'
fig.layout.yaxis12.matches = 'y9'
fig.show()
nrows = df.row_var.nunique() # or find a way to get number of rows from fig object..
for i in range(0,nrows):
fig.update_yaxes(showticklabels=True, matches=f'y{i+1}', col=i+1)
https://github.com/plotly/plotly_express/issues/147#issuecomment-537814046
I am trying to create a grid of Subplots for a predetermined x & y data. The functions should iterate through a pandas DataFrame, identify Categorical variables and plot the x & y data with a line for each level of a given categorial variable. The number of plots is equal to the number of Categorical variables, and the number of lines on each plot should be reflective of the number of categories for that variable.
I initially tried to group the Dataframe in a For loop on a given categorical variable, but I have had some mixed results. I think My issue is in how I am assigning what axis the lines are getting drawn on.
def grouping_for_graphs(df,x_col, y_col,category,func):
'''
funtion to group dataframe given a variable and
aggregation function
'''
X = df[x_col].name
y = df[y_col].name
category = df[category].name
df_grouped = df.groupby([X, category])[y].apply(func)
return df_grouped.reset_index()
# create a list of categorical variables to plot
cat_list = []
col_list = list(df.select_dtypes(include = ['object']).columns)
for col in col_list:
if len(df[col].unique()) < 7:
cat_list.append(col)
# create plots and axes
fig, axs = plt.subplots(2, 2, figsize=(30,24))
axs = axs.flatten()
# pick plot function
plot_func = plt.plot
# plot this
for ax, category in zip(axs, cat_list):
df_grouped = grouping_for_graphs(df,x_col, y_col,category,agg_func)
x_col = df_grouped.columns[0]
y_col = df_grouped.columns[-1]
category = str(list(df_grouped.columns.drop([x_lab, y_lab]))[0])
for feature in list(df_grouped[category].unique()):
X = df_grouped[df_grouped[category] == feature][x_col]
y = df_grouped[df_grouped[category] == feature][y_col]
ax.plot = plot_func(X,y)
ax.set_xlabel(x_col)
ax.set_ylabel(y_col)
ax.set_title(feature)
Other than getting an error that ax.plot is a 'list' object and is not callable, all the lines drawn are put on the final plot of the subplots.
I am confused with your plot_func. Remove this and just directly plot using ax.plot(X, y). The modified line is highlighted by a comment
fig, axs = plt.subplots(2, 2, figsize=(30,24))
axs = axs.flatten()
for ax, category in zip(axs, cat_list):
df_grouped = grouping_for_graphs(df,x_col, y_col,category,agg_func)
x_col = df_grouped.columns[0]
y_col = df_grouped.columns[-1]
category = str(list(df_grouped.columns.drop([x_lab, y_lab]))[0])
for feature in list(df_grouped[category].unique()):
X = df_grouped[df_grouped[category] == feature][x_col]
y = df_grouped[df_grouped[category] == feature][y_col]
ax.plot(X,y) # <--- Modified here
ax.set_xlabel(x_col)
ax.set_ylabel(y_col)
ax.set_title(feature)
If I draw the plot using the following code, it works and I can see all the subplots in a single row. I can specifically break the number of cols into three or two and show them. But I have 30 columns and I wanted to use a loop mechanism so that they are plotted in a grid of say 4x4 sub-plots
regressionCols = ['col_a', 'col_b', 'col_c', 'col_d', 'col_e']
sns.pairplot(numerical_df, x_vars=regressionCols, y_vars='price',height=4, aspect=1, kind='scatter')
plt.show()
The code using loop is below. However, I don't see anything rendered.
nr_rows = 4
nr_cols = 4
li_cat_cols = list(regressionCols)
fig, axs = plt.subplots(nr_rows, nr_cols, figsize=(nr_cols*4,nr_rows*4), squeeze=False)
for r in range(0, nr_rows):
for c in range(0,nr_cols):
i = r*nr_cols+c
if i < len(li_cat_cols):
sns.set(style="darkgrid")
bp=sns.pairplot(numerical_df, x_vars=li_cat_cols[i], y_vars='price',height=4, aspect=1, kind='scatter')
bp.set(xlabel=li_cat_cols[i], ylabel='Price')
plt.tight_layout()
plt.show()
Not sure what I am missing.
I think you didnt connect each of your subplot spaces in a matrix plot to scatter plots generated in a loop.
Maybe this solution with inner pandas plots could be proper for you:
For example,
1.Lets simply define an empty pandas dataframe.
numerical_df = pd.DataFrame([])
2. Create some random features and price depending on them:
numerical_df['A'] = np.random.randn(100)
numerical_df['B'] = np.random.randn(100)*10
numerical_df['C'] = np.random.randn(100)*-10
numerical_df['D'] = np.random.randn(100)*2
numerical_df['E'] = 20*(np.random.randn(100)**2)
numerical_df['F'] = np.random.randn(100)
numerical_df['price'] = 2*numerical_df['A'] +0.5*numerical_df['B'] - 9*numerical_df['C'] + numerical_df['E'] + numerical_df['D']
3. Define number of rows and columns. Create a subplots space with nr_rows and nr_cols.
nr_rows = 2
nr_cols = 4
fig, axes = plt.subplots(nrows=nr_rows, ncols=nr_cols, figsize=(15, 8))
for idx, feature in enumerate(numerical_df.columns[:-1]):
numerical_df.plot(feature, "price", subplots=True,kind="scatter",ax=axes[idx // 4,idx % 4])
4. Enumerate each feature in dataframe and plot a scatterplot with price:
for idx, feature in enumerate(numerical_df.columns[:-1]):
numerical_df.plot(feature, "price", subplots=True,kind="scatter",ax=axes[idx // 4,idx % 4])
where axes[idx // 4, idx % 4] defines the location of each scatterplot in a matrix you create in (3.)
So, we got a matrix plot:
Scatterplot matrix
This is my plot:
If I were to draw your attention to the axis labelled 'B' you'll see that everything is not as it should be.
The plots was produced using this:
def newPoly3D(self):
from matplotlib.cm import autumn
# This passes a pandas dataframe of shape (data on rows x 4 columns)
df = self.loadData()
fig = plt.figure(figsize=(10,10))
ax = fig.gca(projection='3d')
vels = [1.42,1.11,0.81,0.50]
which_joints = df.columns
L = len(which_joints)
dmin,dmax = df.min().min(),df.max().max()
dix = df.index.values
offset=-5
for i,j in enumerate(which_joints):
ax.add_collection3d(plt.fill_between(dix,df[j],
dmin,
lw=1.5,
alpha=0.3/float(i+1.),
facecolor=autumn(i/float(L))),
zs=vels[i],
zdir='y')
ax.grid(False)
ax.set_xlabel('A')
ax.set_xlim([0,df.index[-1]])
ax.set_xticks([])
ax.xaxis.set_ticklabels([])
ax.set_axis_off
ax.set_ylabel('B')
ax.set_ylim([0.4, max(vels)+0.075])
ax.set_yticks(vels)
ax.tick_params(direction='out', pad=10)
ax.set_zlabel('C')
ax.set_zlim([dmin,dmax])
ax.xaxis.labelpad = -10
ax.yaxis.labelpad = 15
ax.zaxis.labelpad = 15
# Note the inversion of the axis
plt.gca().invert_yaxis()
First I want to align the ticks on the yaxis (labelled B) with each coloured face. As you can see they are now offset slightly down.
Second I want to align the yaxis tick labels with the above, as you cans see they are currently very much offset downwards. I do not know why.
EDIT:
Here is some example data; each column represents one coloured face on the above plot.
-13.216256 -7.851065 -9.965357 -25.502654
-13.216253 -7.851063 -9.965355 -25.502653
-13.216247 -7.851060 -9.965350 -25.502651
-13.216236 -7.851052 -9.965342 -25.502647
-13.216214 -7.851038 -9.965324 -25.502639
-13.216169 -7.851008 -9.965289 -25.502623
-13.216079 -7.850949 -9.965219 -25.502592
-13.215900 -7.850830 -9.965078 -25.502529
Here we are again, with a simpler plot, reproduced with this data:
k = 10
df = pd.DataFrame(np.array([range(k),
[x + 1 for x in range(k)],
[x + 4 for x in range(k)],
[x + 9 for x in range(k)]]).T,columns=list('abcd'))
If you want to try this with the above function, comment out the df line in the function and change its argument as so def newPoly3D(df): so that you can pass the the test df above.
I'm trying to plot data from 2 seperate MultiIndex, with the same data as levels in each.
Currently, this is generating two seperate plots and I'm unable to customise the legend by appending some string to individualise each line on the graph. Any help would be appreciated!
Here is the method so far:
def plot_lead_trail_res(df_ante, df_post, symbols=[]):
if len(symbols) < 1:
print "Try again with a symbol list. (Time constraints)"
else:
df_ante = df_ante.loc[symbols]
df_post = df_post.loc[symbols]
ante_leg = [str(x)+'_ex-ante' for x in df_ante.index.levels[0]]
post_leg = [str(x)+'_ex-post' for x in df_post.index.levels[0]]
print "ante_leg", ante_leg
ax = df_ante.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION', legend=ante_leg)
ax = df_post.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION', legend=post_leg)
ax.set_xlabel('Time-shift of sentiment data (days) with financial data')
ax.set_ylabel('Mutual Information')
Using this function call:
sentisignal.plot_lead_trail_res(data_nasdaq_top_100_preprocessed_mi_res, data_nasdaq_top_100_preprocessed_mi_res_validate, ['AAL', 'AAPL'])
I obtain the following figure:
Current plots
Ideally, both sets of lines would be on the same graph with the same axes!
Update 2 [Concatenation Solution]
I've solved the issues of plotting from multiple frames using concatenation, however the legend does not match the line colors on the graph.
There are not specific calls to legend and the label parameter in plot() has not been used.
Code:
df_ante = data_nasdaq_top_100_preprocessed_mi_res
df_post = data_nasdaq_top_100_preprocessed_mi_res_validate
symbols = ['AAL', 'AAPL']
df_ante = df_ante.loc[symbols]
df_post = df_post.loc[symbols]
df_ante.index.set_levels([[str(x)+'_ex-ante' for x in df_ante.index.levels[0]],df_ante.index.levels[1]], inplace=True)
df_post.index.set_levels([[str(x)+'_ex-post' for x in df_post.index.levels[0]],df_post.index.levels[1]], inplace=True)
df_merge = pd.concat([df_ante, df_post])
df_merge['SHIFT'] = abs(df_merge['SHIFT'])
df_merge.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION')
Image:
MultiIndex Plot Image
I think, with
ax = df_ante.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION', legend=ante_leg)
you put the output of the plot() in ax, including the lines, which then get overwritten by the second function call. Am I right, that the lines which were plotted first are missing?
The official procedure would be rather something like
fig = plt.figure(figsize=(5, 5)) # size in inch
ax = fig.add_subplot(111) # if you want only one axes
now you have an axes object in ax, and can take this as input for the next plots.