I have a dataframe with 4 columns. I want to plot it in sns.distplot as following:
g = sns.displot(dataframe, height = 25, kind = "kde", x = "value", fill = True, hue = "Testset", col = "Session", row = "Timepoint")
It produces the following plot with empty subplots, because I don't have all the combinations of values. Is there a way to remove the empty plots and plot it under one another.
This is a problem I encounter very often. I have a plotly figure with column and row facets. I have already unlinked the y axes using fig.update_yaxes(matches=None). However, while this is useful to scale axes between rows 1 and 2 as they exist in quite different domains, it breaks the ability to compare among column facets. You can see this issue in the plot below:
So my question is, how can I have the same y axes across all column facets in each row, while having different y axes for row 1 and row 2?
In order to ensure a row-wise matching you'll have to specify the following for the first row:
fig.layout.yaxis.matches = 'y'
fig.layout.yaxis2.matches = 'y'
fig.layout.yaxis3.matches = 'y'
And this for the second:
fig.layout.yaxis4.matches = 'y4'
fig.layout.yaxis5.matches = 'y4'
fig.layout.yaxis6.matches = 'y4'
As you can see, all y-axes are tied to the first y-axis of each corresponding row.
For those of you who would like to try it out, here's an example that builds on a facet plot
Complete code:
import plotly.express as px
df = px.data.gapminder()
fig = px.scatter(df, x='gdpPercap', y='lifeExp', color='continent', size='pop',
facet_col='year', facet_col_wrap=4)
fig.layout.yaxis.matches = 'y'
fig.layout.yaxis2.matches = 'y'
fig.layout.yaxis3.matches = 'y'
fig.layout.yaxis4.matches = 'y'
fig.layout.yaxis5.matches = 'y5'
fig.layout.yaxis7.matches = 'y5'
fig.layout.yaxis6.matches = 'y5'
fig.layout.yaxis8.matches = 'y5'
fig.layout.yaxis9.matches = 'y9'
fig.layout.yaxis10.matches = 'y9'
fig.layout.yaxis11.matches = 'y9'
fig.layout.yaxis12.matches = 'y9'
nrows = df.row_var.nunique() # or find a way to get number of rows from fig object..
for i in range(0,nrows):
fig.update_yaxes(showticklabels=True, matches=f'y{i+1}', col=i+1)
I am trying to create a grid of Subplots for a predetermined x & y data. The functions should iterate through a pandas DataFrame, identify Categorical variables and plot the x & y data with a line for each level of a given categorial variable. The number of plots is equal to the number of Categorical variables, and the number of lines on each plot should be reflective of the number of categories for that variable.
I initially tried to group the Dataframe in a For loop on a given categorical variable, but I have had some mixed results. I think My issue is in how I am assigning what axis the lines are getting drawn on.
def grouping_for_graphs(df,x_col, y_col,category,func):
funtion to group dataframe given a variable and
aggregation function
X = df[x_col].name
y = df[y_col].name
category = df[category].name
df_grouped = df.groupby([X, category])[y].apply(func)
return df_grouped.reset_index()
# create a list of categorical variables to plot
cat_list = []
col_list = list(df.select_dtypes(include = ['object']).columns)
for col in col_list:
if len(df[col].unique()) < 7:
# create plots and axes
fig, axs = plt.subplots(2, 2, figsize=(30,24))
axs = axs.flatten()
# pick plot function
plot_func = plt.plot
# plot this
for ax, category in zip(axs, cat_list):
df_grouped = grouping_for_graphs(df,x_col, y_col,category,agg_func)
x_col = df_grouped.columns[0]
y_col = df_grouped.columns[-1]
category = str(list(df_grouped.columns.drop([x_lab, y_lab]))[0])
for feature in list(df_grouped[category].unique()):
X = df_grouped[df_grouped[category] == feature][x_col]
y = df_grouped[df_grouped[category] == feature][y_col]
ax.plot = plot_func(X,y)
Other than getting an error that ax.plot is a 'list' object and is not callable, all the lines drawn are put on the final plot of the subplots.
I am confused with your plot_func. Remove this and just directly plot using ax.plot(X, y). The modified line is highlighted by a comment
fig, axs = plt.subplots(2, 2, figsize=(30,24))
axs = axs.flatten()
for ax, category in zip(axs, cat_list):
df_grouped = grouping_for_graphs(df,x_col, y_col,category,agg_func)
x_col = df_grouped.columns[0]
y_col = df_grouped.columns[-1]
category = str(list(df_grouped.columns.drop([x_lab, y_lab]))[0])
for feature in list(df_grouped[category].unique()):
X = df_grouped[df_grouped[category] == feature][x_col]
y = df_grouped[df_grouped[category] == feature][y_col]
ax.plot(X,y) # <--- Modified here
If I draw the plot using the following code, it works and I can see all the subplots in a single row. I can specifically break the number of cols into three or two and show them. But I have 30 columns and I wanted to use a loop mechanism so that they are plotted in a grid of say 4x4 sub-plots
regressionCols = ['col_a', 'col_b', 'col_c', 'col_d', 'col_e']
sns.pairplot(numerical_df, x_vars=regressionCols, y_vars='price',height=4, aspect=1, kind='scatter')
The code using loop is below. However, I don't see anything rendered.
nr_rows = 4
nr_cols = 4
li_cat_cols = list(regressionCols)
fig, axs = plt.subplots(nr_rows, nr_cols, figsize=(nr_cols*4,nr_rows*4), squeeze=False)
for r in range(0, nr_rows):
for c in range(0,nr_cols):
i = r*nr_cols+c
if i < len(li_cat_cols):
bp=sns.pairplot(numerical_df, x_vars=li_cat_cols[i], y_vars='price',height=4, aspect=1, kind='scatter')
bp.set(xlabel=li_cat_cols[i], ylabel='Price')
Not sure what I am missing.
I think you didnt connect each of your subplot spaces in a matrix plot to scatter plots generated in a loop.
Maybe this solution with inner pandas plots could be proper for you:
For example,
1.Lets simply define an empty pandas dataframe.
numerical_df = pd.DataFrame([])
2. Create some random features and price depending on them:
numerical_df['A'] = np.random.randn(100)
numerical_df['B'] = np.random.randn(100)*10
numerical_df['C'] = np.random.randn(100)*-10
numerical_df['D'] = np.random.randn(100)*2
numerical_df['E'] = 20*(np.random.randn(100)**2)
numerical_df['F'] = np.random.randn(100)
numerical_df['price'] = 2*numerical_df['A'] +0.5*numerical_df['B'] - 9*numerical_df['C'] + numerical_df['E'] + numerical_df['D']
3. Define number of rows and columns. Create a subplots space with nr_rows and nr_cols.
nr_rows = 2
nr_cols = 4
fig, axes = plt.subplots(nrows=nr_rows, ncols=nr_cols, figsize=(15, 8))
for idx, feature in enumerate(numerical_df.columns[:-1]):
numerical_df.plot(feature, "price", subplots=True,kind="scatter",ax=axes[idx // 4,idx % 4])
4. Enumerate each feature in dataframe and plot a scatterplot with price:
for idx, feature in enumerate(numerical_df.columns[:-1]):
numerical_df.plot(feature, "price", subplots=True,kind="scatter",ax=axes[idx // 4,idx % 4])
where axes[idx // 4, idx % 4] defines the location of each scatterplot in a matrix you create in (3.)
So, we got a matrix plot:
Scatterplot matrix
I have a pandas DataFrame containing NaN values. I want to make a bar plot with the indexes in the x axys, and a bar for each column, grouped by the indexes. I would like to plot only the bars with an actual value.
As far as I'm trying, from this example:
df = pandas.DataFrame({'foo':[1,None,None], 'bar':[None,2,0.5], 'col': [1,1.5,None]}, index=["A","B","C"])
I can produce this plot:
What I would like is to remove the blank spaces left for the NaN columns. So to compact the bars and center the group above the x tick.
You can do something like the code below, by going through each row of the dataframe
and checking each column for NaNs.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(
{"foo": [1, None, None], "bar": [None, 2, 0.5], "col": [1, 1.5, None]},
index=["A", "B", "C"],
# define the colors for each column
colors = {"foo": "blue", "bar": "orange", "col": "green"}
fig = plt.figure(figsize=(10, 6))
ax = plt.gca()
# width of bars
width = 1
# create emptly lists for x tick positions and names
x_ticks, x_ticks_pos = [], []
# counter for helping with x tick positions
count = 0
# reset the index
# so that we can iterate through the numbers.
# this will help us to get the x tick positions
df = df.reset_index()
# go through each row of the dataframe
for idx, row in df.iterrows():
# this will be the first bar position for this row
count += idx
# this will be the start of the first bar for this row
start_idx = count - width / 2
# this will be the end of the last bar for this row
end_idx = start_idx
# for each column in the wanted columns,
# if the row is not null,
# add the bar to the plot
# also update the end position of the bars for this row
for column in df.drop(["index"], axis=1).columns:
if row[column] == row[column]:
plt.bar(count, row[column], color=colors[column], width=width, label=column)
count += 1
end_idx += width
# this checks if the row had any not NULL value in the desired columns
# in other words, it checks if there was any bar for this row
# if yes, add the center of all the row's bars and the row's name (A,B,C) to the respective lists
if end_idx != start_idx:
x_ticks_pos.append((end_idx + start_idx) / 2)
# now set the x_ticks
plt.xticks(x_ticks_pos, x_ticks)
# also plot the legends
# and make sure to not display duplicate labels
# the below code is taken from:
# https://stackoverflow.com/questions/13588920/stop-matplotlib-repeating-labels-in-legend
handles, labels = plt.gca().get_legend_handles_labels()
by_label = dict(zip(labels, handles))
plt.legend(by_label.values(), by_label.keys())
I'm trying to plot data from 2 seperate MultiIndex, with the same data as levels in each.
Currently, this is generating two seperate plots and I'm unable to customise the legend by appending some string to individualise each line on the graph. Any help would be appreciated!
Here is the method so far:
def plot_lead_trail_res(df_ante, df_post, symbols=[]):
if len(symbols) < 1:
print "Try again with a symbol list. (Time constraints)"
df_ante = df_ante.loc[symbols]
df_post = df_post.loc[symbols]
ante_leg = [str(x)+'_ex-ante' for x in df_ante.index.levels[0]]
post_leg = [str(x)+'_ex-post' for x in df_post.index.levels[0]]
print "ante_leg", ante_leg
ax = df_ante.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION', legend=ante_leg)
ax = df_post.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION', legend=post_leg)
ax.set_xlabel('Time-shift of sentiment data (days) with financial data')
ax.set_ylabel('Mutual Information')
Using this function call:
sentisignal.plot_lead_trail_res(data_nasdaq_top_100_preprocessed_mi_res, data_nasdaq_top_100_preprocessed_mi_res_validate, ['AAL', 'AAPL'])
I obtain the following figure:
Current plots
Ideally, both sets of lines would be on the same graph with the same axes!
Update 2 [Concatenation Solution]
I've solved the issues of plotting from multiple frames using concatenation, however the legend does not match the line colors on the graph.
There are not specific calls to legend and the label parameter in plot() has not been used.
df_ante = data_nasdaq_top_100_preprocessed_mi_res
df_post = data_nasdaq_top_100_preprocessed_mi_res_validate
symbols = ['AAL', 'AAPL']
df_ante = df_ante.loc[symbols]
df_post = df_post.loc[symbols]
df_ante.index.set_levels([[str(x)+'_ex-ante' for x in df_ante.index.levels[0]],df_ante.index.levels[1]], inplace=True)
df_post.index.set_levels([[str(x)+'_ex-post' for x in df_post.index.levels[0]],df_post.index.levels[1]], inplace=True)
df_merge = pd.concat([df_ante, df_post])
df_merge['SHIFT'] = abs(df_merge['SHIFT'])
df_merge.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION')
MultiIndex Plot Image
I think, with
ax = df_ante.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION', legend=ante_leg)
you put the output of the plot() in ax, including the lines, which then get overwritten by the second function call. Am I right, that the lines which were plotted first are missing?
The official procedure would be rather something like
fig = plt.figure(figsize=(5, 5)) # size in inch
ax = fig.add_subplot(111) # if you want only one axes
now you have an axes object in ax, and can take this as input for the next plots.