I'm having troubles trying to make a bar plot from a pandas dataframe, which should be easy but I can't make it work.
I have a dataframe that looks like that:
Data A
Data B
Data C
timestamp
06:54:00
0.1
0.2
0.3
But instead of 3 columns with Data, I have 99.
The point is that I am trying to do a bar plot representing in the x axis the different Data and in the y axis the values.
I tried with:
p = data.hvplot.bar(x = 'Data', y = 'Units', rot = 90)
And
p = data.plot(kind='bar', title="Data", figsize=(15, 10), legend=True, fontsize=12)
But none of them are working, and I think that the problem comes from the format of my dataframe, because of the column 'timestamp'.
However, I haven't manage to delete it, I tried:
data = data.droplevel('timestamp')
And:
data = data.drop(['timestamp'], axis=1)
But none of them are working. Could someone please give me a hand with that?
I finally managed to solve it.
What I did was:
new_df = data.melt(var_name="Data")
To get a new dataframe without the timestamp.
Then:
titles = new_df['Data'].to_list()
values = new_df['value'].to_list()
To get two lists, one with the titles and another one with the values.
And then I plotted the chart with the following code:
p = figure(x_range=titles, height=500, width=1500, title="Unit",
toolbar_location=None, tools="")
p.vbar(x=titles, top=values, width=0.6)
p.xgrid.grid_line_color = None
p.xaxis.major_label_orientation = "vertical"
p.y_range.start = 0
Thank you all,
You can try this:
new_df = df.melt(var_name="Data")
new_df.plot(kind='bar', x='Data', y='value')
Related
I have a DataFrame like below. It has Actual and Predicted columns. I want to compare Actual Vs Predicted in Bar plot in one on one. I have confidence value for Predicted column and default for Actual confidence is 1. So, I want to keep Each row in single bar group Actual and Predicted value will be X axis and corresponding Confidence score as y value.
I am unable to get the expected plot because X axis values are not aligned or grouped to same value in each row.
Actual Predicted Confidence
0 A A 0.90
1 B C 0.30
2 C C 0.60
3 D D 0.75
Expected Bar plot.
Any hint would be appreciable. Please let me know if further details required.
What I have tried so far.
df_actual = pd.DataFrame()
df_actual['Key']= df['Actual'].copy()
df_actual['Confidence'] = 1
df_actual['Identifier'] = 'Actual'
df_predicted=pd.DataFrame()
df_predicted = df[['Predicted', 'Confidence']]
df_predicted = df_predicted.rename(columns={'Predicted': 'Key'})
df_predicted['Identifier'] = 'Predicted'
df_combined = pd.concat([df_actual,df_predicted], ignore_index=True)
df_combined
fig = px.bar(df_combined, x="Key", y="Confidence", color='Identifier',
barmode='group', height=400)
fig.show()
I have found that adjusting the data first makes it easier to get the plot I want. I have used Seaborn, hope that is ok. Please see if this code works for you. I have considered that the df mentioned above is already available. I created df2 so that it aligns to what you had shown in the expected figure. Also, I used index as the X-axis column so that the order is maintained... Some adjustments to ensure xtick names align and the legend is outside as you wanted it.
Code
vals= []
conf = []
for x, y, z in zip(df.Actual, df.Predicted, df.Confidence):
vals += [x, y]
conf += [1, z]
df2 = pd.DataFrame({'Values': vals, 'Confidence':conf}).reset_index()
ax=sns.barplot(data = df2, x='index', y='Confidence', hue='Values',dodge=False)
ax.set_xticklabels(['Actual', 'Predicted']*4)
plt.legend(bbox_to_anchor=(1.0,1))
plt.show()
Plot
Update - grouping Actual and Predicted bars
Hi #Mohammed - As we have already used up hue, I don't think there is a way to do this easily with Seaborn. You would need to use matplotlib and adjust the bar position, xtick positions, etc. Below is the code that will do this. You can change SET1 to another color map to change colors. I have also added a black outline as the same colored bars were blending into one another. Further, I had to rotate the xlables, as they were on top of one another. You can change it as per your requirements. Hope this helps...
vals = df[['Actual','Predicted']].melt(value_name='texts')['texts']
conf = [1]*4 + list(df.Confidence)
ident = ['Actual', 'Predicted']*4
df2 = pd.DataFrame({'Values': vals, 'Confidence':conf, 'Identifier':ident}).reset_index()
uvals, uind = np.unique(df2["Values"], return_inverse=1)
cmap = plt.cm.get_cmap("Set1")
fig, ax=plt.subplots()
l = len(df2)
pos = np.arange(0,l) % (l//2) + (np.arange(0,l)//(l//2)-1)*0.4
ax.bar(pos, df2["Confidence"], width=0.4, align="edge", ec="k",color=cmap(uind) )
handles=[plt.Rectangle((0,0),1,1, color=cmap(i), ec="k") for i in range(len(uvals))]
ax.legend(handles=handles, labels=list(uvals), prop ={'size':10}, loc=9, ncol=8)
pos=pos+0.2
pos.sort()
ax.set_xticks(pos)
ax.set_xticklabels(df2["Identifier"][:l], rotation=45,ha='right', rotation_mode="anchor")
ax.set_ylim(0, 1.2)
plt.show()
Output plot
I updated #Redox answer to get the exact output.
df_ = pd.DataFrame({'Labels': df.reset_index()[['Actual', 'Predicted', 'index']].values.ravel(),
'Confidence': np.array(list(zip(np.repeat(1, len(df)), df['Confidence'].values, np.repeat(0, len(df))))).ravel()})
df_.loc[df_['Labels'].astype(str).str.isdigit(), 'Labels'] = ''
plt.figure(figsize=(15, 6))
ax=sns.barplot(data = df_, x=df_.index, y='Confidence', hue='Labels',dodge=False, ci=None)
ax.set_xticklabels(['Actual', 'Predicted', '']*len(df))
plt.setp(ax.get_xticklabels(), rotation=90)
ax.tick_params(labelsize=14)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
Output:
Removed loop to improve performance
Added blank bar values to look alike group chart.
This is a problem I encounter very often. I have a plotly figure with column and row facets. I have already unlinked the y axes using fig.update_yaxes(matches=None). However, while this is useful to scale axes between rows 1 and 2 as they exist in quite different domains, it breaks the ability to compare among column facets. You can see this issue in the plot below:
So my question is, how can I have the same y axes across all column facets in each row, while having different y axes for row 1 and row 2?
In order to ensure a row-wise matching you'll have to specify the following for the first row:
fig.layout.yaxis.matches = 'y'
fig.layout.yaxis2.matches = 'y'
fig.layout.yaxis3.matches = 'y'
And this for the second:
fig.layout.yaxis4.matches = 'y4'
fig.layout.yaxis5.matches = 'y4'
fig.layout.yaxis6.matches = 'y4'
As you can see, all y-axes are tied to the first y-axis of each corresponding row.
For those of you who would like to try it out, here's an example that builds on a facet plot
Complete code:
import plotly.express as px
df = px.data.gapminder()
fig = px.scatter(df, x='gdpPercap', y='lifeExp', color='continent', size='pop',
facet_col='year', facet_col_wrap=4)
fig.layout.yaxis.matches = 'y'
fig.layout.yaxis2.matches = 'y'
fig.layout.yaxis3.matches = 'y'
fig.layout.yaxis4.matches = 'y'
fig.layout.yaxis5.matches = 'y5'
fig.layout.yaxis7.matches = 'y5'
fig.layout.yaxis6.matches = 'y5'
fig.layout.yaxis8.matches = 'y5'
fig.layout.yaxis9.matches = 'y9'
fig.layout.yaxis10.matches = 'y9'
fig.layout.yaxis11.matches = 'y9'
fig.layout.yaxis12.matches = 'y9'
fig.show()
nrows = df.row_var.nunique() # or find a way to get number of rows from fig object..
for i in range(0,nrows):
fig.update_yaxes(showticklabels=True, matches=f'y{i+1}', col=i+1)
https://github.com/plotly/plotly_express/issues/147#issuecomment-537814046
So, I have made a stripplot with seaborn the easiest way, with 5 different categories:
sns.set_style('whitegrid')
plt.figure(figsize=(35,20))
sns.set(font_scale = 3)
sns.stripplot(df.speed, df.routeID, hue=df.speed>50, jitter=0.2, alpha=0.5, size=10, edgecolor='black')
plt.xlabel("Speed", size=40)
plt.ylabel("route ID", size=40)
plt.title("Velocity stripplot", size=50)
Now, the thing is I want to have a different hue for each category, say speed greater than 50 kmh for first category, 30 kmh for second and so on. Is this possible? I tried to do it passing a list for hue:
hue=([("ROUTE 30">50),("ROUTE 104">0)])
but it marks: SyntaxError: invalid syntax
The thing is, I want to do it all at once (since the most obvious answer would be to plot separately) in the same plot, how can this be done?
EDIT: I followed the suggested answer. Used the same code:
plt.figure(figsize=(20,7))
my_palette = ['b' if x > 82 else 'g' for x in df.speed.values]
sns.stripplot(df.speed, df.routeID, jitter=0.2, alpha=0.5, size=8, edgecolor='black', palette = my_palette)
but didnt turned out like expected:
I dont understand what is wrong here. Any ideas?
I suggest to create separate column in df for dot color.
try this:
# INITIAL DATA
n = 1000
df = pd.DataFrame()
df['speed'] = np.random.randint(10,90,n)
df['routeID'] = np.random.choice(['ROUTE_5','ROUTE_66','ROUTE_95','ROUTE_101'], n)
# set hue indices to match your conditions
df['hue'] = 'normal' # new column with default value
df.loc[df.speed > 50, 'hue'] = 'fast'
df.loc[(df.routeID=="ROUTE_5") & (df.speed>40)|
(df.routeID=="ROUTE_66") & (df.speed>30)|
(df.routeID=="ROUTE_95") & (df.speed>60),
'hue'] = 'special'
palette = {'normal':'g','fast':'r','special':'magenta'}
sns.stripplot(x=df.speed, y=df.routeID, size=15,
hue=df.hue, palette=palette)
I am plotting Density Graphs using Pandas Plot. But I am not able to add appropriate legends for each of the graphs. My code and result is as as below:-
for i in tickers:
df = pd.DataFrame(dic_2[i])
mean=np.average(dic_2[i])
std=np.std(dic_2[i])
maximum=np.max(dic_2[i])
minimum=np.min(dic_2[i])
df1=pd.DataFrame(np.random.normal(loc=mean,scale=std,size=len(dic_2[i])))
ax=df.plot(kind='density', title='Returns Density Plot for '+ str(i),colormap='Reds_r')
df1.plot(ax=ax,kind='density',colormap='Blues_r')
You can see in the pic, top right side box, the legends are coming as 0. How do I add something meaningful over there?
print(df.head())
0
0 -0.019043
1 -0.0212065
2 0.0060413
3 0.0229895
4 -0.0189266
I think you may want to restructure the way you've created the graph. An easy way to do this is to create the ax before plotting:
# sample data
df = pd.DataFrame()
df['returns_a'] = [x for x in np.random.randn(100)]
df['returns_b'] = [x for x in np.random.randn(100)]
print(df.head())
returns_a returns_b
0 1.110042 -0.111122
1 -0.045298 -0.140299
2 -0.394844 1.011648
3 0.296254 -0.027588
4 0.603935 1.382290
fig, ax = plt.subplots()
I then created the dataframe using the parameters specified in your variables:
mean=np.average(df.returns_a)
std=np.std(df.returns_a)
maximum=np.max(df.returns_a)
minimum=np.min(df.returns_a)
pd.DataFrame(np.random.normal(loc=mean,scale=std,size=len(df.returns_a))).rename(columns={0: 'std_normal'}).plot(kind='density',colormap='Blues_r', ax=ax)
df.plot('returns_a', kind='density', ax=ax)
This second dataframe you're working with is created by default with column 0. You'll need to rename this.
I figured out a simpler way to do this. Just add column names to the dataframes.
for i in tickers:
df = pd.DataFrame(dic_2[i],columns=['Empirical PDF'])
print(df.head())
mean=np.average(dic_2[i])
std=np.std(dic_2[i])
maximum=np.max(dic_2[i])
minimum=np.min(dic_2[i])
df1=pd.DataFrame(np.random.normal(loc=mean,scale=std,size=len(dic_2[i])),columns=['Normal PDF'])
ax=df.plot(kind='density', title='Returns Density Plot for '+ str(i),colormap='Reds_r')
df1.plot(ax=ax,kind='density',colormap='Blues_r')
I'm trying to plot data from 2 seperate MultiIndex, with the same data as levels in each.
Currently, this is generating two seperate plots and I'm unable to customise the legend by appending some string to individualise each line on the graph. Any help would be appreciated!
Here is the method so far:
def plot_lead_trail_res(df_ante, df_post, symbols=[]):
if len(symbols) < 1:
print "Try again with a symbol list. (Time constraints)"
else:
df_ante = df_ante.loc[symbols]
df_post = df_post.loc[symbols]
ante_leg = [str(x)+'_ex-ante' for x in df_ante.index.levels[0]]
post_leg = [str(x)+'_ex-post' for x in df_post.index.levels[0]]
print "ante_leg", ante_leg
ax = df_ante.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION', legend=ante_leg)
ax = df_post.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION', legend=post_leg)
ax.set_xlabel('Time-shift of sentiment data (days) with financial data')
ax.set_ylabel('Mutual Information')
Using this function call:
sentisignal.plot_lead_trail_res(data_nasdaq_top_100_preprocessed_mi_res, data_nasdaq_top_100_preprocessed_mi_res_validate, ['AAL', 'AAPL'])
I obtain the following figure:
Current plots
Ideally, both sets of lines would be on the same graph with the same axes!
Update 2 [Concatenation Solution]
I've solved the issues of plotting from multiple frames using concatenation, however the legend does not match the line colors on the graph.
There are not specific calls to legend and the label parameter in plot() has not been used.
Code:
df_ante = data_nasdaq_top_100_preprocessed_mi_res
df_post = data_nasdaq_top_100_preprocessed_mi_res_validate
symbols = ['AAL', 'AAPL']
df_ante = df_ante.loc[symbols]
df_post = df_post.loc[symbols]
df_ante.index.set_levels([[str(x)+'_ex-ante' for x in df_ante.index.levels[0]],df_ante.index.levels[1]], inplace=True)
df_post.index.set_levels([[str(x)+'_ex-post' for x in df_post.index.levels[0]],df_post.index.levels[1]], inplace=True)
df_merge = pd.concat([df_ante, df_post])
df_merge['SHIFT'] = abs(df_merge['SHIFT'])
df_merge.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION')
Image:
MultiIndex Plot Image
I think, with
ax = df_ante.unstack(0).plot(x='SHIFT', y='MUTUAL_INFORMATION', legend=ante_leg)
you put the output of the plot() in ax, including the lines, which then get overwritten by the second function call. Am I right, that the lines which were plotted first are missing?
The official procedure would be rather something like
fig = plt.figure(figsize=(5, 5)) # size in inch
ax = fig.add_subplot(111) # if you want only one axes
now you have an axes object in ax, and can take this as input for the next plots.