category = df.category_name_column.value_counts()
I have the above series which returns the values:
CategoryA,100
CategoryB,200
I am trying to plot the top 5 category names in X - axis and values in y-axis
head = (category.head(5))
sns.barplot(x = head ,y=df.category_name_column.value_counts(), data=df)
It does not print the "names" of the categories in the X-axis, but the count. How to print the top 5 names in X and Values in Y?
You can pass in the series' index & values to x & y respectively in sns.barplot. With that the plotting code becomes:
sns.barplot(head.index, head.values)
I am trying to plot the top 5 category names in X
calling category.head(5) will return the first five values from the series category, which may be different than the top 5 based on the number of times each category appears. If you want the 5 most frequent categories, it is necessary to sort the series first & then call head(5). Like this:
category = df.category_name_column.value_counts()
head = category.sort_values(ascending=False).head(5)
Since the previous accepted solution is deprecated in seaborn. Another workaround could be as follows:
Convert series to dataframe
category = df.category_name_column.value_counts()
category_df = category.reset_index()
category_df.columns = ['categories', 'frequency']
Use barplot
ax = sns.barplot(x = 'categories', y = 'frequency', data = category_df)
Although this is not exactly plot of series, this is a workaround that's officially supported by seaborn.
For more barplot examples please refer here:
https://seaborn.pydata.org/generated/seaborn.barplot.html
https://stackabuse.com/seaborn-bar-plot-tutorial-and-examples/
Related
This is a problem I encounter very often. I have a plotly figure with column and row facets. I have already unlinked the y axes using fig.update_yaxes(matches=None). However, while this is useful to scale axes between rows 1 and 2 as they exist in quite different domains, it breaks the ability to compare among column facets. You can see this issue in the plot below:
So my question is, how can I have the same y axes across all column facets in each row, while having different y axes for row 1 and row 2?
In order to ensure a row-wise matching you'll have to specify the following for the first row:
fig.layout.yaxis.matches = 'y'
fig.layout.yaxis2.matches = 'y'
fig.layout.yaxis3.matches = 'y'
And this for the second:
fig.layout.yaxis4.matches = 'y4'
fig.layout.yaxis5.matches = 'y4'
fig.layout.yaxis6.matches = 'y4'
As you can see, all y-axes are tied to the first y-axis of each corresponding row.
For those of you who would like to try it out, here's an example that builds on a facet plot
Complete code:
import plotly.express as px
df = px.data.gapminder()
fig = px.scatter(df, x='gdpPercap', y='lifeExp', color='continent', size='pop',
facet_col='year', facet_col_wrap=4)
fig.layout.yaxis.matches = 'y'
fig.layout.yaxis2.matches = 'y'
fig.layout.yaxis3.matches = 'y'
fig.layout.yaxis4.matches = 'y'
fig.layout.yaxis5.matches = 'y5'
fig.layout.yaxis7.matches = 'y5'
fig.layout.yaxis6.matches = 'y5'
fig.layout.yaxis8.matches = 'y5'
fig.layout.yaxis9.matches = 'y9'
fig.layout.yaxis10.matches = 'y9'
fig.layout.yaxis11.matches = 'y9'
fig.layout.yaxis12.matches = 'y9'
fig.show()
nrows = df.row_var.nunique() # or find a way to get number of rows from fig object..
for i in range(0,nrows):
fig.update_yaxes(showticklabels=True, matches=f'y{i+1}', col=i+1)
https://github.com/plotly/plotly_express/issues/147#issuecomment-537814046
My graph is ending up looking like this:
I took the original titanic dataset and sliced some columns and created a new dataframe via the following code.
Cabin_group = titanic[['Fare', 'Cabin', 'Survived']] #selecting certain columns from dataframe
Cabin_group.Cabin = Cabin_group.Cabin.str[0] #cleaning the Cabin column
Cabin_group = Cabin_group.groupby('Cabin', as_index =False).Survived.mean()
Cabin_group.drop([6,7], inplace = True) #drop Cabin G and T as instances are too low
Cabin_group['Status']= ('Poor', 'Rich', 'Rich', 'Medium', 'Medium', 'Poor') #giving each Cabin a status value.
So my new dataframe `Cabin_group' ends up looking like this:
Cabin Survived Status
0 A 0.454545 Poor
1 B 0.676923 Rich
2 C 0.574468 Rich
3 D 0.652174 Medium
4 E 0.682927 Medium
5 F 0.523810 Poor
Here is how I tried to plot the dataframe
fig = plt.subplots(1,1, figsize = (10,4))
sns.barplot(x ='Cabin', y='Survived', hue ='Status', data = Cabin_group )
plt.show()
So a couple of things are off with this graph;
First we have the bars A, D, E and F shifted away from their respective x-axis labels. Secondly, the bars itself seem to appear thinner/skinnier than my usual barplots.
Not sure how to shift the bars to their proper place, as well as how to control the width of the bars.
Thank you.
This can be achieved by doing dodge = False. It is handled in the new version of seaborn.
The bar are not aligned since it expects 3 bars for each x (1 for each distinct value of Status) and only one is provided. I think one of the solution is to map a color to the Status. As far as i know it is not possible to do thaht easily. However, here is an example of how to do that. I'm not sure about that since it seems complicated to simply map a color to a category (and the legend is not displayed).
# Creating a color mapping
Cabin_group['Color'] = Series(pd.factorize(Cabin_group['Status'])[0]).map(
lambda x: sns.color_palette()[x])
g = sns.barplot(x ='Cabin', y='Survived', data=Cabin_group, palette=Cabin_group['Color'])
When I see how simple it is in R ... But infortunately the ggplot implementation in Python does not allow to plot a geom_bar with stat = 'identity'.
library(tidyverse)
Cabin_group %>% ggplot() +
geom_bar(aes(x = Cabin, y= Survived, fill = Status),
stat = 'identity')
df in my program happens to be a dataframe with these columns :
df.columns
'''output : Index(['lat', 'lng', 'desc', 'zip', 'title', 'timeStamp', 'twp', 'addr', 'e',
'reason'],
dtype='object')'''
When I execute this piece of code:
sns.countplot(x = df['reason'], data=df)
# output is the plot below
but if i slightly tweak my code like this :
p = df['reason'].value_counts()
k = pd.DataFrame({'causes':p.index,'freq':p.values})
sns.countplot(x = k['causes'], data = k)
So essentially I just stored the 'reasons' column values and its frequencies as a series in p and then converted them to another dataframe k but this new countplot doesn't have the right range of Y-axis for the given values.
My doubts happen to be :
Can we set of Y-axis in the second countplot in its appropriate limits
Why the does second countplot differ from the first one when i just separated the specific column i wanted to graph and plotted it separately ?
I am plotting Density Graphs using Pandas Plot. But I am not able to add appropriate legends for each of the graphs. My code and result is as as below:-
for i in tickers:
df = pd.DataFrame(dic_2[i])
mean=np.average(dic_2[i])
std=np.std(dic_2[i])
maximum=np.max(dic_2[i])
minimum=np.min(dic_2[i])
df1=pd.DataFrame(np.random.normal(loc=mean,scale=std,size=len(dic_2[i])))
ax=df.plot(kind='density', title='Returns Density Plot for '+ str(i),colormap='Reds_r')
df1.plot(ax=ax,kind='density',colormap='Blues_r')
You can see in the pic, top right side box, the legends are coming as 0. How do I add something meaningful over there?
print(df.head())
0
0 -0.019043
1 -0.0212065
2 0.0060413
3 0.0229895
4 -0.0189266
I think you may want to restructure the way you've created the graph. An easy way to do this is to create the ax before plotting:
# sample data
df = pd.DataFrame()
df['returns_a'] = [x for x in np.random.randn(100)]
df['returns_b'] = [x for x in np.random.randn(100)]
print(df.head())
returns_a returns_b
0 1.110042 -0.111122
1 -0.045298 -0.140299
2 -0.394844 1.011648
3 0.296254 -0.027588
4 0.603935 1.382290
fig, ax = plt.subplots()
I then created the dataframe using the parameters specified in your variables:
mean=np.average(df.returns_a)
std=np.std(df.returns_a)
maximum=np.max(df.returns_a)
minimum=np.min(df.returns_a)
pd.DataFrame(np.random.normal(loc=mean,scale=std,size=len(df.returns_a))).rename(columns={0: 'std_normal'}).plot(kind='density',colormap='Blues_r', ax=ax)
df.plot('returns_a', kind='density', ax=ax)
This second dataframe you're working with is created by default with column 0. You'll need to rename this.
I figured out a simpler way to do this. Just add column names to the dataframes.
for i in tickers:
df = pd.DataFrame(dic_2[i],columns=['Empirical PDF'])
print(df.head())
mean=np.average(dic_2[i])
std=np.std(dic_2[i])
maximum=np.max(dic_2[i])
minimum=np.min(dic_2[i])
df1=pd.DataFrame(np.random.normal(loc=mean,scale=std,size=len(dic_2[i])),columns=['Normal PDF'])
ax=df.plot(kind='density', title='Returns Density Plot for '+ str(i),colormap='Reds_r')
df1.plot(ax=ax,kind='density',colormap='Blues_r')
I want to make a simulation of some data and I want to display my points with different colors for different categories. I have three columns where two columns I am using are x,y and I want to use third column which has two categories to be reflected on my plot.
y = np.array(q)
x = np.array(p)
fig = plt.figure(figsize = (18,18))
plt.show()
for t in range(6000):
ax = fig.add_subplot(2,1,1)
for i in s[t:t+4]: # s is a list that contains the third column
if i == 'Match':
ax.plot(x[i], y[i], 'bs')
else:
ax.plot(x[i],y[i],'ro')
There are lots of ways to do this, here is one using Pandas
#generate data
df = pd.DataFrame(np.random.random(size=(100,2)), columns=['x','y'])
df.loc[:,'cat'] = ['Match' if np.random.randint(0,2)==1 else '-' for i in range(100) ]
plt.plot(df.loc[df.cat=='Match','x'],df.loc[df.cat=='Match','y'],'bs')
plt.plot(df.loc[df.cat!='Match','x'],df.loc[df.cat=='Match','y'],'ro')