Seaborn barplot - column values without estimator parameter - python

I am a beginner in seaborn plotting and noticed that sns.barplot shows the value of bars using a parameter called estimator.
Is there a way for the barplot to show the value of each column instead of using a statiscal approach through the estimator parameter?
For instance, I have the following dataframe:
data = [["2019/oct",10],["2019/oct",20],["2019/oct",30],["2019/oct",40],["2019/nov",20],["2019/dec",30]]
df = pd.DataFrame(data, columns=['Period', 'Observations'])
I would like to plot all values ​​from the Period "2019/oct" column (10,20,30 and 40), but the bar chart returns the average of these values ​​(25) for the period "2019/oct":
sns.barplot(x='Period',y='Observations',data=df,ci=None)
How can I bring all column values ​​to the chart?

barplot combines values with the same x, unless the have a different hue. If you want to keep the different value for "2019/oct", you could create a new column to attribute them a different hue:
data = [["2019/oct",10],["2019/oct",20],["2019/oct",30],["2019/oct",40],["2019/nov",20],["2019/dec",30]]
df = pd.DataFrame(data, columns=['Period', 'Observations'])
df['subgroup'] = df.groupby('Period').cumcount()+1
sns.barplot(x='Period',y='Observations',hue='subgroup',data=df,ci=None)

Related

Matplotlib plot without linear ordered

Is it possible to draw matplotlib chart without using a pandas plot to draw a chart without linear ordering the values on the left?
df = pd.DataFrame({
'x':[3,0,5],
'y':[10,4,20]
})
Chart made with the help of DataFrame:
plt.barh(df['x'],df['y'])
Without dataframe:
x = [3,0,5]
y= [10,4,20]
plt.barh(x,y)
it gives me the same result
Matplotlib chart
Output chart:
df.plot.barh('x','y')
Pandas output chart:
I would like to get such an output only with normal numbers and not numbers as the type of str
plt.barh(['3','0','5'],[10,4,20])
Is it possible? How could i get it?
You can use the index of the dataframe as y parameter and use the x values of the dataframe as tick_label:
plt.barh(df.index, width=df['y'], tick_label=df['x'])

Seaborn distplot only return one column when try to plot each Pandas column by loop for

I have problem when try to plot Pandas columns using for each loop
when i use displot instead distplot it act well, besides it only show distribution globally, not based from its group. Let say i have list of column name called columns and Pandas' dataframe n, which has column name class. The goal is to show Distribution Plot based on column for each class:
for w in columns:
if w!=<discarded column> or w!=<discarded column>:
sns.displot(n[w],kde=True
but when I use distplot, it returns only first column:
for w in columns:
if w!=<discarded column> or w!=<discarded column>:
sns.distplot(n[w],kde=True
I'm still new using Seaborn, since i never use any visualization and rely on numerical analysis like p-value and correlation. Any help are appreciated.
You probably getting only the figure corresponding to the last loop.
So you have to explicitly ask for showing the picture in each loop.
import matplotlib.pyplot as plt
for w in columns:
if w not in discarded_columns:
sns.distplot(n[w], kde=True)
plt.show()
or you can make subplots:
# Keep only target-columns
target_columns = list(filter(lambda x: x not in discarded_columns, columns))
# Plot with subplots
fig, axes = plt.subplots(len(target_columns)) # see the parameters, like: nrows, ncols ... figsize=(16,12)
for i,w in enumerate(target_columns):
sns.distplot(n[w], kde=True, ax=axes[i])

Using seaborn how do I plot a column which has 70+ categories

I am trying to plot a column from a dataframe. There are about 8500 rows and the Assignment group column has about 70+ categories. How do I plot this visually using seaborn to get some meaningful output?
nlp_data['Assignment group'].hist(figsize=(17,7))
I used the hist() method to plot
you can use heatmap for such data
seaborn.heatmap

I want to create a pie chart using a dataframe column in python

I want to create a Pie chart using single column of my dataframe, say my column name is 'Score'. I have stored scores in this column as below :
Score
.92
.81
.21
.46
.72
.11
.89
Now I want to create a pie chart with the range in percentage.
Say 0-0.4 is 30% , 0.4-0.7 is 35 % , 0.7+ is 35% .
I am using the below code using
df1['bins'] = pd.cut(df1['Score'],bins=[0,0.5,1], labels=["0-50%","50-100%"])
df1 = df.groupby(['Score', 'bins']).size().unstack(fill_value=0)
df1.plot.pie(subplots=True,figsize=(8, 3))
With the above code I am getting the Pie chart, but i don’t know how i can do this using percentage.
my pie chart look like this for now
Cutting the dataframe up into bins is the right first step. After which, you can use value_counts with normalize=True in order to get relative frequencies of values in the bins column. This will let you see percentage of data across ranges that are defined in the bins.
In terms of plotting the pie chart, I'm not sure if I understood correctly, but it seemed like you would like to display the correct legend values and the percentage values in each slice of the pie.
pandas.DataFrame.plot is a good place to see all parameters that can be passed into the plot method. You can specify what are your x and y columns to use, and by default, the dataframe index is used as the legend in the pie plot.
To show the percentage values per slice, you can use the autopct parameter as well. As mentioned in this answer, you can use all the normal matplotlib plt.pie() flags in the plot method as well.
Bringing everything together, this is the resultant code and the resultant chart:
df = pd.DataFrame({'Score': [0.92,0.81,0.21,0.46,0.72,0.11,0.89]})
df['bins'] = pd.cut(df['Score'], bins=[0,0.4,0.7,1], labels=['0-0.4','0.4-0.7','0.7-1'], right=True)
bin_percent = pd.DataFrame(df['bins'].value_counts(normalize=True) * 100)
plot = bin_percent.plot.pie(y='bins', figsize=(5, 5), autopct='%1.1f%%')
Plot of Pie Chart

How to plot multiple barplot with different Y and the same X?

I have a dataset and I want to find out how several columns values (numeric values) differ across two different groups ('group' is a column that takes either the value of 'high' or 'low').
I want to plot several barplots using a similar system/aesthetics to Seaborn's FacetGrid or PairGrid. Each plot will have a different Y value but the same X-axis (The group variable)
This is what I have so far:
sns.catplot(x='group', y='Number of findings (total)', kind="bar",
palette="muted", data=df)
But I would like to write a loop that can replace my y variable with different variables. How to do it?

Categories