I'm trying to create factorplots for values from one column with 18 values and I'm adding hue parameters as a different column also with 18 unique values, this results in huge chart that is not easy to read. So I want to create separate charts for every unique value from the column so that it's more clearly visible.
So currently it looks like this:
factorplot
And I want to split those 18 charts divided by hue into separate charts.
I was thinking of using loop but I'm stuck at this point:
for i in dframe.type1.unique():
sns.factorplot(x='type1',data=dframe, kind='count')
You need to use the col parameter. Check out more examples on the seaborn doc page for factorplots about 2/3 of the way down.
sns.factorplot(x='type1', col='type2', col_wrap=4, data=dframe, kind='count',
sharex=False, sharey=False)
Related
I am new to Python and programming in general. I am trying to generate multiple horizontal bar graphs from a grouped dataframe. I have been successful, in part, by using the following script:
#building dataframes for data plots
plotdf = pd.DataFrame(merge_df, columns= ['ppset_id', 'Cell Line', 'Undil_CT'])
plotdf = plotdf.dropna()
plotdf = plotdf.reset_index(drop=True)
plotdf.groupby('ppset_id').plot(x="Cell Line", y="Undil_CT", kind = "barh", xlim=(0,45), figsize=(11,8), legend=None)
The output is almost exactly what I want with data for each ppset_id in an individual chart, except each chart is missing the chart title. I am trying to set the chart title for each graph to the unique ppset_id value associated with the data plotted, but haven't been able to find a way to do this after an extensive forum search.
Thanks in advance for your help.
Using seaborn tips dataset as an example.
Store all axes in a variable (so you can later access each axis).
Access group names from DataFrame.groupby(column_name).groups.keys()
Both are equal length, so iterate over both and set each axis title to corresponding group name
Code example using seaborn tips dataset:
axes = tips.groupby('sex').plot()
group_names = tips.groupby('sex').groups.keys()
for ax, title in zip(axes, group_names):
ax.set_title(title)
Output:
I have grouped data from which I want to generated boxplots using seaborn. However, not every group has all classes. As a result, the boxplots are not centered if classes are missing within one group:
Figure
The graph is generated using the following code:
sns.boxplot(x="label2", y="value", hue="variable",palette="Blues")
Is there any way to force seaborn to center theses boxes? I didn't find any approbiate way.
Thank you in advance.
Yes there is but you are not going to like it.
Centering these will mean that you will have the same y value for median values, so normalize your data so that the median is 0.5 for each y value for each value of x. That will give you the plot you want, but you should note that somewhere in the plot so people will not be confused.
I want to create a Pie chart using single column of my dataframe, say my column name is 'Score'. I have stored scores in this column as below :
Score
.92
.81
.21
.46
.72
.11
.89
Now I want to create a pie chart with the range in percentage.
Say 0-0.4 is 30% , 0.4-0.7 is 35 % , 0.7+ is 35% .
I am using the below code using
df1['bins'] = pd.cut(df1['Score'],bins=[0,0.5,1], labels=["0-50%","50-100%"])
df1 = df.groupby(['Score', 'bins']).size().unstack(fill_value=0)
df1.plot.pie(subplots=True,figsize=(8, 3))
With the above code I am getting the Pie chart, but i don’t know how i can do this using percentage.
my pie chart look like this for now
Cutting the dataframe up into bins is the right first step. After which, you can use value_counts with normalize=True in order to get relative frequencies of values in the bins column. This will let you see percentage of data across ranges that are defined in the bins.
In terms of plotting the pie chart, I'm not sure if I understood correctly, but it seemed like you would like to display the correct legend values and the percentage values in each slice of the pie.
pandas.DataFrame.plot is a good place to see all parameters that can be passed into the plot method. You can specify what are your x and y columns to use, and by default, the dataframe index is used as the legend in the pie plot.
To show the percentage values per slice, you can use the autopct parameter as well. As mentioned in this answer, you can use all the normal matplotlib plt.pie() flags in the plot method as well.
Bringing everything together, this is the resultant code and the resultant chart:
df = pd.DataFrame({'Score': [0.92,0.81,0.21,0.46,0.72,0.11,0.89]})
df['bins'] = pd.cut(df['Score'], bins=[0,0.4,0.7,1], labels=['0-0.4','0.4-0.7','0.7-1'], right=True)
bin_percent = pd.DataFrame(df['bins'].value_counts(normalize=True) * 100)
plot = bin_percent.plot.pie(y='bins', figsize=(5, 5), autopct='%1.1f%%')
Plot of Pie Chart
I have a dataset and I want to find out how several columns values (numeric values) differ across two different groups ('group' is a column that takes either the value of 'high' or 'low').
I want to plot several barplots using a similar system/aesthetics to Seaborn's FacetGrid or PairGrid. Each plot will have a different Y value but the same X-axis (The group variable)
This is what I have so far:
sns.catplot(x='group', y='Number of findings (total)', kind="bar",
palette="muted", data=df)
But I would like to write a loop that can replace my y variable with different variables. How to do it?
I am trying out Seaborn to make my plot visually better than matplotlib. I have a dataset which has a column 'Year' which I want to plot on the X-axis and 4 Columns say A,B,C,D on the Y-axis using different coloured lines. I was trying to do this using the sns.lineplot method but it allows for only one variable on the X-axis and one on the Y-axis. I tried doing this
sns.lineplot(data_preproc['Year'],data_preproc['A'], err_style=None)
sns.lineplot(data_preproc['Year'],data_preproc['B'], err_style=None)
sns.lineplot(data_preproc['Year'],data_preproc['C'], err_style=None)
sns.lineplot(data_preproc['Year'],data_preproc['D'], err_style=None)
But this way I don't get a legend in the plot to show which coloured line corresponds to what. I tried checking the documentation but couldn't find a proper way to do this.
Seaborn favors the "long format" as input. The key ingredient to convert your DataFrame from its "wide format" (one column per measurement type) into long format (one column for all measurement values, one column to indicate the type) is pandas.melt. Given a data_preproc structured like yours, filled with random values:
num_rows = 20
years = list(range(1990, 1990 + num_rows))
data_preproc = pd.DataFrame({
'Year': years,
'A': np.random.randn(num_rows).cumsum(),
'B': np.random.randn(num_rows).cumsum(),
'C': np.random.randn(num_rows).cumsum(),
'D': np.random.randn(num_rows).cumsum()})
A single plot with four lines, one per measurement type, is obtained with
sns.lineplot(x='Year', y='value', hue='variable',
data=pd.melt(data_preproc, ['Year']))
(Note that 'value' and 'variable' are the default column names returned by melt, and can be adapted to your liking.)
This:
sns.lineplot(data=data_preproc)
will do what you want.
See the documentation:
sns.lineplot(x="Year", y="signal", hue="label", data=data_preproc)
You probably need to re-organize your dataframe in a suitable way so that there is one column for the x data, one for the y data, and one which holds the label for the data point.
You can also just use matplotlib.pyplot. If you import seaborn, much of the improved design is also used for "regular" matplotlib plots. Seaborn is really "just" a collection of methods which conveniently feed data and plot parameters to matplotlib.