Seaborn FacetGrid - How to get % instead count? [duplicate] - python

This question already has answers here:
How to plot percentage with seaborn distplot / histplot / displot
(3 answers)
Plot a horizontal line on a given plot
(7 answers)
Box around text in matplotlib
(3 answers)
Closed 3 months ago.
I'm tring to create a Clustering Situation, with KMeans.
This is how my datasets looks like:
With these dataset, I apply FacetGrid this way:
for c in data:
grid= sns.FacetGrid(data, col='Clusters')
grid.map(plt.hist,c)
grid.set_xticklabels(rotation=90)
Output:
For all features.
This is working ok, but the FacetGrid only show Feature Value X Count for each clusters...
This information is not too relevant too me, since all clusters have different 'len'.
E.g Customer Age for Cluster 1 plot is very higher than Customer Age for Cluster 0, since Cluster 1 has more elements.
What I need:
I need a way to compare each column of the plot relative to its total.
E.g
I'd like to see:
For each cluster and each feature.
Is it possible?
Thank you.

Related

How to create a histogram with an aggregated dataset [duplicate]

This question already has answers here:
Ploting with seaborn histplot
(2 answers)
Plotting a histogram from pre-counted data in Matplotlib
(6 answers)
Histogram from data which is already binned, I have bins and frequency values
(3 answers)
Closed last month.
I have a dataset new_products which describes the number of months its been since a product launched. I aggregated that data together so that I have 'since_debut' and 'count'. Which describes the number of products that debuted 1, 2, 3....60 month ago. I am having trouble creating a histogram with seaborn.
df = since_debut count
1 1784
2 7345
3 11111
4 13255
sns.histplot(data=df, x="since_debut", y="count", bins=30, kde=True)
ValueError: Could not interpret value `since_debut` for parameter `x`
Unsure what is throwing this error and why it can't interpret the aggregated data. Any help or advice is appreciated.
Since you have already aggregated dataset shouldn't you use something like barplot:
sns.barplot(data=df, x="since_debut", y="count")
countplot should be used on original data and will aggregate data over one of the axis itself.

Get bin size values in seaborn charts logscale [duplicate]

This question already has answers here:
Python - Use bins from one sns.histplot() for another / Extract bin information from an sns.histpllot()
(1 answer)
access to bin counts in seaborn distplot
(1 answer)
How can I extract the bins from seaborn's KDE distplot object?
(2 answers)
Closed 2 months ago.
I made a chart with seaborn and I would like to retrieve the bin size values.
As my bins are constant in a logarithmic scale, their size are different. Any ideas ?
Code used : sns.displot(productDF, x="Area", hue="Slice",hue_order=sliceList, bins = 50, log_scale=True, col="Slice", col_wrap = 2, col_order=sliceList)
Here after an example of my chart:
I checked the doc but seaborn doesn't seem to return any info.

Comparing distribution plots for better visualisation [duplicate]

This question already has answers here:
Seaborn data visualization misunderstanding of densities?
(1 answer)
How to do KDE(kernel density estimation) independently with seaborn?
(1 answer)
Seaborn - displot normalize KDEs for two different sample batches
(1 answer)
How to plot many kdeplots on one figure in python
(1 answer)
Closed 7 months ago.
How can I plot multiple distribution plots in one plot where I have column = "Quantity" from 5 dataframes at nation, region, division, state and DMA level and the length of dataframes and their scale differs a lot.
I used this code:
sns.displot(data= data_vert, x='dev_median_qty', hue='level', kind='kde', fill=True, palette=sns.color_palette('bright')[:5], height=5, aspect=2.5)
plt.xlim(-5, 25)
And I got this graph :
I want that area under each curve is one, or every level data gives same area under the curve without changing the distribution, so that this graph can be more visually sound and good to observe.

Visualize a binary vector [duplicate]

This question already has answers here:
Using pandas value_counts and matplotlib
(1 answer)
how to sort the result by pandas.value_counts
(1 answer)
Unnormalized histogram plots in Seaborn are not centered on X-axis
(1 answer)
Differences between seaborn histogram, countplot and distplot
(1 answer)
Closed 8 months ago.
I have a binary column in pandas dataframe. I want to visualize it, just to see how much there is 0 or 1. I used displot:
Plot = sns.displot(data = data, x = 'stroke', color = 'm')
Plot.fig.suptitle('Stroke numbers in data', size=15, y=1.12);
This did the job but it's very ugly, how do I make it only with 0 and 1 ?:
I think this is a good solution:
data["stroke"].value_counts(sort=False).plot.bar(rot=0)

How to plot rownames and column names for a heatmap with matplotlib? [duplicate]

This question already has answers here:
Heatmap in matplotlib with pcolor?
(4 answers)
Closed 9 years ago.
This is a very basic question. I could not find a satisfactory answer anywhere else so I am writing this up as a question here. I have a matrix, a square matrix about 1300 x 1300. I can use matplotlib to generate a heatmap from it. However, I want the row and column names to show up on the heat map instead of the 0 -- 1300 that normally shows up when i use imshow.
I will put up an example shortly.
You still have not put up your example, but I will give you a quick example of how to change the labels on each axis!
First, put your labels in an array, let's call them y_labels and x_labels
Now here is your code:
ax = pylab.subplots()
ax.set_yticklabels(y_labels)
ax.set_xticklabels(x_labels)
That should do the trick!

Categories