Highlighting Outliers in scatter plot - python

I am using dataset "tips".
Plotting scatter plot with below code
sns.scatterplot(data=df['total_bill'])
I want to show the outliers let's say in this case points which are above 40 on y-axis, in different color or big or is it possible to draw a horizontal like at 40?

With below code desired result achieved.
sns.scatterplot(data=df, y='total_bill', x=range(0,244), hue='is_outlier')

Using seaborn.scatterplot you can leverage the "hue" parameter to plot groups in different color. For your example the following should work
is_outlier = (df['total_bill'] >= 40)
sns.scatterplot(data=df['total_bill'], hue=is_outlier)

Related

Altair: Customizing outliers in boxplots

Is there any way to customize the outlier points in an Altair boxplot? Suppose I had the following plot:
penguins_data="https://raw.githubusercontent.com/datavizpyr/data/master/palmer_penguin_species.tsv"
penguins_df = pd.read_csv(penguins_data, sep="\t")
chart = alt.Chart(penguins_df).mark_boxplot(size=50, extent=0.5).encode(
x='species:O',
y=alt.Y('culmen_length_mm:Q',scale=alt.Scale(zero=False)),
color=alt.Color('species')
).properties(width=300)
I would like to jitter the outliers and also make the points smaller. Is that possible, or would we have to create two layered plots? Ideally the jittered points are all found within the width of the boxplot itself, but that isn't necessary.
I don't think you can jitter them, but you can make them smaller:
alt.Chart(penguins_df).mark_boxplot(size=50, extent=0.5, outliers={'size': 5}).encode(
x='species:O',
y=alt.Y('culmen_length_mm:Q',scale=alt.Scale(zero=False)),
color=alt.Color('species')
).properties(width=300)

Add labels ONLY to SELECTED data points in seaborn scatter plot

I have created a seaborn scatter plot and added a trendline to it. I have some datapoints that fall very far away from the trendline (see the ones highlighted in yellow) so I'd like to add data labels only to these points, NOT to all the datapoints in the graph.
Does anyone know what's the best way to do this?
So far I've found answers to "how to add labels to ALL data points" (see this link) but this is not my case.
In the accepted answer to the question that you reference you can see that the way they add labels to all data points is by looping over the data points and calling .text(x, y, string) on the axes. You can find the documentation for this method here (seaborn is implemented on top of matplotlib). You'll have to call this method for the selected points.
In your specific case I don't know exactly what formula you want to use to find your outliers but to literally get the ones beyond the limits of the yellow rectangle that you've drawn you could try the following:
for x,y in zip(xarr, yarr):
if x < 5 and y > 5.5:
ax.text(x+0.01, y, 'outlier', horizontalalignment='left', size='medium', color='black')
Where xarr is your x-values, yarr your y-values and ax the returned axes from your call to seaborn.

how to use plt.yscale('log') for specific values between 0 and 1?

I need to plot a logarithmic y-axis between 0 and 1 like the graph in the picture.
I need the points on the y-axis to be [0.005,0.010,0.050,0.100,0.500,1] like the graph in the picture. how can I choose which values will show on the axis?
use plt.yscale('log') to make logarithmic scale and plt.axis([1,10000,0.004,1]) for plot borders
use plt.yticks([0.005,0.010,0.050,0.100,0.500,1],[0.005,0.010,0.050,0.100,0.500,1]) to choose the values that will show
plt.yticks([points],[names])

MatPlotLib - Showing legend

I'm making a scatter plot from a Pandas DataFrame with 3 columns. The first two would be the x and y axis, and the third would be classicfication data that I want to visualize by points having different colors. My question is, how can I add the legend to this plot:
df= df.groupby(['Month', 'Price'])['Quantity'].sum().reset_index()
df.plot(kind='scatter', x='Month', y='Quantity', c=df.Price , s = 100, legend = True);
As you can see, I'd like to automatically color the dots based on their price, so adding labels manually is a bit of an inconvenience. Is there a way I could add something to this code, that would also show a legend to the Price values?
Also, this colors the scatter plot dots on a range from black to white. Can I add custom colors without giving up the easy usage of c=df.Price?
Thank you!

Sorted bar charts with pandas/matplotlib or seaborn

I have a dataset of 5000 products with 50 features. One of the column is 'colors' and there are more than 100 colors in the column. I'm trying to plot a bar chart to show only the top 10 colors and how many products there are in each color.
top_colors = df.colors.value_counts()
top_colors[:10].plot(kind='barh')
plt.xlabel('No. of Products');
Using Seaborn:
sns.factorplot("colors", data=df , palette="PuBu_d");
1) Is there a better way to do this?
2) How can i replicate this with Seaborn?
3) How do i plot such that the highest count is at the top (i.e black at the very top of the bar chart)
An easy trick might be to invert the y axis of your plot, rather than futzing with the data:
s = pd.Series(np.random.choice(list(string.uppercase), 1000))
counts = s.value_counts()
ax = counts.iloc[:10].plot(kind="barh")
ax.invert_yaxis()
Seaborn barplot doesn't currently support horizontally oriented bars, but if you want to control the order the bars appear in you can pass a list of values to the x_order param. But I think it's easier to use the pandas plotting methods here, anyway.
If you want to use pandas then you can first sort:
top_colors[:10].sort(ascending=0).plot(kind='barh')
Seaborn already styles your pandas plots, but you can also use:
sns.barplot(top_colors.index, top_colors.values)

Categories