seaborn plot_marginals multiple kdeplots - python

I would like to be able to plot multiple overlaid kde plots on the y axis margin (don't need the x axis margin plot). Each kde plot would correspond to the color category (there are 4) so that I would have 4 kde's each depicting the distribution of one of the categories. This is as far as I got:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
x = [106405611, 107148674, 107151119, 107159869, 107183396, 107229405, 107231917, 107236097,
107239994, 107259338, 107273842, 107275873, 107281000, 107287770, 106452671, 106471246,
106478110, 106494135, 106518400, 106539079]
y = np.array([ 9.09803208, 5.357552 , 8.98868469, 6.84549005,
8.17990909, 10.60640521, 9.89935692, 9.24079133,
8.97441459, 9.09803208, 10.63753055, 11.82336724,
7.93663794, 8.74819285, 8.07146236, 9.82336724,
8.4429435 , 10.53332973, 8.23361968, 10.30035256])
x1 = pd.Series(x, name="$V$")
x2 = pd.Series(y, name="$Distance$")
col = np.array([2, 4, 4, 1, 3, 4, 3, 3, 4, 1, 4, 3, 2, 4, 1, 1, 2, 2, 3, 1])
g = sns.JointGrid(x1, x2)
g = g.plot_joint(plt.scatter, color=col, edgecolor="black", cmap=plt.cm.get_cmap('RdBu', 11))
cax = g.fig.add_axes([1, .25, .02, .4])
plt.colorbar(cax=cax, ticks=np.linspace(1,11,11))
g.plot_marginals(sns.kdeplot, color="black", shade=True)

To plot a distribution of each category, I think the best way is to first combine the data into a pandas dataframe. Then you can loop through each unique category by filtering the dataframe and plot the distribution using calls to sns.kdeplot.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
x = np.array([106405611, 107148674, 107151119, 107159869, 107183396, 107229405,
107231917, 107236097, 107239994, 107259338, 107273842, 107275873,
107281000, 107287770, 106452671, 106471246, 106478110, 106494135,
106518400, 106539079])
y = np.array([9.09803208, 5.357552 , 8.98868469, 6.84549005,
8.17990909, 10.60640521, 9.89935692, 9.24079133,
8.97441459, 9.09803208, 10.63753055, 11.82336724,
7.93663794, 8.74819285, 8.07146236, 9.82336724,
8.4429435 , 10.53332973, 8.23361968, 10.30035256])
col = np.array([2, 4, 4, 1, 3, 4, 3, 3, 4, 1, 4, 3, 2, 4, 1, 1, 2, 2, 3, 1])
# Combine data into DataFrame
df = pd.DataFrame({'V': x, 'Distance': y, 'col': col})
# Define colormap and create corresponding color palette
cmap = sns.diverging_palette(20, 220, as_cmap=True)
colors = sns.diverging_palette(20, 220, n=4)
# Plot data onto seaborn JointGrid
g = sns.JointGrid('V', 'Distance', data=df, ratio=2)
g = g.plot_joint(plt.scatter, c=df['col'], edgecolor="black", cmap=cmap)
# Loop through unique categories and plot individual kdes
for c in df['col'].unique():
sns.kdeplot(df['Distance'][df['col']==c], ax=g.ax_marg_y, vertical=True,
color=colors[c-1], shade=True)
sns.kdeplot(df['V'][df['col']==c], ax=g.ax_marg_x, vertical=False,
color=colors[c-1], shade=True)
This is in my opinion a much better and cleaner solution than my original answer in which I needlessly redefined the seaborn kdeplot because I had not thought to do it this way. Thanks to mwaskom for pointing that out. Also note that the legend labels are removed in the posted solution and are done so using
g.ax_marg_x.legend_.remove()
g.ax_marg_y.legend_.remove()

Related

Automatically find and add the coordinates to add Annotations (e.g. count) on a Boxplot made from a Dictionary of uneven Lists

I'm pretty new in programming world and I'm really frustrated to solve a problem which I thought should be really easy...
Case: Let's say I have a Dictionary with uneven Lists; Also the number of Keys(string) & Values(number) could change anytime.
Need: I want to annotate (add text or whatever) some Information (e.g. count) to each Subplots or Categories (each Key is an individual Category).
Problem: I found many solutions for evenly numbered Categories, which apparently doesn't work for me. e.g. Solution
I also found some Answers e.g. Solution , that I should first get the Coordinates of each Keys in the x-line and then do a inverted transformation to work with the "log scales". Which was so far the best solution for me, but unfortunately it does not really fit the Coordinates and I couldn't get & add the points automatically before using plt.show().
I could also guess the coordinates with trial error in the Transformation Method or with Offset e.g. Solution. But as I said, my Dictionary could change anytime, and then I should do it again every time!
I think there should be much more simpler method to solve this problem, but I couldn't find it.
Here is the simplified example of my Code and what I tried:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.offsetbox import (TextArea, DrawingArea, OffsetImage,
AnnotationBbox)
dictionary = {}
dictionary["a"] = [1, 2, 3, 4, 5]
dictionary["b"] = [1, 2, 3, 4, 5, 6, 7]
fig, ax = plt.subplots()
ax.boxplot(dictionary.values())
x = ax.set_xticklabels(dictionary.keys())
fig.text(x = 0.25, y = 0, s = str(len(dictionary["a"])))
fig.text(x = 0.75, y = 0, s = str(len(dictionary["b"])))
plt.show()
crd = np.vstack((ax.get_xticks(), np.zeros_like(ax.get_xticks()))).T
ticks = ax.transAxes.inverted().transform(ax.transData.transform(crd))
print(ticks[:,0])
# ab = AnnotationBbox(TextArea("text"), xy=(1, 0), xybox =(0, -30), boxcoords="offset points",pad=0,frameon=False )
# ax.add_artist(ab)
Output of my code
as i understand you may want something like this:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.offsetbox import (TextArea, DrawingArea, OffsetImage,
AnnotationBbox)
dictionary = {}
dictionary["a"] = [1, 2, 3, 4, 5]
dictionary["b"] = [1, 2, 3, 4, 5, 6, 7]
dictionary["cex"] = [1, 2, 3]
fig, ax = plt.subplots()
ax.boxplot(dictionary.values())
x = ax.set_xticklabels(dictionary.keys())
ticksList=ax.get_xticks()
print (ticksList)
for x in ticksList:
ax.text(x, 0,str(len(list(dictionary.values())[x-1])),fontdict={'horizontalalignment': 'center'})
fig.show()

Plotly: How to make all plots grayscale?

I am using Plotly to generate few line plots in Python. With a sample code like this:
from plotly import offline as plot, subplots as subplot, graph_objects as go
fig = subplot.make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.01)
trace1 = go.Scatter(x = [1, 2, 3], y = [1, 2, 3])
trace2 = go.Scatter(x = [1, 2, 3], y = [4, 5, 6])
fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 2, 1)
config_test_plot = {'displaylogo': False, 'displayModeBar': False, 'scrollZoom': True}
test_plot_html = plot.plot(fig, output_type='div', include_plotlyjs=False, config= config_test_plot)
I am able to get the required plots. However, I want to be able to get all my plots in grayscale. I see that none of the Plotly default themes are of this type. Is there anyway I can do this?
You haven't specified whether to assign a grey color scheme for your entire plot, or only for your lines. But just to not make things easy for myself, I'm going to assume the former. In that case, I would:
use template = 'plotly_white' for the figure elements not directly connected to your dataset, and
assign a grey scale to all lines using n_colors(lowcolor, highcolor, n_colors, colortype='tuple').
Example plot:
But as #S3DEV mentions, using the greys color palette could be a way to go too, and this is accesible through:
# In:
px.colors.sequential.Greys
# Out:
# ['rgb(255,255,255)',
# 'rgb(240,240,240)',
# 'rgb(217,217,217)',
# 'rgb(189,189,189)',
# 'rgb(150,150,150)',
# 'rgb(115,115,115)',
# 'rgb(82,82,82)',
# 'rgb(37,37,37)',
# 'rgb(0,0,0)']
And this would work perfectly for your use case with a limited number of lines. In that case you could just use this setup:
from plotly import offline as plot, subplots as subplot, graph_objects as go
from itertools import cycle
fig = subplot.make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.01)
trace1 = go.Scatter(x = [1, 2, 3], y = [1, 2, 3])
trace2 = go.Scatter(x = [1, 2, 3], y = [4, 5, 6])
fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 2, 1)
colors = cycle(list(set(px.colors.sequential.Greys)))
f = fig.full_figure_for_development(warn=False)
for d in fig.data:
d.line.color = next(colors)
fig.show()
And get:
And I assume that this is what you were looking for. But one considerable drawback here is that the number of colors in px.colors.sequential.Greys is limited, and I had to use a cycle to assign the line colors of your data. And n_colors(lowcolor, highcolor, n_colors, colortype='tuple') lets you define a starting color, an end color, and a number of colors scaled between them to form a complete scale for all your lines. This will also let you adjust the brightness of the colors to your liking. So you could get this:
...this:
or this:
Here's a complete setup for those figures if you would like to experiment with that as well:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import datetime
from plotly.colors import n_colors
pd.set_option('display.max_rows', None)
pd.options.plotting.backend = "plotly"
# data sample
nperiods = 200
np.random.seed(123)
cols = 'abcdefghijkl'
df = pd.DataFrame(np.random.randint(-10, 12, size=(nperiods, len(cols))),
columns=list(cols))
datelist = pd.date_range(datetime.datetime(2020, 1, 1).strftime('%Y-%m-%d'),periods=nperiods).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
df.iloc[0] =1000
df = df.cumsum()#.reset_index()
greys_all = n_colors('rgb(0, 0, 0)', 'rgb(255, 255, 255)', len(cols)+1, colortype='rgb')
greys_dark = n_colors('rgb(0, 0, 0)', 'rgb(200, 200, 200)', len(cols)+1, colortype='rgb')
greys_light = n_colors('rgb(200, 200, 200)', 'rgb(255, 255, 255)', len(cols)+1, colortype='rgb')
greys = n_colors('rgb(100, 100, 100)', 'rgb(255, 255, 255)', len(cols)+1, colortype='rgb')
fig = df.plot(title = 'Greys_light', template='plotly_white', color_discrete_sequence=greys_light)
fig.update_layout(template='plotly_white')
fig.show()

Python: conditional color in barplot

I'm doing an horizontal barplot. I need one specific bar (type=milk) to have a green fill color, and gray for the other types. The dataframe is:
df = DataFrame(val = c(1, 2, 3, 6, 7, 8),
type = c("honey","bread","coffee","bread","honey","milk"))
I have tried this without success:
clrs = ['green' if ((x == milk) else 'gray' for x in type]
ax.barh(df['type'], (df['val']), align='center', colors=clrs)
Any ideas?
Yes, you could change the patch of the appropriate bar, something like
b=ax.barh(df['type'], (df['val']), align='center', colors=clrs)
b.patches[4].set_color('green')
However I don't understand why you are repeating the same labels multiple times.
If you didn't have repeated elements you could find the label milk automatically like this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.close('all')
df = pd.DataFrame({'val': np.array([1, 2, 3, 6]), 'types': ["honey", "coffee", "bread","milk"]})
b = plt.barh(df['types'], df['val'], align='center', color='gray')
index_milk = df[df['types']=='milk'].index[0]
b.patches[index_milk].set_color('green')
plt.show()

In pandas, how to convert a numeric type to category type to use with seaborn hue

I am stuck on what seems like an easy problem trying to color the different groups on a scatterplot I am creating. I have the following example dataframe and graph:
test_df = pd.DataFrame({ 'A' : 1.,
'B' : np.array([1, 5, 9, 7, 3], dtype='int32'),
'C' : np.array([6, 7, 8, 9, 3], dtype='int32'),
'D' : np.array([2, 2, 3, 4, 4], dtype='int32'),
'E' : pd.Categorical(["test","train","test","train","train"]),
'F' : 'foo' })
# fix to category
# test_df['D'] = test_df["D"].astype('category')
# and test plot
f, ax = plt.subplots(figsize=(6,6))
ax = sns.scatterplot(x="B", y="C", hue="D", s=100,
data=test_df)
which creates this graph:
However, instead of a continuous scale, I'd like a categorical scale for each of the 3 categories [2, 3, 4]. After I uncomment the line of code test_df['D'] = ..., to change this column to a category column-type for category-coloring in the seaborn plot, I receive the following error from the seaborn plot: TypeError: data type not understood
Does anybody know the correct way to convert this numeric column to a factor / categorical column to use for coloring?
Thanks!
I copy/pasted your code, added libraries for import and removed the comment as I thought it looked good. I get a plot with 'categorical' colouring for value [2,3,4] without changing any of your code.
Try updating your seaborn module using: pip install --upgrade seaborn
Here is a list of working libraries used with your code.
matplotlib==3.1.2
numpy==1.18.1
seaborn==0.10.0
pandas==0.25.3
... which executed below code.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
test_df = pd.DataFrame({ 'A' : 1.,
'B' : np.array([1, 5, 9, 7, 3], dtype='int32'),
'C' : np.array([6, 7, 8, 9, 3], dtype='int32'),
'D' : np.array([2, 2, 3, 4, 4], dtype='int32'),
'E' : pd.Categorical(["test","train","test","train","train"]),
'F' : 'foo' })
# fix to category
test_df['D'] = test_df["D"].astype('category')
# and test plot
f, ax = plt.subplots(figsize=(6,6))
ax = sns.scatterplot(x="B", y="C", hue="D", s=100,
data=test_df)
plt.show()
I encoutered the same error TypeError: data type not understood.
Workaround that works is to use option legend="full". Conversion to categorical type is not necessary in this approach:
ax = sns.scatterplot(x="B", y="C", hue="D", s=100, legend="full", data=test_df)
Another solution is to use custom palette:
ax = sns.scatterplot(x="B", y="C", hue="D", s=100, palette=["b", "g", "r"], data=test_df)
In this case number of colours must be equal to unique values in column "D".

Python - legend values duplicate

I'm plotting a matrix, as shown below, and the legend repeats over and over again. I've tried using numpoints = 1 and this didn't seem to have any effect. Any hints?
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (10, 8) # set default figure size, 8in by 6inimport numpy as np
data = pd.read_csv('data/assg-03-data.csv', names=['exam1', 'exam2', 'admitted'])
x = data[['exam1', 'exam2']].as_matrix()
y = data.admitted.as_matrix()
# plot the visualization of the exam scores here
no_admit = np.where(y == 0)
admit = np.where(y == 1)
from pylab import *
# plot the example figure
plt.figure()
# plot the points in our two categories, y=0 and y=1, using markers to indicated
# the category or output
plt.plot(x[no_admit,0], x[no_admit,1],'yo', label = 'Not admitted', markersize=8, markeredgewidth=1)
plt.plot(x[admit,0], x[admit,1], 'r^', label = 'Admitted', markersize=8, markeredgewidth=1)
# add some labels and titles
plt.xlabel('$Exam 1 score$')
plt.ylabel('$Exam 2 score$')
plt.title('Admit/No Admit as a function of Exam Scores')
plt.legend()
It's nearly impossible to understand the problem if you don't put an example of data format especially if one is not familiar with pandas.
However, assuming your input has this format:
x=pd.DataFrame(np.array([np.arange(10),np.arange(10)**2]).T,columns=['exam1','exam2']).as_matrix()
y=pd.DataFrame(np.arange(10)%2).as_matrix()
>>x
array([[ 0, 0],
[ 1, 1],
[ 2, 4],
[ 3, 9],
[ 4, 16],
[ 5, 25],
[ 6, 36],
[ 7, 49],
[ 8, 64],
[ 9, 81]])
>> y
array([[0],
[1],
[0],
[1],
[0],
[1],
[0],
[1],
[0],
[1]])
the reason is the strange transformation from DataFrame to matrix, I guess it wouldn't happen if you have vectors (1D arrays).
For my example this works (not sure if it is the cleanest form, I don't know where the 2D matrix for x and y comes from):
plt.plot(x[no_admit,0][0], x[no_admit,1][0],'yo', label = 'Not admitted', markersize=8, markeredgewidth=1)
plt.plot(x[admit,0][0], x[admit,1][0], 'r^', label = 'Admitted', markersize=8, markeredgewidth=1)

Categories