Seaborn plot two data sets on the same scatter plot - python

I have 2 data sets in Pandas Dataframe and I want to visualize them on the same scatter plot so I tried:
import matplotlib.pyplot as plt
import seaborn as sns
sns.pairplot(x_vars=['Std'], y_vars=['ATR'], data=set1, hue='Asset Subclass')
sns.pairplot(x_vars=['Std'], y_vars=['ATR'], data=set2, hue='Asset Subclass')
plt.show()
But all the time I get 2 separate charts instead of a single one
How can I visualize both data sets on the same plot? Also can I have the same legend for both data sets but different colors for the second data set?

The following should work in the latest version of seaborn (0.9.0)
import matplotlib.pyplot as plt
import seaborn as sns
First we concatenate the two datasets into one and assign a dataset column which will allow us to preserve the information as to which row is from which dataset.
concatenated = pd.concat([set1.assign(dataset='set1'), set2.assign(dataset='set2')])
Then we use the sns.scatterplot function from the latest seaborn version (0.9.0) and via the style keyword argument set it so that the markers are based on the dataset column:
sns.scatterplot(x='Std', y='ATR', data=concatenated,
hue='Asset Subclass', style='dataset')
plt.show()

Related

can seaborn normalise data such that y-axis is clear

I plot time series of data where the y values of the data are orders of magnitude different.
I am using seaborn.lmplot and was expecting to find a normalise keyword, but have been unable to.
I tried to use a log scale, but this failed (see diagram).
This is my best attempt so far:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
gbp_stats = pd.read_csv('price_data.csv')
sns.lmplot(data=gbp_stats, x='numeric_time', y='last trade price', col='symbol')
plt.yscale('log')
plt.show()
Which gave me this:
As you can see, the result needs to scale or normalize the y-axis for each plot. I could do a normalization in pandas, but wanted to avoid such if possible.
So my question is this: Does seaborn have a normailze feature such that the y-axis can be compared better than what i have achieved?
I post this answer which was directly derived from mwaskom comment sharey=False, with a small tweak as this format was depreciated in seaborn and sharey=False now goes into a dict.
The implementation is to add the keyword which takes a dict like this: facet_kws={'sharey':False}
So the answer becomes this:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
gbp_stats = pd.read_csv('price_data.csv')
sns.lmplot(data=gbp_stats, x='numeric_time', y='last trade price',
col='symbol', hue='symbol', facet_kws={'sharey':False})
plt.yscale('log') # this is optional now.
plt.show()
And the result is this:

How to incorporate subplots option when plotting a data frame using Pandas-Bokeh?

I have a dataframe corresponding to a multivariate time series which I'd like to plot. Each channel would appear on its own set of axes, with all plots arranged vertically. I'd also like to add the interactive options available with Bokeh, including the ability to remove one channel from view by clicking on its label.
Without Bokeh, I can use subplots to get the separate "static" plots stacked vertically as follows:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
A=np.random.rand(800,10)
df=pd.DataFrame(data=A,columns=['a','b','c','d','e','f','g','h','i','j'])
df.plot(subplots=True)
plt.show()
I can plot the 10 channels on one set of axes using Bokeh using this:
import numpy as np
import pandas as pd
pd.set_option('plotting.backend', 'pandas_bokeh')
A=np.random.rand(800,10)
df=pd.DataFrame(data=A,columns=['a','b','c','d','e','f','g','h','i','j'])
df.plot_bokeh(kind="line")
The resulting graph allows for zooming, panning, channel de-selection, etc. However all plots signals are plotted on the same set of axes, which I would rather not do.
I use this code snippet to plot my figures in a grid.
import pandas as pd
import pandas_bokeh
from bokeh.palettes import Dark2_5 as palette
def plot_grid(df: pd.DataFrame):
figs = []
color = itertools.cycle(palette)
for c in df.columns:
figs.append(df[c].plot_bokeh(show_figure=False, color=next(color)))
pandas_bokeh.plot_grid(figs, ncols=1, plot_width=1500)
The ncols parameter allows you to specify how many columns you want per row.
Hope this helps!

How to create seaborn violinplot with mean,median and mode displayed?

Is there a way to add a mean and a mode to a violinplot ? I have categorical data in one of my columns and the corresponding values in the next column. I tried looking into matplotlib violin plot as it technically offers the functionality I am looking for but it does not allow me to specify a categorical variable on the x axis, and this is crucial as I am looking at the distribution of the data per category. I have added a small table illustrating the shape of the data.
plt.figure(figsize=10,15)
ax=sns.violinplot(x='category',y='value',data=df)
First we calculate the the mode and means:
import seaborn as sns
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame({'Category':[1,2,5,1,2,4,3,4,2],
'Value':[1.5,1.2,2.2,2.6,2.3,2.7,5,3,0]})
Means = df.groupby('Category')['Value'].mean()
Modes = df.groupby('Category')['Value'].agg(lambda x: pd.Series.mode(x)[0])
You can use seaborn to make the basic plot, below I remove the inner boxplot using the inner= argument, so that we can see the mode and means:
fig, ax = plt.subplots()
sns.violinplot(x='Category',y='Value',data=df,inner=None)
plt.setp(ax.collections, alpha=.3)
plt.scatter(x=range(len(Means)),y=Means,c="k")
plt.scatter(x=range(len(Modes)),y=Modes)

How to plot the count of anomalies grouped by the factory and component?

I have the following code using countplot to plot the count of anomalies grouped by the factory:
import seaborn as sns
sns.countplot(x='factory', hue='anomaly', data=train_df)
This is working (with a very small image width however), but I need to plot a chart that shows the count of products grouped by factory and anomaly.
How can I do this?
The chart can be very large as there are dozens of anomalies and components, so probably I'll have to generate a larger image. What do you suggest?
Here's a small sample of the data:
product_id,factory,anomaly,component
1,1,AC1,W2
2,3,AB1,J1
3,2,AC3,L3
4,4,BA2,T2
5,3,BA2,T2
6,1,AA1,X2
7,4,AC2,J1
8,2,CA1,N1
9,2,AB3,J1
10,4,BB3,W1
11,2,AC3,C3
12,4,CA1,M1
13,3,BC3,Q1
14,2,AC2,O3
And here's the url to the complete: CSV
How the plot should look like:
I guess you would want to create a countplot like
import seaborn as sns
sns.countplot(x='anomaly', hue='factory', data=df)
plt.setp(ax.get_xticklabels(), rotation=90)
You could also create a pivot table of the factories and anomalies with the number of different components as values.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("data/component_factory.txt")
piv = df.pivot_table(values='component', index='anomaly', columns='factory',
aggfunc=lambda x: len(x.unique()))
piv.plot.bar(width=0.8)
plt.show()

Python categorical plot with error bands

I need to make a plot of the following data, with the year_week on x-axis, the test_duration on the y-axis, and each operator as a different series. There may be multiple data points for the same operator in one week. I need to show standard deviation bands around each series.
data = pd.DataFrame({'year_week':[1601,1602,1603,1604,1604,1604],
'operator':['jones','jack','john','jones','jones','jack'],
'test_duration':[10,12,43,7,23,9]})
prints as:
I have looked at seaborn, matplotlib, and pandas, but I cannot find a solution.
It could be that you are looking for seaborn pointplot.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.DataFrame({'year_week':[1601,1602,1603,1604,1604,1604],
'operator':['jones','jack','john','jones','jones','jack'],
'test_duration':[10,12,43,7,23,9]})
sns.pointplot(x="year_week", y="test_duration", hue="operator", data=data)
plt.show()

Categories