Is there a way to add a mean and a mode to a violinplot ? I have categorical data in one of my columns and the corresponding values in the next column. I tried looking into matplotlib violin plot as it technically offers the functionality I am looking for but it does not allow me to specify a categorical variable on the x axis, and this is crucial as I am looking at the distribution of the data per category. I have added a small table illustrating the shape of the data.
plt.figure(figsize=10,15)
ax=sns.violinplot(x='category',y='value',data=df)
First we calculate the the mode and means:
import seaborn as sns
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame({'Category':[1,2,5,1,2,4,3,4,2],
'Value':[1.5,1.2,2.2,2.6,2.3,2.7,5,3,0]})
Means = df.groupby('Category')['Value'].mean()
Modes = df.groupby('Category')['Value'].agg(lambda x: pd.Series.mode(x)[0])
You can use seaborn to make the basic plot, below I remove the inner boxplot using the inner= argument, so that we can see the mode and means:
fig, ax = plt.subplots()
sns.violinplot(x='Category',y='Value',data=df,inner=None)
plt.setp(ax.collections, alpha=.3)
plt.scatter(x=range(len(Means)),y=Means,c="k")
plt.scatter(x=range(len(Modes)),y=Modes)
Related
I have numerous sets of seasonal data that I am looking to show in a heatmap format. I am not worried about the magnitude of the values in the dataset but more the overall direction and any patterns that i can look at in more detail later. To do this I want to create a heatmap that only shows 2 colours (red for below zero and green for zero and above).
I can create a normal heatmap with seaborn but the normal colour maps do not have only 2 colours and I am not able to create one myself. Even if I could I am unable to set the parameters to reflect the criteria of below zero = red and zero+ = green.
I managed to create this simply by styling the dataframe but I was unable to export it as a .png because the table_criteria='matplotlib' option removes the formatting.
Below is an example of what I would like to create made from random data, could someone help or point me in the direction of a helpful Stackoverflow answer?
I have also included the code I used to style and export the dataframe.
Desired output - this is created with random data in an Excel spreadsheet
#Code to create a regular heatmap - can this be easily amended?
df_hm = pd.read_csv(filename+h)
pivot = df_hm.pivot_table(index='Year', columns='Month', values='delta', aggfunc='sum')
fig, ax = plt.subplots(figsize=(10,5))
ax.set_title('M1 '+h[:-7])
sns.heatmap(pivot, annot=True, fmt='.2f', cmap='RdYlGn')
plt.savefig(chartpath+h[:-7]+" M1.png", bbox_inches='tight')
plt.close()
#code used to export dataframe that loses format in the .png
import matplotlib.pyplot as plt
import dataframe_image as dfi
#pivot is the dateframe name
pivot = pd.DataFrame(np.random.randint(-100,100,size= (5, 12)),columns=list ('ABCDEFGHIJKL'))
styles = [dict(selector="caption", props=[("font-size", "120%"),("font-weight", "bold")])]
pivot = pivot.style.format(precision=2).highlight_between(left=-100000, right=-0.01, props='color:white;background-color:red').highlight_between(left=0, right= 100000, props='color:white;background-color:green').set_caption(title).set_table_styles(styles)
dfi.export(pivot, root+'testhm.png', table_conversion='matplotlib',chrome_path=None)
You can manually set cmap property to list of colors and if you want to annotate you can do it and it will show same value as it's not converted to -1 or 1.
import numpy as np
import seaborn as sns
arr = np.random.randn(10,10)
sns.heatmap(arr,cmap=["grey",'green'],annot=True,center=0)
# center will make it dividing point
Output:
PS. If you don't want color-bar you can pass cbar=False in `sns.heatmap)
Welcome to SO!
To achieve what you need, you just need to pass delta through the sign function, here's an example code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
arr = np.random.randn(25,25)
sns.heatmap(np.sign(arr))
Which results in a binary heatmap, albeit one with a quite ugly colormap, still, you can fiddle around with Seaborn's colormaps in order to make it look like excel.
I have a dataframe corresponding to a multivariate time series which I'd like to plot. Each channel would appear on its own set of axes, with all plots arranged vertically. I'd also like to add the interactive options available with Bokeh, including the ability to remove one channel from view by clicking on its label.
Without Bokeh, I can use subplots to get the separate "static" plots stacked vertically as follows:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
A=np.random.rand(800,10)
df=pd.DataFrame(data=A,columns=['a','b','c','d','e','f','g','h','i','j'])
df.plot(subplots=True)
plt.show()
I can plot the 10 channels on one set of axes using Bokeh using this:
import numpy as np
import pandas as pd
pd.set_option('plotting.backend', 'pandas_bokeh')
A=np.random.rand(800,10)
df=pd.DataFrame(data=A,columns=['a','b','c','d','e','f','g','h','i','j'])
df.plot_bokeh(kind="line")
The resulting graph allows for zooming, panning, channel de-selection, etc. However all plots signals are plotted on the same set of axes, which I would rather not do.
I use this code snippet to plot my figures in a grid.
import pandas as pd
import pandas_bokeh
from bokeh.palettes import Dark2_5 as palette
def plot_grid(df: pd.DataFrame):
figs = []
color = itertools.cycle(palette)
for c in df.columns:
figs.append(df[c].plot_bokeh(show_figure=False, color=next(color)))
pandas_bokeh.plot_grid(figs, ncols=1, plot_width=1500)
The ncols parameter allows you to specify how many columns you want per row.
Hope this helps!
When exploring a I often use Pandas' DataFrame.hist() method to quickly display a grid of histograms for every numeric column in the dataframe, for example:
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets
data = datasets.load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df.hist(bins=50, figsize=(10,7))
plt.show()
Which produces a figure with separate plots for each column:
I've tried the following:
import pandas as pd
import seaborn as sns
from sklearn import datasets
data = datasets.load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
for col_id in df.columns:
sns.distplot(df[col_id])
But this produces a figure with a single plot and all columns overlayed:
Is there a way to produce a grid of histograms showing the data from a DataFrame's columns with Seaborn?
You can take advantage of seaborn's FacetGrid if you reorganize your dataframe using melt. Seaborn typically expects data organized this way (long format).
g = sns.FacetGrid(df.melt(), col='variable', col_wrap=2)
g.map(plt.hist, 'value')
There is no equivalent as seaborn displot itself will only pick 1-D array, or list, maybe you can try generating the subplots.
fig, ax = plt.subplots(2, 2, figsize=(10, 10))
for i in range(ax.shape[0]):
for j in range(ax.shape[1]):
sns.distplot(df[df.columns[i*2+j]], ax=ax[i][j])
https://seaborn.pydata.org/examples/distplot_options.html
Here is an example how you can show 4 graphs using subplot, with seaborn.
Anothert useful SEABORN method to quickly display a grid of histograms for every numeric column in the dataframe for you could be the quick,clean and handy sns.pairplot()
try:
sns.pairplot(df)
this has a lot of cool parameters you can explor like Hue etc
pairplot example for iris dataset
if you DON'T want the scatters you can actually create a customised grid really really quickly using sns.PairGrid(df)
this creates an empty grid with all the spaces and you can map whatever you want on them :g = sns.pairgrid(df)
`g.map(sns.distplot)` or `g.map_diag(plt.scatter)`
etc
I ended up adapting jcaliz's to make it work more generally, i.e. not just when the DataFrame has four columns, I also added code to remove any unused axes and ensure axes appear in alphabetical order (as with df.hist()).
size = int(math.ceil(len(df.columns)**0.5))
fig, ax = plt.subplots(size, size, figsize=(10, 10))
for i in range(ax.shape[0]):
for j in range(ax.shape[1]):
data_index = i*ax.shape[1]+j
if data_index < len(df.columns):
sns.distplot(df[df.columns.sort_values()[data_index]], ax=ax[i][j])
for i in range(len(df.columns), size ** 2):
fig.delaxes(ax[i // size][i % size])
I have 2 data sets in Pandas Dataframe and I want to visualize them on the same scatter plot so I tried:
import matplotlib.pyplot as plt
import seaborn as sns
sns.pairplot(x_vars=['Std'], y_vars=['ATR'], data=set1, hue='Asset Subclass')
sns.pairplot(x_vars=['Std'], y_vars=['ATR'], data=set2, hue='Asset Subclass')
plt.show()
But all the time I get 2 separate charts instead of a single one
How can I visualize both data sets on the same plot? Also can I have the same legend for both data sets but different colors for the second data set?
The following should work in the latest version of seaborn (0.9.0)
import matplotlib.pyplot as plt
import seaborn as sns
First we concatenate the two datasets into one and assign a dataset column which will allow us to preserve the information as to which row is from which dataset.
concatenated = pd.concat([set1.assign(dataset='set1'), set2.assign(dataset='set2')])
Then we use the sns.scatterplot function from the latest seaborn version (0.9.0) and via the style keyword argument set it so that the markers are based on the dataset column:
sns.scatterplot(x='Std', y='ATR', data=concatenated,
hue='Asset Subclass', style='dataset')
plt.show()
I need to make a plot of the following data, with the year_week on x-axis, the test_duration on the y-axis, and each operator as a different series. There may be multiple data points for the same operator in one week. I need to show standard deviation bands around each series.
data = pd.DataFrame({'year_week':[1601,1602,1603,1604,1604,1604],
'operator':['jones','jack','john','jones','jones','jack'],
'test_duration':[10,12,43,7,23,9]})
prints as:
I have looked at seaborn, matplotlib, and pandas, but I cannot find a solution.
It could be that you are looking for seaborn pointplot.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.DataFrame({'year_week':[1601,1602,1603,1604,1604,1604],
'operator':['jones','jack','john','jones','jones','jack'],
'test_duration':[10,12,43,7,23,9]})
sns.pointplot(x="year_week", y="test_duration", hue="operator", data=data)
plt.show()