With matplotlib, I can make a histogram with two datasets on one plot (one next to the other, not overlay).
import matplotlib.pyplot as plt
import random
x = [random.randrange(100) for i in range(100)]
y = [random.randrange(100) for i in range(100)]
plt.hist([x, y])
plt.show()
This yields the following plot.
However, when I try to do this with seabron;
import seaborn as sns
sns.distplot([x, y])
I get the following error:
ValueError: color kwarg must have one color per dataset
So then I try to add some color values:
sns.distplot([x, y], color=['r', 'b'])
And I get the same error. I saw this post on how to overlay graphs, but I would like these histograms to be side by side, not overlay.
And looking at the docs it doesn't specify how to include a list of lists as the first argument 'a'.
How can I achieve this style of histogram using seaborn?
If I understand you correctly you may want to try something this:
fig, ax = plt.subplots()
for a in [x, y]:
sns.distplot(a, bins=range(1, 110, 10), ax=ax, kde=False)
ax.set_xlim([0, 100])
Which should yield a plot like this:
UPDATE:
Looks like you want 'seaborn look' rather than seaborn plotting functionality.
For this you only need to:
import seaborn as sns
plt.hist([x, y], color=['r','b'], alpha=0.5)
Which will produce:
UPDATE for seaborn v0.12+:
After seaborn v0.12 to get seaborn-styled plots you need to:
import seaborn as sns
sns.set_theme() # <-- This actually changes the look of plots.
plt.hist([x, y], color=['r','b'], alpha=0.5)
See seaborn docs for more information.
Merge x and y to DataFrame, then use histplot with multiple='dodge' and hue option:
import random
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
x = [random.randrange(100) for _ in range(100)]
y = [random.randrange(100) for _ in range(100)]
df = pd.concat(axis=0, ignore_index=True, objs=[
pd.DataFrame.from_dict({'value': x, 'name': 'x'}),
pd.DataFrame.from_dict({'value': y, 'name': 'y'})
])
fig, ax = plt.subplots()
sns.histplot(
data=df, x='value', hue='name', multiple='dodge',
bins=range(1, 110, 10), ax=ax
)
ax.set_xlim([0, 100])
Related
I'm working with data that has the data has 3 plotting parameters: x,y,c. How do you create a custom color value for a scatter plot?
Extending this example I'm trying to do:
import matplotlib
import matplotlib.pyplot as plt
cm = matplotlib.cm.get_cmap('RdYlBu')
colors=[cm(1.*i/20) for i in range(20)]
xy = range(20)
plt.subplot(111)
colorlist=[colors[x/2] for x in xy] #actually some other non-linear relationship
plt.scatter(xy, xy, c=colorlist, s=35, vmin=0, vmax=20)
plt.colorbar()
plt.show()
but the result is TypeError: You must first set_array for mappable
From the matplotlib docs on scatter 1:
cmap is only used if c is an array of floats
So colorlist needs to be a list of floats rather than a list of tuples as you have it now.
plt.colorbar() wants a mappable object, like the CircleCollection that plt.scatter() returns.
vmin and vmax can then control the limits of your colorbar. Things outside vmin/vmax get the colors of the endpoints.
How does this work for you?
import matplotlib.pyplot as plt
cm = plt.cm.get_cmap('RdYlBu')
xy = range(20)
z = xy
sc = plt.scatter(xy, xy, c=z, vmin=0, vmax=20, s=35, cmap=cm)
plt.colorbar(sc)
plt.show()
Here is the OOP way of adding a colorbar:
fig, ax = plt.subplots()
im = ax.scatter(x, y, c=c)
fig.colorbar(im, ax=ax)
If you're looking to scatter by two variables and color by the third, Altair can be a great choice.
Creating the dataset
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(40*np.random.randn(10, 3), columns=['A', 'B','C'])
Altair plot
from altair import *
Chart(df).mark_circle().encode(x='A',y='B', color='C').configure_cell(width=200, height=150)
Plot
I created a Boxplot like this:
f, ax = plt.subplots(figsize=(15,7))
sns.despine(bottom=True, left=True)
sns.boxplot(x=x)
ax.set(xlim=(0, 120))
ax.grid(linestyle='-', axis="x")
ax.xaxis.set_major_locator(ticker.MultipleLocator(24))
ax.set_axisbelow(True)
plt.show()
Which look like this:
Like i already marked in the Picture, i want a different xtick range for a specific part in the graph. So until the value of 24 the ticker should be ticker.MultipleLocator(8) and then it should continue with ticker.MultipleLocator(24).
Since multiple locators cannot be mixed, there is a way to create and combine scales for each.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
tips = sns.load_dataset("tips")
fig, ax = plt.subplots(figsize=(15,7))
sns.despine(bottom=True, left=True)
g = sns.boxplot(x=tips['total_bill'])
ax.set(xlim=(0, 120))
ax.grid(linestyle='-', axis="x")
tickA = np.arange(0,24,8)
tickB = np.arange(24,120,24)
new_ticks = np.concatenate([tickA, tickB])
ax.set_xticks(new_ticks)
ax.set_axisbelow(True)
plt.show()
I want to format y-axis labels in a seaborn FacetGrid plot, with a number of decimals, and/or with some text added.
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="ticks")
exercise = sns.load_dataset("exercise")
g = sns.catplot(x="time", y="pulse", hue="kind", col="diet", data=exercise)
#g.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: '{:,.2f}'.format(x) + 'K'))
#g.set(xticks=['a','try',0.5])
g.yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: '{:,.2f}'.format(x) + 'K'))
plt.show()
Inspired from How to format seaborn/matplotlib axis tick labels from number to thousands or Millions? (125,436 to 125.4K)
ax.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: '{:,.2f}'.format(x) + 'K'))
It results in the following error.
AttributeError: 'FacetGrid' object has no attribute 'xaxis'
xaxis and yaxis are attributes of the plot axes, for a seaborn.axisgrid.FacetGrid type.
In the linked answer, the type is matplotlib.axes._subplots.AxesSubplot
p in the lambda expression is the tick label number.
seaborn: Building structured multi-plot grids
matplotlib: Creating multiple subplots
Tested and working with the following versions:
matplotlib v3.3.4
seaborn v0.11.1
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as tkr
sns.set(style="ticks")
# load data
exercise = sns.load_dataset("exercise")
# plot data
g = sns.catplot(x="time", y="pulse", hue="kind", col="diet", data=exercise)
# format the labels with f-strings
for ax in g.axes.flat:
ax.yaxis.set_major_formatter(tkr.FuncFormatter(lambda y, p: f'{y:.2f}: Oh baby, baby'))
ax.xaxis.set_major_formatter(tkr.FuncFormatter(lambda x, p: f'{x}: Is that your best'))
As noted in a comment by Patrick FitzGerald, the following code, without using tkr.FuncFormatter, also works to generate the previous plot.
See matplotlib.axis.Axis.set_major_formatter
# format the labels with f-strings
for ax in g.axes.flat:
ax.yaxis.set_major_formatter(lambda y, p: f'{y:.2f}: Oh baby, baby')
ax.xaxis.set_major_formatter(lambda x, p: f'{x}: Is that your best')
I would like to overplot a swarmplot and regplot in seaborn, so that I can have a y=x line through my swarmplot.
Here is my code:
import matplotlib.pyplot as plt
import seaborn as sns
sns.regplot(y=y, x=x, marker=' ', color='k')
sns.swarmplot(x=x_data, y=y_data)
I don't get any errors when I plot, but the regplot never shows on the plot. How can I fix this?
EDIT: My regplot and swarmplot don't overplot and instead, plot in the same frame but separated by some unspecified y amount. If I flip them so regplot is above the call to swarmplot, regplot doesn't show up at all.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.DataFrame({"x":x_data,"y":y_data} )
sns.regplot(y="y", x="x", data= df, color='k', scatter_kws={"alpha" : 0.0})
sns.swarmplot(y="y", x="x", data= df)
SECOND EDIT: The double axis solution from below works beautifully!
In principle the approach of plotting a swarmplot and a regplot simulatneously works fine.
The problem here is that you set an empty marker (marker = " "). This destroys the regplot, such that it's not shown. Apparently this is only an issue when plotting several things to the same graph; plotting a single regplot with empty marker works fine.
The solution would be not to specify the marker argument, but instead set the markers invisible by using the scatter_kws argument: scatter_kws={"alpha" : 0.0}.
Here is a complete example:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
## generate some data
n=19; m=9
y_data = []
for i in range(m):
a = (np.random.poisson(lam=0.99-float(i)/m,size=n)+i*.9+np.random.rand(1)*2)
a+=(np.random.rand(n)-0.5)*2
y_data.append(a*m)
y_data = np.array(y_data).flatten()
x_data = np.floor(np.sort(np.random.rand(n*m))*m)
## put them into dataframe
df = pd.DataFrame({"x":x_data,"y":y_data} )
## plotting
sns.regplot(y="y", x="x", data= df, color='k', scatter_kws={"alpha" : 0.0})
sns.swarmplot(x="x", y="y", data= df)
plt.show()
Concerning the edited part of the question:
Since swarmplot is a categorical plot, the axis in the plot still goes from -0.5 to 8.5 and not as the labels suggest from 10 to 18.
A possible workaround is to use two axes and twiny.
fig, ax = plt.subplots()
ax2 = ax.twiny()
sns.swarmplot(x="x", y="y", data= df, ax=ax)
sns.regplot(y="y", x="x", data= df, color='k', scatter_kws={"alpha" : 0.0}, ax=ax2)
ax2.grid(False) #remove grid as it overlays the other plot
Dataframes in Pandas have a boxplot method, but is there any way to create dot-boxplots in Pandas, or otherwise with seaborn?
By a dot-boxplot, I mean a boxplot that shows the actual data points (or a relevant sample of them) inside the plot, e.g. like the example below (obtained in R).
For a more precise answer related to OP's question (with Pandas):
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.DataFrame({ "A":np.random.normal(0.8,0.2,20),
"B":np.random.normal(0.8,0.1,20),
"C":np.random.normal(0.9,0.1,20)} )
data.boxplot()
for i,d in enumerate(data):
y = data[d]
x = np.random.normal(i+1, 0.04, len(y))
plt.plot(x, y, mfc = ["orange","blue","yellow"][i], mec='k', ms=7, marker="o", linestyle="None")
plt.hlines(1,0,4,linestyle="--")
Old version (more generic) :
With matplotlib :
import numpy as np
import matplotlib.pyplot as plt
a = np.random.normal(0,2,1000)
b = np.random.normal(-2,7,100)
data = [a,b]
plt.boxplot(data) # Or you can use the boxplot from Pandas
for i in [1,2]:
y = data[i-1]
x = np.random.normal(i, 0.02, len(y))
plt.plot(x, y, 'r.', alpha=0.2)
Which gives that :
Inspired from this tutorial
Hope this helps !
This will be possible with seaborn version 0.6 (currently in the master branch on github) using the stripplot function. Here's an example:
import seaborn as sns
tips = sns.load_dataset("tips")
sns.boxplot(x="day", y="total_bill", data=tips)
sns.stripplot(x="day", y="total_bill", data=tips,
size=4, jitter=True, edgecolor="gray")