I'm drawing several point plots in seaborn on the same graph. The x-axis is ordinal, not numerical; the ordinal values are the same for each point plot. I would like to shift each plot a bit to the side, the way pointplot(dodge=...) parameter does within multiple lines within a single plot, but in this case for multiple different plots drawn on top of each other. How can I do that?
Ideally, I'd like a technique that works for any matplotlib plot, not just seaborn specifically. Adding an offset to the data won't work easily, since the data is not numerical.
Example that shows the plots overlapping and making them hard to read (dodge within each plot works okay)
import pandas as pd
import seaborn as sns
df1 = pd.DataFrame({'x':list('ffffssss'), 'y':[1,2,3,4,5,6,7,8], 'h':list('abababab')})
df2 = df1.copy()
df2['y'] = df2['y']+0.5
sns.pointplot(data=df1, x='x', y='y', hue='h', ci='sd', errwidth=2, capsize=0.05, dodge=0.1, markers='<')
sns.pointplot(data=df2, x='x', y='y', hue='h', ci='sd', errwidth=2, capsize=0.05, dodge=0.1, markers='>')
I could use something other than seaborn, but the automatic confidence / error bars are very convenient so I'd prefer to stick with seaborn here.
Answering this for the most general case first.
A dodge can be implemented by shifting the artists in the figure by some amount. It might be useful to use points as units of that shift. E.g. you may want to shift your markers on the plot by 5 points.
This shift can be accomplished by adding a translation to the data transform of the artist. Here I propose a ScaledTranslation.
Now to keep this most general, one may write a function which takes the plotting method, the axes and the data as input, and in addition some dodge to apply, e.g.
draw_dodge(ax.errorbar, X, y, yerr =y/4., ax=ax, dodge=d, marker="d" )
The full functional code:
import matplotlib.pyplot as plt
from matplotlib import transforms
import numpy as np
import pandas as pd
def draw_dodge(*args, **kwargs):
func = args[0]
dodge = kwargs.pop("dodge", 0)
ax = kwargs.pop("ax", plt.gca())
trans = ax.transData + transforms.ScaledTranslation(dodge/72., 0,
ax.figure.dpi_scale_trans)
artist = func(*args[1:], **kwargs)
def iterate(artist):
if hasattr(artist, '__iter__'):
for obj in artist:
iterate(obj)
else:
artist.set_transform(trans)
iterate(artist)
return artist
X = ["a", "b"]
Y = np.array([[1,2],[2,2],[3,2],[1,4]])
Dodge = np.arange(len(Y),dtype=float)*10
Dodge -= Dodge.mean()
fig, ax = plt.subplots()
for y,d in zip(Y,Dodge):
draw_dodge(ax.errorbar, X, y, yerr =y/4., ax=ax, dodge=d, marker="d" )
ax.margins(x=0.4)
plt.show()
You may use this with ax.plot, ax.scatter etc. However not with any of the seaborn functions, because they don't return any useful artist to work with.
Now for the case in question, the remaining problem is to get the data in a useful format. One option would be the following.
df1 = pd.DataFrame({'x':list('ffffssss'),
'y':[1,2,3,4,5,6,7,8],
'h':list('abababab')})
df2 = df1.copy()
df2['y'] = df2['y']+0.5
N = len(np.unique(df1["x"].values))*len([df1,df2])
Dodge = np.linspace(-N,N,N)/N*10
fig, ax = plt.subplots()
k = 0
for df in [df1,df2]:
for (n, grp) in df.groupby("h"):
x = grp.groupby("x").mean()
std = grp.groupby("x").std()
draw_dodge(ax.errorbar, x.index, x.values,
yerr =std.values.flatten(), ax=ax,
dodge=Dodge[k], marker="o", label=n)
k+=1
ax.legend()
ax.margins(x=0.4)
plt.show()
You can use linspace to easily shift your graphs to where you want them to start and end. The function also makes it very easy to scale the graph so they would be visually the same width
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.pyplot as plt
start_offset = 3
end_offset = start_offset
y1 = np.random.randint(0, 10, 20) ##y1 has 20 random ints from 0 to 10
y2 = np.random.randint(0, 10, 10) ##y2 has 10 random ints from 0 to 10
x1 = np.linspace(0, 20, y1.size) ##create a number of steps from 0 to 20 equal to y1 array size-1
x2 = np.linspace(0, 20, y2.size)
plt.plot(x1, y1)
plt.plot(x2, y2)
plt.show()
Related
let's assume I have the following data
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.DataFrame(dict(x=[1,2,3]*4,
y=list(range(12)),
row_separator = ["a"]*6 + ["b"]*6,
col_separator = (["a"]*3 + ["b"]*3) * 2
)
)
data
I'd like to have a simple line plot of x and y for each combination of row_separator and col_separator, and I want to do this using sns.relplot(). y-scales are different, which is fine, but I want to have the same lower bound 0 for all sub-plots.
With plt.subplots, it's like this:
fig, axs = plt.subplots(2,2)
for i in [0,1]:
for j in [0,1]:
ind = (data.row_separator == data.row_separator.unique()[i]) &\
(data.col_separator == data.col_separator.unique()[j])
axs[i,j].plot(data.loc[ind,"x"], data.loc[ind,"y"])
axs[i,j].set_ylim(bottom=0)
With seaborn, I'm struggling how to do this. (My) Basis would be
g = sns.relplot(data=data,
x="x",
y="y",
col="row_separator",
row="col_separator",
kind="line"
)
I would expect facet_kws={"ylim":(0, None)} to do what I want, but it isn't (it's rather limiting all plots to 0 and 1).
I want to create a graph of 2 * height (which is the meter values in the index) versus the time squared (which are the decimal values in the columns). How can I go about doing this? (In matplotlib)
For clarity, I want the y-axis to be 2 * index values, and the x-axis to be the times squared from within the columns. I would like this to be a series of line graphs
It should end up looking something like this:
In your comment you say you use df1.plot() to draw lines. df.plot() uses dataframe index as x values by default. You say you want the y-axis to be 2 * index values, and the x-axis to be the times squared from within the columns. Your demand involves changes to dataframe values, so I suggest you use ax.plot() for better customization.
Here is a program uses numpy.linalg.lstsq which adopts Least squares internally to get a matched line among given points.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from io import StringIO
TESTDATA = StringIO("""Height Trial:1 Trial:2 Trial:3 Trial:4 Trial:5 Trial:6 Trial:7
1.029 0.4667 0.4616 0.4569 0.4579 0.4653 0.4578 0.4484
1.095 0.4752 0.4773 0.4721 0.4738 0.4713 0.4745 0.4663
1.168 0.4836 0.4834 0.4873 0.4890 0.4890 0.4904 0.4902
1.315 0.5139 0.5117 0.5161 0.5108 0.5224 0.5129 0.5187
1.540 0.5644 0.5677 0.5804 0.5535 0.5636 0.5605 0.5609
1.807 0.6051 0.6124 0.6014 0.6035 0.5977 0.6012 0.6209
""")
df = pd.read_csv(TESTDATA, delim_whitespace=True)
df.set_index(['Height'], inplace=True)
fig, ax = plt.subplots()
for column in df:
x = df[column]**2
y = df.index*2
A = np.vstack([x, np.ones(len(x))]).T
k, b = np.linalg.lstsq(A, y)[0]
line = ax.plot(x, y, 'o')
ax.plot(x, k*x+b, label=f'y={k:.5f}x+{b:.5f}', color=line[0].get_color(), linestyle='dashed')
plt.legend()
plt.xlabel('Fall time, squared (s²)')
plt.ylabel('Twice the height (m)')
plt.title('Measurement of Acceleration due to Gravity on Earth')
plt.show()
import matplotlib.pyplot as plt
plt.plot(list of things on x-axis, list of things on y-axs)
plt.show
import matplotlib.pyplot as plt
plt.plot(times_squared_variable, 2_height_variable, '--', color='choose_a_color')
# Label axis and the plot
plt.xlabel('Name_x_axis')
plt.ylabel('Name_y_axis')
plt.title('Plot_name')
# Show the plot
plt.show()
Sorry to my noob question, but how can I add a shadow area/color between the upper and lower lines in a seaborn chart?
The primary code I've working on is the following:
plt.figure(figsize=(18,10))
sns.set(style="darkgrid")
palette = sns.color_palette("mako_r", 3)
sns.lineplot(x="Date", y="Value", hue='Std_Type', style='Value_Type', sizes=(.25, 2.5), palette = palette, data=tbl4)
The idea is to get some effect like below (the example from seaborn website):
But I could not replicate the effect although my data structure is pretty much in the same fashion as fmri (seaborn example)
from seaborn link:
import seaborn as sns
sns.set(style="darkgrid")
# Load an example dataset with long-form data
fmri = sns.load_dataset("fmri")
# Plot the responses for different events and regions
sns.lineplot(x="timepoint", y="signal",
hue="region", style="event",
data=fmri)
Do you have some ideas?
I tried to change the chart style, but if I go to a distplot or relplot, for example, the x_axis cannot show the timeframe...
Check this code:
# import
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
sns.set(style = 'darkgrid')
# data generation
time = pd.date_range(start = '2006-01-01', end = '2020-01-01', freq = 'M')
tbl4 = pd.DataFrame({'Date': time,
'down': 1 - 0.5*np.random.randn(len(time)),
'up': 4 + 0.5*np.random.randn(len(time))})
tbl4 = tbl4.melt(id_vars = 'Date',
value_vars = ['down', 'up'],
var_name = 'Std_Type',
value_name = 'Value')
# figure plot
fig, ax = plt.subplots(figsize=(18,10))
sns.lineplot(ax = ax,
x = 'Date',
y = 'Value',
hue = 'Std_Type',
data = tbl4)
# fill area
plt.fill_between(x = tbl4[tbl4['Std_Type'] == 'down']['Date'],
y1 = tbl4[tbl4['Std_Type'] == 'down']['Value'],
y2 = tbl4[tbl4['Std_Type'] == 'up']['Value'],
alpha = 0.3,
facecolor = 'green')
plt.show()
which gives me this plot:
Since I do not have access to your data, I generated random ones. Replace them with yours.
The shadow area is done with plt.fill_between (documentation here), where you specify the x array (common to both curves), the upper and lower limits of the area as y1 and y2 and, optionally a color and its transparency with the facecolor and alpha parameters respectively.
You cannot do it through ci parameter, since it is used to show the confidence interval of your data.
How can I rotate a seaborn.lineplot so that the result will be as a function of y and not a function of x.
For example, this code:
import pandas as pd
import seaborn as sns
df = pd.DataFrame([[0,1],[0,2],[0,1.5],[1,1],[1,5]], columns=['group','val'])
sns.lineplot(x='group',y='val',data=df)
Create this figure:
But is there a way to rotate the figure in 90° ? so that in the X we will have "val" and in Y we will have "group" and the std will go from left to right and not from bottom to up.
Thanks
EDIT: I've opened a ticket in seaborn to ask for this feature: https://github.com/mwaskom/seaborn/issues/1661
Per the seaborn docs on lineplot, the dataframe passed to data must be
Tidy (“long-form”) dataframe where each column is a variable and each row is an observation.
Which seems to imply there is no way to force the axes to switch, even by manipulating the data. If there is a way to do that I haven't found it - I'm sure there is a more elegant way to do this, but one way you could go about it is to do it by hand so to speak. Something like this would do the trick
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
df = pd.DataFrame([[0,1],[0,2],[0,1.5],[1,1],[1,5]], columns=['group','val'])
group = df['group'].tolist()
val = df['val'].tolist()
yl = list()
yu = list()
avg = list()
ii = 0
while ii < len(group): #Loop through all the groups
g = group[ii]
y0 = val[ii]
y1 = val[ii]
s = 0
jj = ii
while (jj < len(group) and group[jj] == g):
s += val[jj]
#This takes the min and max, but could easily take the standard deviation
if val[jj] > y1:
y1 = val[jj]
if val[jj] < y0:
y0 = val[jj]
jj += 1
avg.append(s/(jj - ii))
ii = jj
yl.append(y0)
yu.append(y1)
x = np.linspace(min(group), max(group), len(yl))
plt.ylabel(df.columns[0])
plt.xlabel(df.columns[1])
plt.plot(avg, x, color="#5a9edd", linestyle="-", linewidth=1.5)
plt.fill_betweenx(x, yl, yu, alpha=0.3)
This will give you the following plot:
For brevity this uses the minimum and maximum from each group to give the error band, but that can be easily changed to standard error or standard deviation as needed.
Consider what you'd do if not using seaborn. You would calculate the mean and standard deviation and plot those as a function of the group. Now it is quite straight forward to exchange x and y for a plot(x,y): plot(y,x). For the filled region, you can use fill_betweenx instead of fill_between.
Below the two cases for comparisson.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame([[0,1],[0,2],[0,1.5],[1,1],[1,5]], columns=['group','val'])
mean = df.groupby("group").mean()
std = df.groupby("group").std()
fig, (ax, ax2) = plt.subplots(ncols=2)
ax.plot(mean.index, mean["val"].values)
ax.fill_between(mean.index, (mean-std)["val"].values, (mean+std)["val"].values, alpha=.5)
ax.set(xlabel="group", ylabel="val")
ax2.plot(mean["val"].values, mean.index)
ax2.fill_betweenx(mean.index, (mean-std)["val"].values, (mean+std)["val"].values, alpha=.5)
ax2.set(ylabel="group", xlabel="val")
fig.tight_layout()
plt.show()
I want to set the x tick density by specifying how many ticks to skip each time. For example, if the x axis is labelled by 100 consecutive dates, and I want to skip every 10 dates, then I will do something like
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
ts = pd.period_range("20060101", periods=100).strftime("%Y%m%d")
y = np.random.randn(100)
ax = plt.subplot(1, 1, 1)
ax.plot(ts, y)
xticks = ax.get_xticks()
ax.set_xticks(xticks[::10])
plt.xticks(rotation="vertical")
plt.show()
However the output is out of place. Pyplot only picks the first few ticks and place them all in the wrong positions, although the spacing is correct:
What can I do to get the desired output? Namely the ticks should be instead:
['20060101' '20060111' '20060121' '20060131' '20060210' '20060220'
'20060302' '20060312' '20060322' '20060401']
#klim's answer seems to put the correct marks on the axis, but the labels still won't show. An example where the date axis is correctly marked yet without labels:
Set xticklabels also. Like this.
xticks = ax.get_xticks()
xticklabels = ax.get_xticklabels()
ax.set_xticks(xticks[::10])
ax.set_xticklabels(xticklabels[::10], rotation=90)
Forget the above, which doesn't work.
How about this?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
ts = pd.period_range("20060101", periods=100).strftime("%Y%m%d")
x = np.arange(len(ts))
y = np.random.randn(100)
ax = plt.subplot(1, 1, 1)
ax.plot(x, y)
ax.set_xticks(x[::10])
ax.set_xticklabels(ts[::10], rotation="vertical")
plt.show()
This works on my machine.