I am trying to change the scatterplot to be a lineplot I have attempted to try using plot.lines[0].set_linestyle("-") however this only affects the regression line which is already a lineplot.
I understand if I used sns.lineplot their is a setting in their to turn on regression however I am trying to do this using regplot.
df = pd.DataFrame({"x": [1, 2, 3, 4, 5, 6], "y": [10, 30, 60, 90, 60, 30]})
plot = sns.regplot(x="x", y="y", data=df, ci=65)
plt.show()
The reason I want to change the scatterplot to a lineplot is beacouse its hard to see whats going on with large datasets other whys.
To clarify I am trying display a lineplot instead of a scatterplot for the original data.
This looks a bit odd, but I'm guessing it's what you want?
import seaborn as sns
df = pd.DataFrame({"x": [1, 2, 3, 4, 5, 6], "y": [10, 30, 60, 90, 60, 30]})
sns.regplot(x="x", y="y", data=df, ci=65,scatter=False)
sns.lineplot(x="x", y="y", data=df)
Related
As per the Plotly website, in a simple line chart one can change the legend entry from the column name to a manually specified string of text. For example, this code results in the following chart:
import pandas as pd
import plotly.express as px
df = pd.DataFrame(dict(
x = [1, 2, 3, 4],
y = [2, 3, 4, 3]
))
fig = px.line(
df,
x="x",
y="y",
width=800, height=600,
labels={
"y": "Series"
},
)
fig.show()
label changed:
However, when one plots multiple columns to the line chart, this label specification no longer works. There is no error message, but the legend entries are simply not changed. See this example and output:
import pandas as pd
import plotly.express as px
df = pd.DataFrame(dict(
x = [1, 2, 3, 4],
y1 = [2, 3, 4, 3],
y2 = [2, 4, 6, 8]
))
fig = px.line(
df,
x="x",
y=["y1", "y2"],
width=800, height=600,
labels={
"y1": "Series 1",
"y2": "Series 2"
},
)
fig.show()
legend entries not changed:
Is this a bug, or am I missing something? Any idea how this can be fixed?
In case anybody read my previous post, I did some more digging and found the solution to this issue. At the heart, the labels one sees over on the right in the legend are attributes known as "names" and not "labels". Searching for how to revise those names, I came across another post about this issue with a solution Legend Label Update. Using that information, here is a revised version of your program.
import pandas as pd
import plotly.express as px
df = pd.DataFrame(dict(
x = [1, 2, 3, 4],
y1 = [2, 3, 4, 3],
y2 = [2, 4, 6, 8]
))
fig = px.line(df, x="x", y=["y1", "y2"], width=800, height=600)
fig.update_layout(legend_title_text='Variable', xaxis_title="X", yaxis_title="Series")
newnames = {'y1':'Series 1', 'y2': 'Series 2'} # From the other post
fig.for_each_trace(lambda t: t.update(name = newnames[t.name]))
fig.show()
Following is a sample graph.
Try that out to see if that addresses your situation.
Regards.
This question already has answers here:
pandas bar plot combined with line plot shows the time axis beginning at 1970
(2 answers)
How can I make a barplot and a lineplot in the same seaborn plot with different Y axes nicely?
(2 answers)
Line plot over bar plot using Seaborn - Line plot won't render
(1 answer)
Problem in combining bar plot and line plot (python)
(2 answers)
Closed 1 year ago.
I'm trying to combine a seaborn barplot with a seaborn lineplot. For some reason, I am able to do both seperately, but when combining the two the x-axis is all over the place.
Figure 1 shows the bar plot, Figure 2 shows the line plot (both working fine) and Figure 3 is my attempt at combining both. I've read somewhere that seaborn uses categorical x-axis values, so my feeling is that this is part of the answer. Nevertheless, I can't seem to get it right.
Worth mentioning, my goal of this whole exercise is to get a moving-average line that follows the barplot. So any insights/workarounds to achieve that are also welcome.
This is my code:
dfGroup = pd.DataFrame({
'Year': [1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920],
'Total Deaths': [0, 0, 2, 3, 2, 3, 4, 5, 6, 7, 8],
'Total Affected': [0, 1, 0, 2, 3, 6, 9, 8, 12, 13, 15]
})
# Add 3-year rolling average
dfGroup['rolling_3years'] = dfGroup['Total Deaths'].rolling(3).mean().shift(0)
dfGroup = dfGroup.fillna(0)
# Make a smooth line from the 3-year rolling average
from scipy.interpolate import make_interp_spline
X_Y_Spline = make_interp_spline(dfGroup['Year'], dfGroup['rolling_3years'])
# Returns evenly spaced numbers over a specified interval.
X_ = np.linspace(dfGroup['Year'].min(), dfGroup['Year'].max(), 500)
Y_ = X_Y_Spline(X_)
# Plot the data
a4_dims = (15, 10)
fig, ax1 = plt.subplots(figsize=a4_dims)
ax1 = sns.barplot(x = "Year", y = "Total Deaths",
data = dfGroup, color='#42b7bd')
ax2 = ax1.twinx()
ax2 = sns.lineplot(X_, Y_, marker='o')
This is what my dfGroup dataframe looks like:
I'm trying to create a matplotlib bar chart with categories on the X-axis, but I can't get the categories right. Here's a minimal example of what I'm trying to do.
data = [[46, 11000], [97, 15000], [27, 24000], [36, 9000], [9, 17000]]
df = pd.DataFrame(data, columns=['car_id', 'price'])
fig1, ax1 = plt.subplots(figsize=(10,5))
ax1.set_title('Car prices')
ax1.bar(df['car_id'], df['price'])
plt.xticks(np.arange(len(df)), list(df['car_id']))
plt.legend()
plt.show()
I need the five categories (car_id) on the X-axis. What Am I doing wrong? :-/
You can turn car_id into category:
df['car_id'] = df['car_id'].astype('category')
df.plot.bar(x='car_id')
Output:
You can also plot just the price column and relabel:
ax = df.plot.bar(y='price')
ax.set_xticklabels(df['car_id'])
You got confused in the xticks with the label and position. Here you specify the position np.arange(len(df)) and the labels list(df['car_id']. So he puts the labels at the specified position list(df['car_id'], i.e. array([0, 1, 2, 3, 4]).
If the position and the labels are here the same, just replace plt.xticks(np.arange(len(df)), list(df['car_id'])) by plt.xticks(df['car_id']).
If you want them to be evenly spaced, your approach is right but you also need to change ax1.bar(df['car_id'], df['price']) toax1.bar(np.arange(len(df)), df['price']), so that the bar x-position is now evenly spaced.
Full code:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
data = [[46, 11000], [97, 15000], [27, 24000], [36, 9000], [9, 17000]]
df = pd.DataFrame(data, columns=['car_id', 'price'])
fig1, ax1 = plt.subplots(figsize=(10,5))
ax1.set_title('Car prices')
ax1.bar(np.arange(len(df)), df['price'])
ax1.set_xticks(np.arange(len(df)))
ax1.set_xticklabels(df['car_id'])
plt.show()
I am stuck on what seems like an easy problem trying to color the different groups on a scatterplot I am creating. I have the following example dataframe and graph:
test_df = pd.DataFrame({ 'A' : 1.,
'B' : np.array([1, 5, 9, 7, 3], dtype='int32'),
'C' : np.array([6, 7, 8, 9, 3], dtype='int32'),
'D' : np.array([2, 2, 3, 4, 4], dtype='int32'),
'E' : pd.Categorical(["test","train","test","train","train"]),
'F' : 'foo' })
# fix to category
# test_df['D'] = test_df["D"].astype('category')
# and test plot
f, ax = plt.subplots(figsize=(6,6))
ax = sns.scatterplot(x="B", y="C", hue="D", s=100,
data=test_df)
which creates this graph:
However, instead of a continuous scale, I'd like a categorical scale for each of the 3 categories [2, 3, 4]. After I uncomment the line of code test_df['D'] = ..., to change this column to a category column-type for category-coloring in the seaborn plot, I receive the following error from the seaborn plot: TypeError: data type not understood
Does anybody know the correct way to convert this numeric column to a factor / categorical column to use for coloring?
Thanks!
I copy/pasted your code, added libraries for import and removed the comment as I thought it looked good. I get a plot with 'categorical' colouring for value [2,3,4] without changing any of your code.
Try updating your seaborn module using: pip install --upgrade seaborn
Here is a list of working libraries used with your code.
matplotlib==3.1.2
numpy==1.18.1
seaborn==0.10.0
pandas==0.25.3
... which executed below code.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
test_df = pd.DataFrame({ 'A' : 1.,
'B' : np.array([1, 5, 9, 7, 3], dtype='int32'),
'C' : np.array([6, 7, 8, 9, 3], dtype='int32'),
'D' : np.array([2, 2, 3, 4, 4], dtype='int32'),
'E' : pd.Categorical(["test","train","test","train","train"]),
'F' : 'foo' })
# fix to category
test_df['D'] = test_df["D"].astype('category')
# and test plot
f, ax = plt.subplots(figsize=(6,6))
ax = sns.scatterplot(x="B", y="C", hue="D", s=100,
data=test_df)
plt.show()
I encoutered the same error TypeError: data type not understood.
Workaround that works is to use option legend="full". Conversion to categorical type is not necessary in this approach:
ax = sns.scatterplot(x="B", y="C", hue="D", s=100, legend="full", data=test_df)
Another solution is to use custom palette:
ax = sns.scatterplot(x="B", y="C", hue="D", s=100, palette=["b", "g", "r"], data=test_df)
In this case number of colours must be equal to unique values in column "D".
I'd like to create a list of boxplots with the color of the box dependent on the name of the pandas.DataFrame column I use as input.
The column names contain strings that indicate an experimental condition based on which I want the box of the boxplot colored.
I do this to make the boxplots:
sns.boxplot(data = data.dropna(), orient="h")
plt.show()
This creates a beautiful list of boxplots with correct names. Now I want to give every boxplot that has 'prog +, DMSO+' in its name a red color, leaving the rest as blue.
I tried creating a dictionary with column names as keys and colors as values:
color = {}
for column in data.columns:
if 'prog+, DMSO+' in column:
color[column] = 'red'
else:
color[column] = 'blue'
And then using the dictionary as color:
sns.boxplot(data = data.dropna(), orient="h", color=color[column])
plt.show()
This does not work, understandably (there is no loop to go through the dictionary). So I make a loop:
for column in data.columns:
sns.boxplot(data = data[column], orient='h', color=color[column])
plt.show()
This does make boxplots of different colors but all on top of each other and without the correct labels. If I could somehow put these boxplot nicely in one plot below each other I'd be almost at what I want. Or is there a better way?
You should use the palette parameter, which handles multiple colors, rather than color, which handles a specific one. You can give palette a name, an ordered list, or a dictionary. The latter seems best suited to your question:
import seaborn as sns
sns.set_color_codes()
tips = sns.load_dataset("tips")
pal = {day: "r" if day == "Sat" else "b" for day in tips.day.unique()}
sns.boxplot(x="day", y="total_bill", data=tips, palette=pal)
You can set the facecolor of individual boxes after plotting them all in one go, using ax.artists[i].set_facecolor('r')
For example:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(
[[2, 4, 5, 6, 1],
[4, 5, 6, 7, 2],
[5, 4, 5, 5, 1],
[10, 4, 7, 8, 2],
[9, 3, 4, 6, 2],
[3, 3, 4, 4, 1]
],columns=['bar', 'prog +, DMSO+ 1', 'foo', 'something', 'prog +, DMSO+ 2'])
ax = sns.boxplot(data=df,orient='h')
boxes = ax.artists
for i,box in enumerate(boxes):
if 'prog +, DMSO+' in df.columns[i]:
box.set_facecolor('r')
else:
box.set_facecolor('b')
plt.tight_layout()
plt.show()