Explicitly set colours of the boxplot in ploltly - python

I am using plotly express to plot boxplot as shown below:
px.box(data_frame=df,
y="price",
x="products",
points="all")
However, the boxpots of the products shown up with the same colours. They are four products. I would like to colour each with a different colour, using an additional paramter color_discrete_sequence does not work.

I am using plotly.express.data.tips() as an example dataset and am creating a new column called mcolour to show how we can use an additional column for coloring. See below;
## packages
import plotly.express as px
import numpy as np
import pandas as pd
## example dataset:
df = px.data.tips()
## creating a new column with colors
df['mcolour'] = np.where(
df['day'] == "Sun" ,
'#636EFA',
np.where(
df['day'] == 'Sat', '#EF553B', '#00CC96'
)
)
## plot
fig = px.box(df, x="day", y="total_bill", color="mcolour")
fig = fig.update_layout(showlegend=False)
fig.show()
So, as you see, you can simply assign colors based on another column using color argument in plotly.express.box().

You will need to add, before plotting, this parameter setting (as part of an effective solution) in order to align the (indeed!) newly colored box plots correctly.
fig.update_layout(boxmode = "overlay")
The boxmode setting "overlay" brings the plot back to the normal layout, that is seemingly being overridden (as setting "group") after having set the color.
In the plotly help it says about boxmode:
"Determines how boxes at the same location coordinate are displayed on
the graph. If 'group', the boxes are plotted next to one another
centered around the shared location. If 'overlay', the boxes are
plotted over one another [...]"
Hope this helps! R

Related

Can mark_rule be extended outside the chart with Altair?

Is there a way to make a rule mark longer without disrupting the axes of a chart? If I have this:
random.seed(0)
df = pd.DataFrame({'x':[i for i in range(1,21)],'y':random.sample(range(1,50), 20)})
chart = alt.Chart(df).mark_area().encode(x='x',y='y')
ruler = alt.Chart(pd.DataFrame({'x':[5]})).mark_rule().encode(x='x')
chart+ruler
But I want this
:
You can set an explicit y-domain and then set clip=False inside mark_rule, but you also need to define the y-range of the rule since the default is to stretch over the entire plot:
import altair as alt
import pandas as pd
import random
random.seed(0)
df = pd.DataFrame({'x':[i for i in range(1,21)],'y':random.sample(range(1,50), 20)})
chart = alt.Chart(df).mark_area().encode(x='x', y=alt.Y('y', scale=alt.Scale(domain=(0, 50))))
ruler = alt.Chart(pd.DataFrame({'x':[5], 'y': [-10], 'y2': [50]})).mark_rule(clip=False, fill='black').encode(x='x', y='y', y2='y2')
chart+ruler
Have you tried to overlay an empty plot on top with wider margins? So the overlay plot just includes the line, but since it has larger margins on the bottom it will extend past the original plot.

Size legend for plotly express scatterplot in Python

Here is a Plotly Express scatterplot with marker color, size and symbol representing different fields in the data frame. There is a legend for symbol and a colorbar for color, but there is nothing to indicate what marker size represents.
Is it possible to display a "size" legend? In the legend I'm hoping to show some example marker sizes and their respective values.
A similar question was asked for R and I'm hoping for a similar results in Python. I've tried adding markers using fig.add_trace(), and this would work, except I don't know how to make the sizes equal.
import pandas as pd
import plotly.express as px
import random
# create data frame
df = pd.DataFrame({
'X':list(range(1,11,1)),
'Y':list(range(1,11,1)),
'Symbol':['Yes']*5+['No']*5,
'Color':list(range(1,11,1)),
'Size':random.sample(range(10,150), 10)
})
# create scatterplot
fig = px.scatter(df, y='Y', x='X',color='Color',symbol='Symbol',size='Size')
# move legend
fig.update_layout(legend=dict(y=1, x=0.1))
fig.show()
Scatterplot Image:
Thank you
You can not achieve this goal, if you use a metric scale/data like in your range. Plotly will try to always interpret it like metric, even if it seems/is discrete in the output. So your data has to be a factor like in R, as you are showing groups. One possible solution could be to use a list comp. and convert everything to a str. I did it in two steps so you can follow:
import pandas as pd
import plotly.express as px
import random
check = sorted(random.sample(range(10,150), 10))
check = [str(num) for num in check]
# create data frame
df = pd.DataFrame({
'X':list(range(1,11,1)),
'Y':list(range(1,11,1)),
'Symbol':['Yes']*5+['No']*5,
'Color':check,
'Size':list(range(1,11,1))
})
# create scatterplot
fig = px.scatter(df, y='Y', x='X',color='Color',symbol='Symbol',size='Size')
# move legend
fig.update_layout(legend=dict(y=1, x=0.1))
fig.show()
That gives:
Keep in mind, that you also get the symbol label, as you now have TWO groups!
Maybe you want to sort the values in the list before converting to string!
Like in this picture (added it to the code above)
UPDATE
Hey There,
yes, but as far as I know, only in matplotlib, and it is a little bit hacky, as you simulate scatter plots. I can only show you a modified example from matplotlib, but maybe it helps you so you can fiddle it out by yourself:
from numpy.random import randn
z = randn(10)
red_dot, = plt.plot(z, "ro", markersize=5)
red_dot_other, = plt.plot(z*2, "ro", markersize=20)
plt.legend([red_dot, red_dot_other], ["Yes", "No"], markerscale=0.5)
That gives:
As you can see you are working with two different plots, to be exact one plot for each size legend. In the legend these plots are merged together. Legendsize is further steered through markerscale and it is linked to markersize of each plot. And because we have two plots with TWO different markersizes, we can create a plot with different markersizes in the legend. markerscale is normally a value between 0 and 1 but you can also do 150% thus 1.5.
You can achieve this through fiddling around with the legend handler in matplotlib see here:
https://matplotlib.org/stable/tutorials/intermediate/legend_guide.html

Plotly: How to define colors in a figure using Plotly Graph Objects and Plotly Express?

There are many questions and answers that touch upon this topic one way or another. With this contribution I'd like to clearly show why an easy approch such as marker = {'color' : 'red'} will work for plotly.graph_objects (go), but color='red' will not for plotly.express (px) although color is an attribute of both px.Line and px.Scatter. And I'd like to demonstrate why it's awesome that it doesn't.
So, if px is supposed to be the easiest way to make a plotly figure, then why does something as apparently obvious as color='red' return the error
ValueError: Value of 'color' is not the name of a column in 'data_frame'.
To put it short, it's because color in px does not accept an arbitrary color name or code, but rather a variable name in your dataset in order to assign a color cycle to unique values and display them as lines with different colors.
Let me demonstrate by applyig a gapminder dataset and show a scatterplot of Life expectancy versus GDP per capita for all (at least most) countries across the world as of 2007. A basic setup like below will produce the following plot
Figure 1, plot using go:
The color is set by a cycle named plotly but is here specified using marker = {'color' : 'red'}
Figure 2, code:
import plotly.graph_objects as go
df = px.data.gapminder()
df=df.query("year==2007")
fig = go.Figure()
fig.add_traces(go.Scatter(x=df['gdpPercap'], y=df["lifeExp"],
mode = 'markers',
marker = {'color' : 'red'}
))
fig.show()
So let's try this with px, and assume that color='red' would do the trick:
Code 2, attempt at scatter plot with defined color using px:
# imports
import plotly.express as px
import pandas as pd
# dataframe
df = px.data.gapminder()
df=df.query("year==2007")
# plotly express scatter plot
px.scatter(df, x="gdpPercap", y="lifeExp",
color = 'red',
)
Result:
ValueError: Value of 'color' is not the name of a column in
'data_frame'. Expected one of ['country', 'continent', 'year',
'lifeExp', 'pop', 'gdpPercap', 'iso_alpha', 'iso_num'] but received:
red
So what's going on here?
First, if an explanation of the broader differences between go and px is required, please take a look here and here. And if absolutely no explanations are needed, you'll find a complete code snippet at the very end of the answer which will reveal many of the powers with colors in plotly.express
Part 1: The Essence:
It might not seem so at first, but there are very good reasons why color='red' does not work as you might expect using px. But first of all, if all you'd like to do is manually set a particular color for all markers you can do so using .update_traces(marker=dict(color='red')) thanks to pythons chaining method. But first, lets look at the deafult settings:
1.1 Plotly express defaults
Figure 1, px default scatterplot using px.Scatter
Code 1, px default scatterplot using px.Scatter
# imports
import plotly.express as px
import pandas as pd
# dataframe
df = px.data.gapminder()
df=df.query("year==2007")
# plotly express scatter plot
px.scatter(df, x="gdpPercap", y="lifeExp")
Here, as already mentioned in the question, the color is set as the first color in the default plotly sequence available through px.colors.qualitative.Plotly:
['#636EFA', # the plotly blue you can see above
'#EF553B',
'#00CC96',
'#AB63FA',
'#FFA15A',
'#19D3F3',
'#FF6692',
'#B6E880',
'#FF97FF',
'#FECB52']
And that looks pretty good. But what if you want to change things and even add more information at the same time?
1.2: How to override the defaults and do exactly what you want with px colors:
As we alread touched upon with px.scatter, the color attribute does not take a color like red as an argument. Rather, you can for example use color='continent' to easily distinguish between different variables in a dataset. But there's so much more to colors in px:
The combination of the six following methods will let you do exactly what you'd like with colors using plotly express. Bear in mind that you do not even have to choose. You can use one, some, or all of the methods below at the same time. And one particular useful approach will reveal itself as a combinatino of 1 and 3. But we'll get to that in a bit. This is what you need to know:
1. Change the color sequence used by px with:
color_discrete_sequence=px.colors.qualitative.Alphabet
2. Assign different colors to different variables with the color argument
color = 'continent'
3. customize one or more variable colors with
color_discrete_map={"Asia": 'red'}
4. Easily group a larger subset of your variables using dict comprehension and color_discrete_map
subset = {"Asia", "Africa", "Oceania"}
group_color = {i: 'red' for i in subset}
5. Set opacity using rgba() color codes.
color_discrete_map={"Asia": 'rgba(255,0,0,0.4)'}
6. Override all settings with:
.update_traces(marker=dict(color='red'))
Part 2: The details and the plots
The following snippet will produce the plot below that shows life expectany for all continents for varying levels of GDP. The size of the markers representes different levels of populations to make things more interesting right from the get go.
Plot 2:
Code 2:
import plotly.express as px
import pandas as pd
# dataframe, input
df = px.data.gapminder()
df=df.query("year==2007")
px.scatter(df, x="gdpPercap", y="lifeExp",
color = 'continent',
size='pop',
)
To illustrate the flexibility of the methods above, lets first just change the color sequence. Since we for starters are only showing one category and one color, you'll have to wait for the subsequent steps to see the real effects. But here's the same plot now with color_discrete_sequence=px.colors.qualitative.Alphabet as per step 1:
1. Change the color sequence used by px with
color_discrete_sequence=px.colors.qualitative.Alphabet
Now, let's apply the colors from the Alphabet color sequence to the different continents:
2. Assign different colors to different variables with the color argument
color = 'continent'
If you, like me, think that this particular color sequence is easy on the eye but perhaps a bit indistinguishable, you can assign a color of your choosing to one or more continents like this:
3. customize one or more variable colors with
color_discrete_map={"Asia": 'red'}
And this is pretty awesome: Now you can change the sequence and choose any color you'd like for particularly interesting variables. But the method above can get a bit tedious if you'd like to assign a particular color to a larger subset. So here's how you can do that too with a dict comprehension:
4. Assign colors to a group using a dict comprehension and color_discrete_map
# imports
import plotly.express as px
import pandas as pd
# dataframe
df = px.data.gapminder()
df=df.query("year==2007")
subset = {"Asia", "Europe", "Oceania"}
group_color = {i: 'red' for i in subset}
# plotly express scatter plot
px.scatter(df, x="gdpPercap", y="lifeExp",
size='pop',
color='continent',
color_discrete_sequence=px.colors.qualitative.Alphabet,
color_discrete_map=group_color
)
5. Set opacity using rgba() color codes.
Now let's take one step back. If you think red suits Asia just fine, but is perhaps a bit too strong, you can adjust the opacity using a rgba color like 'rgba(255,0,0,0.4)' to get this:
Complete code for the last plot:
import plotly.express as px
import pandas as pd
# dataframe, input
df = px.data.gapminder()
df=df.query("year==2007")
px.scatter(df, x="gdpPercap", y="lifeExp",
color_discrete_sequence=px.colors.qualitative.Alphabet,
color = 'continent',
size='pop',
color_discrete_map={"Asia": 'rgba(255,0,0,0.4)'}
)
And if you think we're getting a bit too complicated by now, you can override all settings like this again:
6. Override all settings with:
.update_traces(marker=dict(color='red'))
And this brings us right back to where we started. I hope you'll find this useful!
Complete code snippet with all options available:
# imports
import plotly.express as px
import pandas as pd
# dataframe
df = px.data.gapminder()
df=df.query("year==2007")
subset = {"Asia", "Europe", "Oceania"}
group_color = {i: 'red' for i in subset}
# plotly express scatter plot
px.scatter(df, x="gdpPercap", y="lifeExp",
size='pop',
color='continent',
color_discrete_sequence=px.colors.qualitative.Alphabet,
#color_discrete_map=group_color
color_discrete_map={"Asia": 'rgba(255,0,0,0.4)'}
)#.update_traces(marker=dict(color='red'))

Plotly: How to manually set the color of points in plotly express scatter plots?

https://plotly.com/python/line-and-scatter/ has many scatter plot examples, but not a single one showing you how to set all the points' colours within px.scatter:
# x and y given as DataFrame columns
import plotly.express as px
df = px.data.iris() # iris is a pandas DataFrame
fig = px.scatter(df, x="sepal_width", y="sepal_length")
fig.show()
I've tried adding colour = 'red' etc doesn't work. These examples only show you how to colour by some other variable.
In principle I could add another feature and set it all the same but that seems a bizzare way of accomplishing the task....
For that you may use the color_discrete_sequence argument.
fig = px.scatter(df, x="sepal_width", y="sepal_length", color_discrete_sequence=['red'])
This argument is to use a custom color paletter for discrete color factors, but if you are not using any factor for color it will use the first element for all the points in the plot.
More about discrete color palletes: https://plotly.com/python/discrete-color/
As far as I understand your question, I would try to answer it.
The parameter 'color' only accepts the column names.
In your case, you can consider using update_traces()
import plotly.express as px
df = px.data.iris() # iris is a pandas DataFrame
fig = px.scatter(df, x="sepal_width", y="sepal_length")
fig.update_traces(marker=dict(
color='red'))
fig.show()
Reference: https://plotly.com/python/marker-style/
You do not have to add another feature to get what you want here. Thanks to Python's method chaining, you can just include .update_traces(marker=dict(color='red')) to manually assign any color of your choosing to all markers.
Plot:
Code:
# x and y given as DataFrame columns
import plotly.express as px
df = px.data.iris() # iris is a pandas DataFrame
fig = px.scatter(df,x="sepal_width",
y="sepal_length"
).update_traces(marker=dict(color='red'))
fig.show()

Matplotlib bar chart - overlay bars similar to stacked

I want to create a matplotlib bar plot that has the look of a stacked plot without being additive from a multi-index pandas dataframe.
The below code gives the basic behaviour
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import io
data = io.StringIO('''Fruit,Color,Price
Apple,Red,1.5
Apple,Green,1.0
Pear,Red,2.5
Pear,Green,2.3
Lime,Green,0.5
Lime, Red, 3.0
''')
df_unindexed = pd.read_csv(data)
df_unindexed
df = df_unindexed.set_index(['Fruit', 'Color'])
df.unstack().plot(kind='bar')
The plot command df.unstack().plot(kind='bar') shows all the apple prices grouped next to each other. If you choose the option df.unstack().plot(kind='bar',stacked=True) - it adds the prices for Red and Green together and stacks them.
I am wanting a plot that is halfway between the two - it shows each group as a single bar, but overlays the values so you can see them all. The below figure (done in powerpoint) shows what behaviour I am looking for -> I want the image on the right.
Short of calculating all the values and then using the stacked option, is this possible?
This seems (to me) like a bad idea, since this representation leads to several problem. Will a reader understand that those are not staked bars? What happens when the front bar is taller than the ones behind?
In any case, to accomplish what you want, I would simply repeatedly call plot() on each subset of the data and using the same axes so that the bars are drawn on top of each other.
In your example, the "Red" prices are always higher, so I had to adjust the order to plot them in the back, or they would hide the "Green" bars.
fig,ax = plt.subplots()
my_groups = ['Red','Green']
df_group = df_unindexed.groupby("Color")
for color in my_groups:
temp_df = df_group.get_group(color)
temp_df.plot(kind='bar', ax=ax, x='Fruit', y='Price', color=color, label=color)
There are two problems with this kind of plot. (1) What if the background bar is smaller than the foreground bar? It would simply be hidden and not visible. (2) A chart like this is not distinguishable from a stacked bar chart. Readers will have severe problems interpreting it.
That being said, you can plot both columns individually.
import matplotlib.pyplot as plt
import pandas as pd
import io
data = io.StringIO('''Fruit,Color,Price
Apple,Red,1.5
Apple,Green,1.0
Pear,Red,2.5
Pear,Green,2.3
Lime,Green,0.5
Lime,Red,3.0''')
df_unindexed = pd.read_csv(data)
df = df_unindexed.set_index(['Fruit', 'Color']).unstack()
df.columns = df.columns.droplevel()
plt.bar(df.index, df["Red"].values, label="Red")
plt.bar(df.index, df["Green"].values, label="Green")
plt.legend()
plt.show()

Categories