I'm creating a sankey diagram using plotly and there is the built in method to use 'group' to combine nodes. However, when I use this the color of this node will be black and no label is showing. This is expected as the colors of the grouped nodes could vary. However, I don't see how I can set the color of the group. Same goes for the label.
Is there a way to define this?
example code:
import plotly.graph_objs as go
from plotly.offline import plot
value = [3,5,2,4,6]
source = [0,0,1,0,3]
target = [1,4,2,3,4]
color = ["blue","yellow","orange","orange","purple"]
label = ["A","B","C1","C2","D"]
data = dict(
type='sankey',
arrangement = 'freeform',
node = dict(
pad = 15,
thickness = 20,
line = dict(
color = "black",
width = 0.1
),
groups = [[2,3]],
label = label,
color = color,
),
link = dict(
source = source,
target = target,
value = value,
)
)
layout = dict(
title = "Sankey test",
font = dict(
size = 10
)
)
f = go.FigureWidget(data=[data], layout=layout)
plot(f)
Which renders:
Since I'm getting the following error with your snippet:
ValueError: Invalid property specified for object of type plotly.graph_objs.sankey.Node: 'groups'
And since I don't know what versions you are running of plotly, python (and Jupyter Notebook?), I would simply suggest that you restructure your source data and do the C1 and C2 grouping into simply C before you build your plot. And keep in mind that Links are assigned in the order they appear in dataset and that node colors are assigned in the order that the plot is built.
Plot:
Code:
# imports
import pandas as pd
import numpy as np
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
# settings
init_notebook_mode(connected=True)
# Nodes & links
nodes = [['ID', 'Label', 'Color'],
[0,'A','blue'],
[1,'B','yellow'],
[2,'C','orange'],
[3,'D','purple'],
]
# links with your data
links = [['Source','Target','Value','Link Color'],
[0,1,3,'rgba(200, 205, 206, 0.6)'],
[0,2,5,'rgba(200, 205, 206, 0.6)'],
[0,3,5,'rgba(200, 205, 206, 0.6)'],
[1,2,6,'rgba(200, 205, 206, 0.6)'],
[2,3,6,'rgba(200, 205, 206, 0.6)'],
]
# Retrieve headers and build dataframes
nodes_headers = nodes.pop(0)
links_headers = links.pop(0)
df_nodes = pd.DataFrame(nodes, columns = nodes_headers)
df_links = pd.DataFrame(links, columns = links_headers)
# Sankey plot setup
data_trace = dict(
type='sankey',
domain = dict(
x = [0,1],
y = [0,1]
),
orientation = "h",
valueformat = ".0f",
node = dict(
pad = 10,
# thickness = 30,
line = dict(
color = "black",
width = 0
),
label = df_nodes['Label'].dropna(axis=0, how='any'),
color = df_nodes['Color']
),
link = dict(
source = df_links['Source'].dropna(axis=0, how='any'),
target = df_links['Target'].dropna(axis=0, how='any'),
value = df_links['Value'].dropna(axis=0, how='any'),
color = df_links['Link Color'].dropna(axis=0, how='any'),
)
)
layout = dict(
title = "Sankey Test",
height = 772,
font = dict(
size = 10),)
fig = dict(data=[data_trace], layout=layout)
iplot(fig, validate=False)
My system info:
The version of the notebook server is: 5.6.0
The server is running on this version of Python:
Python 3.7.0 (default, Jun 28 2018, 08:04:48) [MSC v.1912 64 bit (AMD64)]
Related
I'm new in using plotly and I'm trying to make a 2 different graph and show them individually through button; however, when I make it, the legends duplicated, resulting to a bad visualization of the data. Here's the code that I'm running right now:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly as ply
import plotly.express as px
import plotly.graph_objects as go
url = "https://raw.githubusercontent.com/m23chaffee/DS100-Repository/main/Aluminum%20Alloy%20Data%20Set.csv"
alloy = pd.read_csv('https://raw.githubusercontent.com/m23chaffee/DS100-Repository/main/Aluminum%20Alloy%20Data%20Set.csv')
del alloy['temper']
alloy = alloy.rename(columns={'aluminum_alloy':'Alloy Number',
'modulus_elastic': 'Elastic Modulus',
'modulus_shear': 'Shear Modulus',
'strength_yield': 'Yield Strength',
'strength_tensile': 'Tensile Strength'
})
bar1 = px.bar(alloy,
x = "Alloy Number",
y = ["Elastic Modulus", "Shear Modulus","Yield Strength","Tensile Strength"],
barmode = 'group',
width = 1100,
height =500,
orientation = 'v',
color_discrete_sequence = px.colors.qualitative.Pastel,
labels={"value": "Data Values"},
template = 'seaborn').update_traces(legendgroup="group").update_layout(showlegend=False)
line1 = px.line(alloy,
x = "Alloy Number",
y = ["Elastic Modulus", "Shear Modulus","Yield Strength","Tensile Strength"],
width = 1100,
height =500,
orientation = 'v',
color_discrete_sequence = px.colors.qualitative.Pastel,
labels={"value": "Data Values"},
template = 'seaborn').update_traces(legendgroup="group", visible = 'legendonly').update_layout(showlegend=False)
# Add buttom
fig.update_layout(
updatemenus=[
dict(
type = "buttons",
direction = "left",
buttons=list([
dict(
args=['type', 'bar'],
label="Bar Graph",
method="restyle",
),
dict(
args=["type", "line"],
label="Line Graph",
method="restyle"
)
]),
pad={"r": 10, "t": 10},
showactive=True,
x=0.11,
xanchor="left",
y=1.1,
yanchor="middle"
),
]
)
fig.show()
and the result of the image would look like this:
Result of the code above
Attempted Solution
I tried to hide it using traces and in the documentation but it seems it didn't work out for me. I also found a similar stackoverflow post 8 years ago, tried it, and it didn't make any changes in my graph.
So I haven't found any clear solutions in the r documentation for the Sankey diagram, and hoping someone could help me! All I want to do is make the links the same color as the source node, and have the link darken when hovering above it. Here's my Sankey Diagram as it exists at the moment, unfortunately, I can't share the data as there are some confidentiality issues. At the bottom you'll find a link for the image of the plot that I have.
dt <- setDT(copy(dt_minors2016))
nodes <- dt[,unique(c(citizen,geo))]
sources <- match(dt[,citizen],nodes)-1
targets <- match(dt[,geo], nodes) -1
values <- dt[,V1]
fig <- plot_ly(
type = "sankey",
#default= 1000,
domain = list(
x = c(0,1),
y = c(0,1)
),
orientation = "h",
valueformat = ".0f",
valuesuffix = "Persons",
node = list(
label = nodes,
# color = colors,
pad = 15,
thickness = 15,
line = list(
color = "black",
width = 0.5
)
),
link = list(
source = sources,
target = targets,
value = values,
color = 'rgba(0,255,255,0.4)'
)
)
fig <- fig %>% layout(
title = "UAM asylum seekers from top 5 origin countries to EU countries - 2016",
font = list(
size = 10
),
xaxis = list(showgrid = F, zeroline = F),
yaxis = list(showgrid = F, zeroline = F),
hovermode = "x unified"
)
fig
https://i.stack.imgur.com/oAMh8.png
as per comments, solution provided in python not R
core to solution is setting color on both nodes and links using the imdex associated with name to select a color from a predefined color list
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px
import itertools
df = pd.DataFrame(
itertools.product(
["AF", "SY", "IQ", "SO", "ER"], ["DE", "AT", "BG", "SE", "UK", "CH"]
),
columns=["source", "target"],
).pipe(lambda d: d.assign(value=np.random.uniform(1, 10000, 1000)[:len(d)]))
nodes = np.unique(df[["source", "target"]], axis=None)
nodes = pd.Series(index=nodes, data=range(len(nodes)))
fig = go.Figure(
go.Sankey(
node={
"label": nodes.index,
"color": [
px.colors.qualitative.Plotly[i % len(px.colors.qualitative.Plotly)]
for i in nodes
],
},
link={
"source": nodes.loc[df["source"]],
"target": nodes.loc[df["target"]],
"value": df["value"],
"color": [
px.colors.qualitative.Plotly[i % len(px.colors.qualitative.Plotly)]
for i in nodes.loc[df["source"]]
],
},
)
)
fig
I want to keep the labels when you hover, but hide the labels from just appearing over the Sankey as text.
Here is my code:
labels = df_mapping['Name'].to_numpy().tolist() + labels
count_dict = {}
source = []
target = []
value = df_subset['Stuff'].to_numpy().tolist()
index = 0
for x in unique_broad:
count_dict[x] = len(df_mapping.loc[df_mapping['Stuff'] == x])
for key in count_dict:
for i in range(count_dict[key]):
source.append(index)
index += 1
for key in count_dict:
for i in range(count_dict[key]):
target.append(index)
index += 1
number_of_colors = len(source)
color_link = ["#"+''.join([random.choice('0123456789ABCDEF') for j in range(6)])
for i in range(number_of_colors)]
link = dict(source=source, target=target, value=value, color=color_link)
node = dict(label=labels, pad=35, thickness=10)
data = go.Sankey(link=link, node=node)
fig = go.Figure(data)
fig.update_layout(
hovermode = 'x',
title="Sankey for Stuff",
font=dict(size=8, color='white'),
paper_bgcolor='#51504f'
)
return fig
You can make the labels invisible by setting the color of the labels to rgba(0,0,0,0). This ensures that the label will remain in the hovertemplate, but not show up on the nodes.
To do this you can pass textfont=dict(color="rgba(0,0,0,0)", size=1) to go.Sankey such as in the example you used from the Plotly sankey diagram documentation:
import plotly.graph_objects as go
import urllib.request, json
url = 'https://raw.githubusercontent.com/plotly/plotly.js/master/test/image/mocks/sankey_energy.json'
response = urllib.request.urlopen(url)
data = json.loads(response.read())
# override gray link colors with 'source' colors
opacity = 0.4
# change 'magenta' to its 'rgba' value to add opacity
data['data'][0]['node']['color'] = ['rgba(255,0,255, 0.8)' if color == "magenta" else color for color in data['data'][0]['node']['color']]
data['data'][0]['link']['color'] = [data['data'][0]['node']['color'][src].replace("0.8", str(opacity))
for src in data['data'][0]['link']['source']]
fig = go.Figure(data=[go.Sankey(
textfont=dict(color="rgba(0,0,0,0)", size=1),
valueformat = ".0f",
valuesuffix = "TWh",
# Define nodes
node = dict(
pad = 15,
thickness = 15,
line = dict(color = "black", width = 0.5),
label = data['data'][0]['node']['label'],
color = data['data'][0]['node']['color']
),
# Add links
link = dict(
source = data['data'][0]['link']['source'],
target = data['data'][0]['link']['target'],
value = data['data'][0]['link']['value'],
label = data['data'][0]['link']['label'],
color = data['data'][0]['link']['color']
))])
fig.update_layout(title_text="Energy forecast for 2050<br>Source: Department of Energy & Climate Change, Tom Counsell via <a href='https://bost.ocks.org/mike/sankey/'>Mike Bostock</a>",
font_size=10)
fig.show()
You get the following:
I wanted to make a choropleth world map, which shows the hits(number of searches) of a word, on a World map.
Following is the code:
import plotly
import plotly.offline
import pandas as pd
df = pd.read_excel('F:\\Intern\\csir\\1yr\\news\\region_2016_2017.xlsx')
df = df.query('keyword==["addiction"]')
scl = [[0.0, 'rgb(242,240,247)'],[0.2, 'rgb(218,218,235)'],[0.4, 'rgb(188,189,220)'],\
[0.6, 'rgb(158,154,200)'],[0.8, 'rgb(117,107,177)'],[1.0, 'rgb(84,39,143)']]
data = [dict(
type='choropleth',
colorscale=scl,
locations = df['location'],
z = df['hits'].astype(int),
locationmode = "country names",
autocolorscale = False,
reversescale = False,
marker = dict(
line = dict (
color = 'rgb(180,180,180)',
width = 0.5)),
colorbar = dict(
autotick = False,
title = 'Hits'),)]
layout = dict(
title = 'Addiction keyword 1yr analysis',
geo = dict(
showframe = False,
showcoastlines = False,
projection = dict(
type = 'Mercator'
)
)
)
fig = dict(data = data,layout = layout)
plotly.offline.plot(fig,validate=False,filename = 'd3-world-map.html')
And the plotted map is:
As one can see clearly, many countries are missing. This may be due to the fact that many countries didn't have entries which explicitly stated that they have zero hits.
I don't want to explicitly do that with my data. Is there any other way out of this? So that we can see all of the countries.
Data set can be found here.
Note that the dataset that I've linked is an .csv file whereas the file used in the program is an .xlsx version of the file.
You need to turn on country outlines under layout...
"geo":{
"countriescolor": "#444444",
"showcountries": true
},
This is what I am doing in an ipython notebook where the plotly graphs and everything gets generated without fail. After that I am taking the html form of the notebook and embedding it in a django template where everything is working other than the plotly graphs. I am not sure what needs to be done thats why I also tried installing plotly on npm and also including a reference to plotly.js through my template. Below are the codes.
import pandas as pd
import numpy as np
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
from plotly.graph_objs import *
init_notebook_mode()
data = pd.read_csv("storage/AviationDataUp.csv")
useful_columns = ['Event.Date', 'Location', 'Country', 'Latitude', 'Longitude', 'Purpose.of.Flight',\
'Total.Fatal.Injuries','Number.of.Engines','Air.Carrier']
data = data[useful_columns]
data = data[data['Country']=='United States']
accident_trace = Scattergeo(
locationmode = 'ISO-3',
lon = data['Longitude'],
lat = data['Latitude'],
mode = 'markers',
marker = dict(
size = 2,
opacity = 0.75,
color="rgb(0, 130, 250)"),
name = 'Accidents'
)
layout = dict(
title = 'Aviation Accidents in USA',
geo = dict(
scope = 'usa',
projection = dict(),
showland = True,
landcolor = 'rgb(250, 250, 250)',
subunitwidth = 1,
subunitcolor = 'rgb(217, 217, 217)',
countrywidth = 1,
countrycolor = 'rgb(217, 217, 217)',
showlakes = True,
lakecolor = 'rgb(255, 255, 255)'
) )
figure = dict(data=Data([accident_trace]), layout=layout)
iplot(figure)