I have a graph in plotly which I want to replace the x labels.
I pasted this graph as an example. At the bottom you will see ARI, ATL, BAL, etc. I was wondering if its possible to replace these with images? Icons?
same approach that #r-begginers provided in referenced answer
have sourced all logos from kaggle. Used PIL for encoding
have synthesized as an axis by creating a second trace with a -ve percentage and used that plot area to place logos
have set xaxis to invisible so hover provided the team abbreviation
import kaggle.cli
import sys, requests
import pandas as pd
from pathlib import Path
from zipfile import ZipFile
import urllib
import plotly.express as px
from PIL import Image
# fmt: off
# download data set
url = "https://www.kaggle.com/anzhemeng/nfl-team-logos"
sys.argv = [sys.argv[0]] + f"datasets download {urllib.parse.urlparse(url).path[1:]}".split(" ")
kaggle.cli.main()
zfile = ZipFile(f'{urllib.parse.urlparse(url).path.split("/")[-1]}.zip')
# fmt: on
zfile.extractall("nfl-logos")
df = pd.DataFrame(Path.cwd().joinpath("nfl-logos").glob("*.png"), columns=["filename"])
df["team"] = df["filename"].apply(lambda d: d.stem)
df["passResult"] = np.random.uniform(0, 1, len(df))
df = df.sort_values("team")
fig = px.scatter(df, x="team", y="passResult").add_traces(
px.scatter(df, "team", np.full(len(df), -0.05))
.update_traces(marker_color="rgba(0,0,0,0)", hovertemplate="%{x}")
.data
)
for x in fig.data[0].x:
fig.add_layout_image(
source=Image.open(df.loc[df["team"].eq(x), "filename"].values[0]),
x=x,
y=-0.01,
xref="x",
yref="y",
xanchor="center",
sizex=1,
sizey=1,
)
fig.update_layout(xaxis={"visible":False})
Related
I want to mix Plotly with a dropdown widget, the idea being to make some scatter plots and modify the x axis through the widget. Let's say that my dataset is the following :
import sea born as sns
import plotly.graph_objects as go
import pandas as pd
import ipywidgets as widgets
import seaborn as sns
df = sns.load_dataset('diamonds')
And my target is the column carat. What I tried so far is to create the scatters, include them into the widget and display it :
predictors = df.columns.tolist()
predictors.remove("carat")
target = df["carat"]
data = []
for predictor in predictors:
chart = go.Scatter(x = df[predictor],
y = target,
mode="markers")
fig = go.Figure(data=chart)
data.append((predictor,fig))
widgets.Dropdown(options = [item[0] for item in data],
value = [item[0] for item in data][0],
description = "Select :",
disabled=False)
Yet, I am new to ipywidgets/plotly and don't understand what is not working here, since it displays the widget but not the charts even when I change its value. How can I modify the code so that it finally displays the charts when selecting a predictor ?
You can use interact to read the values from the DropDown and plot your graph:
import plotly.graph_objects as go
import pandas as pd
import seaborn as sns
from ipywidgets import widgets
from ipywidgets import interact
import plotly.express as px
df = sns.load_dataset('diamonds')
predictors = df.columns.tolist()
predictors.remove("carat")
target = df["carat"]
#interact
def read_values(
predictor=widgets.Dropdown(
description="Select :", value="clarity", options=predictors
)
):
fig = px.scatter(df, x = predictor, y = target)
go.FigureWidget(fig.to_dict()).show()
I am working on a choropleth map and it is showing a white page instead of the map as shown here
https://i.stack.imgur.com/boYKY.png
I have both the geojson and the excel file downloaded in the same folder.
geojson https://drive.google.com/file/d/1N-rp9yHqE1Rzn2VxoAAweJ8-5XIjk61j/view?usp=sharing
excel https://docs.google.com/spreadsheets/d/1NKeUg20XxJe0jccMgjj9pMxrTIIWeuQk/edit?usp=sharing&ouid=100050178655652050254&rtpof=true&sd=true
Here is my code
import json
import numpy as np
import pandas as pd
import plotly.express as px
df = pd.read_excel('kraje.xlsx', sheet_name='List1')
regions_json = json.load(open("KRAJE.geojson", "r"))
fig = px.choropleth(df,
locations="K_KRAJ",
geojson=regions_json,
color='OB1506')
fig.show()
The console of my browser in which I am viewing the map shows
this
I am using a jupyter notebook in the brave browser.
Can anyone please help me solve this? Thanks
EDIT:
I found the correct geojson file but now I have a different issue. Only one region is colored and not even in the correct color and the rest of the map even outside of my regions is colored in the same color. When I hover over my regions I can see that they are in the correct place but with a wrong color. And I also have no idea why the code colored the whole map and not only the regions from the geojson file. here is an image of the output
new (should be correct) geojson https://drive.google.com/file/d/1S03NX5Q0pqgAsbJnjqt8O5w8gUHH1rt_/view?usp=sharing
import json
import numpy as np
import pandas as pd
import plotly.express as px
df = pd.read_excel('kraje.xlsx', sheet_name='List1')
regions_json = json.load(open("KRAJE.geojson", "r"))
for feature in regions_json['features']:
feature["id"] = feature["properties"]["K_KRAJ"]
fig = px.choropleth(df,
locations="K_KRAJ",
geojson=regions_json,
color='OB1506')
fig.update_geos(fitbounds="locations", visible=False)
fig.show()
SOLUTION
Thanks to Rob Raymond it finally works. There was an issue with the geojson file. I also had a ton of problems installing geopandas and the only tutorial that actually worked was installing each package separately (https://stackoverflow.com/a/69210111/17646343)
there are multiple issues with your geojson
need to define the CRS, it's clearly not epsg:4326. Appears to be UTM CRS for Czech Republic
even with this there are invalid polygons
with valid geojson, a few points you have missed
locations needs to be common across your data frame and geojson
featureidkey needs to be used to define you are joining on name
import json
import numpy as np
import pandas as pd
import plotly.express as px
import geopandas as gpd
files = {
f.suffix: f
for p in ["KRAJE*.*", "KRAJE*.*".lower()]
for f in Path.home().joinpath("Downloads").glob(p)
}
# df = pd.read_excel('kraje.xlsx', sheet_name='List1')
df = pd.read_excel(files[".xlsx"], sheet_name="List1")
# regions_json = json.load(open("KRAJE.geojson", "r"))
regions_json = json.load(open(files[".geojson"], "r"))
regions_json = (
gpd.read_file(files[".geojson"])
.dropna()
.set_crs("EPSG:32633", allow_override=True)
.to_crs("epsg:4326")
.__geo_interface__
)
fig = px.choropleth(
df,
locations="N_KRAJ",
featureidkey="properties.name",
geojson=regions_json,
color="OB1506",
)
fig.update_geos(fitbounds="locations", visible=True)
fig
updated
there are still issues with your geojson. Have fixed it using geopandas and buffer(0) (see Fix invalid polygon in Shapely)
with this and change to plotly parameters I can now generate a figure
import json
import numpy as np
import pandas as pd
import plotly.express as px
import geopandas as gpd
from pathlib import Path
files = {
f.suffix: f
for p in ["KRAJ_*.*", "KRAJE*.*".lower()]
for f in Path.home().joinpath("Downloads").glob(p)
}
# df = pd.read_excel('kraje.xlsx', sheet_name='List1')
df = pd.read_excel(files[".xlsx"], sheet_name="List1")
# regions_json = json.load(open("KRAJE.geojson", "r"))
regions_json = json.load(open(files[".json"], "r"))
# geometry is still invalid!!! force it to valid by buffer(0)
regions_json = gpd.read_file(files[".json"]).assign(geometry=lambda d: d["geometry"].buffer(0)).__geo_interface__
fig = px.choropleth(
df,
locations="K_KRAJ",
featureidkey="properties.K_KRAJ",
geojson=regions_json,
color="OB1506",
)
fig.update_geos(fitbounds="locations", visible=True)
fig
I need to plot a Choropleth graph on a Plotly map using a custom SHP file.
SHP file is used to get the boundary information. I convert it to geojson and feed it to Plotly but all i get is an empty base-map with no error messages.
Here is what i tried:
import json
import random
import pandas as pd
import geopandas as gpd
import plotly.graph_objects as go
# Create GeoDataFrame for SHP file
geodf = gpd.read_file('data/postal-areas-2021/PKS_postinumeroalueet_2021_shp.shp')
# Save as geojson
geodf.to_file("data.geojson", encoding='utf-8', driver = "GeoJSON")
# Open the file
with open("data.geojson", encoding='utf-8') as geofile:
counties = json.load(geofile)
# Create a new Dataframe for supplying z values(colors) to Choropleth.
df = pd.DataFrame()
# Create lists to store the values
postal_code = []
rand = []
# Using Posno(Postal code) as ID and generateing random integers from 1-100 as a color value
for i,v in enumerate(counties['features']):
postal_code.append(counties['features'][i]['properties']['Posno'])
rand.append(random.randint(1,100))
# Adding the columns to the dataframe
df['Posno'] = postal_code
df['rand'] = rand
# Creating the figure and assigning the values
fig = go.Figure(go.Choroplethmapbox(geojson=counties, locations=df['Posno'], z=df['rand'],
colorscale="Viridis", zmin=0, zmax=12, marker_line_width=5))
fig.update_layout(mapbox_style="open-street-map",
height = 1000,
autosize=True,
margin={"r":0,"t":0,"l":0,"b":0},
paper_bgcolor='#303030',
plot_bgcolor='#303030',
mapbox=dict(center=dict(lat=60.1699, lon=24.9384),zoom=11),
)
fig.show()
Question:
How to plot a choropleth from an SHP file with Plotly go.Choroplethmapbox()?
you have missed one very important step, consideration of CRS projection https://geopandas.org/docs/user_guide/projections.html
this is resolved with geodf = geodf.to_crs("WGS84")
additionally it's far simpler to use https://plotly.com/python/mapbox-county-choropleth/#using-geopandas-data-frames to generate this mapbox
plotly graph objects
import requests
from pathlib import Path
from zipfile import ZipFile
import geopandas as gpd
import numpy as np
import plotly.graph_objects as go
import json
# get the shape file...
url = "https://avoidatastr.blob.core.windows.net/avoindata/AvoinData/9_Kartat/PKS%20postinumeroalueet/Shp/PKS_postinumeroalueet_2021_shp.zip"
fn = Path.cwd().joinpath(url.split("/")[-1])
if not fn.exists():
r = requests.get(url, stream=True)
with open(fn, "wb") as f:
for chunk in r.raw.stream(1024, decode_content=False):
if chunk:
f.write(chunk)
zfile = ZipFile(fn)
zfile.extractall()
# open it...
geodf = gpd.read_file(list(Path.cwd().glob("PKS*.shp"))[0])
geodf["rand"] = np.random.randint(1, 100, len(geodf))
# shape file is a different CRS, change to lon/lat GPS co-ordinates
geodf = geodf.to_crs("WGS84").set_index("Posno")
fig = go.Figure(go.Choroplethmapbox(geojson=json.loads(geodf.to_json()),
locations=geodf.index, z=geodf['rand'],
colorscale="Viridis", marker_line_width=.5))
fig.update_layout(mapbox_style="open-street-map",
height = 1000,
autosize=True,
margin={"r":0,"t":0,"l":0,"b":0},
paper_bgcolor='#303030',
plot_bgcolor='#303030',
mapbox=dict(center=dict(lat=60.1699, lon=24.9384),zoom=9),
)
plotly express
import requests
from pathlib import Path
from zipfile import ZipFile
import geopandas as gpd
import numpy as np
import plotly.express as px
# get the shape file...
url = "https://avoidatastr.blob.core.windows.net/avoindata/AvoinData/9_Kartat/PKS%20postinumeroalueet/Shp/PKS_postinumeroalueet_2021_shp.zip"
fn = Path.cwd().joinpath(url.split("/")[-1])
if not fn.exists():
r = requests.get(url, stream=True)
with open(fn, "wb") as f:
for chunk in r.raw.stream(1024, decode_content=False):
if chunk:
f.write(chunk)
zfile = ZipFile(fn)
zfile.extractall()
# open it...
geodf = gpd.read_file(list(Path.cwd().glob("PKS*.shp"))[0])
geodf["rand"] = np.random.randint(1, 100, len(geodf))
# shape file is a different CRS, change to lon/lat GPS co-ordinates
geodf = geodf.to_crs("WGS84")
fig = px.choropleth_mapbox(
geodf.set_index("Posno"),
geojson=geodf.geometry,
locations=geodf.index,
color="rand",
center=dict(lat=60.1699, lon=24.9384),
mapbox_style="open-street-map",
zoom=9,
)
fig.update_layout(
height=1000,
autosize=True,
margin={"r": 0, "t": 0, "l": 0, "b": 0},
paper_bgcolor="#303030",
plot_bgcolor="#303030",
)
fig
I am new to python and wanted to try using a choropleth map. I have the following code for the graph.
import numpy as np
import pandas as pd
import plotly.express as px
df = pd.read_csv(r'C:\Users\lukee\Desktop\COVID Visualisation\time_series_covid_19_confirmed.csv')
#Data for number of cases for each country across the different dates
geojson = df['Country/Region']
#define the colour codes for the number of cases across the different dates
colourscale = px.colors.sequential.Plasma
#world map to show the intensity of cases in each country
fig = px.choropleth(df,
geojson=geojson,
locationmode= 'country names',
color = df['5/16/21'],
color_continuous_scale = colourscale,
scope='world',
hover_name=df["Country/Region"],
labels={'COVID Cases'})
fig.update(layout_coloraxis_showscale=False)
fig.show()
solution uses sourcing open world, not kaggle
plotting code, there were some inconsistencies on how you requested columns in data frame. addition of featureidkey parameter so dataframe and geojson join correctly
data sourcing
import requests
import pandas as pd
from pathlib import Path
from zipfile import ZipFile
import json, io
# source geojson for country boundaries
geosrc = pd.json_normalize(requests.get("https://pkgstore.datahub.io/core/geo-countries/7/datapackage.json").json()["resources"])
fn = Path(geosrc.loc[geosrc["name"].eq("geo-countries_zip"), "path"].values[0]).name
if not Path.cwd().joinpath(fn).exists():
r = requests.get(geosrc.loc[geosrc["name"].eq("geo-countries_zip"), "path"].values[0],stream=True,)
with open(fn, "wb") as fd:
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
zfile = ZipFile(fn)
with zfile.open(zfile.infolist()[0]) as f:
geojson = json.load(f)
# source COVID data
dfall = pd.read_csv(io.StringIO(requests.get("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv").text))
dfall["date"] = pd.to_datetime(dfall["date"])
dflatest = (dfall.sort_values(["iso_code", "date"]).groupby("iso_code", as_index=False).last())
colorcol = "new_cases_smoothed_per_million"
# filter out where data is no for a country or no data available for latest date or a big outlier
dflatest = dflatest.loc[
dflatest[colorcol].gt(0).fillna(False)
& dflatest["iso_code"].str.len().eq(3)
& dflatest[colorcol].lt(dflatest[colorcol].quantile(0.97))
]
plotting
import plotly.express as px
#define the colour codes for the number of cases across the different dates
colourscale = px.colors.sequential.Plasma
#world map to show the intensity of cases in each country
fig = px.choropleth(dflatest,
geojson=geojson,
locations= 'iso_code',
featureidkey="properties.ISO_A3",
color = colorcol,
color_continuous_scale = colourscale,
scope='world',
hover_name="location",
labels={colorcol:'COVID Cases'}
)
fig.update_layout(coloraxis_showscale=False, margin={"l":0,"r":0,"t":0,"r":0})
fig
output
Background:
This question is related, but not identical, to Plotly: How to retrieve values for major ticks and gridlines?. A similar question has also been asked but not answered for matplotlib here: How do I show major ticks as the first day of each months and minor ticks as each day?
Plotly is fantastic, and maybe the only thing that bothers me is the autoselection of ticks / gridlines and the labels chosen for the x-axis like in this plot:
Plot 1:
I think the natural thing to display here is the first of each month (depending ong the period of course). Or maybe even just an abreviateed month name like 'Jan' on each tick. I realize both the technical and even visual challenges due to the fact that all months are not of equal length. But does anyone know how to do this?
Reproducible snippet:
import plotly
import cufflinks as cf
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import pandas as pd
import numpy as np
from IPython.display import HTML
from IPython.core.display import display, HTML
import copy
# setup
init_notebook_mode(connected=True)
np.random.seed(123)
cf.set_config_file(theme='pearl')
# Random data using cufflinks
df = cf.datagen.lines()
#df = df['UUN.XY']
fig = df.iplot(asFigure=True, kind='scatter',
xTitle='Dates',yTitle='Returns',title='Returns')
iplot(fig)
(updated answer for newer versions of plotly)
With newer versions of plotly, you can specify dtick = 'M1' to set gridlines at the beginning of each month. You can also format the display of the month through tickformat:
Snippet 1
fig.update_xaxes(dtick="M2",
tickformat="%b\n%Y"
)
Plot 1
And if you'd like to set the gridlines at every second month, just change "M1" to "M2"
Plot 2
Complete code:
# imports
import pandas as pd
import plotly.express as px
# data
df = px.data.stocks()
df = df.tail(40)
colors = px.colors.qualitative.T10
# plotly
fig = px.line(df,x = 'date',
y = [c for c in df.columns if c != 'date'],
template = 'plotly_dark',
color_discrete_sequence = colors,
title = 'Stocks',
)
fig.update_xaxes(dtick="M2",
tickformat="%b\n%Y"
)
fig.show()
Old Solution:
How to set the gridlines will depend entirely on what you'd like to display, and how the figure is built before you try to edit the settings. But to obtain the result specified in the question, you can do it like this.
Step1:
Edit fig['data'][series]['x'] for each series in fig['data'].
Step2:
set tickmode and ticktext in:
go.Layout(xaxis = go.layout.XAxis(tickvals = [some_values]
ticktext = [other_values])
)
Result:
Complete code for a Jupyter Notebook:
# imports
import plotly
import cufflinks as cf
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import pandas as pd
import numpy as np
from IPython.display import HTML
from IPython.core.display import display, HTML
import copy
import plotly.graph_objs as go
# setup
init_notebook_mode(connected=True)
np.random.seed(123)
cf.set_config_file(theme='pearl')
#%qtconsole --style vim
# Random data using cufflinks
df = cf.datagen.lines()
# create figure setup
fig = df.iplot(asFigure=True, kind='scatter',
xTitle='Dates',yTitle='Returns',title='Returns')
# create df1 to mess around with while
# keeping the source intact in df
df1 = df.copy(deep = True)
df1['idx'] = range(0, len(df))
# time variable operations and formatting
df1['yr'] = df1.index.year
df1['mth'] = df1.index.month_name()
# function to replace month name with
# abbreviated month name AND year
# if the month is january
def mthFormat(month):
dDict = {'January':'jan','February':'feb', 'March':'mar',
'April':'apr', 'May':'may','June':'jun', 'July':'jul',
'August':'aug','September':'sep', 'October':'oct',
'November':'nov', 'December':'dec'}
mth = dDict[month]
return(mth)
# replace month name with abbreviated month name
df1['mth'] = [mthFormat(m) for m in df1['mth']]
# remove adjacent duplicates for year and month
df1['yr'][df1['yr'].shift() == df1['yr']] = ''
df1['mth'][df1['mth'].shift() == df1['mth']] = ''
# select and format values to be displayed
df1['idx'][df1['mth']!='']
df1['display'] = df1['idx'][df1['mth']!='']
display = df1['display'].dropna()
displayVal = display.values.astype('int')
df_display = df1.iloc[displayVal]
df_display['display'] = df_display['display'].astype('int')
df_display['yrmth'] = df_display['mth'] + '<br>' + df_display['yr'].astype(str)
# set properties for each trace
for ser in range(0,len(fig['data'])):
fig['data'][ser]['x'] = df1['idx'].values.tolist()
fig['data'][ser]['text'] = df1['mth'].values.tolist()
fig['data'][ser]['hoverinfo']='all'
# layout for entire figure
f2Data = fig['data']
f2Layout = go.Layout(
xaxis = go.layout.XAxis(
tickmode = 'array',
tickvals = df_display['display'].values.tolist(),
ticktext = df_display['yrmth'].values.tolist(),
zeroline = False)#,
)
# plot figure with specified major ticks and gridlines
fig2 = go.Figure(data=f2Data, layout=f2Layout)
iplot(fig2)
Some important details:
1. Flexibility and limitations with iplot():
This approach with iplot() and editing all those settings is a bit clunky, but it's very flexible with regards to the number of columns / variables in the dataset, and arguably preferable to building each trace manually like trace1 = go.Scatter() for each and every column in the df.
2. Why do you have to edit each series / trace?
If you try to skip the middle part with
for ser in range(0,len(fig['data'])):
fig['data'][ser]['x'] = df1['idx'].values.tolist()
fig['data'][ser]['text'] = df1['mth'].values.tolist()
fig['data'][ser]['hoverinfo']='all'
and try to set tickvals and ticktext directly on the entire plot, it will have no effect:
I think that's a bit weird, but I think it's caused by some underlying settings initiated by iplot().
3. One thing is still missing:
In order fot thie setup to work, the structure of ticvals and ticktext is [0, 31, 59, 90] and ['jan<br>2015', 'feb<br>', 'mar<br>', 'apr<br>'], respectively. This causes the xaxis line hovertext show the position of the data where ticvals and ticktext are empty:
Any suggestions on how to improve the whole thing is highly appreciated. Better solutions than my own will instantly receive Accepted Answer status!