I'm running this in Jupyter Notebook. I'll attach my full code. I'm using a csv file from Kaggle to plot the cumulative coronavirus cases throughout different countries in the world.
Here's the link to the Kaggle dataset download: https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset
I'm using the "covid_19_data.csv" file.
import chart_studio.plotly as py
import plotly.graph_objs as go
import pandas as pd
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot, plot
init_notebook_mode(connected = True)
cf.go_offline()
df = pd.read_csv('covid_19_data.csv')
data = dict(type = 'choropleth',
locations = df['Country/Region'],
z = df['Confirmed'],
text = df['Province/State'],
colorbar = {'title':'Cases of COVID-19'} )
layout = dict(title = '2020 Global Coronavirus Cases', geo = dict(showframe = False, projection = {'type':'natural earth'}))
choromap = go.Figure(data = [data],layout = layout)
iplot(choromap)
The output is a gray map of the world. There is a legend with color, and a title as well. I'm confused why the data is not being plotted!
Related
I am new to python and wanted to try using a choropleth map. I have the following code for the graph.
import numpy as np
import pandas as pd
import plotly.express as px
df = pd.read_csv(r'C:\Users\lukee\Desktop\COVID Visualisation\time_series_covid_19_confirmed.csv')
#Data for number of cases for each country across the different dates
geojson = df['Country/Region']
#define the colour codes for the number of cases across the different dates
colourscale = px.colors.sequential.Plasma
#world map to show the intensity of cases in each country
fig = px.choropleth(df,
geojson=geojson,
locationmode= 'country names',
color = df['5/16/21'],
color_continuous_scale = colourscale,
scope='world',
hover_name=df["Country/Region"],
labels={'COVID Cases'})
fig.update(layout_coloraxis_showscale=False)
fig.show()
solution uses sourcing open world, not kaggle
plotting code, there were some inconsistencies on how you requested columns in data frame. addition of featureidkey parameter so dataframe and geojson join correctly
data sourcing
import requests
import pandas as pd
from pathlib import Path
from zipfile import ZipFile
import json, io
# source geojson for country boundaries
geosrc = pd.json_normalize(requests.get("https://pkgstore.datahub.io/core/geo-countries/7/datapackage.json").json()["resources"])
fn = Path(geosrc.loc[geosrc["name"].eq("geo-countries_zip"), "path"].values[0]).name
if not Path.cwd().joinpath(fn).exists():
r = requests.get(geosrc.loc[geosrc["name"].eq("geo-countries_zip"), "path"].values[0],stream=True,)
with open(fn, "wb") as fd:
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
zfile = ZipFile(fn)
with zfile.open(zfile.infolist()[0]) as f:
geojson = json.load(f)
# source COVID data
dfall = pd.read_csv(io.StringIO(requests.get("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv").text))
dfall["date"] = pd.to_datetime(dfall["date"])
dflatest = (dfall.sort_values(["iso_code", "date"]).groupby("iso_code", as_index=False).last())
colorcol = "new_cases_smoothed_per_million"
# filter out where data is no for a country or no data available for latest date or a big outlier
dflatest = dflatest.loc[
dflatest[colorcol].gt(0).fillna(False)
& dflatest["iso_code"].str.len().eq(3)
& dflatest[colorcol].lt(dflatest[colorcol].quantile(0.97))
]
plotting
import plotly.express as px
#define the colour codes for the number of cases across the different dates
colourscale = px.colors.sequential.Plasma
#world map to show the intensity of cases in each country
fig = px.choropleth(dflatest,
geojson=geojson,
locations= 'iso_code',
featureidkey="properties.ISO_A3",
color = colorcol,
color_continuous_scale = colourscale,
scope='world',
hover_name="location",
labels={colorcol:'COVID Cases'}
)
fig.update_layout(coloraxis_showscale=False, margin={"l":0,"r":0,"t":0,"r":0})
fig
output
I'm using Bokeh and Geopandas to plot an interactive map of Germany. Germany has total 16 states but the plot shows only 15. It does not display the map of Berlin, which is the capital city (Berlin is also a state). I'm using the shapefile as an input to plot the map. I have tried different shapefiles and looked for different solutions but I'm unable to find the root of the problem. Please have a look at the code and the output.
`
import pandas as pd
# Import geopandas package
import geopandas as gpd
# Read in shapefile and examine data
germany = gpd.read_file('Igismap/Germany_Polygon.shp')
pop_states = germany
vargeojson = pop_states.to_json()
import json
from bokeh.io import show, output_notebook
from bokeh.models import (ColumnDataSource,
GeoJSONDataSource, HoverTool,
LinearColorMapper)
from bokeh.layouts import column, row, widgetbox
from bokeh.plotting import figure
output_notebook()
# Input GeoJSON source that contains features for plotting
geosource = GeoJSONDataSource(geojson = vargeojson)
tools = "pan, wheel_zoom, box_zoom, reset"
p = figure(title = 'All states of Germany',
plot_height = 600 ,
plot_width = 600,
toolbar_location = 'right',
tools = tools)
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
# Add patch renderer to figure.
states = p.patches('xs','ys', source = geosource,
line_color = "grey",
line_width = 0.25,
fill_alpha = 1)
# Create hover tool
p.add_tools(HoverTool(renderers = [states],
tooltips = [('Lander','#name')]))
show(p)
`
Click here to see the output of above code.... and
Click here to see the desired output
I created a small excel file listing the confirmed cases, deaths, and recovered cases of the Coronavirus here in the U.S, but I can't seem to get the choropleth map working.
Here's my code:
import pandas as pd
import chart_studio.plotly as py
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
col_Names=["State", "Country", "Time Discovered", "Confirmed", "Deaths","Recovered"]
df = pd.read_csv("coronavirusUS.csv", names=col_Names)
data = dict(type='choropleth',
colorscale= 'magma',
locations = df['State'],
locationmode= 'USA-states',
z = df['Confirmed'],
text = df['Confirmed'],
marker = dict(line=dict(color='rgb(255, 255, 255)', width=2)),
colorbar = {'title':'Coronavirus in the U.S'})
layout = dict(title = 'Coronavirus in the US',
geo= dict(scope = 'usa',
showlakes = True,
lakecolor = 'rgb(85, 173, 240)'))
choromap = go.Figure(data = [data], layout = layout)
iplot(choromap)
And then my map comes out looking like this:
empty map
As you can see, the colorbar is accurate, but the map itself is blank. Except for the lakes.
Here's the .csv file I'm referring to.
data table
I'm using jupyter notebook.
I've tried switching from a .xls to .csv, but that didn't work.
Thanks in advance.
I have some sample code to plot a map of Ontario using Bokeh. The code reads in the shapefile and converts it to a geojson file as suggested from examples available in the internet.
The shapefile source data is the Ontario census subdivision geographic boundary from the StatsCan website downloaded as a shapefile.
Image screenshot: https://imgur.com/xn1Zzdh
The result so far is an empty chart and I can't figure out what's wrong.
The shapefile is loaded first as a geopandas dataframe and converted to geojson.
Apologies for my lack of stackoverflow etiquette. I'm a new user.
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import geopandas
import os
from bokeh.plotting import figure, output_file, show, save,output_notebook
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar
from bokeh.palettes import brewer
pd.options.display.max_rows = 10
workspace = r'C:\Users\user\Documents\lcsd000b16a_e'
CSD_LAYER = geopandas.read_file(os.path.join(workspace,r"lcsd000b16a_e.shp"))
ONT_CSD = CSD_LAYER[CSD_LAYER['PRUID']=='35']
ONT_CSD['geometry'].head()
1372 POLYGON ((7202895.13143 1077367.822855, 720382...
1373 POLYGON ((7205717.394285 1098087.974285, 72058...
1374 POLYGON ((7169056.905715 1216085.682855, 71693...
1614 POLYGON ((7162217.717145 948748.982855, 716229...
1809 POLYGON ((7506330.95143 1116872.145715, 750632...
# # Get the CRS of our grid
CRS = ONT_CSD.crs
print('FROM:' + str(CRS))
ONT_CSD = ONT_CSD.to_crs(epsg=3857) #transform to webmercator
print('TO: '+ str(ONT_CSD.crs))
FROM:{'init': 'epsg:3347'}
TO: {'init': 'epsg:3857', 'no_defs': True}
import json
#read data to json file
ONT_CSD_json = json.loads(ONT_CSD.to_json())
#convert to string like object
ONT_CSD_JSON_DATA = json.dumps(ONT_CSD_json)
#Input GeoJSON source that contains features for plotting.
geosource = GeoJSONDataSource(geojson = ONT_CSD_JSON_DATA)
#Create figure object.
p = figure(title = 'test', plot_height = 600 , plot_width = 950)
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
#Add patch renderer to figure.
p.patch('xs','ys', source = geosource,
line_color = 'black', line_width = 1, fill_alpha = 0.75)
Background:
This question is related, but not identical, to Plotly: How to retrieve values for major ticks and gridlines?. A similar question has also been asked but not answered for matplotlib here: How do I show major ticks as the first day of each months and minor ticks as each day?
Plotly is fantastic, and maybe the only thing that bothers me is the autoselection of ticks / gridlines and the labels chosen for the x-axis like in this plot:
Plot 1:
I think the natural thing to display here is the first of each month (depending ong the period of course). Or maybe even just an abreviateed month name like 'Jan' on each tick. I realize both the technical and even visual challenges due to the fact that all months are not of equal length. But does anyone know how to do this?
Reproducible snippet:
import plotly
import cufflinks as cf
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import pandas as pd
import numpy as np
from IPython.display import HTML
from IPython.core.display import display, HTML
import copy
# setup
init_notebook_mode(connected=True)
np.random.seed(123)
cf.set_config_file(theme='pearl')
# Random data using cufflinks
df = cf.datagen.lines()
#df = df['UUN.XY']
fig = df.iplot(asFigure=True, kind='scatter',
xTitle='Dates',yTitle='Returns',title='Returns')
iplot(fig)
(updated answer for newer versions of plotly)
With newer versions of plotly, you can specify dtick = 'M1' to set gridlines at the beginning of each month. You can also format the display of the month through tickformat:
Snippet 1
fig.update_xaxes(dtick="M2",
tickformat="%b\n%Y"
)
Plot 1
And if you'd like to set the gridlines at every second month, just change "M1" to "M2"
Plot 2
Complete code:
# imports
import pandas as pd
import plotly.express as px
# data
df = px.data.stocks()
df = df.tail(40)
colors = px.colors.qualitative.T10
# plotly
fig = px.line(df,x = 'date',
y = [c for c in df.columns if c != 'date'],
template = 'plotly_dark',
color_discrete_sequence = colors,
title = 'Stocks',
)
fig.update_xaxes(dtick="M2",
tickformat="%b\n%Y"
)
fig.show()
Old Solution:
How to set the gridlines will depend entirely on what you'd like to display, and how the figure is built before you try to edit the settings. But to obtain the result specified in the question, you can do it like this.
Step1:
Edit fig['data'][series]['x'] for each series in fig['data'].
Step2:
set tickmode and ticktext in:
go.Layout(xaxis = go.layout.XAxis(tickvals = [some_values]
ticktext = [other_values])
)
Result:
Complete code for a Jupyter Notebook:
# imports
import plotly
import cufflinks as cf
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import pandas as pd
import numpy as np
from IPython.display import HTML
from IPython.core.display import display, HTML
import copy
import plotly.graph_objs as go
# setup
init_notebook_mode(connected=True)
np.random.seed(123)
cf.set_config_file(theme='pearl')
#%qtconsole --style vim
# Random data using cufflinks
df = cf.datagen.lines()
# create figure setup
fig = df.iplot(asFigure=True, kind='scatter',
xTitle='Dates',yTitle='Returns',title='Returns')
# create df1 to mess around with while
# keeping the source intact in df
df1 = df.copy(deep = True)
df1['idx'] = range(0, len(df))
# time variable operations and formatting
df1['yr'] = df1.index.year
df1['mth'] = df1.index.month_name()
# function to replace month name with
# abbreviated month name AND year
# if the month is january
def mthFormat(month):
dDict = {'January':'jan','February':'feb', 'March':'mar',
'April':'apr', 'May':'may','June':'jun', 'July':'jul',
'August':'aug','September':'sep', 'October':'oct',
'November':'nov', 'December':'dec'}
mth = dDict[month]
return(mth)
# replace month name with abbreviated month name
df1['mth'] = [mthFormat(m) for m in df1['mth']]
# remove adjacent duplicates for year and month
df1['yr'][df1['yr'].shift() == df1['yr']] = ''
df1['mth'][df1['mth'].shift() == df1['mth']] = ''
# select and format values to be displayed
df1['idx'][df1['mth']!='']
df1['display'] = df1['idx'][df1['mth']!='']
display = df1['display'].dropna()
displayVal = display.values.astype('int')
df_display = df1.iloc[displayVal]
df_display['display'] = df_display['display'].astype('int')
df_display['yrmth'] = df_display['mth'] + '<br>' + df_display['yr'].astype(str)
# set properties for each trace
for ser in range(0,len(fig['data'])):
fig['data'][ser]['x'] = df1['idx'].values.tolist()
fig['data'][ser]['text'] = df1['mth'].values.tolist()
fig['data'][ser]['hoverinfo']='all'
# layout for entire figure
f2Data = fig['data']
f2Layout = go.Layout(
xaxis = go.layout.XAxis(
tickmode = 'array',
tickvals = df_display['display'].values.tolist(),
ticktext = df_display['yrmth'].values.tolist(),
zeroline = False)#,
)
# plot figure with specified major ticks and gridlines
fig2 = go.Figure(data=f2Data, layout=f2Layout)
iplot(fig2)
Some important details:
1. Flexibility and limitations with iplot():
This approach with iplot() and editing all those settings is a bit clunky, but it's very flexible with regards to the number of columns / variables in the dataset, and arguably preferable to building each trace manually like trace1 = go.Scatter() for each and every column in the df.
2. Why do you have to edit each series / trace?
If you try to skip the middle part with
for ser in range(0,len(fig['data'])):
fig['data'][ser]['x'] = df1['idx'].values.tolist()
fig['data'][ser]['text'] = df1['mth'].values.tolist()
fig['data'][ser]['hoverinfo']='all'
and try to set tickvals and ticktext directly on the entire plot, it will have no effect:
I think that's a bit weird, but I think it's caused by some underlying settings initiated by iplot().
3. One thing is still missing:
In order fot thie setup to work, the structure of ticvals and ticktext is [0, 31, 59, 90] and ['jan<br>2015', 'feb<br>', 'mar<br>', 'apr<br>'], respectively. This causes the xaxis line hovertext show the position of the data where ticvals and ticktext are empty:
Any suggestions on how to improve the whole thing is highly appreciated. Better solutions than my own will instantly receive Accepted Answer status!