How do I pull data from both of these files? - python

I am trying to create a map that tracks COVID-19 confirmed cases by county using FIPS codes. How am I able to make this code gather the data from both of those data files?
If you run the code as is (NY times data) then the map does not fill in counties with zero cases as zero cases. This is because the NY times data does not list the data for the places with zero cases. The other data does list places with zero cases. So, whatever doesn't get filled in with the NY times data I would like to fill in with the other data set. How do I do this? Or how do I fix my problem? Also, when hovering over the map how do I make it state the county name instead of the FIPS number?
Furthermore, how do I make this a live map that auto-updates when there is new data?
from urllib.request import urlopen
import json
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
counties = json.load(response)
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv",
dtype={"fips": str})
df = pd.read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/03-28-2020.csv",
dtype={"fips": str})
import plotly.express as px
)
fig = px.choropleth(df, geojson=counties, locations='fips', color='cases',
color_continuous_scale="dense",
range_color=(0, 100),
scope="usa",
labels={'cases':'Confirmed COVID:19 Cases'},
)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

Related

How to use data in a csv file with geopandas?

I have a csv file which has neumerical variables including longitude and latitude and also have few categorical variables. I want to use this csv with geopandas to plot map. but i am confused about shapefiles and how to use them. Can anyone tell me how do i start ?
as per comments this is fully covered in documentation. Another example
import pandas as pd
import io, requests
import geopandas as gpd
# read some CSV data
df = pd.read_csv(
io.StringIO(requests.get("https://assets.nhs.uk/data/foi/Hospital.csv").text),
sep="Č",
engine="python",
)
# long / lat columns to geodataframe geomtry all other columns attributes
gdf = gpd.GeoDataFrame(
geometry=gpd.points_from_xy(df.Longitude, df.Latitude, crs="EPSG:4326"), data=df
)
# exclude empty geometries and show it works
gdf.loc[~gdf.geometry.is_empty, :].explore(
"Sector", cmap=["blue", "green"], height=300, width=300
)
output

plotly express for large data sets

import plotly.express as px
import pandas as pd
dfa = pd.DataFrame()
dfa["travel_time(min)"] = range(100000)
fig = px.ecdf(dfa["travel_time(min)"], x="travel_time(min)")
#fig.write_html("debug.html")
fig.show()
The 100k points are producing a graphic, which is lagging (with 10k points it is working fine).
How can I fix this? Is it possible to somehow precalculate the graphic?

My Choropleth map only shows the map, and not the colours

I am new to python and wanted to try using a choropleth map. I have the following code for the graph.
import numpy as np
import pandas as pd
import plotly.express as px
df = pd.read_csv(r'C:\Users\lukee\Desktop\COVID Visualisation\time_series_covid_19_confirmed.csv')
#Data for number of cases for each country across the different dates
geojson = df['Country/Region']
#define the colour codes for the number of cases across the different dates
colourscale = px.colors.sequential.Plasma
#world map to show the intensity of cases in each country
fig = px.choropleth(df,
geojson=geojson,
locationmode= 'country names',
color = df['5/16/21'],
color_continuous_scale = colourscale,
scope='world',
hover_name=df["Country/Region"],
labels={'COVID Cases'})
fig.update(layout_coloraxis_showscale=False)
fig.show()
solution uses sourcing open world, not kaggle
plotting code, there were some inconsistencies on how you requested columns in data frame. addition of featureidkey parameter so dataframe and geojson join correctly
data sourcing
import requests
import pandas as pd
from pathlib import Path
from zipfile import ZipFile
import json, io
# source geojson for country boundaries
geosrc = pd.json_normalize(requests.get("https://pkgstore.datahub.io/core/geo-countries/7/datapackage.json").json()["resources"])
fn = Path(geosrc.loc[geosrc["name"].eq("geo-countries_zip"), "path"].values[0]).name
if not Path.cwd().joinpath(fn).exists():
r = requests.get(geosrc.loc[geosrc["name"].eq("geo-countries_zip"), "path"].values[0],stream=True,)
with open(fn, "wb") as fd:
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
zfile = ZipFile(fn)
with zfile.open(zfile.infolist()[0]) as f:
geojson = json.load(f)
# source COVID data
dfall = pd.read_csv(io.StringIO(requests.get("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv").text))
dfall["date"] = pd.to_datetime(dfall["date"])
dflatest = (dfall.sort_values(["iso_code", "date"]).groupby("iso_code", as_index=False).last())
colorcol = "new_cases_smoothed_per_million"
# filter out where data is no for a country or no data available for latest date or a big outlier
dflatest = dflatest.loc[
dflatest[colorcol].gt(0).fillna(False)
& dflatest["iso_code"].str.len().eq(3)
& dflatest[colorcol].lt(dflatest[colorcol].quantile(0.97))
]
plotting
import plotly.express as px
#define the colour codes for the number of cases across the different dates
colourscale = px.colors.sequential.Plasma
#world map to show the intensity of cases in each country
fig = px.choropleth(dflatest,
geojson=geojson,
locations= 'iso_code',
featureidkey="properties.ISO_A3",
color = colorcol,
color_continuous_scale = colourscale,
scope='world',
hover_name="location",
labels={colorcol:'COVID Cases'}
)
fig.update_layout(coloraxis_showscale=False, margin={"l":0,"r":0,"t":0,"r":0})
fig
output

Fastest way to parse multiple header names to Plotly (Python

so I've been experimenting with plotly and trying to get plotting multiple traces. I wrote the following code which plots two traces on the same graph :
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
data = pd.read_csv("data.csv")
headers = pd.read_csv("data.csv", index_col=0, nrows=0).columns.tolist()
fig = go.Figure()
fig = px.line(data, x="DateTime", y=[headers[0], headers[1]])
fig.show()
In this example the first and second headers are plotted as traces on the graph. I was wondering if there was a way other than y=[headers[n],headers[n+1]]... to get all the lines drawn on? I tried just using the headers array without an index, but it gives a ValueError
Plotly Express cannot process wide-form data with columns of different type.
So, is there a plotly-specific way to make this more efficient & readable than just writing every index in the plot header definition, or can it be done with standard python?
EDIT: the actual data sample is a csv providing int values with a header and date :
DateTime X Y Z
01-JAN-2018,5,6,7...
02-JAN-2018,7,8,9
if your sample data is what is in your CSV, it's a simple case of defining y as the numeric columns
import io
import pandas as pd
import plotly.express as px
headers = pd.read_csv(io.StringIO("""DateTime,X,Y,Z
01-JAN-2018,5,6,7
02-JAN-2018,7,8,9
"""))
px.line(headers, x="DateTime", y=headers.select_dtypes("number").columns)

Plotly px.choropleth not drawing data from json file

I have a CSV file with the following structure
cardodb_id,CONCELHO,LAT,LONG,DATA,INC 225,Abrantes,39.466667,-8.2,2020-03-25,1000
And a Json file with the following structure:
{"type":"FeatureCollection", "features": [ {"type":"Feature","geometry":{"type":"Polygon","coordinates":[[[-8.163874,39.626553],[-8.164286,39.626686],[-8.165384,39.626633],*(more coordinates' pairs)*,[-8.163874,39.626553]]]},"properties":{"cartodb_id":225,"id_2":225,"id_1":16,"id_0":182,"varname_2":null,"nl_name_2":null,"engtype_2":"Municipality","type_2":"Concelho","name_2":"Abrantes","name_1":"Santarém","name_0":"Portugal","iso":"PRT","split_id":"1"}} ]}
Both the CSV and the json file here are part of a larger set but this will do as an example
My code is as follows
import json
with open('abrantes.json') as json_file:
abr = json.load(json_file)
import pandas as pd
df = pd.read_csv("abrantes.csv")
import plotly.express as px
fig = px.choropleth(df, geojson=abr, locations='cardodb_id', color='INC',
color_continuous_scale="Viridis",
range_color=(0, 5000),
labels={'INC':'Incidência'}
)
fig.show()
The end result is an empty map with the scale from 0 to 5000 on the right side, when I was expecting the polygon to be filled with the color correspondent to "INC", i.e., "1000".
What am I doing wrong? Thank you in advance for all the help you can provide.
To draw a map, px.choropleth() must match IDs of your dataframe with IDs of your GeoJSON.
With the parameter locations you specify the column with the IDs in your dataframe.
What you are missing is the parameter featureidkey to specify the same IDs in the GeoJSON. Alternatively, you can omit featureidkey but then the features in your GeoJSON need a parameter id.
Then you have to pay attention to spelling. Your csv file has a column cardodb_id, your GeoJSON a parameter cartodb_id.
And since the polygon you provided is quite small, it is not visible on a world map. Thus, I recommend to add fig.update_geos(fitbounds="locations") to zoom the map to the area of interest.
import json
import pandas as pd
import plotly.express as px
with open('abrantes.json') as json_file:
abr = json.load(json_file)
df = pd.read_csv("abrantes.csv")
fig = px.choropleth(df, geojson=abr, locations='cardodb_id', color='INC',
color_continuous_scale="Viridis",
featureidkey="properties.cartodb_id",
range_color=(0, 5000),
labels={'INC':'Incidência'}
)
fig.update_geos(fitbounds="locations")
fig.show()

Categories