Relatively new here and new to Python in general, but trying to work with Plotly Express to create myself a Choropleth map that allows me to color-code custom data on countries. However, I'm having trouble actually loading the map up. I've been able to load the native Choropleth map with none of my data, but when I link my own geojson data set to the function it gets stuck loading on line 30. I haven't quite been able to figure out why, since it just gets stuck and gives me nothing else to troubleshoot with.
Any help is greatly appreciated!
import random
from token_file import token
import pandas as pd
import plotly.express as px
import json
with open('documents/countries.geojson') as f:
data = json.load(f)
countries = {}
for features in data["features"]:
if features["properties"]["ISO_A3"] != "-99":
name = features["properties"]["ADMIN"]
iso = features["properties"]["ISO_A3"]
geo = features["geometry"]
val = random.randint(0, 100)
values = pd.Series([name, iso, val], index=["Name", "ISO", "Val"])
countries[name] = values
else:
continue
countries = pd.DataFrame(countries)
countries = countries.transpose()
print(countries)
px.set_mapbox_access_token(token)
map = px.choropleth_mapbox(countries, locations="ISO", zoom=1, hover_name="Name", hover_data=["ISO"],
color="Val", color_continuous_scale="Viridis", mapbox_style="carto-positron")
map.update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0})
map.show()
Related
Objective: Create a plotly HISTOGRAM with a dropdown menu and put it in a html. The html will have multiple different types of graphs (not mentioned in this question).
I actually have a large dataframe/csv with many columns; but for this question I'm considering a very SIMPLE csv/dataframe.
The csv/dataframe has three columns - HOST,REQUEST,INCIDENT. Below is the sample csv.
HOST,REQUEST,INCIDENT
host1,GET,error
host1,GET,warning
host1,GET,warning
host1,POST,warning
host1,POST,error
host1,POST,warning
host2,GET,warning
host2,GET,error
host2,GET,warning
host2,POST,error
host2,POST,warning
host2,POST,error
host3,GET,error
host3,GET,error
host3,GET,error
host3,POST,error
host3,POST,error
host3,POST,warning
host4,GET,warning
host4,GET,error
host4,GET,error
host4,POST,error
host4,POST,warning
host4,POST,warning
Currently I'm plottting separate HISTOGRAM graphs for 'REQUEST Vs INCIDENT' for each HOST and then creating a html out of it. Means if there're four different hosts, then I'm plotting four different HISTOGRAM graphs in my html.
Below is my code.
import pandas as pd
import plotly.express as px
print(f"START")
df = pd.read_csv("dropdown.csv")
hosts = list(df['HOST'].unique())
print(hosts)
for host in hosts:
title = "Dropdown grap for host = " + host
df1 = df.loc[(df['HOST'] == host)]
graph = px.histogram(df1, x='REQUEST', color='INCIDENT', title=title)
with open("dropdown.html", 'a') as f:
f.write(graph.to_html(full_html=False, include_plotlyjs=True))
print(f"END")
Below is my output html having four graphs
But My Objective is to plot a single HISTOGRAM graph in my output html, with HOST being the dropdown. I should be able to select different HOSTs from the dropdown to get graph for each respective HOST.
Using plotly express I'm NOT getting any option to achieve my required output. Need help with this. Especially if I can achieve this using plotly.express itself that'll be great!
Other options are also welcome.
You can loop through all possible hosts, and create a corresponding fig using fig = px.histogram(df_subset_by_host, x='REQUEST', color='INCIDENT'), then extract the x array data stored in the fig._data object, and assign this data to the "x" arg of each host selection button.
For example:
from io import StringIO
import pandas as pd
import plotly.express as px
data_str = StringIO("""HOST,REQUEST,INCIDENT
host1,GET,error
host1,GET,warning
host1,GET,warning
host1,POST,warning
host1,POST,error
host1,POST,warning
host2,GET,warning
host2,GET,error
host2,GET,warning
host2,POST,error
host2,POST,warning
host2,POST,error
host3,GET,error
host3,GET,error
host3,GET,error
host3,POST,error
host3,POST,error
host3,POST,warning
host4,GET,warning
host4,GET,error
host4,GET,error
host4,POST,error
host4,POST,warning
host4,POST,warning""")
df = pd.read_csv(data_str)
hosts = list(df['HOST'].unique())
host_default = "host1"
title = f"Dropdown grap for host = {host_default}"
fig = px.histogram(df.loc[df['HOST'] == host_default], x='REQUEST', color='INCIDENT', title=title)
buttons = []
for host in hosts:
df_host = df.loc[(df['HOST'] == host)]
fig_host = px.histogram(df_host, x='REQUEST', color='INCIDENT')
buttons.append(
dict(
label=host,
method="update",
args=[
{
"x": [trace['x'] for trace in fig_host._data],
"title": f"Dropdown group for host {host}"
}
]
)
)
fig.update_layout(
updatemenus=[
dict(
type="dropdown",
direction="down",
showactive=True,
buttons=buttons
)
]
)
fig.show()
I having trouble getting some air pollution data to show different colors in a chloropleth map using folium. Please let me know where my code may be throwing an error. I think it is the key_on parameter but need help.
This is how my map turns out.
enter image description here
What I would like is for the mean concentration of the air pollution data to show up on the map but the map is still greyed out.
Here are the files I used:
Geojson file - Used "download zip" in upper right of this website https://gist.github.com/miguelpaz/edbc79fc55447ae736704654b3b2ef90#file-uhf42-geojson
Data file - Exported data from here https://a816-dohbesp.nyc.gov/IndicatorPublic/VisualizationData.aspx?id=2023,719b87,122,Summarize
Here is my code:
import geopandas as gpd
import folium
#clean pollution data
pm_df1 = pd.read_csv('/work/Fine Particulate Matter (PM2.5).csv',header = 5, usecols = ['GeoTypeName', 'Borough','Geography', 'Geography ID','Mean (mcg per cubic meter)'], nrows = 140)
#limit dataframe to rows with neighborhood (UHF 42) that matches geojson file
pm_df2 = pm_df1[(pm_df1['GeoTypeName'] == 'Neighborhood (UHF 42)')]
pm_df2
#clean geojson file
uhf_df2 = gpd.read_file('/work/uhf42.geojson', driver='GeoJSON')
uhf_df2.head()
#drop row 1 that has no geography
uhf_df3 = uhf_df2.iloc[1:]
uhf_df3.head()
## create a map
pm_testmap = folium.Map(location=[40.65639,-73.97379], tiles = "cartodbpositron", zoom_start=10)
# generate choropleth map
pm_testmap.choropleth(
geo_data=uhf_df3,
data=pm_df2,
columns=['Geography', 'Mean (mcg per cubic meter)'],
key_on='feature.properties.uhf_neigh', #think this is where I mess up.
fill_color='BuPu',
fill_opacity=0.2,
line_opacity=0.7,
legend_name='Average dust concentration',
smooth_factor=0)
# display map
pm_testmap
The problem with key_on is right as you think.
Both data have the name of UHF written on them, but in a completely different form.
In order to link these two, the data must first be preprocessed.
I don't know your data.
It would be nice if you could df.head() the two data to show them, but I'll explain based on the data I checked through the link you provided.
In your geojson file, uhf_neigh simply says Northeast Bronx. However, your PM data appears to have the region listed as Bronx: Northeast Bronx. The following process seems to be necessary to unify your local name before plotting map.
uhf_df2['UHF_NEIGH'] = uhf_df2['BOROUGH']+ ': ' + uhf_df2['UHF_NEIGH']
I tried to run it with your data and code, but it was not even displaying the map. There should be no problem in your code because you have associated the place name in the data frame with the place name in geojson. I gave up on the string association and changed the association to a place name code association, and the map was displayed. The provided csv file failed to load, so I deleted the unnecessary lines and loaded it. Also, I read the file as a json file instead of geopandas.
import pandas as pd
import geopandas as gpd
import json
import folium
pm_df1 = pd.read_csv('./data/test_20211221.csv')
pm_df1 = pm_df1[['GeoTypeName', 'Borough', 'Geography', 'Geography ID', 'Mean (mcg per cubic meter)']]
pm_df2 = pm_df1[(pm_df1['GeoTypeName'] == 'Neighborhood (UHF 42)')]
with open('./data/uhf42.geojson') as f:
uhf_df3 = json.load(f)
pm_testmap = folium.Map(location=[40.65639,-73.97379], tiles = "cartodbpositron", zoom_start=10)
# generate choropleth map
pm_testmap.choropleth(
geo_data=uhf_df3,
data=pm_df2,
columns=['Geography ID', 'Mean (mcg per cubic meter)'],
key_on='feature.properties.uhfcode', #think this is where I mess up.
fill_color='BuPu',
fill_opacity=0.2,
line_opacity=0.7,
legend_name='Average dust concentration',
smooth_factor=0)
# display map
pm_testmap
Good afternoon Stack!!
Pretty straight forward question here. I built a Choropleth to show the density of the soldiers I have where in my army reserve unit. The problem I have is that the map results come back in pretty much monochrome. its not the green scale I used in fill. Am I dumb and missed something??
For obvious reasons I'm not going to share what the map looks like now so any advice is warmly welcomed.
import folium
import pandas as pd
import json
from folium import plugins
##Create data frame
df = pd.read_csv('soldier data.csv')
numSoldiersSeries = df.groupby('Zip').count()
soldiersByZip = pd.DataFrame()
soldiersByZip['ZCTA5CE10'] = [str(i) for i in numSoldiersSeries.index]
soldiersByZip['numSoldiers'] = numSoldiersSeries.values
#Create Zipcode Overlay
with open('pa.geojson', 'r') as f:
pa = json.load(f)
##Create starting map
neMap = folium.Map(location=[40.767937,-73.982155], zoom_start=7)
##Create color overlay
folium.Choropleth(geo_data = pa, data=soldiersByZip,
columns=['ZCTA5CE10', 'numSoldiers'],
key_on='feature.properties.ZCTA5CE10',
fill_color='YlGn',
fill_opacity=.2,
line_opacity=1).add_to(neMap)
neMap.save('nemap.html')
I have seen the TripsLayer example in the deck.gl website (this one) and it looks really cool. I would like to accomplish the same but using pydeck, the python bindings for deck.gl. The example in pydeck's webpage (this one) is not animated and I am not sure how should I do it to get a smooth animation as shown in the javascript example. I have tried multiple things (passing lists, functions, variables with changing value etc.) but non of them have worked and I can't find any example with pydeck.
Thanks!
It's true that the example should include more trips. Here is how to achieve the animation of multiple trips in a jupyter notebook.
import time
import pandas as pd
import pydeck as pdk
data = '[{"agent_id":0,"path":[[-0.63968,50.83091,0.0],[-0.78175,50.83205,0.0]],"time":[65100,65520],"color":[228,87,86]},{"agent_id":1,"path":[[-0.63968,50.83091,0.0],[-0.78175,50.83205,0.0]],"time":[65940,66420],"color":[178,121,162]},{"agent_id":2,"path":[[-0.63968,50.83091,0.0],[-0.37617,50.8185,0.0]],"time":[65340,66360],"color":[157,117,93]},{"agent_id":3,"path":[[-0.63968,50.83091,0.0],[-0.78175,50.83205,0.0]],"time":[65940,66420],"color":[238,202,59]},{"agent_id":4,"path":[[-0.63968,50.83091,0.0],[-0.78175,50.83205,0.0]],"time":[67740,68160],"color":[157,117,93]}]'
df = pd.read_json(data)
view = {"bearing": 0, "latitude": 50.85, "longitude": -0.16, "pitch": 0, "zoom": 9}
time_min = 65_000
time_max = 80_000
layer = pdk.Layer(
"TripsLayer",
df,
get_path='path',
get_timestamps='time',
get_color='color',
opacity=0.8,
width_min_pixels=3,
rounded=True,
trail_length=900,
current_time=0
)
# Render
r = pdk.Deck(layers=[layer], initial_view_state=view, map_style='dark_no_labels')
r.show()
# Animate
for ct in range(time_min, time_max, 100):
layer.current_time = ct
r.update()
time.sleep(0.1)
I am using Bokeh in an experiment to plot data in realtime and the library provides a convenient way to do that.
Here a snippet of my code to accomplish this tasks:
# do the imports
import pandas as pd
import numpy as np
import time
from bokeh.plotting import *
from bokeh.models import ColumnDataSource
# here is simulated fake time series data
ts = pd.date_range("8:00", "10:00", freq="5S")
ts.name = 'timestamp'
ms = pd.Series(np.arange(0, len(ts)), index=ts)
ms.name = 'measurement'
data = pd.DataFrame(ms)
data['state'] = np.random.choice(3, len(ts))
data['observation'] = np.random.choice(2, len(ts))
data.reset_index(inplace=True)
data.head()
This is how the data looks like.
Next I have used the following snipped to push the data to the server in real time
output_server("observation")
p = figure(plot_width=800, plot_height=400, x_axis_type="datetime")
x = np.array(data.head(2).timestamp, dtype=np.datetime64)
y = np.array(data.head(2).observation)
p.diamond_cross(x,y, size=30, fill_color=None, line_width=2, name='observation')
show(p)
renderer = p.select(dict(name="observation"))[0]
ds = renderer.data_source
for mes in range(len(data)):
x = np.append(x, np.datetime64(data.loc[mes].timestamp))
y = np.append(y, np.int64(data.loc[mes].observation))
ds.data["x"] = x
ds.data["y"] = y
ds._dirty = True
cursession().store_objects(ds)
time.sleep(.1)
This produces a very nice result, however I need to change the color of each data point conditioned on a value.
In this case, the condition is the state variable which takes three values -- 0, 1, and 2. So my data should be able to reflect that.
I have spent hours trying to figure it out (admittedly I an very new to Bokeh) and any help will be greatly appreciated.
When you push the data, you have to separate the groups by desired color, and then supply the corresponding colors as a palette. There's a longer discussion with several variations at https://github.com/bokeh/bokeh/issues/1967, such as the simple boteh.charts dot example bryevdv posted on 28 Feb:
cat = ['foo', 'bar', 'baz']
xyvalues=dict(x=[1,4,5], y=[2,7,3], z=[3,4,5])
dots = Dot(
xyvalues, cat=cat, title="Data",
ylabel='FP Rate', xlabel='Vendors',
legend=False, palette=["red", "green", "blue"])
show(dots)
Please remember to read and follow the posting guidelines at https://stackoverflow.com/help/how-to-ask; I found this and several other potentially useful hits with my first search attempt, "Bokeh 'change color' plot". If none of these solve your problem, you need to differentiate what you're doing from the answers already out there.