I am trying to create a plotly choropleth map of the uk local authorities, using predicted autism prevalence. The script is getting stuck infinitely loading when I try and assign "colour=predictedprevalence" so that the choropleth shows predicted autism rates from the dataset in each authority area. I am unsure what I need to do.
I have attached an image of where the script hangs shows where the script hangs
import pandas as pd
import json
from urllib.request import urlopen
import numpy as np
#With Plotly
import plotly.express as px
from geojson_rewind import rewind
with urlopen('https://raw.githubusercontent.com/plummy95/1312306/main/localauth.json') as response:
counties = json.load(response)
df=pd.read_csv('https://raw.githubusercontent.com/plummy95/1312306/main/localauth.json')
counties_corrected=rewind(counties,rfc7946=False)
fig = px.choropleth(df, geojson=counties_corrected, locations='nuts318cd', featureidkey="properties.nuts318cd", color='predictedprevalence',
color_continuous_scale="PurPor", labels={'label name':'label name'}, title='MAP TITLE',
scope="europe")
fig.update_geos(fitbounds="locations", visible=False)
This is what my dataset (lauth2020.csv) looks like. There are 311 entries:
dataset
I have been a wee stuck for a few days with this and would really appreciate some assistance.
Thank you
Are you sure you are not specifying the user data incorrectly, it should be in csv format, but you are reading JSON format. I datamined your CSV data image and ran the code. The graph appears to be displayed correctly.
import pandas as pd
import json
from urllib.request import urlopen
import numpy as np
#With Plotly
import plotly.express as px
from geojson_rewind import rewind
with urlopen('https://raw.githubusercontent.com/plummy95/1312306/main/localauth.json') as response:
counties = json.load(response)
counties_corrected=rewind(counties,rfc7946=False)
import io
data = '''
nuts318cd "local authority" year predictedprevalence latitude longitude
UKC11 Adur 2020 357 50.84572 -0.32417
UKC12 Allerdale 2020 551 54.68524 -3.2809
UKC13 "Amber Valley" 2020 746 53.02884 -1.46219
UKC14 Arun 2020 845 50.84321 -0.64999
UKC21 Ashfield 2020 761 53.09747 -1.25422
UKC22 Ashford 2020 742 51.13096 0.823374
UKC23 Babergh 2020 496 52.0645 0.916149
UKD11 "Barking and Dagenham" 2020 1307 51.54555 0.129479
UKD12 Barnet 2020 2476 51.61107 -0.21819
UKD33 Barnsley 2020 1470 53.52577 -1.54925
UKD34 "Barrow-in-Furness" 2020 390 54.15731 -3.1999
UKD35 Basildon 2020 1091 51.59036 0.475055
UKD36 "Basingstoke and Deane" 2020 1055 51.25937 -1.22021
UKD37 Bassetlaw 2020 686 53.35604 -0.9787
UKD41 "Bath and North East Somerset" 2020 1226 51.35604 -2.48654
UKD42 Bedford 2020 1011 52.19628 -0.45463
UKD44 Bexley 2020 1475 51.45822 0.146212
UKD45 Birmingham 2020 7088 52.48404 -1.88141
UKD46 Blaby 2020 595 52.57706 -1.19887
UKD47 "Blackburn with Darwen" 2020 899 53.7008 -2.4636
UKD61 Blackpool 2020 821 53.82164 -3.02199
UKD62 Bolsover 2020 485 53.23875 -1.27228
UKD63 Bolton 2020 1682 53.58449 -2.47952
UKD71 Boston 2020 419 52.97794 -0.11218
UKD72 "Bournemouth, Christchurch and Poole" 2020 2381 50.74609 -1.84807
UKD73 "Bracknell Forest" 2020 755 51.4113 -0.73363
UKD74 Bradford 2020 3145 53.84382 -1.87389
UKE11 Braintree 2020 871 51.91634 0.575911
UKE12 Breckland 2020 790 52.59421 0.818716
UKE13 Brent 2020 2216 51.56438 -0.27568
UKE21 Brentwood 2020 432 51.64108 0.290091
UKE22 "Brighton and Hove" 2020 2070 50.8465 -0.15079
'''
df = pd.read_csv(io.StringIO(data), delim_whitespace=True)
fig = px.choropleth(df,
geojson=counties_corrected,
locations='nuts318cd',
featureidkey="properties.nuts318cd",
projection='mercator',
color='predictedprevalence',
color_continuous_scale="PurPor",
labels={'label name':'label name'},
title='MAP TITLE',
scope="europe"
)
fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(autosize=False,
width=800,
height=600,
margin={"r":0,"t":20,"l":0,"b":0})
fig.show()
Related
I am trying to create a stacked area chart, which shows the number of customers by country.
So my data frame is:
date people country
2021-11-18 509 USA
2021-11-18 289 France
2021-11-18 234 Germany
2021-11-18 148 Poland
2021-11-18 101 China
I don't understand how to edit the graphic design (color).
table.groupby(['date','country'])['people'].sum().unstack().plot(
kind='area',
figsize=(10,4))
Also I tried to use the Bokeh library for nice visualization, but i don't know how to write the code
Thanks for your help. It's my first post. Sorry if I missed something.
I think your are looking for varea_stack()-function in bokeh.
My solution is based on the varea_stack-example which is part of the official documentation.
Let's assume this is your data (I added on day):
text = """date people country
2021-11-18 509 USA
2021-11-18 289 France
2021-11-18 234 Germany
2021-11-18 148 Poland
2021-11-18 101 China
2021-11-19 409 USA
2021-11-19 389 France
2021-11-19 134 Germany
2021-11-19 158 Poland
2021-11-19 191 China"""
First I bring the data in the same form of the example:
from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO(text), sep='\s+', parse_dates=True, index_col=0)
df = df.groupby(['date','country']).sum().unstack()
df.columns = df.columns.droplevel(0)
df.index.name=None
df.columns.name=None
Now the DataFrame looks like this:
China France Germany Poland USA
2021-11-18 101 289 234 148 509
2021-11-19 191 389 134 158 409
Now the rest is straight forward. If your index is a DatetimeIndex you have to modify the x_axis_type of the bokeh figure. Id did this for the plot below.
from bokeh.palettes import brewer
from bokeh.plotting import figure, show, output_notebook
output_notebook()
n = df.shape[1]
p = figure(x_axis_type='datetime')
p.varea_stack(stackers=df.columns, x='index', source=df, color=brewer['Spectral'][n],)
show(p)
The output lookslike this:
You can redefine the color using the color-keyword if you like.
you should add colors to your source or you could use color pallettes in bokeh. please check here.
I got the idea to try and visualize data for election donations from the fec website. Basically, I would like to create a stacked bar chart, with the X-axis being the State, Y-axis being the donated amount, and the 'stacks' being the different candidates, showing how much each candidate received from each state.
Code:
import matplotlib.pyplot as plt
import pandas as pd
from pathlib import Path
pathName = r"R:\Downloads\indiv20\by_date"
dataDir = Path(pathName)
filename = "itcont_2020_20010425_20190425.txt"
fullName = dataDir / filename
data = pd.read_csv(fullName, low_memory=False, sep="|", usecols=[0, 9, 12, 14])
data.columns = ['Filer ID', 'State', 'Occupation', 'Donation Amount ($)']
data = data.dropna(subset=['Donation Amount ($)'])
donations_by_state = data.groupby('State').sum()
plt.bar(donations_by_state.index, donations_by_state['Donation Amount ($)'])
plt.ylabel('Donation Amount ($)')
plt.xlabel('State')
plt.title('Donations per State')
plt.show()
This plots the total contributions per state, and works great. However, when I try this groupby method to group all the data I want, I'm not sure how to plot a stacked bar chart from this data:
donations_per_candidate_per_state = data['Donation Amount ($)'].groupby([data['State'], data['Filer ID']]).sum()
State Filer ID
AA C00005561 350
C00010603 600
C00042366 115
C00309567 1675
C00331694 2500
C00365536 270
C00401224 4495
C00411330 100
C00492991 300
C00540500 300
C00641381 250
C00696948 2800
C00697441 250
C00699090 67
C00703108 1400
AB C00401224 1386
AE C00000935 295
C00003418 276
C00010603 1750
C00027466 320
C00193433 105
C00211037 251
C00216614 226
C00341396 20
C00369033 150
C00394957 50
C00401224 26538
C00438713 50
C00457325 310
C00492785 300
...
ZZ C00580100 1490
C00603084 95
C00607861 750
C00608380 125
C00618371 2199
C00630665 1000
C00632133 600
C00632398 400
C00639500 208
C00639591 1450
C00640623 6402
C00653816 1000
C00666149 1000
C00666453 2800
C00683102 1000
C00689430 3524
C00693234 13283
C00693713 1000
C00694018 2750
C00694455 12761
C00695510 1045
C00696245 250
C00696419 3000
C00696526 500
C00696948 31296
C00697441 34396
C00698050 350
C00698258 2800
C00699090 5757
C00700732 475
Name: Donation Amount ($), Length: 32662, dtype: int64
It seems to have the data tabulated in the way I need, just not sure how to plot it.
You can use the following as described here:
df = donations_per_candidate_per_state.unstack('Filer ID')
df.plot(kind='bar', stacked=True)
I'm trying to create a choropleth map with zipcode and temperature data to overly the counties, however I continue to have a Javascript error when trying to encode my data. I've looked at the github support and found that this was an issue with sometimes pulling in dataframes, but I also tried using a csv file as the datatype. It seems that the Q is not recognizing the temp column as a number?
import altair as alt
from vega_datasets import data
counties = alt.topo_feature(data.us_10m.url, 'counties')
source = max_2007_df
alt.Chart(counties).mark_geoshape().encode(
color='temp:Q').transform_lookup(
lookup='zipcode',
from_=alt.LookupData(source, 'zipcode', ['temp'])
).project(
type='albersUsa'
).properties(
width=500,
height=300
)
Javascript Error: Failed to execute 'addColorStop' on 'CanvasGradient': The provided float value is non-finite.. This usually means there's a typo in your chart specification. See the JavaScript console for the full traceback.
This is part of the max_2007_temp df
zipcode temp
0 1002 33.6
1 1011 31.8
2 1013 34.1
3 1098 31.9
4 1108 34.3
5 1129 34.1
6 1453 33.3
7 1545 33.5
8 1568 33.4
9 1571 32.8
10 1603 33.5
11 1604 33.8
12 1702 35.5
13 1721 35.5
14 1746 35.5
15 1752 35.5
16 1760 35.5
17 1772 34.4
18 1773 35.5
19 1776 35.5
The map data you reference, data.us_10m, does not have any zipcode information, so it will not work to join this data on zipcode.
If you would like to make the chart you have in mind, you'll need to find a source of geographic data indexed by zipcode rather than by county.
I am new to python and I'm trying to plot an overlaid histogram for a manipulated data set from Kaggle. I tried doing it with matplotlib. This is a dataset that shows the history of gun violence in USA in recent years. I have selected only few columns for EDA.
import pandas as pd
data_set = pd.read_csv("C:/Users/Lenovo/Documents/R related
Topics/Assignment/Assignment_day2/04 Assignment/GunViolence.csv")
state_wise_crime = data_set[['date', 'state', 'n_killed', 'n_injured']]
date_value = pd.to_datetime(state_wise_crime['date'])
import datetime
state_wise_crime['Month']= date_value.dt.month
state_wise_crime.drop('date', axis = 1)
no_of_killed = state_wise_crime.groupby(['state','Year'])
['n_killed','n_injured'].sum()
no_of_killed = state_wise_crime.groupby(['state','Year']
['n_killed','n_injured'].sum()
I want an overlaid histogram that shows the no. of people killed and no.of people injured with the different states on the x-axis
Welcome to Stack Overflow! From next time, please post your data like in below format (not a link or an image) to make us easier to work on the problem. Also, if you ask about a graph output, showing the contents of desired graph (even with hand drawing) would be very helpful :)
df
state Year n_killed n_injured
0 Alabama 2013 9 3
1 Alabama 2014 591 325
2 Alabama 2015 562 385
3 Alabama 2016 761 488
4 Alabama 2017 856 544
5 Alabama 2018 219 135
6 Alaska 2014 49 29
7 Alaska 2015 84 70
8 Alaska 2016 103 88
9 Alaska 2017 70 69
As I commented in your original post, a bar plot would be more appropriate than histogram in this case since your purpose appears to be visualizing the summary statistics (sum) of each year with state-wise comparison. As far as I know, the easiest option is to use Seaborn. It depends on how you want to show the data, but below is one example. The code is as simple as below.
import seaborn as sns
sns.barplot(x='Year', y='n_killed', hue='state', data=df)
Output:
Hope this helps.
I am trying to make a model for predicting energy production, by using ARMA model.
The data I can use for training is as following;
(https://github.com/soma11soma11/EnergyDataSimulationChallenge/blob/master/challenge1/data/training_dataset_500.csv)
ID Label House Year Month Temperature Daylight EnergyProduction
0 0 1 2011 7 26.2 178.9 740
1 1 1 2011 8 25.8 169.7 731
2 2 1 2011 9 22.8 170.2 694
3 3 1 2011 10 16.4 169.1 688
4 4 1 2011 11 11.4 169.1 650
5 5 1 2011 12 4.2 199.5 763
...............
11995 19 500 2013 2 4.2 201.8 638
11996 20 500 2013 3 11.2 234 778
11997 21 500 2013 4 13.6 237.1 758
11998 22 500 2013 5 19.2 258.4 838
11999 23 500 2013 6 22.7 122.9 586
As shown above, I can use data from July 2011 to May 2013 for training.
Using the training, I want to predict energy production on June 2013 for each 500 house.
The problem is that the time series data is not stationary and has trend components and seasonal components (I checked it as following.).
import csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data_train = pd.read_csv('../../data/training_dataset_500.csv')
rng=pd.date_range('7/1/2011', '6/1/2013', freq='M')
house1 = data_train[data_train.House==1][['EnergyProduction','Daylight','Temperature']].set_index(rng)
fig, axes = plt.subplots(nrows=1, ncols=3)
for i, column in enumerate(house1.columns):
house1[column].plot(ax=axes[i], figsize=(14,3), title=column)
plt.show()
With this data, I cannot implement ARMA model to get good prediction. So I want to get rid of the trend components and a seasonal components and make the time series data stationary. I tried this problem, but I could not remove these components and make it stationary..
I would recommend the Hodrick-Prescott (HP) filter, which is widely used in macroeconometrics to separate long-term trending component from short-term fluctuations. It is implemented statsmodels.api.tsa.filters.hpfilter.
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
df = pd.read_csv('/home/Jian/Downloads/data.csv', index_col=[0])
# get part of the data
x = df.loc[df.House==1, 'Daylight']
# hp-filter, set parameter lamb=129600 following the suggestions for monthly data
x_smoothed, x_trend = sm.tsa.filters.hpfilter(x, lamb=129600)
fig, axes = plt.subplots(figsize=(12,4), ncols=3)
axes[0].plot(x)
axes[0].set_title('raw x')
axes[1].plot(x_trend)
axes[1].set_title('trend')
axes[2].plot(x_smoothed)
axes[2].set_title('smoothed x')