I'm trying to create a chloropleth chart using plotly express. I have two files, my geojson file and my data file. Example snippet for one country in my geojson file below:
{'type': 'Feature',
'properties': {'ADMIN': 'Aruba', 'ISO_A3': 'ABW'},
'geometry': {'type': 'Polygon',
'coordinates': [[[-69.99693762899992, 12.577582098000036],
[-69.93639075399994, 12.53172435100005],
[-69.92467200399994, 12.519232489000046],
[-69.91576087099992, 12.497015692000076],
[-69.88019771999984, 12.453558661000045],
[-69.87682044199994, 12.427394924000097],
[-69.88809160099993, 12.417669989000046],
[-69.90880286399994, 12.417792059000107],
[-69.93053137899989, 12.425970770000035],
[-69.94513912699992, 12.44037506700009],
[-69.92467200399994, 12.44037506700009],
[-69.92467200399994, 12.447211005000014],
[-69.95856686099992, 12.463202216000099],
[-70.02765865799992, 12.522935289000088],
[-70.04808508999989, 12.53115469000008],
[-70.05809485599988, 12.537176825000088],
[-70.06240800699987, 12.546820380000057],
[-70.06037350199995, 12.556952216000113],
[-70.0510961579999, 12.574042059000064],
[-70.04873613199993, 12.583726304000024],
[-70.05264238199993, 12.600002346000053],
[-70.05964107999992, 12.614243882000054],
[-70.06110592399997, 12.625392971000068],
[-70.04873613199993, 12.632147528000104],
[-70.00715084499987, 12.5855166690001],
[-69.99693762899992, 12.577582098000036]]]},
'id': 'ABW'}
Head from df is shown below which has the column 'data' which will be used to create the heatmap
Location_Site
Country
City
Cluster_Name
Market_Type
data
id
2
IT-MIL
Italy
Milan
Italy
Mature
73.14%
ITA
3
ES-MAD
Spain
Madrid
Iberia
Mature
55.27%
ESP
4
PT-LIS
Portugal
Lisbon
Iberia
Medium
45.71%
PRT
5
AE-DXB
United Arab Emirates
Dubai
EMEA Emerging Markets (EEM)
Emerging
62.98%
ARE
6
EG-CAI
Egypt
Cairo
EMEA Emerging Markets (EEM)
Emerging
20.36%
EGY
The below code snippet is what I'm trying to execute to plot my choropleth graph
fig = px.choropleth(df,
locations = 'id',
geojson = data,
color = 'data')
fig.show()
I am receiving the below error after execution:
ValueError: The first argument to the plotly.graph_objs.layout.Template
constructor must be a dict or
an instance of :class:`plotly.graph_objs.layout.Template`
Any ideas on what might be creating this error? Thanks!
To solve your problem, you need to tie the ID value of the data frame to the ISO_A3 value of the geojson value. aruba was modified to ABW for ITA in Italy, and the output of the map was obtained.
import plotly.express as px
geo_data = {'type': 'Feature',
'properties': {'ADMIN': 'Aruba', 'ISO_A3': 'ABW'},
'geometry': {'type': 'Polygon',
'coordinates': [[[-69.99693762899992, 12.577582098000036],
[-69.93639075399994, 12.53172435100005],
[-69.92467200399994, 12.519232489000046],
[-69.91576087099992, 12.497015692000076],
[-69.88019771999984, 12.453558661000045],
[-69.87682044199994, 12.427394924000097],
[-69.88809160099993, 12.417669989000046],
[-69.90880286399994, 12.417792059000107],
[-69.93053137899989, 12.425970770000035],
[-69.94513912699992, 12.44037506700009],
[-69.92467200399994, 12.44037506700009],
[-69.92467200399994, 12.447211005000014],
[-69.95856686099992, 12.463202216000099],
[-70.02765865799992, 12.522935289000088],
[-70.04808508999989, 12.53115469000008],
[-70.05809485599988, 12.537176825000088],
[-70.06240800699987, 12.546820380000057],
[-70.06037350199995, 12.556952216000113],
[-70.0510961579999, 12.574042059000064],
[-70.04873613199993, 12.583726304000024],
[-70.05264238199993, 12.600002346000053],
[-70.05964107999992, 12.614243882000054],
[-70.06110592399997, 12.625392971000068],
[-70.04873613199993, 12.632147528000104],
[-70.00715084499987, 12.5855166690001],
[-69.99693762899992, 12.577582098000036]]]},
'id': 'ABW'}
import pandas as pd
import numpy as np
import io
data = '''
Location_Site Country City Cluster_Name Market_Type data id
2 IT-MIL Italy Milan Italy Mature 73.14% ABW
3 ES-MAD Spain Madrid Iberia Mature 55.27% ESP
4 PT-LIS Portugal Lisbon Iberia Medium 45.71% PRT
5 AE-DXB "United Arab Emirates" Dubai "EMEA Emerging Markets (EEM)" Emerging 62.98% ARE
6 EG-CAI Egypt Cairo "EMEA Emerging Markets (EEM)" Emerging 20.36% EGY
'''
df = pd.read_csv(io.StringIO(data), delim_whitespace=True)
fig = px.choropleth(df,
locations = 'id',
geojson = geo_data,
featureidkey="properties.ISO_A3",
color = 'data')
fig.show()
Related
Say instead of a dictionary I have these lists:
cities = ('New York', 'Vancouver', 'London', 'Berlin', 'Tokyo', 'Bangkok')
Europe = ('London', 'Berlin')
America = ('New York', 'Vancouver')
Asia = ('Tokyo', 'Bangkok')
I want to create a pd.DataFrame from this such as:
City
Continent
New York
America
Vancouver
America
London
Europe
Berlin
Europe
Tokyo
Asia
Bangkok
Asia
Note: this is the minimum reproductible example to keep it simple, but the real dataset is more like city -> country -> continent
I understand with such a small sample it would be possible to manually create a dictionary, but in the real example there are many more data-points. So I need to automate it.
I've tried a for loop and a while loop with arguments such as "if Europe in cities" but that doesn't do anything and I think that's because it's "false" since it compares the whole list "Europe" against the whole list "cities".
Either way, my idea was that the loops would go through every city in the cities list and return (city + continent) for each. I just don't know how to um... actually make that work.
I am very new and I wasn't able to figure anything out from looking at similar questions.
Thank you for any direction!
Problem in your Code:
First of all, let's take a look at a Code Snippet used by you: if Europe in cities: was returned nothing Correct!
It is because you are comparing the whole list [Europe] instead of individual list element ['London', 'Berlin']
Solution:
Initially, I have imported all the important modules and regenerated a List of Sample Data provided by you.
# Import all the Important Modules
import pandas as pd
# Read Data
cities = ['New York', 'Vancouver', 'London', 'Berlin', 'Tokyo', 'Bangkok']
Europe = ['London', 'Berlin']
America = ['New York', 'Vancouver']
Asia = ['Tokyo', 'Bangkok']
Now, As you can see in your Expected Output we have 2 Columns mentioned below:
City [Which is already available in the form of cities (List)]
Continent [Which we have to generate based on other Lists. In our case: Europe, America, Asia]
For Generating a proper Continent List follow the Code mentioned below:
# Make Continent list
continent = []
# Compare the list of Europe, America and Asia with cities
for city in cities:
if city in Europe:
continent.append('Europe')
elif city in America:
continent.append('America')
elif city in Asia:
continent.append('Asia')
else:
pass
# Print the continent list
continent
# Output of Above Code:
['America', 'America', 'Europe', 'Europe', 'Asia', 'Asia']
As you can see we have received the expected Continent List. Now let's generate the pd.DataFrame() from the same:
# Make dataframe from 'City' and 'Continent List`
data_df = pd.DataFrame({'City': cities, 'Continent': continent})
# Print Results
data_df
# Output of the above Code:
City Continent
0 New York America
1 Vancouver America
2 London Europe
3 Berlin Europe
4 Tokyo Asia
5 Bangkok Asia
Hope this Solution helps you. But if you are still facing Errors then feel free to start a thread below.
1 : Counting elements
You just count the number of cities in each continent and create a list with it :
cities = ('New York', 'Vancouver', 'London', 'Berlin', 'Tokyo', 'Bangkok')
Europe = ('London', 'Berlin')
America = ('New York', 'Vancouver')
continent = []
cities = []
for name, cont in zip(['Europe', 'America', 'Asia'], [Europe, America, Asia]):
continent += [name for _ in range(len(cont))]
cities += [city for city in cont]
df = pd.DataFrame({'City': cities, 'Continent': continent}
print(df)
And this gives you the following result :
City Continent
0 London Europe
1 Berlin Europe
2 New York America
3 Vancouver America
4 Tokyo Asia
5 Bangkok Asia
This is I think the best solution.
2: With dictionnary
You can create an intermediate dictionnary.
Starting from your code
cities = ('New York', 'Vancouver', 'London', 'Berlin', 'Tokyo', 'Bangkok')
Europe = ('London', 'Berlin')
America = ('New York', 'Vancouver')
Asia = ('Tokyo', 'Bangkok')
You would do this :
continent = dict()
for cont_name, cont_cities in zip(['Europe', 'America', 'Asia'], [Europe, America, Asia]):
for city in cont_cities:
continent[city] = cont_name
This give you the following result :
{
'London': 'Europe', 'Berlin': 'Europe',
'New York': 'America', 'Vancouver': 'America',
'Tokyo': 'Asia', 'Bangkok': 'Asia'
}
Then, you can create your DataFrame :
df = pd.DataFrame(continent.items())
print(df)
0 1
0 London Europe
1 Berlin Europe
2 New York America
3 Vancouver America
4 Tokyo Asia
5 Bangkok Asia
This solution allows you not to override your cities tuple
I think on the long run you might want to elimninate loops for large datasets. Also, you might need to include more continent depending on the content of your data.
import pandas as pd
continent = {
'0': 'Europe',
'1': 'America',
'2': 'Asia'
}
df= pd.DataFrame([Europe, America, Asia]).stack().reset_index()
df['continent']= df['level_0'].astype(str).map(continent)
df.drop(['level_0','level_1'], inplace=True, axis=1)
You should get this output
0 continent
0 London Europe
1 Berlin Europe
2 New York America
3 Vancouver America
4 Tokyo Asia
5 Bangkok Asia
Feel free to adjust to suit your use case
So I tried graphing a data frame using pandas and when I typed it out there is a blank image that shows up with no errors or anything. I was hoping someone knows what the problem could be and how I can solve it.
I was wondering if this is a backend issue or what. Thank you!
For faster answers, we need the code in text format and sample data for reproduction. I have tried to apply the sample from the official reference to your code. The reason why the graph doesn't show up is a guess, since I don't have any code or data, but I think the country name is not retrieved from the dictionary. I extracted the top 10 countries from the sample data by population, and drew a graph based on the data extracted from the original data frame for those country names. The data used as the basis for the looping process is a dictionary of country names and arbitrary colors.
import plotly.express as px
from plotly.subplots import make_subplots
df1 = px.data.gapminder().query('year==2007').sort_values('pop', ascending=False).head(10)
df1
country
continent
year
lifeExp
pop
gdpPercap
iso_alpha
iso_num
299
China
Asia
2007
72.961
1318683096
4959.11
CHN
156
707
India
Asia
2007
64.698
1110396331
2452.21
IND
356
1619
United States
Americas
2007
78.242
301139947
42951.7
USA
840
719
Indonesia
Asia
2007
70.65
223547000
3540.65
IDN
360
179
Brazil
Americas
2007
72.39
190010647
9065.8
BRA
76
1175
Pakistan
Asia
2007
65.483
169270617
2605.95
PAK
586
107
Bangladesh
Asia
2007
64.062
150448339
1391.25
BGD
50
1139
Nigeria
Africa
2007
46.859
135031164
2013.98
NGA
566
803
Japan
Asia
2007
82.603
127467972
31656.1
JPN
392
995
Mexico
Americas
2007
76.195
108700891
11977.6
MEX
484
# create dict country and color
colors = px.colors.sequential.Plasma
color = {k:v for k,v in zip(df1.country,colors)}
{'China': '#0d0887',
'India': '#46039f',
'United States': '#7201a8',
'Indonesia': '#9c179e',
'Brazil': '#bd3786',
'Pakistan': '#d8576b',
'Bangladesh': '#ed7953',
'Nigeria': '#fb9f3a',
'Japan': '#fdca26',
'Mexico': '#f0f921'}
# top10 data
df1_top10 = px.data.gapminder().query('country in #df1.country')
import plotly.graph_objects as go
fig = go.Figure()
colors = px.colors.sequential.Plasma
for k,v in color.items():
fig.add_trace(go.Scatter(
x=df1_top10[df1_top10['country']==k]['year'],
y=df1_top10[df1_top10['country']==k]['lifeExp'],
name=k,
mode='markers+text+lines',
marker_color='black',
marker_size=3,
line=dict(color=color[k]),
yaxis='y1'))
fig.update_layout(
title="Top 10 Country wise Life Ladder trend",
xaxis_title="Year",
yaxis_title="Life Ladder",
template='ggplot2',
font=dict( size=16,
color="Black",
family="Garamond"
),
xaxis=dict(showgrid=True),
yaxis=dict(showgrid=True)
)
fig.show()
I have a dictionary looks like this,
{"regions":[
{"name": "South America", "code": "SA01,SA02,SA03"},
{"name": "Asia Pacific", "code": "AP01,AP02,AP03"}
]}
I have a df looks like this,
id code
1 SA01
2 SA02
3 SA03
4 AP01
5 AP02
6 AP03
I like to create a column region in df whose values will be based on the code values in regions, so the result will look like,
id code region
1 SA01 South America
2 SA02 South America
3 SA03 South America
4 AP01 Asia Pacific
5 AP02 Asia Pacific
6 AP03 Asia Pacific
I am wondering whats the best way to do this.
You could redefine your dictionary (d here) to have an individual code:region entry for each code that appears in the strings and use it to map the values in the code column:
d_ = {code:sd['name'] for sd in d['regions'] for code in sd['code'].split(',')}
# {'SA01': 'South America', 'SA02': 'South America', 'SA03': 'South America',...
df['region'] = df.code.map(d_)
print(df)
id code region
0 1 SA01 South America
1 2 SA02 South America
2 3 SA03 South America
3 4 AP01 Asia Pacific
4 5 AP02 Asia Pacific
5 6 AP03 Asia Pacific
This example takes your current dataset without any modifications. I'm sure that this can be refined by someone with more pandas experience.
import pandas as pd
mydict = {"regions":[
{"name": "South America", "code": "SA01,SA02,SA03"},
{"name": "Asia Pacific", "code": "AP01,AP02,AP03"}
]}
col_names_regions = ['code', 'region name']
df_regions = pd.DataFrame(columns=col_names_regions)
for key, values in mydict.items():
for value in values:
codes = value.get('code')
name = value.get('name')
for code in codes.split(','):
df1 = {'code': code, 'region name': name}
df_regions = df_regions.append(df1, ignore_index=True)
print (df_regions)
# output
code region name
0 SA01 South America
1 SA02 South America
2 SA03 South America
3 AP01 Asia Pacific
4 AP02 Asia Pacific
5 AP03 Asia Pacific
I have this CSV:
Name Species Country
0 Hobbes Tiger U.S.
1 SherKhan Tiger India
2 Rescuer Mouse Australia
3 Mickey Mouse U.S.
And I have a second CSV:
Continent Countries Unnamed: 2 Unnamed: 3 Unnamed: 4
0 North America U.S. Mexico Guatemala Honduras
1 Asia India China Nepal NaN
2 Australia Australia NaN NaN NaN
3 Africa South Africa Botswana Zimbabwe NaN
I want to use the second CSV to update the first file so that the output is:
Name Species Country
0 Hobbes Tiger North America
1 SherKhan Tiger Asia
2 Rescuer Mouse Australia
3 Mickey Mouse North America
So far this the closest I have gotten:
import pandas as pd
# Import my data.
data = pd.read_csv('Continents.csv')
Animals = pd.read_csv('Animals.csv')
Animalsdf = pd.DataFrame(Animals)
# Transpose my data from horizontal to vertical.
data1 = data.T
# Clean my data and update my header with the first column.
data1.columns = data1.iloc[0]
# Drop now duplicated data.
data1.drop(data1.index[[0]], inplace = True)
# Build the dictionary.
data_dict = {col: list(data1[col]) for col in data1.columns}
# Update my csv.
Animals['Country'] = Animals['Country'].map(data_dict)
print ('Animals')
This results in a dictionary that has lists as its values and therefore i just get NaN out:
Name Species Country
0 Hobbes Tiger NaN
1 SherKhan Tiger NaN
2 Rescuer Mole [Australia, nan, nan, nan]
3 Mickey Mole NaN
I've tried flipping from list to tuples and this doesn't work. Have tried multiple ways to pull in the dictionary etc. I am just out of ideas.
Sorry if the code is super junky. I'm learning this as I go. Figured a project was the best way to learn a new language. Didn't think it would be this difficult.
Any suggestions would be appreciated. I need to be able to use the code so that when I get multiple reference CSVs, I can update my data with new keys. Hope this is clear.
Thanks in advance.
One intuitive solution is to use a dictionary mapping. Data from #WillMonge.
pd.DataFrame.itertuples works by producing namedtuples, but they may also be referenced using numeric indexers.
# create mapping dictionary
d = {}
for row in df.itertuples():
d.update(dict.fromkeys(filter(None, row[2:]), row[1]))
# apply mapping dictionary
data['Continent'] = data['Country'].map(d)
print(data)
Country name Continent
0 China 2 Asia
1 China 5 Asia
2 Canada 9 America
3 Egypt 0 Africa
4 Mexico 3 America
You should use DictReader and DictWriter. You can learn how to use them by below link.
https://docs.python.org/2/library/csv.html
Here is an update of your code, I have tried to add comments to explain
import pandas as pd
# Read data in (read_csv also returns a DataFrame directly)
data = pd.DataFrame({'name': [2, 5, 9, 0, 3], 'Country': ['China', 'China', 'Canada', 'Egypt', 'Mexico']})
df = pd.DataFrame({'Continent': ['Asia', 'America', 'Africa'],
'Country1': ['China', 'Mexico', 'Egypt'],
'Country2': ['Japan', 'Canada', None],
'Country3': ['Thailand', None, None ]})
# Unstack to get a row for each country (remove the continent rows)
premap_df = pd.DataFrame(df.unstack('Continent').drop('Continent')).dropna().reset_index()
premap_df.columns = ['_', 'continent_key', 'Country']
# Merge the continent back based on the continent_key (old row number)
map_df = pd.merge(premap_df, df[['Continent']], left_on='continent_key', right_index=True)[['Continent', 'Country']]
# Merge with the data now
pd.merge(data, map_df, on='Country')
For further reference, Wes McKinney's Python for Data Analysis (here is a pdf version I found online) is one of the best books out there for learning pandas
You can always create buckets and run conditions:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name':['Hobbes','SherKhan','Rescuer','Mickey'], 'Species':['Tiger','Tiger','Mouse','Mouse'],'Country':['U.S.','India','Australia','U.S.']})
North_America = ['U.S.', 'Mexico', 'Guatemala', 'Honduras']
Asia = ['India', 'China', 'Nepal']
Australia = ['Australia']
Africa = ['South Africa', 'Botswana', 'Zimbabwe']
conditions = [
(df['Country'].isin(North_America)),
(df['Country'].isin(Asia)),
(df['Country'].isin(Australia)),
(df['Country'].isin(Africa))
]
choices = [
'North America',
'Asia',
'Australia',
'Africa'
]
df['Continent'] = np.select(conditions, choices, default = np.nan)
df
I have a data frame with country and traffic columns:
Country | Traffic
US 8687
Italy 902834
Germany 2343
Brazil 4254
France 23453
I want to add a third column called "Region" to this data frame. It would look like this:
Country | Traffic | Region
US 8687 US
Italy 902834 EU
Germany 2343 EU
Brazil 4254 LA
France 23453 EU
The following code works if I have only two Regions. I am looking more for an if/else, map, or lambda statement:
df['Region'] = np.where(df['Country'] == 'US', 'US', 'EU')
Thank You.
One simple approach is this:
dict ={'US':'US','Italy':'EU','Germany':'EU','Brazil':'LA','France':'EU'}
df['Region']=df['Country'].apply(lambda x : dict[x])
You could use a dictionary:
region_from_country = {
'US': 'US',
'Italy': 'EU',
'Germany': 'EU',
'Brazil': 'LA',
'France': 'EU',
}
df['Region'] = df['Country'].replace(region_from_country)
The keys in the dictionary are the countries and the values are the corresponding regions.