I want to visualize the number of crimes by state using plotly express.
This is the code :
import plotly.express as px
fig = px.choropleth(grouped, locations="Code",
color="Incident",
hover_name="Code",
animation_frame='Year',
scope='usa')
fig.show()
The dataframe itself looks like this:
I only get blank map:
What is the wrong with the code?
The reason for the lack of color coding is that the United States is not specified in the location mode. please find attached a graph with locationmode='USA-states' added. You can find an example in the references. The data was created for your data.
df.head()
Year Code State incident
0 1980 AL Alabama 1445
1 1980 AK Alaska 970
2 1980 AZ Arizona 3092
3 1980 AR Arkansas 1557
4 1980 CA California 1614
import plotly.express as px
fig = px.choropleth(grouped,
locations='Code',
locationmode='USA-states',
color='incident',
hover_name="Code",
animation_frame='Year',
scope="usa")
fig.show()
I'm trying to create a chloropleth chart using plotly express. I have two files, my geojson file and my data file. Example snippet for one country in my geojson file below:
{'type': 'Feature',
'properties': {'ADMIN': 'Aruba', 'ISO_A3': 'ABW'},
'geometry': {'type': 'Polygon',
'coordinates': [[[-69.99693762899992, 12.577582098000036],
[-69.93639075399994, 12.53172435100005],
[-69.92467200399994, 12.519232489000046],
[-69.91576087099992, 12.497015692000076],
[-69.88019771999984, 12.453558661000045],
[-69.87682044199994, 12.427394924000097],
[-69.88809160099993, 12.417669989000046],
[-69.90880286399994, 12.417792059000107],
[-69.93053137899989, 12.425970770000035],
[-69.94513912699992, 12.44037506700009],
[-69.92467200399994, 12.44037506700009],
[-69.92467200399994, 12.447211005000014],
[-69.95856686099992, 12.463202216000099],
[-70.02765865799992, 12.522935289000088],
[-70.04808508999989, 12.53115469000008],
[-70.05809485599988, 12.537176825000088],
[-70.06240800699987, 12.546820380000057],
[-70.06037350199995, 12.556952216000113],
[-70.0510961579999, 12.574042059000064],
[-70.04873613199993, 12.583726304000024],
[-70.05264238199993, 12.600002346000053],
[-70.05964107999992, 12.614243882000054],
[-70.06110592399997, 12.625392971000068],
[-70.04873613199993, 12.632147528000104],
[-70.00715084499987, 12.5855166690001],
[-69.99693762899992, 12.577582098000036]]]},
'id': 'ABW'}
Head from df is shown below which has the column 'data' which will be used to create the heatmap
Location_Site
Country
City
Cluster_Name
Market_Type
data
id
2
IT-MIL
Italy
Milan
Italy
Mature
73.14%
ITA
3
ES-MAD
Spain
Madrid
Iberia
Mature
55.27%
ESP
4
PT-LIS
Portugal
Lisbon
Iberia
Medium
45.71%
PRT
5
AE-DXB
United Arab Emirates
Dubai
EMEA Emerging Markets (EEM)
Emerging
62.98%
ARE
6
EG-CAI
Egypt
Cairo
EMEA Emerging Markets (EEM)
Emerging
20.36%
EGY
The below code snippet is what I'm trying to execute to plot my choropleth graph
fig = px.choropleth(df,
locations = 'id',
geojson = data,
color = 'data')
fig.show()
I am receiving the below error after execution:
ValueError: The first argument to the plotly.graph_objs.layout.Template
constructor must be a dict or
an instance of :class:`plotly.graph_objs.layout.Template`
Any ideas on what might be creating this error? Thanks!
To solve your problem, you need to tie the ID value of the data frame to the ISO_A3 value of the geojson value. aruba was modified to ABW for ITA in Italy, and the output of the map was obtained.
import plotly.express as px
geo_data = {'type': 'Feature',
'properties': {'ADMIN': 'Aruba', 'ISO_A3': 'ABW'},
'geometry': {'type': 'Polygon',
'coordinates': [[[-69.99693762899992, 12.577582098000036],
[-69.93639075399994, 12.53172435100005],
[-69.92467200399994, 12.519232489000046],
[-69.91576087099992, 12.497015692000076],
[-69.88019771999984, 12.453558661000045],
[-69.87682044199994, 12.427394924000097],
[-69.88809160099993, 12.417669989000046],
[-69.90880286399994, 12.417792059000107],
[-69.93053137899989, 12.425970770000035],
[-69.94513912699992, 12.44037506700009],
[-69.92467200399994, 12.44037506700009],
[-69.92467200399994, 12.447211005000014],
[-69.95856686099992, 12.463202216000099],
[-70.02765865799992, 12.522935289000088],
[-70.04808508999989, 12.53115469000008],
[-70.05809485599988, 12.537176825000088],
[-70.06240800699987, 12.546820380000057],
[-70.06037350199995, 12.556952216000113],
[-70.0510961579999, 12.574042059000064],
[-70.04873613199993, 12.583726304000024],
[-70.05264238199993, 12.600002346000053],
[-70.05964107999992, 12.614243882000054],
[-70.06110592399997, 12.625392971000068],
[-70.04873613199993, 12.632147528000104],
[-70.00715084499987, 12.5855166690001],
[-69.99693762899992, 12.577582098000036]]]},
'id': 'ABW'}
import pandas as pd
import numpy as np
import io
data = '''
Location_Site Country City Cluster_Name Market_Type data id
2 IT-MIL Italy Milan Italy Mature 73.14% ABW
3 ES-MAD Spain Madrid Iberia Mature 55.27% ESP
4 PT-LIS Portugal Lisbon Iberia Medium 45.71% PRT
5 AE-DXB "United Arab Emirates" Dubai "EMEA Emerging Markets (EEM)" Emerging 62.98% ARE
6 EG-CAI Egypt Cairo "EMEA Emerging Markets (EEM)" Emerging 20.36% EGY
'''
df = pd.read_csv(io.StringIO(data), delim_whitespace=True)
fig = px.choropleth(df,
locations = 'id',
geojson = geo_data,
featureidkey="properties.ISO_A3",
color = 'data')
fig.show()
I'm having troubles with avoiding negative values in interpolation. I have the following data in a DataFrame:
current_country =
idx Country Region Rank Score GDP capita Family Life Expect. Freedom Trust Gov. Generosity Residual Year
289 South Sudan Sub-Saharan Africa 143 3.83200 0.393940 0.185190 0.157810 0.196620 0.130150 0.258990 2.509300 2016
449 South Sudan Sub-Saharan Africa 147 3.59100 0.397249 0.601323 0.163486 0.147062 0.116794 0.285671 1.879416 2017
610 South Sudan Sub-Saharan Africa 154 3.25400 0.337000 0.608000 0.177000 0.112000 0.106000 0.224000 1.690000 2018
765 South Sudan Sub-Saharan Africa 156 2.85300 0.306000 0.575000 0.295000 0.010000 0.091000 0.202000 1.374000 2019
And I want to interpolate the following year (2019) - shown below - using pandas' df.interpolate()
new_row =
idx Country Region Rank Score GDP capita Family Life Expect. Freedom Trust Gov. Generosity Residual Year
593 South Sudan Sub-Saharan Africa 0 np.nan np.nan np.nan np.nan np.nan np.nan np.nan np.nan 2015
I create the df containing null values in all columns to be interpolated (as above) and append that one to the original dataframe before I interpolate to populate the cells with NaNs.
interpol_subset = current_country.append(new_row)
interpol_subset = interpol_subset.interpolate(method = "pchip", order = 2)
This produces the following df
idx Country Region Rank Score GDP capita Family Life Expect. Freedom Trust Gov. Generosity Residual Year
289 South Sudan Sub-Saharan Africa 143 3.83200 0.393940 0.185190 0.157810 0.196620 0.130150 0.258990 2.509300 2016
449 South Sudan Sub-Saharan Africa 147 3.59100 0.397249 0.601323 0.163486 0.147062 0.116794 0.285671 1.879416 2017
610 South Sudan Sub-Saharan Africa 154 3.25400 0.337000 0.608000 0.177000 0.112000 0.106000 0.224000 1.690000 2018
765 South Sudan Sub-Saharan Africa 156 2.85300 0.306000 0.575000 0.295000 0.010000 0.091000 0.202000 1.374000 2019
4 South Sudan Sub-Saharan Africa 0 2.39355 0.313624 0.528646 0.434473 -0.126247 0.072480 0.238480 0.963119 2015
The issue: In the last row, the value in "Freedom" is negative. Is there a way to parameterize the df.interpolate function such that it doesn't produce negative values? I can't find anything in the documentation. I'm fine with the estimates besides that negative value (Although they're a bit skewed)
I considered simply flipping the negative to a positive, but the "Score" value is a sum of all the other continuous features and I would like to keep it that way. What can I do here?
Here's a link to the actual code snippet. Thanks for reading.
I doubt this is an issue for interpolation. The main reason is the method you were using. 'pchip' will return a negative value for the 'freedom' anyway. If we take the values from your dataframe:
import numpy as np
import scipy.interpolate
y = np.array([0.196620, 0.147062, 0.112000, 0.010000])
x = np.array([0, 1, 2, 3])
pchip_obj = scipy.interpolate.PchipInterpolator(x, y)
print(pchip_obj(4))
The result is -0.126. I think if you want a positive result you should better change the method you are using.
I've this data of 2007 with population in Millions,GDP in Billions and index column is Country
continent year lifeExpectancy population gdpPerCapita GDP Billions
country
China Asia 2007 72.961 1318.6831 4959.11485 6539.50093
India Asia 2007 64.698 1110.39633 2452.21041 2722.92544
United States Americas 2007 78.242 301.139947 42951.6531 12934.4585
Indonesia Asia 2007 70.65 223.547 3540.65156 791.502035
Brazil Americas 2007 72.39 190.010647 9065.80083 1722.59868
Pakistan Asia 2007 65.483 169.270617 2605.94758 441.110355
Bangladesh Asia 2007 64.062 150.448339 1391.25379 209.311822
Nigeria Africa 2007 46.859 135.031164 2013.97731 271.9497
Japan Asia 2007 82.603 127.467972 31656.0681 4035.1348
Mexico Americas 2007 76.195 108.700891 11977.575 1301.97307
I am trying to plot a histogram as the following:
This was plotted using matplotlib (code below), and I want to get this with df.plot method.
The code for plotting with matplotlib:
x = data.plot(y=[3],kind = "bar")
data.plot(y = [3,5],kind = "bar",secondary_y = True,ax = ax,style='g:', figsize = (24, 6))
plt.show()
You could use df.plot() with the y axis columns you need in your plot and secondary_y argument as the second column
data[['population','gdpPerCapita']].plot(kind='bar', secondary_y='gdpPerCapita')
If you want to set the y labels for each side, then you have to get all the axes of the plot (in this case 2 y axis) and set the labels respectively.
ax1, ax2 = plt.gcf().get_axes()
ax1.set_ylabel('Population')
ax2.set_ylabel('GDP')
Output:
I have a dataframe called wine that contains a bunch of rows I need to drop.
How do i drop all rows in column 'country' that are less than 1% of the whole?
Here are the proportions:
#proportion of wine countries in the data set
wine.country.value_counts() / len(wine.country)
US 0.382384
France 0.153514
Italy 0.100118
Spain 0.070780
Portugal 0.062186
Chile 0.056742
Argentina 0.042835
Austria 0.034767
Germany 0.028928
Australia 0.021434
South Africa 0.010233
New Zealand 0.009069
Israel 0.006133
Greece 0.004493
Canada 0.002526
Hungary 0.001755
Romania 0.001558
...
I got lazy and didn't include all of the results, but i think you catch my drift. I need to drop all rows with proportions less than .01
Here is the head of my dataframe:
country designation points price province taster_name variety year price_category
Portugal Avidagos 87 15.0 Douro Roger Voss Portuguese Red 2011.0 low
You can use something like this:
df = df[df.proportion >= .01]
From that dataset it should give you something like this:
US 0.382384
France 0.153514
Italy 0.100118
Spain 0.070780
Portugal 0.062186
Chile 0.056742
Argentina 0.042835
Austria 0.034767
Germany 0.028928
Australia 0.021434
South Africa 0.010233
figured it out
country_filter = wine.country.value_counts(normalize=True) > 0.01
country_index = country_filter[country_filter.values == True].index
wine = wine[wine.country.isin(list(country_index))]