I'm trying to add weights to my folium heatmap layer, but I can't figure out how to correctly implement this.
I have a dataframe with 3 columns: LAT, LON and VALUE. Value being the total sales of that location.
self.map = folium.Map([mlat, mlon], tiles=tiles, zoom_start=8)
locs = zip(self.data.LAT, self.data.LON, self.data.VALUE)
HeatMap(locs, radius=30, blur=10).add_to(self.map)
I tried to use the absolute sales values and I also tried to normalize sales/sales.sum(). Both give me similar results.
The problem is:
Heatmap shows stronger red levels for regions with more stores. Even if the total sales of those stores together is a lot smaller than sales of a distant and isolate large store.
Expected behaviour:
I would expect that the intensity of the heatmap should use the value of sales of each store, as sales was passed in the zip object to the HeatMap plugin.
Let's say I have 2 regions: A and B.
In region A I have 3 stores: 10 + 15 + 10 = 35 total sales.
In region B I have 1 big store: 100 total sales
I'd expect a greater intensity for region B than for region A. I noticed that a similar behaviour only occurs when the difference is very large (if I try 35 vs 5000000 then region B becomes more relevant).
My CSV file is just a random sample, like this:
LAT,LON,VALUE,DATE,DIFFLAT1,DIFFLON1
-22.4056,-53.6193,14,2010,0.0242,0.4505
-22.0516,-53.7025,12,2010,0.3137,0.6636
-22.3239,-52.9108,100,2010,0.0514,0.0002
-22.6891,-53.7424,6,2010,0.0002,0.7887
-21.8762,-53.6866,16,2010,0.7283,0.6180
-22.1861,-53.5353,11,2010,0.1420,0.2924
from folium import plugins
from folium.plugins import HeatMap
heat_df = df.loc[:,["lat","lon","weight"]]
map_hooray = folium.Map(location=[45.517999 ,-73.568184 ], zoom_start=12 )
Format: list of lists as well as lat, lon and weight
heat_data = heat_df.values.tolist()
Plot it on the map
HeatMap(heat_data,radius=13).add_to(map_hooray)
Save the map
map_hooray.save('heat_map.html')
Related
I have two shapefiles of a city. The first one is extremely detailed, down to the level of blocks, that has several information about each block, including the population density. The second one is the same city divided into a squared grid of 1.45km2 cells, with no other information.
I want to calculate the population density in each cell of the squared grid. I tried with
enriched=gpd.read_file('enriched.shp') #gdf with pop density info
grid=gpd.read_file('grid.shp') #grid gdf
popd=gpd.sjoin(grid[['cell_id','geometry']],enriched, op='intersects') #merge grid with enriched shp
popd=popd[['cell_id','popdens']].groupby(['cell_id']).sum().reset_index() #groupby cell and sum the densities of the blocks within
grid=grid.merge(popd,on='cell_id', how='left').fillna(0)
but I am not sure this is the proper way, since I am getting very high density values in some cells (like > 200k per km2). Is this right? how can I check if I am not missing anything?
EDIT: Here are the column headers of the two shapefiles
enriched.columns
Index(['REGION', 'PROVINCIA', 'COMUNA', 'COD_DISTRI', 'COD_ZONA', 'area', 'popdens', 'geometry'],
dtype='object')
enriched.head(2)
REGION PROVINCIA COMUNA COD_DISTRI COD_ZONA area popdens geometry
0 13 131 13121 2.0 1.0 0.442290 4589.75053 POLYGON ((-70.65571 -33.47856, -70.65575 -33.4...
1 13 131 13121 6.0 1.0 0.773985 7661.64421 POLYGON ((-70.68182 -33.47654, -70.68144 -33.4...
don't worry about the first 5 columns, you can see them as a primary key in the dataset: all together they uniquely identify a zone.
grid.columns
Index(['cell_id', 'geometry'], dtype='object')
grid.head(2)
cell_id geometry
0 sq00024 POLYGON ((-70.79970 -33.50447, -70.78894 -33.5...
1 sq00025 POLYGON ((-70.79989 -33.51349, -70.78913 -33.5...
I have 4 Dataframes with different location: Indonesia, Singapore, Malaysia and Total each of them containing the percentage of the 5 top revenue-generating products. I have plotted them separately.
I want to combine them together on one plot where X-axis shows different locations and top-revenue-generating products for each location.
I have printed data frames and as you can see they have different products in them.
print(Ind_top_cat, Sin_top_cat, Mal_top_cat, Tot_top_cat)
Category Amt
M020P 0.144131
MH 0.099439
ML 0.055052
PB 0.050057
PPDR 0.048315
Category Amt
ML 0.480781
M015 0.073034
PPDR 0.035412
M025 0.033418
M020 0.031836
Category Amt
TN 0.343650
PPDR 0.190773
NMCN 0.118425
M015 0.047539
NN 0.038140
Category Amt
M020P 0.158575
MH 0.092012
ML 0.064179
PPDR 0.050803
PB 0.044301
Thanks to joelostblom I was able to construct a plot, however, there are still some issues.
enter image description here
all_countries = pd.concat([Ind_top_cat, Sin_top_cat, Mal_top_cat, Tot_top_cat])
all_countries['Category'] = all_countries.index
sns.barplot(x='Country', y='Amt',hue = 'Category',data=all_countries)
Is there any way I can put legend values on the x-axis (no need to colour categories on I want to instead colour countries), and put data values on top of bars. Also, bars are not centred and have no idea how to solve it.
You could create a new column in each dataframe with the country name, e.g.
Ind_top_cat['Country'] = 'Indonesia'
Sin_top_cat['Country'] = 'Singapore'
The you can create one big dataframe by concatenating the country dataframes together:
all_countries = pd.concat([Ind_top_cat, Sin_top_cat])
And finally, you can use a high level plotting library such as seaborn to assign one column to the x-axis location and one to the color of the bars:
import seaborn as sns
sns.barplot(x='Country', y='Amt', color='Category', data=all_countries)
You can scroll down to the second example on this page to get an idea what such a plot would look like (also pasted below):
I have a data frame for UK data that looks something like this:
longitude latitude region priority
51.307733 -0.75708898 South East High
51.527477 -0.20646542 London Medium
51.725135 0.4747223 East of England Low
This dataframe is several thousand rows long. I want a heatmap of the UK broken down by the regions and colour intensity to be dependent on the priority in each region.
I would like to know the best way to turn this into a heatmap of the UK. I have tried geoPandas and Plotly but I have no functioning knowledge of these. Are these the best way to do it or is there a tool out there that you can simply upload your data to and it will plot it for you? Thanks!
For this kind of job i use to go with folium, which is great to work with maps,
But for the heatMap you have to have your "priority" column as float!
import folium
from folium import plugins
from folium.plugins import HeatMap
my_map = folium.Map(location=[51.5074, 0.1278],
zoom_start = 13) # for UK
your_dataframe['latitude'] = your_dataframe['latitude'].astype(float)
your_dataframe['longitude'] = your_dataframe['longitude'].astype(float)
your_dataframe['priority'] = your_dataframe['priority'].astype(float)
heat_df = your_dataframe[['latitude', 'longitude','priority']]
heat_df = heat_df.dropna(axis=0, subset=['latitude','longitude','priority'])
# List comprehension to make out list of lists
heat_data = [[row['latitude'],row['longitude'],row['priority']] for index, row in heat_df.iterrows()]
my_map.add_children(plugins.HeatMap(heat_data))
my_map.save('map.html')
and then you have to open map.html with yout browser
I am trying to visualize the correlation of the Result column with every other column.
A_B A_C B_C Result
0 0.318182 0.925311 0.860465 91
1 -0.384030 0.991803 0.996344 12
2 -0.818182 0.411765 0.920000 53
3 0.444444 0.978261 0.944444 64
A_B = (A-B)/(A+B) correspondingly all other values too.
which works for smaller no. of columns but if I increase the no. of columns then no. of rows in heatmap keeps on stacking up.Is there any compact way to represent it.
Following code will reproduce the output-
import pandas as pd
import seaborn as sns
data = {'A':[232,243,12,546,67,12,78,11,245],
'B':[120,546,120,210,56,120,56,89,12],
'C':[9,1,5,6,7,43,7,12,64],
'Result':[91,12,53,64,71,436,74,123,641],
}
df = pd.DataFrame(data,columns=['A','B','C','Result'])
#Responsible for (A-B)/(A+B) ,(A-C)/(A+C) and similarly
colnames = df.columns.tolist()[:-1]
for i,c in enumerate(colnames):
if i!=len(colnames):
for k in range(i+1,len(colnames)):
df[c+'_'+colnames[k]]=(df[c]-df[colnames[k]])/(df[c]+df[colnames[k]])
newdf = df[['A_B','A_C','B_C','Result']].copy()
#Plotting A_B,A_C,B_C by ignoring the output of result of itself
plot = pd.DataFrame(newdf.corr().iloc[:-1,-1])
sns.heatmap(plot,annot=True)
A technique which I heard but unable to find any source ,is representing each correlation factor in the mini-recangles like
So according to it, considering the given map as a matrix of 3*3 and (0,0) starting from left-bottom, A_B will be represented in (1,1)
A_C in (2,1),B_C in (2,2).
But ,I am not getting it how to do it ?
You can plot the correlation of each column against the Result column and other columns as well. Below is one way to do so. Providing the x- and y-ticklabels guides you better for comparing the correlations. You can also annotate the correlation values to be displayed on the heat map.
cor = newdf.corr()
sns.heatmap(cor, xticklabels=cor.columns.values,
yticklabels=cor.columns.values, annot=True)
I have a Pandas DataFrame which has a two columns, pageviews and type:
pageviews type
0 48.0 original
1 1.0 licensed
2 181.0 licensed
...
I'm trying to create a histogram each for original and licensed. Each histogram would (ideally) chart the number of occurrences in a given range for that particular type. So the x-axis would be a range of pageviews and the y-axis would be the number of pageviews that fall within that range.
Any recs on how to do this? I feel like it should be straightforward...
Thanks!
Using your current dataframe: df.hist(by='type')
For example:
# Me recreating your dataframe
pageviews = np.random.randint(200, size=100)
types = np.random.choice(['original','licensed'], size=100)
df = pd.DataFrame({'pageviews': pageviews,'type':types})
# Code you need to create faceted histogram by type
df.hist(by='type')
pandas.DataFrame.hist documentation