Folium TopoJSON heatmap does not populate as expected - python

I am trying to display a folium choropleth heatmap using a custom topoJSON file and a dataframe. The map generates with a uniformly shaded choropleth instead of the expected heatmap.
Heres a snippet of code I am using (excludes basic imports, creation of dataframe):
cols = ['dma', 'values']
center_us_long_lat = [39.50, -98.35]
topo_path = r'../../data/designated_marketing_areas_us_topo.json'
us_map = folium.Map(location=center_us_long_lat,attr='dma_code',
tiles='Mapbox Bright', zoom_start=4, min_zoom=4)
us_map.choropleth(geo_path=topo_path, topojson='objects.nielsen_dma',
data=df, columns=cols,
fill_opacity=0.7,
key_on="feature.properties.dma",
line_color='white', fill_color='YlOrRd',
highlight=True
)
The output looks like this:
I've tried adjusting the key_on argument to feature.dma but this results in the same output.
As a reference here's a sample of the df data:
In[1]:
df.head():
Out[1]:
dma values
1 501 16.749
2 740 8.858
3 807 15.790
4 511 15.315
5 798 8.425
The topojson can be found here
What am I doing wrong? Thanks!

I know this might sound silly, but I always find these issues come down to mismatched data types or leading/lagging spaces. Good luck!

Related

Python data visualization: too small value to be visible - how to solve?

Here is dataset, i have:
Source
All Leads
Not Junks
Warms
Hots
Deals
Weighted Sum
web
281316
269490
10252
2508
1602
4376.5
telesales
30458
29732
431
138
85
316.2
networking
4249
4195
763
547
476
539.1
promos
1356
1308
30
1
0
10.8
I visualized it:
df.plot.bar()
And got this output:
Some columns got too small values, so that they are not visible, how can tackle this problem?
Setting bigger figure size isn't useful, it makes chart bigger, but columns ratio is still the same, so nothing changes
Any ideas how to make it look more sophisticated? Or maybe i should try different type of chart? Thank you!
Could try df.plot.bar(logy=true), but it's going to make useful interpretation of it messy. A Sankey diagram would probably be a better fit for showing how the data breaks down in each category.
Seaborn comes out a little nicer, but takes some transformation to produce the same type of output:
import seaborn as sns
df2 = df.melt('Source').rename(columns={'variable': 'Category', 'value': 'Values'})
sns.barplot(x='Source', y='Values', data=df2, hue='Category')
plt.show()
Output:
Or with log=True

Plotly parallel plot gives wrong categorization after specifying color parameter

I have a dataframe with categorical columns and a target column with two categories - 0 and 1.
dfx.target.value_counts()
0 8062
1 3919
Name: target, dtype: int64
I tried to create parallel plot for them using plotly, This works fine, I am pasting my target column's output:
fig = px.parallel_categories(dfx)
fig.show()
Then I tried to color code them, so according to documentation, we can specify column's name.
fig = px.parallel_categories(dfx, color = 'target')
fig.show()
however, while specifying color scheme - which can be done using a column's name, I am getting wrong distribution in the target column :
An additional category appears in gray color, also, the count of 0 and 1 in target column is wrong.
Note: There are no na in the data
Update: It turns out, it was version issue. After updating the package, I was able to do it.
have generated a sample dataset to plot
following your code, specifying color="target" caused errors, invalid colors
changed to use pandas series map() to build a series of target value to a color
required addition of dimensions parameter so that color was not added as an additional category in trace.
distributions are identical with or without using color parameter
import pandas as pd
import numpy as np
import plotly.express as px
# build a dataframe for use in plot
V=6
a = [chr(i) for i in range(ord("A"), ord("A")+V)]
R=2000
dfx = pd.DataFrame({c:np.random.choice(a[0:V//(i+1)], R) for i, c in enumerate(["source","interim","target"])})
# the plot - use "target" for colors
px.parallel_categories(
dfx,
dimensions=dfx.columns,
color=dfx["target"].map(
{
l: px.colors.qualitative.Light24[i % len(px.colors.qualitative.Light24)]
for i, l in enumerate(dfx["target"].unique())
}
),
)

How to display values as different colours in pandas' pivot_table?

I am trying to recreate this graph from here
My code is:
impute_grps = data.pivot_table(values=["Loan_Status"], index=[ "Credit_History","Gender"],
aggfunc='count')
print (impute_grps)
impute_grps.plot(kind='bar', stacked=True, color=['red','blue'], grid=False)
giving the image below. Can someone tell me how to split loan_status as per the original? i have tried adding it to index but get ValueError: Grouper for 'Loan_Status' not 1-dimensional error.

Plots do not appear when calling seaborn's pairplot on a pandas Dataframe

I have a Dataframe that looks like so
Price Mileage Age
4250 71000 8
6500 43100 6
26950 10000 3
1295 78000 17
5999 61600 8
This is assigned to dataset. I simply call sns.pairplot(dataset) and I'm left with just a single graph - the distribution of prices across my dataset. I expected a 3x3 grid of plots.
When I import a pre-configured dataset from seaborn I get the expected multiplot pair plot.
I'm new to seaborn so apologies if this is a silly question, but what am I doing wrong? It seems like a simple task.
From your comment, it seems like you're trying to plot on non-numeric columns. Try coercing them first:
dataset = dataset.apply(lambda x: pd.to_numeric(x, errors='coerce'))
sns.pairplot(dataset)
The errors='coerce' argument will replace non-coercible values (the reason your columns are objects in the first place) to NaN.

Heatmap with specific axis labels coloured

I am trying to plot a heatmap with 2 columns of data from a pandas dataframe. However, I would like to use a 3rd column to label the x axis, ideally by colour though another method such as an additional axis would be equally suitable. My dataframe is:
MUT SAMPLE VAR GROUP
True s1 1_1334442_T CC002
True s2 1_1334442_T CC006
True s1 1_1480354_GAC CC002
True s2 1_1480355_C CC006
True s2 1_1653038_C CC006
True s3 1_1730932_G CC002
...
Just to give a better idea of the data; there are 9 different types of 'GROUP', ~60,000 types of 'VAR' and 540 'SAMPLE's. I am not sure if this is the best way to build a heatmap in python but here is what I figured out so far:
pivot = pd.crosstab(df_all['VAR'],df_all['SAMPLE'])
sns.set(font_scale=0.4)
g = sns.clustermap(pivot, row_cluster=False, yticklabels=False, linewidths=0.1, cmap="YlGnBu", cbar=False)
plt.show()
I am not sure how to get 'GROUP' to display along the x-axis, either as an additional axis or just colouring the axis labels? Any help would be much appreciated.
I'm not sure if the 'MUT' column being a boolean variable is an issue here, df_all is 'TRUE' on every 'VAR' but as pivot is made, any samples which do not have a particular 'VAR' are filled as 0, others are filled with 1. My aim was to try and cluster samples with similar 'VAR' profiles. I hope this helps.
Please let me know if I can clarify anything further? Many thanks
Take look at this example. You can give a list or a dataframe column to the clustermap function. By specifying either the col_colors argument or the row_colors argument you can give colours to either the rows or the columns based on that list.
In the example below I use the iris dataset and make a pandas series object that specifies which colour the specific row should have. That pandas series is given as an argument for row_colors.
iris = sns.load_dataset("iris")
species = iris.pop("species")
lut = dict(zip(species.unique(), "rbg"))
row_colors = species.map(lut)
g = sns.clustermap(iris, row_colors=row_colors,row_cluster=False)
This code results in the following image.
You may need to tweak a bit further to also include a legend for the colouring for groups.

Categories