Pie charts and geopandas map - python

I want to represent the ethnic distribution within each region of my map.
I'm a newbie in geopandas and until now I can just make a map that shows the share of one single ethnic group by region.
My code is the following:
geodf.plot(column="resid_preto", cmap="Blues", figsize=(20,12),
edgecolor='black', linewidth=0.5, alpha=0.9, legend=False)
plt.axis('off')
Where 'resid_preto' is a column that contains the share of the black population within the region
I want to make a map like this one. So, I could have the representation of all ethnic groups in one single map instead of creating one map per group

AFAIK there is no straight forward way to do this with geopandas. But combining it with scatter from pyplot the desired result can be obtained.
The first trick is to obtain the x,y coordinates where the pie chart is going to be plotted. In my case I used the centroid tool of geopandas with the x and y methods, e.g., the x and y coordinates are given by (i is the row index in the geopandas DataFrame):
centroids = geo_df.centroid
x_coord = centroids.x[i]
y_coord = centroids.y[i]
The next step is to use pyplot to create a pie chart and store the returned structure, which comprises, among other things, a list of patches with the wedges of the pie:
wedges=plt.pie([1,1,1])
The 3rd step is to create a plot object and then plot the geopandas DataFrame with the plot API, e.g.,
fig, ax = plt.subplots(1, 1)
geo_df.boundary.plot(ax=ax,zorder=1,edgecolor='black')
Finally, use scatter to add the pie charts on top of the geopandas plot. The for loop is to add each of the 3 wedges obtained from plot.pie([1,1,1]) (in case of more wedges just change this. There must be a more efficient/elegant way of doing this):
piecolor=['b','g','y']
for j in range(3):
ax.scatter([x_coord],[y_coord],marker=(wedges[0][j].get_path().vertices.tolist()),facecolor=piecolors[j], s=800)
plt.show()
The trick is to pass a non-default marker to scatter, in this case the wedges. For each element in wedges first get the path, then the vertices and finally convert it to a list and that's the marker.

Related

Add labels ONLY to SELECTED data points in seaborn scatter plot

I have created a seaborn scatter plot and added a trendline to it. I have some datapoints that fall very far away from the trendline (see the ones highlighted in yellow) so I'd like to add data labels only to these points, NOT to all the datapoints in the graph.
Does anyone know what's the best way to do this?
So far I've found answers to "how to add labels to ALL data points" (see this link) but this is not my case.
In the accepted answer to the question that you reference you can see that the way they add labels to all data points is by looping over the data points and calling .text(x, y, string) on the axes. You can find the documentation for this method here (seaborn is implemented on top of matplotlib). You'll have to call this method for the selected points.
In your specific case I don't know exactly what formula you want to use to find your outliers but to literally get the ones beyond the limits of the yellow rectangle that you've drawn you could try the following:
for x,y in zip(xarr, yarr):
if x < 5 and y > 5.5:
ax.text(x+0.01, y, 'outlier', horizontalalignment='left', size='medium', color='black')
Where xarr is your x-values, yarr your y-values and ax the returned axes from your call to seaborn.

Annotating a few points on a tSNE plot - if possible, a couple of points per cluster

I have a list of ~500 embedding vectors (each embedding vector is length 400, too long to post, but this is an example of the start of one of them:
[-1.5425615, -0.52326035, 0.48309317, -1.3839878, -1.3774203, -0.44861528, 3.026304, -0.23582345, 4.3516054, -2.1284392, -3.0056703, 1.4997623, 0.51767087, -2.3668504, 0.9771546, -2.5286832, -1.1869463, -1.2889853, -4.272979...]
(so there are ~500 of these vector lists in a list called 'list_of_vectors')
There is also a list_of_labels, where each vector list is assigned to a label.
I want to plot them on a t-SNE plot, so I wrote:
tsne = TSNE(n_components=2)
X_tsne = tsne.fit_transform(list_of_vectors)
The output is:
So there are ~500 dots in the below plot, each one has one label (from list_of_labels)
You can see the dots are very roughly clustered, and I want to just add a couple of labels to each rough cluster, so I know which cluster is which, or can I can colour the clusters differently and have a legend with a sample word from that cluster in the legend?
Is there a way for me to annotate/label a couple of the dots in each cluster?
Or any method that would add say 5/10 labels to the below graph, so I can understand the plot better?
It doesn't have to be super exact, I'm just trying to broadly understand the plot better?
If I understand correctly, you want to annotate some points in your graph based on the group they belong to. And you want to annotate them with the group label. If that's the case, just iterate over the groups and annotate some randomly selected points. You could do it as I did in the first script or you can just plot the scatterplot with eg seaborn with hue and then add the loop over the points with annotation (second solution). But it would be much easier to read if you also assigned different colours to your groups:
# how many samples to annotate
m = 4
#create a new figure
plt.figure(figsize=(10,10))
#loop through labels and plot each cluster separately
for label in data.label.unique():
# plot the given group
plt.scatter(x=data.loc[data['label']==label, 'x'], y=data.loc[data['label']==label,'y'], alpha=0.5)
# randomly sample
tmp = data.loc[data['label']==label].sample(m)
#add label to some random points per group
for _,row in tmp.iterrows():
plt.annotate(label, (row['x'], row['y']), size=10, weight='bold', color='k')
with seaborn
sns.scatterplot(x="x", y="y", hue="label", data=data)
#loop through labels and plot each cluster
for label in data.label.unique():
# randomly sample
tmp = data.loc[data['label']==label].sample(m)
#add label to some random points per group
for _,row in tmp.iterrows():
plt.annotate(label, (row['x'], row['y']), size=10, weight='bold', color='k')

Clipping a Pandas dataframe to a shapefile using Python

I have a Pandas dataframe containing Latitude and Longitude data for an abundance of points, and I would like to clip them to a shapefile; ie: everything outside the boundaries of the shapefile is dropped.
Is there a way to achieve this within a Python package such as MatPlotLib?
I have searched on Google, but 'clipping' seems to refer to data being cut off by map legends/axis rather than what I'm describing.
This image displays what I'm after - I did this in ArcMap but need to automate the process. The grey points represent the entire dataframe, while red points are 'clipped' to the green shapefile below.
Here is the code I'm running to over-lay the points in case it is of any use:
f, ax = plt.subplots(1, figsize=(12, 12))
ax.plot(Easting,Northing, 'bo', markersize=5)
IMD.plot(column='imd_score', colormap='Blues', linewidth=0.1, axes=ax)
locs, labels = plt.xticks()
plt.setp(labels, rotation=90)
plt.axis('equal')
You can try geopandas. It's a very good library that works basically with pandas dataframes, but geolocalized, i.e. containing a column of Geometry. You can import export shapefiles directly within it, and finally do something like Intersection.

How to express classes on the axis of a heatmap in Seaborn

I created a very simple heatmap chart with Seaborn displaying a similarity square matrix. Here is the one line of code I used:
sns.heatmap(sim_mat, linewidths=0, square=True, robust=True)
sns.plt.show()
and this is the output I get:
What I'd like to do is to represent on the x and y axis not the labels of my instances but a colored indicator (imagine something like a small palplot on each axis) where each color represents another variable associated to each instance (let's say I have this info stored a list named labels) plus another legend for this kind of information next to the one specifying the colors of the heatmap (one like that for the lmplot). It is important that the two informations have different color palettes.
Is this possible in Seaborn?
UPDATE
What I am looking for is a clustermap as correctly suggested.
sns.clustermap(sim_mat, row_colors=label_cols, col_colors=label_cols
row_cluster=False, col_cluster=False)
Here is what I am getting btw, the dots and lines are too small and I do not see a way to enlarge them in the documentation. I'd like to
Plus, how can I add a legend and put the two one next to the other in the same position?
There are two options:
First, heatmap is an Axes level figure, so you could set up a main large main heatmap axes for the correlation matrix and flank it with heatmaps that you then pass class colors to yourself. This will be a little bit of work, but gives you lots of control over how everything works.
This is more or less an option in clustermap though, so I'm going to demonstrate how to do it that way here. It's a bit of a hack, but it will work.
First, we'll load the sample data and do a bit of roundabout transformations to get colors for the class labels.
networks = sns.load_dataset("brain_networks", index_col=0, header=[0, 1, 2])
network_labels = networks.columns.get_level_values("network")
network_pal = sns.cubehelix_palette(network_labels.unique().size,
light=.9, dark=.1, reverse=True,
start=1, rot=-2)
network_lut = dict(zip(map(str, network_labels.unique()), network_pal))
network_colors = pd.Series(network_labels).map(network_lut)
Next we call clustermap to make the main plot.
g = sns.clustermap(networks.corr(),
# Turn off the clustering
row_cluster=False, col_cluster=False,
# Add colored class labels
row_colors=network_colors, col_colors=network_colors,
# Make the plot look better when many rows/cols
linewidths=0, xticklabels=False, yticklabels=False)
The side colors are drawn with a heatmap, which matplotlib thinks of as quantitative data and thus there's not a straightforward way to get a legend directly from it. Instead of that, we'll add an invisible barplot with the right colors and labels, then add a legend for that.
for label in network_labels.unique():
g.ax_col_dendrogram.bar(0, 0, color=network_lut[label],
label=label, linewidth=0)
g.ax_col_dendrogram.legend(loc="center", ncol=6)
Finally, let's move the colorbar to take up the empty space where the row dendrogram would normally be and save the figure.
g.cax.set_position([.15, .2, .03, .45])
g.savefig("clustermap.png")
Building on the above answer, I think it's worth noting the possibility of multiple colour levels for labels - as noted in the clustermap docs ({row,col}_colors). I couldn't find an example of multiple levels, so I thought I'd share an example here.
networks = sns.load_dataset("brain_networks", index_col=0, header=[0, 1, 2])
network level
network_labels = networks.columns.get_level_values("network")
network_pal = sns.cubehelix_palette(network_labels.unique().size, light=.9, dark=.1, reverse=True, start=1, rot=-2)
network_lut = dict(zip(map(str, network_labels.unique()), network_pal))
Create index using the columns for networks
network_colors = pd.Series(network_labels, index=networks.columns).map(network_lut)
node level
node_labels = networks.columns.get_level_values("node")
node_pal = sns.cubehelix_palette(node_labels.unique().size)
node_lut = dict(zip(map(str, node_labels.unique()), node_pal))
Create index using the columns for nodes
node_colors = pd.Series(node_labels, index=networks.columns).map(node_lut)
Create dataframe for row and column color levels
network_node_colors = pd.DataFrame(network_colors).join(pd.DataFrame(node_colors))
create clustermap
g = sns.clustermap(networks.corr(),
# Turn off the clustering
row_cluster=False, col_cluster=False,
# Add colored class labels using data frame created from node and network colors
row_colors = network_node_colors,
col_colors = network_node_colors,
# Make the plot look better when many rows/cols
linewidths=0,
xticklabels=False, yticklabels=False,
center=0, cmap="vlag")
create two legends - one for each level by creating invisible column and row barplots (as per above)
network legend
from matplotlib.pyplot import gcf
for label in network_labels.unique():
g.ax_col_dendrogram.bar(0, 0, color=network_lut[label], label=label, linewidth=0)
l1 = g.ax_col_dendrogram.legend(title='Network', loc="center", ncol=5, bbox_to_anchor=(0.47, 0.8), bbox_transform=gcf().transFigure)
node legend
for label in node_labels.unique():
g.ax_row_dendrogram.bar(0, 0, color=node_lut[label], label=label, linewidth=0)
l2 = g.ax_row_dendrogram.legend(title='Node', loc="center", ncol=2, bbox_to_anchor=(0.8, 0.8), bbox_transform=gcf().transFigure)
plt.show()
When both dendrograms are used one can also add a new hidden axis and draw the legend.
ax= f.add_axes((0,0,0,0))
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
for label in node_labels.unique():
ax.bar(0, 0, color=node_lut[label], label=label, linewidth=0)
l2 = g.ax_row_dendrogram.legend(title='Node', loc="center", ncol=2, bbox_to_anchor=(0.8, 0.8), bbox_transform=f.transFigure)

How can I draw a graph or plot with 4 quadrants using Python matplotlib?

My objective is to draw a graph with 4 quadrants and plot points in the same. And also, how can I divide a quadrant into several sectors? How can I do the same in matplotlib: a graph/plot with 4 quadrants. With x axis (1-9) and y-axis(1-9)?
From the question, it sounds like you want a single graph with several delineated regions with a specific xy range. This is pretty straightforward to do. You can always just draw lines on the plot to delineate the regions of interest. Here is a quick example based on your stated objectives:
import matplotlib.pyplot as plt
plt.figure()
# Set x-axis range
plt.xlim((1,9))
# Set y-axis range
plt.ylim((1,9))
# Draw lines to split quadrants
plt.plot([5,5],[1,9], linewidth=4, color='red' )
plt.plot([1,9],[5,5], linewidth=4, color='red' )
plt.title('Quadrant plot')
# Draw some sub-regions in upper left quadrant
plt.plot([3,3],[5,9], linewidth=2, color='blue')
plt.plot([1,5],[7,7], linewidth=2, color='blue')
plt.show()
I would take a look at the AxesGrid toolkit:
http://matplotlib.sourceforge.net/mpl_toolkits/axes_grid/index.html
Perhaps the middle image at the top of this page is something along the lines of what you are looking for. There are examples on the following page in the API documentation that should be a good starting point:
http://matplotlib.sourceforge.net/mpl_toolkits/axes_grid/users/overview.html
Without an example of what you want to do exactly it is difficult to give you the best advice.
you need subplot see this example:
http://matplotlib.sourceforge.net/examples/pylab_examples/subplot_toolbar.html

Categories