Geopandas: Color specific clusters in existing figure from shapefile - python

I have plotted a figure using a shapefile through a function using geopandas, it represents Germany with each federal state represented as a cluster. Now I would like to color the region with highest electricity production and the one with the lowest, both with different colors.
I am new to geopandas and couldn't find an answer, I guess I have to create a subplot but I'm not sure? How can I select and color each cluster? (I already determined the name and electricity production of the max and min cluster).
Thanks a lot in advance.

The type of plot you are trying to create is called Choropleth Maps. Create a data frame with the name of the federal state, electricity production of the max and min cluster, and the geometry of the state extracted from the shapefile. Doing this will create only 2 states with different colors, but if you want the map of Germany also, then you need to include the geometry of all the states and set their electricity to NaN.
Then you can call the plot() on the data frame with column='electricity_production'
df.head()
>> name, electricity_production, geometry
'cluster1', max/min, MULTIPOLYGON(((...
'cluster2', max/min. POLYGON((..
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
df.plot(column='electricity_production', ax=ax, legend=True)

Related

Create Bar Chart

How to create a bar chart like below chart for below question using Python?
Chart
Question: plot job role based on number of individuals who have have master degree AND earn <=50K in the United States
My table is below which I have imported from CSV file.
Degree job rol Country earning
Bachelor Admin US. <=50k
Master HR. England >=50k
I tried many ways. But could not do it
You can import the matplotlib.pyplot as plt and import numpy as np to make a bar graph. Create an array called y = np.array([...]) for example, and create a function called plt.bar(). From there, you can fill in the height and your y variable to create a graph as you desire.
Finally, just type plt.show() to display the graph.

Seaborn | Matplotlib: Scatter plot: 3rd variable is encoded by both size and colour

I want to create a plot that shows the geographical distribution of nightly prices using longitude and lattitude as coordinates, and the price encoded both by color and size of the circles. I curently have no idea on how to encode the plots by the price in both colour and size. I come to you in search of help~ I dont understand the documentation for seaborn in this scenario.
3 columns of interest:
longtitude lattitude Price
50.1235156 4.1236436 160
52.3697862 4.8935462 300
52.3640489 4.8895343 8000
52.3729765 4.8931707 1300
52.3657530 4.8796741 5000
52.2957663 4.3058365 60
52.6709324 4.6028347 100
In my scenario: each column is of equal length, but I only want to include prices that are >150
Im stuck with this filter in play, as the column with the applied filter is half the size as longitude and latitude.
My clueless attempt:
plt.scatter(df.longitude, df.latitude, s=(df.price)>150, c= (df.price)>150)
The way I understand it is that the latitude and longitude create the space/plane, and then apply the price data. But implementing it seems to work differently?
First of all, you need to filter the dataframe before plotting. If you do what you're doing (which won't work anyway), your Series of x and y coordinates will be the entire length of the dataframe but series responsible for color-coding and size will be shorter because you're trying to filter out values under 150 with this: s=(df.price)>150.
Secondly, you can't plot like this using matplotlib. With matplotlib to color-code points you need to create a dictionary, so I'd suggest using seaborn for simplicity.
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
df_plot = df.loc[df.price > 150]
fig = sns.scatterplot(data=df_plot, x='longitude', y='latitude', size='price', hue='price')
plt.show()

Plot mean and 95% CI values using seaborn

Region Year Crop Yield Lower CI Upper CI
0 Argentina 2017.0 Soya 2770.885366 2647.711922 2937.259244
1 Argentina 2018.0 Soya 3442.598073 3375.280283 3512.806645
2 Argentina 2019.0 Soya 3472.638859 3415.621142 3536.144550
3 Argentina 2020.0 Maize 6203.009997 6020.164203 6387.457295
Using the dataframe above, I want to plot each row using the data in the Yield, Lower CI and Upper CI columns. The Yield value should be represented as a dot and the Lower and Upper CI values should be represented like a box plot, sort of like:
Each Crop should be represented using a different color, while each year should be a different shade of the same color for that crop. Is there a way to do this using either seaborn or matplotlib?
Here is an answer using matplotlib that will get you started, then you can tell me what else you need. Note that with the data you've given us, the plot isn't so interesting, since none of the crops have overlapping years. So far the different shades per year isn't included (is it really needed?).
import matplotlib.pyplot as plt
from matplotlib import ticker
# Get the lower and upper "error bars"
df = df.assign(Lower=df["Yield"] - df["Lower CI"], Upper=df["Upper CI"] - df["Yield"])
# Make a new figure and axes
fig, ax = plt.subplots()
# Group the data by crop, and iterate through each crop
for crop, data in df.groupby("Crop"):
ax.errorbar(
data["Year"],
data["Yield"],
yerr=(data["Lower"], data["Upper"]),
fmt="o",
label=crop,
capsize=5,
)
ax.legend()
# This sets the x-axis to have ticks on years (so we don't get e.g. 2019.5)
ax.xaxis.set_major_locator(ticker.MultipleLocator())
ax.set(xlabel="Year", ylabel="Yield")
plt.show()
Here is the plot it produces

Pie charts and geopandas map

I want to represent the ethnic distribution within each region of my map.
I'm a newbie in geopandas and until now I can just make a map that shows the share of one single ethnic group by region.
My code is the following:
geodf.plot(column="resid_preto", cmap="Blues", figsize=(20,12),
edgecolor='black', linewidth=0.5, alpha=0.9, legend=False)
plt.axis('off')
Where 'resid_preto' is a column that contains the share of the black population within the region
I want to make a map like this one. So, I could have the representation of all ethnic groups in one single map instead of creating one map per group
AFAIK there is no straight forward way to do this with geopandas. But combining it with scatter from pyplot the desired result can be obtained.
The first trick is to obtain the x,y coordinates where the pie chart is going to be plotted. In my case I used the centroid tool of geopandas with the x and y methods, e.g., the x and y coordinates are given by (i is the row index in the geopandas DataFrame):
centroids = geo_df.centroid
x_coord = centroids.x[i]
y_coord = centroids.y[i]
The next step is to use pyplot to create a pie chart and store the returned structure, which comprises, among other things, a list of patches with the wedges of the pie:
wedges=plt.pie([1,1,1])
The 3rd step is to create a plot object and then plot the geopandas DataFrame with the plot API, e.g.,
fig, ax = plt.subplots(1, 1)
geo_df.boundary.plot(ax=ax,zorder=1,edgecolor='black')
Finally, use scatter to add the pie charts on top of the geopandas plot. The for loop is to add each of the 3 wedges obtained from plot.pie([1,1,1]) (in case of more wedges just change this. There must be a more efficient/elegant way of doing this):
piecolor=['b','g','y']
for j in range(3):
ax.scatter([x_coord],[y_coord],marker=(wedges[0][j].get_path().vertices.tolist()),facecolor=piecolors[j], s=800)
plt.show()
The trick is to pass a non-default marker to scatter, in this case the wedges. For each element in wedges first get the path, then the vertices and finally convert it to a list and that's the marker.

Clipping a Pandas dataframe to a shapefile using Python

I have a Pandas dataframe containing Latitude and Longitude data for an abundance of points, and I would like to clip them to a shapefile; ie: everything outside the boundaries of the shapefile is dropped.
Is there a way to achieve this within a Python package such as MatPlotLib?
I have searched on Google, but 'clipping' seems to refer to data being cut off by map legends/axis rather than what I'm describing.
This image displays what I'm after - I did this in ArcMap but need to automate the process. The grey points represent the entire dataframe, while red points are 'clipped' to the green shapefile below.
Here is the code I'm running to over-lay the points in case it is of any use:
f, ax = plt.subplots(1, figsize=(12, 12))
ax.plot(Easting,Northing, 'bo', markersize=5)
IMD.plot(column='imd_score', colormap='Blues', linewidth=0.1, axes=ax)
locs, labels = plt.xticks()
plt.setp(labels, rotation=90)
plt.axis('equal')
You can try geopandas. It's a very good library that works basically with pandas dataframes, but geolocalized, i.e. containing a column of Geometry. You can import export shapefiles directly within it, and finally do something like Intersection.

Categories