Clipping a Pandas dataframe to a shapefile using Python - python

I have a Pandas dataframe containing Latitude and Longitude data for an abundance of points, and I would like to clip them to a shapefile; ie: everything outside the boundaries of the shapefile is dropped.
Is there a way to achieve this within a Python package such as MatPlotLib?
I have searched on Google, but 'clipping' seems to refer to data being cut off by map legends/axis rather than what I'm describing.
This image displays what I'm after - I did this in ArcMap but need to automate the process. The grey points represent the entire dataframe, while red points are 'clipped' to the green shapefile below.
Here is the code I'm running to over-lay the points in case it is of any use:
f, ax = plt.subplots(1, figsize=(12, 12))
ax.plot(Easting,Northing, 'bo', markersize=5)
IMD.plot(column='imd_score', colormap='Blues', linewidth=0.1, axes=ax)
locs, labels = plt.xticks()
plt.setp(labels, rotation=90)
plt.axis('equal')

You can try geopandas. It's a very good library that works basically with pandas dataframes, but geolocalized, i.e. containing a column of Geometry. You can import export shapefiles directly within it, and finally do something like Intersection.

Related

Plot time series of paired columns

I have an excle file and would like to create time series plots. For a quick view of data for one site, caputred image is give below. I would like to plot two categories of data - one is modelled and the other is monitoring data. For example, "pH" and "pH data" on one plot, "WQ[SO4,Dissolved]" and "WQ data[SO4,Dissolved]" on one plot, as such for all the remaining 30 paires. That means 60 columns of data to plot.
enter image description here
My approach was:
1) read excel data as DF;
2) creat a list for each category of parameters to plot
3) use the "zip" function to creat a paralle list: parameters_pair = zip(parameters_model,parameters_monitor)
4) plot, some codes shown below.
for i,j in parameters_pair:
fig = plt.figure(figsize=(10, 7)
ax.plot(df_Plot['Tstamps'], df_Plot[i],
label=site, color='blue', linestyle='solid') #fillStyle='none'
ax.plot(df_Plot['Tstamps'], df_Plot[j],
label=site, color='orange', marker='s', markersize='4', linestyle='')
My code can plot for i or j individually but if it does not put modelled and monitoring data on one plot and iterate paralelly as expected. Could you please suggest what functions to use to solve this issue? Thank you very much.

Is there a way to add a line plot on top of all plots within a Catplot grid in Seaborn/Python?

Hello I am very new to using python, I am starting to use it for creating graphs at work (for papers and reports etc). I was just wondering if someone could help with the problem which I have detailed below? I am guessing there is a very simple solution but I can't figure it out and it is driving me insane!
Basically, I am plotting the results from an experiment where by on the Y-axis I have the results which in this case is a numerical number (Result), against the x-axis which is categorical and is labeled Location. The data is then split across four graphs based on which machine the experiment is carried out on (Machine)(Also categorical).
This first part is easy the code used is this:
'sns.catplot(x='Location', y='Result', data=df3, hue='Machine', col='Machine', col_wrap = 2, linewidth=2, kind='swarm')'
this provides me with the following graph:
I now want to add another layer to the plot where by it is a red line which represents the Upper spec limit for the data.
So I add the following line off code to the above:
'sns.lineplot(x='Location',y=1.8, data=df3, linestyle='--', color='r',linewidth=2)'
This then gives the following graph:
As you can see the red line which I want is only on one of the graphs, all I want to do is add the same red line across all four graphs in the exact same position etc.
Can anyone help me???
You could use .map to draw a horizontal lines on each of the subplots. You need to catch the generated FacetGrid object into a variable.
Here is an example:
import matplotlib.pyplot as plt
import seaborn as sns
titanic = sns.load_dataset('titanic').dropna()
g = sns.catplot(x='class', y='age', data=titanic,
hue='embark_town', col='embark_town', col_wrap=2, linewidth=2, kind='swarm')
g.map(plt.axhline, y=50, ls='--', color='r', linewidth=2)
plt.tight_layout()
plt.show()

Pie charts and geopandas map

I want to represent the ethnic distribution within each region of my map.
I'm a newbie in geopandas and until now I can just make a map that shows the share of one single ethnic group by region.
My code is the following:
geodf.plot(column="resid_preto", cmap="Blues", figsize=(20,12),
edgecolor='black', linewidth=0.5, alpha=0.9, legend=False)
plt.axis('off')
Where 'resid_preto' is a column that contains the share of the black population within the region
I want to make a map like this one. So, I could have the representation of all ethnic groups in one single map instead of creating one map per group
AFAIK there is no straight forward way to do this with geopandas. But combining it with scatter from pyplot the desired result can be obtained.
The first trick is to obtain the x,y coordinates where the pie chart is going to be plotted. In my case I used the centroid tool of geopandas with the x and y methods, e.g., the x and y coordinates are given by (i is the row index in the geopandas DataFrame):
centroids = geo_df.centroid
x_coord = centroids.x[i]
y_coord = centroids.y[i]
The next step is to use pyplot to create a pie chart and store the returned structure, which comprises, among other things, a list of patches with the wedges of the pie:
wedges=plt.pie([1,1,1])
The 3rd step is to create a plot object and then plot the geopandas DataFrame with the plot API, e.g.,
fig, ax = plt.subplots(1, 1)
geo_df.boundary.plot(ax=ax,zorder=1,edgecolor='black')
Finally, use scatter to add the pie charts on top of the geopandas plot. The for loop is to add each of the 3 wedges obtained from plot.pie([1,1,1]) (in case of more wedges just change this. There must be a more efficient/elegant way of doing this):
piecolor=['b','g','y']
for j in range(3):
ax.scatter([x_coord],[y_coord],marker=(wedges[0][j].get_path().vertices.tolist()),facecolor=piecolors[j], s=800)
plt.show()
The trick is to pass a non-default marker to scatter, in this case the wedges. For each element in wedges first get the path, then the vertices and finally convert it to a list and that's the marker.

Data not plotting, but no errors

I am trying to plot some precipitation data. The code I'm using is modified slightly from this code here.
The code works fine when I plot using the data from the site used in the link, but when I use a different dataset I have, it doesn't plot. The biggest difference between this dataset and the dataset used in the link's example, is my dataset is global data. The dataset I am using is also netcdf, is not masked, and I am loading it the same way as the example.
I am familiar with the data and know for a fact I should be seeing something and the contour values used in the example are reasonable for this other set of data I am using.
My code is the same, expect for some changes in the section that plots the figure (below) which I have modified so it will plot a specific area instead of CONUS like in the example (using ax.set_extent).
When I do not set the extent it appears to plot the data, but then none of the boundaries (coastlines, state lines, etc.) do not plot. Based on this, I'm guessing it's something with either the dataset itself, something with set_extent, or a combination of things that is causing it to go wrong. I am not getting back any kind of errors when I plot it, either way. However, there might be something else I'm missing with it.
In the end, I'm actually comparing my dataset to the dataset used in the example link, so I would like them in the same projection.
Thanks for any insight and let me know if you need more information about the data itself!
fig = plt.figure(figsize=(8, 8))
ax = fig.add_subplot(1, 1, 1, projection=proj)
ax.set_extent((x1,x0,y0,y1))
# draw coastlines, state and country boundaries, edge of map.
ax.coastlines()
ax.add_feature(cfeature.BORDERS)
ax.add_feature(cfeature.STATES)
cs1 = ax.contourf(ym, xm, data1, clevs, cmap=cmap, norm=norm)
# add colorbar.
cbar = plt.colorbar(cs1, orientation='horizontal')
#cbar.set_label(data1.units)
#ax.set_title(prcpvar.long_name + ' for period ending ' + nc.creation_time)
plt.show()
plt.savefig('ncep_model')
Results when extent is not included in code above:
Edit 1:
I'll add that I was able to successfully plot the data with this code below (from a default template I made). I tried to change the projection to stereographic, but I was having trouble getting it to plot correctly using basemap because I've never used it before. As an alternative, if you can't figure out the error with the code above and could instead help with changing the projection for the code below, I would also take that. At this point I just want my data to plot correctly in the correct projection I want!
(I also included the results for the code below to confirm that the data should be showing up in this location)
LLlat = 40.
LLlon = 263.
URlat = 44.
URlon = 270.
lat = xm
lon = ym
%matplotlib inline
plt.figure(1,figsize=(10, 8),)
plt.title('Convective Precipitation 8/28/2018 0Z (in) Valid July 2018')
map = Basemap(projection='cyl',\
llcrnrlat=LLlat,urcrnrlat=URlat,\
llcrnrlon=LLlon,urcrnrlon=URlon,\
rsphere=6371200.,resolution='i')
map.drawcoastlines(linewidth=0.5) # Draw some coastlines
map.drawstates(linewidth=0.5) # Draw some coastlines
map.drawrivers(color='#000000')
map.drawparallels(np.arange(-90.,91.,30),labels=[1,0,0,0]) # Drawing lines of latitude
map.drawmeridians(np.arange(0.,330.,60),labels=[0,0,0,1]) # Drawing lines of longitude
lons,lats = map(lon,lat) # Setting up the grid in cylindrical coords.
cs = plt.contourf(lons,lats,data1[:,:], clevs,cmap=cmap, norm=norm)
cb = plt.colorbar(cs,orientation='horizontal')
plt.show()
Edit 2:
I've added the resulting plot when I don't include the set_extent in the first chunk of code (Don't know if that will help at all, but thought I'd include it as well)
So it'd be really useful to have more information on your data, like a link to sample file, but my guess is that your data do not give coordinates in a stereographic projection, unlike the original data. When plotting with Cartopy, if you do not specify otherwise, all plot commands assume that the x,y values given are in the projection specified for the axes (for the original code this was ccrs.Stereographic). If this is not the case, such as when plotting lon/lats, you need to specify this by passing transform to the plotting command, as below where I specify that the x,y values are lat/lons:
data_proj = ccrs.PlateCarree()
cs1 = ax.contourf(ym, xm, data1, clevs, cmap=cmap, norm=norm,
transform=data_proj)

Change certain squares in a seaborn heat map

Say I have a heat map that looks like this (axes are trimmed off):
I want to be able to alter certain squares to denote statistical significance. I know that I could mask out the squares that are not statistically significant, but I still want to retain that information (and not set the values to zero). Options for doing this include 1) making the text on certain squares bold, 2) adding a hatch-like functionality so that certain squares have stippling, or 3) adding a symbol to certain squares.
What should I do?
One approach is to access the Text objects directly and change their weight/style. The below code will take some sample data and try to make every entry equal to 118 stand out:
flights = sns.load_dataset("flights")
flights = flights.pivot("month", "year", "passengers")
ax = sns.heatmap(flights, annot=True, fmt="d")
for text in ax.texts:
text.set_size(14)
if text.get_text() == '118':
text.set_size(18)
text.set_weight('bold')
text.set_style('italic')
I'm not a matplotlib/seaborn expert, but it appears to me that requiring an individual cell in the heatmap to be hatched would require a bit of work. In short, the heatmap is a Collection of matplotlib Patches, and the hatching of a collection can only be set on the collection as a whole. To set the hatch of an individual cell, you need them to be distinct patches, and things get messy. Perhaps (hopefully) someone more knowledgeable than I can come along and say that this is wrong, and that it's quite easy -- but if I had to guess, I'd say that changing the text style will be easier than setting a hatch.
You could plot twice, applying a mask to the cells you do not want to emphasize the second time:
import numpy as np
import seaborn as sns
x = np.random.randn(10, 10)
sns.heatmap(x, annot=True)
sns.heatmap(x, mask=x < 1, cbar=False,
annot=True, annot_kws={"weight": "bold"})

Categories