Python: Convert map in kilometres to degrees - python

I have a pandas Dataframe with a few million rows, each with an X and Y attribute with their location in kilometres according to the WGS 1984 World Mercator projection (created using ArcGIS).
What is the easiest way to project these points back to degrees, without leaving the Python/pandas environment?

There is already a python module that can do these kind of transformations for you called pyproj. I will agree it is actually not the simplest module to find via google. Some examples of its use can be seen here

Many years later, this is how I would do this. Keeping everything in GeoPandas to minimise the possibility of footguns.
Some imports:
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
Create a dataframe (note the values must be in metres!)
df = pd.DataFrame({"X": [50e3, 900e3], "Y": [20e3, 900e3]})
Create geometries from the X/Y coordinates
df["geometry"] = df.apply(lambda row: Point(row.X, row.Y), axis=1)
Convert to a GeoDataFrame, setting the current CRS.
In this case EPSG:3857, the projection from the question.
gdf = gpd.GeoDataFrame(df, crs=3857)
Project it to the standard WGS84 CRS in degrees (EPSG:4326).
gdf = gdf.to_crs(4326)
And then (optionally), extract the X/Y coordinates in degrees back into standard columns:
gdf["X_deg"] = gdf.geometry.apply(lambda p: p.x)
gdf["Y_deg"] = gdf.geometry.apply(lambda p: p.y)

Related

Extract values from xarray dataset using geopandas multilinestring

I have a few hundred geopandas multilinestrings that trace along an object of interest (one line each week over a few years tracing the Gulf Stream) and I want to use those lines to extract values from a few other xarray datasets to know sea surface temperature, chlorophyll-a, and other variables along this path each week.
I'm unsure though how exactly to use these geopandas lines to extract values from the xarray datasets. I have thought about breaking them into points and grabbing the dataset values at each point but that seems a bit cumbersome. Is there any straightforward way to do this operation?
Breaking the lines into points and then extracting the point is quite straightforward actually!
import geopandas as gpd
import numpy as np
import shapely.geometry as sg
import xarray as xr
# Setup an example DataArray:
y = np.arange(20.0)
x = np.arange(20.0)
da = xr.DataArray(
data=np.random.rand(y.size, x.size),
coords={"y": y, "x": x},
dims=["y", "x"],
)
# Setup an example geodataframe:
gdf = gpd.GeoDataFrame(
geometry=[
sg.LineString([(0.0, 0.0), (5.0, 5.0)]),
sg.LineString([(10.0, 10.0), (15.0, 15.0)]),
]
)
# Get the centroids, and create the indexers for the DataArray:
centroids = gdf.centroid
x_indexer = xr.DataArray(centroids.x, dims=["point"])
y_indexer = xr.DataArray(centroids.y, dims=["point"])
# Grab the results:
da.sel(x=x_indexer, y=y_indexer, method="nearest")
<xarray.DataArray (point: 2)>
array([0.80121949, 0.34728138])
Coordinates:
y (point) float64 3.0 13.0
x (point) float64 3.0 13.0
* point (point) int64 0 1
The main thing is to decide on which point you'd like to sample, or how many points, etc.
Note that the geometry objects in the geodataframe also have an interpolation method, if you'd like draw values at specific points along the trajectory:
https://shapely.readthedocs.io/en/stable/manual.html#object.interpolate
In such a case, .apply can come in handy:
gdf.geometry.apply(lambda geom: geom.interpolate(3.0))
0 POINT (2.12132 2.12132)
1 POINT (12.12132 12.12132)
Name: geometry, dtype: geometry
I have used regionmask and it is pretty fast and easy to use. The mask_geopandas method is what you need.
Since GeoPandas uses the same conventions as Pandas, the best way is to unify the data type when you're working on it. You can do this in xarray with:
xr.Dataset.from_dataframe(df)

Find in what polygon is each point

I am new to Python, so I apologize for the rudimentary programming skills, I am aware I am using a bit too much "loop for" (coming from Matlab it is dragging me down).
I have millions of points (timestep, long, lat, pointID) and hundreds of irregular non-overlapping polygons (vertex_long,vertex_lat,polygonID).points and polygons format sample
I want to know what polygon contains each point.
I was able to do it this way:
from matplotlib import path
def inpolygon(lon_point, lat_point, lon_poly, lat_poly):
shape = lon_point.shape
lon_point = lon_point.reshape(-1)
lat_point = lat_point.reshape(-1)
lon_poly = lon_poly.values.reshape(-1)
lat_poly = lat_poly.values.reshape(-1)
points = [(lon_point[i], lat_point[i]) for i in range(lon_point.shape[0])]
polys = path.Path([(lon_poly[i], lat_poly[i]) for i in range(lon_poly.shape[0])])
return polys.contains_points(points).reshape(shape)
And then
import numpy as np
import pandas as pd
Areas_Lon = Areas.iloc[:,0]
Areas_Lat = Areas.iloc[:,1]
Areas_ID = Areas.iloc[:,2]
Unique_Areas = np.unique(Areas_ID)
Areas_true=np.zeros((Areas_ID.shape[0],Unique_Areas.shape[0]))
for i in range(Areas_ID.shape[0]):
for ii in range(Unique_Areas.shape[0]):
Areas_true[i,ii]=(Areas_ID[i]==Unique_Areas[ii])
Areas_Lon_Vertex=np.zeros(Unique_Areas.shape[0],dtype=object)
Areas_Lat_Vertex=np.zeros(Unique_Areas.shape[0],dtype=object)
for i in range(Unique_Areas.shape[0]):
Areas_Lon_Vertex[i]=(Areas_Lon[(Areas_true[:,i]==1)])
Areas_Lat_Vertex[i]=(Areas_Lat[(Areas_true[:,i]==1)])
import f_inpolygon as inpolygon
Areas_in=np.zeros((Unique_Areas.shape[0],Points.shape[0]))
for i in range (Unique_Areas.shape[0]):
for ii in range (PT.shape[0]):
Areas_in[i,ii]=(inpolygon.inpolygon(Points[ii,2], Points[ii,3], Areas_Lon_Vertex[i], Areas_Lat_Vertex[i]))
This way the final outcome Areas_in Areas_in format contains as many rows as polygons and as many columns as points, where every column is true=1 at the row where the point is relative to polygon index (1st given polygon ID --> 1st row, and so).
The code works but very slowly for what it is supossed to do. When locating points in a regular grid or within a point radius I have succesfully tried implement a KDtree, what increases dramatically the speed, but I can`t do the same or whatever faster to irregular non-overlapping polygons.
I have seen some related questions but rather than asking for what polygons a point is were about whether a point is inside a polygon or not.
Any idea please?
Have you tried Geopandas Spatial join?
install the Package using pip
pip install geopandas
or conda
conda install -c conda-forge geopandas
then you should able to read the data as GeoDataframe
import geopandas
df = geopandas.read_file("file_name1.csv") # you can read shp files too.
right_df = geopandas.read_file("file_name2.csv") # you can read shp files too.
# Convert into geometry column
geometry = [Point(xy) for xy in zip(df['longitude'], df['latitude'])] # Coordinate reference system : WGS84
crs = {'init': 'epsg:4326'}
# Creating a Geographic data frame
left_df = geopandas.GeoDataFrame(df, crs=crs, geometry=geometry)
Then you can apply the sjoin
jdf = geopandas.sjoin(left_df, right_df, how='inner', op='intersects', lsuffix='left', rsuffix='right')
the option in op are:
intersects
contains
within
All should do the same in your case when you joining two geometry columns of type Polygon and Point

How do I correctly reproject a geodataframe with multiple geometry colums?

In the geopandas documentation it says that
A GeoDataFrame may also contain other columns with geometrical (shapely) objects, but only one column can be the active geometry at a time. To change which column is the active geometry column, use the set_geometry method.
I'm wondering how to use such a GeoDataFrame if the goal is to flexibly reproject the geometrical data in these various columns to one or more other coordinate reference systems. Here's what I tried.
First try
import geopandas as gpd
from shapely.geometry import Point
crs_lonlat = 'epsg:4326' #geometries entered in this crs (lon, lat in degrees)
crs_new = 'epsg:3395' #geometries needed in (among others) this crs
gdf = gpd.GeoDataFrame(crs=crs_lonlat)
gdf['geom1'] = [Point(9,53), Point(9,54)]
gdf['geom2'] = [Point(8,63), Point(8,64)]
#Working: setting geometry and reprojecting for first time.
gdf = gdf.set_geometry('geom1')
gdf = gdf.to_crs(crs_new) #geom1 is reprojected to crs_new, geom2 still in crs_lonlat
gdf
Out:
geom1 geom2
0 POINT (1001875.417 6948849.385) POINT (8 63)
1 POINT (1001875.417 7135562.568) POINT (8 64)
gdf.crs
Out: 'epsg:3395'
So far, so good. Things go off the rails if I want to set geom2 as the geometry column, and reproject that one as well:
#Not working: setting geometry and reprojecting for second time.
gdf = gdf.set_geometry('geom2') #still in crs_lonlat...
gdf.crs #...but this still says crs_new...
Out: 'epsg:3395'
gdf = gdf.to_crs(crs_new) #...so this doesn't do anything! (geom2 unchanged)
gdf
Out:
geom1 geom2
0 POINT (1001875.417 6948849.385) POINT (8.00000 63.00000)
1 POINT (1001875.417 7135562.568) POINT (8.00000 64.00000)
Ok, so, apparently, the .crs attribute of the gdf is not reset to its original value when changing the column that serves as the geometry - it seems, the crs is not stored for the individual columns. If that is the case, the only way I see to use reprojection with this dataframe, is to backtrack: start --> select column as geometry --> reproject gdf to crs_new --> use/visualize/... --> reproject gdf back to crs_lonlat --> goto start. This is not usable if I want to visualise both columns in one figure.
Second try
My second attempt was, to store the crs with each column separately, by changing the corresponding lines in the script above to:
gdf = gpd.GeoDataFrame()
gdf['geom1'] = gpd.GeoSeries([Point(9,53), Point(9,54)], crs=crs_lonlat)
gdf['geom2'] = gpd.GeoSeries([Point(8,63), Point(8,64)], crs=crs_lonlat)
However, it soon became clear that, though initialised as a GeoSeries, these columns are normal pandas Series, and don't have a .crs attribute the same way GeoSeries do:
gdf['geom1'].crs
AttributeError: 'Series' object has no attribute 'crs'
s = gpd.GeoSeries([Point(9,53), Point(9,54)], crs=crs_lonlat)
s.crs
Out: 'epsg:4326'
Is there something I'm missing here?
Is the only solution, to decide on the 'final' crs beforehand - and do all the reprojecting before adding the columns? Like so...
gdf = gpd.GeoDataFrame(crs=crs_new)
gdf['geom1'] = gpd.GeoSeries([Point(9,53), Point(9,54)], crs=crs_lonlat).to_crs(crs_new)
gdf['geom2'] = gpd.GeoSeries([Point(8,63), Point(8,64)], crs=crs_lonlat).to_crs(crs_new)
#no more reprojecting done/necessary/possible! :/
...and then, when another crs is needed, rebuild the entire gdf from scratch? That can't be the way this was intended to be used.
Unfortunately, the desired behaviour is currently not possible. Due to limitations in the package, geopandas does not accommodate this use case at the moment, as can be seen in this issue in the github repo.
My workaround is to not use a GeoDataFrame at all, but rather combine a normal pandas DataFrame, for the non-shapely data, with several seperate geopandas GeoSeries, for the shapely geometry data. The GeoSeries each have their own crs and can be correctly reprojected whenever necessary.

Plotting with folium

The task is to make an adress popularity map for Moscow. Basically, it should look like this:
https://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/GeoJSON_and_choropleth.ipynb
For my map I use public geojson: http://gis-lab.info/qa/moscow-atd.html
The only data I have - points coordinates and there's no information about the district they belong to.
Question 1:
Do I have to manually calculate for each disctrict if the point belongs to it, or there is more effective way to do this?
Question 2:
If there is no way to do this easier, then, how can I get all the coordinates for each disctrict from the geojson file (link above)?
import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import Point
Reading in the Moscow area shape file with geopandas
districts = gpd.read_file('mo-shape/mo.shp')
Construct a mock user dataset
moscow = [55.7, 37.6]
data = (
np.random.normal(size=(100, 2)) *
np.array([[.25, .25]]) +
np.array([moscow])
)
my_df = pd.DataFrame(data, columns=['lat', 'lon'])
my_df['pop'] = np.random.randint(500, 100000, size=len(data))
Create Point objects from the user data
geom = [Point(x, y) for x,y in zip(my_df['lon'], my_df['lat'])]
# and a geopandas dataframe using the same crs from the shape file
my_gdf = gpd.GeoDataFrame(my_df, geometry=geom)
my_gdf.crs = districts.crs
Then the join using default value of 'inner'
gpd.sjoin(districts, my_gdf, op='contains')
Thanks to #BobHaffner, I tried to solve the problem using geopandas.
Here are my steps:
I download a shape-files for Moscow using this link click
From a list of tuples containing x and y (latitude and logitude) coordinates I create list of Points (docs)
Assuming that in the dataframe from the first link I have polygons I can write a simple loop for checking if the Point is inside this polygon. For details read this.

Geopandas: not able to change the crs of a geopandas object

I am trying to set the crs of a geopandas object as described here.
The example file can be downloaded from here
import geopandas as gdp
df = pd.read_pickle('myShp.pickle')
I upload the screenshot to show the values of the coordinates
then if I try to change the crs the values of the polygon don't change
tmp = gpd.GeoDataFrame(df, geometry='geometry')
tmp.crs = {'init' :'epsg:32618'}
I show again the screenshot
If I try:
import geopandas as gdp
df = pd.read_pickle('myShp.pickle')
df = gpd.GeoDataFrame(df, geometry='geometry')
dfNew=df.to_crs(epsg=32618)
I get:
ValueError: Cannot transform naive geometries. Please set a crs on the object first.
Setting the crs like:
gdf.crs = {'init' :'epsg:32618'}
does not transform your data, it only sets the CRS (it basically says: "my data is represented in this CRS"). In most cases, the CRS is already set while reading the data with geopandas.read_file (if your file has CRS information). So you only need the above when your data has no CRS information yet.
If you actually want to convert the coordinates to a different CRS, you can use the to_crs method:
gdf_new = gdf.to_crs(epsg=32618)
See https://geopandas.readthedocs.io/en/latest/projections.html
super late answer, but it's:
tmp.set_crs(...)
Use this to define whatever coordinate system the data is in, i.e. 'telling the computer what coordinate system we started in'
Then;
tmp.to_crs(...)
Use this to change to your new preferred crs.

Categories