Geopandas: not able to change the crs of a geopandas object - python

I am trying to set the crs of a geopandas object as described here.
The example file can be downloaded from here
import geopandas as gdp
df = pd.read_pickle('myShp.pickle')
I upload the screenshot to show the values of the coordinates
then if I try to change the crs the values of the polygon don't change
tmp = gpd.GeoDataFrame(df, geometry='geometry')
tmp.crs = {'init' :'epsg:32618'}
I show again the screenshot
If I try:
import geopandas as gdp
df = pd.read_pickle('myShp.pickle')
df = gpd.GeoDataFrame(df, geometry='geometry')
dfNew=df.to_crs(epsg=32618)
I get:
ValueError: Cannot transform naive geometries. Please set a crs on the object first.

Setting the crs like:
gdf.crs = {'init' :'epsg:32618'}
does not transform your data, it only sets the CRS (it basically says: "my data is represented in this CRS"). In most cases, the CRS is already set while reading the data with geopandas.read_file (if your file has CRS information). So you only need the above when your data has no CRS information yet.
If you actually want to convert the coordinates to a different CRS, you can use the to_crs method:
gdf_new = gdf.to_crs(epsg=32618)
See https://geopandas.readthedocs.io/en/latest/projections.html

super late answer, but it's:
tmp.set_crs(...)
Use this to define whatever coordinate system the data is in, i.e. 'telling the computer what coordinate system we started in'
Then;
tmp.to_crs(...)
Use this to change to your new preferred crs.

Related

How to save a geosdataframe with many geomertry columns ( polygon, point and linestring) to a geojson file (or a csv file)?

I am trying to save the following geodataframe with columns (geometry, area, centroid, and boundary) to a json file using df.to_file('result.geojson', driver="GeoJSON"):
However, I get the following error because I have centroid and boundary columns.
TypeError: Cannot interpret '<geopandas.array.GeometryDtype object at 0x7fb7fff86940>' as a data type
This works perfectly fine when there is only geometry column and area column.
Also when I try to save it as a csv file and then read it back as geopandas file, I am only able to convert geometry into datatype geometry. Howver, centroid and boundary show as object datatype. How do I convert them to geometry datatype?
as per comments, both geopandas and geojson only support one geometry per feature
hence your data frame is polygons as geometry and other columns are series of objects (shapely objects)
have simulated data set. This clearly demonstrates your data structure is not normalised. boundary, centroid and area are calculated columns. Hence in relational theory not normalised
this can be saved as CSV shapely objects will be encoded as WKT
this can then simply be loaded, with second step to decode WKT back into shapely objects
have demonstrated this works by plotting geometries loaded from CSV
import geopandas as gpd
import pandas as pd
import shapely.geometry
from pathlib import Path
gdf = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
# derive columns in question...
gdf["boundary"] = gdf["geometry"].apply(lambda p: p.boundary )
gdf["centroid"] = gdf["geometry"].apply(lambda p: p.centroid )
gdf["area"] = gdf["geometry"].apply(lambda p: p.area )
# save as CSV shapely.wkt.dumps will be implicitly used
gdf.to_csv(Path.cwd().joinpath("SO_geom.csv"), index=False)
# load encoded dataframe
df = pd.read_csv(Path.cwd().joinpath("SO_geom.csv"))
# decode geometry columns as strings back into shapely objects
for c in ["geometry","boundary","centroid"]:
df[c] = df[c].apply(shapely.wkt.loads)
# finally reconstruct geodataframe
gdf = gpd.GeoDataFrame(df)
# show it has worked
gdf.plot()
gpd.GeoSeries(gdf["boundary"]).plot()
gpd.GeoSeries(gdf["centroid"]).plot()

How do I correctly reproject a geodataframe with multiple geometry colums?

In the geopandas documentation it says that
A GeoDataFrame may also contain other columns with geometrical (shapely) objects, but only one column can be the active geometry at a time. To change which column is the active geometry column, use the set_geometry method.
I'm wondering how to use such a GeoDataFrame if the goal is to flexibly reproject the geometrical data in these various columns to one or more other coordinate reference systems. Here's what I tried.
First try
import geopandas as gpd
from shapely.geometry import Point
crs_lonlat = 'epsg:4326' #geometries entered in this crs (lon, lat in degrees)
crs_new = 'epsg:3395' #geometries needed in (among others) this crs
gdf = gpd.GeoDataFrame(crs=crs_lonlat)
gdf['geom1'] = [Point(9,53), Point(9,54)]
gdf['geom2'] = [Point(8,63), Point(8,64)]
#Working: setting geometry and reprojecting for first time.
gdf = gdf.set_geometry('geom1')
gdf = gdf.to_crs(crs_new) #geom1 is reprojected to crs_new, geom2 still in crs_lonlat
gdf
Out:
geom1 geom2
0 POINT (1001875.417 6948849.385) POINT (8 63)
1 POINT (1001875.417 7135562.568) POINT (8 64)
gdf.crs
Out: 'epsg:3395'
So far, so good. Things go off the rails if I want to set geom2 as the geometry column, and reproject that one as well:
#Not working: setting geometry and reprojecting for second time.
gdf = gdf.set_geometry('geom2') #still in crs_lonlat...
gdf.crs #...but this still says crs_new...
Out: 'epsg:3395'
gdf = gdf.to_crs(crs_new) #...so this doesn't do anything! (geom2 unchanged)
gdf
Out:
geom1 geom2
0 POINT (1001875.417 6948849.385) POINT (8.00000 63.00000)
1 POINT (1001875.417 7135562.568) POINT (8.00000 64.00000)
Ok, so, apparently, the .crs attribute of the gdf is not reset to its original value when changing the column that serves as the geometry - it seems, the crs is not stored for the individual columns. If that is the case, the only way I see to use reprojection with this dataframe, is to backtrack: start --> select column as geometry --> reproject gdf to crs_new --> use/visualize/... --> reproject gdf back to crs_lonlat --> goto start. This is not usable if I want to visualise both columns in one figure.
Second try
My second attempt was, to store the crs with each column separately, by changing the corresponding lines in the script above to:
gdf = gpd.GeoDataFrame()
gdf['geom1'] = gpd.GeoSeries([Point(9,53), Point(9,54)], crs=crs_lonlat)
gdf['geom2'] = gpd.GeoSeries([Point(8,63), Point(8,64)], crs=crs_lonlat)
However, it soon became clear that, though initialised as a GeoSeries, these columns are normal pandas Series, and don't have a .crs attribute the same way GeoSeries do:
gdf['geom1'].crs
AttributeError: 'Series' object has no attribute 'crs'
s = gpd.GeoSeries([Point(9,53), Point(9,54)], crs=crs_lonlat)
s.crs
Out: 'epsg:4326'
Is there something I'm missing here?
Is the only solution, to decide on the 'final' crs beforehand - and do all the reprojecting before adding the columns? Like so...
gdf = gpd.GeoDataFrame(crs=crs_new)
gdf['geom1'] = gpd.GeoSeries([Point(9,53), Point(9,54)], crs=crs_lonlat).to_crs(crs_new)
gdf['geom2'] = gpd.GeoSeries([Point(8,63), Point(8,64)], crs=crs_lonlat).to_crs(crs_new)
#no more reprojecting done/necessary/possible! :/
...and then, when another crs is needed, rebuild the entire gdf from scratch? That can't be the way this was intended to be used.
Unfortunately, the desired behaviour is currently not possible. Due to limitations in the package, geopandas does not accommodate this use case at the moment, as can be seen in this issue in the github repo.
My workaround is to not use a GeoDataFrame at all, but rather combine a normal pandas DataFrame, for the non-shapely data, with several seperate geopandas GeoSeries, for the shapely geometry data. The GeoSeries each have their own crs and can be correctly reprojected whenever necessary.

Plotting with folium

The task is to make an adress popularity map for Moscow. Basically, it should look like this:
https://nbviewer.jupyter.org/github/python-visualization/folium/blob/master/examples/GeoJSON_and_choropleth.ipynb
For my map I use public geojson: http://gis-lab.info/qa/moscow-atd.html
The only data I have - points coordinates and there's no information about the district they belong to.
Question 1:
Do I have to manually calculate for each disctrict if the point belongs to it, or there is more effective way to do this?
Question 2:
If there is no way to do this easier, then, how can I get all the coordinates for each disctrict from the geojson file (link above)?
import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import Point
Reading in the Moscow area shape file with geopandas
districts = gpd.read_file('mo-shape/mo.shp')
Construct a mock user dataset
moscow = [55.7, 37.6]
data = (
np.random.normal(size=(100, 2)) *
np.array([[.25, .25]]) +
np.array([moscow])
)
my_df = pd.DataFrame(data, columns=['lat', 'lon'])
my_df['pop'] = np.random.randint(500, 100000, size=len(data))
Create Point objects from the user data
geom = [Point(x, y) for x,y in zip(my_df['lon'], my_df['lat'])]
# and a geopandas dataframe using the same crs from the shape file
my_gdf = gpd.GeoDataFrame(my_df, geometry=geom)
my_gdf.crs = districts.crs
Then the join using default value of 'inner'
gpd.sjoin(districts, my_gdf, op='contains')
Thanks to #BobHaffner, I tried to solve the problem using geopandas.
Here are my steps:
I download a shape-files for Moscow using this link click
From a list of tuples containing x and y (latitude and logitude) coordinates I create list of Points (docs)
Assuming that in the dataframe from the first link I have polygons I can write a simple loop for checking if the Point is inside this polygon. For details read this.

Change background map for contextily

I have this code:
import pandas as pd
import numpy as np
from geopandas import GeoDataFrame
import geopandas
from shapely.geometry import LineString, Point
import matplotlib.pyplot as plt
import contextily
''' Do Something'''
df = start_stop_df.drop('track', axis=1)
crs = {'init': 'epsg:4326'}
gdf = GeoDataFrame(df, crs=crs, geometry=geometry)
ax = gdf.plot()
contextily.add_basemap(ax)
ax.set_axis_off()
plt.show()
Basically, this generates a background map that is in Singapore. However, when I run it, I get the following error: HTTPError: Tile URL resulted in a 404 error. Double-check your tile url:http://tile.stamen.com/terrain/29/268436843/268435436.png
However, it still produces this image:
How can I change the Tile URL? I would still like to have the map of Singapore as the base layer.
EDIT:
Also tried including this argument to add_basemap:
url ='https://www.openstreetmap.org/#map=12/1.3332/103.7987'
Which produced this error:
OSError: cannot identify image file <_io.BytesIO object at 0x000001CC3CC4BC50>
First make sure that your GeoDataframe is in Web Mercator projection (epsg=3857). Once your Geodataframe is correctly georeferenced, you can achieve this by Geopandas reprojection:
df = df.to_crs(epsg=3857)
Once you have this done, you easily choose any of the supported map styles. A full list can be found in contextily.sources module, at the time of writing:
### Tile provider sources ###
ST_TONER = 'http://tile.stamen.com/toner/tileZ/tileX/tileY.png'
ST_TONER_HYBRID = 'http://tile.stamen.com/toner-hybrid/tileZ/tileX/tileY.png'
ST_TONER_LABELS = 'http://tile.stamen.com/toner-labels/tileZ/tileX/tileY.png'
ST_TONER_LINES = 'http://tile.stamen.com/toner-lines/tileZ/tileX/tileY.png'
ST_TONER_BACKGROUND = 'http://tile.stamen.com/toner-background/tileZ/tileX/tileY.png'
ST_TONER_LITE = 'http://tile.stamen.com/toner-lite/tileZ/tileX/tileY.png'
ST_TERRAIN = 'http://tile.stamen.com/terrain/tileZ/tileX/tileY.png'
ST_TERRAIN_LABELS = 'http://tile.stamen.com/terrain-labels/tileZ/tileX/tileY.png'
ST_TERRAIN_LINES = 'http://tile.stamen.com/terrain-lines/tileZ/tileX/tileY.png'
ST_TERRAIN_BACKGROUND = 'http://tile.stamen.com/terrain-background/tileZ/tileX/tileY.png'
ST_WATERCOLOR = 'http://tile.stamen.com/watercolor/tileZ/tileX/tileY.png'
# OpenStreetMap as an alternative
OSM_A = 'http://a.tile.openstreetmap.org/tileZ/tileX/tileY.png'
OSM_B = 'http://b.tile.openstreetmap.org/tileZ/tileX/tileY.png'
OSM_C = 'http://c.tile.openstreetmap.org/tileZ/tileX/tileY.png'
Keep in mind that you should not be adding actual x,y,z tile numbers in your tile URL (like you did in your "EDIT" example). ctx will take care of all this.
You can find a working copy-pastable example and further info at GeoPandas docs.
import contextily as ctx
# Dataframe you want to plot
gdf = GeoDataFrame(df, crs= {"init": "epsg:4326"}) # Create a georeferenced dataframe
gdf = gdf.to_crs(epsg=3857) # reproject it in Web mercator
ax = gdf.plot()
# choose any of the supported maps from ctx.sources
ctx.add_basemap(ax, url=ctx.sources.ST_TERRAIN)
ax.set_axis_off()
plt.show()
Contextily's default crs is epsg:3857. However, your data-frame is in different CRS. Use the following,refer the manual here:
ctx.add_basemap(ax, crs='epsg:4326', source=ctx.providers.Stamen.TonerLite)
Please, refer to this link for using different sources such as Stamen.Toner, Stamen.Terrain etc. (Stamen.Terrain is used as default).
Also, you can cast your data frame to EPSG:3857 by using df.to_crs(). In this case, you should skip crs argument inside ctx.add_basemap() function.
im too new to add a comment but I wanted to point out to those saying in the comments that they get a 404 error. Check you capitaliations, etc. Stamen's urls are specifc on this. For instance there is not an all caps call. It is only capitalize the first letter. For example:
ctx.add_basemap(ax=ax,url=ctx.providers.Stamen.Toner, zoom=10)

Python: Convert map in kilometres to degrees

I have a pandas Dataframe with a few million rows, each with an X and Y attribute with their location in kilometres according to the WGS 1984 World Mercator projection (created using ArcGIS).
What is the easiest way to project these points back to degrees, without leaving the Python/pandas environment?
There is already a python module that can do these kind of transformations for you called pyproj. I will agree it is actually not the simplest module to find via google. Some examples of its use can be seen here
Many years later, this is how I would do this. Keeping everything in GeoPandas to minimise the possibility of footguns.
Some imports:
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
Create a dataframe (note the values must be in metres!)
df = pd.DataFrame({"X": [50e3, 900e3], "Y": [20e3, 900e3]})
Create geometries from the X/Y coordinates
df["geometry"] = df.apply(lambda row: Point(row.X, row.Y), axis=1)
Convert to a GeoDataFrame, setting the current CRS.
In this case EPSG:3857, the projection from the question.
gdf = gpd.GeoDataFrame(df, crs=3857)
Project it to the standard WGS84 CRS in degrees (EPSG:4326).
gdf = gdf.to_crs(4326)
And then (optionally), extract the X/Y coordinates in degrees back into standard columns:
gdf["X_deg"] = gdf.geometry.apply(lambda p: p.x)
gdf["Y_deg"] = gdf.geometry.apply(lambda p: p.y)

Categories