Using pyproj to transform shapely data is giving me strange results

Using pyproj to transform shapely data is giving me strange results - python

I've got a polygon which looks something like this in WKT:
POLYGON ((-2.5079473598836624 51.34385834919997, -2.5081726654409133 51.34353499032948, -2.507909808957454 51.343441165566986, -2.507679138982173 51.34359530614682, -2.5079473598836624 51.34385834919997))
I'm trying to transform this from EPSG:3857 (web mercator) to EPSG:32630 (UTM 30N) to do some distance calculations on it but the results look weird:
wgs_proj = pyproj.CRS("EPSG:3857")
utm_proj = pyproj.CRS("EPSG:32630")
transform = pyproj.Transformer.from_crs(wgs_proj, utm_proj, always_xy=True).transform
shape = shapely.wkt.loads("POLYGON ((-2.5079473598836624 51.34385834919997, -2.5081726654409133 51.34353499032948, -2.507909808957454 51.343441165566986, -2.507679138982173 51.34359530614682, -2.5079473598836624 51.34385834919997))")
boundary = shapely.ops.transform(transform, shape)
print(str(boundary))
This outputs:
POLYGON ((833976.0465009063 51.05017626883035, 833976.04627538 51.04985475944731, 833976.0465384943 51.049761471464706, 833976.0467693905 51.0499147304722, 833976.0465009063 51.05017626883035))
This looks to me like it's got the longitude conversion roughly right but the latitude conversion completely wrong. The units are supposed to be in metres, I think. So unless the shape happens to be at a latitude about 51m North of the origin of UTM30N, something has gone wrong. Can anyone point me to what?

This is because the data is in long-lats, not in EPSG:3857. Almost everything online says that EPSG:3857 is what Google maps uses, but this is only true internally. EPSG:3857 is WGS84 projected into metres. Externally, Google still uses WGS84 long-lats, ie EPSG:4326. Changing the origin coordinate system in the code shown in the question produces the right result.

Related

How to create Isopolygon using python?

I have a set of points of a location. I am trying to create an isoline using those points. In order to generate isolines I used convex hull and alphashape which is creating kind of box shaped or straight cut line kind of polygon structure like below. How do I get a proper isoline shape? What is way of perfect way to generate an isochrone using python?
print(df)
id latitude longitude geometry
8758520180 53.334261 -2.569419 POINT (-2.56942 53.33426)
9339285446 53.346211 -2.575348 POINT (-2.57535 53.34621)
616761660 53.340828 -2.566912 POINT (-2.56691 53.34083)
9454070930 53.338889 -2.574538 POINT (-2.57454 53.33889)
9454071045 53.339388 -2.574591 POINT (-2.57459 53.33939)
and so on.
import alphashape
polygon= alphashape.alphashape(df['geometry'], 0.20)
GeoDataFrame(polygon, crs="EPSG:4326", geometry=p_df['geometry'])
final output by alphashape:-
Excepted output (sketch):-
points :-

zonal_stats: width and height must be > 0 error

I am trying to use the function zonal_stats from rasterstats Python package to get the raster statistics from a .tif file of each shape in a .shp file. I manage to do it in QGIS without any problems, but I have to do the same with more than 200 files, which will take a lot of time, so I'm trying the Python way. Both files and replication code are in my Google Drive.
My script is:
import rasterio
import geopandas as gpd
import numpy as np
from rasterio.plot import show
from rasterstats import zonal_stats
from rasterio.transform import Affine
# Import .tif file
raster = rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Arroz_2019-03.tif')
# Read the raster values
array = raster.read(1)
# Get the affine
affine = raster.transform
# Import shape file
shapefile = gpd.read_file(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Setores_Censit_SP_WGS84.shp')
# Zonal stats
zs_shapefile = zonal_stats(shapefile, array, affine = affine,
stats=['min', 'max', 'mean', 'median', 'majority'])
I get the following error:
Input In [1] in <cell line: 22>
zs_shapefile = zonal_stats(shapefile, array, affine = affine,
File ~\Anaconda3\lib\site-packages\rasterstats\main.py:32 in zonal_stats
return list(gen_zonal_stats(*args, **kwargs))
File ~\Anaconda3\lib\site-packages\rasterstats\main.py:164 in gen_zonal_stats
rv_array = rasterize_geom(geom, like=fsrc, all_touched=all_touched)
File ~\Anaconda3\lib\site-packages\rasterstats\utils.py:41 in rasterize_geom
rv_array = features.rasterize(
File ~\Anaconda3\lib\site-packages\rasterio\env.py:387 in wrapper
return f(*args, **kwds)
File ~\Anaconda3\lib\site-packages\rasterio\features.py:353 in rasterize
raise ValueError("width and height must be > 0")
I have found this question about the same problem, but I can't make it work with the solution: I have tried to reverse the signal of the items in the Affine of my raster data, but I couldn't make it work:
''' Trying to use the same solution of question: https://stackoverflow.com/questions/62010050/from-zonal-stats-i-get-this-error-valueerror-width-and-height-must-be-0 '''
old_tif = rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Arroz_2019-03.tif')
print(old_tif.profile) # copy & paste the output and change signs
new_tif_profile = old_tif.profile
# Affine(0.004611149999999995, 0.0, -46.828504575,
# 0.0, 0.006521380000000008, -24.01169169)
new_tif_profile['transform'] = Affine(0.004611149999999995, 0.0, -46.828504575,
0.0, -0.006521380000000008, 24.01169169)
new_tif_array = old_tif.read(1)
new_tif_array = np.fliplr(np.flip(new_tif_array))
with rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\tentativa.tif', "w", **new_tif_profile) as dest:
dest.write(new_tif_array, indexes=1)
dem = rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\tentativa.tif')
# Read the raster values
array = dem.read(1)
# Get the affine
affine = dem.transform
# Import shape file
shapefile = gpd.read_file(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Setores_Censit_SP_WGS84.shp')
# Zonal stats
zs_shapefile = zonal_stats(shapefile, array, affine=affine,
stats=['min', 'max', 'mean', 'median', 'majority'])
Doing this way, I don't get the "width and height must be > 0" error! But every stat in zs_shapefile is "NoneType", so it doesn't help my problem.
Does anyone understands why this error happens, and which sign I have to reverse for making it work? Thanks in advance!

I would be careful with overriding the geotransform of your raster like this, unless you are really convinced the original metadata is incorrect. I'm not too familiar with Affine, but it looks like you're setting the latitude now as positive? Placing the raster on the northern hemisphere. My guess would be that this lack of intersection between the vector and raster causes the NoneType results.
I'm also not familiar with raster_stats, but I'm guessing it boils down to GDAL & Numpy at the core of it. So something you could try as a test is to add the all_touched=True keyword:
https://pythonhosted.org/rasterstats/manual.html#rasterization-strategy
If that works, it might indicate that the rasterization fails because your polygons are so small compared to the pixels, that the default rasterization method results in a rasterized polygon of size 0 (in at least one of the dimensions). And that's what the error also hints at (my guess).
Keep in mind that all_touched=True changes the stats you get in result, so I would only do it for testing, or if you're comfortable with this difference.
If you really need a valid value for these (too) small polygons, there are a few workarounds you could try. Something I've done is to simply take the centroid for these polygons, and take the value of the pixel where this centroid falls on.
A potential way to identify these polygons would be to use all_touched with the "count" statistic, every polygon with a count of only 1 might be too small to get rasterized correctly. To really find this out you would probably have to do the rasterization yourself using GDAL, given that raster_stats doesn't seem to allow it.
Note that due to the shape of some of the polygons you use, the centroid might fall outside of the polygon. But given how course your raster data is, relative to the vector, I don't think it would impact the result all that much.
An alternative is, instead of modifying the vector, to significantly increase the resolution of your raster. You could use gdal_translate to output this to a VRT, with some form of resampling, and avoid having to write this data to disk. Once the resolution is high enough that all polygons rasterize to at least a 1x1 array, it should probably work. But your polygons are tiny compared to the pixels, so it'll be a lot. You could guess it, or analyze the envelopes of all polygons. For example take the smallest edge of the envelope as more or less the resolution that's necessary for a correct rasterization.
Edit; To clarify the above a bit further.
The default rasterization strategy of GDAL (all_touched=False) is to consider a pixel "within" the polygon if the centroid of the pixel intersects with the polygon.
Using QGIS you can for example convert the pixels to points, and then do a spatial join with your vector. If you remove polygons that can't be joined (there's a checkbox), you'll get a different vector that most likely should work with raster_stats, given your current raster.
You could perhaps use that in the normal way (all_touched=False), and get the stats for the small polygons using all_touched=True.
In the image below, the green polygons are the ones that intersect with the centroid of a pixel, the red ones don't (and those are probably the ones raster_stats "tries" to rasterize to a size 0 array).

Reproject coordinates with rasterio creates a irregular grid from an regular grid

I have geotiff files load into xarray with a crs = EPSG:31467. I want to transform/reproject (don't know if there is a difference) these files into EPSG:4326. To do that, I use rasterio.warp.transform function which needs 1D arrays for x,y. To generate these i use numpy.meshgrid and flatten functions. Here is a small example with my data:
import numpy
#Longitude and Latitude in EPSG:31467
lon = [3280914, 3281914, 3282914]
lat = [6103001, 6102001, 6101001]
#create 2d meshgrid
xv, yv = np.meshgrid(lon, lat)
xv, yv
(array([[3280914, 3281914, 3282914],
[3280914, 3281914, 3282914],
[3280914, 3281914, 3282914]]),
array([[6103001, 6103001, 6103001],
[6102001, 6102001, 6102001],
[6101001, 6101001, 6101001]]))
Now I have a sequence of different longitude [3280914, 3281914, 3282914] for the same latitude [6103001, 6103001, 6103001]
When i now use rasterio.transform(src_crs, dst_crs, x, y) these sequences disappear and i dont unterstand why?!
from rasterio.warp import transform
# Compute the lon/lat coordinates with rasterio.warp.transform
lon, lat = transform('EPSG:31467','EPSG:4326',
xv.flatten(), yv.flatten())
np.asarray(lon).reshape(3,3), np.asarray(lat).reshape(3,3)
> (array([[5.57397386, 5.58957607, 5.6051787 ],
> [5.57473921, 5.59033795, 5.60593711],
> [5.57550412, 5.5910994 , 5.60669509]]), array([[55.00756605, 55.00800488, 55.00844171],
> [54.9985994 , 54.99903809, 54.99947477],
> [54.98963274, 54.99007128, 54.99050782]]))
np.unique(xv).shape, np.unique(yv).shape
> ((3,), (3,))
np.unique(lon).shape, np.unique(lat).shape
> ((9,), (9,))
To change the reporjected coordinates back to xarray I have to get the same shape in sense of equality. Which process I don't understand, is it the function of transform or the concept of projections?

I can't understand what exactly you are trying to do after np.asarray(lon).reshape(3,3)
Which process I don't understand, is it the function of transform or the concept of projections?
It seems like you don't understand both.
EPSG:31467 and EPSG:4326 are fundamentally different types of data. EPSG:31467 is actually a planar rectangular coordinate system in zonal projection. EPSG:4326 is not a projection at all, it is a pure geodetic coordinates in WGS-84 terrestrial coordinate system with WGS-84 ellypsoid. What is exactly emportant here is that same coordinates in EPSG:31467 don't have to be same in EPSG:4326. Because in 4326 your coordinate is an angle and in 31467 your coordinate is a distance from equator or false meridien. Axes in these systems are not collinear and related with convergence of meridians parameter. So, if you change Norting or Easting in 31467, both latitude and logitute can change.
Here you can notice an angle between blue lines (one cell is 31467 analogue) and black lines (whole grid is 4326 analogue)
https://ru.wikipedia.org/wiki/%D0%A4%D0%B0%D0%B9%D0%BB:Soviet_topographic_map_kilometer_grid.svg
It's pretty easy to check, that transformation works correctly - just do it backwards.
lon, lat = transform('EPSG:31467','EPSG:4326',
xv.flatten(), yv.flatten())
x_check, y_check = transform('EPSG:4326', 'EPSG:31467', lon, lat)
#we'll have some troubles because of computational errors, so let's round
x_check = [int(round(i, 0)) for i in x_check]
print(lon)
print(x_check)
print(xv.flatten())
>[5.574033001416839, 5.5896346633743175, 5.605236748547687, 5.574797816145165, 5.5903960110246524, 5.605994628800234, 5.5755622060626155, 5.591156935778857, 5.6067520880717225]
>[3280914, 3281914, 3282914, 3280914, 3281914, 3282914, 3280914, 3281914, 3282914]
>[3280914 3281914 3282914 3280914 3281914 3282914 3280914 3281914 3282914]
Output examples that transform() returns you exactly what it expected to return.
Next code also works as it is expected (you can match output with above one):
print(np.asarray(lon).reshape(3,3))
print(xv)
>[[5.574033 5.58963466 5.60523675]
> [5.57479782 5.59039601 5.60599463]
> [5.57556221 5.59115694 5.60675209]]
>[[3280914 3281914 3282914]
> [3280914 3281914 3282914]
> [3280914 3281914 3282914]]
I have never worked with rasterio, so I can't provide you working solution.
Some notes:
I have no idea why do you need grid for raster transformation
Rasterio docs are clear and have solution for you: https://rasterio.readthedocs.io/en/latest/topics/reproject.html#reprojecting-a-geotiff-dataset
You can transform raster between crs directly. If not in rasterio, try osgeo.gdal (gdal.Warp(dst_file, src_file, srcSRS='EPSG:31467', dstSRS='EPSG:4326')
Note the difference between reprojection and defining projection for raster. First changes image, second changes metadata. For correct work of direct transform, your GeoTIFF must have valid projection defenition in metadata (that matches actual projection of your raster)
If you're not developing standalone app and just need to reproject 2-3 rasters, use QGIS and do it without coding. It's also helpfull to try understanding geodetic concepts on 2-3 examples in QGIS before coding. Just use it as a playground
If you're not developing standalone app, you can solve your automatisation task in QGIS python API. You can test workflow with UI and then call some QGIS/GDAL tools from python script as batch. What is more - rasterio and all other packages will be avaluable for installation on QGIS' python. Of course, it's a bad idea for deployment unless you are creating a QGIS plugin
In EPSG:31467 the coordinate value of 0.001 is 1 mm. So more precise is useless. In EPSG:4326 1 degree is 111.1 km approx (or 111.3*cos(lat)). So, you can calculate useful precise. Everything more than 4-5 digit after . may also be useless

in GeoPandas, select (line string) data within a latitude longitude box defined by user

I have a geopandas dataframe consisting of a combination of LineStrings and MultiLineStrings. I would like to select those LineStrings and MultiLineStrings containing a point within a box (defined by me) of latitude longitude, for which I don't have a geometry. In other words, I have some mapped USGS fault traces and I would like to pick a square inset of those fault lines within a certain distance from some lat/lons. So far I've had some success unwrapping just coordinates from the entire data frame and only saving points that fall within a box of lat/lon, but then I no longer keep the original geometry or information saved in the data frame. (i.e. like this:)
xvals=[]
yvals=[]
for flt in qfaults['geometry']:
for coord in flt.coords:
if coord[1] >= centroid[1]-1 and coord[1] <= centroid[1]+1 and coord[0]<=centroid[0]+1 and coord[0]>=centroid[0]-1:
xvals.append(coord[0])
yvals.append(coord[1])
Is there any intuition as to how to do this using the GeoPandas data frame? Thanks in advance.

GeoPandas has .cx indexer which works exactly like this. See https://geopandas.readthedocs.io/en/latest/docs/user_guide/indexing.html
Syntax is gdf.cx[xmin:xmax, ymin:ymax]
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
southern_world = world.cx[:, :0]
western_world = world.cx[:0, :]
western_europe = world.cx[1:10, 40:60]

Cartesian projection issue in a FITS image through PyFITS / AstroPy

I've looked and looked for a solution to this problem and am turning up nothing.
I'm generating rectangular FITS images through matplotlib and subsequently applying WCS coordinates to them using AstroPy (or PyFITS). My images are in galactic latitude and longitude, so the header keywords appropriate for my maps should be GLON-CAR and GLAT-CAR (for Cartesian projection). I've looked at other maps that use this same map projection in SAO DS9 and the coordinates work great... the grid is perfectly orthogonal as it should be. The FITS standard projections can be found here.
But when I generate my maps, the coordinates are not at all Cartesian. Here's a side-by-side comparison of my map (left) and another reference map of roughly the same region (right). Both are listed GLON-CAR and GLAT-CAR in the FITS header, but mine is screwy when looked at in SAO DS9 (note that the coordinate grid is something SAO DS9 generates based on the data in the FITS header, or at least stored somewhere in the FITS file):
This is problematic, because the coordinate-assigning algorithm will assign incorrect coordinates to each pixel if the projection is wrong.
Has anyone encountered this, or know what could be the problem?
I've tried applying other projections (just to see how they perform in SAO DS9) and they come out fine... but my Cartesian and Mercator projections do not come out with the orthogonal grid like they should.
I can't believe this would be a bug in AstroPy, but I can't find any other cause... unless my arguments in the header are incorrectly formatted, but I still don't see how that could cause the problem I'm experiencing. Or would you recommend using something else? (I've looked at matplotlib basemap but have had some trouble getting that to work on my computer).
My header code is below:
from __future__ import division
import numpy as np
from astropy.io import fits as pyfits # or use 'import pyfits, same thing'
#(lots of code in between: defining variables and simple calculations...
#probably not relevant)
header['BSCALE'] = (1.00000, 'REAL = TAPE*BSCALE + BZERO')
header['BZERO'] = (0.0)
header['BUNIT'] = ('mag ', 'UNIT OF INTENSITY')
header['BLANK'] = (-100.00, 'BLANK VALUE')
header['CRVAL1'] = (glon_center, 'REF VALUE POINT DEGR') #FIRST COORDINATE OF THE CENTER
header['CRPIX1'] = (center_x+0.5, 'REF POINT PIXEL LOCATION') ## REFERENCE X PIXEL
header['CTYPE1'] = ('GLON-CAR', 'COORD TYPE : VALUE IS DEGR')
header['CDELT1'] = (-glon_length/x_length, 'COORD VALUE INCREMENT WITH COUNT DGR') ### degrees per pixel
header['CROTA1'] = (0, 'CCW ROTATION in DGR')
header['CRVAL2'] = (glat_center, 'REF VALUE POINT DEGR') #Y COORDINATE OF THE CENTER
header['CRPIX2'] = (center_y+0.5, 'REF POINT PIXEL LOCATION') #Y REFERENCE PIXEL
header['CTYPE2'] = ('GLAT-CAR', 'COORD TYPE: VALUE IS DEGR') # WAS CAR OR TAN
header['CDELT2'] = (glat_length/y_length, 'COORD VALUE INCREMENT WITH COUNT DGR') #degrees per pixel
header['CROTA2'] = (rotation, 'CCW ROTATION IN DEGR') #NEGATIVE ROTATES CCW around origin (bottom left).
header['DATAMIN'] = (data_min, 'Minimum data value in the file')
header['DATAMAX'] = (data_max, 'Maximum data value in the file')
header['TELESCOP'] = ("Produced from 2MASS")
pyfits.update(filename, map_data, header)
Thanks for any help you can provide.

In the modern definition of the -CAR projection (from Calabretta et al.), GLON-CAR/GLAT-CAR projection only produces a rectilinear grid if CRVAL2 is set to zero. If CRVAL2 is not zero, then the grid is curved (this should have nothing to do with Astropy). You can try and fix this by adjusting CRVAL2 and CRPIX2 so that CRVAL2 is zero. Does this help?
Just to clarify what I mean, try, after your code above, and before writing out the file:
header['CRPIX2'] -= header['CRVAL2'] / header['CDELT2']
header['CRVAL2'] = 0.
Any luck?
If you look at the header for the 'reference' file you looked at, you'll see that CRVAL2 is zero there. Just to be clear, there's nothing wrong with CRVAL2 being non-zero, but the grid is then no longer rectilinear.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.