Mask raster by extent in Python using rasterio

Mask raster by extent in Python using rasterio - python

I want to clip one raster based on the extent of another (smaller) raster. First I determine the coordinates of the corners of the smaller raster using
import rasterio as rio
import gdal
from shapely.geometry import Polygon
src = gdal.Open(smaller_file.tif)
ulx, xres, xskew, uly, yskew, yres = src.GetGeoTransform()
lrx = ulx + (src.RasterXSize * xres)
lry = uly + (src.RasterYSize * yres)
geometry = [[ulx,lry], [ulx,uly], [lrx,uly], [lrx,lry]]
This gives me the following output geometry = [[-174740.0, 592900.0], [-174740.0, 2112760.0], [900180.0, 2112760.0], [900180.0, 592900.0]]. (Note that the crs is EPSG: 32651).
Now I would like to clip the larger file using rio.mask.mask(). According to the documentation, the shape variable should be GeoJSON-like dict or an object that implements the Python geo interface protocol (such as a Shapely Polygon). Therefore I create a Shapely Polygon out of the variable geometry, using
roi = Polygon(geometry)
Now everything is ready to use the rio.mask() function.
output = rio.mask.mask(larger_file.tif, roi, crop = True)
But this gives me the following error
TypeError: 'Polygon' object is not iterable
What do I do wrong? Or if someone knows a more elegant way to do it, please let me know.
(Unfortunately I cannot upload the two files since they're too large)

I found your question when I needed to figure out this kind of clipping myself. I got the same error and fixed it the following way:
rasterio.mask expects a list of features, not a single geometry. So the algorithm wants to run masking over several features bundled in an iterable (e.g. list or tuple) so we need to pass it our polygon within a list (or tuple) object.
The code you posted works after following change:
roi = [Polygon(geometry)]
All we have to do is to enclose the geometry in a list/tuple and then rasterio.mask works as expected.

Related

covert rgb png and depth txt to point cloud

I have a series of rgb files in png format, as well as the corresponding depth file in txt format, which can be loaded with np.loadtxt. How could I merge these two files to point cloud using open3d?
I followed the procedure as obtain point cloud from depth numpy array using open3d - python, but the result is not readable for human.
The examples is listed here:
the source png:
the pcd result:
You can get the source file from this link ![google drive] to reproduce my result.
By the way, the depth and rgb are not registerd.
Thanks.

I had to play a bit with the settings and data and used mainly the answer of your SO link.
import cv2
import numpy as np
import open3d as o3d
color = o3d.io.read_image("a542c.png")
depth = np.loadtxt("a542d.txt")
vertices = []
for x in range(depth.shape[0]):
for y in range(depth.shape[1]):
vertices.append((float(x), float(y), depth[x][y]))
pcd = o3d.geometry.PointCloud()
point_cloud = np.asarray(np.array(vertices))
pcd.points = o3d.utility.Vector3dVector(point_cloud)
pcd.estimate_normals()
pcd = pcd.normalize_normals()
o3d.visualization.draw_geometries([pcd])
However, if you keep the code as provided, the whole scene looks very weird and unfamiliar. That is because your depth file contains data between 0 and almost 2.5 m.
I introduced a cut-off at 500 or 1000 mm plus removed all 0s as suggested in the other answer. Additionally I flipped the x-axis (float(-x) instead of float(x)) to resemble your photo.
# ...
vertices = []
for x in range(depth.shape[0]):
for y in range(depth.shape[1]):
if 0< depth[x][y]<500:
vertices.append((float(-x), float(y), depth[x][y]))
For a good perspective I had to rotate the images manually. Probably open3d provides methods to do it automatically (I quickly tried pcd.transform() from your SO link above, it can help you if needed).
Results
500 mm cut-off: and 1000 mm cut-off: .

I used laspy instead of open3d because wanted to give some colors to your image:
import imageio
import numpy as np
# first reading the image for RGB values
image = imageio.imread(".../a542c.png")
loading the depth file
depth = np.loadtxt("/home/shaig93/Documents/internship_FWF/a542d.txt")
# creating fake x, y coordinates with meshgrid
xv, yv = np.meshgrid(np.arange(400), np.arange(640), indexing='ij')
# save_las is a function based on laspy that was provided to me by my supervisor
save_las("fn.laz", image[:400, :, 0].flatten(), np.c_[yv.flatten(), xv.flatten(), depth.flatten()], cmap = plt.cm.magma_r)
and the result is this. As you can see objects are visible from front.
However from side they are not easy to distinguish.
This means to me to think that your depth file is not that good.
Another idea would be also getting rid off 0 values from your depth file so that you can get point cloud without a wall kind of structure in the front. But still does not solve depth issue of course.
ps. I know this is not a proper answer but I hope it was helpful on identifying the problem.

zonal_stats: width and height must be > 0 error

I am trying to use the function zonal_stats from rasterstats Python package to get the raster statistics from a .tif file of each shape in a .shp file. I manage to do it in QGIS without any problems, but I have to do the same with more than 200 files, which will take a lot of time, so I'm trying the Python way. Both files and replication code are in my Google Drive.
My script is:
import rasterio
import geopandas as gpd
import numpy as np
from rasterio.plot import show
from rasterstats import zonal_stats
from rasterio.transform import Affine
# Import .tif file
raster = rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Arroz_2019-03.tif')
# Read the raster values
array = raster.read(1)
# Get the affine
affine = raster.transform
# Import shape file
shapefile = gpd.read_file(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Setores_Censit_SP_WGS84.shp')
# Zonal stats
zs_shapefile = zonal_stats(shapefile, array, affine = affine,
stats=['min', 'max', 'mean', 'median', 'majority'])
I get the following error:
Input In [1] in <cell line: 22>
zs_shapefile = zonal_stats(shapefile, array, affine = affine,
File ~\Anaconda3\lib\site-packages\rasterstats\main.py:32 in zonal_stats
return list(gen_zonal_stats(*args, **kwargs))
File ~\Anaconda3\lib\site-packages\rasterstats\main.py:164 in gen_zonal_stats
rv_array = rasterize_geom(geom, like=fsrc, all_touched=all_touched)
File ~\Anaconda3\lib\site-packages\rasterstats\utils.py:41 in rasterize_geom
rv_array = features.rasterize(
File ~\Anaconda3\lib\site-packages\rasterio\env.py:387 in wrapper
return f(*args, **kwds)
File ~\Anaconda3\lib\site-packages\rasterio\features.py:353 in rasterize
raise ValueError("width and height must be > 0")
I have found this question about the same problem, but I can't make it work with the solution: I have tried to reverse the signal of the items in the Affine of my raster data, but I couldn't make it work:
''' Trying to use the same solution of question: https://stackoverflow.com/questions/62010050/from-zonal-stats-i-get-this-error-valueerror-width-and-height-must-be-0 '''
old_tif = rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Arroz_2019-03.tif')
print(old_tif.profile) # copy & paste the output and change signs
new_tif_profile = old_tif.profile
# Affine(0.004611149999999995, 0.0, -46.828504575,
# 0.0, 0.006521380000000008, -24.01169169)
new_tif_profile['transform'] = Affine(0.004611149999999995, 0.0, -46.828504575,
0.0, -0.006521380000000008, 24.01169169)
new_tif_array = old_tif.read(1)
new_tif_array = np.fliplr(np.flip(new_tif_array))
with rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\tentativa.tif', "w", **new_tif_profile) as dest:
dest.write(new_tif_array, indexes=1)
dem = rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\tentativa.tif')
# Read the raster values
array = dem.read(1)
# Get the affine
affine = dem.transform
# Import shape file
shapefile = gpd.read_file(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Setores_Censit_SP_WGS84.shp')
# Zonal stats
zs_shapefile = zonal_stats(shapefile, array, affine=affine,
stats=['min', 'max', 'mean', 'median', 'majority'])
Doing this way, I don't get the "width and height must be > 0" error! But every stat in zs_shapefile is "NoneType", so it doesn't help my problem.
Does anyone understands why this error happens, and which sign I have to reverse for making it work? Thanks in advance!

I would be careful with overriding the geotransform of your raster like this, unless you are really convinced the original metadata is incorrect. I'm not too familiar with Affine, but it looks like you're setting the latitude now as positive? Placing the raster on the northern hemisphere. My guess would be that this lack of intersection between the vector and raster causes the NoneType results.
I'm also not familiar with raster_stats, but I'm guessing it boils down to GDAL & Numpy at the core of it. So something you could try as a test is to add the all_touched=True keyword:
https://pythonhosted.org/rasterstats/manual.html#rasterization-strategy
If that works, it might indicate that the rasterization fails because your polygons are so small compared to the pixels, that the default rasterization method results in a rasterized polygon of size 0 (in at least one of the dimensions). And that's what the error also hints at (my guess).
Keep in mind that all_touched=True changes the stats you get in result, so I would only do it for testing, or if you're comfortable with this difference.
If you really need a valid value for these (too) small polygons, there are a few workarounds you could try. Something I've done is to simply take the centroid for these polygons, and take the value of the pixel where this centroid falls on.
A potential way to identify these polygons would be to use all_touched with the "count" statistic, every polygon with a count of only 1 might be too small to get rasterized correctly. To really find this out you would probably have to do the rasterization yourself using GDAL, given that raster_stats doesn't seem to allow it.
Note that due to the shape of some of the polygons you use, the centroid might fall outside of the polygon. But given how course your raster data is, relative to the vector, I don't think it would impact the result all that much.
An alternative is, instead of modifying the vector, to significantly increase the resolution of your raster. You could use gdal_translate to output this to a VRT, with some form of resampling, and avoid having to write this data to disk. Once the resolution is high enough that all polygons rasterize to at least a 1x1 array, it should probably work. But your polygons are tiny compared to the pixels, so it'll be a lot. You could guess it, or analyze the envelopes of all polygons. For example take the smallest edge of the envelope as more or less the resolution that's necessary for a correct rasterization.
Edit; To clarify the above a bit further.
The default rasterization strategy of GDAL (all_touched=False) is to consider a pixel "within" the polygon if the centroid of the pixel intersects with the polygon.
Using QGIS you can for example convert the pixels to points, and then do a spatial join with your vector. If you remove polygons that can't be joined (there's a checkbox), you'll get a different vector that most likely should work with raster_stats, given your current raster.
You could perhaps use that in the normal way (all_touched=False), and get the stats for the small polygons using all_touched=True.
In the image below, the green polygons are the ones that intersect with the centroid of a pixel, the red ones don't (and those are probably the ones raster_stats "tries" to rasterize to a size 0 array).

Reproject coordinates with rasterio creates a irregular grid from an regular grid

I have geotiff files load into xarray with a crs = EPSG:31467. I want to transform/reproject (don't know if there is a difference) these files into EPSG:4326. To do that, I use rasterio.warp.transform function which needs 1D arrays for x,y. To generate these i use numpy.meshgrid and flatten functions. Here is a small example with my data:
import numpy
#Longitude and Latitude in EPSG:31467
lon = [3280914, 3281914, 3282914]
lat = [6103001, 6102001, 6101001]
#create 2d meshgrid
xv, yv = np.meshgrid(lon, lat)
xv, yv
(array([[3280914, 3281914, 3282914],
[3280914, 3281914, 3282914],
[3280914, 3281914, 3282914]]),
array([[6103001, 6103001, 6103001],
[6102001, 6102001, 6102001],
[6101001, 6101001, 6101001]]))
Now I have a sequence of different longitude [3280914, 3281914, 3282914] for the same latitude [6103001, 6103001, 6103001]
When i now use rasterio.transform(src_crs, dst_crs, x, y) these sequences disappear and i dont unterstand why?!
from rasterio.warp import transform
# Compute the lon/lat coordinates with rasterio.warp.transform
lon, lat = transform('EPSG:31467','EPSG:4326',
xv.flatten(), yv.flatten())
np.asarray(lon).reshape(3,3), np.asarray(lat).reshape(3,3)
> (array([[5.57397386, 5.58957607, 5.6051787 ],
> [5.57473921, 5.59033795, 5.60593711],
> [5.57550412, 5.5910994 , 5.60669509]]), array([[55.00756605, 55.00800488, 55.00844171],
> [54.9985994 , 54.99903809, 54.99947477],
> [54.98963274, 54.99007128, 54.99050782]]))
np.unique(xv).shape, np.unique(yv).shape
> ((3,), (3,))
np.unique(lon).shape, np.unique(lat).shape
> ((9,), (9,))
To change the reporjected coordinates back to xarray I have to get the same shape in sense of equality. Which process I don't understand, is it the function of transform or the concept of projections?

I can't understand what exactly you are trying to do after np.asarray(lon).reshape(3,3)
Which process I don't understand, is it the function of transform or the concept of projections?
It seems like you don't understand both.
EPSG:31467 and EPSG:4326 are fundamentally different types of data. EPSG:31467 is actually a planar rectangular coordinate system in zonal projection. EPSG:4326 is not a projection at all, it is a pure geodetic coordinates in WGS-84 terrestrial coordinate system with WGS-84 ellypsoid. What is exactly emportant here is that same coordinates in EPSG:31467 don't have to be same in EPSG:4326. Because in 4326 your coordinate is an angle and in 31467 your coordinate is a distance from equator or false meridien. Axes in these systems are not collinear and related with convergence of meridians parameter. So, if you change Norting or Easting in 31467, both latitude and logitute can change.
Here you can notice an angle between blue lines (one cell is 31467 analogue) and black lines (whole grid is 4326 analogue)
https://ru.wikipedia.org/wiki/%D0%A4%D0%B0%D0%B9%D0%BB:Soviet_topographic_map_kilometer_grid.svg
It's pretty easy to check, that transformation works correctly - just do it backwards.
lon, lat = transform('EPSG:31467','EPSG:4326',
xv.flatten(), yv.flatten())
x_check, y_check = transform('EPSG:4326', 'EPSG:31467', lon, lat)
#we'll have some troubles because of computational errors, so let's round
x_check = [int(round(i, 0)) for i in x_check]
print(lon)
print(x_check)
print(xv.flatten())
>[5.574033001416839, 5.5896346633743175, 5.605236748547687, 5.574797816145165, 5.5903960110246524, 5.605994628800234, 5.5755622060626155, 5.591156935778857, 5.6067520880717225]
>[3280914, 3281914, 3282914, 3280914, 3281914, 3282914, 3280914, 3281914, 3282914]
>[3280914 3281914 3282914 3280914 3281914 3282914 3280914 3281914 3282914]
Output examples that transform() returns you exactly what it expected to return.
Next code also works as it is expected (you can match output with above one):
print(np.asarray(lon).reshape(3,3))
print(xv)
>[[5.574033 5.58963466 5.60523675]
> [5.57479782 5.59039601 5.60599463]
> [5.57556221 5.59115694 5.60675209]]
>[[3280914 3281914 3282914]
> [3280914 3281914 3282914]
> [3280914 3281914 3282914]]
I have never worked with rasterio, so I can't provide you working solution.
Some notes:
I have no idea why do you need grid for raster transformation
Rasterio docs are clear and have solution for you: https://rasterio.readthedocs.io/en/latest/topics/reproject.html#reprojecting-a-geotiff-dataset
You can transform raster between crs directly. If not in rasterio, try osgeo.gdal (gdal.Warp(dst_file, src_file, srcSRS='EPSG:31467', dstSRS='EPSG:4326')
Note the difference between reprojection and defining projection for raster. First changes image, second changes metadata. For correct work of direct transform, your GeoTIFF must have valid projection defenition in metadata (that matches actual projection of your raster)
If you're not developing standalone app and just need to reproject 2-3 rasters, use QGIS and do it without coding. It's also helpfull to try understanding geodetic concepts on 2-3 examples in QGIS before coding. Just use it as a playground
If you're not developing standalone app, you can solve your automatisation task in QGIS python API. You can test workflow with UI and then call some QGIS/GDAL tools from python script as batch. What is more - rasterio and all other packages will be avaluable for installation on QGIS' python. Of course, it's a bad idea for deployment unless you are creating a QGIS plugin
In EPSG:31467 the coordinate value of 0.001 is 1 mm. So more precise is useless. In EPSG:4326 1 degree is 111.1 km approx (or 111.3*cos(lat)). So, you can calculate useful precise. Everything more than 4-5 digit after . may also be useless

Given a geotiff file, how does one find the single pixel closest to a given latitude/longitude?

I have a geotiff file that I'm opening with gdal in Python, and I need to find the single pixel closest to a specified latitude/longitude. I was previously working with an unrelated file type for similar data, so I'm completely new to both gdal and geotiff.
How does one do this? What I have so far is
import gdal
ds = gdal.Open('foo.tiff')
width = ds.RasterXSize
height = ds.RasterYSize
gt = ds.GetGeoTransform()
gp = ds.GetProjection()
data = np.array(ds.ReadAsArray())
print(gt)
print(gp)
which produces (for my files)
(-3272421.457337171, 2539.703, 0.0, 3790842.1060354356, 0.0, -2539.703)
and
PROJCS["unnamed",GEOGCS["Coordinate System imported from GRIB file",DATUM["unnamed",SPHEROID["Sphere",6371200,0]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]]],PROJECTION["Lambert_Conformal_Conic_2SP"],PARAMETER["latitude_of_origin",25],PARAMETER["central_meridian",265],PARAMETER["standard_parallel_1",25],PARAMETER["standard_parallel_2",25],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
Ideally, there'd be a single simple function call, and it would also return an indication whether the specified location falls outside the bounds of the raster.
My fallback is to obtain a grid from another source containing the latitudes and longitudes for each pixel and then do a brute force search for the desired location, but I'm hoping there's a more elegant way.
Note: I think what I'm trying to do is equivalent to the command line
gdallocationinfo -wgs84 foo.tif <longitude> <latitude>
which returns results like
Report:
Location: (1475P,1181L)
Band 1:
Value: 66
This suggests to me that the functionality is probably already in the gdal module, if I can just find the right method to call.

You basically need two steps:
Convert the lat/lon point to the raster-projection
Convert the mapx/mapy (in raster proj) to pixel coordinates
Given the code you already posted above, defining both projection systems can be done with:
from osgeo import gdal, osr
point_srs = osr.SpatialReference()
point_srs.ImportFromEPSG(4326) # hardcode for lon/lat
# GDAL>=3: make sure it's x/y
# see https://trac.osgeo.org/gdal/wiki/rfc73_proj6_wkt2_srsbarn
point_srs.SetAxisMappingStrategy(osr.OAMS_TRADITIONAL_GIS_ORDER)
file_srs = osr.SpatialReference()
file_srs.ImportFromWkt(gp)
Creating the coordinate transformation, and using it to convert the point from lon/lat to mapx/mapy coordinates (whatever projection it is) with:
ct = osr.CoordinateTransformation(point_srs, file_srs)
point_x = -114.06138 # lon
point_y = 51.03163 # lat
mapx, mapy, z = ct.TransformPoint(point_x, point_y)
To go from map coordinates to pixel coordinates, the geotransform needs to be inverted first. And can then be used to retrieve the pixel coordinates like:
gt_inv = gdal.InvGeoTransform(gt)
pixel_x, pixel_y = gdal.ApplyGeoTransform(gt_inv, mapx, mapy)
Rounding those pixel coordinates should allow you to use them for indexing the data array. You might need to clip them if the point you're querying is outside the raster.
# round to pixel
pixel_x = round(pixel_x)
pixel_y = round(pixel_y)
# clip to file extent
pixel_x = max(min(pixel_x, width-1), 0)
pixel_y = max(min(pixel_y, height-1), 0)
pixel_data = data[pixel_y, pixel_x]

Why is transforming a shapely polygon not working in some cases?

I'm trying to calculate the size of a polygon of geographic coordinates using shapely, which seems to require a transformation into a suitable projection to yield a results in square meter. I found a couple of examples online, but I couldn't get it working for my example polygon.
I therefore tried to use the same example polygons that came with the code snippets I found, and I noticed that it works for some whole not for others. To reproduce the results, here's the minimal example code:
import json
import pyproj
from shapely.ops import transform
from shapely.geometry import Polygon, mapping
from functools import partial
coords1 = [(-97.59238135821987, 43.47456565304017),
(-97.59244690469288, 43.47962399877412),
(-97.59191951546768, 43.47962728271748),
(-97.59185396090983, 43.47456565304017),
(-97.59238135821987, 43.47456565304017)]
coords1 = reversed(coords1) # Not sure if important, but https://geojsonlint.com says it's wrong handedness
# Doesn't seem to affect the error message though
coords2 = [(13.65374516425911, 52.38533382814119),
(13.65239769133293, 52.38675829106993),
(13.64970274383571, 52.38675829106993),
(13.64835527090953, 52.38533382814119),
(13.64970274383571, 52.38390931824483),
(13.65239769133293, 52.38390931824483),
(13.65374516425911, 52.38533382814119)]
coords = coords1 # DOES NOT WORK
#coords = coords2 # WORKS
polygon = Polygon(coords)
# Print GeoJON to check on https://geojsonlint.com
print(json.dumps(mapping(polygon)))
projection = partial(pyproj.transform,
pyproj.Proj('epsg:4326'),
pyproj.Proj('esri:54009'))
transform(projection, polygon)
Both coords1 and coords2 are just copied from code snippets that supposedly work. However, only coords2 works for me. I've used https://geojsonlint.com to see if there's a difference between the two polygons, and it seems that the handedness/orientation of the polygon is not valid GeoJSON. I don't know if shapely even cares, but reversing the order -- and https://geojsonlint.com says it's valid GeoJSON then, and it shows the polygon on the map -- does not change the error.
So, it works with coords2, but when I use coords1 I get the following error:
~/env/anaconda3/envs/py36/lib/python3.6/site-packages/shapely/geometry/base.py in _repr_svg_(self)
398 if xmin == xmax and ymin == ymax:
399 # This is a point; buffer using an arbitrary size
--> 400 xmin, ymin, xmax, ymax = self.buffer(1).bounds
401 else:
402 # Expand bounds by a fraction of the data ranges
ValueError: not enough values to unpack (expected 4, got 0)
I assume there's something different about coords1 (and the example polygon from my own data) that causes the problem, but I cannot tell what could be different compared to coords2.
In short, what's the difference between coords1 and coords2, with one working and the other not?
UPDATE: I got it working by adding always_xy=True to the definition of the projections. Together with the newer syntax provided by shapely, avoiding partial, the working snippet looks like this:
project = pyproj.Transformer.from_proj(
pyproj.Proj('epsg:4326'), # source coordinate system
pyproj.Proj('epsg:3857'),
always_xy=True
) # destination coordinate system
transform(project.transform, polygon)
To be honest, even after reading the docs, I don't really know what always_xy is doing. Hence I don't want to provide is an answer.

i think you did good, only that the reversed does not create new dataset.
try to use this function to create reversed order list:
def rev_slice(mylist):
'''
return a revered list
mylist: is a list
'''
a = mylist[::-1]
return a
execute the function like so:
coords = rev_slice(coords1)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.