Related
I am trying to use the function zonal_stats from rasterstats Python package to get the raster statistics from a .tif file of each shape in a .shp file. I manage to do it in QGIS without any problems, but I have to do the same with more than 200 files, which will take a lot of time, so I'm trying the Python way. Both files and replication code are in my Google Drive.
My script is:
import rasterio
import geopandas as gpd
import numpy as np
from rasterio.plot import show
from rasterstats import zonal_stats
from rasterio.transform import Affine
# Import .tif file
raster = rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Arroz_2019-03.tif')
# Read the raster values
array = raster.read(1)
# Get the affine
affine = raster.transform
# Import shape file
shapefile = gpd.read_file(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Setores_Censit_SP_WGS84.shp')
# Zonal stats
zs_shapefile = zonal_stats(shapefile, array, affine = affine,
stats=['min', 'max', 'mean', 'median', 'majority'])
I get the following error:
Input In [1] in <cell line: 22>
zs_shapefile = zonal_stats(shapefile, array, affine = affine,
File ~\Anaconda3\lib\site-packages\rasterstats\main.py:32 in zonal_stats
return list(gen_zonal_stats(*args, **kwargs))
File ~\Anaconda3\lib\site-packages\rasterstats\main.py:164 in gen_zonal_stats
rv_array = rasterize_geom(geom, like=fsrc, all_touched=all_touched)
File ~\Anaconda3\lib\site-packages\rasterstats\utils.py:41 in rasterize_geom
rv_array = features.rasterize(
File ~\Anaconda3\lib\site-packages\rasterio\env.py:387 in wrapper
return f(*args, **kwds)
File ~\Anaconda3\lib\site-packages\rasterio\features.py:353 in rasterize
raise ValueError("width and height must be > 0")
I have found this question about the same problem, but I can't make it work with the solution: I have tried to reverse the signal of the items in the Affine of my raster data, but I couldn't make it work:
''' Trying to use the same solution of question: https://stackoverflow.com/questions/62010050/from-zonal-stats-i-get-this-error-valueerror-width-and-height-must-be-0 '''
old_tif = rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Arroz_2019-03.tif')
print(old_tif.profile) # copy & paste the output and change signs
new_tif_profile = old_tif.profile
# Affine(0.004611149999999995, 0.0, -46.828504575,
# 0.0, 0.006521380000000008, -24.01169169)
new_tif_profile['transform'] = Affine(0.004611149999999995, 0.0, -46.828504575,
0.0, -0.006521380000000008, 24.01169169)
new_tif_array = old_tif.read(1)
new_tif_array = np.fliplr(np.flip(new_tif_array))
with rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\tentativa.tif', "w", **new_tif_profile) as dest:
dest.write(new_tif_array, indexes=1)
dem = rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\tentativa.tif')
# Read the raster values
array = dem.read(1)
# Get the affine
affine = dem.transform
# Import shape file
shapefile = gpd.read_file(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Setores_Censit_SP_WGS84.shp')
# Zonal stats
zs_shapefile = zonal_stats(shapefile, array, affine=affine,
stats=['min', 'max', 'mean', 'median', 'majority'])
Doing this way, I don't get the "width and height must be > 0" error! But every stat in zs_shapefile is "NoneType", so it doesn't help my problem.
Does anyone understands why this error happens, and which sign I have to reverse for making it work? Thanks in advance!
I would be careful with overriding the geotransform of your raster like this, unless you are really convinced the original metadata is incorrect. I'm not too familiar with Affine, but it looks like you're setting the latitude now as positive? Placing the raster on the northern hemisphere. My guess would be that this lack of intersection between the vector and raster causes the NoneType results.
I'm also not familiar with raster_stats, but I'm guessing it boils down to GDAL & Numpy at the core of it. So something you could try as a test is to add the all_touched=True keyword:
https://pythonhosted.org/rasterstats/manual.html#rasterization-strategy
If that works, it might indicate that the rasterization fails because your polygons are so small compared to the pixels, that the default rasterization method results in a rasterized polygon of size 0 (in at least one of the dimensions). And that's what the error also hints at (my guess).
Keep in mind that all_touched=True changes the stats you get in result, so I would only do it for testing, or if you're comfortable with this difference.
If you really need a valid value for these (too) small polygons, there are a few workarounds you could try. Something I've done is to simply take the centroid for these polygons, and take the value of the pixel where this centroid falls on.
A potential way to identify these polygons would be to use all_touched with the "count" statistic, every polygon with a count of only 1 might be too small to get rasterized correctly. To really find this out you would probably have to do the rasterization yourself using GDAL, given that raster_stats doesn't seem to allow it.
Note that due to the shape of some of the polygons you use, the centroid might fall outside of the polygon. But given how course your raster data is, relative to the vector, I don't think it would impact the result all that much.
An alternative is, instead of modifying the vector, to significantly increase the resolution of your raster. You could use gdal_translate to output this to a VRT, with some form of resampling, and avoid having to write this data to disk. Once the resolution is high enough that all polygons rasterize to at least a 1x1 array, it should probably work. But your polygons are tiny compared to the pixels, so it'll be a lot. You could guess it, or analyze the envelopes of all polygons. For example take the smallest edge of the envelope as more or less the resolution that's necessary for a correct rasterization.
Edit; To clarify the above a bit further.
The default rasterization strategy of GDAL (all_touched=False) is to consider a pixel "within" the polygon if the centroid of the pixel intersects with the polygon.
Using QGIS you can for example convert the pixels to points, and then do a spatial join with your vector. If you remove polygons that can't be joined (there's a checkbox), you'll get a different vector that most likely should work with raster_stats, given your current raster.
You could perhaps use that in the normal way (all_touched=False), and get the stats for the small polygons using all_touched=True.
In the image below, the green polygons are the ones that intersect with the centroid of a pixel, the red ones don't (and those are probably the ones raster_stats "tries" to rasterize to a size 0 array).
I have a ply file that I am attempting to turn into a mesh for the purposes of ray tracing. It looks like this is the open3d visualizer and is supposed to represent a part of a city:
I used open3d to get make the following mesh as following(kdtree is just to get small number of points as file is huge):
input_file = "san.ply"
pcd = o3d.io.read_point_cloud(input_file)
point_cloud_in_numpy = np.asarray(pcd.points)
color = np.asarray(pcd.colors)
kd = scipy.spatial.cKDTree(point_cloud_in_numpy) #create kdtree for fast querying
near = kd.query_ball_point([0, 0, 0], 100)
items = point_cloud_in_numpy[near]
colors = color[near]
pcd2 = o3d.geometry.PointCloud()
pcd2.colors = o3d.utility.Vector3dVector(colors)
pcd2.points = o3d.utility.Vector3dVector(items)
pcd2.estimate_normals()
distances = pcd2.compute_nearest_neighbor_distance()
avg_dist = np.mean(distances)
radius = 2 * avg_dist
mesh = o3d.geometry.TriangleMesh.create_from_point_cloud_ball_pivoting(
pcd2,
o3d.utility.DoubleVector([radius, radius * 2]))
vertices = np.asarray(mesh.vertices)
faces = np.asarray(mesh.triangles)
o3d.visualization.draw_geometries([mesh])
However, when graphing the mesh, we get something that looks like this:
Many holes and just not at all optimal for ray tracing. I also tried using the create_from_point_cloud_poisson method instead however I kept on getting the following error:
[ERROR] /Users/yixing/repo/Open3D/build/poisson/src/ext_poisson/PoissonRecon/Src/FEMTree.IsoSurface.specialized.inl (Line 1463)
operator()
Failed to close loop [6: 87 64 18] | (113981): (2752,2560,2196)
which I found no way to fix online. I tried looking around but the best I found was pymeshfix which doesn't even work because "The input is assumed to represent a single closed solid object", which my point cloud is obviously not. I'm just looking for a good way to perform surface reconstruction that lets me keep the shape of the city while also fixing all the holes and making all surfaces created by points near eachother surfaces watertight.
Maybe you can close the holes with fill_holes() from the tensor-based TriangleMesh:
mesh = o3d.t.geometry.TriangleMesh.from_legacy(mesh).fill_holes().to_legacy()
fill_holes() takes a parameter for max. hole sizes to be closed
http://www.open3d.org/docs/latest/python_api/open3d.t.geometry.TriangleMesh.html#open3d.t.geometry.TriangleMesh.fill_holes
I've got a high-resolution healpix map (nside = 4096) that I want to smooth in disks of a given radius, let's say 10 arcmin.
Being very new to healpy and having read the documentation I found that one - not so good - way to do this was to perform a "cone search", that is to find around each pixels the ones inside the disk, average them and give this new value to the pixel at the center. However this is very time-consuming.
import numpy as np
import healpy as hp
kappa = hp.read_map("zs_1.0334.fits") #Reading my file
NSIDE = 4096
t = 0.00290888 #10 arcmin
new_array = []
n = len(kappa)
for i in range(n):
a = hp.query_disc(NSIDE,hp.pix2vec(NSIDE,i),t)
new_array.append(np.mean(kappa[a]))
I think the healpy.sphtfunc.smoothing function could be of some help as it states that you can enter any custom beam window function but I don't understand how this works at all...
Thanks a lot for your help !
As suggested, I can easily make use of the healpy.sphtfunc.smoothing function by specifying a custom (circular) beam window.
To compute the beam window, which was my problem, healpy.sphtfunc.beam2bl is very useful and simple in the case of a top-hat.
The appropriated l_max would roughly be 2*Nside but it can be smaller depending on specific maps. One could for example compute the angular power-spectra (the Cls) and check if it dampens for smaller l than l_max which could help gain some more time.
Thanks a lot to everyone who helped in the comments section!
since I spent a certain amount of time trying to figure out how the function smoothing was working. There is a bit of code that allows you to do a top_hat smoothing.
Cheers,
import healpy as hp
import numpy as np
import matplotlib.pyplot as plt
def top_hat(b, radius):
return np.where(abs(b)<=radius, 1, 0)
nside = 128
npix = hp.nside2npix(nside)
#create a empy map
tst_map = np.zeros(npix)
#put a source in the middle of the map with value = 100
pix = hp.ang2pix(nside, np.pi/2, 0)
tst_map[pix] = 100
#Compute the window function in the harmonic spherical space which will smooth the map.
b = np.linspace(0,np.pi,10000)
bw = top_hat(b, np.radians(45)) #top_hat function of radius 45°
beam = hp.sphtfunc.beam2bl(bw, b, nside*3)
#Smooth map
tst_map_smoothed = hp.smoothing(tst_map, beam_window=beam)
hp.mollview(tst_map_smoothed)
plt.show()
I am working on some code for converting an image to the palette of the NES. My current code is somewhat successful, but very very slow.
I am doing it by using Pythagoras' theorem. I'm using the RGB colour values as coordinates in 3D space and doing it that way. The colour in the palette with the smallest distance from the pixel's RGB is the colour that gets used.
class image_filter():
def load(self,path):
self.i = Image.open(path)
self.i = self.i.convert("RGB")
self.pix = self.i.load()
def colour_filter(self,colours=NES):
start = time.time()
for y in range(self.i.size[1]):
for x in range(self.i.size[0]):
pixel = list(self.pix[x,y])
distances = []
for colour in colours:
distance = ((colour[0]-pixel[0])**2)+((colour[1]-pixel[1])**2)+((colour[2]-pixel[2])**2)
distances.append(distance)
pixel = colours[distances.index(sorted(distances,key=lambda x:x)[0])]
self.pix[x,y] = tuple(pixel)
print "Took "+str(time.time()-start)+" seconds."
f = image_filter()
f.load("C:\\path\\to\\image.png")
f.colour_filter()
f.i.save("C:\\path\\to\\new\\image.png")
Using the list:
NES = [(124,124,124),(0,0,252),
(0,0,188),(68,40,188),
(148,0,132),(168,0,32),
(168,16,0),(136,20,0),
(80,48,0),(0,120,0),
(0,104,0),(0,88,0),
(0,64,88),(0,0,0),
(188,188,188),(0,120,248),
(0,88,248),(104,68,252),
(216,0,204),(228,0,88),
(248,56,0),(228,92,16),
(172,124,0),(0,184,0),
(0,168,0),(0,168,68),
(0,136,136),(248,248,248),
(60,188,252),(104,136,252),
(152,120,248),
(248,120,248),(248,88,152),
(248,120,88),(252,160,68),
(184,248,24),(88,216,84),
(88,248,152),(0,232,216),
(120,120,120),(252,252,252),(164,228,252),
(184,184,248),(216,184,248),
(248,184,248),(248,164,192),
(240,208,176),(252,224,168),
(248,216,120),(216,248,120),
(184,248,184),(184,248,216),
(0,252,252),(216,216,216)]
This produces the following Input:
and Output:
This takes between 14 and 20 seconds, which is much too long for its intended application. Does anyone know of any ways to greatly speed this up?
As an idea, I was thinking it may be possible to use numpy arrays for this; however I am not at all familiar enough with numpy arrays to be able to pull it off.
If possible, I would also like to try avoiding using scipy -- I know that, at least under Windows, it can be a pain to install and would prefer to steer clear.
Approach #1 : We could use Scipy's cdist to get the euclidean distances and then look for the min distance arg and thus select the appropriate colour.
Thus, with NumPy arrays as the inputs, we would have an implementation like so -
from scipy.spatial.distance import cdist
out = colours[cdist(pix.reshape(-1,3),colours).argmin(1)].reshape(pix.shape)
Approach #2 : Here's another approach with broadcasting and np.einsum -
subs = pix - colours[:,None,None]
out = colours[np.einsum('ijkl,ijkl->ijk',subs,subs).argmin(0)]
Interfacing between PIL/lists and NumPy arrays
To accept images read through PIL, use :
pix = np.asarray(Image.open('input_filename'))
To Use colours as array :
colours = np.asarray(NES)
# .... Use one of the listed approaches and get out as output array
To output the image :
i = Image.fromarray(out.astype('uint8'),'RGB')
i.save("output_filename")
Sample input, output using given colour palette NES -
I've created a class of which I pass an image (2D array, 1280x720). It's suppose to iterate through, looking for the highest value:
import bumpy as np
class myCv:
def maxIntLoc(self,image):
intensity = image[0,0] #columns, rows
coordinates = (0,0)
for y in xrange(0,len(image)):
for x in xrange(0,len(image[0])):
if np.all(image[x,y] > intensity):
intensity = image[x,y]
coordinates = (x,y)
return (intensity,coordinates)
Yet when I run it I get the error:
if np.all(image[x,y] > intensity):
IndexError: index 720 is out of bounds for axis 0 with size 720
Any help would be great as I'm new to Python.
Thanks,
Shaun
Regardless of the index error that you are experience, which has been addressed by others, iterating through pixels/voxels is not a valid method for manipulating images. The issue becomes particularly evident in multi-dimensional images, where you face the curse of dimensionality.
The correct way to do this is to use vectorisation in programming languages that support it (e.g. Python, Julia, MATLAB). Through this method, you will achieve the results you're looking for much more efficiently (and thousands of times faster). Click here to find out more about vectorisation (aka. array programming). In Python, this can be achieved either using generators, which are not suitable for images as they don't really produce the results until called; or using NumPy arrays.
Here is an example:
Masking image matrices by vectorisation
from numpy.random import randint
from matplotlib.pyplot import figure, imshow, title, grid, show
def mask_img(img, thresh, replacement):
# Copy of the image for masking. Use of |.copy()| is essential to
# prevent memory mapping.
masked = initial_image.copy()
# Replacement is the value to replace anything that
# (in this case) is bellow the threshold.
masked[initial_image<thresh] = replacement # Mask using vectorisation methods.
return masked
# Initial image to be masked (arbitrary example here).
# In this example, we assign a 100 x 100 matrix of random integers
# between 1 and 256 as our sample image.
initial_image = randint(0, 256, [100, 100])
threshold = 150 # Threshold
# Masking process.
masked_image = mask_img(initial_image, threshold, 0)
# Plots.
fig = figure(figsize=[16,9])
fig.add_subplot(121)
imshow(initial_image, interpolation='None', cmap='gray')
title('Initial image')
grid('off')
fig.add_subplot(122)
imshow(masked_image, interpolation='None', cmap='gray')
title('Masked image')
grid('off')
show()
Which returns:
Of course you can put the masking process (function) in a loop to do this on a batch of images. You can modify the indices and do it on 3D, 4D (e.g. MRI), or 5D (e.g. CAT scan) images too, without the need to iterate over each individual pixel or voxel.
Hope this helps.
In python, like most programming languages, indexes start at 0.
So you can access only pixels from 0 to 719.
Check with a debug print that len(image) and len(image[0]) are indeed returning 1280 and 720.