delete pixel from the raster image with specific range value

delete pixel from the raster image with specific range value - python

update :
any idea how to delete pixel from specific range value raster image with
using numpy/scipy or gdal?
or how to can create new raster with some class using raster calculation expressions(better)
for example i have a raster image with the
5 class :
1. 0-100
2. 100-200
3. 200-300
4. 300-500
5. 500-1000
and i want to delete class 1 range value
or maybe class 1,2,4,5
i begin with this script :
import numpy as np
from osgeo import gdal
ds = gdal.Open("raster3.tif")
myarray = np.array(ds.GetRasterBand(1).ReadAsArray())
#print myarray.shape
#print myarray.size
#print myarray
new=np.delete(myarray[::2], 1)
but i cant to complete
the image
White in class 5 and black class 1

Rasters are 2-D arrays of values, with each value being stored in a pixel (which stands for picture element). Each pixel must contain some information. It is not possible to delete or remove pixels from the array because rasters are usually encoded as a simple 1-dimensional string of bits. Metadata commonly helps explain where line breaks are and the length of the bitstring, so that the 1-D bitstring can be understood as a 2-D array. If you "remove" a pixel, then you break the raster. The 2-D grid is no longer valid.
Of course, there are many instances where you do want to effectively discard or clean the raster of data. Such an example might be to remove pixels that cover land from a raster of sea-surface temperatures. To accomplish this goal, many geospatial raster formats hold metadata describing what are called NoData values. Pixels containing a NoData value are interpreted as not existing. Recall that in a raster, each pixel must contain some information. The NoData paradigm allows the structure and format of rasters to be met, while also giving a way to mask pixels from being displayed or analyzed. There is still data (bits, 1s and 0s) at the masked pixels, but it only serves to identify the pixel as invalid.
With this in mind, here is an example using gdal which will mask values in the range of 0-100 so they are NoData, and "do not exist". The NoData value will be specified as 0.
from osgeo import gdal
# open dataset to read, and get a numpy array
ds = gdal.Open("raster3.tif", 'r')
myarray = ds.GetRasterBand(1).ReadAsArray()
# modify numpy array to mask values
myarray[myarray <= 100] = 0
# open output dataset, which is a copy of original
driver = gdal.GetDriverByName('GTiff')
ds_out = driver.CreateCopy("raster3_with_nodata.tif", ds)
# write the modified array to the raster
ds_out.GetRasterBand(1).WriteArray(myarray)
# set the NoData metadata flag
ds_out.GetRasterBand(1).SetNoDataValue(0)
# clear the buffer, and ensure file is written
ds_out.FlushCache()

I want to contribute with an example for landsat data.
In this qick guide, you will be able to exclude Landsat cloud pixels.
Landsat offers the Quality Assessment Band (BQA), which includes int32 values (classes) regarding natural features such as Clouds, Rocks, Ice, Water, Cloud Shadow etc.
We will use the BQA to clip the cloud pixels in the other bands.
# Import Packages
import rasterio as rio
import earthpy.plot as ep
from matplotlib import pyplot
import rioxarray as rxr
from numpy import ma
# Open the Landsat Band 3
Landsat_Image = rxr.open_rasterio(r"C:\...\LC08_L1TP_223075_20210311_20210317_01_T1_B3.tif")
# Open the Quality Assessment Band
BQA = rxr.open_rasterio(r"C:\...\LC08_L1TP_223075_20210311_20210317_01_T1_BQA.tif").squeeze()
# Create a list with the QA values that represent cloud, cloud_shadow, etc.
Cloud_Values = [6816, 6848, 6896, 7072]
# Mask the data using the pixel QA layer
landsat_masked = Landsat_Image.where(~BQA.isin(Cloud_Values))
landsat_masked
# Plot the masked data
landsat_masked_plot = ma.masked_array(landsat_masked.values,landsat_masked.isnull())
# Plot
ep.plot_rgb(landsat_masked_plot, rgb=[2, 1, 0], title = "Masked Data")
plt.show()
###############################################################################
# Export the masked Landsat Scenes to Directory "Masked_Bands_QA"
out_img = landsat_masked
out_img.shape
out_transform = landsat_masked.rio.transform()
# Get a Band of the same Scene for reference
rastDat = rio.open(r"C:\Dados_Espaciais\NDVI_Usinas\Adeco\Indices\Imagens\LC08_L1TP_223075_20210311_20210317_01_T1\LC08_L1TP_223075_20210311_20210317_01_T1_B3.tif")
#copying metadata from original raster
out_meta = rastDat.meta.copy()
#amending original metadata
out_meta.update({'nodata': 0,
'height' : out_img.shape[1],
'width' : out_img.shape[2],
'transform' : out_transform})
# writing and then re-reading the output data to see if it looks good
for i in range(out_img.shape[0]):
with rio.open(rf"C:\Dados_Espaciais\DSM\Bare_Soil_Landsat\Teste_{i+1}_masked.tif",'w',**out_meta) as dst:
dst.write(out_img[i,:,:],1)
This way you tell the program:
Check the areas in BQA with these "Cloud_Values" and exclude these areas, but in the landsat image that I provided.
I hope it works.

Related

covert rgb png and depth txt to point cloud

I have a series of rgb files in png format, as well as the corresponding depth file in txt format, which can be loaded with np.loadtxt. How could I merge these two files to point cloud using open3d?
I followed the procedure as obtain point cloud from depth numpy array using open3d - python, but the result is not readable for human.
The examples is listed here:
the source png:
the pcd result:
You can get the source file from this link ![google drive] to reproduce my result.
By the way, the depth and rgb are not registerd.
Thanks.

I had to play a bit with the settings and data and used mainly the answer of your SO link.
import cv2
import numpy as np
import open3d as o3d
color = o3d.io.read_image("a542c.png")
depth = np.loadtxt("a542d.txt")
vertices = []
for x in range(depth.shape[0]):
for y in range(depth.shape[1]):
vertices.append((float(x), float(y), depth[x][y]))
pcd = o3d.geometry.PointCloud()
point_cloud = np.asarray(np.array(vertices))
pcd.points = o3d.utility.Vector3dVector(point_cloud)
pcd.estimate_normals()
pcd = pcd.normalize_normals()
o3d.visualization.draw_geometries([pcd])
However, if you keep the code as provided, the whole scene looks very weird and unfamiliar. That is because your depth file contains data between 0 and almost 2.5 m.
I introduced a cut-off at 500 or 1000 mm plus removed all 0s as suggested in the other answer. Additionally I flipped the x-axis (float(-x) instead of float(x)) to resemble your photo.
# ...
vertices = []
for x in range(depth.shape[0]):
for y in range(depth.shape[1]):
if 0< depth[x][y]<500:
vertices.append((float(-x), float(y), depth[x][y]))
For a good perspective I had to rotate the images manually. Probably open3d provides methods to do it automatically (I quickly tried pcd.transform() from your SO link above, it can help you if needed).
Results
500 mm cut-off: and 1000 mm cut-off: .

I used laspy instead of open3d because wanted to give some colors to your image:
import imageio
import numpy as np
# first reading the image for RGB values
image = imageio.imread(".../a542c.png")
loading the depth file
depth = np.loadtxt("/home/shaig93/Documents/internship_FWF/a542d.txt")
# creating fake x, y coordinates with meshgrid
xv, yv = np.meshgrid(np.arange(400), np.arange(640), indexing='ij')
# save_las is a function based on laspy that was provided to me by my supervisor
save_las("fn.laz", image[:400, :, 0].flatten(), np.c_[yv.flatten(), xv.flatten(), depth.flatten()], cmap = plt.cm.magma_r)
and the result is this. As you can see objects are visible from front.
However from side they are not easy to distinguish.
This means to me to think that your depth file is not that good.
Another idea would be also getting rid off 0 values from your depth file so that you can get point cloud without a wall kind of structure in the front. But still does not solve depth issue of course.
ps. I know this is not a proper answer but I hope it was helpful on identifying the problem.

zonal_stats: width and height must be > 0 error

I am trying to use the function zonal_stats from rasterstats Python package to get the raster statistics from a .tif file of each shape in a .shp file. I manage to do it in QGIS without any problems, but I have to do the same with more than 200 files, which will take a lot of time, so I'm trying the Python way. Both files and replication code are in my Google Drive.
My script is:
import rasterio
import geopandas as gpd
import numpy as np
from rasterio.plot import show
from rasterstats import zonal_stats
from rasterio.transform import Affine
# Import .tif file
raster = rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Arroz_2019-03.tif')
# Read the raster values
array = raster.read(1)
# Get the affine
affine = raster.transform
# Import shape file
shapefile = gpd.read_file(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Setores_Censit_SP_WGS84.shp')
# Zonal stats
zs_shapefile = zonal_stats(shapefile, array, affine = affine,
stats=['min', 'max', 'mean', 'median', 'majority'])
I get the following error:
Input In [1] in <cell line: 22>
zs_shapefile = zonal_stats(shapefile, array, affine = affine,
File ~\Anaconda3\lib\site-packages\rasterstats\main.py:32 in zonal_stats
return list(gen_zonal_stats(*args, **kwargs))
File ~\Anaconda3\lib\site-packages\rasterstats\main.py:164 in gen_zonal_stats
rv_array = rasterize_geom(geom, like=fsrc, all_touched=all_touched)
File ~\Anaconda3\lib\site-packages\rasterstats\utils.py:41 in rasterize_geom
rv_array = features.rasterize(
File ~\Anaconda3\lib\site-packages\rasterio\env.py:387 in wrapper
return f(*args, **kwds)
File ~\Anaconda3\lib\site-packages\rasterio\features.py:353 in rasterize
raise ValueError("width and height must be > 0")
I have found this question about the same problem, but I can't make it work with the solution: I have tried to reverse the signal of the items in the Affine of my raster data, but I couldn't make it work:
''' Trying to use the same solution of question: https://stackoverflow.com/questions/62010050/from-zonal-stats-i-get-this-error-valueerror-width-and-height-must-be-0 '''
old_tif = rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Arroz_2019-03.tif')
print(old_tif.profile) # copy & paste the output and change signs
new_tif_profile = old_tif.profile
# Affine(0.004611149999999995, 0.0, -46.828504575,
# 0.0, 0.006521380000000008, -24.01169169)
new_tif_profile['transform'] = Affine(0.004611149999999995, 0.0, -46.828504575,
0.0, -0.006521380000000008, 24.01169169)
new_tif_array = old_tif.read(1)
new_tif_array = np.fliplr(np.flip(new_tif_array))
with rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\tentativa.tif', "w", **new_tif_profile) as dest:
dest.write(new_tif_array, indexes=1)
dem = rasterio.open(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\tentativa.tif')
# Read the raster values
array = dem.read(1)
# Get the affine
affine = dem.transform
# Import shape file
shapefile = gpd.read_file(r'M:\PUBLIC\Felipe Dias\Pesquisa\Interpolação Espacial\Setores_Censit_SP_WGS84.shp')
# Zonal stats
zs_shapefile = zonal_stats(shapefile, array, affine=affine,
stats=['min', 'max', 'mean', 'median', 'majority'])
Doing this way, I don't get the "width and height must be > 0" error! But every stat in zs_shapefile is "NoneType", so it doesn't help my problem.
Does anyone understands why this error happens, and which sign I have to reverse for making it work? Thanks in advance!

I would be careful with overriding the geotransform of your raster like this, unless you are really convinced the original metadata is incorrect. I'm not too familiar with Affine, but it looks like you're setting the latitude now as positive? Placing the raster on the northern hemisphere. My guess would be that this lack of intersection between the vector and raster causes the NoneType results.
I'm also not familiar with raster_stats, but I'm guessing it boils down to GDAL & Numpy at the core of it. So something you could try as a test is to add the all_touched=True keyword:
https://pythonhosted.org/rasterstats/manual.html#rasterization-strategy
If that works, it might indicate that the rasterization fails because your polygons are so small compared to the pixels, that the default rasterization method results in a rasterized polygon of size 0 (in at least one of the dimensions). And that's what the error also hints at (my guess).
Keep in mind that all_touched=True changes the stats you get in result, so I would only do it for testing, or if you're comfortable with this difference.
If you really need a valid value for these (too) small polygons, there are a few workarounds you could try. Something I've done is to simply take the centroid for these polygons, and take the value of the pixel where this centroid falls on.
A potential way to identify these polygons would be to use all_touched with the "count" statistic, every polygon with a count of only 1 might be too small to get rasterized correctly. To really find this out you would probably have to do the rasterization yourself using GDAL, given that raster_stats doesn't seem to allow it.
Note that due to the shape of some of the polygons you use, the centroid might fall outside of the polygon. But given how course your raster data is, relative to the vector, I don't think it would impact the result all that much.
An alternative is, instead of modifying the vector, to significantly increase the resolution of your raster. You could use gdal_translate to output this to a VRT, with some form of resampling, and avoid having to write this data to disk. Once the resolution is high enough that all polygons rasterize to at least a 1x1 array, it should probably work. But your polygons are tiny compared to the pixels, so it'll be a lot. You could guess it, or analyze the envelopes of all polygons. For example take the smallest edge of the envelope as more or less the resolution that's necessary for a correct rasterization.
Edit; To clarify the above a bit further.
The default rasterization strategy of GDAL (all_touched=False) is to consider a pixel "within" the polygon if the centroid of the pixel intersects with the polygon.
Using QGIS you can for example convert the pixels to points, and then do a spatial join with your vector. If you remove polygons that can't be joined (there's a checkbox), you'll get a different vector that most likely should work with raster_stats, given your current raster.
You could perhaps use that in the normal way (all_touched=False), and get the stats for the small polygons using all_touched=True.
In the image below, the green polygons are the ones that intersect with the centroid of a pixel, the red ones don't (and those are probably the ones raster_stats "tries" to rasterize to a size 0 array).

cant save an 4d array int .txt file

I am using the sliding window technic to an image and i am extracting the mean values of pixels of each one window. So the results are someting like this [[[[215.015625][123.55036272][111.66057478]]]].now the question is how could i save all these values for every one window into a txt file or at a CSV because i want to use them for further compare similarities? whatever i tried the error is same..that it is a 4D array and not an 1D or 2D. I ll appreciate any help really.! Thank you in advance
import cv2
import matplotlib.pyplot as plt
import numpy as np
# read the image and define the stepSize and window size
# (width,height)
image2 = cv2.imread("bird.jpg")# your image path
image = cv2.resize(image2, (224, 224))
tmp = image # for drawing a rectangle
stepSize = 10
(w_width, w_height) = (60, 60 ) # window size
for x in range(0, image.shape[1] - w_width, stepSize):
for y in range(0, image.shape[0] - w_height, stepSize):
window = image[x:x + w_width, y:y + w_height, :]
# classify content of the window with your classifier and
# determine if the window includes an object (cell) or not
# draw window on image
cv2.rectangle(tmp, (x, y), (x + w_width, y + w_height), (255, 0, 0), 2) # draw rectangle on image
plt.imshow(np.array(tmp).astype('uint8'))
# show all windows
plt.show()
mean_values=[]
mean_val, std_dev = cv2.meanStdDev(image)
mean_val = mean_val[:3]
mean_values.append([mean_val])
mean_values = np.asarray(mean_values)
print(mean_values)

Human Readable Option
Assuming that you want the data to be human readable, saving the data takes a little bit more work. My search showed me that there's this solution for saving 3D data to a text file. However, it's pretty simple to extend this example to 4D for your use case. This code is taken and adapted from that post, thank you Joe Kington and David Cheung.
import numpy as np
data = np.arange(2*3*4*5).reshape((2,3,4,5))
with open('test.csv', 'w') as outfile:
# We write this header for readable, the pound symbol
# will cause numpy to ignore it
outfile.write('# Array shape: {0}\n'.format(data.shape))
# Iterating through a ndimensional array produces slices along
# the last axis. This is equivalent to data[i,:,:] in this case.
# Because we are dealing with 4D data instead of 3D data,
# we need to add another for loop that's nested inside of the
# previous one.
for threeD_data_slice in data:
for twoD_data_slice in threeD_data_slice:
# The formatting string indicates that I'm writing out
# the values in left-justified columns 7 characters in width
# with 2 decimal places.
np.savetxt(outfile, twoD_data_slice, fmt='%-7.2f')
# Writing out a break to indicate different slices...
outfile.write('# New slice\n')
And then once the data has been saved all you need to do is load it and reshape it (np.load()) will default to reading in the data as a 2D array but np.reshape() will allow us to recover the structure. Again, this code is adapted from the previous post.
new_data = np.loadtxt('test.csv')
# Note that this returned a 2D array!
print(new_data.shape)
# However, going back to 3D is easy if we know the
# original shape of the array
new_data = new_data.reshape((2,3,4,5))
# Just to check that they're the same...
assert np.all(new_data == data)
Binary Option
Assuming that human readability is not necessary, I would recommend using the built-in *.npy format which is described here. This stores the data in a binary format.
You can save the array by doing np.save('NAME_OF_ARRAY.npy', ARRAY_TO_BE_SAVED) and then load it with SAVED_ARRAY = np.load('NAME_OF_ARRAY.npy').
You can also save several numpy array in a single zip file with the np.savez() function like so np.savez('MANY_ARRAYS.npz', ARRAY_ONE, ARRAY_TWO). And you load the zipped arrays in a similar fashion SEVERAL_ARRAYS = np.load('MANY_ARRAYS.npz').

Read large numpy-array as czi-file in python

I have the following snippet:
from aicspylibczi import CziFile
from pathlib import Path
pth = Path('/Volumes/USB/20x_HE.czi')
czi = CziFile(pth)
image, shp = czi.read_image(C=0, M=0) # very slow
The parameters C und M are there to slice the big array in to little numpy pieces.
The File is 3,4GB big and it is taking to long(with 8GB RAM Macbook) so I abort it always.
I think thats not okay because I want to have the first slice of the array, not the whole matrix.

You can try slideio python package (http://slideio.com). It makes use of internal image pyramids. You can read the image partially with high resolution or the whole image with low resolution.
The code below rescales the image so that the width of the delivered raster will be 500 pixels (the height is computed to keep the image size ratio).
import slideio
slide = slideio.open_slidei(file_path="/data/a.czi",driver_id="CZI")
scene = slide.get_scene(0)
block = scene.read_block(size=(500,0))

By slice do you mean the first slice of a z-stack? The package you are using, aicspylibczi, allows you to specify a z coordinate e.g. to read the first z-slice:
image, shp = czi.read_image(C=0, M=0, Z=0)

Python IndexError: Out of bounds

I've created a class of which I pass an image (2D array, 1280x720). It's suppose to iterate through, looking for the highest value:
import bumpy as np
class myCv:
def maxIntLoc(self,image):
intensity = image[0,0] #columns, rows
coordinates = (0,0)
for y in xrange(0,len(image)):
for x in xrange(0,len(image[0])):
if np.all(image[x,y] > intensity):
intensity = image[x,y]
coordinates = (x,y)
return (intensity,coordinates)
Yet when I run it I get the error:
if np.all(image[x,y] > intensity):
IndexError: index 720 is out of bounds for axis 0 with size 720
Any help would be great as I'm new to Python.
Thanks,
Shaun

Regardless of the index error that you are experience, which has been addressed by others, iterating through pixels/voxels is not a valid method for manipulating images. The issue becomes particularly evident in multi-dimensional images, where you face the curse of dimensionality.
The correct way to do this is to use vectorisation in programming languages that support it (e.g. Python, Julia, MATLAB). Through this method, you will achieve the results you're looking for much more efficiently (and thousands of times faster). Click here to find out more about vectorisation (aka. array programming). In Python, this can be achieved either using generators, which are not suitable for images as they don't really produce the results until called; or using NumPy arrays.
Here is an example:
Masking image matrices by vectorisation
from numpy.random import randint
from matplotlib.pyplot import figure, imshow, title, grid, show
def mask_img(img, thresh, replacement):
# Copy of the image for masking. Use of |.copy()| is essential to
# prevent memory mapping.
masked = initial_image.copy()
# Replacement is the value to replace anything that
# (in this case) is bellow the threshold.
masked[initial_image<thresh] = replacement # Mask using vectorisation methods.
return masked
# Initial image to be masked (arbitrary example here).
# In this example, we assign a 100 x 100 matrix of random integers
# between 1 and 256 as our sample image.
initial_image = randint(0, 256, [100, 100])
threshold = 150 # Threshold
# Masking process.
masked_image = mask_img(initial_image, threshold, 0)
# Plots.
fig = figure(figsize=[16,9])
fig.add_subplot(121)
imshow(initial_image, interpolation='None', cmap='gray')
title('Initial image')
grid('off')
fig.add_subplot(122)
imshow(masked_image, interpolation='None', cmap='gray')
title('Masked image')
grid('off')
show()
Which returns:
Of course you can put the masking process (function) in a loop to do this on a batch of images. You can modify the indices and do it on 3D, 4D (e.g. MRI), or 5D (e.g. CAT scan) images too, without the need to iterate over each individual pixel or voxel.
Hope this helps.

In python, like most programming languages, indexes start at 0.
So you can access only pixels from 0 to 719.
Check with a debug print that len(image) and len(image[0]) are indeed returning 1280 and 720.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.