I have a raster image of shape 9000x10000 that has RGB bands. I use the below code to get the XY coordinates of all pixels in the image. But it is very slow. Is there a faster way to do it?
filename='file.dat'
inDs = gdal.Open(filename)
outDs = gdal.Translate('{}.xyz'.format(filename), inDs, format='XYZ', creationOptions=["ADD_HEADER_LINE=YES"])
I want to save the XY coordinates and the pixel values in a dataframe.
If your raster file has a GeoTransform attribute, you can try this:
import gdal
import pandas as pd
def ix2xy(r,c,gt):
'''Gets x,y from row and column'''
x = gt[0] + r * gt[1]
y = gt[3] + c * gt[5]
return(x,y)
This little function gets the X/Y coordinates from the GeoTransform attribute which is a tuple with (xorigin, xres, 0, yorigin, 0, yres).
ds = gdal.Open('file.dat')
gt = ds.GetGeoTransform()
df = pd.DataFrame.from_records(itertools.product(range(ds.RasterYSize),range(ds.RasterXSize)),columns=['Row','Column'])
ds = None
df['X'], df['Y'] = zip(*df.apply(lambda x: ix2xy(x['Column'],x['Row'],gt),axis=1))
This should give you a tidy dataframe with the columns Row, Column, X and Y.
Related
I just try out the numpy to find the mean for each column and row of the image, but it show up like a lot of number
import cv2
import numpy as np
shape = np.shape(img)
pixels = np.array(img)
column_means = np.mean(pixels, axis=0) //can change axis = 1 for row
column_max = np.max(pixels, axis=0) //can change axis = 1 for row
column_min = np.min(pixels, axis=0) //can change axis = 1 for row
column_subtract = column_max - column_min //can change axis = 1 for row
print(shape)
print(column_means)
print(column_max)
print(column_min)
print(column_subtract)
Here is sample photo I'm working on it
I have a 3D numpy array:
data0 = np.random.rand(30, 50, 50)
I have a 2D surface:
surf = np.random.rand(50, 50) * 30
surf = surf.astype(int)
Now I want to assign '0' to data0 along the surface profile. Which I know for loop can achieve this:
for xx in range(50):
for yy in range(50):
data0[0:surf[xx, yy], xx, yy] = 0
Data0 is a 3D volume with size of 30 * 50 * 50. surf is a 2D surface profile with size of 50 * 50. What I am trying to do is filling '0' from top to the surface (axis=0) in the volume
Here, 'for' loop is very slow, and it is inefficient when data0 is very huge. Could someone advise how to efficiently assign the values based on the surf profile?
If you want to use numpy, you can create a mask with z-index values below your surf values set to True, then fill those cells with zeros:
import numpy as np
np.random.seed(123)
x, y, z = 4, 5, 3
data0 = np.random.rand(z, x, y)
surf = np.random.rand(x, y) * z
surf = surf.astype(int)
#your attempt
#we copy the data just for the comparison
data_loop = data0.copy()
for xx in range(x):
for yy in range(y):
data_loop[0:surf[xx, yy], xx, yy] = 0
#again, we copy the data just for the comparison
data_np = data0.copy()
#masking the cells according to your index comparison criteria
mask = np.broadcast_to(np.arange(data0.shape[0])[:,None, None], data0.shape) < surf[None, :]
#set masked values to zero
data_np[mask] = 0
#check for equivalence of resulting arrays
print((data_np==data_loop).all())
I am sure there is a better, numpier way to generate the index number mask. As it is, this version is not necessarily faster. This depends on the shape of your array.
For x=500, y=200, and z=3000, your loop takes 1.42 s and my numpy approach 1.94 s.
For the same array size but with shape x=5000, y=2000, and z=30, your loop approach takes 7.06 s and the numpy approach 1.95 s.
I am trying to deconstruct a TIFF image to XY coordinates in Python. The first column in dataframe should be X and second column as Y. A sample file is attached. Would it be done the same way as a JPEG? So far, I have tried the code below.
Sample Tiff Image:
https://file-examples.com/index.php/sample-images-download/sample-tiff-download/
Sample Code (can't figure out what I'm missing to get coordinates):
from PIL import Image
import numpy as np
import pandas as pd
image= Image.open(r"file_example_TIFF_1MB.tiff")
mypixels= colourImg.convert("RGB")
colors = np.array(mypixels.getdata()).reshape(image.size + (3,))
colors is a WxHx3 array. Let's look at colors[:, :, 0], which is the red channel. The X coordinate is the column-index, and the Y-coordinate is the row index. To create the dataframe you want, you need to iterate over these.
result = []
h, w, _ = colors.shape
for x in range(w):
for y in range(h):
result.append([x, y, *colors[y, x, :]])
Then, create your dataframe
df = pd.DataFrame(result, columns=["X", "Y", "R", "G", "B"])
Alternatively, use numpy.meshgrid() to create the x and y numbers for you, and then flatten them into a column vector. This will be much faster than the loopy approach for larger images.
result = np.zeros((w * h, 5))
h, w, c = colors.shape
xg, yg = np.meshgrid(range(w), range(h))
result[:, 0] = xg.flatten()
result[:, 1] = yg.flatten()
for i in range(c):
result[:, 2 + i] = colors[:, :, i].flatten()
Now you have the same thing as before in result, so you can convert it to a dataframe.
I have a set of netcdf datasets that basically look like a CSV file with columns for latitude, longitude, value. These are points along tracks that I want to aggregate to a regular grid of (say) 1 degree from -90 to 90 and -180 to 180 degrees, by for example calculating the mean and/or standard deviation of all points that fall within a given cell.
This is quite easily done with a loop
D = np.zeros((180, 360))
for ilat in np.arange(-90, 90, 1, dtype=np.int):
for ilon in np.arange(-180, 180, 1, dtype=np.int):
p1 = np.logical_and(ds.lat >= ilat,
ds.lat <= ilat + 1)
p2 = np.logical_and(ds.lon >=ilon,
ds.lon <= ilon+1)
if np.sum(p1*p2) == 0:
D[90 + ilat, 180 +ilon] = np.nan
else:
D[90 + ilat, 180 + ilon] = np.mean(ds.var.values[p1*p2])
# D[90 + ilat, 180 + ilon] = np.std(ds.var.values[p1*p2])
Other than using numba/cython to speed this up, I was wondering whether this is something you can directly do with xarray in a more efficient way?
You should be able to solve this using pandas and xarray.
You will first need to convert your data set to a pandas data frame.
Once this is done, df is the dataframe and assuming longitude and latitude are lon/lat, you will need to round the lon/lats to the nearest integer value, and then calculate the mean for each lon/lat. You will then need to set lon/lat to indices. Then you can use xarray's to_xarray to convert to an array:
import xarray as xr
import pandas as pd
import numpy as np
df = df.assign(lon = lambda x: np.round(x.lon))
df = df.assign(lat = lambda x: np.round(x.lat))
df = df.groupby(["lat", "lon"]).mean()
df = df.set_index(["lat", "lon"])
df.to_xarray()
I use #robert-wilson as a starting point, and to_xarray is indeed part of my solution. Other inspiration came from here. The approach that I used is shown below. It's probably slower than numba-ing my solution above, but much simpler.
import netCDF4
import numpy as np
import xarray as xr
import pandas as pd
fname = "super_funky_file.nc"
f = netCDF4.Dataset(fname)
lat = f.variables['lat'][:]
lon = f.variables['lon'][:]
vari = f.variables['super_duper_variable'][:]
df = pd.DataFrame({"lat":lat,
"lon":lon,
"vari":vari})
# Simple functions to calculate the grid location in rows/cols
# using lat/lon as inputs. Global 0.5 deg grid
# Remember to cast to integer
to_col = lambda x: np.floor(
(x+90)/0.5).astype(
np.int)
to_row = lambda x: np.floor(
(x+180.)/0.5).astype(
np.int)
# Map the latitudes to columns
# Map the longitudes to rows
df['col'] = df.lat.map(to_col)
df['row'] = df.lon.map(to_row)
# Aggregate by row and col
gg = df.groupby(['col', 'row'])
# Now, create an xarray dataset with
# the mean of vari per grid cell
ds = gg.mean().to_xarray()
dx = gg.std().to_xarray()
ds['stdi'] = dx['vari']
dx = gg.count().to_xarray()
ds['counti'] = dx['vari']```
I want to use a raster for a A* and bidirectional Dijkstra path analysis in NetworkX. I am using Python for this project.
Raster example (it's a png file converted when uploaded, but the real problem is TIFF):
First I read in the raster with GDAL
input_raster = "raster.tif"
raster = gdal.Open(input_raster)
Next I read the raster as an array
bandraster = raster.GetRasterBand(1)
arr = bandraster.ReadAsArray()
So, I'll transform coords using a function:
def coord2pixelOffset(rasterfn, x, y):
raster = gdal.Open(rasterfn)
geotransform = raster.GetGeoTransform()
originX = geotransform[0]
originY = geotransform[3]
pixelWidth = geotransform[1]
pixelHeight = geotransform[5]
xOffset = int((x - originX)/pixelWidth)
yOffset = int((y - originY)/pixelHeight)
return xOffset, yOffset
CostSurfacefn = 'raster.tif'
source_coord = (-41.1823753163, -13.83393276)
target_coord = (-40.3726182077, -14.2361991946)
# coordinates to array index
source = coord2pixelOffset(CostSurfacefn, source_coord[0], source_coord[1])
target = coord2pixelOffset(CostSurfacefn, target_coord[0], target_coord[1])
The array is like this (example):
# Grid with 2x2. The float numbers are the pixel values
[[ 1.83781120e+08 1.90789248e+08]
[ 1.83781120e+08 1.90789248e+08]]
# array[0][0] is 1.83781120e+08
# array[0][1] is 1.90789248e+08
# array[1][0] is 1.83781120e+08
# array[1][1] is 1.90789248e+08
Next, the graph is loaded and bi-dijkstra function is called (but I want for example from array[0][0] to array[1][1] ):
G = nx.from_numpy_matrix(np.array(arr))
length, path = nx.bidirectional_dijkstra(G, source, target)
How to get the node id of source and target by array?