I am currently working on a project creating raster files using numpy arrays containing data from netCDFs. The rasters were created smoothly but when I plotted them, I realized they were actually upside down (mirrored vertically, not rotated).
# find data values and append to correct list
if tas_val >= min and tas_val <= max:
GDDs[row, column] += 1
# create the transform
x, y = lat, lon
xres, yres = 0.25, -0.25
transform = Affine.translation(x[0] - xres / 2, y[0] - yres / 2) * Affine.scale(xres, yres)
# write out GDDs array to a new raster
with rasterio.open(
f"/content/drive/Shareddrives/Brazil/GDDTiffs/GDD_Count_{ssp}_{year}_{crop}.tif",
mode="w",
driver="GTiff",
height=GDDs.shape[0],
width=GDDs.shape[1],
count=1,
dtype=GDDs.dtype,
crs="+proj=latlong",
transform = transform) as new_dataset:
new_dataset.write(GDDs, 1)
I essentially loop through values in a NetCDF, find if the temperature information is within bounds, and increment an index in the GDDs array that whose position matches the pixel in the NetCDF. My transformation is just an Affine translation and scale and seems to work correctly - it returns a raster the same dimensions and size as the original NetCDF which is the goal.
I've tried using different arrays, different transforms, as well as the np.invert(array) function in the new_dataset.write() line but nothing is able to actually flip my raster. Any ideas would be appreciated!
Related
A piece of equipment outputs a heatmap with a scale bar as an image, but has no option to the save the data as a .csv or something that can easily be imported into Python for analysis.
I have used PIL to pull in the image, then create an array of the heatmap, frame1, with dimensions 680, 900, 3 (an XY array with the 3 RGB values for each pixel). I then made an array from the scalebar, scale1, with dimensions 254, 3 (a line with the 3 RGB values for each point on the scale). To relate this to the actual scale values I create a linear space scaleval = np.linspace(maxval,minval, 254), where maxval and minval are the max and min of the scalebar, which I transcribe from the image.
I want to match each pixel in frame1 to its closest colour match in scale1, and then store the corresponding scale value from scaleval into a dataframe df. In terms of for loops, what I want to do is:
# function returning the distance between two RGB values
def distance(c1, c2):
(r1,g1,b1) = c1
(r2,g2,b2) = c2
return math.sqrt((r1 - r2)**2 + (g1 - g2) ** 2 + (b1 - b2) **2)
#cycle through columns in frame1
for j in range(frame1.shape[1]):
#cycle through rows in frame1
for k in range(frame1.shape[0]):
# create empty list for the distances between the selected pixel and the values in scale1
distances = []
# cycle through scale1 creating list of distances with current pixel
for i in range(len(scale1)):
distances.append(distance(scale1[i], frame1[k,j,:]))
# find the index position of the minimum value, and store the scale value to a dataframe in the current XY position
distarr = np.asarray(distances)
idx = distarr.argmin()
df.loc[k,j] = scaleval[idx]
print("Column " + str(j+1) + " completed")
However this would be quite slow. Any advice on how to avoid using for loops here?
In case anyone with a similar problem finds this while searching later:
I was able to vectorise the inner-most loop. The function cdist in Scipy allows you to generate a list of distances between one point and an array of points without iterating.
So this portion:
distances = []
# cycle through scale1 creating list of distances with current pixel
for i in range(len(scale1)):
distances.append(distance(scale1[i], frame1[k,j,:]))
# find the index position of the minimum value, and store the scale value to a dataframe in the current XY position
distarr = np.asarray(distances)
idx = distarr.argmin()
df.loc[k,j] = scaleval[idx]
became
# create list of distances from current pixel to values in scale1 and store index of minimum distance
idx = cdist([frame1[k,j,:]],scale1).argmin()
df.loc[k,j] = scaleval[idx]
While there are still two for loops iterating through each pixel in frame1, the above change cut the run time to less than a third of what it was.
I have a GeoTIFF of river inundation I want to associate to segments of the river for analysis.
I read the Geotiff into an array (shape - 2832, 14151) and now want to split the array into smaller bins based on a shapefile polyline with 6 segments. Basically I want to associate every point in the array to a bin - based on proximity to river reach shapefile. . The array has three values NAN, 0, and 1's.
I thought the best strategy to do this was scipy.spatial.KDTree but that's not working for me.. Tried np.digitize as well.
#Load in array and shapefile
array = rasterio.open(river)
shp = gpd.read_file(polyline)
whats the best way to sort the array into bins based on the polyline of my shapefile?
This failed because my it only works for arrays..
A = array
k = shp
ki = np.argsort(k)
ks = k[ki]
i = np.digitize(A, (ks[:-1] + ks[1:]) / 2)
Thank you for the help!! I am confused on how to bin an array by geographic proximity to polyline.
I have (a bunch of 3D) stacks of tomographic data, in those I have deduced a certain (3D) coordinate around which I need to cut out a spherical region.
My code produces me the following image which gives an overview of what I do.
I calculate the orange and green points, based on the dashed white and dashed green region.
Around the midpoint of these, I'd like to cut out a spherical region, the representation of it is now marked with a circle in the image (also drawn by my code).
Constructing a sphere with skimage.morphology.ball and multiplying this with the original image is easy to do, but how can I set the center of the sphere at the desired 3D location in the image?
The ~30 3D stacks of images are all of different size with different regions, but I have all the necessary coordinates ready for further use.
you have some radius r and an index (i,j,k) into the data.
kernel = skimage.morphology.ball(r) returns a mask/kernel which is a = 2*r + 1 along each side. It's cube-shaped.
Take a cube-shaped slice, the size of your kernel, out of the tomograph. Starting indices depend on where you need the center to be and what radius the kernel has.
piece = data[i-r:i-r+a, j-r:j-r+a, k-r:k-r+a]
Apply the binary "ball" mask/kernel to the slice.
piece *= kernel
Here are two approaches that I use that show two approaches to 'operating' (calculating values in some way) for sub-regions within an array. The two different approaches are:
So say you wanted to calculate the mean of only the values in your spherical region:
1 - Specify coordinates of the region directly as a 'slice':
data[region_coordinates].mean()
2 - Use a masked version of your array, where the mask is used to specify the region: data_masked.mean()
Which might be better depends on what you may want to do with the values in the region. Both can be used inter-changebly, you can just choose which makes your code clearer/easier/faster.
In my work, I use both approaches, but more commonly the first approach (where you specify a region as a 'slice' of coordinates).
For me, the coordinate slice approach has advantages:
1 - It's more explicitly obvious what is going on
2 - You can more easily apply geometric operations to your 'region' if you need to. (e.g. rotate, translate, scale, ...)
Here is example code, and methods you can use for either approach:
mport numpy as np
import skimage.morphology as mo
from typing import Tuple
def get_ball_coords(radius: int, center: Tuple[int]) -> Tuple[np.ndarray]:
"""
Use radius and center to return the coordinates within that 3d region
as a 'slice'.
"""
coords = np.nonzero(mo.ball(radius))
# 'coords' is a tuple of 1d arrays - to move center using pure numpy,
# first convert to a 2d array
coords_array = np.array(coords)
center_array = np.array([center]).T
# transform coordinates to be centered at 'center'
coords_array = coords_array - radius + center_array
# convert coordinates back to tuple of 1d arrays, which can be used
# directly as a slice specification
coords_tuple = (
coords_array[0,:],
coords_array[1,:],
coords_array[2,:]
)
return coords_tuple
def get_masked_array(data: np.ndarray, radius: int, center: Tuple[int]) -> np.ndarray:
"""
Return a masked version of the data array, where all values are masked
except for the values within the sphere specified by radius and center.
"""
# get 'ball' as 2d array of booleans
ball = np.array(mo.ball(radius), dtype=bool)
# create full mask over entire data array
mask = np.full_like(data, True, dtype=bool)
# un-mask the 'ball' region, translated to the 'center'
mask[
center[0]-radius: center[0]+radius+1,
center[1]-radius: center[1]+radius+1,
center[2]-radius: center[2]+radius+1
] = ~ball
# mask is now True everywhere, except inside the 'ball'
# at 'center' - create masked array version of data using this.
masked_data = np.ma.array(data=data, mask=mask)
return masked_data
# make some 3D data
data_size = (100,100,100)
data = np.random.rand(*data_size)
# define some spherical region by radius and center
region_radius = 2
region_center = (23, 41, 53)
# get coordinates of this region
region_coords = get_ball_coords(region_radius, region_center)
# get masked version of the data, based on this region
data_masked = get_masked_array(data, region_radius, region_center)
# now you can use 'region_coords' as a single 'index' (slice)
# to specify only the points with those coordinates
print('\nUSING COORDINATES:')
print('here is mean value in the region:')
print(data[region_coords].mean())
print('here is the total data mean:')
print(data.mean())
# of you can use the masked array as-is:
print('\nUSING MASKED DATA:')
print('here is mean value in the region:')
print(data_masked.mean())
print('here is the total data mean:')
print(data.mean())
I am trying to calculate the divergence of a 3D velocity field in a multi-phase flow setting (with solids immersed in a fluid). If we assume u,v,w to be the three velocity components (each a n x n x n) 3D numpy array, here is the function I have for calculating divergence:
def calc_divergence_velocity(df,h=0.025):
"""
#param df: A dataframe with the entire vector field with columns [x,y,z,u,v,w] with
x,y,z indicating the 3D coordinates of each point in the field and u,v,w
the velocities in the x,y,z directions respectively.
#param h: This is the dimension of a single side of the 3D (uniform) grid. Used
as input to numpy.gradient() function.
"""
"""
Reshape dataframe columns to get 3D numpy arrays (dim = 80) so each u,v,w is a
80x80x80 ndarray.
"""
u = df['u'].values.reshape((dim,dim,dim))
v = df['v'].values.reshape((dim,dim,dim))
w = df['w'].values.reshape((dim,dim,dim))
#Supply x,y,z coordinates appropriately.
#Note: Only a scalar `h` has been supplied to np.gradient because
#the type of grid we are dealing with is a uniform grid with each
#grid cell having the same dimensions in x,y,z directions.
u_grad = np.gradient(u,h,axis=0) #central diff. du_dx
v_grad = np.gradient(v,h,axis=1) #central diff. dv_dy
w_grad = np.gradient(w,h,axis=2) #central diff. dw_dz
"""
The `mask` column in the dataframe is a binary column indicating the locations
in the field where we are interested in measuring divergence.
The problem I am looking at is multi-phase flow with solid particles and a fluid
hence we are only interested in the fluid locations.
"""
sdf = df['mask'].values.reshape((dim,dim,dim))
div = (u_grad*sdf) + (v_grad*sdf) + (w_grad*sdf)
return div
The problem I'm having is that the divergence values that I am seeing are far too high.
For example the image below showcases, a distribution with values between [-350,350] whereas most values should technically be close to zero and somewhere between [20,-20] in my case. This tells me I'm calculating the divergence incorrectly and I would like some pointers as to how to correct the above function to calculate the divergence appropriately. As far as I can tell (please correct me if I'm wrong), I think have done something similar to this upvoted SO response. Thanks in advance!
I have multiple 2D numpy arrays (image data of a bright object, each of size 600x600), and I ran a cross-correlation on each of the individual images vs. a stacked composite image using skimage.feature.register_translation to obtain the relative subpixel shifts of each image's centroid with respect to the centroid of the composite image. I'd now like to create a weighted 2d histogram of all my individual image data, using the relative shifts of each in order to have all of them exactly centered. But I'm confused on how to do this. My code so far is below (after finding the shifts):
import numpy as np
data = #individual image data; this is an array of multiple 2D (600x600) arrays
# Shifts in x and y (each are same length as 'data')
dx = np.array([0.346, 0.23, 0.113, ...])
dy = np.array([-0.416, -0.298, 0.275, ...])
# Bins
bins = np.arange(-300, 300, 1)
# Weighted histogram
h, xe, ye = np.histogram2d(dx.ravel(), dy.ravel(), bins=bins, weights=data.ravel())
This isn't getting me anywhere though -- I think my weights parameters is wrong (I think there should be just one weight per image, instead of the whole image?), but don't know what else I would put for it. The images are of different bright sources, so I can't just assume they all have the same widths either. How can I accomplish this?