I am currently working on a project creating raster files using numpy arrays containing data from netCDFs. The rasters were created smoothly but when I plotted them, I realized they were actually upside down (mirrored vertically, not rotated).
# find data values and append to correct list
if tas_val >= min and tas_val <= max:
GDDs[row, column] += 1
# create the transform
x, y = lat, lon
xres, yres = 0.25, -0.25
transform = Affine.translation(x[0] - xres / 2, y[0] - yres / 2) * Affine.scale(xres, yres)
# write out GDDs array to a new raster
with rasterio.open(
f"/content/drive/Shareddrives/Brazil/GDDTiffs/GDD_Count_{ssp}_{year}_{crop}.tif",
mode="w",
driver="GTiff",
height=GDDs.shape[0],
width=GDDs.shape[1],
count=1,
dtype=GDDs.dtype,
crs="+proj=latlong",
transform = transform) as new_dataset:
new_dataset.write(GDDs, 1)
I essentially loop through values in a NetCDF, find if the temperature information is within bounds, and increment an index in the GDDs array that whose position matches the pixel in the NetCDF. My transformation is just an Affine translation and scale and seems to work correctly - it returns a raster the same dimensions and size as the original NetCDF which is the goal.
I've tried using different arrays, different transforms, as well as the np.invert(array) function in the new_dataset.write() line but nothing is able to actually flip my raster. Any ideas would be appreciated!
Let's say I have a large array of values that represent terrain latitude locations that is shape x. I also have another array of values that represent terrain longitude values that is shape y. All of the values in x as well as y are equally spaced at 0.005-degrees. In other words:
lons[0:10] = [-130.0, -129.995, -129.99, -129.985, -129.98, -129.975, -129.97, -129.965, -129.96, -129.955]
lats[0:10] = [55.0, 54.995, 54.99, 54.985, 54.98, 54.975, 54.97, 54.965, 54.96, 54.955]
I have a second dataset that is projected in an irregularly-spaced lat/lon grid (but equally spaced ~ 25 meters apart) that is [m,n] dimensions big, and falls within the domain of x and y. Furthermore, we also have all of the lat/lon points within this second dataset. I would like to 'lineup' the grids such that every value of [m,n] matches the nearest neighbor terrain value within the larger grid. I am able to do this with the following code where I basically loop through every lat/lon value in dataset two, and try to find the argmin of a the calculated lat/lon values from dataset1:
for a in range(0,lats.shape[0]):
# Loop through the ranges
for r in range(0,lons.shape[0]):
# Access the elements
tmp_lon = lons[r]
tmp_lat = lats[a]
# Now we need to find where the tmp_lon and tmp_lat match best with the index from new_lats and new_lons
idx = (np.abs(new_lats - tmp_lat)).argmin()
idy = (np.abs(new_lons - tmp_lon)).argmin()
# Make our final array!
second_dataset_trn[a,r] = first_dataset_trn[idy,idx]
Except it is exceptionally slow. Is there another method, either through a package, library, etc. that can speed this up?
Please take a look at the following previous question for iterating over two lists, which may improve the speed: Is there a better way to iterate over two lists, getting one element from each list for each iteration?
A possible correction to the sample code: assuming that the arrays are organized in the standard GIS fashion of Latitude, Longitude, I believe there is an error in the idx and idy variable assignments - the variables receiving the assignments should be swapped (idx should be idy, and the other way around). For example:
# Now we need to find where the tmp_lon and tmp_lat match best with the index from new_lats and new_lons
idy = (np.abs(new_lats - tmp_lat)).argmin()
idx = (np.abs(new_lons - tmp_lon)).argmin()
So, I have three numpy arrays which store latitude, longitude, and some property value on a grid -- that is, I have LAT(y,x), LON(y,x), and, say temperature T(y,x), for some limits of x and y. The grid isn't necessarily regular -- in fact, it's tripolar.
I then want to interpolate these property (temperature) values onto a bunch of different lat/lon points (stored as lat1(t), lon1(t), for about 10,000 t...) which do not fall on the actual grid points. I've tried matplotlib.mlab.griddata, but that takes far too long (it's not really designed for what I'm doing, after all). I've also tried scipy.interpolate.interp2d, but I get a MemoryError (my grids are about 400x400).
Is there any sort of slick, preferably fast way of doing this? I can't help but think the answer is something obvious... Thanks!!
Try the combination of inverse-distance weighting and
scipy.spatial.KDTree
described in SO
inverse-distance-weighted-idw-interpolation-with-python.
Kd-trees
work nicely in 2d 3d ..., inverse-distance weighting is smooth and local,
and the k= number of nearest neighbours can be varied to tradeoff speed / accuracy.
There is a nice inverse distance example by Roger Veciana i Rovira along with some code using GDAL to write to geotiff if you're into that.
This is of coarse to a regular grid, but assuming you project the data first to a pixel grid with pyproj or something, all the while being careful what projection is used for your data.
A copy of his algorithm and example script:
from math import pow
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt
def pointValue(x,y,power,smoothing,xv,yv,values):
nominator=0
denominator=0
for i in range(0,len(values)):
dist = sqrt((x-xv[i])*(x-xv[i])+(y-yv[i])*(y-yv[i])+smoothing*smoothing);
#If the point is really close to one of the data points, return the data point value to avoid singularities
if(dist<0.0000000001):
return values[i]
nominator=nominator+(values[i]/pow(dist,power))
denominator=denominator+(1/pow(dist,power))
#Return NODATA if the denominator is zero
if denominator > 0:
value = nominator/denominator
else:
value = -9999
return value
def invDist(xv,yv,values,xsize=100,ysize=100,power=2,smoothing=0):
valuesGrid = np.zeros((ysize,xsize))
for x in range(0,xsize):
for y in range(0,ysize):
valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
return valuesGrid
if __name__ == "__main__":
power=1
smoothing=20
#Creating some data, with each coodinate and the values stored in separated lists
xv = [10,60,40,70,10,50,20,70,30,60]
yv = [10,20,30,30,40,50,60,70,80,90]
values = [1,2,2,3,4,6,7,7,8,10]
#Creating the output grid (100x100, in the example)
ti = np.linspace(0, 100, 100)
XI, YI = np.meshgrid(ti, ti)
#Creating the interpolation function and populating the output matrix value
ZI = invDist(xv,yv,values,100,100,power,smoothing)
# Plotting the result
n = plt.normalize(0.0, 100.0)
plt.subplot(1, 1, 1)
plt.pcolor(XI, YI, ZI)
plt.scatter(xv, yv, 100, values)
plt.title('Inv dist interpolation - power: ' + str(power) + ' smoothing: ' + str(smoothing))
plt.xlim(0, 100)
plt.ylim(0, 100)
plt.colorbar()
plt.show()
There's a bunch of options here, which one is best will depend on your data...
However I don't know of an out-of-the-box solution for you
You say your input data is from tripolar data. There are three main cases for how this data could be structured.
Sampled from a 3d grid in tripolar space, projected back to 2d LAT, LON data.
Sampled from a 2d grid in tripolar space, projected into 2d LAT LON data.
Unstructured data in tripolar space projected into 2d LAT LON data
The easiest of these is 2. Instead of interpolating in LAT LON space, "just" transform your point back into the source space and interpolate there.
Another option that works for 1 and 2 is to search for the cells that maps from tripolar space to cover your sample point. (You can use a BSP or grid type structure to speed up this search) Pick one of the cells, and interpolate inside it.
Finally there's a heap of unstructured interpolation options .. but they tend to be slow.
A personal favourite of mine is to use a linear interpolation of the nearest N points, finding those N points can again be done with gridding or a BSP. Another good option is to Delauney triangulate the unstructured points and interpolate on the resulting triangular mesh.
Personally if my mesh was case 1, I'd use an unstructured strategy as I'd be worried about having to handle searching through cells with overlapping projections. Choosing the "right" cell would be difficult.
I suggest you taking a look at GRASS (an open source GIS package) interpolation features (http://grass.ibiblio.org/gdp/html_grass62/v.surf.bspline.html). It's not in python but you can reimplement it or interface with C code.
Am I right in thinking your data grids look something like this (red is the old data, blue is the new interpolated data)?
alt text http://www.geekops.co.uk/photos/0000-00-02%20%28Forum%20images%29/DataSeparation.png
This might be a slightly brute-force-ish approach, but what about rendering your existing data as a bitmap (opengl will do simple interpolation of colours for you with the right options configured and you could render the data as triangles which should be fairly fast). You could then sample pixels at the locations of the new points.
Alternatively, you could sort your first set of points spatially and then find the closest old points surrounding your new point and interpolate based on the distances to those points.
There is a FORTRAN library called BIVAR, which is very suitable for this problem. With a few modifications you can make it usable in python using f2py.
From the description:
BIVAR is a FORTRAN90 library which interpolates scattered bivariate data, by Hiroshi Akima.
BIVAR accepts a set of (X,Y) data points scattered in 2D, with associated Z data values, and is able to construct a smooth interpolation function Z(X,Y), which agrees with the given data, and can be evaluated at other points in the plane.
I have multiple 2D numpy arrays (image data of a bright object, each of size 600x600), and I ran a cross-correlation on each of the individual images vs. a stacked composite image using skimage.feature.register_translation to obtain the relative subpixel shifts of each image's centroid with respect to the centroid of the composite image. I'd now like to create a weighted 2d histogram of all my individual image data, using the relative shifts of each in order to have all of them exactly centered. But I'm confused on how to do this. My code so far is below (after finding the shifts):
import numpy as np
data = #individual image data; this is an array of multiple 2D (600x600) arrays
# Shifts in x and y (each are same length as 'data')
dx = np.array([0.346, 0.23, 0.113, ...])
dy = np.array([-0.416, -0.298, 0.275, ...])
# Bins
bins = np.arange(-300, 300, 1)
# Weighted histogram
h, xe, ye = np.histogram2d(dx.ravel(), dy.ravel(), bins=bins, weights=data.ravel())
This isn't getting me anywhere though -- I think my weights parameters is wrong (I think there should be just one weight per image, instead of the whole image?), but don't know what else I would put for it. The images are of different bright sources, so I can't just assume they all have the same widths either. How can I accomplish this?
I'm trying to use KMeans centroids to label/clump pixels for a land cover analysis. I'm hoping to do this only using sklearn and matplotlib. At the moment my code looks like this:
kmeans.fit(band_5)
centroids = kmeans.cluster_centers_
plt.scatter(centroids[:, 0], centroids[:, 1])
The shape of band_5 is (713, 1163), yet from the scatter plot I can tell that the centroid coordinates have values well in excess of that shape.
From my understanding, the centroids that KMeans provides need to be converted into the correct coordinates and then a shapefile, which would then be used in a supervised process to label/clump pixels.
How do I convert those centroids to the correct coordinates and then export to a shapefile? Also, do I need to create a shapefile?
I tried to adopt some of the code from this post, but I could not get that to work. http://scikit-learn.org/stable/auto_examples/cluster/plot_color_quantization.html#sphx-glr-auto-examples-cluster-plot-color-quantization-py
A couple of points:
scikit-learn expects data in columns (think a table in a spreadsheet), so simply passing in an array representing a raster band will actually try and classify the data as if you had 1163 sample points and 713 values (bands) for each sample. Instead you'll need to flatten the array, and what kmeans will return will be equivalent to quantile classification of your raster if you're looking at it in something like ArcGIS, with centroids in the range of band minimum value to band maximum value (not in cell coordinates).
Looking at the example you provide, they have a three band jpeg, which the reshape into three long columns:
image_array = np.reshape(china, (w * h, d))
If you need to have spatially constrained pixels then you have two choices: choose a connectivity constrained cluster method such as Agglomerative Clustering or Affinity Propagation, and look at adding the normalised cell coordinates to your sample-set, e.g.:
xs, ys = np.meshgrid(
np.linspace(0, 1, 1163), # x
np.linspace(0, 1, 713), # y
)
data_with_coordinates = np.column_stack([
band_5.flatten(),
xs.flatten(),
ys.flatten()
])
# And on with the clustering
Once you've done the clustering with scikit-learn, assuming you use fit_predict you'll get a label back for each value by cluster, and you can reshape back to the original shape of the band to plot the clustered results.
labels = classifier.fit_predict(data_with_coordinates)
plt.imshow(labels.reshape(band_5.shape)
Do you actually need the cluster centroids given you have labelled points? And do you need them in real world spatial coordinates? If yes, then you need to be looking at the rasterio and the affine methods to transform from map coordinates to array coordinates and vice versa. And then look into fiona to write the points to a shapefile.