Moving window over large raster map

Moving window over large raster map - python

What is the best way to resample a 11000 x 13000 cells raster image? I would want to recalculate every cell using the values of all cells within a circle of x meters. It is similar to a convolution, but the calculation will differ each run.
I could well work this out by exporting the map to an ascii format such as asciiGrid, but I do hope ArcGis provides a better way of realizing it.
The desired result would look something like this:
for y in lyrYrange:
for x in lyrXrange:
window = lyr.getWindow(x, y, sizeX, sizeY)
newlyr[x, y] = DoMyProcessing(window)
The unknown part here is what function to call instead of the fictitious "getWindow()".

Related

Convert 2D Array to a 3D Space

I am trying to develop a 3D cube with values from a flat 2D Plane. I am having a lot of difficulty trying to pseudo code it out so I was hoping to get some input from you guys.
I will try my best to express myself through pictures as I am able to visualize what I am trying to achieve.
I have a 2D output based on the black line in this figure:
I have an array with data of amplitude as each index's value i.e (0; 1) -> the 0 is the x coordinate (sample) and 1 as the y coordinate (amplitude) or as another example (~1900; ~0.25).
How do I take this 1 dimensional sequence and extrude it into a 3D picture like below:
Is there perhaps a library that does such? Or am I going about it the wrong way? The data is from a matched filter output of a sonar signal and I wish to visualize the concentration of the intensity versus where it is located in a sample on a 3D plane. The data has peaks that have inclining and declining gradient slopes before a peak.
I cannot seem to wrap my mind around such a task. Is there a library or a term used to associate what I wish to accomplish?
EDIT: I found this https://www.tutorialspoint.com/matplotlib/matplotlib_3d_surface_plot.htm
But it requires all x, y and z points. Whereas I only have x and y. Additionally I need to be able to access every coordinate (x, y, z) to be able to do range and angle estimation from sample (0, 1) (Transmitted sound where power is highest). I would only like to basically see the top of this though on another 2D axis...
EDIT 2: Following up on a comment below, I would like to convert Figure 1 above into the below image using a library if there exists.
Thanks so much in advanced!

Build Shapely point objects from .TIF

I would like to convert an image (.tiff) into Shapely points. There are 45 million pixels, I need a way to accomplish this without a loop (currently taking 15+ hours)
For example, I have a .tiff file which when opened is a 5000x9000 array. The values are pixel values (colors) that range from 1 to 215.
I open tif with rasterio.open(xxxx.tif).
Desired epsg is 32615
I need to preserve the pixel value but also attach geospatial positioning. This is to be able to sjoin over a polygon to see if the points are inside. I can handle the transform after processing, but I cannot figure a way to accomplish this without a loop. Any help would be greatly appreciated!

If you just want a boolean array indicating whether the points are within any of the geometries, I'd dissolve the shapes into a single MultiPolygon then use shapely.vectorized.contains. The shapely.vectorized module is currently not covered in the documentation, but it's really good to know about!
Something along the lines of
# for a gridded dataset with 2-D arrays lats, lons
# and a list of shapely polygons/multipolygons all_shapes
XX = lons.ravel()
YY = lats.ravel()
single_multipolygon = shapely.ops.unary_union(all_shapes)
in_any_shape = shapely.vectorized.contains(single_multipolygon, XX, YY)
If you're looking to identify which shape the points are in, use geopandas.points_from_xy to convert your x, y point coordinates into a GeometryArray, then use geopandas.sjoin to find the index of the shape corresponding to each (x, y) point:
geoarray = geopandas.points_from_xy(XX, YY)
points_gdf = geopandas.GeoDataFrame(geometry=geoarray)
shapes_gdf = geopandas.GeoDataFrame(geometry=all_shapes)
shape_index_by_point = geopandas.sjoin(
shapes_gdf, points_gdf, how='right', predicate='contains',
)
This is still a large operation, but it's vectorized and will be significantly faster than a looped solution. The geopandas route is also a good option if you'd like to convert the projection of your data or use other geopandas functionality.

2D interpolate list of many points [duplicate]

So, I have three numpy arrays which store latitude, longitude, and some property value on a grid -- that is, I have LAT(y,x), LON(y,x), and, say temperature T(y,x), for some limits of x and y. The grid isn't necessarily regular -- in fact, it's tripolar.
I then want to interpolate these property (temperature) values onto a bunch of different lat/lon points (stored as lat1(t), lon1(t), for about 10,000 t...) which do not fall on the actual grid points. I've tried matplotlib.mlab.griddata, but that takes far too long (it's not really designed for what I'm doing, after all). I've also tried scipy.interpolate.interp2d, but I get a MemoryError (my grids are about 400x400).
Is there any sort of slick, preferably fast way of doing this? I can't help but think the answer is something obvious... Thanks!!

Try the combination of inverse-distance weighting and
scipy.spatial.KDTree
described in SO
inverse-distance-weighted-idw-interpolation-with-python.
Kd-trees
work nicely in 2d 3d ..., inverse-distance weighting is smooth and local,
and the k= number of nearest neighbours can be varied to tradeoff speed / accuracy.

There is a nice inverse distance example by Roger Veciana i Rovira along with some code using GDAL to write to geotiff if you're into that.
This is of coarse to a regular grid, but assuming you project the data first to a pixel grid with pyproj or something, all the while being careful what projection is used for your data.
A copy of his algorithm and example script:
from math import pow
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt
def pointValue(x,y,power,smoothing,xv,yv,values):
nominator=0
denominator=0
for i in range(0,len(values)):
dist = sqrt((x-xv[i])*(x-xv[i])+(y-yv[i])*(y-yv[i])+smoothing*smoothing);
#If the point is really close to one of the data points, return the data point value to avoid singularities
if(dist<0.0000000001):
return values[i]
nominator=nominator+(values[i]/pow(dist,power))
denominator=denominator+(1/pow(dist,power))
#Return NODATA if the denominator is zero
if denominator > 0:
value = nominator/denominator
else:
value = -9999
return value
def invDist(xv,yv,values,xsize=100,ysize=100,power=2,smoothing=0):
valuesGrid = np.zeros((ysize,xsize))
for x in range(0,xsize):
for y in range(0,ysize):
valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
return valuesGrid
if __name__ == "__main__":
power=1
smoothing=20
#Creating some data, with each coodinate and the values stored in separated lists
xv = [10,60,40,70,10,50,20,70,30,60]
yv = [10,20,30,30,40,50,60,70,80,90]
values = [1,2,2,3,4,6,7,7,8,10]
#Creating the output grid (100x100, in the example)
ti = np.linspace(0, 100, 100)
XI, YI = np.meshgrid(ti, ti)
#Creating the interpolation function and populating the output matrix value
ZI = invDist(xv,yv,values,100,100,power,smoothing)
# Plotting the result
n = plt.normalize(0.0, 100.0)
plt.subplot(1, 1, 1)
plt.pcolor(XI, YI, ZI)
plt.scatter(xv, yv, 100, values)
plt.title('Inv dist interpolation - power: ' + str(power) + ' smoothing: ' + str(smoothing))
plt.xlim(0, 100)
plt.ylim(0, 100)
plt.colorbar()
plt.show()

There's a bunch of options here, which one is best will depend on your data...
However I don't know of an out-of-the-box solution for you
You say your input data is from tripolar data. There are three main cases for how this data could be structured.
Sampled from a 3d grid in tripolar space, projected back to 2d LAT, LON data.
Sampled from a 2d grid in tripolar space, projected into 2d LAT LON data.
Unstructured data in tripolar space projected into 2d LAT LON data
The easiest of these is 2. Instead of interpolating in LAT LON space, "just" transform your point back into the source space and interpolate there.
Another option that works for 1 and 2 is to search for the cells that maps from tripolar space to cover your sample point. (You can use a BSP or grid type structure to speed up this search) Pick one of the cells, and interpolate inside it.
Finally there's a heap of unstructured interpolation options .. but they tend to be slow.
A personal favourite of mine is to use a linear interpolation of the nearest N points, finding those N points can again be done with gridding or a BSP. Another good option is to Delauney triangulate the unstructured points and interpolate on the resulting triangular mesh.
Personally if my mesh was case 1, I'd use an unstructured strategy as I'd be worried about having to handle searching through cells with overlapping projections. Choosing the "right" cell would be difficult.

I suggest you taking a look at GRASS (an open source GIS package) interpolation features (http://grass.ibiblio.org/gdp/html_grass62/v.surf.bspline.html). It's not in python but you can reimplement it or interface with C code.

Am I right in thinking your data grids look something like this (red is the old data, blue is the new interpolated data)?
alt text http://www.geekops.co.uk/photos/0000-00-02%20%28Forum%20images%29/DataSeparation.png
This might be a slightly brute-force-ish approach, but what about rendering your existing data as a bitmap (opengl will do simple interpolation of colours for you with the right options configured and you could render the data as triangles which should be fairly fast). You could then sample pixels at the locations of the new points.
Alternatively, you could sort your first set of points spatially and then find the closest old points surrounding your new point and interpolate based on the distances to those points.

There is a FORTRAN library called BIVAR, which is very suitable for this problem. With a few modifications you can make it usable in python using f2py.
From the description:
BIVAR is a FORTRAN90 library which interpolates scattered bivariate data, by Hiroshi Akima.
BIVAR accepts a set of (X,Y) data points scattered in 2D, with associated Z data values, and is able to construct a smooth interpolation function Z(X,Y), which agrees with the given data, and can be evaluated at other points in the plane.

How to parallelize writing to same cell in a numpy array?

Background: I have millions of points in 2D space with (x_position, y_position, value) associated with each point. I am trying to summarize these points by creating an image, where each pixel can contain multiple points. To summarize, each pixel stores the sum of values at that (x_pixel, y_pixel) location in the image.
Question: How can I do this efficiently? Currently, my code does something like this:
image = np.zeros((4096,4096))
for each point in data:
x_pixel, y_pixel = convertPointPos2PixelPos(point)
image[x_pixel, y_pixel] += point.getValue()
but the ETA for this code completing is 450 hours, which is unacceptable. Is there a way to parallelize this? The code is writing to the same image[x,y] index multiple times. I found StackOverflow posts that suggest using multiprocessing, but I think needing to lock to prevent race conditions will mean this will take just as much time as it would without parallelizing.

Assuming you want something on a regular grid, you can use simple division to bin your data. Here is an example:
size = (4096, 4096)
data = np.random.rand(100000000, 3)
image = np.zeros(size)
coords = data[:, :2]
min = coords.min(0)
max = coords.max(0)
index = np.floor_divide(coords - min, (max - min) / np.subtract(size, 1), out=np.empty(coords.shape, dtype=int), casting='unsafe')
index is now an array of indices into image where you want to add the corresponding values. You can do an unbuffered add using np.add.at:
np.add.at(image, tuple(index.T), data[:, -1])
If your data range is better defined than just the bounding box of the coordinates, you can save a little time by not computing coord.max() and coord.min().
The result is something like this:
This entire operation takes 6.4sec on my very moderately powered machine for 10M points, including the call to plt.imshow, plt.colorbar and garbage collection before runs.
Timing collected using the %%timeit cell magic in IPython.
Either way, you're well under 450 hours. Even if your coordinate transformation is not linear binning, I expect that you can run in reasonable time as long as you vectorize it properly. Also, multiprocessing is not likely to give you a huge boost since it requires copying data around.

Creating a fool proof graphing calculator using python - Python 2.7

I am trying to create a fool proof graphing calculator using python and pygame.
I created a graphing calculator that works for most functions. It takes a user string infix expression and converts it to postfix for easier calculations. I then loop through and pass in x values into the postfix expression to get a Y value for graphing using pygame.
The first problem I ran into was when taking calculations of impossible things. (like dividing by zero, square root of -1, 0 ^ non-positive number). If something like this would happen I would output None and that pixel wouldn't be added to the list of points to be graphed.
* I have showed all the different attempts I have made at this to help you understand where I cam coming from. If you would like to only see my most current code and method, jump down to where it says "current".
Method 1
My first method was after I acquired all my pixel values, I would paint them using the pygame aalines function. This worked, except it wouldn't work when there were missing points in between actual points because it would just draw the line across the points. (1/x would not work but something like 0^x would)
This is what 1/x looks like using the aalines method
Method 1.1
My next Idea was to split the line into two lines every time a None was printed back. This worked for 1/x, but I quickly realized that it would only work if one of the passed in X values exactly landed on a Y value of None. 1/x might work, but 1/(x+0.0001) wouldn't work.
Method 2
My next method was to convert the each pixel x value into the corresponding x point value in the window (for example, (0,0) on the graphing window actually would be pixel (249,249) on a 500x500 program window). I would then calculate every y value with the x values I just created. This would work for any line that doesn't have a slope > 1 or < -1.
This is what 1/x would look like using this method.
Current
My most current method is supposed to be a advanced working version of method 2.
Its kind of hard to explain. Basically I would take the x value in between each column on the display window. For every pixel I would do this just to the left and just to the right of it. I would then plug those two values into the expression to get two Y values. I would then loop through each y value on that column and check if the current value is in between both of the Y values calculated earlier.
size is a list of size two that is the dimensions of the program window.
xWin is a list of size two that holds the x Min and x Max of the graphing window.
yWin is a list of size two that holds the y Min and y Max of the graphing window.
pixelToPoint is a function that takes scalar pixel value (just x or just y) and converts it to its corresponding value on the graphing window
pixels = []
for x in range(size[0]):
leftX = pixelToPoint(x,size[0]+1, xWin, False)
rightX = pixelToPoint(x+1, size[0]+1, xWin, False)
leftY = calcPostfix(postfix, leftX)
rightY = calcPostfix(postfix, rightX)
for y in range(size[1]):
if leftY != None and rightY != None:
yPoint = pixelToPoint(y,size[1],yWin, True)
if (rightY <= yPoint <= leftY) or (rightY >= yPoint >= leftY):
pixels.append((x,y))
for p in pixels:
screen.fill(BLACK, (p, (1, 1)))
This fixed the problem in method 2 of having the pixels not connected into a continuous line. However, it wouldn't fix the problem of method 1 and when graphing 1/x, it looked exactly the same as the aalines method.
-------------------------------------------------------------------------------------------------------------------------------
I am stuck and can't think of a solution. The only way I can think of fixing this is by using a whole bunch of x values. But this way seems really inefficient. Also I am trying to make my program as resizable and customizable as possible so everything must be variably driven and I am not sure what type of calculations are needed to find out how many x values are needed to be used depending on the program window size and the graph's window size.
I'm not sure if I am on the right track or if there is a completely different method of doing this, but I want to create my graphing calculator to able to graph any function (just like my actual graphing calculator).
Edit 1
I just tried using as many x values as there are pixels (500x500 display window calculates 250,000 y values).
Its worked for every function I've tried with it, but it is really slow. It takes about 4 seconds to calculate (it fluctuates depending on the equation). I've looked around online and have found graphing calculators that are almost instantaneous in their graphing, but I cant figure out how they do it.
This online graphing calcuator is extremely fast and effective. There must be some algorithm other than using a bunch of x values than can achieve what I want because that site is doing it..

The problem you have is that to be able to know if between two point you can reasonably draw a line you have to know if the function is continuous in the interval.
It is a complex problem in General what you could do is use the following heuristic. If the slope of the line have changed too much from the previous one guess you have a non continuous point in the interval and don't draw a line.

Another solution would be based on solution 2.
After have draw the points that correspond to every value of the x axis try to draw for every adjacent x: (x1, x2) the y within (y1 = f(x1), y2 = f(x2)) that can be reach by an x within (x1, x2).
This can be done by searching by dichotomy or via the Newton search heuristic an x that could fit.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.