Create a land mask from latitude and longitude arrays - python

Given latitude and longitude arrays, I'm tryin to genereate a land_mask, an array of the same size that tells whether a coordinate is land or not.
lon=np.random.uniform(0,150,size=[1000,1000])
lat=np.random.uniform(-90,90,size=[1000,1000])
from global_land_mask import globe
land_mask=globe.is_land(lat,lon)
This is a very efficient method to create land mask if all values are defined. But if some values in lat or lon are masked or are nan values, it throws an error.
I've tried to use for loops to avoid that error but it's taking almost 15-20 minutes to run. I've to run it on an array with 3000×3000 elements, some of which are masked.
What would be a better way for generating land mask for arrays with masked/nan values?

so it seems globe.is_land(y,x) doesn't take a masked array. An equitable solution would be to use a coord outside your domain (if possible). So:
lon[lon==327.67] = 170
lat[lat==327.67] = -90
from global_land_mask import globe
land_mask=globe.is_land(lat,lon)
masked = np.where((lat==-90)|(lon==170), False, land_mask)
Alternatively, you could mask the values prior to passing them in:
lat_mask = np.where(lat==326.67, np.nan, lat)
lon_mask = np.where(lon==326.67, np.nan, lon)
master_mask = np.where((lat_mask==np.nan)|(lon_mask==np.nan), False, True)
lat[master_mask]==True
lon[master_mask]==True
from global_land_mask import globe
land_mask=globe.is_land(lat,lon)
The second solution will change (flatten) your lat/lon arrays but does not require you to find an area outside of your domain

Related

'Lining up' large lat/lon grid with smaller lat/lon grid

Let's say I have a large array of values that represent terrain latitude locations that is shape x. I also have another array of values that represent terrain longitude values that is shape y. All of the values in x as well as y are equally spaced at 0.005-degrees. In other words:
lons[0:10] = [-130.0, -129.995, -129.99, -129.985, -129.98, -129.975, -129.97, -129.965, -129.96, -129.955]
lats[0:10] = [55.0, 54.995, 54.99, 54.985, 54.98, 54.975, 54.97, 54.965, 54.96, 54.955]
I have a second dataset that is projected in an irregularly-spaced lat/lon grid (but equally spaced ~ 25 meters apart) that is [m,n] dimensions big, and falls within the domain of x and y. Furthermore, we also have all of the lat/lon points within this second dataset. I would like to 'lineup' the grids such that every value of [m,n] matches the nearest neighbor terrain value within the larger grid. I am able to do this with the following code where I basically loop through every lat/lon value in dataset two, and try to find the argmin of a the calculated lat/lon values from dataset1:
for a in range(0,lats.shape[0]):
# Loop through the ranges
for r in range(0,lons.shape[0]):
# Access the elements
tmp_lon = lons[r]
tmp_lat = lats[a]
# Now we need to find where the tmp_lon and tmp_lat match best with the index from new_lats and new_lons
idx = (np.abs(new_lats - tmp_lat)).argmin()
idy = (np.abs(new_lons - tmp_lon)).argmin()
# Make our final array!
second_dataset_trn[a,r] = first_dataset_trn[idy,idx]
Except it is exceptionally slow. Is there another method, either through a package, library, etc. that can speed this up?
Please take a look at the following previous question for iterating over two lists, which may improve the speed: Is there a better way to iterate over two lists, getting one element from each list for each iteration?
A possible correction to the sample code: assuming that the arrays are organized in the standard GIS fashion of Latitude, Longitude, I believe there is an error in the idx and idy variable assignments - the variables receiving the assignments should be swapped (idx should be idy, and the other way around). For example:
# Now we need to find where the tmp_lon and tmp_lat match best with the index from new_lats and new_lons
idy = (np.abs(new_lats - tmp_lat)).argmin()
idx = (np.abs(new_lons - tmp_lon)).argmin()

interpolate / downsample 2D array in Python

I have 2 separate arrays with different sizes:
len(range_data) = 4320
len(az1) = 385
len(az2) = 347
data1.shape = (385,4320)
data2.shape = (347,4320)
I would like for the dimensions of data2 to equal that of data1, such that data2.shape should be (385,4320). I have tried scipy interpolate such as:
f = interpolate.interp2d(az1,range_data,data1,kind='cubic')
znew = f(az2,range_data)
print(znew.shape)
(347,4320)
znew.shape should be (385,4320), any ideas why this is happening and/or what might need to be done to fix this?
I don't think that interp2d actually generates more points for you, it defines an interpolation function over a grid. That means that what you've created is a way to interpolate points within the grid defined by your first set of data points. znew will return an interpolated grid with the same number of values as the x and y passed to it.
See the source code.
Returns
-------
z : 2-D array with shape (len(y), len(x))
The interpolated values.
If you want to add extra data points, I would suggest deriving a regression function (or whatever ML technique you want, NNs if you're so inclined) on the second data set and use that function to produce the extra 38 datapoints you need.

Build Shapely point objects from .TIF

I would like to convert an image (.tiff) into Shapely points. There are 45 million pixels, I need a way to accomplish this without a loop (currently taking 15+ hours)
For example, I have a .tiff file which when opened is a 5000x9000 array. The values are pixel values (colors) that range from 1 to 215.
I open tif with rasterio.open(xxxx.tif).
Desired epsg is 32615
I need to preserve the pixel value but also attach geospatial positioning. This is to be able to sjoin over a polygon to see if the points are inside. I can handle the transform after processing, but I cannot figure a way to accomplish this without a loop. Any help would be greatly appreciated!
If you just want a boolean array indicating whether the points are within any of the geometries, I'd dissolve the shapes into a single MultiPolygon then use shapely.vectorized.contains. The shapely.vectorized module is currently not covered in the documentation, but it's really good to know about!
Something along the lines of
# for a gridded dataset with 2-D arrays lats, lons
# and a list of shapely polygons/multipolygons all_shapes
XX = lons.ravel()
YY = lats.ravel()
single_multipolygon = shapely.ops.unary_union(all_shapes)
in_any_shape = shapely.vectorized.contains(single_multipolygon, XX, YY)
If you're looking to identify which shape the points are in, use geopandas.points_from_xy to convert your x, y point coordinates into a GeometryArray, then use geopandas.sjoin to find the index of the shape corresponding to each (x, y) point:
geoarray = geopandas.points_from_xy(XX, YY)
points_gdf = geopandas.GeoDataFrame(geometry=geoarray)
shapes_gdf = geopandas.GeoDataFrame(geometry=all_shapes)
shape_index_by_point = geopandas.sjoin(
shapes_gdf, points_gdf, how='right', predicate='contains',
)
This is still a large operation, but it's vectorized and will be significantly faster than a looped solution. The geopandas route is also a good option if you'd like to convert the projection of your data or use other geopandas functionality.

Average a Data Set while maintaining its variables?

I am currently trying to plot some data into cartopy, but I am having some issues.
I have a datasheet that has a shape of (180, 180, 360) time, lat, and lon respectively.
I would like to get an annual mean of this data. I had been using the code
def global_mean_3D(var, weights):
# make sure masking is correct, otherwise we get nans
var = np.ma.masked_invalid(var)
# resulting variable should have dimensions of depth and time (x)
ave = np.zeros([var.shape[0], var.shape[1]])
# loop over time
for t in np.arange(var.shape[0]):
# loop over each depth slice
for d in np.arange(var.shape[1]):
ave[t,d] = np.ma.average(var[t,d,:], weights = weights)
return ave
which I then use to plot
ax=plt.axes(projection=ccrs.Robinson())
ax.coastlines()
ax.contourf(x,y, ann_total_5tg)
But this code gives me a one dimension shape, over time, which I can't plot into cartopy using pcolor mesh.
I am left with the error
TypeError: Input z must be a 2D array.
Would it be possible to get an annual mean whilst maintaining variables within the datasheet?
I suspect that you have to reshape your numpy array to use it with the contour method.
Using your variable name it can be done like this :
ann_total_5tg = ann_total_5tg.reshape((180, 180))

Incoherent handling of coordinates in basemap addcyclic

I have used all mpl_toolkits.basemap functions successfully on several global GCM netcdf datasets. Until I met this grid, with longitudes starting at 0.9375 (instead of 0 as I have always seen) and ending at 359.062.
To prepare a plot, I need to:
make the plot continuous with:
# input_var is a 2D numpy array
var_cyclicDUMMY, lons_cyclicDUMMY = addcyclic(input_var, lons)
I thus obtain a 2D array var_cyclicDUMMYwith an extra column (one extra longitude), and a 1D array lons_cyclicDUMMY with one extra element at the end, i.e. one extra longitude, but at 0.9375, instead of the 360 that is needed.
Indeed in the next step, where I
shift the grid, so longitudes go from -180 to 180 instead of 0 to 360, with:
var_cyclic, lons_cyclic = shiftgrid(180., var_cyclicDUMMY,
lons_cyclicDUMMY, start=False)
I get an ValueError: lon0 outside of range of lonsin
Any suggestions how to get around this with basemap or another solution?

Categories