How to draw coastlines over a custom map without resampling

How to draw coastlines over a custom map without resampling - python

I would like to display a satellite image (preferably using python, but other solutions are welcome). It consists in a floating-point parameter P, with dimension NxM, and each pixel is geolocated by the fields latitude and longitude (each of size NxM). So I would like to:
(1) create an image of parameter P with an associated color scale. The image should not be resampled, so it should have dimension NxM
(2) display coastlines over this image
Currently, I can do (1) using PIL. I can also use the basemap library to display an image and the coastlines, but I don't know how to do it without reprojection, by staying in the image native projection with size NxM.
Edit: the parameter P does not contain any information about the coastline. Only the location (lat, lon) of the pixels should be used to overlay the coastline. The coordinates for the coastline can be obtained from gshhs for example. gshhs is actually used in the basemap library.

If all you're trying to do is enhance the boundaries between land and water, it might be good to use a high-pass filter.
For instance, start out with Lena:
and apply a highpass filter:
then overlay the highpass on top of the original:
(more details and examples can be found here).
You can find filters in scipy here.

For those in the community still looking for an answer to this question, the method which I am currently implementing (for v. similar purposes - I'm trying to test the geolocation of satellite data) requires a landmask.
There are landmask datasets available all over the place online, each with different rules and characteristics. I am working with netCDF4 data in python and my landmask is a gridded .nc dataset in which ocean elements are valued as 1 and land elements are valued as 0.
Iterating through my satellite data I multiply each latitude and longitude value by the number of elements per degree in the landmask. In my case there are 120 elements per degree in lat/lon, so
lon_inds = (lons*120).astype(int)
lat_inds = (lats*120).astype(int)
A more general way of writing this would involve substituting 120 for
len(lons)/360
len(lats)/180
respectively. Both examples of these operations can be done nearly instantaneously if using numpy arrays (which is the case for the python netCDF4 module).
Now I create a mask of my own: it must have the same dimensions as the data array (for those not intimately acquainted with satellites, the data, lats and lons arrays will all have identical dimensions):
my_mask = np.zeros(data.shape, dtype=int)
Now all we need to do is replace values in the mask where there is a coastline. This is done by iterating through the lat_inds and lon_inds arrays, looking up the value in the landmask of
landmask[lon_inds[i,j],lat_inds[i,j]]
and changing the value of
mask[i,j]
to 1 if any of the neighbors
landmask[lon_inds[i,j]-1,lat_inds[i,j]]
landmask[lon_inds[i,j]+1,lat_inds[i,j]]
landmask[lon_inds[i,j],lat_inds[i,j]-1]
landmask[lon_inds[i,j],lat_inds[i,j]+1]
are not equal to 0 (of course, a smoother coastline can be generated by adding in the diagonal neighboring cells, but this should not be necessary as hopefully you should be using a landmask dataset with sharper spatial resolution than your satellite data).

Related

Points in Polygons. How can I match them spatially with given coordinates?

I have a dataset of georeferenced flickr posts (ca. 35k, picture below) and I have an unrelated dataset of georeferenced polygons (ca. 40k, picture below), both are currently panda dataframes. The polygons do not cover the entire area where flickr posts are possible. I am having trouble understanding how to sort many different points in many different polygons (or check if they are close). In the end I want a map with the points from the flickerdata in polygons colord to an attribute (Tag). I am trying to do this in Python. Do you have any ideas or recommendations?
Point dataframe Polygon dataframe

Since, you don't have any sample data to load and play with, my answer will be descriptive in nature, trying to explain some possible strategies to approach the problem you are trying to solve.
I assume that:
these polygons are probably some addresses and you essentially want to place the geolocated flickr posts to the nearest best-match among the polygons.
First of all, you need to identify or acquire information on the precision of those flickr geolocations. How off could they possibly be because of numerous sources of errors (the reason behind those errors is not your concern, but the amount of error is). This will give you an idea of a circle of confusion (2D) or more likely a sphere of confusion (3D). Why 3D? Well, you might have flickr post from a certain elevation on a high-rise apartment, and so, (x: latitude,y: longitude, z: altitude) all may be necessary to consider. But, you have to study the data and any other information available to you to determine the best option here (2D/3D space-of-confusion).
Once you have figured out the type of ND-space-of-confusion, you will need a distance metric (typically just a distance between two points) -- call this sigma. Just to be on the safe side, find all the addresses (geopolygons) within a radius of 1 sigma and additionally within 2 sigma -- these are your possible set of target addresses. For each of these addresses have a variable that calculates its distances of its centroid, and the four corners of its rectangular outer bounding box from the flickr geolocations.
You will then want to rank these addresses for each flickr geolocation, based on their distances for all the five points. You will need a way of identifying a flickr point that is far from a big building's center (distance from centroid could be way more than distance from the corners) but closer to it's edges vs. a different property with smaller area-footprint.
For each flickr point, thus you would have multiple predictions with different probabilities (convert the distance metric based scores into probabilities) using the distances, on which polygon they belong to.
Thus, if you choose any flickr location, you should be able to show top-k geopolygons that flickr location could belong to (with probabilities).
For visualizations, I would suggest you to use holoviews with datashader as that should be able to take care of curse of dimension in your data. Also, please take a look at leafmap (or, geemap).
References
holoviews: https://holoviews.org/
datshader: https://datashader.org/
leafmap: https://leafmap.org/
geemap: https://geemap.org/

How to consistently number contours in an image time series?

I have automated the task of measuring plant area over time to extrapolate growth rate using an image time-series and the following two methods: (1) Python + ArcGIS, and (2) Python + OpenCV.
In the first method, ArcGIS allows me to create a vector grid on the image. Each cell of the grid contains a single plant, so I number each cell starting from top-left to bottom-right. After creating a binary image in which plant pixels == 1 and everything else == 0, I apply Zonal Statistics to find my plant area. In this way the plant numbers stay consistent because I use the same grid over all the images in the time series, but it requires manual intervention.
In the second method, I use OpenCV to find plants via contours. The numbering of each contour is done automatically based on its centroid coordinates and bounding box dimensions. Currently I have them sorted 'top-to-bottom', but it obviously isn't as perfect a sort as the manually-made grid. In addition, plant #1 may not stay plant #1 in the second or third image because each plant grows and moves over the course of the experiment, and new plants emerge and change the total number of contours (images are taken every hour for up to several weeks). Therefore, I cannot compare plant #1 in the first image and plant #1 in subsequent images because they may not even be the same plant.
How can I consistently number the same plant through the entire time-series using the second method? I considered associating centroids in subsequent images to (x,y) coordinates in the previous image that were the most similar (once the data is in tabular form), but this would fail to provide an updated numbered contour image.

The solution to this problem lay in automatic circle detection via the OpenCV Hough Transform function (cv2.HoughCircles()), finding the resulting Hough Circle centroids and then overlaying them on the original RGB image to create a reference key. As I did not have an image without any plants in it at all, I adapted the method so it found the correct amount of origins, but the result would be better in an image with no plants.
I converted the resulting csv files for the hough circles reference image (columns: OID, X, Y) and plant contours (columns: CID, X, Y, Area etc.) to GeoPandas GeoDataFrames and used Scipy's cKDTree to combine them through a nearest neighbour algorithm.
Special thanks to JHuw's answer in https://gis.stackexchange.com/questions/222315/geopandas-find-nearest-point-in-other-dataframe as Shapely's nearest_points function did not work for me.

How to plot 2-D navigation data points on a floor map using Matlab/Python?

I want to show the tracking result of my indoor localization algorithm with respect to the ground truth reference path on the floor map. The floor plan and the walking route representing the ground truth is as follows:
Here, the red line is the ground truth route. The right-left side of the image represents the x-axis and it is compressed (original x-axis length is much larger) owing to space. The top-bottom of the image is the y-axis and it represents precisely according to the coordinate.
I want to draw the localization estimation points (2-D) on it. I tried to do it using Origin. I got the following image.
As seen in the figure above, the image does not resemble the floor plan precisely (using log on y-axis can reduce the dimension of y-axis but it does not yield a complete solution in my case).
To summarize:
What I have: (a) A set of 2-D coordinate points from each localization algorithm (I'm comparing my method with two other methods, so there are 3 sets of 2-D coordinate points) and (b) a floor plan image.
What I want: To plot the sets of 2-D coordinate points on the floor plan image.
If anyone could drop a sample Matlab/python code to plot the 2-D coordinates, I'd highly appreciate it.
Thank you.

To plot on top of an image, you have to provide the nessecary scaling information. This can be achieved using the image function passing x, y and C: https://de.mathworks.com/help/matlab/ref/image.html?s_tid=doc_ta
I don't know how your floor plan is scaled, but the resulting code should be something like:
image(x,y,C) % x and y provides the scaling informating, C is the image.
hold on
plot(...) % code you already have

Converting an AutoCAD model to a matrix of points/volumes with the mass density specified at each location

I am an experimental physicist (grad student) that is trying to take an AutoCAD model of the experiment I've built and find the gravitational potential from the whole instrument over a specified volume. Before I find the potential, I'm trying to make a map of the mass density at each point in the model.
What's important is that I already have a model and in the end I'll have a something that says "At (x,y,z) the value is d". If that's an crazy csv file, a numpy array, an excel sheet, or... whatever, I'll be happy.
Here's what I've come up with so far:
Step 1: I color code the AutoCAD file so that color associates with material.
Step 2: I send the new drawing/model to a slicer (made for 3D printing). This takes my 3D object and turns it into equally spaced (in z-direction) 2d objects... but then that's all output as g-code. But hey! G-code is a way of telling a motor how to move.
Step 3: This is the 'hard part' and the meat of this question. I'm thinking that I take that g-code, which is in essence just a set of instructions on how to move a nozzle and use it to populate a numpy array. Basically I have 3D array, each level corresponds to one position in z, and the grid left is my x-y plane. It reads what color is being put where, and follows the nozzle and puts that mass into those spots. It knows the mass because of the color. It follows the path by parsing the g-code.
When it is done with that level, it moves to the next grid and repeats.
Does this sound insane? Better yet, does it sound plausible? Or maybe someone has a smarter way of thinking about this.
Even if you just read all that, thank you. Seriously.

Does this sound insane? Better yet, does it sound plausible?
It's very reasonable and plausible. Using the g-code could do that, but it would require a g-code interpreter that could map the instructions to a 2D path. (Not 3D, since you mentioned that you're taking fixed z-slices.) That could be problematic, but, if you found one, it could work, but may require some parser manipulation. There are several of these in a variety of languages, that could be useful.
SUGGESTION
From what you describe, it's akin to doing a MRI scan of the object, and trying to determine its constituent mass profile along a given axis. In this case, and unlike MRI, you have multiple colors, so that can be used to your advantage in region selection / identification.
Even if you used a g-code interpreter, it would reproduce an image whose area you'll still have to calculate, so noting that and given that you seek to determine and classify material composition by path (in that the path defines the boundary of a particular material, which has a unique color), there may be a couple ways to approach this without resorting to g-code:
1) If the colors of your material are easily (or reasonably) distinguishable, you can create a color mask which will quantify the occupied area, from which you can then determine the mass.
That is, if you take a photograph of the slice, load the image into a numpy array, and then search for a specific value (say red), you can identify the area of the region. Then, you apply a mask on your array. Once done, you count the occupied elements within your array, and then you divide it by the array size (i.e. rows by columns), which would give you the relative area occupied. Since you know the mass of the material, and there is a constant z-thickness, this will give you the relative mass. An example of color masking using numpy alone is shown here: http://scikit-image.org/docs/dev/user_guide/numpy_images.html
As such, let's define an example that's analogous to your problem - let's say we have a picture of a red cabbage, and we want to know which how much of the picture contains red / purple-like pixels.
To simplify our life, we'll set any pixel above a certain threshold to white (RGB: 255,255,255), and then count how many non-white pixels there are:
from copy import deepcopy
import numpy as np
import matplotlib.pyplot as plt
def plot_image(fname, color=128, replacement=(255, 255, 255), plot=False):
# 128 is a reasonable guess since most of the pixels in the image that have the
# purplish hue, have RGB's above this value.
data = imread(fname)
image_data = deepcopy(data) # copy the original data (for later use if need be)
mask = image_data[:, :, 0] < color # apply the color mask over the image data
image_data[mask] = np.array(replacement) # replace the match
if plot:
plt.imshow(image_data)
plt.show()
return data, image_data
data, image_data = plot_image('cabbage.jpg') # load the image, and apply the mask
# Find the locations of all the pixels that are non-white (i.e. 255)
# This returns 3 arrays of the same size)
indices = np.where(image_data != 255)
# Now, calculate the area: in this case, ~ 62.04 %
effective_area = indices[0].size / float(data.size)
The selected region in question is shown here below:
Note that image_data contains the pixel information that has been masked, and would provide the coordinates (albeit in pixel space) of where each occupied (i.e. non-white) pixel occurs. The issue with this of course is that these are pixel coordinates and not a physical one. But, since you know the physical dimensions, extrapolating those quantities are easily done.
Furthermore, with the effective area known, and knowledge of the physical dimension, you have a good estimate of the real area occupied. To obtain better results, tweak the value of the color threshold (i.e. color). In your real-life example, since you know the color, search within a pixel range around that value (to offset noise and lighting issues).
The above method is a bit crude - but effective - and, it may be worth exploring using it in tandem with edge-detection, as that could help improve the region identification, and area selection. (Note that isn't always strictly true!) Also, color deconvolution may be useful: http://scikit-image.org/docs/dev/auto_examples/color_exposure/plot_ihc_color_separation.html#sphx-glr-auto-examples-color-exposure-plot-ihc-color-separation-py
The downside to this is that the analysis requires a high quality image, good lighting; and, most importantly, it's likely that you'll lose some of the more finer details of the edges, which would impact your masses.
2) Instead of resorting to camera work, and given that you have the AutoCAD model, you can use that and the software itself in addition to the above prescribed method.
Since you've colored each material in the model differently, you can use AutoCAD's slicing tool, and can do something similar to what the first method suggests doing physically: slicing the model, and taking pictures of the slice to expose the surface. Then, using a similar method described above of color masking / edge detection / region determination through color selection, you should obtain a much better and (arguably) very accurate result.
The downside to this, is that you're also limited by the image quality used. But, as it's software, that shouldn't be much of an issue, and you can get extremely high accuracy - close to its actual result.
The last suggestion to improve these results would be to script numerous random thin slicing of the AutoCAD model along a particular directional vector shared by every subsequent slice, exporting each exposed surface, analyzing each image in the manner described above, and then collecting those results to given you a Monte Carlo-like and statistically quantifiable determination of the mass (to correct for geometry effects due to slicing along one given axis).

Plotting gridded data using KML

We are beginning a project to visualize the results of a finite volume (FV) calculation using Google Earth. The FV data is essentially 2d (lat/long) data consisting of a Cartesian array of values (sea surface height, for example). Each value should be mapped to a color from some colormap, and then displayed as a single mesh cell in a gridded array suitable for Google Earth. The Cartesian array could be 100x100 or larger.
My question is, do we construct polygons for each mesh cell C_{ij} in the array, assigning a color corresponding to the q_{ij} value for that mesh cell? This would seem to create a huge KML file, if the coordinates of the four corners of every mesh cell must be described, (i.e. 10,000 polygons, for example).
Or are there KML tools we could use that would allow us to specify, for example, the lower and upper coordinates of the array, a generic mesh cell size (e.g. dX, dY values), and the array of q data (or, equivalently, colours) that should be used to fill the "patch"?
Alternatively, we could create an image file, containing for example, a rendered image of our data array (created by some other means), and then referenced from the KML file.
Our aim is to use PyKML for this project.
Any suggestions would be very helpful.

After much digging around, I think I now have a better understanding of what Google Earth can and cannot do, (or is not designed to do). It seems that Google Earth is not designed as a visualization tool for numerical data. This does not mean it cannot be done, but that one must create the image files elsewhere, and then overlay them onto Google Earth. For example, this link provides instructions for visualizing the output from a fire modeling code :
http://www.openwfm.org/wiki/Visualization_in_Google_Earth
The instructions here suggests how pseudocolor plots can used in at least one special case to visualize output in Google Earth.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.