Points in Polygons. How can I match them spatially with given coordinates? - python

I have a dataset of georeferenced flickr posts (ca. 35k, picture below) and I have an unrelated dataset of georeferenced polygons (ca. 40k, picture below), both are currently panda dataframes. The polygons do not cover the entire area where flickr posts are possible. I am having trouble understanding how to sort many different points in many different polygons (or check if they are close). In the end I want a map with the points from the flickerdata in polygons colord to an attribute (Tag). I am trying to do this in Python. Do you have any ideas or recommendations?
Point dataframe Polygon dataframe

Since, you don't have any sample data to load and play with, my answer will be descriptive in nature, trying to explain some possible strategies to approach the problem you are trying to solve.
I assume that:
these polygons are probably some addresses and you essentially want to place the geolocated flickr posts to the nearest best-match among the polygons.
First of all, you need to identify or acquire information on the precision of those flickr geolocations. How off could they possibly be because of numerous sources of errors (the reason behind those errors is not your concern, but the amount of error is). This will give you an idea of a circle of confusion (2D) or more likely a sphere of confusion (3D). Why 3D? Well, you might have flickr post from a certain elevation on a high-rise apartment, and so, (x: latitude,y: longitude, z: altitude) all may be necessary to consider. But, you have to study the data and any other information available to you to determine the best option here (2D/3D space-of-confusion).
Once you have figured out the type of ND-space-of-confusion, you will need a distance metric (typically just a distance between two points) -- call this sigma. Just to be on the safe side, find all the addresses (geopolygons) within a radius of 1 sigma and additionally within 2 sigma -- these are your possible set of target addresses. For each of these addresses have a variable that calculates its distances of its centroid, and the four corners of its rectangular outer bounding box from the flickr geolocations.
You will then want to rank these addresses for each flickr geolocation, based on their distances for all the five points. You will need a way of identifying a flickr point that is far from a big building's center (distance from centroid could be way more than distance from the corners) but closer to it's edges vs. a different property with smaller area-footprint.
For each flickr point, thus you would have multiple predictions with different probabilities (convert the distance metric based scores into probabilities) using the distances, on which polygon they belong to.
Thus, if you choose any flickr location, you should be able to show top-k geopolygons that flickr location could belong to (with probabilities).
For visualizations, I would suggest you to use holoviews with datashader as that should be able to take care of curse of dimension in your data. Also, please take a look at leafmap (or, geemap).
References
holoviews: https://holoviews.org/
datshader: https://datashader.org/
leafmap: https://leafmap.org/
geemap: https://geemap.org/

Related

How to snap coordinates to road and calculate distance

maybe somebody knows something, since I am not able to find anything that makes sense to me.
I have a dataset positions (lon, lat) and I want to snap them to the nearest road and calculate the distance between them.
So far I discovered OSM, however I can't find a working example on how to use the API using python.
If any of you could help, I am thankful for ever little detail.
Will try to find it out by myself in the meantime and publish the answer if successful (couldn't find any similar question so maybe it will help someone in the future)
Welcome! OSM is a wonderful resource, but is essentially a raw dataset that you have to download and do your own processing on. There are a number of ways to do this, if you need a relatively small extract of the data (as opposed to the full planet file) the Overpass API is the place to look. Overpass turbo (docs) is a useful tool to help with this API.
Once you have the road network data you need, you can use a library like Shapely to snap your points to the road network geometry, and then either calculate the distance between them (if you need "as the crow flies" distance), or split the road geometry by the snapped points and calculate the length of the line. If you need real-world distance that takes the curvature of the earth into consideration (as opposed to the distance as it appears on a projected map), you can use something like Geopy.
You may also want to look into the Map Matching API from Mapbox (full disclosure, I work there), which takes a set of coordinates, snaps them to the road network, and returns the snapped geometry as well as information about the route, including distance.
You might use KDTree of sklearn for this. You fill an array with coordinates of candidate roads (I downloaded this from openstreetmap). Then use KDTree to make a tree of this array. Finally, use KDTree.query(your_point, k=1) to get the nearest point of the tree (which is the nearest node of the coordinate roads). Since searching the tree is very fast (essentially log(N) for N points that form the tree), you can query lots of points.

Select a smaller sample of "uniformly" distributed co-ordinates, out of a larger population of co-ordinates

I have a set of co-ordinates(latitudes and longitudes) of different buildings of a city. The sample size is around 16,000. I plan to use these co-ordinates as the central point of their locality/neighbourhood, and do some analysis on the different neighbourhoods of the city. The "radius/size" for each neighbourhood is still undecided as of now.
However, a lot of these co-ordinates are too close to each other. So, many of them actually represent the same locality/neighbourhood.
As a result, I want to select a smaller sample(say, 3-6k) of co-ordinates that will be more evenly spread out.
Example:- If two of the co-ordinates are representing two neighbouring buildings, I don't want to include both as they pretty much represent the same area. So we must select only one of them.
This way, I was hoping to reduce the population to a smaller size, while at the same time being able to cover most of the city through the remaining co-ordinates.
One way I was imagining the solution is to plot these co-ordinates on a 2D graph(for visualisation). Then, we can select different values of "radius" to see how many co-ordinates would remain. But I do not know how to implement such a "graph".
I am doing this analysis in Python. Is there a way I can obtain such a sample of these co-ordinates that are evenly distributed with minimal overlap?
Thanks for your help,
It seems like for your use case, you might need clustering instead of sampling to reduce your analysis set.
Given that you'd want to reduce your "houses" data to "neighborhoods" data, I'd suggest exploring geospatial clustering to cluster houses that are closer together and then take your ~3-4K clusters as your data set to begin with.
That being said, if your objective still is to remove houses that are closer together, you can obviously create an N*N matrix of the geospatial distance between each house vs. others and remove pairs that are within (0, X] where X is your threshold.

How to identify particles in this complex image?

I have been trying Python+OpenCV for quite long time already and followed many tutorials in order to identify particles in the following image:
My ultimate goal is to identify every particle, from there I will be able to e.g. count number of particles, calculate a size distribution, etc.
I have already tried to customize many examples several sites.
I got good hints based on:
How to define the markers for Watershed in OpenCV?
Counting particles using image processing in python
Although I was not able to achieve decent results.
How can I identify particles in this image using Python and OpenCV?
IMO, the only hope to get meaningful results is to use the fact that the particles are round. By using some homogeneity criterion, you could find candidate particle centers, and from these grow contours in such a way that they remain round and stop at edges. An option could be to draw rays from the seed point, find the closest edge points and use a robust fit of a circle or an ellipse.
Reject the shapes that are too far from roundness. This should allow you to find the unoccluded particles. Then you can continue the game from other seed points, this time growing contours that can be occluded by the already detected particles. (When an edge is hit, if it is known to belong to a particle, ignore it.)
Let's pretend the goal is to get an estimated number of particles. Also, let's assume those particles are spheres.
With that being said it should be possible to build a model, based on highlight, shadow, halftone to make the final result as accurate as it can be.
With that being said a simple proof of the concept based on highlight segmentation can be verified.
Initial result doesn't seem to be promising, but a tiny change of the contrast improves it:
Should be enough to get estimated number of stones and apply more advanced models for identified regions.

Thiessen-like polygons out of pre-labeled points

I have a list of coordinate points that are already clustered. Each point is available to me as a row in a csv file, with one of the fields being the "zone id": the ID of the cluster to which a point belongs. I was wondering if there is a way, given the latitude, longitude and zone ID of each point, to draw polygons similar to Voronoi cells, such that:
each cluster is entirely contained within a polygon
each polygon contains points belonging to only one cluster
the union of the polygons is contiguous polygon that contains all the points. No holes: the polygons must border each other except at the edges. A fun extension would be to supply the "holes" (water bodies, for example) as part of the input.
I realise the problem is very abstract and could be very resource intensive, but I am curious to hear of any approaches. I am open to solutions using a variety or combination of tools, such as GIS software, Python, R, etc. I am also open to implementations that would be integrated into the clustering process.

Corner detection in an array of points

I get a pointcloud from my lidar which is basically an numpy array of points in 2D cartesian coordinates. Is there any efficient way to detect corners formed by such 2D points?
What I tried until now was to detect clusters, then apply RANSAC on each cluster to detect two lines and then estimate the intersection point of those two lines. This method works well when I know how many clusters I have (in this case I put 3 boxes in front of my robot) and when the surrounding of the robot is free and no other objects are detected.
What I would like to do is run a general corner detection, then take the points surrounding each corner and check if lines are orthogonal. If it is the case then I can consider this corner as feature. This would make my algorithm more flexible when it comes to the surrounding environment.
Here is a visualization of the data I get:
There are many many ways to do this. First thing I'd try in your case would be to chain with a reasonable distance threshold for discontinuities, using the natural lidar scan ordering of the points. Then it become a problem of either estimating local curature or, as you have done, grow and merge linear segments.

Categories