Removing duplicate labels within a given radius - python

I have 500.000 data of house number (address) with longitude and latitude in a particular area, but I don't want there is the same house number within a 50m radius. Do you know how to detect the existence of the same adjacent numbers within a 50m radius?
is there any method in python, PostGIS, ArcGIS, QGIS, or another spatial tool to solve this?
Thank you.

Related

Points in Polygons. How can I match them spatially with given coordinates?

I have a dataset of georeferenced flickr posts (ca. 35k, picture below) and I have an unrelated dataset of georeferenced polygons (ca. 40k, picture below), both are currently panda dataframes. The polygons do not cover the entire area where flickr posts are possible. I am having trouble understanding how to sort many different points in many different polygons (or check if they are close). In the end I want a map with the points from the flickerdata in polygons colord to an attribute (Tag). I am trying to do this in Python. Do you have any ideas or recommendations?
Point dataframe Polygon dataframe
Since, you don't have any sample data to load and play with, my answer will be descriptive in nature, trying to explain some possible strategies to approach the problem you are trying to solve.
I assume that:
these polygons are probably some addresses and you essentially want to place the geolocated flickr posts to the nearest best-match among the polygons.
First of all, you need to identify or acquire information on the precision of those flickr geolocations. How off could they possibly be because of numerous sources of errors (the reason behind those errors is not your concern, but the amount of error is). This will give you an idea of a circle of confusion (2D) or more likely a sphere of confusion (3D). Why 3D? Well, you might have flickr post from a certain elevation on a high-rise apartment, and so, (x: latitude,y: longitude, z: altitude) all may be necessary to consider. But, you have to study the data and any other information available to you to determine the best option here (2D/3D space-of-confusion).
Once you have figured out the type of ND-space-of-confusion, you will need a distance metric (typically just a distance between two points) -- call this sigma. Just to be on the safe side, find all the addresses (geopolygons) within a radius of 1 sigma and additionally within 2 sigma -- these are your possible set of target addresses. For each of these addresses have a variable that calculates its distances of its centroid, and the four corners of its rectangular outer bounding box from the flickr geolocations.
You will then want to rank these addresses for each flickr geolocation, based on their distances for all the five points. You will need a way of identifying a flickr point that is far from a big building's center (distance from centroid could be way more than distance from the corners) but closer to it's edges vs. a different property with smaller area-footprint.
For each flickr point, thus you would have multiple predictions with different probabilities (convert the distance metric based scores into probabilities) using the distances, on which polygon they belong to.
Thus, if you choose any flickr location, you should be able to show top-k geopolygons that flickr location could belong to (with probabilities).
For visualizations, I would suggest you to use holoviews with datashader as that should be able to take care of curse of dimension in your data. Also, please take a look at leafmap (or, geemap).
References
holoviews: https://holoviews.org/
datshader: https://datashader.org/
leafmap: https://leafmap.org/
geemap: https://geemap.org/

Select a smaller sample of "uniformly" distributed co-ordinates, out of a larger population of co-ordinates

I have a set of co-ordinates(latitudes and longitudes) of different buildings of a city. The sample size is around 16,000. I plan to use these co-ordinates as the central point of their locality/neighbourhood, and do some analysis on the different neighbourhoods of the city. The "radius/size" for each neighbourhood is still undecided as of now.
However, a lot of these co-ordinates are too close to each other. So, many of them actually represent the same locality/neighbourhood.
As a result, I want to select a smaller sample(say, 3-6k) of co-ordinates that will be more evenly spread out.
Example:- If two of the co-ordinates are representing two neighbouring buildings, I don't want to include both as they pretty much represent the same area. So we must select only one of them.
This way, I was hoping to reduce the population to a smaller size, while at the same time being able to cover most of the city through the remaining co-ordinates.
One way I was imagining the solution is to plot these co-ordinates on a 2D graph(for visualisation). Then, we can select different values of "radius" to see how many co-ordinates would remain. But I do not know how to implement such a "graph".
I am doing this analysis in Python. Is there a way I can obtain such a sample of these co-ordinates that are evenly distributed with minimal overlap?
Thanks for your help,
It seems like for your use case, you might need clustering instead of sampling to reduce your analysis set.
Given that you'd want to reduce your "houses" data to "neighborhoods" data, I'd suggest exploring geospatial clustering to cluster houses that are closer together and then take your ~3-4K clusters as your data set to begin with.
That being said, if your objective still is to remove houses that are closer together, you can obviously create an N*N matrix of the geospatial distance between each house vs. others and remove pairs that are within (0, X] where X is your threshold.

Thiessen-like polygons out of pre-labeled points

I have a list of coordinate points that are already clustered. Each point is available to me as a row in a csv file, with one of the fields being the "zone id": the ID of the cluster to which a point belongs. I was wondering if there is a way, given the latitude, longitude and zone ID of each point, to draw polygons similar to Voronoi cells, such that:
each cluster is entirely contained within a polygon
each polygon contains points belonging to only one cluster
the union of the polygons is contiguous polygon that contains all the points. No holes: the polygons must border each other except at the edges. A fun extension would be to supply the "holes" (water bodies, for example) as part of the input.
I realise the problem is very abstract and could be very resource intensive, but I am curious to hear of any approaches. I am open to solutions using a variety or combination of tools, such as GIS software, Python, R, etc. I am also open to implementations that would be integrated into the clustering process.

Create Geographical Grid based on data using K-D Trees (python)

For my research, I need to divide the geographical area of a city (i.e.Chicago or New York) using a grid. Later, I have data points consisting of GPS longitude and latitude location that I want to associate to its corresponding cell in the grid.
The simplest way to do this is dividing the space into squared cells of same size. However, this will lead to cells with very few points in non-populated (rural areas) areas and cells with a high number of points (city centre). In order to have a more fair representation and the relation between the number of points and cell size, an adaptative grid that create cells of size based on data density would be a better option.
I came across this paper that utilise a K-D tree to do the space partition and retrieve the cells from the nodes. However, I cannot find any implementation (in python) that does that. Many of the implementations out there only index data points in the tree to perform Nearest Neighbour search, but they not provide code to extract the polygon-rectangles that k-d tree generates.
For example, given the following image:
My resulting grid will contain 5 cells (node1 to node5) where each cell contains the associated data points.
Any idea on how to do that?
Anyone knows any implementation?
Many thanks,
David

Find direction of vehicle on road

I am working on a GPS data that has the latitude/longitude data speed vehicle ids and so on.
Each day different time vehicle speeds are different for each side of the road.
I created this graph with plotly mapbox and color difference is related with speed of vehicle.
So my question is Can I use any cluster algorithm for find side of vehicle? I tried DBSCAN but I could not find a clear answer.
It depends on the data you have about the different spots, is you know time and speed at each point you can estimate the range within the next point should fail in, and afterwards order them in function of distance. Otherwise it is going to be complicated without more information that position and speed with all those points.
ps there is a heavy computational method to try to estimate the route by using tangent to determine the angle between segments os consecutive points
Many GPS hardware sets will compute direction and insert this information into the standard output data alongside latitude, longitude, speed, etc. You might check if your source data contains information about direction or "heading" often specified in degrees where zero degrees is regarded as North, 90 degrees is East, etc. You may need to parse the data and convert from binary or ascii hexadecimal values to integer values depending on the data structure specifications which vary for different hardware designs. If such data exists in your source, this may be a simpler and more reliable approach to determining direction.

Categories