I am working on a GPS data that has the latitude/longitude data speed vehicle ids and so on.
Each day different time vehicle speeds are different for each side of the road.
I created this graph with plotly mapbox and color difference is related with speed of vehicle.
So my question is Can I use any cluster algorithm for find side of vehicle? I tried DBSCAN but I could not find a clear answer.
It depends on the data you have about the different spots, is you know time and speed at each point you can estimate the range within the next point should fail in, and afterwards order them in function of distance. Otherwise it is going to be complicated without more information that position and speed with all those points.
ps there is a heavy computational method to try to estimate the route by using tangent to determine the angle between segments os consecutive points
Many GPS hardware sets will compute direction and insert this information into the standard output data alongside latitude, longitude, speed, etc. You might check if your source data contains information about direction or "heading" often specified in degrees where zero degrees is regarded as North, 90 degrees is East, etc. You may need to parse the data and convert from binary or ascii hexadecimal values to integer values depending on the data structure specifications which vary for different hardware designs. If such data exists in your source, this may be a simpler and more reliable approach to determining direction.
Related
I have a dataset of georeferenced flickr posts (ca. 35k, picture below) and I have an unrelated dataset of georeferenced polygons (ca. 40k, picture below), both are currently panda dataframes. The polygons do not cover the entire area where flickr posts are possible. I am having trouble understanding how to sort many different points in many different polygons (or check if they are close). In the end I want a map with the points from the flickerdata in polygons colord to an attribute (Tag). I am trying to do this in Python. Do you have any ideas or recommendations?
Point dataframe Polygon dataframe
Since, you don't have any sample data to load and play with, my answer will be descriptive in nature, trying to explain some possible strategies to approach the problem you are trying to solve.
I assume that:
these polygons are probably some addresses and you essentially want to place the geolocated flickr posts to the nearest best-match among the polygons.
First of all, you need to identify or acquire information on the precision of those flickr geolocations. How off could they possibly be because of numerous sources of errors (the reason behind those errors is not your concern, but the amount of error is). This will give you an idea of a circle of confusion (2D) or more likely a sphere of confusion (3D). Why 3D? Well, you might have flickr post from a certain elevation on a high-rise apartment, and so, (x: latitude,y: longitude, z: altitude) all may be necessary to consider. But, you have to study the data and any other information available to you to determine the best option here (2D/3D space-of-confusion).
Once you have figured out the type of ND-space-of-confusion, you will need a distance metric (typically just a distance between two points) -- call this sigma. Just to be on the safe side, find all the addresses (geopolygons) within a radius of 1 sigma and additionally within 2 sigma -- these are your possible set of target addresses. For each of these addresses have a variable that calculates its distances of its centroid, and the four corners of its rectangular outer bounding box from the flickr geolocations.
You will then want to rank these addresses for each flickr geolocation, based on their distances for all the five points. You will need a way of identifying a flickr point that is far from a big building's center (distance from centroid could be way more than distance from the corners) but closer to it's edges vs. a different property with smaller area-footprint.
For each flickr point, thus you would have multiple predictions with different probabilities (convert the distance metric based scores into probabilities) using the distances, on which polygon they belong to.
Thus, if you choose any flickr location, you should be able to show top-k geopolygons that flickr location could belong to (with probabilities).
For visualizations, I would suggest you to use holoviews with datashader as that should be able to take care of curse of dimension in your data. Also, please take a look at leafmap (or, geemap).
References
holoviews: https://holoviews.org/
datshader: https://datashader.org/
leafmap: https://leafmap.org/
geemap: https://geemap.org/
maybe somebody knows something, since I am not able to find anything that makes sense to me.
I have a dataset positions (lon, lat) and I want to snap them to the nearest road and calculate the distance between them.
So far I discovered OSM, however I can't find a working example on how to use the API using python.
If any of you could help, I am thankful for ever little detail.
Will try to find it out by myself in the meantime and publish the answer if successful (couldn't find any similar question so maybe it will help someone in the future)
Welcome! OSM is a wonderful resource, but is essentially a raw dataset that you have to download and do your own processing on. There are a number of ways to do this, if you need a relatively small extract of the data (as opposed to the full planet file) the Overpass API is the place to look. Overpass turbo (docs) is a useful tool to help with this API.
Once you have the road network data you need, you can use a library like Shapely to snap your points to the road network geometry, and then either calculate the distance between them (if you need "as the crow flies" distance), or split the road geometry by the snapped points and calculate the length of the line. If you need real-world distance that takes the curvature of the earth into consideration (as opposed to the distance as it appears on a projected map), you can use something like Geopy.
You may also want to look into the Map Matching API from Mapbox (full disclosure, I work there), which takes a set of coordinates, snaps them to the road network, and returns the snapped geometry as well as information about the route, including distance.
You might use KDTree of sklearn for this. You fill an array with coordinates of candidate roads (I downloaded this from openstreetmap). Then use KDTree to make a tree of this array. Finally, use KDTree.query(your_point, k=1) to get the nearest point of the tree (which is the nearest node of the coordinate roads). Since searching the tree is very fast (essentially log(N) for N points that form the tree), you can query lots of points.
I have a set of co-ordinates(latitudes and longitudes) of different buildings of a city. The sample size is around 16,000. I plan to use these co-ordinates as the central point of their locality/neighbourhood, and do some analysis on the different neighbourhoods of the city. The "radius/size" for each neighbourhood is still undecided as of now.
However, a lot of these co-ordinates are too close to each other. So, many of them actually represent the same locality/neighbourhood.
As a result, I want to select a smaller sample(say, 3-6k) of co-ordinates that will be more evenly spread out.
Example:- If two of the co-ordinates are representing two neighbouring buildings, I don't want to include both as they pretty much represent the same area. So we must select only one of them.
This way, I was hoping to reduce the population to a smaller size, while at the same time being able to cover most of the city through the remaining co-ordinates.
One way I was imagining the solution is to plot these co-ordinates on a 2D graph(for visualisation). Then, we can select different values of "radius" to see how many co-ordinates would remain. But I do not know how to implement such a "graph".
I am doing this analysis in Python. Is there a way I can obtain such a sample of these co-ordinates that are evenly distributed with minimal overlap?
Thanks for your help,
It seems like for your use case, you might need clustering instead of sampling to reduce your analysis set.
Given that you'd want to reduce your "houses" data to "neighborhoods" data, I'd suggest exploring geospatial clustering to cluster houses that are closer together and then take your ~3-4K clusters as your data set to begin with.
That being said, if your objective still is to remove houses that are closer together, you can obviously create an N*N matrix of the geospatial distance between each house vs. others and remove pairs that are within (0, X] where X is your threshold.
My goal is to compute the number of hours for a given day that the sun shines on a given location using Python, under the assumption of a clear sky. The problem arises from the search for real estate. I would like to know how much sun I will actually get on some property, such that I do not have to rely on the statements made by the real estate sales person and can judge a property solely based on the address. I am searching in an area with several nearby mountains, which should be considered in the calcualtion.
The approach that I would like to use is the following:
For a whole year (1.1.2020 till 31.12.2020), with a given temporal resolution (e.g. minutes), compute the altitude and azimuth angles of the sun for the defined location.
Find the angular height of nearby obstacles as seen from the location, yielding value pairs of azimuth and altitude angles. This can be trees, buildings or the already mentioned mountains. Let's assume that trees and buildings are negligible and it's mainly the mountains that take away the sun.
For each day, check for each time with the given resolution, whether the altitude angle of the sun is higher than the altitude angle of the obstacle, at the azimuthal location of the sun. For each day of the year, I can then know at which times the sun is visible.
Steps 1 and 3 are easy. For step 1, I can use for example the Python module pysolar. Step 2 is more complicated. If I were to stand on a perfectly flat plane that extends far to the horizon, the obstacle altitude would be 0 for all azimuth angles. If there was a mountain nearby, I would need to know the shape of the mountain, as seen from the location. Unfortunately, I do not even know where to start solving step 2 and I do not know how to name this problem, i.e. I do not know how to google for a solution. In the best case, there would be a Python module that does that the calculation for me, e.g. by connecting to topography data based on OpenStreetMap or other services. If such a module does not exist, I would have to manually program an access to topography data and then do some type of grid search - divide the landscape in a fine grid (possibly spherical coordinates), compute the altitude and azimuth angle of grid points as seen from the location (under consideration of the earth curvature and the elevation at the location) and find the maximum for a given azimuth angle. Are there easier ways to do this? Are there Python modules that do this?
Step 2 isn't that hard if you have a height map and I'm sure there are height maps available.
Cast n rays in an evenly spread 360 degree pattern out from your location and visit all map positions every delta meters from your starting position along that ray. If d is the distance of a point to you, h is its height and h0 is your height on the heightmap keep the maximum value of (h-h0)/d along the path (if something is twice as far away it needs to be twice as high to cast the same length shadow).
You can pretty much ignore earth's curvature - its effect on sunlight occlusion is negligible.
How do I calculate distance between 2 coordinates by sea? I also want to be able to draw a route between the two coordinates.
Only solution I found so far is to split a map into pixels, identify each pixel as LAND or SEA and then try to find the path using A* algorithm. Then transform pixels to relative coordinates.
There are some software packages I could buy but none have online extensions. A service that calculates distances between sea ports and plots the path on a map is searates.com
Beware of the fact that maps can distort distances. For example, in a Mercator projections segments far away from the equator represent less actual distance than segments near the equator of equal length. If you just assign uniform cost to your pixels/squares/etc, you will end up with non-optimal routing and erroneous distance calculations.
If you project a grid on your map (pixels being just one particular grid out of many possible ones) and search for the optimal path using A*, all you need to do to get the search algorithm to behave properly is set the edge weight according to the real distance along the surface of the sphere (earth) and not the distance on the map.
Beware that simply saying "sea or not-sea" is not enough to determine navigability. There are also issues of depth, traffic routing (e.g. shipping traffic thought the English Channel is split into lanes) and political considerations (territorial waters etc). You also want to add routes manually for channels that are too small to show up on the map (Panama, Suez) and adjust their cost to cover for any overhead incurred.
Pretty much you'll need to split the sea into pixels and do something like A*. You could optimize it a bit by coalescing contiguous pixels into larger areas, but if you keep everything squares it'll probably make the search easier. The search would no longer be Manhattan-style, but if you had large enough squares, the additional connection decision time would be more than made up for.
Alternatively, you could iteratively "grow" polygons from all of your ports, building up convex polygons (so that any point within the polygon is reachable from any other without going outside, you want to avoid the PacMan shape, for instance), although this is a refinement/complication/optimization of the "squares" approach I first mentioned. The key is that you know once you're in an area that you can get to anywhere else in that area.
I don't know if this helps, sorry. It's been a long day. Good luck, though. It sounds like a fun problem!
Edit: Forgot to mention, you could also preprocess your area into a quadtree. That is, take your entire map and split it in half vertically and horizontally (you don't need to do both splits at the same time, and if you want to spend some time making "better" splits, you can do that later), and do that recursively until each node is entirely land or sea. From this you can trivially make a network of connections (just connect neighboring leaves), and the A* should be easy enough to implement from there. This'll probably be the easiest way to implement my first suggestion anyway. :)
I reached a satisfactory solution. It is along the lines of what you suggested and what I had in mind initially but it took me a while to figure out the software and GIS concepts, I am a GIS newbie. If someone bumps into something similar again here's my setup: PostGIS for PostgreSQL, maps from Natural Earth, GIS editing software qGis and OpenJUmp, routing algorithms pgRouting.
The Natural Earth maps needed some processing to be useful, I joined the marine polys and the rivers to be able to get some accurate paths to the most inland points. Then I used the 1 degree graticules to get paths from one continent to another (I need to find a more elegant solution than this because some paths look like chess cubes). All these operations can be done from command line by using PostGIS, I found it easier to use the desktop software (next, next). An alternative to Natural Earth maps might be the OpenStreetMap but the planet.osm dump is aroung 200Gb and that discouraged me.
I think this setup also solves the distance accuracy problem, PostGIS takes into account the Earth's actual form and distances should be pretty accurate.
I still need to do some testing and fine tunings but I can say it can calculate and draw a route from any 2 points on the world's coastlines (no small isolated islands yet) and display the routing points names (channels, seas, rivers, oceans).