The best way to connect users in closed proximity ? (latitude, longitude)

The best way to connect users in closed proximity ? (latitude, longitude) - python

I create an android app which connect runners in closed proximity. I used a Tornado ServerWeb (Python) and a No-SQL database.
My solution:
Store all the {lon,lat} (regularly updated) of users in a DataLocation.
When a user want to see users around him, it calls specific function to my server which make a bounding box from his current position. The next step is to return the users of my DataLocation who are in his bounding box.
Is that a good way? Any advices? Is GeoJSON useful for me? How can I do that in Python?

If you don't have a library and want to do it yourself, you can calculate distance using the Great Circle Distance. The formula is not too complex. To find a group of points that are within a certain radius, you will need to query the database for points whose distance(as calculated by great circle distance) is less than your radius.
In addition, if you are wanting to get better speed, you will want to cache intermediate calculations, such as sines and cosines in another column, as that will speed things up, especially if you want to do the query at the database level.

Related

How to calculate and store Lat/Long data for real time web app?

I am designing an app that will keep track of all user's lat/long. For each user it will calculate the distance between all other users in a city. I will be using python.
For example:
My client will update a database with its long/lat every x seconds. Each time it updates it will have to recalculate the distance between itself and all other users that are logged on.
My plan was to split up cities into their own tables, so as to keep the data set and calculations smaller. But the more I think about this idea gets worse and worse. I don't think it would scale at all if there were any signficant amount of traffic. People would have to be confined to a major metropolitan area if they want to use the app which would limit the user base.
so my question is:
Is there a storage backend that is optimized to do these calculations? I have just heard about PostGIS this morning but from what I have read it seems like it might be overkill? All I plan on doing is calculating the distance between lat/long's.
Thank you

There is "aviation formulary". Look there under "distance between points".
Here's the Equirectangular Approximation which is a hair faster.
Angles (la1, lo1), (la2, lo2) are in radians; you must convert from degrees.
The response, c, similarly, is in radians. You don't want degrees, but want statute miles or km. For that, multiply by the earth's mean radius: 6,378.1 kilometers, 3,961.3 miles

Suitable kind of database to track a high volume of changes

I am trying to implement a python script which writes and reads to a database to track changes within a 3d game (Minecraft) These changes are done by various clients and can be represented by player name, coordinates (x,y,z), and a description. I am storing a high volume of changes and would like to know what would be an easy and preferably fast way to store and retrieve these changes. What kinds of databases that would be suited to this job?

Any kind. A NoSQL option like MongoDB might be especially interesting.

PostgreSQL has a cube module that supports simple storage, indexing and spatial operations on 3D points and cubes.

Distance by sea calculator, intermediate coordinates?

How do I calculate distance between 2 coordinates by sea? I also want to be able to draw a route between the two coordinates.
Only solution I found so far is to split a map into pixels, identify each pixel as LAND or SEA and then try to find the path using A* algorithm. Then transform pixels to relative coordinates.
There are some software packages I could buy but none have online extensions. A service that calculates distances between sea ports and plots the path on a map is searates.com

Beware of the fact that maps can distort distances. For example, in a Mercator projections segments far away from the equator represent less actual distance than segments near the equator of equal length. If you just assign uniform cost to your pixels/squares/etc, you will end up with non-optimal routing and erroneous distance calculations.
If you project a grid on your map (pixels being just one particular grid out of many possible ones) and search for the optimal path using A*, all you need to do to get the search algorithm to behave properly is set the edge weight according to the real distance along the surface of the sphere (earth) and not the distance on the map.
Beware that simply saying "sea or not-sea" is not enough to determine navigability. There are also issues of depth, traffic routing (e.g. shipping traffic thought the English Channel is split into lanes) and political considerations (territorial waters etc). You also want to add routes manually for channels that are too small to show up on the map (Panama, Suez) and adjust their cost to cover for any overhead incurred.

Pretty much you'll need to split the sea into pixels and do something like A*. You could optimize it a bit by coalescing contiguous pixels into larger areas, but if you keep everything squares it'll probably make the search easier. The search would no longer be Manhattan-style, but if you had large enough squares, the additional connection decision time would be more than made up for.
Alternatively, you could iteratively "grow" polygons from all of your ports, building up convex polygons (so that any point within the polygon is reachable from any other without going outside, you want to avoid the PacMan shape, for instance), although this is a refinement/complication/optimization of the "squares" approach I first mentioned. The key is that you know once you're in an area that you can get to anywhere else in that area.
I don't know if this helps, sorry. It's been a long day. Good luck, though. It sounds like a fun problem!
Edit: Forgot to mention, you could also preprocess your area into a quadtree. That is, take your entire map and split it in half vertically and horizontally (you don't need to do both splits at the same time, and if you want to spend some time making "better" splits, you can do that later), and do that recursively until each node is entirely land or sea. From this you can trivially make a network of connections (just connect neighboring leaves), and the A* should be easy enough to implement from there. This'll probably be the easiest way to implement my first suggestion anyway. :)

I reached a satisfactory solution. It is along the lines of what you suggested and what I had in mind initially but it took me a while to figure out the software and GIS concepts, I am a GIS newbie. If someone bumps into something similar again here's my setup: PostGIS for PostgreSQL, maps from Natural Earth, GIS editing software qGis and OpenJUmp, routing algorithms pgRouting.
The Natural Earth maps needed some processing to be useful, I joined the marine polys and the rivers to be able to get some accurate paths to the most inland points. Then I used the 1 degree graticules to get paths from one continent to another (I need to find a more elegant solution than this because some paths look like chess cubes). All these operations can be done from command line by using PostGIS, I found it easier to use the desktop software (next, next). An alternative to Natural Earth maps might be the OpenStreetMap but the planet.osm dump is aroung 200Gb and that discouraged me.
I think this setup also solves the distance accuracy problem, PostGIS takes into account the Earth's actual form and distances should be pretty accurate.
I still need to do some testing and fine tunings but I can say it can calculate and draw a route from any 2 points on the world's coastlines (no small isolated islands yet) and display the routing points names (channels, seas, rivers, oceans).

Finding intersections

Given a scenario where there are millions of potentially overlapping bounding boxes of variable sizes less the 5km in width.
Create a fast function with the arguments findIntersections(Longitude,Latitude,Radius) and the output is a list of those bounding boxes ids where each bounding box origin is inside the perimeter of the function argument dimensions.
How do I solve this problem elegantly?

This is normally done using an R-tree data structure
dbs like mysql or postgresql have GIS modules that use an r-tree under the hood to quickly retrieve locations within a certain proximity to a point on a map.
From http://en.wikipedia.org/wiki/R-tree:
R-trees are tree data structures that
are similar to B-trees, but are used
for spatial access methods, i.e., for
indexing multi-dimensional
information; for example, the (X, Y)
coordinates of geographical data. A
common real-world usage for an R-tree
might be: "Find all museums within 2
kilometres (1.2 mi) of my current
location".
The data structure splits space with
hierarchically nested, and possibly
overlapping, minimum bounding
rectangles (MBRs, otherwise known as
bounding boxes, i.e. "rectangle", what
the "R" in R-tree stands for).
The Priority R-Tree (PR-tree) is a variant that has a maximum running time of:
"O((N/B)^(1-1/d)+T/B) I/Os, where N is the number of d-dimensional (hyper-)
rectangles stored in the R-tree, B is the disk block size, and T is the output
size."
In practice most real-world queries will have a much quicker average case run time.
fyi, in addition to the other great code posted, there's some cool stuff like SpatiaLite and SQLite R-tree module

PostGIS is an open-source GIS extention for postgresql.
They have ST_Intersects and ST_Intersection functions available.
If your interested you can dig around and see how it's implemented there:
http://svn.osgeo.org/postgis/trunk/postgis/

This seems like a better more general approach GiST
http://en.wikipedia.org/wiki/GiST

Server Side Google Markers Clustering - Python/Django

After experimenting with client side approach to clustering large numbers of Google markers I decided that it won't be possible for my project (social network with 28,000+ users).
Are there any examples of clustering the coordinates on the server side - preferably in Python/Django?
The way I would like this to work is to gradually index the markers based on their proximity (radius) and zoom level.
In another words when a new user registers he/she is automatically assigned to a certain 'group' of markers that are close to each other thus increasing the 'group's' counter. What's being send to the server is just a small number of 'groups'. Only when the zoom level/scale of map is 1:1 - actual users are shown on the map.
That way the client side will have to deal only with 10-50 markers per request/zoom level.

This is a paid service that uses server-side clustering, but I'm not sure how it works. I'm guessing that they just use your data to generate the markers to be shown at each zoom level.
Update: This tutorial demonstrates a basic server-side clustering function. It's written in PHP for the Static Maps API, but you could use it as a starting point.

You might want to take a look at the DBSCAN and OPTICS pages on wikipedia, these looks very suitable for clustering places on a map. There is also a page about Cluster Analysis that shows all the possible algorithms you can use, most would be trivial to implement using the language of your choice.
With 28k+ points, you might want to skip django and just jump into C/C++ directly, and surely not expect this to get calculated in real-time in response to web requests.

One way to do it would be to define a grid with a unit size based on the zoom level. So you collect up all the items within a grid by lat,lon to one decimal place. An example is 42.2x73.4. So a point at 42.2003x73.4021 falls in that grid cell. That cell is bounded by 42.2x73.3 and 42.2x73.5.
If there are one or more points in a grid cell, you place a marker in the center of that grid.
You then hook up the zoomend event and change your grid size accordingly, and redraw the markers.
http://code.google.com/apis/maps/documentation/reference.html#GMap2.zoomend

You can try my server-side clustering django app:
https://github.com/biodiv/anycluster
It prvides a kmeans and a grid cluster.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.