Spatial index/query (finding k nearest points) - python

I have +10k points (latitude, longitude) and I'm building an app that shows you the k nearest points to a user's location.
I think this is a very common problem and I don't want to reinvent the wheel. I'm learning about Quadtrees. It seems to be a good approach to solve this spatial problem.
I'm using these tools:
Python 2.5
MySQL
MongoDb
Building the Quadtree is not that hard: http://donar.umiacs.umd.edu/quadtree/points/pointquad.html But once I've created the tree and saved it to a db (MySQL or MongoDb), how I run the query?
I need to run queries like these:
Find all points within 10 km of the user's location.
Find the 6 (or at least 6) nearest points to the
user's location.
What's the standard and common approach to do it?
EDIT 1:
I've loaded the +10k points into MongoDB (Geospatial indexing) and it works fine at first glance. Anyway I've found PostGis:
PostGIS is an extension to the PostgreSQL object-relational database system which allows GIS (Geographic Information Systems) objects to be stored in the database.
So I think I'll give PostGis a try.
I've also found SimpleGeo. You can store points/places in the cloud and then query them via an API: https://simplegeo.com/docs/tutorials/python#how-do-radial-nearby-query

MongoDB has support for spatial indexes built-in, so all you'd need to do is load your points using the correct format, create the spatial index, and then run your queries.
For a quick example, I loaded the center points for all 50 states in the mongo shell:
> db.places.ensureIndex({loc: "2d"})
> db.places.save({name: "AK", loc: {long: -152.2683, lat: 61.3850}})
> db.places.save({name: "AL", loc: {long: -86.8073, lat: 32.7990}})
> db.places.save({name: "AR", loc: {long: -92.3809, lat: 34.9513}})
> db.places.save({name: "AS", loc: {long: -170.7197, lat: 14.2417}})
> ...
Next, to query for the 6 nearest points to a given location:
> db.places.find({loc: { $near: {long: -90, lat: 50}}}).limit(6)
{"name" : "WI", "loc" : { "long" : -89.6385, "lat" : 44.2563 } }
{"name" : "MN", "loc" : { "long" : -93.9196, "lat" : 45.7326 } }
{"name" : "MI", "loc" : { "long" : -84.5603, "lat" : 43.3504 } }
{"name" : "IA", "loc" : { "long" : -93.214, "lat" : 42.0046 } }
{"name" : "IL", "loc" : { "long" : -89.0022, "lat" : 40.3363 } }
{"name" : "ND", "loc" : { "long" : -99.793, "lat" : 47.5362 } }
Next, to query for all points within 10km of a given location. Since I'm calculating the nearest states, I'll use 888km (which is approximately 8 degrees of latitude):
> db.places.find({loc: { $near: {long: -90, lat: 50}, $maxDistance: 8}})
{"name" : "WI", "loc" : { "long" : -89.6385, "lat" : 44.2563 } }
{"name" : "MN", "loc" : { "long" : -93.9196, "lat" : 45.7326 } }
Since one degree of latitude is approximately 111.12km, you'd use a $maxDistance: 0.08999 to represent 10km for your application.
Updated By default MongoDB assumes an "idealized flat earth model" but this results in inaccuracies since longitude lines converge at the poles. MongoDB versions 1.7+ support spherical distance calculations, which provides the increased precision.
Here is an example of running the above query using spherical distance. the maxDistance is in radians, so we need to divide by the earth's average radius:
> db.runCommand({geoNear: "places", near: [-90, 50], spherical: true,
maxDistance: 800/6378});
(summarizing results as they're too verbose to include)
"MN" dis: 0.087..
"WI" dis: 0.100..
"ND" dis: 0.120..

You may want to look at kdtree entry in wikipedia. This would be useful when you have more than two dimensions too (unlike quadtrees). I suggest the kd-tree because the entry has python code for creating and querying the tree.

If you want to use MongoDB, then read their docs carefully. The default model is flat earth. It assumes that a degree of longitude has the same length as a degree of latitude.
I quote: """The current implementation assumes an idealized model of a flat earth, meaning that an arcdegree of latitude (y) and longitude (x) represent the same distance everywhere. This is only true at the equator where they are both about equal to 69 miles or 111km. However, at the 10gen offices at { x : -74 , y : 40.74 } one arcdegree of longitude is about 52 miles or 83 km (latitude is unchanged). This means that something 1 mile to the north would seem closer than something 1 mile to the east."""
You need their "new spherical model". Be warned: you need to use (longtitude, latitude) in that order -- again, read their docs carefully.

Related

How to find locations near a given location

I am trying to create a Bounding Box (or a circle) for the given latitude and longitude with some distance(or radius) using Python3.
I have gone through the previous solutions for this problem but I am having some doubt on how it works. There are some variables like halfsideinKm and some degree to radian and radian to degree conversion and I am unable to understand what are those conversions for and how it works.
Given lat and long finding binding box
Geocoding calculate bounding box
I have a database collection Locations(in MongoDB) which holds the lat and long.
My Requirement is if I enter a lat and long I want to have the list of Places(from my mongodb) which lie inside of the Bounding Box region(with a distance of say 20 Km).
Can anyone provide me with a solution for this problem or some explanation on how those codes work?
Can this be achieved using geopy?(because it says something about great circle distance calculation)
Database values
{
"place_id":"151142295",
"osm_type":"relation",
"osm_id":"65606",
"lat":"51.5073219",
"lon":"-0.1276474",
"display_name":"London, Greater London, England, United Kingdom",
"class":"place",
"type":"city",
"importance":0.9754895765402
},
{
"place_id":"4566287",
"osm_type":"node",
"osm_id":"485248691",
"lat":"42.988097",
"lon":"-81.2460295",
"display_name":"London, Ontario, Canada",
"class":"place",
"type":"city",
"importance":0.6515723047601
}
(just a sample of how data is stored in my db)
The very "first" thing you must do is change how you are storing your data if you intend to use geospatial queries with MongoDB. You have the option of legacy coordinate pairs or GeoJSON format. But your current storage with "lat" and "long" in separate fields and also as "strings" will not work.
Here is a schema fix for your collection, written for the mongo shell because this should be a one off operation. I'm advising on GeoJSON, as it is generally compatible with quite a few libraries, and all distances returned are in kilometers rather than radians.
var bulk = db.collection.initializeUnorderedBulkOp(),
count = 0;
db.collection.find().forEach(function(doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": {
"location": {
"type": "Point",
"coordinates": [parseFloat(doc.lon),parseFloat(doc.lat)]
}
},
"$unset": { "lat": "", "lon": "" }
});
count++;
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.collection.initializeUnorderedBulkOp();
}
});
if ( count % 1000 !=0 )
bulk.execute();
Now the data is fixed and compatible with an index, create the index. What makes sense here with GeoJSON data is a "2sphere" index:
db.collection.createIndex({ "location": "2dsphere" })
Now you are ready to query. Sticking with the shell as the python syntax is identical and I don't know your library calls:
db.collection.find({
"location": {
"$nearSphere": {
"$geometry": {
"type": "Point",
"coordinates": [lon,lat]
},
"$maxDistance": distance
}
}
})
This query uses $nearSphere which will calculate properly on distance based on the curvature of the earth, ideal for real location data. Your three variables there are the "longitude", "latitude" ( in that order ) in the coordinates array and the "distance" under $maxDistance that you want to find things within that radius.
This is a very simple query procedure once your data is suitable and the required geospatial index is in place.
No need for messy calculations in your client, as the server does all the work.
The links to the relevant documentation parts are all included for your reference. Read them and learn more about geospatial queries with MongoDB.

Creating dictionary using the data of planets

I have a bunch of details about of the planets in the solar system. I am supposed to make a dictionary in Python out of it. For each planet, I have its radius, distance from sun, number of moons, if a atmospheres exists, name of moons, if it's a gas planets and so on.
For example, here is the data for Mercury:
Mercury
Radius - 2,439.7 km
Distance from the sun - 58 million km
Moons - none
Atmosphere? True
Gas planet? False
How would I use all this data to create a dictionary?
So far I have:
radius = {} #radius of planets
radius['Mercury'] = 2439.7
radius['Venus'] = 6051.8
radius['Earth'] = 6371.0
radius['Mars'] = 3,396.2
radius['Jupiter'] = 69,911
radius['Saturn'] = 60,268
radius['Uranus'] = 25,559
radius['Neptune'] = 24,764
distance = {} # distance from sun
distance['Mercury'] = 58000000
distance['Venus'] = 108000000
i was planning of continuing this to create a dictionary for all the data that I have so that I would have different sections for each different type of data.
However, I don't if this is the right way to do it. Could somebody tell me if I am going in the right direction? If not, how would I fix it?
It's probably much easier to make a dictionary of planets, each member of which contains a dictionary of that planet's properties. You can also save yourself some effort and avoid repetition by using the dictionary literal syntax.
Doing it as suggested above looks something like this:
planets = {
'Mercury': {
'radius': 2439.7,
'distance': 58000000
'moons': []
# etc...
},
'Venus': {
'radius': 6051.8,
'distance': 108000000,
'moons': []
# etc...
},
'Earth': {
'radius': 6371.0,
'distance': 150000000,
'moons': ['Luna']
# etc...
}
# etc...
}
It would probably be better to structure your data so it looks like this:
planets = {
"mercury": {
"radius": 2439.7,
"distance": 58000000,
# etc
},
"venus": {
"radius": 6051.8,
"distance": 108000000,
# etc
},
#etc
}
That way, we only need a single variable, and can automatically keep all the data related to a single planet in one place.
Then, if you want to obtain all the distances of a planet (for example), you can construct another temporary dictionary by using either list or dictionary comprehensions:
distances = {planet: planets[planet]['distance'] for planet in planets}
I would design the dictionary so that the key is the planet name, and the values are another dictionary, the keys of which are the properties.
You would access it as follows:
mercury_radius = planet["mercury"]["radius"]
And declare it as follows:
planets = {
"mercury" : {
"radius" : 2439.7,
"distance_sun" : 58000000,
"moons" : 0,
"atmosphere" : True,
"gas_planet" : False
}
}

How can I use Mongodb Aggregation in this example?

I am currently using Python to build many of my results instead of MongoDB itself. I am trying to get my head around Aggregation, but I'm struggling a bit. Here is an example of what I am doing currently which perhaps could be better handled by MongoDB.
I have a collection of programs and a collection of episodes. Each program has a list of episodes (DBRefs) associated with it. (The episodes are stored in their own collection because both programs and episodes are quite complex and deep, so embedding is impractical). Each episode has a duration (float). If I want to find a program's average episode duration, I do this:
episodes = list(db.Episodes.find({'Program':DBRef('Programs',ObjectId(...))}))
durations = set(e['Duration'] for e in episodes if e['Duration'] > 0)
avg_mins = int(sum(durations) / len(durations) / 60
This is pretty slow when a program has over 1000 episodes. Is there a way I can do it in MongoDB?
Here is some sample data in Mongo shell format. There are three episodes belonging to the same program. How can I calculate the average episode duration for the program?
> db.Episodes.find({
'_Program':DBRef('Programs',ObjectId('4ec634fbf4c4005664000313'))},
{'_Program':1,'Duration':1}).limit(3)
{
"_id" : ObjectId("506c15cbf4c4005f9c40f830"),
"Duration" : 1643.856,
"_Program" : DBRef("Programs", ObjectId("4ec634fbf4c4005664000313"))
}
{
"_id" : ObjectId("506c15d3f4c4005f9c40f8cf"),
"Duration" : 1598.088,
"_Program" : DBRef("Programs", ObjectId("4ec634fbf4c4005664000313"))
}
{
"_id" : ObjectId("506c15caf4c4005f9c40f80e"),
"_Program" : DBRef("Programs", ObjectId("4ec634fbf4c4005664000313")),
"Duration" : 1667.04
}
I figured it out, and it is ridiculously fast compared to pulling it all into Python.
p = db.Programs.find_one({'Title':'...'})
pipe = [
{'$match':{'_Program':DBRef('Programs',p['_id']),'Duration':{'$gt':0}}},
{'$group':{'_id':'$_Program', 'AverageDuration':{'$avg':'$Duration'}}}
]
eps = db.Episodes.aggregate(pipeline=pipe)
print eps['result']

Raster: How to get elevation at lat/long using python?

I also posted this question in the GIS section of SO. As I'm not sure if this rather a 'pure' python question I also ask it here again.
I was wondering if anyone has some experience in getting elevation data from a raster without using ArcGIS, but rather get the information as a python list or dict?
I get my XY data as a list of tuples.
I'd like to loop through the list or pass it to a function or class-method to get the corresponding elevation for the xy-pairs.
I did some research on the topic and the gdal API sounds promising. Can anyone advice me how to go about things, pitfalls, sample code? Other options?
Thanks for your efforts, LarsVegas
I recommend checking out the Google Elevation API
It's very straightforward to use:
http://maps.googleapis.com/maps/api/elevation/json?locations=39.7391536,-104.9847034&sensor=true_or_false
{
"results" : [
{
"elevation" : 1608.637939453125,
"location" : {
"lat" : 39.73915360,
"lng" : -104.98470340
},
"resolution" : 4.771975994110107
}
],
"status" : "OK"
}
note that the free version is limited to 2500 requests per day.
We used this code to get elevation for a given latitude/longitude (NOTE: we only asked to print the elevation, and the rounded lat and long values).
import urllib.request
import json
lati = input("Enter the latitude:")
lngi = input("Enter the longitude:")
# url_params completes the base url with the given latitude and longitude values
ELEVATION_BASE_URL = 'http://maps.googleapis.com/maps/api/elevation/json?'
URL_PARAMS = "locations=%s,%s&sensor=%s" % (lati, lngi, "false")
url=ELEVATION_BASE_URL + URL_PARAMS
with urllib.request.urlopen(url) as f:
response = json.loads(f.read().decode())
status = response["status"]
result = response["results"][0]
print(float(result["elevation"]))
print(float(result["location"]["lat"]))
print(float(result["location"]["lng"]))
Have a look at altimeter a wrapper for the Google Elevation API
Here is the another one nice API that I`v built: https://algorithmia.com/algorithms/Gaploid/Elevation
import Algorithmia
input = {
"lat": "50.2111",
"lon": "18.1233"
}
client = Algorithmia.client('YOUR_API_KEY')
algo = client.algo('Gaploid/Elevation/0.3.0')
print algo.pipe(input)

When I do Geospatial querying in MongoDB, how do I also return the distance from my desired point?

for post in db.datasets.find({"loc":{"$near":[50,50]}}).limit(10):
How do I get the distance between the document and "50,50"?
Using the "geoNear" command will return a "dis" value as part of the results. Multiplying this value by the Earth's radius in the unit of your choice will give you the distance between that result and your original location.
For example, where "places" is the name of the collection and the lat/lng for your original point is 50, 50.
db.command( { geoNear : "places" , near : [50,50] } );
Will return a result in the format:
[
{
"dis" : 0.3886630122897946,
"obj" : {
"_id" : ObjectId("4d9123026ccc7e2cf22925c4"),
"pos" : {
"lon" : -10,
"lat" : -20
}
}
}
]
Multiplying each result's "dis" value by 6371 will give you the distance in km, while 3959 gives miles.
There's more details and examples in the Mongo docs.

Categories