I have been going through trying to find the best option on Google App Engine to search a Model by GPS coordinates. There seem to be a few decent options out there such as GeoModel (which is out of date) but the new Search API seems to be the most current and gives a good bit of functionality. This of course comes at the possibility of it getting expensive after 1000 searches once this leaves the experimental zone.
I am having trouble going through the docs and creating a coherent full example to be able to use the Search API to search by location and I want to know if anyone has examples they are willing to share or create to help make this process a little more straightforward. I understand how to create the actual geosearch query but I am unclear as to how to glue that together with the construction and indexing of the document.
I don't know of any examples that show geo specifically, but the process is very much the same. You can index documents containing GeoFields, each of which has a latitude and a longitude. Then, when you construct your query, you can:
limit the results by distance from a fixed point by using a query like distance(my_geo_field, geopoint(41, 65)) < 100
sort by distance from a point with a sort expression like distance(my_geo_field, geopoint(55, -20))
calculate expressions based on the distance between points by using a FieldExpression like distance(my_geo_field, geopoint(10, -30))
They work pretty much like any other field, except you can use the distance function in the query and expression languages. If you have any specific questions, feel free to ask here.
Related
I’m currently working on a research for a faceted search backend implementation and want to hear your advice about picking the right technology and approach to tackle this, before I'll gonna perform POCs.
The input
n*m matrix, where n is ~50m records, and m is ~5k columns. About half of the columns are boolean.
Size: ~10gb
Requirements
Each of the ~5k columns should be considered as an optional attribute for the faceted search. Only the relevant attributes should be appear.
The types are varies(Boolean, String etc).
Sub-second response for each search (Multiple filters applied)
Each filter should provide the values counts (e.g 100 valid options, based on the current filters).
The solution should serve 500 concurrent users per sec.
Solution alternatives (Backend solution to serve the UI)
Implement an in-memory structure. This might require custom optimizations and indices to be implemented in order to achieve sub-second response.
Work with a db/search engine which might provide the required latency. Among the solutions I thought about - Clickhouse or ElasticSearch/OpenSearch.
I would love to hear your thoughts.
The desired solution should be as simple as possible (e.g using an out of the box solution rather than implementing complex custom structure), cost wise.
*The mentioned matrix is the input - Each solution will probably require indexing / reconstruct it to the right data structure.
I am starting web app in Django, which must provide one simple task: get all records from DB which are close enough to other record.
For example: Iam in latlang (50, 10), and I need to get all records with latlang closer than 5km from me.
I found that geodjango thing called GeoDjango, but it contains a lot of other dependencies and libraries like GEOS, POSTGIS, and other stuff which i don't really need. I need only this one range functionality.
So should I use GeoDjango, or just write my own range calculation query?
Most definitely not write your own. As you get more familiar with geographic data you will realize that this particular calculation isn't at all simple see for example this question for a detailed discussion. However most of the solutions (answers) given in that question only produce approximate results. Partly due to the fact that the earth is not a perfect sphere.
On the other hand if you use Geospatial extensions for mysql (5.7 onwards) or postgresql you can make use of the ST_DWithin function.
ST_DWithin — Returns true if the geometries are within the specified distance of one another. For geometry units are in those of spatial reference and For geography units are in meters and measurement is defaulted to use_spheroid=true (measure around spheroid), for faster check, use_spheroid=false to measure along sphere.
ST_DWithin makes use of spatial indexes which home made solutions will be unable to. WHen GeoDjango is enabled, ST_DWithin becomes available as a filter to django querysets in the form of dwithin
Last but not least, if you write your own code, you will have to write a lot of code to test it too. Whereas dwithin is thoroughly tested.
I'm not really interested in generating tiles if I can help it. Instead, what I'm looking for is a means of getting "what is near me" kind of information, specifically bodies of water and green space, or even civil services.
If I had the map tiles, I suppose I could parse them for the colours I want, but I'm thinking that there must be a better/smarter way. Isn't is possible to get a list of objects near lat,lng that belong to categories A and B?
I'm a competent Python programmer, but am completely new to OSM. I understand that I can download a Very Large XML file and have all the data, but accessing it, especially for this sort of purpose is totally foreign to me.
I should however that I have at my disposal complete access to a PostgreSQL database complete with PostGIS in a GeoDjango setup.
Tiles are not necessary for this, generating tiles is just one possible way of using OSM data.
Do you need an online or offline solution? For an online solution you don't even need a local copy of the data. Instead you can directly fetch the data around a specific position. Instead of using the official API which is mainly for editing and not for bulk querying, just use the Overpass API which is way faster and features a complex query language.
Here is an example Overpass API query for querying all shops and parking places inside the given bounding box 50.6,7.0,50.65,7.05:
(
node
["shop"]
(50.6,7.0,50.65,7.05);
node
["amenity"="parking"]
(50.6,7.0,50.65,7.05);
way
["shop"]
(50.6,7.0,50.65,7.05);
way
["amenity"="parking"]
(50.6,7.0,50.65,7.05);
relation
["shop"]
(50.6,7.0,50.65,7.05);
relation
["amenity"="parking"]
(50.6,7.0,50.65,7.05);
);
(
._;
>;
);
out;
(The result can be downloaded as either XML or JSON. You can also visualize it using overpass turbo)
In order to understand the query you have to get familiar with OSM's basic elements (nodes, ways and relations) as well as the tagging system and the most common tags.
If you need an offline solution then you better set up a local database. For an instruction you can read the serving tiles howto on switch2osm and just skip the Apache/mod_tile/mapnik steps. Importing an extract instead of the whole planet will often suffice. Live-parsing a XML file instead will be very slow except you have a very small area, say a city, and you did some filtering beforehand.
There is a very beautiful package OSMnx by Geoff Boeing https://geoffboeing.com/tag/osmnx/
You can easily get all the amenities near you by OSM.
import osmnx as ox
import geopandas as gpd
place_name = "" (geocoding of polygon)
tags = {'amenity': 'charging_station'}
ox.geometries_from_place(place_name, tags)
For my app, I need to determine the nearest points to some other point and I am looking for a simple but relatively fast (in terms of performance) solution. I was thinking about using PostGIS and GeoDjango but I think my app is not really that "geographic" (I still don't really know what that means though). The geographic part (around 5 percent of the whole) is that I need to keep coordinates of objects (people and places) and then there is this task to find the nearest points. To put it simply, PostGIS and GeoDjango seems to be an overkill here.
I was also thinking of django-haystack with SOLR or Elasticsearch because I am going to need a strong, strong text search capabilities and these engines have also these "geographic" features. But not sure about it either as I am afraid of core db <-> search engine db synchronisation and hardware requirements for these engines. At the moment I am more akin to use posgreSQL trigrams and some custom way to do that "find near points problem". Is there any good one?
To find points or bounding boxes that are near each other, consider using the Rtree Python package. This uses a similar spatial index technique as PostGIS, except it is not database software and can be used in software. I've tested faster speeds than from PostGIS to find near points on millions of objects.
See examples in the tutoral to get a good feel to find nearest objects.
You're probably right, PostGIS/GeoDjango is probably overkill, but making your own Django app would not be too much trouble for your simple task. Django offers a lot in terms of templating, etc. and with the built in admin makes it pretty easy to enter single records. And GeoDjango is part of contrib, so you can always use it later if your project needs it.
check out shapely. Looks like the object's project() method may be what you're looking for.
I am a newbie in python and have been trying my hands on different problems which introduce me to different modules and functionalities (I find it as a good way of learning).
I have googled around a lot but haven't found anything close to a solution to the problem.
I have a large data set of facebook posts from various groups on facebooks that use it as a medium to mass send the knowledge.
I want to make groups out of these posts which are content-wise same.
For example, one of the posts is "xyz.com is selling free domains. Go register at xyz.com"
and another is "Everyone needs to register again at xyz.com. Due to server failure, all data has been lost."
These are similar as they both ask to go the group's website and register.
P.S: Just a clarification, if any one of the links would have been abc.com, they wouldn't have been similar.
Priority is to the source and then to the action (action being registering here).
Is there a simple way to do it in python? (a module maybe?)
I know it requires some sort of clustering algorithm ( correct me if I am wrong), my question is can python make this job easier for me somehow? some module or anything?
Any help is much appreciated!
Assuming you have a function called geturls that takes a string and returns a list of urls contained within, I would do it like this:
from collections import defaultdict
groups = defaultdict(list):
for post in facebook_posts:
for url in geturls(post):
groups[url].append(post)
That greatly depends on your definition of being "content-wise same". A straight forward approach is to use a so-called Term Frequency - Inverse Document Frequency (TFIDF) model.
Simply put, make a long list of all words in all your posts, filter out stop-words (articles, determiners etc.) and for each document (=post) count how often each term occurs, and multiplying that by the importance of the team (which is the inverse document frequency, calculated by the log of the ratio of documents in which this term occurs). This way, words which are very rare will be more important than common words.
You end up with a huge table in which every document (still, we're talking about group posts here) is represented by a (very sparse) vector of terms. Now you have a metric for comparing documents. As your documents are very short, only a few terms will be significantly high, so similar documents might be the ones where the same term achieved the highest score (ie. the highest component of the document vectors is the same), or maybe the euclidean distance between the three highest values is below some parameter. That sounds very complicated, but (of course) there's a module for that.