What I assumed would be a pretty trivial query, turned out to be difficult for me.
I have a collection of event data for users to a website. I am trying to find the landing page for each user. Unique users have the field anonymousId and the users current page path is in the nested field context.page.path.
I can find the first date which the user visits but am unsure how to extract the context.page.path that goes along with that date in the same query. I used the $first operator on the page path as well, but I'm fairly certain this is incorrect...
Here is the code I'm using to create the cursor in Python (pymongo):
cursor = db.events.aggregate({
'$group':
{
'_id': '$anonymousId',
'date': { '$first': '$timestamp' },
'page': { '$first': '$context.page.path' }
}
})
Edit:
Here is the document structure for this collection (redacted)...
{
"anonymousId": "...",
"timestamp": "2016-04-05T13:05:06.076Z",
"context": {
"page": {
"path": "...",
"referrer": "...",
"title": "...",
"url": "..."
}
},
... more fields here but not relevant to this question
}
I think I may have figured it out... but I'm still not 100% positive.
cursor = db.events.aggregate([
{
'$group':
{
'_id': '$anonymousId',
'date': { '$first': '$timestamp' },
'page': { '$push': '$context.page.path' }
}
},
{
'$unwind': '$page'
}
])
forgive the poor indentation...
Related
I'm curious about the best approach to count the instances of a particular field, across all documents, in a given ElasticSearch index.
For example, if I've got the following documents in index goober:
{
'_id':'foo',
'field1':'a value',
'field2':'a value'
},
{
'_id':'bar',
'field1':'a value',
'field2':'a value'
},
{
'_id':'baz',
'field1':'a value',
'field3':'a value'
}
I'd like to know something like the following:
{
'index':'goober',
'field_counts':
'field1':3,
'field2':2,
'field3':1
}
Is this doable with a single query? or multiple? For what it's worth, I'm using python elasticsearch and elasticsearch-dsl clients.
I've successfully issued a GET request to /goober and retrieved the mappings, and am learning how to submit requests for aggregations for each field, but I'm interested in learning how many times a particular field appears across all documents.
Coming from using Solr, still getting my bearings with ES. Thanks in advance for any suggestions.
The below will return you the count of docs with "field2":
POST /INDEX/_search
{
"size": 0,
"query": {
"bool": {
"filter": {
"exists": {
"field": "field2"
}
}
}
}
}
And here is an example using multiple aggregates (will return each agg in a bucket with a count), using field exist counts:
POST /INDEX/_search
{
"size": 0,
"aggs": {
"field_has1": {
"filter": {
"exists": {
"field": "field1"
}
}
},
"field_has2": {
"filter": {
"exists": {
"field": "field2"
}
}
}
}
}
The behavior within each agg on the second example will mimic the behavior of the first query. In many cases, you can take a regular search query and nest those lookups within aggregate buckets.
Quick time-saver based on existing answer:
interesting_fields = ['field1', 'field2']
body = {
'size': 0,
'aggs': {f'has_{field_name}': {
"filter": {
"exists": {
"field": f'export.{field_name}'
}
}
} for field_name in interesting_fields},
}
print(requests.post('http://localhost:9200/INDEX/_search', json=body).json())
I have created Elasticsearch index and one of the nested field has mapping as following.
"groups": {
"type": "nested",
"properties": {
"name": {
"type": "text"
},
"value": {
"type": "text"
}
}
}
On details about ES version, its 5.0 and I am using official python client elasticsearch-py on client side. I want to query this nested field based on its value.
Lets say there is another field called name which is a text type field. I want to find all name starting with A and falling under group specified.
Some sample data,
Groups - HR(name=HR, value=hr), Marketing(name=Marketing, value=marketing)
Names - Andrew, Alpha, Barry, John
Andrew and Alpha belong to group HR.
Based on this I tried a query
{
'query': {
'bool': {
'must': [{
'match_phrase_prefix': {
'title': 'A'
}
}]
},
'nested': {
'path': 'groups',
'query': {
'bool': {
'must': [{
'match': {
'groups.value': 'hr'
}
}]
}
}
}
}
}
For this query I referred ES docs but this query does not return anything. It would be great if someone can point out what is wrong with this query or mapping itself.
You're almost there, you simply need to move the nested query inside the bool/must query:
{
'query': {
'bool': {
'must': [
{
'match_phrase_prefix': {
'title': 'A'
}
},
{
'nested': {
'path': 'groups',
'query': {
'bool': {
'must': [{
'match': {
'groups.value': 'hr'
}
}]
}
}
}
}
]
}
}
}
I'm using Elastic search with Python. I can't find a way to make insensitive search with accents.
For example:
I have two words. "Camión" and "Camion".
When a user search for "camion" I'd like the two results show up.
Creating index:
es = Elasticsearch([{u'host': u'127.0.0.1', u'port': b'9200'}])
es.indices.create(index='name', ignore=400)
es.index(
index="name",
doc_type="producto",
id=p.pk,
body={
'title': p.titulo,
'slug': p.slug,
'summary': p.summary,
'description': p.description,
'image': foto,
'price': p.price,
'wholesale_price': p.wholesale_price,
'reference': p.reference,
'ean13': p.ean13,
'rating': p.rating,
'quantity': p.quantity,
'discount': p.discount,
'sales': p.sales,
'active': p.active,
'encilleria': p.encilleria,
'brand': marca,
'brand_title': marca_titulo,
'sellos': sellos_str,
'certificados': certificados_str,
'attr_naturales': attr_naturales_str,
'soluciones': soluciones_str,
'categories': categories_str,
'delivery': p.delivery,
'stock': p.stock,
'consejos': p.consejos,
'ingredientes': p.ingredientes,
'es_pack': p.es_pack,
'temp': p.temp,
'relevancia': p.relevancia,
'descontinuado': p.descontinuado,
}
Search:
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': '127.0.0.1', 'port': '9200'}])
resul = es.search(
index="name",
body={
"query": {
"query_string": {
"query": "(title:" + search + " OR description:" + search + " OR summary:" + search + ") AND (active:true)",
"analyze_wildcard": False
}
},
"size": "9999",
}
)
print resul
I've searched on Google, Stackoverflow and elastic.co but I didn't find anything that works.
You need to change the mapping of those fields you have in the query. Changing the mapping requires re-indexing so that the fields will be analyzed differently and the query will work.
Basically, you need something like the following below. The field called text is just an example. You need to apply the same settings for other fields as well. Note that I used fields in there so that the root field will maintain the original text analyzed by default, while text.folded will remove the accented characters and will make it possible for your query to work. I have also changed the query a bit so that you search both versions of that field (camion will match, but also camión).
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"test": {
"properties": {
"text": {
"type": "string",
"fields": {
"folded": {
"type": "string",
"analyzer": "folding"
}
}
}
}
}
}
}
And the query:
"query": {
"query_string": {
"query": "\\*.folded:camion"
}
}
Also, I strongly suggest reading this section of the documentation: https://www.elastic.co/guide/en/elasticsearch/guide/current/asciifolding-token-filter.html
I am using pymongo MongoClient to do multiple fields distinct count.
I found a similar example here: Link
But it doesn't works for me.
For example, by given:
data = [{"name": random.choice(all_names),
"value": random.randint(1, 1000)} for i in range(1000)]
collection.insert(data)
I want to count how many name, value combination. So I followed the Link above, and write this just for a test (I know this solution is not what I want, I just follow the pattern of the Link, and trying to get understand how it works, at least this code can returns me somthing):
collection.aggregate([
{
"$group": {
"_id": {
"name": "$name",
"value": "$value",
}
}
},
{
"$group": {
"_id": {
"name": "$_id.name",
},
"count": {"$sum": 1},
},
}
])
But the console gives me this:
on namespace test.$cmd failed: exception: A pipeline stage
specification object must contain exactly one field.
So, what is the right code to do this? Thank you for all your helps.
Finally I found a solution: Group by Null
res = col.aggregate([
{
"$group": {
"_id": {
"name": "$name",
"value": "$value",
},
}
},
{
"$group": {"_id": None, "count": {"$sum": 1}}
},
])
I am using python to generate a mongodb database collection and I need to find some specific values from the database, the document is like:
{
"_id":ObjectId(215454541245),
"category":food
"venues":{"Thai Restaurant":251, "KFC":124, "Chinese Restaurant":21,.....}
}
My question is that, I want to query this database and find all venues which have a value smaller than 200, so in my example, "KFC" and "Chinese Restaurant" will be returned from this query.
Anyone knows how to do that?
If you can change your schema it would be much easier to issue queries against your collection. As it is, having dynamic values as your keys is considered a bad design pattern with MongoDB as they are extremely difficult to query.
A recommended approach would be to follow an embedded model like this:
{
"_id": ObjectId("553799187174b8c402151d06"),
"category": "food",
"venues": [
{
"name": "Thai Restaurant",
"value": 251
},
{
"name": "KFC",
"value": 124
},
{
"name": "Chinese Restaurant",
"value": 21
}
]
}
Thus with this structure you could then issue the query to find all venues which have a value smaller than 200:
db.collection.findOne({"venues.value": { "$lt": 200 } },
{
"venues": { "$elemMatch": { "value": { "$lt": 200 } } },
"_id": 0
});
This will return the result:
/* 0 */
{
"venues" : [
{
"name" : "KFC",
"value" : 124
}
]
}