Searching letter by letter in elastic search

Searching letter by letter in elastic search - python

i am using elasticsearch with python as client. I want to query through a list of companies. Say Company field values are
Gokl
Normn
Nerth
Scenario 1(using elasticsearch-dsl python)
s = Search(using=client, index="index-test") \
.query("match", Company="N")
So when i put N in query match i don't get Normn or Nerth. I think its probably because of tokenization based on words.
Scenario 2(using elasticsearch-dsl python)
s = Search(using=client, index="index-test") \
.query("match", Company="Normn")
When i enter Normn i get the output clearly. So how can i make the search active when i enter letter n as in above scenario 1.

I think you are looking for a prefix search. I don't know the python syntax but the direct query would look like this:
GET index-test/_search
{
"query": {
"prefix": {
"company": {
"value": "N"
}
}
}
}
See here for more info.

If I understand correctly you need to query companies started with specific letter
In this case you can use this query
{
"query": {
"regexp": {
"Company": "n.*"
}
}
}

please read query types from here
for this case, you can use the code below:
s = Search(using=client, index="index-test").\
.query("match_phrase_prefix", Company="N")
you can use multi-match query for Company and Another field like this:
s = Search(using=client, index="index-test").\
.query("multi_match", query="N", fields=['Company','Another_field'],type='phrase_prefix')

Related

Get second level domain with regexp in elasticsearch

I want to search through an index in my database which is elasticsearch and I want to search for domains contains a second level domain (sld) but it returns me None.
here is what I've done so far:
sld = "smth"
query = client.search(
index = "x",
body = {
"query": {
"regexp": {
"site_domain.keyword": fr"*\.{sld}\.*"
}
}
}
)
EDIT:
I think the problem is with the regex I wrote
any help would be appreciated.

TLDR;
GET /so_regex_url/_search
{
"query": {
"regexp": {
"site_domain": ".*api\\.[a-z]+\\.[a-z]+"
}
}
}
This regex will match api.google.com but won't google.com.
You should watch out for the reserved characters such as .
It require proper escape sequence.
To understand
First let's talk about the pattern your are looking for.
You want to match every url that as a given subdomain.
1. Check the subdomain string exist in the url
Something like .*<subdomain>.* will work. .* means any char in any quantity.
2. Check it is a subdomain
A subdomain in a url looks like <subdomain>.<domain>.<top level domain>
You need to make sure that your subdomain has a . between both domain and top domain
Something like .*<subdomain>.*\.[a-z]+\.[a-z]+ will work [a-z]+
means at least one character between a to z and because . has a special meaning you need to escape it with \
This will match https://<subdomain>.google.com, but won't https://<subdomain>.com
/!\ This is a naive implementation.
https://<subdomain>.1234.com won't match has neither 1, 2 ... exist in [a-z]
3. Create Elastic DSL
I am performing the request on the text field not the keyword, this keep my exemple leaner but work the same way.
GET /so_regex_url/_search
{
"query": {
"regexp": {
"site_domain": ".*api\\.[a-z]+\\.[a-z]+"
}
}
}
You may have noticed the \\ it is explained in the thread it is because the payload travel in a json it also needs to escape that.
4. Python implementation
I imagine it should be
sld = "smth"
query = client.search(
index = "x",
body = {
"query": {
"regexp": {
"site_domain.keyword": `.*{sld}\\.[a-z]+\\.[a-z]+`
}
}
}
)

MongoDB (PyMongo) Pagination with distinct not giving consistent result

I am trying to achieve pagination with distinct using pymongo.
I have records
{
name: string,
roll: integer,
address: string,
.
.
}
I only want name for each record, where name can be duplicate, so i want distinct name with pagination.
result = collection.aggregate([
{'$sort':{"name":1}},
{'$group':{"_id":"$name"}},
{'$skip':skip},
{'$limit':limit}
])
Problem is, with this query, each time I query I get different result for same page number
Looked into this answer
Distinct() command used with skip() and limit()
but didn't help in my case.
How do I resolve this.
Thanks in advance!

I've tried to sort after the group and it seems to solve the problem
db.collection.aggregate([
{
"$group": {
"_id": "$name"
}
},
{
"$sort": {
"_id": 1
}
},
{
"$skip": 0
},
{
"$limit": 1
}
])
try it here

Is there a way to find record in mongo by matching field string with an array of values

I have the below record
{
"title": "Kim floral jacquard minidress",
"designer": "Rotate Birger Christensen"
}
How can I find a record in the collection using an array of values. For example, I have the below array values. Because "title" field contains the "floral" value, the record is selected.
['floral', 'dresses']
The query I am using below doesn't work. :(
queryParam = ['floral', 'dresses']
def get_query(queryParam, gender):
query = {
"gender": gender
}
if (len(queryParam) != 0):
query["title"] = {"$in": queryParam}
return query
products_query = get_query(query, gender)
products = mongo.db.products.find(products_query)

To add to the previous answer, there's a little bit more to do to get this to work in pymongo. You have to use re.compile() to get the regex search to work:
import re
queryParam = [re.compile('floral'), re.compile('dresses')]
Alternatively you could use this approach which removes the need for the $in operator:
import re
queryParam = [re.compile('floral|dresses')]
And once you've done that you don't even need to use re.compile:
queryParam = 'floral|dress'
...
query = {"title": {"$regex": queryParam}}
Take your pick.

You need to do regex search along with $in operator :
db.collectionName.find( { title: { $in: [ /floral/, /dresses/ ] } })

How to return only aggregation results not hits in elasticsearch query dsl

I am writing a query dsl in python using http://elasticsearch-dsl.readthedocs.io
and I have following code
search.aggs.bucket('per_ts', 'terms', field='ts')\
.bucket('load_time', 'percentiles', field='total_req', percents=[99])
response = search.execute()
This works fine but it also returns hits. But I don't want hits
In curl query mode I can get what I want by doing size:0 in
GET /twitter/tweet/_search
{
"size": 0,
"aggregations": {
"my_agg": {
"terms": {
"field": "text"
}
}
}
}
I couldn't find a way where I can use size = 0 in query dsl.

Referring to the code of elasticsearch-dsl-py/search.py here
s = Search().query(...).extra(from_=0, size=25)
This statement should work.

How to use mapreduce in mongodb?

I have the following code in python:
from pymongo import Connection
import bson
c = Connection()
db = c.twitter
ids = db.users_from_united_states.distinct("user.id")
for i in ids:
count = db.users_from_united_states.find({"user.id":i}).count()
for u in db.users_from_united_states.find({"user.id":i, "tweets_text": {"$size": count}}).limit(1):
db.my_usa_fitness_network.insert(u)
I need to get all the users and find the register of each user where the number of tweets_text is equal to the number of times that it appears in the collection (meaning that this document contains ALL the tweets that the same user posted).
Then, I need to save it in another collection, or just group it on the same collection.
When I run this code it gives me a number of documents that is less than the ids number
I saw something about mapReduce but I just can't figure out how to use it in my case.
I tried to run another code directly on mongodb but it hasn't worked at all:
var ids = db.users_from_united_states.distinct("user.id")
for (i=0; i< ids.length; i++){
var count = db.users_from_united_states.find({"user.id":ids[i]}).count()
db.users_from_united_states.find({"user.id":ids[i], "tweets_text": {$size: count}).limit(1).forEach(function(doc){db.my_usa_fitness_network.insert(doc)})
}
Can you help me please? I have a huge project and I need help. Thank you.

[
{
"$group": {
"_id": "$user.id",
"my_fitness_data": {
"$push": "$text"
}
}
},
{
"$project": {
"UserId": "$_id",
"TweetsCount": {
"$size": "$my_fitness_data"
},
"Tweets": "$my_fitness_data"
}
}
]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Searching letter by letter in elastic search - python

I think you are looking for a prefix search. I don't know the python syntax but the direct query would look like this: GET index-test/_search { "query": { "prefix": { "company": { "value": "N" } } } } See here for more info.

If I understand correctly you need to query companies started with specific letter In this case you can use this query { "query": { "regexp": { "Company": "n.*" } } }

Related

Get second level domain with regexp in elasticsearch

MongoDB (PyMongo) Pagination with distinct not giving consistent result

Is there a way to find record in mongo by matching field string with an array of values

How to return only aggregation results not hits in elasticsearch query dsl

How to use mapreduce in mongodb?

Categories

Resources