knn search query using python and elasticsearch

knn search query using python and elasticsearch - python

I try to do this query with elasticsearch python client :
curl -X GET "localhost:9200/articles/_knn_search" -H 'Content-Type: application/json' -d '
{
"knn": {
"field": "title_vector",
"query_vector": [-0.01807806, 0.024579186,...],
"k": 10,
"num_candidates": 100
},
"_source": ["title", "category"]
}
'
If anyone can help me with thanks.
EDIT : with elasticsearch python client > 8.0 there is a new function named knn_search so we can run knn_search very easy:
query = {
"field": "title_vector",
"query_vector": [-0.01807806, 0.024579186,...],
"k": 10,
"num_candidates": 100
}
es = Elasticsearch(request_timeout=600, hosts='http://localhost:9200')
res = es.knn_search(index="index_name", knn=query, source=["filed1", "field2"])

Related

Convert terminal input to python dictionary for use with API

$ curl https://api.goclimate.com/v1/flight_footprint \
-u YOUR_API_KEY: \
-d 'segments[0][origin]=ARN' \
-d 'segments[0][destination]=BCN' \
-d 'segments[1][origin]=BCN' \
-d 'segments[1][destination]=ARN' \
-d 'cabin_class=economy' \
-d 'currencies[]=SEK' \
-d 'currencies[]=USD' \
-G
I have the following input, provided as an example by the creators of the API. This input is meant to be used in the terminal and give output in form of a dictionary. How would it be possible to write the input above in a list or dictionary to use it as part of an Python script? I tried it like below but the response from the API is solely b' '
payload = {
"segments" : [
{
"origin" : "ARN",
"destination" : "BCN"
},
{
"origin" : "BCN",
"destination" : "ARN"
}
],
"cabin_class" : "economy",
"currencies" : [
"SEK", "USD"
]
}
r = requests.get('https://api.goclimate.com/v1/flight_footprint', auth=('my_API_key', ''), data=payload)
print(r.content)

You are making a GET request with requests, but you are trying to pass data, which would be appropriate for making a POST request. Here you want to use params instead:
response = requests.get(
"https://api.goclimate.com/v1/flight_footprint",
auth=("my_API_key", ""),
params=payload,
)
print(response.content)
Now, what should payload be? It can be a dictionary, but it can't be nested in the way you had it, since it needs to be encoded into the URL as parameters (N.B. this is what your -G option was doing in the curl request).
Looking at the docs and your curl example, I think it should be:
payload = {
"segments[0][origin]": "ARN",
"segments[0][destination]": "BCN",
"segments[1][origin]": "BCN",
"segments[1][destination]": "ARN",
"cabin_class": "economy",
"currencies[]": "SEK", # this will actually be overwritten
"currencies[]": "USD", # since this key is a duplicate (see below)
}
response = requests.get(
"https://api.goclimate.com/v1/flight_footprint",
auth=("my_API_key", ""),
params=payload,
)
print(response.content)
Thinking of how we might parse your original dictionary into this structure:
data = {
"segments" : [
{
"origin" : "ARN",
"destination" : "BCN"
},
{
"origin" : "BCN",
"destination" : "ARN"
}
],
"cabin_class" : "economy",
"currencies" : [
"SEK", "USD"
]
}
payload = {}
for index, segment in enumerate(data["segments"]):
origin = segment["origin"]
destination = segment["destination"]
# python 3.6+ needed:
payload[f"segments[{index}][origin]"] = origin
payload[f"segments[{index}][destination]"] = destination
payload["cabin_class"] = data["cabin_class"]
# requests can handle repeated parameters with the same name this way:
payload["currencies[]"] = data["currencies"]
... should do it.

FastApi - receive list of objects in body request

I need to create an endpoint that can receive the following JSON and recognize the objects contained in it:
{
"data": [
{
"start": "A", "end": "B", "distance": 6
},
{
"start": "A", "end": "E", "distance": 4
}
]
}
I created a model to handle a single object:
class GraphBase(BaseModel):
start: str
end: str
distance: int
And with it, I could save it in a database. But now I need to receive a list of objects and save them all.
I tried to do something like this:
class GraphList(BaseModel):
data: Dict[str, List[GraphBase]]
#app.post("/dummypath")
async def get_body(data: schemas.GraphList):
return data
But I keep getting this error on FastApi: Error getting request body: Expecting property name enclosed in double quotes: line 1 column 2 (char 1) and this message on the response:
{
"detail": "There was an error parsing the body"
}
I'm new to python and even newer to FastApi, how can I transform that JSON to a list of GraphBaseto save them in my db?

This is a working example.
from typing import List
from pydantic import BaseModel
from fastapi import FastAPI
app = FastAPI()
class GraphBase(BaseModel):
start: str
end: str
distance: int
class GraphList(BaseModel):
data: List[GraphBase]
#app.post("/dummypath")
async def get_body(data: GraphList):
return data
I could try this API on the autogenerated docs.
Or, on the console (you may need to adjust the URL depending on your setting):
curl -X 'POST' \
'http://localhost:8000/dummypath' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"data": [
{
"start": "string",
"end": "string",
"distance": 0
}
]
}'
The error looks like the data problem. And I found that you have extra spaces at several places. Try the following:
{
"data": [
{
"start": "A", "end": "B", "distance": 6
},
{
"start": "A", "end": "E", "distance": 4
}
]
}
The positions of extra spaces (which I removed) are below:

How to execute PUT in curl in elastic search through python api

Below are two curl command.
I need to add the below content in curl in index of elasticsearch through python api, How to achieve this
curl -XPUT 'http://localhost:9200/my_country_index_5/country/1' -d '
{
"name": "Afginastan"
}'
curl -XPUT 'http://localhost:9200/my_country_index_5/state/1?parent=3' -d '
{
"name": "Andra Pradesh",
"country": "India"
}'
curl -XPUT 'http://localhost:9200/my_country_index_5/city/1?parent=5' -d '
{
"name": "Kolhapur",
"state": "Maharashtra"
}'
I have created index in python below is the code
from elasticsearch import Elasticsearch
es = Elasticsearch()
es.indices.create(index='my_country_index_5', ignore=400)
How to put in to same index(my_country_index_5) but different document country, state, city
doc = {
"name": "Afginastan"
}
res = es.index(index="my_country_index_5", id=1, body=doc)
print(res['result'])

If I understand well, you want to have type which is shown in python as doc_type. type is omitted from version 7. In elasticsearch version lower than 7, you can have this by sending doc_type in the index arguments.
res = es.index(index="my_country_index_5", id=1, doc_type="state", body=doc)
res = es.index(index="my_country_index_5", id=1, doc_type="city", body=doc)
res = es.index(index="my_country_index_5", id=1, doc_type="country", body=doc)

How to fetch data from Elastic Database periodically using python?

I need to fetch data from Elastic database every 4 minutes, but I am facing problems in how to modify the #timestamp variable in the below mentioned query so as I can push the appropriate query to fetch the data from the URL.
I am using Python as the language.
Curl:
curl -XGET "URL" -H 'Content-Type: application/json' -k -u u_name:XXX -d'
{
"query": {
"query_string": {
"query": "#timestamp:[2018-06-29T06:47:40.000Z TO *]"
}
},
"size": 1000
}
'|json_pp )
I can use CRON to run the script scheduled every 7 minutes, but I can't understand how can I modify the #timestamp variable in the above query so as I can get every new data since the last run.
Any inputs are valuable.

You can use command date in Bash to format timestamp.
current date and time
date +%Y-%m-%dT%H:%M:%S
# 2018-07-14T03:00:58
minus 7 minutes
date --date '-7 min' +%Y-%m-%dT%H:%M:%S
# 2018-07-14T02:53:58
Using `` (ticks/backticks) you can try to put it in other command in Bash (but you many need to use " " instead of ' ' in -d)
curl -XGET "URL" -H 'Content-Type: application/json' -k -u u_name:XXX -d'
{
"query": {
"query_string": {
"query": "#timestamp:[`date --date \'-7 min\' +%Y-%m-%dT%H:%M:%S`.000Z TO *]"
}
},
"size": 1000
}
'|json_pp )
If you need it as Python code then you can use page https://curl.trillworks.com/ to convert curl to requests and later you can make modifications.
import requests
import datetime
import pprint # pretty print
#dt = datetime.datetime(2018, 6, 29, 6, 47, 40)
dt = datetime.datetime.now()
td_7mins = datetime.timedelta(minutes=7)
dt = dt - td_7mins # now - 7 minutes
#timestamp = "#timestamp:[{}.000Z TO *]".format(now.strftime("%Y-%m-%dT%H:%M:%S"))
timestamp = dt.strftime("#timestamp:[%Y-%m-%dT%H:%M:%S.000Z TO *]")
data = {
"query": {
"query_string": {
"query": timestamp
}
},
"size": 1000
}
print(data)
url = "https://httpbin.org/get" # good for tests
r = requests.get(url, json=data, headers=headers, verify=False, auth=('u_name', 'XXX'))
pprint.pprint(r.json())

indexing synonyms in ElasticSearch Python

Problem description
I want to run a query string like this for example :
{"query": {
"query_string" : {
"fields" : ["description"],
"query" : "illegal~"
}
}
}
I have a side synonyms.txt file that contains synonyms :
illegal, banned, criminal, illegitimate, illicit, irregular, outlawed, prohibited
otherWord, synonym1, synonym2...
I want to find all elements having any one of these synonyms.
What I tried
First I want to index those synonyms in my ES database.
I tried to run this query with curl :
curl -X PUT "https://instanceAdress.europe-west1.gcp.cloud.es.io:9243/app/kibana#/dev_tools/console/sources" -H 'Content-Type: application/json' -d' {
"settings": {
"index" : {
"analysis" : {
"analyzer" : {
"synonym" : {
"tokenizer" : "whitespace",
"filter" : ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms_path" : "synonyms.txt"
}
}
}
}
}
}
'
but it doesn't work {"statusCode":404,"error":"Not Found"}
I then need to change my query so that it takes into account the synonyms but I have no idea how.
So my questions are :
How can I index my synonyms ?
How can I change my query so that it does the query for all synonyms ?
Is there any way to index them in Python ?
example of a get query using Python Elasticsearch
es = Elasticsearch(
['fullAdress.europe-west1.gcp.cloud.es.io'],
http_auth=('login', 'password'),
scheme="https",
port=9243,
)
es.get(index="sources", doc_type='rcp', id="301495")

You can index using synonyms with Python by:
First, create a token filter:
synonyms_token_filter = token_filter(
'synonyms_token_filter', # Any name for the filter
'synonym', # Synonym filter type
synonyms=your_synonyms # Synonyms mapping will be inlined
)
And then create an analyzer:
custom_analyzer = analyzer(
'custom_analyzer',
tokenizer='standard',
filter=[
'lowercase',
synonyms_token_filter
])
There's also a package for this: https://github.com/agora-team/elasticsearch-synonyms

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

knn search query using python and elasticsearch - python

Related

Convert terminal input to python dictionary for use with API

FastApi - receive list of objects in body request

How to execute PUT in curl in elastic search through python api

How to fetch data from Elastic Database periodically using python?

indexing synonyms in ElasticSearch Python

Categories

Resources