knn search query using python and elasticsearch - python

I try to do this query with elasticsearch python client :
curl -X GET "localhost:9200/articles/_knn_search" -H 'Content-Type: application/json' -d '
{
"knn": {
"field": "title_vector",
"query_vector": [-0.01807806, 0.024579186,...],
"k": 10,
"num_candidates": 100
},
"_source": ["title", "category"]
}
'
If anyone can help me with thanks.
EDIT : with elasticsearch python client > 8.0 there is a new function named knn_search so we can run knn_search very easy:
query = {
"field": "title_vector",
"query_vector": [-0.01807806, 0.024579186,...],
"k": 10,
"num_candidates": 100
}
es = Elasticsearch(request_timeout=600, hosts='http://localhost:9200')
res = es.knn_search(index="index_name", knn=query, source=["filed1", "field2"])

Related

Convert terminal input to python dictionary for use with API

$ curl https://api.goclimate.com/v1/flight_footprint \
-u YOUR_API_KEY: \
-d 'segments[0][origin]=ARN' \
-d 'segments[0][destination]=BCN' \
-d 'segments[1][origin]=BCN' \
-d 'segments[1][destination]=ARN' \
-d 'cabin_class=economy' \
-d 'currencies[]=SEK' \
-d 'currencies[]=USD' \
-G
I have the following input, provided as an example by the creators of the API. This input is meant to be used in the terminal and give output in form of a dictionary. How would it be possible to write the input above in a list or dictionary to use it as part of an Python script? I tried it like below but the response from the API is solely b' '
payload = {
"segments" : [
{
"origin" : "ARN",
"destination" : "BCN"
},
{
"origin" : "BCN",
"destination" : "ARN"
}
],
"cabin_class" : "economy",
"currencies" : [
"SEK", "USD"
]
}
r = requests.get('https://api.goclimate.com/v1/flight_footprint', auth=('my_API_key', ''), data=payload)
print(r.content)
You are making a GET request with requests, but you are trying to pass data, which would be appropriate for making a POST request. Here you want to use params instead:
response = requests.get(
"https://api.goclimate.com/v1/flight_footprint",
auth=("my_API_key", ""),
params=payload,
)
print(response.content)
Now, what should payload be? It can be a dictionary, but it can't be nested in the way you had it, since it needs to be encoded into the URL as parameters (N.B. this is what your -G option was doing in the curl request).
Looking at the docs and your curl example, I think it should be:
payload = {
"segments[0][origin]": "ARN",
"segments[0][destination]": "BCN",
"segments[1][origin]": "BCN",
"segments[1][destination]": "ARN",
"cabin_class": "economy",
"currencies[]": "SEK", # this will actually be overwritten
"currencies[]": "USD", # since this key is a duplicate (see below)
}
response = requests.get(
"https://api.goclimate.com/v1/flight_footprint",
auth=("my_API_key", ""),
params=payload,
)
print(response.content)
Thinking of how we might parse your original dictionary into this structure:
data = {
"segments" : [
{
"origin" : "ARN",
"destination" : "BCN"
},
{
"origin" : "BCN",
"destination" : "ARN"
}
],
"cabin_class" : "economy",
"currencies" : [
"SEK", "USD"
]
}
payload = {}
for index, segment in enumerate(data["segments"]):
origin = segment["origin"]
destination = segment["destination"]
# python 3.6+ needed:
payload[f"segments[{index}][origin]"] = origin
payload[f"segments[{index}][destination]"] = destination
payload["cabin_class"] = data["cabin_class"]
# requests can handle repeated parameters with the same name this way:
payload["currencies[]"] = data["currencies"]
... should do it.

FastApi - receive list of objects in body request

I need to create an endpoint that can receive the following JSON and recognize the objects contained in it:
{​
"data": [
{​
"start": "A", "end": "B", "distance": 6
}​,
{​
"start": "A", "end": "E", "distance": 4
}​
]
}
I created a model to handle a single object:
class GraphBase(BaseModel):
start: str
end: str
distance: int
And with it, I could save it in a database. But now I need to receive a list of objects and save them all.
I tried to do something like this:
class GraphList(BaseModel):
data: Dict[str, List[GraphBase]]
#app.post("/dummypath")
async def get_body(data: schemas.GraphList):
return data
But I keep getting this error on FastApi: Error getting request body: Expecting property name enclosed in double quotes: line 1 column 2 (char 1) and this message on the response:
{
"detail": "There was an error parsing the body"
}
I'm new to python and even newer to FastApi, how can I transform that JSON to a list of GraphBaseto save them in my db?
This is a working example.
from typing import List
from pydantic import BaseModel
from fastapi import FastAPI
app = FastAPI()
class GraphBase(BaseModel):
start: str
end: str
distance: int
class GraphList(BaseModel):
data: List[GraphBase]
#app.post("/dummypath")
async def get_body(data: GraphList):
return data
I could try this API on the autogenerated docs.
Or, on the console (you may need to adjust the URL depending on your setting):
curl -X 'POST' \
'http://localhost:8000/dummypath' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"data": [
{
"start": "string",
"end": "string",
"distance": 0
}
]
}'
The error looks like the data problem. And I found that you have extra spaces at several places. Try the following:
{
"data": [
{
"start": "A", "end": "B", "distance": 6
},
{
"start": "A", "end": "E", "distance": 4
}
]
}
The positions of extra spaces (which I removed) are below:

How to execute PUT in curl in elastic search through python api

Below are two curl command.
I need to add the below content in curl in index of elasticsearch through python api, How to achieve this
curl -XPUT 'http://localhost:9200/my_country_index_5/country/1' -d '
{
"name": "Afginastan"
}'
curl -XPUT 'http://localhost:9200/my_country_index_5/state/1?parent=3' -d '
{
"name": "Andra Pradesh",
"country": "India"
}'
curl -XPUT 'http://localhost:9200/my_country_index_5/city/1?parent=5' -d '
{
"name": "Kolhapur",
"state": "Maharashtra"
}'
I have created index in python below is the code
from elasticsearch import Elasticsearch
es = Elasticsearch()
es.indices.create(index='my_country_index_5', ignore=400)
How to put in to same index(my_country_index_5) but different document country, state, city
doc = {
"name": "Afginastan"
}
res = es.index(index="my_country_index_5", id=1, body=doc)
print(res['result'])
If I understand well, you want to have type which is shown in python as doc_type. type is omitted from version 7. In elasticsearch version lower than 7, you can have this by sending doc_type in the index arguments.
res = es.index(index="my_country_index_5", id=1, doc_type="state", body=doc)
res = es.index(index="my_country_index_5", id=1, doc_type="city", body=doc)
res = es.index(index="my_country_index_5", id=1, doc_type="country", body=doc)

How to fetch data from Elastic Database periodically using python?

I need to fetch data from Elastic database every 4 minutes, but I am facing problems in how to modify the #timestamp variable in the below mentioned query so as I can push the appropriate query to fetch the data from the URL.
I am using Python as the language.
Curl:
curl -XGET "URL" -H 'Content-Type: application/json' -k -u u_name:XXX -d'
{
"query": {
"query_string": {
"query": "#timestamp:[2018-06-29T06:47:40.000Z TO *]"
}
},
"size": 1000
}
'|json_pp )
I can use CRON to run the script scheduled every 7 minutes, but I can't understand how can I modify the #timestamp variable in the above query so as I can get every new data since the last run.
Any inputs are valuable.
You can use command date in Bash to format timestamp.
current date and time
date +%Y-%m-%dT%H:%M:%S
# 2018-07-14T03:00:58
minus 7 minutes
date --date '-7 min' +%Y-%m-%dT%H:%M:%S
# 2018-07-14T02:53:58
Using `` (ticks/backticks) you can try to put it in other command in Bash (but you many need to use " " instead of ' ' in -d)
curl -XGET "URL" -H 'Content-Type: application/json' -k -u u_name:XXX -d'
{
"query": {
"query_string": {
"query": "#timestamp:[`date --date \'-7 min\' +%Y-%m-%dT%H:%M:%S`.000Z TO *]"
}
},
"size": 1000
}
'|json_pp )
If you need it as Python code then you can use page https://curl.trillworks.com/ to convert curl to requests and later you can make modifications.
import requests
import datetime
import pprint # pretty print
#dt = datetime.datetime(2018, 6, 29, 6, 47, 40)
dt = datetime.datetime.now()
td_7mins = datetime.timedelta(minutes=7)
dt = dt - td_7mins # now - 7 minutes
#timestamp = "#timestamp:[{}.000Z TO *]".format(now.strftime("%Y-%m-%dT%H:%M:%S"))
timestamp = dt.strftime("#timestamp:[%Y-%m-%dT%H:%M:%S.000Z TO *]")
data = {
"query": {
"query_string": {
"query": timestamp
}
},
"size": 1000
}
print(data)
url = "https://httpbin.org/get" # good for tests
r = requests.get(url, json=data, headers=headers, verify=False, auth=('u_name', 'XXX'))
pprint.pprint(r.json())

indexing synonyms in ElasticSearch Python

Problem description
I want to run a query string like this for example :
{"query": {
"query_string" : {
"fields" : ["description"],
"query" : "illegal~"
}
}
}
I have a side synonyms.txt file that contains synonyms :
illegal, banned, criminal, illegitimate, illicit, irregular, outlawed, prohibited
otherWord, synonym1, synonym2...
I want to find all elements having any one of these synonyms.
What I tried
First I want to index those synonyms in my ES database.
I tried to run this query with curl :
curl -X PUT "https://instanceAdress.europe-west1.gcp.cloud.es.io:9243/app/kibana#/dev_tools/console/sources" -H 'Content-Type: application/json' -d' {
"settings": {
"index" : {
"analysis" : {
"analyzer" : {
"synonym" : {
"tokenizer" : "whitespace",
"filter" : ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms_path" : "synonyms.txt"
}
}
}
}
}
}
'
but it doesn't work {"statusCode":404,"error":"Not Found"}
I then need to change my query so that it takes into account the synonyms but I have no idea how.
So my questions are :
How can I index my synonyms ?
How can I change my query so that it does the query for all synonyms ?
Is there any way to index them in Python ?
example of a get query using Python Elasticsearch
es = Elasticsearch(
['fullAdress.europe-west1.gcp.cloud.es.io'],
http_auth=('login', 'password'),
scheme="https",
port=9243,
)
es.get(index="sources", doc_type='rcp', id="301495")
You can index using synonyms with Python by:
First, create a token filter:
synonyms_token_filter = token_filter(
'synonyms_token_filter', # Any name for the filter
'synonym', # Synonym filter type
synonyms=your_synonyms # Synonyms mapping will be inlined
)
And then create an analyzer:
custom_analyzer = analyzer(
'custom_analyzer',
tokenizer='standard',
filter=[
'lowercase',
synonyms_token_filter
])
There's also a package for this: https://github.com/agora-team/elasticsearch-synonyms

Categories