Querying ElasticSearch with Python Requests not working fine

Querying ElasticSearch with Python Requests not working fine - python

I'm trying to do full-text search on a mongodb db with the Elastic Search engine but I ran into a problem: no matters what search term I provide(or if I use query1 or query2), the engine always returns the same results. I think the problem is in the way I make the requests, but I don't know how to solve it.
Here is the code:
def search(search_term):
query1 = {
"fuzzy" : {
"art_text" : {
"value" : search_term,
"boost" : 1.0,
"min_similarity" : 0.5,
"prefix_length" : 0
}
},
"filter": {
"range" : {
"published": {
"from" : "20130409T000000",
"to": "20130410T235959"
}
}
}
}
query2 = {
"match_phrase": { "art_text": search_term }
}
es_query = json.dumps(query1)
uri = 'http://localhost:9200/newsidx/_search'
r = requests.get(uri, params=es_query)
results = json.loads( r.text )
data = [res['_source']['api_id'] for res in results['hits']['hits'] ]
print "results: %d" % len(data)
pprint(data)

The params parameter is not for data being sent. If you're trying to send data to the server you should specifically be using the data parameter. If you're trying to send query parameters, then you shouldn't be JSON-encoding them and just give it to params as a dict.
I suspect your first request should be the following:
r = requests.get(uri, data=es_query)
And before someone downvotes me, yes the HTTP/1.1 spec allows data to be sent with GET requests and yes requests does support it.

search = {'query': {'match': {'test_id':13} }, 'sort' {'date_utc':{'order':'desc'}} }
data = requests.get('http://localhost:9200/newsidx/test/_search?&pretty',params = search)
print data.json()
http://docs.python-requests.org/en/latest/user/quickstart/

Related

Python - API- pagination Graphql

I would like to retrieve data from an API, the problem is that it only returns 49 data each time.
I got the startCursor, hasNextPage and endCursor but I don't know how to tell the script to loop until hasNextPage=False based on the endCursor and thus have all the data for my request.
Here is the code:
import requests
import json
query = """
query {
player(slug:"lionel-andres-messi-cuccittini"){
cards(rarities:[limited]) {
nodes {
slug
userOwnerWithRate {
from
}
}
pageInfo{
startCursor
hasNextPage
endCursor
}
}
}
}
"""
url = 'https://api.sorare.com/graphql/'
r = requests.post(url, json={'query': query})
json_data = json.loads(r.text)
print(json_data)
Do you have an idea to help me get all the pages of a request please?

Python GraphQL query issue

I'm using Python to make requests to Pipefy GraphQL API.
I already read the documentation and make search in pipefy forum, but
I could not figure what is wrong with the query bellow:
pipeId = '171258'
query ="""
{
"query": "{allCards(pipeId: %s, first: 30, after: 'WyIxLjAiLCI1ODAuMCIsMzI0OTU0NF0'){pageInfo{endCursor hasNextPage}edges{node{id title}}}}"
}
"""%(pipeid)
The query worked pretty well until I added the after parameter.
I already tried variations like:
after: "WyIxLjAiLCI1ODAuMCIsMzI0OTU0NF0"
after: \"WyIxLjAiLCI1ODAuMCIsMzI0OTU0NF0\"
after: \n"WyIxLjAiLCI1ODAuMCIsMzI0OTU0NF0\n"
I know the issue is related with the escaping, because the API return messages like this:
'{"errors":[{"locations":[{"column":45,"line":1}],"message":"token recognition error at: \'\'\'"},{"locations":[{"column":77,"line":1}],"message":"token recognition error at: \'\'\'"}]}\n'
(this message is returned when the request is made with after: 'WyIxLjAiLCI1ODAuMCIsMzI0OTU0NF0')
Any help here would be immensely handful!
Thanks

I had the same problem as you today (and saw your post on Pipefy's Support page). I personally entered in contact with Pipefy's developers but they weren't helpful at all.
I solved it by escaping the query correctly.
Try like this:
query = '{"query": "{ allCards(pipeId: %s, first: 30, after: \\"WyIxLjAiLCI1ODAuMCIsMzI0OTU0NF0\\"){ pageInfo{endCursor hasNextPage } edges { node { id title } } } }"}'
Using single quotes to define the string and double-backslashes before the doublequotes included in the cursor.

With the code snippet below you are able to call the function get_card_list passing the authentication token (as String) and the pipe_id (as integer) and retrieve the whole card list of your pipe.
The get_card_list function will call the function request_card_list until the hasNextpage is set to False, updating the cursor in each call.
# Function responsible to get cards from a pipe using Pipefy's GraphQL API
def request_card_list(auth_token, pipe_id, hasNextPage=False, endCursor=""):
url = "https://api.pipefy.com/graphql"
headers = {
'Content-Type': 'application/json',
'Authorization': 'Bearer %s' %auth_token
}
if not hasNextPage:
payload = '{"query": "{ allCards(pipeId: %i, first: 50) { edges { node { id title phases_history { phase { name } firstTimeIn lastTimeOut } } cursor } pageInfo { endCursor hasNextPage } } }"}' %pipe_id
else:
payload = '{"query": "{ allCards(pipeId: %i, first: 50, after: \\"%s\\") { edges { node { id title phases_history { phase { name } firstTimeIn lastTimeOut } } cursor } pageInfo { endCursor hasNextPage } } }"}' % (pipe_id, endCursor)
response = requests.request("POST", url, data=payload, headers=headers)
response_body = response.text
response_body_dict = json.loads(response_body)
response_dict_list = response_body_dict['data']['allCards']['edges']
card_list = []
for d in response_dict_list:
for h in d['node']['phases_history']:
h['firstTimeIn'] = datetime.strptime(h['firstTimeIn'], date_format)
if h['lastTimeOut']:
h['lastTimeOut'] = datetime.strptime(h['lastTimeOut'], date_format)
card_list.append(d['node'])
return_list = [card_list, response_body_dict['data']['allCards']['pageInfo']['hasNextPage'], response_body_dict['data']['allCards']['pageInfo']['endCursor']]
return return_list
# Function responsible to get all cards from a pipe using Pipefy's GraphQL API and pagination
def get_card_list(auth_token, pipe_id):
card_list = []
response = request_card_list(auth_token, pipe_id)
card_list = card_list + response[0]
while response[1]:
response = request_card_list(auth_token, pipe_id, response[1], response[2])
card_list = card_list + response[0]
return(card_list)

Thanks for Lodi answer, I was able to do the next step.
How to use a variable to pass the "after" parameter for the query
As it was quite difficult I decide to share it here for those facing the same challenge.
end_cursor = 'WyIxLjAiLCI2NTcuMCIsNDgwNDA2OV0'
end_cursor = "\\" + "\"" + end_cursor + "\\" + "\""
# desired output: end_cursor = '\"WyIxLjAiLCI2NTcuMCIsNDgwNDA2OV0\"'
query ="""
{
"query": "{allCards(pipeId: %s, first: 50, after: %s){pageInfo{endCursor hasNextPage}edges{node{id title}}}}"
}
"""%(pipeid, end_cursor)

Post GraphQL mutation with Python Requests

I'm having trouble posting mutations with GraphQL and Python Requests.
My function looks like:
def create(request):
access_token = 'REDACTED'
headers = {
"X-Shopify-Storefront-Access-Token": access_token
}
mutation = """
{
checkoutCreate(input: {
lineItems: [{ variantId: "Z2lkOi8vc2hvcGlmeS9Qcm9kdWN0VmFyaWFudC80", quantity: 1 }]
}) {
checkout {
id
webUrl
lineItems(first: 5) {
edges {
node {
title
quantity
}
}
}
}
}
}
"""
data = (requests.post('https://catsinuniform.myshopify.com/api/graphql', json={'mutation': mutation}, headers=headers).json())
print(data)
return render(request, 'Stock/create.html', { 'create': data })
I'm getting errors saying I have a bad request "bad_request - Parameter Missing or Invalid" in my json response.

Even though you're sending a mutation, your request body should still include a query property, the value of which should be the string representing your operation. It's a bit confusing, but informally both queries and mutations are called "queries" (you're still "querying" the server either way). Change your request to:
requests.post('https://catsinuniform.myshopify.com/api/graphql', json={'query': mutation}, headers=headers)

Elastic search not giving data with big number for page size

Size of data to get: 20,000 approx
Issue: searching Elastic Search indexed data using below command in python
but not getting any results back.
from pyelasticsearch import ElasticSearch
es_repo = ElasticSearch(settings.ES_INDEX_URL)
search_results = es_repo.search(
query, index=advertiser_name, es_from=_from, size=_size)
If I give size less than or equal to 10,000 it works fine but not with 20,000
Please help me find an optimal solution to this.
PS: On digging deeper into ES found this message error:
Result window is too large, from + size must be less than or equal to: [10000] but was [19999]. See the scrolling API for a more efficient way to request large data sets.

for real time use the best solution is to use the search after query . You need only a date field, and another field that uniquely identify a doc - it's enough a _id field or an _uid field.
Try something like this, in my example I would like to extract all the documents that belongs to a single user - in my example the user field has a keyword datatype:
from elasticsearch import Elasticsearch
es = Elasticsearch()
es_index = "your_index_name"
documento = "your_doc_type"
user = "Francesco Totti"
body2 = {
"query": {
"term" : { "user" : user }
}
}
res = es.count(index=es_index, doc_type=documento, body= body2)
size = res['count']
body = { "size": 10,
"query": {
"term" : {
"user" : user
}
},
"sort": [
{"date": "asc"},
{"_uid": "desc"}
]
}
result = es.search(index=es_index, doc_type=documento, body= body)
bookmark = [result['hits']['hits'][-1]['sort'][0], str(result['hits']['hits'][-1]['sort'][1]) ]
body1 = {"size": 10,
"query": {
"term" : {
"user" : user
}
},
"search_after": bookmark,
"sort": [
{"date": "asc"},
{"_uid": "desc"}
]
}
while len(result['hits']['hits']) < size:
res =es.search(index=es_index, doc_type=documento, body= body1)
for el in res['hits']['hits']:
result['hits']['hits'].append( el )
bookmark = [res['hits']['hits'][-1]['sort'][0], str(result['hits']['hits'][-1]['sort'][1]) ]
body1 = {"size": 10,
"query": {
"term" : {
"user" : user
}
},
"search_after": bookmark,
"sort": [
{"date": "asc"},
{"_uid": "desc"}
]
}
Then you will find all the doc appended to the result var
If you would like to use scroll query - doc here:
from elasticsearch import Elasticsearch, helpers
es = Elasticsearch()
es_index = "your_index_name"
documento = "your_doc_type"
user = "Francesco Totti"
body = {
"query": {
"term" : { "user" : user }
}
}
res = helpers.scan(
client = es,
scroll = '2m',
query = body,
index = es_index)
for i in res:
print(i)

Probably its ElasticSearch constraints.
index.max_result_window index setting which defaults to 10,000

How to use "suggest" in elasticsearch pyes?

How to use the "suggest" feature in pyes? Cannot seem to figure it out due to poor documentation. Could someone provide a working example? None of what I tried appears to work. In the docs its listed under query, but using:
query = Suggest(fields="fieldname")
connectionobject.search(query=query)

Since version 5:
_suggest endpoint has been deprecated in favour of using suggest via _search endpoint. In 5.0, the _search endpoint has been optimized for suggest only search requests.
(from https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-suggesters.html)
Better way to do this is using search api with suggest option
from elasticsearch import Elasticsearch
es = Elasticsearch()
text = 'ra'
suggest_dictionary = {"my-entity-suggest" : {
'text' : text,
"completion" : {
"field" : "suggest"
}
}
}
query_dictionary = {'suggest' : suggest_dictionary}
res = es.search(
index='auto_sugg',
doc_type='entity',
body=query_dictionary)
print(res)
Make sure you have indexed each document with suggest field
sample_entity= {
'id' : 'test123',
'name': 'Ramtin Seraj',
'title' : 'XYZ',
"suggest" : {
"input": [ 'Ramtin', 'Seraj', 'XYZ'],
"output": "Ramtin Seraj",
"weight" : 34 # a prior weight
}
}

Here is my code which runs perfectly.
from elasticsearch import Elasticsearch
es = Elasticsearch()
text = 'ra'
suggDoc = {
"entity-suggest" : {
'text' : text,
"completion" : {
"field" : "suggest"
}
}
}
res = es.suggest(body=suggDoc, index="auto_sugg", params=None)
print(res)
I used the same client mentioned on the elasticsearch site here
I indexed the data in the elasticsearch index by using completion suggester from here

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Querying ElasticSearch with Python Requests not working fine - python

search = {'query': {'match': {'test_id':13} }, 'sort' {'date_utc':{'order':'desc'}} } data = requests.get('http://localhost:9200/newsidx/test/_search?&pretty',params = search) print data.json() http://docs.python-requests.org/en/latest/user/quickstart/

Related

Python - API- pagination Graphql

Python GraphQL query issue

Post GraphQL mutation with Python Requests

Elastic search not giving data with big number for page size

How to use "suggest" in elasticsearch pyes?

Categories

Resources