Using Python elastisearch_dsl with nested objects - python

I want to try and use elasticsearch_dsl with python for the following
import elasticsearch
es_server = 'my_server_name'
es_port = '9200'
es_index_name = 'my_index_name'
es_connection = Elasticsearch([{'host': es_server, 'port': es_port}])
es_query = '{"query":{"bool":{"must":[{"term":{"data.party.fullName":"john do"}}],"must_not":[],"should":[]}},"from":0,"size":1,"sort":[],"facets":{}}'
my_results = es_connection.search(index=es_index_name, body=es_query)
print my_results
es_query ='{"query": {"nested" : {"filter" : {"term" : {"party.phoneList.phoneFullNumber" : "4081234567"}},"path" : "party.phoneList"}},"from" :0,"size" : 1}';
my_results = es_connection.search(index=es_index_name, body=es_query)
print my_results
I am able to get the 1st query but am not sure on the second one
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q
client = Elasticsearch('my_server:9200')
s = Search(using=client, index = "my_index").query("term",fullName="john do ")
response = s.execute()
print response
Not sure how to do the query using DSL for the nested object party.phoneList.phoneFullNumber
New to ES and hence could not figure out how to do the nested objects.
I looked at https://github.com/elastic/elasticsearch-dsl-py/issues/28 and could not quite figure out.
Thanks !

Just use __ instead of . to get around python's limitations and the nested query:
s = Search(using=client, index = "my_index")
s = s.query("nested",
path="party.phoneList",
query=Q("term", party__phoneList__phoneFullNumber="4081234567")
)

Related

Adding multiple values for a key in a dictionary for Python

Hello I am trying to retrieve multiple indcode using the below code. However I get an error saying "cannot specify for field more than once" Can you anyone please assist?
URL = "https://data.colorado.gov/resource/cjkq-q9ih.json"
D = dict()
D["area"] = 57
D["indcode"] = 10,23,81
document = requests.get(URL, D)
print(document.request.url)
Error message received.
{
"error" : true,
"message" : "cannot specify a field more than once"
}
screenshot attached
document = requests.get(URL, D)
print(document.request.url)
Getting data with multiple indcode is impossible at that URL until web developers provide that feature. You can try with another way.
If you want to retrieve multiple indcode, just try to iterate all data in one request from https://data.colorado.gov/resource/cjkq-q9ih.json?area=57. Try this :
import requests
URL = "https://data.colorado.gov/resource/cjkq-q9ih.json"
D = dict()
D["area"] = 57
# D["indcode"] = 10,23,81
indcode = [10,23,81]
document = requests.get(URL, D)
data = document.json()
filteredData = list(filter(lambda p: int(p['indcode']) in indcode, data))
print(filteredData)

passing parameters in neo4j using python

I want to pass parameter in CREATE using Python
e.g:
'''
n = "abc"
a = 1234
cqlCreate = "CREATE (cornell:university { name: $n,yob:$a})"
''''
but it dosen't work.. any suggestions please
It actually depends on the Python driver that you are using to connect to Neo4j (check https://neo4j.com/developer/python/ to see the list of available drivers). If you are using the official neo4j-driver, the code you wrote is correct. In order execute the Cypher query, you could do something like this:
from neo4j import GraphDatabase
uri = # insert neo4j uri
user = # insert neo4j username
password = # insert neo4j password
n = "abc"
a = 1234
query = "CREATE (cornell:university { name: $n,yob:$a})"
driver = GraphDatabase.driver(uri, auth=(user, password))
session = driver.session()
result = session.run(query, n=n, a=a)
session.close()
driver.close()
Although âńōŋŷXmoůŜ's answer will probably work, it is not recommended way to it.
See also:
https://neo4j.com/docs/api/python-driver/current/api.html
You can use the f-strings in Python. See example below. Note that
You need to use {{ as escape character for {
2 You need to use \ as escape character for "
n = "abc"
a = 1234
cqlCreate = f"CREATE (cornell:university {{name: \"{n}\", yob: {a}}})"
print (cqlCreate)
Result:
CREATE (cornell:university {name: "abc", yob: 1234})
reference: https://www.python.org/dev/peps/pep-0498/

List out auto scaling group names with a specific application tag using boto3

I was trying to fetch auto scaling groups with Application tag value as 'CCC'.
The list is as below,
gweb
prd-dcc-eap-w2
gweb
prd-dcc-emc
gweb
prd-dcc-ems
CCC
dev-ccc-wer
CCC
dev-ccc-gbg
CCC
dev-ccc-wer
The script I coded below gives output which includes one ASG without CCC tag.
#!/usr/bin/python
import boto3
client = boto3.client('autoscaling',region_name='us-west-2')
response = client.describe_auto_scaling_groups()
ccc_asg = []
all_asg = response['AutoScalingGroups']
for i in range(len(all_asg)):
all_tags = all_asg[i]['Tags']
for j in range(len(all_tags)):
if all_tags[j]['Key'] == 'Name':
asg_name = all_tags[j]['Value']
# print asg_name
if all_tags[j]['Key'] == 'Application':
app = all_tags[j]['Value']
# print app
if all_tags[j]['Value'] == 'CCC':
ccc_asg.append(asg_name)
print ccc_asg
The output which I am getting is as below,
['prd-dcc-ein-w2', 'dev-ccc-hap', 'dev-ccc-wfd', 'dev-ccc-sdf']
Where as 'prd-dcc-ein-w2' is an asg with a different tag 'gweb'. And the last one (dev-ccc-msp-agt-asg) in the CCC ASG list is missing. I need output as below,
dev-ccc-hap-sdf
dev-ccc-hap-gfh
dev-ccc-hap-tyu
dev-ccc-mso-hjk
Am I missing something ?.
In boto3 you can use Paginators with JMESPath filtering to do this very effectively and in more concise way.
From boto3 docs:
JMESPath is a query language for JSON that can be used directly on
paginated results. You can filter results client-side using JMESPath
expressions that are applied to each page of results through the
search method of a PageIterator.
When filtering with JMESPath expressions, each page of results that is
yielded by the paginator is mapped through the JMESPath expression. If
a JMESPath expression returns a single value that is not an array,
that value is yielded directly. If the result of applying the JMESPath
expression to a page of results is a list, then each value of the list
is yielded individually (essentially implementing a flat map).
Here is how it looks like in Python code with mentioned CCP value for Application tag of Auto Scaling Group:
import boto3
client = boto3.client('autoscaling')
paginator = client.get_paginator('describe_auto_scaling_groups')
page_iterator = paginator.paginate(
PaginationConfig={'PageSize': 100}
)
filtered_asgs = page_iterator.search(
'AutoScalingGroups[] | [?contains(Tags[?Key==`{}`].Value, `{}`)]'.format(
'Application', 'CCP')
)
for asg in filtered_asgs:
print asg['AutoScalingGroupName']
Elaborating on Michal Gasek's answer, here's an option that filters ASGs based on a dict of tag:value pairs.
def get_asg_name_from_tags(tags):
asg_name = None
client = boto3.client('autoscaling')
while True:
paginator = client.get_paginator('describe_auto_scaling_groups')
page_iterator = paginator.paginate(
PaginationConfig={'PageSize': 100}
)
filter = 'AutoScalingGroups[]'
for tag in tags:
filter = ('{} | [?contains(Tags[?Key==`{}`].Value, `{}`)]'.format(filter, tag, tags[tag]))
filtered_asgs = page_iterator.search(filter)
asg = filtered_asgs.next()
asg_name = asg['AutoScalingGroupName']
try:
asgX = filtered_asgs.next()
asgX_name = asg['AutoScalingGroupName']
raise AssertionError('multiple ASG\'s found for {} = {},{}'
.format(tags, asg_name, asgX_name))
except StopIteration:
break
return asg_name
eg:
asg_name = get_asg_name_from_tags({'Env':env, 'Application':'app'})
It expects there to be only one result and checks this by trying to use next() to get another. The StopIteration is the "good" case, which then breaks out of the paginator loop.
I got it working with below script.
#!/usr/bin/python
import boto3
client = boto3.client('autoscaling',region_name='us-west-2')
response = client.describe_auto_scaling_groups()
ccp_asg = []
all_asg = response['AutoScalingGroups']
for i in range(len(all_asg)):
all_tags = all_asg[i]['Tags']
app = False
asg_name = ''
for j in range(len(all_tags)):
if 'Application' in all_tags[j]['Key'] and all_tags[j]['Value'] in ('CCP'):
app = True
if app:
if 'Name' in all_tags[j]['Key']:
asg_name = all_tags[j]['Value']
ccp_asg.append(asg_name)
print ccp_asg
Feel free to ask if you have any doubts.
The right way to do this isn't via describe_auto_scaling_groups at all but via describe_tags, which will allow you to make the filtering happen on the server side.
You can construct a filter that asks for tag application instances with any of a number of values:
Filters=[
{
'Name': 'key',
'Values': [
'Application',
]
},
{
'Name': 'value',
'Values': [
'CCC',
]
},
],
And then your results (in Tags in the response) are all the times when a matching tag is applied to an autoscaling group. You will have to make the call multiple times, passing back NextToken every time there is one, to go through all the pages of results.
Each result includes an ASG ID that the matching tag is applied to. Once you have all the ASG IDs you are interested in, then you can call describe_auto_scaling_groups to get their names.
yet another solution, in my opinion simple enough to extend:
client = boto3.client('autoscaling')
search_tags = {"environment": "stage"}
filtered_asgs = []
response = client.describe_auto_scaling_groups()
for group in response['AutoScalingGroups']:
flattened_tags = {
tag_info['Key']: tag_info['Value']
for tag_info in group['Tags']
}
if search_tags.items() <= flattened_tags.items():
filtered_asgs.append(group)
print(filtered_asgs)

KeyError accessing inner_hits in elasticsearch_dsl Python client

I've encountered a problem when accessing inner_hits data using the Python elasticsearch_dsl client. Any attempt to use the embedded Response object within meta.inner_hits yields a KeyError on "_type" in the container object. The following code is completely self-contained so anyone should be able to reproduce the same result using Python 2.7 and elasticsearch_dsl 5.0.0:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Index, Mapping, Nested, Search, Q
from elasticsearch_dsl.connections import connections
index = "test_index"
es_con = connections.create_connection(hosts=["localhost:9200"])
es_index = Index(index)
if es_index.exists():
es_index.delete()
es_index.create()
para_mapping = Mapping("paragraph")
para_mapping.field("sentences", Nested())
para_mapping.save(index)
test_paras = {}
for a in range(2):
test_paras[a] = {
"label": "Test Paragraph p{}".format(a),
"sentences": []
}
for b in range(2):
test_sent = {
"text": "Test Sentence p{}s{}".format(a, b),
}
test_paras[a]["sentences"].append(test_sent)
for idx, para in test_paras.iteritems():
para_id = "para_id_p{}".format(idx)
es_con.create(
index=index,
doc_type="paragraph",
id=para_id,
body=para
)
es_index.flush()
q_1 = Search(using=es_con).index(index).doc_type('paragraph')
q_2 = q_1 = q_1.query(
'nested', path='sentences', query=Q('term', **{"sentences.text": "p0s1"}), inner_hits={}
)
q_1 = q_1.execute()
# We got the expected paragraph
print "PARA_ID: ", q_1.hits[0].meta.id
# With all sentences
print "PARA[SENTENCES]: ", q_1.hits[0].sentences
# We can see inner_hits is included in para.meta
print "DIR PARA.META: ", dir(q_1.hits[0].meta)
# And it contains a "sentences" object
print "DIR PARA.META.INNER_HITS: ", dir(q_1.hits[0].meta.inner_hits)
# Of type elasticsearch_dsl.result.Response
print "TYPE PARA.META.INNER_HITS.SENTENCES:", type(q_1.hits[0].meta.inner_hits.sentences)
# That contains a "hits" object
print "DIR PARA.META.INNER_HITS.SENTENCES: ", dir(q_1.hits[0].meta.inner_hits.sentences)
# But every attempted action yields a KeyError: '_type' in result.AttrList()
try:
print q_1.hits[0].meta.inner_hits.sentences
except KeyError as e:
print "\nException:", type(e)
print "Message:", e
# Uncomment the following line to see full exception detail
# print dir(q.hits[0].meta.inner_hits.sentences.hits)
# The same query using the base-level API
es = Elasticsearch()
results = es.search(index=index, body=q_2.to_dict())
# This works just fine.
print "\nES RESULT:"
print results["hits"]["hits"][0]["inner_hits"]["sentences"]["hits"]["hits"][0]["_source"]["text"]
Is this a bug in the API?
This is a bug where we only account for inner_hits across parent/child relationship and not nested. I created an issue to track the fix: https://github.com/elastic/elasticsearch-dsl-py/issues/565
Thanks for the report!

How to solve "TypeError: expected string or buffer" when importing json data via api?

I'm trying to import JSON data via an API, and use the imported data to construct a DataFrame.
import json
import pandas as pd
import numpy as np
import requests
api_username = 'acb'
api_password = 'efg'
germany_name = 'Germany'
germany_api_url = "https://api.country_data.com/stats/?country=" + germany_name + "&year=2014"
germany_api_resp = requests.get(germany_api_url,auth=(api_username,api_password))
germany_data_json = json.loads(germany_api_resp)
germany_frame = pd.DataFrame(germany_data_json['data']).set_index('tag')
print(germany_frame) shows me the desired DataFrame.
I want to repeat the process for many countries, not just 'Germany', so I created a country object like this:
class Country(object):
def __init__(self,name):
self.name = name
self.api_url = "https://api.country_data.com/stats/?country=" + name + "&year=2014"
self.api_resp = requests.get(self.api_url,auth=(api_username,api_password))
self.data_json = json.loads(self.api_resp)
self.frame = pd.DataFrame(self.data_json['data']).set_index('tag')
When I create my first object, like this:
Germany = Country('Germany')
I get an Error message:
TypeError: expected string or buffer
Can someone help me with this issue?
I don't which version of Python you're using, and which version of requests but I recommend to you to update everything. Here is a error I found :
self.data_json = json.loads(self.api_resp)
You try to load in a json-way a Response from requests, so change it to :
self.data_json = self.api_resp.json()
I replaced your api url to another because yours is wrong and it works for me.
See ya !

Categories