Extracting certain value from MongoDB using Python

Extracting certain value from MongoDB using Python - python

I have a mongo database including the following collection:
"
"_id": {
"$oid": "12345"
},
"id": "333555",
"token": [
{
"access_token": "ac_33bc",
"expires_in": 3737,
"token_type": "bearer",
"expires_at": {
"$date": "2021-07-02T13:37:28.123Z"
}
}
]
}
In the next python script I'm trying to return and print only the access_token but can't figure out how to do so. I've tried various methods which none of the worked.I've given the "id" as a parameter
def con_mongo():
try:
client = pymongo.MongoClient("mongodb:localhost")
#DB name
db = client["db1"]
#Collection
coll = db["coll1"]
#1st method
x = coll.find({"id":"333555"},{"token":"access_token"})
for data in x:
print(x)
#2nd method
x= coll.find({"id":"333555"})
tok=x.distinct("access_token")
#print(x[0])
for data in tok:
print(data)
except Exception:
logging.info(Exception)
It doesn't work this way, although if I replace (or remove) the "access_token" with simply "token" it works but I get back all the informations included in the field "token" where I only need the value of the "access_token".

Since access_token is an array element, you need to qualify it's name with the name of the array, to properly access its value.
Actually you can first extract the whole document and get the desired value through simple list and dict indexing.
So, assuming you are retrieving many documents with that same id:
x = [doc["token"][0]["access_token"] for doc in coll.find({"id":"333555"})]
The above, comprehensively creates a list with the access_tokens of all the documents matching the given id.
If you just need the first (and maybe only) occurrence of a document with that id, you can use find_one() instead:
x = coll.find_one({"id":"333555"})["token"][0]["access_token"]
# returns ac_33bc

token is a list so you have to reference the list element, e.g.
x = coll.find({"id":"333555"},{"token.access_token"})
for data in x:
print(data.get('token')[0].get('access_token'))
prints:
ac_33bc

Related

Python GraphQL API call composition

I've recently started learning how to use python and i'm having some trouble with a graphQL api call.
I'm trying to set up a loop to grab all the information using pagination, and my first request is working just fine.
values = """
{"query" : "{organizations(ids:) {pipes {id name phases {id name cards_count cards(first:30){pageInfo{endCursor hasNextPage} edges {node {id title current_phase{name} assignees {name} due_date createdAt finished_at fields{name value filled_at updated_at} } } } } }}}"}
"""
but the second call using the end cursor as a variable isn't working for me. I assume that it's because i'm not understanding how to properly escape the string of the variable. But for the life of me I'm unable to understand how it should be done.
Here's what I've got for it so far...
values = """
{"query" : "{phase(id: """ + phaseID+ """ ){id name cards_count cards(first:30, after:"""\" + pointer + "\"""){pageInfo{endCursor hasNextPage} edges {node {id title assignees {name} due_date createdAt finished_at fields{name value datetime_value updated_at phase_field { id label } } } } } } }"}
"""
the second one as it loops just returns a 400 bad request.
Any help would be greatly appreciated.

As a general rule you should avoid building up queries using string manipulation like this.
In the GraphQL query itself, GraphQL allows variables that can be placeholders in the query for values you will plug in later. You need to declare the variables at the top of the query, and then can reference them anywhere inside the query. The query itself, without the JSON wrapper, would look something like
query = """
query MoreCards($phase: ID!, $cursor: String) {
phase(id: $phase) {
id, name, cards_count
cards(first: 30, after: $cursor) {
... CardConnectionData
}
}
}
"""
To actually supply the variable values, they get passed as an ordinary dictionary
variables = {
"phase": phaseID,
"cursor": pointer
}
The actual request body is a straightforward JSON structure. You can construct this as a dictionary too:
body = {
"query": query,
"variables": variables
}
Now you can use the standard json module to format it to a string
print(json.dumps(body))
or pass it along to something like the requests package that can directly accept the object and encode it for you.

I had a similar situation where I had to aggregate data through paginating from a GraphQL endpoint. Trying the above solution didn't work for me that well.
to start my header config for graphql was like this:
headers = {
"Authorization":f"Bearer {token}",
"Content-Type":"application/graphql"
}
for my query string, I used the triple quote with a variable placeholder:
user_query =
"""
{
user(
limit:100,
page:$page,
sort:[{field:"email",order:"ASC"}]
){
list{
email,
count
}
}
"""
Basically, I had my loop here for the pages:
for page in range(1, 9):
formatted_query = user_query.replace("$page",f'{page}')
response = requests.post(api_url, data=formatted_query,
headers=headers)
status_code, json = response.status_code, response.json()

Update nested map dynamodb

I have a dynamodb table with an attribute containing a nested map and I would like to update a specific inventory item that is filtered via a filter expression that results in a single item from this map.
How to write an update expression to update the location to "in place three" of the item with name=opel,tags include "x1" (and possibly also f3)?
This should just update the first list elements location attribute.
{
"inventory": [
{
"location": "in place one", # I want to update this
"name": "opel",
"tags": [
"x1",
"f3"
]
},
{
"location": "in place two",
"name": "abc",
"tags": [
"a3",
"f5"
]
}],
"User" :"test"
}

Updated Answer - based on updated question statement
You can update attributes in a nested map using update expressions such that only a part of the item would get updated (ie. DynamoDB would apply the equivalent of a patch to your item) but, because DynamoDB is a document database, all operations (Put, Get, Update, Delete etc.) work on the item as a whole.
So, in your example, assuming User is the partition key and that there is no sort key (I didn't see any attribute that could be a sort key in that example), an Update request might look like this:
table.update_item(
Key={
'User': 'test'
},
UpdateExpression="SET #inv[0].#loc = :locVal",
ExpressionAttributeNames={
'#inv': 'inventory',
'#loc': 'location'
},
ExpressionAttributeValues={
':locVal': 'in place three',
},
)
That said, you do have to know what the item schema looks like and which attributes within the item should be updated exactly.
DynamoDB does NOT have a way to operate on sub-items. Meaning, there is no way to tell Dynamo to execute an operation such as "update item, set 'location' property of elements of the 'inventory' array that have a property of 'name' equal to 'opel'"
This is probably not the answer you were hoping for, but it is what's available today. You may be able to get closer to what you want by changing the schema a bit.
If you need to reference the sub-items by name, perhaps storing something like:
{
"inventory": {
"opel": {
"location": "in place one", # I want to update this
"tags": [ "x1", "f3" ]
},
"abc": {
"location": "in place two",
"tags": [ "a3", "f5" ]
}
},
"User" :"test"
}
Then your query would be:
table.update_item(
Key={
'User': 'test'
},
UpdateExpression="SET #inv.#brand.#loc = :locVal",
ExpressionAttributeNames={
'#inv': 'inventory',
'#loc': 'location',
'#brand': 'opel'
},
ExpressionAttributeValues={
':locVal': 'in place three',
},
)
But YMMV as even this has limitations because you are limited to identifying inventory items by name (ie. you still can't say "update inventory with tag 'x1'"
Ultimately you should carefully consider why you need Dynamo to perform these complex operations for you as opposed to you being specific about what you want to update.

You can update the nested map as follow:
First create and empty item attribute of type map. In the example graph is the empty item attribute.
dynamoTable = dynamodb.Table('abc')
dynamoTable.put_item(
Item={
'email': email_add,
'graph': {},
}
Update nested map as follow:
brand_name = 'opel'
DynamoTable = dynamodb.Table('abc')
dynamoTable.update_item(
Key={
'email': email_add,
},
UpdateExpression="set #Graph.#brand= :name, ",
ExpressionAttributeNames={
'#Graph': 'inventory',
'#brand': str(brand_name),
},
ExpressionAttributeValues = {
':name': {
"location": "in place two",
'tag': {
'graph_type':'a3',
'graph_title': 'f5'
}
}

Updating Mike's answer because that way doesn't work any more (at least for me).
It is working like this now (attention for UpdateExpression and ExpressionAttributeNames):
table.update_item(
Key={
'User': 'test'
},
UpdateExpression="SET inv.#brand.loc = :locVal",
ExpressionAttributeNames={
'#brand': 'opel'
},
ExpressionAttributeValues={
':locVal': 'in place three',
},
)
And whatever goes in Key={}, it is always partition key (and sort key, if any).
EDIT:
Seems like this way only works when with 2 level nested properties. In this case you would only use "ExpressionAttributeNames" for the "middle" property (in this example, that would be #brand: inv.#brand.loc). I'm not yet sure what is the real rule now.

DynamoDB UpdateExpression does not search on the database for matching cases like SQL (where you can update all items that match some condition). To update an item you first need to identify it and get primary key or composite key, if there are many items that match your criteria, you need to update one by one.
then the issue to update nested objects is to define UpdateExpression,ExpressionAttributeValues & ExpressionAttributeNames to pass to Dynamo Update Api .
I use a recursive function to update nested Objects on dynamoDB. You ask for Python but I use javascript, I think is easy to see this code and implents on Python:
https://gist.github.com/crsepulv/4b4a44ccbd165b0abc2b91f76117baa5
/**
* Recursive function to get UpdateExpression,ExpressionAttributeValues & ExpressionAttributeNames to update a nested object on dynamoDB
* All levels of the nested object must exist previously on dynamoDB, this only update the value, does not create the branch.
* Only works with objects of objects, not tested with Arrays.
* #param obj , the object to update.
* #param k , the seed is any value, takes sense on the last iteration.
*/
function getDynamoExpression(obj, k) {
const key = Object.keys(obj);
let UpdateExpression = 'SET ';
let ExpressionAttributeValues = {};
let ExpressionAttributeNames = {};
let response = {
UpdateExpression: ' ',
ExpressionAttributeNames: {},
ExpressionAttributeValues: {}
};
//https://stackoverflow.com/a/16608074/1210463
/**
* true when input is object, this means on all levels except the last one.
*/
if (((!!obj) && (obj.constructor === Object))) {
response = getDynamoExpression(obj[key[0]], key);
UpdateExpression = 'SET #' + key + '.' + response['UpdateExpression'].substring(4); //substring deletes 'SET ' for the mid level values.
ExpressionAttributeNames = {['#' + key]: key[0], ...response['ExpressionAttributeNames']};
ExpressionAttributeValues = response['ExpressionAttributeValues'];
} else {
UpdateExpression = 'SET = :' + k;
ExpressionAttributeValues = {
[':' + k]: obj
}
}
//removes trailing dot on the last level
if (UpdateExpression.indexOf(". ")) {
UpdateExpression = UpdateExpression.replace(". ", "");
}
return {UpdateExpression, ExpressionAttributeValues, ExpressionAttributeNames};
}
//you can try many levels.
const obj = {
level1: {
level2: {
level3: {
level4: 'value'
}
}
}
}

I had the same need.
Hope this code helps. You only need to invoke compose_update_expression_attr_name_values passing the dictionary containing the new values.
def compose_update_expression_attr_name_values(data: dict) -> (str, dict, dict):
""" Constructs UpdateExpression, ExpressionAttributeNames, and ExpressionAttributeValues for updating an entry of a DynamoDB table.
:param data: the dictionary of attribute_values to be updated
:return: a tuple (UpdateExpression: str, ExpressionAttributeNames: dict(str: str), ExpressionAttributeValues: dict(str: str))
"""
# prepare recursion input
expression_list = []
value_map = {}
name_map = {}
# navigate the dict and fill expressions and dictionaries
_rec_update_expression_attr_name_values(data, "", expression_list, name_map, value_map)
# compose update expression from single paths
expression = "SET " + ", ".join(expression_list)
return expression, name_map, value_map
def _rec_update_expression_attr_name_values(data: dict, path: str, expressions: list, attribute_names: dict,
attribute_values: dict):
""" Recursively navigates the input and inject contents into expressions, names, and attribute_values.
:param data: the data dictionary with updated data
:param path: the navigation path in the original data dictionary to this recursive call
:param expressions: the list of update expressions constructed so far
:param attribute_names: a map associating "expression attribute name identifiers" to their actual names in ``data``
:param attribute_values: a map associating "expression attribute value identifiers" to their actual values in ``data``
:return: None, since ``expressions``, ``attribute_names``, and ``attribute_values`` get updated during the recursion
"""
for k in data.keys():
# generate non-ambiguous identifiers
rdm = random.randrange(0, 1000)
attr_name = f"#k_{rdm}_{k}"
while attr_name in attribute_names.keys():
rdm = random.randrange(0, 1000)
attr_name = f"#k_{rdm}_{k}"
attribute_names[attr_name] = k
_path = f"{path}.{attr_name}"
# recursion
if isinstance(data[k], dict):
# recursive case
_rec_update_expression_attr_name_values(data[k], _path, expressions, attribute_names, attribute_values)
else:
# base case
attr_val = f":v_{rdm}_{k}"
attribute_values[attr_val] = data[k]
expression = f"{_path} = {attr_val}"
# remove the initial "."
expressions.append(expression[1:])

Python - Search and export information from JSON

This is the structure of my json file
},
"client1": {
"description": "blabla",
"contact name": "",
"contact email": "",
"third party organisation": "",
"third party contact name": "",
"third party contact email": "",
"ranges": [
"1.1.1.1",
"2.2.2.2",
"3.3.3.3"
]
},
"client2": {
"description": "blabla",
"contact name": "",
"contact email": "",
"third party organisation": "",
"third party contact name": "",
"third party contact email": "",
"ranges": [
"4.4.4.4",
"2.2.2.2"
]
},
I've seen ways to export specific parts of this json file but not everything. Basically all I want to do is search through the file using user input.
All I'm struggling with is how I actually use the user input to search and print everything under either client1 or client2 based on the input? I am sure this is only 1 or 2 lines of code but cannot figure it out. New to python. This is my code
data = json.load(open('clients.json'))
def client():
searchq = input('Client to export: '.capitalize())
search = ('""'+searchq+'"')
a = open('Log.json', 'a+')
a.write('Client: \n')
client()

This should get you going:
# Safely open the file and load the data into a dictionary
with open('clients.json', 'rt') as dfile:
data = json.load(dfile)
# Ask for the name of the client
query = input('Client to export: ')
# Show the corresponding entry if it exists,
# otherwise show a message
print(data.get(query, 'Not found'))

I'm going to preface this by saying this is 100% a drive-by answering, but one thing you could do is have your user use a . (dot) delimited format for specifying the 'path' to the key in the dictionary/json structure, then implementing a recursive function to seek out the value under that path like so:
def get(query='', default=None, fragment=None):
"""
Recursive function which returns the value of the terminal
key of the query string supplied, or if no query
is supplied returns the whole fragment (dict).
Query string should take the form: 'each.item.is.a.key', allowing
the user to retrieve the value of a key nested within the fragment to
an arbitrary depth.
:param query: String representation of the path to the key for which
the value should be retrieved
:param default: If default is specified, returns instead of None if query is invalid
:param fragment: The dictionary to inspect
:return: value of the specified key or fragment if no query is supplied
"""
if not query:
return fragment
query = query.split('.')
try:
key = query.pop(0)
try:
if isinstance(fragment, dict) and fragment:
key = int(key) if isinstance(fragment.keys()[0], int) else key
else:
key = int(key)
except ValueError:
pass
fragment = fragment[key]
query = '.'.join(query)
except (IndexError, KeyError) as e:
return default if default is not None else None
if not fragment:
return fragment
return get(query=query, default=default, fragment=fragment)
There are going to be a million people who come by here with better suggestions than this and there are doubtless many improvements to be made to this function as well, but since I had it lying around I thought I'd put it here, at least as a starting point for you.
Note:
Fragment should probably be made a positional argument or something. IDK. Its not because I had to rip some application specific context out (it used to have a sensible default state) and I didn't want to start re-writing stuff, so I leave that up to you.
You can do some cool stuff with this function, given some data:
d = {
'woofage': 1,
'woofalot': 2,
'wooftastic': ('woof1', 'woof2', 'woof3'),
'woofgeddon': {
'woofvengers': 'infinity woof'
}
}
Try these:
get(fragment=d, query='woofage')
get(fragment=d, query='wooftastic')
get(fragment=d, query='wooftastic.0')
get(fragment=d, query='woofgeddon.woofvengers')
get(fragment=d, query='woofalistic', default='ultraWOOF')
Bon voyage!

Pass the json format into Dict then look into the topic you want and Read or write it
import json
r = {'is_claimed': True, 'rating': 3.5}
r = json.dumps(r) # Here you have json format {"is_claimed": true, "rating": 3.5}
Json to Dict:
loaded_r = json.loads(r) # {'is_claimed': True, 'rating': 3.5}
print (r)#Print json format
print (loaded_r) #Print dict
Read the Topic
Data=loaded_r['is_claimed'] #Print Topic desired
print(Data) #True
Overwrite the topic
loaded_r['is_claimed']=False
And also this would do the same
print(loaded_r['client1']['description'])

Traversing json array in python

I'm using urllib.request.urlopen to get a JSON response that looks like this:
{
"batchcomplete": "",
"query": {
"pages": {
"76972": {
"pageid": 76972,
"ns": 0,
"title": "Title",
"thumbnail": {
"original": "https://linktofile.com"
}
}
}
}
The relevant code to get the response:
response = urllib.request.urlopen("https://example.com?title="+object.title)
data = response.read()
encoding = response.info().get_content_charset('utf-8')
json_object = json.loads(data.decode(encoding))
I'm trying to retrieve the value of "original", but I'm having a hard time getting there.
I can do print(json_object['query']['pages'] but once I do print(json_object['query']['pages'][0] I run into a KeyError: 0.
How would I be able to, with python retrieve the value of original?

Do this instead:
my_content = json_object['query']['pages']['76972']['thumbnail']['original']
The reason is, you need to mention index as [0] only when you have list as the object. But in your case, every item is of dict type. You need to specify key instead of index
If number is dynamic, you may do:
page_content = json_object['query']['pages']
for content in page_content.values():
my_content = content['thumbnail']['original']
where my_content is the required information.

Doing [0] is looking for that key - which doesn't exist. Assuming you don't always know what the key of the page is, Try this:
pages = json_object['query']['pages']
for key, value in pages.items(): # this is python3
original = value['thumbnail']['original']
Otherwise you can simply grab it by the key if you do know (what appears to be) the pageid:
json_object['query']['pages']['76972']['thumbnail']['original']

You can iterate over keys:
for page_no in json_object['query']['pages']:
page_data = json_object['query']['pages'][page_no]

aggregate a field in elasticsearch-dsl using python

Can someone tell me how to write Python statements that will aggregate (sum and count) stuff about my documents?
SCRIPT
from datetime import datetime
from elasticsearch_dsl import DocType, String, Date, Integer
from elasticsearch_dsl.connections import connections
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q
# Define a default Elasticsearch client
client = connections.create_connection(hosts=['http://blahblahblah:9200'])
s = Search(using=client, index="attendance")
s = s.execute()
for tag in s.aggregations.per_tag.buckets:
print (tag.key)
OUTPUT
File "/Library/Python/2.7/site-packages/elasticsearch_dsl/utils.py", line 106, in __getattr__
'%r object has no attribute %r' % (self.__class__.__name__, attr_name))
AttributeError: 'Response' object has no attribute 'aggregations'
What is causing this? Is the "aggregations" keyword wrong? Is there some other package I need to import? If a document in the "attendance" index has a field called emailAddress, how would I count which documents have a value for that field?

First of all. I notice now that what I wrote here, actually has no aggregations defined. The documentation on how to use this is not very readable for me. Using what I wrote above, I'll expand. I'm changing the index name to make for a nicer example.
from datetime import datetime
from elasticsearch_dsl import DocType, String, Date, Integer
from elasticsearch_dsl.connections import connections
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q
# Define a default Elasticsearch client
client = connections.create_connection(hosts=['http://blahblahblah:9200'])
s = Search(using=client, index="airbnb", doc_type="sleep_overs")
s = s.execute()
# invalid! You haven't defined an aggregation.
#for tag in s.aggregations.per_tag.buckets:
# print (tag.key)
# Lets make an aggregation
# 'by_house' is a name you choose, 'terms' is a keyword for the type of aggregator
# 'field' is also a keyword, and 'house_number' is a field in our ES index
s.aggs.bucket('by_house', 'terms', field='house_number', size=0)
Above we're creating 1 bucket per house number. Therefore, the name of the bucket will be the house number. ElasticSearch (ES) will always give a document count of documents fitting into that bucket. Size=0 means to give use all results, since ES has a default setting to return 10 results only (or whatever your dev set it up to do).
# This runs the query.
s = s.execute()
# let's see what's in our results
print s.aggregations.by_house.doc_count
print s.hits.total
print s.aggregations.by_house.buckets
for item in s.aggregations.by_house.buckets:
print item.doc_count
My mistake before was thinking an Elastic Search query had aggregations by default. You sort of define them yourself, then execute them. Then your response can be split b the aggregators you mentioned.
The CURL for the above should look like:
NOTE: I use SENSE an ElasticSearch plugin/extension/add-on for Google Chrome. In SENSE you can use // to comment things out.
POST /airbnb/sleep_overs/_search
{
// the size 0 here actually means to not return any hits, just the aggregation part of the result
"size": 0,
"aggs": {
"by_house": {
"terms": {
// the size 0 here means to return all results, not just the the default 10 results
"field": "house_number",
"size": 0
}
}
}
}
Work-around. Someone on the GIT of DSL told me to forget translating, and just use this method. It's simpler, and you can just write the tough stuff in CURL. That's why I call it a work-around.
# Define a default Elasticsearch client
client = connections.create_connection(hosts=['http://blahblahblah:9200'])
s = Search(using=client, index="airbnb", doc_type="sleep_overs")
# how simple we just past CURL code here
body = {
"size": 0,
"aggs": {
"by_house": {
"terms": {
"field": "house_number",
"size": 0
}
}
}
}
s = Search.from_dict(body)
s = s.index("airbnb")
s = s.doc_type("sleepovers")
body = s.to_dict()
t = s.execute()
for item in t.aggregations.by_house.buckets:
# item.key will the house number
print item.key, item.doc_count
Hope this helps. I now design everything in CURL, then use Python statement to peel away at the results to get what I want. This helps for aggregations with multiple levels (sub-aggregations).

I do not have the rep to comment yet but wanted to make a small fix on Matthew's comment on VISQL's answer regarding from_dict. If you want to maintain the search properties, use update_from_dict rather the from_dict.
According to the Docs , from_dict creates a new search object but update_from_dict will modify in place, which is what you want if Search already has properties such as index, using, etc
So you would want to declare the query body before the search and then create the search like this:
query_body = {
"size": 0,
"aggs": {
"by_house": {
"terms": {
"field": "house_number",
"size": 0
}
}
}
}
s = Search(using=client, index="airbnb", doc_type="sleep_overs").update_from_dict(query_body)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting certain value from MongoDB using Python - python

token is a list so you have to reference the list element, e.g. x = coll.find({"id":"333555"},{"token.access_token"}) for data in x: print(data.get('token')[0].get('access_token')) prints: ac_33bc

Related

Python GraphQL API call composition

Update nested map dynamodb

Python - Search and export information from JSON

Traversing json array in python

aggregate a field in elasticsearch-dsl using python

Categories

Resources