Create nested maps in DynamoDB - python

I am trying to create a new map and also assign it a new value at the same time
This is the data format I want to store in my db:
{
"user_id": 1,
"project_id": 1,
"MRR": {
"NICHE": {
"define your niche": {
"vertical": "test",
"ideal prospect": "He is the best"
}
},
"Environment": {
"Trend 1": {
"description": "something"
},
"Trend 2": {
"description": "something else"
}
}
}
}
My code so far for inserting data is:
def update_dynamo(user_id, project_id, group, sub_type, data):
dynmoTable.update_item(
Key = {
"user_id": user_id
},
ConditionExpression=Attr("project_id").eq(project_id),
UpdateExpression="SET MRR.#group = :group_value",
ExpressionAttributeNames={
"#group": group
},
ExpressionAttributeValues={
":group_value": {}
}
)
dynmoTable.update_item(
Key={
"user_id": user_id
},
ConditionExpression=Attr("project_id").eq(project_id),
UpdateExpression="SET MRR.#group.#subgroup = :sub_value",
ExpressionAttributeNames={
"#group": group,
'#subgroup': sub_type
},
ExpressionAttributeValues={
":sub_value": data
}
)
data = {
"description": "world",
}
if __name__ == "__main__":
update_dynamo(1, 1, "New Category", "Hello", data)
My question is can these 2 update_items somehow be merged into one?

Sure, you can assign to the top-level attribute an entire nested "document", you don't need to assign only scalars.
Something like this should work:
dynmoTable.update_item(
Key = {
"user_id": user_id
},
ConditionExpression=Attr("project_id").eq(project_id),
UpdateExpression="SET MRR.#group = :group_value",
ExpressionAttributeNames={
"#group": group
},
ExpressionAttributeValues={
":group_value": {sub_type: sub_data}
}
)
Note how you set the "group" attribute to the Python dictionary {subtype: sub_data}. boto3 will convert this dictionary into the appropriate DynamoDB map attribute, as you expect. You can set sophisticated nested dictionaries, lists, nested in each other this way - in a single update.

Related

Python : How to loop through data to access similar keys present inside nested dict

I have an API, after calling which I'm getting a very big json in response.
I want to access similar keys which are present inside the nested dict.
I'm using following lines to make a get request and storing the json data : -
p25_st_devices = r'https://url_from_where_im_getting_data.com'
header_events = {
'Authorization': 'Basic random_keys'}
r2 = requests.get(p25_st_devices, headers= header_events)
r2_json = json.loads(r2.content)
The sample of the json is as follows : -
{
"next": "value",
"self": "value",
"managedObjects": [
{
"creationTime": "2021-08-02T10:48:15.120Z",
"type": " c8y_MQTTdevice",
"lastUpdated": "2022-03-24T17:09:01.240+03:00",
"childAdditions": {
"self": "value",
"references": []
},
"name": "PS_MQTT1",
"assetParents": {
"self": "value",
"references": []
},
"self": "value",
"id": "338",
"Building": "value"
},
{
"creationTime": "2021-08-02T13:06:09.834Z",
"type": " c8y_MQTTdevice",
"lastUpdated": "2021-12-27T12:08:20.186+03:00",
"childAdditions": {
"self": "value",
"references": []
},
"name": "FS_MQTT2",
"assetParents": {
"self": "value",
"references": []
},
"self": "value",
"id": "339",
"c8y_IsDevice": {}
},
{
"creationTime": "2021-08-02T13:06:39.602Z",
"type": " c8y_MQTTdevice",
"lastUpdated": "2021-12-27T12:08:20.433+03:00",
"childAdditions": {
"self": "value",
"references": []
},
"name": "PS_MQTT3",
"assetParents": {
"self": "value",
"references": []
},
"self": "value",
"id": "340",
"c8y_IsDevice": {}
}
],
"statistics": {
"totalPages": 423,
"currentPage": 1,
"pageSize": 3
}
}
As per my understanding I can access name key using r2_json['managedObjects'][0]['name']
But how do I iterate over this json and store all values of name inside an array?
EDIT 1 :
Another thing which I'm trying to achieve is get all id from the JSON data and store in an array where the nested dict managedObjects contains name starting with PS_ only.
Therefore, the expected output would be device_id = ['338','340']
You should not just call the [0] index of the list, but loop over it:
all_names = []
for object in r2_json['managedObjects']:
all_names.append(object['name'])
print(all_names)
edit: Updated answer after OP updated theirs.
For your second question you can use startswith(). The code is almost the same.
PS_names = []
for object in r2_json['managedObjects']:
if object['name'].startswith("PS_"):
PS_names.append(object['id']) # we append with the id, if startswith("PS_") returns True.
print(PS_names)

Very nested JSON with optional fields into pandas dataframe

I have a JSON with the following structure. I want to extract some data to different lists so that I will be able to transform them into a pandas dataframe.
{
"ratings": {
"like": {
"average": null,
"counts": {
"1": {
"total": 0,
"users": []
}
}
}
},
"sharefile_vault_url": null,
"last_event_on": "2021-02-03 00:00:01",
],
"fields": [
{
"type": "text",
"field_id": 130987800,
"label": "Name and Surname",
"values": [
{
"value": "John Smith"
}
],
{
"type": "category",
"field_id": 139057651,
"label": "Gender",
"values": [
{
"value": {
"status": "active",
"text": "Male",
"id": 1,
"color": "DCEBD8"
}
}
],
{
"type": "category",
"field_id": 151333010,
"label": "Field of Studies",
"values": [
{
"value": {
"status": "active",
"text": "Languages",
"id": 3,
"color": "DCEBD8"
}
}
],
}
}
For example, I create a list
names = []
where if "label" in the "fields" list is "Name and Surname" I append ["values"][0]["value"] so names now contains "John Smith". I do exactly the same for the "Gender" label and append the value to the list genders.
The above dictionary is contained in a list of dictionaries so I just have to loop though the list and extract the relevant fields like this:
names = []
genders = []
for r in range(len(users)):
for i in range(len(users[r].json()["items"])):
for field in users[r].json()["items"][i]["fields"]:
if field["label"] == "Name and Surname":
names.append(field["values"][0]["value"])
elif field["label"] == "Gender":
genders.append(field["values"][0]["value"]["text"])
else:
# Something else
where users is a list of responses from the API, each JSON of which has the items is a list of dictionaries where I can find the field key which has as the value a list of dictionaries of different fields (like Name and Surname and Gender).
The problem is that the dictionary with "label: Field of Studies" is optional and is not always present in the list of fields.
How can I manage to check for its presence, and if so append its value to a list, and None otherwise?
To me it seems that the data you have is not valid JSON. However if I were you I would try using pandas.json_normalize. According to the documentation this function will put None if it encounters an object with a label not inside it.

How to write match condition for array values?

I have stored values in multiple variables. below are the input variables.
uid = Objectid("5d518caed55bc00001d235c1")
disuid = ['5d76b2c847c8d3000184a090', '5d7abb7a97a90b0001326010']
These values are changed dynamically. and below is my code:
user_posts.aggregate([{
"$match": {
"$or": [{ "userid": uid }, {
"userid": {
"$eq":
disuid
}
}]
}
},
{
"$lookup": {
"from": "user_profile",
"localField": "userid",
"foreignField": "_id",
"as": "details"
}
},
{ "$unwind": "$details" },
{
"$sort": { "created_ts": -1 }
},
{
"$project": {
"userid": 1,
"type": 1,
"location": 1,
"caption": 1
}
}
])
In the above code, I am getting matched uid values only but I need documents matched to disuid also.
In userid field, we have stored "Objectid" values only.
So my concern is how to add "Objectid" to "disuid" variable and how to write match condition for both variables using userid field?
Ok you can do it in two ways :
As you've this :
uid = Objectid("5d518caed55bc00001d235c1")
disuid = ['5d76b2c847c8d3000184a090', '5d7abb7a97a90b0001326010']
You need to convert your list of strings to list of ObjectId's using python code :
from bson.objectid import ObjectId
disuid = ['5d76b2c847c8d3000184a090', '5d7abb7a97a90b0001326010']
my_list = []
for i in disuid:
my_list.append(ObjectId(i))
It will look like this :
[ObjectId('5d76b2c847c8d3000184a090'),ObjectId('5d7abb7a97a90b0001326010')]
then by using new list my_list, you can do query like this :
user_posts.aggregate([{"$match" : { "$or" : [{ "userid" : uid }, { "userid" : { "$in" : my_list }}]}}])
Or in the other way which I wouldn't prefer, as converting just few in code is easier compared to n num of values for userid field over all documents in DB, but just in case if you want it to be done using DB query :
user_posts.aggregate([{$addFields : {userStrings : {$toString: '$userid'}}},{"$match" : { "$or" : [{ "userid" : uid }, { "userStrings" : { "$in" : disuid }}]}}])
Note : In case if you don't have bson package, then you need to install it by doing something like pip install bson

Add #timestamp field in ElasticSearch with Python

I'm using Python to add entries in a local ElasticSearch (localhost:9200)
Currently, I use this method:
def insertintoes(data):
"""
Insert data into ElasicSearch
:param data: dict
:return:
"""
timestamp = data.get('#timestamp')
logstashIndex = 'logstash-' + timestamp.strftime("%Y.%m.%d")
es = Elasticsearch()
if not es.indices.exists(logstashIndex):
# Setting mappings for index
mapping = '''
{
"mappings": {
"_default_": {
"_all": {
"enabled": true,
"norms": false
},
"dynamic_templates": [
{
"message_field": {
"path_match": "message",
"match_mapping_type": "string",
"mapping": {
"norms": false,
"type": "text"
}
}
},
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"fields": {
"keyword": {
"type": "keyword"
}
},
"norms": false,
"type": "text"
}
}
}
],
"properties": {
"#timestamp": {
"type": "date",
"include_in_all": true
},
"#version": {
"type": "keyword",
"include_in_all": true
}
}
}
}
}
'''
es.indices.create(logstashIndex, ignore=400, body=mapping)
es.index(index=logstashIndex, doc_type='system', timestamp=timestamp, body=data)
data is a dict structure with a valid #timestamp defined like this data['#timestamp'] = datetime.datetime.now()
The problem is, even if there is a timestamp value in my data, Kibana doesn't show the entry in «discovery» field. :(
Here is an example of a full entry in ElasicSearch:
{
"_index": "logstash-2017.06.25",
"_type": "system",
"_id": "AVzf3QX3iazKBndbIkg4",
"_score": 1,
"_source": {
"priority": 6,
"uid": 0,
"gid": 0,
"systemd_slice": "system.slice",
"cap_effective": "1fffffffff",
"exe": "/usr/bin/bash",
"hostname": "ns3003395",
"syslog_facility": 9,
"comm": "crond",
"systemd_cgroup": "/system.slice/cronie.service",
"systemd_unit": "cronie.service",
"syslog_identifier": "CROND",
"message": "(root) CMD (/usr/local/rtm/bin/rtm 14 > /dev/null 2> /dev/null)",
"systemd_invocation_id": "9228b6c72e6a4624a1806e4c59af8d04",
"syslog_pid": 26652,
"pid": 26652,
"#timestamp": "2017-06-25T17:27:01.734453"
}
}
As you can see, there IS a #timestamp field but it doesn't seems to be what Kibana expects.
And don't know what to do to make my entries visible in Kibana.
Any idea ?
Elasticsearch is not recognizing #timestamp as a date, but as a string. If your data['#timestamp'] is a datetime object, you can try to convert it to a ISO string, which is automatically recognized, try:
timestamp = data.get('#timestamp').isoformat()
timestamp should now be a string, but in ISO format

output every attribute, value in an uneven JSON object

I have a very long and uneven JSON object and I want to output every attribute, value for the end points (leaves) of the object.
For instance, it could look like this:
data = {
"Response": {
"Version": "2.0",
"Detail": {
"TransactionID": "Ib410c-2",
"Timestamp": "04:00"
},
"Transaction": {
"Severity": "Info",
"ID": "2222",
"Text": "Success"
},
"Detail": {
"InquiryDetail": {
"Value": "804",
"CountryISOAlpha2Code": "US"
},
"Product": {
"ID": "PRD",
"Org": {
"Header": {
"valuer": "804"
},
"Location": {
"Address": [
{
"CountryISOAlpha2Code": "US",
"Address": [
{
"Text": {
"#Value": 2,
"$": "Hill St"
}
}
]
}
]
}
}
}
}
}
}
I want to output each potential leaf. It can output the (final attribute or the entire path) and the value.
I know I just need to add something to this:
data = json.loads(inputFile)
small = repeat(data)
for attribute,value in small.iteritems():
print attribute,value
You could use recursion:
def print_leaf_keyvalues(d):
for key, value in d.iteritems():
if hasattr(value, 'iteritems'):
# recurse into nested dictionary
print_leaf_keyvalues(value)
else:
print key, value
Demo on your sample data:
>>> print_leaf_keyvalues(data)
Version 2.0
valuer 804
Address [{'CountryISOAlpha2Code': 'US', 'Address': [{'Text': {'#Value': 2, '$': 'Hill St'}}]}]
ID PRD
CountryISOAlpha2Code US
Value 804
Text Success
Severity Info
ID 2222
This will not handle the list value of Address however. You can always add an additional test for sequences and iterate and recurse again.

Categories