How to Store dictionary as array object within a mongodb field - python

I have MongoDB collection with many documents each with fields that looks like the one shown in the picture. The problem is with the field "searched". Its values are stored as a string because of which I cannot do a query for values like this {"searched.image_hash":"some_value"}. I use python to store values into MongoDB. In python, The variable "to_search" which is stored as "searched" in mongo is in fact a dictionary. Am not sure why the dictionary in "to_search" variable is stored as string within the mongodb "searched" field. Any suggestion as how to store the dictionary as array of object in mongodb?
The code I used in python is as follows
i have many other keys going into dictionary 'di'
di['account_id'] = acc_num
di['searched']= to_search
di['breakdown_queried'] = breakdown_to_query
di['combination']= [ele for ele in to_search.keys()]
di['ad_ids'] = ad_ids
di['date'] = date.today()
lo_str= ''
di = {k: str(v) for k, v in di.items()}
mongo_obj_remote.client["dev"]["ad_stats_tracker"].delete_one({"_id": {"$in": [di['_id']]}})
di_key_li = ['_id','account_id','date', 'combination','searched', 'ad_ids','breakdown_queried']
mongo_obj_remote.insert_single_document("dev", "ad_stats_tracker", {key: di[key] for key in di_key_li})

If your data in to_search is a python dict (and not a string that looks like one), then the pymongo drivers will store the data as a BSON "object", not a string; affectively this creates a sub-document within the document being stored in the collection.
Looking at your data I would suggest that your to_search is actually a string; the format is not valid for a dict as it contains a set (which pymongo won't be able to store in mongodb anyway).
You can check it with a small amount of debug code just before you insert the data:
print(type(di['searched']))

Related

I can't get a value from a JSON API response in python

So I am struggling with getting a value from a JSON response. Looking in other post I have managed to write this code but when I try to search for the key (character_id) that I want in the dictionary python says that the key doesn't exist. My solution consists in getting the JSON object from the response, converting it into a string with json.dumps() and the converting it into a dictionary with json.loads(). Then I try to get 'character_id' from the dictionary but it doesn't exist. I am guessing it is related with the format of the dictionary but I have little to none experience in python. The code that makes the query and tries to get the values is this: (dataRequest is a fuction that makes the request and return the response from the api)
characterName = sys.argv[1];
response = dataRequest('http://census.daybreakgames.com/s:888/get/ps2:v2/character/?name.first_lower=' + characterName + '&c:show=character_id')
jsonString = json.dumps(response.json())
print(jsonString)
dic = json.loads(jsonString)
print(dic)
if 'character_id' in dic:
print(dic['character_id'])
The output of the code is:
{"character_list": [{"character_id": "5428662532301799649"}], "returned": 1}
{'character_list': [{'character_id': '5428662532301799649'}], 'returned': 1}
Welcome #Prieto! From what I can see, you probably don't need to serialize/de-serialize the JSON -- response.json() returns a python dictionary object already.
The issue is that you are looking for the 'character_id' key at the top-level of the dictionary, when it seems to be embedded inside another dictionary, that is inside a list. Try something like this:
#...omitted code
for char_obj in dic["character_list"]:
if "character_id" in char_obj:
print(char_obj["character_id"])
if your dic is like {"character_list": [{"character_id": "5428662532301799649"}], "returned": 1}
you get the value of character_id by
print(dic['character_list'][0][character_id])
The problem here is that you're trying to access a dictionary where the key is actually character_list.
What you need to do is to access the character_list value and iterate over or filter the character_id you want.
Like this:
print(jsonString)
dic = json.loads(jsonString)
print(dic)
character_information = dic['character_list'][0] # we access the character list and assume it is the first value
print(character_information["character_id"]) # this is your character id
The way I see it, the only hiccup with the code is this :
if 'character_id' in dic:
print(dic['character_id'])
The problem is that, the JSON file actually consists of actually 2 dictionaries , first is the main one, which has two keys, character_list and returned. There is a second sub-dictionary inside the array, which is the value for the key character_list.
So, what your code should actually look like is something like this:
for i in dic["character_list"]:
print(i["character_id"])
On a side-note, it will help to look at JSON file in this way :
{
"character_list": [
{
"character_id": "5428662532301799649"
}
],
"returned": 1
}
,where, elements enclosed in curly-brackets'{}' imply they are in a dictionary, whereas elements enclosed in curly-brackets'[]' imply they are in a list

Reading nested JSON dynamically

I am currently trying out something which I am unsure if it is possible.
I am trying to map API values from a JSON string (which has nested values) to a database field but I wish for it to be dynamic.
In the YAML example below, the key would be the database field name and the database field value would be where to obtain the information from the JSON string ("-" delimited for nested values). I am able to read the YAML config but what I don't understand is how to translate it to python code. If it were to be dynamic I have no idea how many [] I would have to put.
YAML: (PYYAML package)
employer: "properties-employer_name"
...
employee_name: "employee"
Python Code: (Python 3.8)
json_data = { properties: {employer_name: "XYZ"}, employee: "Sam" }
employer = json_data["properties"]["employer_name"] # How Do I add [] based on how nested the value is dynamically?
employee = json_data["employee"]
Many thanks!
You could try something like this:
def get_value(data, keys):
# Go over each key and adjust data value to current level
for key in keys:
data = data[key]
return data # Once last key is reached return value
You would get your keys by splitting on '-' if that is how you have it in your yaml so in my example I just saved the value to a string and did it this way:
employer = "properties-employer_name"
keys = employer.split('-') # Gives us ['properties', 'employer_name']
Now we can call our get_value function defined above:
get_value(json_data, keys)
Which returns 'XYZ'

Process malformed JSON string in Python

I'm trying to process a log from Symphony using Pandas, but have some trouble with a malformed JSON which I can't parse.
An example of the log :
'{id:46025,
work_assignment:43313=>43313,
declaration:<p><strong>Bijkomende interventie.</strong></p>\r\n\r\n<p>H </p>\r\n\r\n<p><strong><em>Vaststellingen.</em></strong></p>\r\n\r\n<p><strong><em>CV. </em></strong>De.</p>=><p><strong>Bijkomende interventie.</strong></p>\r\n\r\n<p>He </p>\r\n\r\n<p><strong><em>Vaststellingen.</em></strong></p>\r\n\r\n<p><strong><em>CV. </em></strong>De.</p>,conclusions:<p>H </p>=><p>H </p>}'
What is the best way to process this?
For each part (id/work_assignment/declaration/etc) I would like to retrieve the old and new value (which are separated by "=>").
Use the following code:
def clean(my_log):
my_log.replace("{", "").replace("}", "") # Removes the unneeded { }
my_items = list(my_log.split(",")) # Split at the comma to get the pairs
my_dict = {}
for i in my_items:
key, value = i.split(":") # Split at the colon to separate the key and value
my_dict[key] = value # Add to the dictionary
return my_dict
Function returns a Python dictionary, which can then be converted to JSON using a serializer if needed, or directly used.
Hope I helped :D

I need to create a social network using Python and Mongodb

I want to create a graph from a Mongodb collection. Nodes of this graph should be inventors of patents and they should be linked by a common id (that represents the patent in common).
Here is the code I wrote in order to print only nodes.
from pymongo import MongoClient
from pymongo import ASCENDING, DESCENDING
import networkx as nx
import matplotlib.pyplot as plt
uri ="mongodb://127.0.0.1:27017/Patent"
client = MongoClient(uri)
righe ={1:'CODINV2', 2:'INCY', 3:'INNAME', 4:'INADDR',5:'INADOTH',6:'INCITY',7:'INCOUNTY',8:'INREGION',9:'INSTATE',10:'INZIP',11:'nuts3',12:'alive',13:'APPLN_ID',14:'PROGR'}
db = client['Patent']
collection2 = db['projects']
collection = db['myprova']
nodi={}
i=0
G=nx.Graph()
k=1 #this parameter represents the fact that an inventor is still alive
db.projects.aggregate([{"$match": {"$and": [{"alive": k}, {"INCY": "IT"}]}}, {"$group": {"_id": "$CODINV2"}}, {"$out": "myprova"}], allowDiskUse=True)
inventor = collection.find()
newList=[]
for inv in inventor:
newList.append(inv)
print newList
for idi in newList:
nodi[idi] = i
G.add_node(i)
i += 1
#print(G.number_of_nodes())
nx.draw(G)
plt.show()
The attribute CODINV2 represents each inventor's id.
Running this code this errors appear in console:
http://i.stack.imgur.com/BC9wd.png
How can I solve this problem? Do you know another solution to reach my goal? I'm new in MondoDB and Python
From the error, I infer that idi is a dictionary. A dictionary cannot be hashed and therefore cannot be used as a key to another dictionary. It seems that your find query is returning a a set of dictionaries.
You are trying to store a dictionary as key for another dictionary and that is not allowed because dictionary is not hashable. See below
A dictionary’s keys are almost arbitrary values. Values that are not
hashable, that is, values containing lists, dictionaries or other
mutable types (that are compared by value rather than by object
identity) may not be used as keys.
Basically you have
nodi = {} //which is a dictionary
And below code tries to store dictionary idi as key of nodi
for idi in newList:
nodi[idi] = i
Because idi is a dictionary (as shown below) you get error
{"uid": xxxxxx} where xxx is numbers
if you replace the following
nodi[idi] = i
With
nodi[i] = idi
Then you won't get an error because i is hashable (just like string unlike list and dictionary).
You might then need to change the way you add node to G, so something like:
G.add_node(nodi[i]) where nodi[i] is nothing but {"uid": xxxxx}

creating a dictionary named as variable

I am using pyodbc to query database and retrieve some data (obviously).
What I would like is to create new dictionaries namd according to values returned from query.
For example I have a table with 1000 items, and one of collumns in that table is a INT number which is in range 1-51.
I want to do a rundown through my table, and create dictionaries named : bought_I, sold_I, created_I, etc... where I stands for INT number.
I hope I am clear enough :)
I know I could premade those dicts, but range will not always be 1-51, and it's nicer and cleaner to do it programmatically than to hardcode it.
Don't.
Create a dictionary bought, and give it keys based on your number:
bought = {}
for number in column_x:
bought[number] = "whatever object you need here"
Same for sold, created etc.
Or just one big dict:
mydict = {"bought": {}, "sold": {}, "created": {}}
for number in column_x:
for key in mydict:
mydict[key][number] = "whatever"

Categories