I want to create a graph from a Mongodb collection. Nodes of this graph should be inventors of patents and they should be linked by a common id (that represents the patent in common).
Here is the code I wrote in order to print only nodes.
from pymongo import MongoClient
from pymongo import ASCENDING, DESCENDING
import networkx as nx
import matplotlib.pyplot as plt
uri ="mongodb://127.0.0.1:27017/Patent"
client = MongoClient(uri)
righe ={1:'CODINV2', 2:'INCY', 3:'INNAME', 4:'INADDR',5:'INADOTH',6:'INCITY',7:'INCOUNTY',8:'INREGION',9:'INSTATE',10:'INZIP',11:'nuts3',12:'alive',13:'APPLN_ID',14:'PROGR'}
db = client['Patent']
collection2 = db['projects']
collection = db['myprova']
nodi={}
i=0
G=nx.Graph()
k=1 #this parameter represents the fact that an inventor is still alive
db.projects.aggregate([{"$match": {"$and": [{"alive": k}, {"INCY": "IT"}]}}, {"$group": {"_id": "$CODINV2"}}, {"$out": "myprova"}], allowDiskUse=True)
inventor = collection.find()
newList=[]
for inv in inventor:
newList.append(inv)
print newList
for idi in newList:
nodi[idi] = i
G.add_node(i)
i += 1
#print(G.number_of_nodes())
nx.draw(G)
plt.show()
The attribute CODINV2 represents each inventor's id.
Running this code this errors appear in console:
http://i.stack.imgur.com/BC9wd.png
How can I solve this problem? Do you know another solution to reach my goal? I'm new in MondoDB and Python
From the error, I infer that idi is a dictionary. A dictionary cannot be hashed and therefore cannot be used as a key to another dictionary. It seems that your find query is returning a a set of dictionaries.
You are trying to store a dictionary as key for another dictionary and that is not allowed because dictionary is not hashable. See below
A dictionary’s keys are almost arbitrary values. Values that are not
hashable, that is, values containing lists, dictionaries or other
mutable types (that are compared by value rather than by object
identity) may not be used as keys.
Basically you have
nodi = {} //which is a dictionary
And below code tries to store dictionary idi as key of nodi
for idi in newList:
nodi[idi] = i
Because idi is a dictionary (as shown below) you get error
{"uid": xxxxxx} where xxx is numbers
if you replace the following
nodi[idi] = i
With
nodi[i] = idi
Then you won't get an error because i is hashable (just like string unlike list and dictionary).
You might then need to change the way you add node to G, so something like:
G.add_node(nodi[i]) where nodi[i] is nothing but {"uid": xxxxx}
Related
I have MongoDB collection with many documents each with fields that looks like the one shown in the picture. The problem is with the field "searched". Its values are stored as a string because of which I cannot do a query for values like this {"searched.image_hash":"some_value"}. I use python to store values into MongoDB. In python, The variable "to_search" which is stored as "searched" in mongo is in fact a dictionary. Am not sure why the dictionary in "to_search" variable is stored as string within the mongodb "searched" field. Any suggestion as how to store the dictionary as array of object in mongodb?
The code I used in python is as follows
i have many other keys going into dictionary 'di'
di['account_id'] = acc_num
di['searched']= to_search
di['breakdown_queried'] = breakdown_to_query
di['combination']= [ele for ele in to_search.keys()]
di['ad_ids'] = ad_ids
di['date'] = date.today()
lo_str= ''
di = {k: str(v) for k, v in di.items()}
mongo_obj_remote.client["dev"]["ad_stats_tracker"].delete_one({"_id": {"$in": [di['_id']]}})
di_key_li = ['_id','account_id','date', 'combination','searched', 'ad_ids','breakdown_queried']
mongo_obj_remote.insert_single_document("dev", "ad_stats_tracker", {key: di[key] for key in di_key_li})
If your data in to_search is a python dict (and not a string that looks like one), then the pymongo drivers will store the data as a BSON "object", not a string; affectively this creates a sub-document within the document being stored in the collection.
Looking at your data I would suggest that your to_search is actually a string; the format is not valid for a dict as it contains a set (which pymongo won't be able to store in mongodb anyway).
You can check it with a small amount of debug code just before you insert the data:
print(type(di['searched']))
i'm using an api call in python 3.7 which returns json data.
result = (someapicall)
the data returned appears to be in the form of two nested dictionaries within a list, i.e.
[{name:foo, firmware:boo}{name:foo, firmware:bar}]
i would like to retrieve the value of the key "name" from the first dictionary and also the value of key "firmware" from both dictionaries and store in a new dictionary in the following format.
{foo:(boo,bar)}
so far i've managed to retrieve the value of both the first "name" and the first "firmware" and store in a dictionary using the following.
dict1={}
for i in result:
dict1[(i["networkId"])] = (i['firmware'])
i've tried.
d7[(a["networkId"])] = (a['firmware'],(a['firmware']))
but as expected the above just seems to return the same firmware twice.
can anyone help achive the desired result above
you can use defaultdict to accumulate values in a list, like this:
from collections import defaultdict
result = [{'name':'foo', 'firmware':'boo'},{'name':'foo', 'firmware':'bar'}]
# create a dict with a default of empty list for non existing keys
dict1=defaultdict(list)
# iterate and add firmwares of same name to list
for i in result:
dict1[i['name']].append(i['firmware'])
# reformat to regular dict with tuples
final = {k:tuple(v) for k,v in dict1.items()}
print(final)
Output:
{'foo': ('boo', 'bar')}
I have a list of dictionaries in a json file.
I have iterated through the list and each dictionary to obtain two specific key:value pairs from each dictionary for each element.
i.e. List[dictionary{i(key_x:value_x, key_y:value_y)}]
My question is now:
How do I place these two new key: value pairs in a new list/dictionary/array/tuple, representing the two key: value pairs extracted for each listed element in the original?
To be clear:
ORIGINAL_LIST (i.e. with each element being a nested dictionary) =
[{"a":{"blah":"blah",
"key_1":value_a1,
"key_2":value_a2,
"key_3":value_a3,
"key_4":value_a4,
"key_5":value_a5,},
"b":"something_a"},
{"a":{"blah":"blah",
"key_1":value_b1,
"key_2":value_b2,
"key_3":value_b3,
"key_4":value_b4,
"key_5":value_b5,},
"b":"something_b"}]
So my code so far is:
import json
from collections import *
from pprint import pprint
json_file = "/some/path/to/json/file"
with open(json_file) as json_data:
data = json.load(json_data)
json_data.close()
for i in data:
event = dict(i)
event_key_b = event.get('b')
event_key_2 = event.get('key_2')
print(event_key_b)#print value of "b" for each nested dict for 'i'
print(event_key_2)#print value of "key_2" for each nested dict for 'i'
To be clear:
FINAL_LIST(i.e. with each element being a nested dictionary) =
[{"b":"something_a", "key_2":value_2},
{"b":"something_b", "key_2":value_2}]
So I have an answer to getting the keys into individual dictionaries, as follows in the code below. The only problem is that the value for 'key_2' in the original json dictionaries is either an int value or it is "" for values which are 0. My script just returns 'None' for all instances of value_2 for key_2. How can I get it to read the appropriate values for 'value_2'? I want to only return dictionaries for cases where 'value_2' > 0 (i.e. where value_2 != "")
Below is the current code:
import json
from pprint import pprint
json_file = "/some/path/to/json/file"
with open(json_file) as json_data:
data = json.load(json_data)
json_data.close()
for i in data:
event_key_b = event.get('b')
for x in i:
event_key_2 = event.get('key_2')
x = {'b' : something_b, 'key_2' : value_2}
print(x)
Also, if there are any more elegant solutions anyone can think of I would really be interested in learning them ... Some of the json files I'm looking at can range from 200 dictionary entries in the original list to 2,000,000. I'm planning to feed my parsed results into a message queue for processing by a different service and any efficiencies in the code will help for scalability in processing. Also if anyone has any recommendations to give on Redis vs. RabbitMQ, I'd really appreciate it
I try to append function objects to list, which is element of dictionary:
jobs = {}
job = sched.add_date_job(callback, run_at, [params])
jobs[hereCanBeRandomNumber].append(job)
But, it seems I have a problem in last line. Compiler says:KeyError: 118096950.
What's the problem?
The way you're adding to your dictionary is incorrect.
jobs[hereCanBeRandomNumber].append(job)
Translates to "Append the job to the value of the dictionary with key hereCanBeRandomNumber"
If you're trying to add to the dictionary, use:
jobs[hereCanBeRandomNumber] = job
This will add to the jobs dict, so it looks like:
jobs = {118096950: job}
So your problem here is you are trying to append to a key that does not exist yet.
Take a dictionary like
jobs = {}
and doing
jobs[123].append(foo)
will produce a KeyError as nothing exists at 123 yet.
To get around this you can do either of the following:
from collections import defaultdict
jobs = defaultdict(list)
jobs[123].append(foo)
which means if a key does not exist it is initialised to an empty list first or
jobs = {}
jobs[123] = jobs.get(123, []).append(job)
which checks jobs for the presence of the key and if it doesn't exist used an empty list
So I just started with Neo4j, and I'm trying to figure out how I might populate my DataFrame. I have a dictionary of words as keys and synonyms as values in a list and I want to populate Neo4j that seems like it would be an interesting way to learn how to use the database.
An example would be:
'CRUNK' : [u'drunk', u'wasted', u'high', u'crunked', u'crazy', u'hammered', u'alcohol', u'hyphy', u'party']
The lists are not going to be of equal length so converting it to a more typical csv format is not an option, and I haven't found an explanation of how I could populate the database like I would for the SQL database in a Django app. I want to do something like this:
for each k,v in dictionary:
add k and add relationship to each value in v
Does anyone have any tutorials, documentation or answers that could help point me in the right direction?
I think what you want to do you can do in Cypher directly:
MERGE (w:Word {text:{root}})
UNWIND {words} as word
MERGE (w2:Word {text:word})
MERGE (w2)-[:SYNONYM]->(w)
You would then run this statement with http://py2neo.org's cypher-session API and the two parameters, a single root word and a list of words.
you can also use foreach instead of unwind
MERGE (w:Word {text:{root}})
FOREACH (word IN {words} |
MERGE (w2:Word {text:word})
MERGE (w2)-[:SYNONYM]->(w)
)
FINAL EDIT INCORPORATING MERGE:
This uses a dictionary to checks to make sure their output isn't NoneType or 'NOT FOUND', and populates the graph with a 'SYNONYM' relationship using the merge function to ensure their aren't duplicates.
import pickle
from py2neo import Graph
from py2neo import Node, Relationship
import random
graph = Graph(f'http://neo4j:{pw}#localhost:7474/db/data/'))
udSyn = pickle.load(open('lookup_ud', 'rb'))
myWords = udSyn.keys()
for key in myWords:
print(key)
values = udSyn[key]
if values in [None, 'NOT FOUND']:
continue
node = graph.merge_one('WORD', 'name', key)
for value in values:
node2 = graph.merge_one('WORD', 'name', value)
synOfNode = Relationship(node, 'SYNONYM', node2)
graph.create(synOfNode)
graph.push()