I have a to add a data in a nested dictionary, where nested keys names can be unknown so it should create new keys itself if it doesn't find one or else it should append it an existing key
this is my logic
if os.path.exists(str(base_path)+"/face_encodings.pickle"):
with open(str(base_path) + "/face_encodings.pickle", 'rb') as handle:
faces_encodings = pickle.load(handle)
try:
faces_encodings[location][name] = encoding
except:
faces_encodings[location] = {}
faces_encodings[location][name] = encoding
handle.close()
print(faces_encodings)
else:
faces_encodings = {location:{}}
with open(str(base_path) + "/face_encodings.pickle", 'wb') as handle:
faces_encodings[location][name] = encoding
pickle.dump(faces_encodings, handle, protocol=pickle.HIGHEST_PROTOCOL)
handle.close()
print(faces_encodings)
In brief, suppose this is a dictionary looks like
{
location1:{
id1:encoding1,
id2:encoding2
},
location2:{
id3:encoding3,
id4:encoding4
},
location3:{
id5:encoding5,
id6:encoding6
}
}
So by my logic code if I have to save new encoding of location which does not exist it should create a new or else push it into existing location nested dict, but the issue it's replacing the other ids data
If I understand your question correctly,
you could check if a key exists in a dictionary using the "in" keyword. For example, if you have a dict myDict = {"message":"Hello"} then this statement
if "message" in myDict:
return true
else:
return false
will return true.
Using this logic, you can then either 1) Create a new dict OR 2) Change the existing content of the nested dict by adding new key
The defaultdict is perfect for this. It automatically creates the dict values if they don't already exist.
from collections import defaultdict
d = defaultdict(dict)
d[location][name] = encoding
For example:
d = defaultdict(dict)
d['giraffe']['description'] = 'tall'
d['giraffe']['legs'] = 4
d['fish']['legs'] = 0
# > defaultdict(dict,
# > {'giraffe': {'description': 'tall', 'legs': 4},
# > 'fish': {'legs': 0}})
Related
When i call this function to remove an item from a dict imported from json it doesn't work
def removeKey(key):
with open ("keys.json") as f:
data = json.loads(f.read())
for d in data["keys"]:
if(d["key"] == key):
del d
print(data)
with open ("keys.json", "w") as f:
json.dump(data, f)
This is how the dict is set up in JSON
{"keys": [
{"key": 1599853953652, "role": "MODERATOR", "Server": 753230650181550141, "uses": 1, "maxuses": 0}
]
}
It seems like you actually want to remove any dictionaries in the list of dicts under "keys" if the value of their "key" entry matches a certain number (e.g. 1599853953652).
Assuming that's the case, the cleanest approach is actually to create a new list that filters out the unwanted entries; you can't simply call del to remove an element while iterating.
data = json.loads(f.read())
filtered_keys = [d for d in data["keys"] if d["key"] != key]
data["keys"] = filtered_keys
remove 'role' - one liner demo
data = {"keys": [
{"key": 1599853953652, "role": "MODERATOR", "Server": 753230650181550141, "uses": 1, "maxuses": 0}
]
}
data['keys'][0].pop('role',None)
print(data)
output
{'keys': [{'key': 1599853953652, 'Server': 753230650181550141, 'uses': 1, 'maxuses': 0}]}
Using del d only deletes the variable d but doesn't affect the list you want to remove from. You need operate on the list itself. I would use the pop() method, although del would work too.
def removeKey(key):
with open ("keys.json") as f:
data = json.loads(f.read())
for i, d in enumerate(data["keys"]):
if d["key"] == key:
data["keys"].pop(i)
print(data)
with open ("keys.json", "w") as f:
json.dump(data, f)
for d in data["keys"]:
if(d["key"] == key):
del d
del deletes names. Deleting a name does not delete the associated value, unless it was the only name for the value.
The for d in ... loop creates d as an additional name for data["keys"]["key"]. When you del d, you are only deleting the name d -- you are not deleting the value from data.
This is equivalent to:
x = 1
y = x
del y
After running this code, x still exists. del y just removed the name y.
To put it another way, think of post-it notes stuck onto boxes. The post-it notes are names, and the boxes are values. One box can have many notes stuck on it, and removing a note doesn't destroy the box (unless it was the only note).
I have a string 'request.context.user_id' and I want to split the string by '.' and use each element in the list as a dictionary key. Is there a way to do this for lists of varying lengths without trying to hard code all the different possible list lengths after the split?
parts = string.split('.')
if len(parts)==1:
data = [x for x in logData if x[parts[0]] in listX]
elif len(parts)==2:
data = [x for x in logData if x[parts[0]][parts[1]] in listX]
else:
print("Add more hard code")
listX is a list of string values that should be retrieved by x[parts[0]][parts[1]
logData is a list obtained from reading a json file and then the list can be read into a dataframe using json_normalize... the df portion is provided to give some context about its structure.. a list of dicts:
import json
from pandas.io.json import json_normalize
with open(project_root+"filename") as f:
logData = json.load(f)
df = json_normalize(logData)
If you want arbitrary counts, that means you need a loop. You can use get repeatedly to drill through layers of dictionaries.
parts = "request.context.user_id".split(".")
logData = [{"request": {"context": {"user_id": "jim"}}}]
listX = "jim"
def generate(logData, parts):
for x in logData:
ref = x
# ref will be, successively, x, then the 'request' dictionary, then the
# 'context' dictionary, then the 'user_id' value 'jim'.
for key in parts:
ref = ref[key]
if ref in listX:
yield x
data = list(generate(logData, parts))) # ['jim']
I just realized in the comments you said that you didn't want to create a new dictionary but access an existing one x via chaining up the parts in the list.
(3.b) use a for loop to get/set the value in the key the path
In case you want to only read the value at the end of the path in
import copy
def get_val(key_list, dict_):
reduced = copy.deepcopy(dict_)
for i in range(len(key_list)):
reduced = reduced[key_list[i]]
return reduced
# this solution isn't mine, see the link below
def set_val(dict_, key_list, value_):
for key in key_list[:-1]:
dict_ = dict_.setdefault(key, {})
dict_[key_list[-1]] = value_
get_val()
Where the key_list is the result of string.slit('.') and dict_ is the x dictionary in your case.
You can leave out the copy.deepcopy() part, that's just for paranoid peeps like me - the reason is the python dict is not immutable, thus working on a deepcopy (a separate but exact copy in the memory) is a solution.
set_val() As I said it's not my idea, credit to #Bakuriu
dict.setdefault(key, default_value) will take care of non-existing keys in x.
(3) evaluating a string as code with eval() and/or exec()
So here's an ugly unsafe solution:
def chainer(key_list):
new_str = ''
for key in key_list:
new_str = "{}['{}']".format(new_str, key)
return new_str
x = {'request': {'context': {'user_id': 'is this what you are looking for?'}}}
keys = 'request.context.user_id'.split('.')
chained_keys = chainer(keys)
# quite dirty but you may use eval() to evaluate a string
print( eval("x{}".format(chained_keys)) )
# will print
is this what you are looking for?
which is the innermost value of the mockup x dict
I assume you could use this in your code like this
data = [x for x in logData if eval("x{}".format(chained_keys)) in listX]
# or in python 3.x with f-string
data = [x for x in logData if eval(f"x{chained_keys}") in listX]
...or something similar.
Similarly, you can use exec() to execute a string as code if you wanted to write to x, though it's just as dirty and unsafe.
exec("x{} = '...or this, maybe?'".format(chained_keys))
print(x)
# will print
{'request': {'context': {'user_id': '...or this, maybe?'}}}
(2) An actual solution could be a recursive function as so:
def nester(key_list):
if len(key_list) == 0:
return 'value' # can change this to whatever you like
else:
return {key_list.pop(0): nester(key_list)}
keys = 'request.context.user_id'.split('.')
# ['request', 'context', 'user_id']
data = nester(keys)
print(data)
# will result
{'request': {'context': {'user_id': 'value'}}}
(1) A solution with list comprehension for split the string by '.' and use each element in the list as a dictionary key
data = {}
parts = 'request.context.user_id'.split('.')
if parts: # one or more items
[data.update({part: 'value'}) for part in parts]
print(data)
# the result
{'request': 'value', 'context': 'value', 'user_id': 'value'}
You can overwrite the values in data afterwards.
Still new to Python and need a little help here. I've found some answers for iterating through a list of dictionaries but not for nested dictionaries in a list of dictionaries.
Here is the a rough structure of a single dictionary within the dictionary list
[{ 'a':'1',
'b':'2',
'c':'3',
'd':{ 'ab':'12',
'cd':'34',
'ef':'56'},
'e':'4',
'f':'etc...'
}]
dict_list = [{ 'a':'1', 'b':'2', 'c':'3', 'd':{ 'ab':'12','cd':'34', 'ef':'56'}, 'e':'4', 'f':'etc...'}, { 'a':'2', 'b':'3', 'c':'4', 'd':{ 'ab':'23','cd':'45', 'ef':'67'}, 'e':'5', 'f':'etcx2...'},{},........,{}]
That's more or less what I am looking at although there are some keys with lists as values instead of a dictionary but I don't think I need to worry about them right now although code that would catch those would be great.
Here is what I have so far which does a great job of iterating through the json and returning all the values for each 'high level' key.
import ujson as json
with open('test.json', 'r') as f:
json_text = f.read()
dict_list = json.loads(json_text)
for dic in dict_list:
for val in dic.values():
print(val)
Here is the first set of values that are returned when that loop runs
1
2
3
{'ab':'12','cd':'34','ef':'56'}
4
etc...
What I need to be able to do pick specific values from the top level and go one level deeper and grab specific values in that nested dictionary and append them to a list(s). I'm sure I am missing a simple solution. Maybe I'm looking at multiple loops?
Following the ducktype style encouraged with Python, just guess everything has a .values member, and catch it if they do not:
import ujson as json
with open('test.json', 'r') as f:
json_text = f.read()
dict_list = json.loads(json_text)
for dic in dict_list:
for val in dic.values():
try:
for l2_val in val.values():
print(l2_val)
except AttributeError:
print(val)
Bazingaa's solution would be faster if inner dictionaries are expected to be rare.
Of course, any more "deep" and you would need some recursion probably:
def print_dict(d):
for val in d.values():
try:
print_dict(val)
except AttributeError:
print(val)
How about checking for the instance type using isinstance (of course only works for one level deeper). Might not be the best way though
for dic in dict_list:
for val in dic.values():
if not isinstance(val, dict):
print(val)
else:
for val2 in val.values():
print (val2)
# 1
# 2
# 3
# 12
# 34
# 56
# 4
# etc...
# 2
# 3
I am trying to filter out a number of values from a python dictionary. Based on the answer seen here: Filter dict to contain only certain keys. I am doing something like:
new = {k:data[k] for k in FIELDS if k in data}
Basically create the new dictionary and care only about the keys listed in the FIELDS array. My array looks like:
FIELDS = ["timestamp", "unqiueID",etc...]
However, how do I do this if the key is nested? I.E. ['user']['color']?
How do I add a nested key to this array? I've tried:
[user][color], ['user']['color'], 'user]['color, and none of them are right :) Many of the values I need are nested fields. How can I add a nested key to this array and still have the new = {k:data[k] for k in FIELDS if k in data} bit work?
A quite simple approach, could look like the following (it will not work for all possibilities - objects in lists/arrays). You just need to specify a 'format' how you want to look for nested values.
'findValue' will split the searchKey (here on dots) in the given object, if found it searches the next 'sub-key' in the following value (assuming it is an dict/object) ...
myObj = {
"foo": "bar",
"baz": {
"foo": {
"bar": True
}
}
}
def findValue(obj, searchKey):
keys = searchKey.split('.')
for i, subKey in enumerate(keys):
if subKey in obj:
if i == len(subKey) -1:
return obj[subKey]
else:
obj = obj[subKey]
else:
print("Key not found: %s (%s)" % (subKey, keys))
return None
res = findValue(myObj, 'foo')
print(res)
res = findValue(myObj, 'baz.foo.bar')
print(res)
res = findValue(myObj, 'cantFind')
print(res)
Returns:
bar
True
Key not found: cantFind (cantFind)
None
Create a recursive function which checks whether the dictionary key has value or dictionary.
If key has dictionary again call function until you find the non-dictionary value.
When you find value just add it to your new created dictionary.
Hope this helps.
The way I go about nested dictionary is this:
dicty = dict()
tmp = dict()
tmp["a"] = 1
tmp["b"] = 2
dicty["A"] = tmp
dicty == {"A" : {"a" : 1, "b" : 1}}
The problem starts when I try to implement this on a big file, reading in line by line.
This is printing the content per line in a list:
['proA', 'macbook', '0.666667']
['proA', 'smart', '0.666667']
['proA', 'ssd', '0.666667']
['FrontPage', 'frontpage', '0.710145']
['FrontPage', 'troubleshooting', '0.971014']
I would like to end up with a nested dictionary (ignore decimals):
{'FrontPage': {'frontpage': '0.710145', 'troubleshooting': '0.971014'},
'proA': {'macbook': '0.666667', 'smart': '0.666667', 'ssd': '0.666667'}}
As I am reading in line by line, I have to check whether or not the first word is still found in the file (they are all grouped), before I add it as a complete dict to the higher dict.
This is my implementation:
def doubleDict(filename):
dicty = dict()
with open(filename, "r") as f:
row = 0
tmp = dict()
oldword = ""
for line in f:
values = line.rstrip().split(" ")
print(values)
if oldword == values[0]:
tmp[values[1]] = values[2]
else:
if oldword is not "":
dicty[oldword] = tmp
tmp.clear()
oldword = values[0]
tmp[values[1]] = values[2]
row += 1
if row % 25 == 0:
print(dicty)
break #print(row)
return(dicty)
I would actually like to have this in pandas, but for now I would be happy if this would work as a dict. For some reason after reading in just the first 5 lines, I end up with:
{'proA': {'frontpage': '0.710145', 'troubleshooting': '0.971014'}},
which is clearly incorrect. What is wrong?
Use a collections.defaultdict() object to auto-instantiate nested dictionaries:
from collections import defaultdict
def doubleDict(filename):
dicty = defaultdict(dict)
with open(filename, "r") as f:
for i, line in enumerate(f):
outer, inner, value = line.split()
dicty[outer][inner] = value
if i % 25 == 0:
print(dicty)
break #print(row)
return(dicty)
I used enumerate() to generate the line count here; much simpler than keeping a separate counter going.
Even without a defaultdict, you can let the outer dictionary keep the reference to the nested dictionary, and retrieve it again by using values[0]; there is no need to keep the temp reference around:
>>> dicty = {}
>>> dicty['A'] = {}
>>> dicty['A']['a'] = 1
>>> dicty['A']['b'] = 2
>>> dicty
{'A': {'a': 1, 'b': 1}}
All the defaultdict then does is keep us from having to test if we already created that nested dictionary. Instead of:
if outer not in dicty:
dicty[outer] = {}
dicty[outer][inner] = value
we simply omit the if test as defaultdict will create a new dictionary for us if the key was not yet present.
While this isn't the ideal way to do things, you're pretty close to making it work.
Your main problem is that you're reusing the same tmp dictionary. After you insert it into dicty under the first key, you then clear it and start filling it with the new values. Replace tmp.clear() with tmp = {} to fix that, so you have a different dictionary for each key, instead of the same one for all keys.
Your second problem is that you're never storing the last tmp value in the dictionary when you reach the end, so add another dicty[oldword] = tmp after the for loop.
Your third problem is that you're checking if oldword is not "":. That may be true even if it's an empty string, because you're comparing identity, not equality. Just change that to if oldword:. (This one, you'll usually get away with, because small strings are usually interned and will usually share identity… but you shouldn't count on that.)
If you fix both of those, you get this:
{'FrontPage': {'frontpage': '0.710145', 'troubleshooting': '0.971014'},
'proA': {'macbook': '0.666667', 'smart': '0.666667', 'ssd': '0.666667'}}
I'm not sure how to turn this into the format you claim to want, because that format isn't even a valid dictionary. But hopefully this gets you close.
There are two simpler ways to do it:
Group the values with, e.g., itertools.groupby, then transform each group into a dict and insert it all in one step. This, like your existing code, requires that the input already be batched by values[0].
Use the dictionary as a dictionary. You can look up each key as it comes in and add to the value if found, create a new one if not. A defaultdict or the setdefault method will make this concise, but even if you don't know about those, it's pretty simple to write it out explicitly, and it'll still be less verbose than what you have now.
The second version is already explained very nicely in Martijn Pieters's answer.
The first can be written like this:
def doubleDict(s):
with open(filename, "r") as f:
rows = (line.rstrip().split(" ") for line in f)
return {k: {values[1]: values[2] for values in g}
for k, g in itertools.groupby(rows, key=operator.itemgetter(0))}
Of course that doesn't print out the dict so far after every 25 rows, but that's easy to add by turning the comprehension into an explicit loop (and ideally using enumerate instead of keeping an explicit row counter).