Convert multi-line slash-delimited string into a nested dictionary - python

abc/pqr123/xy2/yes//T
abc/pqr245/kl3/yes//T
abc/ijk123/op5/yes//T
abc/pqr245/kl4/yes//T
These are the input values that I want to convert to a nested dictionary.Each value such as
abc, pqr123, xy2, yes, T
represents the name of a product.
My output should look something like this:
{"abc":{"pqr123":{"xy2":{"yes":{"T":[]}},"pqr245":"kl3":{"yes":{"T":
[]}},"kl4":{"yes":{"T":[]}},"ijk123":{"op5":{"yes":{"T":[]}}}
So I need a nested dictionary of all unique values and at the last key of the dictionary should have a value of empty list.
Below is my snippet of code that generates the output I require, but I want to do it more dynamically so it is best suited even
if the length of the input grows or shrinks. Please do let me know if are any better solution for this problem.
data_dict={}
for item in meta_line.split(','):
item = item.replace('//','/')
item = str(item)
item = item.split('/')
if item[0] == "":
continue
if item[0] not in data_dict.keys():
data_dict[item[0]] = {}
if item[1] not in data_dict[item[0]].keys():
data_dict[item[0]][item[1]] = {}
if item[2] not in data_dict[item[0]][item[1]].keys():
data_dict[item[0]][item[1]][item[2]] = {}
if item[3] not in data_dict[item[0]][item[1]][item[2]].keys():
data_dict[item[0]][item[1]][item[2]][item[3]] = {}
if item[4] not in data_dict[item[0]][item[1]][item[2]][item[3]].keys():
data_dict[item[0]][item[1]][item[2]][item[3]][item[4]] = []

You probably want something that's not dependent on so many massively nested brackets. This is a problem that using references to a mutable object will work well on.
meta_line = 'abc/pqr123/xy2/yes//T,abc/pqr245/kl3/yes//T,abc/ijk123/op5/yes//T,abc/pqr245/kl4/yes//T'
data = dict()
for item in meta_line.split(','):
dref = data
dict_tree = item.strip().replace('//', '/').split('/')
for i, val in enumerate(dict_tree):
if val in dref:
pass
elif i != len(dict_tree) - 1:
dref[val] = dict()
elif i == len(dict_tree) - 1:
dref[val] = list()
dref = dref[val]
Every iteration of the inner loop will move the reference dref down a level, and then reset it on every iteration of the outer loop. At the end, data should hold your nested dict.
Edit: Sorry, I just noticed that you wanted the last level to be a list. This is one solution to that problem, but isn't the best (it will create errors if there's a list in a spot that a later data entry wants to be a dict instead). I would probably choose to build my nested dict and then recursively replace any empty dicts with empty lists afterwards to avoid that problem.

You can use The dict.setdefault method in a loop to build the nested dictionary. I'll use the pprint module to display the output. Note that pprint.pprint sorts dictionary keys before the output is computed.
from pprint import pprint
data = '''\
abc/pqr123/xy2/yes//T
abc/pqr245/kl3/yes//T
abc/ijk123/op5/yes//T
abc/pqr245/kl4/yes//T
'''.splitlines()
nested_dict = {}
for row in data:
d = nested_dict
keys = [s for s in row.split('/') if s]
for key in keys[:-1]:
d = d.setdefault(key, {})
d[keys[-1]] = []
pprint(nested_dict)
output
{'abc': {'ijk123': {'op5': {'yes': {'T': []}}},
'pqr123': {'xy2': {'yes': {'T': []}}},
'pqr245': {'kl3': {'yes': {'T': []}}, 'kl4': {'yes': {'T': []}}}}}

Related

How do you combine dictionaries' key values if they have another key value that is the same while leaving out the dictionary that was not combined?

I have a list of objects, like so:
[
{"title":"cdap_tests", "datacenter":"B1", "count_failed": 1},
{"title":"cdap_tests", "datacenter":"G1", "count_failed": 1},
{"title":"cdap_tests", "datacenter":"GOV1", "count_failed": 1},
{"title":"developer_portal_tests", "datacenter":"B1", "count_failed": 1}
]
and I want to combine the objects that have the same title attribute together like so:
[
{"title":"cdap_tests", "datacenter":"B1,G1,GOV1", "count_failed": 1},
{"title":"developer_portal_tests", "datacenter":"B1", "count_failed": 1}
]
I have tried comparing each one to another based on their attribute, and adding the string to the other string if they were the same, but for some reason it is not combining them, I simply get the same data back from the function return
new_data_list = []
for row_to_compare_to in data:
for row_to_compare_from in data:
if row_to_compare_from["datacenter"] == row_to_compare_to["datacenter"]:
pass
elif row_to_compare_from["title"] == row_to_compare_to["title"]:
row_to_compare_to["datacenter"] = f"{row_to_compare_from['datacenter']}, {row_to_compare_to['datacenter']}"
row_to_compare_to["count_failed"] = f"{row_to_compare_from['count_failed']}, {row_to_compare_to['count_failed']}"
new_data_list.append(row_to_compare_to)
return new_data_list
Could someone point me in the direction of what I am doing wrong? Or maybe a cleaner solution?
The code produces an error because the "count_failed" is not in every dictionary.
If I were starting from scratch, I might substitute the original data list for a dictionary of dictionaries where the key to the outer dictionary is the title of each entry. This would result in code that is easier to read. I might also make the appended data like "B1,G1,GOV1" a list instead ["B1", "G1", "GOV1"].
I'm sure my approach isn't the most efficient or Pythonic, but I believe it works:
new_data_list = []
for raw_dct in data:
if raw_dct['title'] in [dct['title'] for dct in new_data_list]:
for new_dct in new_data_list:
if raw_dct['title'] == new_dct['title']:
for k, v in raw_dct.items():
if k in new_dct.keys():
if type(v) is str:
if v not in new_dct[k]:
new_dct[k] += "," + str(v)
if type(v) is int:
new_dct[k] += v
else:
new_dct[k] = v
else:
new_data_list.append(raw_dct)
which gives your desired new_data_list that also takes into account the counting of integer attributes like "count_failed".

iteratively appending N items to list gives last item N times instead

I am accessing a list of dictionary items list_of_dict = [{'ka':'1a', 'kb':'1b', 'kc':'1c'},{'ka':'2a'},{'ka':'3a', 'kb':'3b', 'kc':'3c'}], and trying to conditionally append each entry to another list article_list using a function add_entries.
My desired output
article_list = [{x:1a, y:1b, z:1c}, {x:2a}, {x:3a, y:3b, z:3c}]
My code:
def add_entries(list_of_dict):
keys = ['x','y','z']
#defining a temporary dictionary here
my_dict = dict.fromkeys(keys,0)
entry_keys = ['ka','kb','kc']
my_list = []
for item in list_of_dict:
# conditionally append the entries into the temporary dictionary maintaining desired key names
my_dict.update({a: item[b] for a,b in zip(keys, entry_keys) if b in item})
my_list.append(my_dict)
return my_list
if __name__ == "__main__":
list_of_dict = [{'ka':'1a', 'kb':'1b', 'kc':'1c'},{'ka':'2a'},{'ka':'3a', 'kb':'3b', 'kc':'3c'}]
article_list = []
returned_list = add_entries(list_of_dict)
article_list.extend(returned_list)
My output
article_list = [{x:3a, y:3b, z:3c}, {x:3a, y:3b, z:3c}, {x:3a, y:3b, z:3c}]
Whats wrong
my_list.append(my_dict) appends a reference to the my_dict object to my_list. Therefore, at the end of the for loop in your example, my_list is a list of references to the exact same dictionary in memory.
You can see this for yourself using the function id. id, at least in CPython, basically gives you the memory address of an object. If you do
article_list.extend(returned_list)
print([id(d) for d in article_list])
You'll get a list of identical memory addresses. On my machine I get
[139920570625792, 139920570625792, 139920570625792]
The consequence is that updating the dictionary affects 'all of the dictionaries' in your list. (quotes because really there are not multiple dictionaries in your list, just many times the exact same one). So in the end, only the last update operation is visible to you.
A good discussion on references and objects in Python can be found in this answer https://stackoverflow.com/a/30340596/8791491
The fix
Moving the definition of my_dict inside the for loop, means that you get a new, separate dictionary for each element of my_list. Now, the update operation won't affect the other dictionaries in the list, and my_list is a list of references to several different dictionaries.
def add_entries(list_of_dict):
keys = ['x','y','z']
entry_keys = ['ka','kb','kc']
my_list = []
for item in list_of_dict:
#defining a new dictionary here
my_dict = dict.fromkeys(keys,0)
# conditionally append the entries into the temporary dictionary maintaining desired key names
my_dict.update({a: item[b] for a,b in zip(keys, entry_keys) if b in item})
my_list.append(my_dict)
return my_list
You can use this.
keys=['x','y','z']
res=[{k1:d[k]} for d in list_of_dict for k,k1 in zip(d,keys)]
output
[{'x': '1a'},
{'y': '1b'},
{'z': '1c'},
{'x': '2a'},
{'x': '3a'},
{'y': '3b'},
{'z': '3c'}]
Try this:
list_new_d=[]
for d in list_of_dict:
new_d={}
for k,v in d.items():
if k == 'ka':
new_d['x'] = v
if k == 'kb':
new_d['y'] = v
if k == 'kc':
new_d['z'] = v
list_new_d.append(new_d)

Find value in list of dicts and return id of dict

I'd like to try and find if a value is in a list of dicts, which can be done easily with:
if any(x['aKey'] == 'aValue' for x in my_list_of_dicts):
But this is only a boolean response, I'd like to not only check if the value is there, but to also access it afterwards, so something like:
for i, dictionary in enumerate(my_list_of_dicts):
if dictionary['aKey'] == 'aValue':
# Then do some stuff to that dictionary here
# my_list_of_dicts[i]['aNewKey'] = 'aNewValue'
Is there a better/more pythonic way of writing this out?
With next function, if expected to find only one target dict:
my_list_of_dicts = [{'aKey': 1}, {'aKey': 'aValue'}]
target_dict = next((d for d in my_list_of_dicts if d['aKey'] == 'aValue'), None)
if target_dict: target_dict['aKey'] = 'new_value'
print(my_list_of_dicts)
The output (input list with updated dictionary):
[{'aKey': 1}, {'aKey': 'new_value'}]
You can use a list comprehension. This will return a list of dictionaries that match your condition.
[x for x in my_list_of_dicts if x['aKey']=='aValue' ]

Get key by more than one value in dictionary?

Maybe the dict is not intended to be used in this way, but I need to add more than one value to the same key. My intension is to use a kind of transitory property. If my dict is A:B and B:C, than I want to have the dict A:[B,C].
Let's make an example in order to explain better what I'd like to do:
numDict={'60':['4869'], '4869':['629'], '13':['2']}
I want it to return:
{'60':['4869','629'], '13':['2']}
For just two elements, it is possible to use something like this:
result={}
for key in numDict.keys():
if [key] in numDict.values():
result[list(numDict.keys())[list(numDict.values()).index([key])]]=[key]+numDict[key]
But what about if I have more elements? For example:
numDict={'60':['4869'], '4869':['629'], '13':['2'], '629':['427'}
What can I do in order to get returned {'60':[4869,629,427'], '13':['2']}?
def unchain(d):
#assemble a collection of keys that are not also values. These will be the keys of the final dict.
top_level_keys = set(d.keys()) - set(d.values())
result = {}
for k in top_level_keys:
chain = []
#follow the reference chain as far as necessary.
value = d[k]
while True:
if value in chain: raise Exception("Referential loop detected: {} encountered twice".format(value))
chain.append(value)
if value not in d: break
value = d[value]
result[k] = chain
return result
numDict={'60':'4869', '4869':'629', '13':'2', '629':'427'}
print(unchain(numDict))
Result:
{'60': ['4869', '629', '427'], '13': ['2']}
You might notice that I changed the layout of numDict since it's easier to process if the values aren't one-element lists. But if you're dead set on keeping it that way, you can just add d = {k:v[0] for k,v in d.items()} to the top of unchain, to convert from one to the other.
You can build your own structure, consisting of a reverse mapping of (values, key), and a dictionary of (key, [values]). Adding a key, value pair consists of following a chain of existing entries via the reverse mapping, until it finds the correct location; in case it does not exist, it introduces a new key entry:
class Groupir:
def __init__(self):
self.mapping = {}
self.reverse_mapping = {}
def add_key_value(self, k, v):
self.reverse_mapping[v] = k
val = v
key = k
while True:
try:
self.reverse_mapping[val]
key = val
val = self.reverse_mapping[val]
except KeyError:
try:
self.mapping[val].append(v)
except KeyError:
self.mapping[val] = [v]
break
with this test client:
groupir = Groupir()
groupir.add_key_value(60, 4869)
print(groupir.mapping)
groupir.add_key_value(4869, 629)
print(groupir.mapping)
groupir.add_key_value(13, 2)
print(groupir.mapping)
groupir.add_key_value(629, 427)
print(groupir.mapping)
outputs:
{60: [4869]}
{60: [4869, 629]}
{60: [4869, 629], 13: [2]}
{60: [4869, 629, 427], 13: [2]}
Restrictions:
Cycles as mentioned in comments.
Non unique keys
Non unique values
Probably some corner cases to take care of.
I have written a code for it. See if it helps.
What I have done is to go on diving in till i can go (hope you understand this statement) and mark them as visited as they will no longer be required. At the end I filter out the root keys.
numDict={'60':['4869'], '4869':['629'], '13':['2'], '629':['427']}
l = list(numDict) # list of keys
l1 = {i:-1 for i in numDict} # to track visited keys (initialized to -1 initially)
for i in numDict:
# if key is root and diving in is possible
if l1[i] == -1 and numDict[i][0] in l:
t = numDict[i][0]
while(t in l): # dive deeper and deeper
numDict[i].extend(numDict[t]) # update the value of key
l1[t] = 1 # mark as visited
t = numDict[t][0]
# filter the root keys
answer = {i:numDict[i] for i in numDict if l1[i] == -1}
print(answer)
Output:
{'60': ['4869', '629', '427'], '13': ['2']}

Deleting multiple items in a dict

I've searched around for the error it gives me, but I don't understand that quite well. They did something with for k, v in dbdata.items, but that didn't work for me neither, it gives me other errors.
Well, I want is to delete multiple items.
tskinspath = ['1', '2']
#
dbdata = {}
dbdata['test'] = {}
dbdata['test']['skins_t'] = {}
# Adds the items
dbdata['test']['skins_t']['1'] = 1
dbdata['test']['skins_t']['2'] = 0
dbdata['test']['skins_t']['3'] = 0
dbdata['test']['skins_t']['4'] = 0
# This doesn't work
for item in dbdata["test"]["skins_t"]:
if item not in tskinspath:
if dbdata["test"]["skins_t"][item] == 0:
del dbdata["test"]["skins_t"][item]
# exceptions.RunetimeError: dictonary changed size during iteration
Instead of iterating over the dictionary, iterate over dict.items():
for key, value in dbdata["test"]["skins_t"].items():
if key not in tskinspath:
if value == 0:
del dbdata["test"]["skins_t"][key]
On py3.x use list(dbdata["test"]["skins_t"].items()).
Alternative:
to_be_deleted = []
for key, value in dbdata["test"]["skins_t"].iteritems():
if key not in tskinspath:
if value == 0:
to_be_deleted.append(key)
for k in to_be_deleted:
del dbdata["test"]["skins_t"][k]
The error message says it: you shouldn't modify the dictionary that you are iterating over. Try
for item in set(dbdata['test']['skins_t']):
...
This way you are iterating over a set that contains all keys from dbdata['test']['skins_t'].
As the question details is way aside from the question, If you are looking for a solution that deletes multiple keys from a given dict use this snippet
[s.pop(k) for k in list(s.keys()) if k not in keep]
Additionally, you can create a new dict via comprehension.
new_dict = { k: old_dict[k] for k in keep }

Categories