Sorting a Dictionary by Nested Key - python

Consider a dict of the following form:
dic = {
"First": {
3: "Three"
},
"Second": {
1: "One"
},
"Third": {
2:"Two"
}
}
I would like to sort it by the nested dic key (3, 1, 2)
I tried using the lambda function in the following manner but it returns a "KeyError: 0"
dic = sorted(dic.items(), key=lambda x: x[1][0])
The expected output would be:
{
"Second": {
1: "One"
},
"Third": {
2: "Two"
},
"First": {
3:"Three"
}
}
In essence what I want to know is how to designate a nested key independently from the main dictionary key.

In the lambda function, x is a key-value pair, x[1] is the value, which is itself a dictionary. x[1].keys() is its keys, but it needs to be turned into a list if you want to get its one and only item by its index. Thus:
sorted(dic.items(), key = lambda x: list(x[1].keys())[0])
which evaluates to:
[('Second', {1: 'One'}), ('Third', {2: 'Two'}), ('First', {3: 'Three'})]

dic = {'First': {3: 'Three'}, 'Second': {1: 'One'}, 'Third': {2: 'Two'}}
sorted_list = sorted(dic.items(), key=lambda x:list(x[1].keys())[0])
sorted_dict = dict(sorted_list)
print(sorted_dict)
You need to get the keys for the nested dictionary first and then convert them into list and sort over its first index. You will get a sorted list. All you need to convert this list to dictionary using dict(). I hope that helps. This snippet works for python3.

Related

TypeError: string indices must be integers while extracting the keys

Dictionary is as the following
my = {
"1": {
"first": 'A,B',
"column": "value",
"test":"test",
"output": "Out1",
"second": "Cost",
"Out2": "Rev"
},
"2": {
"first": 'None',
"column": "value",
"test":"test",
"output": "Out2",
"Out2": "Rev"
}
}
Code I tried is the following
{k:{l:l[i] for i in ['first','test'] for l,m in v.items()} for k,v in my.items()}
I am trying to extract only two ['first','test'] keys, there is a change of ['first','test'] not exist also.
I am getting
TypeError: string indices must be integers. WHat is the problem with code
Let's take one of the subdictionaries to understand what is going wrong here.
"1": {
"first": 'A,B',
"column": "value",
"test":"test",
"output": "Out1",
"second": "Cost",
"Out2": "Rev"
},
{k:{l:l[i] for i in ['first','test'] for l,m in v.items()} for k,v in my.items()}
The variable k in your code will be the key "1" and the value v will be the subdictionary.
Then, when you do "l", "l" is actually the dictionary keys which are strings e.g. "first", "test". Then, when you try doing l:l[i], you are actually trying to index the string "first" and you aren't using an integer value to index the string but you are passing a string value - so you are doing "first"["first"].
That is why you see a TypeError with the message "string indices must be integers".
If you want a clever one liner, this should work
{
key: {sub_key:sub_dict[sub_key] for sub_key in ["first", "test"]}
for key, sub_dict in my.items()
}
Personally, I would write
selected_dict = dict()
for key, value in my.items():
for sub_key in ["first", "test"]:
selected_dict[sub_key] = value[sub_key]
Both of them could work:
print({k: {"first": v["first"], "test": v["test"]} for k, v in my.items()})
print({k: {i: v[i] for i in ['first','test']} for k, v in my.items()})
This works as expected
print({x: {i: y[i] for i in ['first', 'test']} for x, y in my.items()})
output
{'1': {'first': 'A,B', 'test': 'test'}, '2': {'first': 'None', 'test': 'test'}}

How to get total number of repeated objects and respective keys from a python dictionary having multiple objects?

I have a python dictionary which consists of many nested dictionaries. I.e. it looks like this:
result = {
123: {
'route1': 'abc'
'route2': 'abc1'
},
456: {
'route1': 'abc'
'route2': 'abc1'
},
789: {
'route1': 'abc2'
'route2': 'abc3'
},
101: {
'route1': 'abc'
'route2': 'abc1'
},
102: {
'route1': 'ab4'
'route2': 'abc5'
}
}
Here we can see that 123, 456 and 101 has the same values.
What I am trying to do that is to find out the repeated object which in this case is:
{
'route1': 'abc'
'route2': 'abc1'
}
and the keys which have this repeated object i.e. 123, 456 and 101.
How can we do this?
Along with repeated objects info, I also want to know which objects does not repeat. I.e. 789 and its respective object and 102 and its respective object.
PS: Please note that I don't really know beforehand which objects are repeating as this structure will be generated inside code. So, it's possible that there could not be any repeated object or there could be multiple i.e. more than one.
Also, I can not use pandas or numpy etc. due to some restrictions.
You can do this by creating a dictionary holding all the matching keys for each distinct value in your result dict (where the values are themselves dicts). This is a fairly common pattern in Python, iterating through one container and aggregating values into a dict. Then, once you've created the aggregation dict, you can split it into duplicate and single values.
To build the aggregation dict, you need to use each subdict from result as a key and append the matching keys from the original dict to a list associated with that subdict. The challenge is that you can't use the subdicts directly as dictionary keys, because they are not hashable. But you can solve that by converting them to tuples. The tuples should also be sorted, to avoid missing duplicates that happen to pop out with different ordering.
It may be easier to understand just by looking at some example code:
result = {
123: {'route1': 'abc', 'route2': 'abc1'},
456: {'route1': 'abc', 'route2': 'abc1'},
789: {'route1': 'abc2', 'route2': 'abc3'},
101: {'route1': 'abc', 'route2': 'abc1'},
102: {'route1': 'ab4', 'route2': 'abc5'}
}
# make a dict showing all the keys that match each subdict
cross_refs = dict()
for key, subdict in result.items():
# make hashable version of subdict (can't use dict as lookup key)
subdict_tuple = tuple(sorted(subdict.items()))
# create an empty list of keys that match this val
# (if needed), or retrieve existing list
matching_keys = cross_refs.setdefault(subdict_tuple, [])
# add this item to the list
matching_keys.append(key)
# make lists of duplicates and non-duplicates
dups = {}
singles = {}
for subdict_tuple, keys in cross_refs.items():
# convert hashed value back to a dict
subdict = dict(subdict_tuple)
if len(keys) > 1:
# convert the list of matching keys to a tuple and use as the key
dups[tuple(keys)] = subdict
else:
# there's only one matching key, so use that as the key
singles[keys[0]] = subdict
print(dups)
# {
# (456, 123, 101): {'route2': 'abc1', 'route1': 'abc'}
# }
print(singles)
# {
# 789: {'route2': 'abc3', 'route1': 'abc2'},
# 102: {'route2': 'abc5', 'route1': 'ab4'}
# }
Use collections.defaultdict:
from collections import defaultdict
d = defaultdict(list)
for k, v in result.items():
d[tuple(v.items())].append(k)
desired = {
'route1': 'abc',
'route2': 'abc1'
}
d[tuple(desired.items())]
Output:
[456, 123, 101]
For not-repeated items, use list comprehension:
[v for v in d.values() if len(v) == 1]
Output:
[[102], [789]]
You can use drop_duplicates() function of pandas:
Firstly transforme your dict on dataframe
import pandas as pd `
df = pd.DataFrame(result).T
Output :
route1 route2
123 abc abc1
456 abc abc1
789 abc2 abc3
101 abc abc1
102 ab4 abc5
Then use the function drop_duplicates and transform to a dict
df2 = df1.drop_duplicates(subset=['route1', 'route2']).T.to_dict()
Output :
{
123: {
'route1': 'abc',
'route2': 'abc1'
},
789: {
'route1': 'abc2',
'route2': 'abc3'
},
102: {
'route1': 'ab4',
'route2': 'abc5'
}
}

Make arrays the same length, append json string

I have a Python script that adds to a list:
column = defaultdict(list)
[...]
for line in out.splitlines():
column[i + 1].append({"row": str(line)})
[...]
f = open(save_dir + 'table_data.json', "w+")
f.write(json.dumps(column))
f.close()
This will ultimately generate a JSON file, with a string like below:
{ "1":[
{
"row":"Product/Descriptian"
}
],
"2":[
{
"row":"Qty/unit"
},
{
"row":"Text"
}
],
"3":[
{
"row":""
}
]}
As you can see, array["2"] have two values. I am trying to make all arrays the same length. So array["1"] and array["3"] will ultimately also have two values.
So in order to do this, I figure I have to find the longest array first:
longest_array = (max(map(len, column.values())))
This should return 2. Now I want to append an empty {"row":""} to the other arrays, to make it the same length:
final = ([v + ["{'row'}: ''"] * (longest_array - len(v)) for v in column.values()])
Which outputs below JSON string:
[
[
{
"row":"Product/Descriptian"
},
{
"row":""
}
],
[
{
"row":"Qty/unit"
},
{
"row":"Text"
}
],
[
{
"row":""
},
{
"row":""
}
]
]
This seems to work partially. However, I spot two errors in the newly created JSON string:
It seems to add another array around the first array. The JSON string now starts with [ [ {
It removes the "parent" arrays "1", "2" and "3"
The culprit is in line:
final = ([v + ["{'row'}: ''"] * (longest_array - len(v)) for v in column.values()])
which:
It's a list comprehension (instead of dict comprehension): by iterating on column.values(), you lose all the keys, and all the lists corresponding to values have been "packed" in an outer (master) list
Not sure what you try to achieve by the double quotes (") in ["{'row'}: ''"]: that's a list with one string element
To solve your problem, turn the above line into:
final = {k: v + [{'row': ''}] * (longest_array - len(v)) for k, v in column.items()}
and final will become the expected dictionary:
>>> column
defaultdict(<class 'list'>, {'1': [{'row': 'Product/Descriptian'}], '2': [{'row': 'Qty/unit'}, {'row': 'Text'}], '3': [{'row': ''}]})
>>>
>>> longest_array_len = max((len(v) for v in column.values()))
>>> longest_array_len
2
>>> final = {k: v + [{'row': ''}] * (longest_array_len - len(v)) for k, v in column.items()}
>>>
>>> final
{'1': [{'row': 'Product/Descriptian'}, {'row': ''}], '2': [{'row': 'Qty/unit'}, {'row': 'Text'}], '3': [{'row': ''}, {'row': ''}]}

iterating over values list in Python dictionary

Hi I am looking to iterate over a Python dictionary where each key has a list of values, I am looking to either create a new list/dictionary where I have separated each value[x] or directly do stuff with the separated values.
here's a simplified example of the dictionary I have:
all_parameters = {"first": ["1a","1b","1c"], "second": ["2a","2b","2c"], "third": ["3a","3b","3c"]}
I am looking to separate the values like this (either by creating a new dictionary or list or directly doing stuff with the separated values).
grouped_parameters = [{"first": "1a", "second": "2a", "third": "3a"},
{"first": "1b", "second": "2b", "third": "3b"},
{"first": "1c", "second": "2c", "third": "3c"}]
I am insure how to iterate correctly over each key:value pair.
i = 0
for k, v in all_parameters.items():
for item in v:
# stuck here
i+=1
Eventually the 'stuff' I am looking to do is convert each output (e.g. '{"first": "1a", "second": "2a", "third": "3a"}') into a string so that I can post each parameter group to a cell in a table, so ideally i'd prefer to do this dynamically instead of creating a new dictionary.
Any help would be greatly appreciated.
Assuming all lists have the same length:
>>> length = len(next(all_parameters.itervalues()))
>>> [{k:v[i] for k,v in all_parameters.iteritems()} for i in range(length)]
[{'second': '2a', 'third': '3a', 'first': '1a'}, {'second': '2b', 'third': '3b', 'first': '1b'}, {'second': '2c', 'third': '3c', 'first': '1c'}]
In Python 3, use len(next(iter(all_parameters.values()))) and items instead of iteritems.
(The iterator shenanigans are done because you don't need a list of all the dictionary values if you only want the length of an arbitrary value-list.)
If there's a chance of the lists being of different length, you could use map with None like so:
all_parameters = {"first": ["1a", "1b", "1c", "1d"], "second": ["2a", "2b", "2c"], "third": ["3a", "3b", "3c"]}
final = [dict(zip(all_parameters.keys(), values)) for values in map(None, *all_parameters.values())]
print final
map(None, *all_parameters.values()) gives you a tuple of the values for each key at each index - e.g. ('1a', '2a', '3a'), and by zipping this to the keys and creating a dictionary, we get the required combination.
Note: this will only work in Python 2.x as map changed in 3.x. For Python 3.x we can use itertools.zip_longest:
from itertools import zip_longest
all_parameters = {"first": ["1a", "1b", "1c", "1d"], "second": ["2a", "2b", "2c"], "third": ["3a", "3b", "3c"]}
final = [dict(zip(all_parameters.keys(), values)) for values in zip_longest(*all_parameters.values())]
print(final)
In both cases we get:
[{'second': '2a', 'third': '3a', 'first': '1a'}, {'second': '2b', 'third': '3b', 'first': '1b'}, {'second': '2c', 'third': '3c', 'first': '1c'}, {'second': None, 'third': None, 'first': '1d'}]
The items in a plain dict aren't ordered *, so you need to be careful when converting a dict to a string if you want the fields to be in a certain order.
This code uses a tuple containing the key strings in the order we want them to be in in the output dict strings.
all_parameters = {
"first": ["1a","1b","1c"],
"second": ["2a","2b","2c"],
"third": ["3a","3b","3c"],
}
# We want keys to be in this order
all_keys = ("first", "second", "third")
# Assumes all value lists are the same length.
for t in zip(*(all_parameters[k] for k in all_keys)):
a = ['"{}": "{}"'.format(u, v) for u, v in zip(all_keys, t)]
print('{' + ', '.join(a) + '}')
output
{"first": "1a", "second": "2a", "third": "3a"}
{"first": "1b", "second": "2b", "third": "3b"}
{"first": "1c", "second": "2c", "third": "3c"}
How it works
We first use a generator expression (all_parameters[k] for k in all_keys) which yields the value lists from all_parameters in the order specified by all_keys. We pass those lists as args to zip using the * "splat" operator. So for your example data, it's equivalent to calling zip like this:
zip(["1a","1b","1c"], ["2a","2b","2c"], ["3a","3b","3c"])
zip effectively transposes the iterables you pass it, so the result of that call is an iterator that produces these tuples:
('1a', '2a', '3a'), ('1b', '2b', '3b'), ('1c', '2c', '3c')
We then loop over those tuples one by one, with the for t in zip(...), so on the first loop t gets the value ('1a', '2a', '3a'), then ('1b', '2b', '3b'), etc.
Next we have a list comprehension that zips the value strings up with the corresponding key string and formats them into a string with double-quotes around each key and value string. We then join those strings together with commas and spaces as separators (and add brace characters) to make our final dict strings.
* Actually in Python 3.6 plain dicts do retain insertion order, but that is currently an implementation detail, and it should not be relied upon.
I'm not sure if this is what you want but
for i in range(len(all_parameters['first'])):
for position in all_parameters:
print(position+" "+all_parameters[position][i])
gives an output like this
first 1a
second 2a
third 3a
first 1b
second 2b
third 3b
first 1c
second 2c
third 3c
This will work only if each dictionary element has a list the same size as the first one.

Get or delete value from a nested list in Python

I get 112 values from excel file. I transfer them to a dict, and then a list. They are grades from three exams based on student name.
Form like this :
grade = ["Mike", {"First": 0.9134344089913918, "Second": 0.9342180467620398, "Third": 0.8703466591191937}]
["Lisa", {"First": 0.8940552022848847, "Second": 0.9342180467620398, "Third": 0.881441786523737}]
["James", {"First": 0.8324328746220233, "Second": 0.9342180467620398, "Third": 0.683570699129642}]
Above are the first three set of value.
My goal is just obtain the first exam value from the list.
The result should be like this:
["Mike", {"First": 0.9134344089913918}]
["Lisa", {"First": 0.8940552022848847}]
["James", {"First": 0.8324328746220233}]
Two ways. a) delete the second and third exam value. b) just get the first exam value.
Is there anyone can help me. Since it is not a typical list...but it is not a dict. I use
print type(grade)....It shows its a list.
Thank you.
It is more natural to represent your initial structure as a dict (as well as the resulting structure).
all_grades = dict(grade)
Now, use a dict-comprehension, creating a new one-item-dict for each student:
first_grades = {
name: { 'First': grades_dict['First'] }
for name, grades_dict in all_grades.items()
}
Here's a generalisation of this concept:
first_grades = {
name: {
grade_key: grades_dict[grade_key]
for grade_key in ( 'First', )
}
for name, grades_dict in all_grades.items()
}
You end up with:
{'James': {'First': 0.8324328746220233},
'Lisa': {'First': 0.8940552022848847},
'Mike': {'First': 0.9134344089913918}}
grade = [["Mike", {"First": 0.9134344089913918, "Second": 0.9342180467620398, "Third": 0.8703466591191937}],["James", {"First": 0.8324328746220233, "Second": 0.9342180467620398, "Third": 0.683570699129642}]]
newdict=[]
for person,marks in grade:
print marks
newdict.append([person,{'First':marks['First']}])
#output [['Mike', {'First': 0.9134344089913918}], ['James', {'First': 0.8324328746220233}]]
please make sure your grades should be a list of list containing the person and marks as mine.then we can easily retrieve value
I'm not sure if I understand your data structure. You said you have a dict before you made a list. Maybe you could use this:
Data structure
>>> grade
{'Lisa': {'Third': 0.881441786523737, 'First': 0.8940552022848847, 'Second': 0.9
342180467620398}, 'Mike': {'Third': 0.8703466591191937, 'First': 0.9134344089913
918, 'Second': 0.9342180467620398}}
The result
>>> [{key:{'First':value['First']}} for key,value in grade.items()]
[{'Lisa': {'First': 0.8940552022848847}}, {'Mike': {'First': 0.9134344089913918}
}]

Categories