I have a list of dictionaries in the following format
data = [
{
"Members": [
"user11",
"user12",
"user13"
],
"Group": "Group1"
},
{
"Members": [
"user11",
"user21",
"user22",
"user23"
],
"Group": "Group2"
},
{
"Members": [
"user11",
"user22",
"user31",
"user32",
"user33",
],
"Group": "Group3"
}]
I'd like to return a dictionary where every user is a key and the value is a list of all the groups which they belong to. So for the above example, this dict would be:
newdict = {
"user11": ["Group1", "Group2", "Group3"]
"user12": ["Group1"],
"user13": ["Group1"],
"user21": ["Group2"],
"user22": ["Group2", "Group3"],
"user23": ["Group2"],
"user31": ["Group3"],
"user32": ["Group3"],
"user33": ["Group3"],
}
My initial attempt was using a defaultdict in a nested loop, but this is slow (and also isn't returning what I expected). Here was that attempt:
user_groups = defaultdict(list)
for user in users:
for item in data:
if user in item["Members"]:
user_groups[user].append(item["Group"])
Does anyone have any suggestions for improvement for speed, and also just a generally better way to do this?
Code
new_dict = {}
for d in data: # each item is dictionary
members = d["Members"]
for m in members:
# appending corresponding group for each member
new_dict.setdefault(m, []).append(d["Group"])
print(new_dict)
Out
{'user11': ['Group1', 'Group2', 'Group3'],
'user12': ['Group1'],
'user13': ['Group1'],
'user21': ['Group2'],
'user22': ['Group2', 'Group3'],
'user23': ['Group2'],
'user31': ['Group3'],
'user32': ['Group3'],
'user33': ['Group3']}
Related
I have some json that I would like to transform from this:
[
{
"name":"field1",
"intValue":"1"
},
{
"name":"field2",
"intValue":"2"
},
...
{
"name":"fieldN",
"intValue":"N"
}
]
into this:
{ "field1" : "1",
"field2" : "2",
...
"fieldN" : "N",
}
For each pair, I need to change the value of the name field to a key, and the values of the intValue field to a value. This doesn't seem like flattening or denormalizing. Are there any tools that might do this out-of-the-box, or will this have to be brute-forced? What's the most pythonic way to accomplish this?
parameters = [ # assuming this is loaded already
{
"name":"field1",
"intValue":"1"
},
{
"name":"field2",
"intValue":"2"
},
{
"name":"fieldN",
"intValue":"N"
}
]
field_int_map = dict()
for p in parameters:
field_int_map[p['name']] = p['intValue']
yields {'field1': '1', 'field2': '2', 'fieldN': 'N'}
or as a dict comprehension
field_int_map = {p['name']:p['intValue'] for p in parameters}
This works to combine the name attribute with the intValue as key:value pairs, but the result is a dictionary instead of the original input type which was a list.
Use dictionary comprehension:
json_dct = {"parameters":
[
{
"name":"field1",
"intValue":"1"
},
{
"name":"field2",
"intValue":"2"
},
{
"name":"fieldN",
"intValue":"N"
}
]}
dct = {d["name"]: d["intValue"] for d in json_dct["parameters"]}
print(dct)
# {'field1': '1', 'field2': '2', 'fieldN': 'N'}
I want to delete the following 'date' and 'last_modified' keys from the following nested dictionary. Kindly suggest any elegant way to do this dynamically with in Python.
{
"total_pages":1,
"datasets":[
{
"dataset_name":"enterpriseqa-landing-zone_census2017",
"database":"enterpriseqa-landing-zone",
"table":"census2017",
"owner":"qadataengineer",
"zone":"landing",
"date":"2020-06-09T07:11:25+00:00",
"location":"s3://enterpriseqa-landing-zone/static/census2017/",
"count":"5507",
"classification":"csv",
"last_modified":"2020-06-09T07:15:49+00:00",
"type":"Static"
}
]
}
If d is your dictionary from the question, you can use this example to delete the keys:
for dataset in d['datasets']:
del dataset['date']
del dataset['last_modified']
Produces this dictionary:
{
"total_pages": 1,
"datasets": [
{
"dataset_name": "enterpriseqa-landing-zone_census2017",
"database": "enterpriseqa-landing-zone",
"table": "census2017",
"owner": "qadataengineer",
"zone": "landing",
"location": "s3://enterpriseqa-landing-zone/static/census2017/",
"count": "5507",
"classification": "csv",
"type": "Static"
}
]
}
You can do it like this:
keys = ["date", "last_modified"]
[[d.pop(key) for key in keys] for d in dictionary["datasets"]]
Where dictionary is your dictionary.
I have a list of dictionary as below. I need to iterate the list of dictionary and remove the content of the parameters and set as an empty dictionary in sections dictionary.
input = [
{
"category":"Configuration",
"sections":[
{
"section_name":"Global",
"parameters":{
"name":"first",
"age":"second"
}
},
{
"section_name":"Operator",
"parameters":{
"adrress":"first",
"city":"first"
}
}
]
},
{
"category":"Module",
"sections":[
{
"section_name":"Global",
"parameters":{
"name":"first",
"age":"second"
}
}
]
}
]
Expected Output:
[
{
"category":"Configuration",
"sections":[
{
"section_name":"Global",
"parameters":{}
},
{
"section_name":"Operator",
"parameters":{}
}
]
},
{
"category":"Module",
"sections":[
{
"section_name":"Global",
"parameters":{}
}
]
}
]
My current code looks like below:
category_list = []
for categories in input:
sections_list = []
category_name_dict = {"category": categories["category"]}
for sections_dict in categories["sections"]:
section = {}
section["section_name"] = sections_dict['section_name']
section["parameters"] = {}
sections_list.append(section)
category_name_dict["sections"] = sections_list
category_list.append(category_name_dict)
Is there any elegant and more performant way to do compute this logic. Keys such as category, sections, section_name, and parameters are constants.
The easier way is not to rebuild the dictionary without the parameters, just clear it in every section:
for value in values:
for section in value['sections']:
section['parameters'] = {}
Code demo
Elegance is in the eye of the beholder, but rather than creating empty lists and dictionaries then filling them why not do it in one go with a list comprehension:
category_list = [
{
**category,
"sections": [
{
**section,
"parameters": {},
}
for section in category["sections"]
],
}
for category in input
]
This is more efficient and (in my opinion) makes it clearer that the intention is to change a single key.
I am trying to write a program where I am having a list of dictionaries in the following manner
[
{
'unique':1,
'duplicate':2,
},
{
'unique':1,
'duplicate':2,
},
{
'unique':1,
'duplicate':2,
},
{
'unique':1,
'duplicate':2,
}
]
Can we form it as a dictionary, where the first key in tuple should become unique Key in a dictionary
and it's corresponding values as a list for that values
Example:
[
{
'unique':1,
'duplicate':2,
},
{
'unique':1,
'duplicate':8,
},
{
'unique':2,
'duplicate':2,
},
{
'unique':1,
'duplicate':4,
}
]
The above list should be converted into the following
---- Expected Outcome ---
[
{
'unique':1,
'duplicates':[2,8,4]
},
{
'unique':2,
'duplicates':[2]
}
]
PS: I am doing this in python
Thanks for the code in advance
you can also use itertools.groupby:
from itertools import groupby
from operator import itemgetter
l = [
{
'unique':1,
'duplicate':2,
},
{
'unique':1,
'duplicate':8,
},
{
'unique':2,
'duplicate':2,
},
{
'unique':1,
'duplicate':4,
}
]
key = itemgetter('unique')
result = [{'unique':k, 'duplicate': list(map(itemgetter('duplicate'), g))}
for k, g in groupby(sorted(l, key=key ), key = key)]
print(result)
output:
[{'unique': 1, 'duplicate': [2, 8, 4]}, {'unique': 2, 'duplicate': [2]}]
I think this list comprehension can solve your problem:
result = [{'unique': id, 'duplicates': [d['duplicate'] for d in l if d['unique'] == id]} for id in set(map(lambda d: d['unique'], l))]
This might help you:
l = [
{
'unique':1,
'duplicate':2,
},
{
'unique':1,
'duplicate':8,
},
{
'unique':2,
'duplicate':2,
},
{
'unique':1,
'duplicate':4,
}
]
a = set()
for i in l:
a.add(i['unique'])
d = {i:[] for i in a }
for i in l:
d[i['unique']].append(i['duplicate'])
output = [{'unique': i, 'duplicate': j}for i, j in d.items()]
The output will be:
[{'unique': 1, 'duplicate': [2, 8, 4]}, {'unique': 2, 'duplicate': [2]}]
defaultdict(list) may help you here:
from collections import defaultdict
# data = [ {'unique': 1, 'duplicate': 2}, ... ] # your data
dups = defaultdict(list) # {unique: [duplicate]}
for dd in data:
dups[dd['unique']].append(dd['duplicate'])
answer = [dict(unique = k, duplicates = v) for k, v in dups.items()]
If you don't know the name of unique key, then replace 'unique' with something like
unique_key = list(data[0].keys())[0]
unique=[]
duplicate ={}
for items in data:
if items['unique'] not in unique:
unique.append(items['unique'])
duplicate[items['unique']]=[items['duplicate']]
else:
duplicate[items['unique']].append(items['duplicate'])
new_data=[]
for key in unique:
new_data.append({'unique':key,'duplicate':duplicate[key]})
Explanation: In the first for loop, I am appending unique keys to 'unique'. If the key doesn't exists in 'unique', I will append it in 'unique' & add a key in 'duplicate' with value as single element list. If the same key is found again, I simply append that value to 'duplicate' corresponding the key. In the 2nd loop, I am creating a 'new_dict' where I am adding these unique keys & its duplicate value list
Please see the JSON below taken from an API.
my_json =
{
"cities":[
{
"portland":[
{"more_info":[{"rank": "3", "games_played": "5"}
],
"team_name": "blazers"
},
{
"cleveland":[
{"more_info":[{"rank": "2", "games_played": "7"}
],
"team_name": "cavaliers"
}
]
}
I would like to create a new dictionary from this my_json with "team_name" as the key and "rank" as the value.
Like this: {'Blazers': 3, 'Cavaliers': 2, 'Bulls': 7}
I'm not sure how to accomplish this... I can return a list of cities, and I can return a list of ranks, but they end up being two separate lists with no relation, I'm not sure how to relate the two.
Any help would be appreciated (I'm also open to organizing this info in a list rather than dict if that is easier).
If I run this:
results_dict = {}
cities = my_json.get('cities', [])
for x in cities:
for k,v in x.items():
print k, v
it returns:
team_name blazers
portland [{"rank": "3", "games_played": "5"}
team_name cavaliers
cavaliers [{"rank": "2", "games_played": "7"}
If you want to take your cities list and your ranks list and combine them, you could use zip() and a dictionary comprehension:
output = {city: rank for city, rank in zip(cities, ranks)}
Valid JSON looks like:
{
"cities":[
{
"portland":[
{"more_info":
[{"rank": "3", "games_played": "5"}],
"team_name":
"blazers"
}
]
},
{
"cleveland":[
{"more_info":
[{"rank": "2", "games_played": "7"}],
"team_name":
"cavaliers"
}
]
}
]
}
This part of code returns all you want, but I'll try to write more readable code instead of this:
results_dict = {}
cities = my_json.get('cities', [])
for x in cities:
for k,v in x.items():
for element in v:
team = element.get('team_name', '')
meta_data = element.get('more_info', [])
for item in meta_data:
rank = item.get('rank')
results_dict.update({team: rank})
>>> results_dict
{'blazers': '3', 'cavaliers': '2'}
What API is that? The JSON structure (if pivanchy got it right) seems to be unnecessarily nested in lists. (Can a city have more than one team? Probably yes. Can a team have more than one rank, though?)
But just for sports, here is a gigantic dictionary comprehension to extract the data you want:
{ team['team_name']: team['more_info'][0]['rank']
for ((team,),) in (
city.values() for city in my_json['cities']
)
}
The json seemed to be missing some closing brackets. After adding them I got this:
my_json = {
"cities": [
{"portland":[
{"more_info":[{"rank": "3", "games_played": "5"}],"team_name": "blazers"}]},
{"cleveland":[{"more_info":[{"rank": "2", "games_played": "7"}],"team_name": "cavaliers"}]}
]
}
Given that structure, which is extremely nested, the following code will extract the data you want, but its very messy:
results = {}
for el in my_json["cities"]:
name = el.keys()[0]
rank = el.values()[0][0]["more_info"][0]["rank"]
results[name] = rank
print results
Which will give you:
{'portland': '3', 'cleveland': '2'}