Get all fieldnames from the union of two lists - python

I want to get all fieldnames from the union of two lists to later export as csv, but I'm only getting fildname from just one list.
I want to get all fieldnames because when I go to export to csv I get the following error:
ValueError: dict contains fields not in fieldnames: 'amzlink', 'url', 'asin'
amazondata = [{'amzlink': '', 'asin': 'B084ZZ7VY3', 'url': ''}]
amazonPage = [{'price': '$14.95', 'image': '', 'rating': '4.7 out of 5'}]
result = []
for myDict in amazonPage:
if myDict not in result:
print (result[0])

If you are just looking to get a list of all field names in the dictionaries:
Extract the keys from the dictionaries, convert to set, and take union of sets.
Borrowed #Baramr's amazondata list to demonstrate this below:
amazondata = [{'amzlink': '', 'asin': 'B084ZZ7VY3', 'url': ''}]
amazonPage = [{'price': '$14.95', 'image': '', 'rating': '4.7 out of 5'}]
amazondata_fields = set(amazondata[0].keys())
amazonPage_fields = set(amazonPage[0].keys())
all_fields = amazondata_fields.union(amazonPage_fields)
> {'price', 'rating', 'asin', 'image', 'amzlink', 'url'}
If you are looking to fuse two dictionaries: Use the update method.
> {'amzlink': '', 'asin':
> 'B084ZZ7VY3', 'url':
> '', 'price':
> '$14.95', 'image':
> '',
> 'rating': '4.7 out of 5'}

Loop over all the dictionaries, adding the keys to a set.
amazondata = [{'amzlink': '', 'asin': 'B084ZZ7VY3', 'url': ''}]
amazonPage = [{'price': '$14.95', 'image': '', 'rating': '4.7 out of 5'}]
result = []
all_fields = set()
for myDict in amazonPage:
all_fields |= myDict.keys()


How can I remove nested keys and create a new dict and link both with an ID?

I have a problem. I have a dict my_Dict. This is somewhat nested. However, I would like to 'clean up' the dict my_Dict, by this I mean that I would like to separate all nested ones and also generate a unique ID so that I can later find the corresponding object again.
For example, I have detail: {...}, this nested, should later map an independent dict my_Detail_Dict and in addition, detail should receive a unique ID within my_Dict. Unfortunately, my list that I give out is empty. How can I remove my slaughtered keys and give them an ID?
my_Dict = {
'_key': '1',
'group': 'test',
'data': {},
'type': '',
'code': '007',
'conType': '1',
'flag': None,
'createdAt': '2021',
'currency': 'EUR',
'detail': {
'selector': {
'number': '12312',
'isTrue': True,
'requirements': [{
'type': 'customer',
'requirement': '1'}]
def nested_dict(my_Dict):
my_new_dict_list = []
for key in my_Dict.keys():
#print(f"Looking for {key}")
if isinstance(my_Dict[key], dict):
print(f"{key} is nested")
# Add id to nested stuff
my_Dict[key]["__id"] = 1
my_nested_Dict = my_Dict[key]
# Delete all nested from the key
del my_Dict[key]
# Add id to key, but not the nested stuff
my_Dict[key] = 1
return my_new_dict_list
[OUT] []
# What I want
[my_Dict, my_Details_Dict, my_Data_Dict]
What I have
{'_key': '1',
'group': 'test',
'data': {},
'type': '',
'code': '007',
'conType': '1',
'flag': None,
'createdAt': '2021',
'currency': 'EUR',
'detail': {'selector': {'number': '12312',
'isTrue': True,
'requirements': [{'type': 'customer', 'requirement': '1'}]}}}
What I want
my_Dict = {'_key': '1',
'group': 'test',
'data': 18,
'type': '',
'code': '007',
'conType': '1',
'flag': None,
'createdAt': '2021',
'currency': 'EUR',
'detail': 22}
my_Data_Dict = {'__id': 18}
my_Detail_Dict = {'selector': {'number': '12312',
'isTrue': True,
'requirements': [{'type': 'customer', 'requirement': '1'}]}, '__id': 22}
The following code snippet will solve what you are trying to do:
my_Dict = {
'_key': '1',
'group': 'test',
'data': {},
'type': '',
'code': '007',
'conType': '1',
'flag': None,
'createdAt': '2021',
'currency': 'EUR',
'detail': {
'selector': {
'number': '12312',
'isTrue': True,
'requirements': [{
'type': 'customer',
'requirement': '1'}]
def nested_dict(my_Dict):
# Initializing a dictionary that will store all the nested dictionaries
my_new_dict = {}
idx = 0
for key in my_Dict.keys():
# Checking which keys are nested i.e are dictionaries
if isinstance(my_Dict[key], dict):
# Generating ID
idx += 1
# Adding generated ID as another key
my_Dict[key]["__id"] = idx
# Adding nested key with the ID to the new dictionary
my_new_dict[key] = my_Dict[key]
# Replacing nested key value with the generated ID
my_Dict[key] = idx
# Returning new dictionary containing all nested dictionaries with ID
return my_new_dict
result = nested_dict(my_Dict)
# Iterating through dictionary to get all nested dictionaries
for item in result.items():
If I understand you correctly, you wish to automatically make each nested dictionary it's own variable, and remove it from the main dictionary.
Finding the nested dictionaries and removing them from the main dictionary is not so difficult. However, automatically assigning them to a variable is not recommended for various reasons. Instead, what I would do is store all these dictionaries in a list, and then assign them manually to a variable.
# Prepare a list to store data in
inidividual_dicts = []
id_index = 1
for key in my_Dict.keys():
# For each key, we get the current value
value = my_Dict[key]
# Determine if the current value is a dictionary. If so, then it's a nested dict
if isinstance(value, dict):
print(key + " is a nested dict")
# Get the nested dictionary, and replace it with the ID
dict_value = my_Dict[key]
my_Dict[key] = id_index
# Add the id to previously nested dictionary
dict_value['__id'] = id_index
id_index = id_index + 1 # increase for next nested dic
inidividual_dicts.append(dict_value) # store it as a new dictionary
# Manually write out variables names, and assign the nested dictionaries to it.
[my_Details_Dict, my_Data_Dict] = inidividual_dicts

Create a new dictionary from a nested JSON output after parsing

In python3 I need to get a JSON response from an API call,
and parse it so I will get a dictionary That only contains the data I need.
The final dictionary I ecxpt to get is as follows:
{'Severity Rules': ('cc55c459-eb1a-11e8-9db4-0669bdfa776e', ['cc637182-eb1a-11e8-9db4-0669bdfa776e']), 'auto_collector': ('57e9a4ec-21f7-4e0e-88da-f0f1fda4c9d1', ['0ab2470a-451e-11eb-8856-06364196e782'])}
the JSON response returns the following output:
'RuleGroups': [{
'Id': 'cc55c459-eb1a-11e8-9db4-0669bdfa776e',
'Name': 'Severity Rules',
'Order': 1,
'Enabled': True,
'Rules': [{
'Id': 'cc637182-eb1a-11e8-9db4-0669bdfa776e',
'Name': 'Severity Rule',
'Description': 'Look for default severity text',
'Enabled': False,
'RuleMatchers': None,
'Rule': '\\b(?P<severity>DEBUG|TRACE|INFO|WARN|ERROR|FATAL|EXCEPTION|[I|i]nfo|[W|w]arn|[E|e]rror|[E|e]xception)\\b',
'SourceField': 'text',
'DestinationField': 'text',
'ReplaceNewVal': '',
'Type': 'extract',
'Order': 21520,
'KeepBlockedLogs': False
'Type': 'user'
}, {
'Id': '4f6fa7c6-d60f-49cd-8c3d-02dcdff6e54c',
'Name': 'auto_collector',
'Order': 4,
'Enabled': True,
'Rules': [{
'Id': '2d6bdc1d-4064-11eb-8856-06364196e782',
'Name': 'auto_collector',
'Description': 'DO NOT CHANGE!! Created via API coralogix-blocker tool',
'Enabled': False,
'RuleMatchers': None,
'SourceField': 'subsystemName',
'DestinationField': 'subsystemName',
'ReplaceNewVal': '',
'Type': 'block',
'Order': 1,
'KeepBlockedLogs': False
'Type': 'user'
I was able to create a dictionary that contains the name and the RuleGroupsID, like that:
response = requests.get(url,headers=headers)
output = response.json()
groupRuleName = [li['Name'] for li in outputlist]
groupRuleID = [li['Id'] for li in outputlist]
# Create a dictionary of NAME + ID
ruleDic = {}
for key in groupRuleName:
for value in groupRuleID:
ruleDic[key] = value
Which gave me a simple dictionary:
{'Severity Rules': 'cc55c459-eb1a-11e8-9db4-0669bdfa776e', 'Rewrites': 'ddbaa27e-1747-11e9-9db4-0669bdfa776e', 'Extract': '0cb937b6-2354-d23a-5806-4559b1f1e540', 'auto_collector': '4f6fa7c6-d60f-49cd-8c3d-02dcdff6e54c'}
but when I tried to parse it as nested JSON things just didn't work.
In the end, I managed to create a function that returns this dictionary,
I'm doing it by breaking the JSON into 3 lists by the needed elements (which are Name, Id, and Rules from the first nest), and then create another list from the nested JSON ( which listed everything under Rule) which only create a list from the keyword "Id".
Finally creating a dictionary using a zip command on the lists and dictionaries created earlier.
def get_filtered_rules() -> List[dict]:
groupRuleName = [li['Name'] for li in outputlist]
groupRuleID = [li['Id'] for li in outputlist]
ruleIDList = [li['Rules'] for li in outputlist]
ruleIDListClean = []
ruleClean = []
for sublist in ruleIDList:
lstRule = [item['Rule'] for item in sublist]
ruleContent=list(zip(groupRuleName, ruleClean))
ruleContentDictionary = dict(ruleContent)
lstID = [item['Id'] for item in sublist]
# Create a dictionary of NAME + ID + RuleID
ruleDic = dict(zip(groupRuleName, zip(groupRuleID, ruleIDListClean)))
except Exception as e: print(e)
return ruleDic

Adding items to list in dict if entry already exists

I'm receiving many CSV-files that contain orders for different products. Those CSV-files need to be "converted" into a specific JSON-structure.
Each row of the CSV-file represents the order of one product. This means that if I would order two products, the CSV would contain two rows.
A simplified version of the CSV-file may look like this (please note the orderId "111" in the first and third row):
111,123,testitem,john doe,samplestreet 1
222,345,anothertestitem,jane doe,samplestreet 1
111,345,anothertestitem,john doe,samplestreet 1
My current solution works but I think I'm overcomplicating things.
Currently, I'm iterating over each CSV-row and create the JSON-structure where I use a helper-function that will either add the order or append a list that contains ordered items like so:
def add_orderitem(orderitem, order, all_orders):
""" Adds an ordered product to the order or "create" a new order if it doesn't exist """
for row in all_orders:
# Order already exists
if any(order["orderNumber"] == value for field, value in row.items()):
print(f"Order '{order['orderNumber']}' already exists, adding product #{orderitem['sku']}")
return all_orders
# New order
print(f"New Order found, creating order '{order['orderNumber']}' and adding product #{orderitem['sku']}")
return all_orders
def parse_orders():
""" Converts CSV-orders into JSON """
results = []
orders = read_csv("testorder.csv") # helper-function returns CSV-dictreader (list of dicts)
for order in orders:
# Create basic structure
orderdata = {
"orderNumber": order["orderId"],
"address": {
"name": order["orderId"],
"street": order["street"]
"orderItems": [] # <-- this will be filled later
# Extract product-information that will be inserted in above 'orderItems' list
product = {
"sku": order["itemNumber"],
"name": order["itemName"]
# Add order to final list or add item if order already exists
results = add_orderitem(product, orderdata, results)
return results
def main():
from pprint import pprint
parsed_orders = parse_orders()
if __name__ == "__main__":
The skript works fine, the output below is what I'm expecting:
New Order found, creating order '111' and adding product #123
New Order found, creating order '222' and adding product #345
Order '111' already exists, adding product #345
[{'address': {'name': '111', 'street': 'samplestreet 1'},
'orderItems': [{'name': 'testitem', 'sku': '123'},
{'name': 'anothertestitem', 'sku': '345'}],
'orderNumber': '111'},
{'address': {'name': '222', 'street': 'samplestreet 1'},
'orderItems': [{'name': 'anothertestitem', 'sku': '345'}],
'orderNumber': '222'}]
Is there a way, to do this "smarter"?
Imo a namedtuple and a groupby would make your code clearer:
from collections import namedtuple
from itertools import groupby
# csv data or file
data = """orderId,itemNumber,itemName,name,street
111,123,testitem,john doe,samplestreet 1
222,345,anothertestitem,jane doe,samplestreet 1
111,345,anothertestitem,john doe,samplestreet 1
# the Order tuple
Order = namedtuple('Order', 'orderId itemNumber itemName name street')
# load the csv into orders
orders = [Order(*values) for line in data.split("\n")[1:] if line for values in [line.split(",")]]
# and group it by orderId
orders = sorted(orders, key = lambda order: order.orderId)
# group it by orderId
output = list()
for key, values in groupby(orders, key=lambda order: order.orderId):
items = list(values)
dct = {"address": {"name": items[0].name, "street": items[0].street},
"orderItems": [{"name": item.itemName, "sku": item.itemNumber} for item in items]}
This yields
[{'address': {'name': 'john doe', 'street': 'samplestreet 1'}, 'orderItems': [{'name': 'testitem', 'sku': '123'}, {'name': 'anothertestitem', 'sku': '345'}]},
{'address': {'name': 'jane doe', 'street': 'samplestreet 1'}, 'orderItems': [{'name': 'anothertestitem', 'sku': '345'}]}]
You could even put it in a great comprehension but that would not make it more readable.

Comparing two dictionaries for specific value within list of dict

I have two list of dictionaries. I am looping through them and looking for matching id. If the id in the src_dict matches to destination I need to call an update method else an insert method. When I am using the below code I am getting unintended result.
This is the outcome I want. While updating i need to preserve rec_id from dest dict with corresponding values from src dict. insert is pretty much just the src dict elements thats not in dest dict. Appreciate any help!
records_update = [{'rec_id': 'abc', 'fields': {'id': 111, 'name': 'sam'}}, {'rec_id': 'xyz', 'fields': {'id': 333, 'name': 'name_changed_to_not_ross'}}]
#. the rec_id is from dest_dict while rest of the field should come from src_dict since these values could change that needs to be update
records_insert = [{"id": 444, "name": "jack"}]
src_dict = [{"id": 111, "name": "sam"}, {"id": 333, "name": "name_changed_to_not_ross"}, {"id": 444, "name": "jack"}]
dest_dict = [{"rec_id":"abc","fields":{"id":111,"name":"sam"}},
records_update = []
records_insert = []
for rec_src in src_dict:
for rec_dest in dest_dict:
if rec_src['id'] == rec_dest['fields']['id']:
print('match and add this element to update list')
print('no match add this element to insert list')
You can create a dict indexed by IDs from dest_dict for efficient lookups, and then use list comprehensions to filter src_dict for respective records:
dest = {d['fields']['id']: d for d in dest_dict}
records_update = [dest[d['id']] for d in src_dict if d['id'] in dest]
records_insert = [d for d in src_dict if d['id'] not in dest]
You need to do an append to insert list only when no append is done to update list.
records_update = []
records_insert = []
for rec_src in src_dict:
flag = 0
for rec_dest in dest_dict:
if rec_src['id'] == rec_dest['fields']['id']:
print('match and add this element to update list')
flag = 1
if flag != 1:
print('no match add this element to insert list')
# [{'rec_id': 'abc', 'fields': {'id': 111, 'name': 'sam'}}, {'rec_id': 'xyz', 'fields': {'id': 333, 'name': 'ross'}}]
# [{'id': 444, 'name': 'jack'}]
I am able to achieve the desired result using this:
records_update = []
records_insert = []
for rec_src in src_dict:
flag = 0
for rec_dest in dest_dict:
if rec_src['id'] == rec_dest['fields']['id']:
new_dict = {}
print('match and add this element to update list')
new_dict['rec_id'] = rec_dest['rec_id']
new_dict['fields'] = rec_src
flag = 1
if flag != 1:
print('no match add this element to insert list')
[{'rec_id': 'abc', 'fields': {'id': 111, 'name': 'sam'}},
{'rec_id': 'xyz', 'fields': {'id': 333, 'name': 'name_changed_to_not_ross'}}]
[{'id': 444, 'name': 'jack'}]
Just curious how I can achieve the result using #blhsing approach since that seems to be more efficient approach
ids = {d['fields']['id'] for d in dest_dict}
records_update = [dest[d['id']] for d in src_dict if d['id'] in dest]
records_insert = [d for d in src_dict if d['id'] not in dest]

Python: retrieve arbitrary dictionary path and amend data?

Simple Python question, but I'm scratching my head over the answer!
I have an array of strings of arbitrary length called path, like this:
path = ['country', 'city', 'items']
I also have a dictionary, data, and a string, unwanted_property. I know that the dictionary is of arbitrary depth and is dictionaries all the way down, with the exception of the items property, which is always an array.
[CLARIFICATION: The point of this question is that I don't know what the contents of path will be. They could be anything. I also don't know what the dictionary will look like. I need to walk down the dictionary as far as the path indicates, and then delete the unwanted properties from there, without knowing in advance what the path looks like, or how long it will be.]
I want to retrieve the parts of the data object (if any) that matches the path, and then delete the unwanted_property from each.
So in the example above, I would like to retrieve:
and then delete unwanted_property from each of the items in the array. I want to amend the original data, not a copy. (CLARIFICATION: By this I mean, I'd like to end up with the original dict, just minus the unwanted properties.)
How can I do this in code?
I've got this far:
path = ['country', 'city', 'items']
data = {
'country': {
'city': {
'items': [
'name': '114th Street',
'unwanted_property': 'foo',
'name': '8th Avenue',
'unwanted_property': 'foo',
for p in path:
if p == 'items':
data = [i for i in data[p]]
data = data[p]
if isinstance(data, list):
for d in data:
del d['unwanted_property']
del data['unwanted_property']
The problem is that this doesn't amend the original data. It also relies on items always being the last string in the path, which may not always be the case.
CLARIFICATION: I mean that I'd like to end up with:
'country': {
'city': {
'items': [
'name': '114th Street'
'name': '8th Avenue'
Whereas what I have available in data is only [{'name': '114th Street'}, {'name': '8th Avenue'}].
I feel like I need something like XPath for the dictionary.
The problem you are overwriting the original data reference. Change your processing code to
temp = data
for p in path:
temp = temp[p]
if isinstance(temp, list):
for d in temp:
del d['unwanted_property']
del temp['unwanted_property']
In this version, you set temp to point to the same object that data was referring to. temp is not a copy, so any changes you make to it will be visible in the original object. Then you step temp along itself, while data remains a reference to the root dictionary. When you find the path you are looking for, any changes made via temp will be visible in data.
I also removed the line data = [i for i in data[p]]. It creates an unnecessary copy of the list that you never need, since you are not modifying the references stored in the list, just the contents of the references.
The fact that path is not pre-determined (besides the fact that items is going to be a list) means that you may end up getting a KeyError in the first loop if the path does not exist in your dictionary. You can handle that gracefully be doing something more like:
temp = data
for p in path:
temp = temp[p]
except KeyError:
print('Path {} not in data'.format(path))
if isinstance(temp, list):
for d in temp:
del d['unwanted_property']
del temp['unwanted_property']
The problem you are facing is that you are re-assigning the data variable to an undesired value. In the body of your for loop you are setting data to the next level down on the tree, for instance given your example data will have the following values (in order), up to when it leaves the for loop:
data == {'country': {'city': {'items': [{'name': '114th Street', 'unwanted_property': 'foo',}, {'name': '8th Avenue', 'unwanted_property': 'foo',},]}}}
data == {'city': {'items': [{'name': '114th Street', 'unwanted_property': 'foo',}, {'name': '8th Avenue', 'unwanted_property': 'foo',},]}}
data == {'items': [{'name': '114th Street', 'unwanted_property': 'foo',}, {'name': '8th Avenue', 'unwanted_property': 'foo',},]}
data == [{'name': '114th Street', 'unwanted_property': 'foo',}, {'name': '8th Avenue', 'unwanted_property': 'foo',},]
Then when you delete the items from your dictionaries at the end you are left with data being a list of those dictionaries as you have lost the higher parts of the structure. Thus if you make a backup reference for your data you can get the correct output, for example:
path = ['country', 'city', 'items']
data = {
'country': {
'city': {
'items': [
'name': '114th Street',
'unwanted_property': 'foo',
'name': '8th Avenue',
'unwanted_property': 'foo',
data_ref = data
for p in path:
if p == 'items':
data = [i for i in data[p]]
data = data[p]
if isinstance(data, list):
for d in data:
del d['unwanted_property']
del data['unwanted_property']
data = data_ref
def delKey(your_dict,path):
if len(path) == 1:
for item in your_dict:
del item[path[0]]
delKey( your_dict[path[0]],path[1:])
{'country': {'city': {'items': [{'name': '114th Street', 'unwanted_property': 'foo'}, {'name': '8th Avenue', 'unwanted_property': 'foo'}]}}}
['country', 'city', 'items', 'unwanted_property']
{'country': {'city': {'items': [{'name': '114th Street'}, {'name': '8th Avenue'}]}}}
You need to remove the key unwanted_property.
names_list = []
def remove_key_from_items(data):
for d in data:
if d != 'items':
for item in data[d]:
unwanted_prop = item.pop('unwanted_property', None)
This will remove the key. The second parameter None is returned if the key unwanted_property does not exist.
You can use pop even without the second parameter. It will raise KeyError if the key does not exist.
EDIT 2: Updated to recursively go into depth of data dict until it finds the items key, where it pops the unwanted_property as desired and append into the names_list list to get the desired output.
Using operator.itemgetter you can compose a function to return the final key's value.
import operator, functools
def compose(*functions):
'''returns a callable composed of the functions
compose(f, g, h, k) -> f(g(h(k())))
def compose2(f, g):
return lambda x: f(g(x))
return functools.reduce(compose2, functions, lambda x: x)
get_items = compose(*[operator.itemgetter(key) for key in path[::-1]])
Then use it like this:
path = ['country', 'city', 'items']
unwanted_property = 'unwanted_property'
for thing in get_items(data):
del thing[unwanted_property]
Of course if the path contains non-existent keys it will throw a KeyError - you probably should account for that:
path = ['country', 'foo', 'items']
get_items = compose(*[operator.itemgetter(key) for key in path[::-1]])
for thing in get_items(data):
del thing[unwanted_property]
except KeyError as e:
print('missing key:', e)
You can try this:
path = ['country', 'city', 'items']
previous_data = data[path[0]]
previous_key = path[0]
for i in path:
previous_data = previous_data[i]
previous_key = i
if isinstance(previous_data, list):
for c, b in enumerate(previous_data):
if "unwanted_property" in b:
del previous_data[c]["unwanted_property"]
current_dict = {}
previous_data_dict = {}
for i, a in enumerate(path):
if i == 0:
current_dict[a] = data[a]
previous_data_dict = data[a]
if a == previous_key:
current_dict[a] = previous_data
current_dict[a] = previous_data_dict[a]
previous_data_dict = previous_data_dict[a]
data = current_dict
{'country': {'city': {'items': [{'name': '114th Street'}, {'name': '8th Avenue'}]}}, 'items': [{'name': '114th Street'}, {'name': '8th Avenue'}], 'city': {'items': [{'name': '114th Street'}, {'name': '8th Avenue'}]}}
