Create a new dictionary from a nested JSON output after parsing - python

In python3 I need to get a JSON response from an API call,
and parse it so I will get a dictionary That only contains the data I need.
The final dictionary I ecxpt to get is as follows:
{'Severity Rules': ('cc55c459-eb1a-11e8-9db4-0669bdfa776e', ['cc637182-eb1a-11e8-9db4-0669bdfa776e']), 'auto_collector': ('57e9a4ec-21f7-4e0e-88da-f0f1fda4c9d1', ['0ab2470a-451e-11eb-8856-06364196e782'])}
the JSON response returns the following output:
{
'RuleGroups': [{
'Id': 'cc55c459-eb1a-11e8-9db4-0669bdfa776e',
'Name': 'Severity Rules',
'Order': 1,
'Enabled': True,
'Rules': [{
'Id': 'cc637182-eb1a-11e8-9db4-0669bdfa776e',
'Name': 'Severity Rule',
'Description': 'Look for default severity text',
'Enabled': False,
'RuleMatchers': None,
'Rule': '\\b(?P<severity>DEBUG|TRACE|INFO|WARN|ERROR|FATAL|EXCEPTION|[I|i]nfo|[W|w]arn|[E|e]rror|[E|e]xception)\\b',
'SourceField': 'text',
'DestinationField': 'text',
'ReplaceNewVal': '',
'Type': 'extract',
'Order': 21520,
'KeepBlockedLogs': False
}],
'Type': 'user'
}, {
'Id': '4f6fa7c6-d60f-49cd-8c3d-02dcdff6e54c',
'Name': 'auto_collector',
'Order': 4,
'Enabled': True,
'Rules': [{
'Id': '2d6bdc1d-4064-11eb-8856-06364196e782',
'Name': 'auto_collector',
'Description': 'DO NOT CHANGE!! Created via API coralogix-blocker tool',
'Enabled': False,
'RuleMatchers': None,
'Rule': 'AUTODISABLED',
'SourceField': 'subsystemName',
'DestinationField': 'subsystemName',
'ReplaceNewVal': '',
'Type': 'block',
'Order': 1,
'KeepBlockedLogs': False
}],
'Type': 'user'
}]
}
I was able to create a dictionary that contains the name and the RuleGroupsID, like that:
response = requests.get(url,headers=headers)
output = response.json()
outputlist=(output["RuleGroups"])
groupRuleName = [li['Name'] for li in outputlist]
groupRuleID = [li['Id'] for li in outputlist]
# Create a dictionary of NAME + ID
ruleDic = {}
for key in groupRuleName:
for value in groupRuleID:
ruleDic[key] = value
groupRuleID.remove(value)
break
Which gave me a simple dictionary:
{'Severity Rules': 'cc55c459-eb1a-11e8-9db4-0669bdfa776e', 'Rewrites': 'ddbaa27e-1747-11e9-9db4-0669bdfa776e', 'Extract': '0cb937b6-2354-d23a-5806-4559b1f1e540', 'auto_collector': '4f6fa7c6-d60f-49cd-8c3d-02dcdff6e54c'}
but when I tried to parse it as nested JSON things just didn't work.

In the end, I managed to create a function that returns this dictionary,
I'm doing it by breaking the JSON into 3 lists by the needed elements (which are Name, Id, and Rules from the first nest), and then create another list from the nested JSON ( which listed everything under Rule) which only create a list from the keyword "Id".
Finally creating a dictionary using a zip command on the lists and dictionaries created earlier.
def get_filtered_rules() -> List[dict]:
groupRuleName = [li['Name'] for li in outputlist]
groupRuleID = [li['Id'] for li in outputlist]
ruleIDList = [li['Rules'] for li in outputlist]
ruleIDListClean = []
ruleClean = []
for sublist in ruleIDList:
try:
lstRule = [item['Rule'] for item in sublist]
ruleClean.append(lstRule)
ruleContent=list(zip(groupRuleName, ruleClean))
ruleContentDictionary = dict(ruleContent)
lstID = [item['Id'] for item in sublist]
ruleIDListClean.append(lstID)
# Create a dictionary of NAME + ID + RuleID
ruleDic = dict(zip(groupRuleName, zip(groupRuleID, ruleIDListClean)))
except Exception as e: print(e)
return ruleDic

Related

problem with iterating over non existing indexes python

I have extracted data from woocommerce webshop with api. Part of the top structure is like this:
{'id': 12345,
'attributes': [{'id': 1,
'name': 'kleur',
'position': 0,
'visible': True,
'variation': False,
'options': ['blauw']},
{'id': 2,
'name': 'maat',
'position': 1,
'visible': True,
'variation': True,
'options': ['s',
'm',
'l']}],
..................
}
try to make a list of dicts
all_webshop_skus = []
for item in all_data:
product= {}
product = {
'id':item['id'],
'sku': item['sku'],
'name' : item['name'],
'date_created': item['date_created'],
'brands': item['brands'][0]['name'],
'attributes': item['attributes'][1]['name']
}
all_webshop_skus.append(product)
iteration has index issues
---> 16 'attributes': item['attributes'][1]['name']
17 }
18 all_webshop_skus.append(product)
IndexError: list index out of range
IndexError: list index out of range
I think because not every item has a second element in the 'attributes' list of dicts. How can I extract 'name'_values from 'attributes' with 'attribute_id'_value = 2?
Loop through the attributes until you find the one you want, and use its name.
all_webshop_skus = []
for item in all_data:
for attr in item['attributes']:
if attr['id'] == 2:
name = attr['name']
break
else: # default if not found
name = ''
product = {
'id': item['id'],
'sku': item['sku'],
'name' : item['name'],
'date_created': item['date_created'],
'brands': item['brands'][0]['name'],
'attributes': name
}
all_webshop_skus.append(product)

How can I remove nested keys and create a new dict and link both with an ID?

I have a problem. I have a dict my_Dict. This is somewhat nested. However, I would like to 'clean up' the dict my_Dict, by this I mean that I would like to separate all nested ones and also generate a unique ID so that I can later find the corresponding object again.
For example, I have detail: {...}, this nested, should later map an independent dict my_Detail_Dict and in addition, detail should receive a unique ID within my_Dict. Unfortunately, my list that I give out is empty. How can I remove my slaughtered keys and give them an ID?
my_Dict = {
'_key': '1',
'group': 'test',
'data': {},
'type': '',
'code': '007',
'conType': '1',
'flag': None,
'createdAt': '2021',
'currency': 'EUR',
'detail': {
'selector': {
'number': '12312',
'isTrue': True,
'requirements': [{
'type': 'customer',
'requirement': '1'}]
}
}
}
def nested_dict(my_Dict):
my_new_dict_list = []
for key in my_Dict.keys():
#print(f"Looking for {key}")
if isinstance(my_Dict[key], dict):
print(f"{key} is nested")
# Add id to nested stuff
my_Dict[key]["__id"] = 1
my_nested_Dict = my_Dict[key]
# Delete all nested from the key
del my_Dict[key]
# Add id to key, but not the nested stuff
my_Dict[key] = 1
my_new_dict_list.append(my_Dict[key])
my_new_dict_list.append(my_Dict)
return my_new_dict_list
nested_dict(my_Dict)
[OUT] []
# What I want
[my_Dict, my_Details_Dict, my_Data_Dict]
What I have
{'_key': '1',
'group': 'test',
'data': {},
'type': '',
'code': '007',
'conType': '1',
'flag': None,
'createdAt': '2021',
'currency': 'EUR',
'detail': {'selector': {'number': '12312',
'isTrue': True,
'requirements': [{'type': 'customer', 'requirement': '1'}]}}}
What I want
my_Dict = {'_key': '1',
'group': 'test',
'data': 18,
'type': '',
'code': '007',
'conType': '1',
'flag': None,
'createdAt': '2021',
'currency': 'EUR',
'detail': 22}
my_Data_Dict = {'__id': 18}
my_Detail_Dict = {'selector': {'number': '12312',
'isTrue': True,
'requirements': [{'type': 'customer', 'requirement': '1'}]}, '__id': 22}
The following code snippet will solve what you are trying to do:
my_Dict = {
'_key': '1',
'group': 'test',
'data': {},
'type': '',
'code': '007',
'conType': '1',
'flag': None,
'createdAt': '2021',
'currency': 'EUR',
'detail': {
'selector': {
'number': '12312',
'isTrue': True,
'requirements': [{
'type': 'customer',
'requirement': '1'}]
}
}
}
def nested_dict(my_Dict):
# Initializing a dictionary that will store all the nested dictionaries
my_new_dict = {}
idx = 0
for key in my_Dict.keys():
# Checking which keys are nested i.e are dictionaries
if isinstance(my_Dict[key], dict):
# Generating ID
idx += 1
# Adding generated ID as another key
my_Dict[key]["__id"] = idx
# Adding nested key with the ID to the new dictionary
my_new_dict[key] = my_Dict[key]
# Replacing nested key value with the generated ID
my_Dict[key] = idx
# Returning new dictionary containing all nested dictionaries with ID
return my_new_dict
result = nested_dict(my_Dict)
print(my_Dict)
# Iterating through dictionary to get all nested dictionaries
for item in result.items():
print(item)
If I understand you correctly, you wish to automatically make each nested dictionary it's own variable, and remove it from the main dictionary.
Finding the nested dictionaries and removing them from the main dictionary is not so difficult. However, automatically assigning them to a variable is not recommended for various reasons. Instead, what I would do is store all these dictionaries in a list, and then assign them manually to a variable.
# Prepare a list to store data in
inidividual_dicts = []
id_index = 1
for key in my_Dict.keys():
# For each key, we get the current value
value = my_Dict[key]
# Determine if the current value is a dictionary. If so, then it's a nested dict
if isinstance(value, dict):
print(key + " is a nested dict")
# Get the nested dictionary, and replace it with the ID
dict_value = my_Dict[key]
my_Dict[key] = id_index
# Add the id to previously nested dictionary
dict_value['__id'] = id_index
id_index = id_index + 1 # increase for next nested dic
inidividual_dicts.append(dict_value) # store it as a new dictionary
# Manually write out variables names, and assign the nested dictionaries to it.
[my_Details_Dict, my_Data_Dict] = inidividual_dicts

Looping through JSON object with a conditional

Having a bit of difficulties here with looping through this json object content.
The json file is as such:
[{'archived': False,
'cache_ttl': None,
'collection': {'archived': False,
'authority_level': None,
'color': '#509EE3',
'description': None,
'id': 525,
'location': '/450/',
'name': 'eaf',
'namespace': None,
'personal_owner_id': None,
'slug': 'eaf'},
'collection_id': 525,
'collection_position': None,
'created_at': '2022-01-06T20:51:17.06376Z',
'creator_id': 1,
'database_id': 4,
}, ... ]
And I want to loop through each dict in the list check that the collection is not empty and then for each collection if the location equals '/450/' return append that dict to a list.
My code is as follows.
content = json.loads(res.text)
for q in content:
if q['collection']:
for col in q['collection']:
if col['location'] == '/450/':
data.append(q)
print(data)
Having played around with it I keep either getting ValueError: too many values to unpack (expected 2) OR TypeError: string indices must be integers
Any help with my structure would be much appreciated thanks.
Disclaimer:
I had previously written this as a list comprehension and it worked like a charm however that doesnt work anymore as I now need to check if the collection is empty.
How I wrote it previously:
content = [ x for x in content if x['collection']['location'] == '/450/']
That should work for you:
for q in content:
if q['collection']['location'] == '/450/':
data.append(q)
print(data)
If you go with for loop with for col in q['collection'], you just iterate over keys inside q['collection'], so cols = ['archived', 'authority_level', ...].
From your previous list comprehension, "location" is a key in q["collection"].
When you write
for col in q["collection"]
You are iterating over the keys in q["collection"]. One of these keys is "location". Your for loop seems to iterate more than necessary:
if q['collection'] and "location" in q["collection"] and q["collection"]["location"] == "/450/":
data.append(q)
Your Code Has Way too Iterations Than needed.
The error TypeError: string indices must be integers occurs at the second conditional statement when you check col['location'] = "/450/".
That's because not all tokens in the collection object have sub-objects where you can get data with their key.
Take a look at your old code and the modified code for more in depth understanding.
# Your old json datas
content = [{'archived': False,
'cache_ttl': None,
'collection': {'archived': False,
'authority_level': None,
'color': '#509EE3',
'description': None,
'id': 525,
'location': '/450/',
'name': 'eaf',
'namespace': None,
'personal_owner_id': None,
'slug': 'eaf'},
'collection_id': 525,
'collection_position': None,
'created_at': '2022-01-06T20:51:17.06376Z',
'creator_id': 1,
'database_id': 4,
} ]
data = []
for q in content:
if q['collection']:
for col in q['collection']:
if col['location'] == '/450/': # The first object in collection object is [archived] which is a string, this causes the program to throw error
data.append(q)
print(data)
Here is the modified code
# Your json datas
json_datas = [{'archived': False,
'cache_ttl': None,
'collection': {'archived': False,
'authority_level': None,
'color': '#509EE3',
'description': None,
'id': 525,
'location': '/450/',
'name': 'eaf',
'namespace': None,
'personal_owner_id': None,
'slug': 'eaf'},
'collection_id': 525,
'collection_position': None,
'created_at': '2022-01-06T20:51:17.06376Z',
'creator_id': 1,
'database_id': 4,
} ]
list_data = [] # Your list data in which appends the json data if the location is /450/
for data in json_datas: # Getting each Json data
if len(data["collection"]): # Continue if the length of collection is not 0 [NOTE: 0 = False, 1 or more = True]
if data['collection']['location'] == "/450/": # Check the location
list_data.append(data) # Append if true
print(list_data)
Don't need to iterate over the collection object since it's a dictionary and just need to check the location property.
Also, in case the "collection" or "location" properties are not present then use dict.get(key) function rather than dict[key] since the latter will raise a KeyError exception if key is not found and get() returns None value if key is not found.
content = [{'archived': False,
'cache_ttl': None,
'collection': {'archived': False,
'authority_level': None,
'color': '#509EE3',
'description': None,
'id': 525,
'location': '/450/',
'name': 'eaf',
'namespace': None,
'personal_owner_id': None,
'slug': 'eaf'},
'collection_id': 525,
'collection_position': None,
'created_at': '2022-01-06T20:51:17.06376Z',
'creator_id': 1,
'database_id': 4,
},
{'foo': None}
]
#content = json.loads(res.text)
data = []
for q in content:
c = q.get('collection')
if c and c.get('location') == '/450/':
data.append(q)
print(data)
Output:
[{'archived': False, 'cache_ttl': None, 'collection': { 'location': '/450/', 'name': 'eaf', 'namespace': None }, ...}]

Find item in a list of dictionaries

I have this data
data = [
{
'id': 'abcd738asdwe',
'name': 'John',
'mail': 'test#test.com',
},
{
'id': 'ieow83janx',
'name': 'Jane',
'mail': 'test#foobar.com',
}
]
The id's are unique, it's impossible that multiple dictonaries have the same id.
For example I want to get the item with the id "ieow83janx".
My current solution looks like this:
search_id = 'ieow83janx'
item = [x for x in data if x['id'] == search_id][0]
Do you think that's the be solution or does anyone know an alternative solution?
Since the ids are unique, you can store the items in a dictionary to achieve O(1) lookup.
lookup = {ele['id']: ele for ele in data}
then you can do
user_info = lookup[user_id]
to retrieve it
If you are going to get this kind of operations more than once on this particular object, I would recommend to translate it into a dictionary with id as a key.
data = [
{
'id': 'abcd738asdwe',
'name': 'John',
'mail': 'test#test.com',
},
{
'id': 'ieow83janx',
'name': 'Jane',
'mail': 'test#foobar.com',
}
]
data_dict = {item['id']: item for item in data}
#=> {'ieow83janx': {'mail': 'test#foobar.com', 'id': 'ieow83janx', 'name': 'Jane'}, 'abcd738asdwe': {'mail': 'test#test.com', 'id': 'abcd738asdwe', 'name': 'John'}}
data_dict['ieow83janx']
#=> {'mail': 'test#foobar.com', 'id': 'ieow83janx', 'name': 'Jane'}
In this case, this lookup operation will cost you some constant* O(1) time instead of O(N).
How about the next built-in function (docs):
>>> data = [
... {
... 'id': 'abcd738asdwe',
... 'name': 'John',
... 'mail': 'test#test.com',
... },
... {
... 'id': 'ieow83janx',
... 'name': 'Jane',
... 'mail': 'test#foobar.com',
... }
... ]
>>> search_id = 'ieow83janx'
>>> next(x for x in data if x['id'] == search_id)
{'id': 'ieow83janx', 'name': 'Jane', 'mail': 'test#foobar.com'}
EDIT:
It raises StopIteration if no match is found, which is a beautiful way to handle absence:
>>> search_id = 'does_not_exist'
>>> try:
... next(x for x in data if x['id'] == search_id)
... except StopIteration:
... print('Handled absence!')
...
Handled absence!
Without creating a new dictionary or without writing several lines of code, you can simply use the built-in filter function to get the item lazily, not checking after it finds the match.
next(filter(lambda d: d['id']==search_id, data))
should for just fine.
Would this not achieve your goal?
for i in data:
if i.get('id') == 'ieow83janx':
print(i)
(xenial)vash#localhost:~/python$ python3.7 split.py
{'id': 'ieow83janx', 'name': 'Jane', 'mail': 'test#foobar.com'}
Using comprehension:
[i for i in data if i.get('id') == 'ieow83janx']
if any(item['id']=='ieow83janx' for item in data):
#return item
As any function returns true if iterable (List of dictionaries in your case) has value present.
While using Generator Expression there will not be need of creating internal List. As there will not be duplicate values for the id in List of dictionaries, any will stop the iteration until the condition returns true. i.e the generator expression with any will stop iterating on shortcircuiting. Using List comprehension will create a entire List in the memory where as GE creates the element on the fly which will be better if you are having large items as it uses less memory.

how to use nested dictionary in python?

I am trying to write some code with the Hunter.io API to automate some of my b2b email scraping. It's been a long time since I've written any code and I could use some input. I have a CSV file of Urls, and I want to call a function on each URL that outputs a dictionary like this:
`{'domain': 'fromthebachrow.com', 'webmail': False, 'pattern': '{f}{last}', 'organization': None, 'emails': [{'value': 'fbach#fromthebachrow.com', 'type': 'personal', 'confidence': 91, 'sources': [{'domain': 'fromthebachrow.com', 'uri': 'http://fromthebachrow.com/contact', 'extracted_on': '2017-07-01'}], 'first_name': None, 'last_name': None, 'position': None, 'linkedin': None, 'twitter': None, 'phone_number': None}]}`
for each URL I call my function on. I want my code to return just the email address for each key labeled 'value'.
Value is a key that is contained in a list that itself is an element of the directory my function outputs. I am able to access the output dictionary to grab the list that is keyed to 'emails', but I don't know how to access the dictionary contained in the list. I want my code to return the value in that dictionary that is keyed with 'value', and I want it to do so for all of my urls.
from pyhunyrt import PyHunter
import csv
file=open('urls.csv')
reader=cvs.reader (file)
urls=list(reader)
hunter=PyHunter('API Key')
for item in urls:
output=hunter.domain_search(item)
output['emails'`
which returns a list that looks like this for each item:
[{
'value': 'fbach#fromthebachrow.com',
'type': 'personal',
'confidence': 91,
'sources': [{
'domain': 'fromthebachrow.com',
'uri': 'http://fromthebachrow.com/contact',
'extracted_on': '2017-07-01'
}],
'first_name': None,
'last_name': None,
'position': None,
'linkedin': None,
'twitter': None,
'phone_number': None
}]
How do I grab the first dictionary in that list and then access the email paired with 'value' so that my output is just an email address for each url I input initially?
To grab the first dict (or any item) in a list, use list[0], then to grab a value of a key value use ["value"]. To combine it, you should use list[0]["value"]

Categories