Convert XML to List of Dictionaries in python

Convert XML to List of Dictionaries in python - python

I'm very new to python and please treat me as same.
When i tried to convert the XML content into List of Dictionaries I'm getting output but not as expected and tried a lot playing around.
XML Content:
<project>
<panelists>
<panelist panelist_login="pradeep">
<login/>
<firstname/>
<lastname/>
<gender/>
<age>0</age>
</panelist>
<panelist panelist_login="kumar">
<login>kumar</login>
<firstname>kumar</firstname>
<lastname>Pradeep</lastname>
<gender/>
<age>24</age>
</panelist>
</panelists>
</project>
Code i have used:
import xml.etree.ElementTree as ET
tree = ET.parse(xml_file.xml) # import xml from
root = tree.getroot()
Panelist_list = []
for item in root.findall('./panelists/panelist'): # find all projects node
Panelist = {} # dictionary to store content of each projects
panelist_login = {}
panelist_login = item.attrib
Panelist_list.append(panelist_login)
for child in item:
Panelist[child.tag] = child.text
Panelist_list.append(Panelist)
print(Panelist_list)
Output:
[{
'panelist_login': 'pradeep'
}, {
'login': None,
'firstname': None,
'lastname': None,
'gender': None,
'age': '0'
}, {
'panelist_login': 'kumar'
}, {
'login': 'kumar',
'firstname': 'kumar',
'lastname': 'Pradeep',
'gender': None,
'age': '24'
}]
and I'm Expecting for the below Output
[{
'panelist_login': 'pradeep',
'login': None,
'firstname': None,
'lastname': None,
'gender': None,
'age': '0'
}, {
'panelist_login': 'kumar'
'login': 'kumar',
'firstname': 'kumar',
'lastname': 'Pradeep',
'gender': None,
'age': '24'
}]
I have refereed so many stack overflow questions on xml tree but still didn't helped me.
any help/suggestion is appreciated.

Your code is appending the dict panelist_login with the tag attributes to the list, in this line: Panelist_list.append(panelist_login) separately from the Panelist dict. So for every <panelist> tag the code appends 2 dicts: one dict of tag attributes and one dict of subtags. Inside the loop you have 2 append() calls, which means 2 items in the list for each time through the loop.
But you actually want a single dict for each <panelist> tag, and you want the tag attribute to appear inside the Panelist dict as if it were a subtag also.
So have a single dict, and update the Panelist dict with the tag attributes instead of keeping the tag attributes in a separate dict.
for item in root.findall('./panelists/panelist'): # find all projects node
Panelist = {} # dictionary to store content of each projects
panelist_login = item.attrib
Panelist.update(panelist_login) # make panelist_login the first key of the dict
for child in item:
Panelist[child.tag] = child.text
Panelist_list.append(Panelist)
print(Panelist_list)
I get this output, which I think is what you had in mind:
[
{'panelist_login': 'pradeep',
'login': None,
'firstname': None,
'lastname': None,
'gender': None,
'age': '0'},
{'panelist_login': 'kumar',
'login': 'kumar',
'firstname': 'kumar',
'lastname': 'Pradeep',
'gender': None,
'age': '24'}
]

Related

Create a new dictionary from a nested JSON output after parsing

In python3 I need to get a JSON response from an API call,
and parse it so I will get a dictionary That only contains the data I need.
The final dictionary I ecxpt to get is as follows:
{'Severity Rules': ('cc55c459-eb1a-11e8-9db4-0669bdfa776e', ['cc637182-eb1a-11e8-9db4-0669bdfa776e']), 'auto_collector': ('57e9a4ec-21f7-4e0e-88da-f0f1fda4c9d1', ['0ab2470a-451e-11eb-8856-06364196e782'])}
the JSON response returns the following output:
{
'RuleGroups': [{
'Id': 'cc55c459-eb1a-11e8-9db4-0669bdfa776e',
'Name': 'Severity Rules',
'Order': 1,
'Enabled': True,
'Rules': [{
'Id': 'cc637182-eb1a-11e8-9db4-0669bdfa776e',
'Name': 'Severity Rule',
'Description': 'Look for default severity text',
'Enabled': False,
'RuleMatchers': None,
'Rule': '\\b(?P<severity>DEBUG|TRACE|INFO|WARN|ERROR|FATAL|EXCEPTION|[I|i]nfo|[W|w]arn|[E|e]rror|[E|e]xception)\\b',
'SourceField': 'text',
'DestinationField': 'text',
'ReplaceNewVal': '',
'Type': 'extract',
'Order': 21520,
'KeepBlockedLogs': False
}],
'Type': 'user'
}, {
'Id': '4f6fa7c6-d60f-49cd-8c3d-02dcdff6e54c',
'Name': 'auto_collector',
'Order': 4,
'Enabled': True,
'Rules': [{
'Id': '2d6bdc1d-4064-11eb-8856-06364196e782',
'Name': 'auto_collector',
'Description': 'DO NOT CHANGE!! Created via API coralogix-blocker tool',
'Enabled': False,
'RuleMatchers': None,
'Rule': 'AUTODISABLED',
'SourceField': 'subsystemName',
'DestinationField': 'subsystemName',
'ReplaceNewVal': '',
'Type': 'block',
'Order': 1,
'KeepBlockedLogs': False
}],
'Type': 'user'
}]
}
I was able to create a dictionary that contains the name and the RuleGroupsID, like that:
response = requests.get(url,headers=headers)
output = response.json()
outputlist=(output["RuleGroups"])
groupRuleName = [li['Name'] for li in outputlist]
groupRuleID = [li['Id'] for li in outputlist]
# Create a dictionary of NAME + ID
ruleDic = {}
for key in groupRuleName:
for value in groupRuleID:
ruleDic[key] = value
groupRuleID.remove(value)
break
Which gave me a simple dictionary:
{'Severity Rules': 'cc55c459-eb1a-11e8-9db4-0669bdfa776e', 'Rewrites': 'ddbaa27e-1747-11e9-9db4-0669bdfa776e', 'Extract': '0cb937b6-2354-d23a-5806-4559b1f1e540', 'auto_collector': '4f6fa7c6-d60f-49cd-8c3d-02dcdff6e54c'}
but when I tried to parse it as nested JSON things just didn't work.

In the end, I managed to create a function that returns this dictionary,
I'm doing it by breaking the JSON into 3 lists by the needed elements (which are Name, Id, and Rules from the first nest), and then create another list from the nested JSON ( which listed everything under Rule) which only create a list from the keyword "Id".
Finally creating a dictionary using a zip command on the lists and dictionaries created earlier.
def get_filtered_rules() -> List[dict]:
groupRuleName = [li['Name'] for li in outputlist]
groupRuleID = [li['Id'] for li in outputlist]
ruleIDList = [li['Rules'] for li in outputlist]
ruleIDListClean = []
ruleClean = []
for sublist in ruleIDList:
try:
lstRule = [item['Rule'] for item in sublist]
ruleClean.append(lstRule)
ruleContent=list(zip(groupRuleName, ruleClean))
ruleContentDictionary = dict(ruleContent)
lstID = [item['Id'] for item in sublist]
ruleIDListClean.append(lstID)
# Create a dictionary of NAME + ID + RuleID
ruleDic = dict(zip(groupRuleName, zip(groupRuleID, ruleIDListClean)))
except Exception as e: print(e)
return ruleDic

Problem when saving json data into python list

I'm trying to get two attributes at the time from my json data and add them as an item on my python list. However, when trying to add those two: ['emailTypeDesc']['createdDate'] it throws an error. Could someone help with this? thanks in advance!
json:
{
'readOnly': False,
'senderDetails': {'firstName': 'John', 'lastName': 'Doe', 'emailAddress': 'johndoe#gmail.com', 'emailAddressId': 123456, 'personalId': 123, 'companyName': 'ACME‘},
'clientDetails': {'firstName': 'Jane', 'lastName': 'Doe', 'emailAddress': 'janedoe#gmail.com', 'emailAddressId': 654321, 'personalId': 456, 'companyName': 'Lorem Ipsum‘}},
'notesSection': {},
'emailList': [{'requestId': 12345667, 'emailId': 9876543211, 'emailType': 3, 'emailTypeDesc': 'Email-In', 'emailTitle': 'SampleTitle 1', 'createdDate': '15-May-2020 11:15:52', 'fromMailList': [{'firstName': 'Jane', 'lastName': 'Doe', 'emailAddress': 'janedoe#gmail.com',}]},
{'requestId': 12345667, 'emailId': 14567775, 'emailType': 3, 'emailTypeDesc': 'Email-Out', 'emailTitle': 'SampleTitle 2', 'createdDate': '16-May-2020 16:15:52', 'fromMailList': [{'firstName': 'Jane', 'lastName': 'Doe', 'emailAddress': 'janedoe#gmail.com',}]},
{'requestId': 12345667, 'emailId': 12345, 'emailType': 3, 'emailTypeDesc': 'Email-In', 'emailTitle': 'SampleTitle 3', 'createdDate': '17-May-2020 20:15:52', 'fromMailList': [{'firstName': 'Jane', 'lastName': 'Doe', 'emailAddress': 'janedoe#gmail.com',}]
}
python:
final_list = []
data = json.loads(r.text)
myId = [(data['emailList'][0]['requestId'])]
for each_req in myId:
final_list.append(each_req)
myEmailList = [mails['emailTypeDesc']['createdDate'] for mails in data['emailList']]
for each_requ in myEmailList:
final_list.append(each_requ)
return final_list
This error comes up when I run the above code:
TypeError: string indices must be integers
Desired output for final_list:
[12345667, 'Email-In', '15-May-2020 11:15:52', 'Email-Out', '16-May-2020 16:15:52', 'Email-In', '17-May-2020 20:15:52']
My problem is definetely in this line:
myEmailList = [mails['emailTypeDesc']['createdDate'] for mails in data['emailList']]
because when I run this without the second attribute ['createdDate'] it would work, but I need both attributes on my final_list:
myEmailList = [mails['emailTypeDesc'] for mails in data['emailList']]

I think you're misunderstanding the syntax. mails['emailTypeDesc']['createdDate'] is looking for the key 'createdDate' inside the object mails['emailTypeDesc'], but in fact they are two items at the same level.
Since mails['emailTypeDesc'] is a string, not a dictionary, you get the error you have quoted. It seems that you want to add the two items mails['emailTypeDesc'] and mails['createdDate'] to your list. I'm not sure if you'd rather join these together into a single string or create a sub-list or something else. Here's a sublist option.
myEmailList = [[mails['emailTypeDesc'], mails['createdDate']] for mails in data['emailList']]

Strings in JSON must be in double quotes, not single.
Edit: As well as names.

Find item in a list of dictionaries

I have this data
data = [
{
'id': 'abcd738asdwe',
'name': 'John',
'mail': 'test#test.com',
},
{
'id': 'ieow83janx',
'name': 'Jane',
'mail': 'test#foobar.com',
}
]
The id's are unique, it's impossible that multiple dictonaries have the same id.
For example I want to get the item with the id "ieow83janx".
My current solution looks like this:
search_id = 'ieow83janx'
item = [x for x in data if x['id'] == search_id][0]
Do you think that's the be solution or does anyone know an alternative solution?

Since the ids are unique, you can store the items in a dictionary to achieve O(1) lookup.
lookup = {ele['id']: ele for ele in data}
then you can do
user_info = lookup[user_id]
to retrieve it

If you are going to get this kind of operations more than once on this particular object, I would recommend to translate it into a dictionary with id as a key.
data = [
{
'id': 'abcd738asdwe',
'name': 'John',
'mail': 'test#test.com',
},
{
'id': 'ieow83janx',
'name': 'Jane',
'mail': 'test#foobar.com',
}
]
data_dict = {item['id']: item for item in data}
#=> {'ieow83janx': {'mail': 'test#foobar.com', 'id': 'ieow83janx', 'name': 'Jane'}, 'abcd738asdwe': {'mail': 'test#test.com', 'id': 'abcd738asdwe', 'name': 'John'}}
data_dict['ieow83janx']
#=> {'mail': 'test#foobar.com', 'id': 'ieow83janx', 'name': 'Jane'}
In this case, this lookup operation will cost you some constant* O(1) time instead of O(N).

How about the next built-in function (docs):
>>> data = [
... {
... 'id': 'abcd738asdwe',
... 'name': 'John',
... 'mail': 'test#test.com',
... },
... {
... 'id': 'ieow83janx',
... 'name': 'Jane',
... 'mail': 'test#foobar.com',
... }
... ]
>>> search_id = 'ieow83janx'
>>> next(x for x in data if x['id'] == search_id)
{'id': 'ieow83janx', 'name': 'Jane', 'mail': 'test#foobar.com'}
EDIT:
It raises StopIteration if no match is found, which is a beautiful way to handle absence:
>>> search_id = 'does_not_exist'
>>> try:
... next(x for x in data if x['id'] == search_id)
... except StopIteration:
... print('Handled absence!')
...
Handled absence!

Without creating a new dictionary or without writing several lines of code, you can simply use the built-in filter function to get the item lazily, not checking after it finds the match.
next(filter(lambda d: d['id']==search_id, data))
should for just fine.

Would this not achieve your goal?
for i in data:
if i.get('id') == 'ieow83janx':
print(i)
(xenial)vash#localhost:~/python$ python3.7 split.py
{'id': 'ieow83janx', 'name': 'Jane', 'mail': 'test#foobar.com'}
Using comprehension:
[i for i in data if i.get('id') == 'ieow83janx']

if any(item['id']=='ieow83janx' for item in data):
#return item
As any function returns true if iterable (List of dictionaries in your case) has value present.
While using Generator Expression there will not be need of creating internal List. As there will not be duplicate values for the id in List of dictionaries, any will stop the iteration until the condition returns true. i.e the generator expression with any will stop iterating on shortcircuiting. Using List comprehension will create a entire List in the memory where as GE creates the element on the fly which will be better if you are having large items as it uses less memory.

how to use nested dictionary in python?

I am trying to write some code with the Hunter.io API to automate some of my b2b email scraping. It's been a long time since I've written any code and I could use some input. I have a CSV file of Urls, and I want to call a function on each URL that outputs a dictionary like this:
`{'domain': 'fromthebachrow.com', 'webmail': False, 'pattern': '{f}{last}', 'organization': None, 'emails': [{'value': 'fbach#fromthebachrow.com', 'type': 'personal', 'confidence': 91, 'sources': [{'domain': 'fromthebachrow.com', 'uri': 'http://fromthebachrow.com/contact', 'extracted_on': '2017-07-01'}], 'first_name': None, 'last_name': None, 'position': None, 'linkedin': None, 'twitter': None, 'phone_number': None}]}`
for each URL I call my function on. I want my code to return just the email address for each key labeled 'value'.
Value is a key that is contained in a list that itself is an element of the directory my function outputs. I am able to access the output dictionary to grab the list that is keyed to 'emails', but I don't know how to access the dictionary contained in the list. I want my code to return the value in that dictionary that is keyed with 'value', and I want it to do so for all of my urls.
from pyhunyrt import PyHunter
import csv
file=open('urls.csv')
reader=cvs.reader (file)
urls=list(reader)
hunter=PyHunter('API Key')
for item in urls:
output=hunter.domain_search(item)
output['emails'`
which returns a list that looks like this for each item:
[{
'value': 'fbach#fromthebachrow.com',
'type': 'personal',
'confidence': 91,
'sources': [{
'domain': 'fromthebachrow.com',
'uri': 'http://fromthebachrow.com/contact',
'extracted_on': '2017-07-01'
}],
'first_name': None,
'last_name': None,
'position': None,
'linkedin': None,
'twitter': None,
'phone_number': None
}]
How do I grab the first dictionary in that list and then access the email paired with 'value' so that my output is just an email address for each url I input initially?

To grab the first dict (or any item) in a list, use list[0], then to grab a value of a key value use ["value"]. To combine it, you should use list[0]["value"]

Build nested tree-like dict from an array of dicts with children

I have an array of dicts retrieved from a web API. Each dict has a name, description, 'parent', and children key. The children key has an array of dicts as it value. For the sake of clarity, here is a dummy example:
[
{'name': 'top_parent', 'description': None, 'parent': None,
'children': [{'name': 'child_one'},
{'name': 'child_two'}]},
{'name': 'child_one', 'description': None, 'parent': 'top_parent',
'children': []},
{'name': 'child_two', 'description': None, 'parent': 'top_parent',
'children': [{'name': 'grand_child'}]},
{'name': 'grand_child', 'description': None, 'parent': 'child_two',
'children': []}
]
Every item in in the array. An item could be the top-most parent, and thus not exist in any of the children arrays. An item could be both a child and a parent. Or an item could only be a child (have no children of its own).
So, in a tree structure, you'd have something like this:
top_parent
child_one
child_two
grand_child
In this contrived and simplified example top_parent is a parent but not a child; child_one is a child but not a parent; child_two is a parent and a child; and grand_child is a child but not a parent. This covers every possible state.
What I want is to be able to iterate over the array of dicts 1 time and generate a nested dict that properly represents the tree structure (however, it 1 time is impossible, the most efficient way possible). So, in this example, I would get a dict that looked like this:
{
'top_parent': {
'child_one': {},
'child_two': {
'grand_child': {}
}
}
}
Strictly speaking, it is not necessary to have item's without children to not be keys, but that is preferable.

Fourth edit, showing three versions, cleaned up a bit. First version works top-down and returns None, as you requested, but essentially loops through the top level array 3 times. The next version only loops through it once, but returns empty dicts instead of None.
The final version works bottom up and is very clean. It can return empty dicts with a single loop, or None with additional looping:
from collections import defaultdict
my_array = [
{'name': 'top_parent', 'description': None, 'parent': None,
'children': [{'name': 'child_one'},
{'name': 'child_two'}]},
{'name': 'child_one', 'description': None, 'parent': 'top_parent',
'children': []},
{'name': 'child_two', 'description': None, 'parent': 'top_parent',
'children': [{'name': 'grand_child'}]},
{'name': 'grand_child', 'description': None, 'parent': 'child_two',
'children': []}
]
def build_nest_None(my_array):
childmap = [(d['name'], set(x['name'] for x in d['children']) or None)
for d in my_array]
all_dicts = dict((name, kids and {}) for (name, kids) in childmap)
results = all_dicts.copy()
for (name, kids) in ((x, y) for x, y in childmap if y is not None):
all_dicts[name].update((kid, results.pop(kid)) for kid in kids)
return results
def build_nest_empty(my_array):
all_children = set()
all_dicts = defaultdict(dict)
for d in my_array:
children = set(x['name'] for x in d['children'])
all_dicts[d['name']].update((x, all_dicts[x]) for x in children)
all_children.update(children)
top_name, = set(all_dicts) - all_children
return {top_name: all_dicts[top_name]}
def build_bottom_up(my_array, use_None=False):
all_dicts = defaultdict(dict)
for d in my_array:
name = d['name']
all_dicts[d['parent']][name] = all_dicts[name]
if use_None:
for d in all_dicts.values():
for x, y in d.items():
if not y:
d[x] = None
return all_dicts[None]
print(build_nest_None(my_array))
print(build_nest_empty(my_array))
print(build_bottom_up(my_array, True))
print(build_bottom_up(my_array))
Results in:
{'top_parent': {'child_one': None, 'child_two': {'grand_child': None}}}
{'top_parent': {'child_one': {}, 'child_two': {'grand_child': {}}}}
{'top_parent': {'child_one': None, 'child_two': {'grand_child': None}}}
{'top_parent': {'child_one': {}, 'child_two': {'grand_child': {}}}}

You can keep a lazy mapping from names to nodes and then rebuild the hierarchy by processing just the parent link (I'm assuming data is correct, so if A is marked as the parent of B iff B is listed among the children of A).
nmap = {}
for n in nodes:
name = n["name"]
parent = n["parent"]
try:
# Was this node built before?
me = nmap[name]
except KeyError:
# No... create it now
if n["children"]:
nmap[name] = me = {}
else:
me = None
if parent:
try:
nmap[parent][name] = me
except KeyError:
# My parent will follow later
nmap[parent] = {name: me}
else:
root = me
The children property of the input is used only to know if the element should be stored as a None in its parent (because has no children) or if it should be a dictionary because it will have children at the end of the rebuild process. Storing nodes without children as empty dictionaries would simplify the code a bit by avoiding the need of this special case.
Using collections.defaultdict the code can also be simplified for the creation of new nodes
import collections
nmap = collections.defaultdict(dict)
for n in nodes:
name = n["name"]
parent = n["parent"]
me = nmap[name]
if parent:
nmap[parent][name] = me
else:
root = me
This algorithm is O(N) assuming constant-time dictionary access and makes only one pass on the input and requires O(N) space for the name->node map (the space requirement is O(Nc) for the original nochildren->None version where Nc is the number of nodes with children).

My stab at it:
persons = [\
{'name': 'top_parent', 'description': None, 'parent': None,\
'children': [{'name': 'child_one'},\
{'name': 'child_two'}]},\
{'name': 'grand_child', 'description': None, 'parent': 'child_two',\
'children': []},\
{'name': 'child_two', 'description': None, 'parent': 'top_parent',\
'children': [{'name': 'grand_child'}]},\
{'name': 'child_one', 'description': None, 'parent': 'top_parent',\
'children': []},\
]
def findParent(name,parent,tree,found = False):
if tree == {}:
return False
if parent in tree:
tree[parent][name] = {}
return True
else:
for p in tree:
found = findParent(name,parent,tree[p],False) or found
return found
tree = {}
outOfOrder = []
for person in persons:
if person['parent'] == None:
tree[person['name']] = {}
else:
if not findParent(person['name'],person['parent'],tree):
outOfOrder.append(person)
for person in outOfOrder:
if not findParent(person['name'],person['parent'],tree):
print 'parent of ' + person['name'] + ' not found
print tree
results in:
{'top_parent': {'child_two': {'grand_child': {}}, 'child_one': {}}}
It also picks up any children whose parent has not been added yet, and then reconciles this at the end.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert XML to List of Dictionaries in python - python

Related

Create a new dictionary from a nested JSON output after parsing

Problem when saving json data into python list

Find item in a list of dictionaries

how to use nested dictionary in python?

Build nested tree-like dict from an array of dicts with children

Categories

Resources