Parsing XML into a dictionary of lists Python/Django - python

I'm having a little issue with parsing an xml with python. I'm trying to get my dictionary to look like the following
listDict = [{'name':'Sales','id':'1','position':'1','order_by_type':'True','order_by_asc':'True;}, {'name':'Information','id':'2','position':'1','order_by_type':'True','order_by_asc':'True;}]
I'm thinking my loop after pulling data from the xml string is wrong.
xml_data = ElementTree.fromstring(self.data)
# Lets grab all the base cats info and add them to a dict containing a list
base_cats = xml_data.findall('./BaseCategory/Name')
base_cats_id = xml_data.findall('./BaseCategory/base_id')
base_postion = xml_data.findall('./BaseCategory/position')
base_order_by_type = xml_data.findall('./BaseCategory/order_by_type')
base_order_by_asc = xml_data.findall('./BaseCategory/order_by_asc')
# store all information into lists
base_cat = [t.text for t in base_cats]
base_id = [t.text for t in base_cats_id]
base_p = [t.text for t in base_postion]
base_obt = [t.text for t in base_order_by_type]
base_asc = [t.text for t in base_order_by_asc]
base_dict = defaultdict(list)
# lets put everything in the list into a dictionary
for base in range(len(base_cat)): # for each base in base_cat loop
base_dict[base].append(base_cat[base])
base_dict[base].append(base_id[base])
base_dict[base].append(base_p[base])
base_dict[base].append(base_obt[base])
base_dict[base].append(base_asc[base])
This produces the following.
instance = {0: ['Sales 2', '1', '10', 'True', 'True'], 1: ['Information 2', '2', '20', 'True', 'True'], 2: ['Listing 2', '3', '30', 'True', 'True'], 3: ['Information', '4', '40', 'True', 'True'], 4: ['Land', '5', '50', 'True', 'True'], 5: ['&', '6', '60', 'True', 'True'], 6: ['Tax', '7', '70', 'True', 'True'], 7: ['Construction', '9', '90', 'True', 'True'], 8: ['Interior/Utilites', '10', '100', 'True', 'True'], 9: ['HOA/Community', '11', '110', 'True', 'True'], 10: ['Remarks', '12', '120', 'True', 'True'], 11: ['Exterior', '8', '80', 'True', 'True']})
My end goal is to be able to do the following on my django template
{%for item in instance%}
{{ item.name }}
{% endfor %}
Any help on how I may have something wrong would help a lot. Thanks in advance for the help.
EDIT:
As asked here is the xml I have.
<?xml version="1.0" ?>
<FormInstance>
<BaseCategory>
<Name>Sales</Name>
<base_id>1</base_id>
<position>10</position>
<order_by_type>True</order_by_type>
<order_by_asc>True</order_by_asc>
</BaseCategory>
<BaseCategory>
<Name>Information</Name>
<base_id>2</base_id>
<position>20</position>
<order_by_type>True</order_by_type>
<order_by_asc>True</order_by_asc>
<MainCategory>
<main_id>1</main_id>
<Name>Address 3</Name>
<is_visible>True</is_visible>
<position>10</position>
<order_by_type>True</order_by_type>
<order_by_asc>True</order_by_asc>
<SubCategory>
<sub_id>1</sub_id>
<Name>Street Number 2</Name>
<sub_library_id>StreetNumber</sub_library_id>
<field_display_type>[u'input']</field_display_type>
<field_type>[u'varchar']</field_type>
<is_active>True</is_active>
<is_required>True</is_required>
<help_text>Street Number</help_text>
<main_category>1</main_category>
<is_visible>True</is_visible>
<position>10</position>
<order_by_type>True</order_by_type>
<order_by_asc>True</order_by_asc>
<show_seller>True</show_seller>
<Enumerations>
<enum_id>4</enum_id>
<Name>Test Enum</Name>
<library_id>test enum</library_id>
<is_active>True</is_active>
<sub_category>1</sub_category>
<is_visible>True</is_visible>
<position>10</position>
<order_by_type>True</order_by_type>
<order_by_asc>True</order_by_asc>
</Enumerations>
</SubCategory>
</MainCategory>
</BaseCategory>
</FormInstance>

So, for what I gather in the expected results, it looks like you just want to get the information about nodes that are strictly BaseCategory, right? In the XML that was provided in the edit, you have two of those.
You should see the XML as a tree of nodes. In the example, you have something like:
FormInstance # this is the root
/ \
/ \
BaseCategory BaseCategory
(name:Sales) (name:Information)
\
\
MainCategory
(name:Address 3)
\
\
Subcategory
(name:Street Number 2)
But you only need the information in the BaseCategory elements, right?
You could just position yourself in the root (which... well... is what xml.fromstring does anyway) iterate over its BaseCategory nodes, get the items you need from those BaseCategory nodes and put them in your list of dictionaries.
Something like:
import pprint
from xml.etree import ElementTree
with open("sample_xml.xml", 'r') as f:
data = f.read()
xml_data = ElementTree.fromstring(data)
base_categories = xml_data.findall("./BaseCategory")
print("Found %s base_categories." % len(base_categories))
list_dict = []
for base_category in base_categories:
list_dict.append({
"name": base_category.find("Name").text,
"id": int(base_category.find("base_id").text),
"position": int(base_category.find("position").text),
"order_by_type": (True if base_category.find("order_by_type").text.lower() == "true"
else False),
"order_by_asc": (True if base_category.find("order_by_asc").text.lower() == "true"
else False),
})
print("list_dict=%s" % (pprint.pformat(list_dict)))
Which outputs:
Found 2 base_categories.
list_dict=[{'id': 1,
'name': 'Sales',
'order_by_asc': True,
'order_by_type': True,
'position': 10},
{'id': 2,
'name': 'Information',
'order_by_asc': True,
'order_by_type': True,
'position': 20}]
The idea is that a BaseCategory item is something that can be seen as a self-contained record (like a dict, if it helps you see it) that can contain (in it) the following attributes:
A string with the name in Name
A numeric id in base_id
A numeric position
A boolean order_by_type
A boolean order_by_asc
Another object MainCategory with its own fields...
So every time you position yourself in one of these BaseCategory nodes, you just gather the interesting fields that it has and put them in dictionaries.
When you do:
base_cats = xml_data.findall('./BaseCategory/Name')
base_cats_id = xml_data.findall('./BaseCategory/base_id')
base_postion = xml_data.findall('./BaseCategory/position')
base_order_by_type = xml_data.findall('./BaseCategory/order_by_type')
base_order_by_asc = xml_data.findall('./BaseCategory/order_by_asc')
You are treating those element (base_id, position...) almost as independent elements, which is not exactly what you have in your XML.
However, if you are absolutely certain that all those lists (base_cats, base_cats_id, base_position...) do contain the same number of items, you can still re-build your dictionary, using the lenght of one of them (in the example below len(base_cats), but it could've been len(base_cats_id), len(base_position)... since all those lists have the same length) to iterate through all the lists in the same step:
base_cats = xml_data.findall('./BaseCategory/Name')
base_cats_id = xml_data.findall('./BaseCategory/base_id')
base_postion = xml_data.findall('./BaseCategory/position')
base_order_by_type = xml_data.findall('./BaseCategory/order_by_type')
base_order_by_asc = xml_data.findall('./BaseCategory/order_by_asc')
list_dict = []
for i in range(len(base_cats)):
list_dict.append({
"name": base_cats[i].text,
"id": int(base_cats_id[i].text),
"position": int(base_postion[i].text),
"order_by_type": True if base_order_by_type[i].text.lower() == "true" else False,
"order_by_asc": True if base_order_by_asc[i].text.lower() == "true" else False,
})
print("list_dict=%s" % (pprint.pformat(list_dict)))

Related

renaming key in every dictionary in list, python

I can't figure out how to rename "Number" to "ProjectId" in every dictionary in the list of dictionaries. Can anyone help? I tried it in the renaming() function but it doesn't work.
pojects =
[{'Number': '5',
'Name': 'CFO'},
{'Number': '7',
'Name': 'Head of Product'},
{'Number': '6',
'Name': 'CEO'}]
def renaming(projects):
for i in projects:
i['ProjectId'] = i.pop('Number')
return projects
you can add a new key named 'ProjectId' that takes the value of 'Number' and then delete number
for item in projects:
item['ProjectId'] = item['Number']
del item['Number']

Why dictionary generator doesn't work correctly

I need to read CSV file and fill dict by data from file. So I wrote one method
def read_data(self):
with open('storage/data/heart.csv') as f:
self.raw_data = {
len(self.raw_data): {
'age':line[0],
'sex':line[1],
'cp':line[2],
'trtbps':line[3],
'chol':line[4],
'fbs':line[5],
'restecg':line[6],
'thalachh':line[7]
} for line in csv.reader(f)}
But print(raw_data) returns this:
{0: {'age': '57', 'sex': '0', 'cp': '1', 'trtbps': '130', 'chol': '236', 'fbs': '0', 'restecg': '0', 'thalachh': '174'}}
As u can see my method saves only 1 line to dict and this line is the last line from the file. Pls help me
len(self.raw_data) is evaluated ones at the start and does not change inside the dict comprehension. Just use a normal loop or enumerate like:
def read_data(self):
with open('storage/data/heart.csv') as f:
self.raw_data = {
i: {
'age':line[0],
'sex':line[1],
'cp':line[2],
'trtbps':line[3],
'chol':line[4],
'fbs':line[5],
'restecg':line[6],
'thalachh':line[7]
} for line in i, enumerate(csv.reader(f))}

Why the nested dictionary cannot append into list correctly?

I try to append a dictionary into a list, but I find the sub-dictionary is always keep the last one I read from CSV file(deviceProfile.csv). Does anyone know why?
Here is my CSV file.
name,description,primaryTable,startingAddress,boolIndex
test_name_1,1,table_1,1,1
test_name_2,2,table_2,2,2
test_name_3,3,table_3,3,3
Here is my python code.
import csv
import yaml
from pprint import pprint
resource = {
'name': "",
'description': "",
'attributes':
{ 'primaryTable': "", 'startingAddress': "", 'boolIndex': "" },
}
resourceArray = []
with open("deviceProfile.csv") as f:
myCsvDic = csv.DictReader(f)
for row in myCsvDic:
resource['name'] = row['name']
resource['description'] = row['description']
resource['attributes']['primaryTable'] = row['primaryTable']
resource['attributes']['startingAddress'] = row['startingAddress']
resource['attributes']['boolIndex'] = row['boolIndex']
test = resource.copy()
resourceArray.append(test)
pprint (resourceArray)
And the result is
[{'attributes': {'boolIndex': '3',
'primaryTable': 'table_3',
'startingAddress': '3'},
'description': '1',
'name': 'test_name_1'},
{'attributes': {'boolIndex': '3',
'primaryTable': 'table_3',
'startingAddress': '3'},
'description': '2',
'name': 'test_name_2'},
{'attributes': {'boolIndex': '3',
'primaryTable': 'table_3',
'startingAddress': '3'},
'description': '3',
'name': 'test_name_3'}]
It is strange that name and description are appended into list correctly, but attributes. The attributes is always append the last sub-dictionary.
Any help will be appreciated. Thanks.
This is because of copy. by default copy is shallow copy and it will copy just level-1 elements.
you should use deepcopy in your case. replace test = resource.copy() with:
from copy import deepcopy
test = deepcopy(resource)
take a look at this Link for more information, or any other links that tells you about copy(shallow and deep).
Why do you have resource on the top outside the loop ???
resource = {
'name': "",
'description': "",
'attributes':
{ 'primaryTable': "", 'startingAddress': "", 'boolIndex': "" },
}
Remove that and just change the loop to this :
for row in myCsvDic:
resource = {}
resource['name'] = row['name']
resource['description'] = row['description']
resource['attributes']['primaryTable'] = row['primaryTable']
resource['attributes']['startingAddress'] = row['startingAddress']
resource['attributes']['boolIndex'] = row['boolIndex']
resourceArray.append(resource)

Convert nested XML content into CSV using xml tree in python

I'm very new to python and please treat me as same. When i tried to convert the XML content into List of Dictionaries I'm getting output but not as expected and tried a lot playing around.
XML Content
<project>
<data>
<row>
<respondent>m0wxo5f6w42h3fot34m7s6xij</respondent>
<timestamp>10-06-16 11:30</timestamp>
<product>1</product>
<replica>1</replica>
<seqnr>1</seqnr>
<session>1</session>
<column>
<question>Q1</question>
<answer>a1</answer>
</column>
<column>
<question>Q2</question>
<answer>a2</answer>
</column>
</row>
<row>
<respondent>w42h3fot34m7s6x</respondent>
<timestamp>10-06-16 11:30</timestamp>
<product>1</product>
<replica>1</replica>
<seqnr>1</seqnr>
<session>1</session>
<column>
<question>Q3</question>
<answer>a3</answer>
</column>
<column>
<question>Q4</question>
<answer>a4</answer>
</column>
<column>
<question>Q5</question>
<answer>a5</answer>
</column>
</row>
</data>
</project>
Code i have used:
import xml.etree.ElementTree as ET
tree = ET.parse(xml_file.xml) # import xml from
root = tree.getroot()
data_list = []
for item in root.find('./data'): # find all projects node
data = {} # dictionary to store content of each projects
for child in item:
data[child.tag] = child.text # add item to dictionary
#-----------------for loop with subchild is not working as expcted in my case
for subchild in child:
data[subchild.tag] = subchild.text
data_list.append(data)
print(data_list)
headers = {k for d in data_list for k in d.keys()} # headers for csv
with open(csv_file,'w') as f:
writer = csv.DictWriter(f, fieldnames = headers) # creating a DictWriter object
writer.writeheader() # write headers to csv
writer.writerows(data_list)
Output for the data_list is getting the last info of question to the list of dictionaries.
i guess the issue is at subchild forloop but im not understanding how to append the list with dictionaries.
[{
'respondent': 'anonymous_m0wxo5f6w42h3fot34m7s6xij',
'timestamp': '10-06-16 11:30',
'product': '1',
'replica': '1',
'seqnr': '1',
'session': '1',
'column': '\n ,
'question': 'Q2',
'answer': 'a2'
},
{
'respondent': 'w42h3fot34m7s6x',
'timestamp': '10-06-16 11:30',
'product': '1',
'replica': '1',
'seqnr': '1',
'session': '1',
'column': '\n ,
'question': 'Q2',
'answer': 'a2'
}.......
]
I expect the below output, tried a lot but unable to loop over the column tag.
[{
'respondent': 'anonymous_m0wxo5f6w42h3fot34m7s6xij',
'timestamp': '10-06-16 11:30',
'product': '1',
'replica': '1',
'seqnr': '1',
'session': '1',
'question': 'Q1',
'answer': 'a1'
},
{
'respondent': 'anonymous_m0wxo5f6w42h3fot34m7s6xij',
'timestamp': '10-06-16 11:30',
'product': '1',
'replica': '1',
'seqnr': '1',
'session': '1',
'question': 'Q2',
'answer': 'a2'
},
{
'respondent': 'w42h3fot34m7s6x',
'timestamp': '10-06-16 11:30',
'product': '1',
'replica': '1',
'seqnr': '1',
'session': '1',
'question': 'Q3',
'answer': 'a3'
},
{
'respondent': 'w42h3fot34m7s6x',
'timestamp': '10-06-16 11:30',
'product': '1',
'replica': '1',
'seqnr': '1',
'session': '1',
'question': 'Q4',
'answer': 'a4'
},
{
'respondent': 'w42h3fot34m7s6x',
'timestamp': '10-06-16 11:30',
'product': '1',
'replica': '1',
'seqnr': '1',
'session': '1',
'question': 'Q5',
'answer': 'a5'
}
]
I have refereed so many stack overflow questions on xml tree but still didn't helped me.
any help/suggestion is appreciated.
I had a problem understanding what this code is supposed to do because it uses abstract variable names like item, child, subchild and this makes it hard to reason about the code. I'm not as clever as that, so I renamed the variables to row, tag, and column to make it easier for me to see what the code is doing. (In my book, even row and column are a bit abstract, but I suppose the opacity of the XML input is hardly your fault.)
You have 2 rows but you want 5 dictionaries, because you have 5 <column> tags and you want each <column>'s data in a separate dictionary. But you want the other tags in the <row> to be repeated along with each <column>'s data.
That means you need to build a dictionary for every <row>, then, for each <column>, add that column's data to the dictionary, then output it before going on to the next column.
This code makes the simplifying assumption that all of your <columns>s have the same structure, with exactly one <question> and exactly one <answer> and nothing else. If this assumption does not hold then a <column> may get reported with stale data it inherited from the previous <column> in the same row. It will also produce no output at all for any <row> that does not have at least one <column>.
The code has to loop through the tags twice, once for the non-<column>s and once for the <column>s. Otherwise it can't be sure it has seen all the non-<column> tags before it starts outputting the <column>s.
There are other (no doubt more elegant) ways to do this, but I kept the code structure as close to your original as I could, other than making the variable names less opaque.
for row in root.find('./data'): # find all projects node
data = {} # dictionary to store content of each projects
for tag in row:
if tag.tag != "column":
data[tag.tag] = tag.text # add row to dictionary
# Now the dictionary data is built for the row level
for tag in row:
if tag.tag == "column":
for column in tag:
data[column.tag] = column.text
# Now we have added the column level data for one column tag
data_list.append(data.copy())
Output is as below. The key order of the dicts isn't preserved because I used pprint.pprint for convenience.
[{'answer': 'a1',
'product': '1',
'question': 'Q1',
'replica': '1',
'respondent': 'm0wxo5f6w42h3fot34m7s6xij',
'seqnr': '1',
'session': '1',
'timestamp': '10-06-16 11:30'},
{'answer': 'a2',
'product': '1',
'question': 'Q2',
'replica': '1',
'respondent': 'm0wxo5f6w42h3fot34m7s6xij',
'seqnr': '1',
'session': '1',
'timestamp': '10-06-16 11:30'},
{'answer': 'a3',
'product': '1',
'question': 'Q3',
'replica': '1',
'respondent': 'w42h3fot34m7s6x',
'seqnr': '1',
'session': '1',
'timestamp': '10-06-16 11:30'},
{'answer': 'a4',
'product': '1',
'question': 'Q4',
'replica': '1',
'respondent': 'w42h3fot34m7s6x',
'seqnr': '1',
'session': '1',
'timestamp': '10-06-16 11:30'},
{'answer': 'a5',
'product': '1',
'question': 'Q5',
'replica': '1',
'respondent': 'w42h3fot34m7s6x',
'seqnr': '1',
'session': '1',
'timestamp': '10-06-16 11:30'}]

Python Dict append list with given vars

I trying to structure the data with dict with appending list, I tried using defaultdict but giving error.
data =
"""
[{'transit01_net': '192.168.1.0',
'transit01_subnet': '26',
'transit02_net': '192.168.2.0',
'transit02_subnet': '26',
'transit03_net': '192.168.3.0',
'transit03_subnet': '26',
}]
"""
output = {
'transit01': [],
'transit02': [],
'transit03': []
}
I would like to get:
{
'transit01': ['192.168.1.0', '26', 'Transit01'],
'transit02': ['192.168.2.0', '26', 'Transit02'],
'transit03': ['192.168.3.0', '26', 'Transit03'],
}
I have tried following, but only able to print the first
for item in data:
# Iterating the elements in list
output['transit01'].append(item['transit01_net'])
output['transit01'].append(item['transit01_subnet'])
output['transit01'].append('Transit01')
output['transit02'].append(item['transit02_net'])
output['transit02'].append(item['transit02_subnet'])
output['transit02'].append('Transit02')
output['transit03'].append(item['transit03_net'])
output['transit03'].append(item['transit03_subnet'])
output['transit03'].append('Transit03')
Step through this. You want to get from this:
data =
"""
[{'transit01_net': '192.168.1.0',
'transit01_subnet': '26',
'transit02_net': '192.168.2.0',
'transit02_subnet': '26',
'transit03_net': '192.168.3.0',
'transit03_subnet': '26',
}]
"""
To this
{
'transit01': ['192.168.1.0', '26', 'Transit01'],
'transit02': ['192.168.2.0', '26', 'Transit02'],
'transit03': ['192.168.3.0', '26', 'Transit03'],
}
The former is a string that describes a literal data structure. Python gives you access to ast to lex and tokenize that into a python object for you.
import ast
evald_data = ast.literal_eval(data)
From there you need to do the more difficult work of actually parsing the structure. Looks like you can split each key, though, and get what you need. Let's save off name of each field for now.
result = {}
for d in evald_data: # for each dictionary in the (single-item) list
for k, v in d.items():
name, key = k.split("_")
result.setdefault(name, {})[key] = v
# this should give you
expected = {
{'transit01': {'net': '192.168.1.0', 'subnet': '26'},
{'transit02': {'net': '192.168.2.0', 'subnet': '26'},
{'transit03': {'net': '192.168.3.0', 'subnet': '26'}
}
assert result == expected
From there it's pretty simple stuff. I'd posit that you probably want a tuple instead of a list, since these values' order seem to matter (sorting them isn't just bad, it's incorrect).
final_result = {k: (v['net'], v['subnet'], k.title()) for k,v in result.items()}
expected = {
'transit01': ['192.168.1.0', '26', 'Transit01'],
'transit02': ['192.168.2.0', '26', 'Transit02'],
'transit03': ['192.168.3.0', '26', 'Transit03'],
}
assert final_result == expected
Use collections.defaultdict
Ex.
from collections import defaultdict
data = [{'transit01_net': '192.168.1.0',
'transit01_subnet': '26',
'transit02_net': '192.168.2.0',
'transit02_subnet': '26',
'transit03_net': '192.168.3.0',
'transit03_subnet': '26',
}]
output = defaultdict(list)
temp = 1
for x in data[0]:
key = x.split("_")[0]
output[key].append(data[0][x])
sub_key = "transit0{}_subnet".format(temp)
if x == sub_key:
output[key].append(key.capitalize())
temp+=1
print(dict(output))
O/P
{'transit01': ['192.168.1.0', '26', 'Transit01'], 'transit02': ['192.168.2.0', '26',
'Transit02'], 'transit03': ['192.168.3.0', '26', 'Transit03']}

Categories