How to iterate over a XML file and sum a specific field

How to iterate over a XML file and sum a specific field - python

I want to iterate over an xml file and get the sum of the field "PremieTot" (marked in the xml below)
<?xml version="1.0" encoding="iso-8859-1" ?>
<Bericht Version="1.0" xmlns="http://www.test.nl/test/2022/01">
<Bericht>
<IdBer>1111</IdBer>
<IdLcr>2323</IdLcr>
<NmLcr>Test Company</NmLcr>
</Bericht>
<AdministratieveEenheid>
<LhNr>3434</LhNr>
<NmIP>Test Company</NmIP>
<TvkCd>MND</TvkCd>
<TijdvakAangifte>
<DatAanvTv>2022-01-01</DatAanvTv>
<DatEindTv>2022-01-31</DatEindTv>
<VolledigeAangifte>
<CollectieveAangifte>
<TotaalRegelingen>
<RelNrAansl>3434</RelNrAansl>
</TotaalRegelingen>
<TotaalRegelingen>
<RelNrAansl>3434</RelNrAansl>
</TotaalRegelingen>
</CollectieveAangifte>
<InkomstenverhoudingInitieel>
<NumIV>1</NumIV>
<DatAanv>2020-01-01</DatAanv>
<PersNr>2364</PersNr>
<RegelingGegevens>
<PremieTot>0.52</PremieTot> //I want to sum this field
</RegelingGegevens>
</InkomstenverhoudingInitieel>
<InkomstenverhoudingInitieel>
<NumIV>1</NumIV>
<DatAanv>2020-07-01</DatAanv>
<PersNr>2365</PersNr>
<RegelingGegevens>
<PremieTot>0.66</PremieTot> //I want to sum this field
<AantVerlUPens>29.12</AantVerlUPens>
</RegelingGegevens>
</InkomstenverhoudingInitieel>
</VolledigeAangifte>
</TijdvakAangifte>
</AdministratieveEenheid>
</Bericht>
Iam trying it with xmldict to parse the xml file into a dict, but for some reason i cant get the value "PremieTot"
info_dict = xml_dict["PensioenAangifte"]["AdministratieveEenheid"]["TijdvakAangifte"]["VolledigeAangifte"]
premieTotal = [xml_data["RegelingGegevens]["PremieTot"] for xml_data in info_dict]

Quite easy with ElementTree:
from xml.etree import ElementTree as ET
et = ET.fromstring(xml)
result = sum(
float(el.text)
for el in et.findall('.//{*}PremieTot')
)

Related

How to add Subelements and its in xml

I have a xml file which has subelements:-
.......
.......
<EnabledFeatureListForUsers>
<FeatureEntitlementDetail>
<UserName>xyz#xyz.com</UserName>
<FeatureList>
<FeatureDetail>
<FeatureId>X</FeatureId>
</FeatureDetail>
</FeatureList>
</FeatureEntitlementDetail>
</EnabledFeatureListForUsers>
.....
.....
I want to add a new sub element FeatureEntitlementDetail with all its subelements/children like username, Feature List, Feature Detail, Feature Id. I tried using SubElement function, but it only adds FeatureEntitlementDetail />. The code which I used was :-
import xml.etree.ElementTree as ET
filename = "XYZ.xml"
xmlTree = ET.parse(filename)
root = xmlTree.getroot()
for element in root.iter('EnabledFeatureListForUsers'):
ET.SubElement(element,"FeatureEntitlementDetail")
Any help is appreciated.

See below
import xml.etree.ElementTree as ET
xml = """
<EnabledFeatureListForUsers>
<FeatureEntitlementDetail>
<UserName>xyz#xyz.com</UserName>
<FeatureList>
<FeatureDetail>
<FeatureId>X</FeatureId>
</FeatureDetail>
</FeatureList>
</FeatureEntitlementDetail>
</EnabledFeatureListForUsers>
"""
root = ET.fromstring(xml)
fed = ET.SubElement(root,'FeatureEntitlementDetail')
un = ET.SubElement(fed,'UserName')
un.text = 'abc.zz.net'
fl = ET.SubElement(fed,'FeatureList')
df = ET.SubElement(fl,'FeatureDetail')
fi = ET.SubElement(df,'FeatureId')
fi.text = 'Z'
ET.dump(root)
output
<?xml version="1.0" encoding="UTF-8"?>
<EnabledFeatureListForUsers>
<FeatureEntitlementDetail>
<UserName>xyz#xyz.com</UserName>
<FeatureList>
<FeatureDetail>
<FeatureId>X</FeatureId>
</FeatureDetail>
</FeatureList>
</FeatureEntitlementDetail>
<FeatureEntitlementDetail>
<UserName>abc.zz.net</UserName>
<FeatureList>
<FeatureDetail>
<FeatureId>Z</FeatureId>
</FeatureDetail>
</FeatureList>
</FeatureEntitlementDetail>
</EnabledFeatureListForUsers>

how to format attributes, prefixes, and tags using xml.etree.ElementTree Python

I'm trying to create a python script that will create a schema to then fill data based on an existing reference.
This is what I need to create:
<srp:root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
this is what I have:
from xml.etree.ElementTree import *
from xml.dom import minidom
def prettify(elem):
rough_string = tostring(elem, "utf-8")
reparsed = minidom.parseString(rough_string)
return reparsed.toprettyxml(indent=" ")
ns = { "SOAP-ENV": "http://www.w3.org/2003/05/soap-envelope",
"SOAP-ENC": "http://www.w3.org/2003/05/soap-encoding",
"xsi": "http://www.w3.org/2001/XMLSchema-instance",
"srp": "http://www.-redacted-standards.org/Schemas/MSRP.xsd"}
def gen():
root = Element(QName(ns["xsi"],'root'))
print(prettify(root))
gen()
which gives me:
<xsi:root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
how do I fix it so that the front matches?

The exact result that you ask for is incomplete, but with a few edits to the gen() function, it is possible to generate well-formed output.
The root element should be bound to the http://www.-redacted-standards.org/Schemas/MSRP.xsd namespace (srp prefix). In order to generate the xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" declaration, the namespace must be used in the XML document.
def gen():
root = Element(QName(ns["srp"], 'root'))
root.set(QName(ns["xsi"], "schemaLocation"), "whatever") # Add xsi:schemaLocation attribute
register_namespace("srp", ns["srp"]) # Needed to get 'srp' instead of 'ns0'
print(prettify(root))
Result (linebreaks added for readability):
<?xml version="1.0" ?>
<srp:root xmlns:srp="http://www.-redacted-standards.org/Schemas/MSRP.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="whatever"/>

How to get the content of specific grandchild from xml file through python

Hi I am very new to python programming. I have an xml file of structure:
<?xml version="1.0" encoding="UTF-8"?>
-<LidcReadMessage xsi:schemaLocation="http://www.nih.gov http://troll.rad.med.umich.edu/lidc/LidcReadMessage.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.nih.gov" uid="1.3.6.1.4.1.14519.5.2.1.6279.6001.1307390687803.0">
-<ResponseHeader>
<Version>1.8.1</Version>
<MessageId>-421198203</MessageId>
<DateRequest>2007-11-01</DateRequest>
<TimeRequest>12:30:44</TimeRequest>
<RequestingSite>removed</RequestingSite>
<ServicingSite>removed</ServicingSite>
<TaskDescription>Second unblinded read</TaskDescription>
<CtImageFile>removed</CtImageFile>
<SeriesInstanceUid>1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192</SeriesInstanceUid>
<DateService>2008-08-18</DateService>
<TimeService>02:05:51</TimeService>
<ResponseDescription>1 - Reading complete</ResponseDescription>
<StudyInstanceUID>1.3.6.1.4.1.14519.5.2.1.6279.6001.298806137288633453246975630178</StudyInstanceUID>
</ResponseHeader>
-<readingSession>
<annotationVersion>3.12</annotationVersion>
<servicingRadiologistID>540461523</servicingRadiologistID>
-<unblindedReadNodule>
<noduleID>Nodule 001</noduleID>
-<characteristics>
<subtlety>5</subtlety>
<internalStructure>1</internalStructure>
<calcification>6</calcification>
<sphericity>3</sphericity>
<margin>3</margin>
<lobulation>3</lobulation>
<spiculation>4</spiculation>
<texture>5</texture>
<malignancy>5</malignancy>
</characteristics>
-<roi>
<imageZposition>-125.000000 </imageZposition>
<imageSOP_UID>1.3.6.1.4.1.14519.5.2.1.6279.6001.110383487652933113465768208719</imageSOP_UID>
......
There are four which contains multiple . Each contains an . I need to extract the information in from all of these headers.
Right now I am doing this:
import xml.etree.ElementTree as ET
tree = ET.parse('069.xml')
root = tree.getroot()
#lst = []
for readingsession in root.iter('readingSession'):
for roi in readingsession.findall('roi'):
id = roi.findtext('imageSOP_UID')
print(id)
but it ouputs like this:
Process finished with exit code 0.
If anyone can help.

The real problem as been wit the namespace. I tried with and without it, but it didn't work with this code.
ds = pydicom.dcmread("000071.dcm")
uid = ds.SOPInstanceUID
tree = ET.parse("069.xml")
root = tree.getroot()
for child in root:
print(child.tag)
if child.tag == '{http://www.nih.gov}readingSession':
read = child.find('{http://www.nih.gov}unblindedReadNodule')
if read != None:
nodule_id = read.find('{http://www.nih.gov}noduleID').text
xml_uid = read.find('{http://www.nih.gov}roi').find('{http://www.nih.gov}imageSOP_UID').text
if xml_uid == uid:
print(xml_uid, "=", uid)
roi= read.find('{http://www.nih.gov}roi')
print(roi)
This work completely fine to get a uid from dicom image of LIDC/IDRI dataset and then extract the same uid from xml file for it region of interest.

Parsing complex Xml Python 3.4

I have the following xml :
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Suite>
<TestCase>
<TestCaseID>001</TestCaseID>
<TestCaseDescription>Hello</TestCaseDescription>
<TestSetup>
<Action>
<ActionCommand>gfdg</ActionCommand>
<TimeOut>dfgd</TimeOut>
<BamSymbol>gff</BamSymbol>
<Side>vfbgc</Side>
<PrimeBroker>fgfd</PrimeBroker>
<Size>fbcgc</Size>
<PMCode>fdgd</PMCode>
<Strategy>fdgf</Strategy>
<SubStrategy>fgf</SubStrategy>
<ActionLogEndPoint>fdgf</ActionLogEndPoint>
<IsActionResultLogged>fdgf</IsActionResultLogged>
<ValidationStep>
<IsValidated>fgdf</IsValidated>
<ValidationFormat>dfgf</ValidationFormat>
<ResponseEndpoint>gdf</ResponseEndpoint>
<ResponseParameterName>fdgfdg</ResponseParameterName>
<ResponseParameterValue>gff</ResponseParameterValue>
<ExpectedValue>fdgf</ExpectedValue>
<IsValidationResultLogged>gdfgf</IsValidationResultLogged>
<ValidationLogEndpoint>fdgf</ValidationLogEndpoint>
</ValidationStep>
</Action>
<Action>
<ActionCommand>New Order</ActionCommand>
<TimeOut>fdgf</TimeOut>
<BamSymbol>fdg</BamSymbol>
<Side>C(COVER)</Side>
<PrimeBroker>CSPB</PrimeBroker>
<Size>fdgd</Size>
<PMCode>GREE</PMCode>
<Strategy>Generalist</Strategy>
<SubStrategy>USLC</SubStrategy>
<ActionLogEndPoint>gfbhgf</ActionLogEndPoint>
<IsActionResultLogged>fdgf</IsActionResultLogged>
<ValidationStep>
<IsValidated>fdgd</IsValidated>
<ValidationFormat>dfgfd</ValidationFormat>
<ResponseEndpoint>dfgf</ResponseEndpoint>
<ResponseParameterName>fdgfd</ResponseParameterName>
<ResponseParameterValue>dfgf</ResponseParameterValue>
<ExpectedValue>fdg</ExpectedValue>
<IsValidationResultLogged>fdgdf</IsValidationResultLogged>
<ValidationLogEndpoint>fdgfd</ValidationLogEndpoint>
</ValidationStep>
</Action>
</TestCase>
</Suite>
Based on the ActionCommand i am getting either one block , the issue is could not get the sub parent tag (ValidationStep) and all its child tags . Can anyone help?
My code:
for testSetup4 in root.findall(".TestCase/TestSetup/Action"):
if testSetup4.find('ActionCommand').text == "gfdg":
for c1 in testSetup4:
t2.append(c1.tag)
v2.append(c1.text)
for k,v in zip(t2, v2):
test_case[k] = v
I am not able to get ValidationStep (sub parent) and its corresponding tags.

Simply add another loop to iterate through the <ValidationStep> node and its children. Also, you do not need the two other lists as you can update a dictionary during the parsing loop:
import xml.etree.ElementTree as et
dom = et.parse('Input.xml')
root = dom.getroot()
test_case = {}
for testSetup4 in root.findall(".TestCase/TestSetup/Action"):
if testSetup4.find('ActionCommand').text == "gfdg":
for c1 in testSetup4:
test_case[c1.tag]= c1.text
for vd in testSetup4.findall("./ValidationStep/*"):
test_case[vd.tag]= vd.text
Alternatively, use the double slash operator to search for all children including grandchildren of <Action> element:
for testSetup4 in root.findall(".TestCase/TestSetup/Action"):
if testSetup4.find('ActionCommand').text == "gfdg":
for c1 in testSetup4.findall(".//*"):
test_case[c1.tag]= c1.text

Conditional Search in XML Python

I have a xml file Orders.xml (excerpt follows):
<?xml version="1.0"?>
<ListOrdersResponse xmlns="https://mws.amazonservices.com/Orders/2013-09-01">
<ListOrdersResult>
<Orders>
<Order>
<LatestShipDate>2015-06-02T18:29:59Z</LatestShipDate>
<OrderType>StandardOrder</OrderType>
<PurchaseDate>2015-05-31T03:58:30Z</PurchaseDate>
<AmazonOrderId>171-6355256-9594715</AmazonOrderId>
<LastUpdateDate>2015-06-01T04:18:58Z</LastUpdateDate>
<ShipServiceLevel>IN Std Domestic</ShipServiceLevel>
<NumberOfItemsShipped>0</NumberOfItemsShipped>
<OrderStatus>Canceled</OrderStatus>
<SalesChannel>Amazon.in</SalesChannel>
<NumberOfItemsUnshipped>0</NumberOfItemsUnshipped>
<IsPremiumOrder>false</IsPremiumOrder>
<EarliestShipDate>2015-05-31T18:30:00Z</EarliestShipDate>
<MarketplaceId>A21TJRUUN4KGV</MarketplaceId>
<FulfillmentChannel>MFN</FulfillmentChannel>
<IsPrime>false</IsPrime>
<ShipmentServiceLevelCategory>Standard</ShipmentServiceLevelCategory>
</Order>
<Order>
<LatestShipDate>2015-06-02T18:29:59Z</LatestShipDate>
<OrderType>StandardOrder</OrderType>
<PurchaseDate>2015-05-31T04:50:07Z</PurchaseDate>
<BuyerEmail>dr7h1rhy6457rng#marketplace.amazon.in</BuyerEmail>
<AmazonOrderId>403-5551715-2566754</AmazonOrderId>
<LastUpdateDate>2015-06-01T07:52:49Z</LastUpdateDate>
<ShipServiceLevel>IN Exp Dom 2</ShipServiceLevel>
<NumberOfItemsShipped>2</NumberOfItemsShipped>
<OrderStatus>Shipped</OrderStatus>
<SalesChannel>Amazon.in</SalesChannel>
<ShippedByAmazonTFM>false</ShippedByAmazonTFM>
<LatestDeliveryDate>2015-06-06T18:29:59Z</LatestDeliveryDate>
<NumberOfItemsUnshipped>0</NumberOfItemsUnshipped>
<BuyerName>Ajit Nair</BuyerName>
<EarliestDeliveryDate>2015-06-02T18:30:00Z</EarliestDeliveryDate>
<OrderTotal>
<CurrencyCode>INR</CurrencyCode>
<Amount>938.00</Amount>
</OrderTotal>
<IsPremiumOrder>false</IsPremiumOrder>
<EarliestShipDate>2015-05-31T18:30:00Z</EarliestShipDate>
<MarketplaceId>A21TJRUUN4KGV</MarketplaceId>
<FulfillmentChannel>MFN</FulfillmentChannel>
<TFMShipmentStatus>Delivered</TFMShipmentStatus>
<PaymentMethod>Other</PaymentMethod>
<ShippingAddress>
<StateOrRegion>MAHARASHTRA</StateOrRegion>
<City>THANE</City>
<Phone>9769994355</Phone>
<CountryCode>IN</CountryCode>
<PostalCode>400709</PostalCode>
<Name>Ajit Nair</Name>
<AddressLine1>C-25 / con-7 / Chandralok CHS</AddressLine1>
<AddressLine2>Sector-10 ,Koper khairne</AddressLine2>
</ShippingAddress>
<IsPrime>false</IsPrime>
<ShipmentServiceLevelCategory>Expedited</ShipmentServiceLevelCategory>
</Order>
</Orders>
<CreatedBefore>2015-06-08T06:45:22Z</CreatedBefore>
<NextToken>smN7fNREdZyaJqJYLDm0ZIfVkJJPpovRb7YcCAmB0tlUojdU4H46trQzazHyYVyLqBXdLk4iogxpJASl2BeRezElfc2tdWR3lK0FtvOjoEqUrelVme04kSJ0wMvlylZkWQWPqGlbsnPaEpJjLWtrc27Vm9nDvRdgFtvOhjiqTWA16vKmtecRgbuZIF9n45mtnrZ4AbBdBTdge/hBzh1HtoVw85GaTVKBVfeXMWcfhX25HmwX5IAmwKfxnqm3JqvZ0Rjw/YZARKQMcjl5+H0CsJGesRwkZOQCBLVDshZ93sFo8v4Do3XuodaFg8ZGJDSTcawcthgh/MGM4KOIYd79q7Aq3I/8b9+STDy5JVgPyI0jQ6ftKc7EcAIwpq2cHuPbP+HgZXNbc7qI4HDvHa5YloEDUrIQbaP8qbwRHLZm6VTmGvVwLKwj6AZ0GNanrGO6</NextToken>
</ListOrdersResult>
<ResponseMetadata>
<RequestId>f2b55344-d281-4bd3-b8b3-788be07b7656</RequestId>
</ResponseMetadata>
</ListOrdersResponse>
I am using a python script to parse data from xml file. I want two fields from XML file AmazonOrderID and BuyerName. Some sub element in XML might not have have BuyerName. When I parse both individually, I get a list of 100 AmazonOrder and 70 BuyerName.
I want to get a empty string instead of nothing. i.e. if any subelement doesn't have a buyer name, i want to include '' instead of nothing.
My Code:
from xml.etree import ElementTree
with open('orders.xml', 'rb') as f:
tree = ElementTree.parse(f)
ns = {'d': 'https://mws.amazonservices.com/Orders/2013-09-01'}
for node in tree.findall('.//d:Order/d:AmazonOrderId', ns):
oid.append(node.text)
for node in tree.findall('.//d:Order/d:BuyerName', ns):
bn.append(node.text)
print oid
print bn

You can make it in a single loop using findtext() specifying the default as an empty string:
for node in tree.findall('.//d:Order', namespaces=ns):
oid.append(node.findtext("d:AmazonOrderId", default='', namespaces=ns))
bn.append(node.findtext("d:BuyerName", default='', namespaces=ns))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to iterate over a XML file and sum a specific field - python

Quite easy with ElementTree: from xml.etree import ElementTree as ET et = ET.fromstring(xml) result = sum( float(el.text) for el in et.findall('.//{*}PremieTot') )

Related

How to add Subelements and its in xml

how to format attributes, prefixes, and tags using xml.etree.ElementTree Python

How to get the content of specific grandchild from xml file through python

Parsing complex Xml Python 3.4

Conditional Search in XML Python

Categories

Resources