I want to iterate over an xml file and get the sum of the field "PremieTot" (marked in the xml below)
<?xml version="1.0" encoding="iso-8859-1" ?>
<Bericht Version="1.0" xmlns="http://www.test.nl/test/2022/01">
<Bericht>
<IdBer>1111</IdBer>
<IdLcr>2323</IdLcr>
<NmLcr>Test Company</NmLcr>
</Bericht>
<AdministratieveEenheid>
<LhNr>3434</LhNr>
<NmIP>Test Company</NmIP>
<TvkCd>MND</TvkCd>
<TijdvakAangifte>
<DatAanvTv>2022-01-01</DatAanvTv>
<DatEindTv>2022-01-31</DatEindTv>
<VolledigeAangifte>
<CollectieveAangifte>
<TotaalRegelingen>
<RelNrAansl>3434</RelNrAansl>
</TotaalRegelingen>
<TotaalRegelingen>
<RelNrAansl>3434</RelNrAansl>
</TotaalRegelingen>
</CollectieveAangifte>
<InkomstenverhoudingInitieel>
<NumIV>1</NumIV>
<DatAanv>2020-01-01</DatAanv>
<PersNr>2364</PersNr>
<RegelingGegevens>
<PremieTot>0.52</PremieTot> //I want to sum this field
</RegelingGegevens>
</InkomstenverhoudingInitieel>
<InkomstenverhoudingInitieel>
<NumIV>1</NumIV>
<DatAanv>2020-07-01</DatAanv>
<PersNr>2365</PersNr>
<RegelingGegevens>
<PremieTot>0.66</PremieTot> //I want to sum this field
<AantVerlUPens>29.12</AantVerlUPens>
</RegelingGegevens>
</InkomstenverhoudingInitieel>
</VolledigeAangifte>
</TijdvakAangifte>
</AdministratieveEenheid>
</Bericht>
Iam trying it with xmldict to parse the xml file into a dict, but for some reason i cant get the value "PremieTot"
info_dict = xml_dict["PensioenAangifte"]["AdministratieveEenheid"]["TijdvakAangifte"]["VolledigeAangifte"]
premieTotal = [xml_data["RegelingGegevens]["PremieTot"] for xml_data in info_dict]
Quite easy with ElementTree:
from xml.etree import ElementTree as ET
et = ET.fromstring(xml)
result = sum(
float(el.text)
for el in et.findall('.//{*}PremieTot')
)
Related
I have a xml file which has subelements:-
.......
.......
<EnabledFeatureListForUsers>
<FeatureEntitlementDetail>
<UserName>xyz#xyz.com</UserName>
<FeatureList>
<FeatureDetail>
<FeatureId>X</FeatureId>
</FeatureDetail>
</FeatureList>
</FeatureEntitlementDetail>
</EnabledFeatureListForUsers>
.....
.....
I want to add a new sub element FeatureEntitlementDetail with all its subelements/children like username, Feature List, Feature Detail, Feature Id. I tried using SubElement function, but it only adds FeatureEntitlementDetail />. The code which I used was :-
import xml.etree.ElementTree as ET
filename = "XYZ.xml"
xmlTree = ET.parse(filename)
root = xmlTree.getroot()
for element in root.iter('EnabledFeatureListForUsers'):
ET.SubElement(element,"FeatureEntitlementDetail")
Any help is appreciated.
See below
import xml.etree.ElementTree as ET
xml = """
<EnabledFeatureListForUsers>
<FeatureEntitlementDetail>
<UserName>xyz#xyz.com</UserName>
<FeatureList>
<FeatureDetail>
<FeatureId>X</FeatureId>
</FeatureDetail>
</FeatureList>
</FeatureEntitlementDetail>
</EnabledFeatureListForUsers>
"""
root = ET.fromstring(xml)
fed = ET.SubElement(root,'FeatureEntitlementDetail')
un = ET.SubElement(fed,'UserName')
un.text = 'abc.zz.net'
fl = ET.SubElement(fed,'FeatureList')
df = ET.SubElement(fl,'FeatureDetail')
fi = ET.SubElement(df,'FeatureId')
fi.text = 'Z'
ET.dump(root)
output
<?xml version="1.0" encoding="UTF-8"?>
<EnabledFeatureListForUsers>
<FeatureEntitlementDetail>
<UserName>xyz#xyz.com</UserName>
<FeatureList>
<FeatureDetail>
<FeatureId>X</FeatureId>
</FeatureDetail>
</FeatureList>
</FeatureEntitlementDetail>
<FeatureEntitlementDetail>
<UserName>abc.zz.net</UserName>
<FeatureList>
<FeatureDetail>
<FeatureId>Z</FeatureId>
</FeatureDetail>
</FeatureList>
</FeatureEntitlementDetail>
</EnabledFeatureListForUsers>
I'm trying to create a python script that will create a schema to then fill data based on an existing reference.
This is what I need to create:
<srp:root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
this is what I have:
from xml.etree.ElementTree import *
from xml.dom import minidom
def prettify(elem):
rough_string = tostring(elem, "utf-8")
reparsed = minidom.parseString(rough_string)
return reparsed.toprettyxml(indent=" ")
ns = { "SOAP-ENV": "http://www.w3.org/2003/05/soap-envelope",
"SOAP-ENC": "http://www.w3.org/2003/05/soap-encoding",
"xsi": "http://www.w3.org/2001/XMLSchema-instance",
"srp": "http://www.-redacted-standards.org/Schemas/MSRP.xsd"}
def gen():
root = Element(QName(ns["xsi"],'root'))
print(prettify(root))
gen()
which gives me:
<xsi:root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
how do I fix it so that the front matches?
The exact result that you ask for is incomplete, but with a few edits to the gen() function, it is possible to generate well-formed output.
The root element should be bound to the http://www.-redacted-standards.org/Schemas/MSRP.xsd namespace (srp prefix). In order to generate the xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" declaration, the namespace must be used in the XML document.
def gen():
root = Element(QName(ns["srp"], 'root'))
root.set(QName(ns["xsi"], "schemaLocation"), "whatever") # Add xsi:schemaLocation attribute
register_namespace("srp", ns["srp"]) # Needed to get 'srp' instead of 'ns0'
print(prettify(root))
Result (linebreaks added for readability):
<?xml version="1.0" ?>
<srp:root xmlns:srp="http://www.-redacted-standards.org/Schemas/MSRP.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="whatever"/>
Hi I am very new to python programming. I have an xml file of structure:
<?xml version="1.0" encoding="UTF-8"?>
-<LidcReadMessage xsi:schemaLocation="http://www.nih.gov http://troll.rad.med.umich.edu/lidc/LidcReadMessage.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.nih.gov" uid="1.3.6.1.4.1.14519.5.2.1.6279.6001.1307390687803.0">
-<ResponseHeader>
<Version>1.8.1</Version>
<MessageId>-421198203</MessageId>
<DateRequest>2007-11-01</DateRequest>
<TimeRequest>12:30:44</TimeRequest>
<RequestingSite>removed</RequestingSite>
<ServicingSite>removed</ServicingSite>
<TaskDescription>Second unblinded read</TaskDescription>
<CtImageFile>removed</CtImageFile>
<SeriesInstanceUid>1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192</SeriesInstanceUid>
<DateService>2008-08-18</DateService>
<TimeService>02:05:51</TimeService>
<ResponseDescription>1 - Reading complete</ResponseDescription>
<StudyInstanceUID>1.3.6.1.4.1.14519.5.2.1.6279.6001.298806137288633453246975630178</StudyInstanceUID>
</ResponseHeader>
-<readingSession>
<annotationVersion>3.12</annotationVersion>
<servicingRadiologistID>540461523</servicingRadiologistID>
-<unblindedReadNodule>
<noduleID>Nodule 001</noduleID>
-<characteristics>
<subtlety>5</subtlety>
<internalStructure>1</internalStructure>
<calcification>6</calcification>
<sphericity>3</sphericity>
<margin>3</margin>
<lobulation>3</lobulation>
<spiculation>4</spiculation>
<texture>5</texture>
<malignancy>5</malignancy>
</characteristics>
-<roi>
<imageZposition>-125.000000 </imageZposition>
<imageSOP_UID>1.3.6.1.4.1.14519.5.2.1.6279.6001.110383487652933113465768208719</imageSOP_UID>
......
There are four which contains multiple . Each contains an . I need to extract the information in from all of these headers.
Right now I am doing this:
import xml.etree.ElementTree as ET
tree = ET.parse('069.xml')
root = tree.getroot()
#lst = []
for readingsession in root.iter('readingSession'):
for roi in readingsession.findall('roi'):
id = roi.findtext('imageSOP_UID')
print(id)
but it ouputs like this:
Process finished with exit code 0.
If anyone can help.
The real problem as been wit the namespace. I tried with and without it, but it didn't work with this code.
ds = pydicom.dcmread("000071.dcm")
uid = ds.SOPInstanceUID
tree = ET.parse("069.xml")
root = tree.getroot()
for child in root:
print(child.tag)
if child.tag == '{http://www.nih.gov}readingSession':
read = child.find('{http://www.nih.gov}unblindedReadNodule')
if read != None:
nodule_id = read.find('{http://www.nih.gov}noduleID').text
xml_uid = read.find('{http://www.nih.gov}roi').find('{http://www.nih.gov}imageSOP_UID').text
if xml_uid == uid:
print(xml_uid, "=", uid)
roi= read.find('{http://www.nih.gov}roi')
print(roi)
This work completely fine to get a uid from dicom image of LIDC/IDRI dataset and then extract the same uid from xml file for it region of interest.
I have the following xml :
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Suite>
<TestCase>
<TestCaseID>001</TestCaseID>
<TestCaseDescription>Hello</TestCaseDescription>
<TestSetup>
<Action>
<ActionCommand>gfdg</ActionCommand>
<TimeOut>dfgd</TimeOut>
<BamSymbol>gff</BamSymbol>
<Side>vfbgc</Side>
<PrimeBroker>fgfd</PrimeBroker>
<Size>fbcgc</Size>
<PMCode>fdgd</PMCode>
<Strategy>fdgf</Strategy>
<SubStrategy>fgf</SubStrategy>
<ActionLogEndPoint>fdgf</ActionLogEndPoint>
<IsActionResultLogged>fdgf</IsActionResultLogged>
<ValidationStep>
<IsValidated>fgdf</IsValidated>
<ValidationFormat>dfgf</ValidationFormat>
<ResponseEndpoint>gdf</ResponseEndpoint>
<ResponseParameterName>fdgfdg</ResponseParameterName>
<ResponseParameterValue>gff</ResponseParameterValue>
<ExpectedValue>fdgf</ExpectedValue>
<IsValidationResultLogged>gdfgf</IsValidationResultLogged>
<ValidationLogEndpoint>fdgf</ValidationLogEndpoint>
</ValidationStep>
</Action>
<Action>
<ActionCommand>New Order</ActionCommand>
<TimeOut>fdgf</TimeOut>
<BamSymbol>fdg</BamSymbol>
<Side>C(COVER)</Side>
<PrimeBroker>CSPB</PrimeBroker>
<Size>fdgd</Size>
<PMCode>GREE</PMCode>
<Strategy>Generalist</Strategy>
<SubStrategy>USLC</SubStrategy>
<ActionLogEndPoint>gfbhgf</ActionLogEndPoint>
<IsActionResultLogged>fdgf</IsActionResultLogged>
<ValidationStep>
<IsValidated>fdgd</IsValidated>
<ValidationFormat>dfgfd</ValidationFormat>
<ResponseEndpoint>dfgf</ResponseEndpoint>
<ResponseParameterName>fdgfd</ResponseParameterName>
<ResponseParameterValue>dfgf</ResponseParameterValue>
<ExpectedValue>fdg</ExpectedValue>
<IsValidationResultLogged>fdgdf</IsValidationResultLogged>
<ValidationLogEndpoint>fdgfd</ValidationLogEndpoint>
</ValidationStep>
</Action>
</TestCase>
</Suite>
Based on the ActionCommand i am getting either one block , the issue is could not get the sub parent tag (ValidationStep) and all its child tags . Can anyone help?
My code:
for testSetup4 in root.findall(".TestCase/TestSetup/Action"):
if testSetup4.find('ActionCommand').text == "gfdg":
for c1 in testSetup4:
t2.append(c1.tag)
v2.append(c1.text)
for k,v in zip(t2, v2):
test_case[k] = v
I am not able to get ValidationStep (sub parent) and its corresponding tags.
Simply add another loop to iterate through the <ValidationStep> node and its children. Also, you do not need the two other lists as you can update a dictionary during the parsing loop:
import xml.etree.ElementTree as et
dom = et.parse('Input.xml')
root = dom.getroot()
test_case = {}
for testSetup4 in root.findall(".TestCase/TestSetup/Action"):
if testSetup4.find('ActionCommand').text == "gfdg":
for c1 in testSetup4:
test_case[c1.tag]= c1.text
for vd in testSetup4.findall("./ValidationStep/*"):
test_case[vd.tag]= vd.text
Alternatively, use the double slash operator to search for all children including grandchildren of <Action> element:
for testSetup4 in root.findall(".TestCase/TestSetup/Action"):
if testSetup4.find('ActionCommand').text == "gfdg":
for c1 in testSetup4.findall(".//*"):
test_case[c1.tag]= c1.text
I have a xml file Orders.xml (excerpt follows):
<?xml version="1.0"?>
<ListOrdersResponse xmlns="https://mws.amazonservices.com/Orders/2013-09-01">
<ListOrdersResult>
<Orders>
<Order>
<LatestShipDate>2015-06-02T18:29:59Z</LatestShipDate>
<OrderType>StandardOrder</OrderType>
<PurchaseDate>2015-05-31T03:58:30Z</PurchaseDate>
<AmazonOrderId>171-6355256-9594715</AmazonOrderId>
<LastUpdateDate>2015-06-01T04:18:58Z</LastUpdateDate>
<ShipServiceLevel>IN Std Domestic</ShipServiceLevel>
<NumberOfItemsShipped>0</NumberOfItemsShipped>
<OrderStatus>Canceled</OrderStatus>
<SalesChannel>Amazon.in</SalesChannel>
<NumberOfItemsUnshipped>0</NumberOfItemsUnshipped>
<IsPremiumOrder>false</IsPremiumOrder>
<EarliestShipDate>2015-05-31T18:30:00Z</EarliestShipDate>
<MarketplaceId>A21TJRUUN4KGV</MarketplaceId>
<FulfillmentChannel>MFN</FulfillmentChannel>
<IsPrime>false</IsPrime>
<ShipmentServiceLevelCategory>Standard</ShipmentServiceLevelCategory>
</Order>
<Order>
<LatestShipDate>2015-06-02T18:29:59Z</LatestShipDate>
<OrderType>StandardOrder</OrderType>
<PurchaseDate>2015-05-31T04:50:07Z</PurchaseDate>
<BuyerEmail>dr7h1rhy6457rng#marketplace.amazon.in</BuyerEmail>
<AmazonOrderId>403-5551715-2566754</AmazonOrderId>
<LastUpdateDate>2015-06-01T07:52:49Z</LastUpdateDate>
<ShipServiceLevel>IN Exp Dom 2</ShipServiceLevel>
<NumberOfItemsShipped>2</NumberOfItemsShipped>
<OrderStatus>Shipped</OrderStatus>
<SalesChannel>Amazon.in</SalesChannel>
<ShippedByAmazonTFM>false</ShippedByAmazonTFM>
<LatestDeliveryDate>2015-06-06T18:29:59Z</LatestDeliveryDate>
<NumberOfItemsUnshipped>0</NumberOfItemsUnshipped>
<BuyerName>Ajit Nair</BuyerName>
<EarliestDeliveryDate>2015-06-02T18:30:00Z</EarliestDeliveryDate>
<OrderTotal>
<CurrencyCode>INR</CurrencyCode>
<Amount>938.00</Amount>
</OrderTotal>
<IsPremiumOrder>false</IsPremiumOrder>
<EarliestShipDate>2015-05-31T18:30:00Z</EarliestShipDate>
<MarketplaceId>A21TJRUUN4KGV</MarketplaceId>
<FulfillmentChannel>MFN</FulfillmentChannel>
<TFMShipmentStatus>Delivered</TFMShipmentStatus>
<PaymentMethod>Other</PaymentMethod>
<ShippingAddress>
<StateOrRegion>MAHARASHTRA</StateOrRegion>
<City>THANE</City>
<Phone>9769994355</Phone>
<CountryCode>IN</CountryCode>
<PostalCode>400709</PostalCode>
<Name>Ajit Nair</Name>
<AddressLine1>C-25 / con-7 / Chandralok CHS</AddressLine1>
<AddressLine2>Sector-10 ,Koper khairne</AddressLine2>
</ShippingAddress>
<IsPrime>false</IsPrime>
<ShipmentServiceLevelCategory>Expedited</ShipmentServiceLevelCategory>
</Order>
</Orders>
<CreatedBefore>2015-06-08T06:45:22Z</CreatedBefore>
<NextToken>smN7fNREdZyaJqJYLDm0ZIfVkJJPpovRb7YcCAmB0tlUojdU4H46trQzazHyYVyLqBXdLk4iogxpJASl2BeRezElfc2tdWR3lK0FtvOjoEqUrelVme04kSJ0wMvlylZkWQWPqGlbsnPaEpJjLWtrc27Vm9nDvRdgFtvOhjiqTWA16vKmtecRgbuZIF9n45mtnrZ4AbBdBTdge/hBzh1HtoVw85GaTVKBVfeXMWcfhX25HmwX5IAmwKfxnqm3JqvZ0Rjw/YZARKQMcjl5+H0CsJGesRwkZOQCBLVDshZ93sFo8v4Do3XuodaFg8ZGJDSTcawcthgh/MGM4KOIYd79q7Aq3I/8b9+STDy5JVgPyI0jQ6ftKc7EcAIwpq2cHuPbP+HgZXNbc7qI4HDvHa5YloEDUrIQbaP8qbwRHLZm6VTmGvVwLKwj6AZ0GNanrGO6</NextToken>
</ListOrdersResult>
<ResponseMetadata>
<RequestId>f2b55344-d281-4bd3-b8b3-788be07b7656</RequestId>
</ResponseMetadata>
</ListOrdersResponse>
I am using a python script to parse data from xml file. I want two fields from XML file AmazonOrderID and BuyerName. Some sub element in XML might not have have BuyerName. When I parse both individually, I get a list of 100 AmazonOrder and 70 BuyerName.
I want to get a empty string instead of nothing. i.e. if any subelement doesn't have a buyer name, i want to include '' instead of nothing.
My Code:
from xml.etree import ElementTree
with open('orders.xml', 'rb') as f:
tree = ElementTree.parse(f)
ns = {'d': 'https://mws.amazonservices.com/Orders/2013-09-01'}
for node in tree.findall('.//d:Order/d:AmazonOrderId', ns):
oid.append(node.text)
for node in tree.findall('.//d:Order/d:BuyerName', ns):
bn.append(node.text)
print oid
print bn
You can make it in a single loop using findtext() specifying the default as an empty string:
for node in tree.findall('.//d:Order', namespaces=ns):
oid.append(node.findtext("d:AmazonOrderId", default='', namespaces=ns))
bn.append(node.findtext("d:BuyerName", default='', namespaces=ns))