I have an xml with a structure as follows:
<routes xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://sumo.dlr.de/xsd/routes_file.xsd">
<vType id="Bus" vClass="ignoring" guiShape="bus" color="cyan"/>
<vehicle id="1.0" type="Bus" depart="0.00">
<route edges="207358226 206878618#0 206878618#1 206878618#2 206878571 206878624#0 195427225#1 25450515 171767377#0 171767377#1 195427224#0 96336873 96336870"/>
</vehicle>
<vehicle id="2.0" type="Taxi" depart="0.00">
<route edges="172428613 -25301974#1 -25301974#0 172428582 -172428593 -25301969#5 -25301969#4 -165310768#1 -165310768#0 -45073854#4 -45073854#3 -45073854#0 -32932418#2 172436826#1 172436826#2 172436826#3 172436826#4 172436826#5 172405270#0 24629564 172405301#1 -172405301#1 -24629564 -172405270#0"/>
</vehicle>
<vehicle id="1.1" type="Bus" depart="0.00">
<route edges="207358226 206878618#0 206878618#1 206878618#2 206878571 206878624#0 195427225#1 25450515 171767377#0 171767377#1 195427224#0 96336873 96336870"/>
</vehicle>
There are multiple vType elements (ex. Bus, taxi, passenger car etc) and for each vType, there are multiple instantantiations of vehicle (numbered 1.0, 1.1 etc.) that has the route edges as attributes.
I want to now append the file such that I now have a subelement stop under vehicle that specifies the stops as follows
<routes xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://sumo.dlr.de/xsd/routes_file.xsd">
<vType id="Bus" vClass="ignoring" guiShape="bus" color="cyan"/>
<vehicle id="1.0" type="Bus" depart="0.00">
<route edges="207358226 206878618#0 206878618#1 206878618#2 206878571 206878624#0 195427225#1 25450515 171767377#0 171767377#1 195427224#0 96336873 96336870"/>
<stop lane="207358226" endPos="10" duration="20"/>
<stop lane="206878618#0" endPos="10" duration="20"/>
<stop lane="206878618#1" endPos="10" duration="20"/>
..........
..........
</vehicle>
<vehicle id="2.0" type="Taxi" depart="0.00">
<route edges="172428613 -25301974#1 -25301974#0 172428582 -172428593 -25301969#5 -25301969#4 -165310768#1 -165310768#0 -45073854#4 -45073854#3 -45073854#0 -32932418#2 172436826#1 172436826#2 172436826#3 172436826#4 172436826#5 172405270#0 24629564 172405301#1 -172405301#1 -24629564 -172405270#0"/>
</vehicle>
<vehicle id="1.1" type="Bus" depart="0.00">
<route edges="207358226 206878618#0 206878618#1 206878618#2 206878571 206878624#0 195427225#1 25450515 171767377#0 171767377#1 195427224#0 96336873 96336870"/>
<stop lane="207358226" endPos="10" duration="20"/>
<stop lane="206878618#0" endPos="10" duration="20"/>
<stop lane="206878618#1" endPos="10" duration="20"/>
..........
..........
</vehicle>
My initial approach is to iteratively parse the xml and pick up the elements with tag vehicle and attribute Bus. I then copy the edges into a list edgesnew. I then create a subelement iteratively inside a loop under vehicle named stop. The code is as follows
parser = etree.XMLParser(encoding='utf-8', recover=True)
routesFileTree = etree.parse('kaiserslautern.rou1.xml', parser)
routesFileRoot = routesFileTree.getroot()
vehicle = routesFileRoot.find('vehicle')
route = etree.SubElement(vehicle, 'route')
for elem in routesFileRoot.iter(tag = 'vehicle'):
if elem.attrib['type'] == 'Bus':
for subelem in elem.iter(tag = 'route'):
if subelem.attrib.get('edges'):
edgesnew = subelem.attrib['edges'].split(' ')
for edges in range(0,len(edgesnew),3):
stop = etree.SubElement(vehicle,'stop', lane = stops[edgesnew[edges]], duration = "30")
The program executes but my algorithm is wrong as it returns me the following output when I try to print
<routes xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://sumo.dlr.de/xsd/routes_file.xsd">
<vType id="Bus" vClass="ignoring" guiShape="bus" color="cyan"/>
<vehicle id="1.0" type="Bus" depart="0.00">
<route edges="207358226 206878618#0 206878618#1 206878618#2 206878571 206878624#0 195427225#1 25450515 171767377#0 171767377#1 195427224#0 96336873 96336870"/>
<route><stop lane="207358226" duration="30" endPos="10"/><stop lane="206878618#0" duration="30" endPos="10"/><.........../></route></vehicle>
<vehicle id="2.0" type="Taxi" depart="0.00">
<route edges="172428613 -25301974#1 -25301974#0 172428582 -172428593 -25301969#5 -25301969#4 -165310768#1 -165310768#0 -45073854#4 -45073854#3 -45073854#0 -32932418#2 172436826#1 172436826#2 172436826#3 172436826#4 172436826#5 172405270#0 24629564 172405301#1 -172405301#1 -24629564 -172405270#0"/>
</vehicle>
<vehicle id="1.1" type="Bus" depart="0.00">
<route edges="207358226 206878618#0 206878618#1 206878618#2 206878571 206878624#0 195427225#1 25450515 171767377#0 171767377#1 195427224#0 96336873 96336870"/>
</vehicle>
There are multiple problems in the code..First it only creates a new subelement for one instantiation of the vehicle only. and secondly it creates new route element rathat than appending to the existing xml. I have seen that I need to use element.append but cant figure out where and how.
Thanks in advance for the help
Related
I want to create an XML file using python
like this:
<?xml version="1.0" encoding="utf-8"?>
<vehicle id="m0">
<timestep pos="2.3000" angle="11.1766" lane="-250709918#7_0" speed="0.0000" time="8.0"
</vehicle>
<vehicle id="m1">
<timestep pos="2.3000" angle="11.1766" lane="-250709918#7_0" speed="0.0000" time="8.0"
</vehicle>
........
my code:
doc = xml.dom.minidom.Document()
root = doc.createElement('vehicle')
for veh in veh_dict:
root.setAttribute('id', veh)
doc.appendChild(root)
for index, value in enumerate(veh_dict[veh]):
nodeManager = doc.createElement('timestep')
nodeManager.setAttribute('time', str(veh_dict[veh][index]['time']))
nodeManager.setAttribute('angle', str(veh_dict[veh][index]['angle']))
nodeManager.setAttribute('lane', str(veh_dict[veh][index]['lane']))
nodeManager.setAttribute(' pos', str(veh_dict[veh][index]['pos']))
nodeManager.setAttribute('speed', str(veh_dict[veh][index]['speed']))
nodeManager.setAttribute('type', str(veh_dict[veh][index]['type']))
nodeManager.setAttribute('x', str(veh_dict[veh][index]['x']))
nodeManager.setAttribute('y', str(veh_dict[veh][index]['y']))
root.appendChild(nodeManager)
fp = open('Manager.xml', 'w')
doc.writexml(fp, indent='\t', addindent='\t', newl='\n', encoding="utf-8")
My output has all datas, but they are all written in one of the 'vehicle'
like this:
<vehicle id="m2.9">
<timestep pos="2.3000" angle="11.1766" lane="-250709918#7_0" speed="0.0000" time="8.0" type="custom_moto" x="469.2605" y="5896.8761"/>
<timestep pos="3.3001" angle="12.9664" lane="-250709918#7_0" speed="1.0001" time="9.0" type="custom_moto" x="470.1134" y="5907.0132"/>
<timestep pos="6.4467" angle="12.2144" lane="-250709918#7_0" speed="3.1466" time="10.0" type="custom_moto" x="470.849" y="5900.3489"/>
<timestep pos="12.7147" angle="11.8696" lane="-250709918#7_0" speed="6.2681" time="11.0"
.......
Is the root always being overwritten?
How can solve it?
Add the root element inside the loop:
import xml.dom.minidom
doc = xml.dom.minidom.Document()
topElem = doc.createElement('vehicles')
for veh in veh_dict:
for index, value in enumerate(veh_dict[veh]):
root = doc.createElement('vehicle')
root.setAttribute('id', veh)
doc.appendChild(root)
nodeManager = doc.createElement('timestep')
nodeManager.setAttribute('time', str(veh_dict[veh][index]['time']))
nodeManager.setAttribute('angle', str(veh_dict[veh][index]['angle']))
nodeManager.setAttribute('lane', str(veh_dict[veh][index]['lane']))
nodeManager.setAttribute(' pos', str(veh_dict[veh][index]['pos']))
nodeManager.setAttribute('speed', str(veh_dict[veh][index]['speed']))
nodeManager.setAttribute('type', str(veh_dict[veh][index]['type']))
nodeManager.setAttribute('x', str(veh_dict[veh][index]['x']))
nodeManager.setAttribute('y', str(veh_dict[veh][index]['y']))
root.appendChild(nodeManager)
topElem.appendChild(root)
fp = open('Manager.xml', 'w')
doc.writexml(fp, indent='\t', addindent='\t', newl='\n', encoding="utf-8")
Consider using a top-level root above <vehicle> elements as required for well-formed XML documents. Also, avoid the repetitious lines and use the inner dictionary keys as the iterator variable. Finally, use context manager, with, to write built XML to file.
import xml.dom.minidom
# LIST OF DICTS
veh_dicts = [{'x': '469.2605', 'y': '5896.8761', 'time': 8.0, 'lane': '-250709918#7_0',
'angle': '11.1766', 'pos': '2.3000', 'speed': '0.0000', 'type': 'custom_moto'},
{'x': '470.1134', 'y': '5907.0132', 'time': 9.0, 'lane': '-250709918#7_0',
'angle': '12.9664', 'pos': '3.3001', 'speed': '1.0001', 'type': 'custom_moto'}]
doc = xml.dom.minidom.Document()
root = doc.createElement('vehicles') # TOP-LEVEL ROOT
doc.appendChild(root)
# ITERATE THROUGH EACH DICT
for i, veh in enumerate(veh_dicts, start=1):
vehichleElem = doc.createElement('vehicle')
vehichleElem.setAttribute('id', f'm{i}') # USES F-STRING (Python 3.6+)
root.appendChild(vehichleElem)
nodeManager = doc.createElement('timestep')
for k in veh.keys():
nodeManager.setAttribute(k, str(veh[k]))
vehichleElem.appendChild(nodeManager)
with open('MiniDomXMLBuild.xml', 'w') as fp: # CONTEXT MANAGER (NO close() NEEDED)
doc.writexml(fp, addindent='\t', newl='\n', encoding="utf-8")
Output
<?xml version="1.0" encoding="utf-8"?>
<vehicles>
<vehicle id="m1">
<timestep angle="11.1766" lane="-250709918#7_0" pos="2.3000" speed="0.0000" time="8.0" type="custom_moto" x="469.2605" y="5896.8761"/>
</vehicle>
<vehicle id="m2">
<timestep angle="12.9664" lane="-250709918#7_0" pos="3.3001" speed="1.0001" time="9.0" type="custom_moto" x="470.1134" y="5907.0132"/>
</vehicle>
</vehicles>
I need to generate and sign an XML. I'm using Windows 10.
I've a certificate in PFX format (also known as P12). I've succesfully extracted the key and cert from that file, in order to sign an XML.
But now I need to know how to sign that XML? More exactly, how to generate the content for this XMl elements: ds:DigestValue, ds:SignatureValue and ds:X509Certificate???
According to the documentation of the entity that will verify the signed XML, this elements are:
ds:SignatureValue: contains the base64 coded sign. The sign is the result of a series of transformations over the binari data of the element <ds:SignedInfo>. The element <ds:SignatureValue> contains this binari value of the coded sign in base64.
ds:DigestMethod: defines the hash function to be used, through the Algorithm atribute.
ds: DigestValue: Is the hash value in base64.
ds:X509Certificate: it is just the sign.
The final XML has other parts that holds different information, but I like to understand how to construct the mentioned tags:
<ds:Signature Id="IDSignSA">
<ds:SignedInfo>
<ds:CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
<ds:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/>
<ds:Reference URI="">
<ds:Transforms>
<ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#envelopedsignature"/>
</ds:Transforms>
<ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
<ds:DigestValue>X4V0Z4K8CNcrud9vojN Iy/8hjkk=</ds:DigestValue>
</ds:Reference>
</ds:SignedInfo>
<ds:SignatureValue>U8bSaZUgOiD F3VDtwFRPiZ/6AQkSht7ezF8QVv+g5ELiLqkQHmRoL7VMLUtaHr+A9VhkIb5RZU5P EoRcvxI9v37zJg2WUe2wpKAY7AHm7kvvUHEs09K8Q+c0rqNaAAK1kvXPHbVFXnn0o2oLWI1bI/rS sWsFtjdmVtyLjkDEzf8=</ds:SignatureValue>
<ds:KeyInfo>
<ds:X509Data>
<ds:X509SubjectName>1. 2.840.113549.1.9.1=#161a4253554c434140534f55544845524e504552552e434f4d2e5045,CN=Pedro Solano,OU=10261110983,O=SOLANO ALMAGRO PEDRO,L=TACNA,ST=TACNA,C=PE</ds:X509SubjectName>
<ds:X509Certificate>MIIESTCCAzGgAwIBAgI
KWOCRzgAAAAAAIjANBgkqhkiG9w0BAQUFADAnMRUwEwYKCZImiZPyLGQB
GRYFU1VOQVQxDjAMBgNVBAMTBVNVTkFUMB4XDTEwMTIyODE5NTExMFoXDTExMTIyODIwMDExMFow
gZUxCzAJBgNVBAYTAlBFMQ0wCwYDVQQIEwRMSU1BMQ0wCwYDVQQHEwRMSU1BMREwDwYDVQQKEwhT
T1VUSEVSTjEUMBIGA1UECxMLMjAxMDAxNDc1MTQxFDASBgNVBAMTC0JvcmlzIFN1bGNhMSkwJwYJ
KoZIhvcNAQkBFhpCU1VMQ0FAU09VVEhFUk5QRVJVLkNPTS5QRTCBnzANBgkqhkiG9w0BAQEFAAOB
jQAwgYkCgYEAtRtcpfBLzyajuEmYt4mVH8EE02KQiETsdKStUThVYM7g3Lkx5zq3SH5nLH00EKGC
tota6RR+V40sgIbnh+Nfs1SOQcAohNwRfWhho7sKNZFR971rFxj4cTKMEvpt8Dr98UYFkJhph6Wn
sniGM2tJDq9KJ52UXrlScMfBityx0AsCAwEAAaOCAYowggGGMA4GA1UdDwEB/wQEAwIE8DBEBgkq
hkiG9w0BCQ8ENzA1MA4GCCqGSIb3DQMCAgIAgDAOBggqhkiG9w0DBAICAIAwBwYFKw4DAgcwCgYI
KoZIhvcNAwcwHQYDVR0OBBYEFG/m6twbiRNzRINavjq+U0j/sZECMBMGA1UdJQQMMAoGCCsGAQUF
BwMCMB8GA1UdIwQYMBaAFN9kHQDqWONmozw3xdNSIMFW2t+7MFkGA1UdHwRSMFAwTqBMoEqGImh0
dHA6Ly9wY2IyMjYvQ2VydEVucm9sbC9TVU5BVC5jcmyGJGZpbGU6Ly9cXHBjYjIyNlxDZXJ0RW5y
b2xsXFNVTkFULmNybDB+BggrBgEFBQcBAQRyMHAwNQYIKwYBBQUHMAKGKWh0dHA6Ly9wY2IyMjYv
Q2VydEVucm9sbC9wY2IyMjZfU1VOQVQuY3J0MDcGCCsGAQUFBzAChitmaWxlOi8vXFxwY2IyMjZc
Q2VydEVucm9sbFxwY2IyMjZfU1VOQVQuY3J0MA0GCSqGSIb3DQEBBQUAA4IBAQBI6wJ/QmRpz3C3
rorBflOvA9DOa3GNiiB7rtPIjF4mPmtgfo2pK9gvnxmV2pST3ovfu0nbG2kpjzzaaelRjEodHvkc
M3abGsOE53wfxqQF5uf/jkzZA9hbLHtE1aLKBD0Mhzc6cvI072alnE6QU3RZ16ie9CYsHmMrs+sP
HMy8DJU5YrdnqHdSn2D3nhKBi4QfT/WURPOuo6DF4iWgrCyMf3eJgmGKSUN3At5fK4HSpfyURT0k
boaJKNBgQwy0HhGh5BLM7DsTi/KwfdUYkoFgrY71Pm23+ra+xTow1Vk9gj5NqrlpMY5gAVQXEIo1
++GxDtaK/5EiVKSqzJ6geIfz</ds:X509Certificate>
</ds:X509Data>
</ds:KeyInfo>
</ds:Signature>
<cac:Signature>
<cbc:ID>IDSignSA</cbc:ID>
<cac:SignatoryParty>
<cac:PartyIdentification>
<cbc:ID>10261110983</cbc:ID>
</cac:PartyIdentification>
<cac:PartyName>
<cbc:Name>SOLANO ALMAGRO PEDRO</cbc:Name>
</cac:PartyName>
</cac:SignatoryParty>
<cac:DigitalSignatureAttachment>
<cac:ExternalReference>
<cbc:URI>#signature</cbc:URI>
</cac:ExternalReference>
</cac:DigitalSignatureAttachment>
</cac:Signature>
I've found the signxml library. And it has an example:
from lxml import etree
from signxml import XMLSigner, XMLVerifier
data_to_sign = "<Test/>"
cert = open("example.pem").read()
key = open("example.key").read()
root = etree.fromstring(data_to_sign)
signed_root = XMLSigner().sign(root, key=key, cert=cert)
verified_data = XMLVerifier().verify(signed_root).signed_xml
But would this example generate the contents for the 3 tags?
UPDATE 1:
I've tried to put my XML like this to sign it but getting an error:
from lxml import etree
from signxml import XMLSigner, XMLVerifier
passwd = 'caballo123'
cd = 'D:\\facturacion_electronica\\cetificado_prueba\\'
my_data_to_sign = '''
<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2"
xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2"
xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2"
xmlns:ccts="urn:un:unece:uncefact:documentation:2"
xmlns:ds="http://www.w3.org/2000/09/xmldsig#"
xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2"
xmlns:qdt="urn:oasis:names:specification:ubl:schema:xsd:QualifiedDatatypes-2"
xmlns:sac="urn:sunat:names:specification:ubl:peru:schema:xsd:SunatAggregateComponents-1"
xmlns:udt="urn:un:unece:uncefact:data:specification:UnqualifiedDataTypesSchemaModule:2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ext:UBLExtensions>
<ext:UBLExtension>
<ext:ExtensionContent>
<sac:AdditionalInformation>
<sac:AdditionalMonetaryTotal>
<cbc:ID>1001</cbc:ID>
<cbc:PayableAmount currencyID="PEN">1407.29</cbc:PayableAmount>
</sac:AdditionalMonetaryTotal>
<sac:AdditionalMonetaryTotal>
<cbc:ID>1004</cbc:ID>
<cbc:PayableAmount currencyID="PEN">48.00</cbc:PayableAmount>
</sac:AdditionalMonetaryTotal>
<sac:AdditionalMonetaryTotal>
<cbc:ID>2005</cbc:ID>
<cbc:PayableAmount currencyID="PEN">74.07</cbc:PayableAmount>
</sac:AdditionalMonetaryTotal>
<sac:AdditionalProperty>
<cbc:ID>1000</cbc:ID>
<cbc:Value>SON MIL SEISCIENTOS SESENTA Y 60/100</cbc:Value>
</sac:AdditionalProperty>
</sac:AdditionalInformation>
</ext:ExtensionContent>
</ext:UBLExtension>
<ext:UBLExtension>
<ext:ExtensionContent>
<ds:Signature Id="signatureKG">
<ds:SignedInfo>
<ds:CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>
<ds:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/>
<ds:Reference URI="">
<ds:Transforms>
<ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#envelopedsignature"/>
</ds:Transforms>
<ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
<ds:DigestValue></ds:DigestValue>
</ds:Reference>
</ds:SignedInfo>
<ds:SignatureValue></ds:SignatureValue>
<ds:KeyInfo>
<ds:X509Data>
<ds:X509SubjectName>1.2 .840.113549.1.9.1=#161a4253554c434140534f55544845524e504552552e434f4d2e5045,CN=Carlos Vega,OU=10200545523,O=Vega Poblete Carlos Enrique,L=CHICLAYO,ST=LAMBAYEQUE,C=PE</ds:X509SubjectName>
<ds:X509Certificate></ds:X509Certificate>
</ds:X509Data>
</ds:KeyInfo>
</ds:Signature>
</ext:ExtensionContent>
</ext:UBLExtension>
</ext:UBLExtensions>
<cbc:UBLVersionID>2.0</cbc:UBLVersionID>
<cbc:CustomizationID>1.0</cbc:CustomizationID>
<cbc:ID>BC01-3652</cbc:ID>
<cbc:IssueDate>2012-06-24</cbc:IssueDate>
<cbc:InvoiceTypeCode>03</cbc:InvoiceTypeCode>
<cbc:DocumentCurrencyCode>PEN</cbc:DocumentCurrencyCode>
<cac:Signature>
<cbc:ID>IDSignKG</cbc:ID>
<cac:SignatoryParty>
<cac:PartyIdentification>
<cbc:ID>10200545523</cbc:ID>
</cac:PartyIdentification>
<cac:PartyName>
<cbc:Name>VEGA POBLETE CARLOS ENRIQUE</cbc:Name>
</cac:PartyName>
</cac:SignatoryParty>
<cac:DigitalSignatureAttachment>
<cac:ExternalReference>
<cbc:URI>#SignatureKG</cbc:URI>
</cac:ExternalReference>
</cac:DigitalSignatureAttachment>
</cac:Signature>
<cac:AccountingSupplierParty>
<cbc:CustomerAssignedAccountID>10200545523</cbc:CustomerAssignedAccountID>
<cbc:AdditionalAccountID>6</cbc:AdditionalAccountID>
<cac:Party>
<cac:PartyName>
<cbc:Name>ELECTRODOMESTICOS CRUZ DE MOTUPE</cbc:Name>
</cac:PartyName>
<cac:PostalAddress>
<cbc:ID>140106</cbc:ID>
<cbc:StreetName>AV. LOS TALLANES #235</cbc:StreetName>
<cbc:CitySubdivisionName>URB. MIGUEL GRAU</cbc:CitySubdivisionName>
<cbc:CityName>CHICLAYO</cbc:CityName>
<cbc:CountrySubentity>LAMBAYEQUE</cbc:CountrySubentity>
<cbc:District>LA VICTORIA</cbc:District>
<cac:Country>
<cbc:IdentificationCode>PE</cbc:IdentificationCode>
</cac:Country>
</cac:PostalAddress>
<cac:PartyLegalEntity>
<cbc:RegistrationName>VEGA POBLETE CARLOS ENRIQUE</cbc:RegistrationName>
</cac:PartyLegalEntity>
</cac:Party>
</cac:AccountingSupplierParty>
<cac:AccountingCustomerParty>
<cbc:CustomerAssignedAccountID>00078647</cbc:CustomerAssignedAccountID>
<cbc:AdditionalAccountID>1</cbc:AdditionalAccountID>
<cac:Party>
<cac:PartyLegalEntity>
<cbc:RegistrationName>SOLEDAD ASUNCION CARRASCO PEREZ</cbc:RegistrationName>
</cac:PartyLegalEntity>
</cac:Party>
</cac:AccountingCustomerParty>
<cac:TaxTotal>
<cbc:TaxAmount currencyID="PEN">253.31</cbc:TaxAmount>
<cac:TaxSubtotal>
<cbc:TaxAmount currencyID="PEN">253.31</cbc:TaxAmount>
<cac:TaxCategory>
<cac:TaxScheme>
<cbc:ID>1000</cbc:ID>
<cbc:Name>IGV</cbc:Name>
<cbc:TaxTypeCode>VAT</cbc:TaxTypeCode>
</cac:TaxScheme>
</cac:TaxCategory>
</cac:TaxSubtotal>
</cac:TaxTotal>
<cac:LegalMonetaryTotal>
<cbc:PayableAmount currencyID="PEN">1660.60</cbc:PayableAmount>
</cac:LegalMonetaryTotal>
<cac:InvoiceLine>
<cbc:ID>1</cbc:ID>
<cbc:InvoicedQuantity unitCode="NIU">1</cbc:InvoicedQuantity>
<cbc:LineExtensionAmount currencyID="PEN">845.76</cbc:LineExtensionAmount>
<cac:PricingReference>
<cac:AlternativeConditionPrice>
<cbc:PriceAmount currencyID="PEN">998.00</cbc:PriceAmount>
<cbc:PriceTypeCode>01</cbc:PriceTypeCode>
</cac:AlternativeConditionPrice>
</cac:PricingReference>
<cac:TaxTotal>
<cbc:TaxAmount currencyID="PEN">152.24</cbc:TaxAmount>
<cac:TaxSubtotal>
<cbc:TaxAmount currencyID="PEN">152.24</cbc:TaxAmount>
<cac:TaxCategory>
<cbc:TaxExemptionReasonCode>10</cbc:TaxExemptionReasonCode>
<cac:TaxScheme>
<cbc:ID>1000</cbc:ID>
<cbc:Name>IGV</cbc:Name>
<cbc:TaxTypeCode>VAT</cbc:TaxTypeCode>
</cac:TaxScheme>
</cac:TaxCategory>
</cac:TaxSubtotal>
</cac:TaxTotal>
<cac:Item>
<cbc:Description>Refrigeradora marca "AXM" no frost de 200 ltrs</cbc:Description>
<cac:SellersItemIdentification>
<cbc:ID>REF564</cbc:ID>
</cac:SellersItemIdentification>
</cac:Item>
<cac:Price>
<cbc:PriceAmount currencyID="PEN">845.76</cbc:PriceAmount>
</cac:Price>
</cac:InvoiceLine>
<cac:InvoiceLine>
<cbc:ID>2</cbc:ID>
<cbc:InvoicedQuantity unitCode="NIU">1</cbc:InvoicedQuantity>
<cbc:LineExtensionAmount currencyID="PEN">635.59</cbc:LineExtensionAmount>
<cac:PricingReference>
<cac:AlternativeConditionPrice>
<cbc:PriceAmount currencyID="PEN">750.00</cbc:PriceAmount>
<cbc:PriceTypeCode>01</cbc:PriceTypeCode>
</cac:AlternativeConditionPrice>
</cac:PricingReference>
<cac:TaxTotal>
<cbc:TaxAmount currencyID="PEN">114.41</cbc:TaxAmount>
<cac:TaxSubtotal>
<cbc:TaxAmount currencyID="PEN">114.41</cbc:TaxAmount>
<cac:TaxCategory>
<cbc:TaxExemptionReasonCode>10</cbc:TaxExemptionReasonCode>
<cac:TaxScheme>
<cbc:ID>1000</cbc:ID>
<cbc:Name>IGV</cbc:Name>
<cbc:TaxTypeCode>VAT</cbc:TaxTypeCode>
</cac:TaxScheme>
</cac:TaxCategory>
</cac:TaxSubtotal>
</cac:TaxTotal>
<cac:Item>
<cbc:Description>Cocina a gas GLP, marca "AXM" de 5 hornillas</cbc:Description>
<cac:SellersItemIdentification>
<cbc:ID>COC124</cbc:ID>
</cac:SellersItemIdentification>
</cac:Item>
<cac:Price>
<cbc:PriceAmount currencyID="PEN">635.59</cbc:PriceAmount>
</cac:Price>
</cac:InvoiceLine>
<cac:InvoiceLine>
<cbc:ID>3</cbc:ID>
<cbc:InvoicedQuantity unitCode="NIU">1</cbc:InvoicedQuantity>
<cbc:LineExtensionAmount currencyID="PEN">0.00</cbc:LineExtensionAmount>
<cac:PricingReference>
<cac:AlternativeConditionPrice>
<cbc:PriceAmount currencyID="PEN">0.00</cbc:PriceAmount>
<cbc:PriceTypeCode>01</cbc:PriceTypeCode>
</cac:AlternativeConditionPrice>
<cac:AlternativeConditionPrice>
<cbc:PriceAmount currencyID="PEN">4.80</cbc:PriceAmount>
<cbc:PriceTypeCode>02</cbc:PriceTypeCode>
</cac:AlternativeConditionPrice>
</cac:PricingReference>
<cac:TaxTotal>
<cbc:TaxAmount currencyID="PEN">0.00</cbc:TaxAmount>
<cac:TaxSubtotal>
<cbc:TaxAmount currencyID="PEN">0.00</cbc:TaxAmount>
<cac:TaxCategory>
<cbc:TaxExemptionReasonCode>31</cbc:TaxExemptionReasonCode>
<cac:TaxScheme>
<cbc:ID>1000</cbc:ID>
<cbc:Name>IGV</cbc:Name>
<cbc:TaxTypeCode>VAT</cbc:TaxTypeCode>
</cac:TaxScheme>
</cac:TaxCategory>
</cac:TaxSubtotal>
</cac:TaxTotal>
<cac:Item>
<cbc:Description>Sixpack de gaseosa "Guaraná" de 400 ml.</cbc:Description>
<cac:SellersItemIdentification>
<cbc:ID>NOB012</cbc:ID>
</cac:SellersItemIdentification>
</cac:Item>
<cac:Price>
<cbc:PriceAmount currencyID="PEN">0.00</cbc:PriceAmount>
</cac:Price>
</cac:InvoiceLine>
</Invoice>
'''
data_to_sign = my_data_to_sign
cert = open("llama_cert.pem").read()
key = open("llama.key").read()
root = etree.fromstring(data_to_sign)
signed_root = XMLSigner().sign(root, key=key, cert=cert)
verified_data = XMLVerifier().verify(signed_root, x509_cert=cert).signed_xml
print(verified_data)
I am trying to parse XML file to CSV. However, I am getting the following error.
I have tested the logic with another simple XML and it seems to work. I have provided below my error, the XML file, the python code, and my desired output. Right now I have only added two of my columns. Have been looking at this for hours so another set of eyes would be much appreciated. Thank you!
Error:
name = member.find('CaseName').tag AttributeError: 'NoneType' object has no attribute 'tag'
XML File:
<?xml version="1.0" encoding="UTF-8"?>
<Nuix version="7.2.2" architecture="amd64">
<Export
startTime="Sun Feb 25 22:07:07 2018 (America/Chicago)"
endTime="Sun Feb 25 22:08:03 2018 (America/Chicago)"
exportDuration="55s"
processingDuration="55s">
<ExportConfiguration>
<LoadFiles>
</LoadFiles>
<MessageFormat>NATIVE</MessageFormat>
<ExportDirectory>C:\Users\KK132WQ\Desktop\Brooklyn Case - Nuix\OCR cache directory</ExportDirectory>
<SeparateEmailAttachments>false</SeparateEmailAttachments>
<RegenerateNatives>false</RegenerateNatives>
<RegeneratePdfs>false</RegeneratePdfs>
<FindTopLevelItems>false</FindTopLevelItems>
<DescendantItems>false</DescendantItems>
<ExportContainers>false</ExportContainers>
<SortOrder>position</SortOrder>
<CaseName>Brooklyn</CaseName>
<CaseLocation>C:\Users\KK132WQ\Desktop\Brooklyn Case - Nuix</CaseLocation>
<TimeZone>America/Chicago</TimeZone>
<Numbering>
<Strategy>Document ID numbering</Strategy>
<DocumentPagesInSameFolder>true</DocumentPagesInSameFolder>
<FamilyDocumentsInSameFolder>false</FamilyDocumentsInSameFolder>
<FirstItemNumber>DOC-000000001</FirstItemNumber>
</Numbering>
<Imaging>
<ImagingProfile>Default</ImagingProfile>
</Imaging>
<Naming>
<NativeNamingScheme>Page only</NativeNamingScheme>
<PdfNamingScheme>Page only</PdfNamingScheme>
</Naming>
<OcrSettings>
<Recognition>High Quality - Slow</Recognition>
<Deskewed/>
<UpdateTextStore append="true"/>
<Rotation>Auto</Rotation>
<Languages>English</Languages>
</OcrSettings>
<ResemblanceThreshold>0.85</ResemblanceThreshold>
</ExportConfiguration>
<ExportStatistics>
<SelectedItems>4</SelectedItems>
<ExcludedCount>0</ExcludedCount>
<TotalItemsToExport>4</TotalItemsToExport>
<FailedItems>0</FailedItems>
<DocumentNumbers>
<First></First>
<Last></Last>
</DocumentNumbers>
</ExportStatistics>
<ExportStageDetails>
<Stage
name="WORK_QUEUE"
successfulItems="4"
failedItems="0"
duration="1s">
<SlipsheetItemDetails>
</SlipsheetItemDetails>
<FailedItemDetails>
</FailedItemDetails>
</Stage>
<Stage
name="NATIVE"
successfulItems="4"
failedItems="0"
duration="33s">
<SlipsheetItemDetails>
</SlipsheetItemDetails>
<FailedItemDetails>
</FailedItemDetails>
</Stage>
<Stage
name="STORED_EMAIL_FIXUP"
successfulItems="4"
failedItems="0"
duration="1s">
<SlipsheetItemDetails>
</SlipsheetItemDetails>
<FailedItemDetails>
</FailedItemDetails>
</Stage>
<Stage
name="PDF"
successfulItems="4"
failedItems="0"
duration="1s">
<SlipsheetItemDetails>
</SlipsheetItemDetails>
<FailedItemDetails>
</FailedItemDetails>
</Stage>
<Stage
name="BINARY_STORE"
successfulItems="0"
failedItems="0"
duration="0s">
<SlipsheetItemDetails>
</SlipsheetItemDetails>
<FailedItemDetails>
</FailedItemDetails>
</Stage>
<Stage
name="OCR_INITIALISATION"
successfulItems="4"
failedItems="0"
duration="0s">
<SlipsheetItemDetails>
</SlipsheetItemDetails>
<FailedItemDetails>
</FailedItemDetails>
</Stage>
<Stage
name="OCR"
successfulItems="4"
failedItems="0"
duration="17s">
<SlipsheetItemDetails>
</SlipsheetItemDetails>
<FailedItemDetails>
</FailedItemDetails>
</Stage>
<Stage
name="POST_OCR"
successfulItems="4"
failedItems="0"
duration="0s">
<SlipsheetItemDetails>
</SlipsheetItemDetails>
<FailedItemDetails>
</FailedItemDetails>
</Stage>
<Stage
name="TEXT_REPLACEMENT"
successfulItems="4"
failedItems="0"
duration="1s">
<SlipsheetItemDetails>
</SlipsheetItemDetails>
<FailedItemDetails>
</FailedItemDetails>
</Stage>
</ExportStageDetails>
<FileStatistics>
<NativeFilesExported>3</NativeFilesExported>
<NativeFilesFromStore>0</NativeFilesFromStore>
<NativeFilesExportedInline>0</NativeFilesExportedInline>
<NativeFilesExportedParallel>3</NativeFilesExportedParallel>
<NativeFilesExportedParallelLocal>0</NativeFilesExportedParallelLocal>
<NativeFilesWithInvalidTimes>0</NativeFilesWithInvalidTimes>
<NativePlaceHolderFilesExported>0</NativePlaceHolderFilesExported>
<NativeFilesRegenerated>0</NativeFilesRegenerated>
<TextFilesExported>0</TextFilesExported>
<TextPlaceHolderFilesExported>0</TextPlaceHolderFilesExported>
<PdfFilesExported>0</PdfFilesExported>
<PdfFilesStamped>0</PdfFilesStamped>
<TiffFilesExported>0</TiffFilesExported>
<PdfDetails>
<PdfFilesFromStore>0</PdfFilesFromStore>
<PdfFilesRegenerated>0</PdfFilesRegenerated>
<PdfFilesExportedInline>0</PdfFilesExportedInline>
<PdfFilesExportedParallel>0</PdfFilesExportedParallel>
<PdfFilesExportedParallelLocal>0</PdfFilesExportedParallelLocal>
<UserImportedPdfs>0</UserImportedPdfs>
<PrintedPdfs>0</PrintedPdfs>
<UnformattedTextPdfs>0</UnformattedTextPdfs>
<ItemEncryptedPdfs>0</ItemEncryptedPdfs>
<UnprintableItemPdfs>0</UnprintableItemPdfs>
</PdfDetails>
</FileStatistics>
<PageCountStatistics>
<PdfPages>0</PdfPages>
<StampedPages>0</StampedPages>
<FailedStampedPages>0</FailedStampedPages>
<AveragePageCount>0.0</AveragePageCount>
</PageCountStatistics>
<ThroughputStatistics>
<NativeDocRate>0.0857363321997085</NativeDocRate>
<PdfDocRate>0.0</PdfDocRate>
<StampedDocRate>0.0</StampedDocRate>
<PdfPageRate>0.0</PdfPageRate>
<StampingPageRate>0.0</StampingPageRate>
</ThroughputStatistics>
<MimeTypeStatistics>
<MimeTypes>
<MimeType name="application/pdf" count="4" />
</MimeTypes>
</MimeTypeStatistics>
</Export>
</Nuix>
Python Code:
import xml.etree.ElementTree as ET
import csv
tree = ET.parse('D:\\Users\\eferse\\Desktop\\XML_parsing\\summary-report.xml')
root = tree.getroot()
# open a file for writing
Resident_data = open('D:\\Users\\eferse\\Desktop\\XML_parsing\\Nuix Export XML Parse_PythonOutput.csv', 'w')
# create the csv writer object
csvwriter = csv.writer(Resident_data)
resident_head = []
count = 0
for member in root.findall('Export'):
resident = []
address_list = []
if count == 0:
name = member.find('CaseName').tag
resident_head.append(CaseName)
location= member.find('CaseLocation').tag
resident_head.append(CaseLocation)
csvwriter.writerow(resident_head)
count = count + 1
name = member.find('CaseName').text
resident.append(CaseName)
location= member.find('CaseLocation').text
resident.append(CaseLocation)
csvwriter.writerow(resident)
Resident_data.close()
Desired Output:
Output
I have used indexing to access the child elements in question. Sometimes this is easier to do when you know where the information is.
You can check this using the following
for child in root[0]:
print(child.tag, child.attrib)
and you can navigate further by continuing the index as far as you like root[0][0][1] etc etc
You have to remember that the index is the parent and you are looking for the children. in your case root is Nuix which will return the children in this instance Export
root[0] is 'Export' which find will search the children and return what you want which is ExportConfiguration and inside here is what you are looking for CaseName and CaseLocation..
if you do
for child in root[0][0]:
print(child.tag, child.attrib)
This will print the tags of CaseName etc but you will not be able to use find at this level. You will be searching inside CaseName for CaseName.
Once you have the parent you are able to find the children easier.
This code works.
I have taken the empty lists out of the loop.
I have also changed the append values as they did not have a variable, only a string name... I have also indented some appends as they were outside of the loop.
I have left the print statements in so you can see what is going on.
import xml.etree.ElementTree as ET
import csv
tree = ET.parse('summary-report.xml')
root = tree.getroot()
Resident_data = open('Parse_PythonOutput.csv', 'a')
# create the csv writer object
csvwriter = csv.writer(Resident_data)
resident_head = []
resident = []
address_list = []
count = 0
for member in root[0]:
if count == 0:
name = member.find('CaseName').tag
print(name)
resident_head.append(name)
location = member.find('CaseLocation').tag
print(location)
resident_head.append(location)
csvwriter.writerow(resident_head)
count = count + 1
name_text = member.find('CaseName').text
print(name_text)
resident.append(name_text)
text_location = member.find('CaseLocation').text
print(text_location)
resident.append(text_location)
print(resident)
csvwriter.writerow(resident)
Resident_data.close()
The CSV data file looks like this:
CaseName,CaseLocation
Brooklyn,C:\Users\KK132WQ\Desktop\Brooklyn Case - Nuix
This is my XML file:
<?xml version="1.0" ?>
<Items>
<Item>
<ASIN>3570102769</ASIN>
<DetailPageURL>http://www.amazon.de/Inside-IS-Tage-Islamischen-Staat/dp/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D3570102769</DetailPageURL>
<ItemLinks>
<ItemLink>
<Description>Add To Wishlist</Description>
<URL>http://www.amazon.de/gp/registry/wishlist/add-item.html%3Fasin.0%3D3570102769%26SubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</URL>
</ItemLink>
<ItemLink>
<Description>Tell A Friend</Description>
<URL>http://www.amazon.de/gp/pdp/taf/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</URL>
</ItemLink>
<ItemLink>
<Description>All Customer Reviews</Description>
<URL>http://www.amazon.de/review/product/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</URL>
</ItemLink>
<ItemLink>
<Description>All Offers</Description>
<URL>http://www.amazon.de/gp/offer-listing/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</URL>
</ItemLink>
</ItemLinks>
<ItemAttributes>
<Author>Jürgen Todenhöfer</Author>
<Binding>Gebundene Ausgabe</Binding>
<EAN>9783570102763</EAN>
<EANList>
<EANListElement>9783570102763</EANListElement>
</EANList>
<ISBN>3570102769</ISBN>
<IsEligibleForTradeIn>1</IsEligibleForTradeIn>
<ItemDimensions>
<Height Units="hundredths-inches">874</Height>
<Length Units="hundredths-inches">575</Length>
<Width Units="hundredths-inches">126</Width>
</ItemDimensions>
<Label>C. Bertelsmann Verlag</Label>
<Languages>
<Language>
<Name>Deutsch</Name>
<Type>Published</Type>
</Language>
<Language>
<Name>Deutsch</Name>
<Type>Original</Type>
</Language>
<Language>
<Name>Deutsch</Name>
<Type>Unbekannt</Type>
</Language>
</Languages>
<ListPrice>
<Amount>1799</Amount>
<CurrencyCode>EUR</CurrencyCode>
<FormattedPrice>EUR 17,99</FormattedPrice>
</ListPrice>
<Manufacturer>C. Bertelsmann Verlag</Manufacturer>
<ManufacturerMinimumAge Units="months">192</ManufacturerMinimumAge>
<NumberOfPages>288</NumberOfPages>
<PackageDimensions>
<Height Units="hundredths-inches">118</Height>
<Length Units="hundredths-inches">567</Length>
<Weight Units="hundredths-pounds">93</Weight>
<Width Units="hundredths-inches">252</Width>
</PackageDimensions>
<PackageQuantity>1</PackageQuantity>
<ProductGroup>Book</ProductGroup>
<ProductTypeName>ABIS_BOOK</ProductTypeName>
<PublicationDate>2015-04-27</PublicationDate>
<Publisher>C. Bertelsmann Verlag</Publisher>
<Studio>C. Bertelsmann Verlag</Studio>
<Title>Inside IS - 10 Tage im 'Islamischen Staat'</Title>
<TradeInValue>
<Amount>930</Amount>
<CurrencyCode>EUR</CurrencyCode>
<FormattedPrice>EUR 9,30</FormattedPrice>
</TradeInValue>
</ItemAttributes>
<OfferSummary>
<LowestNewPrice>
<Amount>1799</Amount>
<CurrencyCode>EUR</CurrencyCode>
<FormattedPrice>EUR 17,99</FormattedPrice>
</LowestNewPrice>
<LowestUsedPrice>
<Amount>1390</Amount>
<CurrencyCode>EUR</CurrencyCode>
<FormattedPrice>EUR 13,90</FormattedPrice>
</LowestUsedPrice>
<LowestCollectiblePrice>
<Amount>4999</Amount>
<CurrencyCode>EUR</CurrencyCode>
<FormattedPrice>EUR 49,99</FormattedPrice>
</LowestCollectiblePrice>
<TotalNew>56</TotalNew>
<TotalUsed>8</TotalUsed>
<TotalCollectible>1</TotalCollectible>
<TotalRefurbished>0</TotalRefurbished>
</OfferSummary>
<Offers>
<TotalOffers>1</TotalOffers>
<TotalOfferPages>1</TotalOfferPages>
<MoreOffersUrl>http://www.amazon.de/gp/offer-listing/3570102769%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3570102769</MoreOffersUrl>
<Offer>
<OfferAttributes>
<Condition>New</Condition>
</OfferAttributes>
<OfferListing>
<OfferListingId>9KHCZj9qtL6ucVBPASfXaryQjU8tWbc0n%2F3F4F7GraOKW6Csji2OxpD93%2FkoHwgIGQctlnrtx4RWIeJULAcvvsFhiopFi08JdsZ%2FeO3u6g0%3D</OfferListingId>
<Price>
<Amount>1799</Amount>
<CurrencyCode>EUR</CurrencyCode>
<FormattedPrice>EUR 17,99</FormattedPrice>
</Price>
<Availability>Gewöhnlich versandfertig in 24 Stunden</Availability>
<AvailabilityAttributes>
<AvailabilityType>now</AvailabilityType>
<MinimumHours>0</MinimumHours>
<MaximumHours>0</MaximumHours>
</AvailabilityAttributes>
<IsEligibleForSuperSaverShipping>1</IsEligibleForSuperSaverShipping>
</OfferListing>
</Offer>
</Offers>
</Item>
<Item>
<ASIN>3813506479</ASIN>
<DetailPageURL>http://www.amazon.de/Altes-Land-Roman-D%C3%B6rte-Hansen/dp/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D3813506479</DetailPageURL>
<ItemLinks>
<ItemLink>
<Description>Add To Wishlist</Description>
<URL>http://www.amazon.de/gp/registry/wishlist/add-item.html%3Fasin.0%3D3813506479%26SubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</URL>
</ItemLink>
<ItemLink>
<Description>Tell A Friend</Description>
<URL>http://www.amazon.de/gp/pdp/taf/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</URL>
</ItemLink>
<ItemLink>
<Description>All Customer Reviews</Description>
<URL>http://www.amazon.de/review/product/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</URL>
</ItemLink>
<ItemLink>
<Description>All Offers</Description>
<URL>http://www.amazon.de/gp/offer-listing/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</URL>
</ItemLink>
</ItemLinks>
<ItemAttributes>
<Author>Dörte Hansen</Author>
<Binding>Gebundene Ausgabe</Binding>
<EAN>9783813506471</EAN>
<EANList>
<EANListElement>9783813506471</EANListElement>
</EANList>
<ISBN>3813506479</ISBN>
<IsEligibleForTradeIn>1</IsEligibleForTradeIn>
<ItemDimensions>
<Height Units="hundredths-inches">870</Height>
<Length Units="hundredths-inches">567</Length>
<Width Units="hundredths-inches">114</Width>
</ItemDimensions>
<Label>Albrecht Knaus Verlag</Label>
<Languages>
<Language>
<Name>Deutsch</Name>
<Type>Published</Type>
</Language>
<Language>
<Name>Deutsch</Name>
<Type>Original</Type>
</Language>
</Languages>
<ListPrice>
<Amount>1999</Amount>
<CurrencyCode>EUR</CurrencyCode>
<FormattedPrice>EUR 19,99</FormattedPrice>
</ListPrice>
<Manufacturer>Albrecht Knaus Verlag</Manufacturer>
<NumberOfPages>288</NumberOfPages>
<PackageDimensions>
<Height Units="hundredths-inches">118</Height>
<Length Units="hundredths-inches">858</Length>
<Weight Units="hundredths-pounds">101</Weight>
<Width Units="hundredths-inches">559</Width>
</PackageDimensions>
<ProductGroup>Book</ProductGroup>
<ProductTypeName>ABIS_BOOK</ProductTypeName>
<PublicationDate>2015-02-16</PublicationDate>
<Publisher>Albrecht Knaus Verlag</Publisher>
<Studio>Albrecht Knaus Verlag</Studio>
<Title>Altes Land: Roman</Title>
<TradeInValue>
<Amount>965</Amount>
<CurrencyCode>EUR</CurrencyCode>
<FormattedPrice>EUR 9,65</FormattedPrice>
</TradeInValue>
</ItemAttributes>
<OfferSummary>
<LowestNewPrice>
<Amount>1999</Amount>
<CurrencyCode>EUR</CurrencyCode>
<FormattedPrice>EUR 19,99</FormattedPrice>
</LowestNewPrice>
<LowestUsedPrice>
<Amount>1599</Amount>
<CurrencyCode>EUR</CurrencyCode>
<FormattedPrice>EUR 15,99</FormattedPrice>
</LowestUsedPrice>
<TotalNew>72</TotalNew>
<TotalUsed>8</TotalUsed>
<TotalCollectible>0</TotalCollectible>
<TotalRefurbished>0</TotalRefurbished>
</OfferSummary>
<Offers>
<TotalOffers>1</TotalOffers>
<TotalOfferPages>1</TotalOfferPages>
<MoreOffersUrl>http://www.amazon.de/gp/offer-listing/3813506479%3FSubscriptionId%3DAKIAI554OLCUMRCYB7ZA%26tag%3DjPp08vuSO4osfgfbCbEdF7TNqnWOm7YtprtqRPB9%26linkCode%3Dxm2%26camp%3D2025%26creative%3D12738%26creativeASIN%3D3813506479</MoreOffersUrl>
<Offer>
<OfferAttributes>
<Condition>New</Condition>
</OfferAttributes>
<OfferListing>
<OfferListingId>aeRv5KPt26T8S0hLrgV8Bv9UPYABYOMijGRxffbNJXUZSN4XfeeOZZpCZ28EURzmgMLlcYEBSRlMXS%2F8Z0pN1JbYerndME%2B2VK3RosfdQJA%3D</OfferListingId>
<Price>
<Amount>1999</Amount>
<CurrencyCode>EUR</CurrencyCode>
<FormattedPrice>EUR 19,99</FormattedPrice>
</Price>
<Availability>Gewöhnlich versandfertig in 24 Stunden</Availability>
<AvailabilityAttributes>
<AvailabilityType>now</AvailabilityType>
<MinimumHours>0</MinimumHours>
<MaximumHours>0</MaximumHours>
</AvailabilityAttributes>
<IsEligibleForSuperSaverShipping>1</IsEligibleForSuperSaverShipping>
</OfferListing>
</Offer>
</Offers>
</Item>
</Items>
I want to get any ASIN element. So I tried this:
from lxml import etree
doc = etree.fromstring(xmlstring)
items = doc.xpath('//Items/Item')
for a in items:
asin = a.xpath('//ASIN/text()')
print asin
What I get is this:
['3570102769', '3813506479']
['3570102769', '3813506479']
But I want this:
['3570102769']
['3813506479']
I don't understand what's the problem here? I think I should iterate over any element and in every element is one item with one asin. Why does it return two times two asin?
When you're searching for a.xpath('//ASIN/text()') you're searching the complete document tree again. Quoting from the XML Path language specification:
//para selects all the para descendants of the document root and thus selects all para elements in the same document as the context node
So what you're doing is iterating over the matched Item nodes and saying "Give me all ASIN nodes in this document please". The context for this (the Item node) is ignored.
What you should do instead, is directly select the ASIN child-node directly. Keeping to your original implementation this could look like this:
doc = etree.fromstring(xmlstring)
items = doc.xpath('//Items/Item')
for a in items:
asin = a.xpath('ASIN/text()')
print asin
which gives the output you desire:
['3570102769']
['3813506479']
Alternatively, if you're not certain where in the Item node your ASIN appears, you could use .//ASIN/text()
I have a xml description like this:
<Car xmlns="http://example.com/vocab/xml/cars#">
<dateStarted>{{date_started|escape}}</dateStarted>
<dateSold>{{date_sold|escape}}</dateSold>
<name type="{{name_type}}" abbrev="{{name_abbrev}}" value="{{name_value}}" >{{name|escape}}</name>
<brandName type="{{brand_name_type}}" abbrev="{{brand_name_abbrev}}" value="{{brand_name_value}}" >{{brand_name|escape}}</brandName>
<maxspeed>
<value>{{speed_value}}</value>
<unit type="{{speed_unit_type}}" value="{{speed_unit_value}}" abbrev="{{speed_unit_abbrev}}" />
</maxspeed>
<route type="{{route_type}}" value="{{route_value}}" abbrev="{{route_abbrev}}">{{by_route|escape}}</route>
<power>
<value>{{strength_value|escape}}</value>
<unit type="{{strength_unit_type}}" value="{{ strength_unit_value }}" abbrev="{{ strength_unit_abbrev }}" />
</power>
<frequency type="{{ frequency_type }}" value="{{frequency_value}}" abbrev="{{ frequency_abbrev }}">{{frequency|escape}}</frequency>
</Car>
I write a function parse_car using Python, to parse from a string use the above format:
def parse_car(etree):
NS = "{http://example.com/vocab/xml/cars#}"
CODES_NS = "{http://example.com/codes/}"
return {'date_started' : etree.findtext('%sdateStarted' % NS),
'date_stopped' : etree.findtext('%sdateStopped' % NS),
'name': etree.findtext('%sname' % NS),
'brand_name': etree.findtext('%sbrandName' % NS),
'maxspeed': etree.findtext('%smaxspeed/value' % NS),
'maxspeed_unit': etree.findtext('%smaxspeed/value' % NS).get('abbrev'),
'route': etree.findtext('%sroute' % NS),
'power': etree.findtext('%spower/value' % NS),
'power_unit': etree.find('%spower/value' % NS).get('abbrev'),
'frequency': etree.findtext('%sfrequency' % NS) }
But I only get a part of result. Here it is: it stop at route:
<Car xmlns="http://example.com/vocab/xml/cars#">
<dateStarted>2011-02-05</dateStarted>
<dateStopped>2011-02-13</dateStopped>
<name type="http://example.com/codes/bmw#" abbrev="X6" value="BMW X6" >BMW X6</name>
<brandName type="http://example.com/codes/bmw#" abbrev="BMW" value="BMW" >BMW</brandName>
<maxspeed>
<value>250</value>
<unit type="http://example.com/codes/units#" value="miles" abbrev="mph" />
</maxspeed>
<route type="http://...'
And here is the expected final result:
<Car xmlns="http://example.com/vocab/xml/cars#">
<dateStarted>2011-02-05</dateStarted>
<dateSold>2011-02-13</dateSold>
<name type="http://example.com/codes/bmw#" abbrev="X6" value="BMW X6" >BMW X6</name>
<brandName type="http://example.com/codes/bmw#" abbrev="BMW" value="BMW">BMW</brandName>
<maxspeed>
<value>250</value>
<unit type="http://example.com/codes/units#" value="miles" abbrev="mph" />
</maxspeed>
<route type="http://example.com/codes/routes#" abbrev="HW" value="Highway" >Highway</route>
<power>
<value>{{strength_value|escape}}</value>
<unit type="http://example.com/codes/units#" value="powerhorse" abbrev="ph" />
</power>
<frequency type="http://example.com/codes/frequency#" value="daily" >Daily</frequency>
</Car>
Could you please give me some advice why it doesn't work? Do I miss some thing here?
Thank you very much!