I am trying to parse a XML using python ,xml example snippet:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<raml xmlns="raml21.xsd" version="2.1">
<series xmlns="" scope="USA" name="Arizona">
<header>
<log action="created"/>
</header>
<x_ns color="Blue">
<p name="timeZone">(GMT-10)</p>
</x_ns>
<x_ns color="Red">
<p name="AvgHeight">175</p>
</x_ns>
<x_ns color="black">
<p name="AvgWeight">235</p>
</x_ns>
the problem is namespaces keeps changing so as an alternative I tried to read the xmlns string first then create a dicionary using namespaces using the below code
root = raw_xml.getroot()
namespace_temp1=root.tag.split("}")
namespace_temp2=namespace_temp1[0].strip('{')
namespaces_auto={}
tag_name =["x","y","z","w","v"]
ns_name=[namespace_temp2,namespace_temp2,namespace_temp2,namespace_temp2,namespace_temp2]
namespace_temp3=zip(tag_name,ns_name)
for tag,ns in namespace_temp3:
namespaces_auto[tag]=ns
namespaces=namespaces_auto
to access a particular tag with namespace I am using the code as follows
for data in raw_xml.findall('x:x_ns',namespaces)
this pretty much solves the problem but gets stuck when the child node has blank xmlns as seen in the series tag (xmlns=""). Not Sure how to incorporate it in the code to check this condition.
Related
I am a begginer using Python. What I am trying to do is to update the InvoiceStatus of a certain list of invoices - I want to update it to "N" instead of "A". Below the XML file extract:
<?xml version="1.0" encoding="WINDOWS-1252"?>
<AuditFile>
<Header>
<AuditFileVersion>1.04_01</AuditFileVersion>
<CompanyID>51630</CompanyID>
</Header>
<MasterFiles>
<Customer>
<CustomerID>20201376</CustomerID>
<AccountID>20000</AccountID>
</Customer>
</MasterFiles>
<SourceDocuments>
<SalesInvoices>
<NumberOfEntries>981</NumberOfEntries>
<Invoice>
<InvoiceNo>F2 UF/3510000211</InvoiceNo>
<ATCUD>0</ATCUD>
<DocumentStatus>
<InvoiceStatus>A</InvoiceStatus>
<SourceBilling>P</SourceBilling>
</DocumentStatus>
<InvoiceNo>F2 UF/3510020247</InvoiceNo>
<ATCUD>0</ATCUD>
<DocumentStatus>
<InvoiceStatus>A</InvoiceStatus>
<SourceBilling>P</SourceBilling>
</DocumentStatus>
<InvoiceNo>F2 UF/3510020247</InvoiceNo>
<ATCUD>0</ATCUD>
<DocumentStatus>
<InvoiceStatus>A</InvoiceStatus>
<SourceBilling>P</SourceBilling>
</DocumentStatus>
</Invoice>
</SalesInvoices>
</SourceDocuments>
</AuditFile>
Here the script:
from xml.dom import minidom
def reemplazaTexto(nodo,textonuevo):
nodo.firstChild.replaceWholeText(textonuevo)
doc = minidom.parse('sample.xml')
print(doc.toxml())
invoices = doc.getElementsByTagName('InvoiceStatus')
for nodo in invoices:
reemplazaTexto(nodo, 'N')
print(doc.toxml())
But this script modifies all the InvoiceStatus. I would appreciate a hand on this.
Cheers,
Axel
I am new in XML and stuck on some feature. My problem statement is I have a list and an XML String (structure of XML is not fixed). I have defined some identifier in my XML string (here in my case is "{some_values}") with the same name as the name of the list. I want that when my code executes, XML string can identify that list variable and the values that are present in the list will add dynamically at run time.
some_values=[1,2,3]
Input xml
<Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/">
<Body>
<Add xmlns="http://tempuri.org/">
<intA>{some_values}</intA>
</Add>
</Body>
</Envelope>
OutPut Xml:
<Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/">
<Body>
<Add xmlns="http://tempuri.org/">
<intA>1</intA>
<intA>2</intA>
<intA>3</intA>
</Add>
</Body>
</Envelope>
I need some approach or solution that how can I solve this problem. I read some Python XML parser's libraries and have read somewhere that we can handle XML string using python templating also but unable to find the solution that fits for this particular problem.
Try something along these lines:
import lxml.etree as ET
parser = ET.XMLParser()
some_values=[1,2,3]
content='''<Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/">
<Body>
<Add xmlns="http://tempuri.org/">
<intA>{some_values}</intA>
</Add>
</Body>
</Envelope>
'''
tree = ET.fromstring(content, parser)
item = tree.xpath('.//*[local-name()="intA"]')
par = item[0].getparent()
for val in reversed(some_values):
new = ET.XML(f'<intA>{val}</intA>')
par.insert(par.index(item[0])+1,new)
par.remove(item[0])
print(etree.tostring(tree).decode())
Output (you can fix the formatting later):
<Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/">
<Body>
<Add xmlns="http://tempuri.org/">
<intA>1</intA><intA>2</intA><intA>3</intA></Add>
</Body>
</Envelope>
i want to add entire tag to xml, below is my XML format.
<?xml version="1.0" encoding="UTF-8"?>
<ca st="true" name="XMLConfig">
<app>
<!--- I want to add entire commneted tag to XML . !
<ar ty="co" name="st">
<ly ty="pt">
<pt>value</pt>
</Layout>
</ar> -->
<roll name="roll" fN="file.log" fP="logs.gz">
<ly type="ptl">
<pt>value</pt>
</ly>
<po>
<!-- Comment /> -->
<si size="100 MB" />
<!-- Comment /> -->
</po>
<de fI="max" max="10"/>
</roll>
</app>
as shown in above file i want to add this tag in file
<ar ty="co" name="st">
<ly ty="pt">
<pt>value</pt>
</Layout>
</ar>
this is where i reached so far..
for appenders in tree.xpath('//Appenders'):
if appenders.getchildren():
appenders.remove(appenders.getchildren()[0])
appenders.insert(0, appenders.getparent().append(etree.fromstring('<ar ty="co" name="st"> <ly ty="pt"><pt>value</pt></Layout></ar>')))
this is removing all other content after new content.
any help will be appreciated.!
In my opinion the first way you did it is way better. You just made some mistakes in your insert line, it should be this:
appenders.insert(0, etree.fromstring('<ar ty="co" name="st"> <ly ty="pt"><pt>value</pt></ly></ar>')))
I'm surprised it didn't throw an error for you because your insert line is basically this:
appenders.insert(0,None)
Also I noticed you do something in all of your questions:
You leave out some line(s) of your xml file. (I mean why?)
You shorten the tag names in your xml but you keep their long version in the code, which is kind of annoying because the person who wants to answer you have to change the code again to see if it is working.
I got it working, !
for apps in tree.xpath('//app'):
if appenders.tag == 'app':
appenders.insert(0, etree.SubElement(appenders, 'ar', ty="Co", name="st"))
for appender in tree.xpath('//ar'):
appender.insert(0, etree.SubElement(appender, 'ly', ty="pt"))
for layout in tree.xpath('//ly'):
layout.insert(0, etree.SubElement(layout, 'pt'))
for pattern in tree.xpath('//pt'):
pattern.text = 'value'
tree.write(r'C:\value.xml', xml_declaration=True, encoding='UTF-8')
if anyone has better way to do this .. please let me know to so i can improve on this .!
I'm new to python and parsing xml, but I'm having trouble with a particular xml file which is spat out by a program I work with. I'm trying parse this xml file using python and elementtree in order to extract the url data (the URL below is fake). Any ideas as to why this isn't working?
my python code:
def xmlTreeParser(fileName,attribute,tagName):
tree = ET.parse(fileName)
root = tree.getroot()
attribArray = [element.attrib[attribute] for element in root.findall(tagName)]
print attribArray
xmlTreeParser("xml_file.xml",'text','Expr')
here's my xml file:
<Query id="f9cef041-085d-47e0-8d16-15e36bba1ec8" name="">
<Description />
<JustSortedColumns />
<Conditions linking="All">
<Condition class="PDCT" enabled="True" readOnly="False" linking="Any">
<Condition class="SMPL" enabled="True" readOnly="False">
<Operator id="Contains" />
<Expressions>
<Expr class="ENTATTR" id="Person.LinkedInUrl" />
<Expr class="CONST" type="String" kind="Scalar" value="https://www.linkedin.com/Bill-Smith" text="https://www.linkedin.com/Bill-Smith" />
</Expressions>
</Condition>
</Condition>
</Conditions>
</Query>
The python I wrote works just fine on another, test, xml file that I wrote myself. I'm at a loss as to why I can't parse this particular block of xml. Thanks everyone.
For the specific call you make, you need to add this syntax to reach the tag Expr (doc):
xmlTreeParser("xml_file.xml",'text','.//Expr')
But also your Xml doesn't have all attributes like text, you should prevent errors like this :
attribArray = [element.attrib.get(attribute, '') for element in root.findall(tagName)]
# -----------------------------^
print(attribArray)
xmlTreeParser("xml_file.xml",'text','.//Expr')
I have a problem related to a Google Earth exported KML, as it doesn't seem to work well with Element Tree. I don't have a clue where the problem might lie, so I will explain how I do everything.
Here is the relevant code:
kmlFile = open( filePath, 'r' ).read( -1 ) # read the whole file as text
kmlFile = kmlFile.replace( 'gx:', 'gx' ) # we need this as otherwise the Element Tree parser
# will give an error
kmlData = ET.fromstring( kmlFile )
document = kmlData.find( 'Document' )
With this code, ET (Element Tree object) creates an Element object accessible via variable kmlData. It points to the root element ('kml' tag). However, when I run a search for the sub-element 'Document', it returns None. Although the 'Document' tag is present in the KML file!
Are there any other discrepancies between KMLs and XMLs apart from the 'gx: smth' tags? I have searched through the KML files I am dealing with and found nothing suspicious. Here is a simplified structure of an KML file the program is supposed to deal with:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2">
<Document>
<name>UK.kmz</name>
<Style id="sh_blu-blank">
<IconStyle>
<scale>1.3</scale>
<Icon>
<href>http://maps.google.com/mapfiles/kml/paddle/blu-blank.png</href>
</Icon>
<hotSpot x="32" y="1" xunits="pixels" yunits="pixels"/>
</IconStyle>
<ListStyle>
<ItemIcon>
<href>http://maps.google.com/mapfiles/kml/paddle/blu-blank-lv.png</href>
</ItemIcon>
</ListStyle>
</Style>
[other style tags...]
<Folder>
<name>UK</name>
<Placemark>
<name>1262 Crossness Pumping Station</name>
<LookAt>
<longitude>0.1329926667038817</longitude>
<latitude>51.50303535104574</latitude>
<altitude>0</altitude>
<range>4246.539753518848</range>
<tilt>0</tilt>
<heading>-4.295161152207489</heading>
<altitudeMode>relativeToGround</altitudeMode>
<gx:altitudeMode>relativeToSeaFloor</gx:altitudeMode>
</LookAt>
<styleUrl>#msn_blu-blank15000</styleUrl>
<Point>
<coordinates>0.1389579668507301,51.50888923518947,0</coordinates>
</Point>
</Placemark>
[other placemark tags...]
</Folder>
</Document>
</kml>
Do you have an idea why I can't access any sub-elements of 'kml'? By the way, Python version is 2.7.
The KML document is in the http://earth.google.com/kml/2.2 namespace, as indicated by
<kml xmlns="http://earth.google.com/kml/2.2">
This means that the name of the Document element is in fact {http://earth.google.com/kml/2.2}Document.
Instead of this:
document = kmlData.find('Document')
you need this:
document = kmlData.find('{http://earth.google.com/kml/2.2}Document')
However, there is a problem with the XML file. There is an element called gx:altitudeMode. The gx bit is a namespace prefix. Such a prefix needs to be declared, but the declaration is missing.
You have worked around the problem by simply replacing gx: with gx. But the proper way to do this would be to add the namespace declaration. Based on https://developers.google.com/kml/documentation/altitudemode, I take it that gx is associated with the http://www.google.com/kml/ext/2.2 namespace. So for the document to be well-formed, the root element start tag should read
<kml xmlns="http://earth.google.com/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
Now the document can be parsed:
In [1]: from xml.etree import ElementTree as ET
In [2]: kmlData = ET.parse("kml2.xml")
In [3]: document = kmlData.find('{http://earth.google.com/kml/2.2}Document')
In [4]: document
Out[4]: <Element '{http://earth.google.com/kml/2.2}Document' at 0x1895810>
In [5]: