How to create an objectified element with text with lxml - python

I'd like to create an element tree (not parsing!) with lxml.objectify that might look like this:
<root>
<child>Hello World</child>
</root>
My first attempt was to write code like this:
import lxml.objectify as o
from lxml.etree import tounicode
r = o.Element("root")
c = o.Element("child", text="Hello World")
r.append(c)
print(tounicode(r, pretty_print=True)
But that produces:
<root xmlns:py="http://codespeak.net/lxml/objectify/pytype"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema" py:pytype="TREE">
<child text="Hello World" data="Test" py:pytype="TREE"/>
</root>
As suggested in other answers, the <child> has no method _setText.
Apparently, lxml.objectifiy does not allow to create an element with text or change the text content. So, did I miss something?

From the doc and the answer you linked. You should use SubElement:
r = o.E.root() # same as o.Element("root")
c = o.SubElement(r, "child")
c._setText("Hello World")
print(tounicode(r, pretty_print=True))
c._setText("Changed it!")
print(tounicode(r, pretty_print=True))
Output:
<root xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<child>Hello World</child>
</root>
<root xmlns:py="http://codespeak.net/lxml/objectify/pytype" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<child>Changed it!</child>
</root>

Related

How to rewrite thid XML file?

I trying to rewrite this xml file containing this XML code:
<?xml version="1.0" encoding="UTF-8"?>
<BrowserAutomationStudioProject>
<ModelList>
<Model>
<Name>token</Name>
<Description ru="token" en="token"/>
<Value>5660191076:AAEY8RI3hXcI3dEvjWAj7p2e7DdxOMNjPfk8</Value>
</Model>
<Defaults/>
<Model>
<Name>chat_id</Name>
<Value>5578940124</Value>
</Model>
<Defaults/>
</ModelList>
</BrowserAutomationStudioProject>
My python code:
import xml.etree.ElementTree as ET
tree = ET.parse('Actual.xml')
root = tree.getroot()
for model in root.findall('Model'):
name = model.find('Name').text
if name == 'token':
model.find('Value').text = '123456789:ABCDEFGHIJKLMNOPQRSTUVWXYZ'
if name == 'chat_id':
model.find('Value').text = '1234567890'
tree.write('xml_file.xml')
It works but I get the same file:
<?xml version="1.0" encoding="UTF-8"?>
<BrowserAutomationStudioProject>
<ModelList>
<Model>
<Name>token</Name>
<Description ru="token" en="token"/>
<Value>5660191076:AAEY8RI3hXcI3dEvjWAj7p2e7DdxOMNjPfk8</Value>
</Model>
<Defaults/>
<Model>
<Name>chat_id</Name>
<Value>5578940124</Value>
</Model>
<Defaults/>
</ModelList>
</BrowserAutomationStudioProject>
What's wrong with my code?
Even ChatGPT can't help me haha
I even tried to print it but it doesn't work
What I should do?
Please help me.
As described in the documentation, Element.findall() finds only elements with a tag which are direct children of the current element.. You need to force ET to selects all subelements, on all levels beneath the current element by using //.
Since <Model> is not a direct child of root (it's a grandchild, or something to that effect :)), root.findall('Model') finds nothing. So to get ET to find it, you need to modify that to
root.findall('.//Model')
and it should work.
You could also use for model in root.findall('ModelList/Model').
If you know the order of the xml tag you can do something like pop() the values from a list by iterate through the tree:
import xml.etree.ElementTree as ET
tree = ET.parse('Actual.xml')
root = tree.getroot()
input_value = ['1234567890','123456789:ABCDEFGHIJKLMNOPQRSTUVWXYZ']
for elem in root.iter():
if elem.tag == "Value":
elem.text = input_value.pop()
print(elem.tag, elem.text)
tree.write('xml_file.xml')
Output:
<?xml version="1.0"?>
<BrowserAutomationStudioProject>
<ModelList>
<Model>
<Name>token</Name>
<Description ru="token" en="token" />
<Value>123456789:ABCDEFGHIJKLMNOPQRSTUVWXYZ</Value>
</Model>
<Defaults />
<Model>
<Name>chat_id</Name>
<Value>1234567890</Value>
</Model>
<Defaults />
</ModelList>
</BrowserAutomationStudioProject>

How to get the content of child->child->child->child in XML file using Python

<?xml version="1.0" encoding="UTF-8"?>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.056.001.01" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<FIToFIPmtCxlReq>
<Assgnmt>
<Id>TEST-ISO-81</Id>
<Assgnr>
<Agt>
<FinInstnId>
<BIC>CCCCGB2L</BIC>
</FinInstnId>
</Agt>
</Assgnr>
<Assgne>
<Agt>
<FinInstnId>
<BIC>MMMMGB2L</BIC>
</FinInstnId>
</Agt>
</Assgne>
<CreDtTm>2009-03-24T11:22:59</CreDtTm>
</Assgnmt>
<TxInf>
<CxlId>103012345</CxlId>
<Case>
<Id>ISO_TEST_CASE</Id>
<Cretr>
<Agt>
<FinInstnId>
<BIC>MMMMGB2L</BIC>
</FinInstnId>
</Agt>
</Cretr>
</Case>
</TxInf>
</Undrlyg>
</FIToFIPmtCxlReq>
</Document>
Here I want to get the content of "TxInf" like all its child and child of child and the data.
What I have tried is :
import xml.etree.ElementTree as ET
from xml.etree import ElementTree
tree = ET.parse('R3-CAMT.056.001.07-ISO-V.XML')
root = tree.getroot()
for element in root.iter():
if element.tag == "{urn:iso:std:iso:20022:tech:xsd:camt.056.001.01}TxInf":
tree._setroot(element.tag)
print(root.tag)
print(root.attrib)
Please suggest if I can change the root with _setroot or any other possible method
Try something along these lines on your code to see if it works:
for r in root.findall(".//*"):
if 'TxInf' in r.tag:
print(ET.tostring(r))
By the way, it may be easier to do it with lxml, if you can use it.

Cannot Parse XML file using Python

<?xml version="1.0" encoding="utf-8"?>
<AcResponse
Command="hist"
TaskId="408709">
<element
name="/build.gradle"
id="93527">
<transaction
id="1117194"
type="promote"
time="1529083792"
user="soarfa99">
<comment>Automated promotion to parent stream by module build: jenkins-SC-MODULE-CS-SC-TRUNK-MedRec-DEV-CI-430</comment>
<version
virtual="11007/75"
real="36877/2"
virtualNamedVersion="CS-SC-TRUNK-INTG/75"
realNamedVersion="CS-SC-TRUNK-MedRec-DEV2_ar037601/2"
elem_type="text"
dir="no">
<issueNum>72768</issueNum>
</version>
</transaction>
<transaction
id="1111652"
type="promote"
time="1528100495"
user="dm041068">
<comment>SEDA file add- Debajyoti</comment>
<version
virtual="11007/74"
real="39225/1"
virtualNamedVersion="CS-SC-TRUNK-INTG/74"
realNamedVersion="CS-SC-TRUNK-CM-DEV-Debajyoti_dm041068/1"
elem_type="text"
dir="no">
<issueNum>72629</issueNum>
</version>
</transaction>
</element>
<streams>
<stream
id="11007"
name="CS-SC-TRUNK-INTG"
type="normal"/>
</streams>
</AcResponse>
This is the xml i am trying to parse, and i am trying to extract the attribute 'issueNum' with the following code:
tree=ET.parse(xml)
root=tree.getroot()
for item in root.findall('version'):
for child in item:
print(child.attrib['issueNum'])
Can you guys please help, get me the value of "issueNum".
You can use an xpath expression to find the values of issueNum:
from lxml import etree
xml = '''<?xml version="1.0" encoding="utf-8"?>
<AcResponse
Command="hist"
TaskId="408709">....'''
tree = etree.fromstring(xml)
issues = tree.xpath('//version/issueNum')
for issue in issues:
print(issue.text)
This prints:
72768
72629

keep original xml declaration when editing with python

my original xml file looks like this:
<?xml version="1.0" encoding="utf-8"?>
<foo/>
and I want to change it to
<?xml version="1.0" encoding="utf-8"?>
<foo>
<bar>confusing dev</bar>
</foo>
I am using xml.etree.ElementTree as suggested by this tutorial
with open('file.xml','r+b') as f:
tree = etree.parse(f)
f.seek(0,0)
tree.write(f,xml_declaration=True)# default argument: encoding="us-ascii"
this outputs
<?xml version='1.0' encoding='us-ascii'?>
<foo/>
But how do I get the encoding of file.xml at runtime and pass it as an argument to tree.write or is there a better way to edit xml in python? I just want to change some Element.text but keep the declaration and namespace unchanged.

Delete entire node using lxml

I have a an xml document like the following:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>company</groupId>
<artifactId>art-id</artifactId>
<version>RELEASE</version>
</parent>
<properties>
<tomcat.username>admin</tomcat.username>
<tomcat.password>admin</tomcat.password>
</properties>
<dependencies>
<dependency>
<groupId>asdf</groupId>
<artifactId>asdf</artifactId>
<version>[3.8,)</version>
</dependency>
<dependency>
<groupId>asdf</groupId>
<artifactId>asdf</artifactId>
<version>[4.1,)</version>
</dependency>
</dependencies>
how can I delete the entire node "dependencies"?
I have looked at other questions and answers on stackoverflow and what is different about is the namespace aspect of this xml, and the other questions ask to delete a subelement like "dependency" while I want to delete the whole node "dependencies." Is there an easy way using lxml to delete the entire node?
The following gives a 'NoneType' object has no attribute 'remove' error:
from lxml import etree as ET
tree = ET.parse('pom.xml')
namespace = '{http://maven.apache.org/POM/4.0.0}'
root = ET.Element(namespace+'project')
root.find(namespace+'dependencies').remove()
You can create a dict mapping for your namespace(s), find the node then call root.remove passing the node, you don't call .remove on the node:
x = """<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>company</groupId>
<artifactId>art-id</artifactId>
<version>RELEASE</version>
</parent>
<properties>
<tomcat.username>admin</tomcat.username>
<tomcat.password>admin</tomcat.password>
</properties>
<dependencies>
<dependency>
<groupId>asdf</groupId>
<artifactId>asdf</artifactId>
<version>[3.8,)</version>
</dependency>
<dependency>
<groupId>asdf</groupId>
<artifactId>asdf</artifactId>
<version>[4.1,)</version>
</dependency>
</dependencies>
</project>"""
import lxml.etree as et
from StringIO import StringIO
tree = et.parse(StringIO(x))
root =tree.getroot()
nsmap = {"mav":"http://maven.apache.org/POM/4.0.0"}
root.remove(root.find("mav:dependencies", namespaces=nsmap))
print(et.tostring(tree))
Which would give you:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>company</groupId>
<artifactId>art-id</artifactId>
<version>RELEASE</version>
</parent>
<properties>
<tomcat.username>admin</tomcat.username>
<tomcat.password>admin</tomcat.password>
</properties>
</project>
First, grab the root node. Since it is <project ... > (vs <project .../>) the "parent" element of dependencies is project. Example from the documentation:
import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
Once you have the root, check root.tag(), it should be "project".
Then do root.remove(root.find('dependencies')), where root is the project node.
If it were <project .../> then it would be invalid XML since there must be a root element. I can see exactly where you are coming from, though.

Categories