ElementTree.findall returns none - python

I want to find all password attribute while xml parsing and replace that with string "password". To find the password attribute I tried findall(), but it return "None".
Python version: python2.6
Sample code :
from xml.etree.ElementTree import ElementTree
tree = ElementTree()
a= tree.parse("/home/xxxx/securityfile_test.xml")
z = tree.findall(".//password")
print z
Can any one please help
Sample xml
<?xml version="1.0" encoding="UTF-8"?>
<security xmlns="http:xxxxx">
<group name="Abc" description="xxxxx.">
<rMember ref="A"/>
</group>
<user name="yyyy" password="**####***">
<gMember ref="A"/>
</user>
<group name="oooo" description="XXXXx">
<rMember ref="O"/>
</group>
<user name="zzzz" password="****###***">
<gMember ref="A"/>
</user>
</security>

EDIT: OP is using Python 2.6, this answer only valid for Python 2.7+
See the elementtree documentation.
To select elements based on attributes, you need to use a different syntax. If you use:
z = tree.findall(".//*[#password]")
This will work. The * means 'select all elements', the [#password] means 'with the password attribute'.
Results on your XML file with Python 2.7.12:
[<Element '{http:xxxxx}user' at 0x585ad30>, <Element '{http:xxxxx}user' at 0x585ae48>]

This will pull out the elements you need, update the attribute values and dump the xml result:
import xml.etree.ElementTree as Etree
tree = Etree.parse("file.xml")
root = tree.getroot()
print(root)
z = root.findall(".//*[#password]")
for elm in z:
elm.attrib['password'] = "Password"
print(Etree.dump(root))
I could only test on 2.7.12 as it's the only version I have for python2. I would definitely recommend you upgrade if you can, but this should hopefully give you a point in the right direction if you can't.

Related

How to rewrite thid XML file?

I trying to rewrite this xml file containing this XML code:
<?xml version="1.0" encoding="UTF-8"?>
<BrowserAutomationStudioProject>
<ModelList>
<Model>
<Name>token</Name>
<Description ru="token" en="token"/>
<Value>5660191076:AAEY8RI3hXcI3dEvjWAj7p2e7DdxOMNjPfk8</Value>
</Model>
<Defaults/>
<Model>
<Name>chat_id</Name>
<Value>5578940124</Value>
</Model>
<Defaults/>
</ModelList>
</BrowserAutomationStudioProject>
My python code:
import xml.etree.ElementTree as ET
tree = ET.parse('Actual.xml')
root = tree.getroot()
for model in root.findall('Model'):
name = model.find('Name').text
if name == 'token':
model.find('Value').text = '123456789:ABCDEFGHIJKLMNOPQRSTUVWXYZ'
if name == 'chat_id':
model.find('Value').text = '1234567890'
tree.write('xml_file.xml')
It works but I get the same file:
<?xml version="1.0" encoding="UTF-8"?>
<BrowserAutomationStudioProject>
<ModelList>
<Model>
<Name>token</Name>
<Description ru="token" en="token"/>
<Value>5660191076:AAEY8RI3hXcI3dEvjWAj7p2e7DdxOMNjPfk8</Value>
</Model>
<Defaults/>
<Model>
<Name>chat_id</Name>
<Value>5578940124</Value>
</Model>
<Defaults/>
</ModelList>
</BrowserAutomationStudioProject>
What's wrong with my code?
Even ChatGPT can't help me haha
I even tried to print it but it doesn't work
What I should do?
Please help me.
As described in the documentation, Element.findall() finds only elements with a tag which are direct children of the current element.. You need to force ET to selects all subelements, on all levels beneath the current element by using //.
Since <Model> is not a direct child of root (it's a grandchild, or something to that effect :)), root.findall('Model') finds nothing. So to get ET to find it, you need to modify that to
root.findall('.//Model')
and it should work.
You could also use for model in root.findall('ModelList/Model').
If you know the order of the xml tag you can do something like pop() the values from a list by iterate through the tree:
import xml.etree.ElementTree as ET
tree = ET.parse('Actual.xml')
root = tree.getroot()
input_value = ['1234567890','123456789:ABCDEFGHIJKLMNOPQRSTUVWXYZ']
for elem in root.iter():
if elem.tag == "Value":
elem.text = input_value.pop()
print(elem.tag, elem.text)
tree.write('xml_file.xml')
Output:
<?xml version="1.0"?>
<BrowserAutomationStudioProject>
<ModelList>
<Model>
<Name>token</Name>
<Description ru="token" en="token" />
<Value>123456789:ABCDEFGHIJKLMNOPQRSTUVWXYZ</Value>
</Model>
<Defaults />
<Model>
<Name>chat_id</Name>
<Value>1234567890</Value>
</Model>
<Defaults />
</ModelList>
</BrowserAutomationStudioProject>

Parsing subchilds in XML with ElementTree

Im trying to extract information from a XML-document with ElementTree in Python 3.2.
The XML looks like this:
<Page Id="1">
<Group>4</Group>
<Type>
<Letter>B</Letter>
<Number>101</Number>
<Deep>
<A>900</A>
<B>900</B>
</Deep>
</Type>
</Page>
I manage to get the elementdata from "Group" with:
for Page in root.iter('Page'):
Group = Page.find('Group').text
And "Letter"-data with:
for Type in root.iter('Type'):
Dim = Type.find('Letter').text
However I can't figure out how to get the data from the subchilds of "Deep" (A and B).
All help is greatly appreciated!
You are very close. Use find to find the Deep tag and the iterate over it.
Ex:
import xml.etree.ElementTree as ET
tree = ET.parse(filename)
root = tree.getroot()
for Type in root.iter('Type'):
for deep_tag in Type.find("Deep"):
print( deep_tag.text )
Output:
900
900

How to force ElementTree to keep xmlns attribute within its original element?

I have an input XML file:
<?xml version='1.0' encoding='utf-8'?>
<configuration>
<runtime name="test" version="1.2" xmlns:ns0="urn:schemas-microsoft-com:asm.v1">
<ns0:assemblyBinding>
<ns0:dependentAssembly />
</ns0:assemblyBinding>
</runtime>
</configuration>
...and Python script:
import xml.etree.ElementTree as ET
file_xml = 'test.xml'
tree = ET.parse(file_xml)
root = tree.getroot()
print (root.tag)
print (root.attrib)
element_runtime = root.find('.//runtime')
print (element_runtime.tag)
print (element_runtime.attrib)
tree.write(file_xml, xml_declaration=True, encoding='utf-8', method="xml")
...which gives the following output:
>test.py
configuration
{}
runtime
{'name': 'test', 'version': '1.2'}
...and has an undesirable side-effect of modifying XML into:
<?xml version='1.0' encoding='utf-8'?>
<configuration xmlns:ns0="urn:schemas-microsoft-com:asm.v1">
<runtime name="test" version="1.2">
<ns0:assemblyBinding>
<ns0:dependentAssembly />
</ns0:assemblyBinding>
</runtime>
</configuration>
My original script modifies XML so I do have to call tree.write and save edited file. But the problem is that ElementTree parser moves xmlns attribute from runtime element up to the root element configuration which is not desirable in my case.
I can't remove xmlns attribute from the root element (remove it from the dictionary of its attributes) as it is not listed in a list of its attributes (unlike the attributes listed for runtime element).
Why does xmlns attribute never gets listed within the list of attributes for any element?
How to force ElementTree to keep xmlns attribute within its original element?
I am using Python 3.5.1 on Windows.
xml.etree.ElementTree pulls all namespaces into the first element as it internally doesn't track on which element the namespace was declared originally.
If you don't want that, you'll have to write your own serialisation logic.
The better alternative would be to use lxml instead of xml.etree, because it preserves the location where a namespace prefix is declared.
Following #mata advice, here I give an answer with an example with code and xml file attached.
The xml input is as shown in the picture (original and modified)
The python codes check the NtnlCcy Name and if it is "EUR", convert the Price to USD (by multiplying EURUSD: = 1.2) and change the NtnlCcy Name to "USD".
The python code is as follows:
from lxml import etree
pathToXMLfile = r"C:\Xiang\codes\Python\afmreports\test_original.xml"
tree = etree.parse(pathToXMLfile)
root = tree.getroot()
EURUSD = 1.2
for Rchild in root:
print ("Root child: ", Rchild.tag, ". \n")
if Rchild.tag.endswith("Pyld"):
for PyldChild in Rchild:
print ("Pyld Child: ", PyldChild.tag, ". \n")
Doc = Rchild.find('{001.003}Document')
FinInstrNodes = Doc.findall('{001.003}FinInstr')
for FinInstrNode in FinInstrNodes:
FinCcyNode = FinInstrNode.find('{001.003}NtnlCcy')
FinPriceNode = FinInstrNode.find('{001.003}Price')
FinCcyNodeText = ""
if FinCcyNode is not None:
CcyNodeText = FinCcyNode.text
if CcyNodeText == "EUR":
PriceText = FinPriceNode.text
Price = float(PriceText)
FinPriceNode.text = str(Price * EURUSD)
FinCcyNode.text = "USD"
tree.write(r"C:\Xiang\codes\Python\afmreports\test_modified.xml", encoding="utf-8", xml_declaration=True)
print("\n the program runs to the end! \n")
As we compare the original and modified xml files, the namespace remains unchanged, the whole structure of the xml remains unchanged, only some NtnlCcy and Price Nodes have been changed, as desired.
The only minor difference we do not want is the first line. In the original xml file, it is <?xml version="1.0" encoding="UTF-8"?>, while in the modified xml file, it is <?xml version='1.0' encoding='UTF-8'?>. The quotation sign changes from double quotation to single quotation. But we think this minor difference should not matter.
The original file context will be attached for your easy test:
<?xml version="1.0" encoding="UTF-8"?>
<BizData xmlns="001.001">
<Hdr>
<AppHdr xmlns="001.002">
<Fr>
<Id>XXX01</Id>
</Fr>
<To>
<Id>XXX02</Id>
</To>
<CreDt>2019-10-25T15:38:30</CreDt>
</AppHdr>
</Hdr>
<Pyld>
<Document xmlns="001.003">
<FinInstr>
<Id>NLENX240</Id>
<FullNm>AO.AAI</FullNm>
<NtnlCcy>EUR</NtnlCcy>
<Price>9</Price>
</FinInstr>
<FinInstr>
<Id>NLENX681</Id>
<FullNm>AO.ABN</FullNm>
<NtnlCcy>USD</NtnlCcy>
<Price>10</Price>
</FinInstr>
<FinInstr>
<Id>NLENX320</Id>
<FullNm>AO.ING</FullNm>
<NtnlCcy>EUR</NtnlCcy>
<Price>11</Price>
</FinInstr>
</Document>
</Pyld>

XPath with LXML Element

I am trying to parse an XML document using lxml etree. The XML doc I am parsing looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.openarchives.org/OAI/2.0/">\t
<codeBook version="2.5" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="ddi:codebook:2_5" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd">
<docDscr>
<citation>
<titlStmt>
<titl>Test Title</titl>
</titlStmt>
<prodStmt>
<prodDate/>
</prodStmt>
</citation>
</docDscr>
<stdyDscr>
<citation>
<titlStmt>
<titl>Test Title 2</titl>
<IDNo agency="UKDA">101</IDNo>
</titlStmt>
<rspStmt>
<AuthEnty>TestAuthEntry</AuthEnty>
</rspStmt>
<prodStmt>
<copyright>Yes</copyright>
</prodStmt>
<distStmt/>
<verStmt>
<version date="">1</version>
</verStmt>
</citation>
<stdyInfo>
<subject>
<keyword>2009</keyword>
<keyword>2010</keyword>
<topcClas>CLASS</topcClas>
<topcClas>ffdsf</topcClas>
</subject>
<abstract>This is an abstract piece of text.</abstract>
<sumDscr>
<timePrd event="single">2020</timePrd>
<nation>UK</nation>
<anlyUnit>Test</anlyUnit>
<universe>test</universe>
<universe>hello</universe>
<dataKind>fdsfdsf</dataKind>
</sumDscr>
</stdyInfo>
<method>
<dataColl>
<timeMeth>test timemeth</timeMeth>
<dataCollector>test data collector</dataCollector>
<sampProc>test sampprocess</sampProc>
<deviat>test deviat</deviat>
<collMode>test collMode</collMode>
<sources/>
</dataColl>
</method>
<dataAccs>
<setAvail>
<accsPlac>Test accsPlac</accsPlac>
</setAvail>
<useStmt>
<restrctn>NONE</restrctn>
</useStmt>
</dataAccs>
<othrStdyMat>
<relPubl>122</relPubl>
<relPubl>12332</relPubl>
</othrStdyMat>
</stdyDscr>
</codeBook>
</metadata>
I wrote the following code to try and process it:
from lxml import etree
import pdb
f = open('/vagrant/out2.xml', 'r')
xml_str = f.read()
xml_doc = etree.fromstring(xml_str)
f.close()
From what I understand from the lxml xpath docs, I should be able to get the text from a specific element as follows:
xml_doc.xpath('/metadata/codeBook/docDscr/citation/titlStmt/titl/text()')
However, when I run this it returns an empty array.
The only xpath I can get to return something is using a wildcard:
xml_doc.xpath('*')
Which returns [<Element {ddi:codebook:2_5}codeBook at 0x7f8da8a413f8>].
I've read through the docs and I'm not understanding what is going wrong with this. Any help is appreciated.
You need to take the default namespace into account so instead of
xml_doc.xpath('/metadata/codeBook/docDscr/citation/titlStmt/titl/text()')
use
xml_doc.xpath.xpath(
'/oai:metadata/ddi:codeBook/ddi:docDscr/ddi:citation/ddi:titlStmt/ddi:titl/text()',
namespaces={
'oai': 'http://www.openarchives.org/OAI/2.0/',
'ddi': 'ddi:codebook:2_5'
}
)

Python - parse xml with lxml trouble

I've found a lot of questions on this issue but nothing I saw fits mine. I'm new to lxml so need some help.
my users.xml file:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<user>
<login>elena</login>
<password>elena</password>
<group>1</group>
</user>
<user>
<login>anele</login>
<password>anele</password>
<group>2</group>
</user>
</root>
the trouble function:
def analize_data(login):
doc = etree.parse("/myapp/users.xml")
for elem in doc.iter(tag='login'):
if elem.text == login:
parent = elem.getparent()
group = etree.SubElement(parent, 'group')
return group.text
What I need:
to find a user tag with login passed to function and get the text of group subelement of this user. But this function returns None when testing. What am I doing wrong and how to fix it?
I'm new to all these things, so need help. Thanks in advance!
Try using:
group = parent.iterchildren(tag="group").next()
etree.SubElement does something completely different:
This function creates an element instance, and appends it to an existing element.
Which is clearly not what you want.

Categories