Python XML modifying by ElementTree destroys the XML structure - python

I am using Python V 3.5.1 on windows framework in order to modify a text inside , the modification works great but after saving the tree all the empty tags get destroyed as the following example:
<HOSTNAME></HOSTNAME> Is being changed to <HOSTNAME />
child with a text between the tags looks good:
<HOSTNAME>tnas2</HOSTNAME> is being changed to
<HOSTNAME>tnas2</HOSTNAME> which is the same as the source.
The source XML file is:
<ROOT>
<DeletedName>
<VERIFY_DEST_SIZE>Y</VERIFY_DEST_SIZE>
<VERIFY_BYTES>Y</VERIFY_BYTES>
<TIMESTAMP>XXXXXXXXXDeletedXXXXXXXXXX</TIMESTAMP>
<EM_USERS>XXXXXXXXXDeletedXXXXXXXXXX</EM_USERS>
<EM_GROUPS></EM_GROUPS>
<LOCAL>
<HOSTNAME></HOSTNAME>
<PORT></PORT>
<USERNAME>XXXXXXXXXDeletedXXXXXXXXXX</USERNAME>
<PASSWORD>XXXXXXXXXDeletedXXXXXXXXXX</PASSWORD>
<HOME_DIR></HOME_DIR>
<OS_TYPE>Windows</OS_TYPE>
</LOCAL>
<REMOTE>
<HOSTNAME>DeletedHostName</HOSTNAME>
<PORT>22</PORT>
<USERNAME>XXXXXXXXXDeletedXXXXXXXXXX</USERNAME>
<PASSWORD>XXXXXXXXXDeletedXXXXXXXXXX</PASSWORD>
<HOME_DIR>XXXXXXXXXDeletedXXXXXXXXXX</HOME_DIR>
<OS_TYPE>Unix</OS_TYPE>
<CHAR_SET>UTF-8</CHAR_SET>
<SFTP>Y</SFTP>
<ENCRYPTION>Blowfish</ENCRYPTION>
<COMPRESSION>N</COMPRESSION>
</REMOTE>
</DeletedName>
</ROOT>
the code is:
import os
import xml.etree.ElementTree as ET
from shutil import copyfile
import datetime
def AddAuthUserToAccountsFile(AccountsFile,RemoteMachine,UserToAdd):
today = datetime.date.today()
today = str(today)
print(today)
BackUpAccountsFile = AccountsFile + "-" + today
try:
tree = ET.parse(AccountsFile)
except:
pass
try:
copyfile(AccountsFile,BackUpAccountsFile)
except:
pass
root = tree.getroot()
UsersTags = tree.findall('.//EM_USERS')
for UsersList in UsersTags:
Users = UsersList.text
Users = UsersList.text = Users.replace("||","|")
if UserToAdd not in Users:
print("The Users were : ",Users, "--->> Adding ",UserToAdd)
UsersList.text = Users + UserToAdd +"|"
tree.write(AccountsFile)
Appreciate for any help to pass this strange scenario.
Thanks,
Miki

OK, i found the solution -
just adding method = "html" to the tree.write line it keeps it as needed.
tree.write(AccountsFile,method = 'html')
Thanks.

Related

How to add namespace prefix at root XML using Python LXML?

I would like to have the following NS prefix <qsp: and </qsp:
<qsp:QSPart xmlns:qsp="urn:qvalent:quicksuper:gateway">
<qsp:MemberRegistrationRequest/>
</qsp:QSPart>
How do I do that in LMXL python?
from lxml import etree
nsmap = {'qsp': 'urn:qvalent:quicksuper:gateway'}
nsprefix = nsmap['qsp']
QSPart = etree.Element('QSPart', nsmap=nsmap)
MemberRegistrationRequest = etree.SubElement(QSPart, etree.QName(nsprefix, 'MemberRegistrationRequest'))
print(etree.tostring(QSPart, pretty_print=True, encoding=str))
Result:
<QSPart xmlns:qsp="urn:qvalent:quicksuper:gateway">
<qsp:MemberRegistrationRequest/>
</QSPart>
According to the documentation, you need to fully qualify the element name in your call to etree.Element:
from lxml import etree
nsmap = {'qsp': 'urn:qvalent:quicksuper:gateway'}
nsprefix = nsmap['qsp']
QSPart = etree.Element(f'{{{nsmap["qsp"]}}}QSPart')
MemberRegistrationRequest = etree.SubElement(QSPart, etree.QName(nsprefix, 'MemberRegistrationRequest'))
print(etree.tostring(QSPart, pretty_print=True, encoding=str))
This outputs:
<ns0:QSPart xmlns:ns0="urn:qvalent:quicksuper:gateway">
<ns0:MemberRegistrationRequest/>
</ns0:QSPart>
Since you know your expected output, I wouldn't bother with all that (though I understand many people frown on this approach...) - just use from and to string:
frag_text = """<qsp:QSPart xmlns:qsp="urn:qvalent:quicksuper:gateway">
<qsp:MemberRegistrationRequest/>
</qsp:QSPart>"""
fragment = etree.fromstring(frag_text)
print(etree.tostring(fragment).decode())
Output should be your expected output.

AttributeError when assigning value to function for XML data extraction

I'm coding a script to extract information from several XML files with the same structure but with missing sections when there is no information related to a tag. The easiest way to achieve this was using try/except so instead of getting a "AtributeError: 'NoneType' object has no atrribute 'find'" I assign an empty string('') to the object in the exeption. Something like this:
try:
string1=root.find('value1').find('value2').find('value3').text
except:
string1=''
The issue is that I want to shrink my code by using a function:
def extract(string):
tempstr=''
try:
tempstr=string.replace("\n", "")
except:
if tempstr is None:
tempstr=""
return string
And then I try to called it like this:
string1=extract(root.find('value1').find('value2').find('value3').text)
and value2 or value3 does not exist for the xml that is being processed, I get and AttributeError even if I don't use the variable in the function making the function useless.
Is there a way to make a function work, maybe there is a way to make it run without checking if the value entered is invalid?
Solution:
I'm using a mix of both answers:
def extract(root, xpath):
tempstr=''
try:
tempstr=root.findall(xpath)[0].text.replace("\n", "")
except:
tempstr=''#To avoid getting a Nonetype object
return tempstr
You can try something like that:
def extract(root, children_keys: list):
target_object = root
result_text = ''
try:
for child_key in children_keys:
target_object = target_object.find(child_key)
result_text = target_object.text
except:
pass
return result_text
You will go deeper at XML structure with for loop (children_keys - is predefined by you list of nested keys of XML - xml-path to your object).
And if error will throw inside that code - you will get '' as result.
Example XML (source):
<?xml version="1.0" encoding="UTF-8"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>
<y>Don't forget me this weekend!</y>
</body>
</note>
Example:
import xml.etree.ElementTree as ET
tree = ET.parse('note.xml')
root = tree.getroot()
children_keys = ['body', 'y']
result_string = extract(root, children_keys)
print(result_string)
Output:
"Don't forget me this weekend!"
Use XPATH expression
import xml.etree.ElementTree as ET
xml1 = '''<r><v1><v2><v3>a string</v3></v2></v1></r>'''
root = ET.fromstring(xml1)
v3 = root.findall('./v1/v2/v3')
if v3:
print(v3[0].text)
else:
print('v3 not found')
xml2 = '''<r><v1><v3>a string</v3></v1></r>'''
root = ET.fromstring(xml2)
v3 = root.findall('./v1/v2/v3')
if v3:
print(v3[0].text)
else:
print('v3 not found')
output
a string
v3 not found

Extract data from xml to Excel (Python 2.7 )

i'm attempting to extract some data from a XML file and create a Excel with the information.
XML File:
<UniversalTransaction>
<TransactionInfo>
<DataContext>
<DataSourceCollection>
<DataSource>
<Type>AccountingInvoice</Type>
<Key>AR INV 00001006</Key>
</DataSource>
</DataSourceCollection>
<Company>
<Code>DCL</Code>
<Country>
<Code>CL</Code>
<Name>Chile</Name>
</Country>
<Name>Your Chile Corp</Name>
</Company>
...etc
Then I made this Code in python 2.7
import xml.etree.ElementTree as ET
import xlwt
from datetime import datetime
tree = ET.parse('ar.xml')
root = tree.getroot()
#extract xml
invoice = root.findall('DataSource')
arinv = root.find('Key').text
country = root.findall('Company')
ctry = root.find('Name').text
wb = xlwt.Workbook()
ws = wb.add_sheet('A Test Sheet')
ws.write(0, 0, arinv)
ws.write(0, 1, ctry)
wb.save('example2.xls')
But I get this error:
arinv = root.find('Key').text
'NoneType' object has no attribute 'text'
And i guess it will be the same with
ctry = root.find('Name').text
Also when I change the "extract xml" part of the code to this
for ar in root.findall('DataContext'):
nro = []
ctry = []
inv = ar.find('Key').text
nro.append(inv)
country = ar.find('Name').text
ctry.append(country)
i get the following error:
ws.write(0, 0, arinv)
name 'arinv' is not defined
then again, I guess its the same with "ctry"
Windows 10, python 2.7
I'll apreciate any help, thanks.
It is better to ask shortened questions - without yours bunch of context code. Probably you find a solution yourself when you carefully try to split out exact short question.
According to the docs, Element.find basically finds only in direct children. You need to use some XPath (look about XPath expressions in the docs) like
root.findall('.//Key')[0].text
(given with assumption the Key always exists, contains text and unique within a document; i.e. without validation)

Python ElementTree how do send the value of a variable to xml output

I want to update the xml file with the current date in lastrun date attribute.
The code below results in + str(mprocessdate) + and I want it to say 2015-04-16.
What's wrong with my code? Why do I get that string instead of the actual date?
company1.xml
<corp>
<lastrun date="20150123" />
<company id="18888802223">
<name>South Plantation</name>
<P_DNIS>99603</P_DNIS>
<Tracking_Phone>+18888802223</Tracking_Phone>
<Account>South Plantation</Account>
<AppendValue> Coupon</AppendValue>
<InsertCoupon>Y</InsertCoupon>
</company>
</corp>
Script
import datetime
from xml.etree import ElementTree as ET
mprocessdate = datetime.date.today()
print (mprocessdate)
tree = ET.parse("company1.xml")
mlastrun = tree.find('lastrun')
mlastrun.set('date', '+ str(mprocessdate) + ')
tree.write('company.xml')
Leave off the + and just put in the variable name.
import datetime
from xml.etree import ElementTree as ET
mprocessdate = datetime.date.today()
print (mprocessdate)
tree = ET.parse("company.xml")
mlastrun = tree.find('lastrun')
mlastrun.set('date', str(mprocessdate))
tree.write('company.xml')

Using "info.get" for a child element in Python / lxml

I'm trying to get the attribute of a child element in Python, using lxml.
This is the structure of the xml:
<GroupInformation groupId="crid://thing.com/654321" ordered="true">
<GroupType value="show" xsi:type="ProgramGroupTypeType"/>
<BasicDescription>
<Title type="main" xml:lang="EN">A programme</Title>
<RelatedMaterial>
<HowRelated href="urn:eventis:metadata:cs:HowRelatedCS:2010:boxCover">
<Name>Box cover</Name>
</HowRelated>
<MediaLocator>
<mpeg7:MediaUri>file://ftp.something.com/Images/123456.jpg</mpeg7:MediaUri>
</MediaLocator>
</RelatedMaterial>
</BasicDescription>
The code I've got is below. The bit I want to return is the 'value' attribute ("Show" in the example) under 'grouptype' (third line from the bottom):
file_name = input('Enter the file name, including .xml extension: ')
print('Parsing ' + file_name)
from lxml import etree
parser = etree.XMLParser()
tree = etree.parse(file_name, parser)
root = tree.getroot()
nsmap = {'xmlns': 'urn:tva:metadata:2010','mpeg7':'urn:tva:mpeg7:2008'}
with open(file_name+'.log', 'w', encoding='utf-8') as f:
for info in root.xpath('//xmlns:GroupInformation', namespaces=nsmap):
crid = info.get('groupId'))
grouptype = info.find('.//xmlns:GroupType', namespaces=nsmap)
gtype = grouptype.get('value')
titlex = info.find('.//xmlns:BasicDescription/xmlns:Title', namespaces=nsmap)
title = titlex.text if titlex != None else 'Missing'
Can anyone explain to me how to implement it? I had a quick look at the xsi namespace, but was unable to get it to work (and didn't know if it was the right thing to do).
Is this what you are looking for?
grouptype.attrib['value']
PS: why the parenthesis around assignment values? Those look unnecessary.

Categories