Change value in specific child element - python

I have some issues with my script.
First of i wanna make sure that the correct name has been given by the user. For example if I were to write "Name" it should not match with anything in my xml. If i were to write "NameY" it should match with given name and not anything else(for example "NameX should not be matched). I tried using wordboundry but it did not help as I thought it would.
When I try want to change value in NameY the script changes for all the values(even for NameX). I believe this is because it can't seperate the "names"?
I am using argparse for this and the code is in another function.
args.name = "the name to look for"
args.dirname = "file location"
args.value "the value I want to change to"
wordboundry = re.compile(r'(<NAME>(%s\b)<\/NAME>)' % args.name)
getName = ("{0}ROOTS/{0}ROOT/{0}ELEM/{0}SPEC/[{0}NAME]")
getDigit = ("{0}ROOTS/{0}ROOT/{0}ELEM/{0}SPEC/{0}DP/{0}NVP/[{0}CHECK]/[{0}DIGIT]")
for fileName in glob.glob(args.dirname):
print fileName
with open(fileName,"r") as file:
my_files_content = file.read()
if args.name in my_files_content: #instead of using args.name i was trying to use if re.search(wordboundry, my_files_content):
print "Parameter exist in this file\n"
tree = tree = ElementTree.ElementTree()
tree.parse(fileName)
ElementTree.register_namespace("", "http://something.com")
namespace = "{http://something.com}"
for names in tree.findall(getName.format(namespace)):
mathcName = names.find('{http://something.com}NAME').text
if mathcName == **args.paramName**:
for digits in tree.findall(getDigit.format(namespace)):
digitNumber = digits.find('{http://something.com}DIGIT').text
convertToString = ' '.join([str(mystring) for mystring in args.value]) #Convert to string
digits.find('{http://something.com}DIGIT').text = convertToString #This allows the script to changes the digit in the xml file
tree.write("test.xml", encoding='utf-8', xml_declaration=True)
else: print"NO MATCHES\n"
else:
print "i am not here"
else:
"Parameter does not exist in this file\n"
My XML
<?xml version="1.0" encoding="utf-8"?>
<TOP xmlns="http://something.com">
<ROOTS>
<ROOT>
<ELEM>
<SPEC>
<NAME>NameX</NAME>
<DP>
<NVP>
<CHECK>NameX_1</CHECK>
<DIGIT>3</DIGIT>
</NVP>
<NVP>
<CHECK>NameX_2</CHECK>
<DIGIT>20</DIGIT>
</NVP>
</DP>
</SPEC>
<SPEC>
<NAME>NameY</NAME>
<DP>
<NVP>
<CHECK>NameY</CHECK>
<DIGIT>7</DIGIT>
</NVP>
</DP>
</SPEC>
</ELEM>
</ROOT>
</ROOTS>
</TOP>
Edit fixed the problem by changning "is not None" to "== args.paramName"

Related

"not well-formed (invalid token): " error for trying to parse an XML file

I am having this error. I am trying to access an xmlfile called "people-kb.xml".
I am having the problem on a line known as: xmldoc = minidom.parse(xmlfile) #Accesses file.
xmldoc is "people-kb.xml" which is passed into a method such as:
parseXML('people-kb.xml')
So the problem I was having came from the save file I had created as I was trying to make a multiple trials that would contain information on two people. for now I only have one trial included and not multiple yet as I am starting with creating the file and after I would edit if it already exists.
the code for making the file is:
import xml.etree.cElementTree as ET
def saveXML(xmlfile):
root = ET.Element("Simulation")
ET.SubElement(root, "chaserStartingCoords").text = "1,1"
ET.SubElement(root, "runnerStartingCoords").text = "9,9"
doc = ET.SubElement(root, "trail")
ET.SubElement(doc, "number").text = "1"
doc1 = ET.SubElement(doc, "number", name="number").text = "1" #Trying to make multiple trials
ET.SubElement(doc1, "chaserEndCoords").text = "10,10"
ET.SubElement(doc1, "runnerInitialCoords").text = "10,10"
tree = ET.ElementTree(root)
tree.write(xmlfile)
if __name__ == '__main__':
saveXML('output.xml')
Where it says "number" I am trying to make it the amount of trials it would be. So what I am trying to make it expect is an output like this:
<simulation>
<chaserStartingCoords>1,1<chaserStartingCoords>
<runnerStartingCoords>9,9<runnerStartingoords>
<trial>
<number>1</number>
<move number="1">
<chaserEndcoords>10,10<chaserEndCoords>
<runnerInitialCoords>10,10<runnerInitialCoords>
</move>
</trial>
</simulation>
I've been having a problem trying to get the <move number="1"> part as later I expect to be able to go into the file and iterate through each node called "move" to check positions.
You say "when trying to name a node of the file, it shows a red highlight on "1" "
That suggests you're trying to use "1", or something beginning with "1", as an element or attribute name, which would be invalid.

AttributeError when assigning value to function for XML data extraction

I'm coding a script to extract information from several XML files with the same structure but with missing sections when there is no information related to a tag. The easiest way to achieve this was using try/except so instead of getting a "AtributeError: 'NoneType' object has no atrribute 'find'" I assign an empty string('') to the object in the exeption. Something like this:
try:
string1=root.find('value1').find('value2').find('value3').text
except:
string1=''
The issue is that I want to shrink my code by using a function:
def extract(string):
tempstr=''
try:
tempstr=string.replace("\n", "")
except:
if tempstr is None:
tempstr=""
return string
And then I try to called it like this:
string1=extract(root.find('value1').find('value2').find('value3').text)
and value2 or value3 does not exist for the xml that is being processed, I get and AttributeError even if I don't use the variable in the function making the function useless.
Is there a way to make a function work, maybe there is a way to make it run without checking if the value entered is invalid?
Solution:
I'm using a mix of both answers:
def extract(root, xpath):
tempstr=''
try:
tempstr=root.findall(xpath)[0].text.replace("\n", "")
except:
tempstr=''#To avoid getting a Nonetype object
return tempstr
You can try something like that:
def extract(root, children_keys: list):
target_object = root
result_text = ''
try:
for child_key in children_keys:
target_object = target_object.find(child_key)
result_text = target_object.text
except:
pass
return result_text
You will go deeper at XML structure with for loop (children_keys - is predefined by you list of nested keys of XML - xml-path to your object).
And if error will throw inside that code - you will get '' as result.
Example XML (source):
<?xml version="1.0" encoding="UTF-8"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>
<y>Don't forget me this weekend!</y>
</body>
</note>
Example:
import xml.etree.ElementTree as ET
tree = ET.parse('note.xml')
root = tree.getroot()
children_keys = ['body', 'y']
result_string = extract(root, children_keys)
print(result_string)
Output:
"Don't forget me this weekend!"
Use XPATH expression
import xml.etree.ElementTree as ET
xml1 = '''<r><v1><v2><v3>a string</v3></v2></v1></r>'''
root = ET.fromstring(xml1)
v3 = root.findall('./v1/v2/v3')
if v3:
print(v3[0].text)
else:
print('v3 not found')
xml2 = '''<r><v1><v3>a string</v3></v1></r>'''
root = ET.fromstring(xml2)
v3 = root.findall('./v1/v2/v3')
if v3:
print(v3[0].text)
else:
print('v3 not found')
output
a string
v3 not found

Can't figure out why script isn't going any further

I have this function called zms_add_bridge that calls a function called xmlbuilder. xmlbuilder creates a global variable called xml and populates it with stuff.
When zms_add_bridge is called, it prints "here" to say that it's gotten at least that far in the script. Then, it calls xmlbuilder, which prints "xml1" along with the xml output to say that it's at least gotten that far in the script.
The issue is that once xmlbuilder exits and returns to zms_add_bridge, the script seems to die. xml is a global variable, so it should print "yes", but it's not even printing "no".
I even tried to return xml at the end of xmlbuilder to see if that would work, but it didn't (and even still, that'd be redundant since xml was declared globally, wouldn't it?)
Any clues? I'm lost.
def xmlbuilder(requestType,xbra):
global xmlbuild
global xmlbuild__request
global xmlbuild__requestElement
global xmlbuild__requestElementAttribute
global xml
xmlbuild = etree.Element("OSS", attrib={"{"+xsi+"}schemaLocation":schema}, nsmap={'xsi':xsi, None:xmlns})
xmlbuild__request = etree.SubElement(xmlbuild, "Request")
etree.SubElement(xmlbuild__request, "RequestType").text = requestType
etree.SubElement(xmlbuild__request, "RequestMode").text = "online"
etree.SubElement(xmlbuild__request, "SessionID").text = session_id
etree.SubElement(xmlbuild__request, "operName").text = apiUser
etree.SubElement(xmlbuild__request, "Version").text = version
etree.SubElement(xmlbuild__request, "Overwrite").text = "false"
xmlbuild__requestElement = etree.SubElement(xmlbuild, "RequestElement")
for index, data in xbra.items():
for key in data:
xmlbuild__requestElementAttribute = etree.SubElement(xmlbuild__requestElement, "Attribute")
etree.SubElement(xmlbuild__requestElementAttribute, "Name").text = "%s" % key
etree.SubElement(xmlbuild__requestElementAttribute, "Value").text = "%s" % data[key]
xml = etree.tostring(xmlbuild, encoding="UTF-8", pretty_print=True, xml_declaration=True)
if debug >= 1:
print "\nDEBUG: Generated XML for transmission\n"
print xml
print "xml1: " + xml
def zms_add_bridge(fsan,vlanId,maxUnicast,secure):
if debug >= 3:
print "\nDEBUG: Function: zms_add_bridge\n"
print "\nDEBUG: Get GPON parameters of ONT\n"
xbra = defaultdict(list)
xbra[1] = {'Filter_Type': 'GponOnuPhysical'}
xbra[2] = {'Filter_Condition': 'Device_Id=' + device_Handle_Id}
xbra[3] = {'Filter_Condition': 'serialNoVendorId=ZNTS'}
xbra[4] = {'Filter_Condition': 'serialNoVendorSpecificHex=' + fsan}
if debug >= 4:
print "\nDEBUG: Dictionary\n"
for index, data in xbra.items():
for key in data:
print "Index: %s: Key: %s, Value: %s" % (index, key,data[key])
print "here"
xmlbuilder("list",xbra)
if xml:
print "yes"
else:
print "no"
Output:
here
xml1: <?xml version='1.0' encoding='UTF-8'?>
<OSS xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.zhone.com/OSSXML" xsi:schemaLocation="http://www.zhone.com/OSSXML ossxml.xsd">
<Request>
<RequestType>list</RequestType>
<RequestMode>online</RequestMode>
<SessionID>0.379081153249641</SessionID>
<operName>boss</operName>
<Version>2.5.1</Version>
<Overwrite>false</Overwrite>
</Request>
<RequestElement>
<Attribute>
<Name>Filter_Type</Name>
<Value>GponOnuPhysical</Value>
</Attribute>
<Attribute>
<Name>Filter_Condition</Name>
<Value>Device_Id=2</Value>
</Attribute>
<Attribute>
<Name>Filter_Condition</Name>
<Value>serialNoVendorId=ZNTS</Value>
</Attribute>
<Attribute>
<Name>Filter_Condition</Name>
<Value>serialNoVendorSpecificHex=03739175</Value>
</Attribute>
</RequestElement>
</OSS>

How can I parse a XML file to a dictionary in Python?

I 'am trying to parse a XML file using the Python library minidom (even tried xml.etree.ElementTree API).
My XML (resource.xml)
<?xml version='1.0'?>
<quota_result xmlns="https://some_url">
</quota_rule>
<quota_rule name='max_mem_per_user/5'>
<users>user1</users>
<limit resource='mem' limit='1550' value='921'/>
</quota_rule>
<quota_rule name='max_mem_per_user/6'>
<users>user2 /users>
<limit resource='mem' limit='2150' value='3'/>
</quota_rule>
</quota_result>
I would like to parse this file and store inside a dictionnary the information in the following form and be able to access it:
dict={user1=[resource,limit,value],user2=[resource,limit,value]}
So far I have only been able to do things like:
docXML = minidom.parse("resource.xml")
for node in docXML.getElementsByTagName('limit'):
print node.getAttribute('value')
You can use getElementsByTagName and getAttribute to trace the result:
dict_users = dict()
docXML = parse('mydata.xml')
users= docXML.getElementsByTagName("quota_rule")
for node in users:
user = 'None'
tag_user = node.getElementsByTagName("users") #check the length of the tag_user to see if tag <users> is exist or not
if len(tag_user) ==0:
print "tag <users> is not exist"
else:
user = tag_user[0]
resource = node.getElementsByTagName("limit")[0].getAttribute("resource")
limit = node.getElementsByTagName("limit")[0].getAttribute("limit")
value = node.getElementsByTagName("limit")[0].getAttribute("value")
dict_users[user.firstChild.data]=[resource, limit, value]
if user == 'None':
dict_users['None']=[resource, limit, value]
else:
dict_users[user.firstChild.data]=[resource, limit, value]
print(dict_users) # remove the <users>user1</users> in xml
Output:
tag <users> is not exist
{'None': [u'mem', u'1550', u'921'], u'user2': [u'mem', u'2150', u'3']}

How to retrieve certain child elements using python and lxml

With lots of help from stack overflow, I managed to get some python code working to process xml files (using lxml). I've been able to adapt it for lots of different purposes, but there is one thing I can't work out.
Example XML:
<?xml version="1.0" encoding="UTF-8" ?>
<TVAMain xml:lang="PL" publisher="Someone" publicationTime="2014-01-03T06:24:24+00:00" version="217" xmlns="urn:tva:metadata:2010" xmlns:mpeg7="urn:tva:mpeg7:2008" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:tva:metadata:2010 http://Something.xsd">
<ProgramDescription>
<ProgramInformationTable>
<ProgramInformation programId="crid://bds.tv/88032"><BasicDescription>
<Title xml:lang="PL" type="episodeTitle">Some Title</Title>
<Synopsis xml:lang="PL" length="short">Some Synopsis</Synopsis>
<Genre href="urn:tva:metadata:cs:EventGenreCS:2009:96">
<Name xml:lang="EN">Some Genre</Name>
</Genre>
<Language>PL</Language>
<RelatedMaterial>
<HowRelated href="urn:eventis:metadata:cs:HowRelatedCS:2010:boxCover">
<Name>Box cover</Name>
</HowRelated>
<MediaLocator>
<mpeg7:MediaUri>file://Images/98528834.p.jpg</mpeg7:MediaUri>
</MediaLocator>
</RelatedMaterial>
The python code will return the Title, Genre and Synopsis, but it will not return the image reference (3rd line from the bottom). I presume this is because of the name format 'mpeg7:MediaUri' (which I cannot change). The code will return the 'No Image' string instead.
This is the relavent python code
file_name = input('Enter the file name, including .xml extension: ')
print('Parsing ' + file_name)
from lxml import etree
parser = etree.XMLParser()
tree = etree.parse(file_name, parser)
root = tree.getroot()
nsmap = {'xmlns': 'urn:tva:metadata:2010'}
with open(file_name+'.log', 'w', encoding='utf-8') as f:
for info in root.xpath('//xmlns:ProgramInformation', namespaces=nsmap):
crid = (info.get('programId'))
titlex = (info.find('.//xmlns:Title', namespaces=nsmap))
title = (titlex.text if titlex != None else 'No title')
genrex = (info.find('.//xmlns:Genre/xmlns:Name', namespaces=nsmap))
genre = (genrex.text if genrex != None else 'No Genre')
imagex = (info.find('.//xmlns:RelatedMaterial/xmlns:MediaLocator/xmlns:"mpeg7:MediaUri"', namespaces=nsmap))
image = (image.text if imagex != None else 'No Image')
f.write('{}|{}|{}|{}\n'.format(crid, title, genre, image))
Can someone explain how I can adapt the 'imagex' line, so that it returns 'file://Images/98528834.p.jpg' from the example? I had a look at using square brackets, but it caused an error.
That node you are interested in, has mpeg7 namespace instead of default namespace. You can try with this syntax *[local-name() = "elementName"] to match element by it's local name (ignoring the namespace) :
imagex = info.xpath(
'.//xmlns:RelatedMaterial/xmlns:MediaLocator/*[local-name() = "MediaUri"]',
namespaces=nsmap)[0]
Or add the mpeg7 in namespaces declaration :
nsmap = {'xmlns': 'urn:tva:metadata:2010', 'mpeg7':'urn:tva:mpeg7:2008'}
then you can use mpeg7 prefix in xpath query :
imagex = (info.find('.//xmlns:RelatedMaterial/xmlns:MediaLocator/mpeg7:MediaUri', namespaces=nsmap))

Categories