I am a begginer using Python. What I am trying to do is to update the InvoiceStatus of a certain list of invoices - I want to update it to "N" instead of "A". Below the XML file extract:
<?xml version="1.0" encoding="WINDOWS-1252"?>
<AuditFile>
<Header>
<AuditFileVersion>1.04_01</AuditFileVersion>
<CompanyID>51630</CompanyID>
</Header>
<MasterFiles>
<Customer>
<CustomerID>20201376</CustomerID>
<AccountID>20000</AccountID>
</Customer>
</MasterFiles>
<SourceDocuments>
<SalesInvoices>
<NumberOfEntries>981</NumberOfEntries>
<Invoice>
<InvoiceNo>F2 UF/3510000211</InvoiceNo>
<ATCUD>0</ATCUD>
<DocumentStatus>
<InvoiceStatus>A</InvoiceStatus>
<SourceBilling>P</SourceBilling>
</DocumentStatus>
<InvoiceNo>F2 UF/3510020247</InvoiceNo>
<ATCUD>0</ATCUD>
<DocumentStatus>
<InvoiceStatus>A</InvoiceStatus>
<SourceBilling>P</SourceBilling>
</DocumentStatus>
<InvoiceNo>F2 UF/3510020247</InvoiceNo>
<ATCUD>0</ATCUD>
<DocumentStatus>
<InvoiceStatus>A</InvoiceStatus>
<SourceBilling>P</SourceBilling>
</DocumentStatus>
</Invoice>
</SalesInvoices>
</SourceDocuments>
</AuditFile>
Here the script:
from xml.dom import minidom
def reemplazaTexto(nodo,textonuevo):
nodo.firstChild.replaceWholeText(textonuevo)
doc = minidom.parse('sample.xml')
print(doc.toxml())
invoices = doc.getElementsByTagName('InvoiceStatus')
for nodo in invoices:
reemplazaTexto(nodo, 'N')
print(doc.toxml())
But this script modifies all the InvoiceStatus. I would appreciate a hand on this.
Cheers,
Axel
Related
I have a xml file and its structure like that,
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<book>
<toc> <tocdiv pagenum="564">
<title>9thmemo</title>
<tocdiv pagenum="588">
<title>b</title>
</tocdiv>
</tocdiv></toc>
<chapter><title>9thmemo</title>
<para>...</para>
<para>...</para></chapter>
<chapter>...</chapter>
<chapter>...</chapter>
</book>
There are several chapters in the <book>...</book>, and each chapter has a title, I only want to read all content of this chapter,"9thmemo"(not others)
I tried to read by following code:
from xml.dom import minidom
filename = "result.xml"
file = minidom.parse(filename)
chapters = file.getElementsByTagName('chapter')
for i in range(10):
print(chapters[i])
I only get the address of each chapter...
if I add some sub-element like chapters[i].title, it shows cannot find this attribute
I only want to read all content of this chapter,"9thmemo"(not others)
The problem with the code is that it does not try to locate the specific 'chapter' while the answer code uses xpath in order to locate it.
Try the below
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<book>
<toc>
<tocdiv pagenum="564">
<title>9thmemo</title>
<tocdiv pagenum="588">
<title>b</title>
</tocdiv>
</tocdiv>
</toc>
<chapter>
<title>9thmemo</title>
<para>A</para>
<para>B</para>
</chapter>
<chapter>...</chapter>
<chapter>...</chapter>
</book>'''
root = ET.fromstring(xml)
chapter = root.find('.//chapter/[title="9thmemo"]')
para_data = ','.join(p.text for p in chapter.findall('para'))
print(para_data)
output
A,B
So I have the following .txt file of data, where the data highlighted with yellow needs to be saved to a new txt file:
I managed to print certain sections in Python, but that's about it:
with open('Podatki-zima-MEDVES.txt', mode='r+t') as file:
for line in file:
print(line[18:39])
Resulting in:
EntryDate="20101126"
EntryDate="20101126"
EntryDate="20101126"
EntryDate="20101126"
EntryDate="20101127"
EntryDate="20101128"
EntryDate="20101128"
EntryDate="20101128"
EntryDate="20101128"
I know it's a very basic question, but for someone experienced this wouldn't take a minute.
Thanks
It looks like you're trying to parse xml data.
There is a standard library package that can do this. The documentation is pretty good and it includes a tutorial. Take a look at The ElementTree XML API.
In you case the code would look something like:
data = """
<data>
<ROW EntryData="20101126" SnowDepth="4"/>
<ROW EntryData="20101127" SnowDepth="8"/>
</data>"""
import xml.etree.ElementTree as ET
root = ET.fromstring(data)
for child in root:
entries = child.attrib
print(entries["EntryData"], entries["SnowDepth"])
This gives the output you're looking for:
20101126 4
20101127 8
As an alternative to using Element Tree you could use an Expat parser for your Structured Markup data.
You first need to specify document type and wrap a top level element around your data as follows:
<?xml version="1.0"?>
<podatki>
<ROW RowState="5" EntryDate="20101126" Entry="" SnowDepth="4" />
<ROW RowState="13" EntryDate="20101126" Entry="Prvi sneg to zimo" SnowDepth="10" />
</podatki>
Then you could use an expat parser.
import xml.parsers.expat
def podatki(name, attrs):
if name == "ROW":
print(f'EntryDate={attrs["EntryDate"]},',
f'SnowDepth={attrs["SnowDepth"]}')
parser = xml.parsers.expat.ParserCreate()
parser.StartElementHandler = podatki
with open('podatki.xml', 'rb') as input_file:
parser.ParseFile(input_file)
The result should be
EntryDate=20101126, SnowDepth=4
EntryDate=20101126, SnowDepth=10
Hi I'm new to xml files in general, but I am trying to replace specific lines in a xml file using 'if statements' in python 3.6. I've been looking at suggestions to use ElementTree, but none of the posts online quite fit the problem I have, so here I am.
My file is as followed:
<?xml version="1.0" encoding="UTF-8"?>
-<StructureDefinition xmlns="http://hl7.org/fhir">
<url value="http://example.org/fhir/StructureDefinition/MyObservation"/>
<name value="MyObservation"/>
<status value="draft"/>
<fhirVersion value="3.0.1"/>
<kind value="resource"/>
<abstract value="false"/>
<type value="Observation"/>
<baseDefinition value="http://hl7.org/fhir/StructureDefinition/Observation"/>
<derivation value="constraint"/>
</StructureDefinition>
I want to replace
url value="http://example.org/fhir/StructureDefinition/MyObservation"/
to something like
url value="http://example.org/fhir/StructureDefinition/NewObservation"/
by using conditional statements - because these are repeated multiple times in other files.
I have tried for-looping through the xml find to find the exact string match (which I've succeeded), but I wasn't able to delete, or replace the line (probably having to do with the fact that this isn't a .txt file).
Any help is greatly appreciated!
Your sample file contains a "-"-token in ln 3 that may be overlooked when copy/pasting in order to find a solution.
Input File
<?xml version="1.0" encoding="UTF-8"?>
<StructureDefinition xmlns="http://hl7.org/fhir">
<url value="http://example.org/fhir/StructureDefinition/MyObservation"/>
<name value="MyObservation"/>
<status value="draft"/>
<fhirVersion value="3.0.1"/>
<kind value="resource"/>
<abstract value="false"/>
<type value="Observation"/>
<baseDefinition value="http://hl7.org/fhir/StructureDefinition/Observation"/>
<derivation value="constraint"/>
</StructureDefinition>
Script
from xml.dom.minidom import parse # use minidom for this task
dom = parse('june.xml') #read in your file
search = "http://example.org/fhir/StructureDefinition/MyObservation" #set search value
replace = "http://example.org/fhir/StructureDefinition/NewObservation" #set replace value
res = dom.getElementsByTagName('url') #iterate over url tags
for element in res:
if element.getAttribute('value') == search: #in case of match
element.setAttribute('value', replace) #replace
with open('june_updated.xml', 'w') as f:
f.write(dom.toxml()) #update the dom, save as new xml file
Output file
<?xml version="1.0" ?><StructureDefinition xmlns="http://hl7.org/fhir">
<url value="http://example.org/fhir/StructureDefinition/NewObservation"/>
<name value="MyObservation"/>
<status value="draft"/>
<fhirVersion value="3.0.1"/>
<kind value="resource"/>
<abstract value="false"/>
<type value="Observation"/>
<baseDefinition value="http://hl7.org/fhir/StructureDefinition/Observation"/>
<derivation value="constraint"/>
</StructureDefinition>
I am trying to parse a XML using python ,xml example snippet:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<raml xmlns="raml21.xsd" version="2.1">
<series xmlns="" scope="USA" name="Arizona">
<header>
<log action="created"/>
</header>
<x_ns color="Blue">
<p name="timeZone">(GMT-10)</p>
</x_ns>
<x_ns color="Red">
<p name="AvgHeight">175</p>
</x_ns>
<x_ns color="black">
<p name="AvgWeight">235</p>
</x_ns>
the problem is namespaces keeps changing so as an alternative I tried to read the xmlns string first then create a dicionary using namespaces using the below code
root = raw_xml.getroot()
namespace_temp1=root.tag.split("}")
namespace_temp2=namespace_temp1[0].strip('{')
namespaces_auto={}
tag_name =["x","y","z","w","v"]
ns_name=[namespace_temp2,namespace_temp2,namespace_temp2,namespace_temp2,namespace_temp2]
namespace_temp3=zip(tag_name,ns_name)
for tag,ns in namespace_temp3:
namespaces_auto[tag]=ns
namespaces=namespaces_auto
to access a particular tag with namespace I am using the code as follows
for data in raw_xml.findall('x:x_ns',namespaces)
this pretty much solves the problem but gets stuck when the child node has blank xmlns as seen in the series tag (xmlns=""). Not Sure how to incorporate it in the code to check this condition.
I am using Elementtree to parse an xml file, edit the contents and write to a new xml file. I have this all working apart form one issue. When I generate the file there are a lot of extra lines containing namespace information. Here are some snippets of code:
import xml.etree.ElementTree as ET
ET.register_namespace("", "http://clish.sourceforge.net/XMLSchema")
tree = ET.parse('ethernet.xml')
root = tree.getroot()
commands = root.findall('{http://clish.sourceforge.net/XMLSchema}'
'VIEW/{http://clish.sourceforge.net/XMLSchema}COMMAND')
for command in commands:
all1.append(list(command.iter()))
And a sample of the output file, with the erroneous line xmlns="http://clish.sourceforge.net/XMLSchema:
<COMMAND xmlns="http://clish.sourceforge.net/XMLSchema" help="Interface specific description" name="description">
<PARAM help="Description (must be in double-quotes)" name="description" ptype="LINE" />
<CONFIG />
</COMMAND>
How can I remove this with elementtree, can I? Or will i have to use some regex (I am writing a string to the file)?