Generating one XML file from another using Python - python

How would one go about generating a local XML file from a url?
I need to be able to select out certain values from the remote XML file and place thme into a local one. Currently I only have snippets.
from xml.dom import minidom
from urllib.request import urlopen
import requests
url = 'http://url.php'
private_list = [16735,7456,18114]
xmldoc = minidom.parse(urlopen(url))
public_list = xmldoc.getElementsByTagName('server')
for public_server in public_list:
for private_server in private_list:
if (public_server.attributes['id'].value) == str(private_server):
print("Found one!")
Sadly that is about as far as I can get I am able to grab the correct number of elements from the public list but I am not sure how to take those elements and build a local copy of those.
Can anyone assist?
Edit: Example
The example XML looks like this:
<settings>
<servers>
<server url="192.168.1.100" name="CentOS" id="12" host="Kirk.corporate.lan"/>
<server url="10.0.0.95" name="Ubunutu" id="4" url2="192.168.1.50" host="Spock.corporate.lan"/>
<server url="10.0.1.95" id="30" host="scotty.corporate.lan"/>
</servers>
</settings>
I this example my list will only include ID 4. So I want to take just that subelement whos ID = 4. I can find that with my code above I dont know how to take that entire element though and shove it into a new file.
It looks like with xml.etree.ElementTree I can do
import xml.etree.ElementTree as ET
settings = ET.Element('settings')
servers = ET.SubElement(settings, 'servers')
That will build the base but thats about as far as I can get successfully.
Edit 2:
Got a little further
from lxml import etree
root = etree.Element('settings')
main = etree.SubElement(root,'servers')
main = etree.SubElement(main, "server", url = "192.168.1.100", name="CentOS", id="12", host="Kirk.corporate.lan")

Related

Python XML element tree data extract

I am VERY new to python and airflow. I have been asked to create a python script which will be run in airflow which goes through XML and extracts the data places it into a variable. I have so far been doing ok and have extracted data successfully however i have now hit a problem and I am not sure why. I have attached a screenshot of the XML I am trying to extract:
<?xml version="1.0"?>
<message>
<m_control>
</m_control>
<m_content>
<b_control>
</b_control>
<intermediary type="IFA">
</intermediary>
<application>
<personal_client id="pc1">
</personal_client>
<product type="xxx" product_code="xxx">
<risk_benefit id="xx1" type="xxx">
<cover_purpose>xxxxxx</cover_purpose>
<cover_period>
<end_age definition="">xx</end_age>
</cover_period>
</risk_benefit>
</application>
</m_content>
</message>
Below is the code I am using which has worked before
from xml.etree import ElementTree as ET
import re
strip_namespace_regex = re.compile(' xmlns="[^"]+"')
product = root.findall('.//product')
for product in product:
risk_benefit_node = product.find('.//risk_benefit')
result['cover_purpose'] = risk_benefit_node.find('cover_purpose').text if risk_benefit_node.find('cover_purpose').text is not None else ''
However at the moment I get
ERROR - Failed to execute task: 'NoneType' object has no attribute 'text'
When I try the below
from xml.etree import ElementTree as ET
import re
strip_namespace_regex = re.compile(' xmlns="[^"]+"')
product = root.findall('.//product')
for product in product:
risk_benefit_node = product.find('.//risk_benefit')
result['risk_benefit_id'] = risk_benefit_node.attrib['id'] if risk_benefit_node.attrib['id'] is not None else ''
I get
ERROR - Failed to execute task: 'id'"
I am not sure whether it's because the risk benefit section has two attributes or it is not picking up the risk benefit section.
Does anyone know what I am doing wrong?

Update the xml using python3 at specific subelement?

I am trying to update below xml file in python 3 using import xml.etree.ElementTree as ET but not able to add anything between tags
Issue I am facing not able to get/fetch the tag after fileSets.
Can someone let me know how we could update the xml?
abc.xml
<assembly
xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2
http://maven.apache.org/xsd/assembly-1.1.2-xsd"
>
<id></id>
<formats>
<format>zip</format>
</formats>
<fileSets>
<fileSet>
<outputDirectory>/<outputDirectory>
<directory>../</directory>
<useDefaultExcludes>false</useDefaultExcludes>
<includes>
</includes>
</fileSet>
</fileSets>
</assembly>
Expected output:(file names will be added dynamically)
abc.xml
<assembly
xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2
http://maven.apache.org/xsd/assembly-1.1.2-xsd"
>
<id></id>
<formats>
<format>zip</format>
</formats>
<fileSets>
<fileSet>
<outputDirectory>/<outputDirectory>
<directory>../</directory>
<useDefaultExcludes>false</useDefaultExcludes>
<includes>
<include>abc.text</include>
<include>def.text</include>
<include>ghi.text</include>
</includes>
</fileSet>
</fileSets>
</assembly>
I am trying this and it prints me all four element inside this files but doesn't know how to access includes and then add something inside this abc.txt and so on.
import xml.etree.ElementTree as ET
tree = ET.parse(abc.xml)
root = tree.getroot()
for actor in root.findall('{http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2}fileSets'):
for name in actor.findall('{http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2}fileSet'):
print(name)
You don't have to do anything with fileSets orfileSet. Since you want to add children to includes, get that element directly.
import xml.etree.ElementTree as ET
# Ensure that the proper prefix is used in the output (in this case, no prefix at all)
ET.register_namespace("", "http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2")
tree = ET.parse("abc.xml")
# Find the 'includes' element (.// means search the whole document).
# {*} is a wildcard and matches any namespace (Python 3.8)
includes = tree.find(".//{*}includes")
# Create three new 'include' elements
include1 = ET.Element("include")
include1.text = "abc.text"
include2 = ET.Element("include")
include2.text = "def.text"
include3 = ET.Element("include")
include3.text = "ghi.text"
# Add the new elements as children of 'includes'
includes.append(include1)
includes.append(include2)
includes.append(include3)

My input is in the XML format that needs to be converted into a list structure in Python?

I'm trying to take an API call response and parse the XML data into list, but I am struggling with the multiple child/parent relationships.
My hope is to export a new XML file that would line up each job ID and tracking number, which I could then import into Excel.
Here is what I have so far
The source XML file looks like this:
<project>
<name>October 2019</name>
<jobs>
<job>
<id>5654206</id>
<tracking>
<mailPiece>
<barCode>00270200802095682022</barCode>
<address>Accounts Payable,1661 Knott Ave,La Mirada,CA,90638</address>
<status>En Route</status>
<dateTime>2019-10-12 00:04:21.0</dateTime>
<statusLocation>PONTIAC,MI</statusLocation>
</mailPiece>
</tracking>...
Code:
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element, SubElement
tree = ET.parse('mailings.xml')
root = tree.getroot()
print(root.tag)
for x in root[1].findall('job'):
id=x.find('id').text
tracking=x.find('tracking').text
print(root[1].tag,id,tracking)
The script currently returns the following:
jobs 5654206 None
jobs 5654203 None

Create array of values from specific element in XML using Python

I have an XML file which has many elements. I would like to create a list/array of all the values which have a specific element name, in my case "pair:ApplicationNumber".
I've gone over a lot of the other questions however I am not able to find an answer. I know that I can do this by loading the text file and going over it using pandas however, I'm sure there's a much better way.
I was unsuccessful trying ElementTree as well as XML.Dom using minidom
My code currently looks as follows:
import os
from xml.dom import minidom
WindowsUser = os.getenv('username')
XMLPath = os.path.join('C:\\Users', WindowsUser, 'Downloads', 'ApplicationsByCustomerNumber.xml')
xmldoc = minidom.parse(XMLPath)
itemlist = xmldoc.getElementsByTagName('pair:ApplicationNumber')
for s in itemlist:
print(s.attributes['pair:ApplicationNumber'].value)
an example XML file looks as follows:
<?xml version="1.0" encoding="UTF-8"?>
<pair:PatentApplicationList xsi:schemaLocation="urn:us:gov:uspto:pair PatentApplicationList.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:pair="urn:us:gov:uspto:pair">
<pair:FileHeader>
<pair:FileCreationTimeStamp>2017-07-10T10:52:12.12</pair:FileCreationTimeStamp>
</pair:FileHeader>
<pair:ApplicationStatusData>
<pair:ApplicationNumber>62383607</pair:ApplicationNumber>
<pair:ApplicationStatusCode>20</pair:ApplicationStatusCode>
<pair:ApplicationStatusText>Application Dispatched from Preexam, Not Yet Docketed</pair:ApplicationStatusText>
<pair:ApplicationStatusDate>2016-09-16</pair:ApplicationStatusDate>
<pair:AttorneyDocketNumber>1354-T-02-US</pair:AttorneyDocketNumber>
<pair:FilingDate>2016-09-06</pair:FilingDate>
<pair:LastModifiedTimestamp>2017-05-30T21:40:37.37</pair:LastModifiedTimestamp>
<pair:CustomerNumber>122761</pair:CustomerNumber><pair:LastFileHistoryTransaction>
<pair:LastTransactionDate>2017-05-30</pair:LastTransactionDate>
<pair:LastTransactionDescription>Email Notification</pair:LastTransactionDescription> </pair:LastFileHistoryTransaction>
<pair:ImageAvailabilityIndicator>true</pair:ImageAvailabilityIndicator>
</pair:ApplicationStatusData>
<pair:ApplicationStatusData>
<pair:ApplicationNumber>62292372</pair:ApplicationNumber>
<pair:ApplicationStatusCode>160</pair:ApplicationStatusCode>
<pair:ApplicationStatusText>Abandoned -- Incomplete Application (Pre-examination)</pair:ApplicationStatusText>
<pair:ApplicationStatusDate>2016-11-01</pair:ApplicationStatusDate>
<pair:AttorneyDocketNumber>681-S-23-US</pair:AttorneyDocketNumber>
<pair:FilingDate>2016-02-08</pair:FilingDate>
<pair:LastModifiedTimestamp>2017-06-20T21:59:26.26</pair:LastModifiedTimestamp>
<pair:CustomerNumber>122761</pair:CustomerNumber><pair:LastFileHistoryTransaction>
<pair:LastTransactionDate>2017-06-20</pair:LastTransactionDate>
<pair:LastTransactionDescription>Petition Entered</pair:LastTransactionDescription> </pair:LastFileHistoryTransaction>
<pair:ImageAvailabilityIndicator>true</pair:ImageAvailabilityIndicator>
</pair:ApplicationStatusData>
<pair:ApplicationStatusData>
<pair:ApplicationNumber>62289245</pair:ApplicationNumber>
<pair:ApplicationStatusCode>160</pair:ApplicationStatusCode>
<pair:ApplicationStatusText>Abandoned -- Incomplete Application (Pre-examination)</pair:ApplicationStatusText>
<pair:ApplicationStatusDate>2016-10-26</pair:ApplicationStatusDate>
<pair:AttorneyDocketNumber>1526-P-01-US</pair:AttorneyDocketNumber>
<pair:FilingDate>2016-01-31</pair:FilingDate>
<pair:LastModifiedTimestamp>2017-06-15T21:24:13.13</pair:LastModifiedTimestamp>
<pair:CustomerNumber>122761</pair:CustomerNumber><pair:LastFileHistoryTransaction>
<pair:LastTransactionDate>2017-06-15</pair:LastTransactionDate>
<pair:LastTransactionDescription>Petition Entered</pair:LastTransactionDescription> </pair:LastFileHistoryTransaction>
<pair:ImageAvailabilityIndicator>true</pair:ImageAvailabilityIndicator>
</pair:ApplicationStatusData>
</pair:PatentApplicationList>
The XML in your example is expanding the "pair:" part of the tags according to the schema you've used, so it doesn't match 'pair:ApplicationNumber', even though it looks like it should.
I've used element tree to extract the application numbers as follows (I've just used a local XML file in my examples, rather than the full path in your code)
Example 1:
from xml.etree import ElementTree
tree = ElementTree.parse('ApplicationsByCustomerNumber.xml')
root = tree.getroot()
for item in root:
if 'ApplicationStatusData' in item.tag:
for child in item:
if 'ApplicationNumber' in child.tag:
print child.text
Example 2:
from xml.etree import ElementTree
tree = ElementTree.parse('ApplicationsByCustomerNumber.xml')
root = tree.getroot()
for item in root.iter('{urn:us:gov:uspto:pair}ApplicationStatusData'):
for child in item.iter('{urn:us:gov:uspto:pair}ApplicationNumber'):
print child.text
Hope this may be useful.

Python XML parsing from website

I am trying to Parse from a website. I am stuck. I will provide the XML below. It is coming from a webiste. I have two questions. What is the best way to read xml from a website, and then I am having trouble digging into the xml to get the rate I need.
The figure I need back is Base:OBS_VALUE 0.12
What I have so far:
from xml.dom import minidom
import urllib
document = ('http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily''r')
web = urllib.urlopen(document)
get_web = web.read()
xmldoc = minidom.parseString(document)
ff_DataSet = xmldoc.getElementsByTagName('ff:DataSet')[0]
ff_series = ff_DataSet.getElementsByTagName('ff:Series')[0]
for line in ff_series:
price = line.getElementsByTagName('base:OBS_VALUE')[0].firstChild.data
print(price)
XML code from webiste:
-<Header> <ID>FFD</ID>
<Test>false</Test>
<Name xml:lang="en">Federal Funds daily averages</Name> <Prepared>2013-05-08</Prepared>
<Sender id="FRBNY"> <Name xml:lang="en">Federal Reserve Bank of New York</Name>
<Contact>
<Name xml:lang="en">Public Information Web Team</Name> <Email>ny.piwebteam#ny.frb.org</Email>
</Contact>
</Sender>
<!--ReportingBegin></ReportingBegin-->
</Header>
<ff:DataSet> -<ff:Series TIME_FORMAT="P1D" DISCLAIMER="G" FF_METHOD="D" DECIMALS="2" AVAILABILITY="A">
<ffbase:Key>
<base:FREQ>D</base:FREQ>
<base:RATE>FF</base:RATE>
<base:MATURITY>O</base:MATURITY>
<ffbase:FF_SCOPE>D</ffbase:FF_SCOPE>
</ffbase:Key>
<ff:Obs OBS_CONF="F" OBS_STATUS="A">
<base:TIME_PERIOD>2013-05-07</base:TIME_PERIOD>
<base:OBS_VALUE>0.12</base:OBS_VALUE>
If you wanted to stick with xml.dom.minidom, try this...
from xml.dom import minidom
import urllib
url_str = 'http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily'
xml_str = urllib.urlopen(url_str).read()
xmldoc = minidom.parseString(xml_str)
obs_values = xmldoc.getElementsByTagName('base:OBS_VALUE')
# prints the first base:OBS_VALUE it finds
print obs_values[0].firstChild.nodeValue
# prints the second base:OBS_VALUE it finds
print obs_values[1].firstChild.nodeValue
# prints all base:OBS_VALUE in the XML document
for obs_val in obs_values:
print obs_val.firstChild.nodeValue
However, if you want to use lxml, use underrun's solution. Also, your original code had some errors. You were actually attempting to parse the document variable, which was the web address. You needed to parse the xml returned from the website, which in your example is the get_web variable.
Take a look at your code:
document = ('http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily''r')
web = urllib.urlopen(document)
get_web = web.read()
xmldoc = minidom.parseString(document)
I'm not sure you have document correct unless you want http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=dailyr because that's what you'll get (the parens group in this case and strings listed next to each other automatically concatenate).
After that you do some work to create get_web but then you don't use it in the next line. Instead you try to parse your document which is the url...
Beyond that, I would totally suggest you use ElementTree, preferably lxml's ElementTree (http://lxml.de/). Also, lxml's etree parser takes a file-like object which can be a urllib object. If you did, after straightening out the rest of your doc, you could do this:
from lxml import etree
from io import StringIO
import urllib
url = 'http://www.newyorkfed.org/markets/omo/dmm/fftoXML.cfm?type=daily'
root = etree.parse(urllib.urlopen(url))
for obs in root.xpath('/ff:DataSet/ff:Series/ff:Obs'):
price = obs.xpath('./base:OBS_VALUE').text
print(price)

Categories