I have just started with lxml basics and I am stuck with namespaces: I need to generate an xml like this:
<CityModel
xmlns:bldg="http://www.opengis.net/citygml/building/2.0"
<cityObjectMember>
<bldg:Building>
<bldg:function>1000</bldg:function>
</bldg:Building>
</cityObjectMember>
</CityModel>
By using the following code:
from lxml import etree
cityModel = etree.Element("cityModel")
cityObject = etree.SubElement(cityModel, "cityObjectMember")
bldg = etree.SubElement(cityObject, "{http://schemas.opengis.net/citygml/building/2.0/building.xsd}bldg")
function = etree.SubElement(bldg, "{bldg:}function")
function.text = "1000"
print etree.tostring(cityModel, pretty_print=True)
I get this:
<cityModel>
<cityObjectMember>
<ns0:bldg xmlns:ns0="http://schemas.opengis.net/citygml/building/2.0/building.xsd">
<ns1:function xmlns:ns1="bldg:">1000</ns1:function>
</ns0:bldg>
</cityObjectMember>
</cityModel>
which is quite different from what I want, and my software doesn't parse it.
How to get the correct xml?
from lxml import etree
ns_bldg = "http://www.opengis.net/citygml/building/2.0"
nsmap = {
'bldg': ns_bldg,
}
cityModel = etree.Element("cityModel", nsmap=nsmap)
cityObject = etree.SubElement(cityModel, "cityObjectMember")
bldg = etree.SubElement(cityObject, "{%s}Building" % ns_bldg)
function = etree.SubElement(bldg, "{%s}function" % ns_bldg)
function.text = "1000"
print etree.tostring(cityModel, pretty_print=True)
prints
<cityModel xmlns:bldg="http://www.opengis.net/citygml/building/2.0">
<cityObjectMember>
<bldg:Building>
<bldg:function>1000</bldg:function>
</bldg:Building>
</cityObjectMember>
</cityModel>
See lxml.etree Tutorial - Namespaces.
Related
I have exported my google-maps Point Of Interests (saved places / locations) via the takeout tool. How can i convert this to GPX, so that i can import it into OSMAnd?
I tried using gpsbabel:
gpsbabel -i geojson -f my-saved-locations.json -o gpx -F my-saved-locations_converted.gpx
But this did not retain the title/name of each point of interest - and instead just used names like WPT001, WPT002, etc.
in the end I solved this by creating a small python script to convert between the formats.
This could be easily adapted for specific needs:
#!/usr/bin/env python3
import argparse
import json
import xml.etree.ElementTree as ET
from xml.dom import minidom
def ingestJson(geoJsonFilepath):
poiList = []
with open(geoJsonFilepath) as fileObj:
data = json.load(fileObj)
for f in data["features"]:
poiList.append({'title': f["properties"]["Title"],
'lon': f["geometry"]["coordinates"][0],
'lat': f["geometry"]["coordinates"][1],
'link': f["properties"].get("Google Maps URL", ''),
'address': f["properties"]["Location"].get("Address", '')})
return poiList
def dumpGpx(gpxFilePath, poiList):
gpx = ET.Element("gpx", version="1.1", creator="", xmlns="http://www.topografix.com/GPX/1/1")
for poi in poiList:
wpt = ET.SubElement(gpx, "wpt", lat=str(poi["lat"]), lon=str(poi["lon"]))
ET.SubElement(wpt, "name").text = poi["title"]
ET.SubElement(wpt, "desc").text = poi["address"]
ET.SubElement(wpt, "link").text = poi["link"]
xmlstr = minidom.parseString(ET.tostring(gpx)).toprettyxml(encoding="utf-8", indent=" ")
with open(gpxFilePath, "wb") as f:
f.write(xmlstr)
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--inputGeoJsonFilepath', required=True)
parser.add_argument('--outputGpxFilepath', required=True)
args = parser.parse_args()
poiList = ingestJson(args.inputGeoJsonFilepath)
dumpGpx(args.outputGpxFilepath, poiList=poiList)
if __name__ == "__main__":
main()
...
it can be called like so:
./convert-googlemaps-geojson-to-gpx.py \
--inputGeoJsonFilepath my-saved-locations.json \
--outputGpxFilepath my-saved-locations_converted.gpx
There is also a NPM script called "togpx":
https://github.com/tyrasd/togpx
I didn't try it, but it claims to keep as much information as possible.
New to python,I am presently in the process of converting the XML to CSV using Python 3.6.1
Input file is file1.xml file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Package>
<name>AllFeatureRules</name>
<pkgId>13569656</pkgId>
<pkgMetadata>
<creator>rsikhapa</creator>
<createdDate>13-05-2018 10:07:16</createdDate>
<pkgVersion>3.0.29</pkgVersion>
<application>All</application>
<icType>Feature</icType>
<businessService>Common</businessService>
<technology>All,NA</technology>
<runTimeFormat>RBML</runTimeFormat>
<inputForTranslation></inputForTranslation>
<pkgDescription></pkgDescription>
</pkgMetadata>
<rules>
<rule>
<name>ip_slas_scheduling</name>
<ruleId>46288</ruleId>
<ruleVersion>1.3.0</ruleVersion>
<ruleVersionId>1698132</ruleVersionId>
<nuggetId>619577</nuggetId>
<nuggetVersionId>225380</nuggetVersionId>
<icType>Feature</icType>
<creator>paws</creator>
<customer></customer>
</rule>
</rules>
<versionChanges>
<rulesAdded/>
<rulesModified/>
<rulesDeleted/>
</versionChanges>
</Package>
python code:
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse("file1.xml")
root = tree.getroot()
get_range = lambda col: range(len(col))
l = [{r[i].tag:r[i].text for i in get_range(r)} for r in root]
df = pd.DataFrame.from_dict(l)
df.to_csv('ABC.csv')
python code written as above
problem is it is taking csv conversion only for parent element(pkgmetadata) not for child element(rules).
,
not converting all xml file into csv .please let me know solution
to iterate over every entry, you can use the element trees ET.iter() function.
try:
import xml.etree.cElementTree as ET
except ImportError:
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse("file1.xml")
root = tree.getroot()
iter_root = root.iter()
l = {}
for elem in iter_root:
l[str(elem.tag)] = str(elem.text)
df = pd.DataFrame.from_dict(l,orient="index")
df.to_csv('ABC.csv')
producing a csv:
;0
Package;"
"
name;ip_slas_scheduling
pkgId;13569656
pkgMetadata;"
"
creator;paws
createdDate;13-05-2018 10:07:16
pkgVersion;3.0.29
application;All
icType;Feature
businessService;Common
technology;All,NA
runTimeFormat;RBML
inputForTranslation;None
pkgDescription;None
rules;"
"
rule;"
"
ruleId;46288
ruleVersion;1.3.0
ruleVersionId;1698132
nuggetId;619577
nuggetVersionId;225380
customer;None
versionChanges;"
"
rulesAdded;None
rulesModified;None
rulesDeleted;None
I am trying to use findall to select on some xml elements, but i can't get any results.
import xml.etree.ElementTree as ET
import sys
storefront = sys.argv[1]
xmlFileName = 'promotions{0}.xml'
xmlFile = xmlFileName.format(storefront)
csvFileName = 'hrz{0}.csv'
csvFile = csvFileName.format(storefront)
ET.register_namespace('', "http://www.demandware.com/xml/impex/promotion/2008-01-31")
tree = ET.parse(xmlFile)
root = tree.getroot()
print('------------------Generate test-------------\n')
csv = open(csvFile,'w')
n = 0
for child in root.findall('campaign'):
print(child.attrib['campaign-id'])
print(n)
n+=1
The XML looks something like this:
<?xml version="1.0" encoding="UTF-8"?>
<promotions xmlns="http://www.demandware.com/xml/impex/promotion/2008-01-31">
<campaign campaign-id="10off-310781">
<enabled-flag>true</enabled-flag>
<campaign-scope>
<applicable-online/>
</campaign-scope>
<customer-groups match-mode="any">
<customer-group group-id="Everyone"/>
</customer-groups>
</campaign>
<campaign campaign-id="MNT-deals">
<enabled-flag>true</enabled-flag>
<campaign-scope>
<applicable-online/>
</campaign-scope>
<start-date>2017-07-03T22:00:00.000Z</start-date>
<end-date>2017-07-31T22:00:00.000Z</end-date>
<customer-groups match-mode="any">
<customer-group group-id="Everyone"/>
</customer-groups>
</campaign>
<campaign campaign-id="black-friday">
<enabled-flag>true</enabled-flag>
<campaign-scope>
<applicable-online/>
</campaign-scope>
<start-date>2017-11-23T23:00:00.000Z</start-date>
<end-date>2017-11-24T23:00:00.000Z</end-date>
<customer-groups match-mode="any">
<customer-group group-id="Everyone"/>
</customer-groups>
<custom-attributes>
<custom-attribute attribute-id="expires_date">2017-11-29</custom-attribute>
</custom-attributes>
</campaign>
<promotion-campaign-assignment promotion-id="winter17-new-bubble" campaign-id="winter17-new-bubble">
<qualifiers match-mode="any">
<customer-groups/>
<source-codes/>
<coupons/>
</qualifiers>
<rank>100</rank>
</promotion-campaign-assignment>
<promotion-campaign-assignment promotion-id="xmas" campaign-id="xmas">
<qualifiers match-mode="any">
<customer-groups/>
<source-codes/>
<coupons/>
</qualifiers>
</promotion-campaign-assignment>
</promotions>
Any ideas what i am doing wrong?
I have tried different solutions that i found on stackoverflow but nothing seems to work for me(from the things i have tried).
The list is empty.
Sorry if it is something very obvious i am new to python.
As mentioned here by #MartijnPieters, etree's .findall uses the namespaces argument while the .register_namespace() is used for xml output of the tree. Therefore, consider mapping the default namespace with an explicit prefix. Below uses doc but can even be cosmin.
Additionally, consider with and enumerate() even the csv module as better handlers for your print and CSV outputs.
import csv
...
root = tree.getroot()
print('------------------Generate test-------------\n')
with open(csvFile, 'w') as f:
c = csv.writer(f, lineterminator='\n')
for n, child in enumerate(root.findall('doc:campaign', namespaces={'doc':'http://www.demandware.com/xml/impex/promotion/2008-01-31'})):
print(child.attrib['campaign-id'])
print(n)
c.writerow([child.attrib['campaign-id']])
# ------------------Generate test-------------
# 10off-310781
# 0
# MNT-deals
# 1
# black-friday
# 2
I want to filter failure messages from output files generated after executing my testcases in Robot Framework. I have tried modules like from robot.api import ExecutionResult but it gives me only only count of Passed and Failed Testcases.
I have also tried other Robot framework Libtraries like import robot.errors to filter out all error messages but didn't get any luck. Below is my code block:
`
#!/usr/bin/python
from robot.api import ExecutionResult
import robot.errors
from robot.result.visitor import ResultVisitor
xmlpath = "<output.xml PATH>"
result = ExecutionResult(xmlpath)
result.configure(stat_config={'suite_stat_level': 2,
'tag_stat_combine': 'tagANDanother'})
stats = result.statistics
print stats.total.critical.failed
print stats.total.critical.passed
print stats.total.critical.passed + stats.total.critical.failed
class FailureCollector(ResultVisitor):
def __init__(self):
self.failures = []
def visit_test(self, test):
if not test.passed:
self.failures += [test]
failure_collector = FailureCollector()
result.visit(failure_collector)
print failure_collector.failures
#the above print gives me all failed testcases as a list Eg: ['test1:My example Testcase1','test2:My example Testcase2' ]`
Any example to get this work done will be very helpful.
I have tried a lot to get my expected output by using Robot Framework APIs but didn't get proper solution. Finally I got my solution by using import xml.etree.ElementTree as ET module. By using xml.etree.ElementTree module I am parsing my robot result.xml file and getting my work done.
`
import xml.etree.ElementTree as ET
import re
tree = ET.parse('<output.xml file Path>')
root = tree.getroot()
testplans = <Testplans as a list>
i = 0
err_dict = {}
for testplan in testplans:
full_err_list = []
err_list = []
for suite_level_1 in root:
try:
if suite_level_1.tag == "suite":
for suite_level_2 in suite_level_1:
if suite_level_2.tag == "suite" and suite_level_2.attrib['name'] == testplan:
for suite_level_3 in suite_level_2:
if suite_level_3.tag == "suite":
for test in suite_level_3:
if test.tag == "test":
for kw_level_5 in test:
if kw_level_5.tag == "kw" and kw_level_5.attrib['name'] == '<specific keyword under which you expect your result(error or Success message >':
for msg in kw_level_5:
if msg.tag == 'msg':
err_str = msg.text
#print err_str
mat = re.match(r'\$\{FinalResult\}\s=\s(.*)',err_str)
if mat and mat.group(1) != 'Succeeded.':
i=i+1
#print mat.group(1), i
err = mat.group(1)
full_err_list.append(err)
if err not in err_list:
err_list.append(err)
except:
print "Errors found"
break
err_dict[testplan] = err_list
print "\n########## "+testplan+" ##########\n"
print "Total no of failures", len(full_err_list)
for err_name in err_list:
print err_name, "===>>", full_err_list.count(err_name)
##The above will print the error name and its count in specific testPlan`
I'm trying to capture all instances of "Catalina 320" SO LONG as they occur before the "These boats" string (see generic sample below).
I have the code to capture ALL instances of "Catalina 320" but I can't figure out how to stop it at the "These boats" string.
resultsArray = re.findall(r'<tag>(Catalina 320)</tag>', string, re.DOTALL)
Can anyone help me solve this missing piece? I tried adding '.+These boats' but it didn't work.
Thanks-
JD
Blah blah blah
<tag>**Catalina 320**</tag>
Blah
<td>**Catalina 320**</td>
Blah Blah
<tag>**These boats** are fully booked for the day</tag>
Blah blah blah
<tag>Catalina 320</tag>
<tag>Catalina 320</tag>
You could solve this with a regular expression, but regex isn't required based on the way that you stated problemSee End Note 1.
You should use lxml to parse this...
import lxml.etree as ET
from lxml.etree import XMLParser
resultsArray = []
parser = XMLParser(ns_clean=True, recover=True)
tree = ET.parse('foo.html', parser) # See End-Note 2
for elem in tree.findall("//"):
if "These boats" in elem.text:
break
elif "Catalina 320" in elem.text:
resultsArray.append(ET.tostring(elem).strip())
print resultsArray
Executing this yields:
[mpenning#Bucksnort ~]$ python foo.py
['<tag>**Catalina 320**</tag>', '<td>**Catalina 320**</td>']
[mpenning#Bucksnort ~]$
End Notes:
The current version of your question doesn't have valid markup, but I assumed you have either xml or html (which was what you had in version 1 of the question)... my answer can handle your text as-written, but it makes more sense to assume some kind of structure markup, so I used the following input text, which I saved locally as foo.html:
<body>
<tag>Blah blah blah</tag>
<tag>**Catalina 320**</tag>
<tag>Blah<tag>
<td>**Catalina 320**</td>
</tag>Blah Blah </tag>
<tag>**These boats** are fully booked for the day</tag>
<tag>Blah blah blah</tag>
<tag>Catalina 320</tag>
<tag>Catalina 320</tag>
</body>
If you want to be a bit more careful about encoding issues, you can use lxml.soupparser as a fallback when parsing HTML with lxml
from lxml.html import soupparser
# ...
try:
parser = XMLParser(ns_clean=True, recover=True)
tree = ET.parse('foo.html', parser)
except UnicodeDecodeError:
tree = soupparser.parse('foo.html')
If there is no other context to your problem, you can just search before the first occurrence of 'These boats':
re.findall('Catalina 320', string.split('These boats')[0])
groups = re.findall(r'(Catalina 320)*.*These boats, r.read(), re.DOTALL)
the first group in groups will contain the list of Catalina 320 matches.
With file of name 'foo.html' containing
<body>
<tag>Blah blah blah</tag>
<tag>**Catalina 320**</tag>
<tag>Blah<tag>
<td>**Catalina 320**</td>
</tag>Blah Blah </tag>
<tag>**These boats** are fully booked for the day</tag>
<tag>Blah blah blah</tag>
<tag>Catalina 320</tag>
<tag>Catalina 320</tag>
</body>
code:
from time import clock
n = 1000
########################################################################
import lxml.etree as ET
from lxml.etree import XMLParser
parser = XMLParser(ns_clean=True, recover=True)
etree = ET.parse('foo.html', parser)
te = clock()
for i in xrange(n):
resultsArray = []
for thing in etree.findall("//"):
if "These boats" in thing.text:
break
elif "Catalina 320"in thing.text:
resultsArray.append(ET.tostring(thing).strip())
tf = clock()
print 'Solution with lxml'
print tf-te,'\n',resultsArray
########################################################################
with open('foo.html') as f:
text = f.read()
import re
print '\n\n----------------------------------'
rigx = re.compile('(Catalina 320)(?:(?:.(?!Catalina 320))*These boats.*\Z)?',re.DOTALL)
te = clock()
for i in xrange(n):
yi = rigx.findall(text)
tf = clock()
print 'Solution 1 with a regex'
print tf-te,'\n',yi
print '\n----------------------------------'
ragx = re.compile('(Catalina 320)|(These boats)')
te = clock()
for i in xrange(n):
li = []
for mat in ragx.finditer(text):
if mat.group(2):
break
else:
li.append(mat.group(1))
tf = clock()
print 'Solution 2 with a regex, similar to solution with lxml'
print tf-te,'\n',li
print '\n----------------------------------'
regx = re.compile('(Catalina 320)')
te = clock()
for i in xrange(n):
ye = regx.findall(text, 0, text.find('These boats') if 'These boats' in text else len(text))
tf = clock()
print 'Solution 3 with a regex'
print tf-te,'\n',ye
result
Solution with lxml
0.30324105438
['<tag>**Catalina 320**</tag>', '<td>**Catalina 320**</td>']
----------------------------------
Solution 1 with regex
0.0245033935877
['Catalina 320', 'Catalina 320']
----------------------------------
Solution 2 with a regex, similar to solution with lxml
0.0233258696287
['Catalina 320', 'Catalina 320']
----------------------------------
Solution 3 with regex
0.00784708671074
['Catalina 320', 'Catalina 320']
What is wrong in my solutions with regex ??
Times:
lxml - 100 %
solution 1 - 8.1 %
solution 2 - 7.7 %
solution 3 - 2.6 %
Using a regex doesn't requires the text to be an XML or HTML text.
.
So, what are the remaining arguments to pretend that regexes are inferior to lxml to treat this problem ??
EDIT 1
The solution with rigx = re.compile('(Catalina 320)(?:(?:.(?!Catalina 320))*These boats.*\Z)?',re.DOTALL) isn't good:
this regex will catch the occurences of 'Catalina 320' situated AFTER 'These boats' IF there are no occurences of 'Catalina 320' BEFORE 'These boats'
The pattern must be:
rigx = re.compile('(<tag>Catalina 320</tag>)(?:(?:.(?!<tag>Catalina 320</tag>))*These boats.*\Z)?|These boats.*\Z',re.DOTALL)
But this is a rather complicated pattern compared to other solutions