XML generation Python reading from DB

XML generation Python reading from DB - python

I have generate an xml, that take a param value from the model.
The result xml should be like this <read><test><value string/></test></read>
root = etree.Element('read')
node = etree.Element('test')
value = etree.SubElement(node, "value" )
for dict in Listing.objects.values('string'):
value.text = dict['string']
I get like this :<read><test><value>string</value></test></read>
How should I add the string to the value tag rather than as a TEXT?

Firstly, as #parfait mentioned, the XML result you want (<read><test><value string/></test></read>) is not valid XML.
Using the code you provided, the node test doesn't get added to the root node read.
1. Code to get your current result:
If you want something like this as the result: <read><test><value>string</value></test></read> (which you said you don't), then the code for it would be:
>>> root = etree.Element('read')
>>> node = etree.SubElement(root, 'test') # `node` should be a `SubElement` of root
>>> # or create as `Element` and use `root.append(node)` to make it a `SubElement` later
... value = etree.SubElement(node, 'value')
>>> value.text = 'string' # or populate based on your `Listing.objects`
>>> etree.dump(root)
<read><test><value>string</value></test></read>
2. "string" as a value (text) of "test":
If you want 'string' to be the value of test and not as a node 'value' under 'test', then you should set 'string' as the .text attribute of 'test':
>>> root = etree.Element('read')
>>> node = etree.SubElement(root, 'test')
>>> node.text = 'string'
>>> etree.dump(root)
<read><test>string</test></read>
3. "value" as an attribute of "test" with the value "string":
The one I think you're trying to get:
>>> root = etree.Element('read')
>>> node = etree.SubElement(root, 'test')
>>> node.attrib # the current attributes of the node, nothing, empty dict
{}
>>> node.attrib['value'] = 'string' # this is how you set an attribute
>>> etree.dump(root)
<read><test value="string" /></read>
Btw, in XML good-ness, the 2nd option is nicer than the 3rd; but botha/all are valid XML.

Related

Identify if yaml key is anchor or pointer

I use ruamel.yaml in order to parse YAML files and I'd like to identify if the key is the anchor itself or just a pointer. Given the following:
foo: &some_anchor
bar: 1
baz: *some_anchor
I'd like to understand that foo is the actual anchor and baz is a pointer. From what I can see, there's an anchor property on the node (and also yaml_anchor method), but both baz and foo show that their anchor is some_anchor - meaning that I cannot differentiate.
How can I get this info?

Since PyYaml and Ruamel.yaml load an alias node as a reference of the object loaded from the corresponding anchor node, you can traverse an object tree and check if each node is a reference of a previous visited object or not.
The following is a simple example only checking dictionaries.
from ruamel.yaml import YAML
root = YAML().load('''
foo: &some_anchor
bar: 1
baz: *some_anchor
''')
dict_ids = set()
def visit(parent):
if isinstance(parent, dict):
i = id(parent)
print(parent, ', is_alias:', i in dict_ids)
dict_ids.add(i)
for k, v in parent.items():
visit(v)
elif isinstance(parent, list):
for e in parent:
visit(e)
visit(root)
This will output the following.
ordereddict([('foo', ordereddict([('bar', 1)])), ('baz', ordereddict([('bar', 1)]))]) , is_alias: False
ordereddict([('bar', 1)]) , is_alias: False
ordereddict([('bar', 1)]) , is_alias: True

In your example &some_anchor is the anchor for the single element mapping bar: 1 and
*some_anchor is the alias. Writing the "foo is the actual anchor and baz is pointer`" is
in IMO both incorrect terminology and confusing keys with their (anchored/aliased) values. If you had a YAML document:
- 3
- 5
- 9
- &some_anchor
bar: 1
- 42
- *some_anchor
would you actually say, probably after carefully counting,
that '4 is the anchor and 6 is the pointer(or3and5` depending on
where you start counting)?
If you want to test if a key of a dict has a value that was an anchored node in YAML, or if that
value was an aliased node, you'll have to look at the value, and you'll find that they are the same Python data structure
for keys foo resp. baz.
What determines on dumping, which key's value gets the anchor and which key's (or keys') value(s) are dumped as an alias,
is entirely determined
by which gets dumped first, as the YAML specification stats that an anchor has to come before its use as an alias (an
anchor can come after an alias if it is re-defined).
As #relent95 describes you should recursively walk over the
data structure you loaded (to see which key gets there first) and in both ruamel.yaml and PyYAML look at the id().
But for PyYAML that only works for complex data (dict, list, objects) as it throws away anchoring information and will
not find the same id() on e.g. an anchored integer value.
The alternative to using the id is to look at the actual anchor name that ruamel.yaml stores in attribute/property anchor.
If you know up front that your YAML document is as simple as your example ( anchored/aliased nodes are values for
the root level mapping ) you can do:
import sys
import ruamel.yaml
yaml_str = """\
foo: &some_anchor
bar: 1
baz: *some_anchor
oof: 42
"""
def is_keys_value_anchor(key, data, verbose=0):
anchor_found = set()
for k, v in data.items():
res = None
try:
anchor = v.anchor.value
if anchor is not None:
res = anchor not in anchor_found
anchor_found.add(anchor)
except AttributeError:
pass
if k == key:
break
if verbose > 0:
print(f'key "{key}" {res}')
return res
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
is_keys_value_anchor('foo', data, verbose=1)
is_keys_value_anchor('baz', data, verbose=1)
is_keys_value_anchor('oof', data, verbose=1)
which gives:
key "foo" True
key "baz" False
key "oof" None
But this in ineffecient for root mappings with lots of keys, and won't find anchors/aliases that were nested deeply
in the document. A more generic approach is to recursively walk the data structure once and create dict with
as key the anchor used, and as value a list of "paths", A path itself being a list of keys/indices with which
which you can traverse the data structure starting at the root. The first path in the list being the anchor, the rest aliases:
import sys
import ruamel.yaml
yaml_str = """\
foo: &some_anchor
- bar: 1
- klm: &anchored_num 42
baz:
xyz:
- *some_anchor
oof: [1, 2, c: 13, magic: [*anchored_num]]
"""
def find_anchor_alias_paths(data, path=None, res=None):
def check_add_anchor(d, path, anchors):
# returns False when an alias is found, to prevent recursing into a node twice.
try:
anchor = d.anchor.value
if anchor is not None:
tmp = anchors.setdefault(anchor, [])
tmp.append(path)
return len(tmp) == 1
except AttributeError:
pass
return True
if path is None:
path = []
if res is None:
res = {}
if isinstance(data, dict):
for k, v in data.items():
next_path = path.copy()
next_path.append(k)
if check_add_anchor(v, next_path, res):
find_anchor_alias_paths(v, next_path, res)
elif isinstance(data, list):
for idx, elem in enumerate(data):
next_path = path.copy()
next_path.append(idx)
if check_add_anchor(elem, next_path, res):
find_anchor_alias_paths(elem, next_path, res)
return res
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
anchor_alias_paths = find_anchor_alias_paths(data)
for anchor, paths in anchor_alias_paths.items():
print(f'anchor: "{anchor}", anchor_path: {paths[0]}, alias_path(s): {paths[1:]}')
print('value for last anchor/alias found', data.mlget(paths[-1], list_ok=True))
which gives:
anchor: "some_anchor", anchor_path: ['foo'], alias_path(s): [['baz', 'xyz', 0]]
anchor: "anchored_num", anchor_path: ['foo', 1, 'klm'], alias_path(s): [['oof', 3, 'magic', 0]]
value for last anchor/alias found 42
You can then test your the paths you are interested in against the values returned by find_anchor_alias_paths,
or the key against the final elements of such paths.

Indented namespacing with python

I have a config file with loads of paths and I want to organize them in a way. So I decided using types.SimpleNamespace to do that like:
paths = SimpleNamespace()
paths.base = '...'
paths.dataset.encoded = '...'
and I got:
AttributeError: 'types.SimpleNamespace' object has no attribute 'dataset'
I tried to define paths.dataset even though I didn't need it yet it didn't work:
paths = SimpleNamespace()
paths.base = '...'
paths.dataset = '...'
paths.dataset.encoded = '...'
AttributeError: 'str' object has no attribute 'encoded'
I also tried this:
_ = {
'base': '...',
'dataset': {
'encoded': '...',
}
}
paths = SimpleNamespace(**_)
and here is the result:
>>> paths.dataset.encoded # Error
AttributeError: 'dict' object has no attribute 'encoded'
>>> paths.dataset['encoded'] # works
'...'
This means that SimpleNamespace only works for one layer namespacing, right?
Is there another solution to this? I mean a solution other than using SimpleNamespace for every layer like this:
dataset = SimpleNamespace()
dataset.encoded = '...'
paths = SimpleNamespace()
paths.base = '???'
paths.dataset = dataset
>>> paths.base
'???'
>>> paths.dataset.encoded
'...'
Any ideas?

I came up with this solution:
def create_namespace(dictionary: dict):
"""Create a namespace of given dictionary
the difference between create_namespace and python's types.SimpleNamespace
is that the former will create name space recursively, but the later will
create the namespace in one layer indentation. See the examples to see the
difference.
Parameters
----------
dictionary : dict
A dict to be converted to a namespace object
Returns
-------
types.SimpleNamespace
A combination of SimpleNamespaces that will have an multilayer
namespace
Examples
--------
>>> dictionary = {
... 'layer1_a': '1a',
... 'layer1_b': {
... 'layer2_a': '2a',
... },
... }
>>> # types.SimpleNamespace
>>> simple = SimpleNamespace(**dictionary)
>>> simple.layer1_a
'1a'
>>> simple.layer1_b.layer2_a
AttributeError: 'dict' object has no attribute 'layer2_a'
# because layer1_b is still a dictionary
>>> # create_namespace
>>> complex = create_namespace(dictionary)
>>> complex.layer1_a
'1a'
>>> complex.layer1_a.layer2_a
'2a'
"""
space = {}
for key, value in dictionary.items():
if isinstance(value, dict):
value = create_namespace(value)
space[key] = value
return SimpleNamespace(**space)
but I think there is a better way that I'm not seeing. I appreciate any comments on this.

How should I parse this xml string in python?

My XML string is -
xmlData = """<SMSResponse xmlns="http://example.com" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<Cancelled>false</Cancelled>
<MessageID>00000000-0000-0000-0000-000000000000</MessageID>
<Queued>false</Queued>
<SMSError>NoError</SMSError>
<SMSIncomingMessages i:nil="true"/>
<Sent>false</Sent>
<SentDateTime>0001-01-01T00:00:00</SentDateTime>
</SMSResponse>"""
I am trying to parse and get the values of tags - Cancelled, MessageId, SMSError, etc. I am using python's Elementtree library. So far, I have tried things like -
root = ET.fromstring(xmlData)
print root.find('Sent') // gives None
for child in root:
print chil.find('MessageId') // also gives None
Although, I am able to print the tags with -
for child in root:
print child.tag
//child.tag for the tag Cancelled is - {http://example.com}Cancelled
and their respective values with -
for child in root:
print child.text
How do I get something like -
print child.Queued // will print false
Like in PHP we can access them with the root -
$xml = simplexml_load_string($data);
$status = $xml->SMSError;

Your document has a namespace on it, you need to include the namespace when searching:
root = ET.fromstring(xmlData)
print root.find('{http://example.com}Sent',)
print root.find('{http://example.com}MessageID')
output:
<Element '{http://example.com}Sent' at 0x1043e0690>
<Element '{http://example.com}MessageID' at 0x1043e0350>
The find() and findall() methods also take a namespace map; you can search for a arbitrary prefix, and the prefix will be looked up in that map, to save typing:
nsmap = {'n': 'http://example.com'}
print root.find('n:Sent', namespaces=nsmap)
print root.find('n:MessageID', namespaces=nsmap)

If you're set on Python standard XML libraries, you could use something like this:
root = ET.fromstring(xmlData)
namespace = 'http://example.com'
def query(tree, nodename):
return tree.find('{{{ex}}}{nodename}'.format(ex=namespace, nodename=nodename))
queued = query(root, 'Queued')
print queued.text

You can create a dictionary and directly get values out of it...
tree = ET.fromstring(xmlData)
root = {}
for child in tree:
root[child.tag.split("}")[1]] = child.text
print root["Queued"]

With lxml.etree:
In [8]: import lxml.etree as et
In [9]: doc=et.fromstring(xmlData)
In [10]: ns={'n':'http://example.com'}
In [11]: doc.xpath('n:Queued/text()',namespaces=ns)
Out[11]: ['false']
With elementtree you can do:
import xml.etree.ElementTree as ET
root=ET.fromstring(xmlData)
ns={'n':'http://example.com'}
root.find('n:Queued',namespaces=ns).text
Out[13]: 'false'

Search and remove element with elementTree in Python

I have an XML document in which I want to search for some elements and if they match some criteria
I would like to delete them
However, I cannot seem to be able to access the parent of the element so that I can delete it
file = open('test.xml', "r")
elem = ElementTree.parse(file)
namespace = "{http://somens}"
props = elem.findall('.//{0}prop'.format(namespace))
for prop in props:
type = prop.attrib.get('type', None)
if type == 'json':
value = json.loads(prop.attrib['value'])
if value['name'] == 'Page1.Button1':
#here I need to access the parent of prop
# in order to delete the prop
Is there a way I can do this?
Thanks

You can remove child elements with the according remove method. To remove an element you have to call its parents remove method. Unfortunately Element does not provide a reference to its parents, so it is up to you to keep track of parent/child relations (which speaks against your use of elem.findall())
A proposed solution could look like this:
root = elem.getroot()
for child in root:
if child.name != "prop":
continue
if True:# TODO: do your check here!
root.remove(child)
PS: don't use prop.attrib.get(), use prop.get(), as explained here.

You could use xpath to select an Element's parent.
file = open('test.xml', "r")
elem = ElementTree.parse(file)
namespace = "{http://somens}"
props = elem.findall('.//{0}prop'.format(namespace))
for prop in props:
type = prop.get('type', None)
if type == 'json':
value = json.loads(prop.attrib['value'])
if value['name'] == 'Page1.Button1':
# Get parent and remove this prop
parent = prop.find("..")
parent.remove(prop)
http://docs.python.org/2/library/xml.etree.elementtree.html#supported-xpath-syntax
Except if you try that it doesn't work: http://elmpowered.skawaii.net/?p=74
So instead you have to:
file = open('test.xml', "r")
elem = ElementTree.parse(file)
namespace = "{http://somens}"
search = './/{0}prop'.format(namespace)
# Use xpath to get all parents of props
prop_parents = elem.findall(search + '/..')
for parent in prop_parents:
# Still have to find and iterate through child props
for prop in parent.findall(search):
type = prop.get('type', None)
if type == 'json':
value = json.loads(prop.attrib['value'])
if value['name'] == 'Page1.Button1':
parent.remove(prop)
It is two searches and a nested loop. The inner search is only on Elements known to contain props as first children, but that may not mean much depending on your schema.

I know this is an old thread but this kept popping up while I was trying to figure out a similar task. I did not like the accepted answer for two reasons:
1) It doesn't handle multiple nested levels of tags.
2) It will break if multiple xml tags are deleted in the same level one-after-another. Since each element is an index of Element._children you shouldn't delete while forward iterating.
I think a better more versatile solution is this:
import xml.etree.ElementTree as et
file = 'test.xml'
tree = et.parse(file)
root = tree.getroot()
def iterator(parents, nested=False):
for child in reversed(parents):
if nested:
if len(child) >= 1:
iterator(child)
if True: # Add your entire condition here
parents.remove(child)
iterator(root, nested=True)
For the OP, this should work - but I don't have the data you're working with to test if it's perfect.
import xml.etree.ElementTree as et
file = 'test.xml'
tree = et.parse(file)
namespace = "{http://somens}"
props = tree.findall('.//{0}prop'.format(namespace))
def iterator(parents, nested=False):
for child in reversed(parents):
if nested:
if len(child) >= 1:
iterator(child)
if prop.attrib.get('type') == 'json':
value = json.loads(prop.attrib['value'])
if value['name'] == 'Page1.Button1':
parents.remove(child)
iterator(props, nested=True)

A solution using lxml module
from lxml import etree
root = ET.fromstring(xml_str)
for e in root.findall('.//{http://some.name.space}node'):
parent = e.getparent()
for child in parent.find('./{http://some.name.space}node'):
try:
parent.remove(child)
except ValueError:
pass

Using the fact that every child must have a parent, I'm going to simplify #kitsu.eb's example. f using the findall command to get the children and parents, their indices will be equivalent.
file = open('test.xml', "r")
elem = ElementTree.parse(file)
namespace = "{http://somens}"
search = './/{0}prop'.format(namespace)
# Use xpath to get all parents of props
prop_parents = elem.findall(search + '/..')
props = elem.findall('.//{0}prop'.format(namespace))
for prop in props:
type = prop.attrib.get('type', None)
if type == 'json':
value = json.loads(prop.attrib['value'])
if value['name'] == 'Page1.Button1':
#use the index of the current child to find
#its parent and remove the child
prop_parents[props.index[prop]].remove(prop)

I also used XPath for this issue, but in a different way:
root = elem.getroot()
elementName = "YourElement"
#this will find all the parents of the elements with elementName
for elementParent in root.findall(".//{}/..".format(elementName)):
#this will find all the elements under the parent, and remove them
for element in elementParent.findall("{}".format(elementName)):
elementParent.remove(element)

I like to use an XPath expression for this kind of filtering. Unless I know otherwise, such an expression must be applied at the root level, which means I can't just get a parent and apply the same expression on that parent. However, it seems to me that there is a nice and flexible solution that should work with any supported XPath, as long as none of the sought nodes is the root. It goes something like this:
root = elem.getroot()
# Find all nodes matching the filter string (flt)
nodes = root.findall(flt)
while len(nodes):
# As long as there are nodes, there should be parents
# Get the first of all parents to the found nodes
parent = root.findall(flt+'/..')[0]
# Use this parent to remove the first node
parent.remove(nodes[0])
# Find all remaining nodes
nodes = root.findall(flt)

I would like only to add a comment on the accepted answer, but my lack of reputation doesn't allow me to. I wanted to add that it is important to add .findall("*")to the iterator to avoid issues, as stated in the documentation:
Note that concurrent modification while iterating can lead to problems, just like when iterating and modifying Python lists or dicts. Therefore, the example first collects all matching elements with root.findall(), and only then iterates over the list of matches.
Therefore, in the accepted answer the iteration should be for child in root.findal("*"):instead of for child in root:. Not doing so made my code skip some elements from the list.

Node.TEXT_NODE has the value, but I need the Attribute

I have an xml file like so:
<host name='ip-10-196-55-2.ec2.internal'>
<hostvalue name='arch_string'>lx24-x86</hostvalue>
<hostvalue name='num_proc'>1</hostvalue>
<hostvalue name='load_avg'>0.01</hostvalue>
</host>
I can get get out the Node.data from a Node.TEXT_NODE, but I also need the Attribute name, like I want to know load_avg = 0.01, without writing load_avg, num_proc, etc, one by one. I want them all.
My code looks like this, but I can't figure out what part of the Node has the attribute name for me.
for stat in h.getElementsByTagName("hostvalue"):
for node3 in stat.childNodes:
attr = "foo"
val = "poo"
if node3.nodeType == Node.ATTRINUTE_NODE:
attr = node3.tagName
if node3.nodeType == Node.TEXT_NODE:
#attr = node3.tagName
val = node3.data
From the above code, I'm able to get val, but not attr (compile error:

here's a short example of what you could achieve:
from xml.dom import minidom
xmldoc = minidom.parse("so.xml")
values = {}
for stat in xmldoc.getElementsByTagName("hostvalue"):
attr = stat.attributes["name"].value
value = "\n".join([x.data for x in stat.childNodes])
values[attr] = value
print repr(values)
This outputs, given your XML file:
$ ./parse.py
{u'num_proc': u'1', u'arch_string': u'lx24-x86', u'load_avg': u'0.01'}
Be warned that this is not failsafe, i.e. if you have nested elements inside <hostvalue>.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

XML generation Python reading from DB - python

Related

Identify if yaml key is anchor or pointer

Indented namespacing with python

How should I parse this xml string in python?

Search and remove element with elementTree in Python

Node.TEXT_NODE has the value, but I need the Attribute

Categories

Resources