xml.etree.ElementTree getElementByID()? - python

How to get the equivalent of getElementByID() with the Python library xml.etree.ElementTree?
There seems to be a method called parseid() but my tree is already parsed. I don't want to parse it again.

I found it myself:
tree.findall('''.//*[#id='fooID']''')[0]
Better or other solutions are still welcome. :-)

The accepted answer works indeed, but performance can be quite abysmal as - my guess is, I didn't verify this, perhaps also related to the complexity of xpath - the tree is traversed on every to findall(), which may or may not be a concern for your use case.
Probably parseid() is indeed what you want if performance is a concern. If you want to obtain such an id mapping on an existing tree, you can also easily perform the traversal once manually.
class getElementById():
def __init__(self, tree):
self.di = {}
def v(node):
i = node.attrib.get("id")
if i is not None:
self.di[i] = node
for child in node:
v(child)
v(tree.getroot())
def __call__(self, k):
return self.di[k]

Related

What is faster: iterating through Python AST to find particular type nodes, or override the visit_type method?

The ast module in Python allows multiple traversal strategies. I want to understand, is there any significant gain in terms of complexity when choosing a specific way of traversal?
Here are two examples:
Example 1
class GlobalVisitor(ast.NodeTransformer):
def generic_visit(self, tree):
for node in tree.body:
if isinstance(node, ast.Global):
*transform the ast*
Example 2
class GlobalVisitor(ast.NodeTransformer):
def visit_Global(self, tree):
*transform the ast*
In Example 1, I override the generic_visit method, providing my own implementation of how I want to traverse the tree. This, however, happens through through visiting every node in the body, so O(n).
In Example 2, I override the visit_Global, and I am thus able to do stuff with all Global type nodes immediately. That's how ast works.
I want to understand, in Example 2, does ast have instant O(1) access to the nodes I specify through overriding visit_field(self, node), or it just goes through the tree again in O(n), looking for the nodes I need in the background, just simplifying my life a little bit?
Some takeaways from the comments provided by #metatoaster, #user2357112 and #rici :
1. Example 1 is completely wrong. One should not aim to traverse the tree in the way that was described, because iterating over tree.body is completely wrong - tree.body isn't a collection of every node in an AST. It's an attribute of Module nodes that gives a list of the nodes for top-level statements in the module. It will miss every global statement that matters (since barring extremely weird exec cases, a correct global statement is never top-level), it will crash on non-Module node input..
If you want to implement a correct version of Example 1, just recursively iterate using ast.iter_child_nodes. However, note that iter_child_nodes is correctly named. It is not iter_descendant_nodes. It does not visit anything other than direct children. The recursive walk must be implemented in the action performed on each child.
2. When implemented correctly, two approached are equivalent, and imply a recursive traversal, however overriding a visit_type(self, node) saves you some time. No gain in terms of complexity will be achieved.
3. Only use NodeTransformer if you want to alter the AST, otherwise just use NodeVisitor.
Finally, ast doesn't seem to be documented exhaustively enough, refer to this for a more detailed documentation. It is a bit outdated (by ~ a year), but explains some fundamentals better than the original ast.

Bi-Directional Binary Search Trees?

I have tried to implement a BST. As of now it only adds keys according to the BST property(Left-Lower, Right-Bigger). Though I implemented it in a different way.
This is how I think BST's are supposed to be
Single Direction BST
How I have implemented my BST
Bi-Directional BST
The question is whether or not is it the correct implementation of BST?
(The way i see it in double sided BST's it would be easier to search, delete and insert)
import pdb;
class Node:
def __init__(self, value):
self.value=value
self.parent=None
self.left_child=None
self.right_child=None
class BST:
def __init__(self,root=None):
self.root=root
def add(self,value):
#pdb.set_trace()
new_node=Node(value)
self.tp=self.root
if self.root is not None:
while True:
if self.tp.parent is None:
break
else:
self.tp=self.tp.parent
#the self.tp varible always is at the first node.
while True:
if new_node.value >= self.tp.value :
if self.tp.right_child is None:
new_node.parent=self.tp
self.tp.right_child=new_node
break
elif self.tp.right_child is not None:
self.tp=self.tp.right_child
print("Going Down Right")
print(new_node.value)
elif new_node.value < self.tp.value :
if self.tp.left_child is None:
new_node.parent=self.tp
self.tp.left_child=new_node
break
elif self.tp.left_child is not None:
self.tp=self.tp.left_child
print("Going Down Left")
print(new_node.value)
self.root=new_node
newBST=BST()
newBST.add(9)
newBST.add(10)
newBST.add(2)
newBST.add(15)
newBST.add(14)
newBST.add(1)
newBST.add(3)
Edit: I have used while loops instead of recursion. Could someone please elaborate as why using while loops instead of recursion is a bad idea in this particular case and in general?
BSTs with parent links are used occasionally.
The benefit is not that the links make it easier to search or update (they don't really), but that you can insert before or after any given node, or traverse forward or backward from that node, without having to search from the root.
It becomes convenient to use a pointer to a node to represent a position in the tree, instead of a full path, even when the tree contains duplicates, and that position remains valid as updates or deletions are performed elsewhere.
In an abstract data type, these properties make it easy, for example, to provide iterators that aren't invalidated by mutations.
You haven't described how you gain anything with the parent pointer. An algorithm that cares about rewinding to the parent node, will do so by crawling back up the call stack.
I've been there -- in my data structures class, I implemented my stuff with bi-directional pointers. When we got to binary trees, those pointers ceased to be useful. Proper use of recursion replaces the need to follow a link back up the tree.

Refactoring between public and private methods in Python

I'm looking at the Binary Search Trees section in the tutorial "Problem Solving with Algorithms and Data Structures", (http://interactivepython.org/runestone/static/pythonds/Trees/SearchTreeImplementation.html). On several occasions, they use "public" and "private" helper methods with the same name, e.g. for the "put" method:
def put(self,key,val):
if self.root:
self._put(key,val,self.root)
else:
self.root = TreeNode(key,val)
self.size = self.size + 1
def _put(self,key,val,currentNode):
if key < currentNode.key:
if currentNode.hasLeftChild():
self._put(key,val,currentNode.leftChild)
else:
currentNode.leftChild = TreeNode(key,val,parent=currentNode)
else:
if currentNode.hasRightChild():
self._put(key,val,currentNode.rightChild)
else:
currentNode.rightChild = TreeNode(key,val,parent=currentNode)
I also have seen this approach elsewhere, but I don't really understand the motivation. What is the advantage compared to putting everything directly into one method, is it just to improve readability?
The rationale is that the user of the class should not know anything about "current node". The current node only makes sense during the recursive insert process, it's not a permanent property of the tree. The user treats the tree as a whole, and only does insert/lookup operations on that.
That said, you could mix both methods into one, by using a default value currentNode=None and checking it. However, the two methods are doing significantly different things. The put method just initialises the root, while the _put does the recursive insertion, so it would probably be better to keep them separate.
Here, the motivation is to use recursion. As you probably notice, _put method calls itself and method signatures are different. If you put _put method into public method, you have to change signature of public method to handle put operation on a given node. Simply, you have to add currentNode parameter. However, original public method does not have this parameter. I assume, it is because the author does not want to expose this functionality to end user.

Python 'ast' module with Visitor pattern - get node's group, not concrete class

I'm using ast python library and want to traverse through my ast nodes.
Visitor pattern is supported in the library pretty well, but if I use it, I will have to implement methods for visiting items of concrete classes, e.g. def visit_Load. In my case, concrete classes are not so important - I'd like to know whether the node is an operator or an expr according to the structure given here.
Of course, I can add generic_visit method and then check all the conditions here, but that looks like a wrong way to use the pattern.
Is there any other pretty way to implement this idea without massive code duplication?
This probably isn't pretty, but you can dynamically create all the methods by introspection:
def visit_expr(self, node):
"""Do something in here!"""
self.generic_visit(node)
ExprVisitor = type('ExprVisitor', (ast.NodeVisitor,), {
'visit_' % cls.__name__: visit_expr for cls in ast.expr.__subclasses__()})
Of course, you can keep doing this for whatever ast node types you need to deal with. . .

In Python ElementTree how can I get list of all ancestors of an element in tree?

I need "get_ancestors_recursively" function.
A sample run can be
>>> dump(tr)
<anc1>
<anc2>
<element> </element>
</anc2>
</anc1>
>>> input_element = tr.getiterator("element")[0]
>>> get_ancestors_recursively(input_element)
['anc1', 'anc2']
Can somebody help me with this ?
Another option is LXML, which provides useful extensions to the built in ElementTree api. If you're willing to install an external module, it has a nice Element.getparent() function that you could simply call recursively until reaching ElementTree.getroot(). This will probably be the fastest and most elegant solution (as the lxml.etree module introduces pointer attributes for the Elements that point to their parents, so instead of searching the entire tree for the proper parent/child pairs).
In the latest version of ElementTree (v1.3 or later), you can simply do
input_element.find('..')
recursively. However, the ElementTree that ships with Python doesn't have this functionality, and I don't see anything in the Element class that looks upwards.
I believe this means you have to do it the hard way: via an exhaustive search of the element tree.
def get_ancestors_recursively(e, b):
"Finds ancestors of b in the element tree e."
return _get_ancestors_recursively(e.getroot(), b, [])
def _get_ancestors_recursively(s, b, acc):
"Recursive variant. acc is the built-up list of ancestors so far."
if s == b:
return acc
else:
for child in s.getchildren():
newacc = acc[:]
newacc.append(s)
res = _get_ancestors_recursively(child, b, newacc)
if res is not None:
return res
return None
This is slow because of the DFS, and cranks out a lot of lists for garbage collection, but if you can deal with that it should be fine.
Found this little gem from lots of googling (http://elmpowered.skawaii.net/?p=74)
parent = root.findall(".//{0}/..".format(elem.tag))
root here is your root node of the tree. elem is the actual element object you get from iterating.
This does require you to know the root, which may mean changing how you set up for XML parsing, but it's minor at best.

Categories