using the value stored in the ._field attribute in python ast - python

This question is a result of my python ast work.
I have a node in the ast and I want to obtain its children.
The ._field attribute gives the names of all the children of a node. However it is different for different node depending upon the syntax node.
for example if node is of type BinOp..then node._field will yield ('left','op', 'right')
Hence to access the children of node I have to use node.left, node.op and node.right
But I want to do it for any general node.
given any node if I use node._field it will give me a tupple. How do I use this tupple for obtaining the children. The node can be any general node. So I do not know what the tuple would be like beforehand.
examples in form of codes will be really nice!
Thanks!

To iterate over the children of an arbitrary node, use ast.iter_child_nodes(). To iterate over the fields instead, use ast.iter_fields().

Related

What is faster: iterating through Python AST to find particular type nodes, or override the visit_type method?

The ast module in Python allows multiple traversal strategies. I want to understand, is there any significant gain in terms of complexity when choosing a specific way of traversal?
Here are two examples:
Example 1
class GlobalVisitor(ast.NodeTransformer):
def generic_visit(self, tree):
for node in tree.body:
if isinstance(node, ast.Global):
*transform the ast*
Example 2
class GlobalVisitor(ast.NodeTransformer):
def visit_Global(self, tree):
*transform the ast*
In Example 1, I override the generic_visit method, providing my own implementation of how I want to traverse the tree. This, however, happens through through visiting every node in the body, so O(n).
In Example 2, I override the visit_Global, and I am thus able to do stuff with all Global type nodes immediately. That's how ast works.
I want to understand, in Example 2, does ast have instant O(1) access to the nodes I specify through overriding visit_field(self, node), or it just goes through the tree again in O(n), looking for the nodes I need in the background, just simplifying my life a little bit?
Some takeaways from the comments provided by #metatoaster, #user2357112 and #rici :
1. Example 1 is completely wrong. One should not aim to traverse the tree in the way that was described, because iterating over tree.body is completely wrong - tree.body isn't a collection of every node in an AST. It's an attribute of Module nodes that gives a list of the nodes for top-level statements in the module. It will miss every global statement that matters (since barring extremely weird exec cases, a correct global statement is never top-level), it will crash on non-Module node input..
If you want to implement a correct version of Example 1, just recursively iterate using ast.iter_child_nodes. However, note that iter_child_nodes is correctly named. It is not iter_descendant_nodes. It does not visit anything other than direct children. The recursive walk must be implemented in the action performed on each child.
2. When implemented correctly, two approached are equivalent, and imply a recursive traversal, however overriding a visit_type(self, node) saves you some time. No gain in terms of complexity will be achieved.
3. Only use NodeTransformer if you want to alter the AST, otherwise just use NodeVisitor.
Finally, ast doesn't seem to be documented exhaustively enough, refer to this for a more detailed documentation. It is a bit outdated (by ~ a year), but explains some fundamentals better than the original ast.

Python XML - Iterate through elements, and if attribute condition is met, append that element with all its children to the list

I have script that is supposed to filter out some elements out of XML file. I did it like this because I exactly knew what is depth of element, how many children there are...
But can you please give me an example of how this can be done without knowing the depth of nest?
Code looks like this:
def Filter_Modules(folder_name, corresponding_list):
for element in delta_root.iter('folder'):
if element.attrib.get('name') == str(folder_name):
corresponding_list.append(element)
for child in element:
corresponding_list.append(child)
for ch in child:
corresponding_list.append(ch)
for c in ch:
corresponding_list.append(c)
All suggestions are welcome..
I understand that you want to put in corresponding_list all
descendant elements of the folder element of which the name
attribute equals some string.
Then a good solution for that is to use a recursive function. (In
general, recursivity is a good approach to handle data structures like
trees, graphs, ...).
The recursive function add_sub_tree appends and element to
corresponding_list and then recursively calls itself on all its
children. Children will also be appended to corresponding_list and
the function will recursively call itself to append all grand-children
and so on.
def Filter_Modules(folder_name, corresponding_list):
def add_sub_tree(element):
corresponding_list.append(element)
for child in element:
add_sub_tree(child)
for element in delta_root.iter('folder'):
if element.attrib.get('name') == str(folder_name):
add_sub_tree(element)

With python ElementTree, how to add a node to a tree having namespace?

The tree I'm adding nodes to uses a namespace:
xmlns:ns0="http://someplace.net/xml/"
Before inserting the child, you must reach the parent node first, and in my case findall() gives me the parent node that looks like this:
<ns0:parent xmlns:ns0="http://someplace.net/xml/" someattrib="some value">
I tried to construct a child node like this:
node = ET.Element('mytag')
or
node = ET.Element('ns0:mytag')
or
node = ET.Element('ns0:mytag')
node.set('xmlns:ns0', "http://someplace.net/xml/")
Then
parent.extend(node)
But the node was nowhere to be found in the resulting tree. None of the three methods worked.
What am I missing here?
I figured it out.
I should use ET.SubElement(parent, node) to replace ET.Element and forget about .extend().
I should also remove the hard-coded namespace prefixes from the tags.
After making the above changes, it worked as expected.

How to make list lookup faster in this recursive function

I have a recursive function which creates a json object
def add_to_tree(name, parent, start_tree):
for x in start_tree:
if x["name"] == parent:
x["children"].append({"name":name, "parent":parent, "children":[]})
else:
add_to_tree(name, parent, x["children"])
It is called from another function
def caller():
start_tree = [{"name":"root", "parent":"null", "children":[]}] # basic structure of the json object which holds the d3.js tree data
for x in new_list:
name = x.split('/')[-2]
parent = x.split('/')[-3]
add_to_tree(name, parent, start_tree)
new_list is list which holds links in this form
/root/A/
/root/A/B/
/root/A/B/C/
/root/A/D/
/root/E/
/root/E/F/
/root/E/F/G/
/root/E/F/G/H/
...
Everything is working fine except for the fact the run times grows exponentially with with the input size.
Normally new_list has ~500k links and depth of these links can be more than 10 so there is lots of looping and looks involved in the add_to_tree() function.
Any ideas on how to make this faster?
You are searching your whole tree each time you add a new entry. This is hugely inefficient as your tree grows; you can easily end up with a O(N^2) searches this way; for each new element search the whole tree again.
You could use a dictionary mapping names to specific tree entries, for fast O(1) lookups; this lets you avoid having to traverse the tree each time. It can be as simple as treeindex[parent]. This'll take some more memory however, and you may need to handle the case where the parent is added after the children (using a queue).
However, since your input list appears to be sorted, you could just process your list recursively or use a stack and take advantage of the fact you just found the parent already. If your path is longer than the previous entry, it'll be a child of that entry. If the path is equal or shorter, it'll be a sibling entry to the previous node or a parent of that node, so return or pop the stack.
For example, for these three elements:
/root/A/B/
/root/A/B/C/
/root/A/D/
/root/A/B/C does not have to search the tree from the root for /root/A/B, it was the previously processed entry. That'll be the parent call for this recursive iteration, or the top of the stack. Just add to that parent directly.
/root/A/D is a sibling of a parent; the path is shorter than /root/A/B/C/, so return or pop that entry of the stack. The length is equal to /root/A/B/, so it is a direct sibling; again return or pop the stack. Now you'll be at the /root/A level, and /root/A/D/ is a child. Add, and continue your process.
I have not tested this, but it looks like the loop does not stop when an insertion has been made, so every entry in new_list will cause a recursive search through all of the tree. This should speed it up:
def add_to_tree(name, parent, start_tree):
for x in start_tree:
if x["name"] == parent:
x["children"].append({"name":name, "parent":parent, "children":[]})
return True
elif add_to_tree(name, parent, x["children"]):
return True
return False
It stops searching as soon as the parent is found.
That said, I think there is a bug in the approach. What if you have:
/root/A/B/C/
/root/D/B/E/
Your algorithm only parses the last two elements and it seems that both C and E will be placed under B. I think you will need to take all elements into account and make your way down the tree element by element. Anyway that is better since you will know at each level which branch to take, and the correct version will be much faster. Each insert will be O(log N).

Directed graph nodes: Keep track of successors and predecessors

I am trying to implement a class Node representing a node in a directed graph, which in particular has a set of successors and predecessors. I would like Node.predecessors and Node.predecessors to behave like sets, in particular I want to iterate over their elements, add and remove elements, check containment, and set them from an iterable. However, after node_1.sucessors.add(node_2) it should be True that node_1 in node_2.pedecessors.
It seems possible to write a new subclass of set that implements this magic, but as far as I see an implementation of such a class would be quite cumbersome, because it would have to know about the Node object it belongs to and if it is a predecessor or successor and would need some special methods for addition and so on, so that node_1.sucessors.add(node_2) will not call node_2.predecessors.add(node_1) and thus lead to an infinite loop.
Generating one of the two attributes on the fly (node for node in all_nodes if self in node.sucessors) should be possible, but then I need to keep track of all Nodes belonging to a graph, which is easy (adding it to a weakref.WeakSet class attribute in __init__) if I have only one graph, but using one big set for all nodes leads to large computational effort if I have multiple disjoint graphs, and I do not see how to modify the set of predecessors.
Does anybody have a good solution for this?
What if you wrap the add method in your class and then inside that wrapper method you just use the two attributes predecessors and sucessors. Something like this
That's the first solution that would come to my mind:
class Node:
def __init__(self):
self.pred = set()
self.suce = set()
def addSucessor(self, node):
self.suce.add(node)
node.pred.add(self)

Categories