Extract the object names from different nesting levels in JSON - python

I've been trying to get the solution from my former question to running here, but unfortunately without success. I'm trying right now to change the code to deliver me in result not the Ids, but the "name" values itself. JSON this is my json, I want to extract the SUB, SUBSUB and NAME and when using a quasi for-chain, I cound not get back in hierarchy to get the SUBSUB2... Could anyone please put me somehow on the right track?
The solution code from the former question:
def locateByName(e,name):
if e.get('name',None) == name:
return e
for child in e.get('children',[]):
result = locateByName(child,name)
if result is not None:
return result
return None
What I exactly want to achieve is simple list as SUB1, SUBSUB1, NAME1, NAME2, SUBSUB2, etc...

Assuming x is your JSON,
def trav(node, acc = []):
acc += [node['name']]
if 'children' in node:
for child in node['children']:
trav(child, acc)
acc = []
trav(x, acc)
print acc
Output:
['MAIN', 'SUB1', 'SUBSUB1', 'NAME1', 'NAME2', 'SUBSUB2', 'SUBSUB3']
Another, more compact solution:
from itertools import chain
def trav(node):
if 'children' in node:
return [node['name']] + list(chain.from_iterable([trav(child) for child in node['children']]))
else:
return [node['name']]
print trav(x)

Related

python list dependency based on other value in same list

I am struggling to figure how to make dependency b/w values in python list.
Basically I have a list like below. from the below list ,i can pass a input value as TABLE_VIEW
based on the input value , i want to generate dependency list in a order.
INPUT
1.EXP_TABLE_NAME_STG,TARGET_TABLE
2.SQ_TABLE_NAME,EXP_TABLE_NAME_STG
3.TABLE_VIEW,SQ_TABLE_NAME
4.EXP_TABLE_NAME_STG,LKP_NEW_TABLE_3
5.SQ_TABLE_NAME,LKP_NEW_TABLE_1
6.EXP_TABLE_NAME_STG,LKP_NEW_TABLE_2
7.LKP_NEW_TABLE_1,TARGET_TABLE
For example 3rd one value is TABLE_VIEW,SQ_TABLE_NAME, so here based on 2nd value i.e SQ_TABLE_NAME I want to find out next dependency so in this case
SQ_TABLE_NAME,EXP_TABLE_NAME_STG
SQ_TABLE_NAME,LKP_NEW_TABLE_1
again from above two , take the 2nd value and again make dependency.
EXP_TABLE_NAME_STG,LKP_NEW_TABLE_3
EXP_TABLE_NAME_STG,LKP_NEW_TABLE_2
EXP_TABLE_NAME_STG,TARGET_TABLE
LKP_NEW_TABLE_1,TARGET_TABLE
I may have up to 50 list like this, but wanted to put them in dependency order based on 2nd value.
OUTPUT:
1.TABLE_VIEW,SQ_TABLE_NAME
2.SQ_TABLE_NAME,EXP_TABLE_NAME_STG
3.SQ_TABLE_NAME,LKP_NEW_TABLE_1
4.EXP_TABLE_NAME_STG,LKP_NEW_TABLE_3
5.EXP_TABLE_NAME_STG,LKP_NEW_TABLE_2
6.EXP_TABLE_NAME_STG,TARGET_TABLE
7.LKP_NEW_TABLE_1,TARGET_TABLE
I have tried writing static query by taking multiple list variable and deleted already processed one original list, but I many never know when all values ends up. Can you please share some thoughts how to implement this dynamically?
sq_order_dependency=[]
for sq_dep in job_dependent_details:
if 'SQ' in sq_dep.split(',')[0] :
sq_order_dependency.append(sq_dep)
job_dependent_details.remove(sq_dep)
sq_order_dependency1=[]
for sq_depenent_order in sq_order_dependency:
next_dependency=sq_depenent_order.split(',')[1]
#print(next_dependency)
for job_dependent_details_list in job_dependent_details:
if next_dependency in job_dependent_details_list.split(','[0]):
#print(job_dependent_details_list)
sq_order_dependency.append(job_dependent_details_list)
for i in sq_order_dependency:
job_dependent_details.remove(i)
I would follow a different approach, i.e. build a tree-like structure of dependency pairs, and then print out the tree.
In the following code I defined a simple Dep class and chose a depth first traversal for showing the tree, both for readability; and since we meet the dependencies in an unspecified order, I used a helper dictionary. Oh, and I abbreviated the table names out of laziness :)
class Dep():
def __init__(self, name, children = None):
self.name = name
if children:
self.children = [children]
else:
self.children = []
def add_child(self, child):
self.children.append(child)
def show(self, level=0):
for c in self.children:
print ('\t'*level, self.name, c.name)
c.show(level+1)
def show_dependencies(deps):
out = {}
root = deps[0][0]
for d in y:
pname, cname = d
if cname in out:
c = out[cname]
else:
c = Dep(cname)
out[cname] = c
if pname in out:
out[pname].add_child(c)
else:
out[pname] = Dep(pname, c)
if root == cname:
root = pname
out[root].show()
>>> show_dependencies([('EXP','TARGET'),('SQ','EXP'),('TABLE','SQ'),('EXP','LKP3'),('SQ','LKP1'),('EXP','LKP2'),('LKP1','TARGET')])
TABLE SQ
SQ EXP
EXP TARGET
EXP LKP3
EXP LKP2
SQ LKP1
LKP1 TARGET
well according to your examples and data there should be no duplicates so it should work i think.
main function is based on recursive call (something like dfs)
this works only for directed edges and without self-node edge.
from collections import defaultdict
a=['EXP_TABLE_NAME_STG, TARGET_TABLE'
,'SQ_TABLE_NAME, EXP_TABLE_NAME_STG'
,'TABLE_VIEW, SQ_TABLE_NAME'
,'EXP_TABLE_NAME_STG, LKP_NEW_TABLE_3'
,'SQ_TABLE_NAME, LKP_NEW_TABLE_1'
,'EXP_TABLE_NAME_STG, LKP_NEW_TABLE_2'
,'LKP_NEW_TABLE_1, TARGET_TABLE']
a = [i.replace(" ",'') for i in a]
sql_data = [tuple(i.split(',')) for i in a]
class CustomGraphDependency:
#desired for question
def __init__(self,data:list):
self.graph = defaultdict(set) # no self edge
self.add_dependency(data)
self.count = 1
def add_dependency(self,data:list):
for node1,node2 in data: #directed edge only
self.graph[node1].add(node2)
def dependency_finder_with_count(self,node: str, sq_order_dependency: list, flag: list):
# (e.x 1.TABLE_VIEW,SQ_TABLE_NAME )
flag.append(node)
for item in self.graph[node]:
sq_order_dependency.append((self.count,node,item))
if item not in flag:
self.count+=1
self.dependency_finder_with_count(item, sq_order_dependency, flag)
return sorted(sq_order_dependency,key=lambda x: x[0])
obj_test = CustomGraphDependency(sql_data).dependency_finder_with_count('TABLE_VIEW', [], [])
for i in obj_test:
print(i)
'''
(1, 'TABLE_VIEW', 'SQ_TABLE_NAME')
(2, 'SQ_TABLE_NAME', 'EXP_TABLE_NAME_STG')
(3, 'EXP_TABLE_NAME_STG', 'LKP_NEW_TABLE_3')
(4, 'EXP_TABLE_NAME_STG', 'TARGET_TABLE')
(5, 'EXP_TABLE_NAME_STG', 'LKP_NEW_TABLE_2')
(6, 'SQ_TABLE_NAME', 'LKP_NEW_TABLE_1')
(7, 'LKP_NEW_TABLE_1', 'TARGET_TABLE')
'''

Python: How can I implement yield in my recursion?

How can I implement yield from in my recursion? I am trying to understand how to implement it but failing:
# some data
init_parent = [1020253]
df = pd.DataFrame({'parent': [1020253, 1020253],
'id': [1101941, 1101945]})
# look for parent child
def recur1(df, parents, parentChild=None, step=0):
if len(parents) != 0:
yield parents, parentChild
else:
parents = df.loc[df['parent'].isin(parents)][['id', 'parent']]
parentChild = parents['parent'].to_numpy()
parents = parents['id'].to_numpy()
yield from recur1(df=df, parents=parents, parentChild=parentChild, step=step+1)
# exec / only printing results atm
out = recur1(df, init_parent, step=0)
[x for x in out]
I'd say your biggest issue here is that recur1 isn't always guaranteed to return a generator. For example, suppose your stack calls into the else branch three times before calling into the if branch. In this case, the top three frames would be returning a generator received from the lower frame, but the lowest from would be returned from this:
yield parents, parentChild
So, then, there is a really simple way you can fix this code to ensure that yield from works. Simply transform your return from a tuple to a generator-compatible type by enclosing it in a list:
yield [(parents, parentChild)]
Then, when you call yield from recur1(df=df, parents=parents, parentChild=parentChild, step=step+1) you'll always be working with something for which yeild from makes sense.

How to implement DFS with recursive function on JSON dict of tree?

I use a recursive Depth-First-Search function to traverse a tree where each node has an index.
During traversing, I need to assign one node (whose type is dict) to a variable to further process from outer scope.
It seems that I use a useless assignment. What is the most efficient way to do that?
def dfs(json_tree, index, result):
if json_tree['index'] == index:
result = json_tree['index'] ## not work!
return
if 'children' not in json_tree:
return
for c in json_tree['children']:
dfs(c, index, result)
Try returning result instead. Note that I changed your function signature. This will also short-circuit the search as soon as index is found.
def dfs(json_tree, index):
if json_tree['index'] == index:
return json_tree['index']
if 'children' not in json_tree:
return None
for c in json_tree['children']:
result = dfs(c, index)
if result is not None:
return result
return None
Edit: Updated with a final return path in case index is never found.

Sorting through a nested list

I have a list that describes a hierarchy, as such:
[obj1, obj2, [child1, child2, [gchild1, gchild2]] onemoreobject]
Where, child1 (and others) are children of obj2, while gchild1 and 2 are children of child 2.
Each of this objects has attributes like date, for example, and I want to sort them according to such attributes. In regular list I would go like this:
sorted(obj_list, key=attrgetter('date'))
In this case, nonetheless that method wont work, since lists don't have date attribute... Even if it did, if its attribute would be different of its parent, then the hierarchical ordering would be broken. Is there a simple and elegant way to do this in python?
I think you just need to put your key in the sort(key=None) functions and this will work. I tested it with strings and it seems to work. I wasn't sure of the structure of onemoreobject. This was sorted to the beginning with obj1 and obj2. I thought that onemoreobject might represent a new hierarchy so I enclosed each hierarchy into a list to keep like objects together.
def embededsort(alist):
islist = False
temp = []
for index, obj in enumerate(alist):
if isinstance(obj,list):
islist = True
embededsort(obj)
temp.append((index,obj))
if islist:
for lists in reversed(temp):
del alist[lists[0]]
alist.sort(key=None)
for lists in temp:
alist.append(lists[1])
else:
alist.sort(key=None)
return alist
>>>l=[['obj2', 'obj1', ['child2', 'child1', ['gchild2', 'gchild1']]], ['obj22', 'obj21', ['child22', 'child21', ['gchild22', 'gchild21']]]]
>>>print(embededsort(l))
[['obj1', 'obj2', ['child1', 'child2', ['gchild1', 'gchild2']]], ['obj21', 'obj22', ['child21', 'child22', ['gchild21', 'gchild22']]]]
This is an implementation of QuickSort algorithm using the polymorphism provided by Python. It should work for ints, floats, lists, nested lists, tuples and even dictionaries
def qsort(list):
if not list: return []
first = list[0]
lesser = filter( lambda x: x < first, list[1:] )
greater = filter( lambda x: x >= first, list[1:] )
return qsort(lesser) + [first] + qsort(greater)
thanks for the answers, as they gave me quite a few ideas, and new stuff to learn from. The final code, which seems to work looks like this. Not as shor and elegant as I imagined, but works:
def sort_by_date(element_list):
last_item = None
sorted_list = []
for item in element_list:
#if item is a list recurse and store it right below last item (parent)
if type(item) == list:
if last_comparisson:
if last_comparisson == 'greater':
sorted_list.append(sort_by_date(item))
else:
sorted_list.insert(1, sort_by_date(item))
#if not a list check if it is greater or smaller then last comparisson
else:
if last_item == None or item.date > last_item:
last_comparisson = 'greater'
sorted_list.append(item)
else:
last_comparisson = 'smaller'
sorted_list.insert(0, item)
last_item = item.date
return(sorted_list)
If you want to sort all children of a node without taking into consideration those nodes which are not siblings, go for a tree structure:
class Tree:
def __init__ (self, payload):
self.payload = payload
self.__children = []
def __iadd__ (self, child):
self.__children.append (child)
return self
def sort (self, attr):
self.__children = sorted (self.__children, key = lambda x: getattr (x.payload, attr) )
for child in self.__children: child.sort (attr)
def __repr__ (self):
return '{}: {}'.format (self.payload, self.__children)

Detect last iteration over dictionary.iteritems() in python

Is there a simple way to detect the last iteration while iterating over a dictionary using iteritems()?
There is an ugly way to do this:
for i, (k, v) in enumerate(your_dict.items()):
if i == len(your_dict)-1:
# do special stuff here
But you should really consider if you need this. I am almost certain that there is another way.
as others have stated, dictionaries have no defined order, so it's hard to imagine why you would need this, but here it is
last = None
for current in your_dict.iteritems():
if last is not None:
# process last
last = current
# now last contains the last thing in dict.iteritems()
if last is not None: # this could happen if the dict was empty
# process the last item
it = spam_dict.iteritems()
try:
eggs1 = it.next()
while True:
eggs2 = it.next()
do_something(eggs1)
eggs1 = eggs2
except StopIteration:
do_final(eggs1)
Quick and quite dirty. Does it solve your issue?
I know this late, but here's how I've solved this issue:
dictItemCount = len(dict)
dictPosition = 1
for key,value in dict
if(dictPosition = dictItemCount):
print 'last item in dictionary'
dictPosition += 1
This is a special case of this broader question. My suggestion was to create an enumerate-like generator that returns -1 on the last item:
def annotate(gen):
prev_i, prev_val = 0, gen.next()
for i, val in enumerate(gen, start=1):
yield prev_i, prev_val
prev_i, prev_val = i, val
yield '-1', prev_val
Add gen = iter(gen) if you want it to handle sequences as well as generators.
I recently had this issue, I thought this was the most elegant solution because it allowed you to write for i,value,isLast in lastEnumerate(...)::
def lastEnumerate(iterator):
x = list(iterator)
for i,value in enumerate(x):
yield i,value,i==len(x)-1
For example:
for i,value,isLast in lastEnumerate(range(5)):
print(value)
if not isLast:
print(',')
The last item in a for loop hangs around after the for loop anyway:
for current_item in my_dict:
do_something(current_item)
try:
do_last(current_item)
except NameError:
print "my_dict was empty"
Even if the name "current_item" is in use before the for loop, attempting to loop over an empty dict seems to have the effect of deleting current_item, hence the NameError
You stated in an above comment that you need this to construct the WHERE clause of an SQL SELECT statement. Perhaps this will help:
def make_filter(colname, value):
if isinstance(value, str):
if '%' in value:
return "%s LIKE '%s'" % (colname, value)
else:
return "%s = '%s'" % (colname, value)
return "%s = %s" % (colname, value)
filters = {'USER_ID':'123456', 'CHECK_NUM':23459, 'CHECK_STATUS':'C%'}
whereclause = 'WHERE '+'\nAND '.join(make_filter(*x) for x in filters.iteritems())
print whereclause
which prints
WHERE CHECK_NUM = 23459
AND CHECK_STATUS LIKE 'C%'
AND USER_ID = '123456'
The approach that makes the most sense is to wrap the loop in some call which contains a hook to call your post-iteration functionality afterwards.
This could be implemented as context manager and called through a 'with' statement or, for older versions of Python, you could use the old 'try:' ... 'finally:' construct. It could also be wrapped in a class where the dictionary iteration is self dispatched (a "private" method) and the appendix code follows that in the public method. (Understanding that the distension between public vs private is a matter of intention and documentation, not enforced by Python).
Another approach is to enumerate your dict and compare the current iteration against the final one. Its easier to look at and understand in my opinion:
for n, (key, value) in enumerate(yourDict.items()):
if yourDict[n] == yourDict[-1]:
print('Found the last iteration!:', n)
OR you could just do something once the iteration is finished:
for key, value in yourDict.items():
pass
else:
print('Finished iterating over `yourDict`')
No. When using an iterator you do not know anything about the position - actually, the iterator could be infinite.
Besides that, a dictionary is not ordered. So if you need it e.g. to insert commas between the elements you should take the items, sort them and them iterate over the list of (key, value) tuples. And when iterating over this list you can easily count the number of iterations and thus know when you have the last element.

Categories