Related
hi I am new to python and I am trying to filter this dictionary with its values containing an empty list or a null value. below is the dictionary called response:
response = {'name': 'py',
'title': 'Py',
'description': 'Python library and collection of scripts that automate work on MediaWiki sites', 'url': 'https://www.me',
'keywords': [],
'author': [{'name': 'team'}],
'repository': 'https://gecore',
'subtitle': None,
'id': None,
'alternates': [],
'username': None,
'deprecated': False,
'by': None,
'experimental': False,
'for': ['*'],
'icon': 'https://commons.wikimedia.org/wiki/File:Pywikibot_MW_gear_icon.svg',
'license': 'MIT',
'sponsor': [],
'available_languages': [],
'technology_used': ['python'],
'tool_type': 'framework',
'api': None,
'developer_url': [{'d_url': 'https://www.mement', 'language': 'en'}],
'user_url': [{'language': 'en', 'm_url': 'https://www.Spec'}, {'t_url': 'https://doc.media', 'language': 'en'}],
'feedback': [],
'privacy': [],
'translate_url': 'https://translate.bot',
'bugtracker_url': 'https://phabbot/',
'annotations': {
'wid': None,
'depre': False,
'by': None,
'exp': False,
'for': [],
'icon': None,
'available_languages': ['en'],
'ttype': None,
'rey': Null,
'api': None,
'dev_doc': [{'url': 'https://www.medial:t', 'language': 'en'}], 'user_url': [],
'feedback': [],
'privacy': [],
'translat': None,
'bugtracker': None},
'_schema': None,
'_language': 'en',
'origin': 'api',
'created_by': {'id': 10, 'username': 'JJMC89'},
'created_date': '2021-10-12T20:26:29.012245Z',
'modified_by': {
'id': 3,
'username': 'BD'
},
'modified_date': Null}
my code:
print([response if response.values() == [] or Null or None])
i got an error while running this on jupyter notebook. my code is trying to filter the dictionary with the condition( where the values of the dictionary has an empty list or a null value) - to create a list of dictionary that meets this condition.
The error here is that you only compare the response to the empty list [] but not to None. Basically what you are doing is asking if the value is an empty list or if None, but only that. What you want to do is separate the condition, such as
print([r for r in response.values() if not r or r is None])
The not r catches both an empty list and the boolean False.
Alternatively you can use the filter function
list(filter(lambda x: not x, response.values()))
But that will give you only the values. If you want the keys, you might want to do:
print([k for k,v in response.items() if not v])
That will output
['keywords', 'subtitle', 'id', 'alternates', 'username', 'deprecated', 'by', 'experimental', 'sponsor', 'available_languages', 'api', 'feedback', 'privacy', '_schema', 'modified_date']
If you want to also support the nested dicts, a single line might not be enough, but recursion might do it.
def get_nones(resp, result):
... for k, v in resp.items():
... if not v:
... result.append(k)
... elif isinstance(v, dict):
... get_nones(v, result)
... return result
...
>>> r = []
>>> get_nones(response, r)
['keywords', 'subtitle', 'id', 'alternates', 'username', 'deprecated', 'by', 'experimental', 'sponsor', 'available_languages', 'api', 'feedback', 'privacy', 'wid', 'depre', 'by', 'exp', 'for', 'icon', 'ttype', 'rey', 'api', 'user_url', 'feedback', 'privacy', 'translat', 'bugtracker', '_schema', 'modified_date']
by the way, this works because Python passes its lists by reference and not by value. The last line of the function return result is just to have it at the end, but there is no need to update the value during recursion as it is the same object that is being passed.
Hope it helps
Null does not exists. You can use np.nan from numpy.
response = {'name': 'py',
'title': 'Py',
'description': 'Python library and collection of scripts that automate work on MediaWiki sites', 'url': 'https://www.me',
'keywords': [],
'author': [{'name': 'team'}],
'repository': 'https://gecore',
'subtitle': None,
'id': None,
'alternates': [],
'username': None,
'deprecated': False,
'by': None,
'experimental': False,
'for': ['*'],
'icon': 'https://commons.wikimedia.org/wiki/File:Pywikibot_MW_gear_icon.svg',
'license': 'MIT',
'sponsor': [],
'available_languages': [],
'technology_used': ['python'],
'tool_type': 'framework',
'api': None,
'developer_url': [{'d_url': 'https://www.mement', 'language': 'en'}],
'user_url': [{'language': 'en', 'm_url': 'https://www.Spec'}, {'t_url': 'https://doc.media', 'language': 'en'}],
'feedback': [],
'privacy': [],
'translate_url': 'https://translate.bot',
'bugtracker_url': 'https://phabbot/',
'annotations': {
'wid': None,
'depre': False,
'by': None,
'exp': False,
'for': [],
'icon': None,
'available_languages': ['en'],
'ttype': None,
'rey': np.nan,
'api': None,
'dev_doc': [{'url': 'https://www.medial:t', 'language': 'en'}], 'user_url': [],
'feedback': [],
'privacy': [],
'translat': None,
'bugtracker': None},
'_schema': None,
'_language': 'en',
'origin': 'api',
'created_by': {'id': 10, 'username': 'JJMC89'},
'created_date': '2021-10-12T20:26:29.012245Z',
'modified_by': {
'id': 3,
'username': 'BD'
},
'modified_date': np.nan}
You must return the name of the key, because the value is uninformed.
You can do with (searching also for nested dict):
nulls = []
for key in response:
if isinstance(response[key], dict):
nested_dict = response[key]
for key_ in nested_dict:
if nested_dict[key_] in [np.nan, [], None]:
nulls.append(key_)
elif response[key] in [np.nan, [], None]:
nulls.append(key)
Output:
['keywords',
'subtitle',
'id',
'alternates',
'username',
'by',
'sponsor',
'available_languages',
'api',
'feedback',
'privacy',
'wid',
'by',
'for',
'icon',
'ttype',
'rey',
'api',
'user_url',
'feedback',
'privacy',
'translat',
'bugtracker',
'_schema',
'modified_date']
Null is not a Keyword in python.
So, you will have to convert it to None or "Null"
print([resp for resp in response.values() if resp == [] or resp == "Null" or resp == None])
Let's say that I have a Dictionary like this
dict1 = [{
'Name': 'Team1',
'id': '1',
'Members': [
{
'type': 'user',
'id': '11'
},
{
'type': 'user',
'id': '12'
}
]
},
{
'Name': 'Team2',
'id': '2',
'Members': [
{
'type': 'group'
'id': '1'
},
{
'type': 'user',
'id': '21'
}
]
},
{
'Name': 'Team3',
'id': '3',
'Members': [
{
'type': 'group'
'id': '2'
}
]
}]
and I want to get an output that can replace all the groups and nested groups with all distinct users.
In this case the output should look like this:
dict2 = [{
'Name': 'Team1',
'id': '1',
'Members': [
{
'type': 'user',
'id': '11'
},
{
'type': 'user',
'id': '12'
}
]
},
{
'Name': 'Team2',
'id': '2',
'Members': [
{
'type': 'user',
'id': '11'
},
{
'type': 'user',
'id': '12'
}
{
'type': 'user',
'id': '21'
}
]
},
{
'Name': 'Team3',
'id': '3',
'Members': [
{
'type': 'user',
'id: '11'
},
{
'type': 'user',
'id': '12'
}
{
'type': 'user',
'id': '21'
}
]
}]
Now let's assume that I have a large dataset to perform these actions on. (approx 20k individual groups)
What would be the best way to code this? I am attempting recursion, but I am not sure about how to search through the dictionary and lists in this manner such that it doesn't end up using too much memory
I do not think you need recursion. Looping is enough.
I think you can simply evaluate each Memberss, fetch users if group type, and make them unique. Then you can simply replace Members's value with distinct_users.
You might have a dictionary for groups like:
group_dict = {
'1': [
{'type': 'user', 'id': '11'},
{'type': 'user', 'id': '12'}
],
'2': [
{'type': 'user', 'id': '11'},
{'type': 'user', 'id': '12'},
{'type': 'user', 'id': '21'}
],
'3': [
{'type': 'group', 'id': '1'},
{'type': 'group', 'id': '2'},
{'type': 'group', 'id': '3'} # recursive
]
...
}
You can try:
def users_in_group(group_id):
users = []
groups_to_fetch = []
for user_or_group in group_dict[group_id]:
if user_or_group['type'] == 'group':
groups_to_fetch.append(user_or_group)
else: # 'user' type
users.append(user_or_group)
groups_fetched = set() # not to loop forever
while groups_to_fetch:
group = groups_to_fetch.pop()
if group['id'] not in groups_fetched:
groups_fetched.add(group['id'])
for user_or_group in group_dict[group['id']]:
if user_or_group['type'] == 'group' and user_or_group['id'] not in groups_fetched:
groups_to_fetch.append(user_or_group)
else: # 'user' type
users.append(user_or_group)
return users
def distinct_users_in(members):
distinct_users = []
def add(user):
if user['id'] not in user_id_set:
distinct_users.append(user)
user_id_set.add(user['id'])
user_id_set = set()
for member in members:
if member['type'] == 'group':
for user in users_in_group(member['id']):
add(user)
else: # 'user'
user = member
add(user)
return distinct_users
dict2 = dict1 # or `copy.deepcopy`
for element in dict2:
element['Members'] = distinct_users_in(element['Members'])
Each Members is re-assigned by distinct_users returned by the corresponding function.
The function takes Members and fetches users from each if member type. If user type, member itself is a user. While (fetched) users are appended to distinct_user, you can use their ids for uniquity.
When you fetch users_in_group, you can use two lists; groups_to_fetch and groups_fetched. The former is a stack to recursively fetch all groups in a group. The latter is not to fetch an already fetched group again. Or, it could loop forever.
Finally, if your data are already in memory, this approach may not exhaust memory and work.
I have a csv with 500+ rows where one column "_source" is stored as JSON. I want to extract that into a pandas dataframe. I need each key to be its own column. #I have a 1 mb Json file of online social media data that I need to convert the dictionary and key values into their own separate columns. The social media data is from Facebook,Twitter/web crawled... etc. There are approximately 528 separate rows of posts/tweets/text with each having many dictionaries inside dictionaries. I am attaching a few steps from my Jupyter notebook below to give a more complete understanding. need to turn all key value pairs for dictionaries inside dictionaries into columns inside a dataframe
Thank you so much this will be a huge help!!!
I have tried changing it to a dataframe by doing this
source = pd.DataFrame.from_dict(source, orient='columns')
And it returns something like this... I thought it might unpack the dictionary but it did not.
#source.head()
#_source
#0 {'sub_organization_id': 'default', 'uid': 'aba...
#1 {'sub_organization_id': 'default', 'uid': 'ab0...
#2 {'sub_organization_id': 'default', 'uid': 'ac0...
below is the shape
#source.shape (528, 1)
below is what the an actual "_source" row looks like stretched out. There are many dictionaries and key:value pairs where each key needs to be its own column. Thanks! The actual links have been altered/scrambled for privacy reasons.
{'sub_organization_id': 'default',
'uid': 'ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b',
'project_veid': 'default',
'campaign_id': 'default',
'organization_id': 'default',
'meta': {'rule_matcher': [{'atribs': {'website': 'github.com/res',
'source': 'Explicit',
'version': '1.1',
'type': 'crawl'},
'results': [{'rule_type': 'hashtag',
'rule_tag': 'Far',
'description': None,
'project_veid': 'A7180EA-7078-0C7F-ED5D-86AD7',
'campaign_id': '2A6DA0C-365BB-67DD-B05830920',
'value': '#Far',
'organization_id': None,
'sub_organization_id': None,
'appid': 'ray',
'project_id': 'CDE2F42-5B87-C594-C900E578C',
'rule_id': '1838',
'node_id': None,
'metadata': {'campaign_title': 'AF',
'project_title': 'AF '}}]}],
'render': [{'attribs': {'website': 'github.com/res',
'version': '1.0',
'type': 'Page Render'},
'results': [{'render_status': 'success',
'path': 'https://east.amanaws.com/rays-ime-store/renders/b/b/70f7dffb8b276f2977f8a13415f82c.jpeg',
'image_hash': 'bb7674b8ea3fc05bfd027a19815f82c',
'url': 'https://discooprdapp.com/',
'load_time': 32}]}]},
'norm_attribs': {'website': 'github.com/res',
'version': '1.1',
'type': 'crawl'},
'project_id': 'default',
'system_timestamp': '2019-02-22T19:04:53.569623',
'doc': {'appid': 'subtter',
'links': [],
'response_url': 'https://discooprdapp.com',
'url': 'https://discooprdapp.com/',
'status_code': 200,
'status_msg': 'OK',
'encoding': 'utf-8',
'attrs': {'uid': '2ab8f2651cb32261b911c990a8b'},
'timestamp': '2019-02-22T19:04:53.963',
'crawlid': '7fd95-785-4dd259-fcc-8752f'},
'type': 'crawl',
'norm': {'body': '\n',
'domain': 'discordapp.com',
'author': 'crawl',
'url': 'https://discooprdapp.com',
'timestamp': '2019-02-22T19:04:53.961283+00:00',
'id': '7fc5-685-4dd9-cc-8762f'}}
before you post make sure the actual code works for the data attached. Thanks!
The below code I tried but it did not work there was a syntax error that I could not figure out.
pd.io.json.json_normalize(source_data.[_source].apply(json.loads))
pd.io.json.json_normalize(source_data.[_source].apply(json.loads))
^
SyntaxError: invalid syntax
Whoever can help me with this will be a saint!
I had to do something like that a while back. Basically I used a function that completely flattened out the json to identify the keys that would be turned into the columns, then iterated through the json to reconstruct a row and append each row into a "results" dataframe. So with the data you provided, it created 52 column row and looking through it, looks like it included all the keys into it's own column. Anything nested, for example: 'meta': {'rule_matcher':[{'atribs': {'website': ...]} should then have a column name meta.rule_matcher.atribs.website where the '.' denotes those nested keys
data_source = {'sub_organization_id': 'default',
'uid': 'ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b',
'project_veid': 'default',
'campaign_id': 'default',
'organization_id': 'default',
'meta': {'rule_matcher': [{'atribs': {'website': 'github.com/res',
'source': 'Explicit',
'version': '1.1',
'type': 'crawl'},
'results': [{'rule_type': 'hashtag',
'rule_tag': 'Far',
'description': None,
'project_veid': 'A7180EA-7078-0C7F-ED5D-86AD7',
'campaign_id': '2A6DA0C-365BB-67DD-B05830920',
'value': '#Far',
'organization_id': None,
'sub_organization_id': None,
'appid': 'ray',
'project_id': 'CDE2F42-5B87-C594-C900E578C',
'rule_id': '1838',
'node_id': None,
'metadata': {'campaign_title': 'AF',
'project_title': 'AF '}}]}],
'render': [{'attribs': {'website': 'github.com/res',
'version': '1.0',
'type': 'Page Render'},
'results': [{'render_status': 'success',
'path': 'https://east.amanaws.com/rays-ime-store/renders/b/b/70f7dffb8b276f2977f8a13415f82c.jpeg',
'image_hash': 'bb7674b8ea3fc05bfd027a19815f82c',
'url': 'https://discooprdapp.com/',
'load_time': 32}]}]},
'norm_attribs': {'website': 'github.com/res',
'version': '1.1',
'type': 'crawl'},
'project_id': 'default',
'system_timestamp': '2019-02-22T19:04:53.569623',
'doc': {'appid': 'subtter',
'links': [],
'response_url': 'https://discooprdapp.com',
'url': 'https://discooprdapp.com/',
'status_code': 200,
'status_msg': 'OK',
'encoding': 'utf-8',
'attrs': {'uid': '2ab8f2651cb32261b911c990a8b'},
'timestamp': '2019-02-22T19:04:53.963',
'crawlid': '7fd95-785-4dd259-fcc-8752f'},
'type': 'crawl',
'norm': {'body': '\n',
'domain': 'discordapp.com',
'author': 'crawl',
'url': 'https://discooprdapp.com',
'timestamp': '2019-02-22T19:04:53.961283+00:00',
'id': '7fc5-685-4dd9-cc-8762f'}}
Code:
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
flat = flatten_json(data_source)
import pandas as pd
import re
results = pd.DataFrame()
special_cols = []
columns_list = list(flat.keys())
for item in columns_list:
try:
row_idx = re.findall(r'\_(\d+)\_', item )[0]
except:
special_cols.append(item)
continue
column = re.findall(r'\_\d+\_(.*)', item )[0]
column = re.sub(r'\_\d+\_', '.', column)
row_idx = int(row_idx)
value = flat[item]
results.loc[row_idx, column] = value
for item in special_cols:
results[item] = flat[item]
Output:
print (results.to_string())
atribs_website atribs_source atribs_version atribs_type results.rule_type results.rule_tag results.description results.project_veid results.campaign_id results.value results.organization_id results.sub_organization_id results.appid results.project_id results.rule_id results.node_id results.metadata_campaign_title results.metadata_project_title attribs_website attribs_version attribs_type results.render_status results.path results.image_hash results.url results.load_time sub_organization_id uid project_veid campaign_id organization_id norm_attribs_website norm_attribs_version norm_attribs_type project_id system_timestamp doc_appid doc_response_url doc_url doc_status_code doc_status_msg doc_encoding doc_attrs_uid doc_timestamp doc_crawlid type norm_body norm_domain norm_author norm_url norm_timestamp norm_id
0 github.com/res Explicit 1.1 crawl hashtag Far NaN A7180EA-7078-0C7F-ED5D-86AD7 2A6DA0C-365BB-67DD-B05830920 #Far NaN NaN ray CDE2F42-5B87-C594-C900E578C 1838 NaN AF AF github.com/res 1.0 Page Render success https://east.amanaws.com/rays-ime-store/render... bb7674b8ea3fc05bfd027a19815f82c https://discooprdapp.com/ 32.0 default ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b default default default github.com/res 1.1 crawl default 2019-02-22T19:04:53.569623 subtter https://discooprdapp.com https://discooprdapp.com/ 200 OK utf-8 2ab8f2651cb32261b911c990a8b 2019-02-22T19:04:53.963 7fd95-785-4dd259-fcc-8752f crawl \n discordapp.com crawl https://discooprdapp.com 2019-02-22T19:04:53.961283+00:00 7fc5-685-4dd9-cc-8762f
I need help :P
I have this code and...
lista_final = [] #storethe difference beetween this two lists
lista1 = (
{
'ip': '127.0.0.1',
'hostname': 'abc',
'state': 'open',
'scan_id': '2'
},
{
'ip': '127.0.0.2',
'hostname': 'bca',
'state': 'closed',
'scan_id': '2'
}
)
lista2 = (
{
'ip': '127.0.0.1',
'hostname': 'abc',
'state': 'closed',
'scan_id': '3'
},
{
'ip': '127.0.0.3',
'hostname': 'qwe',
'state': 'open',
'scan_id': '3'
},
{
'ip': '127.0.0.2',
'hostname': 'xxx',
'state': 'up',
'scan_id': '3'
},
)
And i need to find the difference beetween then.
So i make this code
for l1 in lista1:
for l2 in lista2:
if l1['ip'] == l2['ip']: #if ip is equal
ip = l1['ip'] #store ip
hostname = l1['hostname'] #default hostname
if l1['hostname'] != l2['hostname']: #if hostnames are differente, store
hostname = '({scan_id_l1}:{valuel1}) != ({scan_id_l2}:{valuel2})'.format(scan_id_l1=l1['scan_id'], valuel1=l1['hostname'], scan_id_l2=l2['scan_id'], valuel2=l2['hostname'])
state = l1['state'] #default state
if l1['state'] != l2['state']: #if states are differente, store
state = '({scan_id_l1}:{valuel1}) != ({scan_id_l2}:{valuel2})'.format(scan_id_l1=l1['scan_id'], valuel1=l1['state'], scan_id_l2=l2['scan_id'], valuel2=l2['state'])
# create a temp dict
tl = {
'ip': ip,
'hostname': hostname,
'state': state
}
#append the temp dict to lista_final
lista_final.append(tl)
break #okok, go next
print(lista_final)
and my output is
[
{
'ip': '127.0.0.1',
'hostname': 'abc',
'state': '(2:open) != (3:closed)'
},
{
'ip': '127.0.0.2',
'hostname': '(2:bca) != (3:xxx)',
'state': '(2:closed) != (3:up)'
}
]
note that in list2 there is an ip '127.0.0.3' that does not appear in the lista_final and the result I want is this:
[
{
'ip': '127.0.0.1',
'hostname': 'abc',
'state': '(2:open) != (3:closed)'
},
{
'ip': '127.0.0.2',
'hostname': '(2:bca) != (3:xxx)',
'state': '(2:closed) != (3:up)'
},
{
'ip': '127.0.0.3',
'hostname': '(2:NOT EXIST) != (3:qwe)',
'state': '(2:NOT EXIST) != (3:open)'
}
]
Can you help me with a best solution?
Let's first clean up a bit your solution
#let's make this tuple lists
lista1 = list(lista1)
lista2 = list(lista2)
#let's sort them by ip
lista1.sort( key = lambda d : d['ip'] )
lista2.sort( key = lambda d : d['ip'] )
for dd in zip(lista1,lista2):
for k, v in dd[0].iteritems():
if( v != dd[1].get(k) and k != 'scan_id' ):
dd[0][k] = "({}:{}) != ({}:{})".format( dd[0]['scan_id'], v, dd[1]['scan_id'], dd[1].get(k))
dd[0].pop('scan_id')
lista_final.append(dd[0])
This does pretty much the same of your code, is only doing it in place and in a more pythonic spirit. This is the output:
[
{
'hostname': 'abc',
'ip': '127.0.0.1',
'state': '(2:open) != (3:closed)'
},
{
'hostname': '(2:bca) != (3:xxx)',
'ip': '127.0.0.2',
'state': '(2:closed) != (3:up)'
}
]
You want to cover the edge case where one list is longer than the other, you can simply compare them and then repeat the operation like following
longer_lista = lista1 if lista1>lista2 else lista2
#iterating only through the dictionaries in the longer part of the list
for d in longer_lista[ len( zip( lista1, lista2 ) ) : ]:
for k,v in d.iteritems():
if( k != 'ip' and k!='scan_id' ):
d[k] = "({}) != ({}:{})".format( "NOT EXISTS", lista2[0]['scan_id'], v )
lista_final.append( d )
This will give you an output more or less as you were expecting. Surely the code doesn't cover all the possible edge cases but is a good starting point.
[
{
'hostname': 'abc',
'ip': '127.0.0.1',
'state': '(2:open) != (3:closed)'
},
{
'hostname': '(2:bca) != (3:xxx)',
'ip': '127.0.0.2',
'state': '(2:closed) != (3:up)'
}
{
'hostname': '(NOT EXISTS) != (3:qwe)',
'ip': '127.0.0.3',
'scan_id': '3',
'state': '(NOT EXISTS) != (3:open)'
}
]
I have used some functions that you posted as answer here and i was able to resolve the problem!
Thanks.
Here's the commented solution.
The function
def search_diffences(list_one, list_two):
# Store the result
list_final = []
# Sort by IP
list_one.sort( key = lambda d : d['ip'] )
list_two.sort( key = lambda d : d['ip'] )
# Find the bigger list
bigger_list = list_one
smaller_list = list_two
if len(list_two) > len(list_one):
bigger_list = list_two
smaller_list = list_one
# Start the for inside for
for lo in bigger_list:
found = False # Store if the result was found
pop_i = 0 # Control what dict will be remove in the smaller_list (For code optimization)
for lt in smaller_list:
print("lo['ip']({0}) == lt['ip']({1}): {2}".format(lo['ip'], lt['ip'], lo['ip'] == lt['ip'])) # For debug
if lo['ip'] == lt['ip']: # If the biggest_list as lo ['ip'] was found in smaller_list
found = True # Set found as True
# Store scan_id because will be used more than one time
scan_id_lo = lo['scan_id']
scan_id_lt = lt['scan_id']
# Create a temp list for add in list_final
ip = lo['ip']
# Struct how i want the result
hostname = lo['hostname']
if lo['hostname'] != lt['hostname']:
hostname = '({SCAN_ID_LO}:{VALUE_LO}) != ({SCAN_ID_LT}:{VALUE_LT})'.format(
SCAN_ID_LO=scan_id_lo,
SCAN_ID_LT=scan_id_lt,
VALUE_LO=lo['hostname'],
VALUE_LT=lt['hostname']
)
state = lo['state']
if lo['state'] != lt['state']:
state = '({SCAN_ID_LO}:{VALUE_LO}) != ({SCAN_ID_LT}:{VALUE_LT})'.format(
SCAN_ID_LO=scan_id_lo,
SCAN_ID_LT=scan_id_lt,
VALUE_LO=lo['state'],
VALUE_LT=lt['state']
)
# Create the temp_list
temp_list = {
'ip': ip,
'hostname': hostname,
'state': state
}
# Append temp_list in list_final
list_final.append(temp_list)
# Pop the value because so, the next for of bigger_list does not go through the first value of the smaller list again
smaller_list.pop(pop_i)
# Go to Next value of bigger_list
break
# pop_i++ because if the smaller list does not hit == in the first attempt, then it pops in pop_i value
pop_i += 1
print(found) # Debug
if not found: # If the value of bigger list doesnt exist in smaller_list, append to list_final anyway
scan_id_lo = lo['scan_id']
scan_id_lt = lt['scan_id']
ip = lo['ip']
print("This was not found, adding to list_final", ip)
hostname = '({SCAN_ID_LO}:{VALUE_LO}) != ({SCAN_ID_LT}:{VALUE_LT})'.format(
SCAN_ID_LO=scan_id_lo,
SCAN_ID_LT=scan_id_lt,
VALUE_LO='NOT EXIST',
VALUE_LT=lo['hostname']
)
state = '({SCAN_ID_LO}:{VALUE_LO}) != ({SCAN_ID_LT}:{VALUE_LT})'.format(
SCAN_ID_LO=scan_id_lo,
SCAN_ID_LT=scan_id_lt,
VALUE_LO='NOT EXIST',
VALUE_LT=lo['state']
)
temp_list = {
'ip': ip,
'hostname': hostname,
'state': state
}
list_final.append(temp_list)
# bigger_list.pop(0)
# If smaller_list still have elements
for lt in smaller_list:
scan_id_lt = lt['scan_id']
ip = lt['ip']
hostname = '({SCAN_ID_LO}:{VALUE_LO}) != ({SCAN_ID_LT}:{VALUE_LT})'.format(
SCAN_ID_LO='NOT EXIST',
SCAN_ID_LT=scan_id_lt,
VALUE_LO='NOT EXIST',
VALUE_LT=lt['hostname']
)
state = '({SCAN_ID_LO}:{VALUE_LO}) != ({SCAN_ID_LT}:{VALUE_LT})'.format(
SCAN_ID_LO='NOT EXIST',
SCAN_ID_LT=scan_id_lt,
VALUE_LO='NOT EXIST',
VALUE_LT=lt['state']
)
temp_list = {
'ip': ip,
'hostname': hostname,
'state': state
}
list_final.append(temp_list) # Simple, append
return list_final
The main code and the lists
# First List
list_one = [
{
'ip': '127.0.0.1',
'hostname': 'abc',
'state': 'open',
'scan_id': '2'
},
{
'ip': '127.0.0.2',
'hostname': 'bca',
'state': 'closed',
'scan_id': '2'
},
{
'ip': '100.0.0.4',
'hostname': 'ddd',
'state': 'closed',
'scan_id': '2'
},
{
'ip': '100.0.0.1',
'hostname': 'ggg',
'state': 'up',
'scan_id': '2'
},
]
# Second List
list_two = [
{
'ip': '127.0.0.1',
'hostname': 'abc',
'state': 'closed',
'scan_id': '3'
},
{
'ip': '127.0.0.3',
'hostname': 'qwe',
'state': 'open',
'scan_id': '3'
},
{
'ip': '127.0.0.2',
'hostname': 'xxx',
'state': 'up',
'scan_id': '3'
},
{
'ip': '10.0.0.1',
'hostname': 'ddd',
'state': 'open',
'scan_id': '3'
},
{
'ip': '100.0.0.1',
'hostname': 'ggg',
'state': 'down',
'scan_id': '3'
},
]
print(search_diffences(list_one, list_two))
Resulting in
[{
'ip': '10.0.0.1',
'hostname': '(3:NOT EXIST) != (2:ddd)',
'state': '(3:NOT EXIST) != (2:open)'
}, {
'ip': '100.0.0.1',
'hostname': 'ggg',
'state': '(3:down) != (2:up)'
}, {
'ip': '127.0.0.1',
'hostname': 'abc',
'state': '(3:closed) != (2:open)'
}, {
'ip': '127.0.0.2',
'hostname': '(3:xxx) != (2:bca)',
'state': '(3:up) != (2:closed)'
}, {
'ip': '127.0.0.3',
'hostname': '(3:NOT EXIST) != (2:qwe)',
'state': '(3:NOT EXIST) != (2:open)'
}, {
'ip': '100.0.0.4',
'hostname': '(NOT EXIST:NOT EXIST) != (2:ddd)',
'state': '(NOT EXIST:NOT EXIST) != (2:closed)'
}]
I am having difficulty creating a function that will produce a family tree in JSON format.
An example of a two parent, two offspring tree can be seen here:
{
"children": [
{
"id": 409,
"name": "Joe Bloggs",
"no_parent": "true"
},
{
"children": [
{
"children": [],
"id": 411,
"name": "Alice Bloggs"
},
{
"children": [],
"id": 412,
"name": "John Bloggs"
}
],
"hidden": "true",
"id": "empty_node_id_9",
"name": "",
"no_parent": "true"
},
{
"children": [],
"id": 410,
"name": "Sarah Smith",
"no_parent": "true"
}
],
"hidden": "true",
"id": "year0",
"name": ""
}
Joe Bloggs is married to Sarah Smith, with children Alice Bloggs and John Bloggs. The empty nodes exist purely to handle vertices in the tree-map diagram (see jsfiddle below).
The above example should help explain the syntax. A more complex tree can be found on this jsfiddle: http://jsfiddle.net/cyril123/0vbtvoon/22/
The JSON associated with the jsfiddle can be found from lines 34 to lines 101.
I am having difficulty writing a function that recursively produces the JSON for a family tree. I begin with a person class that represents the oldest member of the family. The function would then checks for marriages, for children etc and continues until the tree is complete, returning the json.
My code involves a person class as well as an associated marriage class. I have appropriate methods such as ids for each person, get_marriage() function, get_children() methods etc. I am wondering the best way to go about this is.
My attempt at a recursive function can be found below. The methods/functions involved etc are not detailed but their purpose should be self-explanatory. Many thanks.
def root_nodes(people, first_node=False): #begin by passing in oldest family member and first_node=True
global obj, current_obj, people_used
if obj is not None: print len(str(obj))
if type(people) != list:
people = [people]
for x in people:
if x in rootPeople and first_node: #handles the beginning of the JSON with an empty 'root' starting node.
first_node = False
obj = {'name': "", 'id': 'year0', 'hidden': 'true', 'children': root_nodes(people)}
return obj
else:
marriage_info = get_marriage(x)
if marriage_info is None: #if person is not married
current_obj = {'name': x.get_name(), 'id': x.get_id(), 'children': []}
people_used.append(x)
else:
partners = marriage_info.get_members()
husband, wife = partners[0].get_name(), partners[1].get_name()
husband_id, wife_id = marriage_info.husband.get_id(), marriage_info.wife.get_id()
marriage_year = marriage_info.year
children = marriage_info.get_children()
people_used.append(partners[0])
people_used.append(partners[1])
if partners[0].get_parents() == ['None', 'None'] or partners[1].get_parents() == ['None', 'None']:
if partners[0].get_parents() == ['None', 'None'] and partners[1].get_parents() == ['None', 'None']:
current_obj = {'name': str(husband), 'id': husband_id, 'no_parent': 'true'}, {'name': '', 'id': 'empty_node_id_' + empty_node(), 'no_parent': 'true', 'hidden': 'true', 'children': root_nodes(children)}, {'name': str(wife), 'id': wife_id, 'no_parent': 'true', 'children': []}
if partners[0].get_parents() == ['None', 'None'] and partners[1].get_parents() != ['None', 'None']:
current_obj = {'name': str(husband), 'id': husband_id, 'no_parent': 'true'}, {'name': '', 'id': 'empty_node_id_' + empty_node(), 'no_parent': 'true', 'hidden': 'true', 'children': root_nodes(children)}, {'name': str(wife), 'id': wife_id, 'children': []}
if partners[0].get_parents() != ['None', 'None'] and partners[1].get_parents() == ['None', 'None']:
current_obj = {'name': str(husband), 'id': husband_id}, {'name': '', 'id': 'empty_node_id_' + empty_node(), 'no_parent': 'true', 'hidden': 'true', 'children': root_nodes(children)}, {'name': str(wife), 'id': wife_id, 'no_parent': 'true', 'children': []}
else:
if not any((True for x in partners[0].get_parents() if x in people_used)):
current_obj = {'name': str(husband), 'id': husband_id, 'no_parent' : 'true'}, {'name': '', 'id': 'empty_node_id_' + empty_node(), 'no_parent': 'true', 'hidden': 'true', 'children': root_nodes(children)}, {'name': str(wife), 'id': wife_id, 'children': []}
elif not any((True for x in partners[1].get_parents() if x in people_used)):
current_obj = {'name': str(husband), 'id': husband_id}, {'name': '', 'id': 'empty_node_id_' + empty_node(), 'no_parent': 'true', 'hidden': 'true', 'children': root_nodes(children)}, {'name': str(wife), 'id': wife_id, 'no_parent': 'true', 'children': []}
else:
current_obj = {'name': str(husband), 'id': husband_id}, {'name': '', 'id': 'empty_node_id_' + empty_node(), 'no_parent': 'true', 'hidden': 'true', 'children': root_nodes(children)}, {'name': str(wife), 'id': wife_id, 'children': []}
return current_obj
if obj is None:
obj = current_obj
else:
obj = obj, current_obj
if people.index(x) == len(people)-1:
return obj
Even though the function above is badly written - it is almost successful. The only instance where it fails is if one child is married, then the other children are missed out from the JSON. This is because obj is returned without going to the next iteration in the for loop. Any suggestions on how to fix this would be appreciated.