Search for dictionary value and output "address" in python - python

I have a list of dictionaries, with a list value (of dictionaries), etc etc.
filesystem = [
{
"type":"dir",
"name":"examples",
"ext":"",
"contents":[
{
"type":"file",
"name":"text_document",
"ext":"txt",
"contents":"This is a text document.\nIt has 2 lines!"
}
]
},
{
"type":"file",
"name":"helloworld",
"ext":"py",
"contents":"print(\"Hello world\")"
}
]
I need a way to search for a dictionary. For example, I want to get the examples folder. I want to write a path: /examples and search for the directory the path directs to. This needs to work with nested directories as well.
I have tried to match to a dictionary using wildcards:
target = {
"type":"dir",
"name":currentSearchDir,
"ext":"",
"contents":*
}
if currentSearch == target:
print("found")
but, of course, it doesn't work.
Thanks.

Here is a recursive search:
filesystem = [
{
"type":"dir",
"name":"examples",
"ext":"",
"contents":[
{
"type":"file",
"name":"text_document",
"ext":"txt",
"contents":"This is a text document.\nIt has 2 lines!"
}
]
},
{
"type":"file",
"name":"helloworld",
"ext":"py",
"contents":"print(\"Hello world\")"
}
]
def search(data, name):
for entry in data:
if entry['name'] == name:
return entry
if isinstance( entry['contents'], list ):
sub = search( entry['contents'], name )
if sub:
return sub
return None
print( search( filesystem, "examples" ) )
print( search( filesystem, "text_document" ) )
Output:
{'type': 'dir', 'name': 'examples', 'ext': '', 'contents': [{'type': 'file', 'name': 'text_document', 'ext': 'txt', 'contents': 'This is a text document.\nIt has 2 lines!'}]}
{'type': 'file', 'name': 'text_document', 'ext': 'txt', 'contents': 'This is a text document.\nIt has 2 lines!'}

Related

Comparing dictionary of list of dictionary/nested dictionary

There are two dict main and input, I want to validate the "input" such that all the keys in the list of dictionary and nested dictionary (if present/all keys are optional) matches that of the main if not the wrong/different key should be returned as the output.
main = "app":[{
"name": str,
"info": [
{
"role": str,
"scope": {"groups": list}
}
]
},{
"name": str,
"info": [
{"role": str}
]
}]
input_data = "app":[{
'name': 'nms',
'info': [
{
'role': 'user',
'scope': {'groups': ['xyz']
}
}]
},{
'name': 'abc',
'info': [
{'rol': 'user'}
]
}]
when compared input with main the wrong/different key should be given as output, in this case
['rol']
The schema module does exactly this.
You can catch SchemaUnexpectedTypeError to see which data doesn't match your pattern.
Also, make sure you don't use the word input as a variable name, as it's the name of a built-in function.
keys = []
def print_dict(d):
if type(d) == dict:
for val in d.keys():
df = d[val]
try:
if type(df) == list:
for i in range(0,len(df)):
if type(df[i]) == dict:
print_dict(df[i])
except AttributeError:
pass
keys.append(val)
else:
try:
x = d[0]
if type(x) == dict:
print_dict(d[0])
except:
pass
return keys
keys_input = print_dict(input)
keys = []
keys_main = print_dict(main)
print(keys_input)
print(keys_main)
for i in keys_input[:]:
if i in keys_main:
keys_input.remove(i)
print(keys_input)
This has worked for me. you can check above code snippet and if any changes provide more information so any chances if required.
Dictionary and lists compare theire content nested by default.
input_data == main should result in the right output if you format your dicts correctly. Try adding curly brackets "{"/"}" arround your dicts. It should probably look like something like this:
main = {"app": [{
"name": str,
"info": [
{
"role": str,
"scope": {"groups": list}
}
]
},{
"name": str,
"info": [
{"role": str}
]
}]}
input_data = {"app":[{
'name': 'nms',
'info': [
{
'role': 'user',
'scope': {'groups': ['xyz']
}
}]
},{
'name': 'abc',
'info': [
{'rol': 'user'}
]
}]}
input_data2 = {"app": [{
'name': 'nms',
'info': [
{
'role': 'user',
'scope': {'groups': ['xyz']
}
}]
}, {
'name': 'abc',
'info': [
{'rol': 'user'}
]
}]}
Comparision results should look like this:
input_data2 == input_data # True
main == input_data # False

How to build a nested dictionary of varying depth using for loop?

Given a Pandas table of thousands of rows, where the left most spaces of a row determine if it's a sub structure of the above row or not.
Parameter | Value
'country' 'Germany'
' city' 'Berlin'
' area' 'A1'
' city' 'Munchen'
' comment' 'a comment'
'country' 'France'
' city' 'Paris'
' comment' 'a comment'
'state' 'California'
' comment' '123'
Where I have information about if a parameter is a list or not.
{
'country': list,
'city': list
'state': list
}
I would want to create the following nested structure
{
"country": [
{
"Germany": {
"city": [
{
"Berlin": {
"area": "A1"
}
},
{
"Munchen": {
"comment": "a comment"
}
}
]
}
},
{
"France": {
"city": [
{
"Paris": {
"comment": "a comment"
}
}
]
}
}
],
"state": [
{
"California": {
"comment": 123
}
}
]
}
Since the knowledge about what level the sub structure depends on only the row before, I thought that a for loop would be good. But I am clearly missing something fundamental about creating nested dictionaries using for loops. It could be a recursive solution as well, but I am unsure if it would be easier here.
This is my current attempt which is obviously a mess.
import pandas as pd
params = ['country',' city',' area',' city',' comment','country',' city',' comment','state',' comment']
vals = ['Germany','Berlin','A1','Munich','acomment','France','Paris','acomment','California','123']
conf = {'country':'list','city':'list'}
df = pd.DataFrame()
df['param'] = params
df['vals']= vals
output_dict = dict()
level_path = dict()
for param,vals in df.values:
d = output_dict
hiearchy_level = sum( 1 for _ in itertools.takewhile(str.isspace,param)) ## Count number of left most spaces
param = param.lstrip()
if hiearchy_level > 0:
base_path = level_path[str(hiearchy_level-1)]
else:
base_path = []
path = base_path + [param]
for p in path:
if p in conf: ## It should be a list
d.setdefault(p,[{}])
d = d[p][-1] ## How to understand if I should push a new list element or enter an existing one?
else:
d.setdefault(p,{})
d = d[p]
d[param] = vals
level_path[str(hiearchy_level)] = path
and the output being
{'country': [{'country': 'France',
'city': [{'city': 'Paris',
'area': {'area': 'A1'},
'comment': {'comment': 'a comment'}}]}],
'state': {'state': 'California', 'comment': {'comment': '123'}}}
I don't understand how I should be able to step in and out of the list elements in the for loop, knowing if I should push a new dictionary or enter an already existing one.
Any input on what I am missing would be appreciated.

Custom sorting a Python list with nested dictionaries

I'm trying to sort a list of dictionaries and lists in Python that represents a file structure. I am aiming to have the list sorted so that all folders (dictionaries with a list inside of it) appear first in alphabetical order. I've taken a stab at sorting but run into a KeyError. Does anyone have a recommended solution?
Here is what I currently have:
[
{
'file_name': 'abc.txt',
'endpoint': '/code/d20cb114-b68c-11ec-b468-a063919f3f30/abc.txt'
},
{
'src': [
{
'file_name': 'jump.sql',
'endpoint': '/code/d20cb114-b68c-11ec-b468-a063919f3f30/jump.sql'
},
{
'file_name': 'test.txt',
'endpoint': '/code/d20cb114-b68c-11ec-b468-a063919f3f30/test.txt'
},
{
'file_name': 'tester.txt',
'endpoint': '/code/d20cb114-b68c-11ec-b468-a063919f3f30/tester.txt'
}
]
},
{
'test': [
{
'file_name': 'test.java',
'endpoint': '/code/d20cb114-b68c-11ec-b468-a063919f3f30/test.java'
},
{
'file_name': 'testerjunit.cpp',
'endpoint': '/code/d20cb114-b68c-11ec-b468-a063919f3f30/testerj.cpp'
}
]
},
{
'file_name': 'test.log',
'endpoint': '/code/d20cb114-b68c-11ec-b468-a063919f3f30/test.log'
}
]
And here is what I am looking to have the sorted output look like:
[
{
'src': [
{
'file_name': 'jump.sql',
'endpoint': '/code/d20cb114-b68c-11ec-b468-a063919f3f30/jump.sql'
},
{
'file_name': 'test.txt',
'endpoint': '/code/d20cb114-b68c-11ec-b468-a063919f3f30/test.txt'
},
{
'file_name': 'tester.txt',
'endpoint': '/code/d20cb114-b68c-11ec-b468-a063919f3f30/tester.txt'
}
]
},
{
'test': [
{
'file_name': 'test.java',
'endpoint': '/code/d20cb114-b68c-11ec-b468-a063919f3f30/test.java'
},
{
'file_name': 'testerjunit.cpp',
'endpoint': '/code/d20cb114-b68c-11ec-b468-a063919f3f30/testerj.cpp'
}
]
},
{
'file_name': 'abc.txt',
'endpoint': '/code/d20cb114-b68c-11ec-b468-a063919f3f30/abc.txt'
},
{
'file_name': 'test.log',
'endpoint': '/code/d20cb114-b68c-11ec-b468-a063919f3f30/test.log'
}
]
I attempted to use a lambda function to sort by the key file_name, but that gives me a KeyError as the key is not directly in every dict.
res.sort(key=lambda e: e['file_name'], reverse=True)
Where res is the list object.
Anyone know of a better way to go about doing this?
TIA!
You could do the following:
folders, files = [], []
for obj in res:
if len(obj) == 1:
folders.append(obj)
else:
files.append(obj)
folders.sort(key=lambda e: next(iter(e.keys())))
files.sort(key=lambda e: e['file_name'])
res = folders + files
This code simply separates the objects into two separate lists (where it assumes that every entry that represents a folder is an object of length one), then sorts both lists separately and finally concatenates them. The folders list is sorted based on the keys (folder names) of the single entries in the dictionaries (folder objects). Note that this approach also sorts the files that are not in folders, which could easily be avoided by removing the line files.sort(key=lambda e: e['file_name'])). Also note that this does not sort the files within folders, which could be achieved by adding the following code:
for folder in folders:
folder_name, file_names = next(iter(folder.items()))
folder[folder_name] = sorted(file_names, key=lambda e: e['file_name'])
Edit: The following function puts all this together and also allows arbitrary nesting levels:
def sort_objects(objects):
folders = list(filter(lambda o: len(o) == 1, objects))
files = list(filter(lambda o: len(o) != 1, objects))
for folder in folders:
name, inner_objects = next(iter(folder.items()))
folder[name] = sort_objects(inner_objects)
sorted_folders = sorted(folders, key=lambda e: next(iter(e.keys())))
sorted_files = sorted(files, key=lambda e: e['file_name'])
return sorted_folders + sorted_files
res = sort_objects(res)

How to parse text output separated with space

First lines are field names. Others are values but if no corresponding data, values are filled with spaces.
In particular, bindings has no values in SHORTNAMES and APIGROUP.
pods has no value in APIGROUP
$ kubectl api-resources
NAME SHORTNAMES APIGROUP NAMESPACED KIND
bindings true Binding
pods po true Pod
deployments deploy apps true Deployment
Finally, I would like to treat output data as python dict, which key is field name.
First of all, it seems to replace spaced no value with the dummy value by regex.
NAME SHORTNAMES APIGROUP NAMESPACED KIND
bindings no-value no-value true Binding
Is it possibe?
Here is a solution with regex.
import re
data = """NAME SHORTNAMES APIGROUP NAMESPACED KIND
bindings true Binding
pods po true Pod
deployments deploy apps true Deployment"""
regex = re.compile(
"(?P<name>\S+)\s+"
"(?P<shortname>\S+)\s+"
"(?P<group>\S+)\s+"
"(?P<namespace>\S+)\s+"
"(?P<kind>\S+)"
)
header = data.splitlines()[0]
for match in regex.finditer(header):
name_index = match.start('name')
shortname_index = match.start('shortname')
group_index = match.start('group')
namespace_index = match.start('namespace')
kind_index = match.start('kind')
def get_text(line, index):
result = ''
for char in line[index:]:
if char == ' ':
break
result += char
if result:
return result
else:
return "no-value"
resources = []
for line in data.splitlines()[1:]:
resources.append({
"name" : get_text(line, name_index),
"shortname": get_text(line, shortname_index),
"group": get_text(line, group_index),
"namespace": get_text(line, namespace_index),
"kind": get_text(line, kind_index)
})
print(resources)
And the output is(formatted):
[
{
'name': 'bindings',
'shortname': 'no-value',
'group': 'no-value',
'namespace': 'true',
'kind': 'Binding'
},
{
'name': 'pods',
'shortname': 'po',
'group': 'no-value',
'namespace': 'true',
'kind': 'Pod'
},
{
'name': 'deployments',
'shortname': 'deploy',
'group': 'apps',
'namespace': 'true',
'kind': 'Deployment'
}
]

Google DLP: "ValueError: Protocol message Value has no "stringValue" field."

I have a method where I build a table for multiple items for Google's DLP inspect API which can take either a ContentItem, or a table of values
Here is how the request is constructed:
def redact_text(text_list):
dlp = google.cloud.dlp.DlpServiceClient()
project = 'my-project'
parent = dlp.project_path(project)
items = build_item_table(text_list)
info_types = [{'name': 'EMAIL_ADDRESS'}, {'name': 'PHONE_NUMBER'}]
inspect_config = {
'min_likelihood': "LIKELIHOOD_UNSPECIFIED",
'include_quote': True,
'info_types': info_types
}
response = dlp.inspect_content(parent, inspect_config, items)
return response
def build_item_table(text_list):
rows = []
for item in text_list:
row = {"values": [{"stringValue": item}]}
rows.append(row)
table = {"table": {"headers": [{"name": "something"}], "rows": rows}}
return table
When I run this I get back the error ValueError: Protocol message Value has no "stringValue" field. Even though the this example and the docs say otherwise.
Is there something off in how I build the request?
Edit: Here's the output from build_item_table
{
'table':
{
'headers':
[
{'name': 'value'}
],
'rows':
[
{
'values':
[
{
'stringValue': 'My name is Jenny and my number is (555) 867-5309, you can also email me at anemail#gmail.com, another email you can reach me at is email#email.com. '
}
]
},
{
'values':
[
{
'stringValue': 'Jimbob Doe (555) 111-1233, that one place down the road some_email#yahoo.com'
}
]
}
]
}
}
Try string_value .... python uses the field names, not the type name.

Categories