Python parse JSON file - python

{
"Logging": {
"LogLevel": {
"Default": "Information",
"Microsoft": "Warning",
"Microsoft.Hosting.Lifetime": "Information",
"Microsoft.AspNetCore": "Warning",
"System.Net.Http.HttpClient.Default.ClientHandler": "Warning",
"System.Net.Http.HttpClient.Default.LogicalHandler": "Warning"
}
},
"AllowedHosts": "*",
"AutomaticTransferOptions": {
"DateOffsetForDirectoriesInDays": -1,
"DateOffsetForPortfoliosInDays": -3,
"Clause": {
"Item1": "1"
}
},
"Authentication": {
"ApiKeys": [
{
"Key": "AB8E5976-2A7C-4EEE-92C1-7B0B4DC840F6",
"OwnerName": "Cron job",
"Claims": [
{
"Type": "http://schemas.microsoft.com/ws/2008/06/identity/claims/role",
"Value": "StressTestManager"
}
]
},
{
"Key": "B11D4F27-483A-4234-8EC7-CA121712D5BE",
"OwnerName": "Test admin",
"Claims": [
{
"Type": "http://schemas.microsoft.com/ws/2008/06/identity/claims/role",
"Value": "StressTestAdmin"
},
{
"Type": "http://schemas.microsoft.com/ws/2008/06/identity/claims/role",
"Value": "TestManager"
}
]
},
{
"Key": "EBF98F2E-555E-4E66-9D77-5667E0AA1B54",
"OwnerName": "Test manager",
"Claims": [
{
"Type": "http://schemas.microsoft.com/ws/2008/06/identity/claims/role",
"Value": "TestManager"
}
]
}
],
"LDAP": {
"Domain": "domain.local",
"MachineAccountName": "Soft13",
"MachineAccountPassword": "vixuUEY7884*",
"EnableLdapClaimResolution": true
}
},
"Authorization": {
"Permissions": {
"Roles": [
{
"Role": "TestAdmin",
"Permissions": [
"transfers.create",
"bindings.create"
]
},
{
"Role": "TestManager",
"Permissions": [
"transfers.create"
]
}
]
}
}
}
I have JSON above and need to parse it with output like this
Logging__LogLevel__Default
Authentication__ApiKeys__0__Claims__0__Type
Everything is ok, but I always get some strings with this output
Authentication__ApiKeys__0__Key
Authentication__ApiKeys__0__OwnerName
Authentication__ApiKeys__0__Claims__0__Type
Authentication__ApiKeys__0__Claims__0__Value
Authentication__ApiKeys__0__Claims__0
Authentication__ApiKeys__2
Authorization__Permissions__Roles__0__Role
Authorization__Permissions__Roles__0__Permissions__1
Authorization__Permissions__Roles__1__Role
Authorization__Permissions__Roles__1__Permissions__0
Authorization__Permissions__Roles__1
Why does my code adds not full strings like
Authentication__ApiKeys__0__Claims__0
Authentication__ApiKeys__2
Authorization__Permissions__Roles__1
And why it doesn't print every value from
Authorization__Permissions__Roles__0__Permissions__*
and from
Authorization__Permissions__Roles__1__Permissions__*
I have this code in python3:
def checkdepth(sub_key, variable):
delmt = '__'
for item in sub_key:
try:
if isinstance(sub_key[item], dict):
sub_variable = variable + delmt + item
checkdepth(sub_key[item], sub_variable)
except TypeError:
continue
if isinstance(sub_key[item], list):
sub_variable = variable + delmt + item
for it in sub_key[item]:
sub_variable = variable + delmt + item + delmt + str(sub_key[item].index(it))
checkdepth(it, sub_variable)
print(sub_variable)
if isinstance(sub_key[item], int) or isinstance(sub_key[item], str):
sub_variable = variable + delmt + item
print (sub_variable)
for key in data:
if type(data[key]) is str:
print(key + '=' +str(data[key]))
else:
variable = key
checkdepth(data[key], variable)
I know that the problem in block where I process list data type, but I don't know where is the problem exactly

Use a recursive generator:
import json
with open('input.json') as f:
data = json.load(f)
def strkeys(data):
if isinstance(data,dict):
for k,v in data.items():
for item in strkeys(v):
yield f'{k}__{item}' if item else k
elif isinstance(data,list):
for i,v in enumerate(data):
for item in strkeys(v):
yield f'{i}__{item}' if item else str(i)
else:
yield None # termination condition, not a list or dict
for s in strkeys(data):
print(s)
Output:
Logging__LogLevel__Default
Logging__LogLevel__Microsoft
Logging__LogLevel__Microsoft.Hosting.Lifetime
Logging__LogLevel__Microsoft.AspNetCore
Logging__LogLevel__System.Net.Http.HttpClient.Default.ClientHandler
Logging__LogLevel__System.Net.Http.HttpClient.Default.LogicalHandler
AllowedHosts
AutomaticTransferOptions__DateOffsetForDirectoriesInDays
AutomaticTransferOptions__DateOffsetForPortfoliosInDays
AutomaticTransferOptions__Clause__Item1
Authentication__ApiKeys__0__Key
Authentication__ApiKeys__0__OwnerName
Authentication__ApiKeys__0__Claims__0__Type
Authentication__ApiKeys__0__Claims__0__Value
Authentication__ApiKeys__1__Key
Authentication__ApiKeys__1__OwnerName
Authentication__ApiKeys__1__Claims__0__Type
Authentication__ApiKeys__1__Claims__0__Value
Authentication__ApiKeys__1__Claims__1__Type
Authentication__ApiKeys__1__Claims__1__Value
Authentication__ApiKeys__2__Key
Authentication__ApiKeys__2__OwnerName
Authentication__ApiKeys__2__Claims__0__Type
Authentication__ApiKeys__2__Claims__0__Value
Authentication__LDAP__Domain
Authentication__LDAP__MachineAccountName
Authentication__LDAP__MachineAccountPassword
Authentication__LDAP__EnableLdapClaimResolution
Authorization__Permissions__Roles__0__Role
Authorization__Permissions__Roles__0__Permissions__0
Authorization__Permissions__Roles__0__Permissions__1
Authorization__Permissions__Roles__1__Role
Authorization__Permissions__Roles__1__Permissions__0

Using json_flatten this can be converted to pandas, but it's not clear if that's what you want. Also, when you do convert it can use df.iloc[0] to see why each column is being provided (ie you see the value for that key).
Note: you need to pass a list so I just wrapped your json above in [].
# https://github.com/amirziai/flatten
dic = your json from above
dic =[dic] # put it in a list
dic_flattened = (flatten(d, '__') for d in dic) # add your delimiter
df = pd.DataFrame(dic_flattened)
df.iloc[0]
Logging__LogLevel__Default Information
Logging__LogLevel__Microsoft Warning
Logging__LogLevel__Microsoft.Hosting.Lifetime Information
Logging__LogLevel__Microsoft.AspNetCore Warning
Logging__LogLevel__System.Net.Http.HttpClient.Default.ClientHandler Warning
Logging__LogLevel__System.Net.Http.HttpClient.Default.LogicalHandler Warning
AllowedHosts *
AutomaticTransferOptions__DateOffsetForDirectoriesInDays -1
AutomaticTransferOptions__DateOffsetForPortfoliosInDays -3
AutomaticTransferOptions__Clause__Item1 1
Authentication__ApiKeys__0__Key AB8E5976-2A7C-4EEE-92C1-7B0B4DC840F6
Authentication__ApiKeys__0__OwnerName Cron job
Authentication__ApiKeys__0__Claims__0__Type http://schemas.microsoft.com/ws/2008/06/identi...
Authentication__ApiKeys__0__Claims__0__Value StressTestManager
Authentication__ApiKeys__1__Key B11D4F27-483A-4234-8EC7-CA121712D5BE
Authentication__ApiKeys__1__OwnerName Test admin
Authentication__ApiKeys__1__Claims__0__Type http://schemas.microsoft.com/ws/2008/06/identi...
Authentication__ApiKeys__1__Claims__0__Value StressTestAdmin
Authentication__ApiKeys__1__Claims__1__Type http://schemas.microsoft.com/ws/2008/06/identi...
Authentication__ApiKeys__1__Claims__1__Value TestManager
Authentication__ApiKeys__2__Key EBF98F2E-555E-4E66-9D77-5667E0AA1B54
Authentication__ApiKeys__2__OwnerName Test manager
Authentication__ApiKeys__2__Claims__0__Type http://schemas.microsoft.com/ws/2008/06/identi...
Authentication__ApiKeys__2__Claims__0__Value TestManager
Authentication__LDAP__Domain domain.local
Authentication__LDAP__MachineAccountName Soft13
Authentication__LDAP__MachineAccountPassword vixuUEY7884*
Authentication__LDAP__EnableLdapClaimResolution true
Authorization__Permissions__Roles__0__Role TestAdmin
Authorization__Permissions__Roles__0__Permissions__0 transfers.create
Authorization__Permissions__Roles__0__Permissions__1 bindings.create
Authorization__Permissions__Roles__1__Role TestManager
Authorization__Permissions__Roles__1__Permissions__0 transfers.create

Ok, I looked at your code and it's hard to follow. You're variable and function names are not easy to understand their purpose. Which is fine cause everyone has to learn best practice and all the little tips and tricks in python. So hopefully I can help you out.
You have a recursive-ish function. Which is definingly the best way to handle a situation like this. However your code is part recursive and part not. If you go recursive to solve a problem you have to go 100% recursive.
Also the only time you should print in a recursive function is for debugging. Recursive functions should have an object that is passed down the function and gets appended to or altered and then passed back once it gets to the end of the recursion.
When you get a problem like this, think about which data you actually need or care about. In this problem we don't care about the values that are stored in the object, we just care about the keys. So we should write code that doesn't even bother looking at the value of something except to determine its type.
Here is some code I wrote up that should work for what you're wanting to do. But take note that because I did purely a recursive function my code base is small. Also my function uses a list that is passed around and added to and then at the end I return it so that we can use it for whatever we need. If you have questions just comment on this question and I'll answer the best I can.
def convert_to_delimited_keys(obj, parent_key='', delimiter='__', keys_list=None):
if keys_list is None: keys_list = []
if isinstance(obj, dict):
for k in obj:
convert_to_delimited_keys(obj[k], delimiter.join((parent_key, str(k))), delimiter, keys_list)
elif isinstance(obj, list):
for i, _ in enumerate(obj):
convert_to_delimited_keys(obj[i], delimiter.join((parent_key, str(i))), delimiter, keys_list)
else:
# Append to list, but remove the leading delimiter due to string.join
keys_list.append(parent_key[len(delimiter):])
return keys_list
for item in convert_to_delimited_keys(data):
print(item)

Related

python dict recursion returns empty list

I have this dictionary that I am trying to iterate through recursively. When I hit a matching node match I want to return that node which is a list.
Currently with my code I keep on getting an empty list. I have stepped through the code and I see my check condition being hit, but the recursion still returns an empty value. what am I doing wrong here? thanks
dictionary data:
{
"apiVersion": "v1",
"kind": "Deployment",
"metadata": {
"name": "cluster",
"namespace": "namespace",
},
"spec": {
"template": {
"metadata": {
"labels": {
"app": "flink",
"cluster": "repo_name-cluster",
"component": "jobmanager",
"track": "prod",
}
},
"spec": {
"containers": [
{
"name": "jobmanager",
"image": "IMAGE_TAG_",
"imagePullPolicy": "Always",
"args": ["jobmanager"],
"resources": {
"requests": {"cpu": "100.0", "memory": "100Gi"},
"limits": {"cpu": "100.0", "memory": "100Gi"},
},
"env": [
{
"name": "ADDRESS",
"value": "jobmanager-prod",
},
{"name": "HADOOP_USER_NAME", "value": "yarn"},
{"name": "JOB_MANAGER_MEMORY", "value": "1000m"},
{"name": "HADOOP_CONF_DIR", "value": "/etc/hadoop/conf"},
{
"name": "TRACK",
"valueFrom": {
"fieldRef": {
"fieldPath": "metadata.labels['track']"
}
},
},
],
}
]
},
},
},
}
code:
test = iterdict(data, "env")
print(test)
def iterdict(data, match):
output = []
if not isinstance(data, str):
for k, v in data.items():
print("key ", k)
if isinstance(v, dict):
iterdict(v, match)
elif isinstance(v, list):
if k.lower() == match.lower():
# print(v)
output += v
return output
else:
for i in v:
iterdict(i, match)
return output
expected return value:
[{'name': 'JOB_MANAGER_RPC_ADDRESS', 'value': 'repo_name-cluster-jobmanager-prod'}, {'name': 'HADOOP_USER_NAME', 'value': 'yarn'}, {'name': 'JOB_MANAGER_MEMORY', 'value': '1000m'}, {'name': 'HADOOP_CONF_DIR', 'value': '/etc/hadoop/conf'}, {'name': 'TRACK', 'valueFrom': {...}}]
When you recurse to iterdict, you're simply throwing away the return value. Thus, since every value in the top level of your dictionary is either a string or a dict, you will end up just returning an empty list.
You probably want to append the recursive outputs:
output += iterdict(v, match)
and
output += iterdict(i, match)
However, this is potentially inefficient as you will build a lot of intermediate lists. A better strategy might be to make your function a generator; the name iterdict would suggest this anyway. To do so, get rid of your output variable and the return statements, and use yield instead:
yield from iterdict(v, match)
yield from v
yield from iterdict(i, match)
and then, at the top level, you can just iterate over your results:
for value in iterdict(data, "env"):
...
or, if you really need a list, collect the generator output into a list:
test = list(iterdata(data, "env"))
This will likely be faster (no intermediate lists) and more Pythonic.
You are not updating the output to output list when you are running it recursively.
You can either append the output or use yield keyword to make use of generators in python. Return creates temporary lists which are memry intensive and impedes performance when you are running it recursively. Thats why use generators.
def iterdict(data, match):
if isinstance(data, str):
return []
for k, v in data.items():
if isinstance(v, dict):
yield from iterdict(v, match)
elif isinstance(v, list):
if k.lower() == match.lower():
yield from v
for i in v:
yield from iterdict(i, match)
test = list(iterdict(data, "env"))
print(test)

How to return specific key value pair from nested json without knowing location?

I want to get the value of a specific key in a nested json file, without knowing the exact location. So basically looking through all the keys (and nested keys) until it finds the match, and return a dictionary {match: "value"}
Nested json_data:
{
"$id": "1",
"DataChangedEntry": {
"$id": "2",
"PathProperty": "/",
"Metadata": null,
"PreviousValue": null,
"CurrentValue": {
"CosewicWsRefId": {
"Value": "QkNlrjq2HL9bhTQqU8-qH"
},
"Date": {
"Value": "2022-05-20T00:00:00Z"
},
"YearSentToMinister": {
"Value": "0001-01-01T00:00:00"
},
"DateSentToMinister": {
"Value": "0001-01-01T00:00:00"
},
"Order": null,
"Type": {
"Value": "REGULAR"
},
"ReportType": {
"Value": "NEW"
},
"Stage": {
"Value": "ASSESSED"
},
"State": {
"Value": "PUBLISHED"
},
"StatusAndCriteria": {
"Status": {
"Value": "EXTINCT"
},
"StatusComment": {
"EnglishText": null,
"FrenchText": null
},
"StatusChange": {
"Value": "NOT_INITIALIZED"
},
"StatusCriteria": {
"EnglishText": null,
"FrenchText": null
},
"ApplicabilityOfCriteria": {
"ApplicabilityCriteriaList": []
}
},
"Designation": null,
"Note": null,
"DomainEvents": [],
"Version": {
"Value": 1651756761385.1248
},
"Id": {
"Value": "3z3XlCkaXY9xinAbK5PrU"
},
"CreatedAt": {
"Value": 1651756761384
},
"ModifiedAt": {
"Value": 1651756785274
},
"CreatedBy": {
"Value": "G#a"
},
"ModifiedBy": {
"Value": "G#a"
}
}
},
"EventAction": "Create",
"EventDataChange": {
"$ref": "2"
},
"CorrelationId": "3z3XlCkaXY9xinAbK5PrU",
"EventId": "WGxlewsUAHayLHZ2LHvFk",
"EventTimeUtc": "2022-05-06T13:15:31.7463355Z",
"EventDataVersion": "1.0.0",
"EventType": "AssessmentCreatedInfrastructure"
}
Desired return is the value from json_data["DataChangedEntry"]["CurrentValue"]["Date"]["Value"]:
"2022-05-20T00:00:00Z"
So far I've tried a recursive function but it keeps return None:
match_dict = {}
def recursive_json(data,attr,m_dict):
for k,v in data.items():
if k == attr:
for k2,v2 in v.items():
m_dict = {attr, v2}
print('IF: ',m_dict)
return m_dict
elif isinstance(v,dict):
return recursive_json(v,attr,m_dict)
print('RETURN: ',recursive_json(json_data, "Date", match_dict))
Output:
RETURN: None
I tried removing the second return statement, and it now prints the value I want in the function, but still returns None:
match_dict = {}
def recursive_json(data,attr,m_dict):
for k,v in data.items():
if k == attr:
for k2,v2 in v.items():
m_dict = {attr, v2}
print('IF: ',m_dict)
return m_dict
elif isinstance(v,dict):
recursive_json(v,attr,m_dict)
print('RETURN: ',recursive_json(json_data, "Date", match_dict))
Output:
IF: {'Date', '2022-05-20T00:00:00Z'}
RETURN: None
I don't get why it keeps returning None. Is there a better way to return the value I want?
The underlying question is: how can we make multiple recursive calls in a loop, return the recursive result if any of them returns something useful, and fail otherwise?
If we blindly return inside the loop, then only one recursive call can be made. Whatever it returns, gets returned at this level. If it didn't find the useful result, we don't get a useful result.
If we blindly don't return inside the loop, then the values that were returned don't matter. Nothing in the current call makes use of them, so we will finish looping, make all the recursive calls, reach the end of the function... and thus implicitly return None.
The way around this, of course, is to check whether the recursive call returned something useful. If it did, we can return that; otherwise, we keep going. If we reach the end, then we signal that we couldn't find anything useful - that way, if we are being recursively called, the caller can do the right thing.
Assuming that None cannot be a "useful" value, we can naturally use that as the signal. We don't even have to return it explicitly at the end.
After fixing some other typos (we should not overwrite the global built-in dict name, and anyway we don't need to name the dict that we pass in at the start, and the parameter should be m_dict so that it's properly defined when we make the recursive call), we get:
def recursive_json(data, attr, m_dict):
for k,v in data.items():
if k == attr:
for k2,v2 in v.items():
m_dict = {attr, v2}
print('IF: ', m_dict)
return m_dict
elif isinstance(v,dict):
result = recursive_json(v, attr, m_dict)
if result:
return result
# call it:
recursive_json(json_data, "Date", {})
We can see that the debug trace is printed, and the value is also returned.
Let's improve this a bit:
First off, the inner for k2,v2 in v.items(): loop doesn't make any sense. Again, we can only return once per call, so this would skip any values in the dict after the first. We would be better served just returning v directly. Also, the m_dict parameter doesn't actually help implement the logic; we don't modify it between calls. It doesn't make sense to use a set for our return value, since it's fundamentally unordered; we care about the order here. Finally, we don't need the debug trace any more. That gives us:
def recursive_json(data, attr):
for k, v in data.items():
if k == attr:
return attr, v
elif isinstance(v,dict):
result = recursive_json(v, attr)
if result:
return result
To get fancier, we can separate the base case from the recursive case, and use more elegant tools for each. To check if any of the keys matches, we can simply check with the in operator. To recurse and return the first fruitful result, the built-in next is useful. We get:
def recursive_json(data, attr):
if not isinstance(data, dict):
# reached a leaf, can't search in here.
return None
if attr in data:
return k, data[k]
candidates = (recursive_json(v, attr) for v in data.values())
try:
# the first non-None candidate, if any.
return next(c for c in candidates if c is not None)
except StopIteration:
return None # all candidates were None.
It seems like you're trying to write something like this:
from json import loads
from typing import Any
test_json = """
{
"a": {
"b": {
"value": 1
}
},
"b": {
"value": 2
},
"c": {
"b": {
"value": 3
},
"c": {
"value": 4
}
},
"d": {}
}
"""
json_data = loads(test_json)
def find_value(data: dict, attr: str, depth_first: bool=True) -> (bool, Any):
# assumes data is a dict, with 'value' attributes for the attr to be found
# returns [whether value was found]: bool, [actual value]: Any
for k, v in data.items():
if k == attr and 'value' in v:
return True, v['value']
elif depth_first and isinstance(v, dict):
if (t := find_value(v, attr, depth_first))[0]:
return t
if not depth_first:
for _, v in data.items():
if isinstance(v, dict) and (t := find_value(v, attr, depth_first))[0]:
return t
return False, None
# returns True, 1 - first 'b' with a 'value', depth-first
print(find_value(json_data, 'b'))
# returns True, 2 - first 'b' with a 'value', breadth-first
print(find_value(json_data, 'b', False))
# returns True, 4 - first 'c' with a 'value' - the 'c' at the root level has no 'value'
print(find_value(json_data, 'c'))
# returns False, None - no 'd' with a value
print(find_value(json_data, 'd'))
# returns False, None - no 'e' in data
print(find_value(json_data, 'e'))
Your own function can return None because you don't actually return the value a recursive call would return. And the default return value for a function is None.
However, your code also doesn't account for the case where there is nothing to be found.
(Note: this solution only works in Python 3.8 or later, due to its use of the walrus operator := - of course it's not that hard to write it without, but that's left as an exercisae for the reader

Get value from search in Dict

"data": {
"0": {
"name": "test",
"tag": "123"
},
"1": {
"name": "test123",
"tag": "456"
lets say having this example data above and i wanted to get the tag value of 456 but need to make sure the "name" has test123 value compared in a search. how should i loop this dict?
def test():
response = requests.get(data_above)
data_dict = json.loads(response.text)
# need to loop here to get the tag value of 456 and assigned it in variable but is from searching to make sure i have "name" test123 is found. is more towards dynamic
The data structure in the original question is incomplete. Making an assumption about how it really looks then this would work:
mydict = [{"data": {
"0": {
"name": "test",
"tag": "123"
},
"1": {
"name": "test123",
"tag": "456"}
}
}]
for v in mydict[0]['data'].values():
if v['name'] == 'test123':
print(v['tag'])
searched_name = "test123"
for v in myDict["data"]:
if v["name"] == searched_name:
tag = v["tag"]
expected outcome:
tag variable will hold value of 456 now
this is working for me. i might be posting my question wrongly but thanks to anyone who tried to helped me. Certainly some gave me idea in how to loop it
You can use [“the item name”] for calling it
like:
dict = {“data” : {
“0” : {
“name” : “test”,
“tag” : “123”
},
“1” : {
“name” : “test123”,
“tag” : “456”
}
}
#searching
for val in mydict:
#scan for each level
if val == “test”:
print(“i found”)
else:
for i in mydict[val]:
if i == “test”:
print(“i found”)
else:
for item in mydict[val][i]:
res = mydict[val][i][item]
if res == “test”:
print(“I found it in final step!”)

Safe get when parent is null in dictionary

I am looking for a way to safe get a value from a nested dictionary.
.get() will give None if the value is not present in a dictionary but if a value is None None.get("value_2") will throw an error.
Sample Dictionary:
[
{
"value": {
"value_2": "string"
}
},
{
"value": null
}
]
When iterating through the array for 0th element let us say a a.get("value").get("value_2") will give string as output, but for the second element a.get("value").get("value_2") gives an error. There needs to be a check if value is None, if not only then get value_2
Is there any way to skip the if check and make python return None. If the dictionary is nested for more than one level then I will have to check for None at multiple levels.
I would suggest to implement function like below
vals = [
{
"value": {
"value_2": "string"
}
},
{
"value": None
}
]
def get_from_dict(dict_, path):
path = path.split("/")[::-1]
dict_ = dict_.get(path.pop())
while dict_ is not None and len(path)>0:
dict_ = dict_.get(path.pop())
return dict_
for a in vals:
print(get_from_dict(a, "value/value_2"))

Python Object Iteration

Can someone provide an example of how to loop through this object in python and pull out 'value' where api = 'interesting' and arguments.name = 'FileName'?
Here is what I have so far.
This object has many more processes and calls....output has been omitted.
edit: I should mention that I am getting the following error when running this code:
"TypeError: list indices must be integers, not str"
for k, v in object['behavior']['processes']['calls'].items():
if v['api'] == "interesting":
<loop through arguments next>
Object:
{"behavior": {
"processes": [
{
"parent_id": "312",
"process_name": "test.exe",
"process_id": "1184",
"first_seen": "2013-03-02 17:22:48,359",
"calls": [
{
"category": "filesystem",
"status": "FAILURE",
"return": "0xc000003a",
"timestamp": "2013-03-02 17:22:48,519",
"thread_id": "364",
"repeated": 0,
"api": "interesting",
"arguments": [
{
"name": "FileHandle",
"value": "0x00000000"
},
{
"name": "DesiredAccess",
"value": "0x80100080"
},
{
"name": "FileName",
"value": "c:\\cgvi5r6i\\vgdgfd.72g"
}, ...
What you've given as a starter in the question won't work because you are not iterating through the elements of the lists that are the values to the keys "processes" and "calls" respectively. That is, you will need something more like
for proc in object ['processes']:
for call in proc ['calls']:
if call ['api'] == "interesting":
fname = None
for arg in call ['arguments']:
if arg ['name'] == "FileName":
fname = arg ['value']
Then the file name you're looking for will be in fname. This has no error checking, since I don't know where your data has come from.
What you're doing seems OK,
but
Your indexes are off (look carefully there are lists
Your check seems to be invalid (v is a string, so v['api'] is invalid).
So, try doing this instead, (I've taken your object as i)
for k, v in i['behavior']['processes'][0]['calls'][0].items():
if k == 'api' and v == "interesting":
print k,v
OR
for dct in i['behavior']['processes'][0]['calls']:
if dct['api'] == "interesting":
print 'api',dct['api']
OR
for dct in i['behavior']['processes'][0]['calls']:
for k,v in dct.items():
if k == 'api' and v =="interesting":
print 'api',dct['api']
OR if the there are multiple parts to each list,
for proc in i['behavior']['processes']:
for call in proc['calls']:
print 'api =>',call['api'] # a if here
for args in call['arguments']:
print ' argument.name =>',args['name'] # and another if here should do the trick.
Why you get the error
Try the following piece of code, and you'll understand what you were doing wrong
print type(i['behavior']['processes'])
print type(i['behavior']['processes'][0])
print type(i['behavior']['processes'][0]['calls'])
print type(i['behavior']['processes'][0]['calls'][0])
object['behavior']['processes']['calls']
is a list, not dict.

Categories