I have a JSON object like so, and I need to extract the name value of any object, using the id. I have tried many different iterations of this but I can't seem to get anything to work. Any general pointers would be much appreciated. Thank you.
{
"weeks":[
{
"1":[
{
"name":"Stackoverflow Question",
"description":"Have you ever asked a question on StackoverFlow?",
"date":"11/25/2019",
"id":"whewhewhkahfasdjkhgjks"
},
{
"name":"I Can't Believe It's Not Butter!",
"description":"Can you believe it? I sure can't.",
"date":"11/25/2019",
"id":"agfasdgasdgasdgawe"
}
]
},
{
"2":[
{
"name":"Hello World",
"description":"A hello world.",
"date":"12/02/2019",
"id":"aewgasdgewa"
},
{
"name":"Testing 123",
"description":"Sometimes people don't say it be like it is but it do.",
"date":"12/04/2019",
"id":"asdgasdgasdgasd"
}
]
}
]
}
Hope you need to find the name based on id, then try out the code below,
def get_name(data, id):
for week in data['weeks']:
for i in week:
for j in week[i]:
if j['id'] == id:
return j['name']
return None
get_name(data, 'asdgasdgasdgasd')
output
'Testing 123'
Not sure if this is what you are looking for
for week in a["weeks"]:
for k, v in week.values():
print(v['name'])
considering the variable a your dict.
Is the structure fixed, or can the depth of the JSON differ from the example?
This one would work as well if there are more or lesser hierarchies.
It basically searches in each dictionary inside a JSON-like structure for the field_name and returns the value of the argument output_name.
Maybe it helps you when your data structure changes :)
data = {
"weeks":[
{
"1":[
{
"name":"Stackoverflow Question",
"description":"Have you ever asked a question on StackoverFlow?",
"date":"11/25/2019",
"id":"whewhewhkahfasdjkhgjks"
},
{
"name":"I Can't Believe It's Not Butter!",
"description":"Can you believe it? I sure can't.",
"date":"11/25/2019",
"id":"agfasdgasdgasdgawe"
}
]
},
{
"2":[
{
"name":"Hello World",
"description":"A hello world.",
"date":"12/02/2019",
"id":"aewgasdgewa"
},
{
"name":"Testing 123",
"description":"Sometimes people don't say it be like it is but it do.",
"date":"12/04/2019",
"id":"asdgasdgasdgasd"
}
]
}
]
}
def extract_name(data, field_name: str, matching_value: str, output_name: str):
"""
:param data: json-like datastructure in which you want to search
:param field_name: the field name with which you want to match
:param matching_value: the value you want to match
:param output_name: the name of the value which you want to get
:return:
"""
if isinstance(data, list):
for item in data:
res = _inner_extract_name(item, field_name, matching_value, output_name)
if res is not None:
return res
elif isinstance(data, dict):
for item in data.values():
res = _inner_extract_name(item, field_name, matching_value, output_name)
if res is not None:
return res
def _inner_extract_name(item, field_name, matching_value, output_name):
if isinstance(item, dict):
res = extract_name(item, field_name, matching_value, output_name)
if field_name in item:
if item[field_name] == matching_value:
if output_name in item:
return item[output_name]
else:
res = extract_name(item, field_name, matching_value, output_name)
return res
if __name__ == "__main__":
name = extract_name(data, "id", "aewgasdgewa", "name")
print(name)
``
Related
I want to get the value of a specific key in a nested json file, without knowing the exact location. So basically looking through all the keys (and nested keys) until it finds the match, and return a dictionary {match: "value"}
Nested json_data:
{
"$id": "1",
"DataChangedEntry": {
"$id": "2",
"PathProperty": "/",
"Metadata": null,
"PreviousValue": null,
"CurrentValue": {
"CosewicWsRefId": {
"Value": "QkNlrjq2HL9bhTQqU8-qH"
},
"Date": {
"Value": "2022-05-20T00:00:00Z"
},
"YearSentToMinister": {
"Value": "0001-01-01T00:00:00"
},
"DateSentToMinister": {
"Value": "0001-01-01T00:00:00"
},
"Order": null,
"Type": {
"Value": "REGULAR"
},
"ReportType": {
"Value": "NEW"
},
"Stage": {
"Value": "ASSESSED"
},
"State": {
"Value": "PUBLISHED"
},
"StatusAndCriteria": {
"Status": {
"Value": "EXTINCT"
},
"StatusComment": {
"EnglishText": null,
"FrenchText": null
},
"StatusChange": {
"Value": "NOT_INITIALIZED"
},
"StatusCriteria": {
"EnglishText": null,
"FrenchText": null
},
"ApplicabilityOfCriteria": {
"ApplicabilityCriteriaList": []
}
},
"Designation": null,
"Note": null,
"DomainEvents": [],
"Version": {
"Value": 1651756761385.1248
},
"Id": {
"Value": "3z3XlCkaXY9xinAbK5PrU"
},
"CreatedAt": {
"Value": 1651756761384
},
"ModifiedAt": {
"Value": 1651756785274
},
"CreatedBy": {
"Value": "G#a"
},
"ModifiedBy": {
"Value": "G#a"
}
}
},
"EventAction": "Create",
"EventDataChange": {
"$ref": "2"
},
"CorrelationId": "3z3XlCkaXY9xinAbK5PrU",
"EventId": "WGxlewsUAHayLHZ2LHvFk",
"EventTimeUtc": "2022-05-06T13:15:31.7463355Z",
"EventDataVersion": "1.0.0",
"EventType": "AssessmentCreatedInfrastructure"
}
Desired return is the value from json_data["DataChangedEntry"]["CurrentValue"]["Date"]["Value"]:
"2022-05-20T00:00:00Z"
So far I've tried a recursive function but it keeps return None:
match_dict = {}
def recursive_json(data,attr,m_dict):
for k,v in data.items():
if k == attr:
for k2,v2 in v.items():
m_dict = {attr, v2}
print('IF: ',m_dict)
return m_dict
elif isinstance(v,dict):
return recursive_json(v,attr,m_dict)
print('RETURN: ',recursive_json(json_data, "Date", match_dict))
Output:
RETURN: None
I tried removing the second return statement, and it now prints the value I want in the function, but still returns None:
match_dict = {}
def recursive_json(data,attr,m_dict):
for k,v in data.items():
if k == attr:
for k2,v2 in v.items():
m_dict = {attr, v2}
print('IF: ',m_dict)
return m_dict
elif isinstance(v,dict):
recursive_json(v,attr,m_dict)
print('RETURN: ',recursive_json(json_data, "Date", match_dict))
Output:
IF: {'Date', '2022-05-20T00:00:00Z'}
RETURN: None
I don't get why it keeps returning None. Is there a better way to return the value I want?
The underlying question is: how can we make multiple recursive calls in a loop, return the recursive result if any of them returns something useful, and fail otherwise?
If we blindly return inside the loop, then only one recursive call can be made. Whatever it returns, gets returned at this level. If it didn't find the useful result, we don't get a useful result.
If we blindly don't return inside the loop, then the values that were returned don't matter. Nothing in the current call makes use of them, so we will finish looping, make all the recursive calls, reach the end of the function... and thus implicitly return None.
The way around this, of course, is to check whether the recursive call returned something useful. If it did, we can return that; otherwise, we keep going. If we reach the end, then we signal that we couldn't find anything useful - that way, if we are being recursively called, the caller can do the right thing.
Assuming that None cannot be a "useful" value, we can naturally use that as the signal. We don't even have to return it explicitly at the end.
After fixing some other typos (we should not overwrite the global built-in dict name, and anyway we don't need to name the dict that we pass in at the start, and the parameter should be m_dict so that it's properly defined when we make the recursive call), we get:
def recursive_json(data, attr, m_dict):
for k,v in data.items():
if k == attr:
for k2,v2 in v.items():
m_dict = {attr, v2}
print('IF: ', m_dict)
return m_dict
elif isinstance(v,dict):
result = recursive_json(v, attr, m_dict)
if result:
return result
# call it:
recursive_json(json_data, "Date", {})
We can see that the debug trace is printed, and the value is also returned.
Let's improve this a bit:
First off, the inner for k2,v2 in v.items(): loop doesn't make any sense. Again, we can only return once per call, so this would skip any values in the dict after the first. We would be better served just returning v directly. Also, the m_dict parameter doesn't actually help implement the logic; we don't modify it between calls. It doesn't make sense to use a set for our return value, since it's fundamentally unordered; we care about the order here. Finally, we don't need the debug trace any more. That gives us:
def recursive_json(data, attr):
for k, v in data.items():
if k == attr:
return attr, v
elif isinstance(v,dict):
result = recursive_json(v, attr)
if result:
return result
To get fancier, we can separate the base case from the recursive case, and use more elegant tools for each. To check if any of the keys matches, we can simply check with the in operator. To recurse and return the first fruitful result, the built-in next is useful. We get:
def recursive_json(data, attr):
if not isinstance(data, dict):
# reached a leaf, can't search in here.
return None
if attr in data:
return k, data[k]
candidates = (recursive_json(v, attr) for v in data.values())
try:
# the first non-None candidate, if any.
return next(c for c in candidates if c is not None)
except StopIteration:
return None # all candidates were None.
It seems like you're trying to write something like this:
from json import loads
from typing import Any
test_json = """
{
"a": {
"b": {
"value": 1
}
},
"b": {
"value": 2
},
"c": {
"b": {
"value": 3
},
"c": {
"value": 4
}
},
"d": {}
}
"""
json_data = loads(test_json)
def find_value(data: dict, attr: str, depth_first: bool=True) -> (bool, Any):
# assumes data is a dict, with 'value' attributes for the attr to be found
# returns [whether value was found]: bool, [actual value]: Any
for k, v in data.items():
if k == attr and 'value' in v:
return True, v['value']
elif depth_first and isinstance(v, dict):
if (t := find_value(v, attr, depth_first))[0]:
return t
if not depth_first:
for _, v in data.items():
if isinstance(v, dict) and (t := find_value(v, attr, depth_first))[0]:
return t
return False, None
# returns True, 1 - first 'b' with a 'value', depth-first
print(find_value(json_data, 'b'))
# returns True, 2 - first 'b' with a 'value', breadth-first
print(find_value(json_data, 'b', False))
# returns True, 4 - first 'c' with a 'value' - the 'c' at the root level has no 'value'
print(find_value(json_data, 'c'))
# returns False, None - no 'd' with a value
print(find_value(json_data, 'd'))
# returns False, None - no 'e' in data
print(find_value(json_data, 'e'))
Your own function can return None because you don't actually return the value a recursive call would return. And the default return value for a function is None.
However, your code also doesn't account for the case where there is nothing to be found.
(Note: this solution only works in Python 3.8 or later, due to its use of the walrus operator := - of course it's not that hard to write it without, but that's left as an exercisae for the reader
There is a JSON file with unknown structure.
I need to find an attribute of a known name in this file and, if it exists, return the name of its parent node, or nodes (if there are multiple instances of the attribute).
Example #1:
Input file:
{
"attr1": {
"attr2": {
"attr3": "somevalue"
"attr7": "someothervalue"
}
}
}
Attribute name: "attr7"
Expected return value: "attr2"
Example #2:
Input file:
{
"some": {
"deeply": {
"nested": {
"stuff": {
"array1": [
{"this":"value1"},
{"this":"value2"},
{"this":"value3"}
]
}
}
}
}
}
Attribute name: "this"
Expected return value: "array1"
Example #3:
(similar to #2 but with a duplicate)
Input file:
{
"some": {
"deeply": {
"nested": {
"this": {
"array1": [
{"this":"value1"},
{"this":"value2"},
{"this":"value3"}
]
}
}
}
}
}
Attribute name: "this"
Expected return value: "array1", "nested"
My starting point is:
import json
if __name__ == "__main__":
jsonFileName = "file.json"
attributeName = "this"
jsonFile = open(jsonFileName, "r")
jsonData = json.load(jsonFile)
# ???
I found this one: Access JSON element with parent name unknown but it is not really applicable in my case because they know the structure of their data and I don't.
Any hints?
So, with a bit of a back and forth with a more experienced colleague I came up with the following solution:
def findKey(jsonData: json, keyName: str, accessPath: str):
if isinstance(jsonData, str):
return None
for key in jsonData.keys():
if key == keyName:
return accessPath + f"/{keyName};"
if isinstance(jsonData[key], list):
for jd in jsonData[key]:
fk = findKey(jd, keyName, accessPath + "/[]" + key)
if(fk):
return fk
elif isinstance(jsonData[key], dict):
fk = findKey(jsonData[key], keyName, accessPath + "/{}" + key)
if(fk):
return fk
return None
{
"Logging": {
"LogLevel": {
"Default": "Information",
"Microsoft": "Warning",
"Microsoft.Hosting.Lifetime": "Information",
"Microsoft.AspNetCore": "Warning",
"System.Net.Http.HttpClient.Default.ClientHandler": "Warning",
"System.Net.Http.HttpClient.Default.LogicalHandler": "Warning"
}
},
"AllowedHosts": "*",
"AutomaticTransferOptions": {
"DateOffsetForDirectoriesInDays": -1,
"DateOffsetForPortfoliosInDays": -3,
"Clause": {
"Item1": "1"
}
},
"Authentication": {
"ApiKeys": [
{
"Key": "AB8E5976-2A7C-4EEE-92C1-7B0B4DC840F6",
"OwnerName": "Cron job",
"Claims": [
{
"Type": "http://schemas.microsoft.com/ws/2008/06/identity/claims/role",
"Value": "StressTestManager"
}
]
},
{
"Key": "B11D4F27-483A-4234-8EC7-CA121712D5BE",
"OwnerName": "Test admin",
"Claims": [
{
"Type": "http://schemas.microsoft.com/ws/2008/06/identity/claims/role",
"Value": "StressTestAdmin"
},
{
"Type": "http://schemas.microsoft.com/ws/2008/06/identity/claims/role",
"Value": "TestManager"
}
]
},
{
"Key": "EBF98F2E-555E-4E66-9D77-5667E0AA1B54",
"OwnerName": "Test manager",
"Claims": [
{
"Type": "http://schemas.microsoft.com/ws/2008/06/identity/claims/role",
"Value": "TestManager"
}
]
}
],
"LDAP": {
"Domain": "domain.local",
"MachineAccountName": "Soft13",
"MachineAccountPassword": "vixuUEY7884*",
"EnableLdapClaimResolution": true
}
},
"Authorization": {
"Permissions": {
"Roles": [
{
"Role": "TestAdmin",
"Permissions": [
"transfers.create",
"bindings.create"
]
},
{
"Role": "TestManager",
"Permissions": [
"transfers.create"
]
}
]
}
}
}
I have JSON above and need to parse it with output like this
Logging__LogLevel__Default
Authentication__ApiKeys__0__Claims__0__Type
Everything is ok, but I always get some strings with this output
Authentication__ApiKeys__0__Key
Authentication__ApiKeys__0__OwnerName
Authentication__ApiKeys__0__Claims__0__Type
Authentication__ApiKeys__0__Claims__0__Value
Authentication__ApiKeys__0__Claims__0
Authentication__ApiKeys__2
Authorization__Permissions__Roles__0__Role
Authorization__Permissions__Roles__0__Permissions__1
Authorization__Permissions__Roles__1__Role
Authorization__Permissions__Roles__1__Permissions__0
Authorization__Permissions__Roles__1
Why does my code adds not full strings like
Authentication__ApiKeys__0__Claims__0
Authentication__ApiKeys__2
Authorization__Permissions__Roles__1
And why it doesn't print every value from
Authorization__Permissions__Roles__0__Permissions__*
and from
Authorization__Permissions__Roles__1__Permissions__*
I have this code in python3:
def checkdepth(sub_key, variable):
delmt = '__'
for item in sub_key:
try:
if isinstance(sub_key[item], dict):
sub_variable = variable + delmt + item
checkdepth(sub_key[item], sub_variable)
except TypeError:
continue
if isinstance(sub_key[item], list):
sub_variable = variable + delmt + item
for it in sub_key[item]:
sub_variable = variable + delmt + item + delmt + str(sub_key[item].index(it))
checkdepth(it, sub_variable)
print(sub_variable)
if isinstance(sub_key[item], int) or isinstance(sub_key[item], str):
sub_variable = variable + delmt + item
print (sub_variable)
for key in data:
if type(data[key]) is str:
print(key + '=' +str(data[key]))
else:
variable = key
checkdepth(data[key], variable)
I know that the problem in block where I process list data type, but I don't know where is the problem exactly
Use a recursive generator:
import json
with open('input.json') as f:
data = json.load(f)
def strkeys(data):
if isinstance(data,dict):
for k,v in data.items():
for item in strkeys(v):
yield f'{k}__{item}' if item else k
elif isinstance(data,list):
for i,v in enumerate(data):
for item in strkeys(v):
yield f'{i}__{item}' if item else str(i)
else:
yield None # termination condition, not a list or dict
for s in strkeys(data):
print(s)
Output:
Logging__LogLevel__Default
Logging__LogLevel__Microsoft
Logging__LogLevel__Microsoft.Hosting.Lifetime
Logging__LogLevel__Microsoft.AspNetCore
Logging__LogLevel__System.Net.Http.HttpClient.Default.ClientHandler
Logging__LogLevel__System.Net.Http.HttpClient.Default.LogicalHandler
AllowedHosts
AutomaticTransferOptions__DateOffsetForDirectoriesInDays
AutomaticTransferOptions__DateOffsetForPortfoliosInDays
AutomaticTransferOptions__Clause__Item1
Authentication__ApiKeys__0__Key
Authentication__ApiKeys__0__OwnerName
Authentication__ApiKeys__0__Claims__0__Type
Authentication__ApiKeys__0__Claims__0__Value
Authentication__ApiKeys__1__Key
Authentication__ApiKeys__1__OwnerName
Authentication__ApiKeys__1__Claims__0__Type
Authentication__ApiKeys__1__Claims__0__Value
Authentication__ApiKeys__1__Claims__1__Type
Authentication__ApiKeys__1__Claims__1__Value
Authentication__ApiKeys__2__Key
Authentication__ApiKeys__2__OwnerName
Authentication__ApiKeys__2__Claims__0__Type
Authentication__ApiKeys__2__Claims__0__Value
Authentication__LDAP__Domain
Authentication__LDAP__MachineAccountName
Authentication__LDAP__MachineAccountPassword
Authentication__LDAP__EnableLdapClaimResolution
Authorization__Permissions__Roles__0__Role
Authorization__Permissions__Roles__0__Permissions__0
Authorization__Permissions__Roles__0__Permissions__1
Authorization__Permissions__Roles__1__Role
Authorization__Permissions__Roles__1__Permissions__0
Using json_flatten this can be converted to pandas, but it's not clear if that's what you want. Also, when you do convert it can use df.iloc[0] to see why each column is being provided (ie you see the value for that key).
Note: you need to pass a list so I just wrapped your json above in [].
# https://github.com/amirziai/flatten
dic = your json from above
dic =[dic] # put it in a list
dic_flattened = (flatten(d, '__') for d in dic) # add your delimiter
df = pd.DataFrame(dic_flattened)
df.iloc[0]
Logging__LogLevel__Default Information
Logging__LogLevel__Microsoft Warning
Logging__LogLevel__Microsoft.Hosting.Lifetime Information
Logging__LogLevel__Microsoft.AspNetCore Warning
Logging__LogLevel__System.Net.Http.HttpClient.Default.ClientHandler Warning
Logging__LogLevel__System.Net.Http.HttpClient.Default.LogicalHandler Warning
AllowedHosts *
AutomaticTransferOptions__DateOffsetForDirectoriesInDays -1
AutomaticTransferOptions__DateOffsetForPortfoliosInDays -3
AutomaticTransferOptions__Clause__Item1 1
Authentication__ApiKeys__0__Key AB8E5976-2A7C-4EEE-92C1-7B0B4DC840F6
Authentication__ApiKeys__0__OwnerName Cron job
Authentication__ApiKeys__0__Claims__0__Type http://schemas.microsoft.com/ws/2008/06/identi...
Authentication__ApiKeys__0__Claims__0__Value StressTestManager
Authentication__ApiKeys__1__Key B11D4F27-483A-4234-8EC7-CA121712D5BE
Authentication__ApiKeys__1__OwnerName Test admin
Authentication__ApiKeys__1__Claims__0__Type http://schemas.microsoft.com/ws/2008/06/identi...
Authentication__ApiKeys__1__Claims__0__Value StressTestAdmin
Authentication__ApiKeys__1__Claims__1__Type http://schemas.microsoft.com/ws/2008/06/identi...
Authentication__ApiKeys__1__Claims__1__Value TestManager
Authentication__ApiKeys__2__Key EBF98F2E-555E-4E66-9D77-5667E0AA1B54
Authentication__ApiKeys__2__OwnerName Test manager
Authentication__ApiKeys__2__Claims__0__Type http://schemas.microsoft.com/ws/2008/06/identi...
Authentication__ApiKeys__2__Claims__0__Value TestManager
Authentication__LDAP__Domain domain.local
Authentication__LDAP__MachineAccountName Soft13
Authentication__LDAP__MachineAccountPassword vixuUEY7884*
Authentication__LDAP__EnableLdapClaimResolution true
Authorization__Permissions__Roles__0__Role TestAdmin
Authorization__Permissions__Roles__0__Permissions__0 transfers.create
Authorization__Permissions__Roles__0__Permissions__1 bindings.create
Authorization__Permissions__Roles__1__Role TestManager
Authorization__Permissions__Roles__1__Permissions__0 transfers.create
Ok, I looked at your code and it's hard to follow. You're variable and function names are not easy to understand their purpose. Which is fine cause everyone has to learn best practice and all the little tips and tricks in python. So hopefully I can help you out.
You have a recursive-ish function. Which is definingly the best way to handle a situation like this. However your code is part recursive and part not. If you go recursive to solve a problem you have to go 100% recursive.
Also the only time you should print in a recursive function is for debugging. Recursive functions should have an object that is passed down the function and gets appended to or altered and then passed back once it gets to the end of the recursion.
When you get a problem like this, think about which data you actually need or care about. In this problem we don't care about the values that are stored in the object, we just care about the keys. So we should write code that doesn't even bother looking at the value of something except to determine its type.
Here is some code I wrote up that should work for what you're wanting to do. But take note that because I did purely a recursive function my code base is small. Also my function uses a list that is passed around and added to and then at the end I return it so that we can use it for whatever we need. If you have questions just comment on this question and I'll answer the best I can.
def convert_to_delimited_keys(obj, parent_key='', delimiter='__', keys_list=None):
if keys_list is None: keys_list = []
if isinstance(obj, dict):
for k in obj:
convert_to_delimited_keys(obj[k], delimiter.join((parent_key, str(k))), delimiter, keys_list)
elif isinstance(obj, list):
for i, _ in enumerate(obj):
convert_to_delimited_keys(obj[i], delimiter.join((parent_key, str(i))), delimiter, keys_list)
else:
# Append to list, but remove the leading delimiter due to string.join
keys_list.append(parent_key[len(delimiter):])
return keys_list
for item in convert_to_delimited_keys(data):
print(item)
I am trying to get a JSON sub-schema's "name" from based off of its contents. This is kind of hard to explain, so an example would be better:
{
"dummy_name_1": {
"dummy_key_1": "unique_dummy_value_1",
"dummy_key_2": "dummy_value_2"
},
"dummy_name_2": {
"dummy_key_1": "unique_dummy_value_2",
"dummy_key_2": "dummy_value_2"
}
}
I want to get the name of dummy_name_1 (which would be "dummy_name_1") given the value of the key "dummy_key_1" (which would be "unique_dummy_value_1"). Basically, if I give the Python function I want "dummy_key_1" and "unique_dummy_value_1" as parameters, I want it to return the string "dummy_name_1".
Something like this? structure being your dict.
def get_dummy_name(dummy_key, dummy_value):
for dummy_name, content in structure.items():
if dummy_key in content.keys() and content[dummy_key] == dummy_value:
return dummy_name
try with this:
def get_category_name(key_name, key_value):
dictionary = {
"dummy_name_1": {
"dummy_key_1": "unique_dummy_value_1",
"dummy_key_2": "dummy_value_2"
},
"dummy_name_2": {
"dummy_key_1": "unique_dummy_value_2",
"dummy_key_2": "dummy_value_2"
}
}
for elem in dictionary.items():
if key_name in elem[1] and elem[1][key_name] == key_value:
return elem[0]
return False
response = get_category_name('dummy_key_1', 'unique_dummy_value_1')
Input/Goal
My input data is an OrderedDict for which there can be a variable depth of nested OrderedDicts so I have opted to handle parsing this output recursively. The desired output is a csv with header.
Elaboration of Problem
My code below will work once I am able to correctly define field_name upon traversing back up a branch after completing all of a branch's leaves. (i.e. Type_1.Field_3.Data will incorrectly be called Type_1.Field_2.Field_3.Data).
Once the leaves on a branch have been exhausted, I want to remove the last .Field_x from the field_name so that a new (correct) one can be added for the following object.
Request for Help
Does anyone see where I can include this feature? Thanks,
...
Dependencies:
Code Snippet:
def get_soql_fields(soql):
soql_fields = re.search('(?<=select)(?s)(.*)(?=from)', soql) # get fields
soql_fields = re.sub(' ', '', soql_fields.group()) # remove extra spaces
fields = re.split(',|\n|\r', soql_fields) # split on commas and newlines
fields = [field for field in fields if field != ''] # remove empty strings
return fields
def parse_output(data, soql):
fields = get_soql_fields(soql)
header = fields
master = [header]
for record in data['records']: # for each 'record' in response
row = []
for obj, value in record.iteritems(): # for each obj in record
if isinstance(value, basestring): # if query base object has desired fields
if obj in fields:
row.append(value)
elif isinstance(value, dict): # traverse down into object
path = obj
row.append(_traverse_output(obj, value, fields, row, path))
master.append(row)
return master
def _traverse_output(obj, value, fields, row, path):
for f, v in value.iteritems(): # for each item in obj
if not isinstance(v, (dict, list, tuple)):
field_name = '{path}.{name}'.format(path=path, name=f) # TODO fix this to full field name
print('FName: {0}'.format(field_name))
if field_name in fields:
print('match')
row.append(v)
elif isinstance(v, dict): # it is a dict
path += '.{obj}'.format(obj=f)
_traverse_output(f, v, fields, row, path)
Example Salesforce SOQL:
select
Type_1.Field_1,
Type_1.Field_2.Data,
Type_1.Field_3,
Type_1.Field_4,
Type_1.Field_5.Data_1.Data,
Type_1.Field_6,
Type_2.Field_1,
Type_2.Field_2
from
Obj_1
limit
1
;
Example Salesforce Output:
{
"records": [
{
"attributes": {
"type": "Obj_1",
"url": "<url>"
},
"Type_1": {
"attributes": {
"type": "Type_1",
"url": "<url>"
},
"Field_1": "<stuff>",
"Field_2": {
"attributes": {
"type": "Field_2",
"url": "<url>"
},
"Data": "<data>"
},
"Field_3": "<data>",
"Field_4": "<data>",
"Field_5": {
"attributes": {
"type": "Field_2",
"url": "<url>"
},
"Data_1": {
"attributes": {
"type": "Data_1",
"url": "<url>"
},
"Data": "<data>"
}
},
"Field_6": 1.0
},
"Type_2": {
"attributes": {
"type": "Type_2",
"url": "<url>"
},
"Field_1": "<data>",
"Field_2": "<data>"
}
}
]
}
I worked out a quick solution for this. I'll just note what I figured out, and append the code I wrote to the end.
Essentially your problem is that you keep trying to modify path in place, which isn't going to work. Instead do something like
new_path = path + '.{obj}'.format(obj=f)
_traverse_output(f, v, fields, row, new_path)
A note about this: it will NOT necessarily result in a row where the values are in the same order as the header (i.e., if Type_1.Field_1 is in position 0 of the header list, then the value corresponding to it might not be).
The easy way to solve this (and handle csvs in general) is to use DictWriter from the csv module, then pass an empty dictionary to your first call where the keys will be the field names and the values will be their values.
Another way to solve the problem is to pre-populate your row list with None or empty strings, then use the list.index method to assign the value to the appropriate position.
I wrote an implementation of _traverse_output as examples for each, though they differ slightly from your code. They take an element of the 'records' list.
Dictionary Example
def _traverse_output_with_dict(record, fields, row_values, field_name=''):
for obj, value in record.iteritems():
new_field_name = '{}.{}'.format(field_name, obj) if field_name else obj
print new_field_name
if not isinstance(value, dict):
if new_field_name in fields:
row_values[new_field_name] = value
else:
_traverse_output_with_dict(value, fields, row_values, new_field_name)
List Example
def _traverse_output_with_list(record, fields, row, field_name=''):
while len(row) < len(fields):
row.append('')
for obj, value in record.iteritems():
new_field_name = '{}.{}'.format(field_name, obj) if field_name else obj
print new_field_name
if not isinstance(value, dict):
if new_field_name in fields:
row[fields.index(new_field_name)] = value
else:
_traverse_output_with_list(value, fields, row, new_field_name)