Validating "Parallel" JSON Arrays - python

I am trying to use pydantic to validate JSON that is being returned in a "parallel" array format. Namely, there is an array defining the column names/types followed by an array of "rows" (this is similar to how pandas handles df.to_json(orient='split') seen here)
{
"columns": [
"sensor",
"value",
"range"
],
"data": [
[
"a",
1,
{"low": 1, "high": 2}
],
[
"b",
2,
{"low": 0, "high": 2}
]
]
}
I know that I could do this:
class ValueRange(BaseModel):
low: int
high: int
class Response(BaseModel):
columns: Tuple[Literal['sensor'], Literal['value'], Literal['range']]
data: List[Tuple[str, int, ValueRange]]
But this has a few downsides:
After parsing, it doesn't allow for an association of the data with the column names. So, you have to do everything by index. Ideally, I would like to parse a response into a List[Row] and then be able to do things like response.data[0].sensor.
It hardcodes the column order.
It doesn't allow for responses that have variable columns in the responses. For example, the same endpoint could also return the following:
{
"columns": ["sensor", "value"],
"data": [
["a", 1],
["b", 2]
]
}
At first I thought that I could use pydantic's discriminated unions, but I'm not seeing how to do this across arrays.
Anyone know of the best approach for validating this type of data? (I'm currently using pydantic, but am open to other libraries if it makes sense).
Thanks!

TL;DR
A very interesting use case for a custom validator.
from collections.abc import Sequence
from typing import Literal, Optional
from pydantic import BaseModel, validator
from pydantic.fields import Field
Column = Literal["sensor", "value", "range"]
class ValueRange(BaseModel):
low: int
high: int
class DataPoint(BaseModel):
sensor: str
value: int
range: Optional[ValueRange]
class Response(BaseModel):
columns: Optional[tuple[Column, ...]] = Field(exclude=True, repr=False)
data: list[DataPoint]
#validator("columns", pre=True)
def ensure_distinct(cls, v: object) -> object:
if isinstance(v, Sequence) and len(v) != len(set(v)):
raise ValueError("`columns` must all be distinct")
return v
#validator("data", pre=True, each_item=True)
def parse_nested_sequence(
cls,
v: object,
values: dict[str, object],
) -> object:
if not isinstance(v, Sequence):
return v
columns = values.get("columns")
if not isinstance(columns, Sequence):
raise TypeError(
"If `data` items are provided as a sequences, "
"the `columns` must be present as a sequence."
)
if len(columns) != len(v):
raise ValueError(
"`data` item must be the same length as `columns`"
)
return dict(zip(columns, v))
Explanation
Schema
First we need to set up the models to reflect the schema we want to have after parsing and validation are complete.
Since you mentioned that you would like the data field in the response model to be a list of model instances corresponding to a certain schema, we need to define that schema. From your example it seems to need at least the fields sensor, value and range. The range field should be its own model once again. You also mentioned that range should be optional, so we'll encode that too.
The actual top-level Response model will still have the columns field because we will need that for validation later on. But we can hide that field from the string representation and the exporter methods like dict and json. To communicate the variable number of elements in that columns tuple, we'll change the annotation a bit.
Here are the models I would suggest:
from collections.abc import Sequence
from typing import Literal, Optional
from pydantic import BaseModel, validator
from pydantic.fields import Field
Column = Literal["sensor", "value", "range"]
class ValueRange(BaseModel):
low: int
high: int
class DataPoint(BaseModel):
sensor: str
value: int
range: Optional[ValueRange]
class Response(BaseModel):
columns: Optional[tuple[Column, ...]] = Field(exclude=True, repr=False)
data: list[DataPoint]
... # more code
Field types and parameters
To hide columns we can use the appropriate parameters of the Field constructor. Making it Optional (default will be None) means we could still initialize an instance of Response without it, but we would then of course be forced to provide the data list in the correct format (as dictionaries or DataPoint instances).
Since we used tuple[Column, ...] as the annotation for columns, the elements could be in any order, which is what you wanted, but it could also theoretically be of arbitrary length and contain a bunch of duplicates. The Python typing system does not provide any elegant tool to define the type in a way that indicates that all elements of the tuple must be distinct. We could of course construct a huge type union of all permutations of the literal values that should be valid, but this is hardly practical.
Validators
Instead I would suggest a very simple validator to do this check for us:
...
class Response(BaseModel):
columns: Optional[tuple[Column, ...]] = Field(exclude=True, repr=False)
data: list[DataPoint]
#validator("columns", pre=True)
def ensure_distinct(cls, v: object) -> object:
if isinstance(v, Sequence) and len(v) != len(set(v)):
raise ValueError("`columns` must all be distinct")
return v
... # more code
The pre=True is actually important here, but only in conjunction with the second validator. That one is much more interesting because it is supposed to bring those columns together with the sequences of data. Here is what I propose:
...
class Response(BaseModel):
columns: Optional[tuple[Column, ...]] = Field(exclude=True, repr=False)
data: list[DataPoint]
#validator("columns", pre=True)
def ensure_distinct(cls, v: object) -> object:
if isinstance(v, Sequence) and len(v) != len(set(v)):
raise ValueError("`columns` must all be distinct")
return v
#validator("data", pre=True, each_item=True)
def parse_nested_sequence(
cls,
v: object,
values: dict[str, object],
) -> object:
if not isinstance(v, Sequence):
return v
columns = values.get("columns")
if not isinstance(columns, Sequence):
raise TypeError(
"If `data` items are provided as a sequences, "
"the `columns` must be present as a sequence."
)
if len(columns) != len(v):
raise ValueError(
"`data` item must be the same length as `columns`"
)
return dict(zip(columns, v))
The pre=True on both validators ensures that they are run before the default validators for those field types (otherwise we would immediately get a validation error from your example data). Field validators are always called in the order the fields were defined, so we ensured that our custom columns validator is called before our custom data validator.
That order also allows us to access the output of our columns validator in the values dictionary of our data validator.
The each_item=True flag changes the behavior of the validator so that it is applied to each individual element of the data list as opposed to the entire list. That means with our example data the v argument will always be a single "sub-list" (e.g. ["a", 1]).
If the value to validate is not of a sequence type, we don't bother with it. It will then be handled appropriately by the default field validator. If it is a sequence, we need to make sure that columns is present and also a sequence and that they are of the same length. If those checks pass, we can just zip them, consume that zip into a dictionary and send it on its merry way to the default field validator.
That's all.
Demo
Here is a little demo script:
def main() -> None:
from pydantic import ValidationError
print(Response.parse_raw(TEST_JSON_VALID_1).json(indent=2), "\n")
print(Response.parse_raw(TEST_JSON_VALID_2).json(indent=2), "\n")
try:
Response.parse_raw(TEST_JSON_INVALID_1)
except ValidationError as exc:
print(exc.json(indent=2), "\n")
try:
Response.parse_raw(TEST_JSON_INVALID_2)
except ValidationError as exc:
print(exc.json(indent=2), "\n")
try:
Response.parse_raw(TEST_JSON_INVALID_3)
except ValidationError as exc:
print(exc.json(indent=2))
if __name__ == "__main__":
main()
And here is the test data and its corresponding output:
TEST_JSON_VALID_1
{
"columns": [
"sensor",
"value",
"range"
],
"data": [
[
"a",
1,
{"low": 1, "high": 2}
],
[
"b",
2,
{"low": 0, "high": 2}
]
]
}
{
"data": [
{
"sensor": "a",
"value": 1,
"range": {
"low": 1,
"high": 2
}
},
{
"sensor": "b",
"value": 2,
"range": {
"low": 0,
"high": 2
}
}
]
}
TEST_JSON_VALID_2 (different order and no range)
{
"columns": ["value", "sensor"],
"data": [
[1, "a"],
[2, "b"]
]
}
{
"data": [
{
"sensor": "a",
"value": 1,
"range": null
},
{
"sensor": "b",
"value": 2,
"range": null
}
]
}
TEST_JSON_INVALID_1
{
"columns": ["foo", "value"],
"data": []
}
[
{
"loc": [
"columns",
0
],
"msg": "unexpected value; permitted: 'sensor', 'value', 'range'",
"type": "value_error.const",
"ctx": {
"given": "foo",
"permitted": [
"sensor",
"value",
"range"
]
}
}
]
TEST_JSON_INVALID_2
{
"columns": ["value", "value"],
"data": []
}
[
{
"loc": [
"columns"
],
"msg": "`columns` must all be distinct",
"type": "value_error"
}
]
TEST_JSON_INVALID_3
{
"columns": ["sensor", "value"],
"data": [["a", 1], ["b"]]
}
[
{
"loc": [
"data",
1
],
"msg": "`data` item must be the same length as `columns`",
"type": "value_error"
}
]

Related

Conditional return value typing based on enum argument

I have some pydantic BaseModels, that I want to populate with values.
from enum import Enum
from typing import Dict, List, Literal, Type, Union, overload
from pydantic import BaseModel
class Document(BaseModel):
name: str
pages: int
class DocumentA(Document):
reviewer: str
class DocumentB(Document):
columns: Dict[str, Dict]
class DocumentC(Document):
reviewer: str
tools: List[str]
Example Values:
db = {
"A": {
0: {"name": "Document 1", "pages": 2, "reviewer": "Person A"},
1: {"name": "Document 2", "pages": 3, "reviewer": "Person B"},
},
"B": {
0: {"name": "Document 1", "pages": 1, "columns": {"colA": "A", "colB": "B"}},
1: {"name": "Document 2", "pages": 5, "columns": {"colC": "C", "colD": "D"}},
},
"C": {
0: {"name": "Document 1", "pages": 7, "reviewer": "Person C", "tools": ["hammer"]},
1: {"name": "Document 2", "pages": 2, "reviewer": "Person A", "tools": ["hammer", "chisel"]},
},
}
To load the values into the correct BaseModel Class, I have created a System Class, which is also need elsewhere and has more functionality, but I omitted details for clarity.
class System(Enum):
A = ("A", DocumentA)
B = ("B", DocumentB)
C = ("C", DocumentC)
#property
def key(self)-> str:
return self.value[0]
#property
def Document(self) -> Union[Type[DocumentA], Type[DocumentB], Type[DocumentC]]:
return self.value[1]
Then, through System["A"].Document I can access DocumentA directly.
To load the values, I use this function (disregard handling IndexErrors for now):
def load_document(db: Dict, idx: int, system: System) -> Union[DocumentA, DocumentB, DocumentC]:
data = db[system.key][idx]
return system.Document(**data)
Now, I might need to handle some of data of type B which I load directly in the handling function.
def handle_document_B(db: Dict, idx: int):
doc = load_document(db=db, idx=idx, system=System.B)
# Following line raises mypy errors
# Item "DocumentA" of "Union[DocumentA, DocumentB, DocumentC]" has no attribute "columns"
# Item "DocumentC" of "Union[DocumentA, DocumentB, DocumentC]" has no attribute "columns"
print(doc.columns)
Running mypy raises errors on the line print(doc.columns), since load_document has a typed return value of Union[DocumentA, DocumentB, DocumentC], and obviously DocumentA and DocumentC cannot access the columns attribute. But the only Document type that could be loaded here is DocumentB anyways.
I know I could load the Document outside of the handler function and pass it instead, but I would prefer to load it in the handler.
I circumvented the type issue by overloading the load_document function with the correct Document class, but this seems like a tedious solution since I need to manually add an overloader for each System that might be added in the future.
Is it possible to conditionally type hint a functions return value based on an Enum input value?
You can explicitly annotate return types depending on selected option:
from typing import Literal, overload
#overload
def load_document(db: Dict, idx: int, system: Literal[System.A]) -> DocumentA: ...
#overload
def load_document(db: Dict, idx: int, system: Literal[System.B]) -> DocumentB: ...
#overload
def load_document(db: Dict, idx: int, system: Literal[System.C]) -> DocumentC: ...
def load_document(db: Dict, idx: int, system: System) -> Union[DocumentA, DocumentB, DocumentC]:
data = db[system.key][idx]
return system.Document(**data)
The rest of your code typechecks now (playground).

Validating nested dict with Pydantic `create_model`

I am using create_model to validate a config file which runs into many nested dicts. In the below example i can validate everything except the last nest of sunrise and sunset.
class System(BaseModel):
data: Optional[create_model('Data', type=(str, ...), daytime=(dict, ...))] = None
try:
p = System.parse_obj({
'data': {
'type': 'solar',
'daytime': {
'sunrise': 5,
'sunset': 10
}
}})
print(p.dict())
vals = p.dict()
print(vals['data']['daytime'], type(vals['data']['daytime']['sunrise']))
except ValidationError as e:
print(e)
How can i incorporate a nested dict in create_model and ensure sunrise and sunset are validate or is there any other way to validate this. ?
Thanks for your feedback.
It's not entirely clear what you are trying to achieve, so here's my best guess:
Pydantic's create_model() is meant to be used when
the shape of a model is not known until runtime.
From your code it's not clear whether or not this is really necessary.
Here's how I would approach this.
(The code below is using Python 3.9 and above type hints. If you're using an earlier version, you might have to replace list with typing.List and dict with typing.Dict.)
Using a "Static" Model
from pydantic import BaseModel, create_model
from typing import Optional
class Data(BaseModel):
type: str
daytime: dict[str, int] # <- explicit types in the dict, values will be coerced
class System(BaseModel):
data: Optional[Data]
system = {
"data": {
"type": "solar",
"daytime": {
"sunrise": 1,
"sunset": 10
}
}
}
p = System.parse_obj(system)
print(repr(p))
# System(data=Data(type='solar', daytime={'sunrise': 1, 'sunset': 10}))
This will accept integers in daytime but not other types, like strings.
That means, something like this:
system = {
"data": {
"type": "solar",
"daytime": {
"sunrise": "some string",
"sunset": 10
}
}
}
p = System.parse_obj(system)
will fail with:
pydantic.error_wrappers.ValidationError: 1 validation error for System
data -> daytime -> sunrise
value is not a valid integer (type=type_error.integer)
If you want more control over how "daytime" is being parsed, you could create an additional model for it, e.g.:
class Daytime(BaseModel):
sunrise: int
sunset: int
class Data(BaseModel):
type: str
daytime: Daytime
class System(BaseModel):
data: Optional[Data]
This will work as above however, only the parameters sunrise and sunset will be parsed and everything else that might be inside "daytime" will be ignored (by default).
Using a "Dynamic" Model
If you really need to create the model dynamically, you can use a similar approach:
class System(BaseModel):
data: Optional[
create_model(
"Data",
type=(str, ...),
daytime=(dict[str, int], ...), # <- explicit types in dict
)
] = None
system = {
"data": {
"type": "solar",
"daytime": {
"sunrise": 1,
"sunset": 10
}
}
}
p = System.parse_obj(system)
print(repr(p))
will work, while
system = {
"data": {
"type": "solar",
"daytime": {
"sunrise": "abc",
"sunset": 10
}
}
}
p = System.parse_obj(system)
will fail with
pydantic.error_wrappers.ValidationError: 1 validation error for System
data -> daytime -> sunrise
value is not a valid integer (type=type_error.integer)

Pydantic create model for list with nested dictionary

I have a body looks like this:
{
"data": [
{
"my_api": {
"label": "First name",
"value": "Micheal"
}
},
{
"my_api": {
"label": "Last name",
"value": [
"Jackson"
]
}
},
{
"my_api": {
"label": "Favourite colour",
"value": "I don't have any"
}
}
]
}
This is my model.py so far:
class DictParameter(BaseModel): # pylint: disable=R0903
"""
`my_api` children
"""
label: Optional[str]
value: Optional[str]
class DataParameter(BaseModel): # pylint: disable=R0903
"""
`data` children
"""
my_api: Optional[dict] # NOTE: Future readers, this incorrect reference is part of the OP's Q
class InputParameter(BaseModel): # pylint: disable=R0903
"""
Takes predefined params
"""
data: Optional[List[DataParameter]]
In main.py:
from model import InputParameter
#router.post("/v2/workflow", status_code=200)
def get_parameter(user_input: InputParameter):
"""
Version 2 : No decoding & retrieve workflow params
"""
data = user_input.data
print(data)
Output:
[DataParameter(my_api={'label': 'First name', 'value': 'Micheal'}), DataParameter(my_api={'label': 'Last name', 'value': ['Jackson']}), DataParameter(my_api={'label': 'Favourite colour', 'value': "I don't have any"})]
I want to access the value inside my_api key. But I keep getting type error. I'm not sure how to access List of dictionary with nested child. Plus, the value of value can be str or array. It is dynamic.
Is there any other way of doing this?
Plus, the value of value can be str or array. It is dynamic.
What you currently have will cast single element lists to strs, which is probably what you want. If you want lists to stay as lists, use:
from Typing import Union
class DictParameter(BaseModel):
Value: Union[str, list[str]]
Unless you have the good luck to be running python 3.10, on which case str | list[str] is equivalent.
However, you do not actually use this model! You have my_api: Optional[dict] not my_api: Optional[DictParameter], so your current output is a plain old dict, and you need to do data[0].my_api["value"]. Currently this returns a str or a list, which is probably the problem. I suspect, though, that you meant to use the pydantic schema.
Note that data is a list: if you want all the values you need to iterate, something like
apis = [x.my_api for x in data]
Assuming you fix the issue in DictParameter (as pointed out by other answer by #2e0byo):
class DictParameter(BaseModel):
label: Optional[str]
value: Optional[Union[str, List[str]]]
And you fix the issue in DataParameter:
class DataParameter(BaseModel):
# my_api: Optional[dict] <-- prev value
my_api: Optional[DictParameter]
You can access the values in your object the following way:
def get_value_from_data_param(param_obj: InputParameter, key: str):
"""
Returns a value from an InputParameter object,
or returns `None` if not found
"""
# Iterate over objects
for item in param_obj.data:
# Skip if no value
if not item.my_api:
continue
# This assumes there are no duplicate labels
# if there are, perhaps make a list and append values
if item.my_api.label == label:
return item.my_api.value
# If nothing is found, return None (or some `default`)
return None
Now let's test it:
input_data = {
"data": [
{"my_api": {"label": "First name", "value": "Micheal"}},
{"my_api": {"label": "Last name", "value": ["Jordan"]}},
{"my_api": {"label": "Favourite colour", "value": "I don't have any"}}
]
}
# Create an object
input_param_obj = InputParameter.parse_obj(input_data)
# Let's see if we can get values:
f_name = get_value_from_data_param(input_param_obj, "First name")
assert f_name == 'Michael'
l_name = get_value_from_data_param(input_param_obj, "Last name")
assert l_name == ['Jordan']
nums = get_value_from_data_param(input_param_obj, "Numbers")
assert nums == ["1", "2", "3"]
erroneous = get_value_from_data_param(input_param_obj, "KEY DOES NOT EXIST")
assert erroneous == None

Python parse JSON file

{
"Logging": {
"LogLevel": {
"Default": "Information",
"Microsoft": "Warning",
"Microsoft.Hosting.Lifetime": "Information",
"Microsoft.AspNetCore": "Warning",
"System.Net.Http.HttpClient.Default.ClientHandler": "Warning",
"System.Net.Http.HttpClient.Default.LogicalHandler": "Warning"
}
},
"AllowedHosts": "*",
"AutomaticTransferOptions": {
"DateOffsetForDirectoriesInDays": -1,
"DateOffsetForPortfoliosInDays": -3,
"Clause": {
"Item1": "1"
}
},
"Authentication": {
"ApiKeys": [
{
"Key": "AB8E5976-2A7C-4EEE-92C1-7B0B4DC840F6",
"OwnerName": "Cron job",
"Claims": [
{
"Type": "http://schemas.microsoft.com/ws/2008/06/identity/claims/role",
"Value": "StressTestManager"
}
]
},
{
"Key": "B11D4F27-483A-4234-8EC7-CA121712D5BE",
"OwnerName": "Test admin",
"Claims": [
{
"Type": "http://schemas.microsoft.com/ws/2008/06/identity/claims/role",
"Value": "StressTestAdmin"
},
{
"Type": "http://schemas.microsoft.com/ws/2008/06/identity/claims/role",
"Value": "TestManager"
}
]
},
{
"Key": "EBF98F2E-555E-4E66-9D77-5667E0AA1B54",
"OwnerName": "Test manager",
"Claims": [
{
"Type": "http://schemas.microsoft.com/ws/2008/06/identity/claims/role",
"Value": "TestManager"
}
]
}
],
"LDAP": {
"Domain": "domain.local",
"MachineAccountName": "Soft13",
"MachineAccountPassword": "vixuUEY7884*",
"EnableLdapClaimResolution": true
}
},
"Authorization": {
"Permissions": {
"Roles": [
{
"Role": "TestAdmin",
"Permissions": [
"transfers.create",
"bindings.create"
]
},
{
"Role": "TestManager",
"Permissions": [
"transfers.create"
]
}
]
}
}
}
I have JSON above and need to parse it with output like this
Logging__LogLevel__Default
Authentication__ApiKeys__0__Claims__0__Type
Everything is ok, but I always get some strings with this output
Authentication__ApiKeys__0__Key
Authentication__ApiKeys__0__OwnerName
Authentication__ApiKeys__0__Claims__0__Type
Authentication__ApiKeys__0__Claims__0__Value
Authentication__ApiKeys__0__Claims__0
Authentication__ApiKeys__2
Authorization__Permissions__Roles__0__Role
Authorization__Permissions__Roles__0__Permissions__1
Authorization__Permissions__Roles__1__Role
Authorization__Permissions__Roles__1__Permissions__0
Authorization__Permissions__Roles__1
Why does my code adds not full strings like
Authentication__ApiKeys__0__Claims__0
Authentication__ApiKeys__2
Authorization__Permissions__Roles__1
And why it doesn't print every value from
Authorization__Permissions__Roles__0__Permissions__*
and from
Authorization__Permissions__Roles__1__Permissions__*
I have this code in python3:
def checkdepth(sub_key, variable):
delmt = '__'
for item in sub_key:
try:
if isinstance(sub_key[item], dict):
sub_variable = variable + delmt + item
checkdepth(sub_key[item], sub_variable)
except TypeError:
continue
if isinstance(sub_key[item], list):
sub_variable = variable + delmt + item
for it in sub_key[item]:
sub_variable = variable + delmt + item + delmt + str(sub_key[item].index(it))
checkdepth(it, sub_variable)
print(sub_variable)
if isinstance(sub_key[item], int) or isinstance(sub_key[item], str):
sub_variable = variable + delmt + item
print (sub_variable)
for key in data:
if type(data[key]) is str:
print(key + '=' +str(data[key]))
else:
variable = key
checkdepth(data[key], variable)
I know that the problem in block where I process list data type, but I don't know where is the problem exactly
Use a recursive generator:
import json
with open('input.json') as f:
data = json.load(f)
def strkeys(data):
if isinstance(data,dict):
for k,v in data.items():
for item in strkeys(v):
yield f'{k}__{item}' if item else k
elif isinstance(data,list):
for i,v in enumerate(data):
for item in strkeys(v):
yield f'{i}__{item}' if item else str(i)
else:
yield None # termination condition, not a list or dict
for s in strkeys(data):
print(s)
Output:
Logging__LogLevel__Default
Logging__LogLevel__Microsoft
Logging__LogLevel__Microsoft.Hosting.Lifetime
Logging__LogLevel__Microsoft.AspNetCore
Logging__LogLevel__System.Net.Http.HttpClient.Default.ClientHandler
Logging__LogLevel__System.Net.Http.HttpClient.Default.LogicalHandler
AllowedHosts
AutomaticTransferOptions__DateOffsetForDirectoriesInDays
AutomaticTransferOptions__DateOffsetForPortfoliosInDays
AutomaticTransferOptions__Clause__Item1
Authentication__ApiKeys__0__Key
Authentication__ApiKeys__0__OwnerName
Authentication__ApiKeys__0__Claims__0__Type
Authentication__ApiKeys__0__Claims__0__Value
Authentication__ApiKeys__1__Key
Authentication__ApiKeys__1__OwnerName
Authentication__ApiKeys__1__Claims__0__Type
Authentication__ApiKeys__1__Claims__0__Value
Authentication__ApiKeys__1__Claims__1__Type
Authentication__ApiKeys__1__Claims__1__Value
Authentication__ApiKeys__2__Key
Authentication__ApiKeys__2__OwnerName
Authentication__ApiKeys__2__Claims__0__Type
Authentication__ApiKeys__2__Claims__0__Value
Authentication__LDAP__Domain
Authentication__LDAP__MachineAccountName
Authentication__LDAP__MachineAccountPassword
Authentication__LDAP__EnableLdapClaimResolution
Authorization__Permissions__Roles__0__Role
Authorization__Permissions__Roles__0__Permissions__0
Authorization__Permissions__Roles__0__Permissions__1
Authorization__Permissions__Roles__1__Role
Authorization__Permissions__Roles__1__Permissions__0
Using json_flatten this can be converted to pandas, but it's not clear if that's what you want. Also, when you do convert it can use df.iloc[0] to see why each column is being provided (ie you see the value for that key).
Note: you need to pass a list so I just wrapped your json above in [].
# https://github.com/amirziai/flatten
dic = your json from above
dic =[dic] # put it in a list
dic_flattened = (flatten(d, '__') for d in dic) # add your delimiter
df = pd.DataFrame(dic_flattened)
df.iloc[0]
Logging__LogLevel__Default Information
Logging__LogLevel__Microsoft Warning
Logging__LogLevel__Microsoft.Hosting.Lifetime Information
Logging__LogLevel__Microsoft.AspNetCore Warning
Logging__LogLevel__System.Net.Http.HttpClient.Default.ClientHandler Warning
Logging__LogLevel__System.Net.Http.HttpClient.Default.LogicalHandler Warning
AllowedHosts *
AutomaticTransferOptions__DateOffsetForDirectoriesInDays -1
AutomaticTransferOptions__DateOffsetForPortfoliosInDays -3
AutomaticTransferOptions__Clause__Item1 1
Authentication__ApiKeys__0__Key AB8E5976-2A7C-4EEE-92C1-7B0B4DC840F6
Authentication__ApiKeys__0__OwnerName Cron job
Authentication__ApiKeys__0__Claims__0__Type http://schemas.microsoft.com/ws/2008/06/identi...
Authentication__ApiKeys__0__Claims__0__Value StressTestManager
Authentication__ApiKeys__1__Key B11D4F27-483A-4234-8EC7-CA121712D5BE
Authentication__ApiKeys__1__OwnerName Test admin
Authentication__ApiKeys__1__Claims__0__Type http://schemas.microsoft.com/ws/2008/06/identi...
Authentication__ApiKeys__1__Claims__0__Value StressTestAdmin
Authentication__ApiKeys__1__Claims__1__Type http://schemas.microsoft.com/ws/2008/06/identi...
Authentication__ApiKeys__1__Claims__1__Value TestManager
Authentication__ApiKeys__2__Key EBF98F2E-555E-4E66-9D77-5667E0AA1B54
Authentication__ApiKeys__2__OwnerName Test manager
Authentication__ApiKeys__2__Claims__0__Type http://schemas.microsoft.com/ws/2008/06/identi...
Authentication__ApiKeys__2__Claims__0__Value TestManager
Authentication__LDAP__Domain domain.local
Authentication__LDAP__MachineAccountName Soft13
Authentication__LDAP__MachineAccountPassword vixuUEY7884*
Authentication__LDAP__EnableLdapClaimResolution true
Authorization__Permissions__Roles__0__Role TestAdmin
Authorization__Permissions__Roles__0__Permissions__0 transfers.create
Authorization__Permissions__Roles__0__Permissions__1 bindings.create
Authorization__Permissions__Roles__1__Role TestManager
Authorization__Permissions__Roles__1__Permissions__0 transfers.create
Ok, I looked at your code and it's hard to follow. You're variable and function names are not easy to understand their purpose. Which is fine cause everyone has to learn best practice and all the little tips and tricks in python. So hopefully I can help you out.
You have a recursive-ish function. Which is definingly the best way to handle a situation like this. However your code is part recursive and part not. If you go recursive to solve a problem you have to go 100% recursive.
Also the only time you should print in a recursive function is for debugging. Recursive functions should have an object that is passed down the function and gets appended to or altered and then passed back once it gets to the end of the recursion.
When you get a problem like this, think about which data you actually need or care about. In this problem we don't care about the values that are stored in the object, we just care about the keys. So we should write code that doesn't even bother looking at the value of something except to determine its type.
Here is some code I wrote up that should work for what you're wanting to do. But take note that because I did purely a recursive function my code base is small. Also my function uses a list that is passed around and added to and then at the end I return it so that we can use it for whatever we need. If you have questions just comment on this question and I'll answer the best I can.
def convert_to_delimited_keys(obj, parent_key='', delimiter='__', keys_list=None):
if keys_list is None: keys_list = []
if isinstance(obj, dict):
for k in obj:
convert_to_delimited_keys(obj[k], delimiter.join((parent_key, str(k))), delimiter, keys_list)
elif isinstance(obj, list):
for i, _ in enumerate(obj):
convert_to_delimited_keys(obj[i], delimiter.join((parent_key, str(i))), delimiter, keys_list)
else:
# Append to list, but remove the leading delimiter due to string.join
keys_list.append(parent_key[len(delimiter):])
return keys_list
for item in convert_to_delimited_keys(data):
print(item)

Python ---- TypeError: string indices must be integers

I have the below Python code
from flask import Flask, jsonify, json
app = Flask(__name__)
with open('C:/test.json', encoding="latin-1") as f:
dataset = json.loads(f.read())
#app.route('/api/PDL/<string:dataset_identifier>', methods=['GET'])
def get_task(dataset_identifier):
global dataset
dataset = [dataset for dataset in dataset if dataset['identifier'] == dataset_identifier]
if len(task) == 0:
abort(404)
return jsonify({'dataset': dataset})
if __name__ == '__main__':
app.run(debug=True)
Test.json looks like this:
{
"dataset": [{
"bureauCode": [
"016:00"
],
"description": "XYZ",
"contactPoint": {
"fn": "AG",
"hasEmail": "mailto:AG#AG.com"
},
"distribution": [
{
"format": "XLS",
"mediaType": "application/vnd.ms-excel",
"downloadURL": "https://www.example.com/xyz.xls"
}
],
"programCode": [
"000:000"
],
"keyword": [ "return to work",
],
"modified": "2015-10-14",
"title": "September 2015",
"publisher": {
"name": "abc"
},
"identifier": US-XYZ-ABC-36,
"rights": null,
"temporal": null,
"describedBy": null,
"accessLevel": "public",
"spatial": null,
"license": "http://creativecommons.org/publicdomain/zero/1.0/",
"references": [
"http://www.example.com/example.html"
]
}
],
"conformsTo": "https://example.com"
}
When I pass the variable in the URL like this: http://127.0.0.1:5000/api/PDL/1403
I get the following error: TypeError: string indices must be integers
Knowing that the "identifier" field is a string and I am passing the following in the URL:
http://127.0.0.1:5000/api/PDL/"US-XYZ-ABC-36"
http://127.0.0.1:5000/api/PDL/US-XYZ-ABC-36
I keep getting the following error:
TypeError: string indices must be integers
Any idea on what am I missing here? I am new to Python!
The problem is that you are trying to iterate the dictionary instead of the list of datasources inside it. As a consequence, you're iterating through the keys of the dictionary, which are strings. Additionaly, as it was mentioned by above, you will have problems if you use the same name for the list and the iterator variable.
This worked for me:
[ds for ds in dataset['dataset'] if ds['identifier'] == dataset_identifier]
The problem you have right now is that during iteration in the list comprehension, the very first iteration changes the name dataset from meaning the dict you json.loads-ed to a key of that dict (dicts iterate their keys). So when you try to look up a value in dataset with dataset['identifier'], dataset isn't the dict anymore, it's the str key of you're currently iterating.
Stop reusing the same name to mean different things.
From the JSON you posted, what you probably want is something like:
with open('C:/test.json', encoding="latin-1") as f:
alldata = json.loads(f.read())
#app.route('/api/PDL/<string:dataset_identifier>', methods=['GET'])
def get_task(dataset_identifier):
# Gets the list of data objects from top level object
# Could be inlined into list comprehension, replacing dataset with alldata['dataset']
dataset = alldata['dataset']
# data is a single object in that list, which should have an identifier key
# data_for_id is the list of objects passing the filter
data_for_id = [data for data in dataset if data['identifier'] == dataset_identifier]
if len(task) == 0:
abort(404)
return jsonify({'dataset': data_for_id})

Categories