Python - Parse complex JSON with objectpath

Python - Parse complex JSON with objectpath - python

i need parse terraform file, write in JSON format. I have to extract two data, resource and id, this is example file:
{
"version": 1,
"serial": 1,
"modules": [
{
"path": [
"root"
],
"outputs": {
},
"resources": {
"aws_security_group.vpc-xxxxxxx-test-1": {
"type": "aws_security_group",
"primary": {
"id": "sg-xxxxxxxxxxxxxx",
"attributes": {
"description": "test-1",
"name": "test-1"
}
}
},
"aws_security_group.vpc-xxxxxxx-test-2": {
"type": "aws_security_group",
"primary": {
"id": "sg-yyyyyyyyyyyy",
"attributes": {
"description": "test-2",
"name": "test-2"
}
}
}
}
}
]
}
I need export for any resources, the first key and value of id, in this case, aws_security_group.vpc-xxxxxxx-test-1 sg-xxxxxxxxxxxxxx and aws_security_group.vpc-xxxxxxx-test-2 sg-yyyyyyyyyyyy
I have tried to write this in python:
#!/usr/bin/python3.6
import json
import objectpath
with open('file.json') as json_file:
data = json.load(json_file)
json_tree = objectpath.Tree(data['modules'])
result = tuple(json_tree.execute('$..resources[0]'))
result is
('aws_security_group.vpc-xxxxxxx-test-1', 'aws_security_group.vpc-xxxxxxx-test-2')
It's'ok but I can't extract the id, any help is appreciated, also use other methods
Thanks

I don't know objectpath, but I think you need:
tree.execute('$..resources[0]..primary.id')
or even just
tree.execute('$..resources[0]..id')

Related

python: construct multipart/form data within defined in swagger.json

I am trying to construct a python request based on swagger.json schema. It mentioned multipart/form data and I did some research. And now the remaining issue is about type "array", not sure how to do it. Below is swagger.json schema.
"requestBody": {
"required": true,
"content": {
"multipart/form-data": {
"schema": {
"type": "object",
"properties": {
"id": {
"type": "string"
},
"name": {
"type": "string"
},
"file": {
"items": {
"type": "string",
"format": "binary"
},
"type": "array"
}
},
"required": [
"id",
"name",
"file"
]
}
}
}
}
I found files parameter in python requests module could do the multiform(How to send a "multipart/form-data" with requests in python?), but I don't know how to do the 'file' part which is an array here...if it is not array, just one object. I will go with 'file': ('testfile', open('testfile', 'rb')
current the UI side has not been deployed, so I cannot test. so could anyone help here? Thanks
data = {
'id' : test_id,
'name' : test_name,
'file': []
}

How to parse nested JSON in python

I'm struggling to access some values in this nested json in python.
How can I access this ['Records'][0]['s3']['bucket']['name'] ? I did search a lot to find a simple python snippet, but no luck. Thanks in advance!
{
"Records": [
{
"eventName": "xxxxxxx",
"userIdentity": {
"principalId": "AWS:XXXXXXXXXXXXXX"
},
"requestParameters": {
"sourceIPAddress": "XX.XX.XX.XX"
},
"responseElements": {
"x-amz-request-id": "8CXXXXXXXXXXHRQX",
"x-amz-id-2": "doZ3+gxxxxxxx"
},
"s3": {
"s3SchemaVersion": "1.0",
"configurationId": "X-Event",
"bucket": {
"name": "bucket-name",
"ownerIdentity": {
"principalId": "xxxxxxx"
},
"arn": "arn:aws:s3:::bucket-name"
},
"object": {
"key": "object.png",
"sequencer": "0060XXXXXXX75X"
}
}
}
]
}

Since this is a string, use the json.loads method from the inbuilt JSON library.
import json
json_string = # your json string
parsed_string = json.loads(json_string)
print(parsed_string) # it will be a python dict
print(parsed_string['Records'][0]['s3']['bucket']['name']) # prints the string

Have you tried running your example? If you're loading the json from elsewhere, you'd need to convert it to this native dictionary object using the json library (as mentioned by others, json.loads(data))
kv = {
"Records": [
{
"eventName": "xxxxxxx",
"userIdentity": {
"principalId": "AWS:XXXXXXXXXXXXXX"
},
"requestParameters": {
"sourceIPAddress": "XX.XX.XX.XX"
},
"responseElements": {
"x-amz-request-id": "8CXXXXXXXXXXHRQX",
"x-amz-id-2": "doZ3+gxxxxxxx"
},
"s3": {
"s3SchemaVersion": "1.0",
"configurationId": "X-Event",
"bucket": {
"name": "bucket-name",
"ownerIdentity": {
"principalId": "xxxxxxx"
},
"arn": "arn:aws:s3:::bucket-name"
},
"object": {
"key": "object.png",
"sequencer": "0060XXXXXXX75X"
}
}
}
]
}
print("RESULT:",kv['Records'][0]['s3']['bucket']['name'])
RESULT: bucket-name

Python: JSON to CSV

I am receiving a JSON file from a Docparser API, which I would like to convert to a CSV document.
The structure is here below:
{
"type": "object",
"properties": {
"id": {
"type": "string"
},
"document_id": {
"type": "string"
},
"remote_id": {
"type": "string"
},
"file_name": {
"type": "string"
},
"page_count": {
"type": "integer"
},
"uploaded_at": {
"type": "string"
},
"processed_at": {
"type": "string"
},
"table_data": [
{
"type": "array",
"items": {
"type": "object",
"properties": {
"account_ref": {
"type": "string"
},
"client": {
"type": "string"
},
"transaction_type": {
"type": "string"
},
"key_4": {
"type": "string"
},
"date_yyyymmdd": {
"type": "string"
},
"amount_excl": {
"type": "string"
}
},
"required": [
"account_ref",
"client",
"transaction_type",
"key_4",
"date_yyyymmdd",
"amount_excl"
]
}
}
]
}
}
The first problem that I have is how to only work with the table_data section?
My second problem is writing the actual code that allows me to put each section, i.e. account_ref, client, etc., into their own columns. I had so many changes to my code, the output varied from adding the properties into columns and dumping the table_data part into one cell, to only printing the headers into a single cell (as a list).
Here's my current code (which is not working correctly):
import pydocparser
import json
import pandas as pd
parser = pydocparser.Parser()
parser.login('API')
data2 = str(parser.fetch("Name of Parser", 'documentID'))
data2 = str(data2).replace("'", '"') # I had to put this in because it kept saying that it needs double quotes.
y = json.loads(str(data2))
json_file = open(r"C:\File.json", "w")
json_file.write(str(y))
json_file.close()
df1 = df = pd.DataFrame({str(y)})
df1.to_csv(r"C:\jsonCSV.csv")
Thanks for your help!

Pandas has a nice built in function called pandas.json_noramlize()
If you're using pandas version lower then 1.0.0 use pandas.io.json.json_normalize(), it should split the columns nicely.
read more about it here:
>1.0.0:
https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.io.json.json_normalize.html
=<1.0.0
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html

How to model a complex json file as a python class

Are there any python helper libraries I can use to create models that I can use to generate complex json files, such as this. I've read about colander but I'm not sure it does what I need. The tricky bit about the following is that the trigger-rule section may have nested match rules, something as described at https://github.com/adnanh/webhook/wiki/Hook-Rules
[
{
"id": "webhook",
"execute-command": "/home/adnan/redeploy-go-webhook.sh",
"command-working-directory": "/home/adnan/go",
"pass-arguments-to-command":
[
{
"source": "payload",
"name": "head_commit.id"
},
{
"source": "payload",
"name": "pusher.name"
},
{
"source": "payload",
"name": "pusher.email"
}
],
"trigger-rule":
{
"and":
[
{
"match":
{
"type": "payload-hash-sha1",
"secret": "mysecret",
"parameter":
{
"source": "header",
"name": "X-Hub-Signature"
}
}
},
{
"match":
{
"type": "value",
"value": "refs/heads/master",
"parameter":
{
"source": "payload",
"name": "ref"
}
}
}
]
}
}
]

Define a class like this:
class AttributeDictionary(dict):
__getattr__ = dict.__getitem__
__setattr__ = dict.__setitem__
When you load your JSON, pass AttributeDictionary as the object_hook:
import json
data = json.loads(json_str, object_hook=AttributeDictionary)
Then you can access dict entries by specifying the key as an attribute:
print data[0].id
Output
webhook
Note: You will want to replace dashes in keys with underscores. If you don't, this approach won't work on those keys.

How to modify nested JSON with python

I need to update (CRUD) a nested JSON file using Python. To be able to call python function(s)(to update/delete/create) entires and write it back to the json file.
Here is a sample file.
I am looking at the remap library but not sure if this will work.
{
"groups": [
{
"name": "group1",
"properties": [
{
"name": "Test-Key-String",
"value": {
"type": "String",
"encoding": "utf-8",
"data": "value1"
}
},
{
"name": "Test-Key-Integer",
"value": {
"type": "Integer",
"data": 1000
}
}
],
"groups": [
{
"name": "group-child",
"properties": [
{
"name": "Test-Key-String",
"value": {
"type": "String",
"encoding": "utf-8",
"data": "value1"
}
},
{
"name": "Test-Key-Integer",
"value": {
"type": "Integer",
"data": 1000
}
}
]
}
]
},
{
"name": "group2",
"properties": [
{
"name": "Test-Key2-String",
"value": {
"type": "String",
"encoding": "utf-8",
"data": "value2"
}
}
]
}
]
}

I feel like I'm missing something in your question. In any event, what I understand is that you want to read a json file, edit the data as a python object, then write it back out with the updated data?
Read the json file:
import json
with open("data.json") as f:
data = json.load(f)
That creates a dictionary (given the format you've given) that you can manipulate however you want. Assuming you want to write it out:
with open("data.json","w") as f:
json.dump(data,f)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Parse complex JSON with objectpath - python

I don't know objectpath, but I think you need: tree.execute('$..resources[0]..primary.id') or even just tree.execute('$..resources[0]..id')

Related

python: construct multipart/form data within defined in swagger.json

How to parse nested JSON in python

Python: JSON to CSV

How to model a complex json file as a python class

How to modify nested JSON with python

Categories

Resources