Python: JSON to CSV - python

I am receiving a JSON file from a Docparser API, which I would like to convert to a CSV document.
The structure is here below:
{
"type": "object",
"properties": {
"id": {
"type": "string"
},
"document_id": {
"type": "string"
},
"remote_id": {
"type": "string"
},
"file_name": {
"type": "string"
},
"page_count": {
"type": "integer"
},
"uploaded_at": {
"type": "string"
},
"processed_at": {
"type": "string"
},
"table_data": [
{
"type": "array",
"items": {
"type": "object",
"properties": {
"account_ref": {
"type": "string"
},
"client": {
"type": "string"
},
"transaction_type": {
"type": "string"
},
"key_4": {
"type": "string"
},
"date_yyyymmdd": {
"type": "string"
},
"amount_excl": {
"type": "string"
}
},
"required": [
"account_ref",
"client",
"transaction_type",
"key_4",
"date_yyyymmdd",
"amount_excl"
]
}
}
]
}
}
The first problem that I have is how to only work with the table_data section?
My second problem is writing the actual code that allows me to put each section, i.e. account_ref, client, etc., into their own columns. I had so many changes to my code, the output varied from adding the properties into columns and dumping the table_data part into one cell, to only printing the headers into a single cell (as a list).
Here's my current code (which is not working correctly):
import pydocparser
import json
import pandas as pd
parser = pydocparser.Parser()
parser.login('API')
data2 = str(parser.fetch("Name of Parser", 'documentID'))
data2 = str(data2).replace("'", '"') # I had to put this in because it kept saying that it needs double quotes.
y = json.loads(str(data2))
json_file = open(r"C:\File.json", "w")
json_file.write(str(y))
json_file.close()
df1 = df = pd.DataFrame({str(y)})
df1.to_csv(r"C:\jsonCSV.csv")
Thanks for your help!

Pandas has a nice built in function called pandas.json_noramlize()
If you're using pandas version lower then 1.0.0 use pandas.io.json.json_normalize(), it should split the columns nicely.
read more about it here:
>1.0.0:
https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.io.json.json_normalize.html
=<1.0.0
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html

Related

python: construct multipart/form data within defined in swagger.json

I am trying to construct a python request based on swagger.json schema. It mentioned multipart/form data and I did some research. And now the remaining issue is about type "array", not sure how to do it. Below is swagger.json schema.
"requestBody": {
"required": true,
"content": {
"multipart/form-data": {
"schema": {
"type": "object",
"properties": {
"id": {
"type": "string"
},
"name": {
"type": "string"
},
"file": {
"items": {
"type": "string",
"format": "binary"
},
"type": "array"
}
},
"required": [
"id",
"name",
"file"
]
}
}
}
}
I found files parameter in python requests module could do the multiform(How to send a "multipart/form-data" with requests in python?), but I don't know how to do the 'file' part which is an array here...if it is not array, just one object. I will go with 'file': ('testfile', open('testfile', 'rb')
current the UI side has not been deployed, so I cannot test. so could anyone help here? Thanks
data = {
'id' : test_id,
'name' : test_name,
'file': []
}

How to require data in the JSON list?

how does one require data in a list? How does one do this and specify the type e.g. dictionary?
Below are two JSON samples and a schema. Both JSON samples are valid according to the schema. The sample with the empty list should fail validation IMHO. How do I make that happen?
from jsonschema import validate
# this is ok per the schema
{
"mylist":[
{
"num_items":8,
"freq":8.5,
"other":2
},
{
"num_items":8,
"freq":8.5,
"other":4
}
]
}
# this should fail validation, but does not.
{
"mylist":[
]
}
# schema
{
"$schema": "http://json-schema.org/schema#",
"type": "object",
"properties": {
"mylist": {
"type": "array",
"items": {
"type": "object",
"properties": {
"num_items": {
"type": "integer"
},
"freq": {
"type": "number"
},
"other": {
"type": "integer"
}
},
"required": [
"freq",
"num_items",
"other"
]
}
}
},
"required": [
"mylist"
]
}

How to add data to a topic using AvroProducer

I have a topic with the following schema. Could someone help me out on how to add data to the different fields.
{
"name": "Project",
"type": "record",
"namespace": "abcdefg",
"fields": [
{
"name": "Object",
"type": {
"name": "Object",
"type": "record",
"fields": [
{
"name": "Number_ID",
"type": "int"
},
{
"name": "Accept",
"type": "boolean"
}
]
}
},
{
"name": "DataStructureType",
"type": "string"
},
{
"name": "ProjectID",
"type": "string"
}
]
}
I tried the following code. I get list is not iterable or list is out of range.
from confluent_kafka import avro
from confluent_kafka.avro import AvroProducer
AvroProducerConf = {'bootstrap.servers': 'localhost:9092','schema.registry.url': 'http://localhost:8081'}
value_schema = avro.load('project.avsc')
avroProducer = AvroProducer(AvroProducerConf, default_value_schema = value_schema)
while True:
avroProducer.produce(topic = 'my_topic', value = {['Object'][0] : "value", ['Object'] [1] : "true", ['DataStructureType'] : "testvalue", ['ProjectID'] : "123"})
avroProducer.flush()
It's not clear what you're expecting something like this to do... ['Object'][0] and keys of a dict cannot be lists.
Try sending this, which matches your Avro schema
value = {
'Object': {
"Number_ID", 1,
"Accept": true
},
'DataStructureType' : "testvalue",
'ProjectID' : "123"
}

Python - Parse complex JSON with objectpath

i need parse terraform file, write in JSON format. I have to extract two data, resource and id, this is example file:
{
"version": 1,
"serial": 1,
"modules": [
{
"path": [
"root"
],
"outputs": {
},
"resources": {
"aws_security_group.vpc-xxxxxxx-test-1": {
"type": "aws_security_group",
"primary": {
"id": "sg-xxxxxxxxxxxxxx",
"attributes": {
"description": "test-1",
"name": "test-1"
}
}
},
"aws_security_group.vpc-xxxxxxx-test-2": {
"type": "aws_security_group",
"primary": {
"id": "sg-yyyyyyyyyyyy",
"attributes": {
"description": "test-2",
"name": "test-2"
}
}
}
}
}
]
}
I need export for any resources, the first key and value of id, in this case, aws_security_group.vpc-xxxxxxx-test-1 sg-xxxxxxxxxxxxxx and aws_security_group.vpc-xxxxxxx-test-2 sg-yyyyyyyyyyyy
I have tried to write this in python:
#!/usr/bin/python3.6
import json
import objectpath
with open('file.json') as json_file:
data = json.load(json_file)
json_tree = objectpath.Tree(data['modules'])
result = tuple(json_tree.execute('$..resources[0]'))
result is
('aws_security_group.vpc-xxxxxxx-test-1', 'aws_security_group.vpc-xxxxxxx-test-2')
It's'ok but I can't extract the id, any help is appreciated, also use other methods
Thanks
I don't know objectpath, but I think you need:
tree.execute('$..resources[0]..primary.id')
or even just
tree.execute('$..resources[0]..id')

How to modify nested JSON with python

I need to update (CRUD) a nested JSON file using Python. To be able to call python function(s)(to update/delete/create) entires and write it back to the json file.
Here is a sample file.
I am looking at the remap library but not sure if this will work.
{
"groups": [
{
"name": "group1",
"properties": [
{
"name": "Test-Key-String",
"value": {
"type": "String",
"encoding": "utf-8",
"data": "value1"
}
},
{
"name": "Test-Key-Integer",
"value": {
"type": "Integer",
"data": 1000
}
}
],
"groups": [
{
"name": "group-child",
"properties": [
{
"name": "Test-Key-String",
"value": {
"type": "String",
"encoding": "utf-8",
"data": "value1"
}
},
{
"name": "Test-Key-Integer",
"value": {
"type": "Integer",
"data": 1000
}
}
]
}
]
},
{
"name": "group2",
"properties": [
{
"name": "Test-Key2-String",
"value": {
"type": "String",
"encoding": "utf-8",
"data": "value2"
}
}
]
}
]
}
I feel like I'm missing something in your question. In any event, what I understand is that you want to read a json file, edit the data as a python object, then write it back out with the updated data?
Read the json file:
import json
with open("data.json") as f:
data = json.load(f)
That creates a dictionary (given the format you've given) that you can manipulate however you want. Assuming you want to write it out:
with open("data.json","w") as f:
json.dump(data,f)

Categories