Loading JSON file for reading and selecting data

Loading JSON file for reading and selecting data - python

I have a json file that I load into python. I want to take a keyword from the file (which is very big), like country rank or review from info taken from the internet. I tried
json.load('filename.json')
but I am getting an error:
AttributeError: 'str' object has no attribute 'read.'
What am I doing wrong?
Additionally, how do I select part of a json file if it is very big?

I think you need to open the file then pass that to json load like this
import json
from pprint import pprint
with open('filename.json') as data:
output = json.load(data)
pprint(output)

Try the following:
import json
json_data_file = open("json_file_path", 'r').read() # r for reading the file
json_data = json.loads(json_data_file)
Access the data using the keys as follows :
json_data['key']

json.load() expects the file handle after it has been opened:
with open('filename.json') as datafile:
data = json.load(datafile)
For example if your json data looked like this:
{
"maps": [
{
"id": "blabla",
"iscategorical": "0"
},
{
"id": "blabla",
"iscategorical": "0"
}
],
"masks": {
"id": "valore"
},
"om_points": "value",
"parameters": {
"id": "valore"
}
}
To access parts of the data, use:
data["maps"][0]["id"]
data["masks"]["id"]
data["om_points"]
That code can be found in this SO answer:
Parsing values from a JSON file using Python?

Related

Extracting data from JSON log

I am a beginner when it comes to programming. I'm trying to extract elements from a JSON log file, but I get an error and I don't know how to deal with it.
import json
with open("/Users/milosz/Desktop/logi.json") as f:
data = json.load(f)
print(type(data['Objects']))
print(data)
for object in data ['Objects']:
print(object)
Error:
File "/Users/milosz/PycharmProjects/JsonDataExtracter/Program/Python Exracter.py", line 4, in <module>
print(type(data['Objects']))
TypeError: list indices must be integers or slices, not str
Process finished with exit code 1
I am sending the log below.
{
"_id": "635bd4bfc594743ce9b1a5a3",
"dateStart": "2022-10-28T13:09:28.609Z",
"dateFinish": "2022-10-28T13:10:23.698Z",
"method": "customer.file.upsert",
"request": {
"Objects": [
{
"ERPId": "6915",
"B24Id": 403772,
"FileName": "B2B000202",
"FileContent": "JVBERi0xLjMNJeLjz9MN",
"B24EntityId": 3334
}
]

Following up on the guidance from #accdias, here is a code snippet that closes the gaps in your JSON snippet and demonstrates how to access the Objects section:
import json
json_string = """
{
"_id": "635bd4bfc594743ce9b1a5a3",
"dateStart": "2022-10-28T13:09:28.609Z",
"dateFinish": "2022-10-28T13:10:23.698Z",
"method": "customer.file.upsert",
"request": {
"Objects": [
{
"ERPId": "6915",
"B24Id": 403772,
"FileName": "B2B000202",
"FileContent": "JVBERi0xLjMNJeLjz9MN",
"B24EntityId": 3334
}
]
}
}
"""
json_dict = json.loads(json_string)
print(json_dict["request"]["Objects"])
Output:
[{'ERPId': '6915', 'B24Id': 403772, 'FileName': 'B2B000202', 'FileContent': 'JVBERi0xLjMNJeLjz9MN', 'B24EntityId': 3334}]

How to add data into a json key from a csv file using python

I am trying to add data into a json key from a csv file and maintain the original structure as is.. the json file looks like this..
{
"inputDocuments": {
"gcsDocuments": {
"documents": [
{
"gcsUri": "gs://test/.PDF",
"mimeType": "application/pdf"
}
]
}
},
"documentOutputConfig": {
"gcsOutputConfig": {
"gcsUri": "gs://test"
}
},
"skipHumanReview": false
The csv file I am trying to load has the following structure..
note that the
mimetype
is not included in the csv file.
I already have code that can do this, however its a bit manual and I am looking for a simpler approach that would just require a csv file with the values and this data will be added into the json structure. The expected outcome should look like this:
{
"inputDocuments": {
"gcsDocuments": {
"documents": [
{
"gcsUri": "gs://sampleinvoices/Handwritten/1.pdf",
"mimeType": "application/pdf"
},
{
"gcsUri": "gs://sampleinvoices/Handwritten/2.pdf",
"mimeType": "application/pdf"
}
]
}
},
"documentOutputConfig": {
"gcsOutputConfig": {
"gcsUri": "gs://test"
}
},
"skipHumanReview": false
The code that I am currently using, which is a bit manual looks like this..
import json
# function to add to JSON
def write_json(new_data, filename='keyvalue.json'):
with open(filename,'r+') as file:
# load existing data into a dict.
file_data = json.load(file)
# Join new_data with file_data inside documents
file_data["inputDocuments"]["gcsDocuments"]["documents"].append(new_data)
# Sets file's current position at offset.
file.seek(0)
# convert back to json.
json.dump(file_data, file, indent = 4)
# python object to be appended
y = {
"gcsUri": "gs://test/.PDF",
"mimeType": "application/pdf"
}
write_json(y)

I would suggest something like this:
import pandas as pd
import json
from pathlib import Path
df_csv = pd.read_csv("your_data.csv")
json_file = Path("your_data.json")
json_data = json.loads(json_file.read_text())
documents = [
{
"gcsUri": cell,
"mimeType": "application/pdf"
}
for cell in df_csv["column_name"]
]
json_data["inputDocuments"]["gcsDocuments"]["documents"] = documents
json_file.write_text(json.dumps(json_data))
Probably you should split this into separate functions, but it should communicate the general idea.

How to add dictionary line to an JSON file

I am trying to achieve the below JSON format and store it in a json file:
{
"Name": "Anurag",
"resetRecordedDate": false,
"ED": {
"Link": "google.com"
}
}
I know how to create a simple JSON file using JSON dumps but not really sure how to add something similar to a dictionary for one of the records within the JSON file.

Assuming the input json content is
{
"Name": "Anurag",
"resetRecordedDate": False
}
Program
import json
# read file
with open('example.json', 'r') as infile:
data=infile.read()
# parse file
parsed_json = json.loads(data)
# Add dictionary element
parsed_json["ED"] = {
"Link": "google.com"
}
# print(json.dumps(parsed_json, indent=4))
# write to json
with open('data.json', 'w') as outfile:
json.dump(parsed_json, outfile)
o/p
{
"Name": "Anurag",
"resetRecordedDate": false,
"ED": {
"Link": "google.com"
}
}

Python JSON to CSV with variable fields

I've a very large json file ( like 1,5gb ) and i need to transform it into csv.
The problem is that sometimes there's an extra field like:
[
{
"item": {
"name": "something",
"colors": {
"color_of_something": "something",
"color_of_something2": "something",
"color_of_something3": "something"
},
"dimensions": {
"dimensions1": "something",
"dimensions2": "something",
"dimensions3": "something"
},
"This_field_appears_sometimes": "something",
"description": {
"text": "something"
}
}
}]
I've this code to transform the json file into csv file:
# -*- coding: utf-8 -*-
import json, csv
with open("items.json") as file:
data = json.load(file)
csv_data = csv.writer(open('items.csv','wb+'))
csv_data.writerow(['item_name','item_color','item_dimension','item_random_field','item_description')
for json_parsed in data:
csv_data.writerow([
json_parsed['item']['name'],
json_parsed['item']['colors']['color_of_something'],
json_parsed['item']['dimensions']['dimensions1'],
json_parsed['item']['This_field_appears_sometimes'],
json_parsed['item']['description']['text']
])
When i run the task i'm getting this error:
KeyError: 'This_field_appears_sometimes'
Need some tip or advice to fix this, meanwhile i'll try if a len checkup works on this code.

You can use a "safe get" like this:
json_parsed['item'].get('This_field_appears_sometimes', '')
or check with a condition if that key is inside item
if 'This_field_appears_sometimes' in json_parsed['item'].keys()

The reason is no key 'This_field_appears_sometimes' in some item.
you can use json_parsed['item'].get('This_field_appears_sometimes') or check the json file

Reformat non-serializable JSON-ish data into a format suitable for value extraction in Python

With the following simple Python script:
import json
file = 'toy.json'
data = json.loads(file)
print(data['gas']) # example
My data generates the error ...is not JSON serializable.
With this, slightly more sophisticated, Python script:
import json
import sys
#load the data into an element
data = open('transactions000000000029.json', 'r')
#dumps the json object into an element
json_str = json.dumps(data)
#load the json to a string
resp = json.loads(json_str)
#extract an element in the response
print(resp['gas'])
The same.
What I'd like to do is extract all the values of a particular index, so ideally I'd like to render the input like so:
...
"hash": "0xf2b5b8fb173e371cbb427625b0339f6023f8b4ec3701b7a5c691fa9cef9daf63",
"gasUsed": "21000",
"hash": "0xf8f2a397b0f7bb1ff212b6bcc57e4a56ce3e27eb9f5839fef3e193c0252fab26"
"gasUsed": "21000"
...
The data looks like this:
{
"blockNumber": "1941794",
"blockHash": "0x41ee74e34cbf9ef4116febea958dbc260e2da3a6bf6f601bfaeb2cd9ab944a29",
"hash": "0xf2b5b8fb173e371cbb427625b0339f6023f8b4ec3701b7a5c691fa9cef9daf63",
"from": "0x3c0cbb196e3847d40cb4d77d7dd3b386222998d9",
"to": "0x2ba24c66cbff0bda0e3053ea07325479b3ed1393",
"gas": "121000",
"gasUsed": "21000",
"gasPrice": "20000000000",
"input": "",
"logs": [],
"nonce": "14",
"value": "0x24406420d09ce7440000",
"timestamp": "2016-07-24 20:28:11 UTC"
}
{
"blockNumber": "1941716",
"blockHash": "0x75e1602cad967a781f4a2ea9e19c97405fe1acaa8b9ad333fb7288d98f7b49e3",
"hash": "0xf8f2a397b0f7bb1ff212b6bcc57e4a56ce3e27eb9f5839fef3e193c0252fab26",
"from": "0xa0480c6f402b036e33e46f993d9c7b93913e7461",
"to": "0xb2ea1f1f997365d1036dd6f00c51b361e9a3f351",
"gas": "121000",
"gasUsed": "21000",
"gasPrice": "20000000000",
"input": "",
"logs": [],
"nonce": "1",
"value": "0xde0b6b3a7640000",
"timestamp": "2016-07-24 20:12:17 UTC"
}
What would be the best way to achieve that?
I've been thinking that perhaps the best way would be to reformat it as valid json?
Or maybe to just treat it like regex?

Your json file is not valid. This data should be a list of dictionaries. You should then separate each dictionary with a comma, Like this:
[
{
"blockNumber":"1941794",
"blockHash": "0x41ee74bf9ef411d9ab944a29",
"hash":"0xf2ef9daf63",
"from":"0x3c0cbb196e3847d40cb4d77d7dd3b386222998d9",
"to":"0x2ba24c66cbff0bda0e3053ea07325479b3ed1393",
"gas":"121000",
"gasUsed":"21000",
"gasPrice":"20000000000",
"input":"",
"logs":[
],
"nonce":"14",
"value":"0x24406420d09ce7440000",
"timestamp":"2016-07-24 20:28:11 UTC"
},
{
"blockNumber":"1941716",
"blockHash":"0x75e1602ca8d98f7b49e3",
"hash":"0xf8f2a397b0f7bb1ff212e193c0252fab26",
"from":"0xa0480c6f402b036e33e46f993d9c7b93913e7461",
"to":"0xb2ea1f1f997365d1036dd6f00c51b361e9a3f351",
"gas":"121000",
"gasUsed":"21000",
"gasPrice":"20000000000",
"input":"",
"logs":[
],
"nonce":"1",
"value":"0xde0b6b3a7640000",
"timestamp":"2016-07-24 20:12:17 UTC"
}
]
Then use this to open the file:
with open('toy.json') as data_file:
data = json.load(data_file)
You can then render the desired output like:
for item in data:
print item['hash']
print item['gasUsed']

If each block is valid JSON data you can parse them seperatly:
data = []
with open('transactions000000000029.json') as inpt:
lines = []
for line in inpt:
if line.startswith('{'): # block starts
lines = [line]
else:
lines.append(line)
if line.startswith('}'): # block ends
data.append(json.loads(''.join(lines)))
for block in data:
print("hash: {}".format(block['hash']))
print("gasUsed: {}".format(block['gasUsed']))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Loading JSON file for reading and selecting data - python

I think you need to open the file then pass that to json load like this import json from pprint import pprint with open('filename.json') as data: output = json.load(data) pprint(output)

Try the following: import json json_data_file = open("json_file_path", 'r').read() # r for reading the file json_data = json.loads(json_data_file) Access the data using the keys as follows : json_data['key']

Related

Extracting data from JSON log

How to add data into a json key from a csv file using python

How to add dictionary line to an JSON file

Python JSON to CSV with variable fields

Reformat non-serializable JSON-ish data into a format suitable for value extraction in Python

Categories

Resources