Pandas to JSON file formatting issue, adding \ to strings - python

I am using the pandas.DataFrame.to_json to convert a data frame to JSON data.
data = df.to_json(orient="records")
print(data)
This works fine and the output when printing is as expected in the console.
[{"n":"f89be390-5706-4ef5-a110-23f1657f4aec:voltage","bt":1610040655,"u":"V","v":237.3},
{"n":"f89be390-5706-4ef5-a110-23f1657f4aec:power","bt":1610040836,"u":"W","v":512.3},
{"n":"f89be390-5706-4ef5-a110-23f1657f4aec:voltage","bt":1610040840,"u":"V","v":238.4}]
The problem comes when uploading it to an external API which converts it to a file format or writing it to a file locally. The output has added \ to the beginning and ends of strings.
def dataToFile(processedData):
with open('data.json', 'w') as outfile:
json.dump(processedData,outfile)
The result is shown in the clip below
[{\"n\":\"f1097ac5-0ee4-48a4-8af5-bf2b58f3268c:power\",\"bt\":1610024746,\"u\":\"W\",\"v\":40.3},
{\"n\":\"f1097ac5-0ee4-48a4-8af5-bf2b58f3268c:voltage\",\"bt\":1610024751,\"u\":\"V\",\"v\":238.5},
{\"n\":\"f1097ac5-0ee4-48a4-8af5-bf2b58f3268c:power\",\"bt\":1610024764,\"u\":\"W\",\"v\":39.7}]
Is there any formatting specifically I should be including/excluding when converting the data to a file format?

Your data variable is a string of json data and not an actual dictionary. You can do a few things:
Use DataFrame.to_json() to write the file, the first argument of to_json() is the file path:
df.to_json('./data.json', orient='records')
Write the json string directly as text:
def write_text(text: str, path: str):
with open(path, 'w') as file:
file.write(text)
data = df.to_json(orient="records")
write_text(data, './data.json')
If you want to play around with the dictionary data:
def write_json(data, path, indent=4):
with open(path, 'w') as file:
json.dump(data, file, indent=indent)
df_data = df.to_dict(orient='records')
# ...some operations here...
write_json(df_data, './data.json')

Related

I need help creating a simple python script that stores an attribute value from a custom json file

JSON file looks like this:
{"Clear":"Pass","Email":"noname#email.com","ID":1234}
There are hundreds of json files with different email values, which is why I need a script to run against all files.
I need to extract out the value associated with the Email attribute, which is nooname#email.com.
I tried using import json but I'm getting a decoder error:
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Script looks like this:
import json
json_data = json.loads("file.json")
print (json_data["Email"]
Thanks!
According to the docs, json.loads() takes a str, bytes or bytearray as argument. So if you want to load a json file this way, you should pass the content of the file instead of its path.
import json
file = open("file.json", "r") # Opens file.json in read mode
file_data = file.read()
json_data = json.loads(file_data)
file.close() # Remember to close the file after using it
You can also use json.load() which takes a FILE as argument
import json
file = open("file.json", "r")
json_data = json.load(file)
file.close()
your script needs to open the file to get a file handle, than we can read the json.
this sample contains code that can read the json file. to simulate this, it uses a string that is identical with the data coming from the file.
import json
#this is to read from the real json file
#file_name = 'email.json'
#with open(file_name, 'r') as f_obj:
#json_data = json.load(f_obj)
# this is a string that equals the result from reading json file
json_data = '{"Clear":"Pass","Email":"noname#email.com","ID":1234}'
json_data = json.loads(json_data)
print (json_data["Email"])
result: noname#email.com
import json
with open("file.json", 'r') as f:
file_content = f.read()
#convert json to python dict
tmp = json.loads(file_content)
email = tmp["Email"]
As already pointed out in previous comments, json.loads() take contents of a file rather than a file.

Read JSON file correctly

I am trying to read a JSON file (BioRelEx dataset: https://github.com/YerevaNN/BioRelEx/releases/tag/1.0alpha7) in Python. The JSON file is a list of objects, one per sentence.
This is how I try to do it:
def _read(self, file_path):
with open(cached_path(file_path), "r") as data_file:
for line in data_file.readlines():
if not line:
continue
items = json.loads(lines)
text = items["text"]
label = items.get("label")
My code is failing on items = json.loads(line). It looks like the data is not formatted as the code expects it to be, but how can I change it?
Thanks in advance for your time!
Best,
Julia
With json.load() you don't need to read each line, you can do either of these:
import json
def open_json(path):
with open(path, 'r') as file:
return json.load(file)
data = open_json('./1.0alpha7.dev.json')
Or, even cooler, you can GET request the json from GitHub
import json
import requests
url = 'https://github.com/YerevaNN/BioRelEx/releases/download/1.0alpha7/1.0alpha7.dev.json'
response = requests.get(url)
data = response.json()
These will both give the same output. data variable will be a list of dictionaries that you can iterate over in a for loop and do your further processing.
Your code is reading one line at a time and parsing each line individually as JSON. Unless the creator of the file created the file in this format (which given it has a .json extension is unlikely) then that won't work, as JSON does not use line breaks to indicate end of an object.
Load the whole file content as JSON instead, then process the resulting items in the array.
def _read(self, file_path):
with open(cached_path(file_path), "r") as data_file:
data = json.load(data_file)
for item in data:
text = item["text"]
label appears to be buried in item["interaction"]

gzip a list of nested dictionaries

I have a group of .jsonl.gz files.
I can read them using the script:
import json
import gzip
with gzip.open(filepath, "r") as read_file: # file path ends with .jsonl.gz
try:
# read gzip file which contains a list of json files (json lines)
# each json file is a dictionary of nested dictionaries
json_list = list(read_file)
except:
print("fail to read thezip ")
Then I do some processing and get some .json files and store them in a list.
for num, json_file in enumerate(json_list):
try:
j_file = json.loads(json_file)
(...some code...)
except:
print("fail")
My question is what is the right way to write them again into .jsonl.gz again?
This is my attempt
jsonfilename = 'valid_' +str(num)+'.jsonl.gz'
with gzip.open(jsonfilename, 'wb') as f:
for dict in list_of_nested_dictionaries:
content.append(json.dumps(dict).encode('utf-8'))
f.write(content)
But I got this error:
TypeError: memoryview: a bytes-like object is required, not 'list'
Then I tried just to gzip the list of dictionaries as is:
jsonfilename = 'valid_' +str(num)+'.jsonl.gz'
with gzip.open(jsonfilename, 'wb') as f:
f.write(json.dumps(list_of_nested_dictionaries).encode('utf-8'))
But the problem here that it gzips the whole list as one block, and when I read it back I got one element which is the whole stored list but not a list of json files as I got from the first step.
this is the code that i use for reading
with gzip.open('valid_3.jsonl.gz', "r" , ) as read_file:
try:
json_list = list(read_file) # read zip file
print(len(json_list))# I got 1 here
except:
print("fail")
json_list[0].decode('utf-8')
f.write(content) takes a byte-string, but you're passing it a list of byte-strings.
f.writelines(content) will iterate over and write each byte-string from the list.
Edit: by the way, gzip is meant for compressing a single file. If you need to compress multiple files into one, I suggest to pack them together in a tarball first and then gzip that.
the solution is simply like this
with gzip.open(jsonfilename, 'wb') as f:
for dict in list_of_nested_dictionaries:
content.append((json.dumps(dict)+'\n').encode('utf-8'))
f.writelines(content)

Update a JSON file with Python and keep the original format

I have a Python script that updates some value of a JSON file, and the original format of my JSON looks like:
To edit the value I use this code:
import json
status_wifi = "ok"
with open("config_wifi.json", "r") as jsonFile:
data = json.load(jsonFile)
data['wifi_session']['status'] = status_wifi
with open("config_wifi.json", "w") as jsonFile:
json.dump(data, jsonFile)
But when the values are updated, the format of my JSON is compressed like this:
I want the JSON file to keep its original format with all spaces and line breaks. How could I do that?
Try json.dumps(json_obj, indent=4)
Example:
import json
status_wifi = "ok"
with open("config_wifi.json", "r") as jsonFile:
data = json.load(jsonFile)
data['wifi_session']['status'] = status_wifi
with open("config_wifi.json", "w") as jsonFile:
json.dump(json.dumps(data, indent=4), jsonFile)
The indent is the number of spaces for a tab.
If you set this parameter, the JSON will be formatted.
You can read more about it here.

nested JSON to CSV using python script

i'm new to python and I've got a large json file that I need to convert to csv - below is a sample
{ "status": "success","Name": "Theresa May","Location": "87654321","AccountCategory": "Business","AccountType": "Current","TicketNo": "12345-12","AvailableBal": "12775.0400","BookBa": "123475.0400","TotalCredit": "1234567","TotalDebit": "0","Usage": "5","Period": "May 11 2014 to Jul 11 2014","Currency": "GBP","Applicants": "Angel","Signatories": [{"Name": "Not Available","BVB":"Not Available"}],"Details": [{"PTransactionDate":"24-Jul-14","PValueDate":"24-Jul-13","PNarration":"Cash Deposit","PCredit":"0.0000","PDebit":"40003.0000","PBalance":"40003.0000"},{"PTransactionDate":"24-Jul-14","PValueDate":"23-Jul-14","PTest":"Cash Deposit","PCredit":"0.0000","PDebit":"40003.0000","PBalance":"40003.0000"},{"PTransactionDate":"25-Jul-14","PValueDate":"22-Jul-14","PTest":"Cash Deposit","PCredit":"0.0000","PDebit":"40003.0000","PBalance":"40003.0000"},{"PTransactionDate":"25-Jul-14","PValueDate":"21-Jul-14","PTest":"Cash Deposit","PCredit":"0.0000","PDebit":"40003.0000","PBalance":"40003.0000"},{"PTransactionDate":"25-Jul-14","PValueDate":"20-Jul-14","PTest":"Cash Deposit","PCredit":"0.0000","PDebit":"40003.0000","PBalance":"40003.0000"}]}
I need this to show up as
name, status, location, accountcategory, accounttype, availablebal, totalcredit, totaldebit, etc as columns,
with the pcredit, pdebit, pbalance, ptransactiondate, pvaluedate and 'ptest' having new values each row as the JSON file shows
I've managed to put this script below together looking online, but it's showing me an empty csv file at the end. What have I done wrong? I have used the online json to csv converters and it works, however as these are sensitive files I'm hoping to write/manage with my own script so I can see exactly how it works. Please see below for my python script - can I have some advise on what to change? thanks
import csv
import json
infile = open("BankStatementJSON1.json","r")
outfile = open("testing.csv","w")
writer = csv.writer(outfile)
for row in json.loads(infile.read()):
writer.writerow(row)
import csv, json, sys
# if you are not using utf-8 files, remove the next line
sys.setdefaultencoding("UTF-8") # set the encode to utf8
# check if you pass the input file and output file
if sys.argv[1] is not None and sys.argv[2] is not None:
fileInput = sys.argv[1]
fileOutput = sys.argv[2]
inputFile = open("BankStatementJSON1.json","r") # open json file
outputFile = open("testing2.csv","w") # load csv file
data = json.load("BankStatementJSON1.json") # load json content
inputFile.close() # close the input file
output = csv.writer("testing.csv") # create a csv.write
output.writerow(data[0].keys()) # header row
for row in data:
output.writerow(row.values()) # values row
This works for the JSON example you posted. The issue is that you have nested dict and you can't create sub-headers and sub rows for pcredit, pdebit, pbalance, ptransactiondate, pvaluedate and ptest as you want.
You can use csv.DictWriter:
import csv
import json
with open("BankStatementJSON1.json", "r") as inputFile: # open json file
data = json.loads(inputFile.read()) # load json content
with open("testing.csv", "w") as outputFile: # open csv file
output = csv.DictWriter(outputFile, data.keys()) # create a writer
output.writeheader()
output.writerow(data)
Make sure you're closing the output file at the end as well.

Categories