save json to file, load it and turn it back to object - python

I'm currently building an application in python where I have a class Corpus. I would like to convert this class to a json format and save it to a json file. Then load the file and finally turn back the json to its Class Corpus.
In order to do that I use the library jsonpickle. The problem is when I load the json, the type is a dictionary and jsonpickle.decode wants a string. I tried to convert the dictionary to a string but its not working.
I hope someone will be able to help me.
Here is my code of my class "Json" (to save and load my Corpus)"
import json, jsonpickle
class Json:
def __init__(self):
self.corpus = {}
def saveCorpus(self,corpus):
jsonCorpus = jsonpickle.encode(corpus,indent=4,make_refs=False)
with open('json_data.json', 'w') as outfile:
outfile.write(jsonCorpus)
def loadCorpus(self):
with open('json_data.json', 'r') as f:
self.corpus = json.load(f)
def getCorpus(self):
return self.corpus
error :
TypeError: the JSON object must be str, bytes or bytearray, not dict

I found the problem.
The issue was the way I was saving my json to a file.
Here is the solution:
def saveCorpus(self,corpus):
jsonCorpus = jsonpickle.encode(corpus,indent=4,make_refs=False)
with open('json_data.json', 'w') as outfile:
json.dump(jsonCorpus, outfile)

Related

Pandas to JSON file formatting issue, adding \ to strings

I am using the pandas.DataFrame.to_json to convert a data frame to JSON data.
data = df.to_json(orient="records")
print(data)
This works fine and the output when printing is as expected in the console.
[{"n":"f89be390-5706-4ef5-a110-23f1657f4aec:voltage","bt":1610040655,"u":"V","v":237.3},
{"n":"f89be390-5706-4ef5-a110-23f1657f4aec:power","bt":1610040836,"u":"W","v":512.3},
{"n":"f89be390-5706-4ef5-a110-23f1657f4aec:voltage","bt":1610040840,"u":"V","v":238.4}]
The problem comes when uploading it to an external API which converts it to a file format or writing it to a file locally. The output has added \ to the beginning and ends of strings.
def dataToFile(processedData):
with open('data.json', 'w') as outfile:
json.dump(processedData,outfile)
The result is shown in the clip below
[{\"n\":\"f1097ac5-0ee4-48a4-8af5-bf2b58f3268c:power\",\"bt\":1610024746,\"u\":\"W\",\"v\":40.3},
{\"n\":\"f1097ac5-0ee4-48a4-8af5-bf2b58f3268c:voltage\",\"bt\":1610024751,\"u\":\"V\",\"v\":238.5},
{\"n\":\"f1097ac5-0ee4-48a4-8af5-bf2b58f3268c:power\",\"bt\":1610024764,\"u\":\"W\",\"v\":39.7}]
Is there any formatting specifically I should be including/excluding when converting the data to a file format?
Your data variable is a string of json data and not an actual dictionary. You can do a few things:
Use DataFrame.to_json() to write the file, the first argument of to_json() is the file path:
df.to_json('./data.json', orient='records')
Write the json string directly as text:
def write_text(text: str, path: str):
with open(path, 'w') as file:
file.write(text)
data = df.to_json(orient="records")
write_text(data, './data.json')
If you want to play around with the dictionary data:
def write_json(data, path, indent=4):
with open(path, 'w') as file:
json.dump(data, file, indent=indent)
df_data = df.to_dict(orient='records')
# ...some operations here...
write_json(df_data, './data.json')

File handling with functions?

So I got this code that is supposed to sort a dictionary within a json file alphabetically by key:
import json
def values(infile,outfile):
with open(infile):
data=json.load(infile)
data=sorted(data)
with open(outfile,"w"):
json.dump(outfile,data)
values("values.json","values_out.json")
And when I run it I get this error:
AttributeError: 'str' object has no attribute 'read'
I'm pretty sure I messed something up when I made the function but I don't know what.
EDIT: This is what the json file contains:
{"two": 2,"one": 1,"three": 3}
You are using the strings infile and outfile in your json calls, you need to use the file description instance, that you get using as keyword
def values(infile,outfile):
with open(infile) as fic_in:
data = json.load(fic_in)
data = sorted(data)
with open(outfile,"w") as fic_out:
json.dump(data, fic_out)
You can group, with statements
def values(infile, outfile):
with open(infile) as fic_in, open(outfile, "w") as fic_out:
json.dump(sorted(json.load(fic_in)), fic_out)
You forgot to assign the file you opened to a variable. In your current code you open a file, but then try to load the filename rather than the actual file. This code should run because you assign the file object reference to my_file.
import json
def values(infile,outfile):
with open(infile) as my_file:
data=json.load(my_file)
data=sorted(data)
with open(outfile,"w"):
json.dump(outfile,data)
values("values.json","values_out.json")

Read JSON file correctly

I am trying to read a JSON file (BioRelEx dataset: https://github.com/YerevaNN/BioRelEx/releases/tag/1.0alpha7) in Python. The JSON file is a list of objects, one per sentence.
This is how I try to do it:
def _read(self, file_path):
with open(cached_path(file_path), "r") as data_file:
for line in data_file.readlines():
if not line:
continue
items = json.loads(lines)
text = items["text"]
label = items.get("label")
My code is failing on items = json.loads(line). It looks like the data is not formatted as the code expects it to be, but how can I change it?
Thanks in advance for your time!
Best,
Julia
With json.load() you don't need to read each line, you can do either of these:
import json
def open_json(path):
with open(path, 'r') as file:
return json.load(file)
data = open_json('./1.0alpha7.dev.json')
Or, even cooler, you can GET request the json from GitHub
import json
import requests
url = 'https://github.com/YerevaNN/BioRelEx/releases/download/1.0alpha7/1.0alpha7.dev.json'
response = requests.get(url)
data = response.json()
These will both give the same output. data variable will be a list of dictionaries that you can iterate over in a for loop and do your further processing.
Your code is reading one line at a time and parsing each line individually as JSON. Unless the creator of the file created the file in this format (which given it has a .json extension is unlikely) then that won't work, as JSON does not use line breaks to indicate end of an object.
Load the whole file content as JSON instead, then process the resulting items in the array.
def _read(self, file_path):
with open(cached_path(file_path), "r") as data_file:
data = json.load(data_file)
for item in data:
text = item["text"]
label appears to be buried in item["interaction"]

how to clean a JSON file and store it to another file in Python

I am trying to read a JSON file with Python. This file is described by the authors as not strict JSON. In order to convert it to strict JSON, they suggest this approach:
import json
def parse(path):
g = gzip.open(path, 'r')
for l in g:
yield json.dumps(eval(l))
however, not being familiar with Python, I am able to execute the script but I am not able to produce any output file with the new clean JSON. How should I modify the script in order to produce a new JSON file? I have tried this:
import json
class Amazon():
def parse(self, inpath, outpath):
g = open(inpath, 'r')
out = open(outpath, 'w')
for l in g:
yield json.dumps(eval(l), out)
amazon = Amazon()
amazon.parse("original.json", "cleaned.json")
but the output is an empty file. Any help more than welcome
import json
class Amazon():
def parse(self, inpath, outpath):
g = open(inpath, 'r')
with open(outpath, 'w') as fout:
for l in g:
fout.write(json.dumps(eval(l)))
amazon = Amazon()
amazon.parse("original.json", "cleaned.json")
another shorter way of doing this
import json
class Amazon():
def parse(readpath, writepath):
with open(readpath) as g, open(writepath, 'w') as fout:
for l in g:
json.dump(eval(l), fout)
amazon = Amazon()
amazon.parse("original.json", "cleaned.json")
While handling json data it is better to use json modules json.dump(json, output_file) for dumping json in file and json.load(file_path) to load the data. In this way you can get maintain json wile saving and reading json data.
For very large amount of data say 1k+ use python pandas module.

parse CSV in Flask-Admin/WTForms on model_change

Suppose the Flask-Admin view below (note I'm using flask_wtf not wtforms). I'd like to upload a csv file, and then on the model_change, parse the csv and do some stuff to it before returning the result which will then be stored into the model. However, I get the error: TypeError: coercing to Unicode: need string or buffer, FileField found
from flask_wtf.file import FileField
class myView(ModelView):
[...]
def scaffold_form(self):
form_class = super(myView, self).scaffold_form()
form_class.csv = FileField('Upload CSV')
return form_class
def on_model_change(self, form, model):
csv = form.csv
csv_data = self.parse_file(csv)
model.csv_data = csv_data
def parse_file(self, csv):
with open(csv, 'rb') as csvfile:
data = csv.reader(csvfile, delimiter=',')
for row in data:
doSomething()
When accessing csv.data, I'll get <FileStorage: u'samplefile.csv' ('text/csv')> but this object doesn't actually contain the csv's data.
Okay, after digging further into the flask_wtf module I was able to find enough to go on and get a workaround. The FileField object has a data attribute that wraps the werkzeug.datastructures.FileStorage class which exposes the stream attribute. The docs say this typically points to the open file resource, but since I'm doing this in memory in this case it's a stream buffer io.BytesIO object.
Attempting to open():
with open(field.data.stream, 'rU') as csv_data:
[...]
Will result in an TypeError: coercing to Unicode: need string or buffer, _io.BytesIO found.
BUT, csv.reader can also take a string or buffer directly, so we pass in the straight shootin' buffer to the csv.reader:
buffer = csv_field.data.stream # csv_field is the FileField obj
csv_data = csv.reader(buffer, delimiter=',')
for row in csv_data:
print row
I found it interesting that if you need additional coercion to/from Unicode UTF-8, the csv examples in docs provides a snippet on wrapping an encoder/decoder.
For me, this did the trick:
def on_model_change(self, form, model):
tweet_file = form.tweet_keywords_file
buffer = tweet_file.data.stream
file_data = buffer.getvalue()

Categories