I have a .txt with JSON formatted content, that I would like to read, convert it to a JSON object and then log the result. I could read the file and I'm really close, but unfortunately json_data is a string object instead of a JSON object/dictionary. I assume it's something trivial, but I have no idea, because I'm new to Python, so I would really appreciate if somebody could show me the right solution.
import json
filename = 'html-json.txt'
with open(filename, encoding="utf8") as f:
jsonContentTxt = f.readlines()
json_data = json.dumps(jsonContentTxt)
print (json_data)
You may want to consult the docs for the json module. The Python docs are generally pretty great and this is no exception.
f.readlines() will read the lines of f points to—in your case, html-json.txt—and return those lines as a string. So jsonContentTxt is a string in JSON format.
If you simply want to print this string, you could just print jsonContentTxt. On the other hand, if you want to load that JSON into a Python data structure, manipulate it, and then output it, you could do something like this (which uses json.load, a function that takes a file-like object and returns an object such as a dict or list depending on the JSON):
with open(filename, encoding="utf8") as f:
json_content = json.load(f)
# do stuff with json_content, e.g. json_concent['foo'] = 'bar'
# then when you're ready to output:
print json.dumps(json_content)
You may also want to use the indent argument to json.dumps (link here) which will give you a nicely-formatted string.
Read the 2.7 documentation here or the 3.5 documentation here:
json.loads(json_as_string) # Deserializes a string to a json heirarchy
Once you have a deserialized form you can convert it back to json with a dump:
json.dump(json_as_heirarchy)
Related
I'm trying to convert a dict that I can't serialize to string type and write it to a json file. However, when using the dumps method, pretty printing does not occur in the json file.
data = ''
for db_name in client.list_database_names():
db = client[db_name]
for coll_name in db.list_collection_names():
data += str("DATABASE NAME: {}, Collection:{}".format(db_name, coll_name))
data = json.dumps(data, default=str)
json.loads(data)
return data
Here is the result: JSON Image
You need to use the indent argument in json.dumps() to create the pretty effect.
with open('filename.json', 'w') as f:
f.write(json.dumps(data, indent=4)
To get pretty printing using json.dumps() you need to include a parameter like indent=4. See the docs here.
Update, after seeing the image:
The problem you have here is that in your JSON, DbCollectionName is a string that contains more JSON. This is "Nested JSON". You need to call json.loads() on each of those strings to convert them to objects.
So I am working on a conversion file that is taking a dictionary and converting it to a JSON file. Current code looks like:
data = {json_object}
json_string = jsonpickle.encode(data)
with open('/Users/machd/Mac/Documents/VISUAL CODE/CSV_to_JSON/JSON FILES/test.json', 'w') as outfile:
json.dump(json_string, outfile)
But when I go to open that rendered file, it is adding three \ on the front and back of each string.
ps: sorry if I am using the wrong terminology, I am still new to python and don't know the vocabulary that well yet.
Try this
import json
data = {"k": "v"}
with open( 'path_to_file.json', 'w') as f:
json.dump(data, f)
You don't need to use jsonpickle to encode dict data.
The json.dump is a wrapper function that convert data to json format firstly, then write these string data to your file.
The reason why you found \\ exist between each string is that, jsonpickle have took your data to string, after which the quote(") would convert to Escape character when json.dump interact.
Just use the following code to write dict data to json
with open('/Users/machd/Mac/Documents/VISUAL CODE/CSV_to_JSON/JSON FILES/test.json', 'w') as outfile:
json.dump(data, outfile)
I am using python and json to construct a json file. I have a string, 'outputString' which consists of multiple lines of dictionaries turned into jsons, in the following format:
{size:1, title:"Hello", space:0}
{size:21, title:"World", space:10}
{size:3, title:"Goodbye", space:20}
I would like to turn this string of jsons and write a new json file entirely, with each item still being its own line. I would like to turn the string of multiple json objects and turn it into one json file. I have attached the code on how I got outputString and what I have tried to do. Right now, the code I have writes the file, but all on one line. I would like the lines to be separated as the string is.
for value in outputList:
newOutputString = json.dumps(value)
outputString += (newOutputString + "\n")
with open('data.json', 'w') as outfile:
for item in outputString.splitlines():
json.dump(item, outfile)
json.dump("\n",outfile)
PROBLEM: when you json.dump("\n",outfile) it will always be written on the same line as ”\n” is not recognised as a new line in json.
SOLUTION: ensure that you write a new line using python and not a json encoded string:
with open('data.json', 'a') as outfile: # We are appending to the file so that we can add multiple new lines for each of different json strings
for item in outputString.splitlines():
json.dump(item, outfile)
outfile.write("\n”) # write to the file a new line, as you can see this uses a python string, no need to encode with json
See comments for explanation.
Please ensure that the file you write to is empty if you just want these json objects in them.
Your value rows are not in actual json format if the properties do not come between double quotes.
This would be a proper json data format:
{"size":1, "title":"Hello", "space":0}
Having said that here is a solution to your question with the type of data you provided.
I am assuming your data comes like this:
outputList = ['{size:1, title:"Hello", space:0}',
'{size:21, title:"World", space:10}',
'{size:3, title:"Goodbye", space:20}']
so the only thing you need to do is write each value using the file.write() function
Python 3.6 and above:
with open('data.json', 'w') as outfile:
for value in outputList:
outfile.write(f"{value}\n")
Python 3.5 and below:
with open('data.json', 'w') as outfile:
for value in outputList:
outfile.write(value+"\n")
data.json file will look like this:
{size:1, title:"Hello", space:0}
{size:21, title:"World", space:10}
{size:3, title:"Goodbye", space:20}
Note: As someone already commented, your data.json file will not be a true json format ted file but it serves the purpose of your question. Enjoy! :)
"afile" is a previously existing file.
handle=open("afile",'r+b')
data=handle.readline()
handle.close()
# signgenerator is a hashlib.md5() object
signgenerator.update(data)
hex=signgenerator.hexdigest()
print(hex) # prints out 061e3f139c80d04f039b7753de5313ce
and write this to a file
f=open("syncDB.txt",'a')
#hex=hex.encode('utf-8')
pickle.dump(hex,f)
f.close()
But when i read back the file as
while True:
data=f.readline()
print(data)
This gives the output:
b'\x80\x03X \x00\x00\x00061e3f139c80d04f039b7753de5313ceq\x00.\x80\x03X \x00\x00\x00d9afd4bb6bc57679f6b10c0b9610d2e0q\x00.\x80\x03X \x00\x00\x008b70452c46285d825d3670d433151841q\x00.\x80\x03X \x00\x00\x00061e3f139c80d04f039b7753de5313ceq\x00.\x80\x03X \x00\x00\x00d9afd4bb6bc57679f6b10c0b9610d2e0q\x00.\x80\x03X \x00\x00\x008b70452c46285d825d3670d433151841q\x00.\x80\x03X \x00\x00\x00b857c3b319036d72cb85fe8a679531b0q\x00.\x80\x03X \x00\x00\x007532fb972cdb019630a2e5a1373fe1c5q\x00.\x80\x03X \x00\x00\x000126bb23767677d0a246d6be1d2e4d5cq\x00.'
How do i encode to get the same hexdigest back from these bytes??
Also I am getting some gibberish characters in syncDb.txt like "€X" after each line.How do I correctly write the data in a readable form??
You need to unpickle the data:
pickle.load(open('syncDB.txt', 'r+b'))
What you have there is pickled data. Proof:
>>> import pickle
>>> pickle.loads(b'\x80\x03X \x00\x00\x00061e3f139c80d04f039b7753de5313ceq\x00.\x80\x03X \x00\x00\x00d9afd4bb6bc57679f6b10c0b9610d2e0q\x00.\x80\x03X \x00\x00\x008b70452c46285d825d3670d433151841q\x00.\x80\x03X \x00\x00\x00061e3f139c80d04f039b7753de5313ceq\x00.\x80\x03X \x00\x00\x00d9afd4bb6bc57679f6b10c0b9610d2e0q\x00.\x80\x03X \x00\x00\x008b70452c46285d825d3670d433151841q\x00.\x80\x03X \x00\x00\x00b857c3b319036d72cb85fe8a679531b0q\x00.\x80\x03X \x00\x00\x007532fb972cdb019630a2e5a1373fe1c5q\x00.\x80\x03X \x00\x00\x000126bb23767677d0a246d6be1d2e4d5cq\x00.')
'061e3f139c80d04f039b7753de5313ce'
But there's no point in pickling a hex string. You can just put it in the file. The pickle module should be used with more complex structures, like arrays, dicts, or even classes.
Don't pickle the hexdigest, just write it out as text.
with open("afile",'rb') as handle:
data=handle.readline()
signgenerator.update(data)
hex=signgenerator.hexdigest()
with open("syncDB.txt",'ab') as f:
f.write(hex + '\n')
with open("syncDB.txt",'rb') as f:
for data in f:
print(data)
If you really want to use pickle, you need to use the pickle.load function to read the data back from the file.
I am using json and jsonpickle sometimes to serialize objects to files, using the following function:
def json_serialize(obj, filename, use_jsonpickle=True):
f = open(filename, 'w')
if use_jsonpickle:
import jsonpickle
json_obj = jsonpickle.encode(obj)
f.write(json_obj)
else:
simplejson.dump(obj, f)
f.close()
The problem is that if I serialize a dictionary for example, using "json_serialize(mydict, myfilename)" then the entire serialization gets put on one line. This means that I can't grep the file for entries to be inspected by hand, like I would a CSV file. Is there a way to make it so each element of an object (e.g. each entry in a dict, or each element in a list) is placed on a separate line in the JSON output file?
thanks.
(simple)json.dump() has the indent argument. jsonpickle probably has something similar, or in the worst case you can decode it and encode it again.
Jsonpickle uses one of the json backends and so you can try this to your code:
jsonpickle.set_encoder_options('simplejson', sort_keys=True, indent=4)
Update: simplejson has been incorporated into base python, just replace simplejson for json and you'll get the pretty-printed/formatted/non-minified json
jsonpickle.set_encoder_options('json', sort_keys=True, indent=4)