serializing JSON files with newlines in Python - python

I am using json and jsonpickle sometimes to serialize objects to files, using the following function:
def json_serialize(obj, filename, use_jsonpickle=True):
f = open(filename, 'w')
if use_jsonpickle:
import jsonpickle
json_obj = jsonpickle.encode(obj)
f.write(json_obj)
else:
simplejson.dump(obj, f)
f.close()
The problem is that if I serialize a dictionary for example, using "json_serialize(mydict, myfilename)" then the entire serialization gets put on one line. This means that I can't grep the file for entries to be inspected by hand, like I would a CSV file. Is there a way to make it so each element of an object (e.g. each entry in a dict, or each element in a list) is placed on a separate line in the JSON output file?
thanks.

(simple)json.dump() has the indent argument. jsonpickle probably has something similar, or in the worst case you can decode it and encode it again.

Jsonpickle uses one of the json backends and so you can try this to your code:
jsonpickle.set_encoder_options('simplejson', sort_keys=True, indent=4)
Update: simplejson has been incorporated into base python, just replace simplejson for json and you'll get the pretty-printed/formatted/non-minified json
jsonpickle.set_encoder_options('json', sort_keys=True, indent=4)

Related

Dumps Not Write To JSON file Pretty Print I Already using "indent=4" I added JSON View

I'm trying to convert a dict that I can't serialize to string type and write it to a json file. However, when using the dumps method, pretty printing does not occur in the json file.
data = ''
for db_name in client.list_database_names():
db = client[db_name]
for coll_name in db.list_collection_names():
data += str("DATABASE NAME: {}, Collection:{}".format(db_name, coll_name))
data = json.dumps(data, default=str)
json.loads(data)
return data
Here is the result: JSON Image
You need to use the indent argument in json.dumps() to create the pretty effect.
with open('filename.json', 'w') as f:
f.write(json.dumps(data, indent=4)
To get pretty printing using json.dumps() you need to include a parameter like indent=4. See the docs here.
Update, after seeing the image:
The problem you have here is that in your JSON, DbCollectionName is a string that contains more JSON. This is "Nested JSON". You need to call json.loads() on each of those strings to convert them to objects.

Writing a JSON file from dictionary, correcting the output

So I am working on a conversion file that is taking a dictionary and converting it to a JSON file. Current code looks like:
data = {json_object}
json_string = jsonpickle.encode(data)
with open('/Users/machd/Mac/Documents/VISUAL CODE/CSV_to_JSON/JSON FILES/test.json', 'w') as outfile:
json.dump(json_string, outfile)
But when I go to open that rendered file, it is adding three \ on the front and back of each string.
ps: sorry if I am using the wrong terminology, I am still new to python and don't know the vocabulary that well yet.
Try this
import json
data = {"k": "v"}
with open( 'path_to_file.json', 'w') as f:
json.dump(data, f)
You don't need to use jsonpickle to encode dict data.
The json.dump is a wrapper function that convert data to json format firstly, then write these string data to your file.
The reason why you found \\ exist between each string is that, jsonpickle have took your data to string, after which the quote(") would convert to Escape character when json.dump interact.
Just use the following code to write dict data to json
with open('/Users/machd/Mac/Documents/VISUAL CODE/CSV_to_JSON/JSON FILES/test.json', 'w') as outfile:
json.dump(data, outfile)

Save/Load a Dictionary

I've found a couple of others asking for help with this, but not specifically what I'm trying to do. I have a dictionary full of various formats (int, str, bool, etc) and I'm trying to save it so I can load it at a later time. Here is a basic version of the code without all the extra trappings that are irrelevant for this.
petStats = { 'name':"", 'int':1, 'bool':False }
def petSave(pet):
with open(pet['name']+".txt", "w+") as file:
for k,v in pet.items():
file.write(str(k) + ':' + str(v) + "\n")
def digimonLoad(petName):
dStat = {}
with open(petName+".txt", "r") as file:
for line in file:
(key, val) = line.split(":")
dStat[str(key)] = val
print(petName,"found. Loading",petName+".")
return dStat
In short I'm just brute forcing it by saving a text file with a Key:Value on each line, then split them all back up on load. Unfortunately this turns all of my int and bool into strings. Is there a file format I could use to save a dictionary to (I don't need to be able to read it, but the conveniance would be nice) that I could easily load back in?
This works for a basic dictionary but if I start adding things like arrays this is going to get out of hand as it is.
Use module json.
import json
def save_pet(pet):
filename = <Whatever filename you want>
with open(filename, 'w') as f:
f.write(json.dumps(pet))
def load_pet(filename):
with open(filename) as f:
pet = json.loads(f.read())
return pet
Use pickle. This is part of the standard library, so you can just import it.
import pickle
pet_stats = {'name':"", 'int':1, 'bool':False}
def pet_save(pet):
with open(pet['name'] + '.pickle', 'wb') as f:
pickle.dump(pet, f, pickle.HIGHEST_PROTOCOL)
def digimon_load(pet_name):
with open(pet_name + '.pickle', 'rb') as f:
return pickle.load(f)
Pickle works on more data types than JSON, and automatically loads them as the right Python type. (There are ways to save more types with JSON, but it takes more work.) JSON (or XML) is better if you need the output to be human-readable, or need to share it with non-Python programs, but neither appears to be necessary for your use case. Pickle will be easiest.
If you need to see what's in the file, just load it using Python or
python -m pickle foo.pickle
instead of a text editor. (Only do this to pickle files from sources you trust, pickle is not at all secure against hacking.)
Q: Is there a file format I could use to save a dictionary to load back in?
A: Yes, there are many. XML and JSON come immediately to mind.
For example:
jsonfile.txt
{
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
Here's an example reading the file into a dictionary:
import json
with open('data.txt','r') as json_file:
data = json.load(json_file)
... and an example writing the dictionary to JSON:
import json
with open('data.txt','w') as fp:
fp.write(json.dumps(data))
If you prefer XML, there are many libraries, including xmltodict:
import xmltodict
with open('path/to/file.xml') as fd:
doc = xmltodict.parse(fd.read())
There are two useful words that you may not know about yet : serialization and pickle.
Serialization refers to the process of converting a data structure (like your dictionary) to a stream of bytes that can be written to storage, and later retrieved from storage to recreate that data structure. This is a common task and your intuition is correct: trying to do this all by yourself will quickly get out of hand.
Pickle is the standard python module for implementing serialization. It’s easy to use, mature and works with a large set of Python data types. You can read more about pickle here : https://docs.python.org/3/library/pickle.html

Can't read JSON from .txt and convert back to JSON object

I have a .txt with JSON formatted content, that I would like to read, convert it to a JSON object and then log the result. I could read the file and I'm really close, but unfortunately json_data is a string object instead of a JSON object/dictionary. I assume it's something trivial, but I have no idea, because I'm new to Python, so I would really appreciate if somebody could show me the right solution.
import json
filename = 'html-json.txt'
with open(filename, encoding="utf8") as f:
jsonContentTxt = f.readlines()
json_data = json.dumps(jsonContentTxt)
print (json_data)
You may want to consult the docs for the json module. The Python docs are generally pretty great and this is no exception.
f.readlines() will read the lines of f points to—in your case, html-json.txt—and return those lines as a string. So jsonContentTxt is a string in JSON format.
If you simply want to print this string, you could just print jsonContentTxt. On the other hand, if you want to load that JSON into a Python data structure, manipulate it, and then output it, you could do something like this (which uses json.load, a function that takes a file-like object and returns an object such as a dict or list depending on the JSON):
with open(filename, encoding="utf8") as f:
json_content = json.load(f)
# do stuff with json_content, e.g. json_concent['foo'] = 'bar'
# then when you're ready to output:
print json.dumps(json_content)
You may also want to use the indent argument to json.dumps (link here) which will give you a nicely-formatted string.
Read the 2.7 documentation here or the 3.5 documentation here:
json.loads(json_as_string) # Deserializes a string to a json heirarchy
Once you have a deserialized form you can convert it back to json with a dump:
json.dump(json_as_heirarchy)

Python: Converting Entire Directory of JSON to Python Dictionaries to send to MongoDB

I'm relatively new to Python, and extremely new to MongoDB (as such, I'll only be concerned with taking the text files and converting them). I'm currently trying to take a bunch of .txt files that are in JSON to move them into MongoDB. So, my approach is to open each file in the directory, read each line, convert it from JSON to a dictionary, and then over-write that line that was JSON as a dictionary. Then it'll be in a format to send to MongoDB
(If there's any flaw in my reasoning, please point it out)
At the moment, I've written this:
"""
Kalil's step by step iteration / write.
JSON dumps takes a python object and serializes it to JSON.
Loads takes a JSON string and turns it into a python dictionary.
So we return json.loads so that we can take that JSON string from the tweet and save it as a dictionary for Pymongo
"""
import os
import json
import pymongo
rootdir='~/Tweets'
def convert(line):
line = file.readline()
d = json.loads(lines)
return d
for subdir, dirs, files in os.walk(rootdir):
for file in files:
f=open(file, 'r')
lines = f.readlines()
f.close()
f=open(file, 'w')
for line in lines:
newline = convert(line)
f.write(newline)
f.close()
But it isn't writing.
Which... As a rule of thumb, if you're not getting the effect that you're wanting, you're making a mistake somewhere.
Does anyone have any suggestions?
When you decode a json file you don't need to convert line by line as the parser will iterate over the file for you (that is unless you have one json document per line).
Once you've loaded the json document you'll have a dictionary which is a data structure and cannot be directly written back to file without first serializing it into a certain format such as json, yaml or many others (the format mongodb uses is called bson but your driver will handle the encoding for you).
The overall process to load a json file and dump it into mongo is actually pretty simple and looks something like this:
import json
from glob import glob
from pymongo import Connection
db = Connection().test
for filename in glob('~/Tweets/*.txt'):
with open(filename) as fp:
doc = json.load(fp)
db.tweets.save(doc)
a dictionary in python is an object that lives within the program, you can't save the dictionary directly to a file unless you pickle it (pickling is a way to save objects in files so you can retrieve it latter). Now I think a better approach would be to read the lines from the file, load the json which converts that json to a dictionary and save that info into mongodb right away, no need to save that info into a file.

Categories