I've found a couple of others asking for help with this, but not specifically what I'm trying to do. I have a dictionary full of various formats (int, str, bool, etc) and I'm trying to save it so I can load it at a later time. Here is a basic version of the code without all the extra trappings that are irrelevant for this.
petStats = { 'name':"", 'int':1, 'bool':False }
def petSave(pet):
with open(pet['name']+".txt", "w+") as file:
for k,v in pet.items():
file.write(str(k) + ':' + str(v) + "\n")
def digimonLoad(petName):
dStat = {}
with open(petName+".txt", "r") as file:
for line in file:
(key, val) = line.split(":")
dStat[str(key)] = val
print(petName,"found. Loading",petName+".")
return dStat
In short I'm just brute forcing it by saving a text file with a Key:Value on each line, then split them all back up on load. Unfortunately this turns all of my int and bool into strings. Is there a file format I could use to save a dictionary to (I don't need to be able to read it, but the conveniance would be nice) that I could easily load back in?
This works for a basic dictionary but if I start adding things like arrays this is going to get out of hand as it is.
Use module json.
import json
def save_pet(pet):
filename = <Whatever filename you want>
with open(filename, 'w') as f:
f.write(json.dumps(pet))
def load_pet(filename):
with open(filename) as f:
pet = json.loads(f.read())
return pet
Use pickle. This is part of the standard library, so you can just import it.
import pickle
pet_stats = {'name':"", 'int':1, 'bool':False}
def pet_save(pet):
with open(pet['name'] + '.pickle', 'wb') as f:
pickle.dump(pet, f, pickle.HIGHEST_PROTOCOL)
def digimon_load(pet_name):
with open(pet_name + '.pickle', 'rb') as f:
return pickle.load(f)
Pickle works on more data types than JSON, and automatically loads them as the right Python type. (There are ways to save more types with JSON, but it takes more work.) JSON (or XML) is better if you need the output to be human-readable, or need to share it with non-Python programs, but neither appears to be necessary for your use case. Pickle will be easiest.
If you need to see what's in the file, just load it using Python or
python -m pickle foo.pickle
instead of a text editor. (Only do this to pickle files from sources you trust, pickle is not at all secure against hacking.)
Q: Is there a file format I could use to save a dictionary to load back in?
A: Yes, there are many. XML and JSON come immediately to mind.
For example:
jsonfile.txt
{
"brand": "Ford",
"model": "Mustang",
"year": 1964
}
Here's an example reading the file into a dictionary:
import json
with open('data.txt','r') as json_file:
data = json.load(json_file)
... and an example writing the dictionary to JSON:
import json
with open('data.txt','w') as fp:
fp.write(json.dumps(data))
If you prefer XML, there are many libraries, including xmltodict:
import xmltodict
with open('path/to/file.xml') as fd:
doc = xmltodict.parse(fd.read())
There are two useful words that you may not know about yet : serialization and pickle.
Serialization refers to the process of converting a data structure (like your dictionary) to a stream of bytes that can be written to storage, and later retrieved from storage to recreate that data structure. This is a common task and your intuition is correct: trying to do this all by yourself will quickly get out of hand.
Pickle is the standard python module for implementing serialization. It’s easy to use, mature and works with a large set of Python data types. You can read more about pickle here : https://docs.python.org/3/library/pickle.html
Related
The question is very self explanatory.
I need to write or append at a specific key/value of an object in json via python.
I'm not sure how to do it because I'm not good with JSON but here is an example of how I tried to do it (I know it is wrong).
with open('info.json', 'a') as f:
json.dumps(data, ['key1'])
this is the json file:
{"key0":"xxxxx#gmail.com","key1":"12345678"}
A typical usage pattern for JSONs in Python is to load the JSON object into Python, edit that object, and then write the resulting object back out to file.
import json
with open('info.json', 'r') as infile:
my_data = json.load(infile)
my_data['key1'] = my_data['key1'] + 'random string'
# perform other alterations to my_data here, as appropriate ...
with open('scratch.json', 'w') as outfile:
json.dump(my_data, outfile)
Contents of 'info.json' are now
{"key0": "xxxxx#gmail.com", "key1": "12345678random string"}
The key operations were json.load(fp), which deserialized the file into a Python object in memory, and json.dump(obj, fp), which reserialized the edited object to the file being written out.
This may be unsuitable if you're editing very large JSON objects and cannot easily pull the entire object into memory at once, but if you're just trying to learn the basics of Python's JSON library it should help you get started.
An example for appending data to a json file using json library:
import json
raw = '{ "color": "green", "type": "car" }'
data_to_add = { "gear": "manual" }
parsed = json.loads(raw)
parsed.update(data_to_add)
You can then save your changes with json.dumps.
I have written the following python code to populate a JSON file.
import json
data = {}
data['people'] = []
for i in range(0,3):
data['people'].append({
'name': 'C%d'%(i),
'div':i,
'from': 'City%d'%(i)
})
with open('data.txt', 'w') as outfile:
json.dump(data, outfile)
However, my JSON file looks something like this:
{"people": [{"div":0,"from":,"City0":"name":"C0"},{"div":0,"from":,"City0":"name":"C0"}]}
My order of input is different from the output's. What is the reason and how do I rectify this?
What python version do you use? You create a dict, but before python 3.6 order of insertion is not preserved. In python 3.6 order of insertion is preserved, but it's considered implementation detail and should not be relied upon. In python 3.7 the insertion-order preservation nature of dict objects has been declared to be an official part of the Python language spec.
If you are using python version lower than 3.7 use OrderedDict from collections.
import json
from collections import OrderedDict
data = {}
data['people'] = []
for i in range(0,3):
data['people'].append(OrderedDict((
('name', 'C%d' %(i)),
('div', i),
('from', 'City%d'%(i))
)))
with open('data.json', 'w') as outfile:
json.dump(data, outfile)
By the way, why the extension of the file is txt and not json? It doesn't matter and is not related to your problem, but I am curious.
The reason your output is like that is because json files don't really care what order they are in, they hold data and are used in comparison with a file directory. As long as you can get to the file and it actually be the file, its all good. You more or less want it to be exactly how you input it which would be impossible with json.dumps, If you absolutely need it that way, Id just make a string like
string='''{"people": [{#arange in order you want it}]}'''and save it how you would any other file.
If your looking to sort your json, try something i found here Sorting Json
I have some json files with 500MB.
If I use the "trivial" json.load() to load its content all at once, it will consume a lot of memory.
Is there a way to read partially the file? If it was a text, line delimited file, I would be able to iterate over the lines. I am looking for analogy to it.
There was a duplicate to this question that had a better answer. See https://stackoverflow.com/a/10382359/1623645, which suggests ijson.
Update:
I tried it out, and ijson is to JSON what SAX is to XML. For instance, you can do this:
import ijson
for prefix, the_type, value in ijson.parse(open(json_file_name)):
print prefix, the_type, value
where prefix is a dot-separated index in the JSON tree (what happens if your key names have dots in them? I guess that would be bad for Javascript, too...), theType describes a SAX-like event, one of 'null', 'boolean', 'number', 'string', 'map_key', 'start_map', 'end_map', 'start_array', 'end_array', and value is the value of the object or None if the_type is an event like starting/ending a map/array.
The project has some docstrings, but not enough global documentation. I had to dig into ijson/common.py to find what I was looking for.
So the problem is not that each file is too big, but that there are too many of them, and they seem to be adding up in memory. Python's garbage collector should be fine, unless you are keeping around references you don't need. It's hard to tell exactly what's happening without any further information, but some things you can try:
Modularize your code. Do something like:
for json_file in list_of_files:
process_file(json_file)
If you write process_file() in such a way that it doesn't rely on any global state, and doesn't
change any global state, the garbage collector should be able to do its job.
Deal with each file in a separate process. Instead of parsing all the JSON files at once, write a
program that parses just one, and pass each one in from a shell script, or from another python
process that calls your script via subprocess.Popen. This is a little less elegant, but if
nothing else works, it will ensure that you're not holding on to stale data from one file to the
next.
Hope this helps.
Yes.
You can use jsonstreamer SAX-like push parser that I have written which will allow you to parse arbitrary sized chunks, you can get it here and checkout the README for examples. Its fast because it uses the 'C' yajl library.
It can be done by using ijson. The working of ijson has been very well explained by Jim Pivarski in the answer above. The code below will read a file and print each json from the list. For example, file content is as below
[{"name": "rantidine", "drug": {"type": "tablet", "content_type": "solid"}},
{"name": "nicip", "drug": {"type": "capsule", "content_type": "solid"}}]
You can print every element of the array using the below method
def extract_json(filename):
with open(filename, 'rb') as input_file:
jsonobj = ijson.items(input_file, 'item')
jsons = (o for o in jsonobj)
for j in jsons:
print(j)
Note: 'item' is the default prefix given by ijson.
if you want to access only specific json's based on a condition you can do it in following way.
def extract_tabtype(filename):
with open(filename, 'rb') as input_file:
objects = ijson.items(input_file, 'item.drugs')
tabtype = (o for o in objects if o['type'] == 'tablet')
for prop in tabtype:
print(prop)
This will print only those json whose type is tablet.
On your mention of running out of memory I must question if you're actually managing memory. Are you using the "del" keyword to remove your old object before trying to read a new one? Python should never silently retain something in memory if you remove it.
Update
See the other answers for advice.
Original answer from 2010, now outdated
Short answer: no.
Properly dividing a json file would take intimate knowledge of the json object graph to get right.
However, if you have this knowledge, then you could implement a file-like object that wraps the json file and spits out proper chunks.
For instance, if you know that your json file is a single array of objects, you could create a generator that wraps the json file and returns chunks of the array.
You would have to do some string content parsing to get the chunking of the json file right.
I don't know what generates your json content. If possible, I would consider generating a number of managable files, instead of one huge file.
Another idea is to try load it into a document-store database like MongoDB.
It deals with large blobs of JSON well. Although you might run into the same problem loading the JSON - avoid the problem by loading the files one at a time.
If path works for you, then you can interact with the JSON data via their client and potentially not have to hold the entire blob in memory
http://www.mongodb.org/
"the garbage collector should free the memory"
Correct.
Since it doesn't, something else is wrong. Generally, the problem with infinite memory growth is global variables.
Remove all global variables.
Make all module-level code into smaller functions.
in addition to #codeape
I would try writing a custom json parser to help you figure out the structure of the JSON blob you are dealing with. Print out the key names only, etc. Make a hierarchical tree and decide (yourself) how you can chunk it. This way you can do what #codeape suggests - break the file up into smaller chunks, etc
You can parse the JSON file to CSV file and you can parse it line by line:
import ijson
import csv
def convert_json(self, file_path):
did_write_headers = False
headers = []
row = []
iterable_json = ijson.parse(open(file_path, 'r'))
with open(file_path + '.csv', 'w') as csv_file:
csv_writer = csv.writer(csv_file, ',', '"', csv.QUOTE_MINIMAL)
for prefix, event, value in iterable_json:
if event == 'end_map':
if not did_write_headers:
csv_writer.writerow(headers)
did_write_headers = True
csv_writer.writerow(row)
row = []
if event == 'map_key' and not did_write_headers:
headers.append(value)
if event == 'string':
row.append(value)
So simply using json.load() will take a lot of time. Instead, you can load the json data line by line using key and value pair into a dictionary and append that dictionary to the final dictionary and convert it to pandas DataFrame which will help you in further analysis
def get_data():
with open('Your_json_file_name', 'r') as f:
for line in f:
yield line
data = get_data()
data_dict = {}
each = {}
for line in data:
each = {}
# k and v are the key and value pair
for k, v in json.loads(line).items():
#print(f'{k}: {v}')
each[f'{k}'] = f'{v}'
data_dict[i] = each
Data = pd.DataFrame(data_dict)
#Data will give you the dictionary data in dataFrame (table format) but it will
#be in transposed form , so will then finally transpose the dataframe as ->
Data_1 = Data.T
I have a .txt with JSON formatted content, that I would like to read, convert it to a JSON object and then log the result. I could read the file and I'm really close, but unfortunately json_data is a string object instead of a JSON object/dictionary. I assume it's something trivial, but I have no idea, because I'm new to Python, so I would really appreciate if somebody could show me the right solution.
import json
filename = 'html-json.txt'
with open(filename, encoding="utf8") as f:
jsonContentTxt = f.readlines()
json_data = json.dumps(jsonContentTxt)
print (json_data)
You may want to consult the docs for the json module. The Python docs are generally pretty great and this is no exception.
f.readlines() will read the lines of f points to—in your case, html-json.txt—and return those lines as a string. So jsonContentTxt is a string in JSON format.
If you simply want to print this string, you could just print jsonContentTxt. On the other hand, if you want to load that JSON into a Python data structure, manipulate it, and then output it, you could do something like this (which uses json.load, a function that takes a file-like object and returns an object such as a dict or list depending on the JSON):
with open(filename, encoding="utf8") as f:
json_content = json.load(f)
# do stuff with json_content, e.g. json_concent['foo'] = 'bar'
# then when you're ready to output:
print json.dumps(json_content)
You may also want to use the indent argument to json.dumps (link here) which will give you a nicely-formatted string.
Read the 2.7 documentation here or the 3.5 documentation here:
json.loads(json_as_string) # Deserializes a string to a json heirarchy
Once you have a deserialized form you can convert it back to json with a dump:
json.dump(json_as_heirarchy)
I'm relatively new to and encoding and decoding, in fact I don't have any experience with it at all.
I was wondering, how would I decode a dictionary in Python 3 into an unreadable format that would prevent someone from modifying it outside the program?
Likewise, how would I then read from that file and encode the dictionary back?
My test code right now only writes to and reads from a plain text file.
import ast
myDict = {}
#Writer
fileModifier = open('file.txt', 'w')
fileModifier.write(str(myDict)))
fileModifier.close()
#Reader
fileModifier = open('file.txt', 'r')
myDict = ast.literal_eval(fileModifier.read())
fileModifier.close()
Depending on what your dictionary is holding you can use an encoding library like json or pickle (useful for storing my complex python data structures).
Here is an example using json, to use pickle just replace all instances of json with pickle and you should be good to go.
import json
myDict = {}
#Writer
fileModifier = open('file.txt', 'w'):
json.dump(myDict, fileModifier)
fileModifier.close()
#Reader
fileModifier = open('file.txt', 'r'):
myDict = json.load(fileModifier)
fileModifier.close()
The typical thing to use here is either the json or pickle modules (both in the standard library). The process is called "serialization". pickle can serialize almost arbitrary python objects whereas json can only serialize basic types/objects (integers, floats, strings, lists, dictionaries). json is human readible at the end of the day whereas pickle files aren't.
An alternative to encoding/decoding is to simply use a file as a dict, and python has the shelve module which does exactly this. This module uses a file as database and provide a dict-like interface to it.
It has some limitations, for example keys must be strings, and it's obviously slower than a normal dict since it performs I/O operations.