So I have a function in python which generates a dict like so:
player_data = {
"player": "death-eater-01",
"guild": "monster",
"points": 50
}
I get this data by calling a function. Once I get this data I want to write this into a file, so I call:
g = open('team.json', 'a')
with g as outfile:
json.dump(player_data, outfile)
This works fine. However my problem is that since a team consists of multiple players I call the function again to get a new player data:
player_data = {
"player": "moon-master",
"guild": "mage",
"points": 250
}
Now when I write this data into the same file, the JSON breaks... as in, it show up like so (missing comma between two nodes):
{
"player": "death-eater-01",
"guild": "monster",
"points": 50
}
{
"player": "moon-master",
"guild": "mage",
"points": 250
}
What I want is to store both this data as a proper JSON into the file. For various reasons I cannot prepare the full JSON object upfront and then save in a single shot. I have to do it incrementally due to network breakage, performance and other issues.
Can anyone guide me on how to do this? I am using Python.
You shouldn't append data to an existing file. Rather, you should build up a list in Python first which contains all the dicts you want to write, and only then dump it to JSON and write it to the file.
If you really can't do that, one option would be to load the existing file, convert it back to Python, then append your new dict, dump to JSON and write it back replacing the whole file.
To produce valid JSON you will need to load the previous contents of the file, append the new data to that and then write it back to the file.
Like so:
def append_player_data(player_data, file_name="team.json"):
if os.path.exists(file_name):
with open(file_name, 'r') as f:
all_data = json.load(f)
else:
all_data = []
all_data.append(player_data)
with open(file_name, 'w') as f:
json.dump(all_data, f)
Related
I want to iterate through some range of pages and save all of them into one json file, that is append page 2 to page 1 and page 3 to already appended page2 to page1.
for i in range(4):
response = requests.post("https://API&page="+str(i))
data = response.json()
my_data = json.load(open( "data.json" ))
my_data.update(my_data)
json.dump(data, open( "data.json", 'w' ))
Basing on some answers from similar question I wrote something like that, but it overwrites instead of appending one page to another.
The json data structure is as follows:
ending with page number that increments every page.
Any idea what I did wrong?
What is it that you are trying to achieve?
You are overwriting the file data.json each time with the result of the response saved in the variable data.
Your code has 2 issues: you are updating a dictionary and you are overwriting a file. Any of the two could solve your problem, depending on what you want to achieve.
It looks like you instead want to save the contents of my_data like that:
json.dump(my_data, open( "data.json", 'w' ))
Anyway, my_data will be a dictionary that gets its contents overwritten each time. Depending on the structure of data, this could not be what you want.
I'll explain better: if your structure is, for any page, something like
{
"username": "retne",
"page": <page-number>
}
my_data will just be equal to the last data page.
Moreover, about the second issue, if you open the file in 'w' mode, you will always overwrite it.
If you will open it in 'a' mode, you will append data to it, obtaining something like this:
{
"username": "retne",
"page": 1
}
{
"username": "pentracchiano",
"page": 2
}
{
"username": "foo",
"page": 3
}
but this is not a valid .json file, because it contains multiple objects with no delimiters.
Try being clearer about your intents and I can provide additional support.
Your code is overwriting the contents of the data.json file on each iteration of the loop. This is because you are using the 'w' mode when calling json.dump, which will overwrite the contents of the file.
To append the data to the file, you can use the 'a' mode instead of 'w' when calling json.dump. This will append the data to the end of the file, rather than overwriting the contents.
Like this
for i in range(4):
response = requests.post("https://API&page="+str(i))
data = response.json()
my_data = json.load(open( "data.json" ))
my_data.update(my_data)
json.dump(data, open( "data.json", 'a' ))
I have a Python script that pulls all records from a SQL table, transforms it and dumps it to a JSON file. It looks something like this.
data = []
# accumulate data into one large list
for record in get_sql_data():
obj = MyObject(record)
transformed_obj = transform(obj)
data.append(transformed_obj)
# dump all data at once to json file
with open("result.json", "w") as f:
json.dump({
"name": "My data",
"timestamp": datetime.datetime.now(),
"data": data,
}, f)
This used to work fine when my SQL table was small. Now it has millions of records and the script crashes because it runs out of memory when adding objects to the data list.
Is there anyway to incrementally dump these objects to the file so I don't need to create this massive data list? Something like this:
with open("result.json", "w") as f:
for record in get_sql_data():
obj = MyObject(record)
transformed_obj = transform(obj)
# write transformed_obj to json file
Note, the object transformation step is very complex so I can't do that step in the database. I need to do it in Python.
The question is very self explanatory.
I need to write or append at a specific key/value of an object in json via python.
I'm not sure how to do it because I'm not good with JSON but here is an example of how I tried to do it (I know it is wrong).
with open('info.json', 'a') as f:
json.dumps(data, ['key1'])
this is the json file:
{"key0":"xxxxx#gmail.com","key1":"12345678"}
A typical usage pattern for JSONs in Python is to load the JSON object into Python, edit that object, and then write the resulting object back out to file.
import json
with open('info.json', 'r') as infile:
my_data = json.load(infile)
my_data['key1'] = my_data['key1'] + 'random string'
# perform other alterations to my_data here, as appropriate ...
with open('scratch.json', 'w') as outfile:
json.dump(my_data, outfile)
Contents of 'info.json' are now
{"key0": "xxxxx#gmail.com", "key1": "12345678random string"}
The key operations were json.load(fp), which deserialized the file into a Python object in memory, and json.dump(obj, fp), which reserialized the edited object to the file being written out.
This may be unsuitable if you're editing very large JSON objects and cannot easily pull the entire object into memory at once, but if you're just trying to learn the basics of Python's JSON library it should help you get started.
An example for appending data to a json file using json library:
import json
raw = '{ "color": "green", "type": "car" }'
data_to_add = { "gear": "manual" }
parsed = json.loads(raw)
parsed.update(data_to_add)
You can then save your changes with json.dumps.
I have been taking some data through the Graph API of facebook and saving it in the json format in a new file. However, whenever I try to save it in the file, the new lines don't actually show as newlines but they show as "\n". Moreover, a backslash also is appended before any data.
For example,
I want the data to be saved in this format:
{
"feed": {
"data": [
{
"message": "XYZ",
"created_time": "0000-00-0000:00:00+0000",
"id": ABC"
}
But it is being saved in this format (in a single line)
"{\n\"feed\": {\n\"data\": [\n{\n\"message\": \"XYZ\",\n\"created_time\": \"0000-00-0000:00:00+0000\",\n\"id\": \"ABC\"\n}
How do I save it in the first format and not the second?
I have been using this code:
url2 = '{0}?fields={1}&access_token={2}'.format(url,fields,token) #the format in which the API receives the request to get the data which is needed
# token is the access token, url is to connect to fb and fields is the data I want
size = os.path.getsize('new.json') #gets the size of the file
content = requests.get(url2).json() #obtaining the content
obj = json.dumps(content,indent = 4)
with open('new.json','r+') as f: #if file size is 0, delete all content and rewrite with new and old content
if size>0:
f.truncate(0)
json.dump(obj,f)
Even though I have used indent, it does not pretty-print in the way I want it to. Help appreciated!
You're using json.dumps to create a JSON representation of your data. Then you're using json.dump to create a JSON representation of that representation. You're double-JSONifying it. Just use one or the other.
I'm trying to load a large JSON File (300MB) to use to parse to excel. I just started running into a MemoryError when I do a json.load(file). Questions similar to this have been posted but have not been able to answer my specific question. I want to be able to return all the data from the json file in one block like I did in the code. What is the best way to do that? The Code and json structure are below:
The code looks like this.
def parse_from_file(filename):
""" proceed to load the json file that given and verified,
it and returns the data that was in the json file so it can actually be read
Args:
filename (string): full branch location, used to grab the json file plus '_metrics.json'
Returns:
data: whatever data is being loaded from the json file
"""
print("STARTING PARSE FROM FILE")
with open(filename) as json_file:
d = json.load(json_file)
json_file.close()
return d
The structure looks like this.
[
{
"analysis_type": "test_one",
"date": 1505900472.25,
"_id": "my_id_1.1.1",
"content": {
.
.
.
}
},
{
"analysis_type": "test_two",
"date": 1605939478.91,
"_id": "my_id_1.1.2",
"content": {
.
.
.
}
},
.
.
.
]
Inside "content" the information is not consistent but has 3 distinct but different possible template that can be predicted based of analysis_type.
i did like this way, hope it will helps you. and maybe you need skip the 1th line "[". and remove "," at a line end if exists "},".
with open(file) as f:
for line in f:
while True:
try:
jfile = ujson.loads(line)
break
except ValueError:
# Not yet a complete JSON value
line += next(f)
# do something with jfile
If all the tested libraries are giving you memory problems my approach would be splitting the file into one per each object inside the array.
If the file has the newlines and padding as you said in the OP I owuld read by line, discarding if it is [ or ] writting the lines to new files every time you find a }, where you also need to remove the commas. Then try to load everyfile and print a message when you end reading each one to see where it fails, if it does.
If the file has no newlines or is not properly padded you would need to start reading char by char keeping too counters, increasing each of them when you find [ or { and decreasing them when you find ] or } respectively. Also take into account that you may need to discard any curly or square bracket that is inside a string, though that may not be needed.