How to convert a series of JSON strings into one json file? - python

I am using python and json to construct a json file. I have a string, 'outputString' which consists of multiple lines of dictionaries turned into jsons, in the following format:
{size:1, title:"Hello", space:0}
{size:21, title:"World", space:10}
{size:3, title:"Goodbye", space:20}
I would like to turn this string of jsons and write a new json file entirely, with each item still being its own line. I would like to turn the string of multiple json objects and turn it into one json file. I have attached the code on how I got outputString and what I have tried to do. Right now, the code I have writes the file, but all on one line. I would like the lines to be separated as the string is.
for value in outputList:
newOutputString = json.dumps(value)
outputString += (newOutputString + "\n")
with open('data.json', 'w') as outfile:
for item in outputString.splitlines():
json.dump(item, outfile)
json.dump("\n",outfile)

PROBLEM: when you json.dump("\n",outfile) it will always be written on the same line as ”\n” is not recognised as a new line in json.
SOLUTION: ensure that you write a new line using python and not a json encoded string:
with open('data.json', 'a') as outfile: # We are appending to the file so that we can add multiple new lines for each of different json strings
for item in outputString.splitlines():
json.dump(item, outfile)
outfile.write("\n”) # write to the file a new line, as you can see this uses a python string, no need to encode with json
See comments for explanation.
Please ensure that the file you write to is empty if you just want these json objects in them.

Your value rows are not in actual json format if the properties do not come between double quotes.
This would be a proper json data format:
{"size":1, "title":"Hello", "space":0}
Having said that here is a solution to your question with the type of data you provided.
I am assuming your data comes like this:
outputList = ['{size:1, title:"Hello", space:0}',
'{size:21, title:"World", space:10}',
'{size:3, title:"Goodbye", space:20}']
so the only thing you need to do is write each value using the file.write() function
Python 3.6 and above:
with open('data.json', 'w') as outfile:
for value in outputList:
outfile.write(f"{value}\n")
Python 3.5 and below:
with open('data.json', 'w') as outfile:
for value in outputList:
outfile.write(value+"\n")
data.json file will look like this:
{size:1, title:"Hello", space:0}
{size:21, title:"World", space:10}
{size:3, title:"Goodbye", space:20}
Note: As someone already commented, your data.json file will not be a true json format ted file but it serves the purpose of your question. Enjoy! :)

Related

Writing a JSON file from dictionary, correcting the output

So I am working on a conversion file that is taking a dictionary and converting it to a JSON file. Current code looks like:
data = {json_object}
json_string = jsonpickle.encode(data)
with open('/Users/machd/Mac/Documents/VISUAL CODE/CSV_to_JSON/JSON FILES/test.json', 'w') as outfile:
json.dump(json_string, outfile)
But when I go to open that rendered file, it is adding three \ on the front and back of each string.
ps: sorry if I am using the wrong terminology, I am still new to python and don't know the vocabulary that well yet.
Try this
import json
data = {"k": "v"}
with open( 'path_to_file.json', 'w') as f:
json.dump(data, f)
You don't need to use jsonpickle to encode dict data.
The json.dump is a wrapper function that convert data to json format firstly, then write these string data to your file.
The reason why you found \\ exist between each string is that, jsonpickle have took your data to string, after which the quote(") would convert to Escape character when json.dump interact.
Just use the following code to write dict data to json
with open('/Users/machd/Mac/Documents/VISUAL CODE/CSV_to_JSON/JSON FILES/test.json', 'w') as outfile:
json.dump(data, outfile)

Exporting and Importing a list of Data Frames as json file in Pandas

Pandas has the DataFrame.to_json and pd.read_json functions that work for single Data Frames. However, I have been trying to figure a way to export and import a list with many Data Frames into and from a single json file. So far, I have come to successfully export the list with this code:
with open('my_file.json', 'w') as outfile:
outfile.writelines([json.dumps(df.to_dict()) for df in list_of_df])
This creates a json file with all the Data Frames converted to dicts. However, when I try to do the reverse to read the file and extract my Data Frames, I get an error. This is the code:
with open('my_file.json', 'r') as outfile:
list_of_df = [pd.DataFrame.from_dict(json.loads(item)) for item in
outfile]
The error I get is:
JSONDecodeError: Extra data
I think the problem is that I have to include somehow the opposite of 'writelines', which is 'readlines' in the code that reads the json file, but I do not know how to do it. Any help will be appreciated!
By using writelines your data isn't really a list in the python sense, which makes reading it a bit tricky. I'd recommend instead writing to your file like this:
with open('my_file.json', 'w') as outfile:
outfile.write(json.dumps([df.to_dict() for df in list_of_df]))
Which means we can read it back just as simply using:
with open('my_file.json', 'r') as outfile:
list_of_df = [pd.DataFrame.from_dict(item) for item in json.loads(outfile.read())]

How do I setup a loop that convert the dictionary into string?

So, I do have the excel csv file. and I need to change the dictionary into string. How do I do that? The purpose of my assignment is It writes the contents of a database1 dictionary back to a CSV file in the same format
as it was the original input file
Original file name is vt.csv
I know I have to set up a loop that converts the dictionary into string but I just can't figure out how. Please help.#I have to use this format
def write_vt(db, filename):
file = open('new_vt.csv', 'w')
This works:
with open(filename, 'r') as f:
str = f.read()
You don't need the csv module.

Python Loop through dictionary

I have a file that I wish to parse. It has data in the json format, but the file is not a json file. I want to loop through the file, and pull out the ID where totalReplyCount is greater than 0.
{ "totalReplyCount": 0,
"newLevel":{
"main":{
"url":"http://www.someURL.com",
"name":"Ronald Whitlock",
"timestamp":"2016-07-26T01:22:03.000Z",
"text":"something great"
},
"id":"z12wcjdxfqvhif5ee22ys5ejzva2j5zxh04"
}
},
{ "totalReplyCount": 4,
"newLevel":{
"main":{
"url":"http://www.someUR2L.com",
"name":"other name",
"timestamp":"2016-07-26T01:22:03.000Z",
"text":"something else great"
},
"id":"kjsdbesd2wd2eedd23rf3r3r2e2dwe2edsd"
}
},
My initial attempt was to do the following
def readCsv(filename):
with open(filename, 'r') as csvFile:
for row in csvFile["totalReplyCount"]:
print row
but I get an error stating
TypeError: 'file' object has no attribute 'getitem'
I know this is just an attempt at printing and not doing what I want to do, but I am a novice at python and lost as to what I am doing wrong. What is the correct way to do this? My end result should look like this for the ids:
['insdisndiwneien23e2es', 'lsndion2ei2esdsd',....]
EDIT 1- 7/26/16
I saw that I made a mistake in my formatting when I copied the code (it was late, I was tired..). I switched it to a proper format that is more like JSON. This new edit properly matches file I am parsing. I then tried to parse it with JSON, and got the ValueError: Extra data: line 2 column 1 - line X column 1:, where line X is the end of the line.
def readCsv(filename):
with open(filename, 'r') as file:
data=json.load(file)
pprint(data)
I also tried DictReader, and got a KeyError: 'totalReplyCount'. Is the dictionary un-ordered?
EDIT 2 -7/27/16
After taking a break, coming back to it, and thinking it over, I realized that what I have (after proper massaging of the data) is a CSV file, that contains a proper JSON object on each line. So, I have to parse the CSV file, then parse each line which is a top level, whole and complete JSON object. The code I used to try and parse this is below but all I get is the first string character, an open curly brace '{' :
def readCsv(filename):
with open(filename, 'r') as csvfile:
for row in csv.DictReader(csvfile):
for item in row:
print item[0]
I am guessing that the DictReader is converting the json object to a string, and that is why I am only getting a curly brace as opposed to the first key. If I was to do print item[0:5] I would get a mish mash of the first 4 characters in an un-ordered fashion on each line, which I assume is because the format has turned into an un-ordered list? I think I understand my problem a little bit better, but still wrapping my head around the data structures and the methods used to parse them. What am I missing?
After reading the question and all the above answers, please check if this is useful to you.
I have considered input file as simple file not as csv or json file.
Flow of code is as follow:
Open and read a file in reverse order.
Search for ID in line. Extract ID and store in temp variable.
Go on reading file line by line and search totalReplyCount.
Once you got totalReplyCount, check it if it greater than 0.
If yes, then store temp ID in id_list and re-initialize temp variable.
import re
tmp_id_to_store = ''
id_list = []
for line in reversed(open("a.txt").readlines()):
m = re.search('"id":"(\w+)"', line.rstrip())
if m:
tmp_id_to_store = m.group(1)
n = re.search('{ "totalReplyCount": (\d+),', line.rstrip())
if n:
fou = n.group(1)
if int(fou) > 0:
id_list.append(tmp_id_to_store)
tmp_id_to_store = ''
print id_list
More check points can be added.
As the error stated, Your csvFile is a file object, it is not a dict object, so you can't get an item out of it.
if your csvFile is in CSV format, you can use the csv module to read each line of the csv into a dict :
import csv
with open(filename) as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print row['totalReplyCount']
note the DictReader method from the csv module, it will read your csv line and parse it into dict object
If your input file is JSON why not just use the JSON library to parse it and then run a for loop over that data. Then it is just a matter of iterating over the keys and extracting data.
import json
from pprint import pprint
with open('data.json') as data_file:
data = json.load(data_file)
pprint(data)
Parsing values from a JSON file using Python?
Look at Justin Peel's answer. It should help.
Parsing values from a JSON file in Python , this link has it all # Parsing values from a JSON file using Python? via stackoverflow.
Here is a shell one-liner, should solve your problem, though it's not python.
egrep -o '"(?:totalReplyCount|id)":(.*?)$' filename | awk '/totalReplyCount/ {if ($2+0 > 0) {getline; print}}' | cut -d: -f2
output:
"kjsdbesd2wd2eedd23rf3r3r2e2dwe2edsd"

How can I read a file that contains a list of dictionaries into python?

I've created a file that contains a list of dictionaries that I was working with. Unfortunately, I'm not sure how to re-import that file back into python in that same format.
I initially wrote the file out as JSON and as text, like this:
d = list_of_dics
jsonarray = json.dumps(d)
with open('list_of_dics.txt', 'w') as outfile:
json.dump(jsonarray, outfile)
with open('list_of_dics.json', 'w') as outfile:
json.dump(jsonarray, outfile)
Can anyone suggest a way to re-import these into python in the same format — i.e., a list of dictionaries?
You're using json.dump() incorrectly. You should be passing d to it directly, not the output of json.dumps(d). Once you do that, you can use json.load() to retrieve your data.
with open('list_of_dics.txt', 'r') as infile:
d = json.load(infile)
With
json.dumps(d)
you've (JSON-)encoded list d in a string (which you assign to a variable misleadingly called jsonarray).
With
json.dump(jsonarray, outfile)
you've JSON-encoded that string and written the result to outfile.
So it's now (unnecessarily) doubly JSON-encoded in the files list_of_dics.txt and list_of_dics.json.
To cleanly get it back from there (without resorting to manual string manipulation) you have to decode it twice:
import json
with open('list_of_dics.json', 'r') as infile:
recovered_d = json.loads(json.load(infile))

Categories