I have a text file temp.txt of the sort --
{
"names" : [ {"index" : 0, "cards": "\n\nbingo" ...} ]
"more stuff": ...
}
{
"names" : [ {"index" : 0, "cards": "\nfalse" ...} ]
"more stuff": ...
}
.
.
Here's how I am trying to load it --
def read_op(filename):
with open("temp.txt", 'r') as file:
for line in file:
print (json.load(line))
return lines
But this throws the error:
JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1
I'm not sure what I am doing wrong here. Are there alternatives to reading this file another way?
Reading it line-by-line will not work because every line is not a valid JSON object by itself.
You should pre-process the data before loading it as a JSON, for example by doing the following:
Read the whole content
Add commas between every 2 objects
Add [] to contain the data
Load with json.loads
import re
import json
with open(r'test.txt', 'r') as fp:
data = fp.read()
concat_data = re.sub(r"\}\n\{", "},{", data)
json_data_as_str = f"[{concat_data}]"
json_data = json.loads(json_data_as_str)
print(json_data)
Related
https://github.com/Asabeneh/30-Days-Of-Python/blob/ff24ab221faaec455b664ad5bbdc6e0de76c3caf/data/countries_data.json
how can i loop through this countries_data.json file (see link above) to get 'languages'
i have tried:
import json
f = open("countries_data.json")
file = f.read()
# print(file)
for item in file:
print(item)
You have everything correct and set up but you didn't load the json file. Also there is a double space on "f = open". You also didn't open the file with the read parameter, not too sure if its needed though.
Correct code:
import json
f = open("countries_data.json", "r")
file = json.loads(f.read())
for item in file:
print(item)
Hope this helped, always double check your code.
You can see that you import the json module at the beginning, so you might as well use it
If you go to the documentation you will see a function allowing you to read this file directly.
In the end you end up with just a dictionary list, the code can be summarized as follows.
import json
with open("test/countries_data.json") as file:
data = json.load(file)
for item in data:
print(item["languages"])
You are missing one essential step, which is parsing the JSON data to Python datastructures.
import json
# read file
f = open("countries.json")
# parse JSON to Python datastructures
countries = json.load(f)
# now you have a list of countries
print(type(countries))
# loop through list of countries
for country in countries:
# you can access languages with country["languages"]; JSON objects are Python dictionaries now
print(type(country))
for language in country["languages"]:
print(language)
f.close()
Expected output:
<class 'list'>
<class 'dict'>
Pashto
Uzbek
Turkmen
...
You can use the json built-in package to deserialize the content of that file.
A sample of usage
data = """[
{
"name": "Afghanistan",
"capital": "Kabul",
"languages": [
"Pashto",
"Uzbek",
"Turkmen"
],
"population": 27657145,
"flag": "https://restcountries.eu/data/afg.svg",
"currency": "Afghan afghani"
},
{
"name": "Ă…land Islands",
"capital": "Mariehamn",
"languages": [
"Swedish"
],
"population": 28875,
"flag": "https://restcountries.eu/data/ala.svg",
"currency": "Euro"
}]"""
# deserializing
print(json.loads(data))
For more complex content have a look to the JSONDecoder.
doc
EDIT:
import json
path = # my file
with open(path, 'r') as fd:
# iterate over the dictionaries
for d in json.loads(fd.read()):
print(d['languages'])
EDIT: extra - top 10 languages
import json
import itertools as it
path = # path to file
with open(path, 'r') as fd:
text = fd.read()
languages_from_file = list(it.chain(*(d['languages'] for d in json.loads(text))))
# get unique "list" of languages
languages_all = set(languages_from_file)
# count the repeated languages
languages_count = {l: languages_from_file.count(l) for l in languages_all}
# order them per descending value
top_ten_languages = sorted(languages_count.items(), key=lambda k: k[1], reverse=True)[:10]
print(top_ten_languages)
I want to get the dictionary below inside my data array, couldn't find much on this as most tutorials show you how to chuck variable right into the json, without ordering it into an array or object.
the JSON:
#JSON file
{
"data" :
[
]
}
the python file I want to export from:
import json
#Python file
your_facebook_info = {'name':'johhny', 'age':213, 'money':'lots'}
desired outcome:
{
"data" :
[
your_facebook_info = {
'name':'johhny',
'age':213,
'money':'lots'
}
]
}
Try this. You read the file, append the new data, write that databack to the file.
import json
#Python file
your_facebook_info = {'name':'johhny', 'age':213, 'money':'lots'}
with open('text.json','r') as inputfile:
data = json.load(inputfile)
data['data'].append(your_facebook_info)
with open("text.json", "w") as outfile:
json.dump(data, outfile)
I had a code, which gave me an empty DataFrame with no saved tweets.
I tried to debug it by putting print(line) under the for line in json file: and json_data = json.loads(line).
That resulted a KeyError.
How do I fix it?
Thank you.
list_df = list()
# read the .txt file, line by line, and append the json data in each line to the list
with open('tweet_json.txt', 'r') as json_file:
for line in json_file:
print(line)
json_data = json.loads(line)
print(line)
tweet_id = json_data['tweet_id']
fvrt_count = json_data['favorite_count']
rtwt_count = json_data['retweet_count']
list_df.append({'tweet_id': tweet_id,
'favorite_count': fvrt_count,
'retweet_count': rtwt_count})
# create a pandas DataFrame using the list
df = pd.DataFrame(list_df, columns = ['tweet_id', 'favorite_count', 'retweet_count'])
df.head()
Your comment says you're trying to save to a file, but your code kind of says that you're trying to read from a file. Here are examples of how to do both:
Writing to JSON
import json
import pandas as pd
content = { # This just dummy data, in the form of a dictionary
"tweet1": {
"id": 1,
"msg": "Yay, first!"
},
"tweet2": {
"id": 2,
"msg": "I'm always second :("
}
}
# Write it to a file called "tweet_json.txt" in JSON
with open("tweet_json.txt", "w") as json_file:
json.dump(content, json_file, indent=4) # indent=4 is optional, it makes it easier to read
Note the w (as in write) in open("tweet_json.txt", "w"). You're using r (as in read), which doesn't give you permission to write anything. Also note the use of json.dump() rather than json.load(). We then get a file that looks like this:
$ cat tweet_json.txt
{
"tweet1": {
"id": 1,
"msg": "Yay, first!"
},
"tweet2": {
"id": 2,
"msg": "I'm always second :("
}
}
Reading from JSON
Let's read the file that we just wrote, using pandas read_json():
import pandas as pd
df = pd.read_json("tweet_json.txt")
print(df)
Output looks like this:
>>> df
tweet1 tweet2
id 1 2
msg Yay, first! I'm always second :(
I have a txt file with json structures. the problem is the file does not only contain json structures but also raw text like log error:
2019-01-18 21:00:05.4521|INFO|Technical|Batch Started|
2019-01-18 21:00:08.8740|INFO|Technical|Got Entities List from 20160101 00:00 :
{
"name": "1111",
"results": [{
"filename": "xxxx",
"numberID": "7412"
}, {
"filename": "xgjhh",
"numberID": "E52"
}]
}
2019-01-18 21:00:05.4521|INFO|Technical|Batch Started|
2019-01-18 21:00:08.8740|INFO|Technical|Got Entities List from 20160101 00:00 :
{
"name": "jfkjgjkf",
"results": [{
"filename": "hhhhh",
"numberID": "478962"
}, {
"filename": "jkhgfc",
"number": "12544"
}]
}
I read the .txt file but trying to patch the jason structures I have an error:
IN :
import json
with open("data.txt", "r", encoding="utf-8", errors='ignore') as f:
json_data = json.load(f)
OUT : json.decoder.JSONDecodeError: Extra data: line 1 column 5 (char 4)
I would like to parce json and save as csv file.
A more general solution to parsing a file with JSON objects mixed with other content without any assumption of the non-JSON content would be to split the file content into fragments by the curly brackets, start with the first fragment that is an opening curly bracket, and then join the rest of fragments one by one until the joined string is parsable as JSON:
import re
fragments = iter(re.split('([{}])', f.read()))
while True:
try:
while True:
candidate = next(fragments)
if candidate == '{':
break
while True:
candidate += next(fragments)
try:
print(json.loads(candidate))
break
except json.decoder.JSONDecodeError:
pass
except StopIteration:
break
This outputs:
{'name': '1111', 'results': [{'filename': 'xxxx', 'numberID': '7412'}, {'filename': 'xgjhh', 'numberID': 'E52'}]}
{'name': 'jfkjgjkf', 'results': [{'filename': 'hhhhh', 'numberID': '478962'}, {'filename': 'jkhgfc', 'number': '12544'}]}
This solution will strip out the non-JSON structures, and wrap them in a containing JSON structure.This should do the job for you. I'm posting this as is for expediency, then I'll edit my answer for a more clear explanation. I'll edit this first bit when I've done that:
import json
with open("data.txt", "r", encoding="utf-8", errors='ignore') as f:
cleaned = ''.join([item.strip() if item.strip() is not '' else '-split_here-' for item in f.readlines() if '|INFO|' not in item]).split('-split_here-')
json_data = json.loads(json.dumps(('{"entries":[' + ''.join([entry + ', ' for entry in cleaned])[:-2] + ']}')))
Output:
{"entries":[{"name": "1111","results": [{"filename": "xxxx","numberID": "7412"}, {"filename": "xgjhh","numberID": "E52"}]}, {"name": "jfkjgjkf","results": [{"filename": "hhhhh","numberID": "478962"}, {"filename": "jkhgfc","number": "12544"}]}]}
What's going on here?
In the cleaned = ... line, we're using a list comprehension that creates a list of the lines in the file (f.readlines()) that do not contain the string |INFO| and adds the string -split_here- to the list whenever there's a blank line (where .strip() yields '').
Then, we're converting that list of lines (''.join()) into a string.
Finally we're converting that string (.split('-split_here-') into a list of lists, separating the JSON structures into their own lists, marked by blank lines in data.txt.
In the json_data = ... line, we're appending a ', ' to each of the JSON structures using a list comprehension.
Then, we convert that list back into a single string, stripping off the last ', ' (.join()[:-2]. [:-2]slices of the last two characters from the string.).
We then wrap the string with '{"entries":[' and ']}' to make the whole thing a valid JSON structure, and feed it to json.dumps and json.loads to clean any encoding and load your data a a python object.
You could do one of several things:
On the Command Line, remove all lines where, say, "|INFO|Technical|" appears (assuming this appears in every line of raw text):
sed -i '' -e '/\|INFO\|Technical/d' yourfilename (if on Mac),
sed -i '/\|INFO\|Technical/d' yourfilename (if on Linux).
Move these raw lines into their own JSON fields
Use the "text structures" as a delimiter between JSON objects.
Iterate over the lines in the file, saving them to a buffer until you encounter a line that is a text line, at which point parse the lines you've saved as a JSON object.
import re
import json
def is_text(line):
# returns True if line starts with a date and time in "YYYY-MM-DD HH:MM:SS" format
line = line.lstrip('|') # you said some lines start with a leading |, remove it
return re.match("^(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})", line)
json_objects = []
with open("data.txt") as f:
json_lines = []
for line in f:
if not is_text(line):
json_lines.append(line)
else:
# if there's multiple text lines in a row json_lines will be empty
if json_lines:
json_objects.append(json.loads("".join(json_lines)))
json_lines = []
# we still need to parse the remaining object in json_lines
# if the file doesn't end in a text line
if json_lines:
json_objects.append(json.loads("".join(json_lines)))
print(json_objects)
Repeating logic in the last two lines is a bit ugly, but you need to handle the case where the last line in your file is not a text line, so when you're done with the for loop you need parse the last object sitting in json_lines if there is one.
I'm assuming there's never more than one JSON object between text lines and also my regex expression for a date will break in 8,000 years.
You could count curly brackets in your file to find beginning and ending of your jsons, and store them in list, here found_jsons.
import json
open_chars = 0
saved_content = []
found_jsons = []
for i in content.splitlines():
open_chars += i.count('{')
if open_chars:
saved_content.append(i)
open_chars -= i.count('}')
if open_chars == 0 and saved_content:
found_jsons.append(json.loads('\n'.join(saved_content)))
saved_content = []
for i in found_jsons:
print(json.dumps(i, indent=4))
Output
{
"results": [
{
"numberID": "7412",
"filename": "xxxx"
},
{
"numberID": "E52",
"filename": "xgjhh"
}
],
"name": "1111"
}
{
"results": [
{
"numberID": "478962",
"filename": "hhhhh"
},
{
"number": "12544",
"filename": "jkhgfc"
}
],
"name": "jfkjgjkf"
}
I would like to ask if there is an easy way to modify JSON by using Python?
I have found some of the relevant topic- How to update json file with python But could not figure out the solution for my current issue.
Currently, JSON looks like this:
{
"X": [
{
"sample_topic_x":"sample_content_x_1",
...
}
{
"sample_topic_x":"sample_content_x_2",
...
}
......
]
"Y": [
{
"sample_topic_y":"sample_content_y_1",
...
}
{
"sample_topic_y":"sample_content_y_2",
...
}
......
]
}
Required: To be accepted by BQ / Need to remove "Y", keep only "X" in this format.
{"sample_topic_x":"sample_content_x_1",.....}
{"sample_topic_x":"sample_content_x_2",.....}
{"sample_topic_x":"sample_content_x_3",.....}
Any relevant documentation, topics?
P.S> Update 1.0
import json
json_path = 'C:\XXX\exportReport.json'
def updateJsonFile():
jsonFile = open(json_path, "r") # Open the JSON file for reading
data = json.load(jsonFile) # Read the JSON into the buffer
jsonFile.close() # Close the JSON file
updateJsonFile()
Solution:
import json
json_path = 'C:\XXX\exportReport.json'
output_path = 'C:\XXX\your_output_file.txt'
with open(json_path) as f:
data = json.loads(f.read())
# Opening output file in append mode
# Note: Output file is not JSON, as the required format is not valid json
with open(output_file, "a+") as op:
for element in data.get('X'):
op.write(json.dumps(element) + "\n")
Explanation:
Load the input json file using json.loads. The output file will be a plain text file and not a JSON file as the required output format is not a valid JSON. Use a .txt file for storing the output. Store value of json.loads() in data. To get inner element X which is a list of dictionaries, use data.get('X'), which will return list. Iterate over it and write json.dumps() to the output file, each element in a newline.
C:\XXX\exportReport.json
{
"X": [
{
"sample_topic_x":"sample_content_x_1",
...
}
{
"sample_topic_x":"sample_content_x_2",
...
}
......
]
"Y": [
{
"sample_topic_y":"sample_content_y_1",
...
}
{
"sample_topic_y":"sample_content_y_2",
...
}
......
]
}
C:\XXX\your_output_file.txt
{"sample_topic_x":"sample_content_x_1",.....}
{"sample_topic_x":"sample_content_x_2",.....}
{"sample_topic_x":"sample_content_x_3",.....}
You need extract parent data at first. And definite this to a variable. And search "X" data in this variable.