Extracting text from json file and saving into text file - python

import json
file= open('webtext.txt','a+')
with open('output-dataset_v1_webtext.test.jsonl') as json_file:
data= json.load(json_file)
for item in data:
file.write(item)
print(item)
>>> I am getting this error:
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 656)
I have already tried with json.loads()
My json file look like with multiple objects:
{"id": 255000, "ended": true, "length": 134, "text": "Is this restaurant fami"}
{"id": 255001, "ended": true, "length": 713, "text": "Clinton talks about her time of 'refle"}
Any advise will be highly appreciated on how to resolve the existing issue and write the dict['text'] into text file

you need to loop through it:
import json
with open('output-dataset_v1_webtext.test.jsonl','r') as json_file:
for line in json_file.readlines():
data= json.loads(line)
for item in data:
print(item)

Looks like you need to iterate each line in the file and then use json.loads.
Ex:
with open('output-dataset_v1_webtext.test.jsonl') as json_file:
for line in json_file: #Iterate Each Line
data= json.loads(line.strip()) #Use json.loads
for item in data:
file.write(item)
print(item)

I'm certainly not a JSON expert, so there might be a better way to do this, but you should be able to resolve your issue by putting your top-level data into an array:
[
{"id": 255000, "ended": true, "length": 134, "text": "Is this restaurant fami"},
{"id": 255001, "ended": true, "length": 713, "text": "Clinton talks about her time of 'refle"}
]
The error you're getting is basically telling you, that there may be no more than one top-level JSON entity. If you want more, they have to be put in an array.

As others have pointed out, your JSON must be surrounded in square brackets, as it can only have one top level object.
Such as like this:
[
{"id": 255000,"ended": true, "length": 134, "text": "Is this restaurant fami"},
{"id": 255001, "ended": true, "length": 713, "text": "Clinton talks about her time of 'refle"}
]
then, you should be able to use this code to do so what you're trying:
import json
file = open('webtext.txt', 'a')
with open('test.json') as json_file:
data = json.load(json_file)
for item in data:
file.write(str(item))
print(item)
In order to fix your file.write issue you need to cast item as a string, like so: str(item).

Related

How to read multiple JSON objects in a single file?

I want to read multiple JSON objects from a single file imported from local dir. So far this is my simple work:
Data:
[{
"uuid": "6f476e26",
"created": "2018-09-26T06:57:04.142232",
"creator": "admin"
}, {
"uuid": "11d1e78a",
"created": "2019-09-21T11:19:39.845876",
"creator": "admin"
}]
Code:
import json
with open('/home/data.json') as f:
for line in f:
data = json.load(f)
Error:
File "/usr/lib64/python3.8/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 8 (char 7)
My question is similar to Loading and parsing a JSON file with multiple JSON objects and I've tried it however same issue appears. What should I do to solve this issue?
for line in f:
data = json.load(f)
This makes no sense. You are trying to parse the file over and over again, as many times as the number of lines in the file. This is more problematic than it sounds since f is exhausted after the first call to json.load(f).
You don't need the loop, just pass f to json.load:
with open('/home/data.json') as f:
data = json.load(f)
print(data)
outputs
[{'uuid': '6f476e26', 'created': '2018-09-26T06:57:04.142232', 'creator': 'admin'},
{'uuid': '11d1e78a', 'created': '2019-09-21T11:19:39.845876', 'creator': 'admin'}]
Now you can loop over data or directly access a specific index, ie data[0] or data[1].

Trying to load data from a txt file into a variable for requests

with open('data.txt', 'r') as file:
dat2 = file.read()
post2 = {
"id": 5,
"method": "set",
"params": [
{
"data": [
dat2
],
"url": "/config/url"
},
]
"session": sessionkey,
"verbose": 1
}
Data from the file I am trying to read looks as so...
{"name": "Host1","type": "ipmask","subnet": ["0.0.0.0","255.255.255.255"],"dynamic_mapping": null},
{"name": "Host2","type": "ipmask","subnet": ["0.0.0.0","255.255.255.255"],"dynamic_mapping": null}, I am trying to read this data and insert into a variable to put it into post2 for a request. What I have tried so far includes: reading the file and replacing null with None so python can read it as well as stripping all of the whitespace. I have tried using json.loads(), json.load() and json.dumps(), but nothing seems to work. When I try to use json.load() I get the following error.
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\json\__init__.py", line 296, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\json\__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\json\decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 18 (char 145)
After the data is placed into dat2, it gets inserted into post2 as '{data}' instead of {data}. Also yes, I know that file.read() will read the contents of the file into a string, but I have been trying everything since I am struggling to have success using json. I have been stuck on this part of my code for the longest time now and would appreciate and ideas. NOTE: I HAVE LOOKED AT MULTIPLE PYTHON/JSON POSTS FOR READING PYTHON AND NOTHING WORKS SO PLEASE DON'T MARK AS DUPLICATE.
Remove the trailing apostrophe and put the dictionaries in a list in your file, it should look like this:
[{"name": "Host1","type": "ipmask","subnet": ["0.0.0.0","255.255.255.255"],"dynamic_mapping": null}, {"name": "Host2","type": "ipmask","subnet": ["0.0.0.0","255.255.255.255"],"dynamic_mapping": null}]
Then use json.loads to turn it into a list:
with open('data.txt', 'r') as file:
dat2 = file.read()
import json
post2 = {"data": json.loads(dat2)}
And this will make post2 be
{'data': [{'name': 'Host1', 'type': 'ipmask', 'subnet': ['0.0.0.0', '255.255.255.255'], 'dynamic_mapping': None}, {'name': 'Host2', 'type': 'ipmask', 'subnet': ['0.0.0.0', '255.255.255.255'], 'dynamic_mapping': None}]}
Hope this helps!

Is there a way to just grab one subset of json data from a large text file?

I'm looking to pull the "name" field from a large json text file and be able to store them in another file for later, but I'm getting every piece of data that was in my previous json file albeit slightly modified. How do I make it so I only grab the data after the "name": field in my json file?
I've tried
names = []
with open('./out.json', 'r') as f:
data = json.load(f)
for name in data:
names.append(data[name])
with open('./names.json','w') as f:
for name in names:
f.write('%s\r\n' % name)
and I'm getting my exact json file back, with no formatting and u' in front of everything, likely from the json.load(f), but I have no idea how to remedy this.
my text file is formatted like this, if it matters:
{
"array":[
{
"name": "Seranul",
"id": 5,
"type": "Paladin",
"itemLevel": 414,
"icon": "Paladin-Holy",
"total": 11107150,
"activeTime": 2205387,
"activeTimeReduced": 2205387
},
{
"name": "Contherious",
"id": 9,
"type": "Hunter",
"itemLevel": 412,
"icon": "Hunter-Marksmanship",
"total": 51102811,
"activeTime": 2637303,
"activeTimeReduced": 2637303
},
{
"name": "Unicorns",
"id": 17,
"type": "Priest",
"itemLevel": null,
"icon": "Priest",
"total": 12252005,
"activeTime": 1768883,
"activeTimeReduced": 1761797
},
...
}
]}
I'm expecting to see the corresponding data for each name field, but I'm getting my entire document back.
It looks like your code is ignoring the structure of the JSON data. Specifically, you are iterating through the keys in the JSON dictionary, which is just array, and then appending the value to you names list. This results in the whole array property being put into your names variable.
Here is what I believe you want: iterate through the entries in array and and them to a list, then export that as JSON to another file.
import json
names = []
with open('./out.json', 'r') as f:
data = json.load(f)
for entry in data["array"]:
names.append(entry["name"])
with open('./names.json', 'w') as f:
f.write(json.dumps(names))
This will result in the following JSON in names.json:
["Seranul", "Contherious", "Unicorns"]

Can't access JSON loaded with json.dumps(json.loads(input))

Suppose I have json data like this.
{"id": {"$oid": "57dbv34346"}, "from": {"$oid": "57dbv34346sbgwe"}, "type": "int"}
{"id": {"$oid": "57dbv34345"}, "from": {"$oid": "57dbv34345sbgwe"}, "type": "int"}
I wrote a script like this in python
import json
with open('klinks_buildson.json', 'r') as f:
for line in f:
distros_dict = json.dumps(json.loads(line), sort_keys=True, indent=4)
print distros_dict['from']
print "\n"
But It is giving me an error:
print distros_dict['from']
TypeError: string indices must be integers, not str
I want data of the from in both the lines.
You don't need to load the line, you can load the file (assuming its valid json); like this:
with open('klinks_buildjson.json', 'r') as f:
data = json.load(f)
Now data is a list, where each item is an object. You can iterate through it:
for row in data:
print(row['from'])
To fix your immediate problem, remove json.dumps which is used to convert an object to a string, which is not what you want here.
distros_dict = json.loads(line)

List Indices in json in Python

I've got a json file that I've pulled from a web service and am trying to parse it. I see that this question has been asked a whole bunch, and I've read whatever I could find, but the json data in each example appears to be very simplistic in nature. Likewise, the json example data in the python docs is very simple and does not reflect what I'm trying to work with. Here is what the json looks like:
{"RecordResponse": {
"Id": blah
"Status": {
"state": "complete",
"datetime": "2016-01-01 01:00"
},
"Results": {
"resultNumber": "500",
"Summary": [
{
"Type": "blah",
"Size": "10000000000",
"OtherStuff": {
"valueOne": "first",
"valueTwo": "second"
},
"fieldIWant": "value i want is here"
The code block in question is:
jsonFile = r'C:\Temp\results.json'
with open(jsonFile, 'w') as dataFile:
json_obj = json.load(dataFile)
for i in json_obj["Summary"]:
print(i["fieldIWant"])
Not only am I not getting into the field I want, but I'm also getting a key error on trying to suss out "Summary".
I don't know how the indices work within the array; once I even get into the "Summary" field, do I have to issue an index manually to return the value from the field I need?
The example you posted is not valid JSON (no commas after object fields), so it's hard to dig in much. If it's straight from the web service, something's messed up. If you did fix it with proper commas, the "Summary" key is within the "Results" object, so you'd need to change your loop to
with open(jsonFile, 'w') as dataFile:
json_obj = json.load(dataFile)
for i in json_obj["Results"]["Summary"]:
print(i["fieldIWant"])
If you don't know the structure at all, you could look through the resulting object recursively:
def findfieldsiwant(obj, keyname="Summary", fieldname="fieldIWant"):
try:
for key,val in obj.items():
if key == keyname:
return [ d[fieldname] for d in val ]
else:
sub = findfieldsiwant(val)
if sub:
return sub
except AttributeError: #obj is not a dict
pass
#keyname not found
return None

Categories