How to read JSON objects from Tweet.py results

How to read JSON objects from Tweet.py results - python

I am trying to read the JSON file created by Tweet.py. However, whatever I tried I am receiving an ValueError consistently.
ValueError: Expecting property name: line 1 column 3 (char 2)
JSON results are in the format of:
{ 'Twitter Data' : [ {
"contributors": null,
"coordinates": null,
"created_at": "Tue Oct 24 15:55:21 +0000 2017",
"entities": {
"hashtags": ["#football"]
}
} , {
"contributors": johnny,
"coordinates": null,
"created_at": "Tue Oct 24 15:55:21 +0000 2017",
"entities": {
"hashtags": ["#football" , "#FCB"]
}
} , ... ] }
There are at least 50 of these JSON objects in the file, which are separated by commas.
My Python script to read this json file is:
twitter_data=[]
with open('#account.json' , 'r') as json_data:
for line in json_data:
twitter_data.append(json.loads(line))
print twitter_data
Tweet.py writes these Json objects by using:
json.dump(status._json,file,sort_keys = True,indent = 4)
I would appreciate any help and guidance on how to read this file!
Thank you.

The { 'Twitter Data' bit should be { "Twitter Data" as well as "Johnny"
That is to say keys and values (strings) must be enclosed in double quotes.
with open("#account.json","r") as json_data:
data = json_data.readlines()
twitter_data.append(json.loads(data))
Also, Haven't used this myself but this might be of help as well: https://jsonlint.com

First off, as both #Rob and #silent have noted, 'Twitter Data' should be "Twitter Data". Json needs double quotes, not single quotes to delimit a string.
Secondly, when reading with json.load() it expects a file Object, so when calling json.load(), just pass in json_data and it will read the whole json file into memory:
with open('#account.json' , 'r') as json_data:
contents = json.load(json_data)
EDIT:
for handling multiple objects at once:
def get_objs(f):
content = f.read()
# Get each object in the contents of the file object.
# This is kinda clunky and inelegant, but it should work
objs = ['{}{}'.format(i, '}') for i in content.split('},')]
# Last json_obj probably got an unnecessary "}" at the end, so trim the
# last character from it
objs[-1] = objs[-1][0:-1]
json_objs = [json.loads(i) for i in objs]
return json_objs
and then just go:
with open('#account.json', 'r') as json_data:
json_objs = get_objs(json_data)
Hopefully this will work for you. It did for me when I tested it on a simalarly formatted json file.

Related

Inconsistent error: json.decoder.JSONDecodeError: Extra data: line 30 column 2 (char 590)

I have .json documents generated from the same code. Here multiple nested dicts are being dumped to the json documents. While loadling with json.load(opened_json), I get the json.decoder.JSONDecodeError: Extra data: line 30 column 2 (char 590) like error for some of of the files whereas not for others. It is not understood why. What is the proper way to dump multiple dicts (maybe nested) into json docs and in my current case what is way to read them all? (Extra: Dicts can be over multiple lines, so 'linesplitting' does not work probably.)
Ex: Say I am json.dump(data, file) with data = {'meta_data':{some_data}, 'real_data':{more_data}}.
Let us take these two fake files:
{
"meta_data": {
"id": 0,
"start": 1238397024.0,
"end": 1238397056.0,
"best": []
},
"real_data": {
"YAS": {
"t1": [
1238397047.2182617
],
"v1": [
5.0438767766574255
],
"v2": [
4.371670270544587
]
}
}
}
and
{
"meta_data": {
"id": 0,
"start": 1238397056.0,
"end": 1238397088.0,
"best": []
},
"real_data": {
"XAS": {
"t1": [
1238397047.2182617
],
"v1": [
5.0438767766574255
],
"v2": [
4.371670270544587
]
}
}
}
and try to load them using json.load(open(file_path)) for duplicatling the problem.

You chose not to offer a
reprex.
Here is the code I'm running
which is intended to represent what you're running.
If there is some discrepancy, update the original
question to clarify the details.
import json
from io import StringIO
some_data = dict(a=1)
more_data = dict(b=2)
data = {"meta_data": some_data, "real_data": more_data}
file = StringIO()
json.dump(data, file)
file.seek(0)
d = json.load(file)
print(json.dumps(d, indent=4))
output
{
"meta_data": {
"a": 1
},
"real_data": {
"b": 2
}
}
As is apparent, over the circumstances you have
described the JSON library does exactly what we
would expect of it.
EDIT
Your screenshot makes it pretty clear
that a bunch of ASCII NUL characters are appended
to the 1st file.
We can easily reproduce that JSONDecodeError: Extra data
symptom by adding a single line:
json.dump(data, file)
file.write(chr(0))
(Or perhaps chr(0) * 80 more closely matches the truncated screenshot.)
If your file ends with extraneous characters, such as NUL,
then it will no longer be valid JSON and compliant
parsers will report a diagnostic message when they
attempt to read it.
And there's nothing special about NUL, as a simple
file.write("X") suffices to produce that same
diagnostic.
You will need to trim those NULs from the file's end
before attempting to parse it.
For best results, use UTF8 unicode encoding with no
BOM.
Your editor should have settings for
switching to utf8.
Use $ file foo.json to verify encoding details,
and $ iconv --to-code=UTF-8 < foo.json
to alter an unfortunate encoding.

You need to read the file, you can do both of these.
data = json.loads(open("data.json").read())
or
with open("data.json", "r") as file:
data = json.load(file)

Is there a way to just grab one subset of json data from a large text file?

I'm looking to pull the "name" field from a large json text file and be able to store them in another file for later, but I'm getting every piece of data that was in my previous json file albeit slightly modified. How do I make it so I only grab the data after the "name": field in my json file?
I've tried
names = []
with open('./out.json', 'r') as f:
data = json.load(f)
for name in data:
names.append(data[name])
with open('./names.json','w') as f:
for name in names:
f.write('%s\r\n' % name)
and I'm getting my exact json file back, with no formatting and u' in front of everything, likely from the json.load(f), but I have no idea how to remedy this.
my text file is formatted like this, if it matters:
{
"array":[
{
"name": "Seranul",
"id": 5,
"type": "Paladin",
"itemLevel": 414,
"icon": "Paladin-Holy",
"total": 11107150,
"activeTime": 2205387,
"activeTimeReduced": 2205387
},
{
"name": "Contherious",
"id": 9,
"type": "Hunter",
"itemLevel": 412,
"icon": "Hunter-Marksmanship",
"total": 51102811,
"activeTime": 2637303,
"activeTimeReduced": 2637303
},
{
"name": "Unicorns",
"id": 17,
"type": "Priest",
"itemLevel": null,
"icon": "Priest",
"total": 12252005,
"activeTime": 1768883,
"activeTimeReduced": 1761797
},
...
}
]}
I'm expecting to see the corresponding data for each name field, but I'm getting my entire document back.

It looks like your code is ignoring the structure of the JSON data. Specifically, you are iterating through the keys in the JSON dictionary, which is just array, and then appending the value to you names list. This results in the whole array property being put into your names variable.
Here is what I believe you want: iterate through the entries in array and and them to a list, then export that as JSON to another file.
import json
names = []
with open('./out.json', 'r') as f:
data = json.load(f)
for entry in data["array"]:
names.append(entry["name"])
with open('./names.json', 'w') as f:
f.write(json.dumps(names))
This will result in the following JSON in names.json:
["Seranul", "Contherious", "Unicorns"]

Parse JSON structures in a txt file containing JSON and text structures

I have a txt file with json structures. the problem is the file does not only contain json structures but also raw text like log error:
2019-01-18 21:00:05.4521|INFO|Technical|Batch Started|
2019-01-18 21:00:08.8740|INFO|Technical|Got Entities List from 20160101 00:00 :
{
"name": "1111",
"results": [{
"filename": "xxxx",
"numberID": "7412"
}, {
"filename": "xgjhh",
"numberID": "E52"
}]
}
2019-01-18 21:00:05.4521|INFO|Technical|Batch Started|
2019-01-18 21:00:08.8740|INFO|Technical|Got Entities List from 20160101 00:00 :
{
"name": "jfkjgjkf",
"results": [{
"filename": "hhhhh",
"numberID": "478962"
}, {
"filename": "jkhgfc",
"number": "12544"
}]
}
I read the .txt file but trying to patch the jason structures I have an error:
IN :
import json
with open("data.txt", "r", encoding="utf-8", errors='ignore') as f:
json_data = json.load(f)
OUT : json.decoder.JSONDecodeError: Extra data: line 1 column 5 (char 4)
I would like to parce json and save as csv file.

A more general solution to parsing a file with JSON objects mixed with other content without any assumption of the non-JSON content would be to split the file content into fragments by the curly brackets, start with the first fragment that is an opening curly bracket, and then join the rest of fragments one by one until the joined string is parsable as JSON:
import re
fragments = iter(re.split('([{}])', f.read()))
while True:
try:
while True:
candidate = next(fragments)
if candidate == '{':
break
while True:
candidate += next(fragments)
try:
print(json.loads(candidate))
break
except json.decoder.JSONDecodeError:
pass
except StopIteration:
break
This outputs:
{'name': '1111', 'results': [{'filename': 'xxxx', 'numberID': '7412'}, {'filename': 'xgjhh', 'numberID': 'E52'}]}
{'name': 'jfkjgjkf', 'results': [{'filename': 'hhhhh', 'numberID': '478962'}, {'filename': 'jkhgfc', 'number': '12544'}]}

This solution will strip out the non-JSON structures, and wrap them in a containing JSON structure.This should do the job for you. I'm posting this as is for expediency, then I'll edit my answer for a more clear explanation. I'll edit this first bit when I've done that:
import json
with open("data.txt", "r", encoding="utf-8", errors='ignore') as f:
cleaned = ''.join([item.strip() if item.strip() is not '' else '-split_here-' for item in f.readlines() if '|INFO|' not in item]).split('-split_here-')
json_data = json.loads(json.dumps(('{"entries":[' + ''.join([entry + ', ' for entry in cleaned])[:-2] + ']}')))
Output:
{"entries":[{"name": "1111","results": [{"filename": "xxxx","numberID": "7412"}, {"filename": "xgjhh","numberID": "E52"}]}, {"name": "jfkjgjkf","results": [{"filename": "hhhhh","numberID": "478962"}, {"filename": "jkhgfc","number": "12544"}]}]}
What's going on here?
In the cleaned = ... line, we're using a list comprehension that creates a list of the lines in the file (f.readlines()) that do not contain the string |INFO| and adds the string -split_here- to the list whenever there's a blank line (where .strip() yields '').
Then, we're converting that list of lines (''.join()) into a string.
Finally we're converting that string (.split('-split_here-') into a list of lists, separating the JSON structures into their own lists, marked by blank lines in data.txt.
In the json_data = ... line, we're appending a ', ' to each of the JSON structures using a list comprehension.
Then, we convert that list back into a single string, stripping off the last ', ' (.join()[:-2]. [:-2]slices of the last two characters from the string.).
We then wrap the string with '{"entries":[' and ']}' to make the whole thing a valid JSON structure, and feed it to json.dumps and json.loads to clean any encoding and load your data a a python object.

You could do one of several things:
On the Command Line, remove all lines where, say, "|INFO|Technical|" appears (assuming this appears in every line of raw text):
sed -i '' -e '/\|INFO\|Technical/d' yourfilename (if on Mac),
sed -i '/\|INFO\|Technical/d' yourfilename (if on Linux).
Move these raw lines into their own JSON fields

Use the "text structures" as a delimiter between JSON objects.
Iterate over the lines in the file, saving them to a buffer until you encounter a line that is a text line, at which point parse the lines you've saved as a JSON object.
import re
import json
def is_text(line):
# returns True if line starts with a date and time in "YYYY-MM-DD HH:MM:SS" format
line = line.lstrip('|') # you said some lines start with a leading |, remove it
return re.match("^(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})", line)
json_objects = []
with open("data.txt") as f:
json_lines = []
for line in f:
if not is_text(line):
json_lines.append(line)
else:
# if there's multiple text lines in a row json_lines will be empty
if json_lines:
json_objects.append(json.loads("".join(json_lines)))
json_lines = []
# we still need to parse the remaining object in json_lines
# if the file doesn't end in a text line
if json_lines:
json_objects.append(json.loads("".join(json_lines)))
print(json_objects)
Repeating logic in the last two lines is a bit ugly, but you need to handle the case where the last line in your file is not a text line, so when you're done with the for loop you need parse the last object sitting in json_lines if there is one.
I'm assuming there's never more than one JSON object between text lines and also my regex expression for a date will break in 8,000 years.

You could count curly brackets in your file to find beginning and ending of your jsons, and store them in list, here found_jsons.
import json
open_chars = 0
saved_content = []
found_jsons = []
for i in content.splitlines():
open_chars += i.count('{')
if open_chars:
saved_content.append(i)
open_chars -= i.count('}')
if open_chars == 0 and saved_content:
found_jsons.append(json.loads('\n'.join(saved_content)))
saved_content = []
for i in found_jsons:
print(json.dumps(i, indent=4))
Output
{
"results": [
{
"numberID": "7412",
"filename": "xxxx"
},
{
"numberID": "E52",
"filename": "xgjhh"
}
],
"name": "1111"
}
{
"results": [
{
"numberID": "478962",
"filename": "hhhhh"
},
{
"number": "12544",
"filename": "jkhgfc"
}
],
"name": "jfkjgjkf"
}

TypeError: string indices must be integers // working with JSON as dict in python

Okay, so I've been banging my head on this for the last 2 days, with no real progress. I am a beginner with python and coding in general, but this is the first issue I haven't been able to solve myself.
So I have this long file with JSON formatting with about 7000 entries from the youtubeapi.
right now I want to have a short script to print certain info ('videoId') for a certain dictionary key (refered to as 'key'):
My script:
import json
f = open ('path file.txt', 'r')
s = f.read()
trailers = json.loads(s)
print(trailers['key']['Items']['id']['videoId'])
# print(trailers['key']['videoId'] gives same response
Error:
print(trailers['key']['Items']['id']['videoId'])
TypeError: string indices must be integers
It does work when I want to print all the information for the dictionary key:
This script works
import json
f = open ('path file.txt', 'r')
s = f.read()
trailers = json.loads(s)
print(trailers['key'])
Also print(type(trailers)) results in class 'dict', as it's supposed to.
My JSON File is formatted like this and is from the youtube API, youtube#searchListResponse.
{
"kind": "youtube#searchListResponse",
"etag": "",
"nextPageToken": "",
"regionCode": "",
"pageInfo": {
"totalResults": 1000000,
"resultsPerPage": 1
},
"items": [
{
"kind": "youtube#searchResult",
"etag": "",
"id": {
"kind": "youtube#video",
"videoId": ""
},
"snippet": {
"publishedAt": "",
"channelId": "",
"title": "",
"description": "",
"thumbnails": {
"default": {
"url": "",
"width": 120,
"height": 90
},
"medium": {
"url": "",
"width": 320,
"height": 180
},
"high": {
"url": "",
"width": 480,
"height": 360
}
},
"channelTitle": "",
"liveBroadcastContent": "none"
}
}
]
}
What other information is needed to be given for you to understand the problem?

The following code gives me all the videoId's from the provided sample data (which is no id's at all in fact):
import json
with open('sampledata', 'r') as datafile:
data = json.loads(datafile.read())
print([item['id']['videoId'] for item in data['items']])
Perhaps you can try this with more data.
Hope this helps.

I didn't really look into the youtube api but looking at the code and the sample you gave it seems you missed out a [0]. Looking at the structure of json there's a list in key items.
import json
f = open ('json1.json', 'r')
s = f.read()
trailers = json.loads(s)
print(trailers['items'][0]['id']['videoId'])

I've not used json before at all. But it's basically imported in the form of dicts with more dicts, lists etc. Where applicable. At least from my understanding.
So when you do type(trailers) you get type dict. Then you do dict with trailers['key']. If you do type of that, it should also be a dict, if things work correctly. Working through the items in each dict should in the end find your error.
Pythons error says you are trying find the index/indices of a string, which only accepts integers, while you are trying to use a dict. So you need to find out why you are getting a string and not dict when using each argument.
Edit to add an example. If your dict contains a string on key 'item', then you get a string in return, not a new dict which you further can get a dict from. item in the json for example, seem to be a list, with dicts in it. Not a dict itself.

List Indices in json in Python

I've got a json file that I've pulled from a web service and am trying to parse it. I see that this question has been asked a whole bunch, and I've read whatever I could find, but the json data in each example appears to be very simplistic in nature. Likewise, the json example data in the python docs is very simple and does not reflect what I'm trying to work with. Here is what the json looks like:
{"RecordResponse": {
"Id": blah
"Status": {
"state": "complete",
"datetime": "2016-01-01 01:00"
},
"Results": {
"resultNumber": "500",
"Summary": [
{
"Type": "blah",
"Size": "10000000000",
"OtherStuff": {
"valueOne": "first",
"valueTwo": "second"
},
"fieldIWant": "value i want is here"
The code block in question is:
jsonFile = r'C:\Temp\results.json'
with open(jsonFile, 'w') as dataFile:
json_obj = json.load(dataFile)
for i in json_obj["Summary"]:
print(i["fieldIWant"])
Not only am I not getting into the field I want, but I'm also getting a key error on trying to suss out "Summary".
I don't know how the indices work within the array; once I even get into the "Summary" field, do I have to issue an index manually to return the value from the field I need?

The example you posted is not valid JSON (no commas after object fields), so it's hard to dig in much. If it's straight from the web service, something's messed up. If you did fix it with proper commas, the "Summary" key is within the "Results" object, so you'd need to change your loop to
with open(jsonFile, 'w') as dataFile:
json_obj = json.load(dataFile)
for i in json_obj["Results"]["Summary"]:
print(i["fieldIWant"])
If you don't know the structure at all, you could look through the resulting object recursively:
def findfieldsiwant(obj, keyname="Summary", fieldname="fieldIWant"):
try:
for key,val in obj.items():
if key == keyname:
return [ d[fieldname] for d in val ]
else:
sub = findfieldsiwant(val)
if sub:
return sub
except AttributeError: #obj is not a dict
pass
#keyname not found
return None

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to read JSON objects from Tweet.py results - python

Related

Inconsistent error: json.decoder.JSONDecodeError: Extra data: line 30 column 2 (char 590)

Is there a way to just grab one subset of json data from a large text file?

Parse JSON structures in a txt file containing JSON and text structures

TypeError: string indices must be integers // working with JSON as dict in python

List Indices in json in Python

Categories

Resources