I'm facing this error in python3.6.
My json file looks like this:
{
"id":"776",
"text":"Scientists have just discovered a bizarre pattern in global weather. Extreme heat waves like the one that hit the Eastern US in 2012, leaving at least 82 dead, don't just come out of nowhere."
}
It's encoding 'utf-8' and I checked it online, it is a valid json file. I tried to load it in this way:
p = 'doc1.json'
json.loads(p)
I tried this as well:
p = "doc1.json"
with open(p, "r") as f:
doc = json.load(f)
The error is the same:
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Anyone can help? Thank you!
p = 'doc1.json'
json.loads(p)
you're asking the json module to load the string 'doc1.json' which obviously isn't valid json, it's a filename.
You want to open the file, read the contents, then load the contents using json.loads():
p = 'doc1.json'
with open(p, 'r') as f:
doc = json.loads(f.read())
As suggested in the comments, this could be further simplified to:
p = 'doc1.json'
with open(p, 'r') as f:
doc = json.load(f)
where jon.load() takes a file handle and reads it for you.
Aside
First, your path is not really a path. My response won't be about that, but your path should be something like '.path/to/the/doc1.json' (this example is a relative path).
TL;DR
json.loads is for loading str objects directly; json.load wants a fp or file pointer object which represents a file.
Solution
It appears you are misusing json.loads vs json.load (notice the s in one and not the other). I believe the s stands for string or Python object type str though I may be wrong. There is a very important distinction here; your path is represented by a string, but you actually care about the file as an object.
So of course this breaks because json.loads thinks it is trying to parse a type str object that is actually an invalid json:
path = 'a/path/like/this/is/a/string.json'
json.loads(path)
---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
...
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Using this properly would look something like this:
json_str = '{"hello": "world!"}'
json.loads(json_str)
# The expected output.
{'hello': 'world!'}
Since json.loads does not meet our needs—it can, however it is extra and unnecessary code—we can use its friend json.load. json.load wants its first parameter to be an fp, but what is that? Well, it stands for file pointer which is a fancy way of saying "an object that represents a file." This has to do with opening a file to do something to or with it. In our case, we want to read the file into json.load.
We will use the context manager open() since that is a good thing to do. Note, I do not know what the contents of your doc1.json is so I replaced the output with my own.
path = 'path/to/the/doc1.json'
with open(path, 'r') as fp:
print(json.load(fp))
# The expected output.
{'hello': 'world!'}
Generally, I think I would use json.load a lot more than json.loads (with the s) since I read directly from json files. If you load some json into your code using a third party package, you may find your self reading that in your code and then passing as a str to json.loads.
Resources
Python's json — https://docs.python.org/3/library/json.html#module-json
Related
I am trying to document the Reports, Visuals and measures used in a PBIX file. I have a PBIX file(containing some visuals and pointing to Tabular Model in Live Mode), I then exported it as a PBIT, renamed to zip. Now in this zip file we have a folder called Report, within that we have a file called Layout. The layout file looks like a JSON file but when I try to read it via python,
import json
# Opening JSON file
f = open("C://Layout",)
# returns JSON object as
# a dictionary
#f1 = str.replace("\'", "\"")
data = json.load(f)
I get below issue,
JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
Renaming it to Layout.json doesn't help either and gives the same issue. Is there a easy way or a parser to specifically parse this Layout file and get below information out of it
Report Name | Visual Name | Column or Measure Name
Not sure if you have come across an answer for your question yet, but I have been looking into something similar.
Here is what I have had to do in order to get the file to parse correctly.
Big items here to not is the encoding and all the whitespace replacements.
data will then contain the parsed object.
with open('path/to/Layout', 'r', encoding="cp1252") as json_file:
data_str = json_file.read().replace(chr(0), "").replace(chr(28), "").replace(chr(29), "").replace(chr(25), "")
data = json.loads(data_str)
This script may help: https://github.com/grenzi/powerbi-model-utilization
a portion of the script is:
def get_layout_from_pbix(pbixpath):
"""
get_layout_from_pbix loads a pbix file, grabs the layout from it, and returns json
:parameter pbixpath: file to read
:return: json goodness
"""
archive = zipfile.ZipFile(pbixpath, 'r')
bytes_read = archive.read('Report/Layout')
s = bytes_read.decode('utf-16-le')
json_obj = json.loads(s, object_hook=parse_pbix_embedded_json)
return json_obj
had similar issue.
my work around was to save it as Layout.txt with utf-8 encoding, then continued as you have
I'm trying to extract a specific value from log files in a directory.
Now the log files contains JSON data and i want to extract the value for the id field.
JSON Data look something like this
{
id: "123",
name: "foo"
description: "bar baz"
}
Code Looks like this
def test_load_json_directly(self):
with open('source_data/testing123.json') as log_file:
data = json.load(log_file)
print data
def test_load_json_from_iteration(self, dir_path, file_ext):
path_name = os.path.join(dir_path, '*.' + file_ext)
files = glob.glob(path_name)
for filename in files:
with open(filename) as log_file:
data = json.load(log_file)
print data
Now I try to call the function test_load_json_directly the JSON string gets loaded correctly. No problem there. This is just to check the correct behavior of the json.load function.
The issue is when I try to call the function test_load_json_from_iteration, the JSON string is not being recognized and returns an error.
ValueError: No JSON object could be decoded
What am I doing wrong here?
Your json is invalid. The property names and the values must be wrapped with quotes (except if they are numbers). You're also missing the commas.
The most probable reason for this error is an error in a json file. Since json module doesn't show detailed errors, you can use the simplejson module to see what's actually happening.
Change your code to:
import simplejson
.
.
.
data = simplejson.load(log_file)
And look at the error message. It will show you the line and the column where it fails.
Ex:
simplejson.errors.JSONDecodeError: Expecting value: line 5 column 17 (char 84)
Hope it helps :) Feel free to ask if you have any doubts.
I am trying to unzip some .json.gz files, but gzip adds some characters to it, and hence makes it unreadable for JSON.
What do you think is the problem, and how can I solve it?
If I use unzipping software such as 7zip to unzip the file, this problem disappears.
This is my code:
with gzip.open('filename' , 'rb') as f:
json_content = json.loads(f.read())
This is the error I get:
Exception has occurred: json.decoder.JSONDecodeError
Extra data: line 2 column 1 (char 1585)
I used this code:
with gzip.open ('filename', mode='rb') as f:
print(f.read())
and realized that the file starts with b' (as shown below):
b'{"id":"tag:search.twitter.com,2005:5667817","objectType":"activity"
I think b' is what makes the file unworkable for the next stage. Do you have any solution to remove the b'? There are millions of this zipped file, and I cannot manually do that.
I uploaded a sample of these files in the following link
just a few json.gz files
The problem isn't with that b prefix you're seeing with print(f.read()), which just means the data is a bytes sequence (i.e. integer ASCII values) not a sequence of UTF-8 characters (i.e. a regular Python string) — json.loads() will accept either. The JSONDecodeError is because the data in the gzipped file isn't in valid JSON format, which is required. The format looks like something known as JSON Lines — which the Python standard library json module doesn't (directly) support.
Dunes' answer to the question #Charles Duffy marked this—at one point—as a duplicate of wouldn't have worked as presented because of this formatting issue. However from the sample file you added a link to in your question, it looks like there is a valid JSON object on each line of the file. If that's true of all of your files, then a simple workaround is to process each file line-by-line.
Here's what I mean:
import json
import gzip
filename = '00_activities.json.gz' # Sample file.
json_content = []
with gzip.open(filename , 'rb') as gzip_file:
for line in gzip_file: # Read one line.
line = line.rstrip()
if line: # Any JSON data on it?
obj = json.loads(line)
json_content.append(obj)
print(json.dumps(json_content, indent=4)) # Pretty-print data parsed.
Note that the output it prints shows what valid JSON might have looked like.
I have a file that I wish to parse. It has data in the json format, but the file is not a json file. I want to loop through the file, and pull out the ID where totalReplyCount is greater than 0.
{ "totalReplyCount": 0,
"newLevel":{
"main":{
"url":"http://www.someURL.com",
"name":"Ronald Whitlock",
"timestamp":"2016-07-26T01:22:03.000Z",
"text":"something great"
},
"id":"z12wcjdxfqvhif5ee22ys5ejzva2j5zxh04"
}
},
{ "totalReplyCount": 4,
"newLevel":{
"main":{
"url":"http://www.someUR2L.com",
"name":"other name",
"timestamp":"2016-07-26T01:22:03.000Z",
"text":"something else great"
},
"id":"kjsdbesd2wd2eedd23rf3r3r2e2dwe2edsd"
}
},
My initial attempt was to do the following
def readCsv(filename):
with open(filename, 'r') as csvFile:
for row in csvFile["totalReplyCount"]:
print row
but I get an error stating
TypeError: 'file' object has no attribute 'getitem'
I know this is just an attempt at printing and not doing what I want to do, but I am a novice at python and lost as to what I am doing wrong. What is the correct way to do this? My end result should look like this for the ids:
['insdisndiwneien23e2es', 'lsndion2ei2esdsd',....]
EDIT 1- 7/26/16
I saw that I made a mistake in my formatting when I copied the code (it was late, I was tired..). I switched it to a proper format that is more like JSON. This new edit properly matches file I am parsing. I then tried to parse it with JSON, and got the ValueError: Extra data: line 2 column 1 - line X column 1:, where line X is the end of the line.
def readCsv(filename):
with open(filename, 'r') as file:
data=json.load(file)
pprint(data)
I also tried DictReader, and got a KeyError: 'totalReplyCount'. Is the dictionary un-ordered?
EDIT 2 -7/27/16
After taking a break, coming back to it, and thinking it over, I realized that what I have (after proper massaging of the data) is a CSV file, that contains a proper JSON object on each line. So, I have to parse the CSV file, then parse each line which is a top level, whole and complete JSON object. The code I used to try and parse this is below but all I get is the first string character, an open curly brace '{' :
def readCsv(filename):
with open(filename, 'r') as csvfile:
for row in csv.DictReader(csvfile):
for item in row:
print item[0]
I am guessing that the DictReader is converting the json object to a string, and that is why I am only getting a curly brace as opposed to the first key. If I was to do print item[0:5] I would get a mish mash of the first 4 characters in an un-ordered fashion on each line, which I assume is because the format has turned into an un-ordered list? I think I understand my problem a little bit better, but still wrapping my head around the data structures and the methods used to parse them. What am I missing?
After reading the question and all the above answers, please check if this is useful to you.
I have considered input file as simple file not as csv or json file.
Flow of code is as follow:
Open and read a file in reverse order.
Search for ID in line. Extract ID and store in temp variable.
Go on reading file line by line and search totalReplyCount.
Once you got totalReplyCount, check it if it greater than 0.
If yes, then store temp ID in id_list and re-initialize temp variable.
import re
tmp_id_to_store = ''
id_list = []
for line in reversed(open("a.txt").readlines()):
m = re.search('"id":"(\w+)"', line.rstrip())
if m:
tmp_id_to_store = m.group(1)
n = re.search('{ "totalReplyCount": (\d+),', line.rstrip())
if n:
fou = n.group(1)
if int(fou) > 0:
id_list.append(tmp_id_to_store)
tmp_id_to_store = ''
print id_list
More check points can be added.
As the error stated, Your csvFile is a file object, it is not a dict object, so you can't get an item out of it.
if your csvFile is in CSV format, you can use the csv module to read each line of the csv into a dict :
import csv
with open(filename) as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print row['totalReplyCount']
note the DictReader method from the csv module, it will read your csv line and parse it into dict object
If your input file is JSON why not just use the JSON library to parse it and then run a for loop over that data. Then it is just a matter of iterating over the keys and extracting data.
import json
from pprint import pprint
with open('data.json') as data_file:
data = json.load(data_file)
pprint(data)
Parsing values from a JSON file using Python?
Look at Justin Peel's answer. It should help.
Parsing values from a JSON file in Python , this link has it all # Parsing values from a JSON file using Python? via stackoverflow.
Here is a shell one-liner, should solve your problem, though it's not python.
egrep -o '"(?:totalReplyCount|id)":(.*?)$' filename | awk '/totalReplyCount/ {if ($2+0 > 0) {getline; print}}' | cut -d: -f2
output:
"kjsdbesd2wd2eedd23rf3r3r2e2dwe2edsd"
I have a .txt with JSON formatted content, that I would like to read, convert it to a JSON object and then log the result. I could read the file and I'm really close, but unfortunately json_data is a string object instead of a JSON object/dictionary. I assume it's something trivial, but I have no idea, because I'm new to Python, so I would really appreciate if somebody could show me the right solution.
import json
filename = 'html-json.txt'
with open(filename, encoding="utf8") as f:
jsonContentTxt = f.readlines()
json_data = json.dumps(jsonContentTxt)
print (json_data)
You may want to consult the docs for the json module. The Python docs are generally pretty great and this is no exception.
f.readlines() will read the lines of f points to—in your case, html-json.txt—and return those lines as a string. So jsonContentTxt is a string in JSON format.
If you simply want to print this string, you could just print jsonContentTxt. On the other hand, if you want to load that JSON into a Python data structure, manipulate it, and then output it, you could do something like this (which uses json.load, a function that takes a file-like object and returns an object such as a dict or list depending on the JSON):
with open(filename, encoding="utf8") as f:
json_content = json.load(f)
# do stuff with json_content, e.g. json_concent['foo'] = 'bar'
# then when you're ready to output:
print json.dumps(json_content)
You may also want to use the indent argument to json.dumps (link here) which will give you a nicely-formatted string.
Read the 2.7 documentation here or the 3.5 documentation here:
json.loads(json_as_string) # Deserializes a string to a json heirarchy
Once you have a deserialized form you can convert it back to json with a dump:
json.dump(json_as_heirarchy)