JSONDecodeError: Expecting value: line 1 column 1 (char 0) - python

I am getting error Expecting value: line 1 column 1 (char 0) when trying to decode JSON.
The URL I use for the API call works fine in the browser, but gives this error when done through a curl request. The following is the code I use for the curl request.
The error happens at return simplejson.loads(response_json)
response_json = self.web_fetch(url)
response_json = response_json.decode('utf-8')
return json.loads(response_json)
def web_fetch(self, url):
buffer = StringIO()
curl = pycurl.Curl()
curl.setopt(curl.URL, url)
curl.setopt(curl.TIMEOUT, self.timeout)
curl.setopt(curl.WRITEFUNCTION, buffer.write)
curl.perform()
curl.close()
response = buffer.getvalue().strip()
return response
Traceback:
File "/Users/nab/Desktop/myenv2/lib/python2.7/site-packages/django/core/handlers/base.py" in get_response
111. response = callback(request, *callback_args, **callback_kwargs)
File "/Users/nab/Desktop/pricestore/pricemodels/views.py" in view_category
620. apicall=api.API().search_parts(category_id= str(categoryofpart.api_id), manufacturer = manufacturer, filter = filters, start=(catpage-1)*20, limit=20, sort_by='[["mpn","asc"]]')
File "/Users/nab/Desktop/pricestore/pricemodels/api.py" in search_parts
176. return simplejson.loads(response_json)
File "/Users/nab/Desktop/myenv2/lib/python2.7/site-packages/simplejson/__init__.py" in loads
455. return _default_decoder.decode(s)
File "/Users/nab/Desktop/myenv2/lib/python2.7/site-packages/simplejson/decoder.py" in decode
374. obj, end = self.raw_decode(s)
File "/Users/nab/Desktop/myenv2/lib/python2.7/site-packages/simplejson/decoder.py" in raw_decode
393. return self.scan_once(s, idx=_w(s, idx).end())
Exception Type: JSONDecodeError at /pricemodels/2/dir/
Exception Value: Expecting value: line 1 column 1 (char 0)

Your code produced an empty response body, you'd want to check for that or catch the exception raised. It is possible the server responded with a 204 No Content response, or a non-200-range status code was returned (404 Not Found, etc.). Check for this.
Note:
There is no need to use simplejson library, the same library is included with Python as the json module.
There is no need to decode a response from UTF8 to unicode, the simplejson / json .loads() method can handle UTF8 encoded data natively.
pycurl has a very archaic API. Unless you have a specific requirement for using it, there are better choices.
Either the requests or httpx offers much friendlier APIs, including JSON support. If you can, replace your call with:
import requests
response = requests.get(url)
response.raise_for_status() # raises exception when not a 2xx response
if response.status_code != 204:
return response.json()
Of course, this won't protect you from a URL that doesn't comply with HTTP standards; when using arbirary URLs where this is a possibility, check if the server intended to give you JSON by checking the Content-Type header, and for good measure catch the exception:
if (
response.status_code != 204 and
response.headers["content-type"].strip().startswith("application/json")
):
try:
return response.json()
except ValueError:
# decide how to handle a server that's misbehaving to this extent

Be sure to remember to invoke json.loads() on the contents of the file, as opposed to the file path of that JSON:
json_file_path = "/path/to/example.json"
with open(json_file_path, 'r') as j:
contents = json.loads(j.read())
I think a lot of people are guilty of doing this every once in a while (myself included):
contents = json.load(json_file_path)

Check the response data-body, whether actual data is present and a data-dump appears to be well-formatted.
In most cases your json.loads- JSONDecodeError: Expecting value: line 1 column 1 (char 0) error is due to :
non-JSON conforming quoting
XML/HTML output (that is, a string starting with <), or
incompatible character encoding
Ultimately the error tells you that at the very first position the string already doesn't conform to JSON.
As such, if parsing fails despite having a data-body that looks JSON like at first glance, try replacing the quotes of the data-body:
import sys, json
struct = {}
try:
try: #try parsing to dict
dataform = str(response_json).strip("'<>() ").replace('\'', '\"')
struct = json.loads(dataform)
except:
print repr(resonse_json)
print sys.exc_info()
Note: Quotes within the data must be properly escaped

With the requests lib JSONDecodeError can happen when you have an http error code like 404 and try to parse the response as JSON !
You must first check for 200 (OK) or let it raise on error to avoid this case.
I wish it failed with a less cryptic error message.
NOTE: as Martijn Pieters stated in the comments servers can respond with JSON in case of errors (it depends on the implementation), so checking the Content-Type header is more reliable.

Check encoding format of your file and use corresponding encoding format while reading file. It will solve your problem.
with open("AB.json", encoding='utf-8', errors='ignore') as json_data:
data = json.load(json_data, strict=False)

I had the same issue trying to read json files with
json.loads("file.json")
I solved the problem with
with open("file.json", "r") as read_file:
data = json.load(read_file)
maybe this can help in your case

A lot of times, this will be because the string you're trying to parse is blank:
>>> import json
>>> x = json.loads("")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
You can remedy by checking whether json_string is empty beforehand:
import json
if json_string:
x = json.loads(json_string)
else:
# Your code/logic here
x = {}

I encounterred the same problem, while print out the json string opened from a json file, found the json string starts with '', which by doing some reserach is due to the file is by default decoded with UTF-8, and by changing encoding to utf-8-sig, the mark out is stripped out and loads json no problem:
open('test.json', encoding='utf-8-sig')

This is the minimalist solution I found when you want to load json file in python
import json
data = json.load(open('file_name.json'))
If this give error saying character doesn't match on position X and Y, then just add encoding='utf-8' inside the open round bracket
data = json.load(open('file_name.json', encoding='utf-8'))
Explanation
open opens the file and reads the containts which later parse inside json.load.
Do note that using with open() as f is more reliable than above syntax, since it make sure that file get closed after execution, the complete sytax would be
with open('file_name.json') as f:
data = json.load(f)

There may be embedded 0's, even after calling decode(). Use replace():
import json
struct = {}
try:
response_json = response_json.decode('utf-8').replace('\0', '')
struct = json.loads(response_json)
except:
print('bad json: ', response_json)
return struct

I had the same issue, in my case I solved like this:
import json
with open("migrate.json", "rb") as read_file:
data = json.load(read_file)

I was having the same problem with requests (the python library). It happened to be the accept-encoding header.
It was set this way: 'accept-encoding': 'gzip, deflate, br'
I simply removed it from the request and stopped getting the error.

Just check if the request has a status code 200. So for example:
if status != 200:
print("An error has occured. [Status code", status, "]")
else:
data = response.json() #Only convert to Json when status is OK.
if not data["elements"]:
print("Empty JSON")
else:
"You can extract data here"

In my case I was doing file.read() two times in if and else block which was causing this error. so make sure to not do this mistake and hold contain in variable and use variable multiple times.

I had exactly this issue using requests.
Thanks to Christophe Roussy for his explanation.
To debug, I used:
response = requests.get(url)
logger.info(type(response))
I was getting a 404 response back from the API.

In my case it occured because i read the data of the file using file.read() and then tried to parse it using json.load(file).I fixed the problem by replacing json.load(file) with json.loads(data)
Not working code
with open("text.json") as file:
data=file.read()
json_dict=json.load(file)
working code
with open("text.json") as file:
data=file.read()
json_dict=json.loads(data)

For me, it was not using authentication in the request.

For me it was server responding with something other than 200 and the response was not json formatted. I ended up doing this before the json parse:
# this is the https request for data in json format
response_json = requests.get()
# only proceed if I have a 200 response which is saved in status_code
if (response_json.status_code == 200):
response = response_json.json() #converting from json to dictionary using json library

I received such an error in a Python-based web API's response .text, but it led me here, so this may help others with a similar issue (it's very difficult to filter response and request issues in a search when using requests..)
Using json.dumps() on the request data arg to create a correctly-escaped string of JSON before POSTing fixed the issue for me
requests.post(url, data=json.dumps(data))

In my case it is because the server is giving http error occasionally. So basically once in a while my script gets the response like this rahter than the expected response:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<h1>502 Bad Gateway</h1>
<p>The proxy server received an invalid response from an upstream server.<hr/>Powered by Tengine</body>
</html>
Clearly this is not in json format and trying to call .json() will yield JSONDecodeError: Expecting value: line 1 column 1 (char 0)
You can print the exact response that causes this error to better debug.
For example if you are using requests and then simply print the .text field (before you call .json()) would do.

I did:
Open test.txt file, write data
Open test.txt file, read data
So I didn't close file after 1.
I added
outfile.close()
and now it works

If you are a Windows user, Tweepy API can generate an empty line between data objects. Because of this situation, you can get "JSONDecodeError: Expecting value: line 1 column 1 (char 0)" error. To avoid this error, you can delete empty lines.
For example:
def on_data(self, data):
try:
with open('sentiment.json', 'a', newline='\n') as f:
f.write(data)
return True
except BaseException as e:
print("Error on_data: %s" % str(e))
return True
Reference:
Twitter stream API gives JSONDecodeError("Expecting value", s, err.value) from None

if you use headers and have "Accept-Encoding": "gzip, deflate, br" install brotli library with pip install. You don't need to import brotli to your py file.

In my case it was a simple solution of replacing single quotes with double.
You can find my answer here

Related

How to get file size if content-length does not exist?

I have a file I'd like Python to find out and print its file size. Unfortunately, the headers do not have the key content-length. Is there a workaround to finding the file size without content-length? Thanks.
import requests
link = "https://web.archive.org/cdx/search/cdx?url=twitter.com/realdonaldtrump/status&matchType=prefix&filter=statuscode:200"
response = requests.head(link)
print(response.headers['content-length'])
Error:
return self._store[key.lower()][1]
KeyError: 'content-length'

normally my api returns json, but sometimes returns a full response object?

All,
I have a script i have in place which fetches JSON off of a webserver. Simple as the following:
url = "foo.com/json"
response = requests.get(url).content
data = json.loads(response)
but i noticed is that sometimes instead of returning the JSON object, it will return what looks like a response dump. See here: https://pastebin.com/fUy5YMuY
What confuses me is to how to continue on.
Right now i took the above python and wrapped it
try:
url = "foo.com/json"
response = requests.get(url).content
data = json.loads(response)
except Exception as ex:
with open("test.txt", "w") as t:
t.write(response)
print("Error", sys.exc_info())
Is there a way to catch this? Right now I get a ValueError... and then reparse it? I was thinking to do something like:
except Exception as ex:
response = reparse(response)
but im still confused as to why it will sometimes return the JSON and other times, the header info + content.
def reparse(response):
"""
Catch the ValueError and attempt to reparse it for the json contnet
"""
Can i feed something like the pastebin dump into some sort of requests.Reponse class or similar?
Edit Here is the full stack trace I am getting.
File "scrape_people_by_fcc_docket.py", line 82, in main
json_data = get_page(limit, page*limit)
File "scrape_people_by_fcc_docket.py", line 13, in get_page
data = json.loads(response)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 16 column 367717 (char 3 - 368222)
None
In the above code, the response variable is defined by:
response = requests.get(url).content
which is odd because most of the time, reponse will return a JSON object which is completely parsable.
Ideally, I have been trying to find a way to, when content isnt JSON, pass some how parse it for the actual content and then continue on.
instead of using .text or .content you can use the response method: .json() which so far seems to resolve my issues. I am doing continual testing and watching for errors and will update this as needed, but it seems that the json function will return the data i need without headers, and similarly already calls json.loads or similar to parse the information.

urlib.request.urlopen not accepting query string with spaces

I am taking a udacity course on python where we are supposed to check for profane words in a document. I am using the website http://www.wdylike.appspot.com/?q= (text_to_be_checked_for_profanity). The text to be checked can be passed as a query string in the above URL and the website would return a true or false after checking for profane words. Below is my code.
import urllib.request
# Read the content from a document
def read_content():
quotes = open("movie_quotes.txt")
content = quotes.read()
quotes.close()
check_profanity(content)
def check_profanity(text_to_read):
connection = urllib.request.urlopen("http://www.wdylike.appspot.com/?q="+text_to_read)
result = connection.read()
print(result)
connection.close
read_content()
It gives me the following error
Traceback (most recent call last):
File "/Users/Vrushita/Desktop/Rishit/profanity_check.py", line 21, in <module>
read_content()
File "/Users/Vrushita/Desktop/Rishit/profanity_check.py", line 11, in read_content
check_profanity(content)
File "/Users/Vrushita/Desktop/Rishit/profanity_check.py", line 16, in check_profanity
connection = urllib.request.urlopen("http://www.wdylike.appspot.com/?q="+text_to_read)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 472, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 510, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 444, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 590, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
The document that I am trying to read the content from contains a string "Hello world" However, if I change the string to "Hello+world", the same code works and returns the desired result. Can someone explain why this is happening and what is a workaround for this?
urllib accepts it, the server doesn't. And well it should not, because a space is not a valid URL character.
Escape your query string properly with urllib.parse.quote_plus(); it'll ensure your string is valid for use in query parameters. Or better still, use the urllib.parse.urlencode() function to encode all key-value pairs:
from urllib.parse import urlencode
params = urlencode({'q': text_to_read})
connection = urllib.request.urlopen(f"http://www.wdylike.appspot.com/?{params}")
The below response is for python 3.*
400 Bad request occurs when there is space within your input text.
To avoid this use parse.
so import it.
from urllib import request, parse
If you are sending any text along with the url then parse the text.
url = "http://www.wdylike.appspot.com/?q="
url = url + parse.quote(input_to_check)
Check the explanation here - https://discussions.udacity.com/t/problem-in-profanity-with-python-3-solved/227328
The Udacity profanity checker program -
from urllib import request, parse
def read_file():
fhand = open(r"E:\Python_Programming\Udacity\movie_quotes.txt")
file_content = fhand.read()
#print (file_content)
fhand.close()
profanity_check(file_content)
def profanity_check(input_to_check):
url = "http://www.wdylike.appspot.com/?q="
url = url + parse.quote(input_to_check)
req = request.urlopen(url)
answer = req.read()
#print(answer)
req.close()
if b"true" in answer:
print ("Profanity Alret!!!")
else:
print ("Nothing to worry")
read_file()
I think this code is closer to what the Lesson was aiming to, inferencing the difference between native functions, classes and functions inside classes:
from urllib import request, parse
def read_text():
quotes = open('C:/Users/Alejandro/Desktop/movie_quotes.txt', 'r+')
contents_of_file = quotes.read()
print(contents_of_file)
check_profanity(contents_of_file)
quotes.close()
def check_profanity(text_to_check):
connection = request.urlopen('http://www.wdylike.appspot.com/?q=' + parse.quote(text_to_check))
output = connection.read()
# print(output)
connection.close()
if b"true" in output:
print("Profanity Alert!!!")
elif b"false" in output:
print("This document has no curse words!")
else:
print("Could not scan the document properly")
read_text()
I'm working on the same project also using Python 3 like the most.
While looking for the solution in Python 3, I found this HowTo, and I decided to give it a try.
It seems that on some websites, including Google, connections through programming code (for example, via the urllib module), sometimes does not work properly. Apparently this has to do with the User Agent, which is recieved by the website when building the connection.
I did some further researches and came up with the following solution:
First I imported URLopener from urllib.request and created a class called ForceOpen as a subclass of URLopener.
Now I could create a "regular" User Agent by setting the variable version inside the ForceOpen class. Then just created an instance of it and used the open method in place of urlopen to open the URL.
(It works fine, but I'd still appreciate comments, suggestions or any feedback, also because I'm not absolute sure, if this way is a good alternative - many thanks)
from urllib.request import URLopener
class ForceOpen(URLopener): # create a subclass of URLopener
version = "Mozilla/5.0 (cmp; Konqueror ...)(Kubuntu)"
force_open = ForceOpen() # create an instance of it
def read_text():
quotes = open(
"/.../profanity_editor/data/quotes.txt"
)
contents_of_file = quotes.read()
print(contents_of_file)
quotes.close()
check_profanity(contents_of_file)
def check_profanity(text_to_check):
# now use the open method to open the URL
connection = force_open.open(
"http://www.wdylike.appspot.com/?q=" + text_to_check
)
output = connection.read()
connection.close()
if b"true" in output:
print("Attention! Curse word(s) have been detected.")
elif b"false" in output:
print("No curse word(s) found.")
else:
print("Error! Unable to scan document.")
read_text()

read xml files online

i'm new to programing and I'm trying to accese the webservice provided in http://indicadoreseconomicos.bccr.fi.cr/indicadoreseconomicos/WebServices/wsindicadoreseconomicos.asmx?op=ObtenerIndicadoresEconomicosXML, i've added the parameters I need to acces it but when I try to read the file in python I get
TypeError: 'HTTPResponse' object cannot be interpreted as an integer
this is my code
import urllib
import http.client
import time
HEADERS={"Content-type":"application/x-www-form-urlencoded","Accept":"text/plain"}
HOST = "indicadoreseconomicos.bccr.fi.cr"
POST = "/indicadoreseconomicos/WebServices/wsIndicadoresEconomicos.asmx/ObtenerIndicadoresEconomicos"
data = urllib.parse.urlencode({'tcIndicador': 317,
'tcFechaInicio':str(time.strftime("%d/%m/%Y")),
'tcFechaFinal':str(time.strftime("%d/%m/%Y")),
'tcNombre' : 'TI1400',
'tnSubNiveles' : 'N'})
conn=http.client.HTTPConnection(HOST)
conn.request("POST",POST,data,headers=HEADERS)
response= conn.getresponse()
responseSTR= response.read(response)
print (response)
Any suggestions are apreciated
response.read() takes an optional argument that is the number of bytes to read from the response; an integer, whole number. Now you passed the response object instead.
As you want to read the entire response, you should omit the argument altogether, thus:
response_str = response.read()
print(response_str)

Python requests unicode error

I'm using a simple Python script to POST files to a PHP script:
...
url = "http://example.com/upload.php"
r = requests.post(url, data=data, files=files)
...
I need to catch the response text, which is stored in
r.text
But when the response contains ASCII characters (e.g. an image file), the Python fails with this error:
content = str(self.content, encoding, errors='replace')
TypeError: unicode() argument 2 must be string, not None
Is there any way to avoid this error?
Binary file such as an image has no associated character encoding. Use r.content instead of r.text.

Categories