What is the difference between json.load() and json.loads() functions - python

In Python, what is the difference between json.load() and json.loads()?
I guess that the load() function must be used with a file object (I need thus to use a context manager) while the loads() function take the path to the file as a string. It is a bit confusing.
Does the letter "s" in json.loads() stand for string?
Thanks a lot for your answers!

Yes, s stands for string. The json.loads function does not take the file path, but the file contents as a string. Look at the documentation.

Just going to add a simple example to what everyone has explained,
json.load()
json.load can deserialize a file itself i.e. it accepts a file object, for example,
# open a json file for reading and print content using json.load
with open("/xyz/json_data.json", "r") as content:
print(json.load(content))
will output,
{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}
If I use json.loads to open a file instead,
# you cannot use json.loads on file object
with open("json_data.json", "r") as content:
print(json.loads(content))
I would get this error:
TypeError: expected string or buffer
json.loads()
json.loads() deserialize string.
So in order to use json.loads I will have to pass the content of the file using read() function, for example,
using content.read() with json.loads() return content of the file,
with open("json_data.json", "r") as content:
print(json.loads(content.read()))
Output,
{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}
That's because type of content.read() is string, i.e. <type 'str'>
If I use json.load() with content.read(), I will get error,
with open("json_data.json", "r") as content:
print(json.load(content.read()))
Gives,
AttributeError: 'str' object has no attribute 'read'
So, now you know json.load deserialze file and json.loads deserialize a string.
Another example,
sys.stdin return file object, so if i do print(json.load(sys.stdin)), I will get actual json data,
cat json_data.json | ./test.py
{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}
If I want to use json.loads(), I would do print(json.loads(sys.stdin.read())) instead.

Documentation is quite clear: https://docs.python.org/2/library/json.html
json.load(fp[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]])
Deserialize fp (a .read()-supporting file-like object containing a
JSON document) to a Python object using this conversion table.
json.loads(s[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]])
Deserialize s (a str or unicode instance containing a JSON document)
to a Python object using this conversion table.
So load is for a file, loads for a string

QUICK ANSWER (very simplified!)
json.load() takes a FILE
json.load() expects a file (file object) - e.g. a file you opened before given by filepath like 'files/example.json'.
json.loads() takes a STRING
json.loads() expects a (valid) JSON string - i.e. {"foo": "bar"}
EXAMPLES
Assuming you have a file example.json with this content: { "key_1": 1, "key_2": "foo", "Key_3": null }
>>> import json
>>> file = open("example.json")
>>> type(file)
<class '_io.TextIOWrapper'>
>>> file
<_io.TextIOWrapper name='example.json' mode='r' encoding='UTF-8'>
>>> json.load(file)
{'key_1': 1, 'key_2': 'foo', 'Key_3': None}
>>> json.loads(file)
Traceback (most recent call last):
File "/usr/local/python/Versions/3.7/lib/python3.7/json/__init__.py", line 341, in loads
TypeError: the JSON object must be str, bytes or bytearray, not TextIOWrapper
>>> string = '{"foo": "bar"}'
>>> type(string)
<class 'str'>
>>> string
'{"foo": "bar"}'
>>> json.loads(string)
{'foo': 'bar'}
>>> json.load(string)
Traceback (most recent call last):
File "/usr/local/python/Versions/3.7/lib/python3.7/json/__init__.py", line 293, in load
return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'

In python3.7.7, the definition of json.load is as below according to cpython source code:
def load(fp, *, cls=None, object_hook=None, parse_float=None,
parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
return loads(fp.read(),
cls=cls, object_hook=object_hook,
parse_float=parse_float, parse_int=parse_int,
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
json.load actually calls json.loads and use fp.read() as the first argument.
So if your code is:
with open (file) as fp:
s = fp.read()
json.loads(s)
It's the same to do this:
with open (file) as fp:
json.load(fp)
But if you need to specify the bytes reading from the file as like fp.read(10) or the string/bytes you want to deserialize is not from file, you should use json.loads()
As for json.loads(), it not only deserialize string but also bytes. If s is bytes or bytearray, it will be decoded to string first. You can also find it in the source code.
def loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None,
parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
"""Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance
containing a JSON document) to a Python object.
...
"""
if isinstance(s, str):
if s.startswith('\ufeff'):
raise JSONDecodeError("Unexpected UTF-8 BOM (decode using utf-8-sig)",
s, 0)
else:
if not isinstance(s, (bytes, bytearray)):
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
f'not {s.__class__.__name__}')
s = s.decode(detect_encoding(s), 'surrogatepass')

Related

Uploading bytes using PySFTP with putfo

So I have a base64 string and wanting to decode into a file object and upload it with PySFTP. I'm getting an error:
'bytes' object has no attribute 'read'
Is my decoding wrong here?
fileObj = base64.b64decode(attach["payload"])
srv.putfo(fileObj, filename)
The Connection.putfo takes a file-like object, not just "bytes":
fileObj = BytesIO(base64.b64decode(attach["payload"]))

How to format JSON style text using python?

I wrote a program to convert KML to GeoJSON. But, when I look at the output files, they are written without whitespace, making them very hard to read.
I tried to use the json module like so:
file = json.load("<filename>")
But it returned the following:
File "/usr/lib/python3.6/json/__init__.py", line 296, in load
return loads(fp.read())
AttributeError: 'str' has no attribute 'read'
load takes a file object, not a file name.
with open("filename") as fh:
d = json.load(fh)
Once you've parsed it, you can dump it again, but formatted a bit more nicely
with open("formatted-filename.json", "w") as fh:
json.dump(d, fh, indent=4)

Write Data To File in Python Gives Error Related to UNICODE

I am basically parsing data from XML using SAX Parser in Python.
I am able to parse and print. However I wanted to put the data to a text file.
sample:
def startElement(self, name, attrs):
file.write("startElement'"+ name + " ' ")
While trying to write some text to a test.txt with above sample code, I get below error:
TypeError: descriptor 'write' requires a 'file' object but received a 'unicode'
Any help is greately appreciated.
You are not using an open file. You are using the file type. The file.write method is then unbound it expected an open file to be bound to:
>>> file
<type 'file'>
>>> file.write
<method 'write' of 'file' objects>
>>> file.write(u'Hello')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: descriptor 'write' requires a 'file' object but received a 'unicode'
If you have an already opened file object, then use that; perhaps you have an attribute named file on self:
self.file.write("startElement'" + name + " ' ")
but take into account that because name is a Unicode value you probably want to encode the information to bytes:
self.file.write("startElement'" + name.encode('utf8') + " ' ")
You could also use io.open() function to create a file object that'll accept Unicode values and encode these to a given encoding for you when writing:
file_object = io.open(filename, 'w', encoding='utf8')
but then you need to be explicit about always writing Unicode values and not mix byte strings (type str) and Unicode strings (type unicode).

Python 3.4 - reading data from a webpage

I'm currently trying to learn how to read from a webpage, and have tried the following:
>>>import urllib.request
>>>page = urllib.request.urlopen("http://docs.python-requests.org/en/latest/", data = None)
>>>contents = page.read()
>>>lines = contents.split('\n')
This gives the following error:
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
lines = contents.split('\n')
TypeError: Type str doesn't support the buffer API
Now I assumed that reading from a URL would be pretty similar from reading for a text file, and that the contents of contents would be of type str. Is this not that case?
When I try >>> contents I can see that the contents of contents is just the HTML document, so why doesn't `.split('\n') work? How can I make it work?
Please note that I'm splitting at the newline characters so I can print the webpage line by line.
Following the same train of thought, I then tried contents.readlines() which gave this error:
Traceback (most recent call last):
File "<pyshell#8>", line 1, in <module>
contents.readlines()
AttributeError: 'bytes' object has no attribute 'readlines'
Is the webpage stored in some object called 'bytes'?
Can someone explain to me what is happening here? And how to read the webpage properly?
You need to wrap it with an io.TextIOWrapper() object and encode your file (utf-8 is a universal you can change it to proper encoding too):
import urllib.request
import io
u = urllib.request.urlopen("http://docs.python-requests.org/en/latest/", data = None)
f = io.TextIOWrapper(u,encoding='utf-8')
text = f.read()
Decode the bytes object to produce a string:
lines = contents.decode(encoding="UTF-8").split("/n")
The return type of the read() method is of type bytes. You need to properly decode it to a string before you can use a string method like split. Assuming it is UTF-8 you can use:
s = contents.decode('utf-8')
lines = s.split('\n')
As a general solution you should check the character encoding the server provides in the response to your request and use that.

Base 64 encode a JSON variable in Python

I have a variable that stores json value. I want to base64 encode it in Python. But the error 'does not support the buffer interface' is thrown. I know that the base64 needs a byte to convert. But as I am newbee in Python, no idea as how to convert json to base64 encoded string.Is there a straight forward way to do it??
In Python 3.x you need to convert your str object to a bytes object for base64 to be able to encode them. You can do that using the str.encode method:
>>> import json
>>> import base64
>>> d = {"alg": "ES256"}
>>> s = json.dumps(d) # Turns your json dict into a str
>>> print(s)
{"alg": "ES256"}
>>> type(s)
<class 'str'>
>>> base64.b64encode(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.2/base64.py", line 56, in b64encode
raise TypeError("expected bytes, not %s" % s.__class__.__name__)
TypeError: expected bytes, not str
>>> base64.b64encode(s.encode('utf-8'))
b'eyJhbGciOiAiRVMyNTYifQ=='
If you pass the output of your_str_object.encode('utf-8') to the base64 module, you should be able to encode it fine.
Here are two methods worked on python3
encodestring is deprecated and suggested one to use is encodebytes
import json
import base64
with open('test.json') as jsonfile:
data = json.load(jsonfile)
print(type(data)) #dict
datastr = json.dumps(data)
print(type(datastr)) #str
print(datastr)
encoded = base64.b64encode(datastr.encode('utf-8')) #1 way
print(encoded)
print(base64.encodebytes(datastr.encode())) #2 method
You could encode the string first, as UTF-8 for example, then base64 encode it:
data = '{"hello": "world"}'
enc = data.encode() # utf-8 by default
print base64.encodestring(enc)
This also works in 2.7 :)
Here's a function that you can feed a string and it will output a base64 string.
import base64
def b64EncodeString(msg):
msg_bytes = msg.encode('ascii')
base64_bytes = base64.b64encode(msg_bytes)
return base64_bytes.decode('ascii')

Categories