Write Data To File in Python Gives Error Related to UNICODE - python

I am basically parsing data from XML using SAX Parser in Python.
I am able to parse and print. However I wanted to put the data to a text file.
sample:
def startElement(self, name, attrs):
file.write("startElement'"+ name + " ' ")
While trying to write some text to a test.txt with above sample code, I get below error:
TypeError: descriptor 'write' requires a 'file' object but received a 'unicode'
Any help is greately appreciated.

You are not using an open file. You are using the file type. The file.write method is then unbound it expected an open file to be bound to:
>>> file
<type 'file'>
>>> file.write
<method 'write' of 'file' objects>
>>> file.write(u'Hello')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: descriptor 'write' requires a 'file' object but received a 'unicode'
If you have an already opened file object, then use that; perhaps you have an attribute named file on self:
self.file.write("startElement'" + name + " ' ")
but take into account that because name is a Unicode value you probably want to encode the information to bytes:
self.file.write("startElement'" + name.encode('utf8') + " ' ")
You could also use io.open() function to create a file object that'll accept Unicode values and encode these to a given encoding for you when writing:
file_object = io.open(filename, 'w', encoding='utf8')
but then you need to be explicit about always writing Unicode values and not mix byte strings (type str) and Unicode strings (type unicode).

Related

How to turn a string into a binary object in python

I'm using this library to download and decode MMS PDUs:
https://github.com/pmarti/python-messaging
The sample code almost works, except that this method:
mms = MMSMessage.from_data(response)
Is throwing an exception:
TypeError: unsupported operand type(s) for &: 'str' and 'int'
Which seems to obviously be some sort of binary formatting problem.
In the sample code, the HTTP response is passed directly into the from_data method, however in my case it comes through with HTTP headers on it so I'm splitting the response by double CRLF and then passing in just the PDU data:
data = buf.getvalue()
split = data.split("\r\n\r\n");
mms = MMSMessage.from_data(split[1].strip())
This throws an error BUT if I first write the exact same data to a file then use the from_file method it works:
data = buf.getvalue()
split = data.split("\r\n\r\n");
f = open('dump','w+')
f.write(split[1])
f.close()
path = 'dump'
mms = MMSMessage.from_file(path)
I looked in the from_file method, and all it does is load the contents and then pass it into the same method as the from_data method, so the first way should Just Work™.
What I did notice is that the file is opened in binary format, and the content is loaded like this:
data = array.array('B')
with open(filename, 'rb') as f:
data.fromfile(f, num_bytes)
return self.decode_data(data)
So it seems obvious that somehow what I'm passing into the first function is actually a "string representation of binary data" and what's being read from the file is "actual binary data".
I tried using bytearray like this to "binaryfy" the string:
mms = MMSMessage.from_data(bytearray(split[1].strip(), "utf8"))
but that throws the error:
Traceback (most recent call last):
File "decodepdu.py", line 41, in <module>
mms = MMSMessage.from_data(bytearray(split[1].strip(), "utf8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8c in position 0: ordinal not in range(128)
which seems weird because it's using an 'ascii' codec but I specified utf8 encoding.
Anyway at this point I'm in over my head because I'm not really all that familiar with python, so for now I'm just writing the content to a temporary file but I would really rather not.
Any help would be most appreciated!
Okay thanks to Paul M. in the comments, this works:
data = buf.getvalue()
split = data.split("\r\n\r\n");
pdu = array.array('B');
pdu.fromstring(split[1]);
mms = MMSMessage.from_data(pdu);

TypeError: a bytes-like object is required, not 'str' while loading with pickle

I'm using Python 3.6 and Spyder (Anaconda).
I have tried many things but nothing worked out.
I don't know why this error is coming always to me while loading with pickle.
filename = "allfeatures.txt"
allfeatures = open(filename, 'r').read()
with open(filename) as f:
allfeatures = list(f)
allconcat = np.vstack(list(allfeatures.values()))
TypeError Traceback (most recent call last)
AttributeError: 'list' object has no attribute 'values'
You need to open your file as a binary file:
pickle.loads(open("accounts.txt", 'rb').read())
Otherwise, it's using an str to read the data.

What is the difference between json.load() and json.loads() functions

In Python, what is the difference between json.load() and json.loads()?
I guess that the load() function must be used with a file object (I need thus to use a context manager) while the loads() function take the path to the file as a string. It is a bit confusing.
Does the letter "s" in json.loads() stand for string?
Thanks a lot for your answers!
Yes, s stands for string. The json.loads function does not take the file path, but the file contents as a string. Look at the documentation.
Just going to add a simple example to what everyone has explained,
json.load()
json.load can deserialize a file itself i.e. it accepts a file object, for example,
# open a json file for reading and print content using json.load
with open("/xyz/json_data.json", "r") as content:
print(json.load(content))
will output,
{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}
If I use json.loads to open a file instead,
# you cannot use json.loads on file object
with open("json_data.json", "r") as content:
print(json.loads(content))
I would get this error:
TypeError: expected string or buffer
json.loads()
json.loads() deserialize string.
So in order to use json.loads I will have to pass the content of the file using read() function, for example,
using content.read() with json.loads() return content of the file,
with open("json_data.json", "r") as content:
print(json.loads(content.read()))
Output,
{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}
That's because type of content.read() is string, i.e. <type 'str'>
If I use json.load() with content.read(), I will get error,
with open("json_data.json", "r") as content:
print(json.load(content.read()))
Gives,
AttributeError: 'str' object has no attribute 'read'
So, now you know json.load deserialze file and json.loads deserialize a string.
Another example,
sys.stdin return file object, so if i do print(json.load(sys.stdin)), I will get actual json data,
cat json_data.json | ./test.py
{u'event': {u'id': u'5206c7e2-da67-42da-9341-6ea403c632c7', u'name': u'Sufiyan Ghori'}}
If I want to use json.loads(), I would do print(json.loads(sys.stdin.read())) instead.
Documentation is quite clear: https://docs.python.org/2/library/json.html
json.load(fp[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]])
Deserialize fp (a .read()-supporting file-like object containing a
JSON document) to a Python object using this conversion table.
json.loads(s[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]])
Deserialize s (a str or unicode instance containing a JSON document)
to a Python object using this conversion table.
So load is for a file, loads for a string
QUICK ANSWER (very simplified!)
json.load() takes a FILE
json.load() expects a file (file object) - e.g. a file you opened before given by filepath like 'files/example.json'.
json.loads() takes a STRING
json.loads() expects a (valid) JSON string - i.e. {"foo": "bar"}
EXAMPLES
Assuming you have a file example.json with this content: { "key_1": 1, "key_2": "foo", "Key_3": null }
>>> import json
>>> file = open("example.json")
>>> type(file)
<class '_io.TextIOWrapper'>
>>> file
<_io.TextIOWrapper name='example.json' mode='r' encoding='UTF-8'>
>>> json.load(file)
{'key_1': 1, 'key_2': 'foo', 'Key_3': None}
>>> json.loads(file)
Traceback (most recent call last):
File "/usr/local/python/Versions/3.7/lib/python3.7/json/__init__.py", line 341, in loads
TypeError: the JSON object must be str, bytes or bytearray, not TextIOWrapper
>>> string = '{"foo": "bar"}'
>>> type(string)
<class 'str'>
>>> string
'{"foo": "bar"}'
>>> json.loads(string)
{'foo': 'bar'}
>>> json.load(string)
Traceback (most recent call last):
File "/usr/local/python/Versions/3.7/lib/python3.7/json/__init__.py", line 293, in load
return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'
In python3.7.7, the definition of json.load is as below according to cpython source code:
def load(fp, *, cls=None, object_hook=None, parse_float=None,
parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
return loads(fp.read(),
cls=cls, object_hook=object_hook,
parse_float=parse_float, parse_int=parse_int,
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
json.load actually calls json.loads and use fp.read() as the first argument.
So if your code is:
with open (file) as fp:
s = fp.read()
json.loads(s)
It's the same to do this:
with open (file) as fp:
json.load(fp)
But if you need to specify the bytes reading from the file as like fp.read(10) or the string/bytes you want to deserialize is not from file, you should use json.loads()
As for json.loads(), it not only deserialize string but also bytes. If s is bytes or bytearray, it will be decoded to string first. You can also find it in the source code.
def loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None,
parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
"""Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance
containing a JSON document) to a Python object.
...
"""
if isinstance(s, str):
if s.startswith('\ufeff'):
raise JSONDecodeError("Unexpected UTF-8 BOM (decode using utf-8-sig)",
s, 0)
else:
if not isinstance(s, (bytes, bytearray)):
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
f'not {s.__class__.__name__}')
s = s.decode(detect_encoding(s), 'surrogatepass')

str object has no attribute 'close'

Im analyzing a text for word frequency and I am getting this error message after it is done:
'str' object has no attribute 'close'
I've used the close() method before so I dont know what to do.
Here is the code:
def main():
text=open("catinhat.txt").read()
text=text.lower()
for ch in '!"$%&()*+,-./:;<=>=?#[\\]^_{|}~':
text=text.replace(ch,"")
words=text.split()
d={}
count=0
for w in words:
count+=1
d[w]=d.get(w,0)+1
d["#"]=count
print(d)
text.close()
main()
You didn't save a reference to the file handle. You opened the file, read its contents, and saved the resulting string. There's no file handle to close. The best way to avoid this is to use the with context manager:
def main():
with open("catinhat.txt") as f:
text=f.read()
...
This will close the file automatically after the with block ends, without an explicit f.close().
That is because your variable text has a type of string (as you are reading contests from a file).
Let me show you the exact example:
>>> t = open("test.txt").read()
#t contains now 'asdfasdfasdfEND' <- content of test.txt file
>>> type(t)
<class 'str'>
>>> t.close()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'close'
If you use an auxiliary variable for the open() function (which returns a _io.TextIOWrapper), you can close it:
>>> f = open("test.txt")
>>> t = f.read() # t contains the text from test.txt and f is still a _io.TextIOWrapper, which has a close() method
>>> type(f)
<class '_io.TextIOWrapper'>
>>> f.close() # therefore I can close it here
>>>
text=open("catinhat.txt").read()
text is a str since that is what .read() returns. It does not have a close method. A file object would have a close method, but you didn't assign the file you opened to a name thus you can no longer refer to it to close it.
I recommend using a with statement to manage the file:
with open("catinhat.txt") as f:
text = f.read()
...
The with statement will close the file whether the block finishes successfully or an exception is raised.

Python 3.4 - reading data from a webpage

I'm currently trying to learn how to read from a webpage, and have tried the following:
>>>import urllib.request
>>>page = urllib.request.urlopen("http://docs.python-requests.org/en/latest/", data = None)
>>>contents = page.read()
>>>lines = contents.split('\n')
This gives the following error:
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
lines = contents.split('\n')
TypeError: Type str doesn't support the buffer API
Now I assumed that reading from a URL would be pretty similar from reading for a text file, and that the contents of contents would be of type str. Is this not that case?
When I try >>> contents I can see that the contents of contents is just the HTML document, so why doesn't `.split('\n') work? How can I make it work?
Please note that I'm splitting at the newline characters so I can print the webpage line by line.
Following the same train of thought, I then tried contents.readlines() which gave this error:
Traceback (most recent call last):
File "<pyshell#8>", line 1, in <module>
contents.readlines()
AttributeError: 'bytes' object has no attribute 'readlines'
Is the webpage stored in some object called 'bytes'?
Can someone explain to me what is happening here? And how to read the webpage properly?
You need to wrap it with an io.TextIOWrapper() object and encode your file (utf-8 is a universal you can change it to proper encoding too):
import urllib.request
import io
u = urllib.request.urlopen("http://docs.python-requests.org/en/latest/", data = None)
f = io.TextIOWrapper(u,encoding='utf-8')
text = f.read()
Decode the bytes object to produce a string:
lines = contents.decode(encoding="UTF-8").split("/n")
The return type of the read() method is of type bytes. You need to properly decode it to a string before you can use a string method like split. Assuming it is UTF-8 you can use:
s = contents.decode('utf-8')
lines = s.split('\n')
As a general solution you should check the character encoding the server provides in the response to your request and use that.

Categories