String object has no attribute 'decode' when converting UTF-8

String object has no attribute 'decode' when converting UTF-8 - python

I'm trying to convert G\xc3\xb6del to Gödel (specifically, \xc3\xb6d to ö), but I can't find a method for going about doing this. When I run the below code, I receive an error:
>>> string = '\xc3\xb6'
>>> string.decode(encoding='UTF-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'
This question didn't seem to help, nor did any others that seemed similar, as they were all from 2.x. A friend mentioned base 64 encoding, but I'm not sure in what way that helps. I can't seem to find what I'm supposed to do to convert it in 3.8, so what would be the best way to go about doing this?

The issue here is that a string is already decoded. Basically you encode a string object to a byte object, and the inverse operation is decoding a byte object to a string object. That's why a string has no attribute decode. Think of it like this:
String -> encode -> Byte
Byte -> decode -> String
In this case, the solution would be to call the encode method and pass in 'utf8' or 'ascii', depending on the context and situation.
However, it isn't just converting it to a string object that is the case here. As the OA of this question, I do know exactly what this was meant for, and how I came to a solution. The value Gödel was gained by scraping an SCP Foundation page, finding the Object Class to then pass onto my Discord bot for a command. Here was my code:
link = f"http://www.scp-wiki.net/scp-{num}"
page = get(link)
obj_class = [str(i) for i in page.iter_lines() if b"Object Class:" in i][0]
# ^ There should only be one line in the document matching that requirement.
# The type of this line is a byte object, which is why conversion is necessary later on.
obj_class = re.findall('(?<=\<\/strong> )(.*?)(?=\<)', obj_class)[0]
# ^ Find the actual class in that line.
print(obj_class) # expected Gödel, got G\xc3\xb6del instead.
The above would not raise an exception, it just simply wouldn't convert the character encoding as desired. My fix was simple, once I understood what was going on; replace the str(i) for i.decode('utf8').
obj_class = [i.decode('utf8') for i in page.iter_lines() if b"Object Class:" in i][0]
# ^ decoding it there really makes the difference, converting it to utf-8 without dealing with
# the issues of decoded strings later on.
This would now return the desired value, Gödel, rather than G\xc3\xb6del. I hope that this helps. Please let me know if I've made any mistakes, so I can make any necessary corrections.

Related

python - TypeError: a bytes-like object is required, not 'str'

file = open('file.txt' , 'wb')
text_in_bytes_format = '''\O£”Ã Ø<RVrF>ýtLš:]:B÷(2öÞ{åÐ5"-V¥D1¦òÒm° –^yêŒ·çNxrÌoTÑ³np$h/ñLˆ>U×bSƒƒxd‚`óJà(æçÛ‰#dõ'ŸÊ¤ÇŸVá0Üsß=r÷=÷ê:W«“ÇNÈ²UëeÆŒ™ê—éÀ§.Jÿ†šœýz«~êü;J×Oà©î•£’áæLªîq¨?{’ZæúëŸ_‰³Á¯ùóHzNyéëß”,v8oÚ²0TCðÎ9èV0‡\ò¼qmÆç—ßPø
ýŒ%Ta*çý¾Þ`3€r )Ü“êqcL›k4
Õ¾Ä!áß>2‚ë/¹lªz=ùëïg>lÌ9zL–c=á¦Hé13ö~]ü.¤“‡`SÄj<Wž–¬¬ˆD4'''
file.write(text_in_bytes_format)
When I run the code, I get this error:
TypeError: a bytes-like object is required, not 'str'
There are a couple of solutions to this problem on the internet, but my problem is slightly different because the text I am trying to write is in bytes format.
Is there any way to fix this problem?
It would be great if anyone could help me out.

You must convert it to bytes first. You can do this by using bytes()
file = open('file.txt' , 'wb')
text_in_bytes_format = '''\O£”Ã Ø<RVrF>ýtLš:]:B÷(2öÞ{åÐ5"-V¥D1¦òÒm° –^yêŒ·çNxrÌoTÑ³np$h/ñLˆ>U×bSƒƒxd‚`óJà(æçÛ‰#dõ'ŸÊ¤ÇŸVá0Üsß=r÷=÷ê:W«“ÇNÈ²UëeÆŒ™ê—éÀ§.Jÿ†šœýz«~êü;J×Oà©î•£’áæLªîq¨?{’ZæúëŸ_‰³Á¯ùóHzNyéëß”,v8oÚ²0TCðÎ9èV0‡\ò¼qmÆç—ßPø
ýŒ%Ta*çý¾Þ`3€r )Ü“êqcL›k4
Õ¾Ä!áß>2‚ë/¹lªz=ùëïg>lÌ9zL–c=á¦Hé13ö~]ü.¤“‡`SÄj<Wž–¬¬ˆD4'''
b = bytes(text_in_bytes_format, 'utf-8')
file.write(b)
file.close()

You are a bit confused -- at least regarding terminology if not more
fundamental issues. String literals in Python 3 are Unicode text by default
(type of str not bytes).
text = 'foo' # A string literal.
print(type(text)) # <class 'str'>
Python does support literal byte strings too:
some_bytes = b'foo'
print(type(some_bytes)) # <class 'bytes'>
But your text is not currently a literal byte string. For example, using
a small snippet of your text, we can attempt to create a literal byte string:
text = b'B÷(2öÞ' # SyntaxError: bytes can only contain ASCII literal characters.
Here's what that snippet would look like as a byte literal (i.e, after
running it through text.encode('utf-8')):
some_bytes = b'B\xc3\xb7(2\xc3\xb6\xc3\x9e'
Back to your code:
# You have some literal text, so don't give it
# a variable name implying that it is bytes.
text = 'B÷(2öÞ'
# Convert to bytes.
bs = text.encode('utf-8')
# Open a file for writing bytes directly.
# BTW, it's also a good idea to open/close files using `with` context manager.
with open('file.txt' , 'wb') as fh:
fh.write(bs)
Some of your comments indicate that the text is not Unicode text or has some
unknown encoding. If so, you have a bigger problem and/or cannot use it
directly as a string literal in a Python program (you might need to read
the data in from a file, for example, and tell Python the file encoding
before reading in the data).
If you haven't read it already, Ned Batchelder's presentation on Unicode
and Python can clarify these issues and get you pointed in the right
direction regarding terminology and naming.

file.write(text_in_bytes_format.encode())

Is there a way to decode bytes inside a string object in Python? [duplicate]

This question already has an answer here:
Converting python string into bytes directly without eval()
(1 answer)
Closed 2 years ago.
Let me be more clear.
I'm receiving a string in Python like this:
file = "b'x\\x9c\\xb4'"
The type of file is str. But you can see inside of that string the format of a <class 'bytes'>. It was the result of calling str(file) once file was already encoded. I would like to decode it but i don't know how to decode the bytes inside of a string object.
My question is: is there a way to interpret file as bytes instead of str without having to call something like bytes(file, 'utf-8') or file.encode('utf-8')? The problem with these method is that i would encode the already encoded bytes as i stated before.
Why do i need that?
I'm building an API and i need to send back as a JSON value a significantly big string. Since there was plenty of space for me to compress it, i ended using zlib:
import zlib
file = BIG_STRING
file_compressed = zlib.compress(BIG_STRING.encode(utf-8)) # Since zlib expects a bytes object
send_back({"SOME_BIG_STRING": str(file_compressed)})
I'm sending it back as a string because i can't send it back as a bytes object, it doesn't support that. And if i try to decode it compressed before sending i ended up facing an error:
send_back({"SOME_BIG_STRING": file_compressed.decode('utf-8')})
-> UnicodeDecodeError: utf-8' codec can't decode byte 0x9c in position 1: invalid start byte
And when i receive the same string later in the program, i find myself stuck on the problem described initially.
I'm lacking knowledge right now to be able to do some workaround and couldn't find an answer to this. I'd be extremely grateful if anyone could help me!

Anyway, you can call eval("b'x\\x9c\\xb4'") and get your result b'x\x9c\xb4' if you don't find any other solution. But eval usage isn't recommended in the common case and it will be a bad practice.

How to work 'Object' type variable with special character as string?

I am doing some testing of a text UI element from a Qt application. It has a special character in it. When I try to get the text with Squish the received value type is Object. The final goal is to do some operations on it like printing it out or comparing it with another string. It's also completely fine to rid of the special character and only look at the remaining value.
In an effort to find out what I can do with this value, I've tried the following:
value.split('')
SyntaxError: Ambiguous overload 'split(str)'. Candidates:
QString::split(const QString & sep)
QString::split(QChar sep)
str(value)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in position 2: ordinal not in range(128)
value.encode('utf-8')
AttributeError: Object does not have any properties
unicode(value, error='replace')
TypeError: coercing to Unicode: need string or buffer, Object found
Usually in other case I can use str() fine since there is no special character. This is from Python 2, and upgrading is not really an option since it's quite a big project and takes time. Please give me some suggestion if this can be done in anyway. Thank you.

value.decode("ascii", errors="ignore").encode()

So using dir() on that value I find out that QString function can be used. After some more experimenting I find QString &QString::replace(int position, int n, const QChar *unicode, int size) as the only method that I can use without running into error Ambiguous overload. Then just replacing the special character away and use the remaining value.

I cannot reproduce this problem with Squish for Qt 6.5.x, Python 2.7:
import os
def main():
startApplication("%s/examples/qt/addressbook/addressbook" % os.getenv("SQUISH_PREFIX"))
o = waitForObject({"type": "MainWindow"})
o.windowTitle = "ä"
str(o.windowTitle)
(Posted as an answer since a comment does not seem to support multiple lines, but not meant as an answer.)

Try to decode value with Qt:
value.toUtf8().constData()

How to solve this TypeError issue in python3? "TypeError: a bytes-like object is required, not 'str'"

Recently I switched from python2.7 to python3.7.3 In my project, very frequently facing this typeerror. "TypeError: a bytes-like object is required, not 'str'". I want to define it as string only. I read to encode str objects in one of the posts. but it gives an error like "pass arguments to encode()", it is not working. Is there any permanent solution for this? like importing or defining something at the beginning only.
Thank You.
My code is as follows.
ids = [1,2,3,4,5]
list_ = ['A','B','X','Y','Z','W']
df = [None for i in ids]
print(type(df))
TypeError: a bytes-like object is required, not 'str'

If you are looking to convert a string object to byte object you should do something
like this
st = "Roushan" # a string object
byte_object = st.encode('utf-8')
here byte_object is the actual object and 'utf-8' is encoding scheme
There are a lot of encoding schemes
ASCII
UTF-16
For more on types of encoding Encoding
After this just figure out which argument was needed to be passed as byte instead of str and change that object to byte.
As i dont have nk module installed in my system i leave this to you
EDIT:
open a fresh new empty python file
write the following code
ids = [1,2,3,4,5]
list_ = ['A','B','X','Y','Z','W']
df = [None for i in ids]
print(type(df))
open a terminal and execute
python2.7 mycode.py
python3.7 mycode.py
in case 1 you get
in case 2
and please post the error Traceback you get . I believe the error is originating at other part of the code as this one is fine

Converting Unicode objects with non-ASCII symbols in them into strings objects (in Python)

I want to send Chinese characters to be translated by an online service, and have the resulting English string returned. I'm using simple JSON and urllib for this.
And yes, I am declaring.
# -*- coding: utf-8 -*-
on top of my code.
Now everything works fine if I feed urllib a string type object, even if that object contains what would be Unicode information. My function is called translate.
For example:
stringtest1 = '無與倫比的美麗'
print translate(stringtest1)
results in the proper translation and doing
type(stringtest1)
confirms this to be a string object.
But if do
stringtest1 = u'無與倫比的美麗'
and try to use my translation function I get this error:
File "C:\Python27\lib\urllib.py", line 1275, in urlencode
v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-8: ordinal not in range(128)
After researching a bit, it seems this is a common problem:
Problem: neither urllib2.quote nor urllib.quote encode the unicode strings arguments
urllib.quote throws exception on Unicode URL
Now, if I type in a script
stringtest1 = '無與倫比的美麗'
stringtest2 = u'無與倫比的美麗'
print 'stringtest1',stringtest1
print 'stringtest2',stringtest2
excution of it returns:
stringtest1 ç„¡èˆ‡å€«æ¯”çš„ç¾Žéº—
stringtest2 無與倫比的美麗
But just typing the variables in the console:
>>> stringtest1
'\xe7\x84\xa1\xe8\x88\x87\xe5\x80\xab\xe6\xaf\x94\xe7\x9a\x84\xe7\xbe\x8e\xe9\xba\x97'
>>> stringtest2
u'\u7121\u8207\u502b\u6bd4\u7684\u7f8e\u9e97'
gets me that.
My problem is that I don't control how the information to be translated comes to my function. And it seems I have to bring it in the Unicode form, which is not accepted by the function.
So, how do I convert one thing into the other?
I've read Stack Overflow question Convert Unicode to a string in Python (containing extra symbols).
But this is not what I'm after. Urllib accepts the string object but not the Unicode object, both containing the same information
Well, at least in the eyes of the web application I'm sending the unchanged information to, I'm not sure if they're are still equivalent things in Python.

When you get a unicode object and want to return a UTF-8 encoded byte string from it, use theobject.encode('utf8').
It seems strange that you don't know whether the incoming object is a str or unicode -- surely you do control the call sites to that function, too?! But if that is indeed the case, for whatever weird reason, you may need something like:
def ensureutf8(s):
if isinstance(s, unicode):
s = s.encode('utf8')
return s
which only encodes conditionally, that is, if it receives a unicode object, not if the object it receives is already a byte string. It returns a byte string in either case.
BTW, part of your confusion seems to be due to the fact that you don't know that just entering an expression at the interpreter prompt will show you its repr, which is not the same effect you get with print;-).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.