base64.encodestring failing in python 3 - python

The following piece of code runs successfully on a python 2 machine:
base64_str = base64.encodestring('%s:%s' % (username,password)).replace('\n', '')
I am trying to port it over to Python 3 but when I do so I encounter the following error:
>>> a = base64.encodestring('{0}:{1}'.format(username,password)).replace('\n','')
Traceback (most recent call last):
File "/auto/pysw/cel55/python/3.4.1/lib/python3.4/base64.py", line 519, in _input_type_check
m = memoryview(s)
TypeError: memoryview: str object does not have the buffer interface
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/auto/pysw/cel55/python/3.4.1/lib/python3.4/base64.py", line 548, in encodestring
return encodebytes(s)
File "/auto/pysw/cel55/python/3.4.1/lib/python3.4/base64.py", line 536, in encodebytes
_input_type_check(s)
File "/auto/pysw/cel55/python/3.4.1/lib/python3.4/base64.py", line 522, in _input_type_check
raise TypeError(msg) from err
TypeError: expected bytes-like object, not str
I tried searching examples for encodestring usage but not able to find a good document. Am I missing something obvious? I am running this on RHEL 2.6.18-371.11.1.el5

You can encode() the string (to convert it to byte string) , before passing it into base64.encodestring . Example -
base64_str = base64.encodestring(('%s:%s' % (username,password)).encode()).decode().strip()

To expand on Anand's answer (which is quite correct), Python 2 made little distinction between "Here's a string which I want to treat like text" and "Here's a string which I want to treat like a sequence of 8-bit byte values". Python 3 firmly distinguishes the two, and doesn't let you mix them up: the former is the str type, and the latter is the bytes type.
When you Base64 encode a string, you're not actually treating the string as text, you're treating it as a series of 8-bit byte values. That's why you're getting an error from base64.encodestring() in Python 3: because that is an operation that deals with the string's characters as 8-bit bytes, and so you should pass it a paramter of type bytes rather than a parameter of type str.
Therefore, to convert your str object into a bytes object, you have to call its encode() method to turn it into a set of 8-bit byte values, in whatever Unicode encoding you have chosen to use. (Which should be UTF-8 unless you have a very specific reason to choose something else, but that's another topic).

In Python 3 encodestring docs says:
def encodestring(s):
"""Legacy alias of encodebytes()."""
import warnings
warnings.warn("encodestring() is a deprecated alias, use encodebytes()", DeprecationWarning, 2)
return encodebytes(s)
Here is working code for Python 3.5.1, it also shows how to url encode:
def _encodeBase64(consumer_key, consumer_secret):
"""
:type consumer_key: str
:type consumer_secret: str
:rtype str
"""
# 1. URL encode the consumer key and the consumer secret according to RFC 1738.
dummy_param_name = 'bla'
key_url_encoded = urllib.parse.urlencode({dummy_param_name: consumer_key})[len(dummy_param_name) + 1:]
secret_url_encoded = urllib.parse.urlencode({dummy_param_name: consumer_secret})[len(dummy_param_name) + 1:]
# 2. Concatenate the encoded consumer key, a colon character “:”, and the encoded consumer secret into a single string.
credentials = '{}:{}'.format(key_url_encoded, secret_url_encoded)
# 3. Base64 encode the string from the previous step.
bytes_base64_encoded_credentials = base64.encodebytes(credentials.encode('utf-8'))
return bytes_base64_encoded_credentials.decode('utf-8').replace('\n', '')
(I am sure it could be more concise, I am new to Python...)
Also see: http://pythoncentral.io/encoding-and-decoding-strings-in-python-3-x/

Related

Is this the right way to make code working with Python3?

I am updating some Python2 code written by others, and this part:
def exec(self, content, query):
# query = "city_68"
content = content.strip().strip(',').decode('utf-8', 'ignore')
query = query.decode('utf-8', 'ignore')
query_list = query.split('|')
This gives an error in Python3:
File "/Users/cong/bexec.py", line 708, in bexec
content = content.strip().strip(',').decode('utf-8', 'ignore')
AttributeError: 'str' object has no attribute 'decode'
The parameters content and query are both strings. So I removed the decode part:
content = content.strip().strip(',')
# query = query.decode('utf-8', 'ignore')
Now it doesn't complain any more. Is this safe to do? I guess in Python3 it doesn't need decode() any more.
Correct. In Python 3, if you have a str value, you can assume it is a proper sequence of Unicode code points, not a sequence of bytes that need to be decoded from (say) UTF-8 to a Unicode string. If you have a bytes value, you must decode it first in order to get a proper Unicode string.
In Python 2, the boundaries were looser. A unicode value was definitely a proper Unicode string (and was renamed str in Python 3), while a str value could be a "real" ASCII-only string value or arbitrary binary data: you couldn't tell just from the type.
As such, the str type supported encode and decode methods to allow switching between the two sides of the str type.
In Python 3, with more strictly defined roles, you can call str.encode to get a bytes value, or you can call bytes.decode to get a str value. You cannot decode a str or further encode a bytes. str.decode and bytes.encode simply do not exist.
In some sense, all files are binary files: they consist of a stream of bytes. What we call a text file is just a file whose bytes are intended to be decoded using a particular text decoder, like ASCII or UTF-8, as opposed to something like a JPEG decoder, or a JVM, or your CPU itself.
When you use open to open a file in text mode (the default), its read method returns str values, resulting from applying file object's decoder to the raw bytes read from the file.
When you use open to open a file in binary mode, its read method returns bytes values, the raw bytes being left undecoded for you to handle as you see fit.

Python script to encrypt a message fails

Trying to encrypt to HMAC-SHA256 by giving my script a key and message.
A popular example that I saw online fails to run on my machine:
import hmac
import hashlib
import binascii
def create_sha256_signature(key, message):
byte_key = binascii.unhexlify(key)
message = message.encode()
enc = hmac.new(byte_key, message, hashlib.sha256).hexdigest().upper()
print (enc)
create_sha256_signature("KeepMySecret", "aaaaa")
why am I getting this error?
Traceback (most recent call last):
File "encryption.py", line 12, in <module>
create_sha256_signature("SaveMyScret", "aaaaa")
File "encryption.py", line 8, in create_sha256_signature
byte_key = binascii.unhexlify(key)
binascii.Error: Odd-length string
How should I change my code so I will be able to give my own short key?
When you call unhexlify it implies that your key is a hexadecimal representation of bytes. E.g. A73FB0FF.... In this kind of encoding, every character represents just 4 bits and therefore you need two characters for a byte and an even number of characters for the whole input string.
From the docs:
hexstr must contain an even number of hexadecimal digits
But actually the given secrets "SaveMySecret" or "KeepMySecret have not only a odd number of characters, but are not even valid hex code, so it would fail anyway with:
binascii.Error: Non-hexadecimal digit found
You can either provide a key in hex encoded form, or instead of calling unhexlify use something like
byte_key = key.encode('utf-8')
to get bytes as input for hmac.new()

String object has no attribute 'decode' when converting UTF-8

I'm trying to convert G\xc3\xb6del to Gödel (specifically, \xc3\xb6d to ö), but I can't find a method for going about doing this. When I run the below code, I receive an error:
>>> string = '\xc3\xb6'
>>> string.decode(encoding='UTF-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'
This question didn't seem to help, nor did any others that seemed similar, as they were all from 2.x. A friend mentioned base 64 encoding, but I'm not sure in what way that helps. I can't seem to find what I'm supposed to do to convert it in 3.8, so what would be the best way to go about doing this?
The issue here is that a string is already decoded. Basically you encode a string object to a byte object, and the inverse operation is decoding a byte object to a string object. That's why a string has no attribute decode. Think of it like this:
String -> encode -> Byte
Byte -> decode -> String
In this case, the solution would be to call the encode method and pass in 'utf8' or 'ascii', depending on the context and situation.
However, it isn't just converting it to a string object that is the case here. As the OA of this question, I do know exactly what this was meant for, and how I came to a solution. The value Gödel was gained by scraping an SCP Foundation page, finding the Object Class to then pass onto my Discord bot for a command. Here was my code:
link = f"http://www.scp-wiki.net/scp-{num}"
page = get(link)
obj_class = [str(i) for i in page.iter_lines() if b"Object Class:" in i][0]
# ^ There should only be one line in the document matching that requirement.
# The type of this line is a byte object, which is why conversion is necessary later on.
obj_class = re.findall('(?<=\<\/strong> )(.*?)(?=\<)', obj_class)[0]
# ^ Find the actual class in that line.
print(obj_class) # expected Gödel, got G\xc3\xb6del instead.
The above would not raise an exception, it just simply wouldn't convert the character encoding as desired. My fix was simple, once I understood what was going on; replace the str(i) for i.decode('utf8').
obj_class = [i.decode('utf8') for i in page.iter_lines() if b"Object Class:" in i][0]
# ^ decoding it there really makes the difference, converting it to utf-8 without dealing with
# the issues of decoded strings later on.
This would now return the desired value, Gödel, rather than G\xc3\xb6del. I hope that this helps. Please let me know if I've made any mistakes, so I can make any necessary corrections.

How do I encode hexadecimal to base64 in python?

If I try to do:
from base64 import b64encode
b64encode('ffffff')
I get this error:
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
base64.b64encode('ffffff')
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/base64.py", line 58, in b64encode
encoded = binascii.b2a_base64(s, newline=False)
TypeError: a bytes-like object is required, not 'str'
Because it said bytes-like object I then tried this:
b64encode(bytes('ffffff'))
Which failed.
Traceback (most recent call last):
File "<pyshell#10>", line 1, in <module>
b64encode(bytes('ffffff'))
TypeError: string argument without an encoding
Finally, using the .encode('utf-8') function:
b64encode('ffffff'.encode('utf-8'))
With incorrect output b'ZmZmZmZm', the correct base64 encoding is ////.
I already know how to decode b64 to hex so don't say how to do that.
Edit: This question got flagged for being the same as converting hex strings to hex bytes. This involves base64.
To fully go from the string ffffff to base64 of the hex value, you need to run it through some encoding and decoding, using the codecs module:
import codecs
# Convert string to hex
hex = codecs.decode('ffffff', 'hex')
# Encode as base64 (bytes)
codecs.encode(hex, 'base64')
For an odd-length string like 0xfffff you need to put a zero at the beginning of the hex string (0x0fffff), otherwise python will give you an error.
Here's an alternative to using codecs.
This one is a bit less readable, but works great and hopefully teaches you how codecs, hex and integers work. (word of caution, works on odd lengths, but will ignore the odd byte-string-representation)
import struct
s = 'ffffff'
b''.join([struct.pack('B', int(''.join(x), 16)) for x in zip(s[0::2], s[1::2])])
Which should give you b'\xff\xff\xff'.
Your main problem is probably that you think 'ffffff' represents the values 255, 255, 255. Which they don't. They're still in a string format with the letters ff. Subsequently you need to parse/convert the string representation of hex, into actual hex. We can do this by first passing the string through int() which can intemperate hex in string representation format.
You will need to convert each pair of ff individually by doing int('ff', 16) which tells Python to intemperate the string as a base-16 integer (hex-numbers).
And then convert that integer into a bytes like object representing that integer. That's where struct.pack comes in. It's meant for exactly this.
struct.pack('B', 255) # 255 is given to us by int('ff', 16)
Essentially, 'B' tells Python to pack the value 255 into a 1-byte-object, in this case, that gives us b'\xff' which is your end goal. Now, do this for every 2-pair of letters in your original data.
This is more of a manual approach where you'll iterate over 2 characters in the string at a time, and use the above description to bundle them into what you expect them to be. Or just use codecs, either way works.
Expanded version of the above oneliner:
import struct
hex_string = 'ffffff'
result = b''
for pair in zip(hex_string[0::2], hex_string[1::2]):
value = int(''.join(pair), 16)
result += struct.pack('B', value)
At the very least, I hope this explains how hex works on a practical level. And how the computer interpenetrates hour humanly readable version of bits and bytes.

Converting to Precomposed Unicode String using Python-AppKit-ObjectiveC

This document by Apple Technical Q&A QA1235 describes a way to convert unicode strings from a composed to a decomposed version. Since I have a problem with file names containing some characters (e.g. an accent grave), I'd like to try the conversion function
void CFStringNormalize(CFMutableStringRef theString,
CFStringNormalizationForm theForm);
I am using this with Python and the AppKit library. If i pass a Python String as an argument, I get:
CoreFoundation.CFStringNormalize("abc",0)
2009-04-27 21:00:54.314 Python[4519:613] * -[OC_PythonString _cfNormalize:]: unrecognized selector sent to instance 0x1f02510
Traceback (most recent call last):
File "", line 1, in
ValueError: NSInvalidArgumentException - * -[OC_PythonString _cfNormalize:]: unrecognized selector sent to instance 0x1f02510
I suppose this is because a CFMutableStringRef is needed as an argument. How do I convert a Python String to CFMutableStringRef?
OC_PythonString (which is what Python strings are bridged to) is an NSString subclass, so you could get an NSMutableString with:
mutableString = NSMutableString.alloc().initWithString_("abc")
then use mutableString as the argument to CFStringNormalize.

Categories