If I try to do:
from base64 import b64encode
b64encode('ffffff')
I get this error:
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
base64.b64encode('ffffff')
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/base64.py", line 58, in b64encode
encoded = binascii.b2a_base64(s, newline=False)
TypeError: a bytes-like object is required, not 'str'
Because it said bytes-like object I then tried this:
b64encode(bytes('ffffff'))
Which failed.
Traceback (most recent call last):
File "<pyshell#10>", line 1, in <module>
b64encode(bytes('ffffff'))
TypeError: string argument without an encoding
Finally, using the .encode('utf-8') function:
b64encode('ffffff'.encode('utf-8'))
With incorrect output b'ZmZmZmZm', the correct base64 encoding is ////.
I already know how to decode b64 to hex so don't say how to do that.
Edit: This question got flagged for being the same as converting hex strings to hex bytes. This involves base64.
To fully go from the string ffffff to base64 of the hex value, you need to run it through some encoding and decoding, using the codecs module:
import codecs
# Convert string to hex
hex = codecs.decode('ffffff', 'hex')
# Encode as base64 (bytes)
codecs.encode(hex, 'base64')
For an odd-length string like 0xfffff you need to put a zero at the beginning of the hex string (0x0fffff), otherwise python will give you an error.
Here's an alternative to using codecs.
This one is a bit less readable, but works great and hopefully teaches you how codecs, hex and integers work. (word of caution, works on odd lengths, but will ignore the odd byte-string-representation)
import struct
s = 'ffffff'
b''.join([struct.pack('B', int(''.join(x), 16)) for x in zip(s[0::2], s[1::2])])
Which should give you b'\xff\xff\xff'.
Your main problem is probably that you think 'ffffff' represents the values 255, 255, 255. Which they don't. They're still in a string format with the letters ff. Subsequently you need to parse/convert the string representation of hex, into actual hex. We can do this by first passing the string through int() which can intemperate hex in string representation format.
You will need to convert each pair of ff individually by doing int('ff', 16) which tells Python to intemperate the string as a base-16 integer (hex-numbers).
And then convert that integer into a bytes like object representing that integer. That's where struct.pack comes in. It's meant for exactly this.
struct.pack('B', 255) # 255 is given to us by int('ff', 16)
Essentially, 'B' tells Python to pack the value 255 into a 1-byte-object, in this case, that gives us b'\xff' which is your end goal. Now, do this for every 2-pair of letters in your original data.
This is more of a manual approach where you'll iterate over 2 characters in the string at a time, and use the above description to bundle them into what you expect them to be. Or just use codecs, either way works.
Expanded version of the above oneliner:
import struct
hex_string = 'ffffff'
result = b''
for pair in zip(hex_string[0::2], hex_string[1::2]):
value = int(''.join(pair), 16)
result += struct.pack('B', value)
At the very least, I hope this explains how hex works on a practical level. And how the computer interpenetrates hour humanly readable version of bits and bytes.
Related
In python 3, I have a str like this, which is the exactly literal representation of bytes data:
'8\x81p\x925\x00\x003dx\x91P\x00x\x923\x00\x00\x91Pd\x00\x921d\x81p1\x00\x00'
I would like to convert it to real byte,
b'8\x81p\x925\x00\x003dx\x91P\x00x\x923\x00\x00\x91Pd\x00\x921d\x81p1\x00\x00'
I tried to use .encode() on the str data, but the result added many "xc2":
b'8\xc2\x81p\xc2\x925\x00\x003dx\xc2\x91P\x00x\xc2\x923\x00\x00\xc2\x91Pd\x00\xc2\x921d\xc2\x81p1\x00\x00'.
I also tried:
import ast
ast.literal_eval("b'8\x81p\x925\x00\x003dx\x91P\x00x\x923\x00\x00\x91Pd\x00\x921d\x81p1\x00\x00'")
The result is:
ValueError: source code string cannot contain null bytes
How to convert the str input to the bytes as exactly the same as follows?
b'8\x81p\x925\x00\x003dx\x91P\x00x\x923\x00\x00\x91Pd\x00\x921d\x81p1\x00\x00'
You are on the right track with the encode function already. Just try with this encoding:
>>> '8\x81p\x925\x00\x003dx\x91P\x00x\x923\x00\x00\x91Pd\x00\x921d\x81p1\x00\x00'.encode('raw_unicode_escape')
b'8\x81p\x925\x00\x003dx\x91P\x00x\x923\x00\x00\x91Pd\x00\x921d\x81p1\x00\x00'
I took it from this table in Python's codecs documentation
Edit: I just found it needs raw_unicode_escape instead of unicode_escape
Trying to encrypt to HMAC-SHA256 by giving my script a key and message.
A popular example that I saw online fails to run on my machine:
import hmac
import hashlib
import binascii
def create_sha256_signature(key, message):
byte_key = binascii.unhexlify(key)
message = message.encode()
enc = hmac.new(byte_key, message, hashlib.sha256).hexdigest().upper()
print (enc)
create_sha256_signature("KeepMySecret", "aaaaa")
why am I getting this error?
Traceback (most recent call last):
File "encryption.py", line 12, in <module>
create_sha256_signature("SaveMyScret", "aaaaa")
File "encryption.py", line 8, in create_sha256_signature
byte_key = binascii.unhexlify(key)
binascii.Error: Odd-length string
How should I change my code so I will be able to give my own short key?
When you call unhexlify it implies that your key is a hexadecimal representation of bytes. E.g. A73FB0FF.... In this kind of encoding, every character represents just 4 bits and therefore you need two characters for a byte and an even number of characters for the whole input string.
From the docs:
hexstr must contain an even number of hexadecimal digits
But actually the given secrets "SaveMySecret" or "KeepMySecret have not only a odd number of characters, but are not even valid hex code, so it would fail anyway with:
binascii.Error: Non-hexadecimal digit found
You can either provide a key in hex encoded form, or instead of calling unhexlify use something like
byte_key = key.encode('utf-8')
to get bytes as input for hmac.new()
Apparently, the following is the valid syntax:
b'The string'
I would like to know:
What does this b character in front of the string mean?
What are the effects of using it?
What are appropriate situations to use it?
I found a related question right here on SO, but that question is about PHP though, and it states the b is used to indicate the string is binary, as opposed to Unicode, which was needed for code to be compatible from version of PHP < 6, when migrating to PHP 6. I don't think this applies to Python.
I did find this documentation on the Python site about using a u character in the same syntax to specify a string as Unicode. Unfortunately, it doesn't mention the b character anywhere in that document.
Also, just out of curiosity, are there more symbols than the b and u that do other things?
Python 3.x makes a clear distinction between the types:
str = '...' literals = a sequence of Unicode characters (Latin-1, UCS-2 or UCS-4, depending on the widest character in the string)
bytes = b'...' literals = a sequence of octets (integers between 0 and 255)
If you're familiar with:
Java or C#, think of str as String and bytes as byte[];
SQL, think of str as NVARCHAR and bytes as BINARY or BLOB;
Windows registry, think of str as REG_SZ and bytes as REG_BINARY.
If you're familiar with C(++), then forget everything you've learned about char and strings, because a character is not a byte. That idea is long obsolete.
You use str when you want to represent text.
print('שלום עולם')
You use bytes when you want to represent low-level binary data like structs.
NaN = struct.unpack('>d', b'\xff\xf8\x00\x00\x00\x00\x00\x00')[0]
You can encode a str to a bytes object.
>>> '\uFEFF'.encode('UTF-8')
b'\xef\xbb\xbf'
And you can decode a bytes into a str.
>>> b'\xE2\x82\xAC'.decode('UTF-8')
'€'
But you can't freely mix the two types.
>>> b'\xEF\xBB\xBF' + 'Text with a UTF-8 BOM'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str
The b'...' notation is somewhat confusing in that it allows the bytes 0x01-0x7F to be specified with ASCII characters instead of hex numbers.
>>> b'A' == b'\x41'
True
But I must emphasize, a character is not a byte.
>>> 'A' == b'A'
False
In Python 2.x
Pre-3.0 versions of Python lacked this kind of distinction between text and binary data. Instead, there was:
unicode = u'...' literals = sequence of Unicode characters = 3.x str
str = '...' literals = sequences of confounded bytes/characters
Usually text, encoded in some unspecified encoding.
But also used to represent binary data like struct.pack output.
In order to ease the 2.x-to-3.x transition, the b'...' literal syntax was backported to Python 2.6, in order to allow distinguishing binary strings (which should be bytes in 3.x) from text strings (which should be str in 3.x). The b prefix does nothing in 2.x, but tells the 2to3 script not to convert it to a Unicode string in 3.x.
So yes, b'...' literals in Python have the same purpose that they do in PHP.
Also, just out of curiosity, are there
more symbols than the b and u that do
other things?
The r prefix creates a raw string (e.g., r'\t' is a backslash + t instead of a tab), and triple quotes '''...''' or """...""" allow multi-line string literals.
To quote the Python 2.x documentation:
A prefix of 'b' or 'B' is ignored in
Python 2; it indicates that the
literal should become a bytes literal
in Python 3 (e.g. when code is
automatically converted with 2to3). A
'u' or 'b' prefix may be followed by
an 'r' prefix.
The Python 3 documentation states:
Bytes literals are always prefixed with 'b' or 'B'; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.
The b denotes a byte string.
Bytes are the actual data. Strings are an abstraction.
If you had multi-character string object and you took a single character, it would be a string, and it might be more than 1 byte in size depending on encoding.
If took 1 byte with a byte string, you'd get a single 8-bit value from 0-255 and it might not represent a complete character if those characters due to encoding were > 1 byte.
TBH I'd use strings unless I had some specific low level reason to use bytes.
From server side, if we send any response, it will be sent in the form of byte type, so it will appear in the client as b'Response from server'
In order get rid of b'....' simply use below code:
Server file:
stri="Response from server"
c.send(stri.encode())
Client file:
print(s.recv(1024).decode())
then it will print Response from server
The answer to the question is that, it does:
data.encode()
and in order to decode it(remove the b, because sometimes you don't need it)
use:
data.decode()
Here's an example where the absence of b would throw a TypeError exception in Python 3.x
>>> f=open("new", "wb")
>>> f.write("Hello Python!")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' does not support the buffer interface
Adding a b prefix would fix the problem.
It turns it into a bytes literal (or str in 2.x), and is valid for 2.6+.
The r prefix causes backslashes to be "uninterpreted" (not ignored, and the difference does matter).
In addition to what others have said, note that a single character in unicode can consist of multiple bytes.
The way unicode works is that it took the old ASCII format (7-bit code that looks like 0xxx xxxx) and added multi-bytes sequences where all bytes start with 1 (1xxx xxxx) to represent characters beyond ASCII so that Unicode would be backwards-compatible with ASCII.
>>> len('Öl') # German word for 'oil' with 2 characters
2
>>> 'Öl'.encode('UTF-8') # convert str to bytes
b'\xc3\x96l'
>>> len('Öl'.encode('UTF-8')) # 3 bytes encode 2 characters !
3
You can use JSON to convert it to dictionary
import json
data = b'{"key":"value"}'
print(json.loads(data))
{"key":"value"}
FLASK:
This is an example from flask. Run this on terminal line:
import requests
requests.post(url='http://localhost(example)/',json={'key':'value'})
In flask/routes.py
#app.route('/', methods=['POST'])
def api_script_add():
print(request.data) # --> b'{"hi":"Hello"}'
print(json.loads(request.data))
return json.loads(request.data)
{'key':'value'}
b"hello" is not a string (even though it looks like one), but a byte sequence. It is a sequence of 5 numbers, which, if you mapped them to a character table, would look like h e l l o. However the value itself is not a string, Python just has a convenient syntax for defining byte sequences using text characters rather than the numbers itself. This saves you some typing, and also often byte sequences are meant to be interpreted as characters. However, this is not always the case - for example, reading a JPG file will produce a sequence of nonsense letters inside b"..." because JPGs have a non-text structure.
.encode() and .decode() convert between strings and bytes.
bytes(somestring.encode()) is the solution that worked for me in python 3.
def compare_types():
output = b'sometext'
print(output)
print(type(output))
somestring = 'sometext'
encoded_string = somestring.encode()
output = bytes(encoded_string)
print(output)
print(type(output))
compare_types()
If I have a unicode string such as:
s = u'c\r\x8f\x02\x00\x00\x02\u201d'
how can I convert this to just a regular string that isn't in unicode format; i.e. I want to extract:
f = '\x00\x00\x02\u201d'
and I do not want it in unicode format. The reason why I need to do this is because I need to convert the unicode in s to an integer value, but if I try it with just s:
int((s[-4]+s[-3]+s[-2]+s[-1]).encode('hex'), 16)
Traceback (most recent call last):
File "<pyshell#48>", line 1, in <module>
int((s[-4]+s[-3]+s[-2]+s[-1]).encode('hex'), 16)
File "C:\Python27\lib\encodings\hex_codec.py", line 24, in hex_encode
output = binascii.b2a_hex(input)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201d' in position 3: ordinal not in range(128)
yet if I do it with f:
int(f.encode('hex'), 16)
664608376369508L
And this is the correct integer value I want to extract from s. Is there a method where I can do this?
Normally, the device sends back something like: \x00\x00\x03\xcc which I can easily convert to 972
OK, so I think what's happening here is you're trying to read four bytes from a byte-oriented device, and decode that to an integer, interpreting the bytes as a 32-bit word in big-endian order.
To do this, use the struct module and byte strings:
>>> struct.unpack('>i', '\x00\x00\x03\xCC')[0]
972
(I'm not sure why you were trying to reverse the string then hex-encode; that would put the bytes in the wrong order and give much too large output.)
I don't know how you're reading from the device, but at some point you've decoded the bytes into a text (Unicode) string. Judging from the U+201D character in there I would guess that the device originally gave you a byte 0x94 and you decoded it using code page 1252 or another similar Windows default (‘ANSI’) code page.
>>> struct.unpack('>i', '\x00\x00\x02\x94')[0]
660
It may be possible to reverse the incorrect decoding step by encoding back to bytes using the same mapping, but this is dicey and depends on which encoding are involved (not all bytes are mapped to anything usable in all encodings). Better would be to look at where the input is coming from, find where that decode step is happening, and get rid of it so you keep hold of the raw bytes the device sent you.
Apparently, the following is the valid syntax:
b'The string'
I would like to know:
What does this b character in front of the string mean?
What are the effects of using it?
What are appropriate situations to use it?
I found a related question right here on SO, but that question is about PHP though, and it states the b is used to indicate the string is binary, as opposed to Unicode, which was needed for code to be compatible from version of PHP < 6, when migrating to PHP 6. I don't think this applies to Python.
I did find this documentation on the Python site about using a u character in the same syntax to specify a string as Unicode. Unfortunately, it doesn't mention the b character anywhere in that document.
Also, just out of curiosity, are there more symbols than the b and u that do other things?
Python 3.x makes a clear distinction between the types:
str = '...' literals = a sequence of Unicode characters (Latin-1, UCS-2 or UCS-4, depending on the widest character in the string)
bytes = b'...' literals = a sequence of octets (integers between 0 and 255)
If you're familiar with:
Java or C#, think of str as String and bytes as byte[];
SQL, think of str as NVARCHAR and bytes as BINARY or BLOB;
Windows registry, think of str as REG_SZ and bytes as REG_BINARY.
If you're familiar with C(++), then forget everything you've learned about char and strings, because a character is not a byte. That idea is long obsolete.
You use str when you want to represent text.
print('שלום עולם')
You use bytes when you want to represent low-level binary data like structs.
NaN = struct.unpack('>d', b'\xff\xf8\x00\x00\x00\x00\x00\x00')[0]
You can encode a str to a bytes object.
>>> '\uFEFF'.encode('UTF-8')
b'\xef\xbb\xbf'
And you can decode a bytes into a str.
>>> b'\xE2\x82\xAC'.decode('UTF-8')
'€'
But you can't freely mix the two types.
>>> b'\xEF\xBB\xBF' + 'Text with a UTF-8 BOM'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str
The b'...' notation is somewhat confusing in that it allows the bytes 0x01-0x7F to be specified with ASCII characters instead of hex numbers.
>>> b'A' == b'\x41'
True
But I must emphasize, a character is not a byte.
>>> 'A' == b'A'
False
In Python 2.x
Pre-3.0 versions of Python lacked this kind of distinction between text and binary data. Instead, there was:
unicode = u'...' literals = sequence of Unicode characters = 3.x str
str = '...' literals = sequences of confounded bytes/characters
Usually text, encoded in some unspecified encoding.
But also used to represent binary data like struct.pack output.
In order to ease the 2.x-to-3.x transition, the b'...' literal syntax was backported to Python 2.6, in order to allow distinguishing binary strings (which should be bytes in 3.x) from text strings (which should be str in 3.x). The b prefix does nothing in 2.x, but tells the 2to3 script not to convert it to a Unicode string in 3.x.
So yes, b'...' literals in Python have the same purpose that they do in PHP.
Also, just out of curiosity, are there
more symbols than the b and u that do
other things?
The r prefix creates a raw string (e.g., r'\t' is a backslash + t instead of a tab), and triple quotes '''...''' or """...""" allow multi-line string literals.
To quote the Python 2.x documentation:
A prefix of 'b' or 'B' is ignored in
Python 2; it indicates that the
literal should become a bytes literal
in Python 3 (e.g. when code is
automatically converted with 2to3). A
'u' or 'b' prefix may be followed by
an 'r' prefix.
The Python 3 documentation states:
Bytes literals are always prefixed with 'b' or 'B'; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.
The b denotes a byte string.
Bytes are the actual data. Strings are an abstraction.
If you had multi-character string object and you took a single character, it would be a string, and it might be more than 1 byte in size depending on encoding.
If took 1 byte with a byte string, you'd get a single 8-bit value from 0-255 and it might not represent a complete character if those characters due to encoding were > 1 byte.
TBH I'd use strings unless I had some specific low level reason to use bytes.
From server side, if we send any response, it will be sent in the form of byte type, so it will appear in the client as b'Response from server'
In order get rid of b'....' simply use below code:
Server file:
stri="Response from server"
c.send(stri.encode())
Client file:
print(s.recv(1024).decode())
then it will print Response from server
The answer to the question is that, it does:
data.encode()
and in order to decode it(remove the b, because sometimes you don't need it)
use:
data.decode()
Here's an example where the absence of b would throw a TypeError exception in Python 3.x
>>> f=open("new", "wb")
>>> f.write("Hello Python!")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' does not support the buffer interface
Adding a b prefix would fix the problem.
It turns it into a bytes literal (or str in 2.x), and is valid for 2.6+.
The r prefix causes backslashes to be "uninterpreted" (not ignored, and the difference does matter).
In addition to what others have said, note that a single character in unicode can consist of multiple bytes.
The way unicode works is that it took the old ASCII format (7-bit code that looks like 0xxx xxxx) and added multi-bytes sequences where all bytes start with 1 (1xxx xxxx) to represent characters beyond ASCII so that Unicode would be backwards-compatible with ASCII.
>>> len('Öl') # German word for 'oil' with 2 characters
2
>>> 'Öl'.encode('UTF-8') # convert str to bytes
b'\xc3\x96l'
>>> len('Öl'.encode('UTF-8')) # 3 bytes encode 2 characters !
3
You can use JSON to convert it to dictionary
import json
data = b'{"key":"value"}'
print(json.loads(data))
{"key":"value"}
FLASK:
This is an example from flask. Run this on terminal line:
import requests
requests.post(url='http://localhost(example)/',json={'key':'value'})
In flask/routes.py
#app.route('/', methods=['POST'])
def api_script_add():
print(request.data) # --> b'{"hi":"Hello"}'
print(json.loads(request.data))
return json.loads(request.data)
{'key':'value'}
b"hello" is not a string (even though it looks like one), but a byte sequence. It is a sequence of 5 numbers, which, if you mapped them to a character table, would look like h e l l o. However the value itself is not a string, Python just has a convenient syntax for defining byte sequences using text characters rather than the numbers itself. This saves you some typing, and also often byte sequences are meant to be interpreted as characters. However, this is not always the case - for example, reading a JPG file will produce a sequence of nonsense letters inside b"..." because JPGs have a non-text structure.
.encode() and .decode() convert between strings and bytes.
bytes(somestring.encode()) is the solution that worked for me in python 3.
def compare_types():
output = b'sometext'
print(output)
print(type(output))
somestring = 'sometext'
encoded_string = somestring.encode()
output = bytes(encoded_string)
print(output)
print(type(output))
compare_types()