I'm sending a file over small UDP packets. (python 3)
On the server I divide the file into small pieces and do
packets.append(b64encode(smallPart))
on the other side I do exactly the opposite
packets.append(b64decode(peice))
However, I keep getting (in all but on packet) Incorrect Padding exception
Is there a standard size for b64decode that I'm missing?
Base 64 works by encoding every 3 bytes into 4 bytes. When decoding, it takes those 4 bytes and converts them back to 3 bytes. If there were less than 3 bytes remaining in the input, the output is still padded with '=' to make 4 bytes. If the input to b64decode is not a multiple of 4 bytes you will get the exception.
The easiest solution for you will be to make sure your packets are always a multiple of 4 bytes.
Your description of what you are doing sounds OK. Choice of the input piece size affects only the efficiency. Padding bytes are minimised if the length of each input piece (except of course the last) is a multiple of 3.
You need to show us both your server code and your client code. Alternatively: on the server, log the input and the pieces transmitted. On the client, log the pieces received. Compare.
Curiosity: Why don't you just b64encode the whole string, split the encoded result however you like, transmit the pieces, at the client reassemble the pieces using b''.join(pieces) and b64decode that?
Further curiosity: I thought the contents of a UDP packet could be any old binary bunch of bytes; why are you doing base64 encoding at all?
The length of any properly encoded base64 string should be divisible by 4.
Base64 encodes 3 bytes as 4, so if you start out with a length of string that's not a multiple of 3, the algorithm adds one or two = characters on the end of the encoded form, one for each byte shy of some multiple of 3 (see http://en.wikipedia.org/wiki/Base64#Padding).
The way the alignment comes out, the number of = characters also equals the number of characters shy of a multiple of 4 in the encoded form.
I had been trying to decode an URL-safe base64 encoded string. Simply replacing "." with "=" did the trick for me.
s = s.replace('.', '=')
# then base64decode
Related
I received a string encoded with base64, I am using python to decode it, but decoding failed, I found that the string is followed by / ends, I don't know how to decode it, I haven't found the answer, who can Help me
data = 'dXN1c19pZD0xMDg2P2RvY01kPTE3Mzc4JnR5cGU9bmV3/'
print(base64.urlsafe_b64decode(data))
print(base64.standard_b64decode(data))
print(base64.b64decode(data))
data is a normal base64 encoded string, containing only characters from the base64 character set. The problem is indeed the / on the end, because the length of a base64 string should be dividable by 4 without remainder. So there should be padding on the end if necessary to achieve this. With the / on the end data is 45 characters long, which means 44 base64 characters that could be decoded to 33 bytes and then the last character which encodes only 6 bits.
Just adding padding would not solve it, because you can only add two padding characters (=) but you need one more for the missing two bits.
So you can either cut it off like this:
lenmax = len(data) - len(data)%4
print(base64.b64decode(data[0:lenmax]).decode())
or add something like 0== to fill it up to 48 characters. But then you would get an error in decode(), and I'm not a friend of inventing extra data.
Or ask/check the code of the sender to find out, why there is this lonely / on the end.
I'm a little confused about the encode and bytes functions, when I do a conversion from string to bytes, I'm getting additional bytes.
Where are the additional bytes coming from (in the conversion)?
I tried the bytes function and the encode function as explained here:
https://www.programiz.com/python-programming/methods/built-in/bytes
fake_serial_data = chr(176)+chr(0)+chr(0)+chr(0)+chr(73)+chr(0)+chr(0)+chr(0)+chr(0)+chr(255)
print("Number of bytes in original data:", len(fake_serial_data))
encoded_fake_serial_data = fake_serial_data.encode()
print(encoded_fake_serial_data)
print("Number of bytes in encoded data:", len(encoded_fake_serial_data))
PySerial already works in terms of bytes. You don't need to perform any string->bytes conversion, because PySerial won't give you a string. Just keep the bytes objects that PySerial gives you.
In your fake_serial_data test, len(fake_serial_data) is not a number of bytes. fake_serial_data is not a sequence of bytes, and asking how many bytes are in it is like asking how many pixels are in your house, or how much pencil lead is in these words you're reading.
len(fake_serial_data) is the number of Unicode code points in fake_serial_data, and len(encoded_fake_serial_data) is the number of bytes in the UTF-8 encoding of fake_serial_data. There is no reason to expect a one-to-one correspondence between Unicode code points and resulting bytes. In UTF-8, code points above 127 don't map to a single byte.
If you want an encoding that maps Unicode code points in the 0-255 range to individual bytes (and fails on Unicode code points outside that range), there's latin1, but wanting such a thing usually means you're making a mistake.
I want to convert a given hex into base64 (in python without using any libraries). As I learned from other stackoverflow answers, we can either group 3 hex (12 bits i.e. 4 bits each) to get 2 base64 values (12 bits i.e. 6 bits each). And also we can group 6 hex(24 bits) into 4 base64 values (24 bits).
The standard procedure is to append all the binary bits of hexs together and start grouping from left in packets of 6.
My question is regarding the situation we need padding for:
(Assuming we are converting 3 hex into 2 base64)
There will arise a situation when we are left with only 2 or 1 hex values to convert. Take the example below:
'a1' to base64
10100001 (binary of a1)
101000 01(0000) //making groups of 6 and adding additional 0's where required
This gives "oQ"the answer which is at some place(oQ==) and something different in other place(wqE=)
Q1. Which of the two sources are giving the correct answer? Why the other one is wrong being a good online decoder?
Q2. How do we realise the number of '=' here? (We could have just add sufficient 0's wherever needed as in example above, and thus ending the answer to be just oQ here and not oQ== , assuming oQ== is correct)
My concept is that: if the hex is of length 2 (rather than 3) we pad with a single = (hence complying with the answer wqE= in above case)
, else if the hex is of length 1 ( rather than 3), we pad with double ='s.
At the same time, I am confused that, if 3 hex is converted into 2 base64, we would never need two ='s.
'a' to base64
1010 (binary of a)
Q3. How to convert hex 'a' to base64.
Base64 is defined by RFC 4648 as being "designed to represent arbitrary sequences of
octets". Octet is a unit of 8 bits, in practice synonymous with byte. When your input is in the form of a hex string, your first step should be to decode it into a byte string. You need two hex characters for each byte. If the length of the input is odd, the reasonable course of action is to raise an error.
To address you numbered questions:
Q1: Even while going to implement you own encoder, you can make use of Python standard library to investigate. Decoding the two results back to bytes gives:
>>> import base64
>>> base64.b64decode(b'oQ==')
b'\xa1'
>>> base64.b64decode(b'wqE=')
b'\xc2\xa1'
So, oQ== is correct, while wqE= has a c2 byte added in front. I can guess that it is the result of applying UTF-8 encoding before Base64. To confirm:
>>> '\u00a1'.encode('utf-8')
b'\xc2\xa1'
Q2: The rules for padding are detailed in the RFC.
Q3: This is ambiguous and you are right to be confused.
I am using Python to read micro controller values in a windows based program. The encodings / byte decodings and values have begun to confuse me. Here is my situation:
In the software, I am allowed to call a receive function once per byte received by the Python interpreter, once per line (not quite sure what that is) or once per message which I assume is the entire transmission from the micro controller.
I am struggling with the best way to decode these values. The microcontroller is putting out specific values that correlate to a protocol. For example, calling a function that is supposed to return the hex values:
F0, 79, (the phrase standard_firmata.pde) [then] F7
returns:
b'\xf0y\x02\x03S\x00t\x00a\x00n\x00d\x00a\x00r\x00d\x00F\x00i\x00r\x00m\x00a\x00t\x00a\x00.\x00i\x00n\x00o\x00\xf7'
When set to "once per message" . This is what I want, I can see that the correct values are being sent, but there are too man \x00 values included (they are after every byte it seems). Additionally, the second byte is 0ywhen it is supposed to be 79. It seems like it printed its value in ASCII when all the others were in hex.
How can I ignore all these null characters and make everything in the right format (I am fine with normal hex values)
When Python represents a bytes value, it'll use the ASCII representation for anything that has a printable character. Thus the hex 0x79 byte is indeed represented by a y:
>>> b'\x79'
b'y'
Using ASCII characters makes the representation more readable, but doesn't affect the contents. You can use \x.. hex and ASCII notations interchangeably when creating bytes values.
The data appears to encode a UTF-16 message, little endian:
>>> data = b'\xf0y\x02\x03S\x00t\x00a\x00n\x00d\x00a\x00r\x00d\x00F\x00i\x00r\x00m\x00a\x00t\x00a\x00.\x00i\x00n\x00o\x00\xf7'
>>> data[4:-1].decode('utf-16-le')
'Ì‚StandardFirmata.ino'
UTF 16 uses 2 bytes per character, and for ASCII (and Latin 1) codepoints that means that each 2nd byte is a null.
You can use simple comparisons to test for message types:
if data[:2] == b'\xf0\x79':
assert data[-1] == 0xf7, "Message did not end with F7 closing byte"
version = tuple(data[2:4])
message = data[4:-1].decode('utf-16-le')
I have an input string and I'm encrypting it with MySQL's AES_ENCRYPT and then decrypting it with python (not mysql AES_DECRYPT). I printed out some tests of the decrypted string:
print decrypt_string
print "%sxxx" % decrypt_string
print len(decrypt_string)
print self.toHex(decrypt_string)
When the input string length is 8, i.e. abcdefgh, the tests output will be:
abcdefgh
xxxdefgh
16
0x610x620x630x640x650x660x670x680x80x80x80x80x80x80x80x8
If input string length is 7, i.e. abcdefg:
abcdefg
abcdefg xxx
16
0x610x620x630x640x650x660x670x90x90x90x90x90x90x90x90x9
I found out that the ending char progressively decreases while the length of input string increases. Why is there a difference? If I use PHP AES encrypt instead of MySQL AES_ENCRYPT, the ending char will be 0x00. I am using use a third party python AES lib.
The reason for your observation is that AES is a block cipher, which can only encrypt data in blocks of 128 bits (= 16 bytes). To do so, it is normally used with a mode of operation (to allow encrypting larger pieces of data), and a padding mode. It looks like your python decrypting function does the decrypting, but doesn't undo the padding, giving you this result.
You are using PKCS#5 padding, which will appends a number of bytes (at least one), all of the same value as this number, so the final length is a multiple of the block length.
For your 8-bytes string, one needs to append 8 bytes, each of the value 8. ASCII 8 is the backspace character, which in your terminal moves the cursor one to the left (8 times), resulting in the xxx overprinting the abc.
For your 7-bytes string, one needs to append 9 bytes, each of value 9. This is the horizontal tabulator, making your xxx appear quite on the right.
Either find out how to supply the right padding mode to your decryption function (it should have such an option), or remove the padding yourself: Check the last byte of decrypt_string, convert to a number (using its ASCII value), and take that many bytes from the end of the decrypt_string. You should also check that these all have the same value.
(You should do this first, before interpreting the output as a string, i.e. applying an encoding like UTF-8 or ASCII.)