Python AES decrypt string end with different char - python

I have an input string and I'm encrypting it with MySQL's AES_ENCRYPT and then decrypting it with python (not mysql AES_DECRYPT). I printed out some tests of the decrypted string:
print decrypt_string
print "%sxxx" % decrypt_string
print len(decrypt_string)
print self.toHex(decrypt_string)
When the input string length is 8, i.e. abcdefgh, the tests output will be:
abcdefgh
xxxdefgh
16
0x610x620x630x640x650x660x670x680x80x80x80x80x80x80x80x8
If input string length is 7, i.e. abcdefg:
abcdefg
abcdefg xxx
16
0x610x620x630x640x650x660x670x90x90x90x90x90x90x90x90x9
I found out that the ending char progressively decreases while the length of input string increases. Why is there a difference? If I use PHP AES encrypt instead of MySQL AES_ENCRYPT, the ending char will be 0x00. I am using use a third party python AES lib.

The reason for your observation is that AES is a block cipher, which can only encrypt data in blocks of 128 bits (= 16 bytes). To do so, it is normally used with a mode of operation (to allow encrypting larger pieces of data), and a padding mode. It looks like your python decrypting function does the decrypting, but doesn't undo the padding, giving you this result.
You are using PKCS#5 padding, which will appends a number of bytes (at least one), all of the same value as this number, so the final length is a multiple of the block length.
For your 8-bytes string, one needs to append 8 bytes, each of the value 8. ASCII 8 is the backspace character, which in your terminal moves the cursor one to the left (8 times), resulting in the xxx overprinting the abc.
For your 7-bytes string, one needs to append 9 bytes, each of value 9. This is the horizontal tabulator, making your xxx appear quite on the right.
Either find out how to supply the right padding mode to your decryption function (it should have such an option), or remove the padding yourself: Check the last byte of decrypt_string, convert to a number (using its ASCII value), and take that many bytes from the end of the decrypt_string. You should also check that these all have the same value.
(You should do this first, before interpreting the output as a string, i.e. applying an encoding like UTF-8 or ASCII.)

Related

The fundametal method to convert a hex to base64 in python3

I want to convert a given hex into base64 (in python without using any libraries). As I learned from other stackoverflow answers, we can either group 3 hex (12 bits i.e. 4 bits each) to get 2 base64 values (12 bits i.e. 6 bits each). And also we can group 6 hex(24 bits) into 4 base64 values (24 bits).
The standard procedure is to append all the binary bits of hexs together and start grouping from left in packets of 6.
My question is regarding the situation we need padding for:
(Assuming we are converting 3 hex into 2 base64)
There will arise a situation when we are left with only 2 or 1 hex values to convert. Take the example below:
'a1' to base64
10100001 (binary of a1)
101000 01(0000) //making groups of 6 and adding additional 0's where required
This gives "oQ"the answer which is at some place(oQ==) and something different in other place(wqE=)
Q1. Which of the two sources are giving the correct answer? Why the other one is wrong being a good online decoder?
Q2. How do we realise the number of '=' here? (We could have just add sufficient 0's wherever needed as in example above, and thus ending the answer to be just oQ here and not oQ== , assuming oQ== is correct)
My concept is that: if the hex is of length 2 (rather than 3) we pad with a single = (hence complying with the answer wqE= in above case)
, else if the hex is of length 1 ( rather than 3), we pad with double ='s.
At the same time, I am confused that, if 3 hex is converted into 2 base64, we would never need two ='s.
'a' to base64
1010 (binary of a)
Q3. How to convert hex 'a' to base64.
Base64 is defined by RFC 4648 as being "designed to represent arbitrary sequences of
octets". Octet is a unit of 8 bits, in practice synonymous with byte. When your input is in the form of a hex string, your first step should be to decode it into a byte string. You need two hex characters for each byte. If the length of the input is odd, the reasonable course of action is to raise an error.
To address you numbered questions:
Q1: Even while going to implement you own encoder, you can make use of Python standard library to investigate. Decoding the two results back to bytes gives:
>>> import base64
>>> base64.b64decode(b'oQ==')
b'\xa1'
>>> base64.b64decode(b'wqE=')
b'\xc2\xa1'
So, oQ== is correct, while wqE= has a c2 byte added in front. I can guess that it is the result of applying UTF-8 encoding before Base64. To confirm:
>>> '\u00a1'.encode('utf-8')
b'\xc2\xa1'
Q2: The rules for padding are detailed in the RFC.
Q3: This is ambiguous and you are right to be confused.

Encryption ValueError: Input strings must be a multiple of 16 in length

I am trying to encrypt a string which will be decrypted later for a password function.
However, when i am trying to encrypt it, I got an error saying that the input string must be a multiple of 16 in length.
This is my encryption code, which uses the library Jasypt2Python.
def test_basic_encryption(self):
try:
self.ciphertext = "encrypt123"
self.j2p = J2PEngine(self.ciphertext)
given_ciphertext = self.j2p.encrypt('mypw123.')
except Exception:
e_str = traceback.format_exc()
print(e_str)
Any idea how to solve this or to make my password a multiple of 16 in length ?
Most symmetric ciphers such as AES work as so called block ciphers. They encrypt the data „block by block“. Modern algorithms use 128 bit blocks — 16 bytes.
To pad the data means for example to expand the string „hello“ (5 bytes ASCII) to 16 bytes. One possibility is to add 11 bytes each with the value 11. if you decrypt the data you will have to look at the last byte (11) and remove this number of bytes from the end. If your text already is a multiple of 16 add a new block only with padding (each byte of value 16).
Look up „PKCS5 padding“ for the spec.

Represent string as an integer in python

I would like to be able to represent any string as a unique integer (means every integer in the world could mean only one string, and a certain string would result constantly in the same integer).
The obvious point is, that's how the computer works, representing the string 'Hello' (for example) as a number for each character, specifically a byte (assuming ASCII encoding).
But... I would like to perform arithmetic calculations over that number (Encode it as a number using RSA).
The reason this is getting messy is because assuming I have a bit larger string 'I am an average length string' I have more characters (29 in this case), and an integer with 29 bytes could come up HUGE, maybe too much for the computer to handle (when coming up with bigger strings...?).
Basically, my question is, how could I do? I wouldn't like to use any module for RSA, it's a task I would like to implement myself.
Here's how to turn a string into a single number. As you suspected, the number will get very large, but Python can handle integers of any arbitrary size. The usual way of working with encryption is to do individual bytes all at once, but I'm assuming this is only for a learning experience. This assumes a byte string, if you have a Unicode string you can encode to UTF-8 first.
num = 0
for ch in my_string:
num = num << 8 + ord(ch)

Issues with Bytes from a Microcontroller in Python

I am using Python to read micro controller values in a windows based program. The encodings / byte decodings and values have begun to confuse me. Here is my situation:
In the software, I am allowed to call a receive function once per byte received by the Python interpreter, once per line (not quite sure what that is) or once per message which I assume is the entire transmission from the micro controller.
I am struggling with the best way to decode these values. The microcontroller is putting out specific values that correlate to a protocol. For example, calling a function that is supposed to return the hex values:
F0, 79, (the phrase standard_firmata.pde) [then] F7
returns:
b'\xf0y\x02\x03S\x00t\x00a\x00n\x00d\x00a\x00r\x00d\x00F\x00i\x00r\x00m\x00a\x00t\x00a\x00.\x00i\x00n\x00o\x00\xf7'
When set to "once per message" . This is what I want, I can see that the correct values are being sent, but there are too man \x00 values included (they are after every byte it seems). Additionally, the second byte is 0ywhen it is supposed to be 79. It seems like it printed its value in ASCII when all the others were in hex.
How can I ignore all these null characters and make everything in the right format (I am fine with normal hex values)
When Python represents a bytes value, it'll use the ASCII representation for anything that has a printable character. Thus the hex 0x79 byte is indeed represented by a y:
>>> b'\x79'
b'y'
Using ASCII characters makes the representation more readable, but doesn't affect the contents. You can use \x.. hex and ASCII notations interchangeably when creating bytes values.
The data appears to encode a UTF-16 message, little endian:
>>> data = b'\xf0y\x02\x03S\x00t\x00a\x00n\x00d\x00a\x00r\x00d\x00F\x00i\x00r\x00m\x00a\x00t\x00a\x00.\x00i\x00n\x00o\x00\xf7'
>>> data[4:-1].decode('utf-16-le')
'̂StandardFirmata.ino'
UTF 16 uses 2 bytes per character, and for ASCII (and Latin 1) codepoints that means that each 2nd byte is a null.
You can use simple comparisons to test for message types:
if data[:2] == b'\xf0\x79':
assert data[-1] == 0xf7, "Message did not end with F7 closing byte"
version = tuple(data[2:4])
message = data[4:-1].decode('utf-16-le')

python b64decode incorrect padding

I'm sending a file over small UDP packets. (python 3)
On the server I divide the file into small pieces and do
packets.append(b64encode(smallPart))
on the other side I do exactly the opposite
packets.append(b64decode(peice))
However, I keep getting (in all but on packet) Incorrect Padding exception
Is there a standard size for b64decode that I'm missing?
Base 64 works by encoding every 3 bytes into 4 bytes. When decoding, it takes those 4 bytes and converts them back to 3 bytes. If there were less than 3 bytes remaining in the input, the output is still padded with '=' to make 4 bytes. If the input to b64decode is not a multiple of 4 bytes you will get the exception.
The easiest solution for you will be to make sure your packets are always a multiple of 4 bytes.
Your description of what you are doing sounds OK. Choice of the input piece size affects only the efficiency. Padding bytes are minimised if the length of each input piece (except of course the last) is a multiple of 3.
You need to show us both your server code and your client code. Alternatively: on the server, log the input and the pieces transmitted. On the client, log the pieces received. Compare.
Curiosity: Why don't you just b64encode the whole string, split the encoded result however you like, transmit the pieces, at the client reassemble the pieces using b''.join(pieces) and b64decode that?
Further curiosity: I thought the contents of a UDP packet could be any old binary bunch of bytes; why are you doing base64 encoding at all?
The length of any properly encoded base64 string should be divisible by 4.
Base64 encodes 3 bytes as 4, so if you start out with a length of string that's not a multiple of 3, the algorithm adds one or two = characters on the end of the encoded form, one for each byte shy of some multiple of 3 (see http://en.wikipedia.org/wiki/Base64#Padding).
The way the alignment comes out, the number of = characters also equals the number of characters shy of a multiple of 4 in the encoded form.
I had been trying to decode an URL-safe base64 encoded string. Simply replacing "." with "=" did the trick for me.
s = s.replace('.', '=')
# then base64decode

Categories