PHP7.4: OpenSSL AES-CFB encryption different to Python

PHP7.4: OpenSSL AES-CFB encryption different to Python - python

I'm trying to use PHP7.4 to replicate a piece of python code which is using Pycryptodome to do a AES-128-CFB encryption.
For this I'm using the openssl_encrypt built-in function of PHP.
I tried several configuration parameters and CFB modes but I'm getting different results all the time.
I found out that pycryptodomes CFB implementation seems to use the 8 bit segment size, which should be the aes-128-cfb8 mode in PHP's openssl implementation.
The IV is intentionally fixed to 0, so please just ignore the fact it is unsecure.
Here is the code I want to replicate, followed by the PHP code trying to replicate the results with different approaches.
Something tells me it has to do with PHP's 'byte handling', because python distincts between a byte string (returned by .encode('utf-8')) and string.
At the end you can see the outputs of both codes:
Python code:
import hashlib
from Crypto.Cipher import AES
key = 'testKey'
IV = '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
ENC_KEY = hashlib.md5(key.encode('utf-8')).hexdigest()
print('key: "' + key + '"')
print('hashedKey: ' + ENC_KEY)
obj = AES.new(ENC_KEY.encode("utf8"), AES.MODE_CFB, IV.encode("utf8"))
test_data = 'test'
print('encrypting "' + test_data + '"')
encData = obj.encrypt(test_data.encode("utf8"))
print('encData: ' + encData.hex())
PHP code:
function encTest($testStr, $ENC_KEY)
{
$iv = hex2bin('00000000000000000000000000000000');
echo "aes-128-cfb8-1: ".bin2hex(openssl_encrypt($testStr, 'aes-128-cfb8', $ENC_KEY, OPENSSL_RAW_DATA, $iv))."\n";
echo "aes-128-cfb1-1: ".bin2hex(openssl_encrypt($testStr, 'aes-128-cfb1', $ENC_KEY, OPENSSL_RAW_DATA, $iv))."\n";
echo "aes-128-cfb-1: ".bin2hex(openssl_encrypt($testStr, 'aes-128-cfb', $ENC_KEY, OPENSSL_RAW_DATA, $iv))."\n";
echo "\n";
echo "aes-128-cfb8-2: ".bin2hex(openssl_encrypt($testStr, 'aes-128-cfb8', $ENC_KEY, OPENSSL_RAW_DATA|OPENSSL_ZERO_PADDING, $iv))."\n";
echo "aes-128-cfb1-2: ".bin2hex(openssl_encrypt($testStr, 'aes-128-cfb1', $ENC_KEY, OPENSSL_RAW_DATA|OPENSSL_ZERO_PADDING, $iv))."\n";
echo "aes-128-cfb-2: ".bin2hex(openssl_encrypt($testStr, 'aes-128-cfb', $ENC_KEY, OPENSSL_RAW_DATA|OPENSSL_ZERO_PADDING, $iv))."\n";
echo "\n";
echo "aes-128-cfb8-3: ".bin2hex(openssl_encrypt(utf8_encode($testStr), 'aes-128-cfb8', utf8_encode($ENC_KEY), OPENSSL_RAW_DATA|OPENSSL_ZERO_PADDING, $iv))."\n";
echo "aes-128-cfb1-3: ".bin2hex(openssl_encrypt(utf8_encode($testStr), 'aes-128-cfb1', utf8_encode($ENC_KEY), OPENSSL_RAW_DATA|OPENSSL_ZERO_PADDING, $iv))."\n";
echo "aes-128-cfb-3: ".bin2hex(openssl_encrypt(utf8_encode($testStr), 'aes-128-cfb', utf8_encode($ENC_KEY), OPENSSL_RAW_DATA|OPENSSL_ZERO_PADDING, $iv))."\n";
echo "\n";
echo "aes-128-cfb8-4: ".bin2hex(openssl_encrypt(utf8_encode($testStr), 'aes-128-cfb8', utf8_encode($ENC_KEY), OPENSSL_RAW_DATA, $iv))."\n";
echo "aes-128-cfb1-4: ".bin2hex(openssl_encrypt(utf8_encode($testStr), 'aes-128-cfb1', utf8_encode($ENC_KEY), OPENSSL_RAW_DATA, $iv))."\n";
echo "aes-128-cfb-4: ".bin2hex(openssl_encrypt(utf8_encode($testStr), 'aes-128-cfb', utf8_encode($ENC_KEY), OPENSSL_RAW_DATA, $iv))."\n";
echo "\n";
}
$key = "testKey";
$ENC_KEY = hash('md5', utf8_encode($key));
echo "ENC_KEY: ".$ENC_KEY."\n";
$test = "test";
echo "encrypting \"".$test."\"\n";
encTest($test, $ENC_KEY);
Python output (encData should be replicated):
key: "testKey"
hashedKey: 24afda34e3f74e54b61a8e4cbe921650
encrypting "test"
encData: 117c1974
PHP output:
key: "testKey"
hashedKey: 24afda34e3f74e54b61a8e4cbe921650
encrypting "test"
aes-128-cfb8-1: b0016a55
aes-128-cfb1-1: bac44c56
aes-128-cfb-1: b0f1c27a
aes-128-cfb8-2: b0016a55
aes-128-cfb1-2: bac44c56
aes-128-cfb-2: b0f1c27a
aes-128-cfb8-3: b0016a55
aes-128-cfb1-3: bac44c56
aes-128-cfb-3: b0f1c27a
aes-128-cfb8-4: b0016a55
aes-128-cfb1-4: bac44c56
aes-128-cfb-4: b0f1c27a

In the PHP code (more precisely for openssl_encrypt), the AES variant is specified explicitly, e.g. as in the current case with aes-128-..., i.e. PHP uses AES-128. A key that is too long is truncated, a key that is too short is padded with 0 values. Since the hash method in the PHP code returns its result as hex string, the 16 bytes MD5 hash is represented by 32 characters (32 bytes), i.e. in the current case PHP uses the first 16 bytes of the key (AES-128).
The hexdigest method in the Python code also returns the result as hex string. However, in the Python code (more precisely for PyCryptodome), the AES variant is specified by the keysize, i.e. the Python code uses the full 32 bytes key and thus AES-256.
The different keys and AES variants are the main reason for the different results. To fix this issue, the same keys and AES variants must be used in both codes:
Option 1 is to use AES-128 in the Python code as well. This can be achieved by the following change:
obj = AES.new(ENC_KEY[:16].encode("utf8"), AES.MODE_CFB, IV.encode("utf8"))
Then the output b0016a55 is in accordance with the result of the PHP code for aes-128-cfb8.
Option 2 is to also use AES-256 in the PHP code. This can be done by replacing aes-128... with aes-256... Then the output is
aes-256-cfb8-1: 117c1974
aes-256-cfb1-1: 54096db1
aes-256-cfb-1 : 11bfdaa9
and, as expected, the output 117c1974 for aes-128-cfb8 matches the original value of the Python code.
The CFB mode changes a block cipher into a stream cipher. Thereby n bits are encrypted in each encryption step, which is called CFBn. For the exact details s. here.
The term CFBn (or cfbn) is also used in PHP, i.e. CFB1 means encryption of one bit, CFB8 of 8 bit (= one byte) and CFB of a whole block (16 bytes). In Python, the number of bits per step is specified with segment_size.
I.e. the counterpart of ...-cfb8 in PHP is segment_size = 8 in Python and the counterpart of ...-cfb in PHP is segment_size = 128 in Python.
In the following it is assumed that an identical key and an identical AES variant are used in both codes.
Since segment_size = 8 is the default, the result from the Python code is the same as for ...-cfb8 from the PHP code. If segement_size = 128 in the Python code is chosen, the result is the same as for ...-cfb in the PHP code. However, in PyCryptodome the segment_size must be an integer multiple of 8, otherwise the error message 'segment_size' must be positive and multiple of 8 bits is displayed. For this reason the CFB1 mode is not supported by PyCryptodome.
Also note:
The result of the digest can also be returned binary in both codes and not as hex string. To do this, the third parameter of the PHP method hash must be set to TRUE (default: FALSE). In Python, simply use the digest method instead of hexdigest.
In the PHP code, for a stream cipher mode like CFB, padding is automatically disabled, so the OPENSSL_ZERO_PADDING flag (which can be used to explicitly disable padding) makes no difference.
utf8_encode allows you to convert from ISO-8859-1 encoding to UTF-8, but since the $ENC_KEY consists of alphanumeric characters (hex encoding) this has no effect. In general, however, arbitrary binary data (such as the result of a digest) must not be UTF8 encoded, as this would corrupt the data. There are other encodings for this purpose, such as Base64. If the results of the digest are returned in binary form (see 1st point), no UTF8 encoding may be performed.
There is a bug in the legacy PyCrypto library in the context of CFB mode that requires the plaintext to have a length that is an integer multiple of the segment size. Otherwise the following error occurs: Input strings must be a multiple of the segment size 16 in length.

Related

Identifying and removing lines with non utf-8 characters in files

I have a python program that parses text files line for line. Some, very few, of these lines are corrupt meaning that they have non utf-8 characters. Once a line has a corrupt character, then the whole content of the line is waste. So solutions that delete, replace etc single characters won't do - I want to delete any line with none utf-8 characters as priority number 1, but saving it to another file to inspect it further is of interest if possible. All previous solutions I find only delete/replace non utf-8 characters.
My main language is python, however I am working in Linux so bash etc is a viable solution.

My main language is python, however I am working in Linux so bash etc is a viable solution.
I don't know python well enough to use it for an answer, so here's a perl version. The logic should be pretty similar:
#!/usr/bin/env perl
use warnings;
use strict;
use Encode;
# One argument: filename to log corrupt lines to. Reads from standard
# input, prints valid lines on standard output; redirect to another
# file if desired.
# Treat input and outputs as binary streams, except STDOUT is marked
# as UTF8 encoded.
open my $errors, ">:raw", $ARGV[0] or die "Unable to open $ARGV[0]: $!\n";
binmode STDIN, ":raw";
binmode STDOUT, ":raw:utf8";
# For each line read from standard input, print it to standard
# output if valid UTF-8, otherwise log it.
while (my $line = <STDIN>) {
eval {
# Default decode behavior is to replace invalid sequences with U+FFFD.
# Raise an error instead.
print decode("UTF-8", $line, Encode::FB_CROAK);
} or print $errors $line;
}
close $errors;

Decrypt cipher text encrypted with PyCrypto using cryptopp

My server encrypts files using pycrypto with AES in CTR mode. My counter is a simple counter like this:
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03
I wanna decrypt the cipher text with c++'s cryptopp library in my clients. How should I do so?
Python code:
encryptor = AES.new(
CRYPTOGRAPHY_KEY,
AES.MODE_CTR,
counter=Counter.new(128),
)
cipher = encryptor.encrypt(plain_text)
C++ code so far:
byte ctr[] = "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01"
mDecryptor = new CryptoPP::CTR_Mode<CryptoPP::AES>::Decryption(key, 32, ctr);
std::string plain;
CryptoPP::StringSource(std::string(data, len), true, new CryptoPP::StreamTransformationFilter(*mDecryptor, new CryptoPP::StringSink(plain)));
but after running this plain is garbage.
Update:
Sample encrypted data you can try to decrypt with crypto++ so that you can help me even if you don't know python and you're just experienced with crypto++:
Try to decrypt this base64 encoded text:
2t0lLuSBY7NkfK5I4kML0qjcZl3xHcEQBPbDo4TbvQaXuUT8W7lNbRCl8hfSGJA00wgUXhAjQApcuTCZckb9e6EVOwsa+eLY78jo2CqYWzhGez9zn0D2LMKNmZQi88WuTFVw9r1GSKIHstoDWvn54zISmr/1JgjC++mv2yRvatcvs8GhcsZVZT8dueaNK6tXLd1fQumhXCjpMyFjOlPWVTBPjlnsC5Uh98V/YiIa898SF4dwfjtDrG/fQZYmWUzJ8k2AslYLKGs=
with this key:
12341234123412341234123412341234
with counter function described in the beginning of this post using crypto++. If you succeed post the decrypted text (which contains only numbers) and your solution please.
Update2:
I'm not providing an IV in python code, the python module ignores IV. I the IV thing is what causing the problem.

As I read their source codes I can say PyCrypto and Crypto++ Both are perfect libraries for cryptography for Python and C++. The problem was that I was prefixing the encrypted data with some meta information about file and I totally forgot about that, after handling these meta data in client Crypto++ decrypted my files.
As I didn't find this documented explicitly anywhere (not even in Wikipedia) I write it here:
Any combination of Nonce, IV and Counter like concatenation, xor, or likes will work for CTR mode, but the standard that most libraries implement is to concatenate these values in order. So the value that is used in block cipher algorithm is usually: Nonce + IV + Counter. And counter usually starts from 1 (not 0).

Calculate and add hash to a file in python

a way of checking if a file has been modified, is calculating and storing a hash (or checksum) for the file. Then at any point the hash can be re-calculated and compared against the stored value.
I'm wondering if there is a way to store the hash of a file in the file itself? I'm thinking text files.
The algorithm to calculate the hash should be iterative and consider that the hash will be added to the file the hash is being calculated for... makes sense? Anything available?
Thanks!
edit:
https://security.stackexchange.com/questions/3851/can-a-file-contain-its-md5sum-inside-it

from Crypto.Hash import HMAC
secret_key = "Don't tell anyone"
h = HMAC.new(secret_key)
text = "whatever you want in the file"
## or: text = open("your_file_without_hash_yet").read()
h.update(text)
with open("file_with_hash") as fh:
fh.write(text)
fh.write(h.hexdigest())
Now, as some people tried to point out, though they seemed confused - you need to remember that this file has the hash on the end of it and that the hash is itself not part of what gets hashed. So when you want to check the file, you would do something along the lines of:
end_len = len(h.hex_digest())
all_text = open("file_with_hash").read()
text, expected_hmac = all_text[:end_len], all_text[end_len:]
h = HMAC.new(secret_key)
h.update(text)
if h.hexdigest() != expected_hmac:
raise "Somebody messed with your file!"
It should be clear though that this alone doesn't ensure your file hasn't been changed; the typical use case is to encrypt your file, but take the hash of the plaintext. That way, if someone changes the hash (at the end of the file) or tries changing any of the characters in the message (the encrypted portion), things will mismatch and you will know something was changed.
A malicious actor won't be able to change the file AND fix the hash to match because they would need to change some data, and then rehash everything with your private key. So long as no one knows your private key, they won't know how to recreate the correct hash.

This is an interesting question. You can do it if you adopt a proper convention for hashing and verifying the integrity of the files. Suppose you have this file, namely, main.py:
#!/usr/bin/env python
# encoding: utf-8
print "hello world"
Now, you could append an SHA-1 hash to the python file as a comment:
(printf '#'; cat main.py | sha1sum) >> main.py
Updated main.py:
#!/usr/bin/env python
# encoding: utf-8
print "hello world"
#30e3b19d4815ff5b5eca3a754d438dceab9e8814 -
Hence, to verify if the file was modified you can do this in Bash:
if [ "$(printf '#';head -n-1 main.py | sha1sum)" == "$(tail -n1 main.py)" ]
then
echo "Unmodified"
else
echo "Modified"
fi
Of course, someone could try to fool you by changing the hash string manually. In order to stop these bad guys, you can improve the system by tempering the file with a secret string before adding the hash to the last line.
Improved version
Add the hash in the last line including your secret string:
(printf '#';cat main.py;echo 'MyUltraSecretTemperString12345') | sha1sum >> main.py
For checking if the file was modified:
if [ "$(printf '#';(head -n-1 main.py; echo 'MyUltraSecretTemperString12345') | sha1sum)" == "$(tail -n1 main.py)" ]
then
echo "Unmodified"
else
echo "Modified"
fi
Using this improved version, the bad guys only can fool you if they find your ultra secret key first.
EDIT: This is a rough implementation of the keyed-hash message authentication code (HMAC).

Well although it looks like a strange idea, it could be an application of a little used but very powerful property of windows NTFS file system: the File Streams.
It allows to add many streams to a file without changing the content of the default stream. For example:
echo foo > foo.text
echo bar > foo.text:alt
type foo.text
=> foo
more < foo.text:alt
=> bar
But when listing the directory, you can only see one single file: foo.txt
So in your use case, you could write the hash of main stream in stream named hash, and later compare the content of the hash stream with the hash of the main stream.
Just a remark: for a reason I do not know, type foo.text:alt generates the following error:
"The filename, directory name, or volume label syntax is incorrect."
that's why my example uses more < as recommended in the Using streams page on MSDN
So assuming you have a myhash function that gives the hash for a file (you can easily build one by using the hashlib module):
def myhash(filename):
# compute the hash of the file
...
return hash_string
You can do:
def store_hash(filename):
hash_string = myhash(filename)
with open(filename + ":hash") as fd:
fd.write(hash_string)
def compare_hash(filename):
hash_string = myhash(filename)
with open(filename + ":hash") as fd:
orig = fd.read()
return (hash_string == orig)

AES-ECB encryption (Difference between Python Crypto.Cipher and openssl)

i have a problem with encryption using python and openssl.
i wrote this small python script:
#!/usr/bin/python
from Crypto.Cipher import AES
obj = AES.new('Thisisakey123456', AES.MODE_ECB)
message = "Sample text....."
ciphertext = obj.encrypt(message)
print ciphertext
When i run the script with this command:
$ ./enc.py | base64
i get E0lNh0wtSg9lxxKClBEITAo= as a result.
If i do the same (or obviously it's not the same ;) ) in openssl i get another result:
$ echo -n "Sample text....." | openssl aes-128-ecb -k "Thisisakey123456" -nosalt -nopad | base64
yvNTGC+gwUK38uyJXIk/sQ==
What i am doing wrong?? i would expect the same base64 encoded string.
btw: i know ecb is bad, but i just play around, so it's no problem... ;)

You can try this command:
echo -n "Sample text....." | openssl aes-128-ecb -K 546869736973616b6579313233343536 -nopad | openssl base64
this explicitly specifies the key in hexadecimals. With -k the following "key" is actually a password, which is converted through an OpenSSL Password Based Key Derivation Function (PBKDF) called EVP_BytesToKey (using one iteration of SHA-1).
The result is E0lNh0wtSg9lxxKClBEITA==. This is not identical to E0lNh0wtSg9lxxKClBEITAo= but that's because Python adds a single newline character \n to the ciphertext, resulting in one extra byte to encode.

mail body encoding after procmail processing

I've got the following line in my .procmailrc on SMTP server:
BODY=`formail -I ""`
Later I echo this body to a local file:
echo "$BODY" >> $HOME/$FILENAME; \
I've also tried prinf (but I got the same effect):
printf "$BODY" >> $HOME/$FILENAME; \
When I read this file I can see that encoding has been change. Here's what I got:
Administrator System=C3=B3w
while it should be (in Polish):
Administrator Systemów
How to decode/encode the body either directly in .procmailrc or later (bash/python) to get the right string?
Another line in my .procmailrc works properly but it needs additional pipe with perl encoder:
SUBJECT=`formail -xSubject: | tr -d '\n' | sed -e 's/^ //' | /usr/bin/perl -MEncode -ne 'print encode ("utf8",decode ("MIME-Header",$_ )) '`
SUBJECT contains UTF8 characters and everything looks OK. Maybe there's a way to use a similar solution with the body of the mail?
OK.
I finally got everything up and running. Here's what I did:
First the .procmailrc file:
VERBOSE=yes
LOGFILE=$HOME/procmail.log
:0f
* ^From.*(some_address#somedomain.com)
| $HOME/python_script.py
Now to the python_script.py:
#!/usr/bin/python
from email.parser import Parser
import sys
temp_file = open("/home/(user)/file.txt","w")
temp_file.write("START\n")
if not message.is_multipart():
temp_file.write(message.get_payload(decode=True))
else:
for part in message.get_payload():
if part.get_content_type() == 'text/plain':
temp_file.write(part.get_payload(decode=True))
temp_file.close()
The most difficult part to debug was the .procmailrc recipe, where I had to test many options for :0, :0f, :0fbW etc... and finally found the one that suits best.
The next problematic step was the $BODY part decoded directly in .procmailrc. I figured out the solution though, by getting rid of all the stuff and moving everything to Python script. Just as tripleee suggested.

It is not changed, but you are zapping the headers so that the correct Content-Type: header is no longer present (you should also keep Mime-Version: and any other standard Content-* headers).
You should see, by examining the source of the message in your mail client, that Procmail or Bash have actually not changed anything. The text you receive is in fact literally Administrator System=C3=B3w but the MIME headers inform your email client that this is Content-Transfer-Encoding: quoted-printable and Content-type: text/plain; charset="utf-8" and so it knows how to decode and display this correctly.
If you want just the payload, you will need to decode it yourself, but in order to do that, you need this information from the MIME headers, so you should not kill them before you have handled the message (if at all). Something like this, perhaps:
from email.parser import Parser
import sys
message = Parser().parse(sys.stdin)
if message['content-type'].lower().startswith('text/'):
print(message.get_payload(decode=True))
else:
raise DieScreamingInAnguish('aaaargh!') # pseudo-pseudocode
This is extremely simplistic in that it assumes (like your current, even more broken solution) that the message contains a single, textual part. Extending it to multipart messages is not technically hard, but how exactly you do that depends on what sort of multiparts you expect to receive, and what you want to do with the payload(s).
Like in your previous question I would like to suggest that you move more, or all, of your email manipulation into Python, if you are going to be using it anyway. Procmail has no explicit MIME support so you would have to reinvent all of that in Procmail, which is neither simple nor particularly fruitful.

I think it could be your echo doesn't return correct unicode to write to your file in the first place, here are 2 of many solutions that may help you:
to echo with escape character:
echo -e "$BODY" >> $HOME/$FILENAME; \
or, use iconv or similar to encode your file to utf-8, assuming you have iconv in linux
iconv -t UTF-8 original.txt > encoded_result.txt

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.