PyCrypto: Encode chinese charaters with RSA asymmetric key - python

I am trying to use PyCrypto to encrypt/decrypt some strings, and I am having troubles with Chinese characters.
When I try to encrypt "ni-hao" (hello)...
pemFile = open("/home/borrajax/keys/myKey.pem", "r")
encryptor = RSA.importKey(pemFile, passphrase="f00")
return encryptor.encrypt("你好", 0)[0]
... I keep getting errors:
Module Crypto.PublicKey.pubkey:64 in encrypt
>> ciphertext=self._encrypt(plaintext, K)
Module Crypto.PublicKey.RSA:92 in _encrypt
>> return (self.key._encrypt(c),)
ValueError: Plaintext too large
I have tried many combinations,
encryptor.encrypt(u"你好"...
encryptor.encrypt(u"你好".encode("utf-8")...
without any luck.
I guess I could always try to use base64 before encoding, but I'd like to leave that as a "last resource"... I was hoping for a more "elegant" way of doing this.
Has anyone encountered the same problems? Any hint will be appreciated. Thank you in advance.

First, you should be encrypting only binary data, not Unicode text. That means str type (in Python 2.x) or bytes (in Python 3.x and most recent Python 2.x). You must encode text before encrypting it, and you must decode it after you decrypt.
Second, the byte string you are encrypting must be smaller than the RSA modulus (e.g. less than 256 bytes for RSA2048). If your data is longer, think of using an intermediate AES session key.
Third, if you use PyCrypto 2.5, there is no good reason for using the .encrypt/.decrypt methods of the RSA key object. It is better and more secure to use one of the PKCS#1 methods: OAEP or v1.5. With them, the plaintext must be even shorter.

I tested with PyCrypto v2.5 installed from pip on Ubuntu Linux 10.04 on python 2.6.5 in the interactive interpreter from the yakuake terminal.
I'm not able to reproduce the error you're seeing, especially the "Plaintext too large" bit. Some errors I have seen:
encryptor.encrypt(u"你好",0)[0]
TypeError: argument 1 must be long, not unicode
Seems like it doesn't like unicode objects - only wants str.
Both of these work on my setup, and both produce the same output, however the first solution is more correct:
encryptor.encrypt(u"你好".encode("utf-8"), 0)[0]
encryptor.encrypt("你好", 0)[0]
Are you trying this from the interactive interpreter, or from a file? If file, is the file UTF-8 encoded? If console, does it have proper UTF-8 support?

I checked related codes of PyCrypto, this error is only thrown when plain-text (after converted to long) is bigger than one of the key parameter. Assuming encoding of the script is correctly set, it may be because your RSA key is invalid or too short. I tried this snippet, it works with no problem:
rsa = RSA.generate(1024)
print(rsa.encrypt("你好", 0))

Related

Validate base64 in python

I would like to check wether a string is base64 encoded in python. As the built in module is very forgiving, I tried the following
s = b'111='
b64encode(b64decode(s)) == s
To my surprise it returned False. Indeed, b64encode(b64decode(s)) returns b'110='. I expected it to return True as I'm under the impression '111=' is a valid base64 string.
My python version is 3.6.4
$ python --version
Python 3.6.4
Why is this? Can someone explain this?
Since "0" and "1" both only differ in the last 4 bits (and the encoding has 4 padding bits), both are valid encodings of the plaintext. b64encode(), however, will only encode using "0" there.
In order to be robust you will need to calculate how many padding bits and bytes the encoded text should have and then only compare the significant bytes and bits.

Encryption not working properly

i have this piece of code for encryption.
from cryptography.fernet import Fernet
key = Fernet.generate_key()
f = Fernet(key)
token = f.encrypt(b"something cool")
k = f.decrypt(token)
print(k) `
This is the output
b'something cool'
According to the example on the website, that "b" should've gone. I'm very new at this and would like to know or understand how exactly the solution works.
Thanks
That ‘b’ means bytes. So instead of working with strings encryption algorythms are actually using bytes. My experience is that what you give a library (str/bytes/array) it should give you back, which Fernet is doing. I would simply convert the bytes back to a string k.decode(“utf-8”)
The encryption functions are doing what they should: bytes in and bytes out.
Cryptography and encryption work with bytes, not strings or other encoding, decrypt returns bytes. The actual low level decrypt has no idea of encodings, it can't the decryption could be a string, it could be an image, etc.
It is up to the caller to provide encodings in and out that are appropriate to the data being encrypted/decrypted.
As the caller wrap the encryption in a function you write that provides the correct encodings, in this case a string to bytes on encryption and bytes back to a string on decryption.

Python Encoding that ignores leading 0s

I'm writing code in python 3.5 that uses hashlib to spit out MD5 encryption for each packet once it is is given a pcap file and the password. I am traversing through the pcap file using pyshark. Currently, the values it is spitting out are not the same as the MD5 encryptions on the packets in the pcap file.
One of the reasons I have attributed this to is that in the hex representation of the packet, the values are represented with leading 0s. Eg: Protocol number is shown as b'06'. But the value I am updating the hashlib variable with is b'6'. And these two values are not the same for same reason:
>> b'06'==b'6'
False
The way I am encoding integers is:
(hex(int(value))[2:]).encode()
I am doing this encoding because otherwise it would result in this error: "TypeError: Unicode-objects must be encoded before hashing"
I was wondering if I could get some help finding a python encoding library that ignores leading 0s or if there was any way to get the inbuilt hex method to ignore the leading 0s.
Thanks!
Hashing b'06' and b'6' gives different results because, in this context, '06' and '6' are different.
The b string prefix in Python tells the Python interpreter to convert each character in the string into a byte. Thus, b'06' will be converted into the two bytes 0x30 0x36, whereas b'6' will be converted into the single byte 0x36. Just as hashing b'a' and b' a' (note the space) produces different results, hashing b'06' and b'6' will similarly produce different results.
If you don't understand why this happens, I recommend looking up how bytes work, both within Python and more generally - Python's handling of bytes has always been a bit counterintuitive, so don't worry if it seems confusing! It's also important to note that the way Python represents bytes has changed between Python 2 and Python 3, so be sure to check which version of Python any information you find is talking about. You can comment here, too,

Base64 decode does not work every time in Python

I have to decode a string. This one comes from Flash AS3 and I want to decode it in Python. I don't have any problems with PHP, but I cannot decode the following string with Python 2.6 'base64.b64decode'.
f3hvQgQaBFp9IC4NQhYZQiAhNhxBAkwIJC0pDR8fBl12ZjkWXwMEWn57bU0dGgBfcWdsTwAbGB4xLmVLAh0FXXd5a0gGHQRWdy5iQANNVAl/KmNLAhUBXyV8PkFQHwNefntjGgpPU18nK21OURtSC35wPE4FHFUJdi4/TlMUVFwlez9JVxtVDH0TB0IGHAc%Pr
Python returns "TypeError: Incorrect Padding". It seems to have superfious characters at the end of the string (from the '%'). But why Python base64 library do not manage this?
Thank you for your answer.
It seems to me you are not feeding a valid string to the function - it tells you so. You can't expect a function to "guess" what you wanted, and base its response on that. You have to use a valid parameter, or the function doesn't work.

How to fix unicode issue when using a web service with Python Suds

I am trying to work with the HORRIBLE web services at Commission Junction (CJ). I can get the client to connect and receive information from CJ, but their database seems to include a bunch of bad characters that cause a UnicideDecodeError.
Right now I am doing:
from suds.client import Client
wsdlLink = 'https://link-search.api.cj.com/wsdl/version2/linkSearchServiceV2.wsdl'
client = Client(wsdlLink)
result = client.service.searchLinks(developerKey='XXX', websiteId='XXX', promotionType='coupon')
This works fine until I hit a record that has something like 'CorpNet® 10% Off Any Service' then the ® causes it to break and I get
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 758: ordinal not in range(128)" error.
Is there a way to encode the ® on my end so that it does not break when SUDS reads in the result?
UPDATE:
To clarify, the ® is coming from the CJ database and is in their response. SO somehow I need to decode the non-ascii characters BEFORE SUDS deals with the response. I am not sure how (or if) this is done in SUDs.
Implicit UnicodeDecodeErrors is something you get when trying to add str and unicode objects. Python will then try to decode the str into unicode, but using the ASCII encoding. If your str then contains anything that is not ascii, you will get this error.
Your solution is the decode it manually like so:
thestring = thestring.decode('utf8')
Try, as much as possible, to decode any string that may contain non-ascii characters as soo as you are handed it from whatever module you get it from, in this case suds.
Then, if suds can't handle Unicode (which may be the case) make sure you encode it back just before handing the text back to suds (or any other library that breaks if you give it unicode).
That should solve things nicely. It may be a big change, as you need to move all your internal processing from str to unicode, but it's worth it. :)
The "registered" character is U+00AE and is encoded as "\xc2\xae" in UTF-8. It looks like you have a str object encoded in UTF-8 but some code is doing (probably by default) your_str_object.decode("ascii") which will fail with the error message you showed.
What you need to do is show us a complete example (i.e. ALL the code necessary to get the error), plus the full error message and traceback, so that at least we can guess whether the problem is in your code or in imported code.
I am using SUDS to interface with Salesforce via their SOAP API. I ran into the same situation until I followed #J.F.Sabastian's advice by not mixing str and unicode string types. For example, passing a SOQL string like this does work with SUDS 0.3.9:
qstr = u"select Id, FirstName, LastName from Contact where FirstName='%s' and LastName='%s'" % (u'Jorge', u'López')
I did not seem to need to do str.decode("utf-8") either.
If you're running your script from PyDev on Eclipse, you might want to go into Project => Properties and under Resource, set "Text File Encoding" to UTF-8, on my Mac, this defaults to "MacRoman". I suppose on Windoze, the default is either Cp1252 or ISO-8859-1 (Latin). You could also set this in your Workspace of your Projects inherit this setting from their workspace. This only effects the program source code.

Categories