I'm working on an Python RESTful API and I need to send bytes-encoded data (specifically data encrypted using a public RSA key from the package rsa) over the network, via JSON forms.
Here is what it looks like :
>>> import rsa
>>> pubkey, privkey = rsa.newkeys(512, True) # Create a pair of public/private rsa keys ; 512 is for the example
>>> encStr = rsa.encrypt(b"Test string", pubkey) # Encrypt a bytes object using the public key
>>> encStr
b'r\x10\x03e\xc6*\xa8\xb1\xee\xbd\x18\x0f\x7f\xecz\xcex\xabP~\xb3]\x8f)R\x9b>i\x03\xab-m\x0c\x19\xd7\xa5f$\x07\xc1;X\x0b\xaa2\x99\xa8&\xfc/\x9f\x05!nk\x93%\xc0\xf5\x1d\xf8C\x1fo'
"encStr" is what I need to send, however, I can't tell what encoding it is, and the package documentation doesn't mention it. If you have any idea, please share it :)
>>> encStr.decode("utf-8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec cannot decode byte 0x8e in position 0: invalid start byte
>>> encStr.decode("latin1")
'\x8eM\x96Æ\'zÈZ\x89\x85±\x98Z¯Ûzùæ¯;£zñ8\x9b§Ù\x9dÏ\x8eâ0®\x89ó(*?\x92ªg\x12ôsä\x1d\x96\x19\x82\x19-3\x15SBýh"3òÖß\x91Ô' # This could be it
>>> encStr.decode("latin1").encode("latin1")
b'\x8eM\x96\xc6\'z\xc8Z\x89\x85\xb1\x98Z\xaf\xdbz\xf9\xe6\xaf;\xa3z\xf18\x9b\xa7\xd9\x9d\xcf\x8e\xe20\xae\x89\xf3(*?\x92\xaag\x12\xf4s\xe4\x1d\x96\x19\x82\x19-3\x15SB\xfdh"3\xf2\xd6\xdf\x91\xd4' # Nop, garbage
After manipulating for a while, I found a way to get a proper string using base64.
>>> import base64
>>> b64_encStr = base64.b64encode(encStr)
>>> b64_encStr
b'jk2Wxid6yFqJhbGYWq/bevnmrzujevE4m6fZnc+O4jCuifMoKj+SqmcS9HPkHZYZghktMxVTQv1oIjPy1t+R1A=='
>>> b64_encStr.decode("utf-8")
'jk2Wxid6yFqJhbGYWq/bevnmrzujevE4m6fZnc+O4jCuifMoKj+SqmcS9HPkHZYZghktMxVTQv1oIjPy1t+R1A=='
Now I'd just have to send this, however, I would like to know if there is a more efficient way of doing this (shorter string ; less operations, considering the client has to encode and the server decode, etc).
Thanks !
Shawn
Base64 is a relatively efficient method of sending bytes as text (6 bits per character, or 8 bits per byte for normal single byte character encoding). These bytes may have any value, such as found in ciphertext. There are more efficient encodings such as basE91, but they only provide few advantages for the complexity that they bring.
However, I often see ciphertext being "stringified" while there is no need. Files, HTTP, sockets etc. all handle any byte values well. If you want to use it in a GET request then you should use base64url instead of the normal base 64 encoding. Often developers encode strings needlessly so that the values can be easily seen in traces and such, but in that case only the trace printout itself needs to be encoded.
Note that I'd advise you to use OAEP padding rather than PKCS#1 and a key size of at least 3072 bits, especially if you want to encrypt data that is transported rather than encrypted "in place".
Related
If I have a unicode string such as:
s = u'c\r\x8f\x02\x00\x00\x02\u201d'
how can I convert this to just a regular string that isn't in unicode format; i.e. I want to extract:
f = '\x00\x00\x02\u201d'
and I do not want it in unicode format. The reason why I need to do this is because I need to convert the unicode in s to an integer value, but if I try it with just s:
int((s[-4]+s[-3]+s[-2]+s[-1]).encode('hex'), 16)
Traceback (most recent call last):
File "<pyshell#48>", line 1, in <module>
int((s[-4]+s[-3]+s[-2]+s[-1]).encode('hex'), 16)
File "C:\Python27\lib\encodings\hex_codec.py", line 24, in hex_encode
output = binascii.b2a_hex(input)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201d' in position 3: ordinal not in range(128)
yet if I do it with f:
int(f.encode('hex'), 16)
664608376369508L
And this is the correct integer value I want to extract from s. Is there a method where I can do this?
Normally, the device sends back something like: \x00\x00\x03\xcc which I can easily convert to 972
OK, so I think what's happening here is you're trying to read four bytes from a byte-oriented device, and decode that to an integer, interpreting the bytes as a 32-bit word in big-endian order.
To do this, use the struct module and byte strings:
>>> struct.unpack('>i', '\x00\x00\x03\xCC')[0]
972
(I'm not sure why you were trying to reverse the string then hex-encode; that would put the bytes in the wrong order and give much too large output.)
I don't know how you're reading from the device, but at some point you've decoded the bytes into a text (Unicode) string. Judging from the U+201D character in there I would guess that the device originally gave you a byte 0x94 and you decoded it using code page 1252 or another similar Windows default (‘ANSI’) code page.
>>> struct.unpack('>i', '\x00\x00\x02\x94')[0]
660
It may be possible to reverse the incorrect decoding step by encoding back to bytes using the same mapping, but this is dicey and depends on which encoding are involved (not all bytes are mapped to anything usable in all encodings). Better would be to look at where the input is coming from, find where that decode step is happening, and get rid of it so you keep hold of the raw bytes the device sent you.
I would like to include picture bytes into a JSON, but I struggle with a encoding issue:
import urllib
import json
data = urllib.urlopen('https://www.python.org/static/community_logos/python-logo-master-v3-TM-flattened.png').read()
json.dumps({'picture' : data})
UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 0: invalid start byte
I don't know how to deal with that issue since I am handling an image, so I am a bit confused about this encoding issue. I am using python 2.7. Does anyone can help me? :)
JSON data expects to handle Unicode text. Binary image data is not text, so when the json.dumps() function tries to decode the bytestring to unicode using UTF-8 (the default) that decoding fails.
You'll have to wrap your binary data in a text-safe encoding first, such as Base-64:
json.dumps({'picture' : data.encode('base64')})
Of course, this then assumes that the receiver expects your data to be wrapped so.
If your API endpoint has been so badly designed to expect your image bytes to be passed in as text, then the alternative is to pretend that your bytes are really text; if you first decode it as Latin-1 you can map those bytes straight to Unicode codepoints:
json.dumps({'picture' : data.encode('latin-1')})
With the data already a unicode object the json library will then proceed to treat it as text. This does mean that it can replace non-ASCII codepoints with \uhhhh escapes.
The best solution that comes to my mind for this situation, space-wise, is base85 encoding which represents four bytes as five characters. Also you could also map every byte to the corresponding character in U+0000-U+00FF format and then dump it in the json.
But still, those could be overkill methods for this and base64, ease-wise, would be the winner.
I am using the RSA python package to encrypt a message and try to pass it to a PHP site to decrypt. See below:
message = rsa.encrypt('abc', pubkey)
print message
print type(message)
What I get is some encrypted text
q??$$??kK?Y??p?[e?[??f???x??s!?s?>?z?*y?p?????????分?
? ???({u????NH?B???N?%?#5|?~?????\U?.??r?Y?q
<type 'str'>
What's the best way to pass it to other languages to decrypt?
That's not a text, it's a binary data. As you are using python2 it doesn't fully distinguish bytes from str so you should care about this.
The other side should get this bytes exactly as rsa outputs them so you can just write them into your connection or file (presuming you are talking binary to them).
For web you can base64 encode the data. It's a common and good way to encode binary data.
>>> import base64
>>> base64.b64encode(b"data")
'ZGF0YQ=='
>>> base64.b64decode(base64.b64encode(b"data"))
'data'
By the way you are not suppose to use RSA that way. It's highly insecure to use raw RSA. You must use a probabilistic encryption scheme with proper padding, e.g. RSAES-OAEP. PKCS#1 defines such scheme for encryption and signatures.
If I call os.urandom(64), I am given 64 random bytes. With reference to Convert bytes to a Python string I tried
a = os.urandom(64)
a.decode()
a.decode("utf-8")
but got the traceback error stating that the bytes are not in utf-8.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 0: invalid start byte
with the bytes
b'\x8bz\xaf$\xb6\x93q\xef\x94\x99$\x8c\x1eO\xeb\xed\x03O\xc6L%\xe70\xf9\xd8
\xa4\xac\x01\xe1\xb5\x0bM#\x19\xea+\x81\xdc\xcb\xed7O\xec\xf5\\}\x029\x122
\x8b\xbd\xa9\xca\xb2\x88\r+\x88\xf0\xeaE\x9c'
Is there a fullproof method to decode these bytes into some string representation? I am generating sudo random tokens to keep track of related documents across multiple database engines.
The code below will work on both Python 2.7 and 3:
from base64 import b64encode
from os import urandom
random_bytes = urandom(64)
token = b64encode(random_bytes).decode('utf-8')
You have random bytes; I'd be very surprised if that ever was decodable to a string.
If you have to have a unicode string, decode from Latin-1:
a.decode('latin1')
because it maps bytes one-on-one to corresponding Unicode code points.
You can use base-64 encoding. In this case:
a = os.urandom(64)
a.encode('base-64')
Also note that I'm using encode here rather than decode, as decode is trying to take it from whatever format you specify into unicode. So in your example, you're treating the random bytes as if they form a valid utf-8 string, which is rarely going to be the case with random bytes.
Are you sure that you need 64 bytes represented as string?
Maybe what you really need is N-bits token?
If so, use secrets. The secrets module provides functions for generating secure tokens, suitable for applications such as password resets, hard-to-guess URLs, and similar.
import secrets
>>> secrets.token_bytes(16)
b'\xebr\x17D*t\xae\xd4\xe3S\xb6\xe2\xebP1\x8b'
>>> secrets.token_hex(16)
'f9bf78b9a18ce6d46a0cd2b0b86df9da'
>>> secrets.token_urlsafe(16)
'Drmhze6EPcv0fN_81Bj-nA'
Or Maybe you need 64 chars length random string? import string
import secrets
alphabet = string.ascii_letters + string.digits
password = ''.join(secrets.choice(alphabet) for i in range(64))
this easy way:
a = str(os.urandom(64))
print(F"the: {a}")
print(type(a))
I am trying to convert an incoming byte string that contains non-ascii characters into a valid utf-8 string such that I can dump is as json.
b = '\x80'
u8 = b.encode('utf-8')
j = json.dumps(u8)
I expected j to be '\xc2\x80' but instead I get:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)
In my situation, 'b' is coming from mysql via google protocol buffers and is filled out with some blob data.
Any ideas?
EDIT:
I have ethernet frames that are stored in a mysql table as a blob (please, everyone, stay on topic and keep from discussing why there are packets in a table). The table collation is utf-8 and the db layer (sqlalchemy, non-orm) is grabbing the data and creating structs (google protocol buffers) which store the blob as a python 'str'. In some cases I use the protocol buffers directly with out any issue. In other cases, I need to expose the same data via json. What I noticed is that when json.dumps() does its thing, '\x80' can be replaced with the invalid unicode char (\ufffd iirc)
You need to examine the documentation for the software API that you are using. BLOB is an acronym: BINARY Large Object.
If your data is in fact binary, the idea of decoding it to Unicode is of course a nonsense.
If it is in fact text, you need to know what encoding to use to decode it to Unicode.
Then you use json.dumps(a_Python_object) ... if you encode it to UTF-8 yourself, json will decode it back again:
>>> import json
>>> json.dumps(u"\u0100\u0404")
'"\\u0100\\u0404"'
>>> json.dumps(u"\u0100\u0404".encode('utf8'))
'"\\u0100\\u0404"'
>>>
UPDATE about latin1:
u'\x80' is a useless meaningless C1 control character -- the encoding is extremely unlikely to be Latin-1. Latin-1 is "a snare and a delusion" -- all 8-bit bytes are decoded to Unicode without raising an exception. Don't confuse "works" and "doesn't raise an exception".
Use b.decode('name of source encoding') to get a unicode version. This was surprising to me when I learned it. eg:
In [123]: 'foo'.decode('latin-1')
Out[123]: u'foo'
I think what you are trying to do is decode the string object of some encoding. Do you know what that encoding is? To get the unicode object.
unicode_b = b.decode('some_encoding')
and then re-encoding the unicode object using the utf_8 encoding back to a string object.
b = unicode_b.encode('utf_8')
Using the unicode object as a translator, without knowing what the original encoding of the string is I can't know for certain but there is the possibility that the conversion will not go as expected. The unicode object is not meant for converting strings of one encoding to another. I would work with the unicode object assuming you know what the encoding is, if you don't know what the encoding is then there really isn't a way to find out without trial and error, and then convert back to the encoded string when you want a string object back.