Related
Having read through eBay's guide for including digital signatures to certain of their REST API calls, I am having trouble with generating the signature header. Rather than including all of the documentation here (there is a lot!), I'll provide links to the appropriate pages and some of the documentation. The following page it the starting point provided by eBay:
https://developer.ebay.com/develop/guides/digital-signatures-for-apis
The next page is where I am lead to from the previous page describing how to create the signature:
https://www.ietf.org/archive/id/draft-ietf-httpbis-message-signatures-13.html#name-eddsa-using-curve-edwards25
Which leads me onto the following :
https://www.rfc-editor.org/rfc/rfc8032#section-5.1.6
5.1.6. Sign
The inputs to the signing procedure is the private key, a 32-octet
string, and a message M of arbitrary size. For Ed25519ctx and
Ed25519ph, there is additionally a context C of at most 255 octets
and a flag F, 0 for Ed25519ctx and 1 for Ed25519ph.
1. Hash the private key, 32 octets, using SHA-512. Let h denote the
resulting digest. Construct the secret scalar s from the first
half of the digest, and the corresponding public key A, as
described in the previous section. Let prefix denote the second
half of the hash digest, h[32],...,h[63].
2. Compute SHA-512(dom2(F, C) || prefix || PH(M)), where M is the
message to be signed. Interpret the 64-octet digest as a little-
endian integer r.
3. Compute the point [r]B. For efficiency, do this by first
reducing r modulo L, the group order of B. Let the string R be
the encoding of this point.
4. Compute SHA512(dom2(F, C) || R || A || PH(M)), and interpret the
64-octet digest as a little-endian integer k.
5. Compute S = (r + k * s) mod L. For efficiency, again reduce k
modulo L first.
6. Form the signature of the concatenation of R (32 octets) and the
little-endian encoding of S (32 octets; the three most
significant bits of the final octet are always zero).
I have some Python code from the appendix from this same web page (https://www.rfc-editor.org/rfc/rfc8032#section-6):
## First, some preliminaries that will be needed.
import hashlib
def sha512(s):
return hashlib.sha512(s).digest()
# Base field Z_p
p = 2**255 - 19
def modp_inv(x):
return pow(x, p-2, p)
# Curve constant
d = -121665 * modp_inv(121666) % p
# Group order
q = 2**252 + 27742317777372353535851937790883648493
def sha512_modq(s):
return int.from_bytes(sha512(s), "little") % q
## Then follows functions to perform point operations.
# Points are represented as tuples (X, Y, Z, T) of extended
# coordinates, with x = X/Z, y = Y/Z, x*y = T/Z
def point_add(P, Q):
A, B = (P[1]-P[0]) * (Q[1]-Q[0]) % p, (P[1]+P[0]) * (Q[1]+Q[0]) % p;
C, D = 2 * P[3] * Q[3] * d % p, 2 * P[2] * Q[2] % p;
E, F, G, H = B-A, D-C, D+C, B+A;
return (E*F, G*H, F*G, E*H);
# Computes Q = s * Q
def point_mul(s, P):
Q = (0, 1, 1, 0) # Neutral element
while s > 0:
if s & 1:
Q = point_add(Q, P)
P = point_add(P, P)
s >>= 1
return Q
def point_equal(P, Q):
# x1 / z1 == x2 / z2 <==> x1 * z2 == x2 * z1
if (P[0] * Q[2] - Q[0] * P[2]) % p != 0:
return False
if (P[1] * Q[2] - Q[1] * P[2]) % p != 0:
return False
return True
## Now follows functions for point compression.
# Square root of -1
modp_sqrt_m1 = pow(2, (p-1) // 4, p)
# Compute corresponding x-coordinate, with low bit corresponding to
# sign, or return None on failure
def recover_x(y, sign):
if y >= p:
return None
x2 = (y*y-1) * modp_inv(d*y*y+1)
if x2 == 0:
if sign:
return None
else:
return 0
# Compute square root of x2
x = pow(x2, (p+3) // 8, p)
if (x*x - x2) % p != 0:
x = x * modp_sqrt_m1 % p
if (x*x - x2) % p != 0:
return None
if (x & 1) != sign:
x = p - x
return x
# Base point
g_y = 4 * modp_inv(5) % p
g_x = recover_x(g_y, 0)
G = (g_x, g_y, 1, g_x * g_y % p)
def point_compress(P):
zinv = modp_inv(P[2])
x = P[0] * zinv % p
y = P[1] * zinv % p
return int.to_bytes(y | ((x & 1) << 255), 32, "little")
def point_decompress(s):
if len(s) != 32:
raise Exception("Invalid input length for decompression")
y = int.from_bytes(s, "little")
sign = y >> 255
y &= (1 << 255) - 1
x = recover_x(y, sign)
if x is None:
return None
else:
return (x, y, 1, x*y % p)
## These are functions for manipulating the private key.
def secret_expand(secret):
if len(secret) != 32:
raise Exception("Bad size of private key")
h = sha512(secret)
a = int.from_bytes(h[:32], "little")
a &= (1 << 254) - 8
a |= (1 << 254)
return (a, h[32:])
def secret_to_public(secret):
(a, dummy) = secret_expand(secret)
return point_compress(point_mul(a, G))
## The signature function works as below.
def sign(secret, msg):
a, prefix = secret_expand(secret)
A = point_compress(point_mul(a, G))
r = sha512_modq(prefix + msg)
R = point_mul(r, G)
Rs = point_compress(R)
h = sha512_modq(Rs + A + msg)
s = (r + h * a) % q
return Rs + int.to_bytes(s, 32, "little")
## And finally the verification function.
def verify(public, msg, signature):
if len(public) != 32:
raise Exception("Bad public key length")
if len(signature) != 64:
Exception("Bad signature length")
A = point_decompress(public)
if not A:
return False
Rs = signature[:32]
R = point_decompress(Rs)
if not R:
return False
s = int.from_bytes(signature[32:], "little")
if s >= q: return False
h = sha512_modq(Rs + public + msg)
sB = point_mul(s, G)
hA = point_mul(h, A)
return point_equal(sB, point_add(R, hA))
Now, the problem that I am having is that this code insists on the "secret" consisting of a 32 byte array:
if len(secret) != 32: raise Exception("Bad size of private key")
However, the secret is described as being the private key provided by eBay's Key Management API (https://developer.ebay.com/api-docs/developer/key-management/overview.html), which is not a 32 byte array, but a 64 character ASCII string (see https://developer.ebay.com/api-docs/developer/key-management/resources/signing_key/methods/createSigningKey#h2-samples):
"privateKey": "MC4CAQAwBQYDK2VwBCIEI******************************************n"
When I try to generate a signature with the eBay private key using this Python code, it gives me an error saying it is a "Bad size of private key". If I convert the private key from eBay to a bytearray, it is 64 bytes long. How can I use the Python code to generate the signature header using the private key supplied by eBay?
To further complicate things, I am actually using Excel VBA (Visual Basic) to make the API call after using Python to generate the signature (simply because Python is better at this kind of thing!). eBay's PAID FOR technical support has confirmed that the following headers are correct and that there is no "message" as described in https://www.rfc-editor.org/rfc/rfc8032#section-5.1.6, but they have not yet been of any further help other than suggesting that there may be a "bug".
http.setRequestHeader "signature-input", "sig1=(""x-ebay-signature-key"" ""#method"" ""#path"" ""#authority"");created=1667386210"
http.setRequestHeader "x-ebay-signature-key", "<jwe returned by eBay>"
http.setRequestHeader "x-ebay-enforce-signature", "true"
The remaining header would be as follows once I can generate a valid signature:
http.setRequestHeader "signature" "sig1=:<signature>:"
Everything I have tried results in the same response:
{
"errors": [
{
"errorId": 215122,
"domain": "ACCESS",
"category": "REQUEST",
"message": "Signature validation failed",
"longMessage": "Signature validation failed to fulfill the request."
}
]
}
Here are some example keys like the ones generated by eBay. https://www.ietf.org/archive/id/draft-ietf-httpbis-message-signatures-11.html#appendix-B.1.4
"The following key is an elliptical curve key over the Edwards curve ed25519, referred to in this document as test-key-ed25519. This key is PCKS#8 encoded in PEM format, with no encryption."
-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEAJrQLj5P/89iXES9+vFgrIy29clF9CC/oPPsw3c5D0bs=
-----END PUBLIC KEY-----
-----BEGIN PRIVATE KEY-----
MC4CAQAwBQYDK2VwBCIEIJ+DYvh6SEqVTm50DFtMDoQikTmiCqirVv9mWG9qfSnF
-----END PRIVATE KEY-----
This is the format of private key that I believe that I need to convert to a 32-byte array to work with the above Python code. I believe that there is a typo on the linked to web page and it should be "PKCS", not "PCKS".
UPDATE:
If I run the following command:
openssl ec -in test.pem -text
Where test.pem is a text file containing:
-----BEGIN PRIVATE KEY-----
MC4CAQAwBQYDK2VwBCIEIJ+DYvh6SEqVTm50DFtMDoQikTmiCqirVv9mWG9qfSnF
-----END PRIVATE KEY-----
It displays private and public keys as 32 byte hex dumps, but even when using these values I get the same response as above with the 215122 error. When I verify using the Python "verify" method in the code above with these 32 byte hex dump keys, validation is successful.
Alright so this is where Im at right now, not using the content-digest as it's simply a GET request so just trying to get the basics working, but none of this seems to work.
$public = "xxx";
$private = "yyy";
$jwe = "jwe";
$path = "/sell/fulfillment/v1/order/" . "11-xxxx-yyyy";
$signature_input_txt = '("x-ebay-signature-key" "#method" "#path" "#authority");created=' . time();
// $signature_base = '"content-digest": sha-256=:' . base64_encode($contentDigest) . ":\n";
$signature_base = '"x-ebay-signature-key": ' . $jwe;
$signature_base .= '"#method": POST';
$signature_base .= '"#path": ' . $path;
$signature_base .= '"#authority": ' . "apiz.ebay.com";
$signature_base .= '"#signature-params": ' . $signature_input_txt;
// ensure signature_base is UTF-8
if (!mb_check_encoding($signature_base, 'UTF-8')) {
$signature_base = mb_convert_encoding($signature_base, 'UTF-8');
}
// dd($signature_base);
// base 64 encode our signature_base
$signature_base_base64_encoded = base64_encode($signature_base);
// format the private key as required
$formatted_private_key = "-----BEGIN RSA PRIVATE KEY-----" . PHP_EOL . $private . PHP_EOL . "-----END RSA PRIVATE KEY-----";
// sign
openssl_sign($signature_base_base64_encoded, $signed_signature, $formatted_private_key, "sha256WithRSAEncryption");
return [
'Authorization' => 'Bearer ' . $this->marketplace->getToken('oauth2.access_token', 'production'),
'Accept' => 'application/json',
'Content-Type' => 'application/json',
'Signature-Input' => 'sig1=' . $signature_input_txt,
'Signature' => 'sig1=:' . base64_encode($signed_signature) . ':',
'x-ebay-signature-key' => $jwe,
'x-ebay-enforce-signature' => true
];
I'm going to put this here for anyone struggling to get this working with PHP, adapted from Renegade_Mtl answer (you'd missed the need for a new line for each signature_base and it didn't need to be encoded).
/**
* #param $method - e.g. POST, GET
* #param $path - e.g /sell/finances/v1/seller_funds_summary
* #param $host - e.g. api.ebay.com
* #param $keyset // public, private and jwt keys generated from https://apiz.ebay.com/developer/key_management/v1/signing_key
* #param $timestamp - e.g. time()
* #return array of headers
*/
private function createHeaders(string $method, string $path, string $host, array $tokens, int $time) {
$signature_input_txt = '("x-ebay-signature-key" "#method" "#path" "#authority");created=' . $time;
// $signature_base = '"content-digest": sha-256=:' . base64_encode($contentDigest) . ":\n";
$signature_base = '"x-ebay-signature-key": ' . $tokens['jwe']."\n";
$signature_base .= '"#method": ' . $method."\n";
$signature_base .= '"#path": ' . $path."\n";
$signature_base .= '"#authority": ' . $host."\n";
$signature_base .= '"#signature-params": ' . $signature_input_txt;
// format the private key as required
$formatted_private_key = "-----BEGIN RSA PRIVATE KEY-----" . PHP_EOL . $tokens['privateKey'] . PHP_EOL . "-----END RSA PRIVATE KEY-----";
openssl_sign($signature_base, $signed_signature, $formatted_private_key, "sha256WithRSAEncryption");
return [
'Signature-Input' => 'sig1=' . $signature_input_txt,
'Signature' => 'sig1=:' . base64_encode($signed_signature) . ':',
'x-ebay-signature-key' => $tokens['jwe'],
'x-ebay-enforce-signature' => "true"
];
}
We only use GET's but if you also POST then you'd need also the content digest... Hope this helps someone from wasting hours and hours trying to figure it out.
I have the below code in CPP that I am trying to port to Python to send data to a UDPS
#define VERSION_MAIN "V6.60"
#define VERSION_BIN_MAJOR 0x06
#define VERSION_BIN_MINOR 0x60
unsigned char temp[3];
temp[0] = VERSION_MAIN[0];
temp[1] = VERSION_BIN_MAJOR;
temp[2] = VERSION_BIN_MINOR;
I have tried code like the following:
byteone = bytes(VERSION_MAIN, 'utf-8')
hex_string = '0x06'
decimal_int = int(hex_string, 16)
decimal_string = str(decimal_int)
digits = [int(c) for c in decimal_string]
zero_padded_BCD_digits = [format(d, '04b') for d in digits]
s = ''.join(zero_padded_BCD_digits)
bytetwo = bytes(int(s[i : i + 8], 2) for i in range(0, len(s), 8))
hex_string = '0x60'
decimal_int = int(hex_string, 16)
decimal_string = str(decimal_int)
digits = [int(c) for c in decimal_string]
zero_padded_BCD_digits = [format(d, '04b') for d in digits]
s = ''.join(zero_padded_BCD_digits)
bytethree = bytes(int(s[i : i + 8], 2) for i in range(0, len(s), 8))
values = (byteonw,bytetwo,bytethree )
s= struct.Struct(f'!3B')
packed_data = s.pack(*values)
but I keep getting pesky errors
struct.error: required argument is not an integer
Can anyone give me a hand please.
Thanks
No need to convert V into an int, pack can manage the char type as well.
from struct import pack
VERSION_MAIN = "V6.60"
VERSION_BIN_MAJOR = 0x06
VERSION_BIN_MINOR = 0x60
version = pack("!c2B", VERSION_MAIN[0].encode('utf-8'), VERSION_BIN_MAJOR, VERSION_BIN_MINOR)
# version = b'V\x06`'
While using a Python socket connection and I received the following from the server (simulator):
'C\xbc\x7fZC\xbc\x7f5C\xbc\x7f&C\xbc\x7f!C\xbc\x7f\x14C\xbc\x7f\x16C\xbc\x7f!C\xbc\x7f\x12C\xbc\x7f\x19C\xbc\x80\x03C\xbc\x81~C\xbc\x80FC\xbc\x7f\xd1C\xbc~\xcdC\xbc~\xd7C\xbc\x7f$;4\xe0\xb8;\x81\xf6\xd7;R\xf4T:\xc5v\xb4:\xc1>\xc1:\xd9\x15\x9a:\xa8M7;K\xde{:\xfb\\\xcd\xbc\tR\xd4\xba\xe1\xb6\xf3\xbb\xc9\xb8\xf4\xbby\x8a\x199\x9dH\x009\x0cp\x00:\xeb<\x00'
But my actual output should be (size 32x1):
376.99493 376.99380 376.99335 376.99319 376.99280 376.99286
376.99319 376.99274 376.99295 377.00009 377.01166 377.00214
376.99857 376.99063 376.99094 376.99329 0.0027599763
0.0039661932 0.0032189088 0.0015065284 0.0014743434
0.0016562224 0.0012840395 0.0031107950 0.0019177437
-0.0083815642 -0.0017220661 -0.0061560813 -0.0038076697
0.00029999018 0.00013393164 0.0017946959
How can I convert the ASCII codes to get those numbers in Python 2.7.
To convert the floats from a socket, you can use struct.unpack.
Code:
num_floats = int(len(raw_data) / 4)
format_str = '>' + 'f' * num_floats
data = struct.unpack(format_str, raw_data[:num_floats * 4])
Test Code:
import struct
raw_data = b'C\xbc\x7fZC\xbc\x7f5C\xbc\x7f&C\xbc\x7f!C\xbc\x7f\x14C\xbc' \
b'\x7f\x16C\xbc\x7f!C\xbc\x7f\x12C\xbc\x7f\x19C\xbc\x80\x03C' \
b'\xbc\x81~C\xbc\x80FC\xbc\x7f\xd1C\xbc~\xcdC\xbc~\xd7C\xbc' \
b'\x7f$;4\xe0\xb8;\x81\xf6\xd7;R\xf4T:\xc5v\xb4:\xc1>\xc1:\xd9' \
b'\x15\x9a:\xa8M7;K\xde{:\xfb\\xcd\xbc\tR\xd4\xba\xe1\xb6\xf3' \
b'\xbb\xc9\xb8\xf4\xbby\x8a\x199\x9dH\x009\x0cp\x00:\xeb'
num_floats = int(len(raw_data) / 4)
format_str = '>' + 'f' * num_floats
data = struct.unpack(format_str, raw_data[:num_floats * 4])
print(data)
Results:
(376.99493408203125, 376.9938049316406, 376.99334716796875,
376.9931945800781, 376.9927978515625, 376.99285888671875,
376.9931945800781, 376.99273681640625, 376.9929504394531,
377.0000915527344, 377.01165771484375, 377.00213623046875,
376.9985656738281, 376.9906311035156, 376.9909362792969,
376.9932861328125, 0.00275997631251812, 0.003966193180531263,
0.0032189087942242622, 0.001506528351455927, 0.0014743433566763997,
0.001656222390010953, 0.001284039462916553, 0.003110795048996806,
0.0019177338108420372, 4.2194070097596986e+21, 456834187264.0,
-7.263825409609126e-06, -0.00011669746163534, -7.377517891143851e-33,
131300.1875, 1.5874123484317317e+29)
I'm struggling with serial write communication. Basically I don't know which option to choose to combine both integers and float values in the correct way for serial write.
The problem: I need to send data values (chars,int, floats) to a microcontroller (ARM8 processor) for control of wheels for a robot platform. By means of a RS422 to USB converter I'm able to read data values from the same microcontroller, by means of this Python code:
import serial
from struct import unpack
# Initialization
counter = 0
#open serial for reading (BMW to PC communication)
s_port = 'COM8'
b_rate = 460800
ser = serial.Serial(port=s_port,baudrate=b_rate,timeout=0.01)
#method for reading incoming bytes on serial
while counter<20:
data = ser.readline()
data = data.encode("hex")
strlen = len(data)
rev_data = "".join(reversed([data[i:i+2] for i in range(0, len(data), 2)]))
if strlen==80:
Soh = data[0:2]
Nob = data[2:4]
Adr = data[4:6]
Cmd = data[6:8]
Hrt = data[8:12]
Po1 = data[12:28]
Po2 = data[28:44]
En1 = data[44:60]
En2 = data[60:76]
Crc = data[76:78]
Eot = data[78:80]
So1 = unpack('B', Soh.decode("hex")) # unsigned char
No1 = unpack('B', Nob.decode("hex")) # unsigned char
Ad1 = unpack('B', Adr.decode("hex")) # unsigned char
Cm1 = unpack('B', Cmd.decode("hex")) # unsigned char
Hr1 = unpack('h', Hrt.decode("hex")) # short
Po1 = unpack('d', Po1.decode("hex")) # double
Po2 = unpack('d', Po2.decode("hex")) # double
En1 = unpack('d', En1.decode("hex")) # double
En2 = unpack('d', En2.decode("hex")) # double
Cr1 = unpack('B', Crc.decode("hex")) # unsigned char
Eo1 = unpack('B', Eot.decode("hex")) # unsigned char
StartOfHeader = So1[0]
NoOfBytes = No1[0]
Address = Ad1[0]
Command = Cm1[0]
Heartbeat = Hr1[0]
Potentiometer1 = Po1[0]
Potentiometer2 = Po2[0]
Encoder1 = En1[0]
Encoder2 = En2[0]
CRC = Cr1[0]
EndOfTransmission = Eo1[0]
counter = counter+1
ser.close()
In Labview the serial write communication is already working, so that is my starting point:
But as I need to work with Python the trick is to make it working in Python.
To my knowledge the data is converted to ASCII, based on the "Type Cast" function in Labview (http://digital.ni.com/public.nsf/allkb/287D59BAF21F58C786256E8A00520ED5)
Honestly I don't nothing about ASCII messages, so I'm not 100% sure.
Now, I want to do the same trick for the serial write in Python 2.7. Starting from chr,int, float values converting to hexadecimal string and then write to the serial port. I suppose to use an ASCII conversion at the end, but I don't know if it's ASCII and the right Python command to do the conversion in the right way.
This is my serial write coding in Python so far (excuse me, for the long and inefficient coding lines):
# Initialization
counter = 0
#open serial for reading (BMW to PC communication)
s_port = 'COM9'
b_rate = 460800
ser = serial.Serial(port=s_port,baudrate=b_rate,timeout=0.05)
#method for writing to serial
while counter<100:
Soh = chr(1)
Nob = chr(46)
Adr = chr(49)
Cmd = chr(32)
DFS = float(1)
DRS = float(2)
DFD = int(3)
DRD = int(4)
DFC = int(5)
DRC = int(6)
SFCP = float(7)
SRCP = float(8)
SFC = int(9)
SRC = int(10)
CRC = chr(77)
EOT = chr(4)
S1 = Soh.encode("hex")
N1 = Nob.encode("hex")
A1 = Adr.encode("hex")
C1 = Cmd.encode("hex")
D11 = hex(struct.unpack('<I', struct.pack('<f', DFS))[0])
D1 = D11[2:]
D12 = hex(struct.unpack('<I', struct.pack('<f', DRS))[0])
D2 = D12[2:]
D3 = '{0:08x}'.format(DFD)
D4 = '{0:08x}'.format(DRD)
D5 = '{0:08x}'.format(DFC)
D6 = '{0:08x}'.format(DRC)
S11 = hex(struct.unpack('<I', struct.pack('<f', SFCP))[0])
S2 = S11[2:]
S12 = hex(struct.unpack('<I', struct.pack('<f', SRCP))[0])
S3 = S12[2:]
S4 = '{0:08x}'.format(SFC)
S5 = '{0:08x}'.format(SRC)
C2 = CRC.encode("hex")
E1 = EOT.encode("hex")
hex_string = E1 + C2 + S5 + S4 + S3 + S2 + D6 + D5 + D4 + D3 + D2 + D1 + C1 + A1 + N1 + S1
rev_hex_string = "".join(reversed([hex_string[i:i+2] for i in range(0, len(hex_string), 2)]))
##command = ...
ser.write(command)
counter = counter+1
ser.close()
This Python code generates the same hexadecimal string (called rev_hex_string) as the Labview programm. But till now, I was unable to do the serial write with Python.
I don't know how to proceed. Do I need ASCII for serial write? And which Python codes to use? encode("ascii"), decode("hex")? I'm totally lost in the many possibilities....
In case I use encode("ascii") or decode("hex") in Python it only generates a " .01" (ASCII???) message.
When I switch the normal view in Labview it has a much longer message:
" .01 €? # | - à# A M"
(I still missing some characters, but this is the most)
Besides that; do I need carriage returns and/or buffer flushing?
I have to say I'm new to serial write communication. I hope you can help me.
Sorry for the long post, I hope to be as precise as possible.
Does anyone know the name of a codec that can translate any random assortment of bytes into a string? I have been getting the following error after encoding, encrypting, and decoding a string in tkinter.Text.
UnicodeDecodeError: 'utf8' codec can't decode
byte 0x99 in position 151: unexpected code byte
Code used to generate the error follow below. The UTF8 codec listed at the top has problems translating some bytes back into a string. What I am looking for is an answer that solves the problem, not direction.
from tkinter import *
import traceback
from tkinter.scrolledtext import ScrolledText
CODEC = 'utf8'
################################################################################
class MarkovDemo:
def __init__(self, master):
self.prompt_size = Label(master, anchor=W, text='Encode Word Size')
self.prompt_size.pack(side=TOP, fill=X)
self.size_entry = Entry(master)
self.size_entry.insert(0, '8')
self.size_entry.pack(fill=X)
self.prompt_plain = Label(master, anchor=W, text='Plaintext Characters')
self.prompt_plain.pack(side=TOP, fill=X)
self.plain_entry = Entry(master)
self.plain_entry.insert(0, '""')
self.plain_entry.pack(fill=X)
self.showframe = Frame(master)
self.showframe.pack(fill=X, anchor=W)
self.showvar = StringVar(master)
self.showvar.set("encode")
self.showfirstradio = Radiobutton(self.showframe,
text="Encode Plaintext",
variable=self.showvar,
value="encode",
command=self.reevaluate)
self.showfirstradio.pack(side=LEFT)
self.showallradio = Radiobutton(self.showframe,
text="Decode Cyphertext",
variable=self.showvar,
value="decode",
command=self.reevaluate)
self.showallradio.pack(side=LEFT)
self.inputbox = ScrolledText(master, width=60, height=10, wrap=WORD)
self.inputbox.pack(fill=BOTH, expand=1)
self.dynamic_var = IntVar()
self.dynamic_box = Checkbutton(master, variable=self.dynamic_var,
text='Dynamic Evaluation',
offvalue=False, onvalue=True,
command=self.reevaluate)
self.dynamic_box.pack()
self.output = Label(master, anchor=W, text="This is your output:")
self.output.pack(fill=X)
self.outbox = ScrolledText(master, width=60, height=10, wrap=WORD)
self.outbox.pack(fill=BOTH, expand=1)
self.inputbox.bind('<Key>', self.reevaluate)
def select_all(event=None):
event.widget.tag_add(SEL, 1.0, 'end-1c')
event.widget.mark_set(INSERT, 1.0)
event.widget.see(INSERT)
return 'break'
self.inputbox.bind('<Control-Key-a>', select_all)
self.outbox.bind('<Control-Key-a>', select_all)
self.inputbox.bind('<Control-Key-/>', lambda event: 'break')
self.outbox.bind('<Control-Key-/>', lambda event: 'break')
self.outbox.config(state=DISABLED)
def reevaluate(self, event=None):
if event is not None:
if event.char == '':
return
if self.dynamic_var.get():
text = self.inputbox.get(1.0, END)[:-1]
if len(text) < 10:
return
text = text.replace('\n \n', '\n\n')
mode = self.showvar.get()
assert mode in ('decode', 'encode'), 'Bad mode!'
if mode == 'encode':
# Encode Plaintext
try:
# Evaluate the plaintext characters
plain = self.plain_entry.get()
if plain:
PC = eval(self.plain_entry.get())
else:
PC = ''
self.plain_entry.delete(0, END)
self.plain_entry.insert(0, '""')
# Evaluate the word size
size = self.size_entry.get()
if size:
XD = int(size)
while grid_size(text, XD, PC) > 1 << 20:
XD -= 1
else:
XD = 0
grid = 0
while grid <= 1 << 20:
grid = grid_size(text, XD, PC)
XD += 1
XD -= 1
# Correct the size and encode
self.size_entry.delete(0, END)
self.size_entry.insert(0, str(XD))
cyphertext, key, prime = encrypt_str(text, XD, PC)
except:
traceback.print_exc()
else:
buffer = ''
for block in key:
buffer += repr(block)[2:-1] + '\n'
buffer += repr(prime)[2:-1] + '\n\n' + cyphertext
self.outbox.config(state=NORMAL)
self.outbox.delete(1.0, END)
self.outbox.insert(END, buffer)
self.outbox.config(state=DISABLED)
else:
# Decode Cyphertext
try:
header, cypher = text.split('\n\n', 1)
lines = header.split('\n')
for index, item in enumerate(lines):
try:
lines[index] = eval('b"' + item + '"')
except:
lines[index] = eval("b'" + item + "'")
plain = decrypt_str(cypher, tuple(lines[:-1]), lines[-1])
except:
traceback.print_exc()
else:
self.outbox.config(state=NORMAL)
self.outbox.delete(1.0, END)
self.outbox.insert(END, plain)
self.outbox.config(state=DISABLED)
else:
text = self.inputbox.get(1.0, END)[:-1]
text = text.replace('\n \n', '\n\n')
mode = self.showvar.get()
assert mode in ('decode', 'encode'), 'Bad mode!'
if mode == 'encode':
try:
XD = int(self.size_entry.get())
PC = eval(self.plain_entry.get())
size = grid_size(text, XD, PC)
assert size
except:
pass
else:
buffer = 'Grid size will be:\n' + convert(size)
self.outbox.config(state=NORMAL)
self.outbox.delete(1.0, END)
self.outbox.insert(END, buffer)
self.outbox.config(state=DISABLED)
################################################################################
import random
CRYPT = random.SystemRandom()
################################################################################
# This section includes functions that
# can test the required key and bootstrap.
# sudoko_key
# - should be a proper "markov" key
def _check_sudoku_key(sudoku_key):
# Ensure key is a tuple with more than one item.
assert isinstance(sudoku_key, tuple), '"sudoku_key" must be a tuple'
assert len(sudoku_key) > 1, '"sudoku_key" must have more than one item'
# Test first item.
item = sudoku_key[0]
assert isinstance(item, bytes), 'first item must be an instance of bytes'
assert len(item) > 1, 'first item must have more than one byte'
assert len(item) == len(set(item)), 'first item must have unique bytes'
# Test the rest of the key.
for obj in sudoku_key[1:]:
assert isinstance(obj, bytes), 'remaining items must be of bytes'
assert len(obj) == len(item), 'all items must have the same length'
assert len(obj) == len(set(obj)), \
'remaining items must have unique bytes'
assert len(set(item)) == len(set(item).union(set(obj))), \
'all items must have the same bytes'
# boot_strap
# - should be a proper "markov" bootstrap
# - we will call this a "primer"
# sudoko_key
# - should be a proper "markov" key
def _check_boot_strap(boot_strap, sudoku_key):
assert isinstance(boot_strap, bytes), '"boot_strap" must be a bytes object'
assert len(boot_strap) == len(sudoku_key) - 1, \
'"boot_strap" length must be one less than "sudoku_key" length'
item = sudoku_key[0]
assert len(set(item)) == len(set(item).union(set(boot_strap))), \
'"boot_strap" may only have bytes found in "sudoku_key"'
################################################################################
# This section includes functions capable
# of creating the required key and bootstrap.
# bytes_set should be any collection of bytes
# - it should be possible to create a set from them
# - these should be the bytes on which encryption will follow
# word_size
# - this will be the size of the "markov" chains program uses
# - this will be the number of dimensions the "grid" will have
# - one less character will make up bootstrap (or primer)
def make_sudoku_key(bytes_set, word_size):
key_set = set(bytes_set)
blocks = []
for block in range(word_size):
blocks.append(bytes(CRYPT.sample(key_set, len(key_set))))
return tuple(blocks)
# sudoko_key
# - should be a proper "markov" key
def make_boot_strap(sudoku_key):
block = sudoku_key[0]
return bytes(CRYPT.choice(block) for byte in range(len(sudoku_key) - 1))
################################################################################
# This section contains functions needed to
# create the multidimensional encryption grid.
# sudoko_key
# - should be a proper "markov" key
def make_grid(sudoku_key):
grid = expand_array(sudoku_key[0], sudoku_key[1])
for block in sudoku_key[2:]:
grid = expand_array(grid, block)
return grid
# grid
# - should be an X dimensional grid from make_grid
# block_size
# - comes from length of one block in a sudoku_key
def make_decode_grid(grid, block_size):
cache = []
for part in range(0, len(grid), block_size):
old = grid[part:part+block_size]
new = [None] * block_size
key = sorted(old)
for index, byte in enumerate(old):
new[key.index(byte)] = key[index]
cache.append(bytes(new))
return b''.join(cache)
# grid
# - should be an X dimensional grid from make_grid
# block
# - should be a block from a sudoku_key
# - should have same unique bytes as the expanding grid
def expand_array(grid, block):
cache = []
grid_size = len(grid)
block_size = len(block)
for byte in block:
index = grid.index(bytes([byte]))
for part in range(0, grid_size, block_size):
cache.append(grid[part+index:part+block_size])
cache.append(grid[part:part+index])
return b''.join(cache)
################################################################################
# The first three functions can be used to check an encryption
# grid. The eval_index function is used to evaluate a grid cell.
# grid
# - grid object to be checked
# - grid should come from the make_grid function
# - must have unique bytes along each axis
# block_size
# - comes from length of one block in a sudoku_key
# - this is the length of one edge along the grid
# - each axis is this many unit long exactly
# word_size
# - this is the number of blocks in a sudoku_key
# - this is the number of dimensions in a grid
# - this is the length needed to create a needed markon chain
def check_grid(grid, block_size, word_size):
build_index(grid, block_size, word_size, [])
# create an index to access the grid with
def build_index(grid, block_size, word_size, index):
for number in range(block_size):
index.append(number)
if len(index) == word_size:
check_cell(grid, block_size, word_size, index)
else:
build_index(grid, block_size, word_size, index)
index.pop()
# compares the contents of a cell along each grid axis
def check_cell(grid, block_size, word_size, index):
master = eval_index(grid, block_size, index)
for axis in range(word_size):
for value in range(block_size):
if index[axis] != value:
copy = list(index)
copy[axis] = value
slave = eval_index(grid, block_size, copy)
assert slave != master, 'Cell not unique along axis!'
# grid
# - grid object to be accessed and evaluated
# - grid should come from the make_grid function
# - must have unique bytes along each axis
# block_size
# - comes from length of one block in a sudoku_key
# - this is the length of one edge along the grid
# - each axis is this many unit long exactly
# index
# - list of coordinates to access the grid
# - should be of length word_size
# - should be of length equal to number of dimensions in the grid
def eval_index(grid, block_size, index):
offset = 0
for power, value in enumerate(reversed(index)):
offset += value * block_size ** power
return grid[int(offset)]
################################################################################
# The following functions act as a suite that can ultimately
# encrpyt strings, though other functions can be built from them.
# bytes_obj
# - the bytes to encode
# byte_map
# - byte tranform map for inserting into the index
# grid
# - X dimensional grid used to evaluate markov chains
# index
# - list that starts the index for accessing grid (primer)
# - it should be of length word_size - 1
# block_size
# - length of each edge in a grid
def _encode(bytes_obj, byte_map, grid, index, block_size):
cache = bytes()
index = [0] + index
for byte in bytes_obj:
if byte in byte_map:
index.append(byte_map[byte])
index = index[1:]
cache += bytes([eval_index(grid, block_size, index)])
else:
cache += bytes([byte])
return cache, index[1:]
# bytes_obj
# - the bytes to encode
# sudoko_key
# - should be a proper "markov" key
# - this key will be automatically checked for correctness
# boot_strap
# - should be a proper "markov" bootstrap
def encrypt(bytes_obj, sudoku_key, boot_strap):
_check_sudoku_key(sudoku_key)
_check_boot_strap(boot_strap, sudoku_key)
# make byte_map
array = sorted(sudoku_key[0])
byte_map = dict((byte, value) for value, byte in enumerate(array))
# create two more arguments for encode
grid = make_grid(sudoku_key)
index = list(map(byte_map.__getitem__, boot_strap))
# run the actual encoding algorithm and create reversed map
code, index = _encode(bytes_obj, byte_map, grid, index, len(sudoku_key[0]))
rev_map = dict(reversed(item) for item in byte_map.items())
# fix the boot_strap and return the results
boot_strap = bytes(rev_map[number] for number in index)
return code, boot_strap
# string
# - should be the string that you want encoded
# word_size
# - length you want the markov chains to be of
# plain_chars
# - characters that you do not want to encrypt
def encrypt_str(string, word_size, plain_chars=''):
byte_obj = string.encode(CODEC)
encode_on = set(byte_obj).difference(set(plain_chars.encode()))
sudoku_key = make_sudoku_key(encode_on, word_size)
boot_strap = make_boot_strap(sudoku_key)
cyphertext = encrypt(byte_obj, sudoku_key, boot_strap)[0]
# return encrypted string, key, and original bootstrap
return cyphertext.decode(CODEC), sudoku_key, boot_strap
def grid_size(string, word_size, plain_chars):
encode_on = set(string.encode()).difference(set(plain_chars.encode()))
return len(encode_on) ** word_size
################################################################################
# The following functions act as a suite that can ultimately
# decrpyt strings, though other functions can be built from them.
# bytes_obj
# - the bytes to encode
# byte_map
# - byte tranform map for inserting into the index
# grid
# - X dimensional grid used to evaluate markov chains
# index
# - list that starts the index for accessing grid (primer)
# - it should be of length word_size - 1
# block_size
# - length of each edge in a grid
def _decode(bytes_obj, byte_map, grid, index, block_size):
cache = bytes()
index = [0] + index
for byte in bytes_obj:
if byte in byte_map:
index.append(byte_map[byte])
index = index[1:]
decoded = eval_index(grid, block_size, index)
index[-1] = byte_map[decoded]
cache += bytes([decoded])
else:
cache += bytes([byte])
return cache, index[1:]
# bytes_obj
# - the bytes to decode
# sudoko_key
# - should be a proper "markov" key
# - this key will be automatically checked for correctness
# boot_strap
# - should be a proper "markov" bootstrap
def decrypt(bytes_obj, sudoku_key, boot_strap):
_check_sudoku_key(sudoku_key)
_check_boot_strap(boot_strap, sudoku_key)
# make byte_map
array = sorted(sudoku_key[0])
byte_map = dict((byte, value) for value, byte in enumerate(array))
# create two more arguments for decode
grid = make_grid(sudoku_key)
grid = make_decode_grid(grid, len(sudoku_key[0]))
index = list(map(byte_map.__getitem__, boot_strap))
# run the actual decoding algorithm and create reversed map
code, index = _decode(bytes_obj, byte_map, grid, index, len(sudoku_key[0]))
rev_map = dict(reversed(item) for item in byte_map.items())
# fix the boot_strap and return the results
boot_strap = bytes(rev_map[number] for number in index)
return code, boot_strap
# string
# - should be the string that you want decoded
# word_size
# - length you want the markov chains to be of
# plain_chars
# - characters that you do not want to encrypt
def decrypt_str(string, sudoku_key, boot_strap):
byte_obj = string.encode(CODEC)
plaintext = decrypt(byte_obj, sudoku_key, boot_strap)[0]
# return encrypted string, key, and original bootstrap
return plaintext.decode(CODEC)
################################################################################
def convert(number):
"Convert bytes into human-readable representation."
assert 0 < number < 1 << 110, 'Number Out Of Range'
ordered = reversed(tuple(format_bytes(partition_number(number, 1 << 10))))
cleaned = ', '.join(item for item in ordered if item[0] != '0')
return cleaned
################################################################################
def partition_number(number, base):
"Continually divide number by base until zero."
div, mod = divmod(number, base)
yield mod
while div:
div, mod = divmod(div, base)
yield mod
def format_bytes(parts):
"Format partitioned bytes into human-readable strings."
for power, number in enumerate(parts):
yield '{} {}'.format(number, format_suffix(power, number))
def format_suffix(power, number):
"Compute the suffix for a certain power of bytes."
return (PREFIX[power] + 'byte').capitalize() + ('s' if number != 1 else '')
################################################################################
PREFIX = ' kilo mega giga tera peta exa zetta yotta bronto geop'.split(' ')
################################################################################
if __name__ == '__main__':
root = Tk()
root.title('Markov Demo')
demo = MarkovDemo(root)
root.mainloop()
Strings are by definition a sequence of bytes that only have meaning when interpreted with the knowledge of the encoding. That's one reason why the equivalent of Python 2's string type in Python 3 is the bytes type. As long as you know the encoding of the strings you're working with, I'm not sure you specifically need to recode it just to compress/encrypt it. Details of what you're actually doing might make a difference, though.
Python's decode has error settings. The default is strict which throws an exception.
Wherever you are doing the decoding, you can specify 'ignore' or 'replace' as a setting, and this will take care of your problems.
Please see the codecs documentation.
In Python HOWTOs from the Python v3.1.1 documentation, there is a helpful section regarding Unicode HOWTO. The table of content contains an entry to Python’s Unicode Support that explains string & byte.
The String Type
>>> b'\x80abc'.decode("utf-8", "strict")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0:
unexpected code byte
>>> b'\x80abc'.decode("utf-8", "replace")
'\ufffdabc'
>>> b'\x80abc'.decode("utf-8", "ignore")
'abc'
Converting to Bytes
>>> u = chr(40960) + 'abcd' + chr(1972)
>>> u.encode('utf-8')
b'\xea\x80\x80abcd\xde\xb4'
>>> u.encode('ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in
position 0: ordinal not in range(128)
>>> u.encode('ascii', 'ignore')
b'abcd'
>>> u.encode('ascii', 'replace')
b'?abcd?'
>>> u.encode('ascii', 'xmlcharrefreplace')
b'ꀀabcd'
One possible solution to the problem listed above involves covert all occurrences of
.encode(CODEC) with .encode(CODEC, 'ignore'). Likewise, all .decode(CODEC) become .decode(CODEC, 'ignore').