Need Assistance in Calculating Checksum - python

I am working on an interface in Python to a home automation system (ElkM1). I have sample code in C# below which apparently correctly calculates the checksum needed when sending messages to this system. I put together the python code below but it doesn't appear to be returning the correct value.
According to the documentation the checksum of the message needs to be the sum of the ASCII values of the message in mod256 then taken as 2s complement. From their manual: "This is the hexadecimal two‟s complement of the modulo-256 sum of the ASCII values of all characters in the message excluding the checksum itself and the CR-LF terminator at the end of the message. Permissible characters are ASCII 0-9 and upper case A-F. When all the characters are added to the Checksum, the value should equal 0."
The vendor has a tool which will calculate the correct checksum. As test data I have been using '00300005000' which should return a checksum of 74
My code returns 18
Thanks in advance.
My Code (Python)
def calc_checksum (string):
'''
Calculates checksum for sending commands to the ELKM1.
Sums the ASCII character values mod256 and takes
the Twos complement
'''
sum= 0
for i in range(len(string)) :
sum = sum + ord(string[i])
temp = sum % 256 #mod256
rem = temp ^ 256 #inverse
cc1 = hex(rem)
cc = cc1.upper()
p=len(cc)
return cc[p-2:p]
Their Code C#:
private string checksum(string s)
{
int sum = 0;
foreach (char c in s)
sum += (int)c;
sum = -(sum % 256);
return ((byte)sum).ToString("X2");
}

FWIW, here's a literal translation of the C# code into Python:
def calc_checksum(s):
sum = 0
for c in s:
sum += ord(c)
sum = -(sum % 256)
return '%2X' % (sum & 0xFF)
print calc_checksum('00300005000')
It outputs is E8 for the message shown which is different from both your and the C# code. Given the description in the manual and doing the calculations by hand, I don't see how their answer could be 74. How do you know that's the correct answer?
After seeing Mark Ransom's comment that the C# code does indeed return E8, I spent some time debugging your Python code and found out why it doesn't produce the same result. One problem is that it doesn't calculate the two's complement correctly on the line with the comment #inverse in your code. There's at least a couple of ways to do that correctly.
A second problem is way the hex() function handles negative numbers is not what you'd might expect. With the -24 two's complement in this case it produces -0x18, not 0xffe8 or something similar. This means that just taking the last two characters of the uppercased result would be incorrect. An really easy way to do that is just convert the lower byte of the value to uppercase hexadecimal using the % string interpolation operator. Here's a working version of your function:
def calc_checksum(string):
'''
Calculates checksum for sending commands to the ELKM1.
Sums the ASCII character values mod256 and takes
the Twos complement.
'''
sum = 0
for i in range(len(string)):
sum = sum + ord(string[i])
temp = sum % 256 # mod256
# rem = (temp ^ 0xFF) + 1 # two's complement, hard way (one's complement + 1)
rem = -temp # two's complement, easier way
return '%2X' % (rem & 0xFF)
A more Pythonic (and faster) implementation would be a one-liner like this which makes use of the built-in sum() function:
def calc_checksum(s):
"""
Calculates checksum for sending commands to the ELKM1.
Sums the ASCII character values mod256 and returns
the lower byte of the two's complement of that value.
"""
return '%2X' % (-(sum(ord(c) for c in s) % 256) & 0xFF)

Related

Calculate the checksum using xor in python 3

So, i am collecting some codes from a ip device, and i am struggling to calc it's checksum.
For example, this is the package that I collected using a simple socket in python:
b'\x07\x94ES(\xff\xceY:'
Converting it to a more human readable using .hex(), i got this:
0794455328ffce593a
3a is the given checksum, i should be able to get the same value by xor the code (like 07^94^45^53^28^ff^ce^59^FF = 3a), but i can't figure out how. I tried to xor the values as integers, but the result was way off.
BTW, 07 is the number of bytes of the package.
Another string example is
b'\x11\xb0\x11\x05\x03\x02\x08\x01\x08\x01\x03\x08\x03\n\x01\n\n\x01I'
Anyone have an idea?
with a little guess work and 2 examples, it seems that the xor algorithm used is flipping all the bits somewhere. Doing that flip makes the value of the examples match.
data_list = [b'\x07\x94ES(\xff\xceY:', b'\x11\xb0\x11\x05\x03\x02\x08\x01\x08\x01\x03\x08\x03\n\x01\n\n\x01I']
for data in data_list:
value = data[0]
for d in data[1:-1]:
value ^= d
checksum = value ^ 0xFF # negate all the bits
if checksum == data[-1]:
print("checksum match for {}".format(data))
else:
print("checksum DOES NOT MATCH for {}".format(data))
prints:
checksum match for b'\x07\x94ES(\xff\xceY:'
checksum match for b'\x11\xb0\x11\x05\x03\x02\x08\x01\x08\x01\x03\x08\x03\n\x01\n\n\x01I'
not sure if it helps future readers but at least this is solved.
If you're curious, here's a direct port of the C# implementation you put in a comment:
def calculate(data):
xor = 0
for byte in data:
xor ^= byte
xor ^= 0xff
return xor
I didn't realise the last byte was in fact the checksum.

python calculate signed crc32 integer to checksum

I have the following string I would like to calculate a checksum for.
3556.5:200:3557.0:2:3556.4:84:3557.4:4:3555.7:6:3557.7:14:3555.1:46:3558.6:21:3552.9:14:3558.7:10:3552.8:194:3558.8:106:3552.7:10:3558.9:10:3552.6:25:3560.2:178:3552.5:4:3560.5:111:3551.7:1:3561.7:1:3551.6:65:3562.5:18:3551.0:103:3562.6:111:3550.7:3:3562.7:3:3550.6:4:3562.8:185:3550.5:1:3563.7:1:3550.3:84:3564.2:1:3550.2:156:3564.8:153:3550.0:82:3565.0:400:3549.7:1:3565.9:60:3548.4:104:3566.1:20:3547.2:177:3566.5:40:3545.9:1:3568.0:20:3545.1:11:3569.4:12:3545.0:71:3570.0:82:3544.9:1:3570.6:4
I do it the following
string2 = string.encode('ascii')
checksum = zlib.crc32((string2))
This gives me an integer of 3467096777. However, the server provider says it should be -949017128. Additionally, I tried many variants of the string and always ended up with a positive number, which somehow leads me to the possibility that my way of calculating a signed crc32 integer is wrong.
I converted the -949017128 via the following
checksum_server = -949017128 & 0xffffffff
it yields 3345950168, which is still different from mine.
Is there a way to calculate the string out of the signed crc32 integer -949017128?
I think it is the BTC's price at okex. What a good time!
crc32 of zlib returns a unsigned number which is different from signed number as the API documentation; if the server check_sum is positive, they should be equal; if server check_sum is negative, there is a need to check as below (should have better solution):
check_sum = zlib.crc32(checksum_str.encode("utf-8"))
if server_check_sum < 0 and 2 ** 32 - check_sum + server_check_sum == 0 or server_check_sum == check_sum:
print(f"{instrument_id}: checksum successful")
You must ensure the string is corrected formatted, no "0" added if you do the type conversion.

Why does using this code can generate a random password?

Here a snippet for generating password code,
I have 2 questions about this, Could you please share how to understand?
urandom(6), help from urandom said,return n random bytes suitable for cryptographic use, it is say, it will return 6 bytes, is it 6 of ASCII ?
ord(c) , get the decimal base for above bytes, why here transfer to decimal base?
Help for urandom:
def urandom(n): # real signature unknown; restored from __doc__
"""
urandom(n) -> str
Return n random bytes suitable for cryptographic use.
"""
return ""
Python script:
from os import urandom
letters = "ABCDEFGHJKLMNPRSTUVWXYZ"
password = "".join(letters[ord(c) % len(letters)] for c in urandom(6))
urandom will return a byte (i.e. a value between 0 and 255). The sample code uses that value and the modulo operator (%) to convert it into a value between 0 and 22, so that it can return one of the 23 letters (I, O, and Q are excluded not to be confused with numbers).
Note that it is not a perfectly balanced algorithm as it would favour the first 3 letters (A, B, and C) more, because 256 is not divisible by 23 and 256 % 23 is 3.
ord() function takes in a string containing a single character, and returns its Unicode index.
ex.
ord("A") => 65
ord("£") => 163
It is not used to get the decimal base of a byte as you mentioned, but rather its Unicode Index (its place in the Unicode Table).
P.S. :- Even though it returns the Unicode index but that doesn't mean its, range = len(Unicode Table) , the reason being that your python compiler may not support such long character sets under normal circumstances.

Get the string that is the midpoint between two other strings

Is there a library or code snippet available that can take two strings and return the exact or approximate mid-point string between the two strings?
Preferably the code would be in Python.
Background:
This seems like a simple problem on the surface, but I'm kind of struggling with it:
Clearly, the midpoint string between "A" and "C" would be "B".
With base64 encoding, the midpoint string between "A" and "B" would probably be "Ag"
With UTF-8 encoding, I'm not sure what the valid midpoint would be because the middle character seems to be a control character: U+0088 c2 88 <control>
Practical Application:
The reason I am asking is because I was hoping write map-reduce type algorithm to read all of the entries out of our database and process them. The primary keys in the database are UTF-8 encoded strings with random distributions of characters. The database we are using is Cassandra.
Was hoping to get the lowest key and the highest key out of the database, then break that up into two ranges by finding the midpoint, then breaking those two ranges up into two smaller sections by finding each of their midpoints until I had a few thousand sections, then I could read each section asynchronously.
Example if the strings were base-16 encoded: (Some of the midpoints are approximate):
Starting highest and lowest keys: '000' 'FFF'
/ \ / \
'000' '8' '8' 'FFF'
/ \ / \ / \ / \
Result: '000' '4' '4' '8' '8' 'B8' 'B8' 'FFF'
(After 3 levels of recursion)
Unfortunately not all sequences of bytes are valid UTF-8, so it's not trivial to just take the midpoint of the UTF-8 values, like the following.
def midpoint(s, e):
'''Midpoint of start and end strings'''
(sb, eb) = (int.from_bytes(bytes(x, 'utf-8'), byteorder='big') for x in (s, e))
midpoint = int((eb - sb) / 2 + sb)
midpoint_bytes = midpoint.to_bytes((midpoint.bit_length() // 8) + 1, byteorder='big')
return midpoint_bytes.decode('utf-8')
Basically this code converts each string into an integer represented by the sequence of bytes in memory, finds the midpoint of those two integers, and attempts to interpret the "midpoint" bytes as UTF-8 again.
Depending on exactly what behavior you would like, the next step could be to replace the invalid bytes in midpoint_bytes with some kind of replacement character to form a valid UTF-8 string. For your problem it might not matter much exactly which character you use for the replacement so long as you're consistent.
However, since you're trying to partition the data and don't seem to care too much about the string representation of the midpoint, another option is to just leave the midpoint representation as an integer and convert the keys to integers while doing the partition. Depending on the scale of your problem this option may or may not be feasible.
Here's a general solution that gives an approximate midpoint m between any two Unicode strings a and b, such that a < m < b if possible:
from os.path import commonprefix
# This should be set according to the range and frequency of
# characters used.
MIDCHAR = u'm'
def midpoint(a, b):
prefix = commonprefix((a, b))
p = len(prefix)
# Find the codepoints at the position where the strings differ.
ca = ord(a[p]) if len(a) > p else None
cb = ord(b[p])
# Find the approximate middle code point.
cm = (cb // 2 if ca is None else (ca + cb) // 2)
# If a middle code point was found, add it and return.
if ca < cm < cb:
return prefix + unichr(cm)
# If b still has more characters after this, then just use
# b's code point and return.
if len(b) > p + 1:
return prefix + unichr(cb)
# Otherwise, if cb == 0, then a and b are consecutive so there
# is no midpoint. Return a.
if cb == 0:
return a
# Otherwise, use part of a and an extra character so that
# the result is greater than a.
i = p + 1
while i < len(a) and a[i] >= MIDCHAR:
i += 1
return a[:i] + MIDCHAR
The function assumes that a < b. Other than that, it should work with arbitrary Unicode strings, even ones containing u'\x00' characters. Note also that it may return strings containing u'\x00' or other nonstandard code points. If there is no midpoint due to b == a + u'\x00' then a is returned.

how to convert negative integer value to hex in python

I use python 2.6
>>> hex(-199703103)
'-0xbe73a3f'
>>> hex(199703103)
'0xbe73a3f'
Positive and negative value are the same?
When I use calc, the value is FFFFFFFFF418C5C1.
Python's integers can grow arbitrarily large. In order to compute the raw two's-complement the way you want it, you would need to specify the desired bit width. Your example shows -199703103 in 64-bit two's complement, but it just as well could have been 32-bit or 128-bit, resulting in a different number of 0xf's at the start.
hex() doesn't do that. I suggest the following as an alternative:
def tohex(val, nbits):
return hex((val + (1 << nbits)) % (1 << nbits))
print tohex(-199703103, 64)
print tohex(199703103, 64)
This prints out:
0xfffffffff418c5c1L
0xbe73a3fL
Because Python integers are arbitrarily large, you have to mask the values to limit conversion to the number of bits you want for your 2s complement representation.
>>> hex(-199703103 & (2**32-1)) # 32-bit
'0xf418c5c1L'
>>> hex(-199703103 & (2**64-1)) # 64-bit
'0xfffffffff418c5c1L'
Python displays the simple case of hex(-199703103) as a negative hex value (-0xbe73a3f) because the 2s complement representation would have an infinite number of Fs in front of it for an arbitrary precision number. The mask value (2**32-1 == 0xFFFFFFFF) limits this:
FFF...FFFFFFFFFFFFFFFFFFFFFFFFF418c5c1
& FFFFFFFF
--------------------------------------
F418c5c1
Adding to Marks answer, if you want a different output format, use
'{:X}'.format(-199703103 & (2**32-1))
For those who want leading zeros for positive numbers, try this:
val = 42
nbits = 16
'{:04X}'.format(val & ((1 << nbits)-1))
Thanks #tm1, for the inspiration!

Categories