How to calculate CRC in python - python

My question is how should I calculate CRC? I know that when you calculate CRC, you have a checksum and you need to append this to the original data, and when you send the data+checksum together, the receiver can detect if there are any errors by simply dividing the data by a polynomial and if there is any remainder other than 0.
I don't know how to append since everyone is only talking about using the codes like the following:
crc32 = crcmod.mkCrcFun(0x104c11db7, 0, False, 0xFFFFFFFF)
bytes_read = f.read(BUFFER_SIZE)
this_chunk_crc=crc32(bytes_read)#will return some integer

You're already calculating the CRC. You are apparently asking how to append the CRC, an integer, to your message, a byte string.
You would use crc.to_bytes(4, 'big') to convert the returned CRC to a string of four bytes in big-endian order. You can then append that to your message. E.g.:
msg += crc32(msg).to_bytes(4, 'big')
I picked big-endian order because your CRC is defined with a False in the third argument. If that had been True (a reflected CRC), then I would have picked little-endian order, 'little'.
Having done all that correctly, the CRC of the resulting message with CRC appended will always be the same constant. But not zero in this case. The constant for that CRC definition is 0x38fb2284, or 955982468 in decimal.

Related

Convert compression algorithm to decompression

sorry for the simple question, but it's blowing by brain, since I'm not good at data structure.
First, I have an initial binary file with compressed raw data. My colleague helped me out to turn the bytes into an array of decimals in Python (the code is given below and works just fine, showing the result as a chart in pyplot).
Now, I want to do the reverse operation, e.g. turn an array of decimal numbers into a binary file, but I'm totally stuck. Thank you very much in advance!
data_out = []
# decode 1st point
data_out.append(int.from_bytes(data_in[0:4], byteorder='big', signed=True))
i = 4
while i < len(data_in):
# get next byte
curr = int.from_bytes(data_in[i:i+1], byteorder='big', signed=False)
if curr < 255:
res = curr - 127
data_out.append(res + data_out[-1])
i = i + 1
else:
res = int.from_bytes(data_in[i+1:i+5], byteorder='little', signed=True)
data_out.append(res)
i = i + 5
from matplotlib import pyplot as plt
plt.plot(data_out)
plt.show()
The original stream of bytes was encoded as one or four-byte integers. The first value is sent as a four-byte integer. After the first value, you have either one byte in the range 0..254, which represents a difference of -127 to 127, or you have 255 followed by four-byte signed little-endian integer, which is the next value (not a difference). The idea is that if the integers are changing slowly from one to the next, this will compress the sequence by up to a factor of four by sending small differences as one byte instead of four. Though if you have too many differences that don't fit in a byte, this could expand the data by 25%, since non-difference values take five bytes instead of four.
To encode such a stream, you start by encoding the first value directly as four bytes, little endian. For each subsequent value, you subtract the previous value from this one. If the result is in the range -127 to 127, then add 127 and send that byte. Otherwise send a 255 byte, followed by the value (not the difference) as a four-byte signed little-endian integer.
As pointed out by #greybeard, there is an error in your colleague's code (assuming it was copied here correctly), in that res is not initialized. The first point decoding needs to be:
# decode 1st point
res = int.from_bytes(data_in[0:4], byteorder='big', signed=True)
data_out.append(res)

Verifying CRC32 of UDP given .jpg file of payload

I'm running a server that receives UDP packets that contain a 2 byte CRC32 polynomial and a variable number of XOR'd DWORDs corresponding to a .jpg file. The packets also contain the index of the corresponding DWORD in the .jpg file for each DWORD in the packet. I am also given the actual .jpg file.
For example, the packet could contain 10 DWORDs and specify the starting index as 3, so we can expect the received DWORDs to correspond with the 4th through 11th DWORDs making up the .jpg.
I want to verify the integrity of each of the DWORDs by comparing their CRC32 values against the CRC32 values of the corresponding DWORDs in the .jpg.
I thought that the proper way to do this would be to divide each DWORD in the packet and its corresponding DWORD in the .jpg by the provided CRC polynomial and analyze the remainder. If the remainders are the same after doing these divisions, then there is no problem with the packet. However, even with packets that are guaranteed to be correct, these remainders are never equal.
Here is how I'm reading the bytes of the actual .jpg and splitting them up into DWORDs:
def split(data):
# Split the .jpg data into DWORDs
chunks = []
for i in range(0, len(data), 4):
chunks.append(data[i: i + 4])
return chunks
def get_image_bytes():
with open("dog.jpg", "rb") as image:
f = image.read()
jpg_bytes = split(f)
return jpg_bytes
Now I have verified my split() function works and to my knowledge, get_image_bytes() reads the .jpg correctly by calling image.read().
After receiving a packet, I convert each DWORD to binary and perform the mod 2 division like so:
jpg_bytes = get_image_bytes()
crc_key_bin = '1000110111100' # binary representation of the received CRC32 polynomial
d_words = [b'\xc3\xd4)v', ... , b'a4\x96\xbb']
iteration = 0 # For simplicity, assume the packet specified that the starting index is 0
for d in d_words:
d_bin = format(int(d.hex(), 16), "b") # binary representation of the DWORD from the packet
jpg_dword = format(int(jpg_bytes[iteration].hex(), 16), "b") # binary representation of the corresponding DWORD in dog.jpg
remainder1 = mod2div(d_bin, crc_key_bin) # <--- These remainders should be
remainder2 = mod2div(jpg_dword, crc_key_bin) # <--- equal, but they're not!
iteration += 1
I have tested the mod2div() function, and it returns the expected remainder after performing mod 2 division.
Where am I going wrong? I'm expecting the 2 remainders to be equal, but they never are. I'm not sure if the way I'm reading the bytes from the .jpg file is incorrect, if I'm performing the mod 2 division with the wrong values, or if I'm completely misunderstanding how to verify the CRC32 values. I'd appreciate any help.
First off, there's no such thing as a "2 byte CRC32 polynomial". A 32-bit CRC needs 32-bits to specify the polynomial.
Second, a CRC polynomial is something that is fixed for a given protocol. Why is a CRC polynomial being transmitted, as opposed to simply specified? Are you sure it's the polynomial? Where is this all documented?
What does "XOR'd DWORDs" means? Exclusive-or'd with what?
And, yes, I think you are completely misunderstanding how to verify CRC values. All you need to do is calculate the check values on the message the same way it was done at the other end, and compare that to the check values that were transmitted. (That is true for any check value, not just CRCs.) However I cannot tell from your description what was calculated on what, or how.

Calculate the checksum using xor in python 3

So, i am collecting some codes from a ip device, and i am struggling to calc it's checksum.
For example, this is the package that I collected using a simple socket in python:
b'\x07\x94ES(\xff\xceY:'
Converting it to a more human readable using .hex(), i got this:
0794455328ffce593a
3a is the given checksum, i should be able to get the same value by xor the code (like 07^94^45^53^28^ff^ce^59^FF = 3a), but i can't figure out how. I tried to xor the values as integers, but the result was way off.
BTW, 07 is the number of bytes of the package.
Another string example is
b'\x11\xb0\x11\x05\x03\x02\x08\x01\x08\x01\x03\x08\x03\n\x01\n\n\x01I'
Anyone have an idea?
with a little guess work and 2 examples, it seems that the xor algorithm used is flipping all the bits somewhere. Doing that flip makes the value of the examples match.
data_list = [b'\x07\x94ES(\xff\xceY:', b'\x11\xb0\x11\x05\x03\x02\x08\x01\x08\x01\x03\x08\x03\n\x01\n\n\x01I']
for data in data_list:
value = data[0]
for d in data[1:-1]:
value ^= d
checksum = value ^ 0xFF # negate all the bits
if checksum == data[-1]:
print("checksum match for {}".format(data))
else:
print("checksum DOES NOT MATCH for {}".format(data))
prints:
checksum match for b'\x07\x94ES(\xff\xceY:'
checksum match for b'\x11\xb0\x11\x05\x03\x02\x08\x01\x08\x01\x03\x08\x03\n\x01\n\n\x01I'
not sure if it helps future readers but at least this is solved.
If you're curious, here's a direct port of the C# implementation you put in a comment:
def calculate(data):
xor = 0
for byte in data:
xor ^= byte
xor ^= 0xff
return xor
I didn't realise the last byte was in fact the checksum.

Converting bytes to signed numbers in Python

I am faced with a problem in Python and I think I don't understand how signed numbers are handled in Python. My logic works in Java where everything is signed so need some help in Python.
I have some bytes that are coded in HEX and I need to decode them and interpret them to numbers. The protocol are defined.
Say the input may look like:
raw = '016402570389FFCF008F1205DB2206CA'
And I decode like this:
bin_bytes = binascii.a2b_hex(raw)
lsb = bin_bytes[5] & 0xff
msb = bin_bytes[6] << 8
aNumber = int(lsb | msb)
print(" X: " + str(aNumber / 4000.0))
After dividing by 4000.0, X can be in a range of -0.000025 to +0.25.
This logic works when X is in positive range. When X is expected
to be negative, I am getting back a positive number.
I think I am not handling "msb" correctly when it is a signed number.
How should I handlehandle negative signed number in
Python?
Any tips much appreciated.
You can use Python's struct module to convert the byte string to integers. It takes care of endianness and sign extension for you. I guess you are trying to interpret this 16-byte string as 8 2-byte signed integers, in big-endian byte order. The format string for this is '>8h. The > character tells Python to interpret the string as big endian, 8 means 8 of the following data type, and h means signed short integers.
import struct
nums = struct.unpack('>8h', bin_bytes)
Now nums is a tuple of integers that you can process further.
I'm not quite sure if your data is little or big endian. If it is little-endian, you can use < to indicate that in the struct.unpack format string.

How can I convert two bytes of an integer back into an integer in Python?

I am currently using an Arduino that's outputting some integers (int) through Serial (using pySerial) to a Python script that I'm writing for the Arduino to communicate with X-Plane, a flight simulation program.
I managed to separate the original into two bytes so that I could send it over to the script, but I'm having a little trouble reconstructing the original integer.
I tried using basic bitwise operators (<<, >> etc.) as I would have done in a C++like program, but it does not seem to be working.
I suspect it has to do with data types. I may be using integers with bytes in the same operations, but I can't really tell which type each variable holds, since you don't really declare variables in Python, as far as I know (I'm very new to Python).
self.pot=self.myline[2]<<8
self.pot|=self.myline[3]
You can use the struct module to convert between integers and representation as bytes. In your case, to convert from a Python integer to two bytes and back, you'd use:
>>> import struct
>>> struct.pack('>H', 12345)
'09'
>>> struct.unpack('>H', '09')
(12345,)
The first argument to struct.pack and struct.unpack represent how you want you data to be formatted. Here, I ask for it to be in big-ending mode by using the > prefix (you can use < for little-endian, or = for native) and then I say there is a single unsigned short (16-bits integer) represented by the H.
Other possibilities are b for a signed byte, B for an unsigned byte, h for a signed short (16-bits), i for a signed 32-bits integer, I for an unsigned 32-bits integer. You can get the complete list by looking at the documentation of the struct module.
For example, using Big Endian encoding:
int.from_bytes(my_bytes, byteorder='big')
What you have seems basically like it should work, assuming the data stored in myline has the high byte first:
myline = [0, 1, 2, 3]
pot = myline[2]<<8 | myline[3]
print 'pot: {:d}, 0x{:04x}'.format(pot, pot) # outputs "pot: 515, 0x0203"
Otherwise, if it's low-byte first you'd need to do the opposite way:
myline = [0, 1, 2, 3]
pot = myline[3]<<8 | myline[2]
print 'pot: {:d}, 0x{:04x}'.format(pot, pot) # outputs "pot: 770, 0x0302"
This totally works:
long = 500
first = long & 0xff #244
second = long >> 8 #1
result = (second << 8) + first #500
If you are not sure of types in 'myline' please check Stack Overflow question How to determine the variable type in Python?.
To convert a byte or char to the number it represents, use ord(). Here's a simple round trip from an int to bytes and back:
>>> number = 3**9
>>> hibyte = chr(number / 256)
>>> lobyte = chr(number % 256)
>>> hibyte, lobyte
('L', '\xe3')
>>> print number == (ord(hibyte) << 8) + ord(lobyte)
True
If your myline variable is string or bytestring, you can use the formula in the last line above. If it somehow is a list of integers, then of course you don't need ord.

Categories