Verifying CRC32 of UDP given .jpg file of payload - python

I'm running a server that receives UDP packets that contain a 2 byte CRC32 polynomial and a variable number of XOR'd DWORDs corresponding to a .jpg file. The packets also contain the index of the corresponding DWORD in the .jpg file for each DWORD in the packet. I am also given the actual .jpg file.
For example, the packet could contain 10 DWORDs and specify the starting index as 3, so we can expect the received DWORDs to correspond with the 4th through 11th DWORDs making up the .jpg.
I want to verify the integrity of each of the DWORDs by comparing their CRC32 values against the CRC32 values of the corresponding DWORDs in the .jpg.
I thought that the proper way to do this would be to divide each DWORD in the packet and its corresponding DWORD in the .jpg by the provided CRC polynomial and analyze the remainder. If the remainders are the same after doing these divisions, then there is no problem with the packet. However, even with packets that are guaranteed to be correct, these remainders are never equal.
Here is how I'm reading the bytes of the actual .jpg and splitting them up into DWORDs:
def split(data):
# Split the .jpg data into DWORDs
chunks = []
for i in range(0, len(data), 4):
chunks.append(data[i: i + 4])
return chunks
def get_image_bytes():
with open("dog.jpg", "rb") as image:
f = image.read()
jpg_bytes = split(f)
return jpg_bytes
Now I have verified my split() function works and to my knowledge, get_image_bytes() reads the .jpg correctly by calling image.read().
After receiving a packet, I convert each DWORD to binary and perform the mod 2 division like so:
jpg_bytes = get_image_bytes()
crc_key_bin = '1000110111100' # binary representation of the received CRC32 polynomial
d_words = [b'\xc3\xd4)v', ... , b'a4\x96\xbb']
iteration = 0 # For simplicity, assume the packet specified that the starting index is 0
for d in d_words:
d_bin = format(int(d.hex(), 16), "b") # binary representation of the DWORD from the packet
jpg_dword = format(int(jpg_bytes[iteration].hex(), 16), "b") # binary representation of the corresponding DWORD in dog.jpg
remainder1 = mod2div(d_bin, crc_key_bin) # <--- These remainders should be
remainder2 = mod2div(jpg_dword, crc_key_bin) # <--- equal, but they're not!
iteration += 1
I have tested the mod2div() function, and it returns the expected remainder after performing mod 2 division.
Where am I going wrong? I'm expecting the 2 remainders to be equal, but they never are. I'm not sure if the way I'm reading the bytes from the .jpg file is incorrect, if I'm performing the mod 2 division with the wrong values, or if I'm completely misunderstanding how to verify the CRC32 values. I'd appreciate any help.

First off, there's no such thing as a "2 byte CRC32 polynomial". A 32-bit CRC needs 32-bits to specify the polynomial.
Second, a CRC polynomial is something that is fixed for a given protocol. Why is a CRC polynomial being transmitted, as opposed to simply specified? Are you sure it's the polynomial? Where is this all documented?
What does "XOR'd DWORDs" means? Exclusive-or'd with what?
And, yes, I think you are completely misunderstanding how to verify CRC values. All you need to do is calculate the check values on the message the same way it was done at the other end, and compare that to the check values that were transmitted. (That is true for any check value, not just CRCs.) However I cannot tell from your description what was calculated on what, or how.

Related

Convert compression algorithm to decompression

sorry for the simple question, but it's blowing by brain, since I'm not good at data structure.
First, I have an initial binary file with compressed raw data. My colleague helped me out to turn the bytes into an array of decimals in Python (the code is given below and works just fine, showing the result as a chart in pyplot).
Now, I want to do the reverse operation, e.g. turn an array of decimal numbers into a binary file, but I'm totally stuck. Thank you very much in advance!
data_out = []
# decode 1st point
data_out.append(int.from_bytes(data_in[0:4], byteorder='big', signed=True))
i = 4
while i < len(data_in):
# get next byte
curr = int.from_bytes(data_in[i:i+1], byteorder='big', signed=False)
if curr < 255:
res = curr - 127
data_out.append(res + data_out[-1])
i = i + 1
else:
res = int.from_bytes(data_in[i+1:i+5], byteorder='little', signed=True)
data_out.append(res)
i = i + 5
from matplotlib import pyplot as plt
plt.plot(data_out)
plt.show()
The original stream of bytes was encoded as one or four-byte integers. The first value is sent as a four-byte integer. After the first value, you have either one byte in the range 0..254, which represents a difference of -127 to 127, or you have 255 followed by four-byte signed little-endian integer, which is the next value (not a difference). The idea is that if the integers are changing slowly from one to the next, this will compress the sequence by up to a factor of four by sending small differences as one byte instead of four. Though if you have too many differences that don't fit in a byte, this could expand the data by 25%, since non-difference values take five bytes instead of four.
To encode such a stream, you start by encoding the first value directly as four bytes, little endian. For each subsequent value, you subtract the previous value from this one. If the result is in the range -127 to 127, then add 127 and send that byte. Otherwise send a 255 byte, followed by the value (not the difference) as a four-byte signed little-endian integer.
As pointed out by #greybeard, there is an error in your colleague's code (assuming it was copied here correctly), in that res is not initialized. The first point decoding needs to be:
# decode 1st point
res = int.from_bytes(data_in[0:4], byteorder='big', signed=True)
data_out.append(res)

How to calculate CRC in python

My question is how should I calculate CRC? I know that when you calculate CRC, you have a checksum and you need to append this to the original data, and when you send the data+checksum together, the receiver can detect if there are any errors by simply dividing the data by a polynomial and if there is any remainder other than 0.
I don't know how to append since everyone is only talking about using the codes like the following:
crc32 = crcmod.mkCrcFun(0x104c11db7, 0, False, 0xFFFFFFFF)
bytes_read = f.read(BUFFER_SIZE)
this_chunk_crc=crc32(bytes_read)#will return some integer
You're already calculating the CRC. You are apparently asking how to append the CRC, an integer, to your message, a byte string.
You would use crc.to_bytes(4, 'big') to convert the returned CRC to a string of four bytes in big-endian order. You can then append that to your message. E.g.:
msg += crc32(msg).to_bytes(4, 'big')
I picked big-endian order because your CRC is defined with a False in the third argument. If that had been True (a reflected CRC), then I would have picked little-endian order, 'little'.
Having done all that correctly, the CRC of the resulting message with CRC appended will always be the same constant. But not zero in this case. The constant for that CRC definition is 0x38fb2284, or 955982468 in decimal.

Calculate the checksum using xor in python 3

So, i am collecting some codes from a ip device, and i am struggling to calc it's checksum.
For example, this is the package that I collected using a simple socket in python:
b'\x07\x94ES(\xff\xceY:'
Converting it to a more human readable using .hex(), i got this:
0794455328ffce593a
3a is the given checksum, i should be able to get the same value by xor the code (like 07^94^45^53^28^ff^ce^59^FF = 3a), but i can't figure out how. I tried to xor the values as integers, but the result was way off.
BTW, 07 is the number of bytes of the package.
Another string example is
b'\x11\xb0\x11\x05\x03\x02\x08\x01\x08\x01\x03\x08\x03\n\x01\n\n\x01I'
Anyone have an idea?
with a little guess work and 2 examples, it seems that the xor algorithm used is flipping all the bits somewhere. Doing that flip makes the value of the examples match.
data_list = [b'\x07\x94ES(\xff\xceY:', b'\x11\xb0\x11\x05\x03\x02\x08\x01\x08\x01\x03\x08\x03\n\x01\n\n\x01I']
for data in data_list:
value = data[0]
for d in data[1:-1]:
value ^= d
checksum = value ^ 0xFF # negate all the bits
if checksum == data[-1]:
print("checksum match for {}".format(data))
else:
print("checksum DOES NOT MATCH for {}".format(data))
prints:
checksum match for b'\x07\x94ES(\xff\xceY:'
checksum match for b'\x11\xb0\x11\x05\x03\x02\x08\x01\x08\x01\x03\x08\x03\n\x01\n\n\x01I'
not sure if it helps future readers but at least this is solved.
If you're curious, here's a direct port of the C# implementation you put in a comment:
def calculate(data):
xor = 0
for byte in data:
xor ^= byte
xor ^= 0xff
return xor
I didn't realise the last byte was in fact the checksum.

Python f.read() and Octave fread(). => Reading a binary file showing the same values

I'm reading a binary file with signal samples both in Octave and Python.
The thing is, I want to obtain the same values for both codes, which is not the case.
The binary file is basically a signal in complex format I,Q recorded as a 16bits Int.
So, based on the Octave code:
[data, cnt_data] = fread(fid, 2 * secondOfData * fs, 'int16');
and then:
data = data(1:2:end) + 1i * data(2:2:end);
It seems simple, just reading the binary data as 16 bits ints. And then creating the final array of complex numbers.
Threfore I assume that in Python I need to do as follows:
rel=int(f.read(2).encode("hex"),16)
img=int(f.read(2).encode("hex"),16)
in_clean.append(complex(rel,img))
Ok, the main problem I have is that both real and imaginary parts values are not the same.
For instance, in Octave, the first value is: -20390 - 10053i
While in Python (applying the code above), the value is: (23216+48088j)
As signs are different, the first thing I thought was that maybe the endianness of the computer that recorded the file and the one I'm using for reading the file are different. So I turned to unpack function, as it allows you to force the endian type.
I was not able to find an "int16" in the unpack documentation:
https://docs.python.org/2/library/struct.html
Therefore I went for the "i" option adding "x" (padding bytes) in order to meet the requirement of 32 bits from the table in the "struct" documentation.
So with:
struct.unpack("i","xx"+f.read(2))[0]
the result is (-1336248200-658802568j) Using
struct.unpack("<i","xx"+f.read(2))[0] provides the same result.
With:
struct.unpack(">i","xx"+f.read(2))[0]
The value is: (2021153456+2021178328j)
With:
struct.unpack(">i",f.read(2)+"xx")[0]
The value is: (1521514616-1143441288j)
With:
struct.unpack("<i",f.read(2)+"xx")[0]
The value is: (2021175386+2021185723j)
I also tried with numpy and "frombuffer":
np.frombuffer(f.read(1).encode("hex"),dtype=np.int16)
With provides: (24885+12386j)
So, any idea about what I'm doing wrong? I'd like to obtain the same value as in Octave.
What is the proper way of reading and interpreting the values in Python so I can obtain the same value as in Octave by applying fread with an'int16'?
I've been searching on the Internet for an answer for this but I was not able to find a method that provides the same value
Thanks a lot
Best regards
It looks like the binary data in your question is 5ab0bbd8. To unpack signed 16 bit integers with struct.unpack, you use the 'h' format character. From that (23216+48088j) output, it appears that the data is encoded as little-endian, so we need to use < as the first item in the format string.
from struct import unpack
data = b'\x5a\xb0\xbb\xd8'
# The wrong way
rel=int(data[:2].encode("hex"),16)
img=int(data[2:].encode("hex"),16)
c = complex(rel, img)
print c
# The right way
rel, img = unpack('<hh', data)
c = complex(rel, img)
print c
output
(23216+48088j)
(-20390-10053j)
Note that rel, img = unpack('<hh', data) will also work correctly on Python 3.
FWIW, in Python 3, you could also decode 2 bytes to a signed integer like this:
def int16_bytes_to_int(b):
n = int.from_bytes(b, 'little')
if n > 0x7fff:
n -= 0x10000
return n
The rough equivalent in Python 2 is:
def int16_bytes_to_int(b):
lo, hi = b
n = (ord(hi) << 8) + ord(lo)
if n > 0x7fff:
n -= 0x10000
return n
But having to do that subtraction to handle signed numbers is annoying, and using struct.unpack is bound to be much more efficient.

Convert I2C Sensor (DS1624) reading into number

First off, sorry for the confusing title. It's pretty late here and I wasn't able to come up with a better one.
So, I have a I2C temperature sensor that outputs the current temperature as a 16 bit word. Reading from LEFT to RIGHT, the 1st bit is the MSB and the 13th bit is the LSB, so 13 bits are payload and the last 3 bits are zeros. I want to read out that sensor with a Raspberry Pi and convert the data.
The first byte (8 bits) are the integer part of the current temperature. If and only if the temperature is negative, the two's complement of the entire word has to be built.
the second byte is the decimal part which has to be multiplied by 0.03125.
So, just a couple of examples (TEMP DIGITAL OUTPUT (Binary) / DIGITAL OUTPUT (Hex), taken from the data sheet here http://datasheets.maximintegrated.com/en/ds/DS1624.pdf)
+125˚C | 01111101 00000000 | 7D00h
+25.0625˚C | 00011001 00010000 | 1910h
+½˚C | 00000000 10000000 | 0080h
0˚C | 00000000 00000000 | 0000h
-½˚C | 11111111 10000000 | FF80h
-25.0625˚C | 11100110 11110000 | E6F0h
-55˚C | 11001001 00000000 | C900h
Because of a difference in endianness the byte order is reversed when reading the sensor, which is not a problem. For example, the first line would become 0x007D instead of 0x7D00, 0xE6F0 becomes F0E6, and so on...
However, once I build the two's complement for negative values I'm not able to come up with a correct conversion.
What I came up with (not working for negative values) is:
import smbus
import time
import logging
class TempSensor:
"""
Class to read out an DS1624 temperature sensor with a given address.
DS1624 data sheet: http://datasheets.maximintegrated.com/en/ds/DS1624.pdf
Usage:
>>> from TempSensor import TempSensor
>>> sensor = TempSensor(0x48)
>>> print "%02.02f" % sensor.get_temperature()
23.66
"""
# Some constants
DS1624_READ_TEMP = 0xAA
DS1624_START = 0xEE
DS1624_STOP = 0x22
def __init__(self, address):
self.address = address
self.bus = smbus.SMBus(0)
def __send_start(self):
self.bus.write_byte(self.address, self.DS1624_START);
def __send_stop(self):
self.bus.write_byte(self.address, self.DS1624_STOP);
def __read_sensor(self):
"""
Gets the temperature data. As the DS1624 is Big-endian and the Pi Little-endian,
the byte order is reversed.
"""
"""
Get the two-byte temperature value. The second byte (endianness!) represents
the integer part of the temperature and the first byte the fractional part in terms
of a 0.03125 multiplier.
The first byte contains the value of the 5 least significant bits. The remaining 3
bits are set to zero.
"""
return self.bus.read_word_data(self.address, self.DS1624_READ_TEMP)
def __convert_raw_to_decimal(self, raw):
# Check if temperature is negative
negative = ((raw & 0x00FF) & 0x80) == 0x80
if negative:
# perform two's complement
raw = (~raw) + 1
# Remove the fractional part (first byte) by doing a bitwise AND with 0x00FF
temp_integer = raw & 0x00FF
# Remove the integer part (second byte) by doing a bitwise AND with 0XFF00 and
# shift the result bits to the right by 8 places and another 3 bits to the right
# because LSB is the 5th bit
temp_fractional = ((raw & 0xFF00) >> 8) >> 3
return temp_integer + ( 0.03125 * temp_fractional)
def run_test(self):
logging.basicConfig(filename='debug.log', level=logging.DEBUG)
# Examples taken from the data sheet (byte order swapped)
values = [0x7D, 0x1019, 0x8000, 0, 0x80FF, 0xF0E6, 0xC9]
for value in values:
logging.debug('value: ' + hex(value) + ' result: ' + str(self.__convert_raw_to_decimal(value)))
def get_temperature(self):
self.__send_start();
time.sleep(0.1);
return self.__convert_raw_to_decimal(self.__read_sensor())
If you run the run_test() method you'll see what i mean. All negatives values are wrong.
The results I get are:
DEBUG:root:value: 0x7d result: 125.0
DEBUG:root:value: 0x1019 result: 25.0625
DEBUG:root:value: 0x8000 result: 0.5
DEBUG:root:value: 0x0 result: 0.0
DEBUG:root:value: 0x80ff result: 1.46875
DEBUG:root:value: 0xf0e6 result: 26.03125
DEBUG:root:value: 0xc9 result: 55.96875
So, I've been banging my head for hours on this one, but it seems I'm lacking the fundamentals of bit-wise operations. I believe that the problem is the masking with the logical AND when values are negative.
EDIT: There are a couple of implementations on the web. None of them works for negative temperatures. I tried it by actually putting the sensor in ice water. I haven't tried the Arduino C++ version yet, but from looking at the source code it seems it doesn't build the two's complement at all, so no negative temperatures either (https://github.com/federico-galli/Arduino-i2c-temperature-sensor-DS1624/blob/master/DS1624.cpp).
Two things, you've got your masks turned around, raw & 0x00ff is the fractional part, not the integer part, and second, this is my solution, given the inputs in your table, this seems to work for me:
import struct
def convert_temp (bytes):
raw_temp = (bytes & 0xff00) >> 8
raw_frac = (bytes & 0x00ff) >> 3
a, b = struct.unpack('bb', '{}{}'.format(chr(raw_temp), chr(raw_frac)))
return a + (0.03125 * b)
The struct module is really nifty when working with more basic data types (such as signed bytes). Hope this helps!
Edit: ignore the comment on your masks, I see my own error now. You can switch around the bytes, should be no problem.
Struct explanation:
Struct.(un)pack both take 2 arguments, the first is a string that specified the attributes of your struct (think in terms of C). In C a struct is just a bunch of bytes, with some information about their types. The second argument is the data that you need decoded (which needs to be a string, explaining the nasty format()).
I can't seem to really explain it any further, I think if you read up on the struct module, and structs in C, and realize that a struct is nothing more then a bunch of bytes, then you should be ok :).
As for the two's complement, that is the regular representation for a signed byte, so no need to convert. The problem you where having is that Python doesn't understand 8-bit integers, and signedness. For instance, you might have a signed byte 0x10101010, but if you long() that in Python, it doesn't interpret that as a signed 8-bit int. My guess is, it just puts it inside and 32-bit int, in which case the sign bit gets interpretted as just the eighth bit.
What struct.unpack('b', ...) does, is actually interpret the bits as a 8-bit signed integer. Not sure if this makes it any clearer, but I hope it helps.

Categories