Calculate/validate bz2 (bzip2) CRC32 in Python - python

I'm trying to calculate/validate the CRC32 checksums for compressed bzip2 archives.
.magic:16 = 'BZ' signature/magic number
.version:8 = 'h' for Bzip2 ('H'uffman coding)
.hundred_k_blocksize:8 = '1'..'9' block-size 100 kB-900 kB
.compressed_magic:48 = 0x314159265359 (BCD (pi))
.crc:32 = checksum for this block
...
...
.eos_magic:48 = 0x177245385090 (BCD sqrt(pi))
.crc:32 = checksum for whole stream
.padding:0..7 = align to whole byte
http://en.wikipedia.org/wiki/Bzip2
So I know where the CRC checksums are in a bz2 file, but how would I go about validating them. What chunks should I binascii.crc32() to get both CRCs? I've tried calculating the CRC of various chunks, byte-by-byte, but have not managed to get a match.
Thank you. I'll be looking into the bzip2 sources and bz2 Python library code, to maybe find something, especially in the decompress() method.
Update 1:
The block headers are identified by the following tags as far as I can see. But tiny bz2 files do not contain the ENDMARK ones. (Thanks to adw, we've found out that one should look for bit shifted values of the ENDMARK, since the compressed data is not padded to bytes.)
#define BLOCK_HEADER_HI 0x00003141UL
#define BLOCK_HEADER_LO 0x59265359UL
#define BLOCK_ENDMARK_HI 0x00001772UL
#define BLOCK_ENDMARK_LO 0x45385090UL
This is from the bzlib2recover.c source, blocks seem to start always at bit 80, right before the CRC checksum, which should be omitted from the CRC calculation, as one can't CRC its own CRC to be the same CRC (you get my point).
searching for block boundaries ...
block 1 runs from 80 to 1182
Looking into the code that calculates this.
Update 2:
bzlib2recover.c does not have the CRC calculating functions, it just copies the CRC from the damaged files. However, I did manage to replicate the block calculator functionality in Python, to mark out the starting and ending bits of each block in a bz2 compressed file. Back on track, I have found that compress.c refers to some of the definitions in bzlib_private.h.
#define BZ_INITIALISE_CRC(crcVar) crcVar = 0xffffffffL;
#define BZ_FINALISE_CRC(crcVar) crcVar = ~(crcVar);
#define BZ_UPDATE_CRC(crcVar,cha) \
{ \
crcVar = (crcVar << 8) ^ \
BZ2_crc32Table[(crcVar >> 24) ^ \
((UChar)cha)]; \
}
These definitions are accessed by bzlib.c as well, s->blockCRC is initialized and updated in bzlib.c and finalized in compress.c. There's more than 2000 lines of C code, which will take some time to look through and figure out what goes in and what does not. I'm adding the C tag to the question as well.
By the way, here are the C sources for bzip2 http://www.bzip.org/1.0.6/bzip2-1.0.6.tar.gz
Update 3:
Turns out bzlib2 block CRC32 is calculated using the following algorithm:
dataIn is the data to be encoded.
crcVar = 0xffffffff # Init
for cha in list(dataIn):
crcVar = crcVar & 0xffffffff # Unsigned
crcVar = ((crcVar << 8) ^ (BZ2_crc32Table[(crcVar >> 24) ^ (ord(cha))]))
return hex(~crcVar & 0xffffffff)[2:-1].upper()
Where BZ2_crc32Table is defined in crctable.c
For dataIn = "justatest" the CRC returned is 7948C8CB, having compressed a textfile with that data, the crc:32 checksum inside the bz2 file is 79 48 c8 cb which is a match.
Conclusion:
bzlib2 CRC32 is (quoting crctable.c)
Vaguely derived from code by Rob
Warnock, in Section 51 of the
comp.compression FAQ...
...thus, as far as I understand, cannot be precalculated/validated using standard CRC32 checksum calculators, but rather require the bz2lib implementation (lines 155-172 in bzlib_private.h).

The following is the CRC algorithm used by bzip2, written in Python:
crcVar = 0xffffffff # Init
for cha in list(dataIn):
crcVar = crcVar & 0xffffffff # Unsigned
crcVar = ((crcVar << 8) ^ (BZ2_crc32Table[(crcVar >> 24) ^ (ord(cha))]))
return hex(~crcVar & 0xffffffff)[2:-1].upper()
(C code definitions can be found on lines 155-172 in bzlib_private.h)
BZ2_crc32Table array/list can be found in crctable.c from the bzip2 source code. This CRC checksum algorithm is, quoting: "..vaguely derived from code by Rob Warnock, in Section 51 of the comp.compression FAQ..." (crctable.c)
The checksums are calculated over the uncompressed data.
Sources can be downloaded here: http://www.bzip.org/1.0.6/bzip2-1.0.6.tar.gz

To add onto the existing answer, there is a final checksum at the end of the stream (The one after eos_magic) It functions as a checksum for all the individual Huffman block checksums. It is initialized to zero. It is updated every time you have finished validating an existing Huffman block checksum. To update it, do as follows:
crc: u32 = # latest validated Huffman block CRC
ccrc: u32 = # current combined checksum
ccrc = (ccrc << 1) | (ccrc >> 31);
ccrc ^= crc;
In the end, validate the value of ccrc against the 32-bit unsigned value you read from the compressed file.

Check the fastcrc python library which have bzip2 crc32 implementation.
https://fastcrc.readthedocs.io/en/latest/#fastcrc.crc32.bzip2

Related

CRC value calculation

I'm working on a communication command protocol between a PLC and a 3rd party device.
The manufacturer has provided me with the following information for calculating the CRC values that will change depending on the address of the device I wish to read information from.
A CRC is performed on a block of data, for example the first seven bytes of all transmissions are followed by a two byte CRC for that data. This CRC will be virtually unique for that particular combination of bytes. The process of calculating the CRC follows:
Inputs:
N.BYTES = Number or data bytes to CRC ( maximum 64 bytes )
DATA() = An array of data bytes of quantity N.BYTES
CRC.MASK = 0xC9DA a hexadecimal constant used in the process
Outputs:
CRC = two byte code redundancy check made up of CRC1 (High byte) and CRC2 (Low byte)
Process:
START
CRC = 0xFFFF
FOR N = 1 TO N.BYTES
CRC = CRC XOR ( DATA(N) AND 0xFF )
FOR I = 1 TO 8
IF ( CRC AND 0x0001 ) = 0 THEN LSR CRC
ELSE LSR CRC ; CRC = CRC XOR CRC.MASK
NEXT I
NEXT N
X = CRC1 ; Change the two bytes in CRC around
CRC1 = CRC2
CRC2 = X
END
They also provided me with a couple of complete command strings for the first few device addresses.
RTU #1
05-64-00-02-10-01-00-6C-4B-53-45-EB-F7
RTU #2
05-64-00-02-10-02-00-1C-AE-53-45-EB-F7
RTU #3
05-64-00-02-10-03-00-CC-F2-53-45-EB-F7
The header CRC bytes in the previous three commands are 6C-4B, 1C-AE, and CC-F2 respectively.
I calculated out a the first few lines by hand to have something to compare against when I wrote out the following code in Python.
byte1 = 05
byte2 = 100
byte3 = 00
byte4 = 02
byte5 = 16
byte6 = 01
byte7 = 00
byte8 = 00
mask = 51674
hexarray = [byte1, byte2, byte3, byte4, byte5, byte6, byte7, byte8]
#print hexarray
CRCdata = 65535
for n in hexarray:
CRCdata = CRCdata ^ (n & 255)
print(n, CRCdata)
for i in range(1,8):
if (CRCdata & 1) == 0:
CRCdata = CRCdata >> 1
# print 'if'
else:
CRCdata = CRCdata >> 1
CRCdata = CRCdata ^ mask
# print 'else'
print(i, CRCdata)
print CRCdata
I added byte8 due to some research I did mentioning that an extra byte of 0s needs to be added to the end of the CRC array for calculations. I converted the final result and did the byte swap manually. The problem I've been running into, is that my CRC calculations, whether I keep byte8 or not, are not matching up with any of the three examples that have been provided.
I'm not quite sure where I am going wrong on this and any help would be greatly appreciated.
I was able to solve the issue by updating the code to range(0,8) and dropping byte8.

Incorrect CRC calculation in protocol. One is implemented using zlib and the other one is calculated in function

I am implementing a protocol in an STM32F412 board. It's almost done, I just need to do a CRC check for the received data.
I tried using the internal CRC module for calculating the CRC but I could not match the result to any online CRC algorithm online, so I decided to do a simple implementation of the Ethernet CRC.
static const uint32_t crc32_tab[] =
{
0x00000000L, 0x77073096L, 0xee0e612cL, 0x990951baL, 0x076dc419L,
0x706af48fL, 0xe963a535L, 0x9e6495a3L, 0x0edb8832L, 0x79dcb8a4L,
0xe0d5e91eL, 0x97d2d988L, 0x09b64c2bL, 0x7eb17cbdL, 0xe7b82d07L,
0x90bf1d91L, 0x1db71064L, 0x6ab020f2L, 0xf3b97148L, 0x84be41deL,
0x1adad47dL, 0x6ddde4ebL, 0xf4d4b551L, 0x83d385c7L, 0x136c9856L,
0x646ba8c0L, 0xfd62f97aL, 0x8a65c9ecL, 0x14015c4fL, 0x63066cd9L,
0xfa0f3d63L, 0x8d080df5L, 0x3b6e20c8L, 0x4c69105eL, 0xd56041e4L,
0xa2677172L, 0x3c03e4d1L, 0x4b04d447L, 0xd20d85fdL, 0xa50ab56bL,
0x35b5a8faL, 0x42b2986cL, 0xdbbbc9d6L, 0xacbcf940L, 0x32d86ce3L,
0x45df5c75L, 0xdcd60dcfL, 0xabd13d59L, 0x26d930acL, 0x51de003aL,
0xc8d75180L, 0xbfd06116L, 0x21b4f4b5L, 0x56b3c423L, 0xcfba9599L,
0xb8bda50fL, 0x2802b89eL, 0x5f058808L, 0xc60cd9b2L, 0xb10be924L,
0x2f6f7c87L, 0x58684c11L, 0xc1611dabL, 0xb6662d3dL, 0x76dc4190L,
0x01db7106L, 0x98d220bcL, 0xefd5102aL, 0x71b18589L, 0x06b6b51fL,
0x9fbfe4a5L, 0xe8b8d433L, 0x7807c9a2L, 0x0f00f934L, 0x9609a88eL,
0xe10e9818L, 0x7f6a0dbbL, 0x086d3d2dL, 0x91646c97L, 0xe6635c01L,
0x6b6b51f4L, 0x1c6c6162L, 0x856530d8L, 0xf262004eL, 0x6c0695edL,
0x1b01a57bL, 0x8208f4c1L, 0xf50fc457L, 0x65b0d9c6L, 0x12b7e950L,
0x8bbeb8eaL, 0xfcb9887cL, 0x62dd1ddfL, 0x15da2d49L, 0x8cd37cf3L,
0xfbd44c65L, 0x4db26158L, 0x3ab551ceL, 0xa3bc0074L, 0xd4bb30e2L,
0x4adfa541L, 0x3dd895d7L, 0xa4d1c46dL, 0xd3d6f4fbL, 0x4369e96aL,
0x346ed9fcL, 0xad678846L, 0xda60b8d0L, 0x44042d73L, 0x33031de5L,
0xaa0a4c5fL, 0xdd0d7cc9L, 0x5005713cL, 0x270241aaL, 0xbe0b1010L,
0xc90c2086L, 0x5768b525L, 0x206f85b3L, 0xb966d409L, 0xce61e49fL,
0x5edef90eL, 0x29d9c998L, 0xb0d09822L, 0xc7d7a8b4L, 0x59b33d17L,
0x2eb40d81L, 0xb7bd5c3bL, 0xc0ba6cadL, 0xedb88320L, 0x9abfb3b6L,
0x03b6e20cL, 0x74b1d29aL, 0xead54739L, 0x9dd277afL, 0x04db2615L,
0x73dc1683L, 0xe3630b12L, 0x94643b84L, 0x0d6d6a3eL, 0x7a6a5aa8L,
0xe40ecf0bL, 0x9309ff9dL, 0x0a00ae27L, 0x7d079eb1L, 0xf00f9344L,
0x8708a3d2L, 0x1e01f268L, 0x6906c2feL, 0xf762575dL, 0x806567cbL,
0x196c3671L, 0x6e6b06e7L, 0xfed41b76L, 0x89d32be0L, 0x10da7a5aL,
0x67dd4accL, 0xf9b9df6fL, 0x8ebeeff9L, 0x17b7be43L, 0x60b08ed5L,
0xd6d6a3e8L, 0xa1d1937eL, 0x38d8c2c4L, 0x4fdff252L, 0xd1bb67f1L,
0xa6bc5767L, 0x3fb506ddL, 0x48b2364bL, 0xd80d2bdaL, 0xaf0a1b4cL,
0x36034af6L, 0x41047a60L, 0xdf60efc3L, 0xa867df55L, 0x316e8eefL,
0x4669be79L, 0xcb61b38cL, 0xbc66831aL, 0x256fd2a0L, 0x5268e236L,
0xcc0c7795L, 0xbb0b4703L, 0x220216b9L, 0x5505262fL, 0xc5ba3bbeL,
0xb2bd0b28L, 0x2bb45a92L, 0x5cb36a04L, 0xc2d7ffa7L, 0xb5d0cf31L,
0x2cd99e8bL, 0x5bdeae1dL, 0x9b64c2b0L, 0xec63f226L, 0x756aa39cL,
0x026d930aL, 0x9c0906a9L, 0xeb0e363fL, 0x72076785L, 0x05005713L,
0x95bf4a82L, 0xe2b87a14L, 0x7bb12baeL, 0x0cb61b38L, 0x92d28e9bL,
0xe5d5be0dL, 0x7cdcefb7L, 0x0bdbdf21L, 0x86d3d2d4L, 0xf1d4e242L,
0x68ddb3f8L, 0x1fda836eL, 0x81be16cdL, 0xf6b9265bL, 0x6fb077e1L,
0x18b74777L, 0x88085ae6L, 0xff0f6a70L, 0x66063bcaL, 0x11010b5cL,
0x8f659effL, 0xf862ae69L, 0x616bffd3L, 0x166ccf45L, 0xa00ae278L,
0xd70dd2eeL, 0x4e048354L, 0x3903b3c2L, 0xa7672661L, 0xd06016f7L,
0x4969474dL, 0x3e6e77dbL, 0xaed16a4aL, 0xd9d65adcL, 0x40df0b66L,
0x37d83bf0L, 0xa9bcae53L, 0xdebb9ec5L, 0x47b2cf7fL, 0x30b5ffe9L,
0xbdbdf21cL, 0xcabac28aL, 0x53b39330L, 0x24b4a3a6L, 0xbad03605L,
0xcdd70693L, 0x54de5729L, 0x23d967bfL, 0xb3667a2eL, 0xc4614ab8L,
0x5d681b02L, 0x2a6f2b94L, 0xb40bbe37L, 0xc30c8ea1L, 0x5a05df1bL,
0x2d02ef8dL
};
uint32_t calc_crc_calculate(uint8_t *pData, uint32_t uLen)
{
uint32_t val = 0xFFFFFFFFU;
int i;
for(i = 0; i < uLen; i++) {
val = crc32_tab[(val ^ pData[i]) & 0xFF] ^ ((val >> 8) & 0x00FFFFFF);
}
return val^0xFFFFFFFF;
}
I calculated the crc of 0x6F and compared the result to the online calculators and it apparently matches.
When I try to test the protocol with my python code I'm just unable to match the CRCs. On python I'm using the following code:
d = 0x6f
crc = zlib.crc32(bytes(d))&0xFFFFFFFF
I'm now unable to tell which is right. Apparently my algorithm is OK because it matches the online calculator. BUT those online calculators do not seem to be reliable sometimes and I doubt that python's zlib implementation is wrong .. I may be using it wrong at worst.
Actually you can compute the Ethernet CRC32 with the builtin module of the STM32. It took me quite a while to make it match up as well.
This code should match up for sizes divisible by 4 (I also used python zlib on the other end):
#include "stm32l4xx_hal.h"
uint32_t CRC32_Compute(const uint32_t *data, size_t sizeIn32BitWords)
{
CRC_HandleTypeDef hcrc = {
.Instance = CRC,
.Init.DefaultPolynomialUse = DEFAULT_POLYNOMIAL_ENABLE,
.Init.DefaultInitValueUse = DEFAULT_INIT_VALUE_ENABLE,
.Init.InputDataInversionMode = CRC_INPUTDATA_INVERSION_WORD,
.Init.OutputDataInversionMode = CRC_OUTPUTDATA_INVERSION_ENABLE,
.InputDataFormat = CRC_INPUTDATA_FORMAT_WORDS,
};
HAL_StatusTypeDef status = HAL_CRC_Init(&hcrc);
assert (status == HAL_OK)
uint32_t checksum = HAL_CRC_Calculate(&hcrc, data, sizeIn32BitWords);
uint32_t checksumInverted = ~checksum;
return checksumInverted;
}
The challenge with sizes not divisible by 4 is to get the "inversion/reversal" (changing the bit order) right. There is an example how the hardware handles this in the "RM0394 Reference manual STM32L43xxx STM32L44xxx STM32L45xxx STM32L46xxx advanced ARM®-based 32-bit MCUs Rev 3" on page 333.
The essence is that reversal reverses the bit order. For CRC32 this reversal must happen on the word level, i.e. over 32 bits.
Ok. It certainly was a bug on my part. But it was happening in my python code.
I suddenly realized that I was practically doing bytes(0x6F) which just creates an array with 111 positions.
What I actually needed to do was
import struct
d = pack('B', 0x6F)
crc = zlib.crc32(bytes(d))&0xFFFFFFFF
This question could have been avoided had I just done a little bit of rubber duck debugging. Hopefuly this will help someone else.

Arduino to Raspberry crc32 check

I'm trying to send messages through the serial USB interface of my Arduino (C++) to a Raspberry Pi (Python).
On the Arduino side I define a struct which I then copy into a char[]. The last part of the struct contains a checksum that I want to calculate using CRC32. I copy the struct into a temporary char array -4 bytes to strip the checksum field. The checksum is then calculated using the temporary array and the result is added to the struct. The struct is then copied into byteMsg which gets send over the serial connection.
On the raspberry end I do the reverse, I receive the bytestring and calculate the checksum over the message - 4 bytes. Then unpack the bytestring and compare the received and calculated checksum but this fails unfortunately.
For debugging I compared the crc32 check on both the python and arduino for the string "Hello World" and they generated the same checksum so doesn't seem to be a problem with the library. The raspberry is also able to decode the rest of the message just fine so the unpacking of the data into variables seem to be ok as well.
Any help would be much appreciated.
The Python Code:
def unpackMessage(self, message):
""" Processes a received byte string from the arduino """
# Unpack the received message into struct
(messageID, acknowledgeID, module, commandType,
data, recvChecksum) = struct.unpack('<LLBBLL', message)
# Calculate the checksum of the recv message minus the last 4
# bytes that contain the sent checksum
calcChecksum = crc32(message[:-4])
if recvChecksum == calcChecksum:
print "Checksum checks out"
The Aruino crc32 library taken from http://excamera.com/sphinx/article-crc.html
crc32.h
#include <avr/pgmspace.h>
static PROGMEM prog_uint32_t crc_table[16] = {
0x00000000, 0x1db71064, 0x3b6e20c8, 0x26d930ac,
0x76dc4190, 0x6b6b51f4, 0x4db26158, 0x5005713c,
0xedb88320, 0xf00f9344, 0xd6d6a3e8, 0xcb61b38c,
0x9b64c2b0, 0x86d3d2d4, 0xa00ae278, 0xbdbdf21c
};
unsigned long crc_update(unsigned long crc, byte data)
{
byte tbl_idx;
tbl_idx = crc ^ (data >> (0 * 4));
crc = pgm_read_dword_near(crc_table + (tbl_idx & 0x0f)) ^ (crc >> 4);
tbl_idx = crc ^ (data >> (1 * 4));
crc = pgm_read_dword_near(crc_table + (tbl_idx & 0x0f)) ^ (crc >> 4);
return crc;
}
unsigned long crc_string(char *s)
{
unsigned long crc = ~0L;
while (*s)
crc = crc_update(crc, *s++);
crc = ~crc;
return crc;
}
Main Arduino Sketch
struct message_t {
unsigned long messageID;
unsigned long acknowledgeID;
byte module;
byte commandType;
unsigned long data;
unsigned long checksum;
};
void sendMessage(message_t &msg)
{
// Set the messageID
msg.messageID = 10;
msg.checksum = 0;
// Copy the message minus the checksum into a char*
// Then perform the checksum on the message and copy
// the full msg into byteMsg
char byteMsgForCrc32[sizeof(msg)-4];
memcpy(byteMsgForCrc32, &msg, sizeof(msg)-4);
msg.checksum = crc_string(byteMsgForCrc32);
char byteMsg[sizeof(msg)];
memcpy(byteMsg, &msg, sizeof(msg));
Serial.write(byteMsg, sizeof(byteMsg));
void loop() {
message_t msg;
msg.module = 0x31;
msg.commandType = 0x64;
msg.acknowledgeID = 0;
msg.data = 10;
sendMessage(msg);
Kind Regards,
Thiezn
You are making the classic struct-to-network/serial/insert communication layer mistake. Structs have hidden padding in order to align the members onto suitable memory boundaries. This is not guaranteed to be the same across different computers, let alone different CPUs/microcontrollers.
Take this struct as an example:
struct Byte_Int
{
int x;
char y;
int z;
}
Now on a basic 32-bit x86 CPU you have a 4-byte memory boundary. Meaning that variables are aligned to either 4 bytes, 2 bytes or not at all according to the type of variable. The example would look like this in memory: int x on bytes 0,1,2,3, char y on byte 4, int z on bytes 8,9,10,11. Why not use the three bytes on the second line? Because then the memory controller would have to do two fetches to get a single number! A controller can only read one line at a time. So, structs (and all other kinds of data) have hidden padding in order to get variables aligned in memory. The example struct would have a sizeof() of 12, and not 9!
Now, how does that relate to your problem? You are memcpy()ing a struct directly into a buffer, including the padding. The computer on the other end doesn't know about this padding and misinterprets the data. What you need a serialization function that takes the members of your structs and pasts them into a buffer one at a time, that way you lose the padding and end up with something like this:
[0,1,2,3: int x][4: char y][5,6,7,8: int z]. All as one lengthy bytearray/string which can be safely sent using Serial(). Of course on the other end you would have to parse this string into intelligible data. Python's unpack() does this for you as long as you give the right format string.
Lastly, an int on an Arduino is 16 bits long. On a pc generally 4! So assemble your unpack format string with care.
The char array I was passing to the crc_string function contained '\0' characters. The crc_string was iterating through the array until it found a '\0' which shouldn't happen in this case since I was using the char array as a stream of bytes to be sent over a serial connection.
I've changed the crc_string function to take the array size as argument and iterate through the array using that value. This solved the issue.
Here's the new function
unsigned long crc_string(char *s, size_t arraySize)
{
unsigned long crc = ~0L;
for (int i=0; i < arraySize; i++) {
crc = crc_update(crc, s[i]);
}
crc = ~crc;
return crc;
}

How to byte-swap a 32-bit integer in python?

Take this example:
i = 0x12345678
print("{:08x}".format(i))
# shows 12345678
i = swap32(i)
print("{:08x}".format(i))
# should print 78563412
What would be the swap32-function()? Is there a way to byte-swap an int in python, ideally with built-in tools?
One method is to use the struct module:
def swap32(i):
return struct.unpack("<I", struct.pack(">I", i))[0]
First you pack your integer into a binary format using one endianness, then you unpack it using the other (it doesn't even matter which combination you use, since all you want to do is swap endianness).
Big endian means the layout of a 32 bit int has the most significant byte first,
e.g. 0x12345678 has the memory layout
msb lsb
+------------------+
| 12 | 34 | 56 | 78|
+------------------+
while on little endian, the memory layout is
lsb msb
+------------------+
| 78 | 56 | 34 | 12|
+------------------+
So you can just convert between them with some bit masking and shifting:
def swap32(x):
return (((x << 24) & 0xFF000000) |
((x << 8) & 0x00FF0000) |
((x >> 8) & 0x0000FF00) |
((x >> 24) & 0x000000FF))
From python 3.2 you can define function swap32() as the following:
def swap32(x):
return int.from_bytes(x.to_bytes(4, byteorder='little'), byteorder='big', signed=False)
It uses array of bytes to represent the value and reverses order of bytes by changing endianness during conversion back to integer.
Maybe simpler use the socket library.
from socket import htonl
swapped = htonl (i)
print (hex(swapped))
that's it.
this library also works in the other direction with ntohl
The array module provides a byteswap() method for fixed sized items.
The array module appears to be in versions back to Python 2.7
array.byteswap()
“Byteswap” all items of the array. This is only supported for values which are 1, 2, 4, or 8 bytes in size;
Along with the fromfile() and tofile() methods, this module is quite easy to use:
import array
# Open a data file.
input_file = open( 'my_data_file.bin' , 'rb' )
# Create an empty array of unsigned 4-byte integers.
all_data = array.array( 'L' )
# Read all the data from the file.
all_data.fromfile( input_file , 16000 ) # assumes the size of the file
# Swap the bytes in all the data items.
all_data.byteswap( )
# Write all the data to a new file.
output_file = open( filename[:-4] + '.new' , 'wb' ) # assumes a three letter extension
all_data.tofile( output_file )
# Close the files.
input_file.close( )
output_file_close( )
The above code worked for me since I have fixed-size data files. There are more Pythonic ways to handle variable length files.

Base64 and non standard

I try to create a python client for bacula, but I have some problem with the authentication.
The algorithm is :
import hmac
import base64
import re
...
challenge = re.search("auth cram-md5 ()", data)
#exemple ''
passwd = 'b489c90f3ee5b3ca86365e1bae27186e'
hm = hmac.new(passwd, challenge).digest()
rep = base64.b64encode(hm).strp().rstrip('=')
#result with python : 9zKE3VzYQ1oIDTpBuMMowQ
#result with bacula client : 9z+E3V/YQ1oIDTpBu8MowB'
There's a way more simple than port the bacula's implemenation of base 64?
int
bin_to_base64(char *buf, int buflen, char *bin, int binlen, int compatible)
{
uint32_t reg, save, mask;
int rem, i;
int j = 0;
reg = 0;
rem = 0;
buflen--; /* allow for storing EOS */
for (i=0; i >= (rem - 6);
if (j
To verify your CRAM-MD5 implementation, it is best to use some simple test vectors and check combinations of (challenge, password, username) inputs against the expected output.
Here's one example (from http://blog.susam.in/2009/02/auth-cram-md5.html):
import hmac
username = 'foo#susam.in'
passwd = 'drowssap'
encoded_challenge = 'PDc0NTYuMTIzMzU5ODUzM0BzZGNsaW51eDIucmRzaW5kaWEuY29tPg=='
challenge = encoded_challenge.decode('base64')
digest = hmac.new(passwd, challenge).hexdigest()
response = username + ' ' + digest
encoded_response = response.encode('base64')
print encoded_response
# Zm9vQHN1c2FtLmluIDY2N2U5ZmE0NDcwZGZmM2RhOWQ2MjFmZTQwNjc2NzIy
That said, I've certainly found examples on the net where the response generated by the above code differs from the expected response stated on the relevant site, so I'm still not entirely clear as to what is happening in those cases.
I HAVE CRACKED THIS.
I ran into exactly the same problem you did, and have just spent about 4 hours identifying the problem, and reimplementing it.
The problem is the Bacula's base64 is BROKEN, AND WRONG!
There are two problems with it:
The first is that the incoming bytes are treated as signed, not unsigned. The effect of this is that, if a byte has the highest bit set (>127), then it is treated as a negative number; when it is combined with the "left over" bits from previous bytes are all set to (binary 1).
The second is that, after b64 has processed all the full 6-bit output blocks, there may be 0, 2 or 4 bits left over (depending on input block modulus 3). The standard Base64 way to handle this is to multiply the remaining bits, so they are the HIGHEST bits in the last 6-bit block, and process them - Bacula leaves them as the LOWEST bits.
Note that some versions of Bacula may accept both the "Bacula broken base64 encoding" and the standard ones, for incoming authentication; they seem to use the broken one for their authentication.
def bacula_broken_base64(binarystring):
b64_chars="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
remaining_bit_count=0
remaining_bits=0
output=""
for inputbyte in binarystring:
inputbyte=ord(inputbyte)
if inputbyte>127:
# REPRODUCING A BUG! set all the "remaining bits" to 1.
remaining_bits=(1 << remaining_bit_count) - 1
remaining_bits=(remaining_bits<<8)+inputbyte
remaining_bit_count+=8
while remaining_bit_count>=6:
# clean up:
remaining_bit_count-=6
new64=(remaining_bits>>remaining_bit_count) & 63 # 6 highest bits
output+=b64_chars[new64]
remaining_bits&=(1 << remaining_bit_count) - 1
if remaining_bit_count>0:
output+=b64_chars[remaining_bits]
return output
I realize it's been 6 years since you asked, but perhaps someone else will find this useful.

Categories