On several places I've read that crc32 is additive and so: CRC(A xor B) = CRC(A) xor CRC(B).
The above statement was disproven by the following code I wrote:
import zlib
def crc32(data):
return zlib.crc32(data) & 0xffffffff
print crc32(chr(ord("A") ^ ord("B")))
print crc32("A") ^ crc32("B")
Program output:
1259060791
2567524794
Could someone provide a proper code proving this theory or point me where I've failed?
CRC is additive in the mathematical sense since the CRC hash is just a remainder value from a carryless division of all the data (treated as a giant integer) divided by the polynomial constant. Using your example, it's akin to this sort of thing:
7 mod 5 = 2
6 mod 5 = 1
(7 mod 5) + (6 mod 5) = 3
(7 + 6) mod 5 = 3
In that analogy, '5' is our CRC polynomial.
Here's an example to play with (gcc based):
#include <stdio.h>
#include <x86intrin.h>
int main(void)
{
unsigned int crc_a = __builtin_ia32_crc32si( 0, 5);
printf( "crc(5) = %08X\n", crc_a );
unsigned int crc_b = __builtin_ia32_crc32si( 0, 7);
printf( "crc(7) = %08X\n", crc_b );
unsigned int crc_xor = crc_a ^ crc_b;
printf( "crc(5) XOR crc(7) = %08X\n", crc_xor );
unsigned int crc_xor2 = __builtin_ia32_crc32si( 0, 5 ^ 7);
printf( "crc(5 XOR 7) = %08X\n", crc_xor2 );
return 0;
}
The output is as expected:
plxc15034> gcc -mcrc32 -Wall -O3 crctest.c
plxc15034> ./a.out
crc(5) = A6679B4B
crc(7) = 1900B8CA
crc(5) XOR crc(7) = BF672381
crc(5 XOR 7) = BF672381
Because this code uses the x86 CRC32 instruction, it will only run on an Intel i7 or newer. The intrinsic function takes the running CRC hash as the first parameter and the new data to accumulate as the second parameter. The return value is the new running CRC.
The initial running CRC value of 0 in the code above is critical. Using any other initial value, then CRC is not "additive" in the practical sense because you have effectively thrown away information about the integer you are dividing into. And this is exactly what's happening in your example. CRC functions never initialize that initial running CRC value to zero, but usually -1. The reason is that an initial CRC of 0 allows any number of leading 0's in the data to simply fall through without changing the running CRC value, which remains 0. So, initializing the CRC to 0 is mathematically sound, but for practical purposes of calculating hash, it's the last thing you'd want.
The CRC-32 algorithm is based on polynomial division, with some extra steps added. Pure polynomial remainder is additive.
By that, I mean: mod(poly1 + poly2, poly3) = mod(mod(poly1, poly3) + mod(poly2, poly3), poly3)
The CRC-32 algorithm builds on this, and is non-additive. To compute the CRC-32 of a byte array m:
XOR the first 4 bytes by 0xFFFFFFFF.
Treat earlier bytes as higher polynomial powers and treat lower order bits as higher polynomial powers. For example, the bytes 0x01 0x04 would be the polynomial x^15 + x^3.
Multiply the polynomial by x^32.
Take the remainder of this polynomial divided by the CRC-32 polynomial, 0x104C11DB7. The remainder polynomial has degree < 32.
Treat lower powers as higher order bits. For example, the polynomial x^2 would be the 32-bit integer 0x40000000.
XOR the result by 0xFFFFFFFF.
The pure polynomial remainder operation is in step #4. It's steps #1 and #6 that make the CRC-32 algorithm non-additive. So if you undo the effect of steps #1 and #6, then you can modify the CRC-32 algorithm to be additive.
(See also: Python CRC-32 woes)
If a, b, and c are the same length, CRC(a) xor CRC(b) xor CRC(c) equals CRC(a xor b xor c). Returning to your original formulation, it means that CRC(a xor b) equals CRC(a) xor CRC(b) xor CRC(z), where z is a sequence of zeroes the same length as the other two sequences.
This would imply that each bit position of the CRC result is only driven by the equivalent bit position in the input. Consider your example with B == 0.
The relationship you're describing is more likely to be true for some primitive xor or additive checksum algorithms.
Related
Let's say I have this number i = -6884376.
How do I refer to it as to an unsigned variable?
Something like (unsigned long)i in C.
Assuming:
You have 2's-complement representations in mind; and,
By (unsigned long) you mean unsigned 32-bit integer,
then you just need to add 2**32 (or 1 << 32) to the negative value.
For example, apply this to -1:
>>> -1
-1
>>> _ + 2**32
4294967295L
>>> bin(_)
'0b11111111111111111111111111111111'
Assumption #1 means you want -1 to be viewed as a solid string of 1 bits, and assumption #2 means you want 32 of them.
Nobody but you can say what your hidden assumptions are, though. If, for example, you have 1's-complement representations in mind, then you need to apply the ~ prefix operator instead. Python integers work hard to give the illusion of using an infinitely wide 2's complement representation (like regular 2's complement, but with an infinite number of "sign bits").
And to duplicate what the platform C compiler does, you can use the ctypes module:
>>> import ctypes
>>> ctypes.c_ulong(-1) # stuff Python's -1 into a C unsigned long
c_ulong(4294967295L)
>>> _.value
4294967295L
C's unsigned long happens to be 4 bytes on the box that ran this sample.
To get the value equivalent to your C cast, just bitwise and with the appropriate mask. e.g. if unsigned long is 32 bit:
>>> i = -6884376
>>> i & 0xffffffff
4288082920
or if it is 64 bit:
>>> i & 0xffffffffffffffff
18446744073702667240
Do be aware though that although that gives you the value you would have in C, it is still a signed value, so any subsequent calculations may give a negative result and you'll have to continue to apply the mask to simulate a 32 or 64 bit calculation.
This works because although Python looks like it stores all numbers as sign and magnitude, the bitwise operations are defined as working on two's complement values. C stores integers in twos complement but with a fixed number of bits. Python bitwise operators act on twos complement values but as though they had an infinite number of bits: for positive numbers they extend leftwards to infinity with zeros, but negative numbers extend left with ones. The & operator will change that leftward string of ones into zeros and leave you with just the bits that would have fit into the C value.
Displaying the values in hex may make this clearer (and I rewrote to string of f's as an expression to show we are interested in either 32 or 64 bits):
>>> hex(i)
'-0x690c18'
>>> hex (i & ((1 << 32) - 1))
'0xff96f3e8'
>>> hex (i & ((1 << 64) - 1)
'0xffffffffff96f3e8L'
For a 32 bit value in C, positive numbers go up to 2147483647 (0x7fffffff), and negative numbers have the top bit set going from -1 (0xffffffff) down to -2147483648 (0x80000000). For values that fit entirely in the mask, we can reverse the process in Python by using a smaller mask to remove the sign bit and then subtracting the sign bit:
>>> u = i & ((1 << 32) - 1)
>>> (u & ((1 << 31) - 1)) - (u & (1 << 31))
-6884376
Or for the 64 bit version:
>>> u = 18446744073702667240
>>> (u & ((1 << 63) - 1)) - (u & (1 << 63))
-6884376
This inverse process will leave the value unchanged if the sign bit is 0, but obviously it isn't a true inverse because if you started with a value that wouldn't fit within the mask size then those bits are gone.
Python doesn't have builtin unsigned types. You can use mathematical operations to compute a new int representing the value you would get in C, but there is no "unsigned value" of a Python int. The Python int is an abstraction of an integer value, not a direct access to a fixed-byte-size integer.
Since version 3.2 :
def unsignedToSigned(n, byte_count):
return int.from_bytes(n.to_bytes(byte_count, 'little', signed=False), 'little', signed=True)
def signedToUnsigned(n, byte_count):
return int.from_bytes(n.to_bytes(byte_count, 'little', signed=True), 'little', signed=False)
output :
In [3]: unsignedToSigned(5, 1)
Out[3]: 5
In [4]: signedToUnsigned(5, 1)
Out[4]: 5
In [5]: unsignedToSigned(0xFF, 1)
Out[5]: -1
In [6]: signedToUnsigned(0xFF, 1)
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
Input In [6], in <cell line: 1>()
----> 1 signedToUnsigned(0xFF, 1)
Input In [1], in signedToUnsigned(n, byte_count)
4 def signedToUnsigned(n, byte_count):
----> 5 return int.from_bytes(n.to_bytes(byte_count, 'little', signed=True), 'little', signed=False)
OverflowError: int too big to convert
In [7]: signedToUnsigned(-1, 1)
Out[7]: 255
Explanations : to/from_bytes convert to/from bytes, in 2's complement considering the number as one of size byte_count * 8 bits. In C/C++, chances are you should pass 4 or 8 as byte_count for respectively a 32 or 64 bit number (the int type).
I first pack the input number in the format it is supposed to be from (using the signed argument to control signed/unsigned), then unpack to the format we would like it to have been from. And you get the result.
Note the Exception when trying to use fewer bytes than required to represent the number (In [6]). 0xFF is 255 which can't be represented using a C's char type (-128 ≤ n ≤ 127). This is preferable to any other behavior.
You could use the struct Python built-in library:
Encode:
import struct
i = -6884376
print('{0:b}'.format(i))
packed = struct.pack('>l', i) # Packing a long number.
unpacked = struct.unpack('>L', packed)[0] # Unpacking a packed long number to unsigned long
print(unpacked)
print('{0:b}'.format(unpacked))
Out:
-11010010000110000011000
4288082920
11111111100101101111001111101000
Decode:
dec_pack = struct.pack('>L', unpacked) # Packing an unsigned long number.
dec_unpack = struct.unpack('>l', dec_pack)[0] # Unpacking a packed unsigned long number to long (revert action).
print(dec_unpack)
Out:
-6884376
[NOTE]:
> is BigEndian operation.
l is long.
L is unsigned long.
In amd64 architecture int and long are 32bit, So you could use i and I instead of l and L respectively.
[UPDATE]
According to the #hl037_ comment, this approach works on int32 not int64 or int128 as I used long operation into struct.pack(). Nevertheless, in the case of int64, the written code would be changed simply using long long operand (q) in struct as follows:
Encode:
i = 9223372036854775807 # the largest int64 number
packed = struct.pack('>q', i) # Packing an int64 number
unpacked = struct.unpack('>Q', packed)[0] # Unpacking signed to unsigned
print(unpacked)
print('{0:b}'.format(unpacked))
Out:
9223372036854775807
111111111111111111111111111111111111111111111111111111111111111
Next, follow the same way for the decoding stage. As well as this, keep in mind q is long long integer — 8byte and Q is unsigned long long
But in the case of int128, the situation is slightly different as there is no 16-byte operand for struct.pack(). Therefore, you should split your number into two int64.
Here's how it should be:
i = 10000000000000000000000000000000000000 # an int128 number
print(len('{0:b}'.format(i)))
max_int64 = 0xFFFFFFFFFFFFFFFF
packed = struct.pack('>qq', (i >> 64) & max_int64, i & max_int64)
a, b = struct.unpack('>QQ', packed)
unpacked = (a << 64) | b
print(unpacked)
print('{0:b}'.format(unpacked))
Out:
123
10000000000000000000000000000000000000
111100001011110111000010000110101011101101001000110110110010000000011110100001101101010000000000000000000000000000000000000
just use abs for converting unsigned to signed in python
a=-12
b=abs(a)
print(b)
Output:
12
I was looking at the source of sorted_containers and was surprised to see this line:
self._load, self._twice, self._half = load, load * 2, load >> 1
Here load is an integer. Why use bit shift in one place, and multiplication in another? It seems reasonable that bit shifting may be faster than integral division by 2, but why not replace the multiplication by a shift as well? I benchmarked the the following cases:
(times, divide)
(shift, shift)
(times, shift)
(shift, divide)
and found that #3 is consistently faster than other alternatives:
# self._load, self._twice, self._half = load, load * 2, load >> 1
import random
import timeit
import pandas as pd
x = random.randint(10 ** 3, 10 ** 6)
def test_naive():
a, b, c = x, 2 * x, x // 2
def test_shift():
a, b, c = x, x << 1, x >> 1
def test_mixed():
a, b, c = x, x * 2, x >> 1
def test_mixed_swapped():
a, b, c = x, x << 1, x // 2
def observe(k):
print(k)
return {
'naive': timeit.timeit(test_naive),
'shift': timeit.timeit(test_shift),
'mixed': timeit.timeit(test_mixed),
'mixed_swapped': timeit.timeit(test_mixed_swapped),
}
def get_observations():
return pd.DataFrame([observe(k) for k in range(100)])
The question:
Is my test valid? If so, why is (multiply, shift) faster than (shift, shift)?
I run Python 3.5 on Ubuntu 14.04.
Edit
Above is the original statement of the question. Dan Getz provides an excellent explanation in his answer.
For the sake of completeness, here are sample illustrations for larger x when multiplication optimizations do not apply.
This seems to be because multiplication of small numbers is optimized in CPython 3.5, in a way that left shifts by small numbers are not. Positive left shifts always create a larger integer object to store the result, as part of the calculation, while for multiplications of the sort you used in your test, a special optimization avoids this and creates an integer object of the correct size. This can be seen in the source code of Python's integer implementation.
Because integers in Python are arbitrary-precision, they are stored as arrays of integer "digits", with a limit on the number of bits per integer digit. So in the general case, operations involving integers are not single operations, but instead need to handle the case of multiple "digits". In pyport.h, this bit limit is defined as 30 bits on 64-bit platform, or 15 bits otherwise. (I'll just call this 30 from here on to keep the explanation simple. But note that if you were using Python compiled for 32-bit, your benchmark's result would depend on if x were less than 32,768 or not.)
When an operation's inputs and outputs stay within this 30-bit limit, the operation can be handled in an optimized way instead of the general way. The beginning of the integer multiplication implementation is as follows:
static PyObject *
long_mul(PyLongObject *a, PyLongObject *b)
{
PyLongObject *z;
CHECK_BINOP(a, b);
/* fast path for single-digit multiplication */
if (Py_ABS(Py_SIZE(a)) <= 1 && Py_ABS(Py_SIZE(b)) <= 1) {
stwodigits v = (stwodigits)(MEDIUM_VALUE(a)) * MEDIUM_VALUE(b);
#ifdef HAVE_LONG_LONG
return PyLong_FromLongLong((PY_LONG_LONG)v);
#else
/* if we don't have long long then we're almost certainly
using 15-bit digits, so v will fit in a long. In the
unlikely event that we're using 30-bit digits on a platform
without long long, a large v will just cause us to fall
through to the general multiplication code below. */
if (v >= LONG_MIN && v <= LONG_MAX)
return PyLong_FromLong((long)v);
#endif
}
So when multiplying two integers where each fits in a 30-bit digit, this is done as a direct multiplication by the CPython interpreter, instead of working with the integers as arrays. (MEDIUM_VALUE() called on a positive integer object simply gets its first 30-bit digit.) If the result fits in a single 30-bit digit, PyLong_FromLongLong() will notice this in a relatively small number of operations, and create a single-digit integer object to store it.
In contrast, left shifts are not optimized this way, and every left shift deals with the integer being shifted as an array. In particular, if you look at the source code for long_lshift(), in the case of a small but positive left shift, a 2-digit integer object is always created, if only to have its length truncated to 1 later: (my comments in /*** ***/)
static PyObject *
long_lshift(PyObject *v, PyObject *w)
{
/*** ... ***/
wordshift = shiftby / PyLong_SHIFT; /*** zero for small w ***/
remshift = shiftby - wordshift * PyLong_SHIFT; /*** w for small w ***/
oldsize = Py_ABS(Py_SIZE(a)); /*** 1 for small v > 0 ***/
newsize = oldsize + wordshift;
if (remshift)
++newsize; /*** here newsize becomes at least 2 for w > 0, v > 0 ***/
z = _PyLong_New(newsize);
/*** ... ***/
}
Integer division
You didn't ask about the worse performance of integer floor division compared to right shifts, because that fit your (and my) expectations. But dividing a small positive number by another small positive number is not as optimized as small multiplications, either. Every // computes both the quotient and the remainder using the function long_divrem(). This remainder is computed for a small divisor with a multiplication, and is stored in a newly-allocated integer object, which in this situation is immediately discarded.
Or at least, that was the case when this question was originally asked. In CPython 3.6, a fast path for small int // was added, so // now beats >> for small ints too.
How does Python hash long numbers? I guess it takes O(1) time for 32-bit ints, but the way long integers work in Python makes me think the complexity is not O(1) for them. I've looked for answers in relevant questions, but have found none straightforward enough to make me confident. Thank you in advance.
The long_hash() function indeed loops and depends on the size of the integer, yes:
/* The following loop produces a C unsigned long x such that x is
congruent to the absolute value of v modulo ULONG_MAX. The
resulting x is nonzero if and only if v is. */
while (--i >= 0) {
/* Force a native long #-bits (32 or 64) circular shift */
x = (x >> (8*SIZEOF_LONG-PyLong_SHIFT)) | (x << PyLong_SHIFT);
x += v->ob_digit[i];
/* If the addition above overflowed we compensate by
incrementing. This preserves the value modulo
ULONG_MAX. */
if (x < v->ob_digit[i])
x++;
}
where i is the 'object size', e.g. the number of digits used to represent the number, where the size of a digit depends on your platform.
Answer
Thanks to #TheDark for spotting the overflow. The new C++ solution is pretty freakin' funny, too. It's extremely redundant:
if(2*i > n && 2*i > i)
replaced the old line of code if(2*i > n).
Background
I'm doing this problem on HackerRank, though the problem may not be entirely related to this question. If you cannot see the webpage, or have to make an account and don't want to, the problem is listed in plain text below.
Question
My C++ code is timing out, but my python code is not. I first suspected this was due to overflow, but I used sizeof to be sure that unsigned long long can reach 2^64 - 1, the upper limit of the problem.
I practically translated my C++ code directly into Python to see if it was my algorithms causing the timeouts, but to my surprise my Python code passed every test case.
C++ code:
#include <iostream>
bool pot(unsigned long long n)
{
if (n % 2 == 0) return pot(n/2);
return (n==1); // returns true if n is power of two
}
unsigned long long gpt(unsigned long long n)
{
unsigned long long i = 1;
while(2*i < n) {
i *= 2;
}
return i; // returns greatest power of two less than n
}
int main()
{
unsigned int t;
std::cin >> t;
std::cout << sizeof(unsigned long long) << std::endl;
for(unsigned int i = 0; i < t; i++)
{
unsigned long long n;
unsigned long long count = 1;
std::cin >> n;
while(n > 1) {
if (pot(n)) n /= 2;
else n -= gpt(n);
count++;
}
if (count % 2 == 0) std::cout << "Louise" << std::endl;
else std::cout << "Richard" << std::endl;
}
}
Python 2.7 code:
def pot(n):
while n % 2 == 0:
n/=2
return n==1
def gpt(n):
i = 1
while 2*i < n:
i *= 2
return i
t = int(raw_input())
for i in range(t):
n = int(raw_input())
count = 1
while n != 1:
if pot(n):
n /= 2
else:
n -= gpt(n)
count += 1
if count % 2 == 0:
print "Louise"
else:
print "Richard"
To me, both versions look identical. I still think I'm somehow being fooled and am actually getting overflow, causing timeouts, in my C++ code.
Problem
Louise and Richard play a game. They have a counter is set to N. Louise gets the first turn and the turns alternate thereafter. In the game, they perform the following operations.
If N is not a power of 2, they reduce the counter by the largest power of 2 less than N.
If N is a power of 2, they reduce the counter by half of N.
The resultant value is the new N which is again used for subsequent operations.
The game ends when the counter reduces to 1, i.e., N == 1, and the last person to make a valid move wins.
Given N, your task is to find the winner of the game.
Input Format
The first line contains an integer T, the number of testcases.
T lines follow. Each line contains N, the initial number set in the counter.
Constraints
1 ≤ T ≤ 10
1 ≤ N ≤ 2^64 - 1
Output Format
For each test case, print the winner's name in a new line. So if Louise wins the game, print "Louise". Otherwise, print "Richard". (Quotes are for clarity)
Sample Input
1
6
Sample Output
Richard
Explanation
As 6 is not a power of 2, Louise reduces the largest power of 2 less than 6 i.e., 4, and hence the counter reduces to 2.
As 2 is a power of 2, Richard reduces the counter by half of 2 i.e., 1. Hence the counter reduces to 1.
As we reach the terminating condition with N == 1, Richard wins the game.
When n is greater than 2^63, your gpt function will eventually have i as 2^63 and then multiply 2^63 by 2, giving an overflow and a value of 0. This will then end up with an infinite loop, multiplying 0 by 2 each time.
Try this bit-twiddling hack, which is probably slightly faster:
unsigned long largest_power_of_two_not_greater_than(unsigned long x) {
for (unsigned long y; (y = x & (x - 1)); x = y) {}
return x;
}
x&(x-1) is x without its least significant one-bit. So y will be zero (terminating the loop) exactly when x has been reduced to a power of two, which will be the largest power of two not greater than the original x. The loop is executed once for every 1-bit in x, which is on average half as many iterations as your approach. Also, this one has not issues with overflow. (It does return 0 if the original x was 0. That may or may not be what you want.)
Note the if the original x was a power of two, that value is simply returned immediately. So the function doubles as a test whether x is a power of two (or 0).
While that is fun and all, in real-life code you'd probably be better off finding your compiler's equivalent to this gcc built-in (unless your compiler is gcc, in which case here it is):
Built-in Function: int __builtin_clz (unsigned int x)
Returns the number of leading 0-bits in X, starting at the most
significant bit position. If X is 0, the result is undefined.
(Also available as __builtin_clzl for unsigned long arguments and __builtin_clzll for unsigned long long.)
I want to write extendible hashing. On wiki I have found good implementation in python. But this code uses least significant bits, so when I have hash 1101 for d = 1 value is 1 and for d = 2 value is 01. I would like to use most significant bits. For exmaple: hash 1101, d = 1 value is 1, d = 2 value is 11. Is there any simple way to do that? I tried, but I can't.
Do you understand why it uses the least significant bits?
More or less. It makes efficient when we using arrays. Ok so for hash function I would like to use four least bits from 4-bytes integer but from left to right.
h = hash(k)
h = h & 0xf #use mask to get four least bits
p = self.pp[ h >> ( 4 - GD)]
And it doesn't work, and I don't know why.
Computing a hash using the least significant bits is the fastest way to compute a hash, because it only requires an AND bitwise operation. This makes it very popular.
Here is an implemetation (in C) for a hash using the most significant bits. Since there is no direct way to know the most significant bit, it repeatedly tests that the remaining value has only the specified amount of bits.
int significantHash(int value, int bits) {
int mask = (1 << bits) - 1;
while (value > mask) {
value >>= 1;
}
return value;
}
I recommend the overlapping hash, that makes use of all the bits of the number. Essentially, it cuts the number in parts of equal number of bits and XORs them. It runs slower than the least significant hash, but faster than the significant hash. Above all else, it offers a better dispersion than the other two methods, making it a better candidate when the numbers that must be hashed have a certain bit-related-pattern.
int overlappingHash(int value, int bits) {
int mask = (1 << bits) - 1;
int answer = 0;
do {
answer ^= (value & mask);
value >>= bits;
} while (value > 0);
return answer;
}