Researching pattern identification requires to identify the repeating patterns in binary representations of fractions of rational numbers. bin(2**24/n) strips the leading zeros, e.g. bin(2**24/11) -> 0b101110100010111010001 instead of 0b000101110100010111010001. The number of leading zeros is variable of course. The obvious pattern here is 0001011101...
I am a nubee with Python still on the learning curve. Is there a Python-appropriate way to approach this?
This can be done with string formatting, in 2.6+:
>>> '{0:024b}'.format(23)
'000000000000000000010111'
You might find the bitstring module useful if you have more advanced needs than string formatting can provide.
>>> from bitstring import BitArray
>>> a = BitArray(24) # 24 zero bits
>>> a.uint = 2**24/11 # set the unsigned integer propery
>>> a.bin # get the binary propery
'000101110100010111010001'
It will never cut off leading zero bits, and can do a few other useful tricks
>>> a.uint /= 2
>>> a.bin
'000010111010001011101000'
>>> list(a.findall('0b1011'))
[4, 14]
>>> a *= 2 # concatenation
>>> a.bin
'000010111010001011101000000010111010001011101000'
>>> a.replace('0b00001', '0xe')
2 # 2 replacements made
>>> a.bin
'1110011101000101110100011100111010001011101000'
I'm not sure of what your exact needs are, so all this could be overkill and you might not want to use an external library in any case, but Python in-built support for bit arrays is a little basic.
Related
I am going over the description of Sha-512. It is mentioned that the initial hash value consists of the sequence of 64-bit words that are obtained by taking the fractional part of the first eight primes. I am trying to replicate these values in Python, but I am not getting the same results. To include more digits, I am using the mpmath library.
from mpmath import *
mp.dps = 50
sqrt(2)
# mpf('1.4142135623730950488016887242096980785696718753769468')
mpf(0.4142135623730950488016887242096980785696718753769468 * 2 ** 64)
# mpf('7640891576956012544.0')
hex(7640891576956012544)
# '0x6a09e667f3bcc800'
However, the description indicates this value must be 6a09e667f3bcc908. As it can be seen, the result I get differs in the last three digits from what I should be getting according to the description. I was wondering why that is the case, and what is the correct approach.
I have come across a similar question, but adjusting it for 64-bit word would yield:
import math
hex(int(math.modf(math.sqrt(2))[0]*(1<<64)))
# '0x6a09e667f3bcd000'
which actually differs in the last four digits.
As a comment already explained, you're actually only using 53 bits in your calculations (native CPython float precision).
Here's an easy way to reproduce the result you're apparently after:
>>> import decimal
>>> x = decimal.getcontext().sqrt(2) - 1
>>> x
Decimal('0.414213562373095048801688724')
>>> hex(int(x * 2**64))
'0x6a09e667f3bcc908'
Nothing really magical about decimal. It just happens to use enough precision by default. You could certainly do the same with mpmath.
For example,
>>> import mpmath
>>> mpmath.mp.prec = 80
>>> hex(int(mpmath.frac( mpmath.sqrt(2) ) * 2**64))
'0x6a09e667f3bcc908'
I'm trying to find an efficient way to do the following in Python:
a = 12345678901234567890123456**12345678
f = open('file', 'w')
f.write(str(a))
f.close()
The calculation of the power takes about 40 minutes while one thread is utilized. Is there a quick and easy way to spread this operation over multiple threads?
As the number is quite huge, I think the string function isn't quite up to the task - it's been going for almost three hours now. I need the number to end up in a text file.
Any ideas on how to better accomplish this?
I would like to give a lavish ;-) answer, but don't have the time now. Elaborating on my comment, the decimal module is what you really want here. It's much faster at computing the power, and very very much faster to convert the result to a decimal string:
>>> import decimal
You need to change its internals so that it avoids floating point, giving it more than enough internal digits to store the final result. We want exact integer arithmetic here, not rounded floating-point. So we fiddle things so decimal uses as much precision as it's capable of using, and tell it to raise the "Inexact" exception if it ever loses information to rounding. Note that you need a 64-bit version of Python for decimal to be capable of using enough precision to hold the exact result in your example:
>>> import decimal
>>> c = decimal.getcontext()
>>> c.prec = decimal.MAX_PREC
>>> c.Emax = decimal.MAX_EMAX
>>> c.Emin = decimal.MIN_EMIN
>>> c.traps[decimal.Inexact] = 1
Now create a Decimal for the base:
>>> base = decimal.Decimal(12345678901234567890123456)
>>> base
Decimal('12345678901234567890123456')
And raise to the power - the exponent will automatically be converted to Decimal, because the base is already Decimal:
>>> x = base ** 12345678
That takes less than a minute on my box! The reasons for that are involved. It's not really because it's working in base 10, but because the person who wrote the decimal module implemented "advanced" algorithms for doing very large multiplications.
Now convert to a string. Because it's already stored in a variant of base 10, converting to a decimal string goes very fast (a few seconds on my box, just because the string has hundreds of millions of digits):
>>> y = str(x)
>>> len(y)
309771765
And, for sanity, let's just look at the last 10, and first 10, digits:
>>> y[-10:]
'6044706816'
>>> y[:10]
'2759594879'
As #StefanPochmann noted in a comment, the last 10 digits can be obtained very quickly with native ints by using modular (3-argument) pow():
>>> pow(int(base), 12345678, 10**10)
6044706816
Which matches the last 10 digits of the string above. For the first 10 digits, we can use decimal again but with much less precision, which will cause it (you'll just to have trust me on this) to use a different approach under the covers:
>>> c.prec = 12
>>> c.traps[decimal.Inexact] = 0 # don't trap on rounding!
>>> base ** 12345678
Decimal('2.75959487945E+309771764')
Rounding that back to 10 digits matches the earlier result, and the exponent is consistent with the length of y too.
I am currently working with very small numbers in my python program, e.g.
x = 200 + 2e-26
One solution is to work with logarithmic values which would increase the range of my float value. The problem is that I have to do a fft with these values, too, and therefore using the logarithmic approach is not usable (and using the Decimal-module neither). Is there another way to solve that problem?
Edit: My problem with the decimal module is: How can I handle imaginary values? I tried a = Decimal(1e-26)+Decimal(1e-26*1j) and a = Decimal(1e-26)+Decimal(1e-26)*1j, and both ways failed (error on request).
Consider trying out the mpmath package.
>>> from mpmath import mpf, mpc, mp
>>> mp.dps = 40
>>> mpf(200) + mpf(2e-26) + mpc(1j)
mpc(real='200.0000000000000000000000000200000000000007', imag='1.0')
Mostly accurate and can handle complex numbers, more details in the documentation.
While numpy supports more decimal types (and also complex versions), they don't help:
>>> import numpy
>>> numpy.longfloat
<type 'numpy.float128'>
>>> a = numpy.array([200, 2e-26], dtype=numpy.longfloat)
>>> a
array([ 200.0, 2e-26], dtype=float128)
>>> a.sum()
200.0
>>> a = numpy.array([200, 2e-26], dtype=numpy.longdouble)
>>> a.sum()
200.0
The reason is explained here: Internally, numpy uses 80 bits which means 63 bits mantissa which just supports 63/3 = 21 digits.
What you need is a real 128bit float type like the one from boost.
Try the Boost.Python module which might give you access to this type. If that doesn't work, then you'll have to write your own wrapper class in C++ as explained here.
So I have a list of floating numbers, some of them have round off errors and appears in the form 0.3599999. It is trivial to detect by convert it to string and see if there is a bunch of 999 following. I wonder how a python hacker will do for this or if there is a mathematical way to do this.
Thanks
Consider using Python's decimal module
>>> from decimal import Decimal
>>> Decimal(0.35)
Decimal('0.34999999999999997779553950749686919152736663818359375')
Also have a look at Numpy's assert_approx_equal() function:
>>> np.testing.assert_approx_equal(0.12345677777777e-20, 0.1234567e-20)
>>> np.testing.assert_approx_equal(0.12345670e-20, 0.12345671e-20,
significant=8)
>>> np.testing.assert_approx_equal(0.12345670e-20, 0.12345672e-20,
significant=8)
...
<type 'exceptions.AssertionError'>:
Items are not equal to 8 significant digits:
ACTUAL: 1.234567e-021
DESIRED: 1.2345672000000001e-021
Can anyone explain the following? I'm using Python 2.5
Consider 1*3*5*7*9*11 ... *49. If you type all that from within IPython(x,y) interactive console, you'll get 58435841445947272053455474390625L, which is correct. (why odd numbers: just the way I did it originally)
Python multiply.reduce() or prod() should yield the same result for the equivalent range. And it does, up to a certain point. Here, it is already wrong:
: k = range(1, 50, 2)
: multiply.reduce(k)
: -108792223
Using prod(k) will also generate -108792223 as the result. Other incorrect results start to appear for equivalent ranges of length 12 (that is, k = range(1,24,2)).
I'm not sure why. Can anyone help?
This is because numpy.multiply.reduce() converts the range list to an array of type numpy.int32, and the reduce operation overflows what can be stored in 32 bits at some point:
>>> type(numpy.multiply.reduce(range(1, 50, 2)))
<type 'numpy.int32'>
As Mike Graham says, you can use the dtype parameter to use Python integers instead of the default:
>>> res = numpy.multiply.reduce(range(1, 50, 2), dtype=object)
>>> res
58435841445947272053455474390625L
>>> type(res)
<type 'long'>
But using numpy to work with python objects is pointless in this case, the best solution is KennyTM's:
>>> import functools, operator
>>> functools.reduce(operator.mul, range(1, 50, 2))
58435841445947272053455474390625L
The CPU doesn't multiply arbitrarily large numbers, it only performs specific operations defined on particular ranges of numbers represented in base 2, 0-1 bits.
Python '*' handles large integers perfectly through a proper representation and special code beyond the CPU or FPU instructions for multiply.
This is actually unusual as languages go.
In most other languages, usually a number is represented as a fixed array of bits. For example in C or SQL you could choose to have an 8 bit integer that can represent 0 to 255, or -128 to +127 or you could choose to have a 16 bit integer that can represent up to 2^16-1 which is 65535. When there is only a range of numbers that can be represented, going past the limit with some operation like * or + can have an undesirable effect, like getting a negative number. You may have encountered such a problem when using the external library which is probably natively C and not python.