I need to simulate a piece of hardware that generates binary files where each word is 10 bits. How can I achieve this with a numpy array?
Something like:
outarray = np.zeros(512, dtype=np.int10)
Thanks!
Numpy doesn't have an uint10 type. But you can use uint16, and a bitmask to check for overflow. And use binary_rep to get the 10 bit binary representations:
import numpy as np
MAX_WORD = 2**10
unused_bits = ~np.array([MAX_WORD-1], dtype="uint16") # Binary mask of the 6 unused_bits
words = np.random.randint(MAX_WORD, size=10, dtype="uint16") # Create 10 bit words
assert not np.any(words & unused_bits) # Check for overflow
for word in words:
print(word, np.binary_repr(word, width=10)) # Get 10 bit binary representation
binary_repr = "".join(np.binary_repr(word, width=10) for word in words)
print(binary_repr) # Full binary representation
Another option you could consider, if you're mainly interested in understanding the accuracy of arithmetic operations on 10-bit numbers, is to use the spfpm package. This will simulate the effect of fixed-point arithemetic operations, including multiplication, division, square-roots, trigonometric functions, etc., but doesn't currently support matrix operations.
Related
It is apparent that NumPy has an upper bound for its integers. But my question is, is there a way to store the elements in NumPy arrays, like by keeping the values and the magnitudes separate? Wouldn't that technically allow storing of larger numbers than what the int64 limit allows?
for example you can store arbitrary precision integers in numpy array using dtype = object and perform addition, multiplication, element-wise multiplication, subtraction and integer division but not operations, which lead to float results, for example np.exp(x) wont work.
x = np.ones((10,10),dtype=object)
x *= 2**100
x *= x
print(x)
if you want truly arbitrary precision arithmetic matrix classes I would implement on my own with proper operator overload with help of mpmath
I understand that numpy can't handle non-native integers, but how can I store python high precision integers as an array of native integers (in either endian)? e.g.
a = 105951305240504794066266398962584786593081186897777398483830058739006966285013
can't be stored as a native integer because it's 256 bit. But it can be stored as
A = array([18196013122530525909, 15462736877728584896,
12869567647602165677, 16879016735257358861], dtype=uint64)
using little-endian (i.e. a == A[0] + A[1]<<64 + A[2]<<128 + A[3]<<192) or A[::-1] as big-endian. How can I convert from a to A here?
I want to convert this "python-side" number to "numpy-side" so that I can run highly efficient algorithms on it (e.g. fast multiplication using Fourier transform).
I believe Python internally should already be using similar structure. All I need to do is to "expose" it to numpy, but I'm not sure about the exact structure or how can I "expose" it. The most straight forward way is of course using a while loop:
A = np.zeros(4, 'uint64')
i = 0
while a > 0:
A[i] = a & (2**64-1)
a >>= 64
i += 1
But I'm wondering are there more "native" or "efficient" ways?
Thanks for your help!
For 1-D numpy arrays, this two expressions should yield the same result (theorically):
(a*b).sum()/a.sum()
dot(a, b)/a.sum()
The latter uses dot() and is faster. But which one is more accurate? Why?
Some context follows.
I wanted to compute the weighted variance of a sample using numpy.
I found the dot() expression in another answer, with a comment stating that it should be more accurate. However no explanation is given there.
Numpy dot is one of the routines that calls the BLAS library that you link on compile (or builds its own). The importance of this is the BLAS library can make use of Multiply–accumulate operations (usually Fused-Multiply Add) which limit the number of roundings that the computation performs.
Take the following:
>>> a=np.ones(1000,dtype=np.float128)+1E-14
>>> (a*a).sum()
1000.0000000000199948
>>> np.dot(a,a)
1000.0000000000199948
Not exact, but close enough.
>>> a=np.ones(1000,dtype=np.float64)+1E-14
>>> np.dot(a,a)
1000.0000000000176 #off by 2.3948e-12
>>> (a*a).sum()
1000.0000000000059 #off by 1.40948e-11
The np.dot(a, a) will be the more accurate of the two as it use approximately half the number of floating point roundings that the naive (a*a).sum() does.
A book by Nvidia has the following example for 4 digits of precision. rn stands for 4 round to the nearest 4 digits:
x = 1.0008
x2 = 1.00160064 # true value
rn(x2 − 1) = 1.6006 × 10−4 # fused multiply-add
rn(rn(x2) − 1) = 1.6000 × 10−4 # multiply, then add
Of course floating point numbers are not rounded to the 16th decimal place in base 10, but you get the idea.
Placing np.dot(a,a) in the above notation with some additional pseudo code:
out=0
for x in a:
out=rn(x*x+out) #Fused multiply add
While (a*a).sum() is:
arr=np.zeros(a.shape[0])
for x in range(len(arr)):
arr[x]=rn(a[x]*a[x])
out=0
for x in arr:
out=rn(x+out)
From this its easy to see that the number is rounded twice as many times using (a*a).sum() compared to np.dot(a,a). These small differences summed can change the answer minutely. Additional exmaples can be found here.
I would like to generate uniformly distributed random numbers between 0 and 0.5, but truncated to 2 decimal places.
without the truncation, I know this is done by
import numpy as np
rs = np.random.RandomState(123456)
set = rs.uniform(size=(50,1))*0.5
could anyone help me with suggestions on how to generate random numbers up to 2 d.p. only? Thanks!
A float cannot be truncated (or rounded) to 2 decimal digits, because there are many values with 2 decimal digits that just cannot be represented exactly as an IEEE double.
If you really want what you say you want, you need to use a type with exact precision, like Decimal.
Of course there are downsides to doing that—the most obvious one for numpy users being that you will have to use dtype=object, with all of the compactness and performance implications.
But it's the only way to actually do what you asked for.
Most likely, what you actually want to do is either Joran Beasley's answer (leave them untruncated, and just round at print-out time) or something similar to Lauritz V. Thaulow's answer (get the closest approximation you can, then use explicit epsilon checks everywhere).
Alternatively, you can do implicitly fixed-point arithmetic, as David Heffernan suggests in a comment: Generate random integers between 0 and 50, keep them as integers within numpy, and just format them as fixed point decimals and/or convert to Decimal when necessary (e.g., for printing results). This gives you all of the advantages of Decimal without the costs… although it does open an obvious window to create new bugs by forgetting to shift 2 places somewhere.
decimals are not truncated to 2 decimal places ever ... however their string representation maybe
import numpy as np
rs = np.random.RandomState(123456)
set = rs.uniform(size=(50,1))*0.5
print ["%0.2d"%val for val in set]
How about this?
np.random.randint(0, 50, size=(50,1)).astype("float") / 100
That is, create random integers between 0 and 50, and divide by 100.
EDIT:
As made clear in the comments, this will not give you exact two-digit decimals to work with, due to the nature of float representations in memory. It may look like you have the exact float 0.1 in your array, but it definitely isn't exactly 0.1. But it is very very close, and you can get it closer by using a "double" datatype instead.
You can postpone this problem by just keeping the numbers as integers, and remember that they're to be divided by 100 when you use them.
hundreds = random.randint(0, 50, size=(50, 1))
Then at least the roundoff won't happen until at the last minute (or maybe not at all, if the numerator of the equation is a multiple of the denominator).
I managed to find another alternative:
import numpy as np
rs = np.random.RandomState(123456)
set = rs.uniform(size=(50,2))
for i in range(50):
for j in range(2):
set[i,j] = round(set[i,j],2)
Is there a more efficient way to generate a 10 kBit (10,000 bits) random binary sequence in Python than appending 0s and 1s in a loop?
If you want a random binary sequence then it's probably quickest just to generate a random integer in the appropriate range:
import random
s = random.randint(0, 2**10000 - 1)
After this it really depends on what you want to do with your binary sequence. You can query individual bits using bitwise operations:
s & (1 << x) # is bit x set?
or you could use a library like bitarray or bitstring if you want to make checking, setting slicing etc. easier:
from bitstring import BitArray
b = BitArray(uint=s, length=10000)
p, = b.find('0b000000')
if b[99]:
b[100] = False
...
The numpy package has a subpackage 'random' which can produce arrays of random numbers.
http://docs.scipy.org/doc/numpy/reference/routines.random.html
If you want an array of 'n' random bits, you can use
arr = numpy.random.randint(2, size=(n,))
... but depending on what you are doing with them, it may be more efficient to use e.g.
arr = numpy.random.randint(0x10000, size=(n,))
to get an array of 'n' numbers, each with 16 random bits; then
rstring = arr.astype(numpy.uint16).tostring()
turns that into a string of 2*n chars, containing the same random bits.
Here is a one liner:
import random
[random.randrange(2) for _ in range(10000)]