I wrote the following method (in python 2.7) that generates a set of integers and transform them into binary representation. It takes self-explanatory two parameters: total_num_nodes and dim. It returns numpy matrix-like containing the binary representation of all these integers:
def generate(total_num_nodes, dim):
# Generate random nodes from the range (0, dim-1)
nodes_matrix = [random.randint(0, 2 ** dim - 1) for _ in range(total_num_nodes)]
# Removes duplicates
nodes_matrix = list(set(nodes_matrix))
# Transforms each node from decimal to string representation
nodes_matrix = [('{0:0' + str(dim) + 'b}').format(x) for x in nodes_matrix]
# Transforms each bit into an integer.
nodes_matrix = np.asarray([list(map(int, list(x))) for x in nodes_matrix], dtype=np.uint8)
return nodes_matrix
The problem is that when I pass very large values, say total_num_nodes= 10,000,000 and dim=128, the generation time takes really long time. A friend of mine hinted me that the following line is actually a bottleneck and it is likely to be responsible for the majority of computation time:
# Transforms each node from decimal to string representation
nodes_matrix = [('{0:0' + str(dim) + 'b}').format(x) for x in nodes_matrix]
I cannot think of other faster method that can replce this line so that I get to speedup the generation time when it is running on a single processor. Any suggestion from you is really really appreciated.
Thank you
Do it all in numpy and it will be faster.
The following generates total_num_nodes rows of dim columns of np.uint8 data, then keeps the unique rows by providing a view of the data suitable for np.unique, then translating back to a 2D array:
import numpy as np
def generate(total_num_nodes, dim):
a = np.random.choice(np.array([0,1],dtype=np.uint8),size=(total_num_nodes,dim))
dtype = a.dtype.descr * dim
temp = a.view(dtype)
uniq = np.unique(temp)
return uniq.view(a.dtype).reshape(-1,dim)
Related
I saw a video about speed of loops in python, where it was explained that doing sum(range(N)) is much faster than manually looping through range and adding the variables together, since the former runs in C due to built-in functions being used, while in the latter the summation is done in (slow) python. I was curious what happens when adding numpy to the mix. As I expected np.sum(np.arange(N)) is the fastest, but sum(np.arange(N)) and np.sum(range(N)) are even slower than doing the naive for loop.
Why is this?
Here's the script I used to test, some comments about the supposed cause of slowing done where I know (taken mostly from the video) and the results I got on my machine (python 3.10.0, numpy 1.21.2):
updated script:
import numpy as np
from timeit import timeit
N = 10_000_000
repetition = 10
def sum0(N = N):
s = 0
i = 0
while i < N: # condition is checked in python
s += i
i += 1 # both additions are done in python
return s
def sum1(N = N):
s = 0
for i in range(N): # increment in C
s += i # addition in python
return s
def sum2(N = N):
return sum(range(N)) # everything in C
def sum3(N = N):
return sum(list(range(N)))
def sum4(N = N):
return np.sum(range(N)) # very slow np.array conversion
def sum5(N = N):
# much faster np.array conversion
return np.sum(np.fromiter(range(N),dtype = int))
def sum5v2_(N = N):
# much faster np.array conversion
return np.sum(np.fromiter(range(N),dtype = np.int_))
def sum6(N = N):
# possibly slow conversion to Py_long from np.int
return sum(np.arange(N))
def sum7(N = N):
# list returns a list of np.int-s
return sum(list(np.arange(N)))
def sum7v2(N = N):
# tolist conversion to python int seems faster than the implicit conversion
# in sum(list()) (tolist returns a list of python int-s)
return sum(np.arange(N).tolist())
def sum8(N = N):
return np.sum(np.arange(N)) # everything in numpy (fortran libblas?)
def sum9(N = N):
return np.arange(N).sum() # remove dispatch overhead
def array_basic(N = N):
return np.array(range(N))
def array_dtype(N = N):
return np.array(range(N),dtype = np.int_)
def array_iter(N = N):
# np.sum's source code mentions to use fromiter to convert from generators
return np.fromiter(range(N),dtype = np.int_)
print(f"while loop: {timeit(sum0, number = repetition)}")
print(f"for loop: {timeit(sum1, number = repetition)}")
print(f"sum_range: {timeit(sum2, number = repetition)}")
print(f"sum_rangelist: {timeit(sum3, number = repetition)}")
print(f"npsum_range: {timeit(sum4, number = repetition)}")
print(f"npsum_iterrange: {timeit(sum5, number = repetition)}")
print(f"npsum_iterrangev2: {timeit(sum5, number = repetition)}")
print(f"sum_arange: {timeit(sum6, number = repetition)}")
print(f"sum_list_arange: {timeit(sum7, number = repetition)}")
print(f"sum_arange_tolist: {timeit(sum7v2, number = repetition)}")
print(f"npsum_arange: {timeit(sum8, number = repetition)}")
print(f"nparangenpsum: {timeit(sum9, number = repetition)}")
print(f"array_basic: {timeit(array_basic, number = repetition)}")
print(f"array_dtype: {timeit(array_dtype, number = repetition)}")
print(f"array_iter: {timeit(array_iter, number = repetition)}")
print(f"npsumarangeREP: {timeit(lambda : sum8(N/1000), number = 100000*repetition)}")
print(f"npsumarangeREP: {timeit(lambda : sum9(N/1000), number = 100000*repetition)}")
# Example output:
#
# while loop: 11.493371912998555
# for loop: 7.385945574002108
# sum_range: 2.4605720699983067
# sum_rangelist: 4.509678105998319
# npsum_range: 11.85120212900074
# npsum_iterrange: 4.464334709002287
# npsum_iterrangev2: 4.498494338993623
# sum_arange: 9.537815956995473
# sum_list_arange: 13.290120724996086
# sum_arange_tolist: 5.231948580003518
# npsum_arange: 0.241889145996538
# nparangenpsum: 0.21876695199898677
# array_basic: 11.736577274998126
# array_dtype: 8.71628468400013
# array_iter: 4.303306431000237
# npsumarangeREP: 21.240833958996518
# npsumarangeREP: 16.690092379001726
np.sum(range(N)) is slow mostly because the current Numpy implementation do not use enough informations about the exact type/content of the values provided by the generator range(N). The heart of the general problem is inherently due to dynamic typing of Python and big integers although Numpy could optimize this specific case.
First of all, range(N) returns a dynamically-typed Python object which is a (special kind of) Python generator. The object provided by this generator are also dynamically-typed. It is in practice a pure-Python integer.
The thing is Numpy is written in the statically-typed language C and so it cannot efficiently work on dynamically-typed pure-Python objects. The strategy of Numpy is to convert such objects into C types when it can. One big problem in this case is that the integers provided by the generator can theorically be huge: Numpy do not know if the values can overflow a np.int32 or even a np.int64 type. Thus, Numpy first detect the good type to use and then compute the result using this type.
This translation process can be quite expensive and appear not to be needed here since all the values provided by range(10_000_000). However, range(5_000_000_000) returns the same object type with pure-Python integers overflowing np.int32 and Numpy needs to automatically detect this case not to return wrong results. The thing is also the input type can be correctly identified (np.int32 on my machine), it does not means that the output result will be correct because overflows can appear in during the computation of the sum. This is sadly the case on my machine.
Numpy developers decided to deprecate such a use and put in the documentation that np.fromiter should be used instead. np.fromiter has a dtype required parameter to let the user define what is the good type to use.
One way to check this behaviour in practice is to simply use create a temporary list:
tmp = list(range(10_000_000))
# Numpy implicitly convert the list in a Numpy array but
# still automatically detect the input type to use
np.sum(tmp)
A faster implementation is the following:
tmp = list(range(10_000_000))
# The array is explicitly converted using a well-defined type and
# thus there is no need to perform an automatic detection
# (note that the result is still wrong since it does not fit in a np.int32)
tmp2 = np.array(tmp, dtype=np.int32)
result = np.sum(tmp2)
The first case takes 476 ms on my machine while the second takes 289 ms. Note that np.sum takes only 4 ms. Thus, a large part of the time is spend in the conversion of pure-Python integer objects to internal int32 types (more specifically the management of pure-Python integers). list(range(10_000_000)) is expensive too as it takes 205 ms. This is again due to the overhead of pure-Python integers (ie. allocations, deallocations, reference counting, increment of variable-sized integers, memory indirections and conditions due to the dynamic typing) as well as the overhead of the generator.
sum(np.arange(N)) is slow because sum is a pure-Python function working on a Numpy-defined object. The CPython interpreter needs to call Numpy functions to perform basic additions. Moreover, Numpy-defined integer object are still Python object and so they are subject to reference counting, allocation, deallocation, etc. Not to mention Numpy and CPython add many checks in the functions aiming to finally just add two native numbers together. A Numpy-aware just-in-time compiler such as Numba can solve this issue. Indeed, Numba takes 23 ms on my machine to compute the sum of np.arange(10_000_000) (with code still written in Python) while the CPython interpreter takes 556 ms.
Let's see if I can summarize the results.
sum can work with any iterable, repeatedly asking for the next value and adding it. range is a generator, that's happy to supply the next value
# sum_range: 1.4830789409988938
Making a list from a range takes time:
# sum_rangelist: 3.6745876889999636
Summing a pregenerated list is actually faster than summing the range:
%%timeit x = list(range(N))
...: sum(x)
np.sum is designed to sum arrays. It's a wrapper to np.add.reduce.
np.sum has a deprecation warning for np.sum(generator), recommending the use of fromiter or Python sum:
# npsum_range: 16.216972655000063
fromiter is the best way of making an array from a generator. Using np.array on range is legacy code and may go away in the future. I think it's the only generator that np.array will accept.
np.array is a general purpose function that can handle many cases, including nested arrays, and conversion to various dtypes. As such it has to process the whole input argument, deducing both shape and dtype.
# npsum_fromiterrange:3.47655400199983
Iteration on a numpy array is slower than a list, since it has to "unbox" each element.
# sum_arange: 16.656015603000924
Similarly making a list from an array is slow; same sort of python level iteration.
# sum_list_arange: 19.500842117000502
arr.tolist() is relatively fast, creating a pure python list in compiled code. So speed is similar to making a list from range.
# sum_arange_tolist: 4.004777374000696
np.sum of an array is pure numpy and quite fast. np.sum(x) where x=np.arange(N) is even faster (by about 4x)
# npsum_arange: 0.2332638230000157
np.sum from range or list is dominated by the cost of creating the array first:
# array_basic: 16.1631146109994
# array_dtype: 16.550737804000164
# array_iter: 3.9803170430004684
From the cpython source code for sum sum initially seems to attempt a fast path that assumes all inputs are the same type. If that fails it will just iterate:
/* Fast addition by keeping temporary sums in C instead of new Python objects.
Assumes all inputs are the same type. If the assumption fails, default
to the more general routine.
*/
I'm not entirely certain what is happening under the hood, but it is likely the repeated creation/conversion of C types to Python objects that is causing these slow-downs. It's worth noting that both sum and range are implemented in C.
This next bit is not really an answer to the question, but I wondered if we could speed up sum for python ranges as range is quite a smart object.
To do this I've used functools.singledispatch to override the built-in sum function specifically for the range type; then implemented a small function to calculate the sum of an arithmetic progression.
from functools import singledispatch
def sum_range(range_, /, start=0):
"""Overloaded `sum` for range, compute arithmetic sum"""
n = len(range_)
if not n:
return start
return int(start + (n * (range_[0] + range_[-1]) / 2))
sum = singledispatch(sum)
sum.register(range, sum_range)
def test():
"""
>>> sum(range(0, 100))
4950
>>> sum(range(0, 10, 2))
20
>>> sum(range(0, 9, 2))
20
>>> sum(range(0, -10, -1))
-45
>>> sum(range(-10, 10))
-10
>>> sum(range(-1, -100, -2))
-2500
>>> sum(range(0, 10, 100))
0
>>> sum(range(0, 0))
0
>>> sum(range(0, 100), 50)
5000
>>> sum(range(0, 0), 10)
10
"""
if __name__ == "__main__":
import doctest
doctest.testmod()
I'm not sure if this is complete, but it's definitely faster than looping.
I have a list/array of numbers, which I want to save to a binary file.
The crucial part is, that each number should not be saved as a pre-defined data type.
The bits per value are constant for all values in the list but do not correspond to the typical data types (e.g. byte or int).
import numpy as np
# create 10 random numbers in range 0-63
values = np.int32(np.round(np.random.random(10)*63));
# each value requires exactly 6 bits
# how to save this to a file?
# just for debug/information: bit string representation
bitstring = "".join(map(lambda x: str(bin(x)[2:]).zfill(6), values));
print(bitstring)
In the real project, there are more than a million values I want to store with a given bit dephts.
I already tried the module bitstring, but appending each value to the BitArray costs a lot of time...
The may be some numpy-specific way that make things easier, but here's a pure Python (2.x) way to do it. It first converts the list of values into a single integer since Python supports int values of any length. Next it converts that int value into a string of bytes and writes it to the file.
Note: If you're sure all the values will fit within the bit-width specified, the array_to_int() function could be sped up slightly by changing the (value & mask) it's using to just value.
import random
def array_to_int(values, bitwidth):
mask = 2**bitwidth - 1
shift = bitwidth * (len(values)-1)
integer = 0
for value in values:
integer |= (value & mask) << shift
shift -= bitwidth
return integer
# In Python 2.7 int and long don't have the "to_bytes" method found in Python 3.x,
# so here's one way to do the same thing.
def to_bytes(n, length):
return ('%%0%dx' % (length << 1) % n).decode('hex')[-length:]
BITWIDTH = 6
#values = [random.randint(0, 2**BITWIDTH - 1) for _ in range(10)]
values = [0b000001 for _ in range(10)] # create fixed pattern for debugging
values[9] = 0b011111 # make last one different so it can be spotted
# just for debug/information: bit string representation
bitstring = "".join(map(lambda x: bin(x)[2:].zfill(BITWIDTH), values));
print(bitstring)
bigint = array_to_int(values, BITWIDTH)
width = BITWIDTH * len(values)
print('{:0{width}b}'.format(bigint, width=width)) # show integer's value in binary
num_bytes = (width+8 - (width % 8)) // 8 # round to whole number of 8-bit bytes
with open('data.bin', 'wb') as file:
file.write(to_bytes(bigint, num_bytes))
Since you give an example with a string, I'll assume that's how you get the results. This means performance is probably never going to be great. If you can, try creating bytes directly instead of via a string.
Side note: I'm using Python 3 which might require you to make some changes for Python 2. I think this code should work directly in Python 2, but there are some changes around bytearrays and strings between 2 and 3, so make sure to check.
byt = bytearray(len(bitstring)//8 + 1)
for i, b in enumerate(bitstring):
byt[i//8] += (b=='1') << i%8
and for getting the bits back:
bitret = ''
for b in byt:
for i in range(8):
bitret += str((b >> i) & 1)
For millions of bits/bytes you'll want to convert this to a streaming method instead, as you'd need a lot of memory otherwise.
I am currently using numpy to create an array. I would like to use vectorized implementations to more efficiently take the average of the elements in a position, (i, j). These arrays are coming from images in a file directory which have all been standardized to a fixed size.
However, when I try to add the image arrays, the sum of each element is returned in a form a (mod 256). How can I change the maximum value of the elements?
Your arrays are presumably of type numpy.uint8, so they wraparound when they hit 256.
If you want to get larger results, use astype to convert the first argument to a larger data type, e.g.:
a = np.array(..., dtype=np.uint8)
b = np.array(..., dtype=np.uint8)
c = a.astype(np.uint32) + b
and you'll get a result array of the larger data type too.
Per #Eric, to avoid the temporary, you can use the numpy add function (not method) to do the addition, passing a dtype so the result is of the new type even as the inputs are not converted, avoiding a temporary (at least at the Python level):
c = np.add(a, b, dtype=np.uint32)
You would be better off creating the output array first:
average = numpy.zeros(a.shape, numpy.float32)
image = numpy.zeros_like(average)
Then traversing the images and adding them up in-place:
for i in images:
image[:] = function_that_reads_images_as_uint8(i)
average += image
average /= len(images)
You might get away with int types if you didn't need the precision in the division step.
I want to process a big list of uint numbers (test1) and I do that i chunks of "length". I need them as signed int and then I need it as absolute length from each pait of even and odd values in this list.
But I want to get rid of two problems:
it uses a a lot of ram
it takes ages!
So how could I make this faster? Any trick? I could also use numpy, no problem in doing so.
Thanks in advance!
test2 = -127 + test1[i:i+length*2048000*2 + 2048000*2*1]
test3 = (test2[::2]**2 + test2[1::2]**2)**0.5
An efficient way is to try to use Numpy functions, e.g:
n = 10
ff = np.random.randint(0, 255, n) # generate some data
ff2 = ff.reshape(n/2, 2) # new view on ff (only makes copy if needed)
l_ff = np.linalg.norm(ff2, axis=1) # calculate vector length of each row
Note that when modifying an entry in ff2 then ff will change as well and vice versa.
Internally, Numpy stores data as contiguous memory blocks. So there are further methods besides np.reshape() to exploit that structure. For efficient conversion of data types, you can try:
dd_s = np.arange(-5, 10, dtype=np.int8)
dd_u = dd_s.astype(np.uint8) # conversion from signed to unsigned
Is there a more efficient way to generate a 10 kBit (10,000 bits) random binary sequence in Python than appending 0s and 1s in a loop?
If you want a random binary sequence then it's probably quickest just to generate a random integer in the appropriate range:
import random
s = random.randint(0, 2**10000 - 1)
After this it really depends on what you want to do with your binary sequence. You can query individual bits using bitwise operations:
s & (1 << x) # is bit x set?
or you could use a library like bitarray or bitstring if you want to make checking, setting slicing etc. easier:
from bitstring import BitArray
b = BitArray(uint=s, length=10000)
p, = b.find('0b000000')
if b[99]:
b[100] = False
...
The numpy package has a subpackage 'random' which can produce arrays of random numbers.
http://docs.scipy.org/doc/numpy/reference/routines.random.html
If you want an array of 'n' random bits, you can use
arr = numpy.random.randint(2, size=(n,))
... but depending on what you are doing with them, it may be more efficient to use e.g.
arr = numpy.random.randint(0x10000, size=(n,))
to get an array of 'n' numbers, each with 16 random bits; then
rstring = arr.astype(numpy.uint16).tostring()
turns that into a string of 2*n chars, containing the same random bits.
Here is a one liner:
import random
[random.randrange(2) for _ in range(10000)]