slice assignment slower using memoryview (Python 3.5.0) - python

I have a large bytearray x and want to assign a slice of it to a slice of another bytearray y
x = bytearray(10**7) #something else in practice
y = bytearray(6*10**6)
y[::6] = x[:2*10**6:2]
I figured using memoryview would be faster, and indeed
memoryview(x)[:2*10**6:2]
is very fast. However,
y[::6] = memoryview(x)[:2*10**6:2]
takes 5 times as long as y[::6] = x[:2*10**6:2]
Am I missing something, or is this slowdown a bug in Python?
What is the fastest way to do this in Python (a) if I want to repeatedly assign a known number of 0's, and (b) in general?

The slowdown is not so much a bug, but that memoryview and the buffer protocol are still relatively new to and are poorly optimised. The underlying code to y[::6] = memoryview(x)[:2*10**6:2] creates a contiguous copy of the bytearray before copying it over. Meaning it will be slower than directly creating and assigning a normal slice of the bytearray. Indeed, in this particular instance (on my machine), using a memoryview is closer in speed to using y[::6] = islice(x, None, 2*10**6, 2) than direct assignment.
numpy has existed for much longer and is much better optimised for the types of operations you are interested in doing.
Using ipython:
In [1]: import numpy as np; from itertools import islice
In [2]: x = bytearray(10**7)
In [3]: y = bytearray(6*10**6)
In [4]: x_np = np.array(x)
In [5]: y_np = np.array(y)
In [6]: %timeit y[::6] = memoryview(x)[:2*10**6:2]
100 loops, best of 3: 10.9 ms per loop
In [7]: %timeit y[::6] = x[:2*10**6:2]
1000 loops, best of 3: 1.65 ms per loop
In [8]: %timeit y[::6] = islice(x, None, 2*10**6, 2)
10 loops, best of 3: 22.9 ms per loop
In [9]: %timeit y_np[::6] = x_np[:2*10**6:2]
1000 loops, best of 3: 911 µs per loop
The last two have the added benefit of having very little memory overhead.

Related

numpy's random vs python's default random subsampling

I observed that python's default random.sample is much faster than numpy's random.choice. Taking a small sample from an array of length 1 million, random.sample is more than 1000x faster than its numpy's counterpart.
In [1]: import numpy as np
In [2]: import random
In [3]: arr = [x for x in range(1000000)]
In [4]: nparr = np.array(arr)
In [5]: %timeit random.sample(arr, 5)
The slowest run took 5.25 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 4.54 µs per loop
In [6]: %timeit np.random.choice(arr, 5)
10 loops, best of 3: 47.7 ms per loop
In [7]: %timeit np.random.choice(nparr, 5)
The slowest run took 6.79 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 7.79 µs per loop
Although numpy sampling from numpy array was decently fast yet it was slower than default random sampling.
Is the observation above correct, or am I missing the difference between what random.sample and np.random.choice compute?
What you're seeing in your first call of numpy.random.choice is simply the overhead of converting the list arr to a numpy array.
As for your second call, the slightly worse is probably due to the fact that numpy.random.choice offers the ability to sample non-uniformly, and can also sample with replacement as well as without.

Why does numpy's fromiter function require specifying the dtype when other array creation routines don't?

In order to improve memory efficiency, I've been working on converting some of my code from lists to generators/iterators where I can. I've found a lot of instances of cases where I am just converting a list I've made to an np.array with the code pattern np.array(some_list).
Notably, some_list is often a list comprehension that is iterating over a generator.
I was looking into np.fromiter to see if I could use the generator more directly (rather than having to first cast it into a list to then convert it into an numpy array), but I noticed that the np.fromiter function, unlike any other array creation routine that uses existing data requires specifying the dtype.
In most of my particular cases, I can make that work(mostly dealing with loglikelihoods so float64 will be fine), but it left me wondering why it was that this is only necessary for the fromiter array creator and not other array creators.
First attempts at a guess:
Memory preallocation?
What I understand is that if you know the dtype and the count, it allows preallocating memory to the resulting np.array, and that if you don't specify the optional count argument that it will "resize the output array on demand". But if you do not specify the count, it would seem that you should be able to infer the dtype on the fly in the same way that you can in a normal np.array call.
Datatype recasting?
I could see this being useful for recasting data into new dtypes, but that would hold for other array creation routines as well, and would seem to merit placement as an optional but not required argument.
A couple ways of restating the question
So why is it that you need to specify the dtype to use np.fromiter; or put another way what are the gains that result from specifying the dtype if the array is going to be resized on demand anyway?
A more subtle version of the same question that is more directly related to my problem:
I know many of the efficiency gains of np.ndarrays are lost when you're constantly resizing them, so what is gained from using np.fromiter(generator,dtype=d) over np.fromiter([gen_elem for gen_elem in generator],dtype=d) over np.array([gen_elem for gen_elem in generator],dtype=d)?
If this code was written a decade ago, and there hasn't been pressure to change it, then the old reasons still apply. Most people are happy using np.array. np.fromiter is mainly used by people who are trying squeeze out some speed from iterative methods of generating values.
My impression is that np.array, the main alternative reads/processes the whole input, before deciding on the dtype (and other properties):
I can force a float return just by changing one element:
In [395]: np.array([0,1,2,3,4,5])
Out[395]: array([0, 1, 2, 3, 4, 5])
In [396]: np.array([0,1,2,3,4,5,6.])
Out[396]: array([ 0., 1., 2., 3., 4., 5., 6.])
I don't use fromiter much, but my sense is that by requiring dtype, it can start converting the inputs to that type right from the start. That could end up producing a faster iteration, though that needs time tests.
I know that the np.array generality comes at a certain time cost. Often for small lists it is faster to use a list comprehension than to convert it to an array - even though array operations are fast.
Some time tests:
In [404]: timeit np.fromiter([0,1,2,3,4,5,6.],dtype=int)
100000 loops, best of 3: 3.35 µs per loop
In [405]: timeit np.fromiter([0,1,2,3,4,5,6.],dtype=float)
100000 loops, best of 3: 3.88 µs per loop
In [406]: timeit np.array([0,1,2,3,4,5,6.])
100000 loops, best of 3: 4.51 µs per loop
In [407]: timeit np.array([0,1,2,3,4,5,6])
100000 loops, best of 3: 3.93 µs per loop
The differences are small, but suggest my reasoning is correct. Requiring dtype helps keep fromiter faster. count does not make a difference in this small size.
Curiously, specifying a dtype for np.array slows it down. It's as though it appends a astype call:
In [416]: timeit np.array([0,1,2,3,4,5,6],dtype=float)
100000 loops, best of 3: 6.52 µs per loop
In [417]: timeit np.array([0,1,2,3,4,5,6]).astype(float)
100000 loops, best of 3: 6.21 µs per loop
The differences between np.array and np.fromiter are more dramatic when I use range(1000) (Python3 generator version)
In [430]: timeit np.array(range(1000))
1000 loops, best of 3: 704 µs per loop
Actually, turning the range into a list is faster:
In [431]: timeit np.array(list(range(1000)))
1000 loops, best of 3: 196 µs per loop
but fromiter is still faster:
In [432]: timeit np.fromiter(range(1000),dtype=int)
10000 loops, best of 3: 87.6 µs per loop
It is faster to apply the int to float conversion on the whole array than to each element during the generation/iteration
In [434]: timeit np.fromiter(range(1000),dtype=int).astype(float)
10000 loops, best of 3: 106 µs per loop
In [435]: timeit np.fromiter(range(1000),dtype=float)
1000 loops, best of 3: 189 µs per loop
Note that the astype resizing operation is not that expensive, only some 20 µs.
============================
array_fromiter(PyObject *NPY_UNUSED(ignored), PyObject *args, PyObject *keywds) is defined in:
https://github.com/numpy/numpy/blob/eeba2cbfa4c56447e36aad6d97e323ecfbdade56/numpy/core/src/multiarray/multiarraymodule.c
It processes the keywds and calls
PyArray_FromIter(PyObject *obj, PyArray_Descr *dtype, npy_intp count)
in
https://github.com/numpy/numpy/blob/97c35365beda55c6dead8c50df785eb857f843f0/numpy/core/src/multiarray/ctors.c
This makes an initial array ret using the defined dtype:
ret = (PyArrayObject *)PyArray_NewFromDescr(&PyArray_Type, dtype, 1,
&elcount, NULL,NULL, 0, NULL);
The data attribute of this array is grown with 50% overallocation => 0, 4, 8, 14, 23, 36, 56, 86 ..., and shrunk to fit at the end.
The dtype of this array, PyArray_DESCR(ret), apparently has a function that can take value (provided by the iterator next), convert it, and set it in the data.
`(PyArray_DESCR(ret)->f->setitem(value, item, ret)`
In other words, all the dtype conversion is done by the defined dtype. The code would be lot more complicated if it decided 'on the fly' how to convert the value (and all previously allocated ones). Most of the code in this function deals with allocating the data buffer.
I'll hold off on looking up np.array. I'm sure it is much more complex.

Can I vectorize this Python code?

I'm kind of new to Python and I have to implement "fast as possible" version of this code.
s="<%dH" % (int(width*height),)
z=struct.unpack(s, contents)
heights = np.zeros((height,width))
for r in range(0,height):
for c in range(0,width):
elevation=z[((width)*r)+c]
if (elevation==65535 or elevation<0 or elevation>20000):
elevation=0.0
heights[r][c]=float(elevation)
I've read some of the python vectorization questions... but I don't think it applies to my case. Most of the questions are things like using np.sum instead of for loops. I guess I have two questions:
Is it possible to speed up this code...I think heights[r][c]=float(elevation) is where the bottleneck is. I need to find some Python timing commands to confirm this.
If it possible to speed up this code. What are my options? I have seen some people recommend cython, pypy, weave. I could do this faster in C but this code also need to generate plots so I'd like to stick with Python so I can use matplotlib.
As you mention, the key to writing fast code with numpy involves vectorization, and pushing the work off to fast C-level routines instead of Python loops. The usual approach seems to improve things by a factor of ten or so relative to your original code:
def faster(elevation, height, width):
heights = np.array(elevation, dtype=float)
heights = heights.reshape((height, width))
heights[(heights < 0) | (heights > 20000)] = 0
return heights
>>> h,w = 100, 101; z = list(range(h*w))
>>> %timeit orig(z,h,w)
100 loops, best of 3: 9.71 ms per loop
>>> %timeit faster(z,h,w)
1000 loops, best of 3: 641 µs per loop
>>> np.allclose(orig(z,h,w), faster(z,h,w))
True
That ratio seems to hold even for longer z:
>>> h,w = 1000, 10001; z = list(range(h*w))
>>> %timeit orig(z,h,w)
1 loops, best of 3: 9.44 s per loop
>>> %timeit faster(z,h,w)
1 loops, best of 3: 675 ms per loop

sign() much slower in python than matlab?

I have a function in python that basically takes the sign of an array (75,150), for example.
I'm coming from Matlab and the time execution looks more or less the same less this function.
I'm wondering if sign() works very slowly and you know an alternative to do the same.
Thx,
I can't tell you if this is faster or slower than Matlab, since I have no idea what numbers you're seeing there (you provided no quantitative data at all). However, as far as alternatives go:
import numpy as np
a = np.random.randn(75, 150)
aSign = np.sign(a)
Testing using %timeit in IPython:
In [15]: %timeit np.sign(a)
10000 loops, best of 3: 180 µs per loop
Because the loop over the array (and what happens inside it) is implemented in optimized C code rather than generic Python code, it tends to be about an order of magnitude faster—in the same ballpark as Matlab.
Comparing the exact same code as a numpy vectorized operation vs. a Python loop:
In [276]: %timeit [np.sign(x) for x in a]
1000 loops, best of 3: 276 us per loop
In [277]: %timeit np.sign(a)
10000 loops, best of 3: 63.1 us per loop
So, only 4x as fast here. (But then a is pretty small here.)

Are Numpy functions slow?

Numpy is supposed to be fast. However, when comparing Numpy ufuncs with standard Python functions I find that the latter are much faster.
For example,
aa = np.arange(1000000, dtype = float)
%timeit np.mean(aa) # 1000 loops, best of 3: 1.15 ms per loop
%timeit aa.mean # 10000000 loops, best of 3: 69.5 ns per loop
I got similar results with other Numpy functions like max, power. I was under the impression that Numpy has an overhead that makes it slower for small arrays but would be faster for large arrays. In the code above aa is not small: it has 1 million elements. Am I missing something?
Of course, Numpy is fast, only the functions seem to be slow:
bb = range(1000000)
%timeit mean(bb) # 1 loops, best of 3: 551 ms per loop
%timeit mean(list(bb)) # 10 loops, best of 3: 136 ms per loop
Others already pointed out that your comparison is not a real comparison (you are not calling the function + both are numpy).
But to give an answer to the question "Are numpy function slow?": generally speaking, no, numpy function are not slow (or not slower than plain python function). Off course there are some side notes to make:
'Slow' depends off course on what you compare with, and it can always faster. With things like cython, numexpr, numba, calling C-code, ... and others it is in many cases certainly possible to get faster results.
Numpy has a certain overhead, which can be significant in some cases. For example, as you already mentioned, numpy can be slower on small arrays and scalar math. For a comparison on this, see eg Are NumPy's math functions faster than Python's?
To make the comparison you wanted to make:
In [1]: import numpy as np
In [2]: aa = np.arange(1000000)
In [3]: bb = range(1000000)
For the mean (note, there is no mean function in python standard library: Calculating arithmetic mean (average) in Python):
In [4]: %timeit np.mean(aa)
100 loops, best of 3: 2.07 ms per loop
In [5]: %timeit float(sum(bb))/len(bb)
10 loops, best of 3: 69.5 ms per loop
For max, numpy vs plain python:
In [6]: %timeit np.max(aa)
1000 loops, best of 3: 1.52 ms per loop
In [7]: %timeit max(bb)
10 loops, best of 3: 31.2 ms per loop
As a final note, in the above comparison I used a numpy array (aa) for the numpy functions and a list (bb) for the plain python functions. If you would use a list with numpy functions, in this case it would again be slower:
In [10]: %timeit np.max(bb)
10 loops, best of 3: 115 ms per loop
because the list is first converted to an array (which consumes most of the time). So, if you want to rely on numpy in your application, it is important to make use of numpy arrays to store you data (or if you have a list, convert it to an array so this conversion has to be done only once).
You're not calling aa.mean. Put the function call parentheses on the end, to actually call it, and the speed difference will nearly vanish. (Both np.mean(aa) and aa.mean() are NumPy; neither uses Python builtins to do the math.)

Categories