Say we have a call like:
ser = pd.Series([1,2,3,4])
ser[ser>1].any()
Now my question is: Is pandas "smart enough" to stop computation and spit out the "true" when it encounters the 2 or does it really go through the whole array first and checks the any() afterwards. If the latter is true: How to avoid this behavior?
Is pandas "smart enough" to stop computation
pandas acts differently, it widely uses vectorized operations (apply one operation/function to a sequence of values at once) and the mentioned expression ser[ser>1].any() implies:
ser > 1 - evaluates on the whole series and returns a boolean array of results (for each value) array([False, True, True, True])
ser[ser > 1] - filters the series by boolean array
.any() - finally evaluates function on the filtered series
Actually, your intention is covered by (ser > 1).any() (without interim filtering).
If you expect a classical any behavior (to immediately return on encountering True) you can go respectively in classical way:
any(x > 1 for x in ser)
And, of course, the classical way in this case will go faster:
In [409]: %timeit (ser > 1).any()
75.4 µs ± 636 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [410]: %timeit any(x > 1 for x in ser)
1.44 µs ± 22.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Related
I would like to set an entire field of a NumPy structured scalar from within a Numba compiled nopython function. The desired_fn in the code below is a simple example of what I would like to do, and working_fn is an example of how I can currently accomplish this task.
import numpy as np
import numba as nb
test_numpy_dtype = np.dtype([("blah", np.int64)])
test_numba_dtype = nb.from_dtype(test_numpy_dtype)
#nb.njit
def working_fn(thing):
for j in range(len(thing)):
thing[j]['blah'] += j
#nb.njit
def desired_fn(thing):
thing['blah'] += np.arange(len(thing))
a = np.zeros(3,test_numpy_dtype)
print(a)
working_fn(a)
print(a)
desired_fn(a)
The error generated from running desired_fn(a) is:
numba.errors.InternalError: unsupported array index type const('blah') in [const('blah')]
[1] During: typing of staticsetitem at /home/sam/PycharmProjects/ChessAI/playground.py (938)
This is needed for extremely performance critical code, and will be run billions of times, so eliminating the need for these types of loops seems to be crucial.
The following works (numba 0.37):
#nb.njit
def desired_fn(thing):
thing.blah[:] += np.arange(len(thing))
# or
# thing['blah'][:] += np.arange(len(thing))
If you are operating primarily on columns of your data instead of rows, you might consider using a different data container. A numpy structured array is laid out like a vector of structs rather than a struct of arrays. This means that when you want to update blah, you are moving through non-contiguous memory space as you traverse the array.
Also, with any code optimizations, it's aways worth it to use timeit or some other timing harness (that removes the time required to jit the code) to see what is the actual performance. You might find with numba that explicit looping while more verbose could actually be faster than your vectorized code.
Without numba, accessing field values is no slower than accessing columns of a 2d array:
In [1]: arr2 = np.zeros((10000), dtype='i,i')
In [2]: arr2.dtype
Out[2]: dtype([('f0', '<i4'), ('f1', '<i4')])
Modifying a field:
In [4]: %%timeit x = arr2.copy()
...: x['f0'] += 1
...:
16.2 µs ± 13.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Similar time if I assign the field to a new variable:
In [5]: %%timeit x = arr2.copy()['f0']
...: x += 1
...:
15.2 µs ± 14.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Much faster if I construct a 1d array of the same size:
In [6]: %%timeit x = np.zeros(arr2.shape, int)
...: x += 1
...:
8.01 µs ± 15.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
But similar time when accessing the column of a 2d array:
In [7]: %%timeit x = np.zeros((arr2.shape[0],2), int)
...: x[:,0] += 1
...:
17.3 µs ± 23.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Let A be an (N,M,M) matrix (with N very large) and I would like to compute scipy.linalg.expm(A[n,:,:]) for each n in range(N). I can of course just use a for loop but I was wondering if there was some trick to do this in a better way (something like np.einsum).
I have the same question for other operations like inverting matrices (inverting solved in comments).
Depending on the size and structure of your matrices you can do better than loop.
Assuming your matrices can be diagonalized as A = V D V^(-1) (where D has the eigenvalues in its diagonal and V contains the corresponding eigenvectors as columns), you can compute the matrix exponential as
exp(A) = V exp(D) V^(-1)
where exp(D) simply contains exp(lambda) for each eigenvalue lambda in its diagonal. This is really easy to prove if we use the power series definition of the exponential function. If the matrix A is furthermore normal, the matrix V is unitary and thus its inverse can be computed by simply taking its adjoint.
The good news is that numpy.linalg.eig and numpy.linalg.inv both work with stacked matrices just fine:
import numpy as np
import scipy.linalg
A = np.random.rand(1000,10,10)
def loopy_expm(A):
expmA = np.zeros_like(A)
for n in range(A.shape[0]):
expmA[n,...] = scipy.linalg.expm(A[n,...])
return expmA
def eigy_expm(A):
vals,vects = np.linalg.eig(A)
return np.einsum('...ik, ...k, ...kj -> ...ij',
vects,np.exp(vals),np.linalg.inv(vects))
Note that there's probably some room for optimization in specifying the order of operations in the call to einsum, but I didn't investigate that.
Testing the above for the random array:
In [59]: np.allclose(loopy_expm(A),eigy_expm(A))
Out[59]: True
In [60]: %timeit loopy_expm(A)
824 ms ± 55.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [61]: %timeit eigy_expm(A)
138 ms ± 992 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
That's already nice. If you're lucky enough that your matrices are all normal (say, because they are real symmetric):
A = np.random.rand(1000,10,10)
A = (A + A.transpose(0,2,1))/2
def eigy_expm_normal(A):
vals,vects = np.linalg.eig(A)
return np.einsum('...ik, ...k, ...jk -> ...ij',
vects,np.exp(vals),vects.conj())
Note the symmetric definition of the input matrix and the transpose inside the pattern of einsum. Results:
In [80]: np.allclose(loopy_expm(A),eigy_expm_normal(A))
Out[80]: True
In [79]: %timeit loopy_expm(A)
878 ms ± 89.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [80]: %timeit eigy_expm_normal(A)
55.8 ms ± 868 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
That is a 15-fold speedup for the above example shapes.
It should be noted though that scipy.linalg.eigm uses Padé approximation according to the documentation. This might imply that if your matrices are ill-conditioned, the eigenvalue decomposition may yield different results than scipy.linalg.eigm. I'm not familiar with how this function works, but I expect it to be safer for pathological inputs.
I'm trying to optimize my code a bit. One call is pretty fast, but since it is often I got some issues.
My input data looks like this:
df = pd.DataFrame(data=np.random.randn(30),
index=pd.date_range(pd.datetime(2016,1,1), periods = 30))
df.iloc[:20] = np.nan
Now I just want to apply a simple function. Here is the part I want to optimize:
s = df >= df.shift(1)
s = s.applymap(lambda x: 1 if x else 0)
Right now I'm getting 1000 loops, best of 3: 1.36 ms per loop. I guess it should be possible to do it much faster. Not sure if I should vectorize, work only with numpy or maybe use cython. Any Idea for the best approach? I struggle a bit with the shift operator.
You can cast the result of your comparison directly from bool to int:
(df >= df.shift(1)).astype(int)
#Paul H's answer is good, performant and what I'd generally recommend.
That said, if you want to squeeze every last bit of performance, this is a decent candidate for numba which you can use to compute the answer in a single pass over the data.
from numba import njit
#njit
def do_calc(arr):
N = arr.shape[0]
ans = np.empty(N, dtype=np.int_)
ans[0] = 0
for i in range(1, N):
ans[i] = 1 if arr[i] > arr[i-1] else 0
return ans
a = (df >= df.shift(1)).astype(int)
b = pd.DataFrame(pd.Series(do_calc(df[0].values), df[0].index))
from pandas.testing import assert_frame_equal
assert_frame_equal(a, b)
Here are timings
In [45]: %timeit b = pd.DataFrame(pd.Series(do_calc(df[0].values), df[0].index))
135 µs ± 1.83 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [46]: %timeit a = (df >= df.shift(1)).astype(int)
762 µs ± 22.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Thats my current best solution:
values = df.values[1:] >= df.values[:-1]
data = np.array(values, dtype=int)
s = pd.DataFrame(data, df.index[1:])
I'm getting 10000 loops, best of 3: 125 µs per loop. x10 improvement. But I think it could be done even faster.
PS: this solution isn't exactly correct since the first zero / nan is missing.
PPS: that can be corrected by pd.DataFrame(np.append([[0]],data), df.index)
According to the documentation that I could find, when using fancy indexing a copy rather than a view is returned. However, I couldn't figure out what its behavior is during assignment to another array, for instance:
A = np.arange(0,10)
B = np.arange(-10,0)
fancy_slice = np.array([0,3,5])
A[fancy_slice] = B[fancy_slice]
I understand that A will just receive a call to __setitem__ while B will get a call to __getitem__. What I am concerned about is whether an intermediate array is created before copying the values over to A.
The interpreter will parse the code and issue the method calls as:
A[idx] = B[idx]
A.__setitem__(idx, B.__getitem__(idx))
The B method is evaluated fully before being passed to the A method. numpy doesn't alter the Python interpreter or its syntax. Rather it just adds functions, objects, and methods.
Functionally, it should be the equivalent to
temp = B[idx]
A[idx] = temp
del temp
We could do some timeit just be sure.
In [712]: A = np.zeros(10000,int)
In [713]: B = np.arange(10000)
In [714]: idx = np.arange(0,10000,100)
In [715]: timeit A[idx] = B[idx]
1.2 µs ± 3.24 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [716]: %%timeit
...: temp = B[idx]
...: A[idx] = temp
...:
1.11 µs ± 0.669 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
There are some alternative functions/methods, like add.at, copyto, place, put, that may do some copies without an intermediate, but I haven't used them much. This indexed assignment is good enough - most of the time.
Example with copyto
In [718]: wh = np.zeros(A.shape, bool)
In [719]: wh[idx] = True
In [721]: np.copyto(A, B, where=wh)
In [722]: timeit np.copyto(A, B, where=wh)
7.47 µs ± 9.92 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
So even without timing the construction of the boolean mask, copyto is slower.
put and take are no better:
In [727]: timeit np.put(A,idx, np.take(B,idx))
7.98 µs ± 8.34 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
An intermediate array is created. It has to be created. NumPy doesn't see
A[fancy_slice] = B[fancy_slice]
It sees
B[fancy_slice]
on its own, with no idea what the context is. This operation is defined to make a new array, and NumPy makes a new array.
Then, NumPy sees
A[fancy_slice] = <the array created by the previous operation>
and copies the data into A.
Is there some difference between NumPy np.inf and float('Inf')?
float('Inf') == np.inf returns True, so it seems they are interchangeable, thus I was wondering why NumPy has defined its own "inf" constant, and when should I use one constant instead of the other (considering style concerns too)?
TL, DR: There is no difference and they can be used interchangeably.
Besides having the same value as math.inf and float('inf'):
>>> import math
>>> import numpy as np
>>> np.inf == float('inf')
True
>>> np.inf == math.inf
True
It also has the same type:
>>> import numpy as np
>>> type(np.inf)
float
>>> type(np.inf) is type(float('inf'))
float
That's interesting because NumPy also has it's own floating point types:
>>> np.float32(np.inf)
inf
>>> type(np.float32(np.inf))
numpy.float32
>>> np.float32('inf') == np.inf # nevertheless equal
True
So it has the same value and the same type as math.inf and float('inf') which means it's interchangeable.
Reasons for using np.inf
It's less to type:
np.inf (6 chars)
math.inf (8 chars; new in python 3.5)
float('inf') (12 chars)
That means if you already have NumPy imported you can save yourself 6 (or 2) chars per occurrence compared to float('inf') (or math.inf).
Because it's easier to remember.
At least for me, it's far easier to remember np.inf than that I need to call float with a string.
Also, NumPy defines some additional aliases for infinity:
np.Inf
np.inf
np.infty
np.Infinity
np.PINF
It also defines an alias for negative infinity:
np.NINF
Similarly for nan:
np.nan
np.NaN
np.NAN
Constants are constants
This point is based on CPython and could be completely different in another Python implementation.
A float CPython instance requires 24 Bytes:
>>> import sys
>>> sys.getsizeof(np.inf)
24
If you can re-use the same instance you might save a lot of memory compared to creating lots of new instances. Of course, this point is mute if you create your own inf constant but if you don't then:
a = [np.inf for _ in range(1000000)]
b = [float('inf') for _ in range(1000000)]
b would use 24 * 1000000 Bytes (~23 MB) more memory than a.
Accessing a constant is faster than creating the variable.
%timeit np.inf
37.9 ns ± 0.692 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit float('inf')
232 ns ± 13.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit [np.inf for _ in range(10000)]
552 µs ± 15.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit [float('inf') for _ in range(10000)]
2.59 ms ± 78.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Of course, you can create your own constant to counter that point. But why bother if NumPy already did that for you.