Numpy: get array where index greater than value and condition is true - python

I have the following array:
a = np.array([6,5,4,3,4,5,6])
Now I want to get all elements which are greater than 4 but also have in index value greater than 2.
The way that I have found to do that was the following:
a[2:][a[2:]>4]
Is there a better or more readable way to accomplish this?
UPDATE:
This is a simplified version. In reality the indexing is done with arithmetic operation over several variables like this:
a[len(trainPredict)+(look_back*2)+1:][a[len(trainPredict)+(look_back*2)+1:]>4]
trainPredict ist a numpy array, look_back an integer.
I wanted to see if there is an established way or how others do that.

If you're worried about the complexity of the slice and/or the number of conditions, you can always separate them:
a = np.array([6,5,4,3,4,5,6])
a_slice = a[2:]
cond_1 = a_slice > 4
res = a_slice[cond_1]
Is your example very simplified? There might be better solutions for more complex manipulations.

#AlexanderCécile's answer is not only more legible than the one liner you posted, but is also removes the redundant computation of a temp array. Despite that, it does not appear to be any faster than your original approach.
The timings below are all run with a preliminary setup of
import numpy as np
np.random.seed(0xDEADBEEF)
a = np.random.randint(8, size=N)
N varies from 1e3 to 1e8 in factors of 10. I tried four variants of the code:
CodePope: result = a[2:][a[2:] > 4]
AlexanderCécile: s = a[2:]; result = s[s > 4]
MadPhysicist1: result = a[np.flatnonzero(a[2:]) + 2]
MadPhysicist2: result = a[(a > 4) & (np.arange(a.size) >= 2)]
In all cases, the timing was obtained on the command line by running
python -m timeit -s 'import numpy as np; np.random.seed(0xDEADBEEF); a = np.random.randint(8, size=N)' '<X>'
Here, N was a power of 10 between 3 and 8, and <X> one of the expressions above. Timings are as follows:
Methods #1 and #2 are virtually indistinguishable. What is surprising is that in the range between ~5e3 and ~1e6 elements, method #3 seems to be slightly, but noticeably faster. I would not normally expect that from fancy indexing. Method #4 is of course going to be the slowest.
Here is the data, for completeness:
CodePope AlexanderCécile MadPhysicist1 MadPhysicist2
1000 3.77e-06 3.69e-06 5.48e-06 6.52e-06
10000 4.6e-05 4.59e-05 3.97e-05 5.93e-05
100000 0.000484 0.000483 0.0004 0.000592
1000000 0.00513 0.00515 0.00503 0.00675
10000000 0.0529 0.0525 0.0617 0.102
100000000 0.657 0.658 0.782 1.09

Related

Python: Very slow execution loops

I am writing a code for proposing typo correction using HMM and Viterbi algorithm. At some point for each word in the text I have to do the following. (lets assume I have 10,000 words)
#FYI Windows 10, 64bit, interl i7 4GRam, Python 2.7.3
import numpy as np
import pandas as pd
for k in range(10000):
tempWord = corruptList20[k] #Temp word read form the list which has all of the words
delta = np.zeros(26, len(tempWord)))
sai = np.chararray(26, len(tempWord)))
sai[:] = '#'
# INITIALIZATION DELTA
for i in range(26):
delta[i][0] = #CALCULATION matrix read and multiplication each cell is different
# INITILIZATION END
# 6.DELTA CALCULATION
for deltaIndex in range(1, len(tempWord)):
for j in range(26):
tempDelta = 0.0
maxDelta = 0.0
maxState = ''
for i in range(26):
# CALCULATION to fill each cell involve in:
# 1-matrix read and multiplication
# 2 Finding Column Max
# logical operation and if-then-else operations
# 7. SAI BACKWARD TRACKING
delta2 = pd.DataFrame(delta)
sai2 = pd.DataFrame(sai)
proposedWord = np.zeros(len(tempWord), str)
editId = 0
for col in delta2.columns:
# CALCULATION to fill each cell involve in:
# 1-matrix read and multiplication
# 2 Finding Column Max
# logical operation and if-then-else operations
editList20.append(''.join(editWord))
#END OF LOOP
As you can see it is computationally involved and When I run it takes too much time to run.
Currently my laptop is stolen and I run this on Windows 10, 64bit, 4GRam, Python 2.7.3
My question: Anybody can see any point that I can use to optimize? Do I have to delete the the matrices I created in the loop before loop goes to next round to make memory free or is this done automatically?
After the below comments and using xrange instead of range the performance increased almost by 30%. I am adding the screenshot here after this change.
I don't think that range discussion makes much difference. With Python3, where range is the iterator, expanding it into a list before iteration doesn't change time much.
In [107]: timeit for k in range(10000):x=k+1
1000 loops, best of 3: 1.43 ms per loop
In [108]: timeit for k in list(range(10000)):x=k+1
1000 loops, best of 3: 1.58 ms per loop
With numpy and pandas the real key to speeding up loops is to replace them with compiled operations that work on the whole array or dataframe. But even in pure Python, focus on streamlining the contents of the iteration, not the iteration mechanism.
======================
for i in range(26):
delta[i][0] = #CALCULATION matrix read and multiplication
A minor change: delta[i, 0] = ...; this is the array way of addressing a single element; functionally it often is the same, but the intent is clearer. But think, can't you set all of that column as once?
delta[:,0] = ...
====================
N = len(tempWord)
delta = np.zeros(26, N))
etc
In tight loops temporary variables like this can save time. This isn't tight, so here is just adds clarity.
===========================
This one ugly nested triple loop; admittedly 26 steps isn't large, but 26*26*N is:
for deltaIndex in range(1,N):
for j in range(26):
tempDelta = 0.0
maxDelta = 0.0
maxState = ''
for i in range(26):
# CALCULATION
# 1-matrix read and multiplication
# 2 Finding Column Max
# logical operation and if-then-else operations
Focus on replacing this with array operations. It's those 3 commented lines that need to be changed, not the iteration mechanism.
================
Make proposedWord a list rather than array might be faster. Small list operations are often faster than array one, since numpy arrays have a creation overhead.
In [136]: timeit np.zeros(20,str)
100000 loops, best of 3: 2.36 µs per loop
In [137]: timeit x=[' ']*20
1000000 loops, best of 3: 614 ns per loop
You have to careful when creating 'empty' lists that the elements are truly independent, not just copies of the same thing.
In [159]: %%timeit
x = np.zeros(20,str)
for i in range(20):
x[i] = chr(65+i)
.....:
100000 loops, best of 3: 14.1 µs per loop
In [160]: timeit [chr(65+i) for i in range(20)]
100000 loops, best of 3: 7.7 µs per loop
As noted in the comments, the behavior of range changed between Python 2 and 3.
In 2, range constructs an entire list populated with the numbers to iterate over, then iterates over the list. Doing this in a tight loop is very expensive.
In 3, range instead constructs a simple object that (as far as I know), consists only of 3 numbers: the starting number, the step (distance between numbers), and the end number. Using simple math, you can calculate any point along the range instead of needing to iterate necessarily. This makes "random access" on it O(1) instead of O(n) when the entire list is interated, and prevents the creation of a costly list.
In 2, use xrange to iterate over a range object instead of a list.
(#Tom: I'll delete this if you post an answer).
It's hard to see exactly what you need to do because of the missing code, but it's clear that you need to learn how to vectorize your numpy code. This can lead to a 100x speedup.
You can probably get rid of all the inner for-loops and replace them with vectorized operations.
eg. instead of
for i in range(26):
delta[i][0] = #CALCULATION matrix read and multiplication each cell is differen
do
delta[:, 0] = # Vectorized form of whatever operation you were going to do.

Numpy matrix row comparison

The question is more focused on performance of calculation.
I have 2 matrix with the same number of columns and different number of rows. One matrix is the 'pattern' whose rows have to be compared separately with the other matrix rows (all rows), then to be able to extract statistical values of mean equal to pattern, std,...
So, I have the following matrix and the computation is the following one:
numCols = 10
pattern = np.random.randint(0,2,size=(7,numCols))
matrix = np.random.randint(0,2,size=(5,numCols))
comp_mean = np.zeros(pattern.shape[0])
for i in range(pattern.shape[0]):
comp_mean[i] = np.mean(np.sum(pattern[i,:] == matrix, axis=1))
print comp_mean # Output example: [ 1.6 1. 1.6 2.2 2. 2. 1.6]
This is clear. The problem is that the number of matrix rows of both is much bigger (~1.000.000). So this code goes very slow. I tryed to implement numpy syntaxis as sometimes it surprises me improving the calculation time. So I did the following code (it could be strange, but it works!):
comp_mean = np.mean( np.sum( (pattern[np.repeat(np.arange(pattern.shape[0]), matrix.shape[0])].ravel() == np.tile(matrix.ravel(),pattern.shape[0])).reshape(pattern.shape[0],matrix.shape[0],matrix.shape[1]), axis=2 ),axis=1)
print comp_mean
However, this code is slower than the previous one where the 'for' bucle is used. So I would like to know if there is any possibility to speed up the calculation.
EDIT
I have checked the runtime of the different approaches for the real matrix and the result is the following:
Me - Approach 1: 18.04 seconds
Me - Approach 2: 303.10 seconds
Divakar - Approach 1: 18.79 seconds
Divakar - Approach 2: 65.11 seconds
Divakar - Approach 3.1: 137.78 seconds
Divakar - Approach 3.2: 59.59 seconds
Divakar - Approach 4: 6.06 seconds
EDIT(2)
Previous runs where performed in a laptop. I have run the code on a desktop. I have avoided the worst results, and the new runtimes are now different:
Me - Approach 1: 6.25 seconds
Divakar - Approach 1: 4.01 seconds
Divakar - Approach 2: 3.66 seconds
Divakar - Approach 4: 3.12 seconds
Few approaches with broadcasting could be suggested here.
Approach #1
out = np.mean(np.sum(pattern[:,None,:] == matrix[None,:,:],2),1)
Approach #2
mrows = matrix.shape[0]
prows = pattern.shape[0]
out = (pattern[:,None,:] == matrix[None,:,:]).reshape(prows,-1).sum(1)/mrows
Approach #3
mrows = matrix.shape[0]
prows = pattern.shape[0]
out = np.einsum('ijk->i',(pattern[:,None,:] == matrix[None,:,:]).astype(int))/mrows
# OR out = np.einsum('ijk->i',(pattern[:,None,:] == matrix[None,:,:])+0)/mrows
Approach #4
If the number of rows in matrix is a huge number, it could be better to stick to a for-loop to avoid the huge memory requirements for such a case, that might also lead to slow runtimes. Instead, we could do some optimizations within each loop iteration. Here's one such possible optimization shown -
mrows = matrix.shape[0]
comp_mean = np.zeros(pattern.shape[0])
for i in range(pattern.shape[0]):
comp_mean[i] = (pattern[i,:] == matrix).sum()
comp_mean = comp_mean/mrows
could you have a try at this:
import scipy.ndimage.measurements
comp_mean = np.zeros(pattern.shape[0])
for i in range(pattern.shape[0]):
m = scipy.ndimage.measurements.histogram(matrix,0,1,2,pattern[i],[0,1])
comp_mean[i] = m[0][0]+m[1][1]
comp_mean /= matrix.shape[0]
Regards.

Log-computations in Python

I'm looking to compute something like:
Where f(i) is a function that returns a real number in [-1,1] for any i in {1,2,...,5000}.
Obviously, the result of the sum is somewhere in [-1,1], but when I can't seem to be able to compute it in Python using straight forward coding, as 0.55000 becomes 0 and comb(5000,2000) becomes inf, which result in the computed sum turning into NaN.
The required solution is to use log on both sides.
That is using the identity a × b = 2log(a) + log(b), if I could compute log(a) and log(b) I could compute the sum, even if a is big and b is almost 0.
So I guess what I'm asking is if there's an easy way of computing
log2(scipy.misc.comb(5000,2000))
So I could compute my sum simply by
sum([2**(log2comb(5000,i)-5000) * f(i) for i in range(1,5000) ])
#abarnert's solution, while working for the 5000 figure addresses the problem by increasing the precision in which the comb is computed. This works for this example, but doesn't scale, as the memory required would significantly increase if instead of 5000 we had 1e7 for example.
Currently, I'm using a workaround which is ugly, but keeps memory consumption low:
log2(comb(5000,2000)) = sum([log2 (x) for x in 1:5000])-sum([log2 (x) for x in 1:2000])-sum([log2 (x) for x in 1:3000])
Is there a way of doing so in a readable expression?
The sum
is the expectation of f with respect to a binomial distribution with n = 5000 and p = 0.5.
You can compute this with scipy.stats.binom.expect:
import scipy.stats as stats
def f(i):
return i
n, p = 5000, 0.5
print(stats.binom.expect(f, (n, p), lb=0, ub=n))
# 2499.99999997
Also note that as n goes to infinity, with p fixed, the binomial distribution approaches the normal distribution with mean np and variance np*(1-p). Therefore, for large n you can instead compute:
import math
print(stats.norm.expect(f, loc=n*p, scale=math.sqrt((n*p*(1-p))), lb=0, ub=n))
# 2500.0
EDIT: #unutbu has answered the real question, but I'll leave this here in case log2comb(n, k) is useful to anyone.
comb(n, k) is n! / ((n-k)! k!), and n! can be computed using the Gamma function gamma(n+1). Scipy provides the function scipy.special.gamma. Scipy also provides gammaln, which is the log (natural log, that is) of the Gamma function.
So log(comb(n, k)) can be computed as gammaln(n+1) - gammaln(n-k+1) - gammaln(k+1)
For example, log(comb(100, 8)) (after executing from scipy.special import gammaln):
In [26]: log(comb(100, 8))
Out[26]: 25.949484949043022
In [27]: gammaln(101) - gammaln(93) - gammaln(9)
Out[27]: 25.949484949042962
and log(comb(5000, 2000)):
In [28]: log(comb(5000, 2000)) # Overflow!
Out[28]: inf
In [29]: gammaln(5001) - gammaln(3001) - gammaln(2001)
Out[29]: 3360.5943053174142
(Of course, to get the base-2 logarithm, just divide by log(2).)
For convenience, you can define:
from math import log
from scipy.special import gammaln
def log2comb(n, k):
return (gammaln(n+1) - gammaln(n-k+1) - gammaln(k+1)) / log(2)
By default, comb gives you a float64, which overflows and gives you inf.
But if you pass exact=True, it gives you a Python variable-sized int instead, which can't overflow (unless you get so ridiculously huge you run out of memory).
And, while you can't use np.log2 on an int, you can use Python's math.log2.
So:
math.log2(scipy.misc.comb(5000, 2000, exact=True))
As an alternative, you relative that n choose k is defined as n!k / k!, right? You can reduce that to ∏(i=1...k)((n+1-i)/i), which is simple to compute.
Or, if you want to avoid overflow, you can do it by alternating * (n-i) and / (k-i).
Which, of course, you can also reduce to adding and subtracting logs. I think looping in Python and computing 4000 logarithms is going to be slower than looping in C and computing 4000 multiplications, but we can always vectorize it, and then, it might be faster. Let's write it and test:
In [1327]: n, k = 5000, 2000
In [1328]: %timeit math.log2(scipy.misc.comb(5000, 2000, exact=True))
100 loops, best of 3: 1.6 ms per loop
In [1329]: %timeit np.log2(np.arange(n-k+1, n+1)).sum() - np.log2(np.arange(1, k+1)).sum()
10000 loops, best of 3: 91.1 µs per loop
Of course if you're more concerned with memory instead of time… well, this obviously makes it worse. We've got 2000 8-byte floats instead of one 608-byte integer at a time. And if you go up to 100000, 20000, you get 20000 8-byte floats instead of one 9K integer. And at 1000000, 200000, it's 200000 8-byte floats vs. one 720K integer.
I'm not sure why either way is a problem for you. Especially given that you're using a listcomp instead of a genexpr, and therefore creating an unnecessary list of 5000, 100000, or 1000000 Python floats—24MB is not a problem, but 720K is? But if it is, we can obviously just do the same thing iteratively, at the cost of some speed:
r = sum(math.log2(n-i) - math.log2(k-i) for i in range(n-k))
This isn't too much slower than the scipy solution, and it never uses more than a small constant number of bytes (a handful of Python floats). (Unless you're on Python 2, in which case… just use xrange instead of range and it's back to constant.)
As a side note, why are you using a list comprehension instead of an NumPy array with vectorized operations (for speed, and also a bit of compactness) or a generator expression instead of a list comprehension (for no memory usage at all, at no cost to speed)?

NumPy vs Cython - nested loop so slow?

I am confused how NumPy nested loop for 3D array is so slow in comparison with Cython.
I wrote trivial example.
Python/NumPy version:
import numpy as np
def my_func(a,b,c):
s=0
for z in xrange(401):
for y in xrange(401):
for x in xrange(401):
if a[z,y,x] == 0 and b[x,y,z] >= 0:
c[z,y,x] = 1
b[z,y,x] = z*y*x
s+=1
return s
a = np.zeros((401,401,401), dtype=np.float32)
b = np.zeros((401,401,401), dtype=np.uint32)
c = np.zeros((401,401,401), dtype=np.uint8)
s = my_func(a,b,c)
Cythonized version:
cimport numpy as np
cimport cython
#cython.boundscheck(False)
#cython.wraparound(False)
def my_func(np.float32_t[:,:,::1] a, np.uint32_t[:,:,::1] b, np.uint8_t[:,:,::1] c):
cdef np.uint16_t z,y,x
cdef np.uint32_t s = 0
for z in range(401):
for y in range(401):
for x in range(401):
if a[z,y,x] == 0 and b[x,y,z] >= 0:
c[z,y,x] = 1
b[z,y,x] = z*y*x
s = s+1
return s
Cythonized version of my_func() runs approx. 6500x faster. Simpler function only with if-statement and array access can be even 10000x faster. Python version of my_func() takes 500.651 sec. to finish. Is iterating over relatively small 3D array so slow or I made some mistake in code?
Cython version 0.21.1, Python 2.7.5, GCC 4.8.1, Xubuntu 13.10.
Python is an interpreted language. One of the benefits of compiling to machine code is the huge speedup you get, especially with things like nested loops.
I don't know what your expectations are, but all interpreted languages will be terribly slow at the things you are trying to do (JIT compiling may help to some extent though).
The trick of getting good performance out of Numpy (or MATLAB or anything similar) is to avoid looping altogether and instead try to refactor your code into a few operations on large matrices. This way, the looping will take place in the (heavily optimized) machine code libraries instead of your Python code.
As mentioned by Krumelur, python loops are definitely slow. You can, however, use numpy to your advantage. Operations on entire arrays are quite fast, although you need a little ingenuity sometimes.
For instance, in your code, since your loop never reads the value in b after you modify it (I think? My head is a little fuzzy at the moment, so you'll definitely want to go through this), the following should be equivalent:
# Precalculate a matrix of x*y*z
tmp = np.indices(a.shape)
prod = (tmp[:,:,:,0] * tmp[:,:,:,1] * tmp[:,:,:,2]).T
# Use array-wide logical operations to compute c using a and the transpose of b
condition = np.logical_and(a == 0, b.T >= 0)
# Use condition to alter b and c only where condition is true
b[condition] = prod[condition]
c[condition] = 1
s = condition.sum()
So this does calculate x*y*z even in cases where the condition is false. You could probably avoid that if it turns out that is using lots of time, but it's likely not to be a significant factor.
For loop with numpy array in python is slow, you should use vector calculation as possible. If the algorithm need for loop for every elements in the array, here is some speedup hint.
a[z,y,x] is a numpy scalar value, calculation with numpy scalar values is very slow:
x = 3.0
%timeit x > 0
x = np.float64(3.0)
%timeit x > 0
the output on my pc with numpy 1.8.2, windows 7:
10000000 loops, best of 3: 64.3 ns per loop
1000000 loops, best of 3: 657 ns per loop
you can use item() method to get the python value directly:
if a.item(z, y, x) == 0 and b.item(x, y, z) >= 0:
...
it can speedup the for loop about 8x.

How do I maximize efficiency with numpy arrays?

I am just getting to know numpy, and I am impressed by its claims of C-like efficiency with memory access in its ndarrays. I wanted to see the differences between these and pythonic lists for myself, so I ran a quick timing test, performing a few of the same simple tasks with numpy without it. Numpy outclassed regular lists by an order of magnitude in the allocation of and arithmetic operations on arrays, as expected. But this segment of code, identical in both tests, took about 1/8 of a second with a regular list, and slightly over 2.5 seconds with numpy:
file = open('timing.log','w')
for num in a2:
if num % 1000 == 0:
file.write("Multiple of 1000!\r\n")
file.close()
Does anyone know why this might be, and if there is some other syntax i should be using for operations like this to take better advantage of what the ndarray can do?
Thanks...
EDIT: To answer Wayne's comment... I timed them both repeatedly and in different orders and got pretty much identical results each time, so I doubt it's another process. I put start = time() at the top of the file after the numpy import and then I have statements like print 'Time after traversal:\t',(time() - start) throughout.
a2 is a NumPy array, right? One possible reason it might be taking so long in NumPy (if other processes' activity don't account for it as Wayne Werner suggested) is that you're iterating over the array using a Python loop. At every step of the iteration, Python has to fetch a single value out of the NumPy array and convert it to a Python integer, which is not a particularly fast operation.
NumPy works much better when you are able to perform operations on the whole array as a unit. In your case, one option (maybe not even the fastest) would be
file.write("Multiple of 1000!\r\n" * (a2 % 1000 == 0).sum())
Try comparing that to the pure-Python equivalent,
file.write("Multiple of 1000!\r\n" * sum(filter(lambda i: i % 1000 == 0, a2)))
or
file.write("Multiple of 1000!\r\n" * sum(1 for i in a2 if i % 1000 == 0))
I'm not surprised that NumPy does poorly w/r/t Python built-ins when using your snippet. A large fraction of the performance benefit in NumPy arises from avoiding the loops and instead access the array by indexing:
In NumPy, it's more common to do something like this:
A = NP.random.randint(10, 100, 100).reshape(10, 10)
w = A[A % 2 == 0]
NP.save("test_file.npy", w)
Per-element access is very slow for numpy arrays. Use vector operations:
$ python -mtimeit -s 'import numpy as np; a2=np.arange(10**6)' '
> sum(1 for i in a2 if i % 1000 == 0)'
10 loops, best of 3: 1.53 sec per loop
$ python -mtimeit -s 'import numpy as np; a2=np.arange(10**6)' '
> (a2 % 1000 == 0).sum()'
10 loops, best of 3: 22.6 msec per loop
$ python -mtimeit -s 'import numpy as np; a2= range(10**6)' '
> sum(1 for i in a2 if i % 1000 == 0)'
10 loops, best of 3: 90.9 msec per loop

Categories