Trigonometry functions from series expansion - python

I am trying to write functions which emulate math.sin and math.tan but, instead of using the math library, performing the calculation using a series expansion.
The formulae are from Mathematics SE, How would you calculate the Tangent without a calculator?:
sin(x) = x − x^3/3! + x^5/5! −...
tan(x) = sin(x) / √(1 − sin(x)^2)
This is my attempt, but I could not figure out how to perform the sign flipping + / - / + / ... part of the series expansion for sin:
from math import factorial
res = 0
for i in [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]:
res += 1**i/factorial(i)
print(res) # 1.1752011936438016
The result is not correct because I have not applied a + / - switch. I could add an if / else clause but this seems messy. Is there a better way?
Note: This question is an embellished version of a now deleted question that was posted yesterday by #Lana.

You can avoid recalculating x**n and the factorial at each step by calculating the next term of the sum using the previous one:
def sin2(x, n=20):
curr = x
res = curr
for i in range(2, n, 2):
curr *= - x**2/(i*(i+1))
res += curr
return res
Compared to jpp's version, it's about twice as fast:
from math import factorial
def sin(x, n=20):
return sum(x**j/factorial(j)*(1 if i%2==0 else -1)
for i, j in enumerate(range(1, n, 2)))
%timeit sin(0.7)
# 100000 loops, best of 3: 8.52 µs per loop
%timeit sin2(0.7)
# 100000 loops, best of 3: 4.54 µs per loop
And it can get a bit faster if we calculate - x**2 once and for all:
def sin3(x, n=20):
curr = x
res = 0
minus_x_squared = - x**2
for i in range(2, n, 2):
res += curr
curr *= minus_x_squared/(i*(i+1))
return res
%timeit sin2(0.7)
# 100000 loops, best of 3: 4.6 µs per loop
%timeit sin3(0.7)
# 100000 loops, best of 3: 3.54 µs per loop

You are close. Below is one way using sum with enumerate for your series expansion.
enumerate works by taking each value of an iterable and attaching an index, i.e. 0 for the first item, 1 for the second item, etc. Then we only need to test whether the index is even or odd and use a ternary statement.
In addition, you can use range instead of listing the odd numbers required in your expansion.
from math import factorial
def sin(x, n=20):
return sum(x**j/factorial(j)*(1 if i%2==0 else -1)
for i, j in enumerate(range(1, n, 2)))
def tan(x):
return sin(x) / (1-(sin(x))**2)**0.5
print(tan(1.2)) # 2.572151622126318
You can avoid the need for a ternary statement and enumerate altogether:
def sin(x, n=20):
return sum((-1)**i * x**(2*i+1) / factorial(2*i+1) for i in range(n))
If you write out the first few terms by hand, the equivalence will become clear.
Notes:
The sign of the tan function is only correct for 1st and 4th quadrants. This is consistent with the formulae you have provided. You can perform a trivial transformation to the input to account for this.
You can improve accuracy by increasing parameter n.
You can also calculate factorial without a library, but I'll leave that as an exercise.

Related

Efficient summation in Python

I am trying to efficiently compute a summation of a summation in Python:
WolframAlpha is able to compute it too a high n value: sum of sum.
I have two approaches: a for loop method and an np.sum method. I thought the np.sum approach would be faster. However, they are the same until a large n, after which the np.sum has overflow errors and gives the wrong result.
I am trying to find the fastest way to compute this sum.
import numpy as np
import time
def summation(start,end,func):
sum=0
for i in range(start,end+1):
sum+=func(i)
return sum
def x(y):
return y
def x2(y):
return y**2
def mysum(y):
return x2(y)*summation(0, y, x)
n=100
# method #1
start=time.time()
summation(0,n,mysum)
print('Slow method:',time.time()-start)
# method #2
start=time.time()
w=np.arange(0,n+1)
(w**2*np.cumsum(w)).sum()
print('Fast method:',time.time()-start)
Here's a very fast way:
result = ((((12 * n + 45) * n + 50) * n + 15) * n - 2) * n // 120
How I got there:
Rewrite the inner sum as the well-known x*(x+1)//2. So the whole thing becomes sum(x**2 * x*(x+1)//2 for x in range(n+1)).
Rewrite to sum(x**4 + x**3 for x in range(n+1)) // 2.
Look up formulas for sum(x**4) and sum(x**3).
Simplify the resulting mess to (12*n**5 + 45*n**4 + 50*n**3 + 15*n**2 - 2*n) // 120.
Horner it.
Another way to derive it if after steps 1. and 2. you know it's a polynomial of degree 5:
Compute six values with a naive implementation.
Compute the polynomial from the six equations with six unknowns (the polynomial coefficients). I did it similarly to this, but my matrix A is left-right mirrored compared to that, and I called my y-vector b.
Code:
from fractions import Fraction
import math
from functools import reduce
def naive(n):
return sum(x**2 * sum(range(x+1)) for x in range(n+1))
def lcm(ints):
return reduce(lambda r, i: r * i // math.gcd(r, i), ints)
def polynomial(xys):
xs, ys = zip(*xys)
n = len(xs)
A = [[Fraction(x**i) for i in range(n)] for x in xs]
b = list(ys)
for _ in range(2):
for i0 in range(n):
for i in range(i0 + 1, n):
f = A[i][i0] / A[i0][i0]
for j in range(i0, n):
A[i][j] -= f * A[i0][j]
b[i] -= f * b[i0]
A = [row[::-1] for row in A[::-1]]
b.reverse()
coeffs = [b[i] / A[i][i] for i in range(n)]
denominator = lcm(c.denominator for c in coeffs)
coeffs = [int(c * denominator) for c in coeffs]
horner = str(coeffs[-1])
for c in coeffs[-2::-1]:
horner += ' * n'
if c:
horner = f"({horner} {'+' if c > 0 else '-'} {abs(c)})"
return f'{horner} // {denominator}'
print(polynomial((x, naive(x)) for x in range(6)))
Output (Try it online!):
((((12 * n + 45) * n + 50) * n + 15) * n - 2) * n // 120
(fastest methods, 3 and 4, are at the end)
In a fast NumPy method you need to specify dtype=np.object so that NumPy does not convert Python int to its own dtypes (np.int64 or others). It will now give you correct results (checked it up to N=100000).
# method #2
start=time.time()
w=np.arange(0, n+1, dtype=np.object)
result2 = (w**2*np.cumsum(w)).sum()
print('Fast method:', time.time()-start)
Your fast solution is significantly faster than the slow one. Yes, for large N's, but already at N=100 it is like 8 times faster:
start=time.time()
for i in range(100):
result1 = summation(0, n, mysum)
print('Slow method:', time.time()-start)
# method #2
start=time.time()
for i in range(100):
w=np.arange(0, n+1, dtype=np.object)
result2 = (w**2*np.cumsum(w)).sum()
print('Fast method:', time.time()-start)
Slow method: 0.06906533241271973
Fast method: 0.008007287979125977
EDIT: Even faster method (by KellyBundy, the Pumpkin) is by using pure python. Turns out NumPy has no advantage here, because it has no vectorized code for np.objects.
# method #3
import itertools
start=time.time()
for i in range(100):
result3 = sum(x*x * ysum for x, ysum in enumerate(itertools.accumulate(range(n+1))))
print('Faster, pure python:', (time.time()-start))
Faster, pure python: 0.0009944438934326172
EDIT2: Forss noticed that numpy fast method can be optimized by using x*x instead of x**2. For N > 200 it is faster than pure Python method. For N < 200 it is slower than pure Python method (the exact value of boundary may depend on machine, on mine it was 200, its best to check it yourself):
# method #4
start=time.time()
for i in range(100):
w = np.arange(0, n+1, dtype=np.object)
result2 = (w*w*np.cumsum(w)).sum()
print('Fast method x*x:', time.time()-start)
Comparing Python with WolframAlpha like that is unfair, since Wolfram will simplify the equation before computing.
Fortunately, the Python ecosystem knows no limits, so you can use SymPy:
from sympy import summation
from sympy import symbols
n, x, y = symbols("n,x,y")
eq = summation(x ** 2 * summation(y, (y, 0, x)), (x, 0, n))
eq.evalf(subs={"n": 1000})
It will compute the expected result almost instantly: 100375416791650. This is because SymPy simplifies the equation for you, just like Wolfram does. See the value of eq:
#Kelly Bundy's answer is awesome, but if you are like me and use a calculator to compute 2 + 2, then you will love SymPy ❤. As you can see, it gets you to the same results with just 3 lines of code and is a solution that would also work for other more complex cases.
In a comment, you mention that it's really f(x) and g(y) instead of x2 and y. If you're only needing an approximation to that sum, you can pretend the sums are midpoint Riemann sums, so that your sum is approximated by the double integral ∫-.5n+.5 f(x) ∫-.5x+.5 g(y) dy dx.
With your original f(x)=x2 and g(y)=y, this simplifies to n5/10+3n4/8+n3/2+5n2/16+3n/32+1/160, which differs from the correct result by n3/12+3n2/16+53n/480+1/160.
Based on this, I suspect that (actual-integral)/actual would be max(f'',g'')*O(n-2), but I wasn't able to prove it.
All the answers uses math to simplify or implement the loop in python trying to be cpu optimal, but they are not memory optimal.
Here a naive implementation without using any math simplification which is memory efficient
def function5():
inner_sum = float()
result = float()
for x in range(0, n + 1):
inner_sum += x
result += x ** 2 * inner_sum
return result
It is quite slow with respect to the other solutions by dankal444:
method 2 | 31 µs ± 2.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
method 3 | 116 µs ± 538 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
method 4 | 91 µs ± 356 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
function 5 | 217 µs ± 1.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
by the way if you jit the function with numba (there may be better options):
from numba import jit
function5 = jit(nopython=True)(function5)
you get
59.8 ns ± 0.209 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Randomly generate 1 or -1 (positive or negative integer)

I wanted to generate 1 or -1 in Python as a step to randomizing between non-negative and non-positive numbers or to randomly changing sign of an already existing integer. What would be the best way to generate 1 or -1 in Python? Assuming even distribution I know I could use:
import random
#method1
my_number = random.choice((-1, 1))
#method2
my_number = (-1)**random.randrange(2)
#method3
# if I understand correctly random.random() should never return exactly 1
# so I use "<", not "<="
if random.random() < 0.5:
my_number = 1
else:
my_number = -1
#method4
my_number = random.randint(0,1)*2-1
Using timeit module I got the following results:
#method1
s = "my_number = random.choice((-1, 1))"
timeit.timeit(stmt = s, setup = "import random")
>2.814896769857569
#method2
s = "my_number = (-1)**random.randrange(2)"
timeit.timeit(stmt = s, setup = "import random")
>3.521280517518562
#method3
s = """
if random.random() < 0.5: my_number = 1
else: my_number = -1"""
timeit.timeit(stmt = s, setup = "import random")
>0.25321546903273884
#method4
s = "random.randint(0,1)*2-1"
timeit.timeit(stmt = s, setup = "import random")
>4.526625442240402
So unexpectedly method 3 is the fastest. My bet was on method 1 to be the fastest as it is also shortest. Also both method 1 (since Python 3.6 I think?) and 3 give the possibility to introduce uneven distributions. Although method 1 is shortest (main advantege) for now I would choose method 3:
def positive_or_negative():
if random.random() < 0.5:
return 1
else:
return -1
Testing:
s = """
import random
def positive_or_negative():
if random.random() < 0.5:
return 1
else:
return -1
"""
timeit.timeit(stmt = "my_number = positive_or_negative()", setup = s)
>0.3916183138621818
Any better (faster or shorter) method to randomly generate -1 or 1 in Python? Any reason why would you choose method 1 over method 3 or vice versa?
A one liner variation of #3:
return 1 if random.random() < 0.5 else -1
It's fast(er) than the 'math' variants, because it doesn't involve additional arithmetic.
Here's another one-liner that my timings show to be faster than the if/else comparison to 0.5:
[-1,1][random.randrange(2)]
not sure what your application is exactly, but I needed something similar for a large vectorized array.
Here's a good way to get a sign array:
(2*np.random.randint(0,2,size=(your_size))-1)
The result is an array, for example:
array([-1, -1, -1, 1, 1, 1, -1, -1, 1, 1, 1, -1, -1, 1, -1])
and you can use the reshape command to get the above to the size of your matrix:
(2*np.random.randint(0,2,size=(m*n))-1).reshape(m,n)
Then you can multiply a matrix by the above and get all of the members with random signs.
A= np.array([[1, 2, 3],
[4, 5, 6]])
B = A*(2*np.random.randint(0,2,size=(2*3))-1).reshape(2,3)
Then you get something like :
B = array([[ 1, 2, -3],[ 4, 5, -6]])
Pretty quick, if your data is vectorized.
Maths made simple:
Generate random number: 0 or 1
Get them mutiplied by 2: 0 or 2
Substract 1: -1 or 1
Adapt that to any programming code. No need for test functions.
print(random.randint(0,1)*2-1)
works also without randint
print(int(random.random()*2)*2-1)
The fastest way to generate random numbers if you're going to be doing lots of them is by using numpy:
In [1]: import numpy as np
In [2]: import random
In [3]: %timeit [random.choice([-1,1]) for i in range(100000)]
10 loops, best of 3: 88.9 ms per loop
In [4]: %timeit [(-1)**random.randrange(2) for i in range(100000)]
10 loops, best of 3: 110 ms per loop
In [5]: %timeit [1 if random.random() < 0.5 else -1 for i in range(100000)]
100 loops, best of 3: 18.4 ms per loop
In [6]: %timeit [random.randint(0,1)*2-1 for i in range(100000)]
1 loop, best of 3: 180 ms per loop
In [7]: %timeit np.random.choice([-1,1],size=100000)
1000 loops, best of 3: 1.52 ms per loop
If you need single bits (one per call), you already did your benchmark and other answers provide additional info.
If you need many bits or can pre-calculate bit-arrays for later consumption, numpy's methods might shine.
Here is some more demo-approach using numpy (which surprisingly does not have a method dedicated for this job exactly):
import numpy as np
import random
def sample_bits(N):
assert N % 8 == 0 # demo only
n_bytes = N // 8
rbytes = np.random.randint(0, 255, dtype=np.uint8, size=n_bytes)
return np.unpackbits(rbytes)
def alt(N):
return np.random.choice([-1,1],size=N)
def alt2(N):
return [1 if random.random() < 0.5 else -1 for i in range(N)]
if __name__ == '__main__':
import timeit
print(timeit.timeit("sample_bits(1024)", setup="from __main__ import sample_bits", number=10000))
print(timeit.timeit("alt(1024)", setup="from __main__ import alt", number=10000))
print(timeit.timeit("alt2(1024)", setup="from __main__ import alt2", number=10000))
Output:
0.06640421246836543
0.352129537507486
1.5522800431775592
The general idea is:
use numpy to generate many uint8's in one step
(there might be something better using internal functions without the randint-API)
unpack uint8's to 8 bits
uniformity follows from randint's uniformity guarantees
Again, this is only a demo:
for one specific case
not caring about different result-types of these functions
not caring about -1 vs. 0 (might be important in your use-case)
(not even optimal compared to much more low-level approaches; MT used internally can be used as a bit-source, which does not need fp-math, like many other PRNGs!)
My code is as
vals = array("i", [-1, 1])
def my_rnd():
return vals[randint(0, 7) % 2]

Comparing runtime of standard list comprehension vs. NumPy

I'm experimenting with NumPy to see how and where it is faster than using generic list comprehensions in Python. Here's a standard coding question I'm using for this experiment.
Find the sum of all the multiples of 3 or 5 below 1000000.
I have written three functions to compute this number.
def fA(M):
sum = 0
for x in range(M):
if x % 3 == 0 or x % 5 == 0:
sum += x
return sum
def fB(M):
multiples_3 = range(0, M, 3)
multiples_5 = range(0, M, 5)
multiples_15 = range(0, M, 15)
return sum(multiples_3) + sum(multiples_5) - sum(multiples_15)
def fC(M):
arr = np.arange(M)
return np.sum(arr[np.logical_or(arr % 3 == 0, arr % 5 == 0)])
I first did a quick sanity check to see that the three functions produced the same answer.
I then used timeit to compare the runtimes for the three functions.
%timeit -n 100 fA(1000000)
100 loops, best of 3: 182 ms per loop
%timeit -n 100 fB(1000000)
100 loops, best of 3: 14.4 ms per loop
%timeit -n 100 fC(1000000)
100 loops, best of 3: 44 ms per loop
It's no surprise that fA is the slowest. But why is fB so much better than fC? Is there a better way to compute this answer using NumPy?
I don't think size is an issue here. In fact, if I change the 1e6 to 1e9, fC becomes even slower when compared to fB.
fB is so much faster than fC because fC is not the NumPy equivalent of fB. fC is the NumPy equivalent of fA. This is the NumPy equivalent of fB:
def fD(M):
multiples_3 = np.arange(0, M, 3)
multiples_5 = np.arange(0, M, 5)
multiples_15 = np.arange(0, M, 15)
return multiples_3.sum() + multiples_5.sum() - multiples_15.sum()
It runs way faster:
In [4]: timeit fB(1000000)
100 loops, best of 3: 9.96 ms per loop
In [5]: timeit fD(1000000)
1000 loops, best of 3: 637 µs per loop
In fB you are constructing the ranges with the exact multiples you want. Their sizes become smaller from 3 to 5 to 15 and thus each takes less time to construct than the one before, after they are constructed you only need to take the sum and do some arithmetic.
In fC you are constructing a 100000 element array, the size isn't really the issue as much as the two modulo comparisons you are doing which must look at every single element in the array. This takes the lion's share of the execution time (about 90 %) for fC.
You're only really using numpy there to generate an array. You'd see a much bigger difference if you were trying to perform operations on arrays as opposed to performing them on lists or tuples. With regards to that particular problem, take a look at the function fD in the code below, which just calculates how many multiples there should be in each range and then calculates their sum, rather than generating the array. Actually, if you run the below snippet, you'll see how the times change in function of M. Also, fC breaks down for M >= 100000. I couldn't tell you why.
import numpy as np
from time import time
def fA(M):
sum = 0
for x in range(M):
if x % 3 == 0 or x % 5 == 0:
sum += x
return sum
def fB(M):
multiples_3 = range(0, M, 3)
multiples_5 = range(0, M, 5)
multiples_15 = range(0, M, 15)
return sum(multiples_3) + sum(multiples_5) - sum(multiples_15)
def fC(M):
arr = np.arange(M)
return np.sum(arr[np.logical_or(arr % 3 == 0, arr % 5 == 0)])
def fD(M):
return sum_mult(M,3)+sum_mult(M,5)-sum_mult(M,15)
def sum_mult(M,n):
instances=(M-1)/n
check=len(range(n,M,n))
return (n*instances*(instances+1))/2
for x in range(5,20):
print "*"*20
M=2**x
print M
answers=[]
T=[]
for f in (fA,fB,fC,fD):
ts=time()
answers.append(f(M))
for i in range(20):
f(M)
T.append(time()-ts)
if not all([x==answers[0] for x in answers]):
print "Warning! Answers do not match!",answers
print T

Minimum, Maximum and Average in a single iteration

If I have a list of numbers or objects in a list like l = [3,5,3,6,47,89]. We can calculate the minimum, maximum and average using following python code
minimum = min(l)
maximum = max(l)
avg = sum(l) / len(l)
Since all involve iterating the entire list, it is slow for large lists and lot of code.Is there any python module which can calculate all these values together?
Cython function:
#cython.boundscheck(False)
#cython.wraparound(False)
def minmaxAvg(list x):
cdef int i
cdef int _min, _max, total
_min = x[0]
_max = x[0]
total = 0
for i in x:
if i < _min: _min = i
elif i > _max: _max = i
total += i
return _min, _max, total/len(x)
pure python function to compare against:
def builtinfuncs(x):
a = min(x)
b = max(x)
avg = sum(x) / len(x)
return a,b,avg
In [16]: x = [random.randint(0,1000) for _ in range(10000)]
In [17]: %timeit minmaxAvg(x)
10000 loops, best of 3: 34 µs per loop
In [18]: %timeit frob(x)
1000 loops, best of 3: 460 µs per loop
Disclaimer:
- Speed result from cython will be dependent on computer hardware.
- Not as flexible and foolproof as using builtins. You would have to change the function to handle anything but integers for example.
- Before going down this path, you should ask yourself if this operation really is a big bottleneck in your application. It's probably not.
If you have pandas installed, you can do something like this:
import numpy as np
import pandas
s = pandas.Series(np.random.normal(size=37))
stats = s.describe()
stats will be a another series that behaves like a dictionary:
print(stats)
count 37.000000
mean 0.072138
std 0.932000
min -1.267888
25% -0.688728
50% -0.048624
75% 0.784244
max 2.501713
dtype: float64
stats['max']
2.501713
...etc. However, I don't recommend this unless you're striving simply for concise code. Here's why:
%%timeit
stats = s.describe()
# 100 loops, best of 3: 1.44 ms per loop
%%timeit
mymin = min(s)
mymax = max(s)
myavg = sum(s)/len(s)
# 10000 loops, best of 3: 89.5 µs per loop
I just can't imagine that you'll be able to squeeze any more performance out of the built-in functions with your own implementations (barring some cython voodoo, maybe).

For loop speed with Numpy

I am trying to get this code running fast in python however I am having trouble getting it to run anywhere near the speed it runs in MATLAB. The problem seems to be this for loop which takes about 2 second to run when the number "SRpixels" is approximately equal to 25000.
I cant seem to find any way to trim this down any further, and I am looking for suggestions.
The datatypes for the numpy arrays below are float32 for all except the **_Location[] which are uint32.
for j in range (0,SRpixels):
#Skip data if outside valid range
if (abs(SR_pointCloud[j,0]) > SR_xMax or SR_pointCloud[j,2] > SR_zMax or SR_pointCloud[j,2] < 0):
pass
else:
RIGrid1_Location[j,0] = np.floor(((SR_pointCloud[j,0] + xPosition + 5) - xGrid1Center) / gridSize)
RIGrid1_Location[j,1] = np.floor(((SR_pointCloud[j,2] + yPosition) - yGrid1LowerBound) / gridSize)
RIGrid1_Count[RIGrid1_Location[j,0],RIGrid1_Location[j,1]] += 1
RIGrid1_Sum[RIGrid1_Location[j,0],RIGrid1_Location[j,1]] += SR_pointCloud[j,1]
RIGrid1_SumofSquares[RIGrid1_Location[j,0],RIGrid1_Location[j,1]] += SR_pointCloud[j,1] * SR_pointCloud[j,1]
RIGrid2_Location[j,0] = np.floor(((SR_pointCloud[j,0] + xPosition + 5) - xGrid2Center) / gridSize)
RIGrid2_Location[j,1] = np.floor(((SR_pointCloud[j,2] + yPosition) - yGrid2LowerBound) / gridSize)
RIGrid2_Count[RIGrid2_Location[j,0],RIGrid2_Location[j,1]] += 1
RIGrid2_Sum[RIGrid2_Location[j,0],RIGrid2_Location[j,1]] += SR_pointCloud[j,1]
RIGrid2_SumofSquares[RIGrid2_Location[j,0],RIGrid2_Location[j,1]] += SR_pointCloud[j,1] * SR_pointCloud[j,1]
I did attempt to use Cython, where I replaced j with a cdef int j and compiled. There was no noticeable performance gain. Anyone have suggestions?
Vectorization is almost always the best way to speed up numpy code, and much of this seems vectorizable. To start, for example, the location arrays seem quite simple to do:
# these are all of your j values
inds = np.arange(0,SRpixels)
# these are the j values you don't want to skip
sel = np.invert((abs(SR_pointCloud[inds,0]) > SR_xMax) | (SR_pointCloud[inds,2] > SR_zMax) | (SR_pointCloud[inds,2] < 0))
RIGrid1_Location[sel,0] = np.floor(((SR_pointCloud[sel,0] + xPosition + 5) - xGrid1Center) / gridSize)
RIGrid1_Location[sel,1] = np.floor(((SR_pointCloud[sel,2] + yPosition) - yGrid1LowerBound) / gridSize)
RIGrid2_Location[sel,0] = np.floor(((SR_pointCloud[sel,0] + xPosition + 5) - xGrid2Center) / gridSize)
RIGrid2_Location[sel,1] = np.floor(((SR_pointCloud[sel,2] + yPosition) - yGrid2LowerBound) / gridSize)
This has no python loop.
The rest are trickier and will depend upon what you are doing, but should also be vectorizable if you think about them in this way.
If you really have something that can't be vectorized and must be done with a loop—I've only had this happen a few times—I'd suggest Weave over Cython. It's harder to use, but should give speeds comparable to C.
Try vectorization the calculation first, if you must do calculation element by element, here is some speedup hint:
Calculation with NumPy scalar is much slower than builtin scalars. array[i, j] will get a numpy scalar, and array.item(i,j) will return a builtin scalar.
functions in math module is faster than numpy when do scalar calculation.
Here is an example:
import numpy as np
import math
a = np.array([[1.1, 2.2, 3.3],[4.4, 5.5, 6.6]])
%timeit np.floor(a[0,0]*2)
%timeit math.floor(a[0,0]*2)
%timeit np.floor(a.item(0,0)*2)
%timeit math.floor(a.item(0,0)*2)
output:
100000 loops, best of 3: 10.2 µs per loop
100000 loops, best of 3: 3.49 µs per loop
100000 loops, best of 3: 6.49 µs per loop
1000000 loops, best of 3: 851 ns per loop
So change np.floor to math.floor, change SR_pointCloud[j,0] to SR_pointCloud.item(j,0) will speedup the loop alot.

Categories