Inaccurate Large Fibonacci Numbers in Python - python

I am currently implementing this simple code trying to find the n-th element of the Fibonacci sequence using Python 2.7:
import numpy as np
def fib(n):
F = np.empty(n+2)
F[1] = 1
F[0] = 0
for i in range(2,n+1):
F[i]=F[i-1]+F[i-2]
return int(F[n])
This works fine for F < 79, but after that I get wrong numbers. For example, according to wolfram alpha F79 should be equal to 14472334024676221, but fib(100) gives me 14472334024676220. I think this could be caused by the way python deals with integers, but I have no idea what exactly the problem is. Any help is greatly appreciated!

the default data type for a numpy array is depending on architecture a 64 (or 32) bit int.
pure python would let you have arbitrarily long integers; numpy does not.
so it's more the way numpy deals with integers; pure python would do just fine.

Python will deal with integers perfectly fine here. Indeed, that is the beauty of python. numpy, on the other hand, introduces ugliness and just happens to be completely unnecessary, and will likely slow you down. Your implementation will also require much more space. Python allows you to write beautiful, readable code. Here is Raymond Hettinger's canonical implementation of iterative fibonacci in Python:
def fib(n):
x, y = 0, 1
for _ in range(n):
x, y = y, x + y
return x
That is O(n) time and constant space. It is beautiful, readable, and succinct. It will also give you the correct integer as long as you have memory to store the number on your machine. Learn to use numpy when it is the appropriate tool, and as importantly, learn to not use it when it is inappropriate.

Unless you want to generate a list with all the fibonacci numbers until Fn, there is no need to use a list, numpy or anything else like that, a simple loop and 2 variables will be enough as you only really need to know the 2 previous values
def fib(n):
Fk, Fk1 = 0, 1
for _ in range(n):
Fk, Fk1 = Fk1, Fk+Fk1
return Fk
of course, there is better ways to do it using the mathematical properties of the Fibonacci numbers, with those we know that there is a matrix that give us the right result
import numpy
def fib_matrix(n):
mat = numpy.matrix( [[1,1],[1,0]], dtype=object) ** n
return mat[0,1]
to which I assume they have an optimized matrix exponentiation making it more efficient that the previous method.
Using the properties of the underlying Lucas sequence is possible to do it without the matriz, and equally as efficient as exponentiation by squaring and with the same number of variables as the other, but that is a little harder to understand at first glance unlike the first example because alongside the second example it require more mathematical.
The close form, the one with the golden ratio, will give you the result even faster, but that have the risk of being inaccurate because the use of floating point arithmetic.

As an additional word to the previous answer by hiro protagonist, note that if using Numpy is a requirement, you can solve very easely your issue by replacing:
F = np.empty(n+2)
with
F = np.empty(n+2, dtype=object)
but it will not do anything more than transferring back the computation to pure Python.

Related

Need help finding GCD (noob approach)

I am currently going through Math adventures with Python book by Peter Farrel. Now I am simply trying to improve my math skills while learning Python in a fun way. So we made a factors function as seen below:
def factors(num):
factorList = []
for i in range(1, num+1):
if num % i == 0:
factorList.append(i)
return factorList
Exercise 3-1 is asking to make GCF (Greatest Common Factor) function. All the answers here are how we could use builtin Python modules or recursive or Euclid algorithm. I have no clue what any of these things mean, let alone trying it on this assignment. I came with the following solution using the above function:
def gcFactor(num1, num2):
fnum1 = factors(num1)
fnum2 = factors(num2)
gcf = list(set(fnum1).intersection(fnum2))
return max(gcf)
print(gcFactor(28,21))
Is this the best way of doing it? Using the .intersection() function seems a little cheaty to me.
So what I wanted to do is if I could use a loop and separate the list values in fnum1 & fnum2 and compare them and then return the value that matches (which would make common factors) and is greatest (which would be GCF).
The idea behind your algorithm is sound, but there are a few problems:
In your original version, you used gcf[-1] to get the greatest factor, but that will not always work, since converting a set to list does not guarantee that the elements will be in sorted order, even if they were sorted before converting to set. Better use max (you already changed that).
Using set.intersection is definitely not "cheating" but just making good use of what the languages provides. It might be considered cheating to just use math.gcd, but not basic set or list functions.
Your algorithm is rather inefficient. I don't know the book, but I don't think you should actually use the factors function to calculate the gcf, but that was just an exercise to teach you stuff like loops and modulo. Consider two very different numbers as inputs, say 23764372 and 6. You'd calculate all the factors of 23764372 first, before testing the very few values that could actually be common factors. Instead of using factors directly, try to rewrite your gcFactor function to test which values up to the min of the two numbers are factors of both numbers.
Even then, your algorithm will not be very efficient. I would suggest reading up on Euclid's Algorithm and trying to implement that next. If unsure if you did it right, you can use your first function as a reference for testing, and to see the difference in performance.
About your factors function itself: Note that there is a symmetry: if i is a factor, so is n//i. If you use this, you do not have to test all the values up to n but just up to sqrt(n), which is a speed-up equivalent to reducing running time from O(n²) to O(n).

Timeit showing that regular python is faster than numpy?

I'm working on a piece of code for a game that calculates the distances between all the objects on the screen using their in-game coordinate positions. Originally I was going to use basic Python and lists to do this, but since the number of distances that will need calculated will increase exponentially with the number of objects, I thought it might be faster to do it with numpy.
I'm not very familiar with numpy, and I've been experimenting on basic bits of code with it. I wrote a little bit of code to time how long it takes for the same function to complete a calculation in numpy and in regular Python, and numpy seems to consistently take a good bit more time than the regular python.
The function is very simple. It starts with 1.1 and then increments 200,000 times, adding 0.1 to the last value and then finding the square root of the new value. It's not what I'll actually be doing in the game code, which will involve finding total distance vectors from position coordinates; it's just a quick test I threw together. I already read here that the initialization of arrays takes more time in NumPy, so I moved the initializations of both the numpy and python arrays outside their functions, but Python is still faster than numpy.
Here is the bit of code:
#!/usr/bin/python3
import numpy
from timeit import timeit
#from time import process_time as timer
import math
thing = numpy.array([1.1,0.0], dtype='float')
thing2 = [1.1,0.0]
def NPFunc():
for x in range(1,200000):
thing[0] += 0.1
thing[1] = numpy.sqrt(thing[0])
print(thing)
return None
def PyFunc():
for x in range(1,200000):
thing2[0] += 0.1
thing2[1] = math.sqrt(thing2[0])
print(thing2)
return None
print(timeit(NPFunc, number=1))
print(timeit(PyFunc, number=1))
It gives this result, which indicates normal Python is 3x faster:
[ 20000.99999999 141.42489173]
0.2917748889885843
[20000.99999998944, 141.42489172698504]
0.10341173503547907
Am I doing something wrong, is is this calculation just so simple that it isn't a good test for numpy?
Am I doing something wrong, is is this calculation just so simple that it isn't a good test for NumPy?
It's not really that the calculation is simple, but that you're not taking any advantage of NumPy.
The main benefit of NumPy is vectorization: you can apply an operation to every element of an array in one go, and whatever looping is needed happens inside some tightly-optimized C (or Fortran or C++ or whatever) loop inside NumPy, rather than in a slow generic Python iteration.
But you're only accessing a single value, so there's no looping to be done in C.
On top of that, because the values in an array are stored as "native" values, NumPy functions don't need to unbox them, pulling the raw C double out of a Python float, and then re-box them in a new Python float, the way any Python math functions have to.
But you're not doing that either. In fact, you're doubling that work: You're pulling the value out o the array as a float (boxing it), then passing it to a function (which has to unbox it, and then rebox it to return a result), then storing it back in an array (unboxing it again).
And meanwhile, because np.sqrt is designed to work on arrays, it has to first check the type of what you're passing it and decide whether it needs to loop over an array or unbox and rebox a single value or whatever, while math.sqrt just takes a single value. When you call np.sqrt on an array of 200000 elements, the added cost of that type switch is negligible, but when you're doing it every time through the inner loop, that's a different story.
So, it's not an unfair test.
You've demonstrated that using NumPy to pull out values one at a time, act on them one at a time, and store them back in the array one at a time is slower than just not using NumPy.
But, if you compare it to actually taking advantage of NumPy—e.g., by creating an array of 200000 floats and then calling np.sqrt on that array vs. looping over it and calling math.sqrt on each one—you'll demonstrate that using NumPy the way it was intended is faster than not using it.
you're comparing it wrong
a_list = np.arange(0,20000,0.1)
timeit(lambda:np.sqrt(a_list),number=1)

python one line for loop over a function returning a tuple

I've tried searching quite a lot on this one, but being relatively new to python I feel I am missing the required terminology to find what I'm looking for.
I have a function:
def my_function(x,y):
# code...
return(a,b,c)
Where x and y are numpy arrays of length 2000 and the return values are integers. I'm looking for a shorthand (one-liner) to loop over this function as such:
Output = [my_function(X[i],Y[i]) for i in range(len(Y))]
Where X and Y are of the shape (135,2000). However, after running this I am currently having to do the following to separate out 'Output' into three numpy arrays.
Output = np.asarray(Output)
a = Output.T[0]
b = Output.T[1]
c = Output.T[2]
Which I feel isn't the best practice. I have tried:
(a,b,c) = [my_function(X[i],Y[i]) for i in range(len(Y))]
But this doesn't seem to work. Does anyone know a quick way around my problem?
my_function(X[i], Y[i]) for i in range(len(Y))
On the verge of crossing the "opinion-based" border, ...Y[i]... for i in range(len(Y)) is usually a big no-no in Python. It is even a bigger no-no when working with numpy arrays. One of the advantages of working with numpy is the 'vectorization' that it provides, and thus pushing the for loop down to the C level rather than the (slower) Python level.
So, if you rewrite my_function so it can handle the arrays in a vectorized fashion using the multiple tools and methods that numpy provides, you may not even need that "one-liner" you are looking for.

How to perform an operation on every element in a numpy matrix?

Say I have a function foo() that takes in a single float and returns a single float. What's the fastest/most pythonic way to apply this function to every element in a numpy matrix or array?
What I essentially need is a version of this code that doesn't use a loop:
import numpy as np
big_matrix = np.matrix(np.ones((1000, 1000)))
for i in xrange(np.shape(big_matrix)[0]):
for j in xrange(np.shape(big_matrix)[1]):
big_matrix[i, j] = foo(big_matrix[i, j])
I was trying to find something in the numpy documentation that will allow me to do this but I haven't found anything.
Edit: As I mentioned in the comments, specifically the function I need to work with is the sigmoid function, f(z) = 1 / (1 + exp(-z)).
If foo is really a black box that takes a scalar, and returns a scalar, then you must use some sort of iteration. People often try np.vectorize and realize that, as documented, it does not speed things up much. It is most valuable as a way of broadcasting several inputs. It uses np.frompyfunc, which is slightly faster, but with a less convenient interface.
The proper numpy way is to change your function so it works with arrays. That shouldn't be hard to do with the function in your comments
f(z) = 1 / (1 + exp(-z))
There's a np.exp function. The rest is simple math.

Which programming language or a library can process Infinite Series?

Which programming language or a library is able to process infinite series (like geometric or harmonic)? It perhaps must have a database of some well-known series and automatically give proper values in case of convergence, and maybe generate an exception in case of divergence.
For example, in Python it could look like:
sum = 0
sign = -1.0
for i in range(1,Infinity,2):
sign = -sign
sum += sign / i
then, sum must be math.pi/4 without doing any computations in the loop (because it's a well-known sum).
Most functional languages which evaluate lazily can simulate the processing of infinite series. Of course, on a finite computer it is not possible to process infinite series, as I am sure you are aware. Off the top of my head, I guess Mathematica can do most of what you might want, I suspect that Maple can too, maybe Sage and other computer-algebra systems and I'd be surprised if you can't find a Haskell implementation that suits you.
EDIT to clarify for OP: I do not propose generating infinite loops. Lazy evaluation allows you to write programs (or functions) which simulate infinite series, programs which themselves are finite in time and space. With such languages you can determine many of the properties, such as convergence, of the simulated infinite series with considerable accuracy and some degree of certainty. Try Mathematica or, if you don't have access to it, try Wolfram Alpha to see what one system can do for you.
One place to look might be the Wikipedia category of Computer Algebra Systems.
There are two tools available in Haskell for this beyond simply supporting infinite lists.
First there is a module that supports looking up sequences in OEIS. This can be applied to the first few terms of your series and can help you identify a series for which you don't know the closed form, etc. The other is the 'CReal' library of computable reals. If you have the ability to generate an ever improving bound on your value (i.e. by summing over the prefix, you can declare that as a computable real number which admits a partial ordering, etc. In many ways this gives you a value that you can use like the sum above.
However in general computing the equality of two streams requires an oracle for the halting problem, so no language will do what you want in full generality, though some computer algebra systems like Mathematica can try.
Maxima can calculate some infinite sums, but in this particular case it doesn't seem to find the answer :-s
(%i1) sum((-1)^k/(2*k), k, 1, inf), simpsum;
inf
==== k
\ (- 1)
> ------
/ k
====
k = 1
(%o1) ------------
2
but for example, those work:
(%i2) sum(1/(k^2), k, 1, inf), simpsum;
2
%pi
(%o2) ----
6
(%i3) sum((1/2^k), k, 1, inf), simpsum;
(%o3) 1
You can solve the series problem in Sage (a free Python-based math software system) exactly as follows:
sage: k = var('k'); sum((-1)^k/(2*k+1), k, 1, infinity)
1/4*pi - 1
Behind the scenes, this is really using Maxima (a component of Sage).
For Python check out SymPy - clone of Mathematica and Matlab.
There is also a heavier Python-based math-processing tool called Sage.
You need something that can do a symbolic computation like Mathematica.
You can also consider quering wolframaplha: sum((-1)^i*1/i, i, 1 , inf)
There is a library called mpmath(python), a module of sympy, which provides the series support for sympy( I believe it also backs sage).
More specifically, all of the series stuff can be found here: Series documentation
The C++ iRRAM library performs real arithmetic exactly. Among other things it can compute limits exactly using the limit function. The homepage for iRRAM is here. Check out the limit function in the documentation. Note that I'm not talking about arbitrary precision arithmetic. This is exact arithmetic, for a sensible definition of exact. Here's their code to compute e exactly, pulled from the example on their web site:
//---------------------------------------------------------------------
// Compute an approximation to e=2.71.. up to an error of 2^p
REAL e_approx (int p)
{
if ( p >= 2 ) return 0;
REAL y=1,z=2;
int i=2;
while ( !bound(y,p-1) ) {
y=y/i;
z=z+y;
i+=1;
}
return z;
};
//---------------------------------------------------------------------
// Compute the exact value of e=2.71..
REAL e()
{
return limit(e_approx);
};
Clojure and Haskell off the top of my head.
Sorry I couldn't find a better link to haskell's sequences, if someone else has it, please let me know and I'll update.
Just install sympy on your computer. Then do the following code:
from sympy.abc import i, k, m, n, x
from sympy import Sum, factorial, oo, IndexedBase, Function
Sum((-1)**k/(2*k+1), (k, 0, oo)).doit()
Result will be: pi/4
I have worked in couple of Huge Data Series for Research purpose.
I used Matlab for that. I didn't know it can/can't process Infinite Series.
But I think there is a possibility.
U can try :)
This can be done in for instance sympy and sage (among open source alternatives) In the following, a few examples using sympy:
In [10]: summation(1/k**2,(k,1,oo))
Out[10]:
2
π
──
6
In [11]: summation(1/k**4, (k,1,oo))
Out[11]:
4
π
──
90
In [12]: summation( (-1)**k/k, (k,1,oo))
Out[12]: -log(2)
In [13]: summation( (-1)**(k+1)/k, (k,1,oo))
Out[13]: log(2)
Behind the scenes, this is using the theory for hypergeometric series, a nice introduction is the book "A=B" by Marko Petkovˇeks, Herbert S. Wilf
and Doron Zeilberger which you can find by googling. ¿What is a hypergeometric series?
Everybody knows what an geometric series is: $X_1, x_2, x_3, \dots, x_k, \dots $ is geometric if the contecutive terms ratio $x_{k+1}/x_k$ is constant. It is hypergeometric if the consecutive terms ratio is a rational function in $k$! sympy can handle basically all infinite sums where this last condition is fulfilled, but only very few others.

Categories