I'm trying to translate some Matlab code into Python. Unfortunately I don't have Matlab so I can't try out the syntax.
I'm confused about an if statement below
for i = 1:200
if mod(i,10) == 0
i
end
The for loop then carries on to calculate some values which depend on i. What does the if statement do?
Can I also ask what the difference between a sparse matrix and one made with zeros eg
A = sparse(n,m)
B = zeros(n,m)
Thanks!
It is usually better to create seperate questions, but I will try to address both:
1) The mod function performes the modulo operation, i.e. the remainder after division. mod(i,10) == 0 will be 1 if a number is divisible by 10, and 0 otherwise. The if statement will therefore be executed when the number i is a multiple of 10.
As there is no elsepart, nothing happens if the condition isn't true.
By just writing i (without semicolon), the current value of i is printed to the command window. The output of your example code will therefore be 10, 20, ..., 200.
2) The zeros command creates a "normal" matrix of (of course) zeros of dimension n x m. MATLAB also has a special sparse memory organization. As sparse matrices consist mostly of zeros, you don't need to fill the memory with all those zeros, but you can save the non-zero values and where they are. This is automatically done using the sparse function. To convert a sparse matrix to the "normal" format, you can use the full function.
The if statement checks whether the modulus (remainder after division) of i divided by 10 is 0 or not.
When it is evaluated true, it prints the number i to the command window.
The naive Python translation would be
for i in range(1, 201):
if not i % 10:
print(i)
but we can save some work by specifying a step value,
for i in range(10, 201, 10):
print(i)
See the documentation for mod. mod(i,10) returns the remainder after division of i by 10, where i is the dividend and 10 is the divisor. The if statement checks whether that remainder is equal to 0 or not.
Related
The issues
So I have an array I imported containing values ranging from ~0.0 to ~0.76. When I started trying to find the min & max values using Numpy, I ran into some strange inconsistencies that I'd like know how to solve if they're my fault, or avoid if they're programming errors on the Numpy developer's end.
The code
Let's start with finding the location of the maximum values using np.max & np.where.
print array.shape
print np.max(array)
print np.where(array == 0.763728955743)
print np.where(array == np.max(array))
print array[35,57]
The output is this:
(74, 145)
0.763728955743
(array([], dtype=int64), array([], dtype=int64))
(array([35]), array([57]))
0.763728955743
When I look for where the array exactly equals the maximum entry's value, Numpy doesn't find it. However, when I simply search for the location of the maximum values without specifying what that value is, it works. Note this doesn't happen in np.min.
Now I have a different issue regarding minima.
print array.shape
print np.min(array)
print np.where(array == 0.0)
print np.where(array == np.min(array))
print array[10,25], array[31,131]
Look at the returns.
(74, 145)
0.0
(array([10, 25]), array([ 31, 131]))
(array([10, 25]), array([ 31, 131]))
0.0769331747301 1.54220192172e-09
1.54^-9 is close enough to 0.0 that it seems like it would be the minimum value. But why is a location with the value 0.077 also listed by np.where? That's not even close to 0.0 compared to the other value.
The Questions
Why doesn't np.where seem to work when entering the maximum value of the array, but it does when searching for np.max(array) instead? And why does np.where() mixed with np.min() returns two locations, one of which is definitely not the minimum value?
You have two issues: the interpretation of floats and the interpretation of the results of np.where.
Non-integer floating point numbers are stored internally in binary and can not always be represented exactly in decimal notation. Similarly, decimal numbers can not always be represented exactly in binary. This is why np.where(array == 0.763728955743) returns an empty array, while print np.where(array == np.max(array)) does the right thing. Note that the second case just uses the exact binary number internally without any conversions. The search for the minimum succeeds because 0.0 can be represented exactly in both decimal and binary. In general, it is a bad idea to compare floats using == for this and related reasons.
For the version of np.where that you are using, it devolves into np.nonzero. You are interpreting the results here because it returns an array for each dimension of the array, not individual arrays of coordinates. There are a number of ways of saying this differently:
If you had three matches, you would be getting two arrays back, each with three elements.
If you had a 3D input array with two matches, you would get three arrays back, each with two elements.
The first array is row-coordinates (dim 0) and the second array is column-coordinates (dim 1).
Notice how you are interpreting the output of where for the maximum case. This is correct, but it is not what you are doing in the minimum case.
There are a number of ways of dealing with these issues. The easiest could be to use np.argmax and np.argmin. These will return the first coordinate of a maximum or minimum in the array, respectively.
>>> x = np.argmax(array)
>>> print(x)
array([35, 57])
>> print(array[x])
0.763728955743
The only possible problem here is that you may want to get all of the coordinates.
In that case, using where, or nonzero is fine. The only difference from your code is that you should print
print array[10,31], array[25,131]
instead of the transposed values as you are doing.
Try using numpy.isclose() instead of ==. Because floating point numbers cannot be tested for exact equality.
i.e. change this: np.where(array == 0.763728955743)
to: np.isclose(array, 0.763728955743)
np.min() and np.max() work as expected for me. Also note you can provide an axis like arr.min(axis=1) if you want to.
If this does not solve it, perhaps you could post some csv data somewhere to try to reproduce the problem? I kinda highly doubt it is a bug with numpy itself but you never know!
I am trying to write a program in python 2.7 that will first see if a number divides the other evenly, and if it does get the result of the division.
However, I am getting some interesting results when I use large numbers.
Currently I am using:
from __future__ import division
import math
a=82348972389472433334783
b=2
if a/b==math.trunc(a/b):
answer=a/b
print 'True' #to quickly see if the if loop was invoked
When I run this I get:
True
But 82348972389472433334783 is clearly not even.
Any help would be appreciated.
That's a crazy way to do it. Just use the remainder operator.
if a % b == 0:
# then b divides a evenly
quotient = a // b
The true division implicitly converts the input to floats which don't provide the precision to store the value of a accurately. E.g. on my machine
>>> int(1E15+1)
1000000000000001
>>> int(1E16+1)
10000000000000000
hence you loose precision. A similar thing happens with your big number (compare int(float(a))-a).
Now, if you check your division, you see the result "is" actually found to be an integer
>>> (a/b).is_integer()
True
which is again not really expected beforehand.
The math.trunc function does something similar (from the docs):
Return the Real value x truncated to an Integral (usually a long integer).
The duck typing nature of python allows a comparison of the long integer and float, see
Checking if float is equivalent to an integer value in python and
Comparing a float and an int in Python.
Why don't you use the modulus operator instead to check if a number can be divided evenly?
n % x == 0
I wanted to use NumPy in a Fibonacci question because of its efficiency in matrix multiplication. You know that there is a method for finding Fibonacci numbers with the matrix [[1, 1], [1, 0]].
I wrote some very simple code but after increasing n, the matrix is starting to give negative numbers.
import numpy
def fib(n):
return (numpy.matrix("1 1; 1 0")**n).item(1)
print fib(90)
# Gives -1581614984
What could be the reason for this?
Note: linalg.matrix_power also gives negative values.
Note2: I tried numbers from 0 to 100. It starts to give negative values after 47. Is it a large integer issue because NumPy is coded in C ? If so, how could I solve this ?
Edit: Using regular python list matrix with linalg.matrix_power also gave negative results. Also let me add that not all results are negative after 47, it occurs randomly.
Edit2: I tried using the method #AlbertoGarcia-Raboso suggested. It resolved the negative number problem, however another issues occured. It gives the answer as -5.168070885485832e+19 where I need -51680708854858323072L. So I tried using int(), it converted it to L, but now it seems the answer is incorrect because of a loss in precision.
The reason you see negative values appearing is because NumPy has defaulted to using the np.int32 dtype for your matrix.
The maximum positive integer this dtype can represent is 231-1 which is 2147483647. Unfortunately, this is less the 47th Fibonacci number, 2971215073. The resulting overflow is causing the negative number to appear:
>>> np.int32(2971215073)
-1323752223
Using a bigger integer type (like np.int64) would fix this, but only temporarily: you'd still run into problems if you kept on asking for larger and larger Fibonacci numbers.
The only sure fix is to use an unlimited-size integer type, such as Python's int type. To do this, modify your matrix to be of np.object type:
def fib_2(n):
return (np.matrix("1 1; 1 0", dtype=np.object)**n).item(1)
The np.object type allows a matrix or array to hold any mix of native Python types. Essentially, instead of holding machine types, the matrix is now behaving like a Python list and simply consists of pointers to integer objects in memory. Python integers will be used in the calculation of the Fibonacci numbers now and overflow is not an issue.
>>> fib_2(300)
222232244629420445529739893461909967206666939096499764990979600
This flexibility comes at the cost of decreased performance: NumPy's speed originates from direct storage of integer/float types which can be manipulated by your hardware.
Okay, for my numerical methods class I have the following question:
Write a Python function to solve Ax = b by back substitution, where A is an upper triangular nonsingular matrix. MATLAB code for this is on page 190 which you can use as a pseudocode guide if you wish. The function should take as input A and b and return x. Your function need not check that A is nonsingular. That is, assume that only nonsingular A will be passed to your function.
The MATLAB code that it refers to is:
x(n) = c(u)/U(n,n)
for i = n-1 : -1 : 1
x(i) = c(i);
for j = i+1 : n
x(i) = x(i) - U(i,j)*x(j);
end
x(i) = x(i)/U(i,i);
end
My Python code, which I wrote using the MATLAB code snippet, is with an upper triangular test matrix(not sure if its nonsingular! How do I test for singularity?):
from scipy import mat
c=[3,2,1]
U=([[6,5,1],[0,1,7],[0,0,2]])
a=0
x=[]
while a<3:
x.append(1)
a=a+1
n=3
i=n-1
x[n-1]=c[n-1]/U[n-1][n-1]
while i>1:
x[i]=c[i]
j=i+1
while j<n-1:
x[i]=x[i]-U[i][j]*x[j];
x[i]=x[i]/U[i][i]
i=i-1
print mat(x)
The answer I am getting is [[1 1 0]] for x. I not sure if I am doing this correctly. I assume it is wrong and can't figure out what to do next. Any clues?
j=i+1
while j<n-1:
x[i]=x[i]-U[i][j]*x[j];
is infinite ... and never gets executed
your indexing is fubared:
for i in range(n-2,-1,-1):
....
for j in range(i+1,n):
notice, range is half open unlike matlab
One problem I see is that your input consists of integers, which means that Python is going to do integer division on them, which will turn 3/4 into 0 when what you want is floating point division. You can tell python to do floating point division by default by adding
from __future__ import division
To the top of your code. From the use of scipy, I'm assuming you're using Python 2.x here.
You ask how to test for singularity of an upper triangular matrix?
Please don't compute the determinant!
Simply look at the diagonal elements. Which ones are zero? Are any zero?
How about effective numerical singularity? Compare the smallest absolute value to the largest in absolute value. If that ratio is smaller than something on the order of eps, it is effectively singular.
When you use the POISSON function in Excel (or in OpenOffice Calc), it takes two arguments:
an integer
an 'average' number
and returns a float.
In Python (I tried RandomArray and NumPy) it returns an array of random poisson numbers.
What I really want is the percentage that this event will occur (it is a constant number and the array has every time different numbers - so is it an average?).
for example:
print poisson(2.6,6)
returns [1 3 3 0 1 3] (and every time I run it, it's different).
The number I get from calc/excel is 3.19 (POISSON(6,2.16,0)*100).
Am I using the python's poisson wrong (no pun!) or am I missing something?
scipy has what you want
>>> scipy.stats.distributions
<module 'scipy.stats.distributions' from '/home/coventry/lib/python2.5/site-packages/scipy/stats/distributions.pyc'>
>>> scipy.stats.distributions.poisson.pmf(6, 2.6)
array(0.031867055625524499)
It's worth noting that it's pretty easy to calculate by hand, too.
It is easy to do by hand, but you can overflow doing it that way. You can do the exponent and factorial in a loop to avoid the overflow:
def poisson_probability(actual, mean):
# naive: math.exp(-mean) * mean**actual / factorial(actual)
# iterative, to keep the components from getting too large or small:
p = math.exp(-mean)
for i in xrange(actual):
p *= mean
p /= i+1
return p
This page explains why you get an array, and the meaning of the numbers in it, at least.