I have found myself with a very confusing array in Python. There following is the output from iPython when I work with it (with the pylab flag):
In [1]: x = np.load('x.npy')
In [2]: x.shape
Out[2]: (504000,)
In [3]: x
Out[3]:
array([ 98.20354462, 98.26583099, 98.26529694, ..., 98.20297241,
98.19876862, 98.29492188], dtype=float32)
In [4]: min(x), mean(x), max(x)
Out[4]: (97.950058, 98.689438, 98.329773)
I have no idea what is going on. Why is the mean() function providing what is obviously the wrong answer?
I don't even know where to begin to debug this problem.
I am using Python 2.7.6.
I would be willing to share the .npy file if necessary.
Probably due to accumulated rounding error in computing mean(). float32 relative precision is ~ 1e-7, and you have 500000 elements -> ~ 5% rounding in direct computation of sum().
The algorithm for computing sum() and mean() is more sophisticated (pairwise summation) in the latest Numpy version 1.9.0:
>>> import numpy
>>> numpy.__version__
'1.9.0'
>>> x = numpy.random.random(500000).astype("float32") + 300
>>> min(x), numpy.mean(x), max(x)
(300.0, 300.50024, 301.0)
In the meanwhile, you may want to use higher-precision accumulator type: numpy.mean(x, dtype=numpy.float64)
I have included a snippet from the np.mean.__doc__ below. You should try using np.mean(x, dtype=np.float64).
-----
The arithmetic mean is the sum of the elements along the axis divided
by the number of elements.
Note that for floating-point input, the mean is computed using the
same precision the input has. Depending on the input data, this can
cause the results to be inaccurate, especially for `float32` (see
example below). Specifying a higher-precision accumulator using the
`dtype` keyword can alleviate this issue.
In single precision, `mean` can be inaccurate:
>>> a = np.zeros((2, 512*512), dtype=np.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> np.mean(a)
0.546875
Computing the mean in float64 is more accurate:
>>> np.mean(a, dtype=np.float64)
0.55000000074505806
Related
Considering the following matrix equation:
x=Ab
where:
In[1]:A
Out[1]:
matrix([[ 0.477, -0.277, -0.2 ],
[-0.277, 0.444, -0.167],
[-0.2 , -0.167, 0.367]])
In[2]: b
Out[2]: [0, 60, 40]
how come that when I use numpy.linalg() I get the following results?
import numpy as np
x = np.linalg.solve(A, b)
res=x.tolist()
# res=[1.8014398509481981e+18, 1.801439850948198e+18, 1.8014398509481984e+18]
These numbers are huge! What's wrong here? I am suspecting A is in the wrong form, as it multiplies b in my equation, whereas numpy.linalg() considers A as if it multiplies x.
What you give as an equation (x=A b) is just a matrix multiplication rather than a set of linear equations to solve (A x=b) for which you would use np.linalg.solve. What you need to do to get x in your case is simply use np.dot (A.dot(b)).
Your matrix is singular, as can be seen by adding its columns which sum to zero. Mathematically, this system is only solvable for a very small set of b vectors.
The solution you're getting is most likely just numerical noise.
I am trying to split up the multiplication of a dft matrix in to real and imaginary parts
from scipy.linalg import dft
improt numpy as np
# x is always real
x = np.ones(4)
W = dft(4)
Wx = W.dot(x)
Wxme = np.real(W).dot(x) + np.imag(W).dot(x)*1.0j
I would like that Wx and Wxme give the same value but they are not at all. I have narrowed down the bug a bit more:
In [62]: W[1]
Out[62]:
array([ 1.00000000e+00 +0.00000000e+00j,
6.12323400e-17 -1.00000000e+00j,
-1.00000000e+00 -1.22464680e-16j, -1.83697020e-16 +1.00000000e+00j])
In [63]: np.sum(W[1])
Out[63]: (-2.2204460492503131e-16-1.1102230246251565e-16j)
In [64]: sum(W[1])
Out[64]: (-1.8369701987210297e-16-2.2204460492503131e-16j)
Why do sum and np.sum give different values ?? addition of complex numbers should not be anything but adding the real parts and the imaginary parts seperately right ??
Adding the by hand gives me the result I would expect as opposed to what numy gives me:
In [65]: 1.00000000e+00 + 6.12323400e-17 + -1.00000000e+00 + 1.83697020e-16
Out[65]: 1.8369702e-16
What am I missing ??
Up to rounding error, these results are equal. The results have slightly different rounding error due to factors such as different summation order or different levels of precision used to represent intermediate results.
I'm trying to use Python to calculate the Rodrigues formula, P_n(x).
http://en.wikipedia.org/wiki/Rodrigues%27_formula
That is, I would like a function which takes into two input parameters, n and x, and returns the output of this formula.
However, I don't think SciPy has this function yet. SpiPy does offer a Legendre module:
http://docs.scipy.org/doc/numpy/reference/routines.polynomials.legendre.html
I don't think any of these is the Rodrigues formula. Am I wrong?
Is there a standard way SciPy offers to do this?
EDIT: I would like the input parameters to be arrays, not just single input values.
If you simply want P_n(x), then you can create a suitable object representing the P_n polynomial using scipy.special.legendre and call it with your values of x:
In [1]: from scipy.special import legendre
In [2]: n = 3
In [3]: Pn = legendre(n)
In [4]: Pn(2.5)
Out[4]: 35.3125 # P_3(2.5)
The object Pn is, in a sense, the "output" of the Rodrigues formula: it is a polynomial of the required order, which can be evaluated at a provided value of x. If you want a single function that takes n and x, you can use eval_legendre:
In [5]: from scipy.special import eval_legendre
In [6]: eval_legendre(3, 2.5)
Out[6]: 35.3125
As noted in the docs, this is the recommended way to do it for large-ish n (e.g. n > 20), instead of creating a polynomial object with all the coefficients which does not handle rounding errors and numerical stability as well.
EDIT: Both approaches work with arrays (at least for the x argument). For example:
In [7]: x = np.array([0, 1, 2, 5, 10])
In [8]: Pn(x)
Out[8]:
array([ 0.00000000e+00, 1.00000000e+00, 1.70000000e+01,
3.05000000e+02, 2.48500000e+03])
I use numpy and mpmath in my Python programm. I use numpy, because it allows an easy access to many linear algebra operations. But because numpy's solver for linear equations is not that exact, i use mpmath for more precision operations. After i compute the solution of a System:
solution = mpmath.lu_solve(A,b)
i want the solution as an array. So i use
array = np.zeros(m)
and then do a loop for setting the values:
for i in range(m):
array[i] = solution[i]
or
for i in range(m):
array.put([i],solution[i])
but with both ways i get again numerical instabilities like:
solution[0] = 12.375
array[0] = 12.37500000000000177636
Is there a way to avoid these errors?
numpy ndarrays have homogeneous type. When you make array, the default dtype will be some type of float, which doesn't have as much precision as you want:
>>> array = np.zeros(3)
>>> array
array([ 0., 0., 0.])
>>> array.dtype
dtype('float64')
You can get around this by using dtype=object:
>>> mp.mp.prec = 65
>>> mp.mpf("12.37500000000000177636")
mpf('12.37500000000000177636')
>>> array = np.zeros(3, dtype=object)
>>> array[0] = 12.375
>>> array[1] = mp.mpf("12.37500000000000177636")
>>> array
array([12.375, mpf('12.37500000000000177636'), 0], dtype=object)
but note that there's a significant performance hit when you do this.
For the completeness, and for people like me who stumbled upon this question because numpy's linear solver is not exact enough (it seems to be able to handle 64bit numbers, only), there is also sympy.
The API is somewhat similar to numpy, but needs a few tweaks every now and then.
In [104]: A = Matrix([
[17928014155669123106522437234449354737723367262236489360399559922258155650097260907649387867023242961198972825743674594974017771680414642705007756271459833, 13639120912900071306285490050678803027789658562874829601953000463099941578381997997439951418291413106684405816668933580642992992427754602071359317117391198, 2921704428390104906296926198429197524950528698980675801502622843572749765891484935643316840553487900050392953088680445022408396921815210925936936841894852],
[14748352608418286197003564525494635018601621545162877692512866070970871867506554630832144013042243382377181384934564249544095078709598306314885920519882886, 2008780320611667023380867301336185953729900109553256489538663036530355388609791926150229595099056264556936500639831205368501493630132784265435798020329993, 6522019637107271075642013448499575736343559556365957230686263307525076970365959710482607736811185215265865108154015798779344536283754814484220624537361159],
[ 5150176345214410132408539250659057272148578629292610140888144535962281110335200803482349370429701981480772441369390017612518504366585966665444365683628345, 1682449005116141683379601801172780644784704357790687066410713584101285844416803438769144460036425678359908733332182367776587521824356306545308780262109501, 16960598957857158004200152340723768697140517883876375860074812414430009210110270596775612236591317858945274366804448872120458103728483749408926203642159476]])
In [105]: B = Matrix([
.....: [13229751631544149067279482127723938103350882358472000559554652108051830355519740001369711685002280481793927699976894894174915494730969365689796995942384549941729746359],
.....: [ 6297029075285965452192058994038386805630769499325569222070251145048742874865001443012028449109256920653330699534131011838924934761256065730590598587324702855905568792],
.....: [ 2716399059127712137195258185543449422406957647413815998750448343033195453621025416723402896107144066438026581899370740803775830118300462801794954824323092548703851334]])
In [106]: A.solve(B)
Out[106]:
Matrix([
[358183301733],
[498758543457],
[ 1919512167]])
In [107]:
Simple rounding of a floating point numpy array seems not working for some reason..
I get numpy array from reading a huge img (shape of (7352, 7472)). Ex values:
>>> imarray[3500:3503, 5000:5003]
array([[ 73.33999634, 73.40000153, 73.45999908],
[ 73.30999756, 73.37999725, 73.43000031],
[ 73.30000305, 73.36000061, 73.41000366]], dtype=float32)
And for rounding I've been just trying to use numpy.around() for the raw value, also writing values to a new array, a copie of raw array, but for some reason no results..
arr=imarray
numpy.around(imarray, decimals=3, out=arr)
arr[3500,5000] #results in 73.3399963379, as well as accessing imarray
So, even higher precision!!!
Is that because of such big array?
I need to round it to get the most frequent value (mode), and I'm searching the vay to avoid more and more libraries..
Your array has dtype float32. That is a 4-byte float.
The closest float to 73.340 representable using float32 is roughly 73.33999634:
In [62]: x = np.array([73.33999634, 73.340], dtype = np.float32)
In [63]: x
Out[63]: array([ 73.33999634, 73.33999634], dtype=float32)
So I think np.around is rounding correctly, it is just that your dtype has too large a granularity to round to the number you might be expecting.
In [60]: y = np.around(x, decimals = 3)
In [61]: y
Out[61]: array([ 73.33999634, 73.33999634], dtype=float32)
Whereas, if the dtype were np.float64:
In [64]: x = np.array([73.33999634, 73.340], dtype = np.float64)
In [65]: y = np.around(x, decimals = 3)
In [66]: y
Out[66]: array([ 73.34, 73.34])
Note that even though printed representation for y shows 73.34, it is not necessarily true that the real number 73.34 is exactly representable as a float64 either. The float64 representation is probably just so close to 73.34 that NumPy chooses to print it as 73.34.
The answer by #unutbu is absolutely correct. Numpy is rounding it as close to the number as it can given the precision that you requested. The only thing that I have to add is that you can use numpy.set_printoptions to change how the array is displayed:
>>> import numpy as np
>>> x = np.array([73.33999634, 73.340], dtype = np.float32)
>>> y = np.round(x, decimals = 3)
>>> y
array([ 73.33999634, 73.33999634], dtype=float32)
>>> np.set_printoptions(precision=3)
>>> y
array([ 73.34, 73.34], dtype=float32)