I have a 4x4 matrix and 4x1 vector. If i calculate dot product by hand (in excel) i get different values to the numpy result. I expect it has to do with float values, but a difference of 6.7E-3 for example seems too large to just be float error? What am i missing?
Isolated code result (see below):
[-3.24218399e-06 1.73591630e-04 -3.49611749e+04 1.90697291e+05]
With handcalculation (excel):
[-1.04791731E-11 7.08581638E-10 -3.49611670E+04 1.90697275E+05]
The input values for the matrix are pulled from code, where i do the same evaluation. There, result is:
[-2.09037901e-04 6.77221033e-03 -3.49612277e+04 1.90697438e+05]
isolated input values:
import numpy as np
arrX1 = np.array([
[-2.18181817e+01, 1.78512395e+03,-5.84222383e+04, 7.43555757e+05],
[ 8.92561977e+02,-6.81592780e+04, 2.07133390e+06,-2.43345520e+07],
[-9.73703971e+03, 6.90444632e+05,-1.96993992e+07, 2.21223199e+08],
[ 3.09814899e+04,-2.02787933e+06, 5.53057997e+07,-6.03335995e+08]],
dtype=np.float64)
arrX2 = np.array([0,-1.97479339E+00,-1.20681818E-01,-4.74107143E-03],dtype=np.float64)
print (np.dot(arrX1, arrX2))
#-> [-3.24218399e-06 1.73591630e-04 -3.49611749e+04 1.90697291e+05]
at a guess this is because you're pulling your values out from Excel with too little precision. the values in your question only have 9 significant figures, while the 64 bit floats used in Excel and that you're requesting in Numpy are good to about 15 digits.
redoing the calculation with Python's arbitrary precision Decimals gives me something very close to Numpy's answer:
from decimal import Decimal as D, getcontext
x = [1.78512395e+03,-5.84222383e+04, 7.43555757e+05]
y = [-1.97479339E+00,-1.20681818E-01,-4.74107143E-03]
# too much precision please
getcontext().prec = 50
sum(D(x) * D(y) for x, y in zip(x, y))
gets within ~4e-13 of the value from Numpy, which seems reasonable given the scale of the values involved.
np.allclose can be good to check whether things are relatively close, but has relatively loose default bounds. if I redo the spreadsheet calculation with the numbers you gave, then allclose says everything is consistent
Related
Imagine to have some vectors (could be a torch tensor or a numpy array) with a huge number of components, each one very small (~ 1e-10).
Let's say that we want to calculate the norm of one of these vectors (or the dot product between two of them). Also using a float64 data type the precision on each component will be ~1e-10, while the product of 2 component (during the norm/dot product computation) can easily reach ~1e-20 causing a lot of rounding errors that, summed up together return a wrong result.
Is there a way to deal with this situation? (For example is there a way to define arbitrary precision array for these operations, or some built in operator that take care of that automatically?)
You are dealing with two different issues here:
Underflow / Overflow
Calculating the norm of very small values may underflow to zero when you calculate the square. Large values may overflow to infinity. This can be solved by using a stable norm algorithm.
A simple way to deal with this is to scale the values temporarily. See for example this:
a = np.array((1e-30, 2e-30), dtype='f4')
np.linalg.norm(a) # result is 0 due to underflow in single precision
scale = 1. / np.max(np.abs(a))
np.linalg.norm(a * scale) / scale # result is 2.236e-30
This is now a two-pass algorithm because you have to iterate over all your data before determining a scaling value. If this is not to your liking, there are single-pass algorithms, though you probably don't want to implement them in Python. The classic would be Blue's algorithm:
http://degiorgi.math.hr/~singer/aaa_sem/Float_Norm/p15-blue.pdf
A simpler but much less efficient way is to simply chain calls to hypot (which uses a stable algorithm). You should never do this, but just for completion:
norm = 0.
for value in a:
norm = math.hypot(norm, value)
Or even a hierarchical version like this to reduce the number of numpy calls:
norm = a
while len(norm) > 1:
hlen = len(norm) >> 1
front, back = norm[:hlen], norm[hlen: 2 * hlen]
tail = norm[2 * hlen:] # only present with length is not even
norm = np.append(np.hypot(front, back), tail)
norm = norm[0]
You are free to combine these strategies. For example if you don't have your data available all at once but blockwise (e.g. because the data set is too large and you read it from disk), you can pick a scaling value per block, then chain the blocks together with a few calls to hypot.
Rounding errors
You accumulate rounding errors, especially when accumulating values of different magnitude. If you accumulate values of different signs, you may also experience catastrophic cancelation. To avoid these issues, you need to use a compensated summation scheme. Python provides a very good one with math.fsum.
So if you absolutely need highest accuracy, go with something like this:
math.sqrt(math.fsum(np.square(a * scale))) / scale
Note that this is overkill for a simple norm since there are no sign changes in the accumulation (so no cancelation) and the squaring increases all differences in magnitude so that the result will always be dominated by its largest components, unless you are dealing with a truly horrifying dataset. That numpy does not provide built-in solutions for these issues tells you that the naive algorithm is actually good enough for most real-world applications. No reason to go overboard with the implementation before you actually run into trouble.
Application to dot products
I've focused on the l2 norm because that is the case that is more generally understood to be hazardous. Of course you can apply similar strategies to a dot product.
np.dot(a, b)
ascale = 1. / np.max(np.abs(a))
bscale = 1. / np.max(np.abs(b))
np.dot(a * ascale, b * bscale) / (ascale * bscale)
This is particularly useful if you use mixed precision. For example the dot product could be calculated in single precision but the x / (ascale * bscale) could take place in double or even extended precision.
And of course math.fsum is still available: dot = math.fsum(a * b)
Bonus thoughts
The whole scaling itself introduces some rounding errors because no one guarantees you that a/b is exactly representable in floating point. However, you can avoid this by picking a scaling factor that is an exact power of 2. Multiplying with a power of 2 is always exact in FP (assuming you stay in the representable range). You can get the exponent with math.frexp
I'm trying to compute the sum of an array of random numbers. But there seems to be an inconcistancy between the results when I do it one element at a time and when I use the built-in function. Furthermore, the error seems to increase when I decrease the data precision.
import torch
columns = 43*22
rows = 44
torch.manual_seed(0)
array = torch.rand([rows,columns], dtype = torch.float64)
array_sum = 0
for i in range(rows):
for j in range(columns):
array_sum += array[i, j]
torch.abs(array_sum - array.sum())
results in:
tensor(3.6380e-10, dtype=torch.float64)
using dtype = torch.float32 results in:
tensor(0.1426)
using dtype = torch.float16 results in (a whooping!):
tensor(18784., dtype=torch.float16)
I find it hard to believe no one has ever asked about it. Yet, I haven't found a similar question in SO.
Can anyone please help me find some explanation or the source of this error?
The first mistake is this:
you should change the summation line to
array_sum += float(array[i, j])
For float64 this causes no problems, for the other values it is a problem, the explenation will follow.
To start with: when doing floating point arithmetic, you should always keep in mind that there are small mistakes due to rounding errors. The most simple way to see this is in a python shell:
>>> .1+.1+.1-.3
5.551115123125783e-17
But how do you take these errors into account?
When summing n positive integers to a total tot, the analysis is fairly simple and it the rule is:
error(tot) < tot * n * machine_epsilon
Where the factor n is usually a gross over-estimation and the machine_epsilon is dependant on the type (representation size) of floating point-number.
And is approximatly:
float64: 2*10^-16
float32: 1*10^-7
float16: 1*10^-3
And one would generally expect as an error approximately within a reasonable factor of tot*machine_epsilon.
And for my tests with float16 we get (with always +-40000 variables summing to a total of +- 20000):
error(float64) = 3*10^-10 ≈ 80* 20000 * 2*10^-16
error(float32) = 1*10^-1 ≈ 50* 20000 * 1*10^-7
which is acceptable.
Then there is another problem with the float 16. There is the machine epsilon = 1e-4 and you can see the problem with
>>> ar = torch.ones([1], dtype=float16)
>>> ar
tensor([2048.], dtype=torch.float16)
>>> ar[0] += .5
>>> ar
tensor([2048.], dtype=torch.float16)
Here the problem is that when the value 2048 is reached, the value is not precise enough to be able to add a value 1 or less. More specifically: with a float16 you can 'represent' the value 2048, and you can represent the value 2050, but nothing in between because it has too little bits for that precision. By keeping the sum in a float64 variable, you overcome this problem. Fixing this we get for float16:
error(float16) = 16 ≈ 8* 20000 * 1*10^-4
Which is large, but acceptable as a value relative to 20000 represented in float16.
If you ask yourself, which of the two methods is 'right' then the answer is none of the two, they are both approximations with the same precision, but a different error.
But as you probably guessed using the sum() method is faster, better and more reliable.
You can use float(array[i][j]) in place of array[i][j] in order to ensure ~0 difference between the loop-based sum and the torch.sum(). The ~0 is easy to observe when the number of elements are taken into account as shown in the two plots below.
The heatmaps below show the error per element = (absolute difference between torch.sum() and loop-based sum), divided by the number of elements. The heatmap value when using an array of r rows and c columns is computed as:
heatmap[r, c] = torch.abs(array_sum - array.sum())/ (r*c)
We vary the size of the array in order to observe how it affects the errors per element. Now, in the case of OP's code, the heatmaps show accumulating error with increasing size of matrix. However, when we use float(array[i,j]), the error is not dependent on the size of the array.
Top Image: when using array_sum += float(array[i][j])
Bottom Image: when using array_sum += (array[i][j])
The script used to generate these plots is reproduced below if someone wants to fiddle around with these.
import torch
import numpy as np
column_list = range(1,200,10)
row_list = range(1,200,10)
heatmap = np.zeros(shape=(len(row_list), len(column_list)))
for count_r, rows in enumerate(row_list):
for count_c, columns in enumerate(column_list):
### OP's snippet start
torch.manual_seed(0)
array = torch.rand([rows,columns], dtype = torch.float16)
array_sum = np.float16(0)
for i in range(rows):
for j in range(columns):
array_sum += (array[i, j])
### OP's snippet end
heatmap[count_r, count_c] = torch.abs(array_sum - array.sum())/ (rows*columns)
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
X = row_list
Y = column_list
Z = heatmap
df = pd.DataFrame(data=Z, columns=X, index=Y)
sns.heatmap(df, square=False)
plt.xlabel('number of rows')
plt.ylabel('number of columns')
plt.tight_layout()
plt.savefig('image_1.png', dpi=300)
plt.show()
You have hit the top of a rather big iceberg wrt to storing high precision values in a computer.
There are two concerns here, one how python always stores a double floating point value, so you have casting between two different data types here leading to some of the odd behavior. Second is how floating point numbers in general work (you can read more here).
In general when you store a number in a float you are "guaranteed" some number of significant figures, say 10, then any values after 10 places will have some error in them due to the precision they were stored at (often denoted ε). This means that if you have a sum of two numbers across 10 orders of magnitude then ε will be significant in your answer, or (far more likely in this case) you will drop some of the values you care about because the total is much larger than one of the numbers you are adding. Below are some examples of this in numpy:
import numpy as np
val_v_small = np.float(0.0000000000001)
val_small = np.float(1.000000001)
val_big = np.float(10000000)
print(val_big + val_small) # here we got an extra .000000001 from the ε of val_big
>>> 10000001.000000002
print(val_big + val_v_small) # here we dropped the values we care about the val_v_small as they were truncated off v_big
>>> 10000000.0
I have a 1D array data which I am trying to model as hyperbola using three parameters. I am trying to implement the Levenberg Marquardt algorithm using the leastsq function from scipy.optimize library. However, my program is getting stuck at an iteration where a number is getting divided by a zero, and I don't understand why.
Some background: The 1D array data are basically lacunarity values for different box sizes. I've generated the lacunarity data from some sound files, the context to which can be found here.
In the algorithm, the least squares function takes three inputs:
(a) initial guess for the three parameters
(b) the x coordinate for the least squares problem - that's basically a 1D array of integers from 1 to 100 in my problem
(c) the y coordinate for the least squares problem - this is the 1D array that stores the lacunarity values. So Lacunarity values are a function of x, where x varies from 1 to 100.
The hyperbola is modeled using three parameters a,b and c as
The code gives the following error:
"OverflowError: cannot convert float infinity to integer"
The code:
#import
from scipy import *
from scipy.optimize import leastsq
import matplotlib.pylab as plt
import numpy as np
import codecs, json
from math import *
# Define your function to calculate the residuals.
#The fitting function holds your parameter values.
def residuals(p, y, x):
err = y-pval(x,p)
return err
def pval(x, p):
z = x
for i in range(100):
print(x)
print(x[i]**p[1])
z[i] = p[0]/(x[i]**p[1])+p[2]
return z
#read in your data
obj_text = codecs.open('textfiles\CC1.json', 'r', encoding='utf-8').read()
b_new = json.loads(obj_text)
data = np.array(b_new)
x = np.arange(1,101)
y = data[1:101]
#guess at initial parameters
A1_0=1.0
A2_0=1.0
A3_0=0.5
#leastsq package calls the Levenberg-Marquardt algorithm
pname = (['A1','A2','A3'])
p0 = array([A1_0 , A2_0, A3_0])
plsq = leastsq(residuals, p0, args=(y, x), maxfev=2000)
# Now, plot your data
plt.plot(x,y,'xo',x,pval(x,plsq[0]),'x')
title('Least-squares fit to data')
xlabel('x')
ylabel('y')
legend(['Data', 'Fit'],loc=4)
# Your best-fit paramters are kept within plsq[0].
print(plsq[0])
According to the error, the value of x changes to 0 at some point in the iteration, and the first parameter a ends up getting divided by zero which gives the error.
To troubleshoot, I printed the values x[i]^b and the array x while executing the code, and you can see the values here. I see that the array x is getting modified which shouldn't happen. x should remain a 1D array of natural numbers from 1 to 100 and not get modified in the iteration. I couldn't identify where exactly is the code modifying the array x.
I expect the array x to remain unchanged and the code to print the final three values of the parameters a,b and c.
EDIT: I made some changes to my code after which it worked successfully. Following are those edits incase anyone would be interested:
Did not define z as z = x, but rather just defined it as z = np.arange(1,101). The result was that the array x did not change anymore which is what was expected.
Changed the datatype of arrays x and y to float using
x = np.array(x, dtype=np.float64)
I got stuck once more, at the piece of code which plots the data. I got the errors" 'title' not defined. Similar errors for xlabel, ylabel. So I just removed those lines and just stuck with
plt.plot(x,y,'red',x,pval(x,plsq[0]),'blue')
plt.show()
Not a direct answer to your question, but since you're using exponentiation (**), I strongly recommend that you convert all your numbers to Decimal beforehand, in order to avoid the precision-loss inherent in floating-point arithmetic on large values.
For example:
import decimal
decimal.getcontext().prec = 100
A1_0=Decimal("1.0")
A2_0=Decimal("1.0")
A3_0=Decimal("0.5")
x = [Decimal(f) for f in x]
y = [Decimal(f) for f in y]
Perhaps your zero will "turn up" to be a small value close to zero...
I'm using Python 3 and trying to plot the half-life time of a process. The formula for this half life time is -ln(2)/(ln(1-f)). In this formula, f is an extremely small number, of the order 10^-17 most of the time, and even less.
Because I have to plot a range of values of f, I have to repeat the calculation -ln(2)/(ln(1-f)) multiple times. I do this via the expression
np.log(2)/(-1*np.log(1-f))
When I plot the half life time for many values of f, I find that for really small values of f, Python starts rounding 1-f to the same number, even though I input the same values of f.
Is there anyway I could increase float precision so that Python could distuingish between outputs of 1-f for small changes in f?
The result you want can be achieved using numpy.log1p. It computes log(1 + x) with a better numerical precision than numpy.log(1 + x), or, as the docs say:
For real-valued input, log1p is accurate also for x so small that
1 + x == 1 in floating-point accuracy.
With this your code becomes:
import numpy as np
min_f, max_f = -32, -15
f = np.logspace(min_f, max_f, max_f - min_f + 1)
y = np.log(2)/(-1*np.log1p(-f))
This can be evaluated consistently:
import matplotlib.pyplot as plt
plt.loglog(f, y)
plt.show()
This function will only stop working if your values of f leave the range of floats, i.e. down to 1e-308. This should be sufficient for any physical measurement (especially considering that there is such a thing as a smallest physical time-scale, the Planck-time t_P = 5.39116(13)e-44 s).
I'm using numpy and matplotlib to analyze data output form my simulations. There is one (apparent) inconsistency that I can't find the roots of. It's the following:
I have a signal that has a given energy a^2~1. When I use rfft to take the FFT and compute the energy in the Fourier space, it comes out to be significantly larger. To void giving the details of my data etc., here is an example with a simple sin wave:
from pylab import *
xx=np.linspace(0.,2*pi,128)
a=np.zeros(128)
for i in range(0,128):
a[i]=sin(xx[i])
aft=rfft(a)
print mean(abs(aft)**2),mean(a**2)
In principle both the numbers should be the same (at least in the numerical sense) but this is what I get out of this code:
62.523081632 0.49609375
I tried to go through numpy.fft documentation but could not find anything. A search here gave the following but I was not able to understand the explanations there:
Big FFT amplitude difference between the existing (synthesized) signal and the filtered signal
What am I missing/ misunderstanding? Any help/ pointer in this regard would be greatly appreciated.
Thanks!
Henry is right on the non-normalization part, but there is a little more to it, because you are using rfft, not fft. The following is consistent with his answer:
>>> x = np.linspace(0, 2 * np.pi, 128)
>>> y = 1 - np.sin(x)
>>> fft = np.fft.fft(y)
>>> np.mean((fft * fft.conj()).real)
191.49999999999991
>>> np.mean(y**2)
1.4960937500000004
>>> fft = fft / np.sqrt(len(fft))
>>> np.mean((fft * fft.conj()).real)
1.4960937499999991
But if you now try the same with rfft, things don't quite work out:
>>> rfft = np.fft.rfft(y)
>>> np.mean((rfft * rfft.conj()).real)
314.58462009358772
>>> rfft /= np.sqrt(len(rfft))
>>> np.mean((rfft * rfft.conj()).real)
4.8397633860551954
65
>>> np.mean((rfft * rfft.conj()).real) / len(rfft)
4.8397633860551954
The following does work properly, though:
>>> (rfft[0] * rfft[0].conj() +
... 2 * np.sum(rfft[1:] * rfft[1:].conj())).real / len(y)
1.4960937873636722
When you use rfft what you are getting is not properly the DFT of your data, but only the positive half of it, since the negative would be symmetric to it. To compute the mean, you need to consider every value other than the DC component twice, which is what the last line of code does.
In most FFT libraries, the various DFT flavours are not orthogonal. The numpy.fft library applies the necessary normalizations only during the inverse transform.
Consider the Wikipedia description of the DFT; the inverse DFT has the 1/N term that the DFT does not have (in which N is the length of the transform). To make an orthogonal version of the DFT, you need to scale the result of the un-normalised DFT by 1/sqrt(N). In this case, the transform is orthogonal (that is, if we define the orthogonal DFT as F, then the inverse DFT is the conjugate, or hermitian, transpose of F).
In your case, you can get the correct answer by simply scaling aft by 1.0/sqrt(len(a)) (note that N is found from the length of the transform; the real FFT just throws about half the values away, so it's the length of a that is important).
I suspect that the reason for leaving the normalization until the end is that in most situations, it doesn't matter and you therefore save the computational cost of doing the normalization twice. Indeed, the very quick FFTW library doesn't do any normalization in either direction, and leaves it entirely up to the user to deal with.
Edit: Just to be clear, the explanation above is not quite correct. The correct answer will not be arrived at with that simple scaling, as in your case the DC component will be added in twice, although 1.0/sqrt(len(a)) is still the correct scaling to produce the unitary transform.