numpy, same calculation different result

numpy, same calculation different result - python

I want to calculate this fomula.
I think the result is A.
So I write a python code using numpy.
But depending on the computation sequence, result is not A.
What brought this on?
import numpy as np
from numpy import *
from numpy.random import *
import decimal
#generate matrix A
A = randn(180,240)
A = np.array(A, dtype = decimal.Decimal )
#generate matrix P
h,w=A.shape
P = randn(0.9*h,h)
P = np.array(P, dtype = decimal.Decimal )
#it's OK. IA = A
PP = dot(P.T,P)
PPinv = np.linalg.inv(PP)
PPinvPP = dot(PPinv,PP)
PPinvPPinv = np.linalg.inv(PPinvPP)
I = dot(PPinvPPinv,PPinvPP)
IA = dot(I, A)
#I think IA2 must be A. but not A. Why?
PA = dot(P,A)
PPA = dot(P.T,PA)
PPinvPPA = dot(PPinv,PPA)
IA2 = dot(PPinvPPinv, PPinvPPA)
#print result
print "A;%0.2f" % A[0,0]
print "IA:%0.2f" % IA[0,0]
print "IA2:%0.2f" % IA2[0,0]

What happens here is quite interesting:
In general your formula is only correct if PP is non-singular.
So why then AI == A?
PP = dot(P.T,P)
PPinv = np.linalg.inv(PP)
PPinvPP = dot(PPinv,PP)
PPinvPPinv = np.linalg.inv(PPinvPP)
I = dot(PPinvPPinv,PPinvPP)
IA = dot(I, A)
There are a couple of things to note here:
PP = dot(P.T,P) is singular
=> PPinv is not a true inverse
but PPinvPP is invertible, so I is indeed the identity matrix
Note: You only get AI == A because of your special order of evaluation of the terms.
In the second calculation of IA2 term you don't have this special evaluation order that gives you A as result.

The main reason is when you use non-square matrix P, where height is less than width, determinant of the PP always has a zero value, but because of a calc error it's != 0. So after this it's impossible to calculate the real PPinv and any forward steps are meaningless.
P = randn(2,3)
P = np.array(P, dtype = decimal.Decimal )
PP = dot(P.T,P)
np.linalg.det(PP) #-5.2536080570332981e-34
So why is IA == A?
I think it's a situation when error*error gives you a normal result.
How to solve it?
Do not use Python for theoretical questions :)
Change P = randn(0.9*h,h) to P = randn(1.1*h,h)

Not a direct answer to your question, but you could use Sympy for these kind of problems:
from IPython.display import display
import sympy as sy
sy.init_printing() # For LaTeX-like pretty printing in IPython
n = 5
A = sy.MatrixSymbol('A', 162, 240) # dimension 162x240
P = sy.MatrixSymbol('P', 162, 180) # dimensions 162x180
PTP = P*P.T
ex1 = (PTP.inverse() * PTP).inverse() * PTP.inverse() * PTP * A
display(ex1) # displays: A

Related

Discrepancy between analytic solution and solution by relaxation method

So I am trying to solve the differential equation $\frac{d^2y}{dx^2} = -y(x)$ subject to boundary conditions y(0) = 0 and y(1) = 1 ,the analytic solution is y(x) = sin(x)/sin(1).
I am using three point stencil to approximate the double derivative.
The curves obtained through these ways should match at least at the boundaries ,but my solutions have small differences even at the boundaries.
I am attaching the code, Please tell me what is wrong.
import numpy as np
import scipy.linalg as lg
from scipy.sparse.linalg import eigs
from scipy.sparse.linalg import inv
from scipy import sparse
import matplotlib.pyplot as plt
a = 0
b = 1
N = 1000
h = (b-a)/N
r = np.arange(a,b+h,h)
y_a = 0
y_b = 1
def lap_three(r):
h = r[1]-r[0]
n = len(r)
M_d = -2*np.ones(n)
#M_d = M_d + B_d
O_d = np.ones(n-1)
mat = sparse.diags([M_d,O_d,O_d],offsets=(0,+1,-1))
#print(mat)
return mat
def f(r):
h = r[1]-r[0]
n = len(r)
return -1*np.ones(len(r))*(h**2)
def R_mat(f,r):
r_d = f(r)
R_mat = sparse.diags([r_d],offsets=[0])
#print(R_mat)
return R_mat
#def R_mat(r):
# M_d = -1*np.ones(len(r))
def make_mat(r):
main = lap_three(r) - R_mat(f,r)
return main
main = make_mat(r)
main_mat = main.toarray()
print(main_mat)
'''
eig_val , eig_vec = eigs(main, k = 20,which = 'SM')
#print(eig_val)
Val = eig_vec.T
plt.plot(r,Val[0])
'''
main_inv = inv(main)
inv_mat = main_inv.toarray()
#print(inv_mat)
#print(np.dot(main_mat,inv_mat))
n = len(r)
B_d = np.zeros(n)
B_d[0] = 0
B_d[-1] = 1
#print(B_d)
#from scipy.sparse.linalg import spsolve
A = np.abs(np.dot(inv_mat,B_d))
plt.plot(r[0:10],A[0:10],label='calculated solution')
real = np.sin(r)/np.sin(1)
plt.plot(r[0:10],real[0:10],label='analytic solution')
plt.legend()
#plt.plot(r,real)
#plt.plot(r,A)
'''diff = A-real
plt.plot(r,diff)'''

There is no guarantee of what the last point in arange(a,b+h,h) will be, it will mostly be b, but could in some cases also be b+h. Better use
r,h = np.linspace(a,b,N+1,retstep=True)
The linear system consists of the equations for the middle positions r[1],...,r[N-1]. These are N-1 equations, thus your matrix size is one too large.
You could keep the matrix construction shorter by including the h^2 term already in M_d.
If you use sparse matrices, you can also use the sparse solver A = spsolve(main, B_d).
The equations that make up the system are
A[k-1] + (-2+h^2)*A[k] + A[k+1] = 0
The vector on the right side thus needs to contain the values -A[0] and -A[N]. This should clear up the sign problem, no need to cheat with the absolute value.
The solution vector A corresponds, as constructed from the start, to r[1:-1]. As there are no values for postitions 0 and N inside, there can also be no difference.
PS: There is no relaxation involved here, foremost because this is no iterative method. Perhaps you meant a finite difference method.

Issue specifying datatype for the Mandelbrot set

I have optimized a bit on calculating the Mandelbrot set, & I now wish to be able to specify whether my arrays should be float64 or float32 instead of the easier implementation with type complex128 or complex64. I use the fact that for a complex number (a+jb)^2 = a^2-b^2 + (2ab)j, but this seems to give me a slightly different wrong mandelbrot set. The code is seen below:
from timeit import default_timer as timer
import numpy as np
from numexpr import evaluate
import matplotlib.pyplot as plt
#%% Inputs
N = 5000
I = 20
T = 2 #Thresholdenter code here
#%% Functions
def mandel_brot_vector(I,C,T,datatype):
Cre = np.array(C.real,dtype=datatype)
Cim = np.array(C.imag,dtype=datatype)
M = np.zeros(Cre.shape,dtype=datatype)
zreal=0
zimag=0
for i in range(I):
M[zreal*zreal+zimag*zimag<T**2] = i/I
zreal = evaluate("zreal*zreal-zimag*zimag+Cre") #complex multiplication rule
zimag = evaluate("2*zreal*zimag+Cim") #complex multiplication rule
N = len(M[0])
M = np.reshape(np.array(M),(N//2,N)).astype(datatype)
M = np.concatenate((M,M[::-1]),axis=0)
return M
def create_C(N,split):
C_re = np.linspace(np.full((1,N),-2)[0],np.full((1,N),1)[0],N).T
C_im = np.linspace(np.full((1,N),1.5*1j)[0],np.full((1,N),-1.5*1j)[0],N)
C = C_re+C_im
C = C[:N//2,:]
if split != 0:
C_split = np.array_split(C,split)
else:
C_split = C
return np.array(C_split)
C = create_C(N, 0)
t0_32 = timer()
M32 = mandel_brot_vector(I,C,T,np.float32)
t_32 = timer() - t0_32
t0_64 = timer()
M64 = mandel_brot_vector(I,C,T,np.float64)
t_64 = timer() - t0_64
plt.matshow(M64,cmap="hot")
print(" "*10,f"N={N}")
print(f"{'Float 32':<20}{t_32:<40}",
f"\n{'Float 64':<20}{t_64:<40}"
)
Currently the image I get: wrong mandelbrot. For reference, the following function will produce the correct mandelbrot set but with complex128:
def mandel_brot(I,C,T):
M = np.zeros(C.shape)
z=0
for i in range(I):
M[np.abs(z)<T] = i/I
z = evaluate("z*z+C")
N = len(M[0])
M = np.reshape(np.array(M),(N//2,N)).astype(datatype)
M = np.concatenate((M,M[::-1]),axis=0)
return M
Hope someone can help solve this issue, thanks in advance. Btw do not bother with the split of the C array, it is set up to run with multiprocessing which I am not using in the code attached.

Crank Nicolson Method on Wave Function Python

I am trying to propagate a gaussian wave packet using the crank nicolson method in imaginary time (multiply the time step by the unit imaginary). The code that I have written in attempt to achieve this is shown here:
import matplotlib.pyplot as plt #this allows you to plot, and changes the name to plt
import numpy as np #this allows you to do math, and changes the name to np
import math
import scipy.linalg as la
def V(x):
# k = 1
# v = k*x**4
v = 0.25*(x-3)**2+0.15*(x-3)**4
return v
def Psi(x):
psi = np.exp(-2*(x-3)**2)
return psi
#Function for computing integral using trapezoid method
def TrapInt(y, h):
trap = [(float(y[ii]) + float(y[ii+1])) for ii in range(0, len(y)-1)]
return float(h)/2*sum(trap)
N = 1000
L = 3;
h = 0.01
x = np.arange(0,6,h);
t = np.linspace(0,L,300);
t = 1j*t;
dt = t[1] - t[0]
dx = x[1] - x[0]
A = 1j*dt/(2*dx**2)
pot = V(x)
Q = np.zeros([len(x),len(x)],dtype = complex)
P = np.zeros([len(x),len(x)],dtype = complex)
wave = np.zeros([len(x),len(t)],dtype = complex)
wave[:,0] = Psi(x)
B = (1- 2*A - 1j*dt*pot)
for ii in range(0,len(x)-1):
Q[ii][ii] = -(B[ii])
P[ii][ii] = (B[ii])
Q[ii][ii+1] = (2-A)
P[ii][ii+1] = A
if ii >= 1:
Q[ii][ii-1] = -A
P[ii][ii-1] = A
plt.plot(wave[:,0])
for ii in range(0,len(t)-1):
one = np.matmul(P,wave[:,ii])
wave[:,ii+1] = np.matmul(la.inv(Q),one)
I can't seem to find any mathematical errors in my implementation of the crank nicolson method; however, whenever I try to run this it gives an error saying that Q is singular (has no inverse). I'm not sure why this is occurring. Any help is appreciated. Thanks

You never assign to Q[-1]. Zero rows have been known to produce singular matrices in some cases.
Also, don’t repeatedly invert the matrix. Probably don’t invert it at all, but rather store some decomposition of it to allow efficient calculation of Q-1x.

MATLAB fftfilt equivalent for Python

I am trying to traslate the following function created in MATLAB into Python,
function output_phase = fix_phasedata180(phase_data_degrees, averaging_length)
x = exp(sqrt(-1)*phase_data_degrees*2/180*pi);
N = averaging_length;
b = 1/sqrt(N)*ones(1,N);
y = fftfilt(b,x);y = fftfilt(b,y(end:-1:1));y = y(end:-1:1); # This is a quick implementation of filtfilt using fftfilt instead of filter
output_phase = (phase_data_degrees-(round(mod(phase_data_degrees/180*pi-unwrap(angle(y))/2,2*pi)*180/pi/180)*180));
temp = mod(output_phase(1),90);
output_phase = output_phase-output_phase(1)+temp;
output_phase = mod(output_phase,360);
s = find(output_phase>= 180);
output_phase(s) = output_phase(s)-360;
So, I am trying to implement this function defined in MATLAB into Python here
def fix_phasedata180(data_phase, averaging_length):
x = np.exp(1j*data_phase*2./180.*np.pi)
N = averaging_length
b = 1./np.sqrt(N)*np.ones(N)
y = fftfilt(b,x)
y = fftfilt(b,y[::-1])
y = y[::-1]
output_phase = data_phase - np.array(map(round,((data_phase/180.*np.pi-np.unwrap(np.angle(y))/2.)%(2.*np.pi))*180./np.pi/180.))*180
temp = output_phase[0]%90
output_phase = output_phase-output_phase[0]+temp
s = output_phase[output_phase >= 180]
for s in range(len(output_phase)):
output_phase[s] = output_phase[s]-360
return output_phase
I was thinking that the function fftfilt was a clone of fftfilt in MATLAB, when I run I have the following error
ValueError Traceback (most recent call last)
<ipython-input-40-eb6944fd1053> in <module>()
4 N = averaging_length
5 b = 1./np.sqrt(N)*np.ones(N)
----> 6 y = fftfilt(b,x)
D:/folder/fftfilt.pyc in fftfilt(b, x, *n)
66 k = min([i+N_fft,N_x])
67 yt = ifft(fft(x[i:il],N_fft)*H,N_fft) # Overlap..
---> 68 y[i:k] = y[i:k] + yt[:k-i] # and add
69 i += L
70 return y
ValueError: could not broadcast input array from shape (0,0) into shape (0)
So, my question is: are there any equivalent to MATLAB fftfilt in Python? The aim of my function output_phase is to correct the fast variations in a phase signal and then correct n*90 degrees shifts, showed bellow

The function you linked to is a Python equivalent to the Matlab function. It just happens to be broken.
Anyway, MNE also has an implementation of the overlap and add method used by the fftfilt function. It's a private function of the library, and I'm not sure if you can call it exactly equivalent to the Matlab function, but maybe it's useful. You can find the source code here: https://github.com/mne-tools/mne-python/blob/master/mne/filter.py#L41.

Finally, I got an improvement in my code. I replace the fftfilt (twice applied) by the scipy.signal.filtfilt (that is basically the same). So my code traslated into python will be:
import numpy as np
import scipy.signal as sg
AveragingLengthAmp = 10
AveragingLengthPhase = 10
PhaseFixLength = 60
averaging_length = channel_sampling_freq1*PhaseFixLength
def fix_phasedata180(data_phase, averaging_length):
data_phase = np.reshape(data_phase,len(data_phase))
x = np.exp(1j*data_phase*2./180.*np.pi)
N = float(averaging_length)
b, a = sg.butter(10, 1./np.sqrt(N))
y = sg.filtfilt(b, a, x)
output_phase = data_phase - np.array(map(round,((data_phase/180*np.pi-np.unwrap(np.angle(y))/2)%(2*np.pi))*180/np.pi/180))*180
temp = output_phase[0]%90
output_phase = output_phase-output_phase[0]+temp
s = output_phase[output_phase >= 180]
for s in range(len(output_phase)):
output_phase[s] = output_phase[s]-360
return output_phase
out1 = fix_phasedata180(data_phase, averaging_length)
def fix_phasedata90(data_phase, averaging_length):
data_phase = np.reshape(data_phase,len(data_phase))
x = np.exp(1j*data_phase*4./180.*np.pi)
N = float(averaging_length)
b, a = sg.butter(10, 1./np.sqrt(N))
y = sg.filtfilt(b, a, x)
output_phase = data_phase - np.array(map(round,((data_phase/180*np.pi-np.unwrap(np.angle(y))/4)%(2*np.pi))*180/np.pi/90))*90
temp = output_phase[0]%90
output_phase = output_phase-output_phase[0]+temp
output_phase = output_phase%360
s = output_phase[output_phase >= 180]
for s in range(len(output_phase)):
output_phase[s] = output_phase[s]-360
return output_phase
offset = 0
data_phase_unwrapped = np.zeros(len(out2))
data_phase_unwrapped[0] = out2[0]
for jj in range(1,len(out2)):
if out2[jj]-out2[jj-1] > 180:
offset = offset + 360
elif out2[jj]-out2[jj-1] < -180:
offset = offset - 360
data_phase_unwrapped[jj] = out2[jj] - offset
Here fix_phasedata180 fix the 180-degrees shifts, similarly for fix_phasedata90. The channel_sampling_freq1 is 1/sec.
The result is:
that is mostly right. Only I have some question understanding the scipy.signal.butter and scipy.signal.filtfilt. As you see, I choose:
b, a = sg.butter(10, 1./np.sqrt(N))
Here the order of the filter (N) is 10 and the critical frequency (Wn) is 1/sqrt(60). My question is, How can I choose the appropiated order of the filter? I tried since N=1 until N=21, larger than 21 the result data_phase_unwrapped are all NAN. I tried too, giving values for padlen in filtfilt but I didnt understand it well.

This is a bit late but I found the answer to this while translating some matlab code of my own.
TLDR: Use mode="full" for any of the convolve functions in scipy.signal
I leaned on scipy's recipes to guide me through this. The rest of my answer is effectively a summary of that page. Matlabs fftfilt function can be replaced with any of the convolve functions mentioned in the cookbook (np.convolve, scipy.signal.convolve, .oaconvolve, .fttconvolve), if you pass mode='full'.
import numpy as np
from numpy import convolve as np_convolve
from scipy.signal import fftconvolve, lfilter, firwin
from scipy.signal import convolve as sig_convolve
# Create the m by n data to be filtered.
m = 1
n = 2 ** 18
x = np.random.random(size=(m, n))
ntaps_list = 2 ** np.arange(2, 14)
for ntaps in ntaps_list:
# Create a FIR filter.
b = firwin(ntaps, [0.05, 0.95], width=0.05, pass_zero=False)
conv_result = sig_convolve(x, b[np.newaxis, :], mode='full')
Happy filtering!

I also had issues when converting a MATLAB code. I went from this MATLAB code:
signal_weighted = fftfilt( weight, signal.^2 ) / Ntau;
to this python code:
from scipy.signal import convolve
signal_weighted = convolve(signal**2 ,weightData, 'full', 'direct') / Ntau
signal_weighted = signal_weighted[:len(signal)]
If you want something faster than convolve, see this overlap and add fft implementation

Do I underestimate the power of NumPy.. again?

I don't think I can optimize my function anymore, but it won't be my first time that I underestimate the power of NumPy.
Given:
2 rank NumPy array with coordinates
1 rank NumPy array with elevation of each coordinate
Pandas DataFrame with stations
Function:
def Function(xy_coord):
# Apply a KDTree search for (and select) 8 nearest stations
dist_tree_real, ix_tree_real = tree.query(xy_coord, k=8, eps=0, p=1)
df_sel = df.ix[ix_tree_real]
# Fits multi-linear regression to find coefficients
M = np.vstack((np.ones(len(df_sel['POINT_X'])),df_sel['POINT_X'], df_sel['POINT_Y'],df_sel['Elev'])).T
b1,b2,b3 = np.linalg.lstsq(M,df_sel['TEMP'])[0][1:4]
# Compute IDW using the coefficients
return sum( (1/dist_tree_real)**2)**-1 * sum((df_sel['TEMP'] + (b1*(xy_coord[0] - df_sel['POINT_X'])) +
(b2*(xy_coord[1]-df_sel['POINT_Y'])) + (b3*(dem[index]-df_sel['Elev']))) *
(1/dist_tree_real)**2)
And I apply the function on the coordinates as follow:
for index, coord in enumerate(xy):
outarr[index] = func(coord)
This is an iterative process, if I try this outarr = np.vectorize(func)(xy) then Python crashes, so I guess that's something I should avoid doing.
I also prepared an IPython Notebook, so I could write LaTeX, something I've always dreamed of doing for a long time. Till now. The day has come. Yeah
Off topic: the math won't show up in the nbviewer.. on my local machine it looks like this:

My suggest is don't use DataFrame for the calculation, use numpy array only. Here is the code:
dist, idx = tree.query(xy, k=8, eps=0, p=1)
columns = ["POINT_X", "POINT_Y", "Elev", "TEMP"]
px, py, elev, tmp = df[columns].values.T[:, idx, None]
tmp = np.squeeze(tmp)
one = np.ones_like(px)
m = np.concatenate((one, px, py, elev), axis=-1)
mtm = np.einsum("ijx,ijy->ixy", m, m)
mty = np.einsum("ijx,ij->ix", m, tmp)
b1,b2,b3 = np.linalg.solve(mtm, mty)[:, 1:].T
px, py, elev = px.squeeze(), py.squeeze(), elev.squeeze()
b1 = b1[:,None]
b2 = b2[:,None]
b3 = b3[:,None]
rdist = (1/dist)**2
t0 = tmp + b1*(xy[:,0,None]-px) + b2*(xy[:,1,None]-py) + b3*(dem[:,None]-elev)
outarr = (t0*rdist).sum(1) / rdist.sum(1)
print outarr
output:
[ -499.24287422 -540.28111668 -512.43789349 -589.75389439 -411.65598912
-233.1779803 -1249.63803291 -232.4924416 -273.3978919 -289.35240473]
There are some trick in the code:
np.linalg.solve in numpy 1.8 is a generalized ufunc that can solve many linear equations by one call, but lstsq is not. So I need use solve to calculate lstsq.
To do many matrix multiply by one call, we can't use dot, einsum() does the trick, but I think it may be slower than dot. You can timeit for your real data.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy, same calculation different result - python

Related

Discrepancy between analytic solution and solution by relaxation method

Issue specifying datatype for the Mandelbrot set

Crank Nicolson Method on Wave Function Python

MATLAB fftfilt equivalent for Python

Do I underestimate the power of NumPy.. again?

Categories

Resources