I use UnivariateSpline from scipy module to fit data.It works for almost all cases except for this one, which gives rise to Process finished with exit code -1073741819 (0xC0000005) error. If I change smoothing factor s to 0, it also works. Any suggestions to solve this problem will help.
Update1
My working environment is:
python 3.7
scipy 1.3.2
numpy 1.17.4
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import UnivariateSpline, InterpolatedUnivariateSpline
x = np.arange(78)
y = np.asarray([
0., 0., 0., 0., 0., 0.,
0., 0., 5.03989319, 4.03191455, 4.03191455, 3.02393591,
3.02393591, 2.01595727, 2.01595727, 1.00797864, 0., 0.,
0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0.])
spl = UnivariateSpline(x, y, k=1, s=0.01)
knots = list(map(int, spl.get_knots()))
plt.plot(knots, y[knots], 'rx')
plt.plot(knots, y[knots], 'r-')
plt.plot(x, y, 'b-')
plt.show()
The combination of you s and k parameter are causing the issue.
According to the documentation, the number of knots increases until the condition sum((w[i] * (y[i]-spl(x[i])))**2, axis=0) <= s is met. However, because you have a limited number of non-zero data points, you can only add so many meaningful knots to the data set, and because you are doing k=1 spline (as opposed to cubic for example), the difference between the spline value and the data values is never reaching the prescribed s value.
Your options include increasing k (I tested with k=3 and it worked) or increase the s value to have a less strict condition (anything above s=0.08 worked for me). Note your code worked when s=0 because for that condition, instead of doing a smoothing, the algorithm just interpolates between each point and does no smoothing (which maybe is what you want).
Related
I am currently working on an image processing project and I am using a Kalman filter for the algorithm, among other things. However, the computation time of the Kalman filter is very slow compared to other software components, despite the use of numpy.
The predict function is very fast. The update function, however, is not. I think the reason for that could be the calculation of the inverse of the 2x2 matrix np.linalg.inv().
Does anyone have an idea for a faster calculation? Possibly hardcoded or how to rearrange the equation to avoid the inverse calculation?
I also appreciate other comments on how to get the code faster. I may have overlooked something as well.
Thank you very much in advance!
KalmanFilter.py:
class KalmanFilter(object):
def __init__(self, dt, u_x,u_y, std_acc, x_std_meas, y_std_meas):
"""
:param dt: sampling time (time for 1 cycle)
:param u_x: acceleration in x-direction
:param u_y: acceleration in y-direction
:param std_acc: process noise magnitude
:param x_std_meas: standard deviation of the measurement in x-direction
:param y_std_meas: standard deviation of the measurement in y-direction
"""
# Define sampling time
self.dt = dt
# Define the control input variables
self.u = np.array([u_x,u_y])
# Intial State
self.x = np.array([0.,0.,0.,0.,0.,0.]) # x, x', x'', y, y', y''
# Define the State Transition Matrix A
self.A = np.array([[1., self.dt, 0.5*self.dt**2., 0., 0., 0.],
[0., 1., self.dt, 0., 0., 0.],
[0., 0., 1., 0., 0., 0.],
[0., 0., 0., 1., self.dt, 0.5*self.dt**2.],
[0., 0., 0., 0., 1., self.dt],
[0., 0., 0., 0., 0., 1.]])
# Define the Control Input Matrix B
self.B = np.array([[(self.dt**2.)/2., 0.],
[0.,(self.dt**2.)/2.],
[self.dt,0.],
[0.,self.dt]])
# Define Measurement Mapping Matrix
self.H = np.array([[1., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0.]])
# Initial Process Noise Covariance
self.Q = np.array([[(self.dt**4.)/4., (self.dt**3.)/2., (self.dt**2.)/2., 0., 0., 0.],
[(self.dt**3.)/2., self.dt**2., self.dt, 0., 0., 0.],
[(self.dt**2.)/2., self.dt, 1., 0., 0., 0.],
[0., 0., 0., (self.dt**4.)/4., (self.dt**3.)/2., (self.dt**2.)/2.],
[0., 0., 0., (self.dt**3.)/2., self.dt**2., self.dt],
[0., 0., 0., (self.dt**2.)/2., self.dt, 1.]]) * std_acc**2.
# Initial Measurement Noise Covariance
self.R = np.array([[x_std_meas**2.,0.],
[0., y_std_meas**2.]])
# Initial Covariance Matrix
self.P = np.array([[500., 0., 0., 0., 0., 0.],
[0., 500., 0., 0., 0., 0.],
[0., 0., 500., 0., 0., 0.],
[0., 0., 0., 500., 0., 0.],
[0., 0., 0., 0., 500., 0.],
[0., 0., 0., 0., 0., 500.]])
# Initial Kalman Gain
self.K = np.zeros((6, 2))
# Initial System uncertainity
self.S = np.zeros((2, 2))
def predict(self):
# Update time state
if self.A is not None and self.u[0] is not None:
self.x = dot(self.A, self.x) + dot(self.B, self.u)
else:
self.x = dot(self.A, self.x)
# Calculate error covariance
self.P = np.dot(np.dot(self.A, self.P), self.A.T) + self.Q
return self.x[0], self.x[3]
def update(self, z):
self.S = np.dot(self.H, np.dot(self.P, self.H.T)) + self.R
# Calculate the Kalman Gain
self.K = np.dot(np.dot(self.P, self.H.T), np.linalg.inv(self.S))
self.x = self.x + np.dot(self.K, (z - np.dot(self.H, self.x)))
I = np.eye(self.H.shape[1])
# Update error covariance matrix
helper = I - np.dot(self.K, self.H)
self.P = np.dot(np.dot(helper, self.P), helper.T) + np.dot(self.K, np.dot(self.R, self.K.T))
return self.x[0], self.x[3]
I replaced all existing functions with numpy, but not much changed.
In my previous search on the internet, I found that np.linalg.inv() could be the reason why it's slow. But I can't find a solution to get rid of it.
You might get some speedup this way:
def __init__(self, dt, u_x,u_y, std_acc, x_std_meas, y_std_meas):
# your existing code
self.I = np.eye(self.H.shape[1])
And,
def update(self, z):
# you can cut down on 1 dot product if you save P#H.T in an intermediate variable
P_HT = np.dot(self.P, self.H.T)
self.S = np.dot(self.H, P_HT) + self.R
# Calculate the Kalman Gain
self.K = np.dot(P_HT, np.linalg.inv(self.S))
self.x = self.x + np.dot(self.K, (z - np.dot(self.H, self.x)))
# Update error covariance matrix >>> use self.I
helper = self.I - np.dot(self.K, self.H)
self.P = np.dot(np.dot(helper, self.P), helper.T) + np.dot(self.K, np.dot(self.R, self.K.T))
return self.x[0], self.x[3]
I have a .npy file here
Its just a file with an object that is a list of images and their labels. for example:
{
'2007_002760': array([0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.,0., 0., 0.], dtype=float32),
'2008_004036': array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0.,0., 0., 0.], dtype=float32)
}
I want to open the file and get its length, and then possibly add to it or modify it
I am able to open the file, but I cant get the length of items in it.
Heres how i open it:
import numpy as np
file = np.load('cls_labels.npy', allow_pickle = True)
print(file.size)
What am I missing here?
Your file contains a dictionary wrapped inside a 0-dimensional numpy object. The magic to extract the actual information is:
my_dictionary = file[()]
This is a standard dictionary whose keys are strings like '2008_004036' and whose values are numpy arrays.
Edit: And as mentioned above, you shouldn't be saving dictionaries using numpy.save(), you should have been using pickle. You end up with horrors like file[()].
here is the correct and easiest way to do it:
cls_labels = np.load('cls_labels.npy', allow_pickle = True).item()
I want to run combined non max suppression in a set of
windows for an image.
I am using tf.image.combined_non_max_suppression from tensorflow as follow:
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from PIL import Image
import numpy as np
import tensorflow as tf
boxesX=np.array(([200,100,150,100],[220,120,150,100],[190,110,150,100],[210,112,150,100])).astype('float32')
scoresX=np.array(([0.2,0.7,0.1],[0.1,0.8,0.1],[0.3,0.6,0.1],[0.05,0.9,0.05]))
boxes1=tf.reshape(boxesX,(1,4,1,4))
boxes2=tf.dtypes.cast(boxes1, tf.float32)
scores1=tf.reshape(scoresX,(1,4,3))
scores2=tf.dtypes.cast(scores1, tf.float32)
boxes, scores, classes, valid_detections = tf.image.combined_non_max_suppression(
boxes=boxes2,
scores=scores2,
max_output_size_per_class=10,
max_total_size=10,
iou_threshold=0.5,
score_threshold=0.2)
But the output 'boxes' is just an array of zeros and ones:
array([[[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]]], dtype=float32)>
The boxes are being clipped between [0,1]. All you need to do is add the argument clip_boxes=False:
boxes, scores, classes, valid_detections = tf.image.combined_non_max_suppression(
boxes=boxes2,
scores=scores2,
max_output_size_per_class=10,
max_total_size=10,
iou_threshold=0.5,
score_threshold=0.2,
clip_boxes=False)
For example I need 30x30 numpy arrays created from images to be fed to a neural net. If I have a directory of images to predict, I should be able to loop through the directory, get image data and create an (n,30,30) shape np array
This is my current method, I intend to reshape each row before feeding to the model
def get_image_vectors(path):
img_list=os.listdir(path)
print(img_list)
X=np.empty((900,))
for img_file in img_list:
img= Image.open(os.path.join(path,img_file))
img_grey= img.convert("L")
resized = img_grey.resize((30,30))
flattened = np.array(resized.getdata())
# print(flattened.shape)
X=np.vstack((X,flattened))
print(img_file,'=>',X.shape)
return X[1:,:]
Instead of appending to an existing array, it will probably be better to use a list initially, appending to it, and converting to an array at the end. thus saving many redundant modifications of np arrays.
Here a toy example:
import numpy as np
def get_image_vectors():
X= [] #Create empty list
for i in range(10):
flattened = np.zeros(900)
X.append(flattened) #Append some np array to it
return np.array(X) #Create array from the list
With result:
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]])
I am attempting to classify some data with the scikit learn LDA classifier. I'm not entirely sure what to "expect" from it, but what I am getting is weird. Seems like a good opportunity to learn about either a shortcoming of the technique, or a way in which I am applying it wrong. I understand that no line could completely separate this data, but it seems that there are much "better" lines than the one it is finding. I'm just using the default options. Any thoughts on how to do this better? I'm using LDA because it is linear in the size of my dataset. Although I think a linear SVM has a similar complexity. Perhaps it would be better for such data? I will update when I have tested other possibilities.
The picture: (light blue is what my LDA classifier predicts will be dark blue)
The code:
import numpy as np
from numpy import array
import matplotlib.pyplot as plt
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
import itertools
X = array([[ 0.23125754, 0.79170351],
[ 0.78021491, -0.24999486],
[ 0.00856446, 0.41452734],
[ 0.66381753, -0.09872504],
[-0.03178685, 0.04876317],
[ 0.65574645, -0.68214948],
[ 0.14290684, 0.38256002],
[ 0.05156987, 0.11094875],
[ 0.06843403, 0.19110019],
[ 0.24070898, -0.07403764],
[ 0.03184353, 0.4411446 ],
[ 0.58708124, -0.38838008],
[-0.00700369, 0.07540799],
[-0.01907816, 0.07641038],
[ 0.30778608, 0.30317186],
[ 0.55774143, -0.38017325],
[-0.00957214, -0.03303287],
[ 0.8410637 , 0.158594 ],
[-0.00294113, -0.00380608],
[ 0.26577841, 0.07833684],
[-0.32249375, 0.49290502],
[ 0.11313078, 0.35697211],
[ 0.41153679, -0.4471876 ],
[-0.00313315, 0.30065913],
[ 0.14344143, -0.19127107],
[ 0.04857767, 0.01339191],
[ 0.5865007 , 0.71209886],
[ 0.08157439, 0.40909955],
[ 0.72495202, 0.29583866],
[-0.09391461, 0.17976605],
[ 0.06149141, 0.79323099],
[ 0.52208024, -0.2877661 ],
[ 0.01992141, -0.00435266],
[ 0.68492617, -0.46981335],
[-0.00641231, 0.29699622],
[ 0.2369677 , 0.140319 ],
[ 0.6602586 , 0.11200433],
[ 0.25311836, -0.03085372],
[-0.0895014 , 0.45147252],
[-0.18485667, 0.43744524],
[ 0.94636701, 0.16534406],
[ 0.01887734, -0.07702135],
[ 0.91586801, 0.17693792],
[-0.18834833, 0.31944796],
[ 0.20468328, 0.07099982],
[-0.15506378, 0.94527383],
[-0.14560083, 0.72027034],
[-0.31037647, 0.81962815],
[ 0.01719756, -0.01802322],
[-0.08495304, 0.28148978],
[ 0.01487427, 0.07632112],
[ 0.65414479, 0.17391618],
[ 0.00626276, 0.01200355],
[ 0.43328095, -0.34016614],
[ 0.05728525, -0.05233956],
[ 0.61218382, 0.20922571],
[-0.69803697, 2.16018536],
[ 1.38616732, -1.86041621],
[-1.21724616, 2.72682759],
[-1.26584365, 1.80585403],
[ 1.67900048, -2.36561699],
[ 1.35537903, -1.60023078],
[-0.77289615, 2.67040114],
[ 1.62928969, -1.20851808],
[-0.95174264, 2.51515935],
[-1.61953649, 2.34420531],
[ 1.38580104, -1.9908369 ],
[ 1.53224512, -1.96537012]])
y = array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1.])
classifier = LDA()
classifier.fit(X,y)
xx = np.array(list(itertools.product(np.linspace(-4,4,300), np.linspace(-4,4,300))))
yy = classifier.predict(xx)
b_colors = ['salmon' if yyy==0 else 'deepskyblue' for yyy in yy]
p_colors = ['r' if yyy==0 else 'b' for yyy in y]
plt.scatter(xx[:,0],xx[:,1],s=1,marker='o',edgecolor=b_colors,c=b_colors)
plt.scatter(X[:,0], X[:,1], marker='o', s=5, c=p_colors, edgecolor=p_colors)
plt.show()
UPDATE: Changing from using sklearn.discriminant_analysis.LinearDiscriminantAnalysis to sklearn.svm.LinearSVC also using the default options gives the following picture:
I think using the zero-one loss instead of the hinge loss would help, but sklearn.svm.LinearSVC doesn't seem to allow custom loss functions.
UPDATE: The loss function to sklearn.svm.LinearSVC approaches the zero-one loss as the parameter C goes to infinity. Setting C = 1000 gives me what I was originally hoping for. Not posting this as an answer, because the original question was about LDA.
picture:
LDA models each class as a Gaussian, so the model for each class is determined by the class' estimated mean vector and covariance matrix.
Judging by the eye only, your blue and red classes have approximately the same mean and same covariance, which means the 2 Gaussians will 'sit' on top of each other, and the discrimination will be poor. Actually it also means that the separator (the blue-pink border) will be noisy, that is it will change a lot between random samples of your data.
Btw your data is clearly not linearly-separable, so every linear model will have a hard time discriminating the data.
If you must use a linear model, try using LDA with 3 components, such that the top-left blue blob is classified as '0', the bottom-right blue blob as '1', and the red as '2'. This way you will get a much better linear model. You can do it by preprocessing the blue class with a clustering algorithm with K=2 classes.