GPS data outliers

GPS data outliers - python

I have set of latitude and longitude points but i am having hard time to build algorithm that could remove outlier points(shown in orange circles). What i tried is to get the vector angle between points and then remove if reach certain threshold(i.e. large angle), is there a better way? also, when i use use the dataframe in pandas, i have problem with the looping. loop when the vector is large, back to the base point(red circle) then remove until the next point which is somewhat straight. thanks in advance.enter image description here
Here is my sample code:
df_4 = df_3.copy()
def angle(vector1, vector2):
unit_vector_1 = vector1 / np.linalg.norm(vector1)
unit_vector_2 = vector2 / np.linalg.norm(vector2)
dot_product = np.dot(unit_vector_1, unit_vector_2)
angle = np.arccos(dot_product)
return np.degrees(angle)
df_4.reset_index(inplace=True)
th = 20
angle_vector = 0
idx=0
a=0
for x in range(len(df_4)):
print('for:',str(idx))
vector1 = [(df_4.loc[idx+1,'Longitude'] - df_4.loc[idx,'Longitude']),(df_4.loc[idx+1,'Latitude']-df_4.loc[idx,'Latitude'])]
vector2 = [(df_4.loc[idx+2,'Longitude'] - df_4.loc[idx+1,'Longitude']),(df_4.loc[idx+2,'Latitude']-df_4.loc[idx+1,'Latitude'])]
angle_vector = angle(vector1,vector2)
df_4.at[idx+1,'angle_vector'] = angle_vector
#print(angle_vector)
print('before:',len(df_4))
i = 0
while(angle_vector > th & (angle_vector not in [a for a in range(86,96)])):
df_4.drop(df_4.index[idx+1], inplace=True )
df_4.reset_index(inplace=True,drop=True)
print('after:',len(df_4))
print('while:',str(i))
vector3 = [(df_4.loc[idx+i,'Longitude'] - df_4.loc[idx+i-a,'Longitude']), (df_4.loc[idx+i,'Latitude']-df_4.loc[idx+i-a,'Latitude'])]
angle_vector = angle(vector1,vector3)
i += 1
idx+=1
`

Related

The area & center of gravity of a polygon having non-uniform density of vertices? (in Python)

I would like to calculate the COG of a polygon shaped exactly like the contour map of my town. However, using the available database of borderpoints would produce a rigged result, since some places have much bigger density of borderpoints than others, so the center of gravity would be skewed towards these regions. I tried to equalise the density of vertices by producing this Python code:
import numpy as np
punkty = open("borderpoints.txt","r", encoding = "utf8")
tempp = []
a = []
for line in punkty:
for c in line:
if c != " ":
tempp.append(c)
else:
p = "".join(tempp)
a.append(p)
tempp = []
i = 0
x= []
y = []
fx = open("outx1.txt", "w")
fy = open("outy1.txt", "w")
while i<len(a)-1:
x.append(a[i])
fx.write(a[i])
fx.write("\n")
y.append(a[i+1])
fy.write(a[i+1])
fy.write("\n")
i= i+2
j = 0
jump = 20
newxs = []
newys = []
fnx = open("newxs.txt","w")
fny = open("newys.txt", "w")
while j<len(x):
L = np.sqrt(pow((float(y[j+1])-float(y[j])),2)+pow((float(x[j+1])-float(x[j])),2))
n = jump*L
interval = (float(y[j+1])-float(y[j]))/n
k = 1
slope = (float(x[j+1])-float(x[j]))/(float(y[j+1])-float(y[j]))
inters = float(x[j+1])-slope*float(y[j+1])
while k<n+1:
g = float(y[j])+k*interval
newxs.append(g)
fnx.write(str(g))
fnx.write("\n")
g = (slope*(float(y[j])+k*interval)+inters)
newys.append(g)
fny.write(str(g))
fny.write("\n")
k = k+1
j = j+2
k=1
newxs.append(x)
newys.append(y)
but in the result, the points were denser everywhere except places that were previously empty and were supposed to get populated by the algorithm.
The graphs of the map before the application of the algorithm and
after (some proportions may vary but the main problem is the empty spot).
What is the approach that I could use in solving this problem? How to make the points distributed equally or maybe it's possible to calculate the COG with some other method?
My aim is that the amount of points shouldn't determine the COG, but rather determine the position of polygon sides - these are most important here, but obviously there is no database for them and it's harder to calculate the COG having a lot of linear functions and their ranges.

How to display a 2d interpolation function in python as a matrix?

I looked around a lot but it's hard to find an answer. Basically when one interpolates v -> w you would normally use one of the many interpolation functions. But I want to get the corresponding matrix Av = w.
In my case w is a 200x200 matrices with v beeing a random subset of w with half as many points. I don't really care for fancy math it could be as simple as weighting the known points by distance squared. I already tried just implementing it all with some for loops but it only really works with small values. But maybe it helps explaining my question.
from random import sample
def testScatter(xbig, ybig):
NumberOfPoints = int(xbig * ybig / 2) #half as many points as in full Sample
#choose random coordinates
Index = sample(range(xbig * ybig),NumberOfPoints)
IndexYScatter = np.remainder(Index, xbig)
IndexXScatter = np.array((Index - IndexYScatter) / xbig, dtype=int)
InterpolationMatrix = np.zeros((xbig * ybig , NumberOfPoints), dtype=np.float32)
WeightingSum = np.zeros(xbig * ybig )
coordsSamplePoints = []
for i in range(NumberOfPoints): #first set all the given points (no need to interpolate)
coordsSamplePoints.append(IndexYScatter[i] + xbig * IndexXScatter[i])
InterpolationMatrix[coordsSamplePoints[i], i] = 1
WeightingSum[coordsSamplePoints[i]] = 1
for x in range(xbig * ybig): #now comes the interpolation
if x not in coordsSamplePoints:
YIndexInterpol = x % xbig #xcoord in interpolated matrix
XIndexInterpol = (x - YIndexInterpol) / xbig #ycoord in interp. matrix
for y in range(NumberOfPoints):
XIndexScatter = IndexXScatter[y]
YIndexScatter = IndexYScatter[y]
distanceSquared = (np.float32(YIndexInterpol) - np.float32(YIndexScatter))**2+(np.float32(XIndexInterpol) - np.float32(XIndexScatter))**2
InterpolationMatrix[x,y] = 1/distanceSquared
WeightingSum[x] += InterpolationMatrix[x,y]
return InterpolationMatrix/ WeightingSum[:,None] , IndexXScatter, IndexYScatter

You need to spend some time with the Numpy documentation start at the top of this page and working your way down. Studying answers here on SO for questions asking how to vectorize an operation when using Numpy array's would help you. If you find that you are iterating over indices and performing calcs with Numpy arrays there is probably a better way.
First cut...
The first for loop can be replaced with:
coordsSamplePoints = IndexYScatter + (xbig * IndexXScatter)
InterpolationMatrix[coordsSamplePoints,np.arange(coordsSamplePoints.shape[0])] = 1
WeightingSum[coordsSamplePoints] = 1
This mainly makes use of elementwise arithmetic and Index arrays - the complete Indexing Tutorial should be read
You can test this by enhancing the function and executing the for loop along with Numpy way then comparing the result.
...
IM = InterpolationMatrix.copy()
WS = WeightingSum.copy()
for i in range(NumberOfPoints): #first set all the given points (no need to interpolate)
coordsSamplePoints.append(IndexYScatter[i] + xbig * IndexXScatter[i])
InterpolationMatrix[coordsSamplePoints[i], i] = 1
WeightingSum[coordsSamplePoints[i]] = 1
cSS = IndexYScatter + (xbig * IndexXScatter)
IM[cSS,np.arange(cSS.shape[0])] = 1
WS[cSS] = 1
# TEST Validity
print((cSS == coordsSamplePoints).all(),
(IM == InterpolationMatrix).all(),
(WS == WeightingSum).all())
...
The outer loop:
...
for x in range(xbig * ybig): #now comes the interpolation
if x not in coordsSamplePoints:
YIndexInterpol = x % xbig #xcoord in interpolated matrix
XIndexInterpol = (x - YIndexInterpol) / xbig #ycoord in interp. matrix
...
Can be replaced with:
...
space = np.arange(xbig * ybig)
mask = ~(space == cSS[:,None]).any(0)
iP = space[mask] # points to interpolate
yIndices = iP % xbig
xIndices = (iP - yIndices) / xbig
...
Complete solution:
import random
import numpy as np
def testScatter(xbig, ybig):
NumberOfPoints = int(xbig * ybig / 2) #half as many points as in full Sample
#choose random coordinates
Index = random.sample(range(xbig * ybig),NumberOfPoints)
IndexYScatter = np.remainder(Index, xbig)
IndexXScatter = np.array((Index - IndexYScatter) / xbig, dtype=int)
InterpolationMatrix = np.zeros((xbig * ybig , NumberOfPoints), dtype=np.float32)
WeightingSum = np.zeros(xbig * ybig )
coordsSamplePoints = IndexYScatter + (xbig * IndexXScatter)
InterpolationMatrix[coordsSamplePoints,np.arange(coordsSamplePoints.shape[0])] = 1
WeightingSum[coordsSamplePoints] = 1
IM = InterpolationMatrix
cSS = coordsSamplePoints
WS = WeightingSum
space = np.arange(xbig * ybig)
mask = ~(space == cSS[:,None]).any(0)
iP = space[mask] # points to interpolate
yIndices = iP % xbig
xIndices = (iP - yIndices) / xbig
dSquared = ((yIndices[:,None] - IndexYScatter) ** 2) + ((xIndices[:,None] - IndexXScatter) ** 2)
IM[iP,:] = 1/dSquared
WS[iP] = IM[iP,:].sum(1)
return IM / WS[:,None], IndexXScatter, IndexYScatter
I'm getting about 200x improvement with this over your original with (100,100) for the arguments. Probably some other minor improvements but they won't effect execution time significantly.
Broadcasting is another Numpy skill that is a must.

np.where() to eliminate data, where coordinates are too close to each other

I'm doing aperture photometry on a cluster of stars, and to get easier detection of background signal, I want to only look at stars further apart than n pixels (n=16 in my case).
I have 2 arrays, xs and ys, with the x- and y-values of all the stars' coordinates:
Using np.where I'm supposed to find the indexes of all stars, where the distance to all other stars is >= n
So far, my method has been a for-loop
import numpy as np
# Lists of coordinates w. values between 0 and 2000 for 5000 stars
xs = np.random.rand(5000)*2000
ys = np.random.rand(5000)*2000
# for-loop, wherein the np.where statement in question is situated
n = 16
for i in range(len(xs)):
index = np.where( np.sqrt( pow(xs[i] - xs,2) + pow(ys[i] - ys,2)) >= n)
Due to the stars being clustered pretty closely together, I expected a severe reduction in data, though even when I tried n=1000 I still had around 4000 datapoints left

Using just numpy (and part of the answer here)
X = np.random.rand(5000,2) * 2000
XX = np.einsum('ij, ij ->i', X, X)
D_squared = XX[:, None] + XX - 2 * X.dot(X.T)
out = np.where(D_squared.min(axis = 0) > n**2)
Using scipy.spatial.pdist
from scipy.spatial import pdist, squareform
D_squared = squareform(pdist(x, metric = 'sqeuclidean'))
out = np.where(D_squared.min(axis = 0) > n**2)
Using a KDTree for maximum fast:
from scipy.spatial import KDTree
X_tree = KDTree(X)
in_radius = np.array(list(X_tree.query_pairs(n))).flatten()
out = np.where(~np.in1d(np.arange(X.shape[0]), in_radius))

np.random.seed(seed=1)
xs = np.random.rand(5000,1)*2000
ys = np.random.rand(5000,1)*2000
n = 16
mask = (xs>=0)
for i in range(len(xs)):
if mask[i]:
index = np.where( np.sqrt( pow(xs[i] - x,2) + pow(ys[i] - y,2)) <= n)
mask[index] = False
mask[i] = True
x = xs[mask]
y = ys[mask]
print(len(x))
4220

You can use np.subtract.outer for creating the pairwise comparisons. Then you check for each row whether the distance is below 16 for exactly one item (which is the comparison with the particular start itself):
distances = np.sqrt(
np.subtract.outer(xs, xs)**2
+ np.subtract.outer(ys, ys)**2
)
indices = np.nonzero(np.sum(distances < 16, axis=1) == 1)

Calculating mean value of a 2D array as a function of distance from the center in Python

I'm trying to calculate the mean value of a quantity(in the form of a 2D array) as a function of its distance from the center of a 2D grid. I understand that the idea is that I identify all the array elements that are at a distance R from the center, and then add them up and divide by the number of elements. However, I'm having trouble actually identifying an algorithm to go about doing this.
I have attached a working example of the code to generate the 2d array below. The code is for calculating some quantities that are resultant from gravitational lensing, so the way the array is made is irrelevant to this problem, but I have attached the entire code so that you could create the output array for testing.
import numpy as np
import multiprocessing
import matplotlib.pyplot as plt
n = 100 # grid size
c = 3e8
G = 6.67e-11
M_sun = 1.989e30
pc = 3.086e16 # parsec
Dds = 625e6*pc
Ds = 1726e6*pc #z=2
Dd = 1651e6*pc #z=1
FOV_arcsec = 0.0001
FOV_arcmin = FOV_arcsec/60.
pix2rad = ((FOV_arcmin/60.)/float(n))*np.pi/180.
rad2pix = 1./pix2rad
Renorm = (4*G*M_sun/c**2)*(Dds/(Dd*Ds))
#stretch = [10, 2]
# To create a random distribution of points
def randdist(PDF, x, n):
#Create a distribution following PDF(x). PDF and x
#must be of the same length. n is the number of samples
fp = np.random.rand(n,)
CDF = np.cumsum(PDF)
return np.interp(fp, CDF, x)
def get_alpha(args):
zeta_list_part, M_list_part, X, Y = args
alpha_x = 0
alpha_y = 0
for key in range(len(M_list_part)):
z_m_z_x = (X - zeta_list_part[key][0])*pix2rad
z_m_z_y = (Y - zeta_list_part[key][1])*pix2rad
alpha_x += M_list_part[key] * z_m_z_x / (z_m_z_x**2 + z_m_z_y**2)
alpha_y += M_list_part[key] * z_m_z_y / (z_m_z_x**2 + z_m_z_y**2)
return (alpha_x, alpha_y)
if __name__ == '__main__':
# number of processes, scale accordingly
num_processes = 1 # Number of CPUs to be used
pool = multiprocessing.Pool(processes=num_processes)
num = 100 # The number of points/microlenses
r = np.linspace(-n, n, n)
PDF = np.abs(1/r)
PDF = PDF/np.sum(PDF) # PDF should be normalized
R = randdist(PDF, r, num)
Theta = 2*np.pi*np.random.rand(num,)
x1= [R[k]*np.cos(Theta[k])*1 for k in range(num)]
y1 = [R[k]*np.sin(Theta[k])*1 for k in range(num)]
# Uniform distribution
#R = np.random.uniform(-n,n,num)
#x1= np.random.uniform(-n,n,num)
#y1 = np.random.uniform(-n,n,num)
zeta_list = np.column_stack((np.array(x1), np.array(y1))) # List of coordinates for the microlenses
x = np.linspace(-n,n,n)
y = np.linspace(-n,n,n)
X, Y = np.meshgrid(x,y)
M_list = np.array([0.1 for i in range(num)])
# split zeta_list, M_list, X, and Y
zeta_list_split = np.array_split(zeta_list, num_processes, axis=0)
M_list_split = np.array_split(M_list, num_processes)
X_list = [X for e in range(num_processes)]
Y_list = [Y for e in range(num_processes)]
alpha_list = pool.map(
get_alpha, zip(zeta_list_split, M_list_split, X_list, Y_list))
alpha_x = 0
alpha_y = 0
for e in alpha_list:
alpha_x += e[0]
alpha_y += e[1]
alpha_x_y = 0
alpha_x_x = 0
alpha_y_y = 0
alpha_y_x = 0
alpha_x_y, alpha_x_x = np.gradient(alpha_x*rad2pix*Renorm,edge_order=2)
alpha_y_y, alpha_y_x = np.gradient(alpha_y*rad2pix*Renorm,edge_order=2)
det_A = 1 - alpha_y_y - alpha_x_x + (alpha_x_x)*(alpha_y_y) - (alpha_x_y)*(alpha_y_x)
abs = np.absolute(det_A)
I = abs**(-1.)
O = np.log10(I+1)
plt.contourf(X,Y,O,100)
The array of interest is O, and I have attached a plot of how it should look like. It can be different based on the random distribution of points.
What I'm trying to do is to plot the mean values of O as a function of radius from the center of the grid. In the end, I want to be able to plot the average O as a function of distance from center in a 2d line graph. So I suppose the first step is to define circles of radius R, based on X and Y.
def circle(x,y):
r = np.sqrt(x**2 + y**2)
return r
Now I just have to figure out a way to find all the values of O, that have the same indices as equivalent values of R. Kinda confused on this part and would appreciate any help.

You can find the geometric coordinates of a circle with center (0,0) and radius R as such:
phi = np.linspace(0, 1, 50)
x = R*np.cos(2*np.pi*phi)
y = R*np.sin(2*np.pi*phi)
these values however will not fall on the regular pixel grid but in between.
In order to use them as sampling points you can either round the values and use them as indexes or interpolate the values from the near pixels.
Attention: The pixel indexes and the x, y are not the same. In your example (0,0) is at the picture location (50,50).

Astrometry of a CCD image with python

I am trying to implement a really easy astrometry code. I have manually found the coordinates of a couple of stars in my picture (RA/DEC and x/y in pixel).
Everything seems straight forward but I still get weird results, that are off by a couple of degrees.
I am trying to solve for the plate constants of a CCD image I took, where I found the stars coordinates and position in the picture by hand and now I want to try and find the (0,0) point's real worlds coordinates.
I hope someone can help me with my code or can tell me how to do it properly.
Thanks a lot in advance!
Here is my code:
import numpy as np
import os
def astrometry(star_pos, xpix, ypix, focallength, target_RA, target_DEC):
pi = np.pi
DegToRad = pi / 180
RadToDeg = 180 / pi
n = len(star_pos)
(target_RA, target_DEC) = (target_RA, target_DEC)
print(target_RA, target_DEC)
# 1) Obtain star coordinates in pixel and RA/DEC
x_pix = [row[0] for row in star_pos]
y_pix = [row[1] for row in star_pos]
ra_star = [row[2] for row in star_pos]
dec_star = [row[3] for row in star_pos]
# 2) Calculate the standard coordinates of the stars
X_star = np.zeros(n)
Y_star = np.zeros(n)
for i in range(n):
X_star[i] = -(np.cos(DegToRad*dec_star[i])*np.sin(DegToRad*(ra_star[i] - target_RA)))/(np.cos(DegToRad*target_DEC)*np.cos(DegToRad*dec_star[i])*np.cos(DegToRad*(ra_star[i]-target_RA)) + np.sin(DegToRad*target_DEC)*np.sin(DegToRad*dec_star[i]))
Y_star[i] = -(np.sin(DegToRad*target_DEC)*np.cos(DegToRad*dec_star[i])*np.cos(DegToRad*(ra_star[i]-target_RA)) - np.cos(DegToRad*target_DEC)*np.sin(DegToRad*dec_star[i]))/(np.cos(DegToRad*target_DEC)*np.cos(DegToRad*dec_star[i])*np.cos(DegToRad*(ra_star[i]-target_RA)) + np.sin(DegToRad*target_DEC)*np.sin(DegToRad*dec_star[i]))
# 3) Calculate the plate constants (Check my notes)
def calc_plate_const(k,x,y,X):
c_down = ((x[k+1]-x[k])*(y[k]*x[k+2]-y[k+2]*x[k])-(x[k+2]-x[k])*(y[k]*x[k+1]-y[k+1]*x[k]))
c_up = (X[k]*x[k+1]*(y[k]*x[k+2]-y[k+2]*x[k])-X[k+1]*x[k]*(y[k]*x[k+2]-y[k+2]*x[k])-X[k]*x[k+2]*(y[k]*x[k+1]-y[k+1]*x[k])-X[k+2]*x[k]*(y[k]*x[k+1]-y[k+1]*x[k]))
c = c_up/c_down
print('c',c)
b = ((X[k]*x[k+1]-X[k+1]*x[k])-(x[k+1]-x[k])*c)/(y[k]*x[k+1]-y[k+1]*x[k])
print('b',b)
a = (X[k]-b*y[k]-c)/x[k]
print('a', a)
return(a,b,c)
(a,b,c) = calc_plate_const(0,x_pix,y_pix,X_star)
(d,e,f) = calc_plate_const(0,x_pix,y_pix,Y_star)
print(target_RA,target_DEC)
# 4) Calculate the standard coordinates for the object
# HIER object at (0,0)
(x_ob, y_ob) = (0,0)
X_ob = a*x_ob + b*y_ob + c
Y_ob = d*x_ob + e*y_ob + f
print('x', x_pix, x_ob, 'y', y_pix, y_ob)
print('X', X_star, X_ob, 'Y', Y_star, Y_ob)
print('RA', ra_star, 'DEC', dec_star)
# 5) Calculate the RA/DEC of the objects standard coordinates
a = target_RA + np.arctan(DegToRad*((-X_ob)/(np.cos(DegToRad*target_DEC)- Y_ob*np.sin(DegToRad*target_DEC))))
d = target_DEC - np.arcsin(DegToRad*((np.sin(DegToRad*target_DEC) + Y_ob*np.cos(DegToRad*target_DEC))/(np.sqrt(1 + X_ob**2 + Y_ob**2))))
print('RA in rad', a, 'DEC in rad', d)
print('RA',a,target_RA, 'DEC',d, target_DEC)
return(a,d)
The Input is for example an array with the stars position in pixel of the image and degree of the real world
star pos = [[1948.2, 1205.8, 132.34058333333334, -3.4429722222222225], [153.90000000000001, 1892.5, 131.08620833333333, -5.0947499999999994]
# star_pos [x_pos in pix, y_pos in pix, RA, DEC]
(x_pix, y_pix) = (0.0135, 0.0135)
# pixel size
focallength = 0.7168
(target_RA, target_DEC) = (131.683014444 -3.91890194444)
# But I am not sure how accurate that is, it is more of an assumption. I would say if I look at the star map manually, it looks quite a bit off...
I would expect to see for the (0,0) point RA to be around 133° and DEC -5.75°

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

GPS data outliers - python

Related

The area & center of gravity of a polygon having non-uniform density of vertices? (in Python)

How to display a 2d interpolation function in python as a matrix?

np.where() to eliminate data, where coordinates are too close to each other

Calculating mean value of a 2D array as a function of distance from the center in Python

Astrometry of a CCD image with python

Categories

Resources