I've god some satellite data which essentially is the geographical position of a satellite that circles the earth at a given time. This is data saved with a latitude, longitude and unixtime in a SQLite DB. This is retrieved as following:
latitudes = [] #Long list of latitudes
longitudes = [] #Long list of longitudes
unixtimes = [] #Long list of corresponding unixtimes
So, I'm interested to distinguish the latitude/longitude recordings for each time the satellite is over a fairly large geographical area (for each passing). However I'm unsure on how I would do this.
Now I've manually, by visual inspection of the plots of the position, found the first 'occurrence' of the satellite in that area, then I have found the next occurrence in the same way. The passing time is then the difference between each of those events. However, this passing time varies over time, so this method is not that accurate over time. An other problem is that it is dependent on the geographical positions, If I want the time of the first passing, and the passing time for any other geographical position I have to manually inspect once again. I've included my code. Note that the seq function is simply a function I've retrieved from SO that gives me the ability to iterate over non-integer increments.
def seq(start, end, step):
assert(step != 0)
sample_count = abs(end - start) / step
return itertools.islice(itertools.count(start, step), sample_count)
gridsize = 5 #Unit: degrees
upperleftlong = #Upper corner of geographical area
upperleftlat = #Upper corner of geographical area
lowerrightlong = #Lower corner of geographical area
lowerrightlat = #Lower corner of geographical area
passrate = 5500 #Time between passings in seconds
start = 1498902400 #Time of first passing
end = 1498905700 #Approximately passing length
numberofpassings = 600 #Number of passings that should be checked for
for p in range(0,numberofpassings+1):
start = 1398903400+passrate*p
end = 1398905400+passrate*p
for i in seq(lowerrightlat, upperleftlat+gridsize, gridsize):
for j in seq(upperleftlong, lowerrightlong+gridsize, gridsize):
positions = getPositionsFromDB(j,i,start,end,gridsize,databasepath, con)
So, does anyone have a clever way to signify passing rate, passing time and discover which geographical positions that belongs to each passing?
I'm working with Python and SQLite.
From the period of your satellite (5500 seconds), I fairly certain that your satellite is the Space Station. Very few other satellites are normally active at that low altitude (370 km) because of the low lifespan.
The Heavens-above site has many tools to predict sighting of the Space station (and others). Spot-the-station is dedicated and provides the predictions. Satellites calculations on-line is a large collections of tools which can be of help too.
If interested in the workings of such programs, an open source project, Predict, is available with source code.
Of course, Wikipedia has to be present, with a list of apps, and references to many libraries with tools for predictions.
Note: Integer increments are fine, but numpy can give you floating point increments if you use numpy.arange. This is much more flexible, and you can work with physical, non-scaled values, without the risk of running into integer overflows.
Related
Summary/simplified version
Given a list of track points defined by three 1-dimensional arrays (lats, lons and dtime all with same length) and a gridded 3-dimensional array rr (defined by 2-D lat_radar, lon_radar coordinate arrays and a 1-dimensional time array time_radar) I want to extract all the grid values in rr where the coordinates (latitude, longitude AND time included) are closest to the three 1-dimensional arrays.
I've managed to use cKDTree to select points in space but I don't know how to generalize the solution to space & time together. Right now I have to do the selection on time separately and it makes the code quite bulky and hard to read.
for more details about this problem see hereinafter
Extended version
I'm trying to develop an app that uses precipitation data obtained from weather radar composites to predict the precipitation along a track. Most apps usually predict the precipitation at a point without considering the point moving in time.
The idea is, given points identifying a track in space and time, find the closest grid points from radar data to obtain a precipitation estimate over the track (see plot). The final goal would be to shift the start time to identify the best time to leave to avoid rain.
I just optimized my previous algorithm, that was using plain loops, to use cKDTree from scipy. Execution time went down from 30s to 380ms :). However I think the code can still be optimized. Here is my attempt.
As input we have
lons, lats: coordinates of the track as N-dimensional arrays
dtime: timedelta T-dimensional array containing the time elapsed on the track
lon_radar, lat_radar: M x P matrices containing the coordinates of the radar data
dtime_radar: timedelta Q-dimensional array containing the radar forecast
rr: M x P X Q array containing the radar forecast at every time step
First find the grid points closest to the trajectory using cKDTree:
combined_x_y_arrays = np.dstack([lon_radar.ravel(),
lat_radar.ravel()])[0]
points_list = list(np.vstack([lons, lats]).T)
def do_kdtree(combined_x_y_arrays, points):
mytree = cKDTree(combined_x_y_arrays)
dist, indexes = mytree.query(points)
return indexes
results = do_kdtree(combined_x_y_arrays, points_list)
# As we have many duplicates, since the itinerary has a much higher resolution than the radar,
# we only select the unique points
inds_itinerary = np.unique(results)
lon_lat_itinerary = combined_x_y_arrays[inds_itinerary]
then find the closest points in the track to subset it. It doesn't make sense to have a track resolution of 10 m if the radar only has grid points every km.
combined_x_y_arrays = np.vstack([lons, lats]).T
points_list = list(lon_lat_itinerary)
results = do_kdtree(combined_x_y_arrays, points_list)
Now we can use these positions to get the elapsed time on the trajectory and the relative time steps in radar data
dtime_itinerary = dtime[results]
# find indices of these dtimes in radar dtime
inds_dtime_radar = np.abs(np.subtract.outer(dtime_radar, dtime_itinerary)).argmin(0)
Now we have everything that we need to find the precipitation so we only need one last loop. I also loop on shifts to obtain prediction with different start times.
shifts = (1, 3, 5, 7, 9)
rain = np.empty(shape=(len(shifts), len(inds_itinerary)))
for i, shift in enumerate(shifts):
temp = []
for i_time, i_space in zip(inds_dtime_radar, inds_itinerary):
temp.append(rr[i_time+shift].ravel()[i_space])
rain[i, :] = temp
In particular I would like to find a way to combine the time search with the lat-lon search for the closest points.
I have time series data for the position of two objects. The second object roughly follows the path of the first object. I want to join the two objects with a curved line that best represents the combined paths of the two objects. This is post-processing, so I already know the future paths of both objects. I can use information about where the second object will be to compute the path. Link to .csv file of source data in Google Drive - blue is columns 0,1 and yellow is columns 3,4.
My source data looks like this:
The objects are spaced fairly equally. Object two reaches the position of object one in around 50 frames. My initial approach was to take the past 25 frames of object blue object, and the future 25 frames of the yellow object. I used signal.savgol() to smooth the results (shown in pink).
positions = leading_object[frame_number - 25: frame_number]
positions += trailing_object[frame_number: frame_number + 25
x,y = zip(*positions)
window_length = int(len(x)*.5)
if window_length//2 == window_length/2: window_length -= 1
x = signal.savgol_filter(x, window_length, polyorder)
y = signal.savgol_filter(y, window_length, polyorder)
positions = list(zip(x,y))
This works okay, but the smoothed line jogs from one path to another. I'd like the path to be smooth.
Link to complete code used to generate animations.
You are essentially trying to do curve fitting for a curve that joins the two positions and interpolates some points of the two lines. As things stand the problem is a little overdetermined in that you have rather too many points. This leads to 'kinks' in the curve.
Perhaps choosing fewer points e.g. 5th, 10th, 15th of each partial trajectory to give 6 points plus your fixed endpoints would work better.
I would then choose a curve fitting strategy that gives good continuity for the derivatives such as a non uniform rational B-spline (NURB) or maybe a Chebyshev polynomial.
I have point cloud data from airborne LiDAR. It is noisy, so I want to run a median filter which collects points within N metres of each point, finds the median elevation value, and returns the neighbourhood median as an adjusted elevation value.
This is roughly analogous to gridding the data, and taking the median of elevations within each bin. Or scipy.signal.medfilt.
But - I want to preserve the location (x,y) of each point. Also I'm not sure that medfilt preserves the spatial information required.
I have a method, but it involves multiple for loops. Expensive when millions of points go in
Updated - for each iteration of the first loop, a small patch of points is selected for the shapely intersection operation. The first version searched all input points for an intersection at every iteration. Now, only a small patch at a time is converted to a shapely geometry and used for the intersection:
import numpy as np
from shapely import geometry
def spatialmedian(pointcloud,radius):
"""
Using shapely geometires, replace every point in a cloud with the
median value of points within 'radius' units of the point
'pointcloud' must have no more than 3 dimensions (x,y,z)
"""
new_z = []
i = 0
for point in pointcloud:
#pick a point and make it a shapely Point
point = geometry.Point(pointcloud[i,:])
#select a patch around the point and make it a shapely
# MultiPoint
patch = geometry.MultiPoint(list(pointcloud[\
(pointcloud[:,0] > point.x - radius+0.5) &\
(pointcloud[:,0] < point.x + radius+0.5) &\
(pointcloud[:,1] > point.y - radius+0.5) &\
(pointcloud[:,1] < point.y + radius+0.5)\
]))
#buffer the Point by radius
pbuff = point.buffer(radius)
#use the intersection method to find points in our
# patch that lie inside the Point buffer
isect = pbuff.intersection(patch)
#print(isect.geom_type)
#initialise another list
plist = []
#for every intersection set,
# unpack it into a list and collect the median
# Z value.
if isect.geom_type == 'MultiPoint':
#print('point has neightbours')
for p in isect:
plist.append(p.z)
new_z.append(np.median(plist))
else:
# if the intersection set isn't MultiPoint,
# it is an isolated point, whose median Z value
# is it's own.
#print('isolated point')
#append it to the big list
new_z.append(isect.z)
#iterate i
i += 1
#print(i)
#return a list of new median filtered Z coordinates
return new_z
This works by:
ingesting a list/array of XYZ points
the first for loop goes through the list and for every point:
picks out a patch of the point cloud just bigger than the neighbourhood specified
uses shapely to place a 3 metre buffer around the point
finds the intersection of the buffer and the whole point cloud
extracts the set of points from that operation in another for loop
finding the median and appending it to a list of new Z values
returning the list of new Z values
For 10^4 points, I get a result in 11 seconds. For 10^5 points 3 minutes, and most of my datasets run into 2- 5 * 10^6 points. On a 2 * 10^6 point cloud it's been running overnight.
What I want is a faster/more efficient method!
I've been tinkering with python-pcl, which is fast for filtering point clouds, but I don't know how to return indices of points which pass/fail pcl-python filters. I need those indices because each point has other attributes which must remain attached to it.
If anyone can suggest a more efficient method, please do so - I would highly appreciate your help. If it can't go faster and this code is helpful, feel free to use it.
Thanks!
After some good advice, I tried this:
#import numpy and scikit-learn's neighbours modulw
import numpy as np
from sklearn import neighbors as nb
#make a little time ticker
from datetime import datetime
startTime = datetime.now()
# generate a KDTree object. This takes ~95% of the
# processing time
tree = nb.KDTree(xyzi[:,0:3], leaf_size=60)
# how long did tree generation take
print(datetime.now() - startTime)
#initialise a list
new_z = []
#for each point, collect neighbours within radius r
nhoods = tree.query_radius(xyzi[:,0:3], r=3)
# iterate through the list of neighbourhoods,
# find the median height, and add it to the output list
for point in nhoods:
new_z.append(np.median(xyzi[point,2]))
# how long did it take?
print(datetime.now() - startTime)
This version took ~33 minutes for just under two million points. Acceptable, but still could be better.
Can the KDTree generation go faster using a %jit method?
IS there a better method than looping through all the neighbourhoods to find neighbourhood means? here, nhood is an array-of-arrays - I thought something like:
median = np.median(nhoods[:][:,2])
...but it didn't.
Thanks!
I'm trying to go from alt/azi to RA/Dec for a point on the sky at a fixed location, trying out pyEphem. I've tried a couple of different ways, and I get sort of the right answer, within a couple of degrees, but I'm expecting better, and I can't figure out where the problems lie.
I've been using Canopus as a test case (I'm not after stars specifically, so I can't use the in-built catalogue). So in my case, I know that at
stn = ephem.Observer()
# yalgoo station, wa
stn.long = '116.6806'
stn.lat = '-28.3403'
stn.elevation = 328.0
stn.pressure = 0 # no refraction correction.
stn.epoch = ephem.J2000
stn.date = '2014/12/15 14:32:09' #UTC
Stellarium, checked with other web sites tell me Canopus should be at
azi, alt '138:53:5.1', '56:09:52.6' or in equatorial RA 6h 23m 57.09s/ Dec. -52deg 41' 44.6"
but trying:
cano = ephem.FixedBody()
cano._ra = '6:23:57.1'
cano._dec = '-52:41:44.0'
cano._epoch = ephem.J2000
cano.compute( stn)
print( cano.az, cano.alt)
>>>(53:22:44.0, 142:08:03.0)
about 3 degrees out. I've also tried the reverse,
ra, dec = stn.radec_of('138:53:5.1', '56:09:52.6')
>>>(6:13:18.52, -49:46:09.5)
where I'm expecting 6:23 not 6:13. Turning on refraction correction makes a small difference, but not enough, and I've always understood aberration and nutation were much smaller effects than this offset as well?
As a follow up, I've tried manual calculations, based on 'Practical Astronomy with your calculator'; so for dec:
LAT = math.radians(-28.340335)
LON = math.radians(116.680621667)
ALT = math.radians(56.16461)
AZ = math.radians(138.88475)
sinDEC = (math.sin( LAT)*math.sin( ALT)
+ math.cos( LAT)*math.cos( ALT)*math.cos( AZ) )
DEC = math.asin( sinDEC)
DEC_deg = math.degrees(DEC)
print( 'dec = ', DEC_deg )
>>>('dec = ', -49.776032754148986)
again, quite different from '56:09:52.6', but reasonably close to pyEphem - so now I'm thoroughly confused! So now I'm suspecting the problem is my understanding, rather than pyEphem - could someone enlighten me about the correct way to go do RADEC/ALTAZI conversions, and why things are not lining up?!
First some notes
Atmospheric scattering and relative speed between observer and object
have the maximal error (near horizon) up to 0.6 degree which is nowhere near your error.
how can altitude be over 90 degrees?
you got swapped data for azimut and altitude
I put your observer data into mine program and result was similar to yours
but I visually search for that star instead of putting the coordinates. Result was also about 3-4 degrees off in RA axis
RA=6.4h Dec=-52.6deg
azi=142.4deg alt=53.9deg
mine engine is in C++, using Kepler's equation
Now what can be wrong:
mine stellar catalog can be wrongly converted
rotated wrongly with some margin but strongly doubt that it is 3 degrees. Also perspective transforms can add some error while rendering at 750AU distance from observer. I never tested for Southern sky (not visible from mine place).
we are using different Earth reference frame then the data you comparing to
I found out that some sites like NASA Horizon use different reference frame which does not correspond with mine observations. Look here
calculate the time when the sun is X degrees below/above the Horizon
at the start of the answer is link to 2 sites with different reference frames when you compare the result they are off. The second link is corresponding with mine observations the rest is dealing (included source code) with Kepler's equation based Solar system simulation. The other sublinks are also worth looking into.
I could have a bug in mine simulation/data
I have referenced data to this engine which could partially hide any computation errors from mine observer position so handle all above text with taken that it mind.
you could use wrong time/Julian date to stellar time conversions
if your time is off then the angles will not match...
How to resolve this?
pick up your Telescope, set up equatoreal coordinate system/mount to it and measure Ra/Dec Azi/Alt for known (distant) object in reality and compare with computed positions. Only this way you can decide which value is good or wrong (for reference frame you are using). Do this on star not planet !!! Do this on high altitude angles not near Horizon !!!
How to transform between azimutal and equatoreal coordinates
I compute transform matrix Eath representing earth's coordinate system (upper right) in heliocentric coordinate system as global coordinate system (left) then I compute another matrix NEH representing observer on Earth's surface (North,East,High/Altitude ... lower right).
After this it is just a matter of matrix and vector multiplications and conversion between Cartesian and spherical coordinate systems look here:
Representing Points on a Circular Radar Math approach
for more insight to azimutal coordinates. if you use just that simple equation like in your example then you do not account for many things... The Earth position is computed by Kepler's equation, rotation is given by daily rotation, nutation and precession included.
I use 64 bit floating point values which can create round errors but not that high ...
I use geometric North Pole as observer reference (this could add some serious error near poles).
The biggest thing that can affect this is the speed of light but that account for near earth 'moving' objects like planets not stars (except Sun) because their computed position is visible after some time ... For example Sun-Earth distance is about 8 light minutes so we see the Sun where it was 8 minutes ago. If the effemerides data is geometrical only (not account for this) then this can lead to high errors if not computed properly.
Newer effemerides models use gravity integration instead of Kepler so their data must be geometrical and the final output is then corrected by the time shift ...
Operators used to examine the spectrum, knowing the location and width of each peak and judge the piece the spectrum belongs to. In the new way, the image is captured by a camera to a screen. And the width of each band must be computed programatically.
Old system: spectroscope -> human eye
New system: spectroscope -> camera -> program
What is a good method to compute the width of each band, given their approximate X-axis positions; given that this task used to be performed perfectly by eye, and must now be performed by program?
Sorry if I am short of details, but they are scarce.
Program listing that generated the previous graph; I hope it is relevant:
import Image
from scipy import *
from scipy.optimize import leastsq
# Load the picture with PIL, process if needed
pic = asarray(Image.open("spectrum.jpg"))
# Average the pixel values along vertical axis
pic_avg = pic.mean(axis=2)
projection = pic_avg.sum(axis=0)
# Set the min value to zero for a nice fit
projection /= projection.mean()
projection -= projection.min()
#print projection
# Fit function, two gaussians, adjust as needed
def fitfunc(p,x):
return p[0]*exp(-(x-p[1])**2/(2.0*p[2]**2)) + \
p[3]*exp(-(x-p[4])**2/(2.0*p[5]**2))
errfunc = lambda p, x, y: fitfunc(p,x)-y
# Use scipy to fit, p0 is inital guess
p0 = array([0,20,1,0,75,10])
X = xrange(len(projection))
p1, success = leastsq(errfunc, p0, args=(X,projection))
Y = fitfunc(p1,X)
# Output the result
print "Mean values at: ", p1[1], p1[4]
# Plot the result
from pylab import *
#subplot(211)
#imshow(pic)
#subplot(223)
#plot(projection)
#subplot(224)
#plot(X,Y,'r',lw=5)
#show()
subplot(311)
imshow(pic)
subplot(312)
plot(projection)
subplot(313)
plot(X,Y,'r',lw=5)
show()
Given an approximate starting point, you could use a simple algorithm that finds a local maxima closest to this point. Your fitting code may be doing that already (I wasn't sure whether you were using it successfully or not).
Here's some code that demonstrates simple peak finding from a user-given starting point:
#!/usr/bin/env python
from __future__ import division
import numpy as np
from matplotlib import pyplot as plt
# Sample data with two peaks: small one at t=0.4, large one at t=0.8
ts = np.arange(0, 1, 0.01)
xs = np.exp(-((ts-0.4)/0.1)**2) + 2*np.exp(-((ts-0.8)/0.1)**2)
# Say we have an approximate starting point of 0.35
start_point = 0.35
# Nearest index in "ts" to this starting point is...
start_index = np.argmin(np.abs(ts - start_point))
# Find the local maxima in our data by looking for a sign change in
# the first difference
# From http://stackoverflow.com/a/9667121/188535
maxes = (np.diff(np.sign(np.diff(xs))) < 0).nonzero()[0] + 1
# Find which of these peaks is closest to our starting point
index_of_peak = maxes[np.argmin(np.abs(maxes - start_index))]
print "Peak centre at: %.3f" % ts[index_of_peak]
# Quick plot showing the results: blue line is data, green dot is
# starting point, red dot is peak location
plt.plot(ts, xs, '-b')
plt.plot(ts[start_index], xs[start_index], 'og')
plt.plot(ts[index_of_peak], xs[index_of_peak], 'or')
plt.show()
This method will only work if the ascent up the peak is perfectly smooth from your starting point. If this needs to be more resilient to noise, I have not used it, but PyDSTool seems like it might help. This SciPy post details how to use it for detecting 1D peaks in a noisy data set.
So assume at this point you've found the centre of the peak. Now for the width: there are several methods you could use, but the easiest is probably the "full width at half maximum" (FWHM). Again, this is simple and therefore fragile. It will break for close double-peaks, or for noisy data.
The FWHM is exactly what its name suggests: you find the width of the peak were it's halfway to the maximum. Here's some code that does that (it just continues on from above):
# FWHM...
half_max = xs[index_of_peak]/2
# This finds where in the data we cross over the halfway point to our peak. Note
# that this is global, so we need an extra step to refine these results to find
# the closest crossovers to our peak.
# Same sign-change-in-first-diff technique as above
hm_left_indices = (np.diff(np.sign(np.diff(np.abs(xs[:index_of_peak] - half_max)))) > 0).nonzero()[0] + 1
# Add "index_of_peak" to result because we cut off the left side of the data!
hm_right_indices = (np.diff(np.sign(np.diff(np.abs(xs[index_of_peak:] - half_max)))) > 0).nonzero()[0] + 1 + index_of_peak
# Find closest half-max index to peak
hm_left_index = hm_left_indices[np.argmin(np.abs(hm_left_indices - index_of_peak))]
hm_right_index = hm_right_indices[np.argmin(np.abs(hm_right_indices - index_of_peak))]
# And the width is...
fwhm = ts[hm_right_index] - ts[hm_left_index]
print "Width: %.3f" % fwhm
# Plot to illustrate FWHM: blue line is data, red circle is peak, red line
# shows FWHM
plt.plot(ts, xs, '-b')
plt.plot(ts[index_of_peak], xs[index_of_peak], 'or')
plt.plot(
[ts[hm_left_index], ts[hm_right_index]],
[xs[hm_left_index], xs[hm_right_index]], '-r')
plt.show()
It doesn't have to be the full width at half maximum — as one commenter points out, you can try to figure out where your operators' normal threshold for peak detection is, and turn that into an algorithm for this step of the process.
A more robust way might be to fit a Gaussian curve (or your own model) to a subset of the data centred around the peak — say, from a local minima on one side to a local minima on the other — and use one of the parameters of that curve (eg. sigma) to calculate the width.
I realise this is a lot of code, but I've deliberately avoided factoring out the index-finding functions to "show my working" a bit more, and of course the plotting functions are there just to demonstrate.
Hopefully this gives you at least a good starting point to come up with something more suitable to your particular set.
Late to the party, but for anyone coming across this question in the future...
Eye movement data looks very similar to this; I'd base an approach off that used by Nystrom + Holmqvist, 2010. Smooth the data using a Savitsky-Golay filter (scipy.signal.savgol_filter in scipy v0.14+) to get rid of some of the low-level noise while keeping the large peaks intact - the authors recommend using an order of 2 and a window size of about twice the width of the smallest peak you want to be able to detect. You can find where the bands are by arbitrarily removing all values above a certain y value (set them to numpy.nan). Then take the (nan)mean and (nan)standard deviation of the remainder, and remove all values greater than the mean + [parameter]*std (I think they use 6 in the paper). Iterate until you're not removing any data points - but depending on your data, certain values of [parameter] may not stabilise. Then use numpy.isnan() to find events vs non-events, and numpy.diff() to find the start and end of each event (values of -1 and 1 respectively). To get even more accurate start and end points, you can scan along the data backward from each start and forward from each end to find the nearest local minimum which has value smaller than mean + [another parameter]*std (I think they use 3 in the paper). Then you just need to count the data points between each start and end.
This won't work for that double peak; you'd have to do some extrapolation for that.
The best method might be to statistically compare a bunch of methods with human results.
You would take a large variety data and a large variety of measurement estimates (widths at various thresholds, area above various thresholds, different threshold selection methods, 2nd moments, polynomial curve fits of various degrees, pattern matching, and etc.) and compare these estimates to human measurements of the same data set. Pick the estimate method that correlates best with expert human results. Or maybe pick several methods, the best one for each of various heights, for various separations from other peaks, and etc.