How to do a second interpolation in python - python

I did my first interpolation with numpy.polyfit() and numpy.polyval() for 50 longitude values for a full satellite orbit.
Now, I just want to look at a window of 0-4.5 degrees longitude and do a second interpolation so that I have 6,000 points for longitude in the window.
I need to use the equation/curve from the first interpolation to create the second one because there is only one point in the window range. I'm not sure how to do the second interpolation.
Inputs:
lon = [-109.73105744378498, -104.28690174554579, -99.2435132929552, -94.48533149079628, -89.91054414962821, -85.42671400689177, -80.94616150449806, -76.38135021210172, -71.6402674905218, -66.62178379632216, -61.21120467960157, -55.27684029674759, -48.66970878028004, -41.23083703244677, -32.813881865289346, -23.332386757370532, -12.832819226213942, -1.5659455609661785, 10.008077792630402, 21.33116444634303, 31.92601575632583, 41.51883213364072, 50.04498630545507, 57.58103957109249, 64.26993028992476, 70.2708323505337, 75.73441871754586, 80.7944079829813, 85.56734813043659, 90.1558676264546, 94.65309120129724, 99.14730128118617, 103.72658922048785, 108.48349841714494, 113.51966824008079, 118.95024882101737, 124.9072309203375, 131.5395221402974, 139.00523971191907, 147.44847902856114, 156.95146022590976, 167.46163867248032, 178.72228750873975, -169.72898181991064, -158.44642409799974, -147.8993300787564, -138.35373014113995, -129.86955508919888, -122.36868103811106, -115.70852432245486]
myOrbitJ2000Time = [ 20027712., 20027713., 20027714., 20027715., 20027716.,
20027717., 20027718., 20027719., 20027720., 20027721.,
20027722., 20027723., 20027724., 20027725., 20027726.,
20027727., 20027728., 20027729., 20027730., 20027731.,
20027732., 20027733., 20027734., 20027735., 20027736.,
20027737., 20027738., 20027739., 20027740., 20027741.,
20027742., 20027743., 20027744., 20027745., 20027746.,
20027747., 20027748., 20027749., 20027750., 20027751.,
20027752., 20027753., 20027754., 20027755., 20027756.,
20027757., 20027758., 20027759., 20027760., 20027761.]
Code:
deg = 30 #polynomial degree for fit
fittime = myOrbitJ2000Time - myOrbitJ2000Time[0]
'Longitude Interpolation'
fitLon = np.polyfit(fittime, lon, deg) #gets fit coefficients
polyval_lon = np.polyval(fitLon,fittime) #interp.s to get actual values
'Get Longitude values for a window of 0-4.5 deg Longitude'
lonwindow =[]
for i in range(len(polyval_lon)):
if 0 < polyval_lon[i] < 4.5: # get lon vals in window
lonwindow.append(polyval_lon[i]) #append lon vals
lonwindow = np.array(lonwindow)

First, generate the polynomial fit coefficients using the old time (x-axis) values, and interpolated longitude (y-axis) values.
import numpy as np
import matplotlib.pyplot as plt
poly_deg = 3 #degree of the polynomial fit
polynomial_fit_coeff = np.polyfit(original_times, interp_lon, poly_deg)
Next, use np.linspace() to generate arbitrary time values based on the number of desire points in the window.
start = 0
stop = 4
num_points = 6000
arbitrary_time = np.linspace(start, stop, num_points)
Finally, use the fit coefficients and the arbitrary time to get the actual interpolated longitude (y-axis) values and plot.
lon_intrp_2 = np.polyval(polynomial_fit_coeff, arbitrary_time)
plt.plot(arbitrary_time, lon_intrp_2, 'r') #interpolated window as a red curve
plt.plot(myOrbitJ2000Time, lon, '.') #original data plotted as points

Related

Finding the area of an overlap between curves (python)

Is it possible to calculate the area of the overlap of two curves?
I found two answers here but they are written in R which I am not familiar with. Or struggling to convert them to python.
Area between the two curves and Find area of overlap between two curves
For example, for a given dataset with defined x, y points. (x1,y1,x2,y2)
I am able to get the area of each curve using :
np.trapz
However, to get the overlap only is challenging and I haven't found a solution to show. Any guidance or maths formulas will be appreciated.
So this can be done using the shapely module within Python.
Firstly, Join the two curves together to create one self-intersecting polygon (shown in code below).
Then using the unary_union() function from shapely, you will:
Split the complex polygon into seperate simple polygons.
Find the area of each simple polygon.
Sum it to find the overall area of the two curves.
Full code shown below:
import numpy as np
from shapely.geometry import LineString
from shapely.ops import unary_union, polygonize
avg_coords = [(0.0, 0.0), (4.872117, 2.29658), (5.268545, 2.4639225), (5.664686, 2.6485724), (6.059776, 2.8966842), (6.695151, 3.0986626), (7.728006, 3.4045217), (8.522297, 3.652668), (9.157002, 3.895031), (10.191483, 4.1028132), (10.827622, 4.258638), (11.38593, 4.2933016), (11.86478, 4.3048816), (12.344586, 4.258769), (12.984073, 4.2126703), (13.942729, 4.1781383), (14.58212, 4.137809), (15.542498, 3.99943), (16.502588, 3.878359), (17.182951, 3.7745714), (18.262657, 3.6621647), (19.102558, 3.567045), (20.061789, 3.497897), (21.139917, 3.4806826), (22.097425, 3.5153809), (23.65388, 3.5414772), (24.851482, 3.541581), (26.04966, 3.507069), (27.72702, 3.463945), (28.925198, 3.429433), (29.883854, 3.3949006), (31.08246, 3.3344274), (31.92107, 3.317192), (33.716183, 3.3952322), (35.63192, 3.4213595), (37.427895, 3.4474766), (39.343628, 3.473604), (41.49874, 3.508406), (43.773468, 3.5518723), (46.287716, 3.595359), (49.28115, 3.6302335), (52.633293, 3.6997545), (54.30922, 3.7431688), (55.8651, 3.8038807), (58.738773, 3.8387446), (60.893887, 3.8735466), (63.647655, 3.9170544), (66.760704, 3.960593), (68.79663, 3.9607692), (70.23332, 3.986855), (72.867905, 3.995737), (75.38245, 4.0219164), (77.778656, 3.9615464), (79.337975, 3.8145657), (80.41826, 3.6675436), (80.899734, 3.5204697), (81.62059, 3.38207), (82.34045, 3.3042476), (83.30039, 3.1918304), (84.38039, 3.062116), (84.50359, 2.854434), (83.906364, 2.7591898), (83.669716, 2.586092), (83.43435, 2.3351095), (83.19727, 2.1879735), (82.84229, 1.9283267), (82.48516, 1.7984879), (81.65014, 1.5993768), (80.454544, 1.4781193), (79.13962, 1.3308897), (77.944595, 1.1750168), (76.39001, 1.0364205), (74.59633, 0.87184185), (71.60447, 0.741775), (70.04903, 0.6551017), (58.3, 0.0)]
model_coords = [(0.0, 0.0), (0.6699889, 0.18807), (1.339894, 0.37499), (2.009583, 0.55966), (2.67915, 0.74106), (3.348189, 0.91826), (4.016881, 1.0904), (4.685107, 1.2567), (5.359344, 1.418), (6.026172, 1.5706), (6.685472, 1.714), (7.350604, 1.8508), (8.021434, 1.9803), (8.684451, 2.0996), (9.346408, 2.2099), (10.0066, 2.311), (10.66665, 2.4028), (11.32436, 2.4853), (11.98068, 2.5585), (12.6356, 2.6225), (13.29005, 2.6775), (13.93507, 2.7232), (14.58554, 2.7609), (15.23346, 2.7903), (15.87982, 2.8116), (16.52556, 2.8254), (17.16867, 2.832), (17.80914, 2.8317), (18.44891, 2.825), (19.08598, 2.8124), (19.72132, 2.7944), (20.35491, 2.7713), (20.98673, 2.7438), (21.61675, 2.7121), (22.24398, 2.677), (22.86939, 2.6387), (23.49297, 2.5978), (24.1147, 2.5548), (24.73458, 2.51), (25.3526, 2.464), (25.96874, 2.4171), (26.58301, 2.3697), (27.1954, 2.3223), (27.80491, 2.2751), (28.41354, 2.2285), (29.02028, 2.1829), (29.62512, 2.1384), (30.22809, 2.0954), (30.82917, 2.0541), (31.42837, 2.0147), (32.02669, 1.9775), (32.62215, 1.9425), (33.21674, 1.9099), (33.80945, 1.8799), (34.40032, 1.8525), (34.98933, 1.8277), (35.5765, 1.8058), (36.16283, 1.7865), (36.74733, 1.7701), (37.33002, 1.7564), (37.91187, 1.7455), (38.49092, 1.7372), (39.06917, 1.7316), (39.64661, 1.7285), (40.22127, 1.7279), (40.79514, 1.7297), (41.36723, 1.7337), (41.93759, 1.7399), (42.50707, 1.748), (43.07386, 1.7581), (43.63995, 1.7699), (44.20512, 1.7832), (44.76772, 1.7981), (45.3295, 1.8143), (45.88948, 1.8318), (46.44767, 1.8504), (47.00525, 1.8703), (47.55994, 1.8911), (48.11392, 1.9129), (48.6661, 1.9356), (49.21658, 1.959), (49.76518, 1.9832), (50.31305, 2.0079), (50.85824, 2.033), (51.40252, 2.0586), (51.94501, 2.0845), (52.48579, 2.1107), (53.02467, 2.1369), (53.56185, 2.1632), (54.09715, 2.1895), (54.63171, 2.2156), (55.1634, 2.2416), (55.69329, 2.2674), (56.22236, 2.2928), (56.74855, 2.3179), (57.27392, 2.3426), (57.7964, 2.3668), (58.31709, 2.3905), (58.83687, 2.4136), (59.35905, 2.4365), (59.87414, 2.4585), (60.38831, 2.4798), (60.8996, 2.5006), (61.40888, 2.5207), (61.91636, 2.5401), (62.42194, 2.5589), (62.92551, 2.577), (63.42729, 2.5945), (63.92607, 2.6113), (64.42384, 2.6275), (64.91873, 2.643), (65.4127, 2.658), (65.90369, 2.6724), (66.39266, 2.6862), (66.87964, 2.6995), (67.36373, 2.7123), (67.84679, 2.7246), (68.32689, 2.7364), (68.80595, 2.7478), (69.28194, 2.7588), (69.756, 2.7695), (70.22709, 2.7798), (70.69707, 2.7898), (71.16405, 2.7995), (71.62902, 2.809), (72.0919, 2.8183), (72.55277, 2.8273), (73.01067, 2.8362), (73.46734, 2.845), (73.92112, 2.8536), (74.37269, 2.8622), (74.82127, 2.8706), (75.26884, 2.8791), (75.71322, 2.8875), (76.15559, 2.8958), (76.59488, 2.9042), (77.03304, 2.9126), (77.46812, 2.921), (77.90111, 2.9294), (78.33199, 2.9379), (78.75986, 2.9464), (79.18652, 2.955), (79.60912, 2.9637), (80.03049, 2.9724), (80.44985, 2.9811), (80.86613, 2.99), (81.2802, 2.9989), (81.69118, 3.0078), (82.10006, 3.0168), (82.50674, 3.0259), (82.91132, 3.035), (83.31379, 3.0441), (83.71307, 3.0533), (84.10925, 3.0625), (84.50421, 3.0717), (84.8961, 3.0809), (85.28577, 3.0901), (85.67334, 3.0993), (86.05771, 3.1085), (86.43989, 3.1176), (86.81896, 3.1267), (87.19585, 3.1358), (87.57063, 3.1448), (87.94319, 3.1537), (88.31257, 3.1626), (88.67973, 3.1713), (89.04372, 3.18), (89.40659, 3.1886), (89.7652, 3.197), (90.12457, 3.2053), (90.47256, 3.2135), (90.82946, 3.2216), (91.17545, 3.2295), (91.52045, 3.2373), (91.86441, 3.2449), (92.20641, 3.2524), (92.54739, 3.2597), (92.88728, 3.2669), (93.21538, 3.2739), (93.55325, 3.2807), (93.87924, 3.2874), (94.20424, 3.2939), (94.52822, 3.3002), (94.85012, 3.3064), (95.16219, 3.3123), (95.48208, 3.3182), (95.79107, 3.3238), (96.09807, 3.3293), (96.40505, 3.3346), (96.71003, 3.3397), (97.01401, 3.3447), (97.31592, 3.3496), (97.60799, 3.3542), (97.90789, 3.3587), (98.19686, 3.3631), (98.48386, 3.3673), (98.77085, 3.3714), (99.05574, 3.3753), (99.32983, 3.3791), (99.6127, 3.3828), (99.8837, 3.3863), (100.1538, 3.3897), (100.4326, 3.393), (100.6897, 3.3961), (100.9566, 3.3991), (101.2215, 3.402), (101.4756, 3.4048), (101.7375, 3.4075), (101.9885, 3.4101), (102.2385, 3.4126), (102.4875, 3.4149), (102.7354, 3.4172), (102.9714, 3.4194), (103.2163, 3.4214), (103.4493, 3.4234), (103.6823, 3.4253), (103.9133, 3.4271), (104.1433, 3.4288), (104.3712, 3.4304), (104.5882, 3.4319), (104.8141, 3.4333), (105.0291, 3.4346), (105.2421, 3.4358), (105.4541, 3.437), (105.6651, 3.438), (105.8751, 3.439), (106.083, 3.4399), (106.28, 3.4407), (106.4759, 3.4414), (106.6699, 3.442), (106.8629, 3.4425), (107.0549, 3.443), (107.2458, 3.4433), (107.4249, 3.4435), (107.6128, 3.4437), (107.7897, 3.4438), (107.9647, 3.4437), (108.1387, 3.4436), (108.3116, 3.4433), (108.4737, 3.443), (108.6436, 3.4426), (108.8027, 3.4421), (108.9706, 3.4414), (109.1265, 3.4407), (109.2814, 3.4399), (109.4255, 3.439), (109.5784, 3.4379), (109.7195, 3.4368), (109.8694, 3.4356), (110.0084, 3.4342), (110.1454, 3.4328), (110.2813, 3.4313), (110.4162, 3.4296), (110.5403, 3.4279), (110.6722, 3.426), (110.7932, 3.424), (110.9132, 3.422), (111.0322, 3.4198), (111.1492, 3.4175), (111.2651, 3.4151), (111.3701, 3.4127), (111.483, 3.4101), (111.585, 3.4074), (111.686, 3.4046), (111.786, 3.4017), (111.884, 3.3987), (111.9809, 3.3956), (112.0669, 3.3924), (112.1608, 3.3891), (112.2448, 3.3857), (112.3268, 3.3822), (112.4078, 3.3786), (112.4867, 3.3749), (112.5548, 3.3711), (112.6317, 3.3672), (112.6978, 3.3632), (112.7726, 3.3591), (112.8356, 3.3549), (112.8975, 3.3506), (112.9476, 3.3462), (113.0076, 3.3417), (113.0655, 3.3372), (113.1125, 3.3325), (113.1584, 3.3278), (113.2024, 3.3229), (113.2464, 3.318), (113.2884, 3.313), (113.3283, 3.3079), (113.3584, 3.3027), (113.3963, 3.2974), (113.4233, 3.292), (113.4492, 3.2865), (113.4742, 3.281), (113.4972, 3.2753), (113.5201, 3.2696), (113.5312, 3.2638), (113.5501, 3.2579), (113.5591, 3.2519), (113.5661, 3.2459), (113.5721, 3.2397), (113.577, 3.2335), (113.5809, 3.2272), (113.573, 3.2208), (113.5749, 3.2143), (113.5649, 3.2077), (113.5539, 3.2011), (113.5409, 3.1944), (113.5278, 3.1876), (113.5128, 3.1807), (113.4967, 3.1737), (113.4697, 3.1667), (113.4418, 3.1596), (113.4227, 3.1524), (113.3917, 3.145), (113.3597, 3.1375), (113.3266, 3.1298), (113.2827, 3.1218), (113.2475, 3.1136), (113.2016, 3.1051), (113.1635, 3.0964), (113.1155, 3.0873), (113.0655, 3.0779), (113.0144, 3.0683), (112.9525, 3.0583), (112.8994, 3.048), (112.8345, 3.0373), (112.7793, 3.0264), (112.7123, 3.0152), (112.6453, 3.0037), (112.5763, 2.9919), (112.5063, 2.9798), (112.4352, 2.9674), (112.3533, 2.9548), (112.2801, 2.9419), (112.1952, 2.9287), (112.1102, 2.9153), (112.034, 2.9017), (111.9361, 2.8879), (111.8481, 2.8739), (111.7581, 2.8597), (111.667, 2.8453), (111.5661, 2.8307), (111.473, 2.816), (111.3689, 2.801), (111.2639, 2.786), (111.1579, 2.7708), (111.0509, 2.7555), (110.9428, 2.74), (110.8239, 2.7245), (110.7138, 2.7088), (110.5928, 2.6931), (110.4709, 2.6772), (110.3578, 2.6613), (110.2338, 2.6453), (110.1087, 2.6292), (109.9826, 2.613), (109.8457, 2.5968), (109.7176, 2.5805), (109.5787, 2.5642), (109.4496, 2.5478), (109.3086, 2.5314), (109.1666, 2.5149), (109.0236, 2.4984), (108.8806, 2.4819), (108.7355, 2.4653), (108.5905, 2.4488), (108.4434, 2.4322), (108.2865, 2.4155), (108.1384, 2.3989), (107.9794, 2.3822), (107.8195, 2.3655), (107.6684, 2.3488), (107.5063, 2.3321), (107.3374, 2.3156), (107.1744, 2.2989), (107.0104, 2.2822), (106.8442, 2.2654), (106.6683, 2.2487), (106.5012, 2.232), (106.3242, 2.2152), (106.1452, 2.1985), (105.9662, 2.1818), (105.7862, 2.165), (105.6052, 2.1483), (105.4232, 2.1316), (105.2402, 2.1149), (105.0572, 2.0981), (104.8721, 2.0814), (104.6772, 2.0647), (104.492, 2.048), (104.295, 2.0313), (104.098, 2.0147), (103.9, 1.998), (103.701, 1.9813), (103.502, 1.9647), (103.301, 1.948), (103.1, 1.9314), (102.899, 1.9148), (102.6959, 1.8982), (102.483, 1.8816), (102.2789, 1.865), (102.0649, 1.8484), (101.8588, 1.8318), (101.6428, 1.8153), (101.4268, 1.7988), (101.2098, 1.7822), (100.9918, 1.7657), (100.7728, 1.7492), (100.5538, 1.7328), (100.3338, 1.7163), (100.1128, 1.6999), (99.89169, 1.6834), (99.65978, 1.667), (99.43769, 1.6506), (99.20477, 1.6343), (98.98066, 1.6179), (98.74665, 1.6016), (98.51164, 1.5852), (98.27574, 1.5689), (98.04964, 1.5527), (97.81264, 1.5364), (97.57562, 1.5202), (97.33752, 1.5039), (97.08962, 1.4877), (96.8506, 1.4716), (96.61061, 1.4554), (96.37051, 1.4393), (96.12058, 1.4232), (95.87949, 1.4071), (95.62759, 1.391), (95.38547, 1.375), (95.13258, 1.359), (94.88946, 1.343), (94.63548, 1.3271), (94.38145, 1.3111), (94.12645, 1.2952), (93.87144, 1.2793), (93.61545, 1.2635), (93.35946, 1.2477), (93.10343, 1.2319), (92.84642, 1.2161), (92.58843, 1.2004), (92.33042, 1.1846), (92.07232, 1.169), (91.8034, 1.1533), (91.54331, 1.1377), (91.2744, 1.1221), (91.0133, 1.1065), (90.7434, 1.091), (90.48229, 1.0755), (90.21139, 1.0601), (89.9493, 1.0446), (89.67728, 1.0292), (89.40428, 1.0139), (89.13137, 0.99855), (88.86826, 0.98325), (88.59427, 0.96799), (88.32026, 0.95277), (88.04527, 0.93758), (87.77126, 0.92242), (87.4972, 0.90731), (87.21732, 0.89222), (86.94719, 0.87718), (86.66711, 0.86217), (86.3773, 0.8472), (86.10719, 0.83227), (85.82721, 0.81738), (85.5472, 0.80252), (85.26721, 0.7877), (84.9872, 0.77292), (84.7071, 0.75819), (84.41721, 0.74349), (84.1371, 0.72883), (83.84721, 0.71421), (83.5671, 0.69963), (83.27721, 0.68509), (82.99711, 0.6706), (82.70711, 0.65615), (82.41721, 0.64173), (82.1371, 0.62736), (81.8471, 0.61304), (81.55722, 0.59875), (81.27709, 0.58451), (80.98712, 0.57031), (80.697, 0.55616), (80.39711, 0.54205), (80.10722, 0.52798), (79.8271, 0.51396), (79.53701, 0.49999), (79.23711, 0.48605), (78.9471, 0.47217), (78.65701, 0.45833), (78.3571, 0.44453), (78.06712, 0.43078), (77.77701, 0.41708), (77.4771, 0.40343), (77.18701, 0.38982), (76.8871, 0.37626), (76.59711, 0.36274), (76.30701, 0.34928), (76.0071, 0.33586), (75.7169, 0.32249), (75.4071, 0.30917), (75.11701, 0.29589), (74.8171, 0.28267), (74.52701, 0.26949), (74.22711, 0.25636), (73.937, 0.24329), (73.63691, 0.23026), (73.3271, 0.21728), (73.03699, 0.20436), (72.73712, 0.19148), (72.4469, 0.17865), (72.13712, 0.16588), (71.84701, 0.15315), (71.547, 0.14048), (71.24701, 0.12786), (70.947, 0.11528), (70.64701, 0.10277), (70.3471, 0.090298), (70.05691, 0.077883), (69.74712, 0.06552), (69.457, 0.05321), (69.1569, 0.040952), (68.84709, 0.028747), (68.557, 0.016595), (68.25701, 0.0)]
polygon_points = [] #creates a empty list where we will append the points to create the polygon
for xyvalue in avg_coords:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append all xy points for curve 1
for xyvalue in model_coords[::-1]:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append all xy points for curve 2 in the reverse order (from last point to first point)
for xyvalue in avg_coords[0:1]:
polygon_points.append([xyvalue[0],xyvalue[1]]) #append the first point in curve 1 again, to it "closes" the polygon
avg_poly = []
model_poly = []
for xyvalue in avg_coords:
avg_poly.append([xyvalue[0],xyvalue[1]])
for xyvalue in model_coords:
model_poly.append([xyvalue[0],xyvalue[1]])
line_non_simple = LineString(polygon_points)
mls = unary_union(line_non_simple)
Area_cal =[]
for polygon in polygonize(mls):
Area_cal.append(polygon.area)
print(polygon.area)# print area of each section
Area_poly = (np.asarray(Area_cal).sum())
print(Area_poly)#print combined area
If possible, represent your overlap regions as polygons. From there the polygon area is computable by a remarkably concise formula as explained on Paul Bourke's site.
Suppose (x[i], y[i]), i = 0, ..., N, are the polygon vertices, with (x[0], y[0]) = (x[N], y[N]) so that the polygon is closed, and consistently all in clockwise order or all in counter-clockwise order. Then the area is
area = |0.5 * sum_i (x[i] * y[i+1] - x[i+1] * y[i])|
where the sum goes over i = 0, ..., N-1. This is valid even for nonconvex polygons. This formula is essentially the same principle behind how a planimeter works to measure area of an arbitrary two-dimensional shape, a special case of Green's theorem.
If your functions are actually "function" meaning that no vertical lines intersect the functions more than once, then finding the overlaps is the matter of finding zeros.
import numpy as np
import matplotlib.pyplot as plt
dx = 0.01
x = np.arange(-2, 2, dx)
f1 = np.sin(4*x)
f2 = np.cos(4*x)
plt.plot(x, f1)
plt.plot(x, f2)
eps = 1e-1; # threshold of intersection points.
df = f1 - f2
idx_zeros = np.where(abs(df) <= eps)[0]
area = 0
for i in range(len(idx_zeros) - 1):
idx_left = idx_zeros[i]
idx_rite = idx_zeros[i+1]
area += abs(np.trapz(df[idx_left:idx_rite])) * dx
I have assumed areas to be considered positive.
The analytical value for the example I used is
sufficiently close to the computed value (area=2.819). Of course, you can improve this if your grids are finer, and threshold eps smaller.

Xarray Data Array from netcdf returns numpy grid array larger than input

I have a netcdf file with float values representing chlorophyll concentration at latitudes and longitudes. I am trying to draw a line between two sets of lats/lons and return all chlorophyll values from points on the line.
I'm approaching it from a geometry point of view: for points (x1, y1) and (x2, y2), find the slope and intercept of the line and return all values of x for given values of y on the line. Once I have all x and y values (longitude and latitude) I hope to input those into the xarray select method to return the chlorophyll concentration.
ds = '~/apr1.nc'
ds = xarray.open_dataset(ds, decode_times=False)
x1, y1 = [34.3282, 32.4791]
x2, y2 = [34.7, 32.21]
slope = (y2 - y1) / (x2 - x1)
intercept = y1 - (slope * x1)
line_lons = np.arange(x1, x2, step)
line_lats = [slope * x + intercept for x in lons]
values = ds.CHL.sel(lat=line_lats, lon=line_lons, method='nearest')
ds.values
>>> [0.0908799 , 0.06634101, 0.07615771, 0.16289435],
[0.06787204, 0.07480557, 0.0655338 , 0.06064864],
[0.06352911, 0.06586582, 0.06702182, 0.10024723],
[0.0789495 , 0.07035938, 0.07455409, 0.08405576]]], dtype=float32)
line_lons
>>> array([34.3282, 34.4282, 34.5282, 34.6282])
I want to create a plot with longitudes on the x axis, and values on the y axis. The problem is that the ds.values command returns an numpy data array with a shape of (1, 4, 4) while the longitudes are only 4. There are way more values in the returned array.
plt.plot(line_lons, chlvalues.values)
Any idea why that is and how I can return one value for one input?
Thanks.
I assume it is because by default your output was taken from box instead of along a selected transect.
I propose a more complex solution with Numpy and netCDF4, where you first make the transect with random coordinates and then turn these random coordinates into the closest unique coordinates from input file (unique = so that each point along the transect is encounted only once).
Afterwards, when you know your output coordinates, you have 2 possibilities how to take out data along transect:
a) you find the indices of the corresponding coordinates
b) interpolate original data to those coordinates (either nearest or bi-linear method)
Here is the code:
#!/usr/bin/env ipython
# --------------------------------------------------------------------------------------------------------------
import numpy as np
from netCDF4 import Dataset
# -----------------------------
# coordinates:
x1, y1 = [10., 55.]
x2, y2 = [20., 58.]
# --------------------------------
# ==============================================================================================================
# create some test data:
nx,ny = 100,100
dataout = np.random.random((ny,nx));
# -------------------------------
lonout=np.linspace(9.,30.,nx);
latout=np.linspace(54.,66.,ny);
# make data:
ncout=Dataset('test.nc','w','NETCDF3_CLASSIC');
ncout.createDimension('lon',nx);
ncout.createDimension('lat',ny);
ncout.createDimension('time',None);
ncout.createVariable('lon','float64',('lon'));ncout.variables['lon'][:]=lonout;
ncout.createVariable('lat','float64',('lat'));ncout.variables['lat'][:]=latout;
ncout.createVariable('var','float32',('lat','lon'));ncout.variables['var'][:]=dataout;
ncout.close()
#=================================================================================================================
# CUT THE DATA FROM FILE:
# make some arbitrary line between start-end point, later let us convert it to indices:
coords=np.linspace(x1+1j*y1,x2+1j*y2,1000);
xo=np.real(coords);yo=np.imag(coords);
# ------------------------------------------------------
# get transect:
ncin = Dataset('test.nc');
lonin=ncin.variables['lon'][:];
latin=ncin.variables['lat'][:];
# ------------------------------------------------------
# get the transect indices:
rxo=np.array([np.squeeze(np.min(lonout[np.where(np.abs(lonout-val)==np.abs(lonout-val).min())])) for val in xo]);
ryo=np.array([np.squeeze(np.min(latout[np.where(np.abs(latout-val)==np.abs(latout-val).min())])) for val in yo]);
rcoords=np.unique(rxo+1j*ryo);
rxo=np.real(rcoords);ryo=np.imag(rcoords);
# ------------------------------------------------------
ixo=[int(np.squeeze(np.where(lonin==val))) for val in rxo];
jxo=[int(np.squeeze(np.where(latin==val))) for val in ryo];
# ------------------------------------------------------
# get var data along transect:
trans_data=np.array([ncin.variables['var'][jxo[ii],ixo[ii]] for ii in range(len(ixo))]);
# ------------------------------------------------------
ncin.close()
# ================================================================================================================
# Another solution using interpolation, when we already know the target coordinates (original coordinates along the transect):
from scipy.interpolate import griddata
ncin = Dataset('test.nc');
lonin=ncin.variables['lon'][:];
latin=ncin.variables['lat'][:];
varin=ncin.variables['var'][:];
ncin.close()
# ----------------------------------------------------------------------------------------------------------------
lonm,latm = np.meshgrid(lonin,latin);
trans_data_b=griddata((lonm.flatten(),latm.flatten()),varin.flatten(),(rxo,ryo),'nearest')

how to isolate data that are 2 and 3 sigma deviated from mean and then mark them in a plot in python?

I am reading from a dataset which looks like the following when plotted in matplotlib and then taken the best fit curve using linear regression.
The sample of data looks like following:
# ID X Y px py pz M R
1.04826492772e-05 1.04828050287e-05 1.048233088e-05 0.000107002791008 0.000106552433081 0.000108704469007 387.02 4.81947797625e+13
1.87380963036e-05 1.87370588085e-05 1.87372620448e-05 0.000121616280029 0.000151924707761 0.00012371156585 428.77 6.54636174067e+13
3.95579877816e-05 3.95603773653e-05 3.95610756809e-05 0.000163470663023 0.000265203868883 0.000228031803626 470.74 8.66961875758e+13
My code looks the following:
# Regression Function
def regress(x, y):
#Return a tuple of predicted y values and parameters for linear regression.
p = sp.stats.linregress(x, y)
b1, b0, r, p_val, stderr = p
y_pred = sp.polyval([b1, b0], x)
return y_pred, p
# plotting z
xz, yz = M, Y_z # data, non-transformed
y_pred, _ = regress(xz, np.log(yz)) # change here # transformed input
plt.semilogy(xz, yz, marker='o',color ='b', markersize=4,linestyle='None', label="l.o.s within R500")
plt.semilogy(xz, np.exp(y_pred), "b", label = 'best fit') # transformed output
However I can see a lot upward scatter in the data and the best fit curve is affected by those. So first I want to isolate the data points which are 2 and 3 sigma away from my mean data, and mark them with circle around them.
Then take the best fit curve considering only the points which fall within 1 sigma of my mean data
Is there a good function in python which can do that for me?
Also in addition to that may I also isolate the data from my actual dataset, like if the third row in the sample input represents 2 sigma deviation may I have that row as an output too to save later and investigate more?
Your help is most appreciated.
Here's some code that goes through the data in a given number of windows, calculates statistics in said windows, and separates data in well- and misbehaved lists.
Hope this helps.
from scipy import stats
from scipy import polyval
import numpy as np
import matplotlib.pyplot as plt
num_data = 10000
fake_data_x = np.sort(12.8+np.random.random(num_data))
fake_data_y = np.exp(fake_data_x) + np.random.normal(0,scale=50000,size=num_data)
# Regression Function
def regress(x, y):
#Return a tuple of predicted y values and parameters for linear regression.
p = stats.linregress(x, y)
b1, b0, r, p_val, stderr = p
y_pred = polyval([b1, b0], x)
return y_pred, p
# plotting z
xz, yz = fake_data_x, fake_data_y # data, non-transformed
y_pred, _ = regress(xz, np.log(yz)) # change here # transformed input
plt.figure()
plt.semilogy(xz, yz, marker='o',color ='b', markersize=4,linestyle='None', label="l.o.s within R500")
plt.semilogy(xz, np.exp(y_pred), "b", label = 'best fit') # transformed output
plt.show()
num_bin_intervals = 10 # approx number of averaging windows
window_boundaries = np.linspace(min(fake_data_x),max(fake_data_x),int(len(fake_data_x)/num_bin_intervals)) # window boundaries
y_good = [] # list to collect the "well-behaved" y-axis data
x_good = [] # list to collect the "well-behaved" x-axis data
y_outlier = []
x_outlier = []
for i in range(len(window_boundaries)-1):
# create a boolean mask to select the data within the averaging window
window_indices = (fake_data_x<=window_boundaries[i+1]) & (fake_data_x>window_boundaries[i])
# separate the pieces of data in the window
fake_data_x_slice = fake_data_x[window_indices]
fake_data_y_slice = fake_data_y[window_indices]
# calculate the mean y_value in the window
y_mean = np.mean(fake_data_y_slice)
y_std = np.std(fake_data_y_slice)
# choose and select the outliers
y_outliers = fake_data_y_slice[np.abs(fake_data_y_slice-y_mean)>=2*y_std]
x_outliers = fake_data_x_slice[np.abs(fake_data_y_slice-y_mean)>=2*y_std]
# choose and select the good ones
y_goodies = fake_data_y_slice[np.abs(fake_data_y_slice-y_mean)<2*y_std]
x_goodies = fake_data_x_slice[np.abs(fake_data_y_slice-y_mean)<2*y_std]
# extend the lists with all the good and the bad
y_good.extend(list(y_goodies))
y_outlier.extend(list(y_outliers))
x_good.extend(list(x_goodies))
x_outlier.extend(list(x_outliers))
plt.figure()
plt.semilogy(x_good,y_good,'o')
plt.semilogy(x_outlier,y_outlier,'r*')
plt.show()

How to interpolate numpy.polyval and numpy.polyfit python

I did a numpy.polyfit() for latitude, longitude, & altitude data for a satellite orbit and interpolated (50 points) with numpy.polyval().
Now, I want to just take a window (0-4.5 degrees longitude) and do a higher resolution interpolation (6,000 points). I think that I need to use the fit coefficients from the first low res fit in order to interpolate for my longitude window, and I am not quite sure how to do this.
Inputs:
lat = [27.755611104020687, 22.50661883405905, 17.083576087905502, 11.53891099628959, 5.916633366002468, 0.2555772624429494, -5.407902834141322, -11.037514984810027, -16.594621304857206, -22.03556688048686, -27.308475759820045, -32.34927891621322, -37.07690156937186, -41.38803163295967, -45.15306971601912, -48.21703193866987, -50.41165326774015, -51.58419672864487, -51.63883932997542, -50.57025116952513, -48.46557920053242, -45.47329014246061, -41.76143266388077, -37.48707787049647, -32.782653540783, -27.754184631685046, -22.48503337048438, -17.041097574740743, -11.475689837873944, -5.833592289780744, -0.1543286595142316, 5.525119007560692, 11.167878192881306, 16.73476477885508, 22.18160021405449, 27.455997555900108, 32.493386953033685, 37.21222272985329, 41.508824407948275, 45.25350232626601, 48.291788915858554, 50.45698534747271, 51.59925055739275, 51.62660832560593, 50.53733379179681, 48.420673231121725, 45.42531420150485, 41.71819693220144, 37.45473807165676, 32.76569228387106]
lon = [-109.73105744378498, -104.28690174554579, -99.2435132929552, -94.48533149079628, -89.91054414962821, -85.42671400689177, -80.94616150449806, -76.38135021210172, -71.6402674905218, -66.62178379632216, -61.21120467960157, -55.27684029674759, -48.66970878028004, -41.23083703244677, -32.813881865289346, -23.332386757370532, -12.832819226213942, -1.5659455609661785, 10.008077792630402, 21.33116444634303, 31.92601575632583, 41.51883213364072, 50.04498630545507, 57.58103957109249, 64.26993028992476, 70.2708323505337, 75.73441871754586, 80.7944079829813, 85.56734813043659, 90.1558676264546, 94.65309120129724, 99.14730128118617, 103.72658922048785, 108.48349841714494, 113.51966824008079, 118.95024882101737, 124.9072309203375, 131.5395221402974, 139.00523971191907, 147.44847902856114, 156.95146022590976, 167.46163867248032, 178.72228750873975, -169.72898181991064, -158.44642409799974, -147.8993300787564, -138.35373014113995, -129.86955508919888, -122.36868103811106, -115.70852432245486]
alt = [374065.49207488785, 372510.1635949105, 371072.75959230476, 369836.3092635453, 368866.7921820211, 368209.0950216997, 367884.3703536549, 367888.97894243425, 368195.08833668986, 368752.88080031495, 369494.21701128664, 370337.49662954226, 371193.3839051864, 371971.0136622536, 372584.272228585, 372957.752022573, 373032.0104747458, 372767.8112563471, 372149.0940816824, 371184.49208500446, 369907.2992362557, 368373.8795969478, 366660.5935723809, 364859.4071422184, 363072.42955020745, 361405.69765685993, 359962.58417682414, 358837.24421522504, 358108.5277743581, 357834.7679493668, 358049.8054538341, 358760.531463618, 359946.1257064284, 361559.04646970675, 363527.70518032915, 365760.6377191965, 368151.8843206526, 370587.2165838985, 372950.8014553002, 375131.8814988529, 377031.06540952163, 378565.8596562773, 379675.13241518533, 380322.2707576381, 380496.8682141012, 380214.86538256245, 379517.14674525027, 378466.68079100474, 377144.36811517406, 375643.83731560566]
myOrbitJ2000Time =[ 20027712., 20027713., 20027714., 20027715., 20027716.,
20027717., 20027718., 20027719., 20027720., 20027721.,
20027722., 20027723., 20027724., 20027725., 20027726.,
20027727., 20027728., 20027729., 20027730., 20027731.,
20027732., 20027733., 20027734., 20027735., 20027736.,
20027737., 20027738., 20027739., 20027740., 20027741.,
20027742., 20027743., 20027744., 20027745., 20027746.,
20027747., 20027748., 20027749., 20027750., 20027751.,
20027752., 20027753., 20027754., 20027755., 20027756.,
20027757., 20027758., 20027759., 20027760., 20027761.]
Code:
deg = 30 #polynomial degree for fit
fittime = myOrbitJ2000Time - myOrbitJ2000Time[0]
'Latitude Interpolation'
fitLat = np.polyfit(fittime, lat, deg)
polyval_lat = np.polyval(fitLat,fittime)
'Longitude Interpolation'
fitLon = np.polyfit(fittime, lon, deg)
polyval_lon = np.polyval(fitLon,fittime)
'Altitude Interpolation'
fitAlt = np.polyfit(fittime, alt, deg)
polyval_alt = np.polyval(fitAlt,fittime)
'Get Lat, Lon, & Alt values for a window of 0-4.5 deg Longitude'
lonwindow =[]
latwindow = []
altwindow = []
for i in range(len(polyval_lat)):
if 0 < polyval_lon[i] < 4.5: # get lon vals in window
lonwindow.append(polyval_lon[i]) #append lon vals
latwindow.append(polyval_lat[i]) #append corresponding lat vals
altwindow.append(polyval_alt[i]) #append corresponding alt vals
lonwindow = np.array(lonwindow)
Just to be clear -- The issue is I only have one point in the window range, I want to use the interpolation/equation/curve from the previous step. So then I can use that to interpolate again and generate 6,000 points in my window range.
Original answer posted here
First, generate the polynomial fit coefficients using the old time (x-axis) values, and interpolated longitude (y-axis) values.
import numpy as np
import matplotlib.pyplot as plt
poly_deg = 3 #degree of the polynomial fit
polynomial_fit_coeff = np.polyfit(original_times, interp_lon, poly_deg)
Next, use np.linspace() to generate arbitrary time values based on the number of desire points in the window.
start = 0
stop = 4
num_points = 6000
arbitrary_time = np.linspace(start, stop, num_points)
Finally, use the fit coefficients and the arbitrary time to get the actual interpolated longitude (y-axis) values and plot.
lon_intrp_2 = np.polyval(polynomial_fit_coeff, arbitrary_time)
plt.plot(arbitrary_time, lon_intrp_2, 'r') #interpolated window as a red curve
plt.plot(myOrbitJ2000Time, lon, '.') #original data plotted as points

linear interpolation with grided data in python

I've a gridded weather data set which have a dimension 33 X 77 X 77. The first dimension is time and rest are Lat and Lon respectively. I need to interpolate (linear or nearest neighbour) the data to different points (lat&lon) for each time and write it into a csv file. I've used interp2d function from scipy and it is successful for one time step. As I've many locations I don't want to loop over time.
below shown is the piece of code that I wrote, Can any one suggest a better method to accomplish the task?
import sys ; import numpy as np ; import scipy as sp ; from scipy.interpolate import interp2d ;import datetime ; import time ; import pygrib as pg ;
grb_f=pg.open('20150331/gfs.20150331.grb2') lat=tmp[0].data(lat1=4,lat2=42,lon1=64,lon2=102)[1] ; lat=lat[:,0];
lon=tmp[0].data(lat1=4,lat2=42,lon1=64,lon2=102)[2] ; lon=lon[0,:] ;
temp=np.empty((0,lon.shape[0]))
for i in range(0,tmp.shape[0]):
dat=tmp[i].data(lat1=4,lat2=42,lon1=64,lon2=102)
temp=np.concatenate([temp,dat[0]-273.15],axis=0)
temp1=temp.reshape(tmp.shape[0],lat.shape[0],lon.shape[0])
x=77 ; y=28 #(many points)
f=interp2d(lon,lat, temp1[0,:,:],kind='linear',copy=False,bounds_error=True ) ; Z=f(x,y)
EDIT ::
Instead of making a 3D matrix, I appended the data in vertically and made data matrix of size 2541 X 77 and lat and lon of size 2541 X 1. the interp2d function gives Invalid length Error.
f=interp2d(lon,lat, temp1[0,:,:],kind='linear',copy=False,bounds_error=True )
"Invalid length for input z for non rectangular grid")
ValueError: Invalid length for input z for non rectangular grid
length of my x,y,z matrix are same (2541,2541,2541). Then why did it throw an Error?
Could any one explain ? Your help will be highly appreciated.
Processing of time series is very easy with RedBlackPy.
import datetime as dt
import redblackpy as rb
index = [dt.date(2018,1,1), dt.date(2018,1,3), dt.date(2018,1,5)]
lat = [10.0, 30.0, 50.0]
# create Series object
lat_series = rb.Series(index=index, values=lat, dtype='float32',
interpolate='linear')
# Now you can access at any key using linear interpolation
# Interpolation does not create new items in Series
# It uses neighbours to calculate value inplace when you call getitem
print(lat_series[dt.date(2018,1,2)]) #prints 20
So, if you want to just write interpolated values to csv file, you can iterate over list of needed keys and call getitem of Series object then put value to file:
# generator for dates range
def date_range(start, stop, step=dt.timedelta(1)):
it = start - step
while it < step:
it += step
yield it
#------------------------------------------------
# create list for keeping output strings
out_data = []
# create output file
out_file = open('data.csv', 'w')
# add head for output table
out_data.append('Time,Lat\n')
for date in date_range(dt.date(2018,1,1), dt.date(2018,1,5)):
out_data.append( '{:},{:}\n'.format(date, lat_series[date]) )
# write output Series
out_file.writelines(out_data)
out_file.close()
By the same way you can add to your processing Lon data.
If you want to create an "interpolator" object once, and use it to sequentially query just the specific points you need, you could take a loot at the scipy.interpolate.Rbf module:
"A class for radial basis function approximation/interpolation of n-dimensional scattered data."
Where n-dimensional would work for your data if you adjust ratio between temporal and spatial dimensions, and scattered meaning you can also use it for regular/uniform data.
If it's the same lat and lon for each time could you do it using slices and a manual interpolation. So if you want a 1D array of values at lat = 4.875, lon = 8.4 (obviously you would need to scale to match your actual spacing)
b = a[:,4:6, 8:10]
c = ((b[:,0,0] * 0.125 + b[:,0,1] * 0.875) * 0.6 + ((b[:,1,0] * 0.125 + b[:,1,1] * 0.875) * 0.4)
obviously you could do it all in one line but it would be even uglier
EDIT to allow variable lat and lon at each time period.
lat = np.linspace(55.0, 75.0, 33)
lon = np.linspace(1.0, 25.0, 33)
data = np.linspace(18.0, 25.0, 33 * 77 * 77).reshape(33, 77, 77)
# NB for simplicity I map 0-360 and 0-180 rather than -180+180
# also need to ensure values on grid lines or edges work ok
lat_frac = lat * 77.0 / 360.0
lat_fr = np.floor(lat_frac).astype(int)
lat_to = lat_fr + 1
lat_frac -= lat_fr
lon_frac = lon * 77.0 / 180.0
lon_fr = np.floor(lon_frac).astype(int)
lon_to = lon_fr + 1
lon_frac -= lon_fr
data_interp = ((data[:,lat_fr,lon_fr] * (1.0 - lat_frac) +
data[:,lat_fr,lon_to] * lat_frac) * (1.0 - lon_frac) +
(data[:,lat_to,lon_fr] * (1.0 - lat_frac) +
data[:,lat_to,lon_to] * lat_frac) * lon_frac)

Categories