I have a time series dataset where the pickup and drop off latitude and longitude coordinates are given.
Since coordinates of a city hardly vary, how to categorize them in python?
I want to make groups so that Classification algorithm can be applied.
I am pasting a single row of pick up and drop off longitude and latitude coordinates of New York city.
-73.973052978515625 40.793209075927734 -73.972923278808594 40.782520294189453
I have fixed the range of latitude from 40.6 to 40.9 and longitude range from -73.9 to -74.25
Now, I want to make them into groups so that classification algorithm can be applied.
For example you can insert you coordinates in a list of tuples called coordinates. Please note that I have appended also a couple of coordinates out of range. Here is the code:
coordinates = [
(-73.973052978515625,40.793209075927734),
(-73.972923278808594,40.782520294189453),
(-75.9,40.7)
]
filtered = list()
# filtering coordinates
for c in coordinates:
if -74.25 <= c[0] <= -73.9 and 40.6 <= c[1] <= 40.9:
filtered.append(c)
print filtered # here you have your filtered coordinates
Output:
[(-73.97305297851562, 40.793209075927734), (-73.9729232788086, 40.78252029418945)]
Related
As the title says, supposing I have a ds with coords: [time lat lon], how can I obtain for each timestep in time the pair of ['lat','lon'] in which the maximum(or minimum) value for a given variable is located.
Use xr.Dataset.idxmax to find the index label of the maximum along a dimension (one at a time). Same for xr.Dataset.idxmin.
max_lons = ds.max(dim="lat").idxmax(dim="lon")
max_lats = ds.max(dim="lon").idxmax(dim="lat")
The results will be datasets, with each variable giving the lon or lat coresponding to the maximum in each time step for that variable.
Let's say I have a large array of values that represent terrain latitude locations that is shape x. I also have another array of values that represent terrain longitude values that is shape y. All of the values in x as well as y are equally spaced at 0.005-degrees. In other words:
lons[0:10] = [-130.0, -129.995, -129.99, -129.985, -129.98, -129.975, -129.97, -129.965, -129.96, -129.955]
lats[0:10] = [55.0, 54.995, 54.99, 54.985, 54.98, 54.975, 54.97, 54.965, 54.96, 54.955]
I have a second dataset that is projected in an irregularly-spaced lat/lon grid (but equally spaced ~ 25 meters apart) that is [m,n] dimensions big, and falls within the domain of x and y. Furthermore, we also have all of the lat/lon points within this second dataset. I would like to 'lineup' the grids such that every value of [m,n] matches the nearest neighbor terrain value within the larger grid. I am able to do this with the following code where I basically loop through every lat/lon value in dataset two, and try to find the argmin of a the calculated lat/lon values from dataset1:
for a in range(0,lats.shape[0]):
# Loop through the ranges
for r in range(0,lons.shape[0]):
# Access the elements
tmp_lon = lons[r]
tmp_lat = lats[a]
# Now we need to find where the tmp_lon and tmp_lat match best with the index from new_lats and new_lons
idx = (np.abs(new_lats - tmp_lat)).argmin()
idy = (np.abs(new_lons - tmp_lon)).argmin()
# Make our final array!
second_dataset_trn[a,r] = first_dataset_trn[idy,idx]
Except it is exceptionally slow. Is there another method, either through a package, library, etc. that can speed this up?
Please take a look at the following previous question for iterating over two lists, which may improve the speed: Is there a better way to iterate over two lists, getting one element from each list for each iteration?
A possible correction to the sample code: assuming that the arrays are organized in the standard GIS fashion of Latitude, Longitude, I believe there is an error in the idx and idy variable assignments - the variables receiving the assignments should be swapped (idx should be idy, and the other way around). For example:
# Now we need to find where the tmp_lon and tmp_lat match best with the index from new_lats and new_lons
idy = (np.abs(new_lats - tmp_lat)).argmin()
idx = (np.abs(new_lons - tmp_lon)).argmin()
I am working with MODIS active fire data (data resolution 1 km). After deriving meaningful information from it, I have with me an array of size (72x4797x4797) [time x lat x lon], and mesh for lat (4797 x 4797) and lon (4797 x 4797). The latitude mesh decrease from 40 N to O with uniform dy of 0.0108 so that mesh have values changing in rows while each column is same as other. However, the longitude mesh has values changing in rows and columns which I guess is because for each latitude, the value of longitude is different because of satellite swath.
My objective is to have this data on WRF grid (lat 129 x lon 109 at 30 km resolution). The data has NaN for all points with no fire and values at points of active fire. Using scipy interpolation with griddata returns an array of all NaNs which is of no use as all information is lost.
To make a coarse resolution data on new lat-lon, I am trying to have use a nearest neighbour approach, for example if multiple active fires are present in fine grid, assign that lat-lon to nearest coarse grid lat-lon while taking average of all such points in that coarse grid square.
In the code below,
lon_wrf, lat_wrf are new longitude and latitude of interest at coarse
resolution.
lon_mosaic, lat_mosaic are both 2-D arrays of longitude
and latitude at fine resolution.
Block_OC_day is a 2-D array to be
made on newer coarse resolution.
I am first finding all the locations of fire that are not NaN and extracting the lon and lat for these locations. Next I am finding the nearest lon and lat in target grid. The function gives me the indices and values. Next I would like is averaging the value of Block_OC_day on all points that have same lat-lon on target grid.
lat_1d = lat_mosaic[:,0] # making 2-D latitude mesh to 1-D
fire_loc = np.argwhere(~np.isnan(Block_OC_day)) # finding all locations with non-NaN
fire_lat = lat_1d[fire_loc[:,0]] # finding latitude of fine resolution
fire_lon = lon_mosaic[fire_loc[:,0],fire_loc[:,1]] # finding longitue of fine resolution
# Function to find nearest values and index
def nearest(value_to_search,lookup_array):
'''Finds nearest value and index closest to a value in an array'''
idx = np.argmin(np.abs(lookup_array-value_to_search))
closest_value = lookup_array[idx]
return closest_value, idx
# Storing target latitude and index
va_lat = []
li_lat = []
for i in range(0,len(fire_lat)):
a, idx = nearest(fire_lat[i],lat_wrf)
va_lat.append(a)
li_lat.append(idx)
# storing target longitude and index
va_lon = []
li_lon = []
for i in range(0,len(fire_lon)):
a, idx = nearest(fire_lon[i],lon_wrf)
va_lon.append(a)
li_lon.append(idx)
I have found a solution to solve the problem. It involves finding common indices of lat-lon and then fill them with averaged values of all points in fine-grid.
There may be a better solution but this code works fine.
# Using specific indices
i = 108
j = 128
# np.argwhere(li_lon==i)
# np.argwhere(li_lat == j)
inter = np.intersect1d(np.argwhere(li_lon==i), np.argwhere(li_lat == j)) # finding locations to replace
# fire_lon[inter]
# fire_lat[inter]
NoNaNFire = Block_OC_day[~np.isnan(Block_OC_day)] # extracting values at those locations
AvgFire = np.nanmean(NoNaNFire[inter]) # averaging all fine-grid points
# AvgFire returns the average value of variable at i and j of coarse grid
# For any 2-D coarse grid
Block_WRF = np.empty((lat_wrf.shape[0],lon_wrf.shape[0]))
for i in range(0,len(lon_wrf)):
for j in range(0,len(lat_wrf)):
inter = np.intersect1d(np.argwhere(li_lon==i), np.argwhere(li_lat == j))
Block_WRF[j,i] = np.nanmean(NoNaNFire[inter])
I have a .nc file that I open with xarray as a dataset. This dataset has 3 variables:
Band (5000x300x250)
latitude (300x250)
longitude (300x250)
Its dimensions are:
time (5000)
y (300)
x (250)
I created the dataset myself and made a mistake, because I would like to "grab" the timeseries of a specific point of "Band" based on its coordinates value:
dataset.Band.sel(longitude=6.696e+06,latitude=4.999e+05,method='nearest')
(I based the values to grab on the first values of both variables).
The issue is that when I created the .nc file, I did not enter the latitude and longitude as dimensions but as variables. Is there a way to use my code but modify a few things so I can grab the point based on the nearest values of variables latitude and longitude ? Or should I redefine completely the dimensions of my .nc to replace x and y by longitude and latitude?
there isn't a great way to select data using the lat/lon values - as your data is structured you essentially have mutlidimensional coordinates.
That said, if your lat/lon are actually only indexed by x OR y; that is, latitude has the same value repeated over and over for all levels of x and same for longitude with y, you could reorganize your data pretty easily:
lats = dataset.latitude.mean(dim='x')
lons = dataset.longitude.mean(dim='y')
dataset = dataset.drop(['latitude', 'longitude'])
dataset.coords['latitude'] = latitude
dataset.coords['longitude'] = longitude
dataset = dataset.swap_dims({'x': 'longitude', 'y': 'latitude'})
At this point, your data is indexed by time, latitude, longitude and you can select the data how you'd like
Can I convert from longitude/latitude (x,y) coordinates to cartesian (x,y,z) without having elevation. I have checked some forms discussing converting longitude/latitude into cartesian coordinates, and here's the code in python.
R = numpy.float64(6371000) # in meters
longitude = numpy.float64(lon)
latitude = numpy.float64(lat)
X = R * math.cos(longitude) * math.sin(latitude)
Y = R * math.sin(latitude) * math.sin(longitude)
Z = R * math.cos(latitude)
Problem Statement: I have data gathered from different locations. However, these locations are in longitude and latitude format. Are these two attributes are enough to convert the locations into cartesian format ?. Is the code above is correct ?
You need the distance from the center of the Earth to convert latitude+longitude to X,Y,Z (either Earth-Centered, Earth-Fixed or Earth-Centered Inertial). Earth-Centered Inertial also requires the time to convert from longitude to angle.
The reason for this is simple: you need three independent variables since it's a 3D coordinate system. Latitude and longitude are only two variables, you need the distance from the center (R in your equations above) to do that.
If you are looking for ground coordinates, you can use the Google Elevation API to get R in your equations above. But either way you need this information for the coordinate transform.