The current format of the longitude data is in the form (0,360) as shown in the picture. I want to get it to be (-180,180) where (-180,0) is equal to (180,360).
Assuming you have a column of longitude data (it's hard to tell from your sample), something like this should work:
pick = pd['longitude']>180
pd.loc[pick,'longitude'] = pd.loc[pick,'longitude'] - 360
The first line chooses the rows where longitude needs adjusting. The second line does the math for only those rows.
Related
I have a database that requires input data to be in an atypical format. Normally latitude and longitude would be in separate columns. In this case I need to bring them into the same column and then add an additional column to differentiate coordinate type. I am a python novice so I am in a bit of a bind as to where to start.
I need to get from this:
Location
Latitude
Longitude
Place1
32.123
120.123
Place2
31.321
121.321
To this:
Location
Lat/Long
Coords
Place1
Latitude
32.123
Place1
Longitude
120.123
Place2
Latitude
31.321
Place2
Longitude
121.321
Edit - This is a simplified example. The data I'm working with has a dozen other columns all of which I would like to preserve, only lengthening with the lat/long columns.
My initial thought was to create two exports of dataframe, one where I take the latitude field and one where I take the longitude field, and then recombine them (duplicating the records), but then calculating the values for the 'Lat/Long' and 'Coords' fields respectively. I do not know a) whether or not this is the right approach or; b) a clean path to get there. Any thoughts would be appreciated.
I have a csv file with a table that has the columns Longitude, Latitude, and Wind Speed. I have a code that takes a csv file and deletes values outside of a specified bound. I would like to retain values whose longitude/latitude is within a 0.5 lon/lat radius of a point located at -71.5 longitude and 40.5 latitude.
My example code below deletes any values whose longitude and latitude isn't between -71 to -72 and 40 to 41 respectively. Of course, this retains values within a square bound ±0.5 lon/lat around my point of interest. But I am interested in finding values within a circular bound with radius 0.5 lon/lat of my point of interest. How should I modify my code?
import pandas as pd
import numpy
df = pd.read_csv(r"C:\\Users\\xil15102\\Documents\\results\\EasternLongIsland50.csv") #file path
indexNames=df[(df['Longitude'] <= -72)|(df['Longitude']>=-71)|(df['Latitude']<=40)|(df['Latitude']>=41)].index
df.drop(indexNames,inplace=True)
df.to_csv(r"C:\\Users\\xil15102\\Documents\\results\\EasternLongIsland50.csv")
Basically you need to check if a value is a certain distance from a central point (-71.5 and 40.5); to do this use the pythagorean theorem/distance formula:
d = sqrt(dx^2+dy^2).
So programmatically, I would do this like:
from math import sqrt
drop_indices = []
for row in range(len(df)):
if (sqrt(abs(-71.5 - df[row]['Longitude'])*abs(-71.5 - df[row]['Longitude']) + abs(40.5-df[row]['Latitude'])*abs(40.5-df[row]['Latitude']))) > 0.5:
drop_indices.append(row)
df.drop(drop_indices)
Sorry that is a sort for disgusting way to get rid of the rows and your way looks much better, but the code should work.
You should write a function to calculate the distance from your point of interest and drop those. Some help here. Pretty sure the example below should work if you implement is_not_in_area as a function to calculate the distance and check if dist < 0.5.
df = df.drop(df[is_not_in_area(df.lat, df.lon)].index)
(This code lifted from here)
Edit: drop the ones that aren't in area, not the ones that are haha.
I have a wrf output netcdf file.File have variables temp abd prec.Dimensions keys are time, south-north and west-east. So how I select different lat long value in region. The problem is south-north and west-east are not variable. I have to find index value of four lat long value
1) Change your Registry files (I think it is Registry.EM_COMMON) so that you print latitude and longitude in your wrfout_d01_time.nc files.
2) Go to your WRFV3 map.
3) Clean, configure and recompile.
4) Run your model again the way you are used to.
I am new to pandas. I have a csv file which has a latitude and longitude columns and also a tile ID column, the file has around 1 million rows. I have a list of around a hundred tile ID's and want to get the latitude and longitude coordinates for these tile ID's. Currently I have:
good_tiles_str = [str(q) for q in good_tiles]#setting list elements to string data type
file['tile'] = file.tile.astype(str)#setting title column to string data type
for i in range (len(good_tiles_str)):
x = good_tiles_str[i]
lat = file.loc[file['tile'].str.contains(x), 'BL_Latitude'] #finding lat coordinates
long = file.loc[file['tile'].str.contains(x), 'BL_Longitude'] #finding long coordinates
print(lat)
print(long)
This method is very slow and I know it is not the correct way as I heard you should not use for loops like this whilst using pandas. Also, it does not work as it doesn't find all the latitude and longitude points for the tile ID's
Any help would be very gladly appreciated
There is no need to iterate rows explicitly , I think as far as I understood your question.
If you wish a particular assignment given a condition, you can do so explicitly. Here's one way using numpy.where; we use ~ to indicate "negative".
rule1= file['tile'].str.contains(x)
rule2= file['tile'].str.contains(x)
file['flag'] = np.where(rule1 , 'BL_Latitude', " " )
file['flag'] = np.where(rule2 & ~rule1, 'BL_Longitude', file['flag'])
Try this:
search_for = '|'.join(good_tiles_str)
good = file[file.tile.str.contains(search_for)]
good = good[['BL_Latitude', 'BL_Longitude']].drop_duplicates()
I have a netCDF file with a grid (each step 0.25°).
What I want is the value of the variable, lets say tempMax, at a certain gridpoint, over the last 50 years.
I am aware that you read the data into python like this
lon = numpy.array(file.variables['longitude'][:])
lat = numpy.array(file.variables['latitude'][:])
temp = numpy.array(file.variables['tempMax'][:])
time = numpy.array(file.variables['time'][:])
That leaves me with an array and I do not know how to "untangle" it.
How to get the value at a certain coordinate (stored in temp) over the whole time (stored in time)?
S display is the value over the time at the certain coordinate.
Any ideas how I could achieve that?
Thanks!
I'm guessing that tempMax is 3D (time x lat x lon) and should then be read in as
temp = ncfile.variables['tempMAx'][:,:,:]
(Note two things: (1) if you're using Python v2, it's best to avoid the word file and instead use something like ncfile as shown above, (2) temp will be automatically stored as a numpy.ndarray simply with the call above, you don't need to use the numpy.array() command during the read in of variables.)
Now you can extract temperatures for all times at a certain location with
temp_crd = temp[:,lat_idx,lon_idx]
where lat_idx and lon_idx are integers corresponding to the index of the latitude and longitude coordinates. If you know these indices beforehand, great, just plug them in, e.g. temp_crd = temp[:,25,30]. (You can use the tool ncdump to view the contents of a netCDF file, https://www.unidata.ucar.edu/software/netcdf/docs/netcdf/ncdump.html)
The more likely case is that you know the coordinates, but not their indices beforehand. Let's say you want temperatures at 50N and 270E. You can use the numpy.where function to extract the indices of the coordinates given the lat and lon arrays that you've already read in.
lat_idx = numpy.where(lat==50)[0][0]
lon_idx = numpy.where(lon==270)[0][0]
tmp_crd = temp[:,lat_idx,lon_idx]