I have a csv file with a table that has the columns Longitude, Latitude, and Wind Speed. I have a code that takes a csv file and deletes values outside of a specified bound. I would like to retain values whose longitude/latitude is within a 0.5 lon/lat radius of a point located at -71.5 longitude and 40.5 latitude.
My example code below deletes any values whose longitude and latitude isn't between -71 to -72 and 40 to 41 respectively. Of course, this retains values within a square bound ±0.5 lon/lat around my point of interest. But I am interested in finding values within a circular bound with radius 0.5 lon/lat of my point of interest. How should I modify my code?
import pandas as pd
import numpy
df = pd.read_csv(r"C:\\Users\\xil15102\\Documents\\results\\EasternLongIsland50.csv") #file path
indexNames=df[(df['Longitude'] <= -72)|(df['Longitude']>=-71)|(df['Latitude']<=40)|(df['Latitude']>=41)].index
df.drop(indexNames,inplace=True)
df.to_csv(r"C:\\Users\\xil15102\\Documents\\results\\EasternLongIsland50.csv")
Basically you need to check if a value is a certain distance from a central point (-71.5 and 40.5); to do this use the pythagorean theorem/distance formula:
d = sqrt(dx^2+dy^2).
So programmatically, I would do this like:
from math import sqrt
drop_indices = []
for row in range(len(df)):
if (sqrt(abs(-71.5 - df[row]['Longitude'])*abs(-71.5 - df[row]['Longitude']) + abs(40.5-df[row]['Latitude'])*abs(40.5-df[row]['Latitude']))) > 0.5:
drop_indices.append(row)
df.drop(drop_indices)
Sorry that is a sort for disgusting way to get rid of the rows and your way looks much better, but the code should work.
You should write a function to calculate the distance from your point of interest and drop those. Some help here. Pretty sure the example below should work if you implement is_not_in_area as a function to calculate the distance and check if dist < 0.5.
df = df.drop(df[is_not_in_area(df.lat, df.lon)].index)
(This code lifted from here)
Edit: drop the ones that aren't in area, not the ones that are haha.
Related
Say I have a location A, with IP address, and a geolocator API with latitude and longitude. Now I want to find all instances that are within a 25-mile radius of location A. How can I compute this with the least amount of steps?
solution A: I can compute all distances between location A and all instances in the database, and display instances within 25 radius. (way too slow especially if I want a dynamic location, with a large database of locations)
solution B: I can group all instances in terms of zip code in addition to IP, and (lat, long). so that fewer distances between location A and instances needed to be computed. (better, but what if the IP address is at the border of another zip code, this will add to the amount of needed computation)
solution C: I can use trigonometry. using the latitude and longitude of location A. i can find each instance with in the 25 mile radius.
Can someone please describe a better way of comparing distances? ideas and suggestions are much appreciated (if further explanation is needed, pls ask) Thanks.
I'd use a combination of your proposed solutions a and c. You can query your database directly using a filter that only selects the locations within a 25 miles radius (or any other radius). Calculating the longitudinal difference in miles is a little tricky because the mileage of one degree in longitude differs with latitude. Kudos go out to this explanation: https://gis.stackexchange.com/questions/142326/calculating-longitude-length-in-miles#142327
Assuming you have the following DB schema with existing locations (only latitude and longitude as columns):
CREATE TABLE location (
lat REAL,
lon REAL
);
You're able to filter only the locations within a 25 mile radius using this query:
query = """
SELECT (lat - ?) AS difflat,
(lon - ?) AS difflon
FROM location
WHERE POWER(POWER(difflat * 69.172, 2) + POWER(difflon * COS(lat*3.14/180.) * 69.172, 2), 0.5) < ?;
"""
Using the query then like this:
radius = 25 #miles
cursor.execute(query, (querylocation['lat'], querylocation['lat'], radius))
SQLite3 unfortunately doesn't support basic mathematical functions like COS and POWER but they can easily be created:
import math
con = sqlite3.connect(db_path)
con.create_function('POWER', 2, math.pow)
con.create_function('COS', 1, math.cos)
I have a table in a Postgres DB containing places and their corresponding latitude and longitude values.
Places (id, name, lat, lng) # primary-key(id)
I will get an input of a pair of latitudes and longitudes forming a rectangle.
{
"long_ne": 12.34,
"lat_ne": 34.45,
"long_sw": 15.34,
"lat_sw": 35.56
}
I want to get all the rows that fall inside the rectangle.
The rows can't be sorted based on their lat-lng values as that will cause trouble while inserting new values.
What would be the best way to go about solving this to optimize queries to get the result?
I can obviously do it using the WHERE clause, but would it be the ost optimized solution? There would be a massive number of rows in the table and is there a way this query can be optimized to speed up the result?
It this what you want?
select *
from places
where
lat between :lat_no and :lat_sw
and lng between :long_no and :long_sw
Where :lat_no, :lat_ws :long_no and :long_sw are the input parameters to the query.
This gives you all rows whose latitude and longitude fall between the square boundaries. Note that between considers the inteval inclusive of their bounds on both ends. You can change this as needed with inequality conditions, for example, this makes the match inclusive on the lower bound and exclusive on the upper bound:
where
lat >= :lat_no and lat < :lat_sw
and lng >= :long_no and lng < :long_sw
Won't a simple SQL query suffice here?
SELECT *
FROM Places
WHERE (lat > lat_sw)
AND (long > long_sw)
AND (lat < lat_ne)
AND (long < long_ne)
I have over 1 million rows of Latitude Longitude positions. My goal is to check each of these rows against a data set of about 43000 ZipCodes that have a central Latitude Longitude.
I want to calculate the haversine distance between each row with the large ZipCodes list. I then want to take the closest lat/long and return that or the corresponding zip code to the left most frame (in essence, giving the closest ZipCode to the latitude/longitudes in the large frame.
I have tried several things including vectorized haversine functions and looping through each row, calculating and moving to next but I can't quite get them to work. Given the large size of my data I know that simply looping through each row and calculating won't work. I need a new solution. I think it might involve vectorization.
Here are some sample frames of my data. df is the large frame I am trying to calculate the smallest distance from the zip_list and return the corresponding zip code to the large frame.
df = pd.DataFrame(np.array([[42.801104,-76.827879],[38.187102,-83.433917],
[35.973115,-83.955932]]), columns = ['Lat', 'Long'])
zip_list = pd.DataFrame(np.array([[49544, 42.999561,-85.75371],[49648,
45.000254,-85.3651],[49654, 45.023384,-85.75697],[50265,
41.570916,-93.73568]]), columns = ['ZipCode', 'Latitude', 'Longitude'])
I would like to return the minimum distance zip code to the corresponding row in the df frame.
Any ideas would be great. I am a beginner with vectorization and numpy/pandas.
I have a DataFrame with latitude and longitude of places (restaurants) and a DataFrame with latitude and longitude of neighborhoods (area).
I would like, for each neighborhood, to count the number of restaurants in a 3km area (numberR).
I have written the following code, and it works:
df=pd.DataFrame()
numberR=[]
radius=3
for element in range(0,area['lon'].count()): #for every neighborhood
df=pd.DataFrame()
df['destLat']=restaurants['lat']
df['originLat']=areas['lat'][element]
df['destLon']= restaurants['lng']
df['originLon']=area['lon'][element]
for i, row in df.iterrows():
#for every restaurant I compute the distance from my neighborhood in km
l=[haversine(df.originLon[i],df.originLat[i],df.destLon[i],df.destLat[i]) for i, row in df.iterrows()]
numberR.append(sum(x<radius for x in l))
However, I would like to make the code quicker as it is very slow.
Do you have any idea on how could I do to reach the same result in less time?
Thanks in advance.
P.S. haversine is the well known function for getting distance in km starting from lat and lng.
I would recommend you to use functions from scipy.spacial.distance.
from scipy.spatial.distance import cdist
distances = cdist(areas, restaurants, metric=haversine) # metric accepts a callable
sum(distances > 3) # sums columns
The cdist function computes distances between each pair of rows of the two DataFrames.
Also, you should modify the haversine function as to be able to accept DataFrame rows.
I have 4 Dataframes (ticket_data.csv, providers.csv, stations.csv and cities.csv)
In stations.csv I have 2 colls called o_city (origin city) and d_city (destination city) those two colls gives me the id of the city i need to look for in cities.csv
In cities.csv I have the lat and long of each city.
How can i calculate the distance between o_city and d_city for each ticket ? I tried to use pyproj but I didn't find a way to make it work with each ticket..
Screenshot of csv files :
ticket_data.csv
cities.csv
Welcome to StackOverflow! In your cities dataframe, assuming here it is called city_df; for each row you can use something called the haversine distance formula from Euclidean geometry to calculate the distance between two coordinate pairs on Earth's surface. Here is an example of some dummy Python3 code of roughly how you may go about this (just using two pairs of coordinates for ease of communication):
from haversine import haversine
distance = haversine((city_df[origin_lat][0], city_df[origin_lon][0]), (city_df[destination_lat][0], city_df[destination_lon][0]))
The coordinates must be in decimal degree notation as in 43.9202 instead of 43* 38" 67' notation. Given this, the output value of distance will be in km units.
Hope this helps you get closer to solving your problem!
P. S. - you may need to install haversine, as it is not in the standard libary