I'm trying to scrape all of the French map.
I've got one issue :
1 - I'm limited by the zoom of the map
import requests
url ='https://www.iadfrance.fr/agent-search-location?southwestlat=47.0270782&southwestlng=-2.1560669&northeastlat=47.4930807&northeastlng=-1.0093689'
jsonObj = requests.get(url).json()
emails = jsonObj['agents']
#print (emails)
for agent in emails:
email = agent['email']
print(email)
Thank you
You'll have to utilize the longitude, latitude parameters in the request to "zoom out"
You can either change them manually, or I'm a fan of osmnx. You can use that to get the boundaries of different areas, then set a radius in meters to create your boundary box:
import requests
import osmnx as ox
import os
os.environ["PROJ_LIB"] = "C:/Users/xxxxxxx/AppData/Local/Continuum/anaconda3/Library/share"; #fixr
# Get a boundary box of a city/place/address
city = ox.gdf_from_place('Paris, France')
# Distance to make boundary from center in meters
# Essentially allows you to zoom out
distance = 300000
# Get centroid of that city/place boundary box
point = ( city['geometry'].centroid.x.iloc[0], city['geometry'].centroid.y.iloc[0] )
# Get a new boundary box a certain distance in North, South, East, West directions for x meters
boundary = ox.bbox_from_point(point, distance=distance , project_utm=False, return_crs=False)
sw_lat = boundary[3]
sw_lng = boundary[0]*-1
ne_lat = boundary[2]
ne_lng = boundary[1]*-1
# website to scrape https://www.iadfrance.fr/trouver-un-conseiller
url ='https://www.iadfrance.fr/agent-search-location'
# Here is the coordinates from orginial post
#payload = {
#'southwestlat': '47.0270782',
#'southwestlng': '-2.1560669',
#'northeastlat': '47.4930807',
#'northeastlng': '-1.0093689'}
payload = {
'southwestlat': sw_lat,
'southwestlng': sw_lng,
'northeastlat': ne_lat,
'northeastlng': ne_lng}
jsonObj = requests.get(url, params=payload).json()
emails = jsonObj['agents']
#print (emails)
for agent in emails:
email = agent['email']
print(email)
I find the right way, I must think out the box.
I've set 2 geographic data manually in a very very large area. ( one in Atlantic and an other in Russia ).
It works !
import requests
url ='https://www.iadfrance.fr/agent-search-location?southwestlat=9.884462&southwestlng=-35.58398&northeastlat=68.714264&northeastlng=44.796407'
jsonObj = requests.get(url).json()
emails = jsonObj['agents']
#print (emails)
for agent in emails:
email = agent['email']
print(email)
Related
I have written this code addressing to Geolocator API of Google Maps and it is not functional. Note that as an entry level devoloper I do this through Google collabs.
`import requests
import smtplib
import pandas as pd
file_path = "C:/Users/30697/Downloads/addresses.xlsx"
api_key = "MED_316KQ_4XqvYLSYa2k="
lat1 = 38.1579862
lng1 = 23.9626608
lat2 = 38.1579862
lng2 = 23.9626608
lat3 = 38.1540804
lng3 = 23.9595323
lat4 = 38.1540804
lng4 = 23.9595323
def save_addresses_to_excel(addresses, file_path):
# Create a DataFrame with the addresses
df = pd.DataFrame({"Address": addresses})
# Write the DataFrame to an Excel file
df.to_excel(file_path, index=False)
def get_addresses_in_region(api_key, lat1, lng1, lat2, lng2, lat3, lng3, lat4, lng4):
# Define the bounds of the region
bounds = f"{lat1},{lng1}|{lat2},{lng2}|{lat3},{lng3}|{lat4},{lng4}"
# Make the API request
response = requests.get(f"https://maps.googleapis.com/maps/api/geocode/json?bounds={bounds}&key={api_key}")
# Check if the request was successful
if response.status_code == 200:
# Parse the response JSON
data = response.json()
# Extract the addresses from the response
addresses = [result["formatted_address"] for result in data["results"]]
return addresses
else:
# Return an error message
return "Error: Could not retrieve addresses."
I expect a series of addresses to be displayed as a result.
According to the documentation, the bounds parameter is an optional addition to the request, which helps to refine the results. You must also specify address, components, latlng or place_id. Assuming you're trying to do reverse geocoding, you probably need to specify a single coordinate using latlng.
So my goal is to make a python script that reads an email and then selects a specific link in it, which it then opens in a web-browser.
But at the moment I'm stuck at the part whereby I get all the URL links. But I want to filter those to only a specific one
The specific URL contains "/user/cm-l.php?" but after the question mark, you get a randomly generated link.
Does someone know how to fix this or edit the script to filter for only URLs that contain that part?
I tried something with the re.search/findall/match but I couldn't make it work so it would filter for only that URL.
import imaplib
import email
import re
# imap and user credentials.
mail = imaplib.IMAP4_SSL('imap.domain.com')
mail.login('username#domain.com', 'password')
mail.list()
# connect to right mailbox inside inbox.
mail.select("inbox")
result, data = mail.search(None, "ALL")
# data is a list.
ids = data[0]
# ids is a space separated string.
id_list = ids.split()
# changes which e-mail to read. '-1': gets the latest e-mail.
latest_email_id = id_list[6]
result, data = mail.fetch(latest_email_id, "(RFC822)")
raw_email = data[0][1]
raw_email = str(raw_email)
# this will search al the urls in an email.
def Find(string):
regex = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/user)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))"
url = re.findall(regex,string)
return [x[0] for x in url]
# prints all of the URLs.
print(Find(raw_email))
By defining regex pattern with applying groups (..), you can find exact strings with optional pre- and suffix. ([a-zA-Z\/]*?)(\/user\/cm-l\.php\?)(.*)? includes three groups.
The following example shows how to access the extracted content.
import re
mailstring = """
/user/cm-l.php?
some link : /main/home/user/cm-l.php?
link with suffix /user/cm-l.php?345TfvbzteW4rv#!_
"""
def Find(string):
pattern = r'([a-zA-Z\/]*?)(\/user\/cm-l\.php\?)(.*)?'
for idx,match in enumerate(re.findall(pattern,string)):
print(f'### Match {idx}')
print('full= ',''.join(match))
print('0= ',match[0])
print('1= ',match[1]) # match[1] is the base url
print('2= ',match[2])
Find(mailstring)
'''
### Match 0
full= /user/cm-l.php?
0=
1= /user/cm-l.php?
2=
### Match 1
full= /main/home/user/cm-l.php?
0= /main/home
1= /user/cm-l.php?
2=
### Match 2
full= /user/cm-l.php?345TfvbzteW4rv#!_
0=
1= /user/cm-l.php?
2= 345TfvbzteW4rv#!_
'''
I'm using Google API to obtain the json data of nearby coffee outlets. To do this, I need to encode the latitude and longitude into the URL.
The required URL: https://maps.googleapis.com/maps/api/place/textsearch/json?query=coffee&location=22.303940,114.170372&radius=1000&maxprice=3&key=myAPIKey
The URL i'm obtaining using urlencode: https://maps.googleapis.com/maps/api/place/textsearch/json?query=coffee&location=22.303940%2C114.170372&radius=1000&maxprice=3&key=myAPIKEY
How can I remove the "%2C" in the URL? (I have shown my code below)
serviceurl_placesearch = 'https://maps.googleapis.com/maps/api/place/textsearch/json?'
parameters = dict()
query = input('What are you searching for?')
parameters['query'] = query
parameters['location'] = "22.303940,114.170372"
while True:
radius = input('Enter radius of search in meters: ')
try:
radius = int(radius)
parameters['radius'] = radius
break
except:
print('Please enter number for radius')
while True:
maxprice = input('Enter the maximum price level you are looking for(0 to 4): ')
try:
maxprice = int(maxprice)
parameters['maxprice'] = maxprice
break
except:
print('Valid inputs are 0,1,2,3,4')
parameters['key'] = API_key
url = serviceurl_placesearch + urllib.parse.urlencode(parameters)
I added this piece of code in to make the URL work however I don't think this is a long term solution. I'm looking for a more long term solution.
urlparts = url.split('%2C')
url = ','.join(urlparts)
You can add safe=","
import urllib.parse
parameters = {'location': "22.303940,114.170372"}
urllib.parse.urlencode(parameters, safe=',')
Result
location=22.303940,114.170372
I want to find the distance between two location using google API. I want output to be look like - "The distance between location 1 and location 2 is 500 miles ( distance here is example purposes )", but how can i get the desired output as the current program is showing various output ( which i cant use to get he desired output ) . can you guys please show me the way or show me what is the exact procedure to do it?
import urllib
import json
serviceurl = 'http://maps.googleapis.com/maps/api/geocode/json?'
while True:
address = raw_input('Enter location: ')
if len(address) < 1 : break
url = serviceurl + urllib.urlencode({'sensor':'false', 'address': address})
print 'Retrieving', url
uh = urllib.urlopen(url)
data = uh.read()
print 'Retrieved',len(data),'characters'
try: js = json.loads(str(data))
except: js = None
if 'status' not in js or js['status'] != 'OK':
print '==== Failure To Retrieve ===='
print data
continue
print json.dumps(js, indent=4)
lat = js["results"][0]["geometry"]["location"]["lat"]
lng = js["results"][0]["geometry"]["location"]["lng"]
print 'lat',lat,'lng',lng
location = js['results'][0]['formatted_address']
print location
Google has a specific api for that, it's called Google Maps Distance Matrix API.
Distance & duration for multiple destinations and transport modes.
Retrieve duration and distance values based on the recommended route
between start and end points.
If you just need the distance between two points on the globe you may want to use the Haversine formula
If you know lat and lon, use the geopy package:
In [1]: from geopy.distance import great_circle
In [2]: newport_ri = (41.49008, -71.312796)
In [3]: cleveland_oh = (41.499498, -81.695391)
In [4]: great_circle(newport_ri, cleveland_oh).kilometers
Out[4]: 864.4567616296598
The following link provides data in JSON regarding a BTC adress -> https://blockchain.info/address/1GA9RVZHuEE8zm4ooMTiqLicfnvymhzRVm?format=json.
The bitcoin adress can be viewed here --> https://blockchain.info/address/1GA9RVZHuEE8zm4ooMTiqLicfnvymhzRVm
As you can see in the first transaction on 2014-10-20 19:14:22, the TX had 10 inputs from 10 adresses. I want to retreive these adresses using the API, but been struggling to get this to work. The following code only retrieves the first adress instead of all 10, see code. I know it has to do with the JSON structure, but I cant figure it out.
import json
import urllib2
import sys
#Random BTC adress (user input)
btc_adress = ("1GA9RVZHuEE8zm4ooMTiqLicfnvymhzRVm")
#API call to blockchain
url = "https://blockchain.info/address/"+(btc_adress)+"?format=json"
json_obj = urllib2.urlopen(url)
data = json.load(json_obj)
#Put tx's into a list
txs_list = []
for txs in data["txs"]:
txs_list.append(txs)
#Cut the list down to 5 recent transactions
listcutter = len(txs_list)
if listcutter >= 5:
del txs_list[5:listcutter]
# Get number of inputs for tx
recent_tx_1 = txs_list[1]
total_inputs_tx_1 = len(recent_tx_1["inputs"])
The block below needs to put all 10 input adresses in the list 'Output_adress'. It only does so for the first one;
output_adress = []
output_adress.append(recent_tx_1["inputs"][0]["prev_out"]["addr"])
print output_adress
Your help is always appreciated, thanks in advance.
Because you only add one address to it. Change it to this:
output_adress = []
for i in xrange(len(recent_tx_1["inputs"])):
output_adress.append(recent_tx_1["inputs"][i]["prev_out"]["addr"])
print output_adress