Unable to let my script populate result using post requests - python

I've created a script using python in combination with selenium to parse the id,vikey and cbhtmlfragid meant to be used as payload while being used within a post http requests. As I found it difficult to scrape id,vikey and cbhtmlfragid using requests, I thought to grab them using selenium so that I can use them while making a post requests.
I'm trying to populate result using a in the inputbox right next to Entity Name Or Identifier. I could notice that the result are populated through a post requests which I'm trying to achieve programmatically.
website link
To populate the result it is necessary to follow the steps sequentially in this image which ultimately leads to this image
I've tried with:
import re
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = 'https://www.businessregistration.moc.gov.kh/'
post_url = 'https://www.businessregistration.moc.gov.kh/cambodia-master/viewInstance/update.html?id={}'
payload = {
'QueryString': 'a',
'SourceAppCode': 'cambodia-br-soleproprietorships',
'OriginalVersionIdentifier': '',
'nodeW772-Advanced': 'N',
'_CBASYNCUPDATE_': 'true',
'_CBHTMLFRAGNODEID_': 'W762',
'_CBHTMLFRAGID_': '',
'_CBHTMLFRAG_': 'true',
'_CBNODE_': 'W778',
'_VIKEY_': '',
'_CBNAME_': 'buttonPush'
}
def get_content(wait,link):
driver.get(link)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"a[data-rel='#appMainNavigation']"))).click()
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"a[class$='menu-soleproprietorships']"))).click()
elem = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"a[class$='menu-brSoleProprietorSearch']")))
driver.execute_script("arguments[0].click();",elem)
item_id = driver.current_url.split("id=")[1].split("&_timestamp")[0]
x_catalyst = re.findall(r"sessionId:'(.*?)',", str(driver.page_source), flags=re.DOTALL)[0]
item = re.findall(r"viewInstanceKey:'(.*?)',", str(driver.page_source), flags=re.DOTALL)[0]
elem = re.findall(r"guid:(.*?),", str(driver.page_source), flags=re.DOTALL)[0]
return item_id,x_catalyst,item,elem
def make_post_requests(item_id,x_catalyst,item,elem):
payload['_VIKEY_'] = item
payload['_CBHTMLFRAGID_'] = elem
res = requests.post(post_url.format(item_id),data=payload,headers={
'user-agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36',
'x-requested-with':'XMLHttpRequest',
'x-catalyst-session-global':x_catalyst
})
soup = BeautifulSoup(res.text,"lxml")
result_count = soup.select_one("[class='appPagerBanner']")
print(result_count)
if __name__ == '__main__':
driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)
item_id,x_catalyst,item,elem = get_content(wait,link)
make_post_requests(item_id,x_catalyst,item,elem)
driver.quit()
When I execute the above script, I could find out that there is no result in there. So, I suppose I went somewhere wrong.
How can I let my script populate result using post requests?

Well, The POST request needing multiple parameters from the website. So here's the plan.
We will make GET request to the target url and collect all our required POST data.
https://www.businessregistration.moc.gov.kh/cambodia-master/relay.html?url=https%3A%2F%2Fwww.businessregistration.moc.gov.kh%2Fcambodia-master%2Fservice%2Fcreate.html%3FtargetAppCode%3Dcambodia-master%26targetRegisterAppCode%3Dcambodia-br-soleproprietorships%26service%3DregisterItemSearch&target=cambodia-master
we will then check/verify how the structure of the POST request is applied within the POST request which is requiring you to track the POST parameters and how they acting behind.
nodeWxxx-Advanced where xxx is 3 digits auto generated.
_CBHTMLFRAGNODEID_ that key is also holding an auto generated value.
_CBHTMLFRAGID_ that's holding a unix timestamp auto generated to validate the post request.
_CBNODE_ is a mechanism key which is auto generated for the first time but it's then rooted with 4 points up. and then became static
_VIKEY_ is holding a value which presented within the HTML source as well.
_CBNAME_ is holding 3 different values which is buttonPush, selectPage and pageSizeChange and we don't need to use the first value. so we will work on 2nd and 3rd. i don't need to explain you what those values do as the name already present the meaning.
_CBVALUE_ is acting as double face key, which hold a digit value which can act as the pagination if it's called with selectPage, And it can act as the size of the items per page if it's called with pageSizeChange. where 4 is meaning 200 items per page which is the maximum.
You might found this complicated if you parsed the HTML and found multiple values of Wxxx where xxx is 3 digits, but I've been able to analyze the HOST structure itself.
Now we've prepared our POST data. but here's a note that our GET request got redirection to the following url.
https://www.businessregistration.moc.gov.kh/cambodia-master/viewInstance/view.html?id=
Where id is actually the an automatically generated token where the host identify your request as a valid one or not.
Now before made the POST request, we need to change the parameter which is handling the operation of viewing the data. In our case we will need to change view to update to be as the following
https://www.businessregistration.moc.gov.kh/cambodia-master/viewInstance/update.html?id
Now we are ready to call the target but firstly we will make POST request to change the items per page to be 200 items. as i described above.
Now after our previous POST, we will make a loop of POSTS to iterate over pages with our form data.
the current result of the website is 1,421 which means (200 item multiplied by 8 will give us 1600) but the last page is included only 21 items so that's why 1421
Below is the final code:
import requests
from bs4 import BeautifulSoup
import re
one = "https://www.businessregistration.moc.gov.kh/cambodia-master/relay.html?url=https%3A%2F%2Fwww.businessregistration.moc.gov.kh%2Fcambodia-master%2Fservice%2Fcreate.html%3FtargetAppCode%3Dcambodia-master%26targetRegisterAppCode%3Dcambodia-br-soleproprietorships%26service%3DregisterItemSearch&target=cambodia-master"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0"
}
def main(url):
with requests.session() as req:
r = req.get(url, headers=headers, allow_redirects=True)
node = re.search(r"nodeW\d{3}-Advanced", r.text).group()
nodeid = re.search(r"AsyncWrapperW\d{3}", r.text).group()
agid = re.search(r"guid:(\d+)", r.text).group(1)
cbnode = re.findall(r"Callback\('([^']*)'", r.text)[5]
vikey = re.search(r"viewInstanceKey:'([^']*)'", r.text).group(1)
data = {
'QueryString': 'a',
'SourceAppCode': 'cambodia-br-soleproprietorships',
'OriginalVersionIdentifier': '',
f"{node}": 'N',
'_CBASYNCUPDATE_': 'true',
'_CBHTMLFRAGNODEID_': nodeid[-4:],
'_CBHTMLFRAGID_': agid,
'_CBHTMLFRAG_': 'true',
'_CBNODE_': f"W{int(cbnode[1:])+4}",
'_VIKEY_': vikey,
'_CBNAME_': 'selectPage'
}
target = r.url.split("&")[0].replace("view.", "update.")
change = data.copy()
change['_CBNAME_'] = "pageSizeChange"
change['_CBVALUE_'] = "4"
r = req.post(target, headers=headers, data=change)
for num in range(1, 9):
print(f"Extracting Page# {num}\n {'*' * 40}")
data['_CBVALUE_'] = num
r = req.post(target, headers=headers, data=data)
soup = BeautifulSoup(r.content, 'html.parser')
for item in soup.findAll("span", class_="appReceiveFocus")[3:]:
print(item.text)
main(one)
Please also stop to delete your question after you receive the answer, as you always do on your previous posts where you usually delete the post after receiving the answer.
Sample of output:
Extracting Page# 1
****************************************
អេមី ជេមស៍ / AMY GEMS (50009790)
អាហារដ្ឋាន លីន លៀនជីន ហ៊្វូដ ហ្វ្លេវើ / AHARATHAN LIN LIANJIN FOOD FLAVOR (50009800)
អាមេទីស ដាយមិន ខេធីវី / AMETHYST DIAMOND KTV (50009801)
អេស៊ា ដាយអឹម៉ិន ស៊ីធី អេនធើប្រាយ / ASIA DIAMOND CITY ENTERPRISE (50009845)
អរុណ យ៉ាណា / ARUN YANA (50009848)
អេ ដឺ ស៊ាវ ជី ឌាន / A DE XIAO CHI DIAN (50009856)
អង្គរ ផែល ប៊ូទិក / ANGKOR PAL BOUTIQUE (50009862)
អូហ្ក្រេ ឌុយ សូឡី / AUGRE DU SOLEIL (50001007)
អង្គរធំ មន្ទីរពហុព្យាបាល និង សម្ភព / ANGKOR THOM MONTY PEAHUK PYEABAL & SAMPHUB (50009905)
អាមីកា ត្រាវែល (កាំបូដ្យ) / AMICA TRAVEL (CAMBODGE) (50009916)
អង្គរ ជិនសារ​ ហ៊្វូដ / ANGKOR JINSHA FOOD (50009973)
អចលនទ្រព្យ វ៉ាន់សៀង / ACHALNAK TROP VEAN SIENG (50009986)
អាឡាដាំង សឹបផ្លាយអឺរ / ALADANG SUPPLIER (50010017)
អាន ស៊ី មីន ណាន ស៊ាវ ឈី / AN XI MIN NAN XIAO CHI (50010068)
អង្គរ មែនហ្គោ ទ្រី រីហ្សត / ANGKOR MANGO TREE RESORT (50010072)
អាហារដ្ឋាន ជៀ មីង យៀក កាយ / AHARATHAN JIA MING YUE CAI (50010073)
អូប៊ែ អេលេហ្វិន ប្លង្គ / AUBERGE ELEPHANT BLANC (50001029)
អានីស ស្តារ អ៊ិនវ៉ាយរ៉ិនម៉ិនថល ខូតធីង / ANISE STAR ENVIRONMENTAL COATINGS (50010091)
អរសាំ ហៅស៍ វីឡា / AWESOME HOUSE VILLA (50010131)
អាន វេងស្រ៊ុន ត្បូងឃ្មុំ / AN VENGSRUN TBONG KHMUM (50010150)
ភោជនីយដ្ឋាន អា យុង តា ផាយ តាំង / AH YONG TA PAI TANG RESTAURANT (50010177)
អា ឆ័ន ចុង ឆាង ធីង / A CHUAN ZHONG CHAN TING (50010232)
អាតឡូវ ខនសាល់ធីង / ARTLOW CONSULTING (50010235)
សណ្ឋាគារ អង្គរ យ៉ាយ័ន / ANGKOR YAYUAN HOTEL (50010236)
អេ.អេម ហាងបញ្ចាំ / A.M HANG BANH CHAM (50010326)
អង់ទ្រីក្ស៍ ត្រេឌើរ / ANTRIKSH TRADERS (50010335)
អង្គរ សុខសាន្តត្រាណ / ANGKOR SOKSAN TRAN (50001055)
អង្គរ ស៊ែរ ធួរ / ANGKOR SHARE TOURS (50010341)
អង្គរ សិលា ពេជ្រ / ANGKOR SEILA PICH (50001056)
អ័ង ចូង អ័ង / ANG CHOUNG ANG (50010382)
អៅ មិន ជិន ឡុង 18 / AO MEN JIN LONG 18 (50010387)
អង្គរ ភើល / ANGKOR PEARL (50001067)
ភោជនីយដ្ឋាន អូ ម៉ាស្សេ / AU MARCHE RESTAURANT (50010465)
ភោជនីយដ្ឋាន អេ អាយ / A I RESTAURANT (50010472)
អាន ស៊ី មីង ស៊័ន ជឹន ស័រ / AN XI MING XUAN ZHEN SUO (50010513)
អាម៉ាណូ ឃែរ វ៉ធើរ / AMANO CARE WATER (50010521)
អាសស្តា វ៉ធើរ / ASSDA WATER (50010555)
អង្គរ ឡូធើស អាត / ANGKOR LOTUS ART (50010576)
សណ្ឋាគារ អេផល 2 / APPLE 2 HOTEL (50010621)
អាហារដ្ឋាន ផ្ទះបាយខ្មែរ ភឿក / AHARATHAN PHTEAH BAI KHMER PERK (50001084)
អង្គរ ស៊ីវុត្ថា ហូថេល / ANGKOR SIVUTHA HOTEL (50010625)
អាទីហ្វិក ឃីតឈិនវែរ អិកសុីប៊ីសិន សេនធ័រ / ARTIFEX KITCHENWARE EXHIBITION CENTER (50010654)
សណ្ឋាគារ អង្គរ យៀ ធីម / ANGKOR YEAR THEME HOTEL (50010659)
អេប៊ីឡូស៍ ឌីជីធលអេដជ៍ / ABILOS DIGITALEDGE (50010666)
សណ្ឋាគារ អេភី 7 ដេ / AP 7 DAY HOTEL (50010700)
អង្គរ អឹមេហ្ស៊ីង ហូលីដេ & ធួរ / ANGKOR AMAZING HOLIDAY & TOURS (50001092)
អេអេហ្វអេសស៊ី ព្រីមៀម ស្ទ័រ / AFSC PREMIUM STORE (50010703)
ភោជនីយដ្ឋាន អូ រ៉ង់ដេ-វូ / AU RENDEZ-VOUS RESTAURANT (50010720)
អន្តរជាតិលេខ១ វី យា ណា / ANYARAKCHEAT LEK 1 VY YA NA (50010733)
អៅ ម៉ាយ យ៉ា លី យី មេន ជ័ង / AO MAY YA LY YIE MAN CHUANG (50010740)
អេស្រ៊ី អ៊ិនធើណេសិនណល អ៊ិនវេសម៉ិន / ATHREE INTERNATIONAL INVESTMENT (50010764)
អង្គរ សាន់ អែន គុយ គីមយ៉ា ហៅស៍ / ANGKOR SUN AND KUY KIM YA HOUSE (50010801)
អាតឡូវ ខនសាល់ធីង I / ARTLOW CONSULTING I (50010809)
អា ហ្វុង ស៊ុបភើម៉ាឃីត / AR FONG SUPERMARKET (50010826)
អាម៉ាហ្សូន អេឡិចទ្រីក សប / AMAZON ELECTRIC SHOP (50010830)
អេស៊ីអិល ប៊ី ផនសប / ACL B PAWNSHOP (50010857)
អេពិក អ៊េចឌី អេទ្បីវាទ័រ / APEX HD ELEVATOR (50010891)
អេ & អ៊ីវ៉ា ហ្វេសិន / A & EVA FASHION (50010895)
អេននី វីល ម៉ាត / ANNIE WILL MART (50010912)
ភោជនីយដ្ឋាន អានហួយ ហ័យណាន / AN HUI HUAI NAN RESTAURANT (50010941)
អេនថូនី អឹផាតម៉ិន / ANTHONY APARTMENT (50010982)
អាណែត ប៊ីយូធី បាយ តូណូ ប្រ៊េន / AH NETH BEAUTY BY TONO BRAND (50011007)
ភោជនីយដ្ឋាន អេរេប៉ាស៍ ខារ៉ាខាស៍ / AREPAS CARACAS RESTAURANT (50011033)
អង្គរ ប៊ែដមីនថុន ក្លឹប សៀមរាប / ANGKOR BADMINTON CLUB SIEMREAP (50011075)
ភោជនីយដ្ឋាន អា វ៉ាង ជឺ ប៉ាវ អ៊ី / A WANG CHEU BAO YI RESTAURANT (50011102)
ភោជនីយដ្ឋាន អាយ សាង សួយ ជូរ / AI SHANG SHUI ZHOU RESTAURANT (50011103)
អេ ធី អ៊ីនជិននារីង វើកសប / A T ENGINEERING WORKSHOP (50011140)
អចលនទ្រព្យ ទឹកមាស ធីអិម / AKCHALNAKTROP TEUK MEAS TM (50011184)
អារុណរះ រ៉ូមែន / ARUNRASS ROMAIN (50011207)
អាអយ ហូសថេល / AOI HOSTEL (50011221)
អំពូលភ្លើង ពន្លឺពត៌មាន / AMPUOULPHLEUNG PONLEU POR MEAN (50011335)
អន្លង់ធំ វ៉ធើរ / ANLONGTHOUM WATER (50011341)
អង្គរ សរ សេរី រីហ្សត / ANGKOR SOR SEREI RESORT (50011347)
ផលិតកម្ម អាថាន់ ភាពយន្ត / ATHAN FILM PRODUCTION (50011350)
អេ អាយ ជី អូ កាហ្វេ / A I G O CAFE (50011416)
អាឡេវ៉ូ / ALEVU (50011433)
អាហារសមុទ្រចំហុយ ក្តាមមាស / AHAR SAMUTH CHOMHUY KDAMMEAS (50011470)
អសនៈ ប៊ូទិក វីឡា / ASANAK BOUTIQUE VILLA (50011487)
អាហារដ្ឋាន ផ្ទះបាយខ្មែរ ភឿក I / AHARATHAN PHTEAH BAI KHMER PERK I (50011504)
អចលនទ្រព្យ យាន ឈីងប៊ូ / ACHALNAKTRORP YAN QINGBU (50011521)
ភោជនីយដ្ឋាន អា ឡាំង សាវ ខាវ / AH LAING SHAO KHAO (50011555)
អារ៉ាយ៉ា អង្គរ រេសុីដេន I / ARAYA ANGKOR RESIDENCE I (50011563)
អាស៊ីង ភ្នាក់ងារទេសចរណ៏ / AH XING PNAKNGEATESACHOR (50011586)
សណ្ឋាគារ អង្គរ បូស្គី / ANGKOR BOSKY HOTEL (50011621)
ហាងនំប័ុង អូសសីុ ហ្គួមេ / AUSSIE GOURMET BAKERY (50001188)
សណ្ឋាគារ អង្គរ ហាវជាំង / ANGKOR HAOJIANG HOTEL (50011644)
អាប៉ូឡូ រៀល អុីស្ទេត / APOLLO REAL ESTATE (50011655)
ផ្ទះសំណាក់ អរុណរះ ហេងហេង / ARUNRAS HENG HENG GUESTHOUSE (50011717)
អូតូ ហាប់ (ខេមបូឌា) / AUTO HUB (CAMBODIA) (50011772)
ផ្សារទំនើប អាយ សាំង / AI SANG SUPERMARKET (50011790)
អាណាមេន ព្រីនធីង ហៅស៍ / ANAMAN PRINTING HOUSE (50011839)
អាតម៉ាលែន រីសត៍ / ATMALAND RESORT (50011842)
អាសុី ទ្រីលីហ្គល អុិនធើណេសិនណល ស្កូល (អេ.ធី.អាយ.អេស) / ASIA TRILINGUAL INTERNATIONAL SCHOOL (A.T.I.S) (50011
908)
អេ & ហ្សេត ម៉ាស្ទ័រ / A & Z MASTER (50011909)
អាយ យៀក ស៊ួយ ឡេវ ហ៊ួយ / AI YUE SHUI LIAO HUI (50011946)
អេស៊ា ម៉ាំងឃី ត្រាវែល (ខេមបូឌា) / ASIA MONKEY TRAVEL (CAMBODIA) (50011972)
អេលីស វីឡា / ALICE VILLA (50011993)
អង្គរ ប៊ែមប៊ូ ហ្វាយប៊ឺរ / ANGKOR BAMBOO FIBERS (50012035)
ភោជនីយដ្ឋាន អាដាហ្ស៊ី យេផេនីស / ADACHI JAPANESE RESTAURANT (50012044)
ភោជនីយដ្ឋាន អៃ សាង ជា ស៊ី / AI SHANG JIA SHI RESTAURANT (50012069)
អ៊ែតវេនឈ័រ សៀមរាប ​រេស៊ីដេន / ADVENTURE SIEMREAP RESIDENCE (50012135)
អាំង ប៊ីប៊ីឃ្យូ គ្រីល / ANG BBQ GRILL (50012143)
អចលនទ្រព្យ ចាសួន / AKCHALNAKTROP JASOUN (50012160)
អឹផាតម៉ិន 103 / APARTMENT 103 (50012161)
អេ.ឌី អាន ដុង អេនធើធេនមេន / A.D AN DONG ENTERTAINMENT (50012165)
អេស៊ា រីហ្វ្លេក ត្រាវែល អេន ធួរ / ASIA REFLECT TRAVEL AND TOUR (50012181)
អាណាខន់ដា តិចណូឡូជី / ANACONDA TECHNOLOGY (50012187)
អេឌីជេ គ្រីអេធីវ / ADJ CREATIVE (50012256)
អង្គរ ប៊ុន ធួរ / ANGKOR BUN TOUR (50012265)
អ៊ែមប៊ើ អង្គរ វីឡា ហូថេល អេន ស្ប៉ា / AMBER ANGKOR VILLA HOTEL AND SPA (50012311)
អៃ ឈីង ហាយ ប៊ូទីក ហូថេល / AI QING HAI BOUTIQUE HOTEL (50012317)
អនុស្សាវរីយ៍ល្អកោះរ៉ុង / ANUSAORY LAOR KOH RONG (50012340)
អង្គរ មែនហ្គោ ទ្រី រេស៊ីដេន ហូថេល / ANGKOR MANGO TREE RESIDENCE HOTEL (50012378)
សណ្ឋាគារ អេឡិចស៊ីស / ALEXIS HOTEL (50012418)
អចលនទ្រព្យ តាកទឺ / AKCHALNAK TROP TAK TEUR (50012422)
អាហារដ្ឋាន ហោ វ៉ាង សេង យ៉ាន / AHARATHAN HAO WANG SHENG YAN (50012429)
អាំង ថាណូ / AING TANO (50012439)
អាយសាង កុំព្យូទ័រ ណេត សប / AISANG COMPUTER NET SHOP (50012501)
អង្គរកាតាលីណា ប៊ូទីកវីឡា / ANGKOR CATALINA BOUTIQUE VILLA (50012509)
អេឌវ៉ាន់ឈ័រ ដូម រីសត / ADVENTURE DOME RESORT (50012521)
អេ.ភី.ធី.ធី ត្រាវែល & ធួរ / A.P.T.T TRAVEL & TOURS (50012528)
អឹផាតម៉ិន វីភី លី ហេង / APARTMENT VP LY HENG (50012540)
អាហារដ្ឋាន វីភី លី ហេង / AHARATHAN VP LY HENG (50012541)
អាដូរាស៍ ស្ប៉ា / ADORASS SPA (50012593)
ភោជនីយដ្ឋាន អាន់ណាពួរណា ឥណ្ឌាន / ANNAPURNA INDIAN RESTAURANT (50012620)
អវតាន ឆាតធើរ បាស / AVATAN CHARTER BUS (50012640)
អាយសាង កុំព្យូទ័រ ណេត សប សង្កាត់ 3 / AISANG COMPUTER NET SHOP SANGKAT 3 (50012660)
អាមីលីយ៉ា ប៊ីយូធី ខសមេធីក / AMILIYA BEAUTY COSMETICS (50012679)
អង្គរ ស្ក្រាប់ មរកត & ស្ប៉ា / ANGKOR SCRUB MOROKOT & SPA (50012680)
អេស៊ាន II ហាងបញ្ចាំ / ASIAN II HANG BANHCHAM (50012681)
អាហារដ្ឋាន ម្លប់សុបិន្ត / AHARATHAN MLOB SOBEN (50012682)
អា លីលី ប៊្រែន ប៊ីយូធី / AH LYLY BRAND BEAUTY (50012729)
អង្គរ ម៉ូនូមែន ហូថេល / ANGKOR MONUMENT HOTEL (50012733)
អេភីខេអ សៀមរាបអង្គរ ខនទ្រីសាយ / APKR SIEM REAP ANGKOR COUNTRY SIDE (50012752)
អង្គរ អុីន ហ្វ័រ ហូមស្តេ / ANGKOR UNE FOIS HOMESTAY (50000130)
អង្ករ ស្ថាបនិក / ANGKOR SATHABNEK (50001312)
អេ1 រីសាយខល / A1 RECYCLE (50012850)
អាដ័រ មី / ADORE ME (50012890)
អាវ គឹមហយ / AV KOEMHORY (50012896)
អឹផាតម៉ិន 19 / APARTMENT 19 (50012900)
អានខិន ខា រីភែ / ANKEN CAR REPAIR (50012909)
សណ្ឋាគារ អង្គរ សេនត្រល់ ផាក & ស្ប៉ា / ANGKOR CENTRAL PARK HOTEL & SPA (50012921)
អង្គរ ម៉ាវេលើស ត្រាវែល / ANGKOR MARVELLOUS TRAVEL (50012945)
អេនជូលី វីឡា / ANJULIE VILLA (50013023)
អាហារដ្ឋាន សំ សំអឿន ភូមិគៀនត្រជាក់ចិត្ត / AHARATHAN SAM SAM OEUN PHOM KEAN TRACHEAK CHET (50013043)
អេនី ផ្ទះ / ANNE PHTEASH (50013049)
អាបូរេស្ត ហូសថេល / ABOREST HOSTEL (50013073)
អេ.វី.អេស.អាយ.អេស.ប៊ី (ខេមបូឌា) / A.V.S.I.S.B (CAMBODIA) (50013098)
អា ហ៊ុយ ផាយ តាង / AR HUI PAI DANG (50013141)
អង្គរ ក្រូសង់ បេកឃើរី / ANGKOR CROISSANT BAKERY (50013147)
អាំង បានភាគ្យ / AING BANPHEAK (50001347)
អេនដ្រូ សម / ANDREW SAM (50013301)
ភោជនីយដ្ឋាន អារ៉ាប៊ីក សាវ៉ារម៉ា / ARABIC SHAWARMA RESTAURANT (50013339)
អាហារដ្ឋាន ថៅ យាន ទៅ ខៅ អ៊ី / AHARATHAN THAO YEAN TOUV KHAO YI (50013351)
អាហារដ្ឋាន យ៉ាន ជា ជឺ ប៉ៅ អ៊ី / AHARATHAN YAN CHEA CHEU PAO YI (50013352)
អាហារដ្ឋាន ចិន ឈីង អាយ ខៅ ប៉ា / AHARATHAN CHEN CHHING AI KHAO PA (50013353)
អាហារដ្ឋាន អ័រ អាយ អ័រ ជា ស៊ន់ ឆៃ អ៊ី / AHARATHAN OR AI OR CHEA SHUN CHHAI YI (50013354)
អាហារដ្ឋាន ចិន វួយ សា ស៊ាន សាវ ឈឺ / AHARATHAN CHEN WUY SHA SEAN SAV CHHER (50013355)
អាង-យី ប៊ីយូធី សប / ANG-YI BEAUTY SHOP (50013361)
ភោជនីយដ្ឋាន អាន់ណាពួរណា I ឥណ្ឌាន / ANNAPURNA I INDIAN RESTAURANT (50013381)
អឹឡាយអិន ត្រេនស្លេសិន រីសសស៍ / ALLIANCE TRANSLATION RESOURCE (50013407)
អង្គរ អ៉ិមប្រេស ឡេថិច / ANGKOR EMBRACE LATEX (50001377)
អេ.អ.វ៉ាយ ខសមេធីក / A.R.Y COSMETIC (50013492)
អេស៊ា ម៉ាឃីត ផាប់ស្ទ្រីត / ASIA MARKET PUB STREET (50001395)
អឹឈីវើ អិុនធើណេសិនណល ស្គូល / ACHIEVERS INTERNATIONAL SCHOOL (50001513)
អាតវ៉កស៍ ស្ទូឌីយោ (ខេមបូឌា) / ARTWORX STUDIOS (CAMBODIA) (50001559)
អេឃ្វើរៀស ហូថេល អេន អឺបេន រីហ្សត / AQUARIUS HOTEL AND URBAN RESORT (50001601)
អេមីធី អង្គរ សឺវីស / AMITY ANGKOR SERVICE (50001604)
អាទីឡា អង្គរ / ATTILA ANGKOR (50001613)
ផ្ទះសំណាក់ អង្គរ ភ្នំមាស / ANGKOR PHNOM MEAS GUESTHOUSE (50001646)
អង្គរ សេនលីន ត្រាវែល / ANGKOR SENLIN TRAVEL (50001659)
អាប៊ែនឌែន ឡៃ អុិនធើណេសិនណល​ ស្គូល / ABUNDANT LIFE INTERNATIONAL SCHOOL (50001664)
សណ្ឋាគារ អង្គរ ប៊ូធីក ត្រភីក / ANGKOR BOUTIQUE TROPIC HOTEL (50001668)
អង្គរ រង់ដេវូ វីឡា / ANGKOR RENDEZVOUS VILLA (50001703)
អង្គរ ហាត បឹងហ្គាឡូ / ANGKOR HEART BUNGALOW (50001751)
អេស៊ា អិចផ្លរើ ត្រាវែល / ASIA EXPLORER TRAVEL (50001758)
អាស្ត្រូ អេស៊ា ធួរ & ត្រាវែល សឺវីស / ASTRO ASIA TOUR & TRAVEL SERVICES (50001767)
អូន សុគន្ធា / AUN SOKUNTHEA (50001796)
អង្គរដេសធីណេសិនត្រាវែល / ANGKOR DESTINATION TRAVEL (50001799)
អង្គរ ធួរ បាស ត្រេនស្ព័តថេសិន / ANGKOR TOUR BUS TRANSPORTATION (50001809)
អង្គរ អ៉ិនធើ ហ្វីតណេស / ANGKOR INTER FITNESS (50001836)
អ៊ែតមឺស៊្វែរ រេសថឹរេន / ATMO SPHERE RESTAURANT (50001849)
អេស៊ា ជេមអាប់ ត្រាវែល / ASIA JAMUP TRAVEL (50001901)
អ៊ែបសឹលូត ធួរ & ត្រាវែល (ខេមបូឌា) / ABSOLUTE TOURS & TRAVEL (CAMBODIA) (50001903)
អង្គរ អេ ណាទួរ វ៉ូយ៉ាហ្ស៍ / ANGKOR ET NATURE VOYAGE (50001942)
អង្គរ ម៉ាល់ធីម៉ីឌៀ / ANGKOR MULTIMEDIA (50001949)
អេ.អេស.អេច ត្រាវែល I / A.S.H TRAVEL I (50001955)
អេអេ ប្រាយវេត ជើនី អេន វីឡា / AA PRIVATE JOURNEYS AND VILLAS (50001966)
អង្គរ ថោន ថ្មី / ANGKOR TOWN THMEY (50001990)
អឹឃូស្ទីក អេនធើធេនមិុន / ACOUSTIC ENTERTAINMENT (50002025)
អង្គរ បុរីប្រិមប្រិយ៍ ម៉ាស្សា ខ្មែរ បុរាណ / ANGKOR BOREIBREMBREI MASSAGE KHMER BORAN (50000204)
អប្សរា ស៊ីល សេនធ័រ / APSARA SILK CENTER (50002122)
អ៊ែតវេនឈ័រ អេស៊ា ត្រាវែល & ធួរ / ADVENTURE ASIA TRAVEL & TOURS (50002161)
អង្គរ ហ្វីលីង ប៊ូទីក / ANGKOR FEELING BOUTIQUE (50002206)
សណ្ឋាគារ អេ-វើខេសិន ប៊ូធីក / A-VACATION BOUTIQUE HOTEL (50002210)
អង្គរ ដឹ ថេន ប៊ែលស៍ / ANGKOR THE TEN BELLS (50002218)
អេមប្រាយ ថេលើរ / AMPRISE TAILOR (50002223)
អេស៊ា​ ហេបភី​ វីឡា / ASIA HAPPY VILLA (50002232)
អេ សិដ្ឋ អូតូ / A SETH AUTO (50002321)
អង្គរទិព្វ លីគង កាត់ដេរ / ANGKOR TIP LYKORNG KATDE (50002374)

I am not sure why you want to mimick the POST method when you can automate it easily with selenium. As you are already using selenium you can automate without being bothered about underlying complexities of preparing raw requests and parameter passing.
With selenium its quite easy to automate the whole flow. Unless,
You want to use python requests finally to reduce the time taken by browser load time.
You are expecting changes in the site HTML frequently (though change can also happen to the POST request, small parameter change can break the POST request)
If you have above two reason or a third one please don't accept this as an answer, update the question and I will remove this as an answer.
Below code can automate the flow for you, which I have generated with the help of Selenium IDE. Done some modifications to use labels instead of other hardcoded xpath combinations, added visibility check and clickable check to it.
import pytest
import time
import json
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.common.exceptions import ElementNotVisibleException
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://www.businessregistration.moc.gov.kh/")
driver.maximize_window()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[contains(text(),\'Online Services\')]"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[text()='Sole Proprietorships']"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[#class='appSubMenuLink menu-brSoleProprietorSearch']/span[text()='Search Entity']"))).click()
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.ID, "QueryString")))
driver.find_element(By.ID, "QueryString").send_keys("a")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[#class=\'appReceiveFocus\' and text()=\'Search\']"))).click()
time.sleep(10)
driver.quit()
Of course it doesn't cover the part to extract data from the grid after search. Tested with chromedriver 80.0 in Ubuntu.

Related

How can I distribute member to it's role on python in bs4 + lxml from site?

I want to parse a role-cards on python from nft-collectible site: https://imaginaryones.com
Here's my code:
find_title = soup.find("title")
find_all_card_titles = soup.findAll("div", class_="v-card__title")
find_all_names = soup.findAll("h4", class_="member__subtitle")
find_all_links_url = soup.findAll("a")
print('Name -', find_title.text)
for names in find_all_names:
for roles in find_all_card_titles:
print(names.text, '-', roles.text)
i get smth like:
Clement - Creator
Clement - Biz / Strategist
Clement - Artist / Partnerships
Clement - PM / Community
Clement - Tech / Contracts
David - Creator
David - Biz / Strategist
David - Artist / Partnerships
David - PM / Community
David - Tech / Contracts
etc...
but i need a result like:
Clement - Creator
David - Biz / Strategist
Gregory - Artist / Partnerships
etc...
How can i do this?
Try:
import requests
from bs4 import BeautifulSoup
url = "https://imaginaryones.com"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for member in soup.select(".team__list .member__subtitle"):
print(member.text, "-", member.find_previous(class_="v-card__title").text)
Prints:
Clement - Creator
David - Biz / Strategist
Gregory - Artist / Partnerships
Caleb - PM / Community
Jerome - Tech / Contracts

How do I remove the <a href... tags from my web scrapper

So, right now, what I'm trying to do is that I'm trying to scrape a table from rottentomatoes.com and but every time I run the code, I'm facing an issue that it just prints <a href tags. For now, all I want are the Movie titles numbered.
This is my code so far:
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
url = "https://www.rottentomatoes.com/top/bestofrt/"
headers = {"Accept-Language": "en-US, en;q=0.5"}
titles = []
year_released = []
def get_requests():
try:
result = requests.get(url=url)
soup = BeautifulSoup(result.text, 'html.parser')
table = soup.find('table', class_='table')
for name in table:
td = soup.find_all('a', class_='unstyled articleLink')
titles.append(td)
print(titles)
break
except:
print("The result could not get fetched")
And this is my output:
[[Opening This Week, Top Box Office, Coming Soon to Theaters, Weekend Earnings, Certified Fresh Movies, On Dvd & Streaming, VUDU, Netflix Streaming, iTunes, Amazon and Amazon Prime, Top DVD & Streaming, New Releases, Coming Soon to DVD, Certified Fresh Movies, Browse All, Top Movies, Trailers, Forums,
View All
,
View All
, Top TV Shows, Certified Fresh TV, 24 Frames, All-Time Lists, Binge Guide, Comics on TV, Countdown, Critics Consensus, Five Favorite Films, Now Streaming, Parental Guidance, Red Carpet Roundup, Scorecards, Sub-Cult, Total Recall, Video Interviews, Weekend Box Office, Weekly Ketchup, What to Watch, The Zeros, View All, View All, View All,
It Happened One Night (1934),
Citizen Kane (1941),
The Wizard of Oz (1939),
Modern Times (1936),
Black Panther (2018),
Parasite (Gisaengchung) (2019),
Avengers: Endgame (2019),
Casablanca (1942),
Knives Out (2019),
Us (2019),
Toy Story 4 (2019),
Lady Bird (2017),
Mission: Impossible - Fallout (2018),
BlacKkKlansman (2018),
Get Out (2017),
The Irishman (2019),
The Godfather (1972),
Mad Max: Fury Road (2015),
Spider-Man: Into the Spider-Verse (2018),
Moonlight (2016),
Sunset Boulevard (1950),
All About Eve (1950),
The Cabinet of Dr. Caligari (Das Cabinet des Dr. Caligari) (1920),
The Philadelphia Story (1940),
Roma (2018),
Wonder Woman (2017),
A Star Is Born (2018),
Inside Out (2015),
A Quiet Place (2018),
One Night in Miami (2020),
Eighth Grade (2018),
Rebecca (1940),
Booksmart (2019),
Logan (2017),
His Girl Friday (1940),
Portrait of a Lady on Fire (Portrait de la jeune fille en feu) (2020),
Coco (2017),
Dunkirk (2017),
Star Wars: The Last Jedi (2017),
A Night at the Opera (1935),
The Shape of Water (2017),
Thor: Ragnarok (2017),
Spotlight (2015),
The Farewell (2019),
Selma (2014),
The Third Man (1949),
Rear Window (1954),
E.T. The Extra-Terrestrial (1982),
Seven Samurai (Shichinin no Samurai) (1956),
La Grande illusion (Grand Illusion) (1938),
Arrival (2016),
Singin' in the Rain (1952),
The Favourite (2018),
Double Indemnity (1944),
All Quiet on the Western Front (1930),
Snow White and the Seven Dwarfs (1937),
Marriage Story (2019),
The Big Sick (2017),
On the Waterfront (1954),
Star Wars: Episode VII - The Force Awakens (2015),
An American in Paris (1951),
The Best Years of Our Lives (1946),
Metropolis (1927),
Boyhood (2014),
Gravity (2013),
Leave No Trace (2018),
The Maltese Falcon (1941),
The Invisible Man (2020),
12 Years a Slave (2013),
Once Upon a Time In Hollywood (2019),
Argo (2012),
Soul (2020),
Ma Rainey's Black Bottom (2020),
The Kid (1921),
Manchester by the Sea (2016),
Nosferatu, a Symphony of Horror (Nosferatu, eine Symphonie des Grauens) (Nosferatu the Vampire) (1922),
The Adventures of Robin Hood (1938),
La La Land (2016),
North by Northwest (1959),
Laura (1944),
Spider-Man: Far From Home (2019),
Incredibles 2 (2018),
Zootopia (2016),
Alien (1979),
King Kong (1933),
Shadow of a Doubt (1943),
Call Me by Your Name (2018),
Psycho (1960),
1917 (2020),
L.A. Confidential (1997),
The Florida Project (2017),
War for the Planet of the Apes (2017),
Paddington 2 (2018),
A Hard Day's Night (1964),
Widows (2018),
Never Rarely Sometimes Always (2020),
Baby Driver (2017),
Spider-Man: Homecoming (2017),
The Godfather, Part II (1974),
The Battle of Algiers (La Battaglia di Algeri) (1967), View All, View All]]
Reading tables via pandas.read_html() as provided by #F.Hoque would probably the leaner approache but you can also get your results with BeautifulSoup only.
Iterate over all <tr> of the <table>, pick information from tags via .text / .get_text() and store it structured in list of dicts:
data = []
for row in soup.select('table.table tr')[1:]:
data.append({
'rank': row.td.text,
'title': row.a.text.split(' (')[0].strip(),
'releaseYear': row.a.text.split(' (')[1][:-1]
})
Example
import requests
from bs4 import BeautifulSoup
url = "https://www.rottentomatoes.com/top/bestofrt/"
headers = {"Accept-Language": "en-US, en;q=0.5"}
result = requests.get(url=url)
soup = BeautifulSoup(result.text, 'html.parser')
data = []
for row in soup.select('table.table tr')[1:]:
data.append({
'rank': row.td.text,
'title': row.a.text.split(' (')[0].strip(),
'releaseYear': row.a.text.split(' (')[1][:-1]
})
data
Output
[{'rank': '1.', 'title': 'It Happened One Night', 'releaseYear': '1934'},
{'rank': '2.', 'title': 'Citizen Kane', 'releaseYear': '1941'},
{'rank': '3.', 'title': 'The Wizard of Oz', 'releaseYear': '1939'},
{'rank': '4.', 'title': 'Modern Times', 'releaseYear': '1936'},
{'rank': '5.', 'title': 'Black Panther', 'releaseYear': '2018'},...]

Combine multiple lines (line break) into one line in python

So I was crawling articles from a site but the summary had multiple paragraphs and I want them in one line.
eg.
Line 1 : Title 1
Line 2 : Summary para1 Summary para2
These are my current code from this site
https://theaizawlpost.org/health-minister-in-fimkhur-turin-mipui-ngen-nawn/
import csv
import pandas as pd
import requests
from bs4 import BeautifulSoup
from datetime import date
import urllib
from urllib.request import urlopen
csv_file = open('cms_scrape.csv', 'w')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['title', 'summary'])
source = requests.get('https://theaizawlpost.org/health-minister-in-fimkhur-turin-mipui-ngen-nawn/').text
soup = BeautifulSoup(source, 'lxml')
article = soup.find('article')
title = article.find('span', class_='current').text
print(title)
summary = article.find('div', class_='entry-content entry clearfix').text
print(summary)
csv_writer.writerow([title, summary.strip()])
csv_file.close()
Set the strip=True argument in get_text() to remove a newline (\n):
summary = article.find('div', class_='entry-content entry clearfix').get_text(strip=True)
Since you have already stripped the whitespace from summary, don't call .strip() when writing to the CSV file, instead, use:
csv_writer.writerow([title, summary])
Output:
Health minister-in fimkhur turin mipui ngen nawn
Sawrkarin May ni 31 thleng total lockdown a pawhsei leh hnuah, health minister Dr. R. Lalthangliana chuan nimin khan mipui hnena ngenna leh thuchah tichhuakin, total lockdown chu kan damkhawchhuahna tur a nih tih hriaa inkhuahkhirhna dan te tha taka zawm chunga fimkhurna ngai pawimawh zel turin mipui a chah.Health minister Dr. Thangtea chuan, kan zavaia kan tanrual a, kan tawrh leh rih hram hram a tul dawn a, chutih rualin, tumah riltam leh chhuanchhama kan awm hi sawrkarin a phal lova, kohhran leh khawtlang hruaitute, Local task Force te nen tangrualin theihtawp kan chhuah zel dawn a ni, a ti a. Hetih rual hian sum lakluhna te a lo kiam tak avangin chhungtinin mahni zawnah theuh inrenchem tum ila, fimkhur takin, chi-ai si lovin awm ila, inlenpawh lo turin leh a tul tawpkhawkah lo chuan pawn chhuak rih lo turin kan inchah nawn leh a, kan duh reng vang pawh ni lovin, nunna chhan nan kan tawrh tlan rih hram hram a ngai a ni, a ti bawk.Total lockdown kar hnih kalpui hnu pawha hri kaiin kian lam a la pan theih loh chungchangah health minister chuan, inkharkhip laiin mipui lam hi kan fimkhur tawk lo deuh em tih zawhna a awm hial a ni, a ti a. “Nikum lama khauh taka bazar-na hmuna social distancing kan zawm ang khan kan zawm ta lo em ni aw? ka ti a, mipuite pawh bazar-ah leh puipunna hmunah duty te hmuh phak loha kan awm hian kan fimkhur tawk lo palh ang tih ka hlau a. Mahni theuh kan pawimawh ber a ni tih hriain kan inkhuahkhirhna dan hi khauh deuh mah se, kan zavaia kan himna tura ruahman a ni tih i hre nawn fo ang u,” tiin mipui a chah a.Hri vanga thi awm thin chu lungchhiatthlak a tih thu sawiin health minister chuan, “Kan state-a thi zat hi a tam tawh viaua a lan laiin hmarchhak state dang te leh India ram ngaihtuah chuan kan dinhmun a la ziaawm hle a. Kan positivity rate a sang kan tih pawh hi test kan neih that vang a ni ve pakhat a, kan test percentage hi 31.49 niin hmarchhak state-ah Arunachal Pradesh tih lohah chuan test tam ber kan ni,” a ti.Vaccine chungchangah, sawrkarin vaccine a chah mek thu leh, Central Ministry lamin inkaihhruaina a siam ang zelin chak taka vaccine pek hna kalpui zel tum a nih thu a sawi a. Mizoramin khawvel ram hrang hrang United Kingdom, Egypt, Ireland, Switzerland, Turkey, China, Taiwan atangtein kan mamawh hmanrua leh khawl chi hrang hrang kan dawng tawh tih sawiin, USA, Spain leh Kuwait atang pawhin tanpuina dawn tur a la awm thu te, World Health Organisation atanga oxygen concentrator 150 dawn a nih thu te pawh a sawi bawk.Minister chuan, ram hruaitu Minister-te leh MLA zawng zawngte an thawkrim hle tih sawiin, “Kan thawhhona a thatna leh mipuite thlawpna avangin he hripui hi kan hneh ngei dawn a ni,” a ti.“Total lockdown puang tura kohhran, Local Council Association, NGO leh VC Association te bakah mi thahnemngaite ngenna a lawmawmin a zahawm ka ti hle a, nitina eizawngte lakah inthlahrunawm viau mahse state dangte pawhin lockdown lo chu kawng dang zawh tur an hre bik meuh lo tih i hre tlang ila. Lockdown chhung hian kan frontline worker te leh healthcare worker ten nasa takin hma an la a, theihtawp an chhuah a ni tih i hriatsak ang u. Kan rorelah sawisel tur leh khamkhawp loh tam tak in hmu thin tih pawh ka hria a, khawvel pum luhchhuahtu hri a ni a, mi zawng zawng min nghawng vek a, rorel thiam a har a, vawiina tha kan tih kha naktuk lawkah a lo tha leh lova, engmah experiment han tih hman a ni si lo, kan rorelah leh kan tawngkam chhuakah te in rilru kan tih nat a awm chuan khawngaihtakin min ngaidam ula, inhriatthiamna nen dawhthei takin indawm tlang ila, thurawn tha leh fing engtiklai pawhin kan dawng thei reng a ni,” health minister chuan a ti a. Pathian venhimna leh a chhanchhuahna bang lova dil turin leh, malsawm tlak ni tura mahni lamin kan tih ve theihte ti ve turin Zoram mipuite a ngen a ni.
What you want to do is replace all the newlines in the string. You can do this by
summary.replace("\n"," ")
The first string in this is what we want to replace
The second string is the what we want in that place

Unable to scrape similar links from different depth out of a webpage

I've created a script in python to parse different links from a webpage. There are two section in the landing page. One is Top Experiences and the other is More Experiences. My current attempt can fetch the links from both the categories.
The type of links I wanna collect are (few of them) under the Top Experiences section at this moment. However, when I traverse the links under More Experiences section, I can see that they all lead to the page in which there is a section named Experiences under which there are links that are similar to the links under Top Experiences in the landing page. I wanna grab them all.
One such desirable link I'm after looks like: https://www.airbnb.com/experiences/20712?source=seo.
website link
My current attempt fetches the links from both the categories:
import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup
URL = "https://www.airbnb.com/sitemaps/v2/experiences_pdp-L0-0"
def get_links(link):
res = requests.get(link)
soup = BeautifulSoup(res.text,"lxml")
items = [urljoin(link,item.get("href")) for item in soup.select("div[style='margin-top:16px'] a._1f0v6pq")]
return items
if __name__ == '__main__':
for item in get_links(URL):
print(item)
How can I parse all the links under Top Experiences section along with the links under Experiences section that can be found upon traversing the links under More Experiences?
Please check out the image if anything unclear. I used a pen available in paint so the writing may be a little hard to understand.
The solution is slightly tricky. It can be achieved in several ways. The one I find most useful is use the links under More Experiences within get_links() function recursively. All the links under More Experiences have a common keyword _pdp-.
So, when you define condional statement within the function to make the links sieve through the function get_links() recursively then the else block will produces the desired links. Most important thing to notice is that all the desired links are within the class _1f0v6pq So the logic of getting the links is fairly easy .
import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup
URL = "https://www.airbnb.com/sitemaps/v2/experiences_pdp-L0-0"
def get_links(link):
res = requests.get(link)
soup = BeautifulSoup(res.text,"lxml")
for item in soup.select("div[style='margin-top:16px'] a._1f0v6pq"):
if "_pdp-" in item.get("href"):
get_links(urljoin(URL,item.get("href")))
else:
print(urljoin(URL,item.get("href")))
if __name__ == '__main__':
get_links(URL)
Process:
Get all Top Experiences links
Get all More Experiences links
Send a request to all More Experiences links one by one and get the links under Experiences in each page.
The div under which the links are present are same for all the pages have the same class _12kw8n71
import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup
from time import sleep
from random import randint
URL = "https://www.airbnb.com/sitemaps/v2/experiences_pdp-L0-0"
res = requests.get(URL)
soup = BeautifulSoup(res.text,"lxml")
top_experiences= [urljoin(URL,item.get("href")) for item in soup.find_all("div",class_="_12kw8n71")[0].find_all('a')]
more_experiences= [urljoin(URL,item.get("href")) for item in soup.find_all("div",class_="_12kw8n71")[1].find_all('a')]
generated_experiences=[]
#visit each link in more_experiences
for url in more_experiences:
sleep(randint(1,10))#avoid blocking by putting some delay
generated_experiences.extend([urljoin(URL,item.get("href")) for item in soup.find_all("div",class_="_12kw8n71")[0].find_all('a')])
Notes:
Your required links will be present in three lists top_experiences , more_experiences and generated_experiences
I have added random delay to avoid getting blocked.
Not printing the lists as it will be too long.
top_experiences - 50 links
more_experiences - 299 links
generated_experiences -14950 links
Seem like both the "Top Experience" and "More experiences" links share the same class so you can just use .find_all to obtain the links.
import requests
#from urllib.parse import urljoin
from bs4 import BeautifulSoup
# URL to scrape
url = "https://www.airbnb.com/sitemaps/v2/experiences_pdp-L0-0"
# Make request and Initialize BS4 with request content
req = requests.get(url)
soup = BeautifulSoup(req.content, "lxml")
# Tag that contains "Top Experiences" and "More Experiences"
soup.find_all(class_="_l8g1fr")
# Test Code
#Prints title of links and the href
links = soup.find_all(class_="_l8g1fr")
for link in links:
print(link.find("a").get_text())
print(link.find("a").get('href'))
Refactor code to meet your coding paradigm.
You can scrape from the divs with class "_12kw8n71":
from bs4 import BeautifulSoup as soup
import requests
d = soup(requests.get('https://www.airbnb.com/sitemaps/v2/experiences_pdp-L0-0').text, 'html.parser')
a, b, *_ = d.find_all('div', {'class':'_12kw8n71'})
result = {'top_experiences':[[i.text, i['href']] for i in a.find_all('a')], 'more_experiences':[[i.text, i['href']] for i in b.find_all('a')]}
Output (Only top experiences and part of the links from more experiences, as the full output exceeds Stackoverflow's character limit):
{'top_experiences': [["#1 HOLLYWOOD SIGN TOUR - WORLD'S BEST", '/experiences/26790?source=seo'], ['**#1-COMEDY/INSTAGRAM HOLLYWOOD WALK**', '/experiences/94033?source=seo'], ["A Potter's Wheel in Brooklyn", '/experiences/139532?source=seo'], ['Absolute: Seoul Pub Crawl & Party', '/experiences/161886?source=seo'], ['Amsterdam Experience Cruise', '/experiences/145329?source=seo'], ['Argentine Tango classes in London', '/experiences/241771?source=seo'], ['Axe throwing Canadian Experience', '/experiences/372251?source=seo'], ['Be a chocolate maker for a day', '/experiences/161912?source=seo'], ['Chianti in a Glass', '/experiences/179469?source=seo'], ['Cooking class in the Chianti Hills', '/experiences/25122?source=seo'], ['Cycle and Snack through hidden Bangkok', '/experiences/315116?source=seo'], ['Discover Valpolicella', '/experiences/77468?source=seo'], ["Don't watch Soweto, be part of Soweto", '/experiences/243188?source=seo'], ['Explore Pompeii w/ an Archaeologist', '/experiences/176605?source=seo'], ['Explore backstreets with a historian', '/experiences/90528?source=seo'], ['Explore the hidden food gems of Athens', '/experiences/256850?source=seo'], ['Fall and Winter Forest Trail Rides', '/experiences/236040?source=seo'], ['Foodies of Palermo - street food tour', '/experiences/212673?source=seo'], ['Forge a silver ring with jewellers', '/experiences/209198?source=seo'], ['Handmade Family Pasta & Tiramisù', '/experiences/115906?source=seo'], ['Hidden Gems of Old Delhi', '/experiences/154177?source=seo'], ['Hike Runyon Canyon with a rescue dog', '/experiences/82265?source=seo'], ['Hike Table Mountain with a Guide!', '/experiences/137248?source=seo'], ['Kayak in La Jolla', '/experiences/132375?source=seo'], ['Lakehouse Jazz', '/experiences/53239?source=seo'], ["Lisbon's best flavors", '/experiences/64564?source=seo'], ["London's Top Sights Tour : Fun Guide!", '/experiences/203605?source=seo'], ['Make ramen noodles from scratch', '/experiences/61279?source=seo'], ['Mount Etna excursion and lava tubes', '/experiences/221142?source=seo'], ['Must Have L.A. Pictures!', '/experiences/59665?source=seo'], ['MySurf School Bali', '/experiences/153910?source=seo'], ['PASTAMANIA', '/experiences/103280?source=seo'], ['Paddle with the Penguins', '/experiences/112515?source=seo'], ['Paella Maestro', '/experiences/51311?source=seo'], ["Paris' Best Kept Secrets Tour", '/experiences/113388?source=seo'], ['Play, Walk & Feed Anteaters +Kinkajou', '/experiences/270405?source=seo'], ['Red Dunes Safari & BBQ #The Real Camp', '/experiences/105242?source=seo'], ['San José Chocolate Tasting Tour', '/experiences/280822?source=seo'], ['Snorkel in Manly with a local expert', '/experiences/66873?source=seo'], ['Sunrise SUP into Caves & Grottos', '/experiences/242911?source=seo'], ['Sushi-making Experience', '/experiences/53271?source=seo'], ['Swim with sea lions in their natural habitat', '/experiences/327235?source=seo'], ['The Drinking & Prohibition walk', '/experiences/102778?source=seo'], ['The Ultimate Harry Potter Walking Tour', '/experiences/188432?source=seo'], ['Tsukiji (Old) vs Toyosu (New) S.S Tour', '/experiences/71924?source=seo'], ['Wolf Encounter', '/experiences/47240?source=seo'], ['XXL Wild&Organic Roman food: Eat&Learn', '/experiences/161461?source=seo'], ['Your First Chinese Seal-engraving Work', '/experiences/166172?source=seo'], ['Под руку с духами по старой Москве', '/experiences/94558?source=seo'], ["⭐BIKE to taste the world's BEST TACOS!", '/experiences/75047?source=seo']], 'more_experiences': [[" Flying Trapeze - 'Ten Things I Hate About You' Tour", '/sitemaps/v2/experiences_pdp-L1-0'], ["'The Devil in the White City' Tour - 23:59 in Paris", '/sitemaps/v2/experiences_pdp-L1-1'], ["24h en Republica Dominicana - 80s Cover Band's Secret Gig[FUKUOKA]", '/sitemaps/v2/experiences_pdp-L1-2'], ['80s Nights - A Journey Through History', '/sitemaps/v2/experiences_pdp-L1-3'], ["A Journey for the Tastebuds, in Malaga - A Tea Sommelier's Zen Tea Party", '/sitemaps/v2/experiences_pdp-L1-4'], ['A Time for Home - A music walk with a french composer', '/sitemaps/v2/experiences_pdp-L1-5'], ['A musical walk in Salzburg - ANALOG PHOTOGRAPHY in DC! Shoot FILM!', '/sitemaps/v2/experiences_pdp-L1-6'], ['ANGEL, ENERGIA GASTRONOMICA ANCESTRAL! - Acuarela al aire libre en Poblenou', '/sitemaps/v2/experiences_pdp-L1-7'], ['Acupuncture in Historic Clinic - Afternoon tea at the Casavant Villa', '/sitemaps/v2/experiences_pdp-L1-8'], ["Afterwork dans un Atelier d'Artistes - Alpine Raft Expérience", '/sitemaps/v2/experiences_pdp-L1-9'], ['Alpine Tour Skiing Uphill/Downhill - Amélie en live!', '/sitemaps/v2/experiences_pdp-L1-10'], ['An African Dance & Cultural Experience - Another BKK', '/sitemaps/v2/experiences_pdp-L1-11'], ['Another Etna: Trekking, Food and Wine - Aprende o perfecciona bailes caribeños', '/sitemaps/v2/experiences_pdp-L1-12'], ['Aprende una actividad unica de Oaxaca - Around historical area with lovely dog', '/sitemaps/v2/experiences_pdp-L1-13'], ['Around the Neva & its embankments - Art tour Soumaya & Jumex Museums visit', '/sitemaps/v2/experiences_pdp-L1-14'], ['Art tour at Bellas Artes - Ascent to the crater of the Puracé volcano', '/sitemaps/v2/experiences_pdp-L1-15'], ['Asciende a la cima del Batea Mahuida - Atlas Mountains Day Trip: imlil Valley', '/sitemaps/v2/experiences_pdp-L1-16'], ['Atlas Mountains Full day Cultural Trip - Authentic Tango Night', '/sitemaps/v2/experiences_pdp-L1-17'], ['Authentic Tango with Live Music - BIARRITZ WALKING HISTORICAL TOUR', '/sitemaps/v2/experiences_pdp-L1-18'], ['BIG SUP Fun - Paddle Sevilla! - Bake ‘Dorayaki’ Japanese pancake', '/sitemaps/v2/experiences_pdp-L1-19'], ['Bakery Hop with Perth Brunch Bloggers - Banyuatis-Munduk Amazing Bike Tour', '/sitemaps/v2/experiences_pdp-L1-20'], ["Baptism of Rome for First-Timers - Batik: What's That?", '/sitemaps/v2/experiences_pdp-L1-21'], ['Batom vermelho - um estilo de vida - Beach Exercise Under The Sun', '/sitemaps/v2/experiences_pdp-L1-22'], ['Beach FlowJAM - Become Captain of the Day with Vicky', '/sitemaps/v2/experiences_pdp-L1-23'], ['Become Reiki 1 Certified in Hawaii! - Beginner Workshop', '/sitemaps/v2/experiences_pdp-L1-24'], ['Beginner and Kid Friendly Kayak Tour - Best Croissants + Coffee of Paris Walk', '/sitemaps/v2/experiences_pdp-L1-25'], ['Best Dirt Biking Experience in KL! - Big Easy Food Tours', '/sitemaps/v2/experiences_pdp-L1-26'], ['Big Fish In The Big Apple - Bike the Wineries', '/sitemaps/v2/experiences_pdp-L1-27'], ['Bike the local backroads - Birdwatching al Parco del Circeo', '/sitemaps/v2/experiences_pdp-L1-28'], ['Birdwatching and camping - Boat Trip at Arrabida with Lunch', '/sitemaps/v2/experiences_pdp-L1-29'], ['Boat Trip to historic Clonmacnoise - Boozin in Brooklyn: Spirits & Beer', '/sitemaps/v2/experiences_pdp-L1-30'], ["Boozy Icy Secrets - Breakfast Coffee Yoga # Kellogg's Cafe", '/sitemaps/v2/experiences_pdp-L1-31'], ['Breakfast With The (Chicago) Bear - Brugge voor jou.', '/sitemaps/v2/experiences_pdp-L1-32'], ['Brumby (wild) Horse Experience - Bushwalk with Bob', '/sitemaps/v2/experiences_pdp-L1-33'], ["Bushwalking; Airlie's best kept secret - Cabalgata, pasión por la aventura.", '/sitemaps/v2/experiences_pdp-L1-34'], ['Cabalgatas "Buscando el sol de ayer" - Camina por el parque observando aves', '/sitemaps/v2/experiences_pdp-L1-35'], ['Camina por la Historia de Jujuy - Canoeing with Champagne and Pastries', '/sitemaps/v2/experiences_pdp-L1-36'], ['Canoeing, Kayaking, Cycling in Catba - Capture your relationship in photos', '/sitemaps/v2/experiences_pdp-L1-37'], ['Capture your smile in beautiful hanbok - Catalonia Sailing Xperience', '/sitemaps/v2/experiences_pdp-L1-38'], ['Catalonia from Above - Cervantes Tapas Crawl Alcalá d Henares', '/sitemaps/v2/experiences_pdp-L1-39'], ['Cervezas artesanales de Querétaro - Cherry blossom viewing party in Ueno', '/sitemaps/v2/experiences_pdp-L1-40'], ['Cherry blossoms and your photo session - Chocolate tasting and tour', '/sitemaps/v2/experiences_pdp-L1-41'], ['Chocolate therapy - City Tour With Visit to Teatro Colon', '/sitemaps/v2/experiences_pdp-L1-42'], ['City Tour around Cali - Climb a sea cliff in El Empordà coast', '/sitemaps/v2/experiences_pdp-L1-43'], ['Climb an iconic hill with an instructor - Cocktails and Canapés', '/sitemaps/v2/experiences_pdp-L1-44'], ['Cocktails and Canvas - Columbia River King Salmon Fishing', '/sitemaps/v2/experiences_pdp-L1-45'], ['Columbus Cemetery. History and Art - Conhecendo o centro do recife', '/sitemaps/v2/experiences_pdp-L1-46'], ['Conhecendo o litoral e as belas praias - Cook great traditional dish with Heba', '/sitemaps/v2/experiences_pdp-L1-47'], ['Cook with Cristiana and Mamma Nora - Cook shelf mania.. authentic n savory', '/sitemaps/v2/experiences_pdp-L1-48'], ['Cook tapas at a secret private eatery - Cooking Spanish "tapas"', '/sitemaps/v2/experiences_pdp-L1-49'], ['Cooking Spanish classics with Soul! - Copenhagen Skyline Art Workshop', '/sitemaps/v2/experiences_pdp-L1-50'], ['Copenhagen Small Group Tour - Couture dans un atelier de Montmartre', '/sitemaps/v2/experiences_pdp-L1-51'], ["Covered Walkways Through a Writer’s Eyes - Crawlin' in the City - Bar Crawl", '/sitemaps/v2/experiences_pdp-L1-52'], ["Crazy French Sauces - Create a Story in a Writer's Retreat", '/sitemaps/v2/experiences_pdp-L1-53'], ['Create a Travel Journal - Create with a pro tie-dye artist', '/sitemaps/v2/experiences_pdp-L1-54'], ['Create wooden guitar picks and jewelry - Creative Raw Vegan Cooking Workshop', '/sitemaps/v2/experiences_pdp-L1-55'], ['Creative Rendezvous in Copenhagen - Créer ses cosmétiques naturels (DIY)', '/sitemaps/v2/experiences_pdp-L1-56'], ['Créer un Mandala Energétique - Cupcake Bake & Decorate - Spitalfields', '/sitemaps/v2/experiences_pdp-L1-57'], ['Cupcake Bouquet Class - Cycling in quiet NE. Connecticut.', '/sitemaps/v2/experiences_pdp-L1-58'], ['Cycling in the Don Valley - DJ WORKSHOP IN AACHEN GEBE DEN TAKT AN', '/sitemaps/v2/experiences_pdp-L1-59'], ['DJ Workshop and Indie Scene Night - Dark tour of Split', '/sitemaps/v2/experiences_pdp-L1-60'], ['Darkroom Photography in Kuala Lumpur - Decorative Wreath Workshop', '/sitemaps/v2/experiences_pdp-L1-61'], ['Decoupage for DIY memorabilia & gifts - Descubre México a través de la foto', '/sitemaps/v2/experiences_pdp-L1-62'], ['Descubre Playa Blanca, Saltos - Design a Garment in the World Market', '/sitemaps/v2/experiences_pdp-L1-63'], ['Design a game level in a game studio - Dinghy Sailing at El Portet', '/sitemaps/v2/experiences_pdp-L1-64'], ['Dingle Traditional Rowing - Discover CDMX from the best Rooftops', '/sitemaps/v2/experiences_pdp-L1-65'], ['Discover Caen "Art de vivre" - Discover Manzanita Scavenger Hunt', '/sitemaps/v2/experiences_pdp-L1-66'], ["Discover Marrakech Secrets by Bike - Discover Tarraco's living history", '/sitemaps/v2/experiences_pdp-L1-67'], ['Discover Tarragona coast from the sea - Discover nd paint your Aztec protector', '/sitemaps/v2/experiences_pdp-L1-68'], ['Discover new flavors with an oenophile - Discover the cosmos', '/sitemaps/v2/experiences_pdp-L1-69'], ['Discover the emotion of Kitesurf - Discovering Montjuic Hill', '/sitemaps/v2/experiences_pdp-L1-70'], ['Discovering Old Medina of Casablanca - Diving, Dipping, Swiming in Varadero', '/sitemaps/v2/experiences_pdp-L1-71'], ['Dj Workshop - Downtown Toronto Photoshoot Experience', '/sitemaps/v2/experiences_pdp-L1-72'], ['Downtown Traditional Cantina Pub Crawl - Drinking&Eating with Tokyohandsomeboy', '/sitemaps/v2/experiences_pdp-L1-73'], ['Drinks and a Show Walking Tour - Découverte des dauphins', '/sitemaps/v2/experiences_pdp-L1-74'], ['Découverte des spiritueux québécois - EL FASCINANTE MUNDO DEL MOSAICO', '/sitemaps/v2/experiences_pdp-L1-75'], ['ELDORADO: sus plantas y lugares. - Eat Like a Local - Dinner Experience', '/sitemaps/v2/experiences_pdp-L1-76'], ['Eat Like a Local at Butterworth - Eco Friendly Ranch & Petting Zoo', '/sitemaps/v2/experiences_pdp-L1-77'], ['Eco Hike in Pinar del Rio - Electric Skateboard down Paris', '/sitemaps/v2/experiences_pdp-L1-78'], ['Electric Skateboard the Highline Canal - Enigmatic / Sintra', '/sitemaps/v2/experiences_pdp-L1-79'], ['Enjoin Local Meal By Taking Vegetable - Enjoy local food experience in Tokyo!', '/sitemaps/v2/experiences_pdp-L1-80'], ['Enjoy local western part of Jeju ! - Escalada en Roca en Acantilados Valpo', '/sitemaps/v2/experiences_pdp-L1-81'], ['Escalada en Tarifa - Everyone Can Dance! Brussels', '/sitemaps/v2/experiences_pdp-L1-82'], ['Everyone can do Calligraphy - Experience Fabulous698B! Spring menu!', '/sitemaps/v2/experiences_pdp-L1-83'], ['Experience Falconry Firsthand - Experience a tasting menu in Saigon', '/sitemaps/v2/experiences_pdp-L1-84'], ['Experience a traditional Sunday lunch - Experiencia de golf en Buenos Aires', '/sitemaps/v2/experiences_pdp-L1-85'], ['Experiencia de un día en Salamanca - Explore Bowen Island Walking Adventure', '/sitemaps/v2/experiences_pdp-L1-86'], ['Explore Brazilian cuisine with a chef - Explore Korea broadcast Town & MBC', '/sitemaps/v2/experiences_pdp-L1-87'], ['Explore Korčula with Art historian - Explore Saratoga Passage by Kayak', '/sitemaps/v2/experiences_pdp-L1-88'], ['Explore Sceneries of Sokcho - Explore city nightlife with a musician', '/sitemaps/v2/experiences_pdp-L1-89'], ['Explore city nightlife with locals! - Explore the Old District Sham Shui Po', '/sitemaps/v2/experiences_pdp-L1-90'], ['Explore the Oslo Wilderness in Kayak - Exploring Maspalomas by E-Scooter', '/sitemaps/v2/experiences_pdp-L1-91'], ['Exploring RawaPening Lake of Salatiga - FOOTBALL MATCH IN MEDELLIN', '/sitemaps/v2/experiences_pdp-L1-92'], ['FOREST BATHING AND CREATING MEMORIES - Fantasy Photo~Shoot with A Unicorn...', '/sitemaps/v2/experiences_pdp-L1-93'], ['Fantasy adventure in the mountains - Fazendo Arte na Serra da Cantareira!', '/sitemaps/v2/experiences_pdp-L1-94'], ['Faça acrobacia em tecido como circense - Film your experience in Osaka', '/sitemaps/v2/experiences_pdp-L1-95'], ['FilmNeverDies-Filmphotography Workshop - Fishing with the Presidents', '/sitemaps/v2/experiences_pdp-L1-96'], ['Fishing, Malecon and Sunset - Florist in Paris', '/sitemaps/v2/experiences_pdp-L1-97'], ['Florist terrace & aperol spritz - Follow a historian through Havana', '/sitemaps/v2/experiences_pdp-L1-98'], ['Follow a local in vibrant East London - Food, Fotos & The French', '/sitemaps/v2/experiences_pdp-L1-99'], ['Food, Nature and Art on Tuscan Hills - Fossil Recovery Exploration', '/sitemaps/v2/experiences_pdp-L1-100'], ['Fossil hunting, Lets GO! - French food market tour & aperitif', '/sitemaps/v2/experiences_pdp-L1-101'], ['French macarons baking class - Full day surf experience', '/sitemaps/v2/experiences_pdp-L1-102'], ['Full day trail ride - Gaelic Craic and Cuisine', '/sitemaps/v2/experiences_pdp-L1-103'], ['Gain insight from a spiritual healer - Gems of Rome...', '/sitemaps/v2/experiences_pdp-L1-104'], ['Generations of Beekeeping - Get uplifted with gospel in Harlem', '/sitemaps/v2/experiences_pdp-L1-105'], ['Get ur Shine On Moonshine/Whiskey Tour - Glenorchy and Paradise Explorer', '/sitemaps/v2/experiences_pdp-L1-106'], ['Gliding on the Ice in the Red Dot - Goat Yoga!', '/sitemaps/v2/experiences_pdp-L1-107'], ['Goatherd for one day! - Graffiti, Street & Young Art in Vienna', '/sitemaps/v2/experiences_pdp-L1-108'], ['Grampians hike & waterfalls - Guide tours with E-bike in Menorca', '/sitemaps/v2/experiences_pdp-L1-109'], ['Guided Photo Tour ART Walk - Guitar workshop for the Byron Vibe', '/sitemaps/v2/experiences_pdp-L1-110'], ['Gulf Coast Birding For Beginners - Hand Building with clay in Brooklyn', '/sitemaps/v2/experiences_pdp-L1-111'], ['Hand Coffee Roasting & Cupping Class - Hang loose w/ Island Style Surf School', '/sitemaps/v2/experiences_pdp-L1-112'], ['Hang out in Shibuya and craft bonsai - Have a personal photographer for a day', '/sitemaps/v2/experiences_pdp-L1-113'], ['Have a picnic in a national park - Helicopter tour from Paris to Versailles', '/sitemaps/v2/experiences_pdp-L1-114'], ['Helicopter tour in São Paulo - Hidden Sculpture Stories Tour', '/sitemaps/v2/experiences_pdp-L1-115'], ['Hidden Sights & Cosmopolitan Magic - Hike Forest+ City Summit+ Rose Garden', '/sitemaps/v2/experiences_pdp-L1-116'], ['Hike Gracia hills for view & workout - Hike the Braids and Blackford Hill', '/sitemaps/v2/experiences_pdp-L1-117'], ['Hike the Bridge to Nowhere - Hiking Ajusco (1day)', '/sitemaps/v2/experiences_pdp-L1-118'], ['Hiking Ancient Mountain Olive Trees - Hill Country Estate Winery Tour', '/sitemaps/v2/experiences_pdp-L1-119'], ['Hill Country Yoga Hike - History and Tapas', '/sitemaps/v2/experiences_pdp-L1-120'], ['History and art inside Tijuca forest - Home-based cooking class in Panaji', '/sitemaps/v2/experiences_pdp-L1-121'], ['Home-cooked traditional Indian Food - Horseback ride California cowboy style', '/sitemaps/v2/experiences_pdp-L1-122'], ['Horseback ride and lunch with a farmer - Hungarian Cooking Class & Market Walk', '/sitemaps/v2/experiences_pdp-L1-123'], ['Hungarian pasta workshop and dinner - Ikebana experience', '/sitemaps/v2/experiences_pdp-L1-124'], ['Ikebana exprence - In the middle of the wet forest.', '/sitemaps/v2/experiences_pdp-L1-125'], ['In the trails of Rio - Inside Scoop LA', '/sitemaps/v2/experiences_pdp-L1-126'], ['Inside Showjumping with a Journalist - Interactive experience of lightpainting room', '/sitemaps/v2/experiences_pdp-L1-127'], ['Interactive outdoor navigation course - Invigorating Yoga Flow at Balboa Park', '/sitemaps/v2/experiences_pdp-L1-128'], ['Invitation To Ginza Personal Shopping - JUNGLE ART WALK TULUM', '/sitemaps/v2/experiences_pdp-L1-129'], ['JUNGLE IN THE CITY - Jazz in Cape Town with Lee Thomson', '/sitemaps/v2/experiences_pdp-L1-130'], ['Jazz in a Secret Venue - Joshua Tree Macramé Journey', '/sitemaps/v2/experiences_pdp-L1-131'], ['Joshua Tree Recording Studio & Lodging - Kamikochi hike and visit the Shrine', '/sitemaps/v2/experiences_pdp-L1-132'], ['Kanazawa Haul at Depachika - Kayak with the Dolphins', '/sitemaps/v2/experiences_pdp-L1-133'], ['Kayak&Snorkeling experience in Genoa - Kitesurf In Lake Garda', '/sitemaps/v2/experiences_pdp-L1-134'], ['Kitesurf Lecce-Scuola Kite Salento - Kundalini Yoga to awaken Inner Lover', '/sitemaps/v2/experiences_pdp-L1-135'], ['Kundalini Yoga, Gong and Yogi Tea - LGBTQ+ History Tour of Brighton', '/sitemaps/v2/experiences_pdp-L1-136'], ['LGBTQ+ queer and trans tour of London - Lamawandern im Mostviertel', '/sitemaps/v2/experiences_pdp-L1-137'], ['Lamb Farm Life - Learn 3D printing at Maker Bean Cafe!', '/sitemaps/v2/experiences_pdp-L1-138'], ['Learn 3D printing: design a souvenir - Learn Pencak Silat (Bali Martial Art)', '/sitemaps/v2/experiences_pdp-L1-139'], ['Learn Phone Photography in Mongkok - Learn and indulge in Quercy cuisine', '/sitemaps/v2/experiences_pdp-L1-140'], ['Learn and play poker in Barão Geraldo - Learn how to wakeboard', '/sitemaps/v2/experiences_pdp-L1-141'], ['Learn how to wakeboard or wakesurf - Learn to DJ and dance at a queer club', '/sitemaps/v2/experiences_pdp-L1-142'], ['Learn to DJ at Private Studio - Learn to bake sourdough in my kitchen', '/sitemaps/v2/experiences_pdp-L1-143'], ['Learn to bake traditional Irish Breads - Learn to play badminton!', '/sitemaps/v2/experiences_pdp-L1-144'], ['Learn to play poker from a Vegas pro - Legends and secrets with traditional music', '/sitemaps/v2/experiences_pdp-L1-145'], ["Lei Po'o (Flower Crown) Workshop - Let's learn how to wearing kimono", '/sitemaps/v2/experiences_pdp-L1-146'], ["Let's learn to SKETCH & DRAW! - Life changing Hike on Crowders Mtn.", '/sitemaps/v2/experiences_pdp-L1-147'], ['Life coaching with horses - Lisbon with the historic tram', '/sitemaps/v2/experiences_pdp-L1-148'], ["Lisbon's CatWalk - Living the Santa Cruz Canyon!", '/sitemaps/v2/experiences_pdp-L1-149'], ['Living the dying city! - London Chicken Wing Tour!', '/sitemaps/v2/experiences_pdp-L1-150'], ['London Christmas Lights Running Tour - Luis Barragán en 5 tiempos.', '/sitemaps/v2/experiences_pdp-L1-151'], ['Lujan de Cuyo Private Wine Tour - Macaron Class in Paris', '/sitemaps/v2/experiences_pdp-L1-152'], ["Macaron Masterclass in Borough - Makati's best food drinks and music", '/sitemaps/v2/experiences_pdp-L1-153'], ['Make Fujiyama Katsu-Curry with a chef - Make Wine Candles+Bottle Cutting (DIY)', '/sitemaps/v2/experiences_pdp-L1-154'], ['Make Your Leather Tortellino Keychain - Make a seashell frame after a boatride', '/sitemaps/v2/experiences_pdp-L1-155'], ['Make a signature scent with a perfumer - Make sushi in a traditional folk house', '/sitemaps/v2/experiences_pdp-L1-156'], ['Make sustainable leather espadrilles - Make your own silk scarf in Paris !', '/sitemaps/v2/experiences_pdp-L1-157'], ['Make your own silver jewel in Florence - Malen und Zeichnen', '/sitemaps/v2/experiences_pdp-L1-158'], ['Maleny Yoga Rainforest Wellness - Market Tour and a Home Cooked Meal', '/sitemaps/v2/experiences_pdp-L1-159'], ['Market Tour&Cook Kenyan cultural food - Matera Photo Journey', '/sitemaps/v2/experiences_pdp-L1-160'], ['Mathematics Walking Tour of Amsterdam - Meditazione rilassamento profondo', '/sitemaps/v2/experiences_pdp-L1-161'], ['Meditazione e benEssere in spiaggia - Men & Womens Wellness Retreat', '/sitemaps/v2/experiences_pdp-L1-162'], ['Mental Wellness Picnic Chat - Midday Vermouth Madrileño & Tortilla', '/sitemaps/v2/experiences_pdp-L1-163'], ['Middle Eastern Food Cooking Class - Mini Horse Experience!', '/sitemaps/v2/experiences_pdp-L1-164'], ['Mini Nature Retreat - Moniz Family Surf: Private Surf Lesson', '/sitemaps/v2/experiences_pdp-L1-165'], ['Moniz Family Surf: Waikiki Surf Lesson - Morocco in frames:Culture &photography', '/sitemaps/v2/experiences_pdp-L1-166'], ['Mortadella & Bologna: a love story - Mountain plateau hike (5-7 hr)', '/sitemaps/v2/experiences_pdp-L1-167'], ['Mountain summit and cozy restaurant - Music, Mummies, and Museums!', '/sitemaps/v2/experiences_pdp-L1-168'], ['Music, art, history, Rio nightlife - NEW! Amsterdam light cruise', '/sitemaps/v2/experiences_pdp-L1-169'], ['NEW! Aperitivo tour - Fun Fun Fun! ❤️ - National Arboretum and #madeinDC Tour', '/sitemaps/v2/experiences_pdp-L1-170'], ['National Gallery with an Art Historian - Nature trails & Birding near Cali', '/sitemaps/v2/experiences_pdp-L1-171'], ['Nature walk near Hotspring in Jozankei - Night Cheese: California Artisans', '/sitemaps/v2/experiences_pdp-L1-172'], ["Night Cherry Viewing - Nonna Cecilia's Pasta Workshop!", '/sitemaps/v2/experiences_pdp-L1-173'], ['Noodle & Roll & Water Puppet Theater - Obstacle Course Adventure', '/sitemaps/v2/experiences_pdp-L1-174'], ['Ocean Beach Coffee Cart Bike Tour - Old Montreal: Intimate & Personalized', '/sitemaps/v2/experiences_pdp-L1-175'], ['Old Quebec : Mesdames & Mesdemoiselles - Ontdek de ambacht van het jam maken', '/sitemaps/v2/experiences_pdp-L1-176'], ['OoLong Tea Tasting - Osaka Old City Walk Tour with Sweets!', '/sitemaps/v2/experiences_pdp-L1-177'], ['Osaka Shinsekai/Dotonbori Walking Tour - PASTA MATRICIANA MAKING CLASS', '/sitemaps/v2/experiences_pdp-L1-178'], ['PASTA grani antichi PANE lievito nat - Paddle with the Penguins', '/sitemaps/v2/experiences_pdp-L1-179'], ['Paddle your way in history - Paint a skateboard deck with an artist', '/sitemaps/v2/experiences_pdp-L1-180'], ['Paint a surfboard to send home! - Painting on the Amalfi Coast.', '/sitemaps/v2/experiences_pdp-L1-181'], ['Painting on the Pier - Paris fabrics & crafts', '/sitemaps/v2/experiences_pdp-L1-182'], ['Paris for Families - Passeio a cavalo no por do Sol', '/sitemaps/v2/experiences_pdp-L1-183'], ['Passeio agradável no bairro do Bexiga! - Pedal untamed back roads of Outaouais', '/sitemaps/v2/experiences_pdp-L1-184'], ['Pedala nelle pinete di Ravenna - Personal Shopping Consultant', '/sitemaps/v2/experiences_pdp-L1-185'], ['Personal Shopping For Bespoke Suits - Photo Session in beautiful Basel', '/sitemaps/v2/experiences_pdp-L1-186'], ['Photo Session in sunny Ibiza - Photo tour of Dublin city', '/sitemaps/v2/experiences_pdp-L1-187'], ["Photo tour on bike with an expert - Photographing in 'off-piste' Tokyo.", '/sitemaps/v2/experiences_pdp-L1-188'], ['Photographing in Palermo markets - Photoshoot for yoga lovers in Paris', '/sitemaps/v2/experiences_pdp-L1-189'], ['Photoshoot from Varadero to Havana - Pick plant n Cook,KANOM JAAK,dessert', '/sitemaps/v2/experiences_pdp-L1-190'], ['Pick wild apples and can applesauce - Pizza Class on a Renaissance Terrace', '/sitemaps/v2/experiences_pdp-L1-191'], ['Pizza Cooking Class in Milan - Play the Chinese ink art with Runkuan', '/sitemaps/v2/experiences_pdp-L1-192'], ['Play the Fiddle in Under an Hour - Por los Cerros de Santiago.', '/sitemaps/v2/experiences_pdp-L1-193'], ['Por los cementerios de Chepe. - Pose for pin-ups with a photographer', '/sitemaps/v2/experiences_pdp-L1-194'], ['Pose for portraits in Cambridge - Prague experience: archaeology&history', '/sitemaps/v2/experiences_pdp-L1-195'], ['Prague for Locals and You - Private Alpine Backpacking Trip', '/sitemaps/v2/experiences_pdp-L1-196'], ['Private Amalfi & Positano HalfDay tour - Private Yoga Class & Eco Snack', '/sitemaps/v2/experiences_pdp-L1-197'], ['Private Yoga Class for All Levels - Professional photos of your LA Vacay!', '/sitemaps/v2/experiences_pdp-L1-198'], ['Professional photoshoot by the sea - Pêche dans le Parc du Mercantour', '/sitemaps/v2/experiences_pdp-L1-199'], ['Pêche des Calamars en mer - Rafting the Isar', '/sitemaps/v2/experiences_pdp-L1-200'], ['Rafting trip-Snorkeling&cliff jumping - Recording Studio Magic (Immersive)', '/sitemaps/v2/experiences_pdp-L1-201'], ['Recording your own k-pop song - Relaxing Costarican Typical Trails', '/sitemaps/v2/experiences_pdp-L1-202'], ['Relaxing Countryside : Somewhere Snowy - Ride OFF Road in Puerto Rico', '/sitemaps/v2/experiences_pdp-L1-203'], ['Ride Retired Racehorses at Lake Fork - River hike captured on drone video', '/sitemaps/v2/experiences_pdp-L1-204'], ['River view cooking with super mom chef - Romancing the Bean', '/sitemaps/v2/experiences_pdp-L1-205'], ['Romantic "Aperitivo" with lake view - Rugged North Cascades Private Day Tour', '/sitemaps/v2/experiences_pdp-L1-206'], ['Ruin Pub Crawl with a Local Hostess - Running Tour of Brooklyn!', '/sitemaps/v2/experiences_pdp-L1-207'], ['Running Tour of San Diego - SAORI freestyle weaving for everyone', '/sitemaps/v2/experiences_pdp-L1-208'], ['SASHIKO; Learn the Japanese embroidery - SUP around Balboa Isle & the Back Bay', '/sitemaps/v2/experiences_pdp-L1-209'], ['SUP at sunset - Sail around the city with a boat lover', '/sitemaps/v2/experiences_pdp-L1-210'], ['Sail at Sunset in Porto - Sails, Whales, and Trails', '/sitemaps/v2/experiences_pdp-L1-211'], ['Sailtour Lake Lucerne - Samurai Kendo Experience', '/sitemaps/v2/experiences_pdp-L1-212'], ['Samurai Ninja Experience & Guided Tour - Sardinia: sip by sip', '/sitemaps/v2/experiences_pdp-L1-213'], ['Sardinian identity in the kitchen - Scotch Whisky & Cheese Tasting', '/sitemaps/v2/experiences_pdp-L1-214'], ['Scotland Adventure Adrenaline Day - Seaside Yoga', '/sitemaps/v2/experiences_pdp-L1-215'], ['Seaside bike ride and paddle surfing - Secretos de la Perla Tapatía', '/sitemaps/v2/experiences_pdp-L1-216'], ['Secretos del centro de Coyoacán - Segeln in Berlin', '/sitemaps/v2/experiences_pdp-L1-217'], ['Segeltörn auf dem Steinhuder Meer - Seville From The Rooftops', '/sitemaps/v2/experiences_pdp-L1-218'], ['Seville Great Experience in E-Bike - Shmurgle a Goat, Cosy Winter Edition', '/sitemaps/v2/experiences_pdp-L1-219'], ['Shochu tasting workshop in Asakusa - Shop-til-you-drop with Fashion Stylist', '/sitemaps/v2/experiences_pdp-L1-220'], ['Shopper bag making class - Silicon Valley High Tech Company Tour', '/sitemaps/v2/experiences_pdp-L1-221']]}

Scraping Google Destinations

I'm preparing a tour around the world and am curious to find out what the top sights are around the world, so I´m trying to scrape the top destinations within a certain place. I want to end up with the top places in a country, and their best sights. Google Destinations was recently added as a a great functionality for this.
For example, when googling Cuba Destinations, Google shows a card with destinations Havana, Varadero, Trinidad, Santiago de Cuba.
Then, when googling Havana Cuba Destinations, it shows `Old Havana, Malecon, Castillo de los Tres Reyes Magos del Morro, El Capitolio.
Finally I´ll turn it into a table, that looks like:
Cuba, Havana, Old Havana.
Cuba, Havana, Malecon.
Cuba, Havana, Castillo de los Tres Reyes Magos del Morro.
Cuba, Havana, El Capitolio.
Cuba, Varadero, Hicacos Peninsula.
and so on.
I have tried the API call as shown in Travel destinations API, butthat does not provide the right feedback, and often yields OVER_QUERY_LIMIT.
The code below returns an error:
URL = "https://www.google.nl/destination/compare?q=cuba+destinations&site=search&output=search&dest_mid=/m/0d04z6&sa=X&ved=0API_KEY"
import requests
from bs4 import BeautifulSoup
#URL = "http://www.values.com/inspirational-quotes"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html5lib')
print(soup.prettify())
Any tips?
You will need to use something like Selenium for this as the page makes multiple XHRs you will not be able to get the rendered page using requests alone. First install Selenium.
sudo pip3 install selenium
Then get a driver https://sites.google.com/a/chromium.org/chromedriver/downloads
(Depending upon your OS you may need to specify the location of your driver)
from bs4 import BeautifulSoup
from selenium import webdriver
import time
browser = webdriver.Chrome()
url = ("https://www.google.nl/destination/compare?q=cuba+destinations&site=search&output=search&dest_mid=/m/0d04z6&sa=X&ved=0API_KEY")
browser.get(url)
time.sleep (2)
html_source = browser.page_source
browser.quit()
soup = BeautifulSoup(html_source, "lxml")
# Get the headings
hs = [tag.text for tag in soup.find_all('h2')]
# get the text containg divs
divs = [tag.text for tag in soup.find_all('div', {'class': False})]
# Delete surplus divs
del divs[:22]
del divs[-1:]
print(list(zip(hs,divs)))
Outputs:
[('Havana', "Cuban capital known for Old Havana's colonial architecture, live salsa music & nearby beaches."), ('Varadero', 'Major Cuban resort town on Hicacos Peninsula, with a 20km beach, a golf course & several parks.'), ('Trinidad', 'Cuban town known for Plaza Mayor, colonial architecture & plantations of Valle de los Ingenios.'), ('Santiago de Cuba', 'Cuban city known for Afro-Cuban festivals & music, plus Spanish colonial & revolutionary history.'), ('Viñales', 'Cuban town known for Viñales Valley, Casa de Caridad Botanical Gardens & nearby tobacco farms.'), ('Cienfuegos', 'Cuban coastal city, known for Tomás Terry Theater, Arco de Triunfo & Playa Rancho Luna resorts.'), ('Santa Clara', 'Cuban city home to the Che Guevara Mausoleum, Parque Vidal & ornate Teatro La Caridad.'), ('Cayo Coco', 'Cuban island known for its white-sand beaches & resorts, plus reef snorkeling & flamingos.'), ('Cayo Santa María', 'Cuban island known for Gaviotas Beach, Cayo Santa María Wildlife Refuge & Pueblo La Estrella.'), ('Cayo Largo del Sur', 'Cuban island, known for beaches like Playa Blanca & Playa Sirena, plus a sea turtle center & diving.'), ('Plaza de la Revolución', 'Che Guevara and monuments'), ('Camagüey', 'Ballet, churches, history, and beaches'), ('Holguín', 'Cuban city known for Parque Calixto García, the Hacha de Holguín axe head & Guardalavaca beaches.'), ('Cayo Guillermo', 'Cuban island with beaches like Playa del Medio & Playa Pilar, plus vast expanses of coral reef.'), ('Matanzas', 'Caves, theater, beaches, history, and rivers'), ('Baracoa', 'Beaches, rivers, and nature'), ('Centro Habana', '\xa0'), ('Playa Girón', 'Beaches, snorkeling, and museums'), ('Topes de Collantes', 'Scenic nature reserve park for hiking'), ('Guardalavaca', 'Cuban resort known for Esmeralda Beach, the Cayo Naranjo Aquarium & the Chorro de Maíta Museum.'), ('Bay of Pigs', 'Snorkeling, scuba diving, and beaches'), ('Isla de la Juventud', 'Scuba diving and beaches'), ('Zapata Swamp', 'Parks, crocodiles, birdwatching, and swamps'), ('Pinar del Río', 'History'), ('Remedios', 'Churches, beaches, and museums'), ('Bayamo', 'Wax museums, monuments, history, and music'), ('Sierra Maestra', 'Peaks with a storied political history'), ('Las Terrazas', 'Zip-lining, nature reserves, and hiking'), ('Sancti Spíritus', 'History and museums'), ('Playa Ancon', 'Beaches, snorkeling, and scuba diving'), ('Jibacoa', 'Beaches, snorkeling, and jellyfish'), ('Jardines de la Reina', 'Scuba diving, fly-fishing, and gardens'), ('Cayo Jutías', 'Beach and snorkeling'), ('Guamá, Cuba', 'Crocodiles, beaches, snorkeling, and lakes'), ('Morón', 'Crocodiles, lagoons, and beaches'), ('Las Tunas', 'Beaches, nightlife, and history'), ('Soroa', 'Waterfalls, gardens, nature, and ecotourism'), ('Guanabo', 'Beach'), ('María la Gorda', 'Scuba diving, beaches, and snorkeling'), ('Alejandro de Humboldt National Park', 'Park, protected area, and hiking'), ('Ciego de Ávila', 'Zoos and beaches'), ('Bacunayagua', '\xa0'), ('Guantánamo', 'Beaches, history, and nature'), ('Cárdenas', 'Beaches, museums, monuments, and history'), ('Canarreos Archipelago', 'Sailing and coral reefs'), ('Caibarién', 'Beaches'), ('El Nicho', 'Waterfalls, parks, and nature'), ('San Luis Valley', 'Cranes, national wildlife refuge, and elk')]
UPDATED IN RESPONSE TO COMMENT:
from bs4 import BeautifulSoup
from selenium import webdriver
import time
browser = webdriver.Chrome()
for place in ["Cuba", "Belgum", "France"]:
url = ("https://www.google.nl/destination/compare?site=destination&output=search")
browser.get(url) # you may not need to do this every time if you clear the search box
time.sleep(2)
element = browser.find_element_by_name('q') # get the query box
time.sleep(2)
element.send_keys(place) # populate the search box
time.sleep (2)
search_box=browser.find_element_by_class_name('sbsb_c') # get the first element in the list
search_box.click() # click it
time.sleep (2)
destinations=browser.find_element_by_id('DESTINATIONS') # Click the destinations link
destinations.click()
time.sleep (2)
html_source = browser.page_source
soup = BeautifulSoup(html_source, "lxml")
# Get the headings
hs = [tag.text for tag in soup.find_all('h2')]
# get the text containg divs
divs = [tag.text for tag in soup.find_all('div', {'class': False})]
# Delete surplus divs
del divs[:22]
del divs[-1:]
print(list(zip(hs,divs)))
browser.quit()
Try this Google Places API URL. You will get the point of Interest/Attraction/Tourists places in (for example) New York City. You have to use the CITY NAME with the keyword Point Of Interest.
https://maps.googleapis.com/maps/api/place/textsearch/json?query=new+york+city+point+of+interest&language=en&key=API_KEY
These API results are same as the results of the Google search results below.
https://www.google.com/search?sclient=psy-ab&site=&source=hp&btnG=Search&q=New+York+point+of+interest
Two more little tips for you:
You can use the Python Client for Google Maps Services: https://github.com/googlemaps/google-maps-services-python
For the OVER_QUERY_LIMIT problem, make sure that you add a billing method to your Google Cloud project (with your credit card or free trail credit balance). Don't worry too much because Google will give you some thousand free queries each month.

Categories