Unexpected behaviour when navigating through webpages using Selenium - python

I want to navigate through all the continents/ countries here and collect the tables into a pandas data frame, but sometimes the process clicks on the same link a couple of times before continuing on to the next. This is my current implementation:
chrome_options = Options()
chrome_options.add_argument("--start-maximized")
DRIVER_PATH = '/path/to/chromedriver'
driver = webdriver.Chrome(executable_path=DRIVER_PATH, options=chrome_options)
driver.get('https://www.ertms.net/deployment-world-map/')
continents = driver.find_element(by='id', value='panel')
continent_names = continents.text.split()
# navigating through continent links
for i, cont in enumerate(continent_names):
cont_buttons = driver.find_elements_by_class_name('accordion')
continent_element = WebDriverWait(driver, 10).until(EC.element_to_be_clickable(cont_buttons[i + 1]))
time.sleep(0.5)
ActionChains(driver).move_to_element(continent_element).click().perform()
time.sleep(3)
child_buttons = driver.find_elements_by_class_name('accordion')
# going through country links for each continent. Here is where the same link is sometimes clicked twice
for j, country in enumerate(child_buttons):
time.sleep(3)
child_buttons = driver.find_elements_by_class_name('accordion')
country_element = WebDriverWait(driver, 10).until(EC.element_to_be_clickable(child_buttons[j]))
time.sleep(0.5)
ActionChains(driver).move_to_element(country_element).click().perform()
# going back to page with list of countries for current continent
back_button = driver.find_element_by_class_name('go-back')
driver.execute_script("arguments[0].click();", back_button)
time.sleep(3)
# going back to list of continents
back_button = driver.find_element_by_class_name('go-back')
driver.execute_script("arguments[0].click();", back_button)
time.sleep(3)
I navigate around using EC.element_to_be_clickable and a combination of the By.LINK_TEXT or find_elements_by_class_name methods. Any advice on best practices would be appreciated.

Related

Getting text from multiple webpages(Pagination) in selenium python

I wanted to extract text from multiple pages. Currently, I am able to extract data from the first page but I want to append and go to muliple pages and extract the data from pagination. I have written this simple code which extracts data from the first page. I am not able to extract the data from multiple pages which is dynamic in number.
`
element_list = []
opts = webdriver.ChromeOptions()
opts.headless = True
driver = webdriver.Chrome(ChromeDriverManager().install())
base_url = "XYZ"
driver.maximize_window()
driver.get(base_url)
driver.set_page_load_timeout(50)
element = WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.ID, 'all-my-groups')))
l = []
l = driver.find_elements_by_xpath("//div[contains(#class, 'alias-wrapper sim-ellipsis sim-list--shortId')]")
for i in l:
print(i.text)
`
I have shared the images of class if this could help from pagination.
If we could extract the automate and extract from all the pages that would be awesome. Also, I am new so please pardon me for asking silly questions. Thanks in advance.
You have provided the code just for the previous page button. I guess you need to go to the next page until next page exists. As I don't know what site we are talking about I can only guess its behavior. So I'm assuming the button 'next' disappears when no next page exists. If so, it can be done like this:
element_list = []
opts = webdriver.ChromeOptions()
opts.headless = True
driver = webdriver.Chrome(ChromeDriverManager().install())
base_url = "XYZ"
driver.maximize_window()
driver.get(base_url)
driver.set_page_load_timeout(50)
element = WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.ID, 'all-my-groups')))
l = []
l = driver.find_elements_by_xpath("//div[contains(#class, 'alias-wrapper sim-ellipsis sim-list--shortId')]")
while True:
try:
next_page = driver.find_element(By.XPATH, '//button[#label="Next page"]')
except NoSuchElementException:
break
next_page.click()
l.extend(driver.find_elements(By.XPATH, "//div[contains(#class, 'alias-wrapper sim-ellipsis sim-list--shortId')]"))
for i in l:
print(i.text)
To be able to catch the exception this import has to be added:
from selenium.common.exceptions import NoSuchElementException
Also note that the method find_elements_by_xpath is deprecated and it would be better to replace this line:
l = driver.find_elements_by_xpath("//div[contains(#class, 'alias-wrapper sim-ellipsis sim-list--shortId')]")
by this one:
l = driver.find_elements(By.XPATH, "//div[contains(#class, 'alias-wrapper sim-ellipsis sim-list--shortId')]")

Selenium: Issue with reassignment in For Loop

I tried debugging my program with print statements to see what was going on during each iteration.
This part works fine:
The program goes through a total of 50 combinations of the drop-down menus (25 for each year).
This part isn't working:
However, for some reason the totals dictionary is only storing the inputs from the second iteration of the initial "year" for-loop. It is returning a dictionary with a length of 25 (only half of what I actually want).
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
# General Stuff about the website
path = '/Users/admin/desktop/projects/scraper/chromedriver'
options = Options()
options.headless = True
options.add_argument("--window-size=1920,1200")
driver = webdriver.Chrome(options=options, executable_path=path)
website = 'http://siops.datasus.gov.br/filtro_rel_ges_covid_municipal.php'
driver.get(website)
# Initial Test: printing the title
print(driver.title)
print()
# Dictionary to Store stuff in
totals = {}
### Drop Down Menus ###
state_select = Select(driver.find_element(By.XPATH, '//*[#id="cmbUF"]'))
state_options = state_select.options
year_select = Select(driver.find_element(By.XPATH, '//*[#id="cmbAno"]'))
year_options = year_select.options
# county_select = Select(driver.find_element(By.XPATH, '//*[#id="cmbMunicipio"]'))
# county_select.select_by_value('120025')
# report_select = Select(driver.find_element(By.XPATH, '//*[#id="gesRelatorio"]'))
# report_select.select_by_value('rel_ges_covid_rep_uniao_municipal.php')
# period_select = Select(driver.find_element(By.XPATH, '//*[#id="cmbPeriodo"]'))
# period_select.select_by_value('14')
### Loop through all combinations ###
for year in range(1, 3):
year_select = Select(driver.find_element(By.XPATH, '//*[#id="cmbAno"]'))
year_select.select_by_index(year)
for index in range(0, len(state_options) - 1):
state_select = Select(driver.find_element(By.XPATH, '//*[#id="cmbUF"]'))
state_select.select_by_index(index)
# Click the Submit Button
submit_button = driver.find_element(By.XPATH, '//*[#id="container"]/div[2]/form/div[2]/div/input[2]')
submit_button.click()
# Pulling data from the webpage
nameof = driver.find_element(By.XPATH, '//*[#id="arearelatorio"]/div[1]/div/table[1]/tbody/tr[2]').text
total_balance = driver.find_element(By.XPATH, '//*[#id="arearelatorio"]/div[1]/div/table[3]/tbody/tr[9]/td[2]').text
paid_expenses = driver.find_element(By.XPATH, '//*[#id="arearelatorio"]/div[1]/div/table[4]/tbody/tr[11]/td[4]').text
# Update Dictionary with the new info
totals.update({nameof: [total_balance, paid_expenses, year]})
print([nameof, year])
driver.back()
# Print the final Dictionary and quit
print(len(totals))
print(totals)
driver.quit()
#Alex Karamfilov figured it out with his comment:
"Just a wild guess but is it possible that you overwrite the value for the same key in the dictionary. Since this is a dictionary and keys should be unique, this might be the reason to have only the second iteration values "
It was a dumb error on my part. The keys were the same during each iteration, so it was just modifying the values rather than creating a new key-value pair.

Selenium webdriver: How to delete/flag spam comments on YouTube platform (without API)

I've been trying to flag/report a list of spam comments in a particular YouTube video.
For that I've been using this code on Python, which loads my previous profile so I log in with my account:
URL = "https://www.youtube.com/watch?
v=dvecqwfU6xw&lc=Ugxw_nsUNUor9AUEBGp4AaABAg.9fDfvkgiqtW9fDkE2r6Blm"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")
options = webdriver.ChromeOptions()
user = pathlib.Path().home()
print(user)
options.add_argument(f"user-data-dir={user}/AppData/Local/Google/Chrome/User Data/")
driver= webdriver.Chrome('chromedriver.exe',chrome_options=options)
driver.get(URL)
wait=WebDriverWait(driver, 100)
comment_box = '//*[#id="comment"]'
reply_box ='//*[#id="replies"]'
while(True):
driver.execute_script("window.scrollBy(0, 200);")
try:
reply_box = driver.find_element(By.XPATH, reply_box)
print(reply_box.text)
break
except:
pass
# resp = driver.request('POST', 'https://www.youtube.com/youtubei/v1/flag/get_form?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8&prettyPrint=false')
# print(resp.text)
button = wait.until(EC.presence_of_element_located((By.XPATH,'//*[#id="button"]')))
driver.execute_script("arguments[0].click();", button)
The problem comes with opening the menu, I believe since you have to hover over the 3 dots menu it would then appear as the clickable menu so I never get to open the actual menu to report/flag the comment.
My mistake was not to take full Xpath path.... It works perfectly like this, THANKS
options = webdriver.ChromeOptions()
user = pathlib.Path().home()
print(user)
options.add_argument(f"user-data-dir={user}/AppData/Local/Google/Chrome/User Data/")
options.add_argument('--headless')
driver= webdriver.Chrome('chromedriver.exe',chrome_options=options)
driver.get(URL)
wait=WebDriverWait(driver, 100)
comment_box = '//*[#id="comment"]'
reply_box ='//*[#id="replies"]'
while(True):
driver.execute_script("window.scrollBy(0, 200);")
try:
reply_box = driver.find_element(By.XPATH, reply_box)
print(reply_box.text)
break
except:
pass
option_button = '/html/body/ytd-app/div[1]/ytd-page-manager/ytd-watch-flexy/div[5]/div[1]/div/div[2]/ytd-comments/ytd-item-section-renderer/div[3]/ytd-comment-thread-renderer[1]/div/ytd-comment-replies-renderer/div[2]/ytd-comment-renderer/div[3]/div[3]/ytd-menu-renderer/yt-icon-button/button'
option_button = wait.until(EC.presence_of_element_located((By.XPATH, option_button)))
driver.execute_script("arguments[0].click();", option_button)
report_button = '/html/body/ytd-app/ytd-popup-container/tp-yt-iron-dropdown/div/ytd-menu-popup-renderer/tp-yt-paper-listbox/ytd-menu-service-item-renderer/tp-yt-paper-item/yt-formatted-string'
report_button = wait.until(EC.presence_of_element_located((By.XPATH,report_button)))
driver.execute_script("arguments[0].click();", report_button)
report_button_spam = '/html/body/ytd-app/ytd-popup-container/tp-yt-paper-dialog/yt-report-form-modal-renderer/tp-yt-paper-dialog-scrollable/div/div/yt-options-renderer/div/tp-yt-paper-radio-group/tp-yt-paper-radio-button[1]/div[1]'
report_button_spam = wait.until(EC.presence_of_element_located((By.XPATH, report_button_spam)))
driver.execute_script("arguments[0].click();", report_button_spam)
report_button_send = '/html/body/ytd-app/ytd-popup-container/tp-yt-paper-dialog/yt-report-form-modal-renderer/div/yt-button-renderer[2]/a/tp-yt-paper-button'
report_button_send = wait.until(EC.presence_of_element_located((By.XPATH, report_button_send)))
driver.execute_script("arguments[0].click();", report_button_send)
popup_button_done = '/html/body/ytd-app/ytd-popup-container/tp-yt-paper-dialog[2]/yt-confirm-dialog-renderer/div[2]/div[2]/yt-button-renderer[3]/a/tp-yt-paper-button'
popup_button_done = wait.until(EC.presence_of_element_located((By.XPATH, popup_button_done)))
print(popup_button_done.text)

How to re-load page while looping over elements?

This is my code, should be easily recreateable:
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
def main():
# Setup chrome options
chrome_options = Options()
chrome_options.add_argument("--headless") # Ensure GUI is off
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--window-size=1920x3500")
# Set path to chromedriver as per your configuration
webdriver_service = Service("/home/sumant/chromedriver/stable/chromedriver")
# Choose Chrome Browser
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)
driver.maximize_window()
# Get page
url = "https://www.ibrance.com/"
driver.get(url)
time.sleep(2)
ele = driver.find_elements_by_tag_name('a')
for i, e in enumerate(ele):
try:
print(e.get_attribute('outerHTML'))
e.click()
time.sleep(2)
driver.save_screenshot(f"/mnt/d/Work/ss{i}.png")
driver.get(url)
# driver.refresh()
except:
print("element not interactable")
driver.close()
driver.quit()
if __name__ == '__main__':
main()
The idea is I click on a link take a screenshot, load home page again, click on next link and so on.
After the first link, it is not able to find any other element on the reloaded page.
This is correct, since after the refresh it is unable to find you required elements.
To do so, elements need to be reloaded after each refresh.
Do this:
ele = driver.find_elements_by_tag_name('a')
for i, e in enumerate(ele):
try:
print(e.get_attribute('outerHTML'))
e.click()
time.sleep(2)
driver.save_screenshot(f"/mnt/d/Work/ss{i}.png")
driver.get(url)
driver.refresh()
# reload elements
ele = driver.find_elements_by_tag_name('a')
So this worked
(Thanks YuMa, for the inspiration)
def main():
# ...
# Get page
url = "https://www.ibrance.com/"
driver.get(url)
time.sleep(2)
total_element = driver.find_elements_by_tag_name('a')
total_clicks = len(total_element)
def get_images(ele, i):
try:
ele[i].click()
time.sleep(2)
# driver.save_screenshot(f"/mnt/d/Work/ss{i}.png")
print(driver.title)
driver.get(url)
time.sleep(2)
except:
print("")
for i in range(0, total_clicks+1):
ele = driver.find_elements_by_tag_name('a')
get_images(ele, i)

Next Page Iteration in Selenium/BeautfulSoup for Scraping E-Commerce Website

I'm scraping an E-Commerce website, Lazada using Selenium and bs4, I manage to scrape on the 1st page but I unable to iterate to the next page. What I'm tyring to achieve is to scrape the whole pages based on the categories I've selected.
Here what I've tried :
# Run the argument with incognito
option = webdriver.ChromeOptions()
option.add_argument(' — incognito')
driver = webdriver.Chrome(executable_path='chromedriver', chrome_options=option)
driver.get('https://www.lazada.com.my/')
driver.maximize_window()
# Select category item #
element = driver.find_elements_by_class_name('card-categories-li-content')[0]
webdriver.ActionChains(driver).move_to_element(element).click(element).perform()
t = 10
try:
WebDriverWait(driver,t).until(EC.visibility_of_element_located((By.ID,"a2o4k.searchlistcategory.0.i0.460b6883jV3Y0q")))
except TimeoutException:
print('Page Refresh!')
driver.refresh()
element = driver.find_elements_by_class_name('card-categories-li-content')[0]
webdriver.ActionChains(driver).move_to_element(element).click(element).perform()
print('Page Load!')
#Soup and select element
def getData(np):
soup = bs(driver.page_source, "lxml")
product_containers = soup.findAll("div", class_='c2prKC')
for p in product_containers:
title = (p.find(class_='c16H9d').text)#title
selling_price = (p.find(class_='c13VH6').text)#selling price
try:
original_price=(p.find("del", class_='c13VH6').text)#original price
except:
original_price = "-1"
if p.find("i", class_='ic-dynamic-badge ic-dynamic-badge-freeShipping ic-dynamic-group-2'):
freeShipping = 1
else:
freeShipping = 0
try:
discount = (p.find("span", class_='c1hkC1').text)
except:
discount ="-1"
if p.find(("div", {'class':['c16H9d']})):
url = "https:"+(p.find("a").get("href"))
else:
url = "-1"
nextpage_elements = driver.find_elements_by_class_name('ant-pagination-next')[0]
np=webdriver.ActionChains(driver).move_to_element(nextpage_elements).click(nextpage_elements).perform()
print("- -"*30)
toSave = [title,selling_price,original_price,freeShipping,discount,url]
print(toSave)
writerows(toSave,filename)
getData(np)
The problem might be that the driver is trying to click the button before the element is even loaded correctly.
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(PATH, chrome_options=option)
# use this code after driver initialization
# this is make the driver wait 5 seconds for the page to load.
driver.implicitly_wait(5)
url = "https://www.lazada.com.ph/catalog/?q=phone&_keyori=ss&from=input&spm=a2o4l.home.search.go.239e359dTYxZXo"
driver.get(url)
next_page_path = "//ul[#class='ant-pagination ']//li[#class=' ant-pagination-next']"
# the following code will wait 5 seconds for
# element to become clickable
# and then try clicking the element.
try:
next_page = WebDriverWait(driver, 5).until(
EC.element_to_be_clickable((By.XPATH, next_page_path)))
next_page.click()
except Exception as e:
print(e)
EDIT 1
Changed the code to make the driver wait for the element to become clickable. You can add this code inside a while loop for iterating multiple times and break the loop if the button is not found and is not clickable.

Categories