python selenium xpath loop skips random elements

python selenium xpath loop skips random elements - python

I was trying to select all the 264 Recipients from the second form on this website and I used the code below:
s = Service('./chromedriver.exe')
driver = webdriver.Chrome(service=s)
url = "https://armstrade.sipri.org/armstrade/page/trade_register.php"
driver.get(url)
sleep(5)
and here's the loop:
for b in range(1, 264):
Recipients = driver.find_element(By.XPATH, "//*[#id='selFrombuyer_country_code']/option[%d]" % b)
Recipients.click()
sleep(0.2)
# click "ADD" option
driver.find_element(By.XPATH, "/html/body/font/div/form/table/tbody/tr[3]/td/table/tbody/tr/td[2]/input[1]").click()
sleep(0.2)
When I test this code, the loop worked...partly: it skips random elements and only got like half of the elements selected.
And here are some of the elements my loop ignored, which look perfectly normal:
//*[#id="selFrombuyer_country_code"]/option[2]
//*[#id="selFrombuyer_country_code"]/option[26]
Why can't my loop traverse all the elements?

This should work for you. No hard-coded ranges:
# Below are the Additional imports needed
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Code after the driver is instantiated
driver.get("https://armstrade.sipri.org/armstrade/page/trade_register.php")
c_codes = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[#id='selFrombuyer_country_code']//option")))
print(len(c_codes))
ls1=[]
ls2=[]
for each_code in c_codes:
ls1.append(each_code.text)
each_code.click()
time.sleep(0.2)
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.XPATH, "//*[#id='selFrombuyer_country_code']/..//..//input[contains(#value, 'Add')]"))).click()
all_c = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[#name='selselTobuyer_country_code']//option")))
for each_c in all_c:
ls2.append(each_c.text)
print(len(ls1))
print(len(ls2))
print(ls1)
print(ls2)
driver.close()

Related

Selenium: find elements by classname returns only one item instead of all

The code below sometimes returns one (1) element, sometimes all, and sometimes none. For it to work for my application I need it to return all the matching elements in the page
Code trials:
from selenium import webdriver
from selenium.webdriver.common.by import By
def villanovan():
driver = webdriver.Chrome()
driver.implicitly_wait(10)
url = 'http://webcache.googleusercontent.com/search?q=cache:https://villanovan.com/&strip=0&vwsrc=0'
url_2 = 'https://villanovan.com/'
driver.get(url_2)
a = driver.find_elements(By.CLASS_NAME, "homeheadline")
titles = [i.text for i in a if len(i.text) != 0]
links = [i.get_attribute('href') for i in a if len(i.text) != 0]
return [titles, links]
if __name__ == "__main__":
print(villanovan())
I was expecting a list with multiple links and article titles, but recieved a list with the first element found, not all elements found.

To extract the value of href attributes you can use list comprehension and you can use either of the following locator strategies:
Using CSS_SELECTOR:
driver.get("https://villanovan.com/")
time.sleep(3)
print([my_elem.get_attribute("href") for my_elem in driver.find_elements(By.CSS_SELECTOR, "a.homeheadline[href]")])
Using XPATH:
driver.get("https://villanovan.com/")
time.sleep(3)
print([my_elem.get_attribute("href") for my_elem in driver.find_elements(By.XPATH, "//a[#class='homeheadline' and #href]")])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Console Output:
['https://villanovan.com/22105/sports/villanova-goes-cold-in-clutch-against-no-14-marquette/', 'https://villanovan.com/22102/sports/villanova-bests-marquette-in-blowout-win-73-54/', 'https://villanovan.com/22098/news/decarbonizing-villanova-a-town-hall-on-fossil-fuel-divestment/', 'https://villanovan.com/22096/news/biology-professors-granted-1-million-for-wetlands-research/', 'https://villanovan.com/22093/news/students-create-the-space-supporting-sex-education/', 'https://villanovan.com/22098/news/decarbonizing-villanova-a-town-hall-on-fossil-fuel-divestment/', 'https://villanovan.com/22096/news/biology-professors-granted-1-million-for-wetlands-research/', 'https://villanovan.com/22044/culture/julia-staniscis-leaning-on-letters/', 'https://villanovan.com/22032/culture/villanova-sorority-recruitment-recap/', 'https://villanovan.com/22105/sports/villanova-goes-cold-in-clutch-against-no-14-marquette/', 'https://villanovan.com/22102/sports/villanova-bests-marquette-in-blowout-win-73-54/', 'https://villanovan.com/21932/opinion/villanova-should-be-free-for-families-earning-less-than-100000/', 'https://villanovan.com/21897/opinion/grasshoppergate-the-state-of-villanova-dining/', 'https://villanovan.com/22105/sports/villanova-goes-cold-in-clutch-against-no-14-marquette/', 'https://villanovan.com/22102/sports/villanova-bests-marquette-in-blowout-win-73-54/', 'https://villanovan.com/22093/news/students-create-the-space-supporting-sex-education/', 'https://villanovan.com/22090/news/mlk-day-of-service/', 'https://villanovan.com/22087/news/university-updates-covid-procedures/']

How to scroll correctly in a dynamically-loading webpage with Selenium?

Here's the link of the website : website
I would like to have all the links of th hotels in this location.
Here's my script :
import pandas as pd
import numpy as np
from selenium import webdriver
import time
PATH = "driver\chromedriver.exe"
options = webdriver.ChromeOptions()
options.add_argument("--disable-gpu")
options.add_argument("--window-size=1200,900")
options.add_argument('enable-logging')
driver = webdriver.Chrome(options=options, executable_path=PATH)
driver.get('https://fr.hotels.com/search.do?destination-id=10398359&q-check-in=2021-06-24&q-check-out=2021-06-25&q-rooms=1&q-room-0-adults=2&q-room-0-children=0&sort-order=BEST_SELLER')
cookie = driver.find_element_by_xpath('//button[#class="uolsaJ"]')
try:
cookie.click()
except:
pass
for i in range(30):
driver.execute_script("window.scrollBy(0, 1000)")
time.sleep(5)
time.sleep(5)
my_elems = driver.find_elements_by_xpath('//a[#class="_61P-R0"]')
links = [my_elem.get_attribute("href") for my_elem in my_elems]
X = np.array(links)
print(X.shape)
#driver.close()
But I cannot find a way to tell the script : scroll down until there is nothing more to scroll.
I tried to change this parameters :
for i in range(30):
driver.execute_script("window.scrollBy(0, 1000)")
time.sleep(30)
I changed the time.sleep(), the number 1000 and so on but my output keep changing and not in the right way.
output
As you can see, I have scraped a lot of numbers differents. How to make my script scraping a same amout each time ? Not necessarily each links but at last a stable number.
Here it scroll and at one point it seems blocked and scrape all the links it has at the moment. That's not appropriate.

There are several issues here.
You are getting the elements and their links only AFTER you finished scrolling while you should do that inside the scrolling loop.
You should wait until the cookies alert is appearing to close it.
You can scroll until the footer element is presented.
Something like this:
import pandas as pd
import numpy as np
from selenium import webdriver
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
PATH = "driver\chromedriver.exe"
options = webdriver.ChromeOptions()
options.add_argument("--disable-gpu")
options.add_argument("--window-size=1200,900")
options.add_argument('enable-logging')
driver = webdriver.Chrome(options=options, executable_path=PATH)
wait = WebDriverWait(driver, 20)
driver.get('https://fr.hotels.com/search.do?destination-id=10398359&q-check-in=2021-06-24&q-check-out=2021-06-25&q-rooms=1&q-room-0-adults=2&q-room-0-children=0&sort-order=BEST_SELLER')
wait.until(EC.visibility_of_element_located((By.XPATH, '//button[#class="uolsaJ"]'))).click()
def is_element_visible(xpath):
wait1 = WebDriverWait(driver, 2)
try:
wait1.until(EC.visibility_of_element_located((By.XPATH, xpath)))
return True
except Exception:
return False
while not is_element_visible("//footer[#id='footer']"):
my_elems = driver.find_elements_by_xpath('//a[#class="_61P-R0"]')
links = [my_elem.get_attribute("href") for my_elem in my_elems]
X = np.array(links)
print(X.shape)
driver.execute_script("window.scrollBy(0, 1000)")
time.sleep(5)
#driver.close()

You can try this by directly calling the DOM and locate some element that will be only at the bottom of the page with .is_displayed() selenium method which returns true/false:
# https://stackoverflow.com/a/57076690/15164646
while True:
# it will be returning false until the element is located
# "#message" id = "No more results" at the bottom of the YouTube search
end_result = driver.find_element_by_css_selector('#message').is_displayed()
driver.execute_script("var scrollingElement = (document.scrollingElement || document.body);scrollingElement.scrollTop = scrollingElement.scrollHeight;")
# further code below
# once the element is found it returns True. If so, it will break out of the while loop
if end_result == True:
break
I wrote a blog post where I used this method to scrape YouTube Search.

Extracting reviews from Google play store app website

import time
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
import pandas as pd
class FindByXpathCss():
# Declaring variables
Reviews = [] # List to store final set of reviews
reviewText = [] # List to store reviews extracted from XPath
reviewFullText = []
# Chromedriver path
driver = webdriver.Chrome(executable_path=r"F:\Chrome-webdriver\chromedriver.exe")
driver.maximize_window()
baseUrl = "https://play.google.com/store/apps/details?id=com.delta.mobile.android&hl=en_US&showAllReviews=true"
driver.get(baseUrl)
# driver.execute_script("scrollBy(0,300);")
# Scrolling down
for i in range(20):
driver.find_element_by_xpath('//*[#id="yDmH0d"]').send_keys(Keys.ARROW_DOWN, i)
time.sleep(0.5)
# To click on Show more button
#btnShowMore = driver.find_element_by_xpath('//*[#id="fcxH9b"]/div[4]/c-wiz/div/div[2]''/div/div[1]/div/div/div[1]/div[2]/div[2]/div/span/span').click()
# Scrolling to top
for j in range(10):
driver.find_element_by_xpath('//*[#id="yDmH0d"]').send_keys(Keys.ARROW_UP, j)
#for i in range(10):
review_btn = driver.find_elements_by_xpath("//button[contains(#class,'')][contains(text(),'Full Review')]")
single_review_btn = driver.find_element_by_xpath("//button[contains(#class,'')][contains(text(),'Full Review')]")
#time.sleep(1)
The div html tag having 2 tags, one is having jsname as 'fbQN7e' which is there for holding the bigger reviews and those reviews will have button called "Full Review". Another one span within the same div html tag is 'bN97Pc' which is there to hold smaller reviews which wont have 'Full review' button at the end of this review. I couldn't get reviews of both types of span. Here I tried to write reviewFullText list directly to dataframe, but getting only element datatype, not text. I don't know why this too happening.
for btn in review_btn:
btn.click()
reviewFullText = driver.find_elements_by_css_selector("span[jsname='fbQN7e']")
#if(single_review_btn.is_enabled()==False):
#reviewText = driver.find_elements_by_css_selector("span[jsname=\"bN97Pc\"]")
##else:
#pass
# Iterating each reviews and appending into list Reviews
for txtreview in reviewText:
reviewFullText.append(txtreview.text)
print(len(reviewFullText))
# Writing the list values into csv file
df = pd.DataFrame(reviewFullText)
#df = pd.DataFrame({'Reviews': 'Reviews'}) #'Sentiment': 'null'})
df.to_csv('Reviews.csv', index=True, encoding='utf-8')
driver.close()

I have modified your solution to retrieve all review from the page.
import time
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
class FindByXpathCss():
driver = webdriver.Chrome(executable_path=r"C:\New folder\chromedriver.exe")
driver.maximize_window()
baseUrl = "https://play.google.com/store/apps/details?id=com.delta.mobile.android&hl=en_US&showAllReviews=true"
driver.get(baseUrl)
scrolls = 3
while True:
scrolls -= 1
driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
time.sleep(3)
if scrolls < 0:
break
buttonClick = WebDriverWait(driver, 30).until(
EC.visibility_of_all_elements_located((By.XPATH, "//button[contains(#class,'')][contains(text(),'Full Review')]")))
for element in buttonClick:
driver.execute_script("arguments[0].click();", element)
reviewText = WebDriverWait(driver, 30).until(
EC.presence_of_all_elements_located((By.XPATH, "//*[#class='UD7Dzf']")))
for textreview in reviewText:
print textreview.text
reviewText = WebDriverWait(driver, 30).until(
EC.presence_of_all_elements_located((By.XPATH, "//*[#class='UD7Dzf']")))
# reviewText = driver.find_elements_by_xpath("//*[#class='UD7Dzf']")
for textreview in reviewText:
print textreview.text
Output:

Python Selenium wedriver iterate over form

I'm trying to automate a form assertion with a list of data, however I'm struggling on when and how to use "WebDriverWait" or driver implict wait. My list is 1000 strings. When I run a sample of a 100, less than 100 are captured correctly. The code below catches ElementNotSelectableException\StaleElementReferenceException but doesn't address them correctly.
import unittest
from selenium.common.exceptions import ElementNotVisibleException, ElementNotSelectableException, \
ElementNotInteractableException, StaleElementReferenceException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Chrome()
oar_order_id = '#oar-order-id'
oar_zip_code = '#oar_zip'
#order confirmation list
order_id_list = ["WO-34284975","WO-50002804","WO-50004302","WO-34282964","WO-34214963"]
#zip code confirmation list
zip_code_list = ["23508","99338","62036","75074-7520","37763","98034","89406-4361"]
submit_button = 'body.sales-guest-form.page-layout-1column:nth-child(2) div.page-wrapper:nth-child(10) main.page-main:nth-child(4) div.columns:nth-child(4) div.column.main form.form.form-orders-search:nth-child(4) div.actions-toolbar div.primary > button.action.submit.primary'
# Enter orderid & Zip & click enter
found = []
notfound = []
length = len(order_id_list)
driver.implicitly_wait(10)
wait = WebDriverWait(driver, 15, poll_frequency=1,
ignored_exceptions=[ElementNotVisibleException ,ElementNotSelectableException])
driver.get('https:/www.ecklers.com/sales/orderlookup/index/')
# Sample of 10 from list
for i in range(10):
try:
wait
element1 = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, oar_order_id)))
element1.send_keys(order_id_list[i])
element2 = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, oar_zip_code)))
element2.send_keys(zip_code_list[i])
element3 = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, submit_button)))
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, submit_button)))
element3.click()
wait
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"body.sales-guest-view.page-layout-1column:nth-child(2) div.page-wrapper:nth-child(10) main.page-main:nth-child(4) div.page-title-wrapper:nth-child(3) h1.page-title > span.base")))
title = driver.title
#driver.implicitly_wait(60)
except(StaleElementReferenceException):
print("StaleElementReferenceException")
except(ElementNotInteractableException):
print("ElementNotInteractableException")
try:
text = order_id_list[i]
assert(title.endswith(text[3:]))
found.append(text)
except(AssertionError):
notfound.append((order_id_list[i]))
driver.get('https:/www.ecklers.com/sales/orderlookup/index/')
wait
print(found,notfound)```

I found an answer that helped me. I used: time.sleep(.500) after driver.get('https:/www.ecklers.com/sales/orderlookup/index/'

PYTHON : How to fix for loop stop?

I'm trying to extract over 24 product names from multiple pages but my for loop only return 1 product name. I need the script to go to a individual product page and extract product name and the go back to
page url list and repeat the same steps.
The script below on return the first product name and then stop.
from selenium import webdriver
import time
HROMEDRIVER_PATH = '/Users/reezalaq/PycharmProjects/wholesale/driver/chromedriver'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
driver = webdriver.Chrome(CHROMEDRIVER_PATH, options=chrome_options)
chrome_options.accept_untrusted_certs = True
chrome_options.assume_untrusted_cert_issuer = True
chrome_options.headless = False
home = "https://www.blibli.com/c/4/beli--mukena/MU-1000008/54912?page=1&start=0&category=MU-1000008&sort=7&intent=false"
driver.get(home)
for number in range(1, 24):
elem = driver.find_elements_by_xpath('//*[#id="catalogProductListContentDiv"]/div[3]/div[' + str(number) + ']/div/div/a')
for link in elem:
producturl = link.get_attribute("href")
time.sleep(24)
driver.get(producturl)
getproductname = driver.find_element_by_class_name("product__name-text")
print(getproductname.text)
driver.close()

I do this by opening a new tab, then switching to the new tab.
First, wait until all your elements are visible.
Instead of time.sleep (24) you can use WebDriverWait.
This is the element you mean:
//div[#class="product__item"]/a
You can try the below code:
driver.get('https://www.blibli.com/c/4/beli--mukena/MU-1000008/54912?page=1&start=0&category=MU-1000008&sort=7&intent=false')
elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//div[#class="product__item"]/a')))
for element in elements:
url = element.get_attribute('href')
#open new tab with specific url
driver.execute_script("window.open('" +url +"');")
#switch to new tab
driver.switch_to.window(driver.window_handles[1])
getproductname = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CLASS_NAME, 'product__name-text')))
print(getproductname.text)
#close current tab
driver.close()
#back to first tab
driver.switch_to.window(driver.window_handles[0])
driver.quit()
Following import:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python selenium xpath loop skips random elements - python

Related

Selenium: find elements by classname returns only one item instead of all

How to scroll correctly in a dynamically-loading webpage with Selenium?

Extracting reviews from Google play store app website

Python Selenium wedriver iterate over form

PYTHON : How to fix for loop stop?

Categories

Resources