Selecting an element with conditions and multiple attributes using Python's Selenium - python

I'm having trouble accessing a specific span with selenium. I want to click on a specific search result if the spans texts match a specific string from the website https://www.superherodb.com/battle/create/#add_member
The html code:
<span>
<img src="/pictures2/portraits/10/025/791.jpg" class="avatar avatar-sm" alt="Superman"> Superman
<span class="suffix level-1">Kal-El</span>
<span class="suffix level-2">Prime Earth</span>
</span>
I want to check if the string from the first span = Superman, and second span = Kal-El, and third span= Prime Earth, match the strings on a list.
My code:
driver = webdriver.Chrome('C:\Webdriver\chromedriver.exe', options= options)
driver.get('https://www.superherodb.com/battle/create')
wait = WebDriverWait(driver, 5)
wait.until(EC.visibility_of_element_located((By.XPATH, '//*[#id="team1"]/div/a'))).click()
Superheroes = ["Superman", "Spiderman"] #check in span
Names = ["Kal-El", "Peter Parker"] #check in span class = "suffix level-1"
Universes = ["Prime Earth", "Prime, Earth"] #check in span class = "suffix level-1"
wait.until(EC.visibility_of_element_located((By.NAME, 'quickselect'))).send_keys(Superheroes[0]) #Search for Superman
search = [my_elem for my_elem in WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.XPATH,
".//*[#id='quickselect_result']/li/span/[contains(text(), Superheroes[0] )]/..")))
if
my elem in WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.XPATH,
".//*[#id='quickselect_result']/li/span/span[1][#class='suffix level-1' and contains(text(), Names[0] )]/..")))
if
my_elem in WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.XPATH,
".//*[#id='quickselect_result']/li/span/span[2][#class='suffix level-2' and contains(text(), Universes [0])]/..")))
]
search.click()
My Error:
InvalidSelectorException: Message: invalid selector: Unable to locate an element with the xpath expression .//*[#id='quickselect_result']/li/span/[contains(text(), Superheroes[0] )]/.. because of the following error:
SyntaxError: Failed to execute 'evaluate' on 'Document': The string './/*[#id='quickselect_result']/li/span/[contains(text(), Superheroes[0] )]/..' is not a valid XPath expression.
I want to ultimately turn this into a for loop to check a huge list of Superheroes but I want to select the right one from the list because there are many versions of the same Superhero (i.e. same Superhero, but different Universe).
What am I doing that is wrong?

Related

How to implement expected condition for 'element_to_be_clickable' under for loop in python selenium

DLPRules = self.find_elements(By.CSS_SELECTOR, "div.ng-option")
print(len(DLPRules))
for dlpRule in DLPRules:
dlp = dlpRule.find_element(By.CSS_SELECTOR, "div span.label-title")
if dlp.text == DLPRule_name:
print(dlp.text)
WebDriverWait(self, 60).until(
ec.element_to_be_clickable(
(By.CSS_SELECTOR,
"div.ng-option div span.label-title")))
dlp.click()
time.sleep(3)
Selector pointing to 1st element and waiting for 1st element to be clickable. The CSS selector is pointing to multiple elements.
I want the ec to be clickable for that element which is coming under if condition.

Python selenium find element returns nothing

I would like to find titles contain '募集说明书', but the following codes just return nothing.
No error, just nothing. empty results.
driver.get('http://www.szse.cn/disclosure/bond/notice/index.html')
wait = WebDriverWait(driver, 30)
datefield_st = wait.until(EC.element_to_be_clickable((By.XPATH, "//div[#class='input-group-wrap form-control dropdown-btn']/input[1]")))
datefield_st.click()
wait.until(EC.element_to_be_clickable((By.XPATH, "//div[#class='calendar-control'][1]//div[3]//a"))).click()
wait.until(EC.element_to_be_clickable((By.XPATH, "//ul[#class='monthselect'][1]//li[text()='{}']".format("1")))).click()
s1 = driver.find_element_by_class_name('input-left')
s1.send_keys("2022-1-1")
s2 = driver.find_element_by_class_name('input-right')
s2.send_keys("2022-1-18")
driver.find_element_by_id("query-btn").click()
while True:
time.sleep(2)
try:
links=[link.get_attribute('href') for link in wait.until(EC.presence_of_all_elements_located((By.XPATH,"//a[#attachformat][.//span[contains(text(),'募集说明书')]]")))]
titles=[title.text for title in wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//span[#class='pull-left ellipsis title-text' and contains(text(), '募集说明书')]//parent::a")))]
dates=[date.text for date in wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//span[#class='pull-left ellipsis title-text' and contains(text(), '募集说明书')]//ancestor::td//following-sibling::td")))]
for link, title, date in zip(links, titles, dates):
print(link,title,date)
<div class="text-title-box">
<a attachformat="pdf" attachpath="/disc/disk03/finalpage/2022-01-21/bb9854c5-9d89-4914-a6ea-219b487b874a.PDF" target="_blank" href="/disclosure/listed/bulletinDetail/index.html?bd5fd845-e810-42d3-98b3-d2501daaabc3" class="annon-title-link">
<span class="pull-left title-text multiline" title="22新资01:新疆金投资产管理股份有限公司2022年面向专业投资者公开发行公司债券(第一期)募集说明书">22新资01:新疆金投资产管理股份有限公司2022年面向专业投资者公开发行公司债券(第一期)募集说明书</span>
<span class="pull-left ellipsis title-icon" title="点击下载公告文件"><img src="http://res.static.szse.cn/modules/disclosure/images/icon_pdf.png">(5822k)</span>
<span class="titledownload-icon" title="点击下载公告文件"></span>
</a>
</div>
Could someone please help with this issue? Many thanks
Elements matching //a[#attachformat][.//span[contains(text(),'募集说明书')]] XPath are located on the bottom of presented search results, they are out of the visible screen until you scroll them into the view.
Also, you are using a wrong locators for titles. See my fixes there.
The same about dates.
Also there is no need to use wait.until(EC.visibility_of_all_elements_located 3 times there. Since elements are found (and scrolled into the view) you can simply get them by driver.find_elements.
I also see no need for while True: loop here, it will break your code from complete after getting those 2 elements data, but I leaved it as is since you mentioned you intend to click "next page" there.
from selenium.webdriver.common.action_chains import ActionChains
driver.get('http://www.szse.cn/disclosure/bond/notice/index.html')
wait = WebDriverWait(driver, 30)
actions = ActionChains(driver)
datefield_st = wait.until(EC.element_to_be_clickable((By.XPATH, "//div[#class='input-group-wrap form-control dropdown-btn']/input[1]")))
datefield_st.click()
wait.until(EC.element_to_be_clickable((By.XPATH, "//div[#class='calendar-control'][1]//div[3]//a"))).click()
wait.until(EC.element_to_be_clickable((By.XPATH, "//ul[#class='monthselect'][1]//li[text()='{}']".format("1")))).click()
s1 = driver.find_element_by_class_name('input-left')
s1.send_keys("2022-1-1")
s2 = driver.find_element_by_class_name('input-right')
s2.send_keys("2022-1-18")
driver.find_element_by_id("query-btn").click()
while True:
time.sleep(2)
try:
lower_link =wait.until(EC.presence_of_element_located((By.XPATH,"(//a[#attachformat][.//span[contains(text(),'募集说明书')]])[last()]")))
actions.move_to_element(lower_link).perform()
time.sleep(0.5)
links=[link.get_attribute('href') for link in driver.find_elements(By.XPATH,"//a[#attachformat][.//span[contains(text(),'募集说明书')]]")]
titles=[title.text for title in driver.find_elements(By.XPATH, "//span[contains(#class,'pull-left') and contains(text(), '募集说明书')]//parent::a")]
dates=[date.text for date in driver.find_elements(By.XPATH, "//span[contains(#class,'pull-left') and contains(text(), '募集说明书')]//ancestor::td//following-sibling::td")]
for link, title, date in zip(links, titles, dates):
print(link,title,date)

Loop through the div tag containing paragraph tags

I am trying to scrape information from an automobile blog but i can't loop through the div tag containing the paragraph tags that contain the info.
driver.get("https://www.autocar.co.uk/car-news")
driver.maximize_window()
for i in range(3):
i+=1
info = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, f'//*[#id="page"]/div[2]/div[1]/div[1]/div[2]/div/div[1]/div/div[1]/div[1]/div[{i}]/div')))
heading = info.find_element_by_tag_name('h2')
clickable = heading.find_element_by_tag_name('a')
driver.execute_script("arguments[0].click();", clickable)
# the code starts to fail around here
try:
body_info = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, 'field-item even')))
main_text = []
for j in range(3):
j+=1
text = body_info.find_element_by_tag_name('p')
main_text.append(text)
for t in main_text:
t_info = t.text
print(f'{heading.text}\n{t_info}')
except:
print("couldn't find tag")
driver.back()
There's an issue with your code, (By.CLASS_NAME, 'field-item even').
Selenium does not have support for multiple classes or classes with space.
Simply replace space with . and that would be the CSS_SELECTOR
Try something like this:
try:
body_info = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, '.field-item.even')))
It must be '.field-item even' and not 'field-item even' if you are using By.CSS_SELECTOR for presence_of_element_located().
So Replace,
body_info = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, 'field-item even')))
with,
body_info = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, 'field-item even')))
Official docs. https://selenium-python.readthedocs.io/api.html#locate-elements-by
To select multiple classes, you must use the class selector. You can't select multiple classes through a space, like in CSS. You need to select multiple classes using the class selector. So you must put a dot before all of the classes and not give any space between them

Select an element of a drop down -selenium -python

I try of select the sport 'Football' in a drop down of sport but impossible of click on it.
I tried with the Select() method:
driver = webdriver.Chrome()
url = "https://www.flashscore.com/"
driver.get(url)
Team = 'Paris SG'
Type = 'Teams'
sport = 'Football'
buttonSearch = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".header__button--search"))).click()
fill_search_bar = driver.find_element(By.CSS_SELECTOR, ".input___1NGxU3-")
fill_search_bar.clear()
fill_search_bar.send_keys(Team)
driver.find_element(By.CSS_SELECTOR, ".dropDown").click()
select_sport = Select(driver.find_element(By.XPATH,"//div[contains(#class, 'dropDown__list')]"))
select_sport.select_by_visible_text(sport)
This code return this error : UnexpectedTagNameException: Message: Select only works on <select> elements, not on <div>.
Here is my second version:
fill_search_bar = driver.find_element(By.CSS_SELECTOR, ".input___1NGxU3-")
fill_search_bar.clear()
fill_search_bar.send_keys(Team)
driver.find_element(By.CSS_SELECTOR, ".dropDown").click()
select_sport = WebDriverWait(driver, timeout=10).until(EC.element_to_be_clickable((By.XPATH,"//[#class='dropDown__list']/[contains(text(),'"+ sport +"')]"))).click()
This code return this error : selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[#class='dropDown__list']/div[contains(text(),'Football')]"}.
How can I solve this problem ?
I would suggest to break down the wait until class into two lines for simplicity. Its totally optional and wouldn't make much of a difference.
wait = WebDriverWait(driver, 300)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, ".header__button--search")))
element_to_be_clicked=driver.find_element_by_css_selector(".header__button--search")
element_to_be_clicked.click()
For the second part try using the values of the options in drop down list:
fill_search_bar.clear()
fill_search_bar.send_keys(Team)
driver.find_element_by_xpath("//div[#class='dropDown__selectedValue dropDownValueSelected___3msxRQS']").click()
select_sport=Select(driver.find_element_by_class("dropDown__list dropDownList___3V-ppVu"))
select_sport.select_by_value('1') #football has value 1 in the list

Selenium Web scraping nested divs with no ids or class names

I am trying to get the product name and the quantity from a nested HTML table using selenium. My problem is some of the divs don't have any id or class names. The table I am trying to access is the Critical Product list. Here is what I have done but I do seem to be lost at how I can get the nested divs. The site is in the code.
options = Options()
options.add_argument('start-maximized')
driver = webdriver.Chrome(chrome_options=options, executable_path=r'/usr/local/bin/chromedriver/')
url = 'https://www.rrpcanada.org/#/' # site I'm scraping
driver.get(url)
time.sleep(150)
page = driver.page_source
driver.quit()
html_soup = BeautifulSoup(page, 'html.parser')
item_containers = html_soup.find_all('div', class_='critical-products-title hide-mobile')
if item_containers:
for item in item_containers:
for link in item.findAll('a', ) # need to loop the inner divs to reach the href and then get to the left and right classes to get title and quantity
print(item)
Here is the image from the inspection. I want to be able to loop through all the divs and get the title and quantity.
You don't need beautiful soup, nor to save the page_source.
I used a CSS selector to select all the target rows in the table and then applied list comprehension to choose the left and right sides of each row. I outputted the results to a list of tuples.
options = Options()
options.add_argument('start-maximized')
driver = webdriver.Chrome(chrome_options=options, executable_path=r'/usr/local/bin/chromedriver/')
url = 'https://www.rrpcanada.org/#/' # site I'm scraping
driver.get(url)
time.sleep(150)
elements = driver.find_elements_by_css_selector('#app > div:nth-child(1) > div.header-wrapper > div.header-right > div.critical-product-table-container > div.table.shorten.hide-mobile > div > div > div > a > div')
targetted_values = [(element.find_element_by_css_selector('.line-item-left').text, element.find_element_by_css_selector('.line-item-right').text) for element in elements]
driver.quit()
Example output of targetted_values:
[('Surgical & Reusable Masks', '376,713,363 available'),
('Disposable Gloves', '66,962,093 available'),
('Gowns and Coveralls', '40,502,145 available'),
('Respirators', '22,189,273 available'),
('Surface Wipes', '20,650,831 available'),
('Face Shields', '16,535,686 available'),
('Hand Sanitizer', '11,152,890 L available'),
('Thermometers', '8,457,993 available'),
('Testing Kits', '2,110,815 available'),
('Surface Solutions', '107,452 L available'),
('Protective Barriers', '10,833 available'),
('Ventilators', '410 available')]
To print the product name and the quantity you need to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR and text attribute:
driver.get('https://www.rrpcanada.org/#/')
items = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.table.shorten.hide-mobile > div div.line-item-title")))]
quantities = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.table.shorten.hide-mobile > div div.line-item-bold.available")))]
for i,j in zip(items,quantities):
print(i, j)
Using XPATH and get_attribute("innerHTML"):
driver.get('https://www.rrpcanada.org/#/')
items = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[#class='table shorten hide-mobile']/div//div[#class='line-item-title']")))]
quantities = [my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[#class='table shorten hide-mobile']/div//div[#class='line-item-bold available']")))]
for i,j in zip(items,quantities):
print(i, j)
Console Output:
Surgical & Reusable Masks 376,713,363 available
Disposable Gloves 66,962,093 available
Gowns and Coveralls 40,502,145 available
Respirators 22,189,273 available
Surface Wipes 20,650,831 available
Face Shields 16,535,686 available
Hand Sanitizer 11,152,890 L available
Thermometers 8,457,993 available
Testing Kits 2,110,815 available
Surface Solutions 107,452 L available
Protective Barriers 10,833 available
Ventilators 410 available
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python
Outro
Link to useful documentation:
get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium
You have to use relative xpath to find the element with class="line-item-left" for the name of each item and the element with class="line-item-right" for the number of available items.
driver.find_elements_by_class_name("line-item-left") //Item names
driver.find_elements_by_class_name("line-item-right") //Number of items available
Note the 's' in elements
This is the selector for product name:
div.critical-product-table-container div.line-item-left
And for total:
div.critical-product-table-container div.line-item-right
But the following approach is without BeautifulSoup.
time.sleep(...) is bad practice, please use WebDriverWait instead.
And to pair the above two variables and perform parallel looping, I try to use the zip() function:
url = 'https://www.rrpcanada.org/#/' # site I'm scraping
driver.get(url)
wait = WebDriverWait(driver, 150)
product_names = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'div.critical-product-table-container div.line-item-left')))
totals = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'div.critical-product-table-container div.line-item-right')))
for product_name, total in zip(product_names, totals):
print(product_name.text +'--' +total.text)
driver.quit()
You need following import:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

Categories