Finding element with Selenium on Python - python

I am trying to automate the clicking of "next" on my university's online lecture when the current slide ends as it requires the user to manually press "next" whenever the slide has ended.
Using selenium with python, managed to get to webpage, login and navigate to the lecture slides itself but am unable to progress further
HTML Element on pastebin trying to element on line 3397
I am trying to get the elapsedTime / totalTime and
while ( currentSlide != totalSlide )
if elapsedTime == totalTime
find and click on 'next'
I've tried:
duration = driver.find_element(By.XPATH,"//div[#class='label time'][#style='display: none;]")
duration = driver.find_element(By.XPATH,"//div[#class='.label.time'][#style='display: none;']")
Any help would be appreciated!
EDIT:
Got it working suggested by Lukas
content = urllib.request.urlopen(URL, timeout=10).read().decode("utf-8")
duration = content.split("<div class="label time" style="display: none;">")[1].split("</div>")[0]

Did you tried to use a normal parser instead of selenium?
Check this out:
content = urllib.request.urlopen(URL, timeout=10).read().decode("utf-8")
duration = content.split("<div class="label time" style="display: none;">")[1].split("</div>")[0]

i would try:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Firefox()
browser.get(URL)
delay = 30 # seconds
WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.CLASS_NAME, 'label time')))
print("Page is ready!")
# duration = browser.find_element_by_class("label time").text
# --> "InvalidSelectorException: Message: invalid selector: Compound class names not permitted"
duration = browser.find_element_by_css_selector(".label.time").text
# alternative:
duration = driver.find_element_by_xpath("//*[#class='label time']").text

Related

Crawl data by Selenium but throws errors TimeoutException

I tried to crawl the reviews in the websites. For 1 website, it runs fine. however when I create a loop to crawl in many websites, it throws an error raise
TimeoutException(message, screen, stacktrace) TimeoutException
I tried to increase the waiting time from 30 to 50 now but it still does not run fine.
here is my code :
import requests
import pandas as pd
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
from datetime import datetime
start_time = datetime.now()
result = pd.DataFrame()
df = pd.read_excel(r'D:\check_bols.xlsx')
ids = df['ids'].values.tolist()
link = "https://www.bol.com/nl/ajax/dataLayerEndpoint.html?product_id="
for i in ids:
link3 = link + str(i[-17:].replace("/",""))
op = webdriver.ChromeOptions()
op.add_argument('--ignore-certificate-errors')
op.add_argument('--incognito')
op.add_argument('--headless')
driver = webdriver.Chrome(executable_path='D:/chromedriver.exe',options=op)
driver.get(i)
WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[data-test='consent-modal-confirm-btn']>span"))).click()
WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a.review-load-more__button.js-review-load-more-button"))).click()
soup = BeautifulSoup(driver.page_source, 'lxml')
product_attributes = requests.get(link3).json()
reviewtitle = [i.get_text() for i in soup.find_all("strong", class_="review__title") ]
url = [i]*len(reviewtitle)
productid = [product_attributes["dmp"]["productId"]]*len(reviewtitle)
content= [i.get_text().strip() for i in soup.find_all("div",attrs={"class":"review__body"})]
author = [i.get_text() for i in soup.find_all("li",attrs={"data-test":"review-author-name"})]
date = [i.get_text() for i in soup.find_all("li",attrs={"data-test":"review-author-date"})]
output = pd.DataFrame(list(zip(url, productid,reviewtitle, author, content, date )))
result.append(output)
result.to_excel(r'D:\bols.xlsx', index=False)
end_time = datetime.now()
print('Duration: {}'.format(end_time - start_time))
Here are some links that I tried to crawl :
link1
link2
As mentioned in the comments - your timing out because you're looking for a button that does not exist.
You need to catch the error(s) and skip those failling lines. You can do this with a try and except.
I've put together an example for you. It's hard coded to one url (as I don't have your data sheet) and it's a fixed loop with purpose to keep TRYING to click the "show more" button, even after it's gone.
With this solution be careful of your sync time. EACH TIME the WebDriverWait is called it will wait that full duration if it does not exist. You'll need to exit the expand loop when done (first time you trip the error) and keep your sync time tight - or it will be a slow script
First, add these to your imports:
from selenium.common.exceptions import TimeoutException
from selenium.common.exceptions import StaleElementReferenceException
Then this will run and not error:
#not a fixed url:
driver.get('https://www.bol.com/nl/p/Matras-180x200-7-zones-koudschuim-premium-plus-tijk-15-cm-hard/9200000130825457/')
#accept the cookie once
WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[data-test='consent-modal-confirm-btn']>span"))).click()
for i in range(10):
try:
WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a.review-load-more__button.js-review-load-more-button"))).click()
print("I pressed load more")
except (TimeoutException, StaleElementReferenceException):
pass
print("No more to load - but i didn't fail")
The output to the console is this:
DevTools listening on
ws://127.0.0.1:51223/devtools/browser/4b1a0033-8294-428d-802a-d0d2127c4b6f
I pressed load more
I pressed load more
No more to load - but i didn't fail
No more to load - but i didn't fail
No more to load - but i didn't fail
No more to load - but i didn't fail
(and so on).
This is how my browser looks - Note the size of the scroll bar for the link I used - it looks like it's got all the reviews:
I would suggest Use Infinite While loop and use try..except block. If element found it will click on the element else statement will go to the except block and exit from while loop.
driver.get("https://www.bol.com/nl/p/Matras-180x200-7-zones-koudschuim-premium-plus-tijk-15-cm-hard/9200000130825457/")
WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[data-test='consent-modal-confirm-btn']>span"))).click()
while True:
try:
WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a.review-load-more__button.js-review-load-more-button"))).click()
print("Lode more button found and clicked ")
except:
print("No more load more button available on the page.Please exit...")
break
Your console output will display like below.
Lode more button found and clicked
Lode more button found and clicked
Lode more button found and clicked
Lode more button found and clicked
No more load more button available on the page.Please exit...

Selenium - 'None' value from element

I wan't to get the value or the price of a stock from a trading website. The problem is, that when i'm using the .get attribute method like this:
.get_attribute('')
I can't seem to find anything to put in between the '' that will give me the value of the stock
Here is an image of the line when using inspect:
<span _ngcontent-c31="" class="price__value" style="" xpath="1"> 187.510 </span>
This is the code below that i've been making for this:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time
browser = webdriver.Chrome('/Users/ludvighenriksen/downloads/chromedriver')
browser.get('https://www.forex.com/en-uk/account-login/')
username_elem = browser.find_element_by_name('Username')
username_elem.send_keys('kebababdulaziz#gmail.com')
password_elem = browser.find_element_by_name('Password')
password_elem.send_keys('KEbababdulaziz')
password_elem.send_keys(Keys.ENTER)
time.sleep(5)
search_elem = WebDriverWait(browser, 20).until(EC.element_to_be_clickable(
(By.CSS_SELECTOR, "input.market-search__search-input")))
search_elem.click()
search_elem.send_keys('FB')
search_click_elem = WebDriverWait(browser, 20).until(EC.element_to_be_clickable(
(By.XPATH, "//app-market-table[#class='search-results-element ng-star-inserted']//div[#class='price--buy clickable-price arrows-flashing']")))
browser.execute_script("arguments[0].click();", search_click_elem)
price_elem = browser.find_element_by_css_selector("div.mercury:nth-child(2) div.mercury__body:nth-child(4) div.mercury__body-content-container app-workspace.ng-star-inserted:nth-child(3) div.panel-container:nth-child(1) app-workspace-panel.active.ng-star-inserted div.workspace-panel-content.workspace-panel-content--no-scroll-vertical.workspace-panel-content--no-scroll-horizontal.workspace-panel-content--auto-size div.workspace-panel-content__component.workspace-panel-content__component--auto-size app-deal-ticket.ng-star-inserted form.ticket-form.ng-untouched.ng-pristine.ng-invalid.ng-star-inserted div.market-prices app-market-prices.main-prices.ng-untouched.ng-pristine.ng-valid div.market-prices div.market-prices__direction label.buy.selected span.price.ng-star-inserted:nth-child(2) > span.price__value")
price_value = price_elem.get_attribute('value')
print(price_value)
The ('value') isn't working which makes sense i guess, but I think i've tried all that i could think of - and it prints out none.
The log in to the website is included if you want to try it out, thanks in Advance
If you want to access the content of some tag, you could use the .text option.

StaleElementReferenceException Using Selenium in Python

Yes i know that this type of question has been answered many times before, but none of them helped me. Actually i didn't know much about it so need your help!
My problem:
I am scraping through a website and it needs a CAPTCHA to search for every input. So i use FireFox as my browser as it asks for the captcha one time and doesn't change it. My code asks the user for CAPTCHA one time and then click on search button and tries to scrape the data, but when it clicks on the search button again (as it is in a loop) then it raises this error:
selenium.common.exceptions.StaleElementReferenceException:
Message: The element reference of <input id="txt_ALPHA_NUMERIC" class="ui-inputfield ui-inputtext ui-widget ui-state-default ui-corner-all" name="txt_ALPHA_NUMERIC" type="text"> is stale;
either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed
My old code:
from selenium import webdriver # Import module
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys # For keyboard keys
import time
import pandas as pd
URL = 'https://vahan.nic.in/nrservices/faces/user/searchstatus.xhtml' # Define URL
browser = webdriver.Firefox(executable_path=r'C:\Users\intel\Downloads\Setups\geckodriver.exe')
browser.get(URL)
vehicle_no = browser.find_element_by_xpath("""//*[#id="regn_no1_exact"]""")
vehicle_no.send_keys('RJ14CX3238')
captcha_input = input("enter your captcha ")
captcha = browser.find_element_by_xpath("""//*[#id="txt_ALPHA_NUMERIC"]""")
captcha.send_keys(captcha_input)
button_click = browser.find_element_by_xpath("/html/body/form/div[1]/div[3]/div/div[2]/div/div/div[2]/div[5]/div/button/span").click()
i = 111
attempt = 1
max_attempts = 2
while True:
i = i + 1
time.sleep(4)
reg_no = browser.find_element_by_xpath("/html/body/form/div[1]/div[3]/div/div[2]/div/div/div[2]/div[6]/div/div/div/table/tbody/tr[2]/td[2]/span").text
date = browser.find_element_by_xpath("/html/body/form/div[1]/div[3]/div/div[2]/div/div/div[2]/div[6]/div/div/div/table/tbody/tr[2]/td[4]").text
vehicle_no = browser.find_element_by_xpath("""//*[#id="regn_no1_exact"]""")
vehicle_no.send_keys('RJ14CX3' + str(i))
captcha.send_keys(captcha_input)
button_click = browser.find_element_by_xpath("/html/body/form/div[1]/div[3]/div/div[2]/div/div/div[2]/div[5]/div/button/span").click()
browser.execute_script("return arguments[0].scrollIntoView(true);", button_click)
Updated new code now:
from selenium import webdriver # Import module
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys # For keyboard keys
import time
import pandas as pd
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
URL = 'https://vahan.nic.in/nrservices/faces/user/searchstatus.xhtml' # Define URL
browser = webdriver.Firefox(executable_path=r'C:\Users\intel\Downloads\Setups\geckodriver.exe')
browser.get(URL)
vehicle_no = browser.find_element_by_xpath("""//*[#id="regn_no1_exact"]""")
vehicle_no.send_keys('RJ14CX3238')
captcha_input = input("enter your captcha ")
captcha = browser.find_element_by_xpath("""//*[#id="txt_ALPHA_NUMERIC"]""")
captcha.send_keys(captcha_input)
button_click = browser.find_element_by_xpath("/html/body/form/div[1]/div[3]/div/div[2]/div/div/div[2]/div[5]/div/button/span").click()
i = 111
while True:
button_click = browser.find_element_by_xpath("/html/body/form/div[1]/div[3]/div/div[2]/div/div/div[2]/div[5]/div/button/span")
WebDriverWait(browser, 10).until_not(EC.visibility_of_element_located((By.ID, "overley")))
browser.execute_script("return arguments[0].scrollIntoView(true);", button_click)
i = i + 1
#reg_no = browser.find_element_by_xpath("/html/body/form/div[1]/div[3]/div/div[2]/div/div/div[2]/div[6]/div/div/div/table/tbody/tr[2]/td[2]/span").text
#date = browser.find_element_by_xpath("/html/body/form/div[1]/div[3]/div/div[2]/div/div/div[2]/div[6]/div/div/div/table/tbody/tr[2]/td[4]").text
time.sleep(5)
vehicle_no.send_keys('RJ14CX3' + str(i))
WebDriverWait(browser, 10).until_not(EC.visibility_of_element_located((By.ID, "overley")))
captcha.send_keys(captcha_input)
Also fix any other problems if is in my code. Any help would be appreciated!!
Thanks in advance.
Simply re-find the button element in the loop, each time, rather than before the loop starts. Any time the DOM mutates, previous references are marked as stale, and will require a new instance. Interacting with Captcha's mutate the DOM, and mark the page as dirty (having changed/modified), which Selenium uses to flag "staleness".

Executing an event from python

As a beginner with python I am trying to make a simple automated login project. One more thing I have to do is to mouse click on the 4th row of html table to show me proper content. The html code of that segment is:
<tr class="tbl_seznam_barva_1" onclick="setTimeout('__doPostBack(\'ctl02$ctl00$BrowseSql1\',\'Select$0\')',470);" onmouseover="radekSeznamuClass=this.className;this.className='RowMouseOver';" onmouseout="this.className=radekSeznamuClass;">
<td>virtuálny terminál</td>
</tr>
How to execute this "onclick" event?
from selenium import webdriver
#...
browser = webdriver.Firefox()
elem = browser.find_element_by_name('txtUsername')
elem.send_keys('myLogin' + Keys.RETURN)
elem = browser.find_element_by_xpath("//tr[4]")
# some code for event execution goes here...
If you want to click() on the element with text as virtuálny terminál you can achieve it with:
browser.find_element_by_xpath("//*[text()='virtuálny terminál']").click()
If you need to click on more elements you can use a for-loop on all the elements.
elements = browser.find_element_by_xpath("//tr[4]")
for i in elements:
print(i.text)
Edit:
You can use ActionChains:
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
browser = webdriver.Firefox()
my_elem = browser.find_element_by_xpath("//tr[4]")
action = ActionChains(browser)
action.move_to_element(my_elem)
# action.move_to_element_with_offset(my_elem, 5, 5)
action.click()
action.perform()
Edit2:
If you can't use chromedriver and you have nothing else to do you can use execute_script:
element = browser.find_element_by_xpath("//tr[4]")
browser.execute_script("arguments[0].click();", element)
The problem is that one should wait for webpage to fully load
After the line elem.send_keys('myLogin' + Keys.RETURN) the webpage needs time to render a content, so a delay should by added:
import time
# ...
elem.send_keys('myLogin' + Keys.RETURN)
time.sleep(1)
elem=browser.find_element_by_xpath("//tr[4]")
elem.click()

Selenium load time errors - looking for possible workaround

I am trying to data scrape from a certain website. I am using Selenium so that I can log myself in, and then start parsing through data.
I have 3 main errors:
Last page # not loading properly. here I am loading "1" when it should be "197" and I believe this is happening because of the load associated with the website
element 'test' xpath not being found properly. I commented out in last for loop.
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[1]/div[#class='col-lg-3 col-sm-3 result-info' and 2]/span[#class='brand-name' and 1]"}
finally, I am trying to click last page to test if that works, but I am getting an error that Element is not found.
selenium.common.exceptions.ElementNotVisibleException: Message: element not visible
This is my code
import selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
url = "https://marketplace.refersion.com/"
username = "jupoxar#b2bx.net"
password = "testpass123"
driver = webdriver.Chrome("/Users/xxx/Downloads/chromedriver")
if __name__ == "__main__":
driver.get(url)
driver.find_element_by_xpath("/html/body/div[#class='wrapper']/div[#class='top-block']/header[#class='header clearfix']/div[#class='login-button']/a[#class='login-link']").click()
driver.find_element_by_id("email").send_keys(username) # enters the username in textbox
driver.find_element_by_xpath("/html/body/div[#id='app']/div[#class='top-block']/div[#class='row']/div[#class='col-xs-12 col-sm-10 col-sm-offset-1 col-md-8 col-md-offset-2 col-lg-6 col-lg-offset-3 main-section']/div[#class='main-section-content']/div/form[#class='form-horizontal']/div[#class='form-group ']/div[#class='col-xs-12 col-sm-10 col-sm-offset-1 input-group input-group-lg']/input[#id='password']").send_keys(password) # enters the password in textbox
# Find the submit button using class name and click on it.
driver.find_element_by_class_name("btn-primary").click()
driver.find_element_by_link_text("Find Offers").click()
driver.find_element_by_id("sorting-dropdown").click() # enters the username in textbox
driver.find_element_by_link_text("Newest First").click()
last_page = driver.find_element_by_class_name("right-center").text
print(last_page)
# try:
# last_page = WebDriverWait(driver, 3).until(EC.presence_of_element_located((By.CLASS_NAME, 'right-center')))
# print("Page is ready!")
# except TimeoutException:
# print("Loading took too much time!")
for i in range(1, 10):
# test = driver.find_element_by_xpath("//div[1]/div[#class='col-lg-3 col-sm-3 result-info' and 2]/span[#class='brand-name' and 1]")
# print(test)
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, 'hover-link'))).click()
I think this has to do with the way the page is being loaded. My question is, is there any work around to something like this?
You should have explicit waits in your code to handle the dynamic loading of the pages. Sorting the page by "Newest First" causes it to refresh the results and introduces a spinner to indicate the sorting.
<i class="fa fa-spinner fa-spin" aria-hidden="true" style="font-size: 48px;"></i>
Waiting for the spinner to disappear should give you the correct page count. Something on the following lines:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
.....
# your login code
.....
driver.find_element_by_link_text("Newest First").click()
element = WebDriverWait(driver, 10).until(
EC.invisibility_of_element_located((By.XPATH, "//i[#class='fa fa-spinner fa-spin']"))
)
last_page = driver.find_element_by_class_name("right-center").text
To find all the brand names listed on the page, you need to find all the span tags with class='brand-name' by calling the method find_elements_by_xpath(plural, elements)
brand_names_list = driver.find_elements_by_xpath("//span[#class='brand-name']")
for brand_name in brand_name_list:
print brand_name.text

Categories