I am not sure how to simulate clicking the "Next", my current code will just stop at first instance..I also created a for loop but it would just say object is not iterable..any way to revise my code?
from selenium import webdriver
browser = webdriver.Firefox()
import time
browser.get('https://www.autocodes.com/obd-code-list/powertrain/1')
linkElem = browser.find_element_by_css_selector("#pag > a")
type(linkElem)
linkElem.click() # follows the "Next" link
try the following code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Firefox()
import time
count = 0
browser.get('https://www.autocodes.com/obd-code-list/powertrain/1')
while(1):
try:
linkElem = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.XPATH, "//div[#id='pag']//a[contains(text(),'Next')]")))
except:
import traceback
traceback.print_exc()
print "last page reached."
break;
# type(linkElem)
count += 1
print "count " , count
linkElem.click() # follows the "Next" link
time.sleep(1)
the code continuously checks whether the Next element is present in an infinite while loop. If present, clicks it. Otherwise, break the loop.
Note: count variable is added as a debug statement to know the loop count. If not required, you can remove the code related to it.
Note: similarly, traceback is added to print the complete trace. just prints the exception trace for your reference.
Note: break keyword, breaks the infinite loop once it reaches the last page and comes out of while loop and continues with the next code.
References:
http://selenium-python.readthedocs.io/waits.html
May be your element id or type of "linkElem" get changed after first click.
Related
Hi I am trying to scrape gujarat rera using selenium python.
main.py:
import csv
import os
import time
import chromedriver_autoinstaller
from selenium import webdriver
from selenium.webdriver import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support.select import Select
WORK_DONE = False
chromedriver_autoinstaller.install()
driver = webdriver.Chrome()
driver.maximize_window()
driver.get("https://gujrerar1.gujarat.gov.in/home")
time.sleep(3)
close = driver.find_elements(By.CSS_SELECTOR,".close") # click close on popup.
for i in close:
try:
i.click()
time.sleep(0.5)
except:
pass
close = driver.find_elements(By.CSS_SELECTOR,".close") # click close on popup.
for i in close:
try:
i.click()
time.sleep(0.5)
except:
pass
search_bar = driver.find_element(By.ID,"password")
data=["031218"]
for reg_no in data:
last_thing_of_reg_no = reg_no.split("/")[5]
search_bar.send_keys(Keys.CONTROL + "A")
search_bar.send_keys(last_thing_of_reg_no)
driver.find_element(By.NAME,"btn1").click() # click on search button.
time.sleep(1.5)
driver.find_element(By.TAG_NAME,"body").send_keys(Keys.END) # scroll the page to end
time.sleep(1.5)
total_projects = driver.find_elements(By.CSS_SELECTOR,".search_result_list")
for projects in total_projects:
all_paragraphs = projects.find_elements(By.TAG_NAME,"p")
for i in all_paragraphs:
text = i.text.replace("Reg No. : ","")
print(text)
print(reg_no)
if reg_no == text:
print("yes")
print("done")
time.sleep(100)
print("done")
time.sleep(50)
First, it opens the website, closes the pop-ups, finds a search bar, and sends reg_no. Then it gets all the paragraph tags in the website, iterates through each of them, and the error is there. Both statements are exactly same but it is not going inside if statement, I don't know why it is so.
I want to print yes in this code and iterate inside it, but I don't know why it is not printing yes, even though both the strings are exactly the same.
I don't know what else to write but feel free to ask more questions,
Many thanks for considering my request.
So i am trying to pre-order the latest PS5 (during the sale )for which i thought to write a python script bot(Newbie to python) .I managed to write the below for another product .However the total time taken till i reach the final transaction page is just over 1 min ,and that too when the server is not loaded .
Is there anything i can do make the transaction run faster till the last page under 30 secs if possible ?
And since i am a newbie trying to learn from automate the boring stuff from python ,Plese suggest some what improvements can i make on my code ?
Also is there a possible way to run this headless till i enter my otp to confirm the transaction its basically the last page for which i am not using headless mode?
...
from selenium import webdriver
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
op = webdriver.ChromeOptions()
#op.add_argument('headless')
op.add_argument('disable-infobars')
op.add_argument('disable-extensions')
caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "none"
url ='https://www.flipkart.com/latibule-scratch-remover-wax/p/itm251fceb778c9f?pid=SCAG2W9BU2GEJFUW&lid=LSTSCAG2W9BU2GEJFUWHKPRKY&marketplace=FLIPKART&q=car+scratch+remover&store=1mt%2Fuhg%2Fzfb%2F2wf&srno=s_1_2&otracker=AS_QueryStore_OrganicAutoSuggest_1_8_na_na_na&otracker1=AS_QueryStore_OrganicAutoSuggest_1_8_na_na_na&fm=SEARCH&iid=3cbc56ce-4032-4ad3-9f8c-f58e3572b637.SCAG2W9BU2GEJFUW.SEARCH&ppt=sp&ppn=sp&ssid=duj7dx9q340000001622962739970&qH=7307be34aace13e8'
browser=webdriver.Chrome(desired_capabilities=caps, executable_path=r'C:\FAST\Python\3.7.0\chromedriver.exe',options=op)
browser.get(url)
i=0
while True:
try:
element = WebDriverWait(browser, 1).until(
#EC.presence_of_element_located((By.ID, "myDynamicElement"))
EC.element_to_be_clickable((By.CSS_SELECTOR, "._2KpZ6l._2U9uOA.ihZ75k._3AWRsL"))
)
break
except:
# print('element not found/not clickable')
continue
print('element found')
try:
while (element.is_enabled()):
element.click()
except:
print(url)
flag=1
while(browser.current_url):
try:
if (flag ==1):
elem =browser.find_element_by_css_selector('input._17N0em')
while(not elem.get_attribute('value')):
elem.send_keys('8xxxxxxxxx')
elem.send_keys(Keys.ENTER)
flag=2
elif (flag==2):
elem =browser.find_element_by_css_selector('input[type=password]')
while(not elem.get_attribute('value')):
elem.send_keys('111#1234')
time.sleep(2)
elem.send_keys(Keys.ENTER)
flag=3
elif(flag==3):
elem =browser.find_element_by_id('to-payment')
while( elem is None):
continue
elem.click()
flag=4
elif(flag==4):
elem =browser.find_element_by_css_selector('input._1w3ZZo._2mFmU7')
while( elem is None):
continue
elem.send_keys('xxx')
elem.send_keys(Keys.ENTER)
flag= 5
elif (flag==5):
break
except:
continue
print('end')
print(url)
print(browser.current_url)
...
I see 2 things I would make it other.
1)
element = WebDriverWait(browser, 1).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "._2KpZ6l._2U9uOA.ihZ75k._3AWRsL"))
If that element is normally expected to appear there you should use much longer timeout. Since this is not a kind of sleep, Selenium will continue exactly on the moment it detects that element clickable. On the other hand too short timeouts will cause timeout exceptions in many cases if the web page had not complete loading in that limited time period
2)
try:
while (element.is_enabled()):
element.click()
I'm not sure it's a good practice to click on the same element with infinite loop while clicks are performed with a very short delays, actually kind of bombing that element maybe thousands times in a second...
So im trying to figure out how to run this loop properly, my issue is that depending on the link that is loading, the page that loads will have an access denied error, this isnt like that for all the links, my issue is that i would like to identify whether or not when a particular element loads onto my screen, the program recognizes it and breaks the loop, and starts the next iteration in the for loop, so im trying to determine whether the "Access-Denied" element is present, and if it is, then break, otherwise, continue the for loop
idList = ["8573", "85678", "2378", "2579"]
for ID in idList:
print(ID)
driver.get(f"https://www.someWebsite/username/{ID}")
element = driver.find_element_by_class_name("Access-Denied")
print("error loading website")
break
if not element:
print("you may continue the for loop")
Mind you if the element showing the access denied page isnt present, i get an error that the 'Access-denied' element doesnt exist, how can i fix this?
You want to wait for the webpage to receive the proper response. Using the following code, you can wait for the full response to load, and then take
appropriate action based on the outcome:
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
...
try:
_ = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "Access-Denied"))
)
print("error loading website")
break
except TimeoutException:
print("you may continue the for loop")
...
So you want to loop through if the access denied is there then break.
wait = WebDriverWait(driver, 10)
idList = ["8573", "85678", "2378", "2579"]
for ID in idList:
print(ID)
driver.get(f"https://www.someWebsite/username/{ID}")
try:
element=wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'Access-Denied')))
break
except:
continue
Import
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
How do I track dynamically updating code on a website?
On a website there is a part of the code that shows notifications. This code gets updates frequently, and I would like to use selenium to capture the changes.
Example:
# Setting up the driver
from selenium import webdriver
EXE_PATH = r'C:/Users/mrx/Downloads/chromedriver.exe'
driver = webdriver.Chrome(executable_path=EXE_PATH)
# Navigating to website and element of interest
driver.get('https://whateverwebsite.com/')
element = driver.find_element_by_id('changing-element')
# Printing source at time 1
element.get_attribute('innerHTML')
# Printing source at time 2
element.get_attribute('innerHTML')
The code returned for time 1 and time 2 is different. I could of cause capture this using some time of loop.
# While loop capturing changes
results=list()
while True:
print("New source")
source=element.get_attribute('innerHTML')
new_source=element.get_attribute('innerHTML')
results.append(source)
while source==new_source:
time.sleep(1)
Is there a smarter way to do this using selenium's event listener?
new_source=element.get_attribute('innerHTML')
Try wait by selenium way with WebDriverWait, selenium provide a method .text_to_be_present_in_element, you can try following approach.
First you need following import:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
Try the bellow code:
element = driver.find_element_by_id('changing-element')
# Printing source at time 1
element.get_attribute('innerHTML')
#something that makes the element change
WebDriverWait(driver, 10).until(expected_conditions.text_to_be_present_in_element((By.ID, 'changing-element'), 'expected_value'))
# Printing source at time 2
element.get_attribute('innerHTML')
But if it isn't found, it will return an TimeoutException error, please handle with try/except
I am trying to scrape data from the Sunshine List website (http://www.sunshinelist.ca/) using the Selenium package but I get the following error mentioned below. From several other related posts I understand that I need to use the WebDriverWait to explicitly ask the driver to wait/refresh but I am unable to identify where and how I should call the function.
Screenshot of Error
StaleElementReferenceException: Message: The element reference
of (tr class="even") stale: either the element is no longer attached to the DOM or the
page has been refreshed
import numpy as np
import pandas as pd
import requests
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import StaleElementReferenceException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
ffx_bin = FirefoxBinary(r'C:\Users\BhagatM\AppData\Local\Mozilla Firefox\firefox.exe')
ffx_caps = DesiredCapabilities.FIREFOX
ffx_caps['marionette'] = True
driver = webdriver.Firefox(capabilities=ffx_caps,firefox_binary=ffx_bin)
driver.get("http://www.sunshinelist.ca/")
driver.maximize_window()
tablewotags1=[]
while True:
divs = driver.find_element_by_id('datatable-disclosures')
divs1=divs.find_elements_by_tag_name('tbody')
for d1 in divs1:
div2=d1.find_elements_by_tag_name('tr')
for d2 in div2:
tablewotags1.append(d2.text)
try:
driver.find_element_by_link_text('Next →').click()
except NoSuchElementException:
break
year1=tablewotags1[0::10]
name1=tablewotags1[3::10]
position1=tablewotags1[4::10]
employer1=tablewotags1[1::10]
df1=pd.DataFrame({'Year':year1,'Name':name1,'Position':position1,'Employer':employer1})
df1.to_csv('Sunshine List-1.csv', index=False)
If your problem is to click the "Next" button, you can do that with the xpath:
driver = webdriver.Firefox(executable_path=r'/pathTo/geckodriver')
driver.get("http://www.sunshinelist.ca/")
wait = WebDriverWait(driver, 20)
el=wait.until(EC.presence_of_element_located((By.XPATH,"//ul[#class='pagination']/li[#class='next']/a[#href='#' and text()='Next → ']")))
el.click()
For each click on the "Next" button -- you should find that button and click on it.
Or do something like this:
max_attemps = 10
while True:
next = self.driver.find_element_by_css_selector(".next>a")
if next is not None:
break
else:
time.sleep(0.5)
max_attemps -= 1
if max_attemps == 0:
self.fail("Cannot find element.")
And after this code does click action.
PS: Also try to add just time.sleep(x) after fiding element and then do click action.
Try this code below.
When the element is no longer attached to the DOM and the StaleElementReferenceException is invoked, search for the element again to reference the element.
Please do note I checked with Chrome:
try:
driver.find_element_by_css_selector('div[id="datatable-disclosures_wrapper"] li[class="next"]>a').click()
except StaleElementReferenceException:
driver.find_element_by_css_selector('div[id="datatable-disclosures_wrapper"] li[class="next"]>a').click()
except NoSuchElementException:
break
>>>Stale Exceptions can be handled using **StaleElementReferenceException** to continue to execute the for loop. When you try to get the element by any find_element method in a for loop.
from selenium.common import exceptions
and customize your code of for loop as:
for loop starts:
try:
driver.find_elements_by_id("data") //method to find element
//your code
except exceptions.StaleElementReferenceException:
pass
When you raise the StaleElementException that means that somthing changed in the site, but not in the list you have. So the trick is to refresh that list every time, inside the loop like this:
while True:
driver.implicitly_wait(4)
for d1 in driver.find_element_by_id('datatable-disclosures').find_element_by_tag_name('tbody').find_elements_by_tag_name('tr'):
tablewotags1.append(d1.text)
try:
driver.switch_to.default_content()
driver.find_element_by_xpath('//*[#id="datatable-disclosures_wrapper"]/div[2]/div[2]/div/ul/li[7]/a').click()
except NoSuchElementException:
print('Don\'t be so cryptic about error messages, they are good\n
...Script broke clicking next') #jk aside put some info there
break
Hope this help you, cheers.
Edit:
So I went to the said website, the layout is pretty straight forward, but the structure repeats itself like four times. So when you go about crawling the site like that something is bound to change.
So I’ve edited the code to only scrap one tbody tree. This tree comes from the first datatable-disclousure. And added some waits.