Trying to make a small script that helps me fill in table
I can't seem to be able to select from a dropdown list using selenium. after running the code multiple times it seems to randomly not work on some rows but it never breaks down at the same spot twice.
For some reason, it works fine on the first 2 dropdown boxes but the last 2 won't seem to work consistently(deductible and company).
heres what I have so far:
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.support.ui import Select
from random import randint
driver = webdriver.Chrome()
driver.get("https://www.ehail.ca/quotes/?1494142398325")
for x in range(5):
driver.find_element_by_name("button").click()
acres = 100
croptype = "Wheat"
qrt = "NW"
sec = randint(1,16)
twn = randint(1,30)
rng = randint(1,30)
mer = "W3"
ded = "Full"
comp = randint(1,7)
cov = 100
for w in range(1,8):
w = str(w)
element = driver.find_element_by_name("acres"+w)
element.send_keys(acres)
select = Select(driver.find_element_by_id('cropComboboxId'+w))
select.select_by_visible_text(croptype)
select = Select(driver.find_element_by_id("quarterComboboxId"+w))
select.select_by_visible_text(qrt)
element = driver.find_element_by_name("section"+w)
element.send_keys(sec)
element = driver.find_element_by_name("township"+w)
element.send_keys(twn)
element = driver.find_element_by_name("range"+w)
element.send_keys(rng)
select = Select(driver.find_element_by_name("meridian"+w))
select.select_by_visible_text(mer)
#THIS IS WHERE THE TROUBLE STARTS!
select = Select(driver.find_element_by_name("deductible"+w))
select.select_by_index(5)
select = Select(driver.find_element_by_name('company'+w))
for index in range(len(select.options)):
select = Select(driver.find_element_by_name('company'+w))
select.select_by_index(1)
element = driver.find_element_by_name("coverageperacre"+w)
element.send_keys(cov)
element = driver.find_element_by_name("quoteForm").submit()
I have tried selecting by index, name, id, text, pretty much everything but I cant even find a consistent breakdown point. in fact the odd time it will run without an error. the error I get is usually though is "cant locate element with index/name/id 'whatever'"
any help would be greatly appreciated.
cheers
The last two dropowns options are populated only after data is filled in previous fields. You can wait until the options exists before choosing from them
options_size = 0
while options_size == 0:
select = Select(driver.find_element_by_name("meridian" + w))
options_size = len(select.options)
select.select_by_visible_text(mer)
The select element is refreshed when populated, so it should be relocated to prevent StaleElementReferenceException
Related
I'm a beginner and have a lot to learn, so please be patient with me.
Using Python and Selenium, I'm trying to scrap table data from a website while navigating through different pages. As I navigate through different pages, the table shows the updated data, but it doesn't refresh the page, and the URL remains the same.
To get the refreshed data from the table and avoid stale element exception, I used WebDriverWait and expected_conditions (tr elements). Even with the wait, my code didn't get the refreshed data. It was getting the old data from the previous page and was giving the exception. So, I added time.sleep() after I clicked the next page button, which solved the problem.
However, I noticed my code was getting slower as I was navigating more and more pages. So, at around page 120, it gave me the stale element exception and was not able to get the refreshed data. I'm assuming it is because I'm using a for loop within a while loop that slows down the performance.
I tried implicit wait and increased time.sleep() gradually to avoid staleness exception, but nothing was working. There are 100 table rows in each page and around 3,100 pages total.
The followings are the problems:
Why do I get the stale element exception and how to avoid it
How to increase the efficiency of the code
I searched a lot and really tried to fix it on my own before I decided to write here.
I'm stuck here and don't know what to do. Please help, and thank you so much for your time.
while True:
# waits until the table elements are visible when the page is loaded
# this is a must step for Selenium to scrap data from the dynamic table when we navigate through different pages
tr = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[#id='erdashboard']/tbody/tr")))
for record in tr:
count += 1
posted_date = datetime.strptime(record.find_element(By.XPATH, './td[7]').text, "%m/%d/%Y").date()
exclusion_request_dict["ID"].append(int(record.find_element(By.XPATH, './td[1]').text))
exclusion_request_dict["Company"].append(record.find_element(By.XPATH, './td[2]').text)
exclusion_request_dict["Product"].append(record.find_element(By.XPATH, './td[3]').text)
exclusion_request_dict["HTSUSCode"].append(record.find_element(By.XPATH, './td[4]').text)
exclusion_request_dict["Status"].append(record.find_element(By.XPATH, './td[5]').text)
exclusion_request_dict["Posted Date"].append(posted_date)
next_button = driver.find_element(By.ID, "erdashboard_next")
next_button_clickable = driver.find_element(By.ID, "erdashboard_next").get_attribute("class").split(" ")
print(next_button_clickable)
print("Current Page:", page, "Total Counts:", count)
if next_button_clickable[-1] == "disabled":
break
next_button.click() # goes to the next page
time.sleep(wait + 0.01)
When you click the next page button, you can avoid the stale element exception by, for example, checking when the ID in the first row has changed. This is done in the section of the code # wait until new page is loaded (see full code below).
When scraping data from a table, you can increase the efficiency of the code with two tricks. First: loop over columns rather than over rows, because there are (almost always) more rows than columns. Second: use javascript instead of the selenium command .text, because js is way faster than .text. For example, to scrape the values in the first column, the command in selenium is
[td.text for td in driver.find_elements(By.XPATH, '//tbody/tr/td[1]')]
and it takes about 1.2 seconds on my computer, while the corresponding javascript command (see the code inside for idx in range(1,8) below) takes only about 0.008 seconds (150 times faster!). Actually, the first trick is slightly noticeable when using .text, but when using javascript is really effective: for example to scrape the whole table by rows with js it takes about 0.52 seconds, while by columns it takes about 0.05 seconds.
Here is the full code:
import math, time, pandas
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import StaleElementReferenceException
chromedriver_path = '...'
driver = webdriver.Chrome(service=Service(chromedriver_path))
wait = WebDriverWait(driver,9)
driver.get('https://232app.azurewebsites.net/')
dropdown = wait.until(EC.element_to_be_clickable((By.NAME, 'erdashboard_length')))
target_number_of_rows = 100
Select(dropdown).select_by_value(str(target_number_of_rows))
# wait until 100 rows are loaded
current_number_of_rows = 0
while current_number_of_rows != target_number_of_rows:
current_number_of_rows = len(driver.find_elements(By.CSS_SELECTOR, 'tbody tr'))
header = [th.text for th in driver.find_elements(By.XPATH, '//tr/th[position()<last()]')]
data = {key:[] for key in header}
number_of_pages = int(driver.find_element(By.CSS_SELECTOR, '.paginate_button:last-child').text)
times = []
while 1:
start = time.time()
if len(times)>0:
current_page = int(driver.find_element(By.CLASS_NAME, "current").text)
mean = sum(times) / len(times)
eta = (number_of_pages - current_page) * mean
minutes = math.floor(eta/60)
seconds = round((eta/60 - minutes)*60)
print(f'current page {current_page} (ETA {minutes}:{seconds}) (mean per page {mean:.2f}s) ({len(data[header[0]])} rows scraped)',end='\r')
for idx in range(1,8):
data[header[idx-1]] += driver.execute_script("var result = [];" +
f"var all = document.querySelectorAll('tbody>tr>td:nth-child({idx})');" +
"for (var i=0, max=all.length; i < max; i++) {" +
" result.push(all[i].innerText);" +
"} " +
" return result;")
# check if all lists in the dictionary have the same length, if not there is a problem (column missing or not scraped properly)
lens = [len(data[h]) for h in header]
if len(set(lens)) != 1:
print('\nerror: lists in the dictionary have different lengths')
print(lens)
break
# click next page button if available
next_btn = driver.find_element(By.ID, 'erdashboard_next')
if 'disabled' not in next_btn.get_attribute('class'):
next_btn.click()
else:
print('\nno more pages to load')
break
# wait until new page is loaded
firt_row_id_old = WebDriverWait(driver,9).until(EC.visibility_of_element_located((By.CSS_SELECTOR, 'tbody>tr>td'))).text
firt_row_id_new = firt_row_id_old
while firt_row_id_new == firt_row_id_old:
try:
firt_row_id_new = WebDriverWait(driver,9).until(EC.visibility_of_element_located((By.CSS_SELECTOR, 'tbody>tr>td'))).text
except StaleElementReferenceException:
continue
times += [time.time() - start]
While the loop is running you get an output like this ("ETA" is the estimated remaining time in the format minutes:seconds) ("mean per page" is the mean time it takes to execute each loop)
current page 156 (ETA 73:58) (mean per page 1.52s) (15500 rows scraped)
Then by running pandas.DataFrame(data) you get something like this
im am trying to get the values i added to a list with selenium and print them out. But i am only getting this: <generator object at 0x000001B924EC7990>. How can i print the values in the list.
I also tried to shorten the xpath with "//tr[#class= 'text3'][11]/td" but it didnt work.
Like you can see i tried to loop through the list and convert it in text, but it also didnt work.
Would this work range(driver.find_elements(By.XPATH,"//table[2]/tbody/tr/td[2]/table[1]/tbody/tr/td[3]/table/tbody/tr[2]/td[position() >= last()]"))?
Can you guys help me out?
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.implicitly_wait(10)
website = "https://langerball.de/"
driver.get(website)
for i in range(7):
xpath_test = "//table[2]/tbody/tr/td[2]/table[1]/tbody/tr/td[3]/table/tbody/tr[2]/td[position() >= last()]"
a = driver.find_elements(By.XPATH, xpath_test)
test_li = []
test_li.append(a)
print(b.text for b in test_li)
driver.find_elements method returns a list of web elements while you are looking for their text values. Web element text value can be received by applying the .text method on a web element.
So, you should iterate over the received list of web elements and extract text from each web element in the list.
Also test_li = [] should be defined out of the loop.
So your code could be something like this:
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.implicitly_wait(10)
website = "https://langerball.de/"
driver.get(website)
test_li = []
for i in range(7):
xpath_test = "//table[2]/tbody/tr/td[2]/table[1]/tbody/tr/td[3]/table/tbody/tr[2]/td[position() >= last()]"
a_list = driver.find_elements(By.XPATH, xpath_test)
for a in a_list:
test_li.append(a.text)
print(b.text for b in test_li)
P.S.
I'm not sure about the rest of your code: the for i in range(7) loop and the xpath_test XPath expression
This worked
test_li = []
xpath_test = "//table[2]/tbody/tr/td[2]/table[1]/tbody/tr/td[3]/table/tbody/tr[2]/td[position() <= last()]"
a_list = driver.find_elements(By.XPATH, xpath_test)
for a in a_list:
test_li.append(a.text)
print(test_li)
I am using the below code to get data from http://www.bddk.org.tr/BultenHaftalik. Two table elements have the same class name. How can I get just one of them?
from selenium import webdriver
import time
driver_path = "C:\\Users\\Bacanli\\Desktop\\chromedriver.exe"
browser = webdriver.Chrome(driver_path)
browser.get("http://www.bddk.org.tr/BultenHaftalik")
time.sleep(3)
Krediler = browser.find_element_by_xpath("//*[#id='tabloListesiItem-253']/span")
Krediler.click()
elements = browser.find_elements_by_css_selector("td.ortala")
for element in elements:
print(element.text)
browser.close()
If you want to select all rows for one column only that match a specific css selection, then you can use :nth-child() selector.
Simply, the code will be like this:
elements = browser.find_elements_by_css_selector("td.ortala:nth-child(2)")
In this way, you will get the "Krediler" column rows only. You can also select the first child if you want to by applying the same idea.
I guess what you want to do is to extract the text and not the numbers, try this:
elements = []
for i in range(1,21):
css_selector = f'#Tablo > tbody:nth-child(2) > tr:nth-child({i}) > td:nth-child(2)'
element=browser.find_element_by_css_selector(css_selector)
elements.append(element)
for element in elements:
print(element.text)
browser.close()
I want to select first 100 elements in a list so I user action chain method.
Please suggest any other method, if any. With the code I used below, I can select elements in the list but I am not able to click any element:
for r in range(1, 100):
r = str(r)
print r
row = GlobalVar.Driver.find_element_by_xpath("/html/body/table/tbody/tr[2]/td/table/tbody/tr[2]/td/div/div[3]/table/tbody/tr[2]/td/table/tbody/tr/td[1]/table/tbody/tr[2]/td/select/option["+r+"]")
action_chains = ActionChains(GlobalVar.Driver)
action_chains.key_down(Keys.CONTROL).key_down(Keys.SHIFT).click(row).key_up(Keys.SHIFT).key_up(Keys.CONTROL).perform()
It is a select tag, you can go with select class in python.
ele = GlobalVar.Driver.find_element_by_xpath("/html/body/table/tbody/tr[2]/td/table/tbody/tr[2]/td/div/div[3]/table/tbody/tr[2]/td/table/tbody/tr/td[1]/table/tbody/tr[2]/td/select")
select = Select(ele)
for index in range(1, 100):
select.select_by_index(index)
It is not advisable to use absolute xpath. please try to use relative path if possible or other locator types.
On a typical eBay search query where more than 50 listings are returned, such as this, eBay displays in the a grid format (whether you have it set up to display as grid or a list).
I'm using class name to pull out the prices using WebDriver:
prices = webdriver.find_all_elements_by_class_name("bidsold")
The challenge: although all prices on the page look identical in structure, the ones that are crossed out (where Buy It Now is not available and it's Best offer accepted) are actually contained within a child span of the above span:
I could pull these out separately by repeating the find_all_elements_by_class_name method with class sboffer, but (i) I will lose track of the order, and more importantly (ii) it will roughly double the time it takes to extract the prices.
The CSS selector for both types of prices also differ, as do the XPaths.
How do we catch all prices in one go?
Try this:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('http://www.ebay.com/sch/i.html?rt=nc&LH_Complete=1&_nkw=Columbia+Hiking+Pants&LH_Sold=1&_sacat=0&LH_BIN=1&_from=R40&_sop=3&LH_ItemCondition=1000&_pgn=2')
prices_list = driver.find_elements_by_css_selector('span.amt')
prices_on_page = []
for span in prices_list:
unsold_item = span.find_elements_by_css_selector('span.bidsold.bold')
sold_item = span.find_elements_by_css_selector('span.sboffer')
if len(sold_item):
prices_on_page.append(sold_item[0].text)
elif len(unsold_item):
prices_on_page.append(unsold_item[0].text)
elif span.text:
prices_on_page.append(span.text)
print prices_on_page
driver.quit()
In this case, you will have track of the order and you will only query the specific span element instead of the entire page. This should improve performance.
I would go for xpath- below code worked for me. It grabbed 50 prices!
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('http://www.ebay.com/sch/i.html?rt=nc&LH_Complete=1&_nkw=Columbia+Hiking+Pants&LH_Sold=1&_sacat=0&LH_BIN=1&_from=R40&_sop=3&LH_ItemCondition=1000&_pgn=2')
my_prices = []
itms = driver.find_elements_by_xpath("//div[#class='bin']")
for i in itms:
prices = i.find_elements_by_xpath(".//span[contains(text(),'$')]")
val = ','.join(i.text for i in prices)
my_prices.append([val])
print my_prices
driver.quit()
Result is
[[u'$64.95'], [u'$59.99'], [u'$49.95'], [u'$46.89,$69.99'], [u'$44.98'], [u'$42.95'], [u'$39.99'], [u'$39.99'], [u'$37.95'], [u'$36.68'], [u'$35.96,$44.95'], [u'$34.99'], [u'$34.99'], [u'$34.95'], [u'$30.98'], [u'$29.99'], [u'$29.99'], [u'$29.65,$32.95'], [u'$29.00'], [u'$27.96,$34.95'], [u'$27.50'], [u'$27.50'], [u'$26.99,$29.99'], [u'$26.95'], [u'$26.55,$29.50'], [u'$24.99'], [u'$24.99'], [u'$24.99'], [u'$24.99'], [u'$24.98'], [u'$24.98'], [u'$24.98'], [u'$24.98'], [u'$24.98'], [u'$22.00'], [u'$22.00'], [u'$22.00'], [u'$22.00'], [u'$18.00'], [u'$18.00'], [u'$17.95'], [u'$11.99'], [u'$9.99'], [u'$6.00']]