Selenium python : my current_url doesnt update after click - python

Im scraping a website where I need to retrieve values from the url when i click on a button providing different form values.
I have a problem: when i click the button and retrieve the current_url, the provided values in the forms doesnt reflect in the url which should be updated (it's a search button). There is no new tab created.
My code to retrieve the url value is :
driver = webdriver.Firefox()
driver.get(url)
arrlist = []
idlist = []
service=value
for i in key_list:
form = driver.find_elements(by=By.XPATH, value='//input[#id="geo_nav"]')
form[0].send_keys(i)
form2=driver.find_elements(by=By.XPATH, value='//input[#id="sev_nav"]')
form2[0].send_keys(service)
button=driver.find_elements(by=By.XPATH, value='//button[#data-role="filter-apply"]')
button[0].click()
time.sleep(5)
url=driver.current_url
print(dept)
print(i)
id=re.findall(r"(?<=\[population\]=)(\d{9})",url)[0]
arrlist.append(i)
idlist.append(id)
the button html code is :
<button class="filter-apply cta-navigate relative hide-mobile flex withNumber" data-role="filter-apply">
<p class="hide-mobile m-r-4">Appliquer</p>
<div class="svg relative">
<span class="filters-apply-length">2</span>
<svg height="18" viewBox="0 0 16 18" width="16" xmlns="http://www.w3.org/2000/svg"><path d="m10.877 17.457 2.026 1.533v-4.553c0-.166.042-.329.12-.475l4.3-7.962h-10.68l4.122 7.978c.074.142.112.3.112.459zm3.026 4.543c-.213 0-.426-.068-.603-.203l-4.026-3.045c-.25-.189-.397-.484-.397-.797v-3.274l-4.765-9.222c-.161-.31-.148-.681.034-.979.181-.298.505-.48.854-.48h14c.352 0 .678.185.859.488.18.302.188.677.021.987l-4.977 9.215v6.31c0 .379-.214.726-.554.895-.141.07-.294.105-.446.105z" fill="#0579c7" fill-rule="evenodd" transform="translate(-4 -4)"></path></svg> </div>
</button>
I've tried to use
driver.switch_to.window(driver.window_handles[-1]);
following this post : Python Selenium Chromedriver - Can't Get current_url of new opened tab after click()
But I dont have tab or new windows issues.
I tried to click autocompletion lists in the 2 forms in inputand one of the form produces a modification of the url but not the other (the one of which effects on the url i need to monitor).
The form code that works is :
<form data-component="sev_nav_input" data-no-results="Sans résultats" data-default-pho="Services" data-selected-name="Achat compulsif" data-selected-id="5928" class="filter-input filter-services relative">
<input type="text" placeholder="Services" autocomplete="off" name="sev_nav" id="sev_nav" data-role="js_filter" data-id="5928" class="autocomplete-with-result">
<span id="clear-sev-input" class="clear-sev-input" style="display: none;">
<img src="data:image/svg+xml;base64,PHN2ZyBoZWlnaHQ9IjI0IiB2aWV3Qm94PSIwIDAgMjQgMjQiIHdpZHRoPSIyNCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48ZyBmaWxsPSJub25lIiBmaWxsLXJ1bGU9ImV2ZW5vZGQiPjxwYXRoIGQ9Im0wIDBoMjR2MjRoLTI0eiIvPjxwYXRoIGQ9Im0xMS43NSAyMC41YzIuNDIyIDAgNC40ODYtLjg1MyA2LjE5MS0yLjU1OSAxLjcwNi0xLjcwNSAyLjU1OS0zLjc3IDIuNTU5LTYuMTkxIDAtMi40MjItLjg1My00LjQ4Ni0yLjU1OS02LjE5MS0xLjcwNS0xLjcwNi0zLjc3LTIuNTU5LTYuMTkxLTIuNTU5LTIuNDIyIDAtNC40OTIuODYtNi4yMSAyLjU3OC0xLjY5NSAxLjY5My0yLjU0IDMuNzUtMi41NCA2LjE3MnMuODUzIDQuNDg2IDIuNTU5IDYuMTkxYzEuNzA1IDEuNzA2IDMuNzcgMi41NTkgNi4xOTEgMi41NTl6bTMuMTY0LTQuNDE0Yy0uMDUyIDAtLjExNy0uMDQtLjE5NS0uMTE3bC0yLjk2OS0yLjkzLTIuOTMgMi45NjljLS4wNTIuMDUyLS4xMy4wNzgtLjIzNC4wNzhzLS4xODItLjAyNi0uMjM0LS4wNzhsLS44Mi0uODZjLS4wNTMtLjA1Mi0uMDc5LS4xMy0uMDc5LS4yMzRzLjAyNi0uMTcuMDc4LS4xOTVsMi45NjktMi45NjktMi45NjktMi45M2MtLjE1Ni0uMTU2LS4xNTYtLjMxMiAwLS40NjhsLjgyLS44MmMuMDc5LS4wNzkuMTU3LS4xMTguMjM1LS4xMTguMDUyIDAgLjExNy4wNC4xOTUuMTE3bDIuOTY5IDIuODkgMi45NjktMi44OWMuMDc4LS4wNzguMTQzLS4xMTcuMTk1LS4xMTcuMDc4IDAgLjE1Ni4wNC4yMzQuMTE3bC44Ni44MmMuMTU2LjE1Ny4xNTYuMzEzIDAgLjQ3bC0yLjk2OSAyLjkyOSAyLjkzIDIuOTNjLjA3OC4wNzguMTE3LjE1Ni4xMTcuMjM0IDAgLjEwNC0uMDQuMTgyLS4xMTcuMjM0bC0uODIuODJjLS4wNzkuMDc5LS4xNTcuMTE4LS4yMzUuMTE4eiIgZmlsbD0iIzE0OWM5NyIvPjwvZz48L3N2Zz4K"></span>
<span class="gradient"></span>
<span class="gradient" style="display: none;"></span>
<div class="spinner" style="display: none;"></div>
<div id="services-list" class="services-list" style="display: none;"><ul data-role="autocomplete-list" class="autocomplete-list"> </ul></div></form>
The form code that doesnt work is :
<form data-component="geo_nav_input" data-selected-name="" data-selected-id="" data-selected-neighborhood-id="0" data-selected-type="" data-no-results="Sans résultats" data-pho="Localité" data-default-pho="Localité" class="filter-input relative">
<div class="hide">Chercher des professionnels en/à...</div>
<span class="icon-x toggle_geo_nav hide"></span>
<label for="geo_nav" class="hidden-label">Localité</label>
<input type="text" placeholder="Localité" autocomplete="off" name="geo_nav" id="geo_nav" data-role="js_filter" data-id="" data-neighborhoodid="0" data-type="" class="autocomplete-with-result"> <span id="clear-geo-input" class="clear-geo-input" style="display: none;">
<img src="data:image/svg+xml;base64,PHN2ZyBoZWlnaHQ9IjI0IiB2aWV3Qm94PSIwIDAgMjQgMjQiIHdpZHRoPSIyNCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48ZyBmaWxsPSJub25lIiBmaWxsLXJ1bGU9ImV2ZW5vZGQiPjxwYXRoIGQ9Im0wIDBoMjR2MjRoLTI0eiIvPjxwYXRoIGQ9Im0xMS43NSAyMC41YzIuNDIyIDAgNC40ODYtLjg1MyA2LjE5MS0yLjU1OSAxLjcwNi0xLjcwNSAyLjU1OS0zLjc3IDIuNTU5LTYuMTkxIDAtMi40MjItLjg1My00LjQ4Ni0yLjU1OS02LjE5MS0xLjcwNS0xLjcwNi0zLjc3LTIuNTU5LTYuMTkxLTIuNTU5LTIuNDIyIDAtNC40OTIuODYtNi4yMSAyLjU3OC0xLjY5NSAxLjY5My0yLjU0IDMuNzUtMi41NCA2LjE3MnMuODUzIDQuNDg2IDIuNTU5IDYuMTkxYzEuNzA1IDEuNzA2IDMuNzcgMi41NTkgNi4xOTEgMi41NTl6bTMuMTY0LTQuNDE0Yy0uMDUyIDAtLjExNy0uMDQtLjE5NS0uMTE3bC0yLjk2OS0yLjkzLTIuOTMgMi45NjljLS4wNTIuMDUyLS4xMy4wNzgtLjIzNC4wNzhzLS4xODItLjAyNi0uMjM0LS4wNzhsLS44Mi0uODZjLS4wNTMtLjA1Mi0uMDc5LS4xMy0uMDc5LS4yMzRzLjAyNi0uMTcuMDc4LS4xOTVsMi45NjktMi45NjktMi45NjktMi45M2MtLjE1Ni0uMTU2LS4xNTYtLjMxMiAwLS40NjhsLjgyLS44MmMuMDc5LS4wNzkuMTU3LS4xMTguMjM1LS4xMTguMDUyIDAgLjExNy4wNC4xOTUuMTE3bDIuOTY5IDIuODkgMi45NjktMi44OWMuMDc4LS4wNzguMTQzLS4xMTcuMTk1LS4xMTcuMDc4IDAgLjE1Ni4wNC4yMzQuMTE3bC44Ni44MmMuMTU2LjE1Ny4xNTYuMzEzIDAgLjQ3bC0yLjk2OSAyLjkyOSAyLjkzIDIuOTNjLjA3OC4wNzguMTE3LjE1Ni4xMTcuMjM0IDAgLjEwNC0uMDQuMTgyLS4xMTcuMjM0bC0uODIuODJjLS4wNzkuMDc5LS4xNTcuMTE4LS4yMzUuMTE4eiIgZmlsbD0iIzE0OWM5NyIvPjwvZz48L3N2Zz4K"></span>
<span class="gradient"></span>
<span class="gradient" style="display: none;"></span>
<div class="spinner" style="display: none;"></div>
<div id="location-list" class="location-list" style="display: none;"><ul data-role="autocomplete-list" class="autocomplete-list"> </ul></div>
</form>

Can you make a function to navigate pages, and on each page do the actions you require. And with each call of the function use driver.switch_to.window to ensure you are on the latest page.
Although based on your edits, it now seems the issue is that you are having trouble locating and following one of the links on the pages.
def navigate(n):
""" Move through the pages. Select the relevant buttons on each page"""
window_after = driver.window_handles[0]
driver.switch_to.window(window_after)
if n == 0:
form = driver.find_elements(by=By.XPATH, value='//input[#id="geo_nav"]')
button = driver.find_elements(by=By.XPATH, value='//button[#data-role="filter-apply"]').click()
elif n == 1:
pass
# Do something
else:
pass
# Do something else
for i in range(3):
navigate(i)
time.sleep(3)

The solution was in fact linked to the autocompletion forms. They require you to click on the autocompletion suggestions so the button is actually working.
FYI, here is the full code to autocomplete with clicking the form, deleting the content, adding the content, clicking the list and clicking the button.
def get_city_locations(service):
url='url'
#options = Options()
#options.headless = True
driver = webdriver.Firefox()#options=options)
driver.get(url)
time.sleep(2)
buttoncookie = driver.find_elements(by=By.XPATH, value='//button[#class="cf2Lf6"]')
buttoncookie[0].click()
time.sleep(1)
form2 = driver.find_elements(by=By.XPATH, value='//input[#id="sev_nav"]')
form2[0].click()
time.sleep(1)
Static.clear_text(driver)
form2[0].send_keys(service)
time.sleep(1)
autocompleteservice = driver.find_elements(by=By.XPATH, value='//li[not(#class)]')
for f in autocompleteservice:
if f.text == service:
f.click()
df_pref=pd.read_csv('arrondissement_2022.csv',sep=',')
deptlist = []
arrlist = []
idlist = []
for i in df_pref['LIBELLE']:
df_dep=df_pref[df_pref['LIBELLE']==i]
dept = df_dep.loc[df_dep.index.values[0], 'DEP']
form = driver.find_elements(by=By.XPATH, value='//input[#id="geo_nav"]')
form[0].click()
time.sleep(1)
Static.clear_text(driver)
form[0].send_keys(i)
time.sleep(3)
autocompletelocation=driver.find_elements(by=By.XPATH, value='//li[not(#class)]')
cond=0
for a in autocompletelocation:
if a.text==i:
print ('condition ok')
cond=1
a.click()
break
time.sleep(3)
button=driver.find_elements(by=By.XPATH, value='//button[#data-role="filter-apply"]')
button[0].click()
time.sleep(3)
driver.switch_to.window(driver.window_handles[-1]);
url=driver.current_url
print(dept)
print(i)
print(url)
if cond==0:
id=0
else:
id=re.findall(r"(?<=\[population\]=)(\d{7,10})",url)[0]
print(f'id = {id}')
print('\n')
deptlist.append(dept)
arrlist.append(i)
idlist.append(id)
df0 = pd.DataFrame({"dept": deptlist, "arrondissement":arrlist,"id":idlist})
df0.to_csv('arr_id.csv',sep=';',index=False)

Related

Unable to click button Shopify/Selenium

Unable to click the "Continue to payment button" on shopify site. I have seen several similar post but most of them are for js and do not mention the spinner part of the error.
driver.find_element_by_xpath ('//*[#id="continue_button"]/svg')
<div class="content-box__row">
<div class="radio-wrapper" data-shipping-method="shopify-Standard%20Shipping-15.00">
<div class="radio__input">
<input class="input-radio" data-checkout-total-shipping="$15.00" data-checkout-total-shipping-cents="1500" data-checkout-shipping-rate="$15.00" data-checkout-original-shipping-rate="$15.00" data-checkout-total-price="$94.00" data-checkout-total-price-cents="9400" data-checkout-payment-due="$94.00" data-checkout-payment-due-cents="9400" data-checkout-payment-subform="required" data-checkout-subtotal-price="$79.00" data-checkout-subtotal-price-cents="7900" data-checkout-total-taxes="$0.00" data-checkout-total-taxes-cents="0" data-checkout-multiple-shipping-rates-group="false" data-backup="shopify-Standard%20Shipping-15.00" type="radio" value="shopify-Standard%20Shipping-15.00" name="checkout[shipping_rate][id]" id="checkout_shipping_rate_id_shopify-standard20shipping-15_00" />
</div>
<label class="radio__label" for="checkout_shipping_rate_id_shopify-standard20shipping-15_00">
<span class="radio__label__primary" data-shipping-method-label-title="Standard Shipping">
Standard Shipping
</span>
<span class="radio__label__accessory">
<span class="content-box__emphasis">
$15.00
</span>
</span>
</label> </div> <!-- /radio-wrapper-->
</div>
</div>
</div>
</div>
</div>
<div class="step__footer" data-step-footer>
<button name="button" type="submit" id="continue_button" class="step__footer__continue-btn btn" aria-busy="false"><span class="btn__content" data-continue-button-content="true">Continue to payment</span><svg class="icon-svg icon-svg--size-18 btn__spinner icon-svg--spinner-button" aria-hidden="true" focusable="false"> <use xlink:href="#spinner-button" /> </svg></button>
<a class="step__footer__previous-link" href="/18292275/checkouts/38df275516a513f1c08f6c470ef014d0?step=contact_information"><svg focusable="false" aria-hidden="true" class="icon-svg icon-svg--color-accent icon-svg--size-10 previous-link__icon" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 10 10"><path d="M8 1L7 0 3 4 2 5l1 1 4 4 1-1-4-4"/></svg><span class="step__footer__previous-link-content">Return to information</span></a>
</div>
Try this xpath :
//span[text()='Continue to payment']/..
In code :
Without explicit waits :
Code :
driver.find_element_by_xpath("//span[text()='Continue to payment']/..").click()
With Explicit waits :
Code :
wait = WebDriverWait(driver, 10)
wait.until(EC.element_to_be_clickable((By.XPATH, "//span[text()='Continue to payment']/.."))).click()
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Update 1 :
from selenium.webdriver.common.action_chains import ActionChains
ActionChains(driver).move_to_element(driver.find_element_by_xpath("//span[text()='Continue to payment']/..")).click().perform()
Update 2 :
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
driver.implicitly_wait(50)
driver.get('https://seelbachs.com/products/sagamore-spirit-cask-strength-rye-whiskey')
wait = WebDriverWait(driver, 50)
frame_xpath = '/html/body/div[5]/div/div/div/div/iframe'
wait = WebDriverWait(driver, 10)
# wait until iframe appears and select iframe
wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, frame_xpath)))
# select button
xpath = '//*[#id="enter"]'
time.sleep(2)
element= driver.find_element_by_xpath(xpath)
ActionChains(driver).move_to_element(element).click(element).perform()
# go back to main page
driver.get('https://seelbachs.com/products/sagamore-spirit-cask-strength-rye-whiskey')
# add to cart
atc= driver.find_element_by_xpath('//button[#class="btn product-form__cart-submit product-form__cart-submit--small"]')
atc.click()
# check out
co= driver.find_element_by_xpath ('//*[#id="shopify-section-cart-template"]/div/form/footer/div/div[2]/input[2]')
co.click()
# enter email
driver.find_element_by_xpath('//*[#id="checkout_email"]').send_keys('no#yahoo.com')
time.sleep(1)
# enter first name
driver.find_element_by_xpath('//*[#id="checkout_shipping_address_first_name"]').send_keys('John')
time.sleep(1)
# enter last name
driver.find_element_by_xpath('//*[#id="checkout_shipping_address_last_name"]').send_keys('Smith')
time.sleep(1)
# enter address
driver.find_element_by_xpath ('//*[#id="checkout_shipping_address_address1"]').send_keys('111 South Street')
# enter city
driver.find_element_by_xpath ('//*[#id="checkout_shipping_address_city"]').send_keys('Cocoa')
# enter zip
driver.find_element_by_xpath ('//*[#id="checkout_shipping_address_zip"]').send_keys('263153')
# enter phone
driver.find_element_by_xpath ('//*[#id="checkout_shipping_address_phone"]').send_keys('5555555'+ u'\ue007')
select = Select(wait.until(EC.visibility_of_element_located((By.ID, "checkout_shipping_address_province"))))
select.select_by_value('UK')
wait.until(EC.element_to_be_clickable((By.ID, "continue_button"))).click()
ctp = driver.find_element_by_id('continue_button')
ctp.click()
Solved this issue. I am now able to click the "Continue to Payment" button.

Python/Selenium- Class element cannot be clicked

I am trying to scrape data from this page: https://www.oddsportal.com/soccer/chile/primera-division/curico-unido-o-higgins-CtsLggl6/#over-under;2
Here I am trying to expand all of the "compare odds" fields, which are contained in this HTML:
<div class="table-container">
<div class="table-header-light even"><strong>Over/Under +1 </strong><span class="avg chunk-odd-payout">93.4%</span><span class="avg chunk-odd nowrp">5.63</span>
<span
class="avg chunk-odd nowrp">1.12</span><span class="odds-cnt">(3)</span><span class="odds-co"><a class="more" href="" onclick="page.togleTableContent('P-1.00-0-0',this);return false;">Compare odds</a></span></div>
</div>
<div class="table-container" style="display: none;">
<div class="table-header-light"><strong>Over/Under +1.25 </strong><span class="avg chunk-odd-payout"></span><span class="avg chunk-odd nowrp"></span><span class="avg chunk-odd nowrp"></span>
<span
class="odds-cnt">(0)</span><span class="odds-co"><a class="more" href="" onclick="page.togleTableContent('P-1.25-0-0',this);return false;">Compare odds</a></span></div>
</div>
The part I am trying to access is the following:
span class="odds-co">Compare odds
I have tried all of the following:
#odds_rows = browser.find_elements_by_class_name('more')
# odds_rows=browser.find_elements_by_css_selector(".more")
# odds_rows=WebDriverWait(browser, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[#class='more']")))
odds_rows= WebDriverWait(browser, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".more")))
#odds_rows=WebDriverWait(browser, 10).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "more")))
In order to subsequently loop click through the identified fields:
for i in odds_rows:
#browser.execute_script("arguments[0].click();", i)
i.click()
However already in the step of identifying the fields I am getting a timeout error on all WebDriverWait attempts except for
odds_rows=WebDriverWait(browser, 10).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "more")))
This option yields only one result:
[<selenium.webdriver.remote.webelement.WebElement (session="7cbc57173a57aadbc115264dff8ca620", element="3654f928-bca4-4033-9566-da9e6aa6294b")>]
However this result is not clicked subsequently.
What am I doing wrong?
driver = webdriver.Chrome()
driver.implicitly_wait(10)
driver.get("https://www.oddsportal.com/soccer/chile/primera-division/curico-unido-o-higgins-CtsLggl6/#over-under;2;0.50;0")
driver.maximize_window()
odds_rows = WebDriverWait(driver, 10).until(
EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.table-header-light')))
for i in odds_rows:
count = i.find_element_by_xpath('./span[#class="odds-cnt"]')
elem = i.find_elements_by_xpath('.//*[contains(text(),"Compare")]')
txt = count.text
if txt != '' and len(elem):
elem = elem[0]
driver.execute_script("arguments[0].scrollIntoView();", elem)
elem.click()
The issue is the row with count '' is not visible and you cannot click it .
if you click on 'Compare odds' you can see the URL that changes from
https://www.oddsportal.com/soccer/chile/primera-division/curico-unido-o-higgins-CtsLggl6/#over-under;2
https://www.oddsportal.com/soccer/chile/primera-division/curico-unido-o-higgins-CtsLggl6/#over-under;2;0.50;0
if you follow clicking you will se that the last part:
2;0.50;0 will increase by 0.5
next is
https://www.oddsportal.com/soccer/chile/primera-division/curico-unido-o-higgins-CtsLggl6/#over-under;2;1.00;0
https://www.oddsportal.com/soccer/chile/primera-division/curico-unido-o-higgins-CtsLggl6/#over-under;2;1.50;0
and continue...
In another way you have this class: "table-main detail-odds sortable" by default is hidden, because there is the data, you DON'T need to click. you can scrape that class
i hope be helpful for you.

Unable to locate an lists of elements using selenium

I need to scrape some pages. The exact structure of the part that I want is as follows:
<div class="someclasses">
<h3>...</h3> # Not needed
<ul class="ul-class1 ul-class2">
<li id="li1-id" class="li-class1 li-class2">
<div id ="div1-id" class="div-class1 div-class2 ... div-class6">
<div class="div2-class">
<div class="div3-class">...</div> #Not needed
<div class="div4-class1 div4-class2 div4-class3">
<a href="href1" data-control-id="id1" data-control-name="name" id ="a1-id" class="a-class1 a-class2">
<h3 class="h3-class1 h3-class2 h3-class3">Text1</h3>
</a></div>
<div>...</div> # Not needed
</div>
</li>
<li id="li2-id" class="li-class1 li-class2">
<div id ="div2-id" class="div-class1 div-class2 ... div-class6">
<div class="div2-class">
<div class="div3-class">...</div> #Not needed
<div class="div4-class1 div4-class2 div4-class3">
<a href="href2" data-control-id="id2" data-control-name="name" id ="a2-id" class="a-class1 a-class2">
<h3 class="h3-class1 h3-class2 h3-class3">Text2</h3>
</a></div>
<div>...</div> # Not needed
</div>
</li>
# More <li> elements
</ul>
</div>
Now what I want is to get the Texts as well as the hrefs.I have used the naming in above example exactly realistic i.e the same names are also the same in the real webpage. The code that I am currently using is:
elems = driver.find_elements_by_xpath("//div[#class='someclasses']/ul[#class='ul-class1']/li[#class='li-class1']")
print(len(elems))
for elem in elems:
elem1 = driver.find_element_by_xpath("./a[#data-control-name='name']")
names2.append(elem1.text)
print(elem1.text)
hrefs.append(elem.get_attribute("href"))
The result of the print statement above is 0 so basically the elements are not found. Can anyone please tell me what am I doing wrong.
You are using only part of the class name... in XPATH you need the full class name...
FYI: With CSS you can use part of the class name...
If you want to use XPATH try:
elems = driver.find_elements_by_xpath("//div[#class='someclasses']//li//a")
print(len(elems))
for elem in elems:
names2.append(elem.text)
print(elem.text)
new_href = elem.get_attribute("href")
print(new_href)
hrefs.append(new_href)
For CSS use: div.someclasses ul.ul-class1
elems = driver.find_elements_by_css_selector("div.someclasses ul.ul-class1 li a")
for elem in elems:
names2.append(elem.text)
print(elem.text)
new_href = elem.get_attribute("href")
print(new_href)
hrefs.append(new_href)

Unable to fetch the relevant links and discard others

I've written a script in python in combination with selenium along with BeautifulSoup to get the links leading to property details from a webpage. As the content are heavily dynamic, I made use of selenium to get the page source. When I run my script, I get lots of links including those required links.
How can I get only the relevant link from each container out of the three?
My try:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def fetch_info(link):
driver.get(link)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#community-search-homes .propertyWrapper > a")))
soup = BeautifulSoup(driver.page_source, "lxml")
linklist = [item.get("href") for item in soup.select("#community-search-homes .propertyWrapper > a")]
return linklist
if __name__ == '__main__':
url = "https://www.khov.com/find-new-homes/arizona/buckeye"
driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)
for newlink in fetch_info(url):
print(newlink)
driver.quit()
Results I'm having:
/find-new-homes/arizona/buckeye/85396/k-hovnanian-homes/aspire-at-sienna-hills
/find-new-homes/arizona/buckeye/85396/k-hovnanian-homes/affinity-at-verrado
/find-new-homes/arizona/buckeye/85396/four-seasons/k.-hovnanian's-four-seasons-at-victory-at-verrado
/find-new-homes/arizona/scottsdale/85255/k-hovnanian-homes/summit-at-silverstone
/find-new-homes/arizona/scottsdale/85257/k-hovnanian-homes/skye
/find-new-homes/arizona/phoenix/85020/k-hovnanian-homes/pointe-16
/find-new-homes/arizona/peoria/85383/k-hovnanian-homes/fusion-ii-at-the-meadows
/find-new-homes/arizona/scottsdale/85257/k-hovnanian-homes/aire
/find-new-homes/arizona/scottsdale/85255/k-hovnanian-homes/pinnacle-at-silverstone
/find-new-homes/arizona/peoria/85383/k-hovnanian-homes/montage-at-the-meadows
/find-new-homes/arizona/sun-city/85373/four-seasons/k.-hovnanian-s-four-seasons-at-ventana-lakes
/find-new-homes/arizona/peoria/85382/k-hovnanian-homes/park-paseo
/find-new-homes/arizona/laveen/85339/k-hovnanian-homes/affinity-at-montana-vista
/find-new-homes/arizona/laveen/85339/k-hovnanian-homes/aspire-at-montana-vista
/find-new-homes/arizona/scottsdale/85255/k-hovnanian-homes/pinnacle-ii-at-silverstone
/find-new-homes/arizona/scottsdale/85255/k-hovnanian-homes/summit-ii-at-silverstone
Results I would like to get:
/find-new-homes/arizona/buckeye/85396/k-hovnanian-homes/aspire-at-sienna-hills
/find-new-homes/arizona/buckeye/85396/k-hovnanian-homes/affinity-at-verrado
/find-new-homes/arizona/buckeye/85396/four-seasons/k.-hovnanian's-four-seasons-at-victory-at-verrado
A chunk of html elements (the link I'm after is within the second line of the following elements):
<div class="propertyWrapper clear">
<span class="link-outside"></span>
<div class="propertyCarouselWrapper">
<div class="responsiveImageCarousel enabled" style="touch-action: pan-y; user-select: none; -webkit-user-drag: none; -webkit-tap-highlight-color: rgba(0, 0, 0, 0);">
<div class="prevBtn"></div>
<div class="nextBtn"></div>
<div class="images" data-detail-url="/find-new-homes/arizona/buckeye/85396/k-hovnanian-homes/aspire-at-sienna-hills">
<ul style="width: 960px; left: 0px;">
<li style="width: 320px;"><img alt="holiday exterior new homes sienna hills usp" src="https://khovcachecdn.azureedge.net/azure/sitefinitylibraries/images/default-source/images/az/aspire-at-sienna-hills/community-thumbnails/holiday-exterior-new-homes-sienna-hills-usp.jpg?sfvrsn=4&build=1019&encoder=wic&useresizingpipeline=true&w=450&h=280&mode=crop"></li>
<li style="width: 320px;"><img alt="carnival exterior new homes sienna hills usp" src="https://khovcachecdn.azureedge.net/azure/sitefinitylibraries/images/default-source/images/az/aspire-at-sienna-hills/community-thumbnails/carnival-exterior-new-homes-sienna-hills-usp.jpg?sfvrsn=4&build=1019&encoder=wic&useresizingpipeline=true&w=450&h=280&mode=crop"></li>
</ul>
</div>
<div class="pagination" style="width: 56px;"><ul><li class="active"></li><li></li></ul></div>
</div>
</div>
<div class="propertyInfoWrapper">
<div class="marker-details-container">
<h3 class="marker-details">New Homes in Buckeye, Arizona</h3>
<div class="spacer"></div>
<h4 class="propertyListingHeader">Aspire at Sienna Hills</h4>
<p class="marker-details">21007 West Almeria Road, Buckeye, AZ 85396</p>
<p class="marker-details marker-status">Final Opportunities</p>
<div class="spacer"></div>
<p class="marker-details marker-price"><span class="bold">Priced from: </span>Mid $200s</p>
<p class="marker-details"><span class="bold">Home type: </span>Single Family Homes</p>
<p class="marker-details marker-amenities"><span class="bold">Amenities: </span>Pool, Hiking Trails, Park</p>
</div>
<div class="community-tag-container">
<a href="/find-new-homes/arizona/buckeye/85396/k-hovnanian-homes/aspire-at-sienna-hills#quick-move-in-homes" onclick="KHOV.Analytics.trackEvent('Qmi_Icon_Qmi');">
<div class="community-tag">
<div class="ctaDesc quick-move-in-badge link-inside">Quick Move In Homes</div>
<div class="ctaIcon quick-move-in-badge-icon link-inside"></div>
</div>
</a>
</div>
<a href="#request-info-form-modal" class="open-inline-modal-link" onclick="KHOV.Analytics.trackEvent('Orange_Ri_Request_Info');">
<div class="button orange-color requestInfoButton link-inside" data-urlname="aspire-at-sienna-hills">Request Info</div>
</a>
</div>
</div>
You need to include the featured id as well as results. You can use Or to combine. Latest bs4 supports not.
#propertyResultsContainer .propertyWrapper :not([onclick])[href*=find], #propertyFeaturedResultsContainer .propertyWrapper :not([onclick])[href*=find]
This can also be shortened to
#propertyResultsContainer .propertyWrapper :not([onclick])[href*=find], #propertyFeaturedResultsContainer
But that shortening may be less robust.
You can just check for the desired keyword in the link and print those, and ignore the others:
if __name__ == '__main__':
url = "https://www.khov.com/find-new-homes/arizona/buckeye"
driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)
for newlink in fetch_info(url):
if url.split('/')[-1] in newlink:
print(newlink)
driver.quit()
Output:
/find-new-homes/arizona/buckeye/85396/k-hovnanian-homes/aspire-at-sienna-hills
/find-new-homes/arizona/buckeye/85396/k-hovnanian-homes/affinity-at-verrado
/find-new-homes/arizona/buckeye/85396/four-seasons/k.-hovnanian's-four-seasons-at-victory-at-verrado
Would list slicing works?
def fetch_info(link):
driver.get(link)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#community-search-homes .propertyWrapper > a")))
soup = BeautifulSoup(driver.page_source, "lxml")
linklist = [item.get("href") for item in soup.select("#community-search-homes .propertyWrapper > a")][:3]
return linklist

Looping over multiple tooltips

I am trying to get names and affiliations of authors from a series of articles from this page (you'll need to have access to Proquest to visualise it). What I want to do is to open all the tooltips present at the top of the page, and extract some HTML text from them. This is my code:
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
browser = webdriver.Firefox()
url = 'http://search.proquest.com/econlit/docview/56607849/citation/2876523144F544E0PQ/3?accountid=13042'
browser.get(url)
#insert your username and password here
n_authors = browser.find_elements_by_class_name('zoom') #zoom is the class name of the three tooltips that I want to open in my loop
author = []
institution = []
for a in n_authors:
print(a)
ActionChains(browser).move_to_element(a).click().perform()
html_author = browser.find_element_by_xpath('//*[#id="authorResolveLinks"]/li/div/a').get_attribute('innerHTML')
html_institution = browser.find_element_by_xpath('//*[#id="authorResolveLinks"]/li/div/p').get_attribute('innerHTML')
author.append(html_author)
institution.append(html_institution)
Although n_authors has three entries that are apparently different from one another, selenium fails to get the info from all tooltips, instead returning this:
author
#['Nuttall, William J.',
#'Nuttall, William J.',
#'Nuttall, William J.']
And the same happens for the institution. What am I getting wrong? Thanks a lot
EDIT:
The array containing the xpaths of the tooltips:
n_authors
#[<selenium.webdriver.remote.webelement.WebElement (session="277c8abc-3883-
#43a8-9e93-235a8ded80ff", element="{008a2ade-fc82-4114-b1bf-cc014d41c40f}")>,
#<selenium.webdriver.remote.webelement.WebElement (session="277c8abc-3883-
#43a8-9e93-235a8ded80ff", element="{c4c2d89f-3b8a-42cc-8570-735a4bd56c07}")>,
#<selenium.webdriver.remote.webelement.WebElement (session="277c8abc-3883-
#43a8-9e93-235a8ded80ff", element="{9d06cb60-df58-4f90-ad6a-43afeed49a87}")>]
Which has length 3, and the three elements are different, which is why I don't understand why selenium won't distinguish them.
EDIT 2:
Here is the relevant HTML
<span class="titleAuthorETC small">
<span style="display:none" class="title">false</span>
Jamasb, Tooraj
<a class="zoom" onclick="return false;" href="#">
<img style="margin-left:4px; border:none" alt="Visualizza profilo" id="resolverCitation_previewTrigger_0" title="Visualizza profilo" src="/assets/r20161.1.0-4/ctx/images/scholarUniverse/ar_button.gif">
</a><script type="text/javascript">Tips.images = '/assets/r20161.1.0-4/pqc/javascript/prototip/images/prototip/';</script>; Nuttall, William J
<a class="zoom" onclick="return false;" href="#">
<img style="margin-left:4px; border:none" alt="Visualizza profilo" id="resolverCitation_previewTrigger_1" title="Visualizza profilo" src="/assets/r20161.1.0-4/ctx/images/scholarUniverse/ar_button.gif">
</a>; Pollitt, Michael G
<a class="zoom" onclick="return false;" href="#">
<img style="margin-left:4px; border:none" alt="Visualizza profilo" id="resolverCitation_previewTrigger_2" title="Visualizza profilo" src="/assets/r20161.1.0-4/ctx/images/scholarUniverse/ar_button.gif">
</a>.
UPDATE:
#parishodak's answer, for some reason does not work using Firefox, unless I manually hover over the tooltips first. It works with chromedriver, but only if I first hover over the tooltips, and only if I allow time.sleep(), as in
for i in itertools.count():
try:
tooltip = browser.find_element_by_xpath('//*[#id="resolverCitation_previewTrigger_' + str(i) + '"]')
print(tooltip)
ActionChains(browser).move_to_element(tooltip).perform() #
except NoSuchElementException:
break
time.sleep(2)
elements = browser.find_elements_by_xpath('//*[#id="authorResolveLinks"]/li/div/a')
author = []
for e in elements:
print(e)
attribute = e.get_attribute('innerHTML')
author.append(attribute)`
The reason it is returning the same element, because xpath is not changing for all the loop iterations.
Two ways to deal:
Use array notation for xpath as described below:
browser.find_elements_by_xpath('//*[#id="authorResolveLinks"]/li/div/a[1]').get_attribute('innerHTML')
browser.find_elements_by_xpath('//*[#id="authorResolveLinks"]/li/div/a[2]').get_attribute('innerHTML')
browser.find_elements_by_xpath('//*[#id="authorResolveLinks"]/li/div/a[3]').get_attribute('innerHTML')
Or
Instead of find_element_by_xpath use find_elements_by_xpath
elements = browser.find_elements_by_xpath('//*[#id="authorResolveLinks"]/li/div/a')
loop over elements and use get_attribute('innerHTML') on each element in loop iteration.

Categories