Selenium: Can't find an element in HTML - python

Hi I am working on the script to automate downloads of videos from this side https://pixabay.com/videos/
I can find a class with href(href is an attribute with URL) but after that Selenium gives me a bug with any error only a result of print(xy.get_atribute("href)) is None:
my code:
from selenium import webdriver
from selenium.webdriver.firefox.service import Service
from webdriver_manager.firefox import GeckoDriverManager
from selenium.webdriver.common.by import By
from mutagen.mp3 import MP3
import requests
import time
tag = "city "
while True:
s = Service(GeckoDriverManager().install())
driver = webdriver.Firefox(service=s)
driver.minimize_window()
tester = tag.split()
print(tester)
print(len(tester))
if len(tester) == 2:
tag = tester[0] + "%20" + tester[1]
print(tag)
print("2 " + tag)
driver.get("https://pixabay.com/cs/videos/search/" + tag )
images = driver.find_elements(By.CLASS_NAME, 'item' )
print(images)
n = 0
lenght = 0
for image in images:
image = image.get_attribute("href")
print(image)
break
HTML on the side
<html lang="cs" prefix="og: http://ogp.me/ns#">
<head>.</head> <body class="" data-new-gr-C-5-check-loaded="14.1050." data-gr-ext-installed>
<noscript>_</noscript> <div id="wrapper"> > <div id="header">...</div> <div id="content" class="clearfix">
::before <div id="search-term" style="display:none">city</div> <div class="media_list">
<div style="border-bottom:1px solid #f0f1f4"> </div> <div style="background:#e8eaec" class="external-media">.</div> <div style="background: #f6f5fa"> <div style="max-width: 1824px;padding: 10px 3px 20px;margin: auto">
<h1 style="font-size: 13px;color:#bbb;margin:0 19px; position:relative;top:2px">70 videa zdarma z city</h1> <div class="related-keywords">...</div> <div class="row-masonry video video-search-results"> flex <div class="row-masonry-cell" style="flex-basis: 355.55555555555554px; flex-grow: 1.7777777777777777; flex-shrink: 1.7777777777777777; max-width: 622.2222222222222px"> <div class="row-masonry-cell-outer" style="padding-top: 56.25%"> <div class="row-masonry-cell-inner"> <div itemscope itemtype="schema.org/videoobject" class="item" data-w="1920" data-h="1980">
<meta itemprop="license" content="https://creativecommons.org/licenses/publicdomain/"> <meta itemprop="contentUrl" content="//player.vimeo.com/external/142621375.mobile.mp4?s=e9a3c9616798b6f3de74d579ea8314acc75fad72&profile_id=116"> <meta itemprop="thumbnailUrl" content="https://i.vimeocdn.com/video/539965294-5d28c2680682aa5173e86fa74acb94671783ba7e2dc2892682e897c8a158af75-d_640x360.jpg"> <meta itemprop="name" content="New York City, Manhattan, Lidé"> <meta itemprop="description" content="New York City, Manhattan, Lidé, Auta, Rozcestí, Amerika"> <meta itemprop="duration" content="TM145"> <meta itemprop="uploadDate" content="2022-02-16"> <a href="/cs/videos/new-york-city-manhattan-lidC3%A9-auta-1944/"> == $ <div class="media" data-mp4="//player.vimeo.com/external/142621375.mobile.mp4?s=e9a3c9616798b6f3de74d579ea8314acc75fad72&profile_id=116">
<img class="video-preview" src="https://i.vimeocdn.com/video/539965294-5d28c26...d 640x360.jpg" alt="New York City, Manhattan, Lidé, Auta">
<i></i> </div> </a> ►<em class="info-corner">-</em>
<em class="info-line">-</em> flex </div> </div> </div>

To extract the values of the href attributes you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:
Using CSS_SELECTOR:
driver.get("https://pixabay.com/videos/search/madona/")
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.item>a")))])
Using XPATH:
driver.get("https://pixabay.com/videos/search/madona/")
print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[#class='item']/a")))])
Console Output:
['https://pixabay.com/link/?ua=t%3Devent%26ec%3Dapi_ad%26ea%3Dnavigate%26el%3Dgetty%26v%3D1%26tid%3DUA-20223345-1&next=https%3A%2F%2Fwww.istockphoto.com%2Fvideo%2Fthe-blessed-virgin-mary-in-front-of-the-roman-catholic-diocese-public-place-in-gm1297534089-390641862%3Futm_source%3Dpixabay%26utm_medium%3Daffiliate%26utm_campaign%3DSRP_video_sponsored%26utm_content%3Dhttp%253A%252F%252Fpixabay.com%252Fvideos%252Fsearch%252Fmadona%252F%26utm_term%3Dmadona&hash=96d4db58e5ed4dfa33719b6789ec2e54c6a9e93c', 'https://pixabay.com/link/?ua=t%3Devent%26ec%3Dapi_ad%26ea%3Dnavigate%26el%3Dgetty%26v%3D1%26tid%3DUA-20223345-1&next=https%3A%2F%2Fwww.istockphoto.com%2Fvideo%2Fshining-star-landscape-above-the-nativity-scene-in-bethlehem-in-the-middle-of-the-gm1284414826-381542953%3Futm_source%3Dpixabay%26utm_medium%3Daffiliate%26utm_campaign%3DSRP_video_sponsored%26utm_content%3Dhttp%253A%252F%252Fpixabay.com%252Fvideos%252Fsearch%252Fmadona%252F%26utm_term%3Dmadona&hash=63c5e9d5424c6ed9234b7558a47c2f9bd34b0b2f', 'https://pixabay.com/link/?ua=t%3Devent%26ec%3Dapi_ad%26ea%3Dnavigate%26el%3Dgetty%26v%3D1%26tid%3DUA-20223345-1&next=https%3A%2F%2Fwww.istockphoto.com%2Fvideo%2Fvirgin-mary-statue-and-stained-glass-window-cathedral-la-major-marseille-france-gm523679382-92792807%3Futm_source%3Dpixabay%26utm_medium%3Daffiliate%26utm_campaign%3DSRP_video_sponsored%26utm_content%3Dhttp%253A%252F%252Fpixabay.com%252Fvideos%252Fsearch%252Fmadona%252F%26utm_term%3Dmadona&hash=eba8300b6dc8d879d5c0c20ad4159667075b13a8', 'https://pixabay.com/link/?ua=t%3Devent%26ec%3Dapi_ad%26ea%3Dnavigate%26el%3Dgetty%26v%3D1%26tid%3DUA-20223345-1&next=https%3A%2F%2Fwww.istockphoto.com%2Fvideo%2Fmary-and-joseph-kissing-and-touching-baby-jesus-gm1331078248-414315796%3Futm_source%3Dpixabay%26utm_medium%3Daffiliate%26utm_campaign%3DSRP_video_sponsored%26utm_content%3Dhttp%253A%252F%252Fpixabay.com%252Fvideos%252Fsearch%252Fmadona%252F%26utm_term%3Dmadona&hash=0d8025f74d6831ec1728a487c6c348fbb0489c0e', 'https://pixabay.com/link/?ua=t%3Devent%26ec%3Dapi_ad%26ea%3Dnavigate%26el%3Dgetty%26v%3D1%26tid%3DUA-20223345-1&next=https%3A%2F%2Fwww.istockphoto.com%2Fvideo%2Fthe-blessed-virgin-mary-in-front-of-the-roman-catholic-diocese-public-place-in-gm1297536160-390641882%3Futm_source%3Dpixabay%26utm_medium%3Daffiliate%26utm_campaign%3DSRP_video_sponsored%26utm_content%3Dhttp%253A%252F%252Fpixabay.com%252Fvideos%252Fsearch%252Fmadona%252F%26utm_term%3Dmadona&hash=1a8d9dc7db4a76fe4fe8b15d050e950a4ef6ceee', 'https://pixabay.com/link/?ua=t%3Devent%26ec%3Dapi_ad%26ea%3Dnavigate%26el%3Dgetty%26v%3D1%26tid%3DUA-20223345-1&next=https%3A%2F%2Fwww.istockphoto.com%2Fvideo%2Fmary-and-joseph-speaking-and-taking-care-of-baby-jesus-gm1331074139-414312596%3Futm_source%3Dpixabay%26utm_medium%3Daffiliate%26utm_campaign%3DSRP_video_sponsored%26utm_content%3Dhttp%253A%252F%252Fpixabay.com%252Fvideos%252Fsearch%252Fmadona%252F%26utm_term%3Dmadona&hash=71968733d6752db862f8f614ec81965c72c045c7', 'https://pixabay.com/link/?ua=t%3Devent%26ec%3Dapi_ad%26ea%3Dnavigate%26el%3Dgetty%26v%3D1%26tid%3DUA-20223345-1&next=https%3A%2F%2Fwww.istockphoto.com%2Fvideo%2Fvirgin-mary-over-the-village-of-maaloula-in-syria-view-of-the-virgin-mary-in-the-gm1225424859-360681300%3Futm_source%3Dpixabay%26utm_medium%3Daffiliate%26utm_campaign%3DSRP_video_sponsored%26utm_content%3Dhttp%253A%252F%252Fpixabay.com%252Fvideos%252Fsearch%252Fmadona%252F%26utm_term%3Dmadona&hash=77ee3cc8968570cce77488fd36324f9a21aa8cdb', 'https://pixabay.com/link/?ua=t%3Devent%26ec%3Dapi_ad%26ea%3Dnavigate%26el%3Dgetty%26v%3D1%26tid%3DUA-20223345-1&next=https%3A%2F%2Fwww.istockphoto.com%2Fvideo%2Fstatue-of-st-mary-in-the-church-gm1192697685-338971692%3Futm_source%3Dpixabay%26utm_medium%3Daffiliate%26utm_campaign%3DSRP_video_sponsored%26utm_content%3Dhttp%253A%252F%252Fpixabay.com%252Fvideos%252Fsearch%252Fmadona%252F%26utm_term%3Dmadona&hash=0bc49cf096c333282f735ce562f59c9eb17309f4', 'https://pixabay.com/link/?ua=t%3Devent%26ec%3Dapi_ad%26ea%3Dnavigate%26el%3Dgetty%26v%3D1%26tid%3DUA-20223345-1&next=https%3A%2F%2Fwww.istockphoto.com%2Fvideo%2Fnotre-dame-de-paris-exterior-beautiful-statue-of-virgin-and-child-architecture-gm980707058-266394749%3Futm_source%3Dpixabay%26utm_medium%3Daffiliate%26utm_campaign%3DSRP_video_sponsored%26utm_content%3Dhttp%253A%252F%252Fpixabay.com%252Fvideos%252Fsearch%252Fmadona%252F%26utm_term%3Dmadona&hash=69ab20290bfcaef4b644d7f888618d569cd0c994', 'https://pixabay.com/link/?ua=t%3Devent%26ec%3Dapi_ad%26ea%3Dnavigate%26el%3Dgetty%26v%3D1%26tid%3DUA-20223345-1&next=https%3A%2F%2Fwww.istockphoto.com%2Fvideo%2Fmary-and-joseph-with-baby-jesus-in-barn-gm1331079736-414317051%3Futm_source%3Dpixabay%26utm_medium%3Daffiliate%26utm_campaign%3DSRP_video_sponsored%26utm_content%3Dhttp%253A%252F%252Fpixabay.com%252Fvideos%252Fsearch%252Fmadona%252F%26utm_term%3Dmadona&hash=dbae2b824a22623d12eaf1127f0fd33f4de2cc41', 'https://pixabay.com/link/?ua=t%3Devent%26ec%3Dapi_ad%26ea%3Dnavigate%26el%3Dgetty%26v%3D1%26tid%3DUA-20223345-1&next=https%3A%2F%2Fwww.istockphoto.com%2Fvideo%2Fchristmas-nativity-gm113747428-13537033%3Futm_source%3Dpixabay%26utm_medium%3Daffiliate%26utm_campaign%3DSRP_video_sponsored%26utm_content%3Dhttp%253A%252F%252Fpixabay.com%252Fvideos%252Fsearch%252Fmadona%252F%26utm_term%3Dmadona&hash=496f4633dd6cd6394f1a06959e4c6dd089d3aeff', 'https://pixabay.com/link/?ua=t%3Devent%26ec%3Dapi_ad%26ea%3Dnavigate%26el%3Dgetty%26v%3D1%26tid%3DUA-20223345-1&next=https%3A%2F%2Fwww.istockphoto.com%2Fvideo%2Fsculpture-of-the-image-of-nossa-senhora-aparecida-the-patroness-of-brazil-gm1348138435-425434800%3Futm_source%3Dpixabay%26utm_medium%3Daffiliate%26utm_campaign%3DSRP_video_sponsored%26utm_content%3Dhttp%253A%252F%252Fpixabay.com%252Fvideos%252Fsearch%252Fmadona%252F%26utm_term%3Dmadona&hash=c745fbc9d7b69eeaa2186c9843f8d566590b5e61', 'https://pixabay.com/link/?ua=t%3Devent%26ec%3Dapi_ad%26ea%3Dnavigate%26el%3Dgetty%26v%3D1%26tid%3DUA-20223345-1&next=https%3A%2F%2Fwww.istockphoto.com%2Fvideo%2Fchurch-icon-close-up-gm824010680-134889129%3Futm_source%3Dpixabay%26utm_medium%3Daffiliate%26utm_campaign%3DSRP_video_sponsored%26utm_content%3Dhttp%253A%252F%252Fpixabay.com%252Fvideos%252Fsearch%252Fmadona%252F%26utm_term%3Dmadona&hash=89a9e0bee0d8d49c4dad52407e63e5ab37a98b55', 'https://pixabay.com/link/?ua=t%3Devent%26ec%3Dapi_ad%26ea%3Dnavigate%26el%3Dgetty%26v%3D1%26tid%3DUA-20223345-1&next=https%3A%2F%2Fwww.istockphoto.com%2Fvideo%2Fangled-detail-of-icon-image-of-virgin-mary-in-st-nicholas-orthodox-cathedral-in-nice-gm860958856-143133885%3Futm_source%3Dpixabay%26utm_medium%3Daffiliate%26utm_campaign%3DSRP_video_sponsored%26utm_content%3Dhttp%253A%252F%252Fpixabay.com%252Fvideos%252Fsearch%252Fmadona%252F%26utm_term%3Dmadona&hash=0fc269e8cf9d4fc2c0a6384537349b33fa98f963', 'https://pixabay.com/link/?ua=t%3Devent%26ec%3Dapi_ad%26ea%3Dnavigate%26el%3Dgetty%26v%3D1%26tid%3DUA-20223345-1&next=https%3A%2F%2Fwww.istockphoto.com%2Fvideo%2Fvitaleta-chapel-aerial-view-in-the-wonderful-valley-of-orcia-tuscany-la-toscana-drone-gm1309396875-399121012%3Futm_source%3Dpixabay%26utm_medium%3Daffiliate%26utm_campaign%3DSRP_video_sponsored%26utm_content%3Dhttp%253A%252F%252Fpixabay.com%252Fvideos%252Fsearch%252Fmadona%252F%26utm_term%3Dmadona&hash=356539420e275a4aae53df6dfc2ec1e22a469f5a', 'https://pixabay.com/link/?ua=t%3Devent%26ec%3Dapi_ad%26ea%3Dnavigate%26el%3Dgetty%26v%3D1%26tid%3DUA-20223345-1&next=https%3A%2F%2Fwww.istockphoto.com%2Fvideo%2Fburning-candles-in-the-cathedral-of-chartres-gm1358325815-431987333%3Futm_source%3Dpixabay%26utm_medium%3Daffiliate%26utm_campaign%3DSRP_video_sponsored%26utm_content%3Dhttp%253A%252F%252Fpixabay.com%252Fvideos%252Fsearch%252Fmadona%252F%26utm_term%3Dmadona&hash=3e6b1cd3f8922c80cb2ea4aacbc5f0ca13b48024', 'https://pixabay.com/link/?ua=t%3Devent%26ec%3Dapi_ad%26ea%3Dnavigate%26el%3Dgetty%26v%3D1%26tid%3DUA-20223345-1&next=https%3A%2F%2Fwww.istockphoto.com%2Fvideo%2Flourdes-france-sanctuary-of-our-lady-of-lourdes-a-famous-pilgrimage-place-gm1352878773-428151138%3Futm_source%3Dpixabay%26utm_medium%3Daffiliate%26utm_campaign%3DSRP_video_sponsored%26utm_content%3Dhttp%253A%252F%252Fpixabay.com%252Fvideos%252Fsearch%252Fmadona%252F%26utm_term%3Dmadona&hash=e29f6ee34464b11218e1bba7f72d321db571968a', 'https://pixabay.com/link/?ua=t%3Devent%26ec%3Dapi_ad%26ea%3Dnavigate%26el%3Dgetty%26v%3D1%26tid%3DUA-20223345-1&next=https%3A%2F%2Fwww.istockphoto.com%2Fvideo%2Fsanctuary-of-our-lady-of-lourdes-gm1348213795-425484844%3Futm_source%3Dpixabay%26utm_medium%3Daffiliate%26utm_campaign%3DSRP_video_sponsored%26utm_content%3Dhttp%253A%252F%252Fpixabay.com%252Fvideos%252Fsearch%252Fmadona%252F%26utm_term%3Dmadona&hash=00fc1155414568f9c01d4d8e07b3e7d6396e9c76', 'https://pixabay.com/link/?ua=t%3Devent%26ec%3Dapi_ad%26ea%3Dnavigate%26el%3Dgetty%26v%3D1%26tid%3DUA-20223345-1&next=https%3A%2F%2Fwww.istockphoto.com%2Fvideo%2Frosaries-on-the-bridge-in-lourdes-gm1348213738-425484842%3Futm_source%3Dpixabay%26utm_medium%3Daffiliate%26utm_campaign%3DSRP_video_sponsored%26utm_content%3Dhttp%253A%252F%252Fpixabay.com%252Fvideos%252Fsearch%252Fmadona%252F%26utm_term%3Dmadona&hash=ffc4132d8125f9da46baff336297150fee0617e2', 'https://pixabay.com/link/?ua=t%3Devent%26ec%3Dapi_ad%26ea%3Dnavigate%26el%3Dgetty%26v%3D1%26tid%3DUA-20223345-1&next=https%3A%2F%2Fwww.istockphoto.com%2Fvideo%2Fa-statue-of-the-virgin-mary-in-lourdes-gm1348213561-425484841%3Futm_source%3Dpixabay%26utm_medium%3Daffiliate%26utm_campaign%3DSRP_video_sponsored%26utm_content%3Dhttp%253A%252F%252Fpixabay.com%252Fvideos%252Fsearch%252Fmadona%252F%26utm_term%3Dmadona&hash=198c52a01d8f0d8e2c2079ea5238cfc1f3da8d00']
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Well, first of all, you haven't actually provided what the error is, which would be helpful.
Additionally, your for loop is overwriting the image var which is just bad practice, but shouldn't break anything.
Finally, it looks like you've written an infinite loop with a non-conditional break at the end and no continue. Is this supposed to be a loop?

Related

Selenium python : my current_url doesnt update after click

Im scraping a website where I need to retrieve values from the url when i click on a button providing different form values.
I have a problem: when i click the button and retrieve the current_url, the provided values in the forms doesnt reflect in the url which should be updated (it's a search button). There is no new tab created.
My code to retrieve the url value is :
driver = webdriver.Firefox()
driver.get(url)
arrlist = []
idlist = []
service=value
for i in key_list:
form = driver.find_elements(by=By.XPATH, value='//input[#id="geo_nav"]')
form[0].send_keys(i)
form2=driver.find_elements(by=By.XPATH, value='//input[#id="sev_nav"]')
form2[0].send_keys(service)
button=driver.find_elements(by=By.XPATH, value='//button[#data-role="filter-apply"]')
button[0].click()
time.sleep(5)
url=driver.current_url
print(dept)
print(i)
id=re.findall(r"(?<=\[population\]=)(\d{9})",url)[0]
arrlist.append(i)
idlist.append(id)
the button html code is :
<button class="filter-apply cta-navigate relative hide-mobile flex withNumber" data-role="filter-apply">
<p class="hide-mobile m-r-4">Appliquer</p>
<div class="svg relative">
<span class="filters-apply-length">2</span>
<svg height="18" viewBox="0 0 16 18" width="16" xmlns="http://www.w3.org/2000/svg"><path d="m10.877 17.457 2.026 1.533v-4.553c0-.166.042-.329.12-.475l4.3-7.962h-10.68l4.122 7.978c.074.142.112.3.112.459zm3.026 4.543c-.213 0-.426-.068-.603-.203l-4.026-3.045c-.25-.189-.397-.484-.397-.797v-3.274l-4.765-9.222c-.161-.31-.148-.681.034-.979.181-.298.505-.48.854-.48h14c.352 0 .678.185.859.488.18.302.188.677.021.987l-4.977 9.215v6.31c0 .379-.214.726-.554.895-.141.07-.294.105-.446.105z" fill="#0579c7" fill-rule="evenodd" transform="translate(-4 -4)"></path></svg> </div>
</button>
I've tried to use
driver.switch_to.window(driver.window_handles[-1]);
following this post : Python Selenium Chromedriver - Can't Get current_url of new opened tab after click()
But I dont have tab or new windows issues.
I tried to click autocompletion lists in the 2 forms in inputand one of the form produces a modification of the url but not the other (the one of which effects on the url i need to monitor).
The form code that works is :
<form data-component="sev_nav_input" data-no-results="Sans résultats" data-default-pho="Services" data-selected-name="Achat compulsif" data-selected-id="5928" class="filter-input filter-services relative">
<input type="text" placeholder="Services" autocomplete="off" name="sev_nav" id="sev_nav" data-role="js_filter" data-id="5928" class="autocomplete-with-result">
<span id="clear-sev-input" class="clear-sev-input" style="display: none;">
<img src="data:image/svg+xml;base64,PHN2ZyBoZWlnaHQ9IjI0IiB2aWV3Qm94PSIwIDAgMjQgMjQiIHdpZHRoPSIyNCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48ZyBmaWxsPSJub25lIiBmaWxsLXJ1bGU9ImV2ZW5vZGQiPjxwYXRoIGQ9Im0wIDBoMjR2MjRoLTI0eiIvPjxwYXRoIGQ9Im0xMS43NSAyMC41YzIuNDIyIDAgNC40ODYtLjg1MyA2LjE5MS0yLjU1OSAxLjcwNi0xLjcwNSAyLjU1OS0zLjc3IDIuNTU5LTYuMTkxIDAtMi40MjItLjg1My00LjQ4Ni0yLjU1OS02LjE5MS0xLjcwNS0xLjcwNi0zLjc3LTIuNTU5LTYuMTkxLTIuNTU5LTIuNDIyIDAtNC40OTIuODYtNi4yMSAyLjU3OC0xLjY5NSAxLjY5My0yLjU0IDMuNzUtMi41NCA2LjE3MnMuODUzIDQuNDg2IDIuNTU5IDYuMTkxYzEuNzA1IDEuNzA2IDMuNzcgMi41NTkgNi4xOTEgMi41NTl6bTMuMTY0LTQuNDE0Yy0uMDUyIDAtLjExNy0uMDQtLjE5NS0uMTE3bC0yLjk2OS0yLjkzLTIuOTMgMi45NjljLS4wNTIuMDUyLS4xMy4wNzgtLjIzNC4wNzhzLS4xODItLjAyNi0uMjM0LS4wNzhsLS44Mi0uODZjLS4wNTMtLjA1Mi0uMDc5LS4xMy0uMDc5LS4yMzRzLjAyNi0uMTcuMDc4LS4xOTVsMi45NjktMi45NjktMi45NjktMi45M2MtLjE1Ni0uMTU2LS4xNTYtLjMxMiAwLS40NjhsLjgyLS44MmMuMDc5LS4wNzkuMTU3LS4xMTguMjM1LS4xMTguMDUyIDAgLjExNy4wNC4xOTUuMTE3bDIuOTY5IDIuODkgMi45NjktMi44OWMuMDc4LS4wNzguMTQzLS4xMTcuMTk1LS4xMTcuMDc4IDAgLjE1Ni4wNC4yMzQuMTE3bC44Ni44MmMuMTU2LjE1Ny4xNTYuMzEzIDAgLjQ3bC0yLjk2OSAyLjkyOSAyLjkzIDIuOTNjLjA3OC4wNzguMTE3LjE1Ni4xMTcuMjM0IDAgLjEwNC0uMDQuMTgyLS4xMTcuMjM0bC0uODIuODJjLS4wNzkuMDc5LS4xNTcuMTE4LS4yMzUuMTE4eiIgZmlsbD0iIzE0OWM5NyIvPjwvZz48L3N2Zz4K"></span>
<span class="gradient"></span>
<span class="gradient" style="display: none;"></span>
<div class="spinner" style="display: none;"></div>
<div id="services-list" class="services-list" style="display: none;"><ul data-role="autocomplete-list" class="autocomplete-list"> </ul></div></form>
The form code that doesnt work is :
<form data-component="geo_nav_input" data-selected-name="" data-selected-id="" data-selected-neighborhood-id="0" data-selected-type="" data-no-results="Sans résultats" data-pho="Localité" data-default-pho="Localité" class="filter-input relative">
<div class="hide">Chercher des professionnels en/à...</div>
<span class="icon-x toggle_geo_nav hide"></span>
<label for="geo_nav" class="hidden-label">Localité</label>
<input type="text" placeholder="Localité" autocomplete="off" name="geo_nav" id="geo_nav" data-role="js_filter" data-id="" data-neighborhoodid="0" data-type="" class="autocomplete-with-result"> <span id="clear-geo-input" class="clear-geo-input" style="display: none;">
<img src="data:image/svg+xml;base64,PHN2ZyBoZWlnaHQ9IjI0IiB2aWV3Qm94PSIwIDAgMjQgMjQiIHdpZHRoPSIyNCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48ZyBmaWxsPSJub25lIiBmaWxsLXJ1bGU9ImV2ZW5vZGQiPjxwYXRoIGQ9Im0wIDBoMjR2MjRoLTI0eiIvPjxwYXRoIGQ9Im0xMS43NSAyMC41YzIuNDIyIDAgNC40ODYtLjg1MyA2LjE5MS0yLjU1OSAxLjcwNi0xLjcwNSAyLjU1OS0zLjc3IDIuNTU5LTYuMTkxIDAtMi40MjItLjg1My00LjQ4Ni0yLjU1OS02LjE5MS0xLjcwNS0xLjcwNi0zLjc3LTIuNTU5LTYuMTkxLTIuNTU5LTIuNDIyIDAtNC40OTIuODYtNi4yMSAyLjU3OC0xLjY5NSAxLjY5My0yLjU0IDMuNzUtMi41NCA2LjE3MnMuODUzIDQuNDg2IDIuNTU5IDYuMTkxYzEuNzA1IDEuNzA2IDMuNzcgMi41NTkgNi4xOTEgMi41NTl6bTMuMTY0LTQuNDE0Yy0uMDUyIDAtLjExNy0uMDQtLjE5NS0uMTE3bC0yLjk2OS0yLjkzLTIuOTMgMi45NjljLS4wNTIuMDUyLS4xMy4wNzgtLjIzNC4wNzhzLS4xODItLjAyNi0uMjM0LS4wNzhsLS44Mi0uODZjLS4wNTMtLjA1Mi0uMDc5LS4xMy0uMDc5LS4yMzRzLjAyNi0uMTcuMDc4LS4xOTVsMi45NjktMi45NjktMi45NjktMi45M2MtLjE1Ni0uMTU2LS4xNTYtLjMxMiAwLS40NjhsLjgyLS44MmMuMDc5LS4wNzkuMTU3LS4xMTguMjM1LS4xMTguMDUyIDAgLjExNy4wNC4xOTUuMTE3bDIuOTY5IDIuODkgMi45NjktMi44OWMuMDc4LS4wNzguMTQzLS4xMTcuMTk1LS4xMTcuMDc4IDAgLjE1Ni4wNC4yMzQuMTE3bC44Ni44MmMuMTU2LjE1Ny4xNTYuMzEzIDAgLjQ3bC0yLjk2OSAyLjkyOSAyLjkzIDIuOTNjLjA3OC4wNzguMTE3LjE1Ni4xMTcuMjM0IDAgLjEwNC0uMDQuMTgyLS4xMTcuMjM0bC0uODIuODJjLS4wNzkuMDc5LS4xNTcuMTE4LS4yMzUuMTE4eiIgZmlsbD0iIzE0OWM5NyIvPjwvZz48L3N2Zz4K"></span>
<span class="gradient"></span>
<span class="gradient" style="display: none;"></span>
<div class="spinner" style="display: none;"></div>
<div id="location-list" class="location-list" style="display: none;"><ul data-role="autocomplete-list" class="autocomplete-list"> </ul></div>
</form>
Can you make a function to navigate pages, and on each page do the actions you require. And with each call of the function use driver.switch_to.window to ensure you are on the latest page.
Although based on your edits, it now seems the issue is that you are having trouble locating and following one of the links on the pages.
def navigate(n):
""" Move through the pages. Select the relevant buttons on each page"""
window_after = driver.window_handles[0]
driver.switch_to.window(window_after)
if n == 0:
form = driver.find_elements(by=By.XPATH, value='//input[#id="geo_nav"]')
button = driver.find_elements(by=By.XPATH, value='//button[#data-role="filter-apply"]').click()
elif n == 1:
pass
# Do something
else:
pass
# Do something else
for i in range(3):
navigate(i)
time.sleep(3)
The solution was in fact linked to the autocompletion forms. They require you to click on the autocompletion suggestions so the button is actually working.
FYI, here is the full code to autocomplete with clicking the form, deleting the content, adding the content, clicking the list and clicking the button.
def get_city_locations(service):
url='url'
#options = Options()
#options.headless = True
driver = webdriver.Firefox()#options=options)
driver.get(url)
time.sleep(2)
buttoncookie = driver.find_elements(by=By.XPATH, value='//button[#class="cf2Lf6"]')
buttoncookie[0].click()
time.sleep(1)
form2 = driver.find_elements(by=By.XPATH, value='//input[#id="sev_nav"]')
form2[0].click()
time.sleep(1)
Static.clear_text(driver)
form2[0].send_keys(service)
time.sleep(1)
autocompleteservice = driver.find_elements(by=By.XPATH, value='//li[not(#class)]')
for f in autocompleteservice:
if f.text == service:
f.click()
df_pref=pd.read_csv('arrondissement_2022.csv',sep=',')
deptlist = []
arrlist = []
idlist = []
for i in df_pref['LIBELLE']:
df_dep=df_pref[df_pref['LIBELLE']==i]
dept = df_dep.loc[df_dep.index.values[0], 'DEP']
form = driver.find_elements(by=By.XPATH, value='//input[#id="geo_nav"]')
form[0].click()
time.sleep(1)
Static.clear_text(driver)
form[0].send_keys(i)
time.sleep(3)
autocompletelocation=driver.find_elements(by=By.XPATH, value='//li[not(#class)]')
cond=0
for a in autocompletelocation:
if a.text==i:
print ('condition ok')
cond=1
a.click()
break
time.sleep(3)
button=driver.find_elements(by=By.XPATH, value='//button[#data-role="filter-apply"]')
button[0].click()
time.sleep(3)
driver.switch_to.window(driver.window_handles[-1]);
url=driver.current_url
print(dept)
print(i)
print(url)
if cond==0:
id=0
else:
id=re.findall(r"(?<=\[population\]=)(\d{7,10})",url)[0]
print(f'id = {id}')
print('\n')
deptlist.append(dept)
arrlist.append(i)
idlist.append(id)
df0 = pd.DataFrame({"dept": deptlist, "arrondissement":arrlist,"id":idlist})
df0.to_csv('arr_id.csv',sep=';',index=False)

Unable to click button Shopify/Selenium

Unable to click the "Continue to payment button" on shopify site. I have seen several similar post but most of them are for js and do not mention the spinner part of the error.
driver.find_element_by_xpath ('//*[#id="continue_button"]/svg')
<div class="content-box__row">
<div class="radio-wrapper" data-shipping-method="shopify-Standard%20Shipping-15.00">
<div class="radio__input">
<input class="input-radio" data-checkout-total-shipping="$15.00" data-checkout-total-shipping-cents="1500" data-checkout-shipping-rate="$15.00" data-checkout-original-shipping-rate="$15.00" data-checkout-total-price="$94.00" data-checkout-total-price-cents="9400" data-checkout-payment-due="$94.00" data-checkout-payment-due-cents="9400" data-checkout-payment-subform="required" data-checkout-subtotal-price="$79.00" data-checkout-subtotal-price-cents="7900" data-checkout-total-taxes="$0.00" data-checkout-total-taxes-cents="0" data-checkout-multiple-shipping-rates-group="false" data-backup="shopify-Standard%20Shipping-15.00" type="radio" value="shopify-Standard%20Shipping-15.00" name="checkout[shipping_rate][id]" id="checkout_shipping_rate_id_shopify-standard20shipping-15_00" />
</div>
<label class="radio__label" for="checkout_shipping_rate_id_shopify-standard20shipping-15_00">
<span class="radio__label__primary" data-shipping-method-label-title="Standard Shipping">
Standard Shipping
</span>
<span class="radio__label__accessory">
<span class="content-box__emphasis">
$15.00
</span>
</span>
</label> </div> <!-- /radio-wrapper-->
</div>
</div>
</div>
</div>
</div>
<div class="step__footer" data-step-footer>
<button name="button" type="submit" id="continue_button" class="step__footer__continue-btn btn" aria-busy="false"><span class="btn__content" data-continue-button-content="true">Continue to payment</span><svg class="icon-svg icon-svg--size-18 btn__spinner icon-svg--spinner-button" aria-hidden="true" focusable="false"> <use xlink:href="#spinner-button" /> </svg></button>
<a class="step__footer__previous-link" href="/18292275/checkouts/38df275516a513f1c08f6c470ef014d0?step=contact_information"><svg focusable="false" aria-hidden="true" class="icon-svg icon-svg--color-accent icon-svg--size-10 previous-link__icon" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 10 10"><path d="M8 1L7 0 3 4 2 5l1 1 4 4 1-1-4-4"/></svg><span class="step__footer__previous-link-content">Return to information</span></a>
</div>
Try this xpath :
//span[text()='Continue to payment']/..
In code :
Without explicit waits :
Code :
driver.find_element_by_xpath("//span[text()='Continue to payment']/..").click()
With Explicit waits :
Code :
wait = WebDriverWait(driver, 10)
wait.until(EC.element_to_be_clickable((By.XPATH, "//span[text()='Continue to payment']/.."))).click()
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Update 1 :
from selenium.webdriver.common.action_chains import ActionChains
ActionChains(driver).move_to_element(driver.find_element_by_xpath("//span[text()='Continue to payment']/..")).click().perform()
Update 2 :
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
driver.implicitly_wait(50)
driver.get('https://seelbachs.com/products/sagamore-spirit-cask-strength-rye-whiskey')
wait = WebDriverWait(driver, 50)
frame_xpath = '/html/body/div[5]/div/div/div/div/iframe'
wait = WebDriverWait(driver, 10)
# wait until iframe appears and select iframe
wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, frame_xpath)))
# select button
xpath = '//*[#id="enter"]'
time.sleep(2)
element= driver.find_element_by_xpath(xpath)
ActionChains(driver).move_to_element(element).click(element).perform()
# go back to main page
driver.get('https://seelbachs.com/products/sagamore-spirit-cask-strength-rye-whiskey')
# add to cart
atc= driver.find_element_by_xpath('//button[#class="btn product-form__cart-submit product-form__cart-submit--small"]')
atc.click()
# check out
co= driver.find_element_by_xpath ('//*[#id="shopify-section-cart-template"]/div/form/footer/div/div[2]/input[2]')
co.click()
# enter email
driver.find_element_by_xpath('//*[#id="checkout_email"]').send_keys('no#yahoo.com')
time.sleep(1)
# enter first name
driver.find_element_by_xpath('//*[#id="checkout_shipping_address_first_name"]').send_keys('John')
time.sleep(1)
# enter last name
driver.find_element_by_xpath('//*[#id="checkout_shipping_address_last_name"]').send_keys('Smith')
time.sleep(1)
# enter address
driver.find_element_by_xpath ('//*[#id="checkout_shipping_address_address1"]').send_keys('111 South Street')
# enter city
driver.find_element_by_xpath ('//*[#id="checkout_shipping_address_city"]').send_keys('Cocoa')
# enter zip
driver.find_element_by_xpath ('//*[#id="checkout_shipping_address_zip"]').send_keys('263153')
# enter phone
driver.find_element_by_xpath ('//*[#id="checkout_shipping_address_phone"]').send_keys('5555555'+ u'\ue007')
select = Select(wait.until(EC.visibility_of_element_located((By.ID, "checkout_shipping_address_province"))))
select.select_by_value('UK')
wait.until(EC.element_to_be_clickable((By.ID, "continue_button"))).click()
ctp = driver.find_element_by_id('continue_button')
ctp.click()
Solved this issue. I am now able to click the "Continue to Payment" button.

Scraping nested html with Selenium

I'm looking for some help with scraping with selenium in python.
You need a paid account to view this page so creating a reproducible won't be possible.
The page I'm trying to scrape
I'm attempting to scrape the data from the pitch in the top right corner of the image under 'Spots on Field'.
<div class="player-details-football-map__UEFA player-details-football-map">
<div class="shots">
<div>
<a class="shot episode" style="left: 39.8529%; top: 28.9474%;"></a>
<div class="tooltip" style="left: 39.8529%; top: 28.9474%;">
<div class="tooltip-title">
<div class="tooltip-shoot-type">Shot on target</div>
<div class="tooltip-blow-type">Donyell Malen </div>
<div class="tooltip-shoot-name"></div>
</div>
<div class="tooltip-time">h Viktoria Koln</div>
<div class="tooltip-time">Half 1, 18:22 02/09/20</div>
<div class="tooltip-time">Length: 7.1 m</div>
<div class="tooltip-shoot-xg">Expected goals: 0.17</div>
</div>
</div>
The above is a snippet of just one of the data points I want to scrape.
I've tried using BeautifulSoup
from bs4 import BeautifulSoup
from requests import get
url = 'https://football.instatscout.com/players/294322/shots'
response = get(url)
html_soup = BeautifulSoup(response.text, 'html.parser')
type(html_soup)
shots = html_soup.find_all('div', class_ = 'tooltip')
print(type(shots))
print(len(shots))
and nothing was being returned.
So now I've tried using Selenium.
options = Options()
options.add_argument("start-maximized")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Users\James\OneDrive\Desktop\webdriver\chromedriver.exe')
driver.get('https://football.instatscout.com/players/294322/shots')
print("Page Title is : %s" %driver.title)
driver.find_element_by_name('email').send_keys('my username')
driver.find_element_by_name('pass').send_keys('my password')
driver.find_element_by_xpath('//*[contains(concat( " ", #class, " " ), concat( " ", "hRAqIl", " " ))]').click()
goals = driver.find_element_by_class_name('tooltip')
but I'm getting the error of
NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".tooltip"}
Can someone please help point me in the right direction? I'm basically trying to scrape everything from the above HTML, that includes 'tooltip' in the class name.
Thanks
Using css selectors with bs4:
from bs4 import BeautifulSoup as soup
import re #for extracting offsets
r = [{**dict(zip(['left', 'top'], re.findall('[\d\.]+', i.div['style']))),
'shoot_type':i.select_one('.tooltip-shoot-type').text,
'name':i.select_one('.tooltip-blow-type').text,
'team':i.select_one('div:nth-of-type(2).tooltip-time').text,
'time':i.select_one('div:nth-of-type(3).tooltip-time').text,
'length':i.select_one('div:nth-of-type(4).tooltip-time').text[8:],
'expected_goals':i.select_one('.tooltip-shoot-xg').text[16:]}
for i in soup(html, 'html.parser').select('div.shots > div')]
Output:
[{'left': '39.8529', 'top': '28.9474', 'shoot_type': 'Shot on target', 'name': 'Donyell Malen ', 'team': 'h Viktoria Koln', 'time': 'Half 1, 18:22 02/09/20', 'length': '7.1 m', 'expected_goals': '0.17'}]

How to iterate over children webelements in Python Webbot/Selenium?

I have a table of search results in Selenium browser and each search result is defined in Html like this:
<div class="item
itemWrapper
ItemPosition1
ItemMonitor
" data-position="1" data-it-name="NAME OF THE ITEM" data-it-category="Category" role="article">
<div class="item-image">
<a href="/some/link/" target="_blank" rel="noopener" class="itemRec">
<img src="https://some.jpg" alt="some name" class="img-responsive">
</a>
</div>
<h2 class="small-text item-title">
Link Text
</h2>
<div class="item-bottom">
<div class="pull-left item-price">
<span>999</span>
</div>
<div class="pull-right detail-link">
<a href="/link/to/detail" title="link title" class="detail"
Detail
</a>
</div>
</div>
</div>
I am able to find all webelements by classname = item.
elements = driver.find_elements_by_class_name("item")
I would need to iterate over elements and get their position, name and price to be able to click to one of them:
for e in elements:
position=e.get_attribute("data-position").value,
name=e.get_attribute("data-it-name").value,
price=e.find_element(By.CLASS_NAME,'item-price').value
but this does not work - get_attribute returns None and find_element does not find any child element
Can you please advise me how to get the "data-" atributes and child elements values correctly?
Whole code using Webbot:
import webbot
from selenium.webdriver.common.by import By
web = webbot.Browser()
web.go_to('www.***.cz')
web.type('bed', classname='header-search-form')
web.press(web.Key.ENTER)
elements = web.find_elements(classname="product-item")
for e in elements:
name = e.get_attribute("data-it-name").value
price = e.find_element(By.CLASS_NAME, 'item-price').value
print(name,price)
break
classname acts weirdly in webbot. You definitely are not getting a product item there:
In [56]: elements[0].get_attribute('outerHTML')
Out[56]: '\n\n\t\t\t\t\t\t<img src="https://s.favi.cz/static/frontend/_global/images/favi-logo/favi-logo.60d511aff13247dd52f15acf6bdf2af9.svg" role="banner">\n\n\t\t\t\t\t'
Works well with a CSS selector:
In [58]: elements = web.find_elements(css_selector=".product-item")
In [59]: elements[0].get_attribute('outerHTML')
Out[59]: '<div class="\n\t\t\tproduct-item\n\t\t\titemWrapper\n\t\t\tproductItemPosition1\n\t\t\tproductItemMonitor\n\t\t\tproductItemWrapper\n\t\t\tsendProductTransactionWrapper\n\t\t\t\t\t" data-position="1" data-pr-name="Moderní box spring postel Alvares 160x200, bílá" data-tr-id="04d62b60-9d00-4d1b-b03c-2258c50bfdb9" data-pr-category="Postele" data-tr-ob-id="2144583" data-m-ob-id="2345478" role="article">\n\n\t\t<div class="product-image">\n\n\t\t\t\n\t\t\t\t\t\t\t\t\t<img src="https://s.favi.cz/static/images/t/product/300/6f/92/6f922779-bc84-483e-b1cd-ad8522ef0c92.jpg" alt="Moderní box spring postel Alvares 160x200, bílá" class="img-responsive">\n\t\t\t\t\t\t\t\n\n\t\t\t\n\t\t\t\t\t\t\t\t\t<span class="count">485</span>\n\t\t\t\t\t\t\t\n\n\t\t\t\n\t\t\t\n\t\t</div>\n\n\t\t<div class="product-labels stickers-holder">\n\n\t\t\t\t\t\t\t<span class="sticker storage white">\n\t\t\t\t\t<span class="text">Skladem</span>\n\t\t\t\t</span>\n\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\n\t\t</div>\n\n\t\t<h2 class="small-text product-item-title">\n\t\t\tModerní box spring postel Alvares 160x200, bílá\n\t\t</h2>\n\n\t\t<div class="product-bottom">\n\n\t\t\t<div class="pull-left product-item-price">\n\t\t\t\t<span>15 599 Kč</span>\n\t\t\t\t\t\t\t</div>\n\n\t\t\t<div class="pull-right product-shop-link">\n\t\t\t\t\n\t\t\t\t\tDetail\n\t\t\t\t\n\n\t\t\t\t\n\t\t\t\t\t<strong>Do obchodu</strong>\n\t\t\t\t\n\t\t\t</div>\n\n\t\t</div>\n\n\t\t\n\t</div>'
In [60]: elements[0].get_attribute('data-position')
Out[60]: '1'
In [61]: elements[0].get_attribute('data-pr-name')
Out[61]: 'Moderní box spring postel Alvares 160x200, bílá'

Unable to fetch the relevant links and discard others

I've written a script in python in combination with selenium along with BeautifulSoup to get the links leading to property details from a webpage. As the content are heavily dynamic, I made use of selenium to get the page source. When I run my script, I get lots of links including those required links.
How can I get only the relevant link from each container out of the three?
My try:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def fetch_info(link):
driver.get(link)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#community-search-homes .propertyWrapper > a")))
soup = BeautifulSoup(driver.page_source, "lxml")
linklist = [item.get("href") for item in soup.select("#community-search-homes .propertyWrapper > a")]
return linklist
if __name__ == '__main__':
url = "https://www.khov.com/find-new-homes/arizona/buckeye"
driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)
for newlink in fetch_info(url):
print(newlink)
driver.quit()
Results I'm having:
/find-new-homes/arizona/buckeye/85396/k-hovnanian-homes/aspire-at-sienna-hills
/find-new-homes/arizona/buckeye/85396/k-hovnanian-homes/affinity-at-verrado
/find-new-homes/arizona/buckeye/85396/four-seasons/k.-hovnanian's-four-seasons-at-victory-at-verrado
/find-new-homes/arizona/scottsdale/85255/k-hovnanian-homes/summit-at-silverstone
/find-new-homes/arizona/scottsdale/85257/k-hovnanian-homes/skye
/find-new-homes/arizona/phoenix/85020/k-hovnanian-homes/pointe-16
/find-new-homes/arizona/peoria/85383/k-hovnanian-homes/fusion-ii-at-the-meadows
/find-new-homes/arizona/scottsdale/85257/k-hovnanian-homes/aire
/find-new-homes/arizona/scottsdale/85255/k-hovnanian-homes/pinnacle-at-silverstone
/find-new-homes/arizona/peoria/85383/k-hovnanian-homes/montage-at-the-meadows
/find-new-homes/arizona/sun-city/85373/four-seasons/k.-hovnanian-s-four-seasons-at-ventana-lakes
/find-new-homes/arizona/peoria/85382/k-hovnanian-homes/park-paseo
/find-new-homes/arizona/laveen/85339/k-hovnanian-homes/affinity-at-montana-vista
/find-new-homes/arizona/laveen/85339/k-hovnanian-homes/aspire-at-montana-vista
/find-new-homes/arizona/scottsdale/85255/k-hovnanian-homes/pinnacle-ii-at-silverstone
/find-new-homes/arizona/scottsdale/85255/k-hovnanian-homes/summit-ii-at-silverstone
Results I would like to get:
/find-new-homes/arizona/buckeye/85396/k-hovnanian-homes/aspire-at-sienna-hills
/find-new-homes/arizona/buckeye/85396/k-hovnanian-homes/affinity-at-verrado
/find-new-homes/arizona/buckeye/85396/four-seasons/k.-hovnanian's-four-seasons-at-victory-at-verrado
A chunk of html elements (the link I'm after is within the second line of the following elements):
<div class="propertyWrapper clear">
<span class="link-outside"></span>
<div class="propertyCarouselWrapper">
<div class="responsiveImageCarousel enabled" style="touch-action: pan-y; user-select: none; -webkit-user-drag: none; -webkit-tap-highlight-color: rgba(0, 0, 0, 0);">
<div class="prevBtn"></div>
<div class="nextBtn"></div>
<div class="images" data-detail-url="/find-new-homes/arizona/buckeye/85396/k-hovnanian-homes/aspire-at-sienna-hills">
<ul style="width: 960px; left: 0px;">
<li style="width: 320px;"><img alt="holiday exterior new homes sienna hills usp" src="https://khovcachecdn.azureedge.net/azure/sitefinitylibraries/images/default-source/images/az/aspire-at-sienna-hills/community-thumbnails/holiday-exterior-new-homes-sienna-hills-usp.jpg?sfvrsn=4&build=1019&encoder=wic&useresizingpipeline=true&w=450&h=280&mode=crop"></li>
<li style="width: 320px;"><img alt="carnival exterior new homes sienna hills usp" src="https://khovcachecdn.azureedge.net/azure/sitefinitylibraries/images/default-source/images/az/aspire-at-sienna-hills/community-thumbnails/carnival-exterior-new-homes-sienna-hills-usp.jpg?sfvrsn=4&build=1019&encoder=wic&useresizingpipeline=true&w=450&h=280&mode=crop"></li>
</ul>
</div>
<div class="pagination" style="width: 56px;"><ul><li class="active"></li><li></li></ul></div>
</div>
</div>
<div class="propertyInfoWrapper">
<div class="marker-details-container">
<h3 class="marker-details">New Homes in Buckeye, Arizona</h3>
<div class="spacer"></div>
<h4 class="propertyListingHeader">Aspire at Sienna Hills</h4>
<p class="marker-details">21007 West Almeria Road, Buckeye, AZ 85396</p>
<p class="marker-details marker-status">Final Opportunities</p>
<div class="spacer"></div>
<p class="marker-details marker-price"><span class="bold">Priced from: </span>Mid $200s</p>
<p class="marker-details"><span class="bold">Home type: </span>Single Family Homes</p>
<p class="marker-details marker-amenities"><span class="bold">Amenities: </span>Pool, Hiking Trails, Park</p>
</div>
<div class="community-tag-container">
<a href="/find-new-homes/arizona/buckeye/85396/k-hovnanian-homes/aspire-at-sienna-hills#quick-move-in-homes" onclick="KHOV.Analytics.trackEvent('Qmi_Icon_Qmi');">
<div class="community-tag">
<div class="ctaDesc quick-move-in-badge link-inside">Quick Move In Homes</div>
<div class="ctaIcon quick-move-in-badge-icon link-inside"></div>
</div>
</a>
</div>
<a href="#request-info-form-modal" class="open-inline-modal-link" onclick="KHOV.Analytics.trackEvent('Orange_Ri_Request_Info');">
<div class="button orange-color requestInfoButton link-inside" data-urlname="aspire-at-sienna-hills">Request Info</div>
</a>
</div>
</div>
You need to include the featured id as well as results. You can use Or to combine. Latest bs4 supports not.
#propertyResultsContainer .propertyWrapper :not([onclick])[href*=find], #propertyFeaturedResultsContainer .propertyWrapper :not([onclick])[href*=find]
This can also be shortened to
#propertyResultsContainer .propertyWrapper :not([onclick])[href*=find], #propertyFeaturedResultsContainer
But that shortening may be less robust.
You can just check for the desired keyword in the link and print those, and ignore the others:
if __name__ == '__main__':
url = "https://www.khov.com/find-new-homes/arizona/buckeye"
driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)
for newlink in fetch_info(url):
if url.split('/')[-1] in newlink:
print(newlink)
driver.quit()
Output:
/find-new-homes/arizona/buckeye/85396/k-hovnanian-homes/aspire-at-sienna-hills
/find-new-homes/arizona/buckeye/85396/k-hovnanian-homes/affinity-at-verrado
/find-new-homes/arizona/buckeye/85396/four-seasons/k.-hovnanian's-four-seasons-at-victory-at-verrado
Would list slicing works?
def fetch_info(link):
driver.get(link)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#community-search-homes .propertyWrapper > a")))
soup = BeautifulSoup(driver.page_source, "lxml")
linklist = [item.get("href") for item in soup.select("#community-search-homes .propertyWrapper > a")][:3]
return linklist

Categories