I have some trouble finding a element here Link
I want to scrape the names of the matches using:
WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH, '/html/body/div[7]/div[1]/div/div/div[4]/div/div/main/div[2]/div/div/div/div//div/div/div/article/main/a/span')))
(I don't like using the XPath, but otherwise I'd also get the bets that weren't just full time result bets)
But when I run this it returns an empty list.
I have tried figuring out wether the match names are within an iframe or something but I can't figure it out. Does someone know how I can scrape these elements?
N.B. I have checked multiple times whether the XPath is actually in the HTML and it is.
so the code you are using is to wait the page to load until the xpath is found. I wrote the code below and it works. It just prints dates too, you need to adjust. However I just run it so I am confident it works. Ensure you load all the dependencies in the top. It works with class and not xpath.
driver.get("https://sports.williamhill.com/betting/en-gb/football/competitions/OB_TY295/English-Premier-League/matches/OB_MGMB/Match-Betting")
WebDriverWait(driver, 10).until(
lambda x: x.find_element_by_class_name('sp-o-market__title').is_displayed())
out = driver.find_elements_by_class_name('sp-o-market__title')
for item in out:
item = item.get_attribute('innerHTML')
item = item.split('<span>')[1]
item = item.split("</span>")[0]
print(item)
Produces:
Arsenal v Norwich
Brentford v Brighton
Related
Basic concept I know:
find_element = find single elements. We can use .text or get.attribute('href') to make the element can be readable. Since find_elements is a list, we can't use .textor get.attribute('href') otherwise it shows no attribute.
To scrape information to be readable from find_elements, we can use for loop function:
vegetables_search = driver.find_elements(By.CLASS_NAME, "product-brief-wrapper")
for i in vegetables_search:
print(i.text)
Here is my problem, when I use find_element, it shows the same result. I searched the problem on the internet and the answer said that it's because using find_element would just show a single result only. Here is my code which hopes to grab different urls.
links.append(driver.find_element(By.XPATH, ".//a[#rel='noopener']").get_attribute('href'))
But I don't know how to combine the results into pandas. If I print these codes, links variable prints the same url on the csv file...
vegetables_search = driver.find_elements(By.CLASS_NAME, "product-brief-wrapper")
Product_name =[]
links = []
for search in vegetables_search:
Product_name.append(search.find_element(By.TAG_NAME, "h4").text)
links.append(driver.find_element(By.XPATH, ".//a[#rel='noopener']").get_attribute('href'))
#use panda modules to export the information
df = pd.DataFrame({'Product': Product_name,'Link': links})
df.to_csv('name.csv', index=False)
print(df)
Certainly, if I use loop function particularly, it shows different links.(That's mean my Xpath is correct(!?))
product_link = (driver.find_elements(By.XPATH, "//a[#rel='noopener']"))
for i in product_link:
print(i.get_attribute('href'))
My questions:
Besides using for loop function, how to make find_elements becomes readable? Just like find_element(By.attribute, 'content').text
How to go further step for my code? I cannot print out different urls.
Thanks so much. ORZ
This is the html code which's inspected from the website:
This line:
links.append(driver.find_element(By.XPATH, ".//a[#rel='noopener']").get_attribute('href'))
should be changed to be
links.append(search.find_element(By.XPATH, ".//a[#rel='noopener']").get_attribute('href'))
driver.find_element(By.XPATH, ".//a[#rel='noopener']").get_attribute('href') will always search for the first element on the DOM matching .//a[#rel='noopener'] XPath locator while you want to find the match inside another element.
To do so you need to change WebDriver driver object with WebElement search object you want to search inside, as shown above.
I am very new to this and i have tried to look for the answer to this but unable to find any.
I am using Selenium+chromedriver, trying to monitor some items I am interested in.
Example:
a page with 20 items in a list.
Code:
#list of items on the page
search_area = driver.find_elements_by_xpath("//li[#data-testid='test']")
search_area[19].find_element_by_xpath("//p[#class='sc-hKwDye name']").text
this returns the name of item[0]
search_area[19].find_element_by_css_selector('.name').text
this returns the name of item[19]
why is xpath looking at the parent html?
I want xpath to return the name of item within the WebElement /list item. is it possible?
found the answer, add a . in front
hope this is gonna help someone new like me in the future.
from
search_area[19].find_element_by_xpath("//p[#class='sc-hKwDye name']").text
to
search_area[19].find_element_by_xpath(".//p[#class='sc-hKwDye name']").text
What you are passing in find_element_by_xpath("//p[#class='sc-hKwDye name']") is relative Xpath. You can pass the full Xpath to get the desired result.
Scrapping links should be a simple feat, usually just grabbing the src value of the a tag.
I recently came across this website (https://sunteccity.com.sg/promotions) where the href value of a tags of each item cannot be found, but the redirection still works. I'm trying to figure out a way to grab the items and their corresponding links. My typical python selenium code looks something as such
all_items = bot.find_elements_by_class_name('thumb-img')
for promo in all_items:
a = promo.find_elements_by_tag_name("a")
print("a[0]: ", a[0].get_attribute("href"))
However, I can't seem to retrieve any href, onclick attributes, and I'm wondering if this is even possible. I noticed that I couldn't do a right-click, open link in new tab as well.
Are there any ways around getting the links of all these items?
Edit: Are there any ways to retrieve all the links of the items on the pages?
i.e.
https://sunteccity.com.sg/promotions/724
https://sunteccity.com.sg/promotions/731
https://sunteccity.com.sg/promotions/751
https://sunteccity.com.sg/promotions/752
https://sunteccity.com.sg/promotions/754
https://sunteccity.com.sg/promotions/280
...
Edit:
Adding an image of one such anchor tag for better clarity:
By reverse-engineering the Javascript that takes you to the promotions pages (seen in https://sunteccity.com.sg/_nuxt/d4b648f.js) that gives you a way to get all the links, which are based on the HappeningID. You can verify by running this in the JS console, which gives you the first promotion:
window.__NUXT__.state.Promotion.promotions[0].HappeningID
Based on that, you can create a Python loop to get all the promotions:
items = driver.execute_script("return window.__NUXT__.state.Promotion;")
for item in items["promotions"]:
base = "https://sunteccity.com.sg/promotions/"
happening_id = str(item["HappeningID"])
print(base + happening_id)
That generated the following output:
https://sunteccity.com.sg/promotions/724
https://sunteccity.com.sg/promotions/731
https://sunteccity.com.sg/promotions/751
https://sunteccity.com.sg/promotions/752
https://sunteccity.com.sg/promotions/754
https://sunteccity.com.sg/promotions/280
https://sunteccity.com.sg/promotions/764
https://sunteccity.com.sg/promotions/766
https://sunteccity.com.sg/promotions/762
https://sunteccity.com.sg/promotions/767
https://sunteccity.com.sg/promotions/732
https://sunteccity.com.sg/promotions/733
https://sunteccity.com.sg/promotions/735
https://sunteccity.com.sg/promotions/736
https://sunteccity.com.sg/promotions/737
https://sunteccity.com.sg/promotions/738
https://sunteccity.com.sg/promotions/739
https://sunteccity.com.sg/promotions/740
https://sunteccity.com.sg/promotions/741
https://sunteccity.com.sg/promotions/742
https://sunteccity.com.sg/promotions/743
https://sunteccity.com.sg/promotions/744
https://sunteccity.com.sg/promotions/745
https://sunteccity.com.sg/promotions/746
https://sunteccity.com.sg/promotions/747
https://sunteccity.com.sg/promotions/748
https://sunteccity.com.sg/promotions/749
https://sunteccity.com.sg/promotions/750
https://sunteccity.com.sg/promotions/753
https://sunteccity.com.sg/promotions/755
https://sunteccity.com.sg/promotions/756
https://sunteccity.com.sg/promotions/757
https://sunteccity.com.sg/promotions/758
https://sunteccity.com.sg/promotions/759
https://sunteccity.com.sg/promotions/760
https://sunteccity.com.sg/promotions/761
https://sunteccity.com.sg/promotions/763
https://sunteccity.com.sg/promotions/765
https://sunteccity.com.sg/promotions/730
https://sunteccity.com.sg/promotions/734
https://sunteccity.com.sg/promotions/623
You are using a wrong locator. It brings you a lot of irrelevant elements.
Instead of find_elements_by_class_name('thumb-img') please try find_elements_by_css_selector('.collections-page .thumb-img') so your code will be
all_items = bot.find_elements_by_css_selector('.collections-page .thumb-img')
for promo in all_items:
a = promo.find_elements_by_tag_name("a")
print("a[0]: ", a[0].get_attribute("href"))
You can also get the desired links directly by .collections-page .thumb-img a locator so that your code could be:
links = bot.find_elements_by_css_selector('.collections-page .thumb-img a')
for link in links:
print(link.get_attribute("href"))
I'm trying to extract some odds from a page using Selenium ChromeDriver, since the data is dynamic. The "find elements by XPath expression" usually works with these kind of websites for me, but this time, it can't seem to find the element in question, nor any element that belong to the section of the page that shows the relevant odds.
I'm probably making a simple error - if anyone has time to check the page out I'd be very grateful! Sample page: Nordic Bet NHL Odds
driver.get("https://www.nordicbet.com/en/odds#?cat=®=&sc=50&bgi=36")
time.sleep(5)
dayElems = driver.find_elements_by_xpath("//div[#class='ng-scope']")
print(len(dayElems))
Output:
0
It was a problem I used to face...
It is in another frame whose id is SportsbookIFrame. You need to navigate into the frame:
driver.switch_to_frame("SportsbookIFrame")
dayElems = driver.find_elements_by_xpath("//div[#class='ng-scope']")
len(dayElems)
Output:
26
For searching iframes, they are usual elements:
iframes = driver.find_elements_by_xpath("//iframe")
So to find a word in a certain page i decided to do this:
bodies = browser.find_elements_by_xpath('//*[self::span or self::strong or self::b or self::h1 or self::h2 or self::h3 or self::h4 or self::h5 or self::h6 or self::p]')
for i in bodies:
bodytext = i.text
if "description" in bodytext or "Description" in bodytext:
print("found")
print(i.text)
I believe the code itself is fine, but ultimately I get nothing. To be honest I am not sure what is happening, it just seems like it doesn't work.
Here is an example of what website it would be ran on. Here is some of the page source:
<h2>product description</h2>
EDIT: It may be because the element is not in view, but I have tried to fix it with that in mind. Still, no luck.
Inspect the element and check if its inside the iframe,
if it does, you need switch to the frame first
driver.switch_to_frame(frame_id)
You could try waiting for a few ms to ensure the page has rendered and the expected element is visible with:
time.sleep(1)
Alternatively, try using IDs or custom data attributes to target the elements instead of xpath. Something like find_by_css_selector('.my_class') may work better.