My goal is to be able to scrape definitions of words in python.
To begin with, I am trying to get just the first definition of the word "assist" which should be "to help". I am using dictionary.cambridge.org
//web driver goes to page
driver.get("https://dictionary.cambridge.org/dictionary/english/assist")
//to give time for the page to load
time.sleep(4)
//click "accept cookies"
driver.find_element_by_xpath("/html[#class='i-amphtml-singledoc i-amphtml-standalone']/body[#class='break default_layout amp-mode-mouse']/div[#id='onetrust-consent-sdk']/div[#id='onetrust-banner-sdk']/div[#class='ot-sdk-container']/div[#class='ot-sdk-row']/div[#id='onetrust-button-group-parent']/div[#id='onetrust-button-group']/div[#class='banner-actions-container']/button[#id='onetrust-accept-btn-handler']").click()
Up this point, everything is working correctly. However, when I try to print the first definition using "find element by xpath", I get a NoSuchElementException. I'm pretty familiar with selenium and have scraped web stuff hundreds of times before but on this webpage, I don't know what I'm doing wrong. Here's the code I am using:
print(driver.find_element_by_xpath("/html[#class='i-amphtml-singledoc i-amphtml-standalone']/body[#class='break default_layout amp-mode-mouse']/div[#class='cc fon']/div[#class='pr cc_pgwn']/div[#class='x lpl-10 lpr-10 lpt-10 lpb-25 lmax lp-m_l-20 lp-m_r-20']/div[#class='hfr-m ltab lp-m_l-15']/article[#id='page-content']/div[#class='page']/div[#class='pr dictionary'][1]/div[#class='link']/div[#class='pr di superentry']/div[#class='di-body']/div[#class='entry']/div[#class='entry-body']/div[#class='pr entry-body__el'][1]/div[#class='pos-body']/div[#class='pr dsense dsense-noh']/div[#class='sense-body dsense_b']/div[#class='def-block ddef_block ']/div[#class='ddef_h']/div[#class='def ddef_d db']").text())
To print the scrape definitions of words you can use either of the following Locator Strategies:
Using xpath and text attribute:
print(driver.find_element_by_xpath("//span[contains(#class, 'epp-xref dxref')]//following::div[1]").text)
Using xpath and innerText:
print(driver.find_element_by_xpath("//span[contains(#class, 'epp-xref dxref')]//following::div[1]").get_attribute("innerText"))
Console Output:
to help:
Instead of Absolute xpath, opt for Relative xpaths. You can refer this link
Tried with below code and it retrieved the data.
driver.get("https://dictionary.cambridge.org/dictionary/english/assist")
print(driver.find_element_by_xpath("(//div[#class='ddef_h'])[1]/div").get_attribute("innerText"))
to help:
Related
Here, I want to scrape a website called "fundsnetservices.com." Specifically, I want to grab the text below each program — it's about a paragraph's worth of text.
Using the Google Chrome Inspect method, I was able to pull this...
'/html/body/div[3]/div/div/div[1]/div/p[2]/text()'
... as the xpath. However, every time I print the text out, it returns [ ]. Why might this be?
response = urllib.request.urlopen('http://www.fundsnetservices.com/searchresult/30/International-Grants-&-Funders/18.html')
tree = etree.HTML(response.read().decode('utf-16'))
text = tree.xpath('/html/body/div[3]/div/div/div[1]/div/p[2]/text()')
It seems your code returns whitespace nodes. Correct your XPath with :
//p[#class="tdclass"]/text()[3]
I'm trying to extract some odds from a page using Selenium ChromeDriver, since the data is dynamic. The "find elements by XPath expression" usually works with these kind of websites for me, but this time, it can't seem to find the element in question, nor any element that belong to the section of the page that shows the relevant odds.
I'm probably making a simple error - if anyone has time to check the page out I'd be very grateful! Sample page: Nordic Bet NHL Odds
driver.get("https://www.nordicbet.com/en/odds#?cat=®=&sc=50&bgi=36")
time.sleep(5)
dayElems = driver.find_elements_by_xpath("//div[#class='ng-scope']")
print(len(dayElems))
Output:
0
It was a problem I used to face...
It is in another frame whose id is SportsbookIFrame. You need to navigate into the frame:
driver.switch_to_frame("SportsbookIFrame")
dayElems = driver.find_elements_by_xpath("//div[#class='ng-scope']")
len(dayElems)
Output:
26
For searching iframes, they are usual elements:
iframes = driver.find_elements_by_xpath("//iframe")
I'm trying to use selenium & python to auto get part of the html content.
I use find xpath like below ways to sort out the price from specified flight number, but always get failed result " unable to locate "
Anyone could shed some lights on it ?
element_price = driver.find_element_by_xpath("//div[#id='flight_MU5401']")
element_price.find_element_by_xpath(".//span[#class='base_price02']")
It's the html
If I were to guess, I would say that you are probably getting that error because the element isn't loaded when your code runs. It's probably a slower loading page. You should try adding a wait and see if that helps.
You can also simplify your locator by using a CSS selector and only scraping the page once.
price = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#flight_MU5401 span.base_price02")).text
I am trying to find an element in the web page using selenium. when I "right click" on the web element and inspect, I can find the html code in the console. However when I copy the xpath of the same element from the console and try to execute it in firepath, it says "No matching nodes". Why is it so and how can I fix this ?
Here is the HTML of the element.
<input id="mobile" type="text" onchange="javascript:dispLocMob(this);
"onkeydown="javascript:dispLocMob(this);
"onkeyup="javascript:dispLocMob(this);" value="" maxlength=
"10" placeholder="Mobile Number" name="mobile">
And this is what I am doing:
receiverMobileElement = browser.find_element_by_xpath('//*[#id="mobile"]')
Please help.
Why using xpath when you can find the element by unique id?
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://example.com')
target_element = browser.find_element_by_id('mobile')
I always say screw firebug, use the Chrome console..
From the looks of it, the best way to get this particular element would be one of the following two options (given that 'id=mobile' is unique)..
By.cssSelector("#mobile");
By.xpath("//input[#id = 'mobile']");
To use the chrome console -->
Start Chrome
Right click and inspect your element.
You should see the html code and the console appear below it. (If you can not see the console, click the three vertical dots and select 'Show Console'
Now, from here you can do some awesome stuff! You can search for the elements that you are trying to identify and see if they can be found. for example, to see if the By.cssSelector("#mobile"); will work type this
document.querySelectorAll("#mobile")
for the xpath, type this
$xBy.xpath("//input[#id = 'mobile']");
Just a suggestion... dont be concerned about firebug if your code works, and seriously try out this way in chrome!!
I'm new in Selenium with Python. I'm trying to scrape some data but I can't figure out how to parse outputs from commands like this:
driver.find_elements_by_css_selector("div.flightbox")
I was trying to google some tutorial but I've found nothing for Python.
Could you give me a hint?
find_elements_by_css_selector() would return you a list of WebElement instances. Each web element has a number of methods and attributes available. For example, to get an inner text of the element, use .text:
for element in driver.find_elements_by_css_selector("div.flightbox"):
print(element.text)
You can also make a context-specific search to find other elements inside the current element. Taking into account, that I know what site you are working with, here is an example code to get the departure and arrival times for the first-way flight in a result box:
for result in driver.find_elements_by_css_selector("div.flightbox"):
departure_time = result.find_element_by_css_selector("div.departure p.p05 strong").text
arrival_time = result.find_element_by_css_selector("div.arrival p.p05 strong").text
print [departure_time, arrival_time]
Make sure you study Getting Started, Navigating and Locating Elements documentation pages.