Web Scraping using Selenium with two drop down menus

Web Scraping using Selenium with two drop down menus - python

I am trying to make a web scraper that cycles through two drop down menus, but I cannot seem to locate the first drop down box using selenium. I was to cycle through all the names, and years in the drop down box and export a table of all pages and values to a csv. The webpage is :http://surge.srcc.lsu.edu/s1.html
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("http://surge.srcc.lsu.edu/s1.html")
element = driver.find_element_by_xpath('//select[#id="storm_name"]')
all_options = element.find_elements_by_tag_name("option")
My error is:
NoSuchElementException: Unable to locate element:
{"method":"xpath","selector":"//select[#id=\"storm_name\"]"}

So because of #HumphreyTriscuit I will post this as an answer. Please take the time to mark it as an answer in case it was.
So there you go:
Its because you are not loading the right webpage to find your element. Your webpage contains an iFrame that points to the actual content but Selenium I believe is not able to pre-load iFrames. Get this page and your code should work.
And also a little bonus right here: If you are unsure about your xPath you should consider using xPath Checker for Firefox.

Related

How can Selenium (Python, Chrome) find web elements visible in dev tools, but not visible in page source?

I need to click the first item in the menu in a webpage with Python3 using Selenium.
I manage to log-in and navigate to the required page using Selenium, but there I get stuck: it looks like Selenium can't find any element in the page beyond the very first div in body.
I tried to find the element by ID, class, xpath, selector... The problem is probably not about that. I thought it could be about an iframe, but the content I need does not seem to be in one.
I guess that the problem is that the element I need to find is visible in the devtools, but not in the page source, so Selenium just can't see it - does this make sense? If so, can this be fixed?
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
self.driver.get("my site")
# log-in website and navigate to needed page
# [...]
# find element in page
# this works
first_div = driver.find_element(By.CSS_SELECTOR, "#app-wrapper")
# this does not work
second_div = driver.find_element(By.CSS_SELECTOR, "#app-wrapper > div.layout.flex.flex-col.overflow-x-hidden.h-display-flex.h-flex-direction-column.h-screen")
Edit
The problem is most likely due to a dynamic webpage with parts of the DOM tree attached later on by a script. I downloaded a local version of page.html, removed scripts, and successfully found the sought-after element in the local page with
from selenium import webdriver
from selenium.webdriver.common.by import By
from pathlib import Path
driver = webdriver.Chrome()
html_file = Path.cwd() / "page.html"
driver.get(html_file.as_uri())
my_element = driver.find_element(By.CSS_SELECTOR, "[title='my-title']")
The exact same driver.find_element query won't work on the online page. I'm trying to implement a waiting condition as suggested in Misc08's answer.

I guess that the problem is that the element I need to find is visible
in the devtools, but not in the page source, so Selenium just can't
see it - does this make sense? If so, can this be fixed?
No, this does not make sense, since Selenium is executing a full browser in the background like you are using when you investigate the page source with the devtools.
But you have some options to narrow your problem. The first thing you can do, is to print the source the webdriver is "seeing" in this moment:
print(driver.page_source)
If you see the elements you are looking for in the page source, than you should try to improve your selector. It is helpful to go down the DOM step by step. Look for an upper element in the page tree first. If this works, try to find the next child element, then the next child, and so on. You can check if selenium found the element like this:
try:
myelement = driver.find_element(By.CSS_SELECTOR, 'p.content')
print("Found :)")
except NoSuchElementException:
print("No found :(")
By the way, i think your CSS selector is by far to complex, just use on CSS class, not all of them:
second_div = driver.find_element(By.CSS_SELECTOR, "#app-wrapper > div.layout")
But there might be the case, where the elements you are looking for, are not present in the page source from the beginning on. Dynamic webpages getting more and more popular. In this case parts of the DOM tree are attached later on by a script. So you have to wait for the execution of the scripts, before you can find this "dynamic" elements. One dirty and unreliable option is to just add a sleep() here. Much better is to to use an explicit waiting condition, see https://selenium-python.readthedocs.io/waits.html

Using Selenium to Scrape Java-Heavy Website - Returning None

New coder here. I've been trying to scrape just one piece of text on a very java based website for a while now using Selenium. Not sure what I am doing wrong that this point.
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://explorer.helium.com/accounts/13pm9juR7WPjAf7EVWgq5EQAaRTppu2EE7ReuEL9jpkHQMJCjn9")
earnings = driver.find_elements_by_class_name('text-base text-gray-600 mb-1 tracking-tight w-full break-all')
print(earnings)
driver.quit()
Image of attempted element to scrape :
I am trying to scrape that dollar amount in this container so I can eventually use it in a daily report that I am building.
Everything I have tried has resulted in it returning none. Even when I try to grab the text from that element.
Here is website link: https://explorer.helium.com/accounts/13pm9juR7WPjAf7EVWgq5EQAaRTppu2EE7ReuEL9jpkHQMJCjn9

You should wait until javascript loads, page loads, elements loads.
_ = driver.Manage().Timeouts().ImplicitWait;
You can create condition until element appers.
ExpectedConditions ...... define selenium conditions
//This is how we specify the condition to wait on.
wait.until(ExpectedConditions.alertIsPresent());
You can use XPATH ! The DOLLAR XPATH IS
/html/body/div[1]/div/article/div[2]/div/div[2]/div/div[2]/div[1]/div[2]/div[2]
FIREFOX XPATH FINDER
https://addons.mozilla.org/en-US/firefox/addon/xpath_finder/

You can use this xpath
//*[#id="app"]/article/div[2]/div/div[2]/div/div[2]/div[3]/div[1]/div[1]/div[3]

Python Edge Driver Web Automation Help - Cannot find Xpath

Inspect Youtube Page Element
I am new to Python and I am learning how to automate webpages. I under the basics around using the different locators under the inspect element tab to drive my code.
I have written some basic code to skip youtube ads however I am stuck on finding the correct page element to agree to the privacy policy pop up box in Youtube. I have used ChroPath to try and find the xpath of the page however there doesn't appear to be one. I was unable to locate any other page elements and I was wondering if anyone has any ideas on how I can automate the click of the 'I Agree' button?
Python Code:
from msedge.selenium_tools import Edge, EdgeOptions
options = EdgeOptions()
options.use_chromium = True
driver = Edge(options=options)
driver.get('http://www.youtube.com')
def agree():
while True:
try:
driver.find_element_by_xpath('/html/body/ytd-app/ytd-popup-container/paper-dialog/yt-upsell-dialog-renderer/div/div[3]/div[1]/yt-button-renderer/a/paper-button').click()
driver.find_elements_by_xpath('.<span class="RveJvd snByac">I agree</span>').click()
except:
continue
if __name__ == '__main__':
agree()
Youtube Inspect Element Screeshot is below:

I don't know if the xpath in your code is right as I can't see the whole html structure of the page. But you can use F12 dev tools in Edge to find the xpath and to check if the xpath you find is right:
Open the page you want to automate and open F12 dev tools in Edge.
Use Ctrl+Shift+C and click the element you want to locate and find the html code of the element.
Right click the html code and select Copy -> Copy XPath.
Then you can try to use the xpath you copy.
Besides, find_elements_by_xpath(xpath) will return a list with elements if any was found. I think you need to specify which one element of the list to click with. You need to pass in the value number of the elements list like this [x], for example:
driver.find_elements_by_xpath('.<span class="RveJvd snByac">I agree</span>')[0].click()

When inspecting the page elements I overlooked the element of iframe. After doing some digging I came across the fact I had to tell the Selenium Driver to switch from the main page to the iframe. I added the following code and now the click to the 'I Agree' button is automated:
frame_element = driver.find_element_by_id('iframe')
driver.switch_to.frame(frame_element)
agree2 = driver.find_element_by_xpath("/html/body/div/c-wiz/div[2]/div/div/div/div/div[2]/form/div/span/span").click()
driver.switch_to.default_content()

How to get HTML code from elements which don't appear in the source code with selenium?

I'm working in an App that uses Web scraping, but I'm having a hard time figuring out how to get some data from a web page. I can see the info that I'm looking for when i use "inspect element" in Firefox:
The thing is that it doesn't appear in the HTML code of the page, which i actually can get using selenium, the data i look for is obviously database driven and I'm stuck right there, Is there a way to scrap this out with selenium?
This is the url btw: http://2ez.gg/#gg?name=Doombag&server=lan

You should probably be trying to scrape http://lan.op.gg/summoner/userName=Doombag instead, http://2ez.gg/#gg?name=Doombag&server=lan contains an iframe which is why you can't find 55% in the document body.

The reason is that the data you want to retrieve is contained inside the iframe, which has caused the selenium cannot visit the data directly.
Try the following code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver=webdriver.Chrome()
URL='http://2ez.gg/#gg?name=Doombag&server=lan'
driver.get(URL)
driver.switch_to.frame('iframe-content')
elem=driver.find_element_by_css_selector('.WinRatioGraph div.Text')
print(elem.text)
Output: 55%

Why does trying to click with selenium brings up "ElementNotInteractableException"?

I'm trying to click on the webpage "https://2018.navalny.com/hq/arkhangelsk/" from the website's main page. However, I get this error
selenium.common.exceptions.ElementNotInteractableException: Message:
There's nothing after "Message:"
My code
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
browser = webdriver.Firefox()
browser.get('https://2018.navalny.com/')
time.sleep(5)
linkElem = browser.find_element_by_xpath("//a[contains(#href,'arkhangelsk')]")
type(linkElem)
linkElem.click()
I think xpath is necessary for me because, ultimately, my goal is to click not on a single link but on 80 links on this webpage. I've already managed to print all the relevant links using this :
driver.find_elements_by_xpath("//a[contains(#href,'hq')]")
However, for starters, I'm trying to make it click at least a single link.
Thanks for your help,

The best way to figure out issues like this, is to look at the page source using developer tools of your preferred browser. For instance, when I go to this page and look at HTML tab of the Firebug, and look for //a[contains(#href,'arkhangelsk')] I see this:
So the link is located within div, which is currently not visible (in fact entire sub-section starting from div with id="hqList" is hidden). Selenium will not allow you to click on invisible elements, although it will allow you to inspect them. Hence getting element works, clicking on it - does not.
What you do with it depends on what your expectations are. In this particular case it looks like you need to click on <label class="branches-map__toggle-label" for="branchesToggle">Список</label> to get that link visible. So add this:
browser.find_element_by_link_text("Список").click();
after that you can click on any links in the list.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.