I am trying to access an iframe within an iframe using Selenium, Python, and BS4
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import html5lib
driver = webdriver.Firefox()
driver.implicitly_wait(10)
driver.get('http://myurl.com')
try:
time.sleep(4)
iframe = driver.find_elements_by_tag_name('iframe')[0]
driver.switch_to_default_content()
driver.switch_to_frame(iframe)
driver.switch_to_default_content()
driver.find_elements_by_tag_name('iframe')[0]
output = driver.page_source
print output
finally:
driver.quit();
Within the returned text, there appears to be two more iframes. How would I access those? I have attempted in the code above without success.
switch_to_default_content() will return you to the top of the document. What was happening is you switched into the first iframe, switched back to the top of the document, then tried to find the second iframe. Selenium can't find the second iframe, because it's inside of the first iframe.
If you remove the second switch_to_default_content() you should be fine:
iframe = driver.find_elements_by_tag_name('iframe')[0]
driver.switch_to.default_content()
driver.switch_to.frame(iframe)
driver.find_elements_by_tag_name('iframe')[0]
Related
I am trying to scrape the data from the table ETH ZERO SEK of the given URL, however I can't make it work. Does anyone have some advise how I can get it to work?
from selenium import webdriver
from selenium.webdriver.common.by import By
url = 'https://www.ngm.se/marknaden/vardepapper?symbol=ETH%20ZERO%20SEK'
driver = webdriver.Chrome()
driver.get(url)
element = driver.find_element(By.Xpath, './/*[#id="detailviewDiv"]/table/tbody/tr[1]/td/div')
What happens?
Content you are looking for is provided via iframe, so you xpath won't work.
How to fix?
Option#1
Change your url to https://mdweb.ngm.se/detailview.html?locale=sv_SE&symbol=ETH%20ZERO%20SEK and call content directly
Option#2
Grab the source of iframe from your original url:
driver.get('https://www.ngm.se/marknaden/vardepapper?symbol=ETH%20ZERO%20SEK')
get the src of iframe that holds your table
iframe = driver.find_element(By.XPATH, '//iframe').get_attribute("src")
get the iframe
driver.get(iframe)
wait until your tbody of table is located and store it in element
element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//div[#id="detailviewDiv"]//thead[.//span[contains(text(),"Volym")]]/following-sibling::tbody')))
Assign values from cells to variables, by split of elements text:
volym = element.text.split('\n')[-3]
vwap = element.text.split('\n')[-2]
Note waits requires - from selenium.webdriver.support.ui import WebDriverWait
I am trying to scrape information on a website where the information is not immediately present. When you click a certain button, the page begins to load new content on the bottom of the page, and after it's done loading, red text shows up as "Assists (At Least)". I am able to find the first button "Go to Prop builder", which doesn't immediately show up on the page, but after the script clicks the button, it times out when trying to find the "Assists (At Least)" text, in spite of the script sleeping and being present on the screen.
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
import time
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
driver.get('https://www.bovada.lv/sports/basketball/nba')
# this part succeeds
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located(
(By.XPATH, "//span[text()='Go to Prop builder']")
)
)
element.click()
time.sleep(5)
# this part fails
element2 = WebDriverWait(driver, 6).until(
EC.visibility_of_element_located(
(By.XPATH, "//*[text()='Assists (At Least)']")
)
)
time.sleep(2)
innerHTML = driver.execute_script('return document.body.innerHTML')
driver.quit()
soup = BeautifulSoup(innerHTML, 'html.parser')
The problem is the Assist element is under a frame. You need to switch to the frame like this:
frame = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME,"player-props-frame")))
driver.switch_to.frame(frame)
Increase the timeout to confirm the timeout provided is correct, You can also confirm using debug mode. If still issue persist, please check "Assists (At Least)" element do not fall under any frame.
You can also share the DOM and proper error message if issue not resolved.
I have a couple of suggestions you could try,
Make sure that the content loaded at the bottom of the is not in a frame. If it is, you need to switch to the particular frame
Check the XPath is correct, try the XPath is matching from the Developer Console
Inspect the element from the browser, once the Developer console is open, press CTRL +F and then try your XPath. if it's not highlighting check frames
Check if there is are any iframes in the page, search for iframe in the view page source, and if you find any for that field which you are looking for, then switch to that frame first.
driver.switch_to.frame("name of the iframe")
Try adding a re-try logic with timeout, and a refresh button if any on the page
st = time.time()
while st+180>time.time():
try:
element2 = WebDriverWait(driver, 6).until(
EC.visibility_of_element_located(
(By.XPATH, "//*[text()='Assists (At Least)']")
)
)
except:
pass
The content you want is in an iFrame. You can access it by switching to it first, like this:
iframe=driver.find_element_by_css_selector('iframe[class="player-props-frame"]')
driver.switch_to.frame(iframe)
Round brackets are the issue here (at least in some cases...). If possible, use .contains selector:
//*[contains(text(),'Assists ') and contains(text(),'At Least')]
I would like to write a python Programm which downloads automaticaly historical stock data from a web-page. The correspindent HTML-Code of the Element I would like to select is on the following Picture:
There are two iframes. One is inside the other. I switch to the second iframe, but the element I would like to click can't be found. I get the following error: "Message: no such element: Unable to locate element: {"method":"css selector","selector":"[id=":cu"]"} (Session info: chrome=75.0.3770.100)"
from requests import get
from requests.exceptions import RequestException
from contextlib import closing
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import ctypes # An included library with Python install.
import time
user = ""
pwd = ""
driver = webdriver.Chrome()
driver.get("https://www.dukascopy.com/trading-tools/widgets/quotes/historical_data_feed")
driver.maximize_window()
## Give time for iframe to load ##
time.sleep(1)
# get the list of iframes present on the web page using tag "iframe"
seq = driver.find_elements_by_tag_name('iframe')
print("No of frames present in the web page are: ", len(seq))
#switch to correct iFrame
driver.switch_to_default_content()
iframe = driver.find_elements_by_tag_name('iframe')[1]
driver.switch_to.frame(iframe)
driver.implicitly_wait(5)
elem = driver.find_element_by_id(':cu')
elem.click()
ctypes.windll.user32.MessageBoxW(0, "Test", "Test MsgBox", 1)
driver.close()
If my code would be correct the element "EUR/TRY" in the List would be selected.
There are total 4 iframes.
The table you want to interact with is in iframe[src^='https://freeserv'] and parent iframe is widget-container. One by one you have to switch to it like this :
Code :
wait = WebDriverWait(driver,10)
driver.maximize_window()
driver.get("https://www.dukascopy.com/trading-tools/widgets/quotes/historical_data_feed")
wait.until(EC.frame_to_be_available_and_switch_to_it((By.ID, "widget-container")))
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "iframe[src^='https://freeserv']")))
check_Box = wait.until(EC.visibility_of_element_located((By.XPATH, "//strong[text()='EUR/TRY']/../preceding-sibling::span/span")))
ActionChains(driver).move_to_element(check_Box).perform()
check_Box.click()
I am trying to click on the "chercher" button on the left of the page (middle).
url = "https://www.fpjq.org/repertoires/repertoire-des-medias/"
driver = webdriver.Firefox()
driver.get(url)
time.sleep(2)
driver.find_element_by_xpath('//*[#id="recherche"]/input[3]').click()
However, it can't find the element. I copy pasted the XPath so I am not sure why it's not working.
Thanks.
That's because required button located inside an iframe and to be able to click it you need to switch to that iframe:
url = "https://www.fpjq.org/repertoires/repertoire-des-medias/"
driver = webdriver.Firefox()
driver.get(url)
time.sleep(2)
driver.switch_to.frame(driver.find_element_by_tag_name("iframe"))
driver.find_element_by_xpath('//*[#id="recherche"]/input[3]').click()
Also note that using time.sleep() is not a good practice. You can try to implement Explicitwait instead
I'm trying to crawl the website "http://everydayhealth.com". However, I found that the page will dynamically rendered. So, when I click the button "More", some new news will be shown. However, using splinter to click the button doesn't let "browser.html" automatically changes to the current html content. Is there a way to let it get newest html source, using either splinter or selenium? My code in splinter is as follows:
import requests
from bs4 import BeautifulSoup
from splinter import Browser
browser = Browser()
browser.visit('http://everydayhealth.com')
browser.click_link_by_text("More")
print(browser.html)
Based on #Louis's answer, I rewrote the program as follows:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Firefox()
driver.get("http://www.everydayhealth.com")
more_xpath = '//a[#class="btn-more"]'
more_btn = WebDriverWait(driver, 10).until(lambda driver: driver.find_element_by_xpath(more_xpath))
more_btn.click()
more_news_xpath = '(//a[#href="http://www.everydayhealth.com/recipe-rehab/5-herbs-and-spices-to-intensify-flavor.aspx"])[2]'
WebDriverWait(driver, 5).until(lambda driver: driver.find_element_by_xpath(more_news_xpath))
print(driver.execute_script("return document.documentElement.outerHTML;"))
driver.quit()
However, in the output text, I still couldn't find the text in the updated page. For example, when I search "Is Milk Your Friend or Foe?", it still returns nothing. What's the problem?
With Selenium, assuming that driver is your initialized WebDriver object, this will give you the HTML that corresponds to the state of the DOM at the time you make the call:
driver.execute_script("return document.documentElement.outerHTML;")
The return value is a string so you could do:
print(driver.execute_script("return document.documentElement.outerHTML;"))
When I use Selenium for tasks like this, I know browser.page_source does get updated.