Get html of inspect element source with selenium - python

I'm working in selenium with Chrome.
The webpage I'm accessing updates dynamically.
I need the html that shows the results, I can access it when I do 'inspect element'.
I don't get how I need to access that html from my code. I always get the original html.
I tried this: Get HTML Source of WebElement in Selenium WebDriver using Python
browser.get('http://bijsluiters.fagg-afmps.be/?localeValue=nl')
searchform = browser.find_element_by_class_name('iceInpTxt')
searchform.send_keys('cefuroxim')
button = browser.find_element_by_class_name('iceCmdBtn').click()
element = browser.find_element_by_class_name('contentContainer')
html = element.get_attribute('innerHTML')
browser.close()
print(html)

It seems that it's working after some delay. If I were you I should try to experiment with the delay time.
from selenium import webdriver
import time
browser = webdriver.Chrome()
browser.get('http://bijsluiters.fagg-afmps.be/?localeValue=nl')
searchform = browser.find_element_by_class_name('iceInpTxt')
searchform.send_keys('cefuroxim')
button = browser.find_element_by_class_name('iceCmdBtn').click()
time.sleep(10)
element = browser.find_element_by_class_name('contentContainer')
html = element.get_attribute('innerHTML')
browser.close()
print(html)
Addition: a nicer way is to let the script proceed when an element is available (because of time it takes with JS (for example) before a specific element has been added to the DOM). The element to look for in your example is table with id iceDatTbl (for what I could find after a quick look).

Related

How do I make the driver navigate to new page in selenium python

I am trying to write a script to automate job applications on Linkedin using selenium and python.
The steps are simple:
open the LinkedIn page, enter id password and log in
open https://linkedin.com/jobs and enter the search keyword and location and click search(directly opening links like https://www.linkedin.com/jobs/search/?geoId=101452733&keywords=python&location=Australia get stuck as loading, probably due to lack of some post information from the previous page)
the click opens the job search page but this doesn't seem to update the driver as it still searches on the previous page.
import selenium
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup
import pandas as pd
import yaml
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")
url = "https://linkedin.com/"
driver.get(url)
content = driver.page_source
stream = open("details.yaml", 'r')
details = yaml.safe_load(stream)
def login():
username = driver.find_element_by_id("session_key")
password = driver.find_element_by_id("session_password")
username.send_keys(details["login_details"]["id"])
password.send_keys(details["login_details"]["password"])
driver.find_element_by_class_name("sign-in-form__submit-button").click()
def get_experience():
return "1%C22"
login()
jobs_url = f'https://www.linkedin.com/jobs/'
driver.get(jobs_url)
keyword = driver.find_element_by_xpath("//input[starts-with(#id, 'jobs-search-box-keyword-id-ember')]")
location = driver.find_element_by_xpath("//input[starts-with(#id, 'jobs-search-box-location-id-ember')]")
keyword.send_keys("python")
location.send_keys("Australia")
driver.find_element_by_xpath("//button[normalize-space()='Search']").click()
WebDriverWait(driver, 10)
# content = driver.page_source
# soup = BeautifulSoup(content)
# with open("a.html", 'w') as a:
# a.write(str(soup))
print(driver.current_url)
driver.current_url returns https://linkedin.com/jobs/ instead of https://www.linkedin.com/jobs/search/?geoId=101452733&keywords=python&location=Australia as it should. I have tried to print the content to a file, it is indeed from the previous jobs page and not from the search page. I have also tried to search elements from page like experience and easy apply button but the search results in a not found error.
I am not sure why this isn't working.
Any ideas? Thanks in Advance
UPDATE
It works if try to directly open something like https://www.linkedin.com/jobs/search/?f_AL=True&f_E=2&keywords=python&location=Australia but not https://www.linkedin.com/jobs/search/?f_AL=True&f_E=1%2C2&keywords=python&location=Australia
the difference in both these links is that one of them takes only one value for experience level while the other one takes two values. This means it's probably not a post values issue.
You are getting and printing the current URL immediately after clicking on the search button, before the page changed with the response received from the server.
This is why it outputs you with https://linkedin.com/jobs/ instead of something like https://www.linkedin.com/jobs/search/?geoId=101452733&keywords=python&location=Australia.
WebDriverWait(driver, 10) or wait = WebDriverWait(driver, 20) will not cause any kind of delay like time.sleep(10) does.
wait = WebDriverWait(driver, 20) only instantiates a wait object, instance of WebDriverWait module / class

Python: finding content in dynamically generated HTML

I am trying to get stock options prices from this website based on the series code (for example FMM1), but the content is dynamically generated after the page loads and my python selenium script is not able to extract the correct source code, and therefore does not find it. When I inspect element, I can find it but not when I click on "view source code".
This is my code:
# Here, we open the website for options prices in Chrome
driver = webdriver.Chrome()
driver.get("http://www.bmfbovespa.com.br/pt_br/servicos/market-data/consultas/mercado-de-derivativos/precos-referenciais/precos-referenciais-bm-f-premios-de-opcoes/")
# Since the page is populated by JavaScript code *after* loading the page, we
# tell the browser to wait 10 seconds before getting the source html code
time.sleep(10)
html_file = driver.page_source # gets the html source of the page
print(html_file)
I have also tried the following, but it did not work:
WebDriverWait(driver, 60).until(EC.visibility_of_element_located((By.ID,
"divContainerIframeBmf")))
Use this after the page loads
driver.switch_to.frame(driver.find_element_by_xpath("//iframe"))
and continue performing your operations on the page.

How to set the page source in python selenium?

When using python-selenium and loading a web page I can get the source as follows:
webdriver.page_source
Is there a way to set the page source?
I want to 'read' the html from a file and perform a location action on it, i.e. something like this:
driver = webdriver.Firefox()
driver.set_source(open('my_file.html'))
driver.find_element((By.XPATH, "//div[#id='create']//input"))
Is there a way to do this?
You can open the file directly.
from selenium import webdriver
import os
driver = webdriver.Firefox()
driver.get('file:///' + os.getcwd() +'/my_file.html')
inputElement = driver.find_element_by_xpath("//div[#id='create']//input")
driver.quit()
P.S. I recall that this doesn't work on IE. It works fine on Firefox and Chrome.
You can try to implement something like below:
# Get "html" element
current_html = driver.find_element_by_tag_name("html")
# Get "body" element from saved HTML doc
saved_doc = open("my_file.html")
new_body = saved_doc.read().split("<html>")[-1].split("</html>")[0]
# Replace "body" of current page with "body" of saved page
driver.execute_script("arguments[0].innerHTML = arguments[1]", current_html, new_body)
saved_doc.close()

How to webscrape with Selenium when URL remains static after click

I am new to Selenium/Firefox. My goal is to go to my URL, fill in basic input, select a few items, let browser change the content and download a PDF from there. Ideally, I would love to do it repeatedly later by looping a number of new items. As a first step, I manage to get the browser to work and change content once. But I am stuck in getting the content out as find_elements_by_tag_name() seem to get me something funny rather than some usual HTML tag like what Beautifulsoup .find_all() would do. Appreciate very much any help here.
Here is my code:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
url ='http://www.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx'
browser = webdriver.Firefox(executable_path = 'C:\Program Files\Mozilla
Firefox\geckodriver.exe')
browser.get(url)
StockElem = browser.find_element_by_id('ctl00_txt_stock_code')
StockElem.send_keys('00772')
StockElem.click()
select = Select(browser.find_element_by_id('ctl00_sel_tier_1'))
select.select_by_value('3')
select = Select(browser.find_element_by_id('ctl00_sel_tier_2'))
select.select_by_value('153')
select = Select(browser.find_element_by_id('ctl00_sel_DateOfReleaseFrom_d'))
select.select_by_value('01')
select = Select(browser.find_element_by_id('ctl00_sel_DateOfReleaseFrom_m'))
select.select_by_value('01')
select = Select(browser.find_element_by_id('ctl00_sel_DateOfReleaseFrom_y'))
select.select_by_value('2000')
# select the search button
browser.execute_script("document.forms[0].submit()")
element = browser.find_elements_by_tag_name("a")
print(element)
After clicking on the Search button -- you have 5 links to download PDF files.
You should find those links by CSS selector: .news.
Then go through the list of links by index and click on each link to Download:
elements[0].click() -- by clicking on the first link.

How to loop though HTML and return id values

I am using selenium to navigate to a webpage and store the page source in a variable.
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get("http://google.com")
html1 = driver.page_source
html1 now contains the page source of http://google.com.
My question is How can I return html selectors such as id="id" or name="name".
EDIT:
For example:
The webpage I navigated to with selenium has a menu bar with 4 tabs. Each tab has an id element; id="tab1", id="tab2", and so on. I would like to return each id value. So I want tab1, tab2, so on.
Edit#2:
Another example:
The homepage on my webpage (http://chrisarroyo.me) have several clickable links with ids. I would like to be able to return/print those ids to my console.
So I would like to return the ids for the Learn More button and the ids for the links in the footer (facebookLnk, githubLnk, etc..)
If you are looking for a list of WebElements that have an ID use:
elements = driver.find_elements_by_xpath("//*[#id]")
You can then iterate over that list and use get_attribute_("id") to pull out each elements specific ID.
For name, its pretty much the same code. Except change id to name and you're set.
Thank you #stewartm you comment helped.
This ended up giving me the results I was looking for:
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get("http://chrisarroyo.me")
id_elements = driver.find_elements_by_xpath("//*[#id]")
for eachElement in id_elements:
individual_ids = eachElement.get_attribute("id")
print(individual_ids)
After running the above ^^ the output listed each of the ids on the webpage specified.
output:
navbarNavAltMarkup
learnBtn
githubLnk
facebookLnk
linkedinLnk

Categories