How can I extract the table from the given URL - python

I am trying to scrape the data from the table ETH ZERO SEK of the given URL, however I can't make it work. Does anyone have some advise how I can get it to work?
from selenium import webdriver
from selenium.webdriver.common.by import By
url = 'https://www.ngm.se/marknaden/vardepapper?symbol=ETH%20ZERO%20SEK'
driver = webdriver.Chrome()
driver.get(url)
element = driver.find_element(By.Xpath, './/*[#id="detailviewDiv"]/table/tbody/tr[1]/td/div')

What happens?
Content you are looking for is provided via iframe, so you xpath won't work.
How to fix?
Option#1
Change your url to https://mdweb.ngm.se/detailview.html?locale=sv_SE&symbol=ETH%20ZERO%20SEK and call content directly
Option#2
Grab the source of iframe from your original url:
driver.get('https://www.ngm.se/marknaden/vardepapper?symbol=ETH%20ZERO%20SEK')
get the src of iframe that holds your table
iframe = driver.find_element(By.XPATH, '//iframe').get_attribute("src")
get the iframe
driver.get(iframe)
wait until your tbody of table is located and store it in element
element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//div[#id="detailviewDiv"]//thead[.//span[contains(text(),"Volym")]]/following-sibling::tbody')))
Assign values from cells to variables, by split of elements text:
volym = element.text.split('\n')[-3]
vwap = element.text.split('\n')[-2]
Note waits requires - from selenium.webdriver.support.ui import WebDriverWait

Related

Python Selenium: Finding and filling out the dropbox and textbox fields from a given website

I am trying to build a function to run the information on the following website using Selenium in Python: https://www.smartystreets.com/products/single-address. The function should do the following, given the single line address variable named address_to_search in the arguments:
Load the webpage url with Selenium
Change the lookup dropdown in Step 2 to “freeform address”
Insert address_to_search into the textbox in Step 3
Return the resulting address information (for now as a text file is fine)
Here’s the incomplete function code:
def get_property_info(address_to_search):
url = 'https://www.smartystreets.com/products/single-address'
driver = webdriver.Chrome('chromedriver')
driver.get(url)
return driver.page_source
Whenever I load the url, though, I do not see a dropbox for Step 2 in the driver’s page source to be able to change it or the address fields in Step 3 to insert the given search into. If changing the dropdown in 2 above is not possible, I can change the input to load each address item (address line 1, address line 2, city, state, zip code separately) even though that approach is less ideal, but either way I can’t access the items in Step 3 anyways.
Do you have any suggestions for how to locate these items on the webpage and otherwise build the functions?
As I've mentioned in my comment you are stuck because it's a seperate iframe.
Just navigate to that iframe and do the magic.
If your address will have "\n" at the end like mine does you'll see the result display (the newline char acts as enter key).
I've also added driver argument, since I declared it outside. You can just modify that to your need.
address = "3301 South Greenfield Rd Gilbert, AZ 85297\n"
def get_property_info(driver,address_to_search):
url = 'https://www.smartystreets.com/products/single-address-iframe'
driver.get(url)
driver.find_element_by_id("lookup-select-button").click()
driver.find_element_by_id("lookup-select").find_element_by_id("address-freeform").click()
driver.find_element_by_id("freeform-address").send_keys(address_to_search)
#return driver.page_source
get_property_info(driver,address)
There is one iframe as mentioned, also I think you should use maximize your windows through selenium, should use explicit waits as well :-
Below is the code that does the magic :-
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
driver.implicitly_wait(50)
driver.get("https://www.smartystreets.com/products/single-address")
wait = WebDriverWait(driver, 20)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "iframe[title='iframe']")))
driver.execute_script("window.scrollTo(0, 500)")
ele = wait.until(EC.element_to_be_clickable((By.ID, "lookup-select")))
driver.execute_script("arguments[0].scrollIntoView(true);", ele)
ele.click()
time.sleep(5)
wait.until(EC.element_to_be_clickable((By.ID, "address-freeform"))).click()
wait.until(EC.element_to_be_clickable((By.ID, "freeform-address"))).send_keys("3503 Adonais Way, Norcross")
wait.until(EC.element_to_be_clickable((By.ID, "submit-request"))).click()
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Selenium is returning empty text for elements that definitely have text

I'm practicing trying to scrape my university's course catalog. I have a few lines in Python that open the url in Chrome and clicks the search button to bring up the course catalog. When I go to extract the texting using find_elements_by_xpath(), it returns blank. When I use the dev tools on Chrome, there definitely is text there.
from selenium import webdriver
import time
driver = webdriver.Chrome()
url = 'https://courses.osu.edu/psp/csosuct/EMPLOYEE/PUB/c/COMMUNITY_ACCESS.OSR_CAT_SRCH.GBL?'
driver.get(url)
time.sleep(3)
iframe = driver.find_element_by_id('ptifrmtgtframe')
driver.switch_to.frame(iframe)
element = driver.find_element_by_xpath('//*[#id="OSR_CAT_SRCH_WK_BUTTON1"]')
element.click()
course = driver.find_elements_by_xpath('//*[#id="OSR_CAT_SRCH_OSR_CRSE_HEADER$0"]')
print(course)
I'm trying to extract the text from the element 'OSU_CAT_SRCH_OSR_CRSE_HEADER'. I don't understand why it's not returning the text values especially when I can see that it contains text with dev tools.
You are not using text that is the reason you are not getting the text.
course = driver.find_elements_by_xpath('//*[#id="OSR_CAT_SRCH_OSR_CRSE_HEADER$0"]').text
Try above changes in last second line
Below is the full code after the changes
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
driver = webdriver.Chrome()
url = 'https://courses.osu.edu/psp/csosuct/EMPLOYEE/PUB/c/COMMUNITY_ACCESS.OSR_CAT_SRCH.GBL?'
driver.get(url)
time.sleep(3)
iframe = driver.find_element_by_id('ptifrmtgtframe')
driver.switch_to.frame(iframe)
element = driver.find_element_by_xpath('//*[#id="OSR_CAT_SRCH_WK_BUTTON1"]')
element.click()
# wait 10 seconds
course = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, '//*[#id="OSR_CAT_SRCH_OSR_CRSE_HEADER$0"]'))
).text
print(course)

How to get certain Tag Values

The following Selenium automated script correctly opens the given URL and opens the View Invoice tab that opens up a detailed Invoice.
I need to fetch some values like Number, Date and table values from the detailed invoice. The values are very nested to get to them correctly. The URL that opens up when the View Invoice is clicked, I don't know how to scrape it or use selenium to proceed with.
Is the element in the code like an instance to get the values of the opened detailed invoice page or is there some different approach?
Here is the code:
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r'D:/Chrome driver/chromedriver.exe') # Get local session(use webdriver.Chrome() for chrome)
driver.implicitly_wait(3)
driver.get("URL") # load page from some url
driver.find_element_by_xpath("//input[#id='PNRId']").send_keys("MHUISZ")
driver.find_element_by_xpath("//input[#id='GstRetrievePageInteraction']").click()
element = driver.find_element_by_name('ViewInvoice')
element.click()
Can anyone please guide me on how to fetch the values from the invoice page?
So try to wait for elements to be visible or clickable and your clicking on the invoice actually creates new child handles so you have to switch to them. All you have to do now is figure how to go through a table try looking through it's xpath.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path=r'D:/Chrome driver/chromedriver.exe') # Get local session(use webdriver.Chrome() for chrome)
driver.get("URL")
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//input[#id='PNRId']"))).send_keys("MHUISZ")
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//input[#id='GstRetrievePageInteraction']"))).click()
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.NAME, 'ViewInvoice'))).click()
p = driver.current_window_handle
#get first child window
chwnd = driver.window_handles
for w in chwnd:
#switch focus to child window
if(w!=p):
driver.switch_to.window(w)
break
invoiceTable = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "TableHeader")))
print(invoiceTable.find_element_by_xpath("tbody/tr[1]/td").text)
driver.quit()

Extract Tables from an Iframe - Anbima Using python + selenium

Sup, I'm trying to extract some data tables from an website (https://www.anbima.com.br/pt_br/informar/curvas-de-juros-fechamento.htm), but as we can see the data is inside an Iframe. It took me a while, since I'm not an expert to webscraping data, to click in the button "Consultar" to get in the page that I want. Basically, i't loads the data (4 tables) that inside an Iframe too.
The problem it's I still don't have any successful attempt to get the tables, maybe it's because of the Iframe.
For an example, I tried to use xpath i the first table without sucess.
drive.find_elemnt_by_xpath(//*[#id="Parametros"]/table).text
Here's the code to reach the page that i mentioned:
from selenium import webdriver
import time
import re
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as expectedCondition
from selenium.webdriver.chrome.options import Options
import pandas as pd
import numpy as np
#----------------------- INICIALIZAÇÃO DO SCRAPING -----------------------------#
want_to_scrape = True
if want_to_scrape:
options = Options()
#options.add_argument('--headless')
driver = webdriver.Chrome("C:\\Users\\......\\chromedriver.exe",options=options)
now = time.time()
dataset_list = []
url = 'https://www.anbima.com.br/pt_br/informar/curvas-de-juros-fechamento.htm'
driver.get(url)
#element = driver.find_element_by_class_name('full')
#driver.switch_to.frame(element)
driver.switch_to.frame(0)
element = driver.find_elements_by_name('Consultar')
element[0].click()
time.sleep(1)
try:
alert = driver.switch_to_alert()
alert.accept()
print("alert accepted")
except:
print("no alert")
time.sleep(1)
driver.switch_to.frame(0)
driver.find_element_by_xpath
Try replacing your driver.switch_to.frame(0) line with this:
# Get the iframe element - note, may need to use more specialized selector here
iframe = driver.find_elements_by_tag_name('iframe')
driver.switch_to.frame(iframe)
That will get your driver into the frame context so you can fetch the tables. You may need to use a different selector to get the iframe element. If you post the iframe HTML here, I can help you write a selector.

Cannot get dynamical element with python selenium

There is site, that streams youtube videos. I want to get playlist with them. So I use selenium webdriver to get the needed element div with class-name ytp-title-text where youtube link is located.
It is located here for example, when I use browser console to find element:
<div class="ytp-title-text"><a class="ytp-title-link yt-uix-sessionlink" target="_blank" data-sessionlink="feature=player-title" href="https://www.youtube.com/watch?v=VyCY62ElJ3g">Fears - Jono McCleery</a><div class="ytp-title-subtext"><a class="ytp-title-channel-name" target="_blank" href=""></a></div></div>
I wrote simple script for testing:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
driver = webdriver.Firefox()
driver.get('http://awsmtv.com')
try:
element = WebDriverWait(driver, 10).until(
EC.visibility_of_element_located((By.CLASS_NAME, "ytp-title-text"))
)
finally:
driver.quit()
But no element is found and timeout exception is thrown. I cannot understand, what actions selenium needs to perform to get the full page source.
Required link is hidden and also located inside an iframe. Try below to locate it:
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it("tvPlayer_1"))
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "ytp-title-link")))
print(element.get_attribute('href'))
finally:
driver.quit()
Just saw this element is inside iframe... You need to switch to the iframe first -> find it by ClassName -> ifame = ...(By.CLASS_NAME, "player") then switch to it driver.switch_to_frame(iframe) and you should be able now to get the wanted element :)
The XPath locator like this one will work (or your locator) -> "//a[#class='ytp-title-link yt-uix-sessionlink']".
You then need via the element to get the property href for the youtube video url or the text of the element for the song title.
If still not working I can suggest to get the page source - html = driver.page_source which will give you the source of the page and via some regex to get the info you want eventually.

Categories