Cannot find href on a page - python

I am trying to find the url for the trailer video from this page. https://www.binged.com/streaming-premiere-dates/black-monday/.
I checked the various properties of the div class="wordkeeper-video", I cannot find it. Can someone help?

Go ahead and play it. Then there will be something like this. The link is in src tag
<iframe frameborder="0" allowfullscreen="" allow="autoplay" src="https://www.youtube.com/embed/pzxGR6Q-7Mc?rel=0&showinfo=0&autoplay=1"></iframe>
PS: It is in div class="wordkeeper-video"

The video href is not initially present there.
You need first to click on the play button (actually the image), after that the href is presented inside the iframe there.
The iframe is .wordkeeper-video iframe
So you have to switch to the iframe and then extract it's src attribute

The full URL isn't there but all you need to build it is.
<div class="wordkeeper-video " data-type="youtube" data-embed="pzxGR6Q-7Mc" ...>
The data-embed attribute has what you need.
The URL is
https://www.youtube.com/watch?v=pzxGR6Q-7Mc
^ here's the data-embed value
You can get this by using
data_embed = driver.find_element_by_css_selector(".wordkeeper-video").get_attribute("data-embed")
video_url = "https://www.youtube.com/watch?v=" + data_embed

Related

Problem in scraping link(href) using selenium; href="#"

I'm a amateur at using python, and I'm trying to scrape the url from the html below using selenium.
<a class="" href="#" style="text-decoration: none; color: #1b1b1b;" onclick="toDetailOrUrl(event, '1641438','')">[안내] 빗썸 - 빗썸 글로벌 간 간편 가상자산 이동 서비스 종료 안내</a>
In ordinary case, the link url i want to get is in just beside 'href=', but there is just "#" in that html.
When i run the code below that is usual way to using selenium to scrape the given html, it returns a https://cafe.bithumb.com/view/boards/43. But is just what i entered in 'driver.get()', and i don't want.
url = "https://cafe.bithumb.com/view/boards/43"
driver=webdriver.Chrome('chromedriver.exe')
driver.get(url)
driver.implicitly_wait(30)
bo =driver.find_element_by_xpath("//tbody[1]/tr[#style='cursor:pointer;border-top:1px solid #dee2e6;background-color: white']/td[2]/a")
print(bo.get_attribute('href'))
What i want is https://cafe.bithumb.com/view/board-contents/1641438. You can get this url when you click a item corresponding with the xpath i wrote above.
I want this url using selenium or other programmatic ways, no need to open a chrome and enter the url in addressbar, and click using mouse... like that.
good
You can use,
bo.click()
in order to click the element you want (I assumed you want to click bo)
print(driver.execute_script('return arguments[0].getAttribute("href")',bo))
selenium , bo.get_attribute('href') is actually doing document.getElementById("somelocaator").href which returns full href , as '#' indicates current page you get current URL you provided in get()
If you just need # you can use the execute_script

Selenium how to extract href from attributes

<div class="turbolink_scroller" id="container">
<article><div class="inner- article">
<a style="height:81px;" href="LINK TO EXTRACT">
<img width="81" height="81" src="//image.jpg" alt="code" />
Hello! I'm pretty new to selenium and I've been playing around with how to get sources for my webdriver. So far, I'm trying to extract a href link given an alt code as above and I'm not sure if the documentation has a means to do this. I'm feeling that the answer is find_by_xpath but I'm not entirely sure. Thank you for any tips!
The way is as follows
href = driver.find_element_by_tag_name('a').get_attribute('href')
of course, you may have a lot of 'a' tags in a page, so you may make the path to your respective tag,
e.g
div = driver.find_element_by_id('container')
a = div.find_element_by_tag_name('a')
href = a.get_attribute('href')

Get Iframe Src content using Selenium Python

I have added to an html file: maintenance.html an iframe:
<iframe name="iframe_name" src="maintenance_state.txt" frameborder="0" height="40" allowtransparency="allowtransparency" width="800" align="middle" ></iframe>
And I want to get the content of the src file maintenance_state.txt using Python and Selenium.
I'm locating the iframe element using:
maintain = driver.find_element_by_name("iframe_name")
However maintain.text is returning an empty value.
How can I get the text written in maintenance_state.txt file.
Thanks for your help.
As some sites' scripts stop the iframe from working properly if it's loaded as the main document, it's also worth knowing how to read the iframe's source without needing to issue a separate driver.get for its URL:
driver.switch_to.frame(driver.find_element_by_name("iframe_name"))
print(driver.page_source)
driver.switch_to.default_content()
The last line is needed only if you want to be able to do something else with the page afterwards.
You can get the src element, navigate to it and get the page_source:
from urlparse import urljoin
src = driver.find_element_by_name("iframe_name").get_attribute("src")
url = urljoin(base_url, src)
driver.get(url)
print(driver.page_source)

splitting the outerHTML attribute in python

I would want to split out particular text from the outerHTML attribute for a web link.
while Id is true:
link = driver.find_element_by_xpath("//a[#id='bu:ms:all-sp:2']")
href = link.get_attribute("outerHTML")
link.click()
# This will load the link in the same page !
self.assertIn(href, self.page.get_current_url())
When I print the href, output would be,
<a id="bu:ms:all-sp:8" href="/euro/tennis" class="Pointer"><span class="SportImg8"></span> Tennis <span class="NumEvt">51</span></a>
I would want to split this and assert the value of href alone (/euro/tennis) with the current URL.
Could anyone please help me out here ?
Get href attribute instead of outerHTML:
href = link.get_attribute("href")

Selenium in Python: Select the second element with given link text

I am trying to click all of the links on a web page that contain the link text "View all hits in this text." Here's what some of the html on the web page looks like:
<a href="/searchCom.do?offset=24981670&entry=4&entries=112&area=Poetry&forward=textsCom&queryId=../session/1380145118_2069"><b>View all hits in this text</b>
<br>
</a>
[...]
<a href="/searchCom.do?offset=25280103&entry=5&entries=112&area=Poetry&forward=textsCom&queryId=../session/1380145118_2069"><b>View all hits in this text</b>
<br>
</a>
If there were only one such link on the page, I know I could click it using something like:
driver.find_element_by_link_text('View all hits in this text').click()
Unfortunately, this method only ever identifies and clicks the first link on the web page with the link text "View all hits in this text." I therefore wanted to ask: is there a method I can use to click the second (or nth) link with link text "View all hits in this text" on this page? I have a feeling I may need to use xpath, but I haven't quite figured out how I should go about implementing xpath in my script. I would be grateful for any advice others can lend.
There is find_elements_by_link_text() (docs):
links = driver.find_elements_by_link_text('View all hits in this text')
for link in links:
link.click()
Also, you can use xpath to get all links with a specified text:
links = driver.find_elements_by_xpath("//a[text() = 'View all hits in this text']")
for link in links:
link.click()
Hope that helps.
Try this below code:
driver.find_elements_by_link_text('linktext')[1].click()

Categories