I'm a amateur at using python, and I'm trying to scrape the url from the html below using selenium.
<a class="" href="#" style="text-decoration: none; color: #1b1b1b;" onclick="toDetailOrUrl(event, '1641438','')">[안내] 빗썸 - 빗썸 글로벌 간 간편 가상자산 이동 서비스 종료 안내</a>
In ordinary case, the link url i want to get is in just beside 'href=', but there is just "#" in that html.
When i run the code below that is usual way to using selenium to scrape the given html, it returns a https://cafe.bithumb.com/view/boards/43. But is just what i entered in 'driver.get()', and i don't want.
url = "https://cafe.bithumb.com/view/boards/43"
driver=webdriver.Chrome('chromedriver.exe')
driver.get(url)
driver.implicitly_wait(30)
bo =driver.find_element_by_xpath("//tbody[1]/tr[#style='cursor:pointer;border-top:1px solid #dee2e6;background-color: white']/td[2]/a")
print(bo.get_attribute('href'))
What i want is https://cafe.bithumb.com/view/board-contents/1641438. You can get this url when you click a item corresponding with the xpath i wrote above.
I want this url using selenium or other programmatic ways, no need to open a chrome and enter the url in addressbar, and click using mouse... like that.
good
You can use,
bo.click()
in order to click the element you want (I assumed you want to click bo)
print(driver.execute_script('return arguments[0].getAttribute("href")',bo))
selenium , bo.get_attribute('href') is actually doing document.getElementById("somelocaator").href which returns full href , as '#' indicates current page you get current URL you provided in get()
If you just need # you can use the execute_script
Related
Situation
I'm using Selenium and Python to extract info from a page
Here is the div I want to extract from:
I want to extract the "Registre-se" and the "Login" text.
My code
from selenium import webdriver
url = 'https://www.bet365.com/#/AVR/B146/R^1'
driver = webdriver.Chrome()
driver.get(url.format(q=''))
elements = driver.find_elements_by_class_name('hm-MainHeaderRHSLoggedOutNarrow_Join ')
for e in elements:
print(e.text)
elements = driver.find_elements_by_class_name('hm-MainHeaderRHSLoggedOutNarrow_Login ')
for e in elements:
print(e.text)
Problem
My code don't send any output.
HTML
<div class="hm-MainHeaderRHSLoggedOutNarrow_Join ">Registre-se</div>
<div class="hm-MainHeaderRHSLoggedOutNarrow_Login " style="">Login</div>
By looking this HTML
<div class="hm-MainHeaderRHSLoggedOutNarrow_Join ">Registre-se</div>
<div class="hm-MainHeaderRHSLoggedOutNarrow_Login " style="">Login</div>
and your code, which looks okay to me, except that part you are using find_elements for a single web element.
and by reading this comment
The class name "hm-MainHeaderRHSLoggedOutMed_Login " only appear in
the inspect of the website, but not in the page source. What it's
supposed to do now?
It is clear that the element is in either iframe or shadow root.
Cause page_source does not look for iframe.
Please check if it is in iframe, then you'd have to switch to iframe first and then you can use the code that you have.
switch it like this :
driver.switch_to.frame(driver.find_element_by_xpath('xpath here'))
hello colleagues a question, how would I click a href="javascript:void(0)" I have been trying to understand the same question from the same forum but I do not try to understand it very well, I await your contributions
the xpath href => //[#id="course-link-_62332_1"] ,
the xpath h4 = //[#id="course-link-_62332_1"]/h4
here photo
enter image description here
You can do
driver.find_element_by_id('course-link-_62332_1').click()
href="javascript:void(0)" is used to make the browser stay on same page when clicked. It might be performing task/event which is defined in JavaScript/Jquery script or so.
Coming to your question you can click on href by this method.
element1 = self.driver.find_element_by_xpath('//[#id="course-link-_62332_1"]')
element2 = self.driver.find_element_by_xpath('//[#id="course-link-_62332_1"]/h4 ')
element1.click()
element2.click()
I want to scrape the href link using python3
existing code:
import lxml.html
import requests
dom = lxml.html.fromstring(requests.get('https://www.tripadvisor.co.uk/Search?singleSearchBox=true&geo=191&pid=3825&redirect=&startTime=1576072392277&uiOrigin=MASTHEAD&q=the%20grilled%20cheese%20truck&supportedSearchTypes=find_near_stand_alone_query&enableNearPage=true&returnTo=https%253A__2F____2F__www__2E__tripadvisor__2E__co__2E__uk__2F__&searchSessionId=AF4BFA0308CF336B90FD9602FA122CD11576072382852ssid&social_typeahead_2018_feature=true&sid=AF4BFA0308CF336B90FD9602FA122CD11576072410521&blockRedirect=true&ssrc=a&rf=1').content)
result = dom.xpath("//a[#class='review_count']/#href")
print (result)
from this code:
<a class="review_count" href="/Restaurant_Review-g54774-d10073153-Reviews-The_Grilled_Cheese_Truck-Rapid_City_South_Dakota.html#REVIEWS" onclick="return false;" data-clicksource="ReviewCount">3 reviews</a>
with my existing code I'm getting empty print
i have located the link here yet:
widgetEvCall('handlers.openResult', event, this, '/Restaurant_Review-g54774-d10073153-Reviews-The_Grilled_Cheese_Truck-Rapid_City_South_Dakota.html', {type: 'EATERY',element: this,index: 0,section: 1,locationId: '10073153',parentId: '54774',elementType: 'title',selectedId: '10073153'});
so will need help on this , in this case will be even better to get locationId and selectedId to print
any ideas ?
The problem you're having is because the data is loaded over javascript - try viewing the page with javascript disabled
You could try using a tool that will function with javascript eg. selenium - https://selenium-python.readthedocs.io/
Or try to track down where the JavaScript is loading the data from and then request that directly using python
I am trying to test against this website. (https://www.phptravels.net/), and I want to test its Login feature. There is a "My Account" link, which needs to be clicked first to show the drop down of the Login and Sign up button. The HTML code is like this:
<li id="li_myaccount" class="">
<span class="ink animate" style="height: 137px; width: 137px; top: -10.7969px; left: -28.7344px;"></span><i class="icon_set_1_icon-70 go-right"></i> My Account <b class="lightcaret mt-2 go-left"></b>
<ul class="dropdown-menu">
<li><a class="go-text-right" href="https://www.phptravels.net/login"> Login</a></li>
<li><a class="go-text-right" href="https://www.phptravels.net/register"> Sign Up</a></li>
</ul>
</li>
When I try to click the "My Account" button, it throws an error msg saying "element not visible". I am confused because apparently this button is visible all the time. Here is the code:
elem = driver.find_element_by_xpath("//*[#id='li_myaccount']/a")
elem.click()
What is wrong with my code? Thank you.
I tried to interact with the My Account Link, which is shown in the right part of the navigation bar on the page (https://www.phptravels.net/). Using the locator //*[#id='li_myaccount']/a when I try to click the Link, using the Webdriver, I get the below error:
ElementNotVisibleException: Message: element not interactable
When I explored the html using Chrome's console and searched the element using the locator //*[#id='li_myaccount']/a, the My Account Link, which you want to click doesn't get highlighted.
Therefore further exploration led me to choose the locator //*[contains(#class,'navbar-nav navbar-right')]//*[#id='li_myaccount']/a which highlights the My Account Link.
Then I used the new locator to click My Account link, using the Webdriver and it works !
If you notice carefully, I just added preceding path in the locator that you shared to uniquely identify the My Account Link.
Change the xpath to this one:
(//*[#id='li_myaccount']/a)[2]
If you look at the source, there are actually 2 elements matching this locator - an <a> tag in some modal that's currently hidden, and the one you are trying to address; thus your issue, the method returns the first one, which is not the desired.
This xpath will return the 2nd element ([2]) from the set of responses (the () surrounding the looked-for value).
I am trying to click all of the links on a web page that contain the link text "View all hits in this text." Here's what some of the html on the web page looks like:
<a href="/searchCom.do?offset=24981670&entry=4&entries=112&area=Poetry&forward=textsCom&queryId=../session/1380145118_2069"><b>View all hits in this text</b>
<br>
</a>
[...]
<a href="/searchCom.do?offset=25280103&entry=5&entries=112&area=Poetry&forward=textsCom&queryId=../session/1380145118_2069"><b>View all hits in this text</b>
<br>
</a>
If there were only one such link on the page, I know I could click it using something like:
driver.find_element_by_link_text('View all hits in this text').click()
Unfortunately, this method only ever identifies and clicks the first link on the web page with the link text "View all hits in this text." I therefore wanted to ask: is there a method I can use to click the second (or nth) link with link text "View all hits in this text" on this page? I have a feeling I may need to use xpath, but I haven't quite figured out how I should go about implementing xpath in my script. I would be grateful for any advice others can lend.
There is find_elements_by_link_text() (docs):
links = driver.find_elements_by_link_text('View all hits in this text')
for link in links:
link.click()
Also, you can use xpath to get all links with a specified text:
links = driver.find_elements_by_xpath("//a[text() = 'View all hits in this text']")
for link in links:
link.click()
Hope that helps.
Try this below code:
driver.find_elements_by_link_text('linktext')[1].click()