Situation
I'm using Selenium and Python to extract info from a page
Here is the div I want to extract from:
I want to extract the "Registre-se" and the "Login" text.
My code
from selenium import webdriver
url = 'https://www.bet365.com/#/AVR/B146/R^1'
driver = webdriver.Chrome()
driver.get(url.format(q=''))
elements = driver.find_elements_by_class_name('hm-MainHeaderRHSLoggedOutNarrow_Join ')
for e in elements:
print(e.text)
elements = driver.find_elements_by_class_name('hm-MainHeaderRHSLoggedOutNarrow_Login ')
for e in elements:
print(e.text)
Problem
My code don't send any output.
HTML
<div class="hm-MainHeaderRHSLoggedOutNarrow_Join ">Registre-se</div>
<div class="hm-MainHeaderRHSLoggedOutNarrow_Login " style="">Login</div>
By looking this HTML
<div class="hm-MainHeaderRHSLoggedOutNarrow_Join ">Registre-se</div>
<div class="hm-MainHeaderRHSLoggedOutNarrow_Login " style="">Login</div>
and your code, which looks okay to me, except that part you are using find_elements for a single web element.
and by reading this comment
The class name "hm-MainHeaderRHSLoggedOutMed_Login " only appear in
the inspect of the website, but not in the page source. What it's
supposed to do now?
It is clear that the element is in either iframe or shadow root.
Cause page_source does not look for iframe.
Please check if it is in iframe, then you'd have to switch to iframe first and then you can use the code that you have.
switch it like this :
driver.switch_to.frame(driver.find_element_by_xpath('xpath here'))
Related
I'm a amateur at using python, and I'm trying to scrape the url from the html below using selenium.
<a class="" href="#" style="text-decoration: none; color: #1b1b1b;" onclick="toDetailOrUrl(event, '1641438','')">[안내] 빗썸 - 빗썸 글로벌 간 간편 가상자산 이동 서비스 종료 안내</a>
In ordinary case, the link url i want to get is in just beside 'href=', but there is just "#" in that html.
When i run the code below that is usual way to using selenium to scrape the given html, it returns a https://cafe.bithumb.com/view/boards/43. But is just what i entered in 'driver.get()', and i don't want.
url = "https://cafe.bithumb.com/view/boards/43"
driver=webdriver.Chrome('chromedriver.exe')
driver.get(url)
driver.implicitly_wait(30)
bo =driver.find_element_by_xpath("//tbody[1]/tr[#style='cursor:pointer;border-top:1px solid #dee2e6;background-color: white']/td[2]/a")
print(bo.get_attribute('href'))
What i want is https://cafe.bithumb.com/view/board-contents/1641438. You can get this url when you click a item corresponding with the xpath i wrote above.
I want this url using selenium or other programmatic ways, no need to open a chrome and enter the url in addressbar, and click using mouse... like that.
good
You can use,
bo.click()
in order to click the element you want (I assumed you want to click bo)
print(driver.execute_script('return arguments[0].getAttribute("href")',bo))
selenium , bo.get_attribute('href') is actually doing document.getElementById("somelocaator").href which returns full href , as '#' indicates current page you get current URL you provided in get()
If you just need # you can use the execute_script
I would like to click an image link and I need to be able to find it by its src, however it's still not working for some reason. Is this even possible? This is what I'm trying:
#Find item
item = WebDriverWait(driver, 100000).until(EC.presence_of_element_located((By.XPATH, "//img[#src=link]")))
#item = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//img[#alt='Bzs36xl9 xa']")))
item.click()
link = //assets.supremenewyork.com/170065/vi/BZS36xl9-xA.jpg in the above code. This matches the HTML from below.
The second locator works (finding image using alt), but I will only have the image source when the program actually runs.
HTML for the webpage:
<article>
<div class="inner-article">
<a style="height:81px;" href="/shop/accessories/h68lyxo2h/llhxzvydj">
<img width="81" height="81" src="//assets.supremenewyork.com/170065/vi/BZS36xl9-xA.jpg" alt="Bzs36xl9 xa">
</a>
</div>
</article>
I don't see why finding by alt would work and not src, is this possible? I saw another similar question which is where I got my solution but it didn't work for me. Thanks in advance.
EDIT
To find the link I have to parse through a website in JSON format, here's the code:
#Loads Supreme JSON website into an object
url = urllib2.urlopen('https://www.supremenewyork.com/mobile_stock.json')
obj = json.load(url)
items = obj["products_and_categories"]["Accessories"]
itm_name = "Sock"
index = 0;
for i in items:
if(itm_name in items[index]["name"]):
found_url = i["image_url"]
break
index += 1
str_link = str(found_url)
link = str_link.replace("ca","vi")
Use WebDriverWait and element_to_be_clickable.Try the following xpath.Hope this will work.
link ='//assets.supremenewyork.com/170065/vi/BZS36xl9-xA.jpg'
item = WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.XPATH, "//div[#class='inner-article']/a/img[#src='{}']".format(link))))
print(item.get_attribute('src'))
item.click()
item = WebDriverWait(driver, 100000).until(EC.presence_of_element_located((By.XPATH, "//img[#src=link]")))
Heres your problem, I can't believe it didn't jump out at me. You're asking the driver to find an element with a src of "link" NOT the variable link that you've defined earlier. Idk how to pass in variables into xpaths but i do know that you can use stringFormat to create the correct xpath string just before calling it.
i also dont speak python, so here's some pseudo java/c# to help you get the picture
String xPathString = String.Format("//img[#src='{0}']", link);
item = WebDriverWait(driver, 100000).until(EC.presence_of_element_located((By.XPATH, xPathString)))
I am trying to scrape the list of followings for a given instagram user. This requires using Selenium to navigate to the user's Instagram page and then clicking "following". However, I cannot seem to click the "following" button with Selenium.
driver = webdriver.Chrome()
url = 'https://www.instagram.com/beforeeesunrise/'
driver.get(url)
driver.find_element_by_xpath('//*[#id="react-root"]/section/main/article/header/div[2]/ul/li[3]/a').click()
However, this results in a NoSuchElementException. I copied the xpath from the html, tried using the class name, partial link and full link and cannot seem to get this to work! I've also made sure that the above xpath include the element with a "click" event listener.
UPDATE: By logging in I was able to get the above information. However (!), now I cannot get the resulting list of "followings". When I click on the button with the driver, the html does not include the information in the pop up dialog that you see on Instagram. My goal is to get all of the users that the given username is following.
Make sure you are using the correct X Path.
Use the following link to get perfect X Paths to access web elements and then try.
Selenium Command
Hope this helps to solve the problem!
Try a different XPath. I've verified this is unique on the page.
driver.find_element_by_xpath("//a[contains(.,'following')]")
It's not the main goal of selenium to provide rich functionalities, from a web-scraping perspective, to find elements on the page, so the better option is to delegate this task to a specific tool, like BeautifulSoup. After we find what we're looking for, then, we can ask for selenium to interact with the element.
The bridge between selenium and BeautifulSoup will be this amazing function below that I found here. The function gets a single BeautifulSoup element and generates a unique XPATH that we can use on selenium.
import os
import re
from selenium import webdriver
from bs4 import BeautifulSoup as bs
import itertools
def xpath_soup(element):
"""
Generate xpath of soup element
:param element: bs4 text or node
:return: xpath as string
"""
components = []
child = element if element.name else element.parent
for parent in child.parents:
"""
#type parent: bs4.element.Tag
"""
previous = itertools.islice(parent.children, 0, parent.contents.index(child))
xpath_tag = child.name
xpath_index = sum(1 for i in previous if i.name == xpath_tag) + 1
components.append(xpath_tag if xpath_index == 1 else '%s[%d]' % (xpath_tag, xpath_index))
child = parent
components.reverse()
return '/%s' % '/'.join(components)
driver = webdriver.Chrome(executable_path=YOUR_CHROMEDRIVER_PATH)
driver.get(url = 'https://www.instagram.com/beforeeesunrise/')
source = driver.page_source
soup = bs(source, 'html.parser')
button = soup.find('button', text=re.compile(r'Follow'))
xpath_for_the_button = xpath_soup(button)
elm = driver.find_element_by_xpath(xpath_for_the_button)
elm.click()
...and works!
( but you need writing some code to log in with an account)
Trying to get the search bar ID from this website: http://www.pexels.com
browser = webdriver.Chrome(executable_path="C:\\Users\\James\\Documents\\PythonScripts\\chromedriver.exe")
url = "https://www.pexels.com"
browser.get(url)
browser.maximize_window()
search_bar = browser.find_element_by_id("//input[#id='search__input']")
search_bar.send_keys("sky")
search_button.click()
However this isn't correct and I'm not sure how to get the search to work. First time using selenium so all help is appreciated!
There's no id attribute in the tag you are searching for. You may use css selectors instead. Here's a sample snippet:
search_bar = driver.find_element_by_css_selector("input[placeholder='Search for free photos…']");
search_bar.send_keys("sky")
search_bar.send_keys(Keys.RETURN)
Above snippet will insert 'sky' in the search bar and hit enter button.
Locating Elements gives an explanation on how to locate elements with Selenium.
The element you want to select, looks like this in the DOM:
<input required="required" autofocus="" class="search__input" type="search" placeholder="Search for free photos…" name="s">
Since there is no id specified you can't use find_element_by_id.
For me this worked:
search_bar = driver.find_element_by_xpath('/html/body/header[2]/div/section/form/input')
search_bar.send_keys("sky")
search_bar.send_keys(Keys.RETURN)
for the selection you can also use the class name:
search_bar = driver.find_element_by_class_name("search__input")
or the name tag:
search_bar = driver.find_element_by_name('s')
However, locating elements by names is probably not a good idea if there are more elements with the same name (link)
BTW if you are unsure about the xpath, the google Chrome inspection tool lets you copy the xpath from the document:
How to find what selenium see in a dom where it misses an image I see on screen?
Context: I have a Selenium python test
browser.wait_to_find_visible_element(By.ID, 'image')
that sometimes can't find an image that I see on the browser selenium launched for the test:
<div id="container">
<img id='image' src=''/>
</div>
To find out what selenium see instead, I get the enclosing div:
element = browser.find_displayed_elements(By.CSS_SELECTOR, '#container')
print element
which prints:
selenium.webdriver.remote.webelement.WebElement object at 0x9b3876c
and try to get the dom:
dom = browser.driver.execute_script('return arguments[0].parentNode', element)
print dom
which prints
None
What I'm missing?
Have you tried this
element = browser.find_displayed_elements(By.CSS_SELECTOR, '#container')
source_code = element.get_attribute("innerHTML")
# or
source_code = element.get_attribute("outerHTML")