the webpage is : https://www.vpgame.com/market/gold?order_type=pro_price&order=desc&offset=0
As you can see there are 25 items in the selling part of this page that when you click them it opens a new tab and show you that specific item details.
Now I want to make a program to get those 25 item URLs and save them in a list, and my problem is as you can see in page inspect, their tags are which should be and also I can't find any 'href' attributes that related to them.
# using selenium and driver = webdriver.Chrome()
link = driver.find_elements_by_tag_name('a')
link2 = [l.get_attribute('href') for l in link]
I thought I can do it with above code but the problem is what I said. any suggestion?
Looks like you are trying to scrape a page that is powered by react. There are no href tags because javascript is powering all the linking. Your best bet is to use selenium to execute a click on each of the div objects, switch to the newly tabe, and use something like this code to get the URL of the page it's taken you to:
import time
links = driver.find_elements_by_class_name('card-header')
urls = []
for link in links:
new_page = link.click()
driver.switch_to.window(driver.window_handles[1])
url = driver.current_url
urls.append(url)
driver.close()
driver.switch_to.window(driver.window_handles[0])
time.sleep(1)
Note that the code closes the new tab each time and goes back to the main tab. I added time.sleep() so it doesn't go too fast.
This question already has answers here:
Python Selenium: Block-Title is not properly verified. (Magento Cloud)
(2 answers)
Closed 2 years ago.
I am trying to access the text of an element using selenium with Python. I can access the elements themselves just fine, but when I try to get the text it doesn't work.
This is my code:
from selenium import webdriver
driver = webdriver.Chrome() # I removed the path for my post, but there is one that works in my actual code
URL = "https://www.costco.com/laptops.html"
driver.get(URL)
prices = driver.find_elements_by_class_name("price")
print([price.text for price in prices])
If I run this code I get: selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
However, if I were to print out the elements themselves, I have no problem.
I read some previous posts about the stale element exception, but I don't understand why it applies to me in this case. Why would the DOM change when I try to access the text? Why is this happening?
Turn out you just need to wait:
from selenium import webdriver
import time
driver = webdriver.Chrome() # I removed the path for my post, but there is one that works in my actual code
URL = "https://www.costco.com/laptops.html"
driver.get(URL)
time.sleep(3)
prices = driver.find_elements_by_class_name("price")
print([price.text for price in prices])
Output:
['$1,999.99', '$2,299.99', '', '', '$769.99', '', '$799.99', '$1,449.99', '$1,199.99', '$1,199.99', '$1,999.99', '$1,599.99', '$1,299.99', '$2,299.99', '$1,549.99', '$1,499.99', '$599.99', '$1,699.99', '$1,079.99', '$2,999.99', '$1,649.99', '$1,499.99', '$2,399.99', '$1,499.97', '$1,199.99', '$1,649.99', '$849.99', '']
The correct way to do this is to use WebDriverWait. See
Old answer:
I am not entirely sure why that is happening. But I would suggest you try BeautifulSoup:
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome() # I removed the path for my post, but there is one that works in my actual code
URL = "https://www.costco.com/laptops.html"
driver.get(URL)
soup = BeautifulSoup(driver.page_source)
divs = soup.find_all("div",{"class":"price"})
[div.text.replace("\t",'').replace("\n",'') for div in divs]
Output:
['$1,099.99',
'$399.99',
'$1,199.99',
'$599.99',
'$1,049.99',
'$799.99',
'$699.99',
'$949.99',
'$699.99',
'$1,999.99',
'$449.99',
'$2,699.99',
'$1,149.99',
'$1,599.99',
'$1,049.99',
'$1,249.99',
'$299.99',
'$1,799.99',
'$749.99',
'$849.99',
'$2,299.99',
'$999.99',
'$649.99',
'$799.99']
I am hoping someone can help me handle nested loop in selenium. I am trying to Scrape a website using selenium, it happens that i have to scrape multiple information with different links.
So i got all the links and looped through each, but in the process, the first link only displayed the items i needed, then the code breaks.
def get_financial_info(self):
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--window-size=1920x1080")
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path='/home/miracle/chromedriver')
driver.get("https://www.financialjuice.com")
try:
WebDriverWait(driver, 60).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='trendWrap']")))
except TimeoutException:
driver.quit()
category_url = driver.find_elements_by_xpath("//ul[#class='nav navbar-nav']/li[#class='text-uppercase']/a[#href]")
for record in category_url:
driver.get(record.get_attribute("href"))
news = {}
title_element = driver.find_elements_by_xpath("//p[#class='headline-title']")
for news_record in title_element:
news['title'] = news_record.text
print news
Your category_url will be valid only on page where you've defined it and after first re-direction to another page it becomes stale...
You need to replace
category_url = driver.find_elements_by_xpath("//ul[#class='nav navbar-nav']/li[#class='text-uppercase']/a[#href]")
with
category_url = [a.get_attribute("href") for a in driver.find_elements_by_xpath("//ul[#class='nav navbar-nav']/li[#class='text-uppercase']/a")]
and then loop through the list of links as
for record in category_url:
driver.get(record)
I tried to get all URLs from this website:
https://www.bbvavivienda.com/es/buscador/venta/vivienda/todos/la-coruna/
There are a lot of links like https://www.bbvavivienda.com/es/unidades/UV_n_UV00121705 inside but I'm not able to recover them with Selenium. Any idea how to do it?
I add more info about how I tried it. obviously... i'm starting with python, selenium, etc... thanks in advance:
**from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome("D:\Python27\selenium\webdriver\chrome\chromedriver.exe")
driver.implicitly_wait(30)
driver.maximize_window()
driver.get("https://www.bbvavivienda.com/es/buscador/venta/vivienda/todos/la-coruna/")
urls=driver.find_element_by_css_selector('a').get_attribute('href')
print urls
links = driver.find_elements_by_partial_link_text('_self')
for link in links:
print link.get_attribute("href")
driver.quit()**
following code shall work. You are using the wrong identifier for the link.
driver = webdriver.Chrome()
driver.implicitly_wait(30)
driver.maximize_window()
driver.get("https://www.bbvavivienda.com/es/buscador/venta/vivienda/todos/la-coruna/")
urls=driver.find_element_by_css_selector('a').get_attribute('href')
print urls
for link in driver.find_elements_by_xpath("//a[#target='_self']"):
try:
print link.get_attribute("href")
except Exception:
pass
driver.quit()
I don't know python but normally in Java we can find all the elements in the webpage having tag as "a" for finding the links in the webpage. You can find the below code snippet useful.
List<WebElement> links = driver.findElements(By.tagName("a"));
System.out.println(links.size());
for (int i = 1; i<=links.size(); i=i+1)
{
System.out.println(links.get(i).getText());
}
I am trying to scrape a google webpage for the title inside a td, this is the code I have got so far but I am missing something.
from selenium import webdriver
case_url = "http://www.google.com/finance?q=NYSE%3Acalm&ei=7DIoVcKZNo2ZjALz8YCYCw"
driver = webdriver.Firefox()
driver.get(case_url)
elem = driver.find_element_by_class_name("ctsymbol")
print(elem[1])
assert "No results found." not in driver.page_source
driver.close()
#
the class as seen on the browser is as follow:
IBA
Help!!
There are eleven elements with this class.
The method you're using, find_element_by_class_name, only returns one element. So with elem[1] you're asking for an element in a list, that's not actually a list.
If you want to have a list of all elements with this class, use find_elements_by_class_name - see http://selenium-python.readthedocs.org/en/latest/locating-elements.html for the difference.