Using selenium to parse a class from browser - python

I am trying to scrape a google webpage for the title inside a td, this is the code I have got so far but I am missing something.
from selenium import webdriver
case_url = "http://www.google.com/finance?q=NYSE%3Acalm&ei=7DIoVcKZNo2ZjALz8YCYCw"
driver = webdriver.Firefox()
driver.get(case_url)
elem = driver.find_element_by_class_name("ctsymbol")
print(elem[1])
assert "No results found." not in driver.page_source
driver.close()
#
the class as seen on the browser is as follow:
IBA
Help!!

There are eleven elements with this class.
The method you're using, find_element_by_class_name, only returns one element. So with elem[1] you're asking for an element in a list, that's not actually a list.
If you want to have a list of all elements with this class, use find_elements_by_class_name - see http://selenium-python.readthedocs.org/en/latest/locating-elements.html for the difference.

Related

Selenium python function find_elements_by_css_selector() not returning expected data

I am new to Selenium and am trying to scrape data (just names for now) from these bourbon product cards on thewhiskeyexchange.com. I have tested all of my css (and xpath) selectors in scrapy shell so I know that they are correct, but the output returns coded information about the "session" and the element that I do not understand. The quantity of items in the list seem to be correct, so maybe Selenium is doing exactly what it is supposed to do and I just dont know how to convert the output to something I should use. How do I get just the names from the product cards?
I have tried both the driver and the local selector functions Selenium offers with the same results. beautiful soup functions return the data I need, but that method is too inefficient for the scope of the project I am working on. Any insight as to how I can fix this would be greatly appreciated.
IN[]:
chrome_options = Options()
chrome_options.add_argument("--incognito")
chrome_options.add_argument("--window-size=1920x1080")
chrome_options.binary_location = "C:\Program Files\Google\Chrome\Application\chrome.exe"
IN[]:
driver = webdriver.Chrome(ChromeDriverManager().install())
IN[]:
url = "https://www.thewhiskyexchange.com/c/639/bourbon-whiskey"
driver.get(url)
time.sleep(5) # second delay to improve visual quality
html = driver.page_source
html # HTTP request response object is as expected
IN[]:
els = driver.find_elements_by_css_selector('p.product-card__name')
# local method: els = driver.find_elements(By.CSS_SELECTOR, 'p.product-card__name')
els
OUT[]:
[<selenium.webdriver.remote.webelement.WebElement (session="e521768d8df1dd788b1fda816299b0b5", element="b9384a19-f8c9-46b2-be99-780200dcba99")>,
<selenium.webdriver.remote.webelement.WebElement (session="e521768d8df1dd788b1fda816299b0b5", element="af76dfa8-b86c-426a-8ad8-30ea904ed11b")>,
<selenium.webdriver.remote.webelement.WebElement (session="e521768d8df1dd788b1fda816299b0b5", element="58b14e5a-6bc3-443a-807f-ec696e83b096")>, ...
find_elements
returns a list of web element whereas find_element returns a single web element.
You can iterate over the list and extract the text like it below:
IN[]:
els = driver.find_elements(By.CSS_SELECTOR, 'p.product-card__name')
for e in els:
print(e.text)
Also, note that find_elements_by_css_selector has been deprecated in newer selenium version (also known as Selenium 4) so one should use find_elements(By.CSS_SELECTOR, "") instead.

Get html of inspect element source with selenium

I'm working in selenium with Chrome.
The webpage I'm accessing updates dynamically.
I need the html that shows the results, I can access it when I do 'inspect element'.
I don't get how I need to access that html from my code. I always get the original html.
I tried this: Get HTML Source of WebElement in Selenium WebDriver using Python
browser.get('http://bijsluiters.fagg-afmps.be/?localeValue=nl')
searchform = browser.find_element_by_class_name('iceInpTxt')
searchform.send_keys('cefuroxim')
button = browser.find_element_by_class_name('iceCmdBtn').click()
element = browser.find_element_by_class_name('contentContainer')
html = element.get_attribute('innerHTML')
browser.close()
print(html)
It seems that it's working after some delay. If I were you I should try to experiment with the delay time.
from selenium import webdriver
import time
browser = webdriver.Chrome()
browser.get('http://bijsluiters.fagg-afmps.be/?localeValue=nl')
searchform = browser.find_element_by_class_name('iceInpTxt')
searchform.send_keys('cefuroxim')
button = browser.find_element_by_class_name('iceCmdBtn').click()
time.sleep(10)
element = browser.find_element_by_class_name('contentContainer')
html = element.get_attribute('innerHTML')
browser.close()
print(html)
Addition: a nicer way is to let the script proceed when an element is available (because of time it takes with JS (for example) before a specific element has been added to the DOM). The element to look for in your example is table with id iceDatTbl (for what I could find after a quick look).

Selenium webdriver error - stale element refernce

So i tried to practice on my selenium skills using the website https://instagram.com
I Can Find an object using selenium & click, can not send keys to it.
So basically , i tried to automate comments on Instagram , iv'e found the "Add Comment" , successfully clicked on it , but when i tried to send keys i got an error.
CODE SECTION:
comment_picture = driver.find_elements_by_tag_name('textarea')
for l in comment_picture:
try:
print l.get_attribute("class")
l.click()
time.sleep(1)
l.send_keys('test')
ERROR SECTION:
Message: stale element reference: element is not attached to the page document
The expected result should be that i can comment on each photo on Instagram.
I don't want an answer. i really want to learn selenium. if someone know that i'm doing wrong , it would be great if i will get an hint and not a full answer.
stale element is because the element has been modified when you click, you have to re-find the element like this
comment_picture = driver.find_elements_by_tag_name('textarea')
index = 1 # xpath index start from 1
for txt in comment_picture:
try:
txt.click()
time.sleep(1)
# re-search the textarea
txt = driver.find_element_by_xpath('(//textarea)[%s]' % index)
txt.send_keys('test')
index = index + 1
element references returned by the .find_element_* functions are from the intial page load. When you click(), you are navigating away from the intial page, making all of the element references stale. You will need to call find_elements again before you send keys to the new elements.
You have to make sure you are signed in and able to comment.
also executable_path='your/path'
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from time import sleep
driver = webdriver.Chrome(executable_path="/home/chromedriver")
driver.get('https://www.instagram.com/p/Bs0y4Myg3Hk/')
comment=driver.find_element_by_xpath('//*[#id="react-root"]/section/main/div/div/article/div[2]/section[3]/div/form/textarea')
comment.send_keys('hello')
comment.send_keys(Keys.RETURN)
sleep(10)
driver.close()

How to select a value from a drop-down using Selenium from a website with special setting- Python

Note: I particularly deal with this website
How can I use selenium with Python to get the reviews on this page to sort by 'Most recent'?
What I tried was:
driver.find_element_by_id('sort-order-dropdown').send_keys('Most recent')
from this didn't cause any error but didn't work.
Then I tried
from selenium.webdriver.support.ui import Select
select = Select(driver.find_element_by_id('sort-order-dropdown'))
select.select_by_value('recent')
select.select_by_visible_text('Most recent')
select.select_by_index(1)
I've got: Message: Element <select id="sort-order-dropdown" class="a-native-dropdown" name=""> is not clickable at point (66.18333435058594,843.7999877929688) because another element <span class="a-dropdown-prompt"> obscures it
This one
element = driver.find_element_by_id('sort-order-dropdown')
element.click()
li = driver.find_elements_by_css_selector('#sort-order-dropdown > option:nth-child(2)')
li.click()
from this caused the same error msg
This one from this caused the same error also
Select(driver.find_element_by_id('sort-order-dropdown')).select_by_value('recent').click()
So, I'm curious to know if there is any way that I can select the reviews to sort from the most recent first.
Thank you
This worked for me using Java:
#Test
public void amazonTest() throws InterruptedException {
String URL = "https://www.amazon.com/Harry-Potter-Slytherin-Wall-Banner/product-reviews/B01GVT5KR6/ref=cm_cr_dp_d_show_all_top?ie=UTF8&reviewerType=all_reviews";
String menuSelector = ".a-dropdown-prompt";
String menuItemSelector = ".a-dropdown-common .a-dropdown-item";
driver.get(URL);
Thread.sleep(2000);
WebElement menu = driver.findElement(By.cssSelector(menuSelector));
menu.click();
List<WebElement> menuItem = driver.findElements(By.cssSelector(menuItemSelector));
menuItem.get(1).click();
}
You can reuse the element names and follow a similar path using Python.
The key points here are:
Click on the menu itself
Click on the second menu item
It is a better practice not to hard-code the item number but actually read the item names and select the correct one so it works even if the menu changes. This is just a note for future improvement.
EDIT
This is how the same can be done in Python.
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
URL = "https://www.amazon.com/Harry-Potter-Slytherin-Wall-Banner/product-reviews/B01GVT5KR6/ref=cm_cr_dp_d_show_all_top?ie=UTF8&reviewerType=all_reviews";
menuSelector = ".a-dropdown-prompt";
menuItemSelector = ".a-dropdown-common .a-dropdown-item";
driver = webdriver.Chrome()
driver.get(URL)
elem = driver.find_element_by_css_selector(menuSelector)
elem.click()
time.sleep(1)
elemItems = []
elemItems = driver.find_elements_by_css_selector(menuItemSelector)
elemItems[1].click()
time.sleep(5)
driver.close()
Just to keep in mind, css selectors are a better alternative to xpath as they are much faster, more robust and easier to read and change.
This is the simplified version of what I did to get the reviews sorted from the most recent ones. As "Eugene S" said above, the key point is to click on the button itself and select/click the desired item from the list. However, my Python code use XPath instead of selector.
# click on "Top rated" button
driver.find_element_by_xpath('//*[#id="a-autoid-4-announce"]').click()
# this one select the "Most recent"
driver.find_element_by_xpath('//*[#id="sort-order-dropdown_1"]').click()

How to loop though HTML and return id values

I am using selenium to navigate to a webpage and store the page source in a variable.
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get("http://google.com")
html1 = driver.page_source
html1 now contains the page source of http://google.com.
My question is How can I return html selectors such as id="id" or name="name".
EDIT:
For example:
The webpage I navigated to with selenium has a menu bar with 4 tabs. Each tab has an id element; id="tab1", id="tab2", and so on. I would like to return each id value. So I want tab1, tab2, so on.
Edit#2:
Another example:
The homepage on my webpage (http://chrisarroyo.me) have several clickable links with ids. I would like to be able to return/print those ids to my console.
So I would like to return the ids for the Learn More button and the ids for the links in the footer (facebookLnk, githubLnk, etc..)
If you are looking for a list of WebElements that have an ID use:
elements = driver.find_elements_by_xpath("//*[#id]")
You can then iterate over that list and use get_attribute_("id") to pull out each elements specific ID.
For name, its pretty much the same code. Except change id to name and you're set.
Thank you #stewartm you comment helped.
This ended up giving me the results I was looking for:
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get("http://chrisarroyo.me")
id_elements = driver.find_elements_by_xpath("//*[#id]")
for eachElement in id_elements:
individual_ids = eachElement.get_attribute("id")
print(individual_ids)
After running the above ^^ the output listed each of the ids on the webpage specified.
output:
navbarNavAltMarkup
learnBtn
githubLnk
facebookLnk
linkedinLnk

Categories