I have two questions.
I'm trying to get the text value (Medium Coverage, Liquid Formula) But there are no locators. These text values are different from pages but the locations are the same. Is there any way to get this text like
find_element_by_class('css_s6sd4y.eanm7710') -> go down to one more row then select the element
There is a class name. But I wonder if there is another way to find the element using data-at = 'sku_size_label'. And what is 'data-at' locator called?
for first question, you can use the below code :
to get the Medium Coverage :
wait = WebDriverWait(driver, 10)
ele = wait.until(EC.visibility_of_element_located((By.XPATH, "//img[#alt='Medium Coverage']/.."))).text
print(ele)
or
wait = WebDriverWait(driver, 10)
ele = wait.until(EC.visibility_of_element_located((By.XPATH, "//img[#alt='Medium Coverage']/.."))).get_attribute('innerHTML')
print(ele)
to get the Liquid Formula :
wait = WebDriverWait(driver, 10)
ele = wait.until(EC.visibility_of_element_located((By.XPATH, "//img[#alt='Liquid Formula']/.."))).text
print(ele)
or
wait = WebDriverWait(driver, 10)
ele = wait.until(EC.visibility_of_element_located((By.XPATH, "//img[#alt='Liquid Formula']/.."))).get_attribute('innerHTML')
print(ele)
for your second question :-
Yes you can use data-at = 'sku_size_label' in xpath or css :
Xpath below :
//span[contains(#data-at, 'sku_size_label')]
CSS below :
span[data-at = 'sku_size_label']
and for this question :
what is 'data-at' locator called? - They are not called locators, they are basically attributes of the respective tag.
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Related
The code below sometimes returns one (1) element, sometimes all, and sometimes none. For it to work for my application I need it to return all the matching elements in the page
Code trials:
from selenium import webdriver
from selenium.webdriver.common.by import By
def villanovan():
driver = webdriver.Chrome()
driver.implicitly_wait(10)
url = 'http://webcache.googleusercontent.com/search?q=cache:https://villanovan.com/&strip=0&vwsrc=0'
url_2 = 'https://villanovan.com/'
driver.get(url_2)
a = driver.find_elements(By.CLASS_NAME, "homeheadline")
titles = [i.text for i in a if len(i.text) != 0]
links = [i.get_attribute('href') for i in a if len(i.text) != 0]
return [titles, links]
if __name__ == "__main__":
print(villanovan())
I was expecting a list with multiple links and article titles, but recieved a list with the first element found, not all elements found.
To extract the value of href attributes you can use list comprehension and you can use either of the following locator strategies:
Using CSS_SELECTOR:
driver.get("https://villanovan.com/")
time.sleep(3)
print([my_elem.get_attribute("href") for my_elem in driver.find_elements(By.CSS_SELECTOR, "a.homeheadline[href]")])
Using XPATH:
driver.get("https://villanovan.com/")
time.sleep(3)
print([my_elem.get_attribute("href") for my_elem in driver.find_elements(By.XPATH, "//a[#class='homeheadline' and #href]")])
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Console Output:
['https://villanovan.com/22105/sports/villanova-goes-cold-in-clutch-against-no-14-marquette/', 'https://villanovan.com/22102/sports/villanova-bests-marquette-in-blowout-win-73-54/', 'https://villanovan.com/22098/news/decarbonizing-villanova-a-town-hall-on-fossil-fuel-divestment/', 'https://villanovan.com/22096/news/biology-professors-granted-1-million-for-wetlands-research/', 'https://villanovan.com/22093/news/students-create-the-space-supporting-sex-education/', 'https://villanovan.com/22098/news/decarbonizing-villanova-a-town-hall-on-fossil-fuel-divestment/', 'https://villanovan.com/22096/news/biology-professors-granted-1-million-for-wetlands-research/', 'https://villanovan.com/22044/culture/julia-staniscis-leaning-on-letters/', 'https://villanovan.com/22032/culture/villanova-sorority-recruitment-recap/', 'https://villanovan.com/22105/sports/villanova-goes-cold-in-clutch-against-no-14-marquette/', 'https://villanovan.com/22102/sports/villanova-bests-marquette-in-blowout-win-73-54/', 'https://villanovan.com/21932/opinion/villanova-should-be-free-for-families-earning-less-than-100000/', 'https://villanovan.com/21897/opinion/grasshoppergate-the-state-of-villanova-dining/', 'https://villanovan.com/22105/sports/villanova-goes-cold-in-clutch-against-no-14-marquette/', 'https://villanovan.com/22102/sports/villanova-bests-marquette-in-blowout-win-73-54/', 'https://villanovan.com/22093/news/students-create-the-space-supporting-sex-education/', 'https://villanovan.com/22090/news/mlk-day-of-service/', 'https://villanovan.com/22087/news/university-updates-covid-procedures/']
I was trying to select all the 264 Recipients from the second form on this website and I used the code below:
s = Service('./chromedriver.exe')
driver = webdriver.Chrome(service=s)
url = "https://armstrade.sipri.org/armstrade/page/trade_register.php"
driver.get(url)
sleep(5)
and here's the loop:
for b in range(1, 264):
Recipients = driver.find_element(By.XPATH, "//*[#id='selFrombuyer_country_code']/option[%d]" % b)
Recipients.click()
sleep(0.2)
# click "ADD" option
driver.find_element(By.XPATH, "/html/body/font/div/form/table/tbody/tr[3]/td/table/tbody/tr/td[2]/input[1]").click()
sleep(0.2)
When I test this code, the loop worked...partly: it skips random elements and only got like half of the elements selected.
And here are some of the elements my loop ignored, which look perfectly normal:
//*[#id="selFrombuyer_country_code"]/option[2]
//*[#id="selFrombuyer_country_code"]/option[26]
Why can't my loop traverse all the elements?
This should work for you. No hard-coded ranges:
# Below are the Additional imports needed
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Code after the driver is instantiated
driver.get("https://armstrade.sipri.org/armstrade/page/trade_register.php")
c_codes = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[#id='selFrombuyer_country_code']//option")))
print(len(c_codes))
ls1=[]
ls2=[]
for each_code in c_codes:
ls1.append(each_code.text)
each_code.click()
time.sleep(0.2)
WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.XPATH, "//*[#id='selFrombuyer_country_code']/..//..//input[contains(#value, 'Add')]"))).click()
all_c = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[#name='selselTobuyer_country_code']//option")))
for each_c in all_c:
ls2.append(each_c.text)
print(len(ls1))
print(len(ls2))
print(ls1)
print(ls2)
driver.close()
I desire to crawl images in a single webpage, and the image URLs are in a div tag, which are validated as a style value, like so:
<div class="v-image__image v-image__image--cover" style="background-image: url("https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/f3ea4910e239eb704af755c65f548e35_car.png"); background-position: center center;"></div>
I want to get that: https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/f3ea4910e239eb704af755c65f548e35_car.png
But when I try chrome driver find elements or soup.find they return empty lists and that's because the text between the div tag is nothing.
I'm looking for a way to get inside the div tag, not between.
To get all the images you should, use presence of all the elements.
Once you've the list in Python, such as all_images (See below), you can remove the () and "" like below.
Sample code :
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
#driver.implicitly_wait(50)
wait = WebDriverWait(driver, 20)
links = []
driver.get("https://mashinbank.com/ad/GkbI20tzp3/%D8%AE%D8%B1%DB%8C%D8%AF-%D9%BE%D8%B1%D8%A7%DB%8C%D8%AF-111-SE-1397")
all_images = WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.XPATH, "//div[contains(#class,'v-image__image--cover')]")))
for image in all_images:
a = image.get_attribute('style')
b = a.split("(")[1].split(")")[0].replace('"', '')
links.append(b)
print(links)
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Output :
['https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/cabdf9f3f379e5b839300f89a90ab27e_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/e1c6c75dda980a6b4b4a83932ed49832_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/81ef7c57ca349485a9ba78bf0e42e13f_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/02bd13f2c5ce936ec3db10706c03854d_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/cabdf9f3f379e5b839300f89a90ab27e_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/e1c6c75dda980a6b4b4a83932ed49832_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/81ef7c57ca349485a9ba78bf0e42e13f_car.png', 'https://mashinbank.com/api/parse/files/7uPtEVa0plEFoNExiYHcbtL1rQnpIGnnPHVuvKKu/02bd13f2c5ce936ec3db10706c03854d_car.png']
Selenium solution for this issue can be as following:
You probably should wait for the element visibility and only after that to extract the element attribute.
Split the entire style attribute to get the url value.
Like this:
wait = WebDriverWait(driver, 20)
element = wait.until(EC.visibility_of_element_located((By.XPATH, "//div[contains(#style,'https://mashinbank.com/api/parse/files')]")))
style_content = element.get_attribute("style")
url = style_content.split(";")[1]
I have wrote the code
import os
from webdriver_manager.chrome import ChromeDriverManager
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = Options()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--start-maximized')
options.page_load_strategy = 'eager'
driver = webdriver.Chrome(options=options)
url = "https://www.moneycontrol.com/india/stockpricequote/chemicals/tatachemicals/TC"
driver.get(url)
try:
wait = WebDriverWait(driver, 10)
except Exception:
driver.send_keys(Keys.CONTROL +'Escape')
driver.find_element_by_link_text("Bonus").click()
try:
wait = WebDriverWait(driver, 5)
except Exception:
driver.send_keys(Keys.CONTROL +'Escape')
for i in range(0, 50):
bonus_month = driver.find_element_by_xpath ("//*[#class= 'mctable1.thborder.frtab']/tbody/tr[%s]/td[1]"%(i))
print(bonus_month.text)
bonus = driver.find_element_by_xpath ("//*[#class= 'mctable1.thborder.frtab']/tbody/tr[%s]/td[1]"%(i))
print(bonus.text)
This gives me error
no such element: Unable to locate element: {"method":"xpath","selector":"//*[#class= 'mctable1.thborder.frtab']/tbody/tr[0]/td[1]"}
Element on the page:
Where I am making mistake in finding Exbonus and Ratio?
First use the clickable method from the expected conditions to check that the element is clickable within given time to just make sure it is operational.
Once the click action performed on the bonus link the table takes some time to finishes loading. In meantime selenium tries to fetch the table content and fails to get it. So again add wait for the element to load and then grab the table using Xpath and iterate over the rows of the table. -
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//ul[#id='corporation_tab']//a[#href='#ca_bonus']")))
driver.find_element_by_link_text("Bonus").click()
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//tbody[#id='id_b']/tr")))
tableRows = driver.find_elements_by_xpath('//tbody[#id="id_b"]/tr')
print(tableRows)
for i in tableRows:
AnnouncementDate = i.find_element_by_xpath('td[1]').text
exbonus = i.find_element_by_xpath('td[2]').text
ratio = i.find_element_by_xpath('td[3]').text
print(AnnouncementDate + " \t\t " + exbonus + " \t\t " + ratio)
This returns me the output -
You will need following extra import -
from selenium.webdriver.support import expected_conditions as EC
This partially will solve your issue with locators:
1 To find Ex-Bonus use css selector: #id_b>tr>td:nth-of-type(2)
2 To find ratio use also css selector, #id_b>tr>td:nth-of-type(3)
To iterante use:
#id_b>tr:nth-of-type(x)>td:nth-of-type(3)
where x is the number of row.
For example, #id_b>tr:nth-of-type(1)>td:nth-of-type(3) will give you text with ratio 3:5
If you avoid avoid using #id_b, this locator will not be unique.
Instead of range function I'd use find_elements_by_css_selector. Try following:
rows = driver.find_elements_by_css_selector("#id_b>tr")
for row in rows:
bonus = row.find_element_by_css_selector("td:nth-of-type(2)").text
ratio = row.find_element_by_css_selector("td:nth-of-type(3)").text
There are only 5 of this elements on the page. I'll have to click See More.
I am facing an issue while parsing data from the 'Literature ' tab from the third table. The steps I took to reach the table:
Go to ibl.mdanderson.org/fasmic/#!
Type and select AKT1 (3 mutations) (NOTE:'GO' button doesn't work, please click the option from the drop-down)
Click on the green button with the text 'MS', a new table will appear.
In this new table, there will be a tab called literature, I need the literature text and the PMID.
I tried the following code, but it gives an empty list:
xyz= driver.find_element_by_xpath("//*[contains(text(),'Literature')]").click()
for elements in driver.find_elements_by_xpath('//div[#class="tab-pane ng-scope active"]'):
soup = BeautifulSoup(driver.page_source, 'lxml')
table = soup.find('div', attrs={'id': "literature_div"})
table_body = table.find('h4')
rows = table.find_all('h4')
for row in rows:
cols = row.find_all('h4')
# cols = [ele.text.strip() for ele in cols]
litrature.append([ele for ele in cols if ele]) # Get rid of empty value
print("Data from COLUMN 1:")
print(litrature)
How can I resolve this?
UPDATE
When I try to click on the 'Next ' button under the 'literature' table, I get the following error:
"Message: The element reference of is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed "
Following is the line I added to click on the "NEXT" buton: driver.find_element_by_xpath('//a[#ng-click="selectPage(page + 1, $event)"]').click()
How can I resolve this?
you need to wait 3 times
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get('https://ibl.mdanderson.org/fasmic/#!/')
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.XPATH , '//input')))
input = driver.find_element_by_xpath("//input")
input.send_keys("AKT1\n")
button = WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CLASS_NAME , 'btn-tab-avail')))
button.click()
driver.find_element_by_xpath("//*[contains(text(),'Literature')]").click()
WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR, '#literature_div h4')))
rows = driver.find_elements_by_css_selector("#literature_div h4")
litrature = []
for item in rows:
item = item.text
litrature.append(item)
print("Data from COLUMN 1:")
print item
Like this? Someone with more knowledge of python waits can certainly improve on my wait lines.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
url = "https://ibl.mdanderson.org/fasmic/#!/"
d = webdriver.Chrome()
wait = WebDriverWait(d, 10)
d.get(url)
d.find_element_by_css_selector('[type=text]').send_keys('AKT1 (3 mutations)')
d.find_element_by_css_selector("input[type='text']").send_keys(Keys.RETURN)
btn = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".btn.btn-default.btn-tab-avail")))
btn.click()
d.find_element_by_css_selector("[heading=Literature]").click()
ele = wait.until(EC.text_to_be_present_in_element((By.CSS_SELECTOR, "#literature_div [ng-repeat]"), "PMID"))
eles = d.find_elements_by_css_selector("#literature_div [ng-repeat]")
for item in eles:
print(item.text,"\n")
d.quit()