Get href using xpath + id - python

I have a list of search results 9 search results from this site and I'd like to get the href link for each of the items in the search results.
Here is the xpath and selectors of the 1st, 2nd, and 3rd items' links:
'//*[#id="search-results"]/div[4]/div/ctl:cache/div[3]/div[1]/div/div[2]/div[2]/div[2]/p[4]/a'
#search-results > div.c_408104 > div > ctl:cache > div.product-list.grid > div:nth-child(8) > div > div.thumbnail > div.caption.link-behavior > div.caption > p.description > a
'//*[#id="search-results"]/div[4]/div/ctl:cache/div[3]/div[2]/div/div[2]/div[2]/div[2]/p[4]/a'
#search-results > div.c_408104 > div > ctl:cache > div.product-list.grid > div:nth-child(13) > div > div.thumbnail > div.caption.link-behavior > div.caption > p.description > a
'//*[#id="search-results"]/div[4]/div/ctl:cache/div[3]/div[4]/div/div[2]/div[2]/div[2]/p[2]/a'
#search-results > div.c_408104 > div > ctl:cache > div.product-list.grid > div:nth-child(14) > div > div.thumbnail > div.caption.link-behavior > div.caption > p.description > a
I've tried:
browser.find_elements_by_xpath("//a[#href]")
but this returns all links on the page, not just the search results. I've also tried using the id, but not sure what is the proper syntax.
browser.find_elements_by_xpath('//*[#id="search-results"]//a')

What you want is the attribute="href" of all the results...
So I'll show you an example:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
url = 'https://www.costco.com/sofas-sectionals.html'
chrome_options = Options()
chrome_options.add_argument("--start-maximized")
browser = webdriver.Chrome("C:\workspace\TalSolutionQA\general_func_class\chromedriver.exe",
chrome_options=chrome_options)
browser.get(url)
result_xpath = '//*[#class="caption"]//a'
all_results = browser.find_elements_by_xpath(result_xpath)
for i in all_results:
print(i.get_attribute('href'))
So what I'm doing here is just getting all the elements that I know to have the links and saving them to all_results, now in selenium we have a method get_attribute to extract the required attribute.
Hope you find this helpful!

Related

I want to change an attribute and give it a value with selenium

I have 19 span with the same attribute data-checked
<span id="u25-accordion-panel--61" data-type="checkbox" data-checked style="display: none;"></span>
I want to change the attribute in all the spans, I tried with 1 but I get the following error
options = webdriver.EdgeOptions()
options.add_argument("start-maximized")
driver = webdriver.Edge(options=options)
driver.implicitly_wait(20)
driver.get(
'https://www.udemy.com/course/angular-10-fundamentos-8-app/')
elem = driver.find_element(By.CSS_SELECTOR,
'#udemy > div.ud-main-content-wrapper > div.ud-main-content > div > div > div.paid-course-landing-page__container > div.paid-course-landing-page__body > div > div:nth-child(3) > div > button')
elem.click()
elems = driver.find_elements(By.CSS_SELECTOR,
'div.accordion-panel--panel--24beS > span')
driver.execute_script("arguments[0].data-checked = 'checked';", elems[1])
driver.execute_script("arguments[0].data-checked = 'checked';", elems[1])
selenium.common.exceptions.JavascriptException: Message: javascript error: Invalid left-hand side in assignment
(Session info: MicrosoftEdge=109.0.1518.78)
To change set the value of data-checked attribute for all the accordion-panel elements as checked you can use the setAttribute() method as follows:
Code block:
driver.get('https://www.udemy.com/course/angular-10-fundamentos-8-app/')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#udemy > div.ud-main-content-wrapper > div.ud-main-content > div > div > div.paid-course-landing-page__container > div.paid-course-landing-page__body > div > div:nth-child(3) > div > button"))).click()
elements = driver.find_elements(By.CSS_SELECTOR, 'div.accordion-panel--panel--24beS > span')
for element in elements:
driver.execute_script("arguments[0].setAttribute('data-checked', 'checked')", element)
Browser Snapshot:

how to access tag with css selector in selenium using descendant(>>)?

xxx, yyy is the things that i want to access with css selector in selenium
xxx=driver.find_element(By.CSS_SELECTOR,'#contents > div.tabWrap.pdtTabWrap.fixed > div.tabContents > section.tabCont.active >
div > div > div.prdDetailConWrap > div.prdType.prdType11 > div.imgWrap.imgCrop > img')
yyy=driver.find_element(By.CSS_SELECTOR,'#contents > div.tabWrap.pdtTabWrap.fixed > div.tabContents > section.tabCont.active >
div > div > div > div.prd_sec.prd_top_type01.sec01.mt0 > div > div.top_img_box > img')
xxx, yyy look similar
is it possible to access similar things(same start point, different middle point, same end point)
with just one line using descendant(>>)?
i ran
driver.find_element(
By.CSS_SELECTOR,'#contents > div.tabWrap.pdtTabWrap.fixed > div.tabContents > section.tabCont.active >> img')
but error occured
Try using a space instead:
driver.find_element( By.CSS_SELECTOR,'#contents > div.tabWrap.pdtTabWrap.fixed > div.tabContents > section.tabCont.active img')
You can read more here:
https://www.w3.org/TR/selectors/#descendant-combinators

Getting error while fetching details with web scraping in python

I am getting Error while scraping data from a site please if anyone could help me with that
my Code
html = requests.get('https://www.cryptocompare.com/coins/btc/influence/USDT').text
soup = BeautifulSoup(html, 'html.parser')
total_commit = soup.select_one(' # col-body > div > social-influence > div.row.row-zero.influence-others > div:nth-child(2) > div > div > div > div.col-md-3.td-col.brd-right > div > div.repo-tag > span > span > a').text
print(total_commit)
error
soupsieve.util.SelectorSyntaxError: Malformed id selector at position 2
line 1:
# col-body > div > social-influence > div.row.row-zero.influence-others > div:nth-child(2) > div > div > div > div.col-md-3.td-col.brd-right > div > div.repo-tag > span > span > a
^
and also if anyone can tell me how to use the Css selectors which we copy directly from inspect element in bs4.
As mentioned by David Miró removing whitespace will fix the error but to get your goal you have to deal with selenium
Selenium will render the website and you can inspect the page_source and select your Element with bs4:
soup.select_one('div.repo-tag a')['href']
Example
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome('YOUR PATH TO DRIVER')
driver.get('https://www.cryptocompare.com/coins/btc/influence/USDT')
soup=BeautifulSoup(driver.page_source, 'html.parser')
soup.select_one('div.repo-tag a')['href']
Output
https://github.com/bitcoin/bitcoin
Try removing space between # and col-body.
html = requests.get('https://www.cryptocompare.com/coins/btc/influence/USDT').text
soup = BeautifulSoup(html, 'html.parser')
total_commit = soup.select_one('#col-body > div > social-influence > div.row.row-zero.influence-others > div:nth-child(2) > div > div > div > div.col-md-3.td-col.brd-right > div > div.repo-tag > span > span > a').text
print(total_commit)
But it doesn't work because a part of the html is generated by javascript. So, you need to simulate that you are a web browser (for example with Selenium):
<div class="col-body col-body-new" id="col-body" ui-view>
<div class="loader-ccc">
<div class="loader-ccc-logo"></div>
<div class="loader-ccc-sides"></div>
</div>
In the web browser information exists:

Selenium - switch to div class that is a window

I need help selecting an element on a webpage with Selenium. I have been using Selenium on this website for about 3 weeks and so far, I can usually find an element by css selector or XPath. However, this specific section of the website is giving me a very hard time.
After I click on “reset office 365 password” a window comes up and I want to programmably put in the new password but it can’t find anything in the popup window.
Here is what the page looks like:
(I am too low of score to post pictures here) https://cdn.discordapp.com/attachments/768594779344470022/845811910577881098/unknown.png
Here is the whole element’s information:
<input type="password" tabindex="1" name="password" class="m-third pass ng-pristine ng-empty ng-invalid ng-invalid-required ng-touched" ng-model="password.value" ng-blur="password.check = false" ng-focus="password.check = true" required="" autofocus="" ng-disabled="!active">
Here is what I tried: (I tried a lot of things)
Tried clicking on the password box by using css selector – failed: Invalid selector
im_blacklistaddbutton = browser_options.browser.find_element_by_css_selector('#ng-app > div.page-container > div > div > div.vertical-tabs.j-vertical-tabs.ng-scope > div.vertical-tabs-panes.p0 > div > div > div.page-content.ng-scope > div > div > form > div > div > div.ng-isolate-scope > div.modal > div.modal-body.ng-transclude > div > reset:password > ng-form > div:nth-child(1) > div > div.validation-input > input')
im_blacklistaddbutton.send_keys(email_pd.pd)
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified
Tried clicking on the password box by using xpath selector – failed: Namespace Error
im_blacklistaddbutton = browser_options.browser.find_element_by_xpath('//*[#id="ng-app"]/div[2]/div/div/div[3]/div[2]/div/div/div[2]/div/div/form/div/div/div[3]/div[1]/div[2]/div/reset:password/ng-form/div[1]/div/div[1]/input')
im_blacklistaddbutton.send_keys(email_pd.pd)
NamespaceError: Failed to execute 'evaluate' on 'Document': The string '//*[#id="ng-app"]/div[2]/div/div/div[3]/div[2]/div/div/div[2]/div/div/form/div/div/div[3]/div[1]/div[2]/div/reset:password/ng-form/div[1]/div/div[1]/input' contains unresolvable namespaces.
Tried waiting for the element by partial link text: It timed out
wait.until(EC.visibility_of_element_located((By.PARTIAL_LINK_TEXT, 'Generate password')))
selenium.common.exceptions.TimeoutException: Message:
Tried waiting for the element by ID name text value: It timed out
wait.until(EC.text_to_be_present_in_element((By.CLASS_NAME, 'btn m-link'), "Generate Password"))
selenium.common.exceptions.TimeoutException: Message:
Tried to switch to a window or iframe but it said that the div class of "model" is not a window or an iframe.
From here I am completely lost as to why this stupid window is not accessible. Text window - why are you so mean to me?
Here is my specific function in total:
def reset_im_oa_password():
browser_options.browser.get('https://cpx.intermedia.net/ControlPanel/Menu/AccountMenu/?frameUrl=https://cpx.intermedia.net/aspx/Office365/Home/licenses#/installed/users')
wait = WebDriverWait(browser_options.browser, 10)
try:
wait.until(EC.element_to_be_clickable((By.XPATH, 'player')))
except exceptions.TimeoutException as e:
pass
browser_options.browser.switch_to_frame('mainFrame')
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '#ng-app > div.page-container > div > div > div.vertical-tabs.j-vertical-tabs.ng-scope > div.vertical-tabs-panes.p0 > div > div > div.page-content.ng-scope > div > div > form > div > div > div:nth-child(2) > div.table-wrap.table-fixed.j-table-wrap.s-wide.ng-isolate-scope > div.table-filter > div.table-filter-search.searchbox.ng-isolate-scope > div > span:nth-child(3) > input')))
im_blacklistaddbutton = browser_options.browser.find_element_by_css_selector('#ng-app > div.page-container > div > div > div.vertical-tabs.j-vertical-tabs.ng-scope > div.vertical-tabs-panes.p0 > div > div > div.page-content.ng-scope > div > div > form > div > div > div:nth-child(2) > div.table-wrap.table-fixed.j-table-wrap.s-wide.ng-isolate-scope > div.table-filter > div.table-filter-search.searchbox.ng-isolate-scope > div > span:nth-child(3) > input')
im_blacklistaddbutton.send_keys(email_or_user_selection.email_select)
im_blacklistaddbutton = browser_options.browser.find_element_by_css_selector('#ng-app > div.page-container > div > div > div.vertical-tabs.j-vertical-tabs.ng-scope > div.vertical-tabs-panes.p0 > div > div > div.page-content.ng-scope > div > div > form > div > div > div:nth-child(2) > div.table-wrap.table-fixed.j-table-wrap.s-wide.ng-isolate-scope > div.table-filter > div.table-filter-search.searchbox.ng-isolate-scope > div > span:nth-child(3) > button')
im_blacklistaddbutton.send_keys(Keys.ENTER)
wait.until(EC.element_to_be_clickable((By.XPATH, ("//*[starts-with(#id, 'btnResetPassword')]"))))
im_blacklistaddbutton = browser_options.browser.find_element_by_xpath(("//*[starts-with(#id, 'btnResetPassword')]"))
im_blacklistaddbutton.send_keys(Keys.ENTER)
try:
wait.until(EC.visibility_of_element_located((By.PARTIAL_LINK_TEXT, 'Generate password')))
except exceptions.TimeoutException as e:
pass
browser_options.browser.switch_to_window('model') # anything past this section will fail
wait.until(EC.visibility_of_element_located((By.CLASS_NAME, 'model')))
im_blacklistaddbutton = browser_options.browser.find_element_by_xpath('//*[#id="ng-app"]/div[2]/div/div/div[3]/div[2]/div/div/div[2]/div/div/form/div/div/div[3]/div[1]/div[2]/div/reset:password/ng-form/div[1]/div/div[1]/input')
im_blacklistaddbutton.send_keys(email_pd.pd)
return
if anyone needs the full code from the webpage let me know. Thanks
If this element is not really inside an iframe as you write, then, wait for it to become clickable, like this:
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[type='password']")))
im_blacklistaddbutton = browser.find_element_by_css_selector("input[type='password']")
im_blacklistaddbutton.send_keys("new_password")
But make sure that css selector input[type='password'] is unique.
If not, try this one: .validation-input>input[type='password']
(Check validation-input class name si correct as it is cut on your screenshot)
If the input frame is inside iframe nothing will work until you switch to this iframe.
Because of no webpage code, right now I can't say why the element is not detectable by Selenium but you can try one thing. Right click on the element(input tag in dom shown in picture) and go to "Copy to" option and select "Copy JS Path". Then go to console tab in dev tools and paste it. Then try to set it's value to some dummy text and see if it sets the password.
jsPath.value="some password" //this should set the password
If this works, then you can set the value by using JavaScript executor of Selenium in the same way.

I want to make a python script that scrapes (copies) all of the usernames from a person's following list

I tried to follow along with some youtube tutorials in order to make my code do what I want it to do, but I still haven't found any answer on the entire internet...
Here I tried to make the script using BeautifulSoup:
import bs4
import requests
resoult = requests.get("https://www.instagram.com/kyliejenner/following/")
src = resoult.content
Soup = bs4.BeautifulSoup(src, "lxml")
links = Soup.find_all("a")
print(links)
print("/n")
for link in links:
if "FPmhX notranslate _0imsa " in link.text:
print(link)
And here I tried to do the same thing with Selenium, but the problem is that I don't know the next steps in order to make my code copy the usernames a user is following
import selenium
from selenium import webdriver
import time
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://www.instagram.com/")
time.sleep(2)
username = driver.find_element_by_css_selector ("#loginForm > div > div:nth-child(1) > div > label >
input")
username.send_keys ("my_username")
password = driver.find_element_by_css_selector ("#loginForm > div > div:nth-child(2) > div > label >
input")
password.send_keys("password")
loginButton = driver.find_element_by_css_selector ("#loginForm > div > div:nth-child(3)")
loginButton.click()
time.sleep(3)
saveinfoButton = driver.find_element_by_css_selector ("#react-root > section > main > div > div > div
>
section > div > button")
saveinfoButton.click()
time.sleep(3)
notnowButton = driver.find_element_by_css_selector("body > div.RnEpo.Yx5HN > div > div > div >
div.mt3GC
> button.aOOlW.HoLwm")
notnowButton.click()
I would really appreciate it if someone could solve this problem. Again, all that I want my script to do is to copy the usernames from the "following" section of someones profile.

Categories