How to extract the text through a href using Selenium and Python

How to extract the text through a href using Selenium and Python - python

I looked on every Posts but didnt get the solution i want.
browser.find_element_by_xpath("//*/section/main/div/div[3]/article/div[1]/div/div[1]/div[1]/a")
pic = browser.find_elements_by_xpath("//*/section/main/div/div[3]/article/div[1]/div/div[1]/div[1]/a").get(title)
print(title)
Thats my Code just now but I Need exactly this:
<div class="eLAPa"><div class="KL4Bh"><img class="FFVAD" decoding="auto" sizes="281.671875px" srcset="https://scontent-frx5-1.cdninstagram.com/vp/38a5fdde34937b2b3e4e600da56c46b5/5C0D05B3/t51.2885-15/e15/s150x150/46276509_388170972011907_7609813800358803282_n.jpg 150w,https://scontent-frx5-1.cdninstagram.com/vp/74b222ca252a488bb5eb110347283c3b/5C0CB7F9/t51.2885-15/e15/s240x240/46276509_388170972011907_7609813800358803282_n.jpg 240w,https://scontent-frx5-1.cdninstagram.com/vp/af2eff79d9c38b0f97a763041e3a99e5/5C0D10C3/t51.2885-15/e15/s320x320/46276509_388170972011907_7609813800358803282_n.jpg 320w,https://scontent-frx5-1.cdninstagram.com/vp/bad123f6264636de35489b7d8dc2cbed/5C0CC199/t51.2885-15/e15/s480x480/46276509_388170972011907_7609813800358803282_n.jpg 480w,https://scontent-frx5-1.cdninstagram.com/vp/93a4e36addf453f754e4caba706fe93a/5C0CC82C/t51.2885-15/e15/s640x640/46276509_388170972011907_7609813800358803282_n.jpg 640w" src="https://scontent-frx5-1.cdninstagram.com/vp/93a4e36addf453f754e4caba706fe93a/5C0CC82C/t51.2885-15/e15/s640x640/46276509_388170972011907_7609813800358803282_n.jpg" style=""></div><div class="_9AhH0"></div></div><div class="u7YqG"><div class="Byj2F"><span class="glyphsSpriteVideo_large u-__7" aria-label="Video"></span></div></div>
a href and then Comes a link which i need

To extract the text Video you can use either of the following solutions:
css_selector:
driver.find_element_by_css_selector("a[href='/p/BrDudIUBuNr/'] span.glyphsSpriteVideo_large").get_attribute("aria-label")
xpath:
driver.find_element_by_xpath("//a[#href='/p/BrDudIUBuNr/']//span[contains(#class,'glyphsSpriteVideo_large')]").get_attribute("aria-label")

Related

Python Web-scraping youtube.com BeautifulSoup4 problem

I am trying to get the author of every video on the YouTube homepage by web-scraping with BeautifulSoup4.
This is the chunk of HTML I am trying to navigate to.
<a class="yt-simple-endpoint style-scope yt-formatted-string" spellcheck="false" href="/c/ApertureScience" dir="auto">Aperture</a>
With the link: https://www.youtube.com/
And I am trying to get the item "Aperture".
The problem is that I can't seem to navigate correctly to the data, I have been trying this:
source = urllib.request.urlopen('https://www.youtube.com/').read()
soup = bs.BeautifulSoup(source,'lxml')
for i in soup.find_all('a', class_='yt-simple-endpoint style-scope yt-formatted-string'):
print(i)
And nothing prints, I think it is because of the weird spaces in the class name but I don't know how to get around that.
If any ideas help, thank you!

try the syntax:
find_all('a',{'class' : 'yt-simple-endpoint style-scope yt-formatted-string'})
and for the 'Aperture' use string or content or text.
And if the content is Dynamic, you could use Selenium.

Python selenium webdriver select href select partial text

I'm using Python 3.8.5 on Ubuntu 20.04. Using selenium webdriver on chrome, I want to download the attachment by specifying the licenceId number (1467250) which is included in this element:
<a xmlns="http://www.w3.org/1999/xhtml" href="#" onclick="if(typeof jsfcljs == 'function'){jsfcljs(document.forms['myApplicationsResultsForm'],'myApplicationsResultsForm:searchResultsTable:0:j_id339:0:j_id340,myApplicationsResultsForm:searchResultsTable:0:j_id339:0:j_id340,licenceId,1467250,statusHistoryId,2600790,fileName,ROL_1467250_20200817-142839.pdf,attachmentType,ROL','');}return false" class="pageLink"><img src="/oplinc2/images/pdf.jpg" alt="ROL_1467250_20200817-142839.pdf" height="24" style="border-width: 0px;" title="ROL_1467250_20200817-142839.pdf" width="24" /></a>
I am able to download this link by clicking on the css_selector:
pdf = driver.find_element_by_css_selector('#myApplicationsResultsForm\:searchResultsTable\:0\:j_id335 > a')
pdf.click()
Am I able to use partial text within the element to locate and download attachment eg. licenceID, 1467250? There are many of these attachments. I tried the partial text example from the docs but this didn't work for me:
>>> driver.find_element_by_partial_link_text('1467250')
NoSuchElementException: Message: no such element: Unable to locate element: {"method":"partial link text","selector":"1467250"}
Edit
This question is similar to #Ajay link to this solution except this element has slightly different href. Still not sure how to access onclick

try this
links = driver.find_elements_by_partial_link_text('https://websites.com/activation.php?a=act')
for link in links:
print(link.get_attribute("href"))

driver.find_element_by_xpath('//*[contains(#onclick, "1467250")]')
Replacing the a with * finds the element.

How to find and click an image link using the image's src (Selenium, Python)

I would like to click an image link and I need to be able to find it by its src, however it's still not working for some reason. Is this even possible? This is what I'm trying:
#Find item
item = WebDriverWait(driver, 100000).until(EC.presence_of_element_located((By.XPATH, "//img[#src=link]")))
#item = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//img[#alt='Bzs36xl9 xa']")))
item.click()
link = //assets.supremenewyork.com/170065/vi/BZS36xl9-xA.jpg in the above code. This matches the HTML from below.
The second locator works (finding image using alt), but I will only have the image source when the program actually runs.
HTML for the webpage:
<article>
<div class="inner-article">
<a style="height:81px;" href="/shop/accessories/h68lyxo2h/llhxzvydj">
<img width="81" height="81" src="//assets.supremenewyork.com/170065/vi/BZS36xl9-xA.jpg" alt="Bzs36xl9 xa">
</a>
</div>
</article>
I don't see why finding by alt would work and not src, is this possible? I saw another similar question which is where I got my solution but it didn't work for me. Thanks in advance.
EDIT
To find the link I have to parse through a website in JSON format, here's the code:
#Loads Supreme JSON website into an object
url = urllib2.urlopen('https://www.supremenewyork.com/mobile_stock.json')
obj = json.load(url)
items = obj["products_and_categories"]["Accessories"]
itm_name = "Sock"
index = 0;
for i in items:
if(itm_name in items[index]["name"]):
found_url = i["image_url"]
break
index += 1
str_link = str(found_url)
link = str_link.replace("ca","vi")

Use WebDriverWait and element_to_be_clickable.Try the following xpath.Hope this will work.
link ='//assets.supremenewyork.com/170065/vi/BZS36xl9-xA.jpg'
item = WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.XPATH, "//div[#class='inner-article']/a/img[#src='{}']".format(link))))
print(item.get_attribute('src'))
item.click()

item = WebDriverWait(driver, 100000).until(EC.presence_of_element_located((By.XPATH, "//img[#src=link]")))
Heres your problem, I can't believe it didn't jump out at me. You're asking the driver to find an element with a src of "link" NOT the variable link that you've defined earlier. Idk how to pass in variables into xpaths but i do know that you can use stringFormat to create the correct xpath string just before calling it.
i also dont speak python, so here's some pseudo java/c# to help you get the picture
String xPathString = String.Format("//img[#src='{0}']", link);
item = WebDriverWait(driver, 100000).until(EC.presence_of_element_located((By.XPATH, xPathString)))

How to click on the element with Selenium Python

I'm trying to click a the element with text as I don't have the telephone on this website.
So I find the element with inspect. here is the element in html:
<span class="toggle-link link_has-no-phone" role="button">I don't have a telephone number</span>
In my nonfunctional code i wrote this:
r = driver.find_element_by_xpath("//*[#id='root']/div/div[2]/div/main/div/div/div/form/div[3]/div/div[2]/div/span")
r.click
The button is never clicked and nothing happens i get no error and i can't click it any help would be appreciated.

You can use css selector below to get span:
r = driver.find_element_by_css_selector(".link_has-no-phone")
r.click()

r = driver.find_element_by_xpath("//*[#id='root']/div/div[2]/div/main/div/div/div/form/div[3]/div/div[2]/div/span")
r.click()
You just forgot the parenthesis

To click on the element with text as I don't have a telephone number you can use either of the Locator Strategies:
css_selector:
driver.find_element_by_css_selector("span.toggle-link.link_has-no-phone").click()
xpath:
driver.find_element_by_xpath("//span[#class='toggle-link link_has-no-phone']").click()

(Python) How to use driver.find_element_by_link_text when two texts are there in between <a> tag

I have this HTML
text1</span> <br /><span class="UC">text2</span>
I want to get the hyperlink and click on it. I write:
link = driver.find_element_by_link_text('text')
link.click()
But the problem is there are two texts in between "a" tag. How do I modify the syntax?

Try below code:
link = driver.find_element_by_link_text('text1\ntext2')
link.click()
There is also possibility to find element by "text1" or "text2" using find_element_by_partial_link_text():
link = driver.find_element_by_partial_link_text('text1')
link.click()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to extract the text through a href using Selenium and Python - python

Related

Python Web-scraping youtube.com BeautifulSoup4 problem

Python selenium webdriver select href select partial text

How to find and click an image link using the image's src (Selenium, Python)

How to click on the element with Selenium Python

(Python) How to use driver.find_element_by_link_text when two texts are there in between <a> tag

Categories

Resources