looping over followers name in instagram - python

I am trying to generate a list where to put the followers name for a particular person using selenium into a list
The XPath for the first user in the list is :
/html/body/div[3]/div/div/div[2]/ul/div/li[1]/div/div[2]/div[1]/div/div/a
I been trying loop over li but could reach nothing good.
Or maybe i can take the tittle for each one of this class, but i cannot perform it

What you are trying to accomplish can easily be done in plain python:
Copy the page HTML into a text file
Extract all anchor tags using regex
For each anchor tag, if href value contains anchor text value, then append the anchor text to list of followers

Try to post your code, so it would be easy for us to view and try to help you.
-HTML code
-Your code (java/python)

After followers link clicked you need to wait until follower dialog appear, use WebDriverWait
# after the link clicked
followers = WebDriverWait(driver, 5).until(
lambda d: d.find_elements_by_xpath('//div[#role="dialog"]//a[#title]')
)
for follower in followers:
print(follower.get_attribute('textContent'))
note: your xpath only return first follower

Related

How do you turn a string with url into a url that is clickable using selenium?

url="https://fourminutebooks.com/book-summaries/"
driver.get(url)
fin_list = []
page_tabs = driver.find_elements(By.CSS_SELECTOR, "a[class='post_title w4pl_post_title']")
for i in range(len(page_tabs)):
page_tabs[i] = page_tabs[i].get_attribute("href")
fin_list.append(page_tabs[i])
fin_list[0].click()
#html = driver.find_elements(By.CSS_SELECTOR,"header[class='entry-header page-header']")
print(fin_list)
I am trying to create a program that randomly emails me book summaries, and am having difficulty clicking on the link to get the HTML content. I have managed to get all the links, but they are saved as a string and I cannot click on one of the links without getting an error. **note the image below
This is without trying to get the first element.
You can not turn a string to be a web element object.
These are definitely different objects.
As a user you can click a web element on a web page.
Similarly, Selenium can click on a web element. It can not click string or int.
In case you want to collect a list of clickable objects you have to collect a clickable web elements.
In your particular case you can keep a page_tabs list to click on page_tabs[0] element later with
page_tabs[0].click()
page_tabs = [x.get_attribute('href') for x in driver.find_elements(By.CSS_SELECTOR, "a[class='post_title w4pl_post_title']")]
driver.get(page[0])
To simply access a url from the list you can do.

Extract a hyperlink from a website - Selenium

I was attempting to solve this issue for a bit of time and attempted multiple solution posted on here prior to opening this question.
I am currently attempting to a run a scraper with the following code
website = 'https://www.abitareco.it/nuove-costruzioni-milano.html'
path = Path().joinpath('util', 'chromedriver')
driver = webdriver.Chrome(path)
driver.get(website)
main = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.NAME, "p1")))
My goal hyperlink has word scheda in it:
i = driver.find_element_by_xpath('.//a[contains(#href, "scheda")]')
i.text
My first issue is that find_element_by_xpath only outputs a single hyperlink and second issue is that it is not extracting anything so far.
I'd appreciate any help and/or guidance.
You need to use find_elements instead :
for name in driver.find_elements(By.XPATH, ".//a[contains(#href, 'scheda')]"):
print(name.text)
Note that find_elements will return a list of web elements, where as find_element return a single web element.
if you specifically looking for href attribute then you can try the below code :
for name in driver.find_elements(By.XPATH, ".//a[contains(#href, 'scheda')]"):
print(name.get_attribute('href'))
There's 2 issues, looking at the website.
You want to find all elements, not just one, so you need to use find_elements, not find_element
The anchors actually don't have any text in them, so .text won't return anything.
Assuming what you want is to scrape the URLs of all these links, you can use .get_attribute('href') instead of .text, like so:
url_list = driver.find_elements(By.XPATH, './/a[contains(#href, "scheda")]')
for i in url_list:
print(i.get_attribute('href'))
It will detect all webelements that match you criteria and store them in a list. I just used print as an example, but obviously you may want to do more than just print the links.

Python selenium how to scrape list of values from website

I have a list of job roles on this website that I want to scrape. The code I am using is below:
driver.get('https://jobs.ubs.com/TGnewUI/Search/home/HomeWithPreLoad?partnerid=25008&siteid=5012&PageType=searchResults&SearchType=linkquery&LinkID=6017#keyWordSearch=&locationSearch=')
job_roles = driver.find_elements(By.XPATH, '/html/body/div[2]/div[2]/div[1]/div[6]/div[3]/div/div/div[5]/div[2]/div/div[1]/ul/li[1]/div[2]/div[1]/span/a')
for job_roles in job_roles:
text = job_roles.text
print(text)
With this code, I am able to retrieve the first role which is: Business Analyst - IB Credit Risk Change
I am unable to retrieve the other roles, can someone kindly assist
Thanks
In this case all the job names have the two CSS classes jobProperty and jobtitle.
So, since you want all the jobs, I recommend selecting by CSS selector.
The following example should work:
driver.get('https://jobs.ubs.com/TGnewUI/Search/home/HomeWithPreLoad?partnerid=25008&siteid=5012&PageType=searchResults&SearchType=linkquery&LinkID=6017#keyWordSearch=&locationSearch=')
job_roles = driver.find_elements_by_css_selector('.jobProperty.jobtitle')
for job_roles in job_roles:
text = job_roles.text
print(text)
If you want to use the xPath, you were very close. Your xPath specifically only selects the first li element (li[1]). By changing it to just li, it will find all matching xPaths:
driver.get('https://jobs.ubs.com/TGnewUI/Search/home/HomeWithPreLoad?partnerid=25008&siteid=5012&PageType=searchResults&SearchType=linkquery&LinkID=6017#keyWordSearch=&locationSearch=')
job_roles = driver.find_elements(By.XPATH, '/html/body/div[2]/div[2]/div[1]/div[6]/div[3]/div/div/div[5]/div[2]/div/div[1]/ul/li/div[2]/div[1]/span/a')
for job_roles in job_roles:
text = job_roles.text
print(text)

Python Selenium get list of all links with the same text

Using python selenium, how to find all the links with the same text in a list?
I can use the following code to find a link that has the text '...'
button = driver.find_element_by_link_text('...')
But I have more than one of these on the page and would like to click the second one.
find_element_by_ returns the first WebElement matching the search criteria. To get all the matching WebElements use find_elements_by_
driver.find_elements_by_link_text('...')
You can you use find_elements_by_link_text :
driver.find_elements_by_link_text('...')[1].click()

How to click every link and extract content inside - Python Selenium

I wanna get content inside from all links with id = "LinkNoticia"
Actually my code join in first link and extract content, but i cant access to other.
How can i do it?
this is my code (its works for 1 link)
from selenium import webdriver
driver= webdriver.Chrome("/selenium/webdriver/chromedriver")
driver.get('http://www.emol.com/noticias/economia/todas.aspx')
driver.find_element_by_id("LinkNoticia").click()
title = driver.find_element_by_id("cuDetalle_cuTitular_tituloNoticia")
print(title.text)
First of all, the fact that page has multiple elements with the same ID is a bug on its own. The whole point of ID is to be unique for each element on the page. According to HTML specs:
id = name
This attribute assigns a name to an element. This name must be unique in a document.
A lengthy discussion is here.
Since ID is supposed to be unique, most (all?) implementations of Selenium will only have function to look for one element with given ID (e.g. find_element_by_id). I have never seen a function for finding multiple elements by ID. So you cannot use ID as your locator directly, you need to use one of the existing functions that allows location of multiple elements, and use ID as just some attribute which allows you to select a group of elements. Your choices are:
find_elements_by_xpath
find_elements_by_css_selector
For example, you could change your search like this:
links = driver.find_elements_by_xpath("//a[#id='LinkNoticia']");
That would give you the whole set of links, and you'd need to loop through them to retrieve the actual link (href). Note that if you just click on each link, you navigate away from the page and references in links will no longer be valid. So instead you can do this:
Build list of hrefs from the links:
hrefs=[]
for link in links:
hrefs.append(link.get_attribute("href"))
Navigate to eachhref to check its title:
for href in hrefs:
driver.get(href);
title = driver.find_element_by_id("cuDetalle_cuTitular_tituloNoticia")
# etc

Categories