Looking element on few pages with loop “while” Python, Selenium

Looking element on few pages with loop “while” Python, Selenium - python

In my application, I have a table with employees but the table can have more than 1 page with employees. I want to check if the new employee was added (which i created) i want to check that employee in a table and click on it, with Selenium Webdriver Python. The whole idea is check first page if there is no employee with id that I'm looking for than click second page, check there and click on employee, and if there is no click 3rd page etc. I have a function that goes on pages 1 by 1 but it doesn't check that employee which i need on that pages:
id = ()
def add_new_employee(driver, first_name, last_name):
driver.find_element_by_css_selector("#menu_pim_viewPimModule").click()
driver.find_element_by_css_selector("[name='btnAdd']").click()
driver.find_element_by_css_selector("#firstName").send_keys(first_name)
driver.find_element_by_css_selector("#lastName").send_keys(last_name)
driver.find_element_by_css_selector("#photofile").\
send_keys(os.path.abspath("cloud-computing-IT.jpg"))
global id
id = driver.find_element_by_css_selector("#employeeId").get_attribute("value")
def new_employee_added(driver):
global id
driver.find_element_by_css_selector("#menu_pim_viewPimModule").click()
el = len(driver.find_elements_by_link_text("%s" % id))
while el < 1:
try:
driver.find_element_by_link_text("%s" % id).click()
except NoSuchElementException:
try:
for i in range(1, 50):
driver.find_element_by_link_text("%s" % i).click()
except NoSuchElementException:
return False
def test_new_employee(driver, first_name="Patrick", last_name="Patterson"):
login(driver, username="Admin", password="Password")
add_new_employee(driver,first_name, last_name)
new_employee_added(driver)
logout(driver)
The problem is in this function:
def new_employee_added(driver):
global id
driver.find_element_by_css_selector("#menu_pim_viewPimModule").click()
el = len(driver.find_elements_by_link_text("%s" % id))
while el < 1:
try:
driver.find_element_by_link_text("%s" % id).click()
except NoSuchElementException:
try:
for i in range(1, 50):
driver.find_element_by_link_text("%s" % i).click()
except NoSuchElementException:
return False
loop should try find element on 1st page, if no go to 2nd and check there, but it seems like it tries find on 1st page then run that piece of loop :
`for i in range(1, 50):
driver.find_element_by_link_text("%s" % i).click()`
and clicking pages not trying to find employee

Your main loop should iterate through all pages and break once you find required id. In your code you're trying to find id on the first page and if it absent on the first page you just iterate through other pages without executing search for required id.
As it's hard to provide you a good solution without actual html source below lines are kind of pseudo code:
for i in range(1, 50):
try:
# search for required element
driver.find_element_by_link_text("%s" % id).click()
break
except NoSuchElementException:
# go to next page as there is no required element on current page
driver.find_element_by_link_text("%s" % i).click()
P.S. It seem that there are no reasons to use global in your function, you can simply add a parameter to your new_employee_added as new_employee_added(driver, id) and call it with appropriate value.
P.P.S. Do not use "id" as a variable name as id() is a Python built-in function

Related

How to get data from Airbnb with selenium

I am trying to web-scrape Airbnb with selenium. However it's been a HUGE impossible mission.
First, I create a drive, where the argument "executable_path" is where my chromedriver is installed.
driver = webdriver.Chrome(executable_path=r'C:\directory\directory\directory\chromedriver.exe')
Secondly, I do the other stuffs:
driver.get('https://www.airbnb.com.br/')
a = driver.find_element(By.CLASS_NAME, "cqtsvk7 dir dir-ltr")
a.click()
a.send_keys('Poland')
Here I received the error: NoSuchWindowException: Message: no such window: target window already closed from unknown error: web view not found
Moreover, when I create the variables to store the html elements, it doesn't work as well:
title = driver.find_elements(By.CLASS_NAME, 'a-size-base-plus a-color-base a-text-normal')
place = driver.find_elements(By.ID, 'title_49247685')
city = driver.find_elements(By.CLASS_NAME, 'f15liw5s s1cjsi4j dir dir-ltr')
price = driver.find_elements(By.CLASS_NAME, 'p11pu8yw dir dir-ltr')
Please, someone could help me? How can I get the place, city and price of all of my query of place to travel in airbnb? (I know how to store all in a pandas df, my problem is the use of selenium. Those "get_elements" seem not to work properly in airbnb.

I received the error: NoSuchWindowException: Message: no such window: target window already closed from unknown error: web view not found
Which line is raising this error? I don't see anything in your snippets that could be causing it, but is there anything in your code [before the included snippet], or some external factor that could be causing the automated window to get closed? You could see if any of the answers to this helps you with the issue, especially if you're using .switch_to.window anywhere in your code.
Searching
(You should include screenshots or better descriptions of the fields you are targeting, especially when the issue is that you're having difficulty targeting them.)
Secondly, I do the other stuffs:
driver.get('https://www.airbnb.com.br/')
a = driver.find_element(By.CLASS_NAME, "cqtsvk7 dir dir-ltr")
want that selenium search for me the country where I want to extract the data (Poland, in this case)
If you mean that you're trying to enter "Poland" into this input field, then the class cqtsvk7 in cqtsvk7 dir dir-ltr appears to change. The id attribute might be more reliable; but also, it seems like you need to click on the search area to make the input interactable; and after entering "Poland" you also have to click on the search icon and wait to load the results.
# from selenium.webdriver.support.ui import WebDriverWait
def search_airbnb(search_for, browsr, wait_timeout=5):
wait_til = WebDriverWait(browsr, wait_timeout).until
browsr.get('https://www.airbnb.com.br/')
wait_til(EC.element_to_be_clickable(
(By.CSS_SELECTOR, 'div[data-testid="little-search"]')))
search_area = browsr.find_element(
By.CSS_SELECTOR, 'div[data-testid="little-search"]')
search_area.click()
print('CLICKED search_area')
wait_til(EC.visibility_of_all_elements_located(
(By.ID, "bigsearch-query-location-input")))
a = browsr.find_element(By.ID, "bigsearch-query-location-input")
a.send_keys(search_for)
print(f'ENTERED "{search_for}"')
wait_til(EC.element_to_be_clickable((By.CSS_SELECTOR,
'button[data-testid="structured-search-input-search-button"]')))
search_btn = browsr.find_element(By.CSS_SELECTOR,
'button[data-testid="structured-search-input-search-button"]')
search_btn.click()
print('CLICKED search_btn')
searchFor = 'Poland'
search_airbnb(searchFor, driver) # , 15) # adjust wait_timeout if necessary
Notice that for the clicked elements, I used By.CSS_SELECTOR; if unfamiliar with CSS selectors, you can consult this reference. You can also use By.XPATH in these cases; this XPath cheatsheet might help then.
Scraping Results
How can I get the place, city and price of all of my query of place to travel in airbnb?
Again, you can use CSS selectors [or XPaths] as they're quite versatile. If you use a function like
def select_get(elem, sel='', tAttr='innerText', defaultVal=None, isv=False):
try:
el = elem.find_element(By.CSS_SELECTOR, sel) if sel else elem
rVal = el.get_attribute(tAttr)
if isinstance(rVal, str): rVal = rVal.strip()
return defaultVal if rVal is None else rVal
except Exception as e:
if isv: print(f'failed to get "{tAttr}" from "{sel}"\n', type(e), e)
return defaultVal
then even if a certain element or attribute is missing in any of the cards, it'll just fill in with defaultVal and all the other cards will still be scraped instead of raising an error and crashing the whole program.
You can get a list of dictionaries in listings by looping through the result cards with list comprehension like
listings = [{
'name': select_get(el, 'meta[itemprop="name"]', 'content'), # SAME TEXT AS
# 'title_sub': select_get(el, 'div[id^="title_"]+div+div>span'),
'city_title': select_get(el, 'div[id^="title_"]'),
'beds': select_get(el, 'div[id^="title_"]+div+div+div>span'),
'dates': select_get(el, 'div[id^="title_"]+div+div+div+div>span'),
'price': select_get(el, 'div[id^="title_"]+div+div+div+div+div div+span'),
'rating': select_get(el, 'div[id^="title_"]~span[aria-label]', 'aria-label')
# 'url': select_get(el, 'meta[itemprop="url"]', 'content', defaultVal='').split('?')[0],
} for el in driver.find_elements(
By.CSS_SELECTOR, 'div[itemprop="itemListElement"]' ## RESULT CARD SELECTOR
)]
Dealing with Pagination
If you wanted to scrape from multiple pages, you can loop through them. [You can also use while True (instead of a for loop as below) for unlimited pages, but I feel like it's safer like this, even if you set an absurdly high limit like maxPages=5000 or something; either way, it should break out of the loop once it rreaches the last page.]
maxPages = 50 # adjust as preferred
wait = WebDriverWait(browsr, 3) # adjust timeout as necessary
listings, addedIds = [], []
isFirstPage = True
for pgi in range(maxPages):
prevLen = len(listings) # just for printing progress
## wait to load all the cards ##
try:
wait.until(EC.visibility_of_all_elements_located(
(By.CSS_SELECTOR, 'div[itemprop="itemListElement"]')))
except Exception as e:
print(f'[{pgi}] Failed to load listings', type(e), e)
continue # losing one loop for additional wait time
## check current page number according to driver ##
try:
pgNum = driver.find_element(
By.CSS_SELECTOR, 'button[aria-current="page"]'
).text.strip() if not isFirstPage else '1'
except Exception as e:
print('Failed to find pgNum', type(e), e)
pgNum = f'?{pgi+1}?'
## collect listings ##
pgListings = [{
'listing_id': select_get(
el, 'div[role="group"]>a[target^="listing_"]', 'target',
defaultVal='').replace('listing_', '', 1).strip(),
# 'position': 'pg_' + str(pgNum) + '-pos_' + select_get(
# el, 'meta[itemprop="position"]', 'content', defaultVal=''),
'name': select_get(el, 'meta[itemprop="name"]', 'content'),
#####################################################
### INCLUDE ALL THE key-value pairs THAT YOU WANT ###
#####################################################
} for el in driver.find_elements(
By.CSS_SELECTOR, 'div[itemprop="itemListElement"]'
)]
## [ only checks for duplicates against listings frm previous pages ] ##
listings += [pgl for pgl in pgListings if pgl['listing_id'] not in addedIds]
addedIds += [l['listing_id'] for l in pgListings]
### [OR] check for duplicates within the same page as well ###
## for pgl in pgListings:
## if pgl['listing_id'] not in addedIds:
## listings.append(pgl)
## addedIds.append(addedIds)
print(f'[{pgi}] extracted', len(listings)-prevLen,
f'listings [of {len(pgListings)} total] from page', pgNum)
## got to next page ##
nxtPg = driver.find_elements(By.CSS_SELECTOR, 'a[aria-label="Próximo"]')
if not nxtPg:
print(f'No more next page [{len(listings)} listings so far]\n')
break ### [OR] START AGAIN FROM page1 WITH:
## try: _, isFirstPage = search_airbnb(searchFor, driver), True
## except Exception as e: print('Failed to search again', type(e), e)
## continue
### bc airbnb doesn't show all results even across all pages
### so you can get a few more every re-scrape [but not many - less than 5 per page]
try: _, isFirstPage = nxtPg[0].click(), False
except Exception as e: print('Failed to click next', type(e), e)
dMsg = f'[reduced from {len(addedIds)} after removing duplicates]'
print('extracted', len(listings), 'listings with', dMsg)
[listing_id seems to be the easiest way to ensure that only unique listings are collected. You can also form a link to that listing like f'https://www.airbnb.com.br/rooms/{listing_id}'.]
Combining with Old Data [Load & Save]
If you want to save to CSV and also load previous from the same file with old and new data combined without duplicates, you can do some thing like
# import pandas as pd
# import os
fileName = 'pol_airbnb.csv'
maxPages = 50
try:
listings = pd.read_csv(fileName).to_dict('records')
addedIds = [str(l['listing_id']).strip() for l in listings]
print(f'loaded {len(listings)} previously extracted listings')
except Exception as e:
print('failed to load previous data', type(e), e)
listings, addedIds = [], []
#################################################
# for pgi... ## LOOP THROUGH PAGES AS ABOVE #####
#################################################
dMsg = f'[reduced from {len(addedIds)} after removing duplicates]'
print('extracted', len(listings), 'listings with', dMsg)
pd.DataFrame(listings).set_index('listing_id').to_csv(fileName)
print('saved to', os.path.abspath(fileName))
Note that keeping the old data might mean that some the listings are no longer available.
View pol_airbnb.csv for my results with maxPages=999 and searching again instead of break-ing in if not nxtPg.....

How to loop through indeed job pages using selenium

I am trying to make a selenium python script to collect data from each job in an indeed job search. I can easily get the data from the first and second page. The problem I am running into is while looping through the pages, the script only clicks the next page and the previous page, in that order. Going from page 1 -> 2 -> 1 -> 2 -> ect. I know it is doing this because both the next and previous button have the same class name. So when I redeclare the webelement variable when the page uploads, it hits the previous button because that is the first location of the class in the stack. I tried making it always click the next button by using the xpath, but I still run into the same errors. I would inspect the next button element, and copy the full xpath. my code is below, I am using python 3.7.9 and pip version 21.2.4
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
import time
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
HTTPS = "https://"
# hard coded data to test
siteDomain = "indeed.com"
jobSearch = "Software Developer"
locationSearch = "Richmond, VA"
listOfJobs = []
def if_exists_by_id(id):
try:
driver.find_element_by_id(id)
except NoSuchElementException:
return False
return True
def if_exists_by_class_name(class_name):
try:
driver.find_element_by_class_name(class_name)
except NoSuchElementException:
return False
return True
def if_exists_by_xpath(xpath):
try:
driver.find_element_by_xpath(xpath)
except NoSuchElementException:
return False
return True
def removeSpaces(strArray):
newjobCounter = 0
jobCounter = 0
for i, word in enumerate(strArray):
jobCounter += 1
if strArray[i].__contains__("\n"):
strArray[i] = strArray[i].replace("\n", " ")
if strArray[i].__contains__("new"):
newjobCounter += 1
print(strArray[i] + "\n")
if newjobCounter == 0:
print("Unfortunately, there are no new jobs for this search")
else:
print("With " + str(newjobCounter) + " out of " + str(jobCounter) + " new jobs!")
return strArray
try:
# Goes to Site
driver.get(HTTPS + siteDomain)
# obtains access to elements from website
searchJob = driver.find_element_by_name("q")
searchLocation = driver.find_element_by_name("l")
# clear text field
searchJob.send_keys(Keys.CONTROL, "a", Keys.BACK_SPACE)
searchLocation.send_keys(Keys.CONTROL, "a", Keys.BACK_SPACE)
# inputs values into website elements
searchJob.send_keys(jobSearch)
searchLocation.send_keys(locationSearch)
# presses button to search
searchLocation.send_keys(Keys.RETURN)
# Begin looping through pages
pageList = driver.find_element_by_class_name("pagination")
page = pageList.find_elements_by_tag_name("li")
numPages = 0
for i,x in enumerate(page):
time.sleep(1)
# checks for popup, if there is popup, exit out and sleep
if if_exists_by_id("popover-x"):
driver.find_element_by_id("popover-x").click()
time.sleep(1)
# increment page counter variabke
numPages += 1
# obtains data in class name value
jobCards = driver.find_elements_by_class_name("jobCard_mainContent")
# prints number of jobs returned
print(str(len(jobCards)) + " jobs in: " + locationSearch)
# inserts each job into list of jobs array
# commented out to make debugging easier
# for jobCard in jobCards:
# listOfJobs.append(jobCard.text)
# supposed to click the next page, but keeps alternating
# between next page and previous page
driver.find_element_by_class_name("np").click()
print("On page number: " + str(numPages))
# print(removeSpaces(listOfJobs))
except ValueError:
print(ValueError)
finally:
driver.quit()
Any help will be greatly appreciated, also if I am implementing bad coding practices in the structure of the script please let me know as I am trying to learn as much as possible! :)

I have tested your code.. the thing is there are 2 'np' class elements when we go to the 2nd page.. what you can do is for first time use find_element_by_class_name('np') and for all the other time use find_elements_by_class_name('np')[1] that will select the next button.. and you can use find_elements_by_class_name('np')[0] for the previous button if needed. Here is the code!
if i == 0:
driver.find_element_by_class_name("np").click()
else:
driver.find_elements_by_class_name("np")[1].click()
Just replace the line driver.find_element_by_class_name("np").click() with the code snippet above.. I have tested it and it worked like a charm.
Also i am not as experienced as the other devs here.. But i am glad if i could help you. (This is my first answer ever on stackoverflow)

Find Value Using Selenium using a Variable that Contains String

I am trying to open up several URL's (because they contain data I want to append to a list). I have a logic saying "if amount in icl_dollar_amount_l" then run the rest of the code. However, I want the script to only run the rest of the code on that specific amount in the variable "amount".
Example:
selenium opens up X amount of links and sees ['144,827.95', '5,199,024.87', '130,710.67'] in icl_dollar_amount_l but i want it to skip '144,827.95', '5,199,024.87' and only get the information for '130,710.67' which is in the 'amount' variable already.
Actual results:
Its getting webscaping information for amount '144,827.95' only and not even going to '5,199,024.87', '130,710.67'. I only want it getting webscaping information for '130,710.67' because my amount variable has this as the only amount.
print(icl_dollar_amount_l)
['144,827.95', '5,199,024.87', '130,710.67']
print(amount)
'130,710.67'
file2.py
def scrapeBOAWebsite(url,fcg_subject_l, gp_subject_l):
from ICL_Awk_Checker import rps_amount_l2
icl_dollar_amount_l = []
amount_ack_missing_l = []
file_total_l = []
body_l = []
for link in url:
print(link)
browser = webdriver.Chrome(options=options,
executable_path=r'\\TEST\user$\TEST\Documents\driver\chromedriver.exe')
# if 'P2 Cust ID 908554 File' in fcg_subject:
browser.get(link)
username = browser.find_element_by_name("dialog:username").get_attribute('value')
submit = browser.find_element_by_xpath("//*[#id='dialog:continueButton']").click()
body = browser.find_element_by_xpath("//*[contains(text(), 'Total:')]").text
body_l.append(body)
icl_dollar_amount = re.findall('(?:[\£\$\€]{1}[,\d]+.?\d*)', body)[0].split('$', 1)[1]
icl_dollar_amount_l.append(icl_dollar_amount)
if not missing_amount:
logging.info("List is empty")
print("List is empty")
count = 0
for amount in missing_amount:
if amount in icl_dollar_amount_l:
body = body_l[count]
get_file_total = re.findall('(?:[\£\$\€]{1}[,\d]+.?\d*)', body)[0].split('$', 1)[1]
file_total_l.append(get_file_total)
return icl_dollar_amount_l, file_date_l, company_id_l, client_id_l, customer_name_l, file_name_l, file_total_l, \
item_count_l, file_status_l, amount_ack_missing_l

I don't know if I understand problem but this
if amount in icl_dollar_amount_l:
doesn't give information on which position is '130,710.67' in icl_dollar_amount_l and you need also
count = icl_dollar_amount_l.index(amount)
for amount in missing_amount:
if amount in icl_dollar_amount_l:
count = icl_dollar_amount_l.index(amount)
body = body_l[count]
But it will works if you expect only one amount on list icl_dollar_amount_l. For more elements you would have to use rather for-loop and check every element separatelly
for amount in missing_amount:
for count, item in enumerate(icl_dollar_amount_l)
if amount == item :
body = body_l[count]
But frankly I don't know why you don't check it in first loop for link in url: when you have direct access to icl_dollar_amount and body

Scroll Height Return "None" in Selenium: [ arguments[0].scrollHeight ]

Working on the python Bot with selenium, and infinite scrolling in dialog box isn't working due to "None" return from the "arguments[0].scrollHeight"
dialogBx=driver.find_element_by_xpath("//div[#role='dialog']/div[2]")
print(dialogBx) #<selenium.webdriver.remote.webelement.WebElement (session="fcec89cc11fa5fa5eaf29a8efa9989f9", element="31bfd470-de78-XXXX-XXXX-ac1ffa6224c4")>
print(type(dialogBx)) #<class 'selenium.webdriver.remote.webelement.WebElement'>
sleep(5)
last_height=driver.execute_script("arguments[0].scrollHeight",dialogBx);
print("Height : ",last_height) #None
I needed last height to compare, please suggest solution.

Ok, to answer your question, since you are inside a dialog we should focus on it. When you execute : last_height=driver.execute_script("arguments[0].scrollHeight",dialogBx); I believe you are executing that in the main page or in a wrong div (not 100% sure). Either way I took a diferente approach, we are going to select the last <li> item currently available in the dialog and scroll down to its position, this will force the dialog to update. I will extract a code from the full code you will see below:
last_li_item = driver.find_element_by_xpath('/html/body/div[4]/div/div[2]/ul/div/li[{p}]'.format(p=start_pos))
last_li_item.location_once_scrolled_into_view
We first select the last list item and then the property location_once_scrolled_into_view. This property will scroll our dialog down to our last item and then it will load more items. start_pos is just the position in the list of <li> element we have available. ie.: <div><li></li><li></li><li></li></div> start_pos=2 which is the last li item starting from 0. I put this variable name because it is inside a for loop which is watching the changes of li items inside the div, you will get it once you see the full code.
In other hand to execute this,simply change the parameters at the top and execute the test function test(). If you are already log in to instagram you can just run get_list_of_followers().
Note: Using this function use a Follower class that is also in this code. You can remove if you wish but you will need to modify the function.
IMPORTANT:
When you execute this program, the dialog box items will be increasing until there is no more items to load, so a TODO would be remove the element you have already processed otherwise I believe performace will get slower when you start hitting big numbers!
Let me know if you need any other explanation. Now the code:
import time
from selenium import webdriver
from selenium.webdriver.remote.webelement import WebElement
# instagram url as our base
base_url = "https://www.instagram.com"
# =====================MODIFY THESE TO YOUR NEED=========
# the user we wish to get the followers from
base_user = "/nasa/"
# how much do you wish to sleep to wait for loading (seconds)
sleep_time = 3
# True will attempt login with facebook, False with instagram
login_with_facebook = True
# Credentials here
username = "YOUR_USERNAME"
password = "YOUR_PASSWORD"
# How many users do you wish to retrieve? -1 = all or n>0
get_users = 10
#==========================================================
# This is the div that contains all the followers info not the dialog box itself
dialog_box_xpath = '/html/body/div[4]/div/div[2]/ul/div'
total_followers_xpath = '/html/body/div[1]/section/main/div/header/section/ul/li[2]/a/span'
followers_button_xpath = '/html/body/div[1]/section/main/div/header/section/ul/li[2]/a'
insta_username_xpath = '/html/body/div[5]/div/div[2]/div[2]/div/div/div[1]/div/form/div[2]/div/label/input'
insta_pwd_xpath = '/html/body/div[5]/div/div[2]/div[2]/div/div/div[1]/div/form/div[3]/div/label/input'
insta_login_button_xpath = '/html/body/div[5]/div/div[2]/div[2]/div/div/div[1]/div/form/div[4]/button'
insta_fb_login_button_xpath = '/html/body/div[5]/div/div[2]/div[2]/div/div/div[1]/div/form/div[6]/button'
fb_username_xpath = '/html/body/div[1]/div[3]/div[1]/div/div/div[2]/div[1]/form/div/div[1]/input'
fb_pwd_xpath = '/html/body/div[1]/div[3]/div[1]/div/div/div[2]/div[1]/form/div/div[2]/input'
fb_login_button_xpath = '/html/body/div[1]/div[3]/div[1]/div/div/div[2]/div[1]/form/div/div[3]/button'
u_path = fb_username_xpath if login_with_facebook else insta_username_xpath
p_path = fb_pwd_xpath if login_with_facebook else insta_pwd_xpath
lb_path = fb_login_button_xpath if login_with_facebook else insta_login_button_xpath
# Simple class of a follower, you dont actually need this but for explanation is ok.
class Follower:
def __init__(self, user_name, href):
self.username = user_name
self.href = href
#property
def get_username(self):
return self.username
#property
def get_href(self):
return self.href
def __repr__(self):
return self.username
def test():
base_user_path = base_url + base_user
driver = webdriver.Chrome()
driver.get(base_user_path)
# click the followers button and will ask for login
driver.find_element_by_xpath(followers_button_xpath).click()
time.sleep(sleep_time)
# now we decide if we will login with facebook or instagram
if login_with_facebook:
driver.find_element_by_xpath(insta_fb_login_button_xpath).click()
time.sleep(sleep_time)
username_input = driver.find_element_by_xpath(u_path)
username_input.send_keys(username)
password_input = driver.find_element_by_xpath(p_path)
password_input.send_keys(password)
driver.find_element_by_xpath(lb_path).click()
# We need to wait a little longer for the page to load so. Feel free to change this to your needs.
time.sleep(10)
# click the followers button again
driver.find_element_by_xpath(followers_button_xpath).click()
time.sleep(sleep_time)
# now we get the list of followers from the dialog box. This function will return a list of follower objects.
followers: list[Follower] = get_list_of_followers(driver, dialog_box_xpath, get_users)
# close the driver we do not need it anymore.
driver.close()
for follower in followers:
print(follower, follower.get_href)
def get_list_of_followers(driver, d_xpath=dialog_box_xpath, get_items=10):
"""
Get a list of followers from instagram
:param driver: driver instance
:param d_xpath: dialog box xpath. By default it gets the global parameter but you can change it
:param get_items: how many items do you wish to obtain? -1 = Try to get all of them. Any positive number will be
= the number of followers to obtain
:return: list of follower objects
"""
# getting the dialog content element
dialog_box: WebElement = driver.find_element_by_xpath(d_xpath)
# getting all the list items (<li></li>) inside the dialog box.
dialog_content: list[WebElement] = dialog_box.find_elements_by_tag_name("li")
# Get the total number of followers. since we get a string we need to convert to int by int(<str>)
total_followers = int(driver.find_element_by_xpath('/html/body/div[1]/section/main/div/header/section/ul/li['
'2]/a/span').get_attribute("title").replace(".",""))
# how many items we have without scrolling down?
li_items = len(dialog_content)
# We are trying to get n elements (n=get_items variable). Now we need to check if there are enough followers to
# retrieve from if not we will get the max quantity of following. This applies only if n is >=0. If -1 then the
# total amount of followers is n
if get_items == -1:
get_items = total_followers
elif -1 < get_items <= total_followers:
# no need to change anything, git is ok to work with get_items
pass
else:
# if it not -1 and not between 0 and total followers then we raise an error
raise IndexError
# You can start from greater than 0 but that will give you a shorter list of followers than what you wish if
# there is not enough followers available. i.e: total_followers = 10, get_items=10, start_from=1. This will only
# return 9 followers not 10 even if get_items is 10.
return generate_followers(0, get_items, total_followers, dialog_box, driver)
def generate_followers(start_pos, get_items, total_followers, dialog_box_element: WebElement, driver):
"""
Generate followers based on the parameters
:param start_pos: index of where to start getting the followers from
:param get_items: total items to get
:param total_followers = total number of followers
:param dialog_box_element: dialog box to get the list items count
:param driver: driver object
:return: followers list
"""
if -1 < start_pos < total_followers:
# we want to count items from our current position until the last element available without scrolling. We do
# it this way so when we scroll down, the list items will be greater but we will start generating followers
# from our last current position not from the beginning!
first = dialog_box_element.find_element_by_xpath("./li[{pos}]".format(pos=start_pos+1))
li_items = dialog_box_element.find_elements_by_xpath("./li[position()={pos}][last("
")]/following-sibling::li"
.format(pos=(start_pos + 1)))
li_items.insert(0, first)
print("Generating followers from position position: {pos} with {li_count} list items"
.format(pos=(start_pos+1), li_count=len(li_items)))
followers = []
for i in range(len(li_items)):
anchors = li_items[i].find_elements_by_tag_name("a")
anchor = anchors[0] if len(anchors) ==1 else anchors[1]
follower = Follower(anchor.text, anchor.get_attribute(
"href"))
followers.append(follower)
get_items -= 1
start_pos += 1
print("Follower {f} added to the list".format(f=follower))
# we break the loop if our starting position is greater than 0 or if get_items has reached 0 (means if we
# request 10 items we got them all no need to continue
if start_pos >= total_followers or get_items == 0:
print("finished")
return followers
print("finished loop, executing scroll down...")
last_li_item = driver.find_element_by_xpath('/html/body/div[4]/div/div[2]/ul/div/li[{p}]'.format(p=start_pos))
last_li_item.location_once_scrolled_into_view
time.sleep(sleep_time)
followers.extend(generate_followers(start_pos, get_items, total_followers, dialog_box_element, driver))
return followers
else:
raise IndexError

Selenium (Python) - Checking parameter next to the LABEL and insert the value OR select an option

IF would like to create a generic code (by using Selenium) which will look for the label, and the find next to the label input(OR select) tag and insert the value.
Main function:
for l in label:
try:
xpathInput = "//label[contains(.,'{}')]/following::input".format(l)
checkXpathInput, pathInput= check_xpath(browser,xpathInput)
if checkXpathInput is True:
pathInput.clear()
pathInput.send_keys("\b{}".format(value))
break
for op in option:
xpathSelect = "//label[contains(.,'{}')]/following::select/option[text()='{}']".format(l,op)
checkXpathSelect, pathSelect= check_xpath(browser,xpathSelect)
if checkXpathSelect is True:
pathSelect.click()
break
except:
print("Can't match: {}".format(l))
Path checker:
def check_xpath(browser,xpath):
try:
path = browser.find_element_by_xpath(xpath)
except NoSuchElementException:
return False
return True , path
What is the current issue?
I need that if LABEL will be for example TITLE the code will check that there is NO input tag next to "Title" label and then he will go and check is there is the select tag next to the label "Title" and e.t.c....
In my current, he will find the label "Title" and then will fill in value to the first next input (which is incorrect as "Title" is using the SELECT tag)

I'd exploit the fact that find_elements_by_xpath returns a list of found elements and empty lists are falsy. So you wouldn't need a try/except and a function which returns bool or tuple values (which is not the most optimal behavior).
It would be easier to give a good answer with some html source example but I assume what you'd like to do is this:
def handle_label_inputs(label, value):
# if there is a such label, this result won't be empty
found_labels = driver.find_elements_by_xpath('//label[contains(.,"{}")]'.format(label))
# if the list is not empty
if found_labels:
l = found_labels[0]
# any options with the given value as text
following_select_option_values = l.find_elements_by_xpath('./following::select//option[text()="{}"]'.format(value))
# any inputs next to the label
following_inputs = l.find_elements_by_xpath('./following::input')
# did we find an option?
if following_select_option_values:
following_select_option_values[0].click()
# or is there an input?
elif following_inputs:
in_field = following_inputs[0]
in_field.clear()
in_field.send_keys(value)
else:
print("Can't match: {} - {}".format(label, value))
driver.get('http://thenewcode.com/166/HTML-Forms-Drop-down-Menus')
handle_label_inputs('State / Province / Territory', 'California')
I don't know how tidy the page you are work with but if it is well done, then your label should have a for="something" attribute. If that is the case then you can simply find the label-related-element and find out if its tag is input (or select):
related_element_if_done_properly = driver.find_elements_by_xpath('//*[#id="{}"]'.format(label_element.get_attribute("for")))
if related_element_if_done_properly:
your_element = related_element_if_done_properly[0]
is_input = your_element.tagname.lower() == "input"
else:
print('Ohnoes')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Looking element on few pages with loop “while” Python, Selenium - python

Related

How to get data from Airbnb with selenium

How to loop through indeed job pages using selenium

Find Value Using Selenium using a Variable that Contains String

Scroll Height Return "None" in Selenium: [ arguments[0].scrollHeight ]

Selenium (Python) - Checking parameter next to the LABEL and insert the value OR select an option

Categories

Resources