Trying to grab location of job postings on website using python - python

I'm trying to grab the location tag on each job to filter them based on location as this option isn't available from
Seek Work From Home and have been using python with selenium.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import pandas as pd
driver = webdriver.Chrome("C:\\Program Files (x86)\\Google\\Chrome\\Application\\chromedriver.exe")
driver.get("https://www.seek.com.au/jobs?where=Work%20from%20home")
assert "SEEK" in driver.title
location = WebDriverWait(driver, 25).until(EC.visibility_of_all_elements_located((By.XPATH,
'("//*[#id=""app""]/div/div'
'/div[4]/div/div[3]/section'
'/div[2]/div/div[2]/div["'''
'"1]/div/div[2]/div/div[1]/'
'div[2]/article/div[1]/span'
'[2]/span/strong/span/span"'
')')))``
The WebDriverWait seems to timeout when trying to find the element that has the location as text (despite attempting crazy wait times)
Traceback (most recent call last):
File "C:/Users/meagl/Desktop/Python/grabjobs/grabjobs.py", line 13, in <module>
location = WebDriverWait(driver, 25).until(EC.visibility_of_all_elements_located((By.XPATH,
File "C:\Users\meagl\anaconda3\envs\Python\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
the XPATH that I am using is the one at the very top. (Currently showing as Sydney)
What is my next step here?

Its look like there is issue in your xPath. As i have used below code and it printed all location for all 20 jobs on page:
driver.get("https://www.seek.com.au/jobs?where=Work%20from%20home")
assert "SEEK" in driver.title
location = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[contains(text(),'location:')]")))
for loc in location:
print(loc.text)
Output
Note : You can play with string if you just want to get city name.

Whenever this kind of operation you are doing the locator should be selected carefully and here the xpaths used is not working. Using the xpath in the way //*[text()='location:'] will solve your issue.

Related

Selenium fails finding html element even though its visible

I wanted to program a selenium script with python which can write an input in an testbox but selenium doesnt find the element. (and i know the scipt looks real bad right now but please only concentrate on the error)
from selenium import webdriver
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
link = "https://kahoot.it/"
name = input("Name: ")
pin = str(input("PIN: "))
bots = int(input("Bots: "))
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
for i in range(bots):
time.sleep(1)
driver.execute_script('''window.open("https://kahoot.it/%22,%22_blank");''')
wait1 = wait.until(EC.visibility_of_element_located((By.XPATH, "/html/body/div\[1\]/div\[1\]/div/div/div/div\[3\]/div\[2\]/main/div/form/input")))
inputid = driver.find_element(By.NAME, 'game-Id')
inputid.send_keys(pin)
inputid.send_keys(Keys.ENTER)
time.sleep(5)
here is the console output:
Traceback (most recent call last):
File "c:\\Users\\Blackbird\\Documents\\programmieren\\kahootjoinbot\\main.py", line 19, in \<module\>
wait1 = wait.until(EC.visibility_of_element_located((By.XPATH, "/html/body/div\[1\]/div\[1\]/div/div/div/div\[3\]/div\[2\]/main/div/form/input")))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\\Python311\\Lib\\site-packages\\selenium\\webdriver\\support\\wait.py", line 95, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
I tried some other things like finding the element by Name or ID or css selector but it all didnt work.
With this should work:
from selenium import webdriver
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
name = input("Name: ")
pin = str(input("PIN: "))
bots = int(input("Bots: "))
driver = webdriver.Chrome()
for i in range(bots):
driver.execute_script('''window.open("https://kahoot.it/%22,%22_blank");''')
driver.switch_to.window(driver.window_handles[i+1])
wait = WebDriverWait(driver, 10)
input = wait.until(EC.visibility_of_element_located((By.XPATH, "//input[#name='gameId']")))
input.send_keys(name)
input.send_keys(Keys.ENTER)
time.sleep(5)
Why is your code not working?
Because you are using line driver.execute_script('''window.open("https://kahoot.it/%22,%22_blank");''').
That line is opening a new tab, then when you do your next line wait1 = wait.until(EC.visibility_of_element_located((By.XPATH, "/html/body/div\[1\]/div\[1\]/div/div/div/div\[3\]/div\[2\]/main/div/form/input"))) your are using the first tab, but you navigate in the second tab.
So, you are opening a second tab and trying to locate your element in your fist tab, that's why no matters what element you try to find in your first tab, because your first tab does not contain anything.
How to solve it? Navigate in your first tab using driver.get("https://kahoot.it/")
Or you can change to your new tabs using: driver.switch_to.window(driver.window_handles[i+1]) where [i+1] is the index of tabs you have. So first tab is 0 second is 1 and so on.
Advices:
Do not use full xpaths, in the moment something change in the page your xpath will probably not be valid anymore
You can wait for the element and save in a variable, as I did, then just use it for sendkeys, click etc.

How do I wait for this website to load using selenium in Python?

Currently I'm using beautifulSoup in my python web scraping project. However, in one of the pages I need to scrape, I need to interact with a javascript element. So I'm being forced to use selenium (which I'm not that familiar with).
This is my code so far:
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
opts = Options()
opts.add_argument('--headless')
seleniumDriver = Firefox(options=opts, executable_path = 'D:\Programs\Python\Scripts\geckodriver.exe')
seleniumDriver.get("https://www.thecompleteuniversityguide.co.uk/courses/details/computing-bsc/57997898")
driverWait = WebDriverWait(seleniumDriver, 10)
driverWait.until(EC.invisibility_of_element_located((By.ID, "mainBody")))
moduleButton = seleniumDriver.find_element_by_xpath("//div[#class='mdldv']")#.find_element_by_tag_name("span")
print("MODULE BUTTON:", moduleButton)
moduleButton.click()
seleniumDriver.close()
Currently, I'm getting a timeout error, however I'm certain that the mainBody ID element does exist.
(I don't know how to use the By class, so I have no idea how it will work).
Error Message:
Traceback (most recent call last):
File "D:/Web Scraping/selenium tests.py", line 12, in <module>
driverWait.until(EC.invisibility_of_element_located((By.ID, "mainBody")))
File "D:\Programs\Python\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
You are calling:
driverWait.until(EC.invisibility_of_element_located((By.ID, "mainBody")))
Per the doc, this will wait until the element is absent:
class invisibility_of_element_located(object):
""" An Expectation for checking that an element is either invisible or not
present on the DOM.
locator used to find the element
"""
The Timeout exception that was raised means that the element was found but was never removed from the DOM or never became invisible.
What you need it to wait until the element is found (part of the DOM). Use instead, presence_of_element_located
driverWait.until(presence_of_element_located((By.ID, "mainBody")))
The timeout exception will be raised if it is not found within the timeout requested when creating driverWait
(I don't know how to use the By class, so I have no idea how it will work)
The By is used under the hood when calling find_element_by_xpath/id/css_selector.
In your case, when you are using EC, you are giving the locator to use By.ID and its value. You can see it equal to find_element_by_id('yourValue')

I am scraping the following website which contains java script but getting an error

I am trying to scrape a website but when I try to run the program, I get the following error. Here is my code
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.expected_conditions import presence_of_element_located
driver = webdriver.Chrome(executable_path = '/home/danish-khan/webscraping/rgcrawler2/chromedriver')
driver.get('https://www.researchgate.net/institution/Islamia_College_Peshawar/department/Department_of_Computer_Science/members')
chrome_options = Options()
element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, '//*[#id="rgw9_5fac070727fc2"]/div[3]/h5/a]')))
print(element.text)`
Traceback (most recent call last):
File "resgt3.py", line 14, in <module>
element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, '//*[#id="rgw9_5fac070727fc2"]/div[3]/h5/a]')))
File "/home/danish-khan/miniconda3/lib/python3.7/site-packages/selenium/webdriver/support/wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, '//*[#id="rgw9_5fac070727fc2"]/div[3]/h5/a]')))
While page is loading your program waits 20 seconds for XPATH element '//*[#id="rgw9_5fac070727fc2"]/div[3]/h5/a]' to appear.
If the XPATH element doesn't appear after defined time you get a Timeout error. Which is a good thing, otherwise your program would stuck forever if the XPATH element won't appear at all.
I think you should double check whether the provided XPATH is correct, or that it doesn't change over time.

How do I fill out a form using Python and Selenium?

I am new to Python and wanted to use it for automatic login. I found https://automatetheboringstuff.com/chapter11/ and tried:
#! python3
from selenium import webdriver
browser = webdriver.Firefox()
type(browser)
browser.get('https://forum-studienstiftung.de/')
emailEl = browser.find_element_by_id(username)
Unfortunately, this leads to:
Traceback (most recent call last): File "", line 1, in
emailEl = browser.find_element_by_id(username) NameError:
name 'username' is not defined
According to the Firefox Developer Tools the correct ID is "username".
Wrap username in quotation marks. Right now you are passing in a variable called username which selenium is trying to match with an id on the page with the same value. Since the value is none, Selenium cannot find it hence the error.
The page you are trying to access takes time to load. You have to wait for the element to be visible before accessing it.
Try this:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
browser = webdriver.Firefox()
type(browser)
browser.get('https://forum-studienstiftung.de/')
emailEl = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.id, "username")))

Selenium timeoutexception with webdriver

First post on here and brand new to Python. I am trying to learn how to scrape data from a website. When you first load the website, a disclaimer window shows up and all I am trying to do is hit the accept button using the browser.find_element_by_id.
I am using the webdriverwait command to wait for the page to load prior to clicking the "Accept" button but I keep getting a Timeoutexception. Here is the code that I currently have:
from selenium import webdriver
#get the chrome webdriver path file
browser = webdriver.Chrome(executable_path=r"C:/Program Files (x86)/Google/Chrome/Application/chromedriver.exe")
browser.get('http://foreclosures.guilfordcountync.gov/')
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
#wait until element is loaded
wait = WebDriverWait(browser, 10)
wait.until(EC.presence_of_element_located((By.ID, "cmdAccept")))
element = browser.find_element_by_id("cmdAccept")
element.click()
Here is the error I keep getting:
Traceback (most recent call last):
File "C:/Users/Abbas/Desktop/Foreclosure_Scraping/Foreclosure_Scraping.py", line 33, in <module>
wait.until(EC.presence_of_element_located((By.ID, "cmdAccept")))
File "C:\Users\Abbas\AppData\Local\Programs\Python\Python37-32\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
I believe it has something to do with the calling out the ID of the button itself from the website but I honestly do not know. Any help is greatly appreciated.
Your attempts to locate the element are unsuccessful because of they are nested within an iframe. One must tell selenium to switch to the iframe that contains the desired element before attempting to click it or use it in any way. Try the following:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
#get the chrome webdriver path file
browser = webdriver.Chrome(executable_path=r"C:/Program Files (x86)/Google/Chrome/Application/chromedriver.exe")
browser.get('http://foreclosures.guilfordcountync.gov/')
browser.switch_to.frame(browser.find_element_by_name("ctl06"))
wait = WebDriverWait(browser, 10)
wait.until(EC.presence_of_element_located((By.ID, "cmdAccept")))
element = browser.find_element_by_id("cmdAccept")
element.click()

Categories