Description:
I am trying to make a job ad parser which works on the indeed.com site (I am using python + selenium + chromedriver)
I am able to login with my facebook credentials and then, I am redirected to the default site which is hu.indeed.com (as I am living in Hungary).
I would like to search for jobs available in London, therefore get selenium driver to change to the uk.indeed.com site.
Then I get selenium to locate and input my job search criteria in the position input field and the locality as well in the locality field. Up untill now everything works smoothly.
The problem:
After pressing the search button I am able to see the results window, but after a very short time I am automatically redirected to the hu.indeed.com site. As you can see from my code below, I have no such commands, I have no clue whatsoever why and how this is happening. My print statements show that driver.current_url changes at a moment in time and I dont understand why is that happening and how could I prevent that.
Could you please let me know why does the url change and how could I prevent that?
Code:
driver.get("https://uk.indeed.com/")
time.sleep(1)
job_type_input=driver.find_element_by_xpath('//*[#id="text-input-what"]')
search_text=f"{jobs[0]} {extra_info}"
job_type_input.send_keys(search_text)
time.sleep(1)
print(f"1 print:{driver.current_url}") #<--- 1. print
job_location_input=driver.find_element_by_xpath('//*[#id="text-input-where"]')
job_location_input.send_keys(cities[0])
search_button=driver.find_element_by_xpath('//*[#id="jobsearch"]/button')
search_button.click()
time.sleep(5)
print(f"2 print:{driver.current_url}") #<--- 2. print
print(f"3 print:{driver.current_url}") #<--- 3. print
try:
moaic_element=driver.find_element_by_id("mosaic-provider-jobcards")
html=mosaic_element.get_attribute('innerHTML')
print("success")
except:
print("error in try")
print(f"4 print:{driver.current_url}") #<--- 4. print
Output:
1 print:https://uk.indeed.com/
2 print:https://hu.indeed.com/
3 print:https://hu.indeed.com/
error in try
4 print:https://hu.indeed.com/
I am the one who wrote the original post and found I found the solution to this problem. As Max Daroshchanka mentioned in his answer, the problem was claused by indeed.com as it reloaded due to some plugin (or something). Therefore my solution was to use the input field only after some time passed (using time.sleep(2))
Related
So I have a selenium script that will automatically enter a series of numbers into a website, and the website will redirect the user to another website based off if the numbers match a PIN. However, the browser takes a short time to redirect the user, in which the next line of code would have already run and returned an error.
I was thinking something like this would work but it doesn't, I don't know why.
def checkElement():
try:
xpath = '//*[#id="name"]'
print("Page is ready!")
except TimeoutException:
print("failed")
checkElement()
I believe that you are looking for WebDriverWait. You can add specific condition in it. Please find the sample code below.
first_result = wait.until(presence_of_element_located((By.XPATH, "//*[#id='name']")))
So, my goal was to write a script, that scrapes users, that used a specific hashtag on Instagram and writes their accounts into a .txt-file and it mostly works!
My problem is, that even though some accounts posted plural pictures, my script does show each name only once. Any idea, how it might be able to kind of count them or get my script to not delete doublets?
I looked for everything but can't find a solution.
This is my part of writing code:
def generate_initial_information_txt(initial_information):
initial_information_txt = open("initial_information", "w+")
for user in initial_information:
initial_information_txt.write(user + "\n")
This is the part to find the name:
for user in range(30):
el = self.driver.find_element_by_xpath('/html/body/div[4]/div[2]/div/article/header/div[2]/div[1]/div[1]')
el = el.find_element_by_tag_name('a')
time.sleep(2)
profile = el.get_attribute('href')
open_recent_posts_set.add(profile)
time.sleep(2)
next_button = self.driver.find_element_by_xpath('/html/body/div[4]/div[1]/div/div/a[2]')
next_button.click()
time.sleep(2)
THE URL would be
https://instagram.com/explore/tags/hansaviertel_ms
So I'm starting to scrape the "Recent" Posts and e.g. the "Hansaforum" posted like 5 of the first 6. If I insert a range of 6 it just throws out a .txt-file with two accounts, not 5 times the "Hansaforum". And I'd like to get the amount of times in any kind of way. –
Thanks :)
I am trying to validate that a value changes to the correct text and if it does not to refresh the page and check again for up to set time.
I have tried while-loops, if statements and nested variations of both with no success. I am not even sure how to format it as this point.
element = driver.find_element_by_xpath('xpath')
While True:
if element contains textA
break
else if element contains textB
driver.refresh()
else
error
Something along those lines. Ignore any syntax errors, I am just trying to get the idea across
I have also tried using EC and By with no luck
Edit: Adding some details
So what I have is a table. I am inserting a new row with no problems. Then I need to check that one of the column values of the new row gets updated from 'new' to 'old' which usually takes about anywhere from 30secs to 2mins. This is all viewable from a web ui. I need to refresh the page in order to see the value change. I wish I had some more detailed code or error to post along with it but honestly I am just beginning to learn Selenium
Can you please try the following :
while True:
try:
driver.find_element_by_xpath('xpath'):
except NoSuchElementException:
driver.refresh
else:
print("Text found")
break
Note: I suggest to create text-based XPath to avoid an extra line of code to get and compare text.
I have the following code;
if united_states_hidden is not None:
print("Country removed successfully")
time.sleep(10)
print("type(united_states_hidden) = ")
print(type(united_states_hidden))
print("united_states_hidden.text = " + united_states_hidden.text)
print("united_states_hidden.id = " + united_states_hidden.id)
print(united_states_hidden.is_displayed())
print(united_states_hidden.is_enabled())
united_states_hidden.click()
The outputs to the console are as follows:
Country removed successfully
type(united_states_hidden) =
<class 'selenium.webdriver.remote.webelement.WebElement'>
united_states_hidden.text = United States
united_states_hidden.id = ccea7858-6a0b-4aa8-afd5-72f75636fa44
True
True
As far as I am aware this should work as it is a clickable web element, however, no click is delivered to the element. Any help would be appreciated as I can't seem to find anything anywhere else. The element I am attempting to click is within a selector box.
Seems like a valid WebElement given you can print all of the info. like you did in your example.
It's possible the element located is not the element that is meant to be clicked, so perhaps the click is succeeding, but not really clicking anything.
You could try using a Javascript click and see if that helps:
driver.execute_script("arguments[0].click();", united_states_hidden)
If this does not work for you, we may need to see the HTML on the page and the locator strategy you are using to find united_states_hidden so that we can proceed.
Hello all…I am trying to use Selenium and PhantomJS to do a headless browsing. It’s to login a forum.
What I do is, recorded the login steps in FireFox then edit it to fit PhantomJS, as below:
driver = webdriver.PhantomJS()
base_url = "http://6atxfootball.vbulletin.net/"
verificationErrors = []
accept_next_alert = True
driver.get(base_url)
driver.find_element_by_id("lnkLoginSignupMenu").click()
driver.find_element_by_id("idLoginUserName").clear()
driver.find_element_by_id("idLoginUserName").send_keys("USERNAME_HERE")
driver.find_element_by_id("idLoginPassword").clear()
driver.find_element_by_id("idLoginPassword").send_keys("PASSWORD_HERE ")
driver.find_element_by_id("idLoginBtn").click()
It failed and problem lies in this line, and the error is “NoSuchElementException:” etc.
driver.find_element_by_id("idLoginUserName").clear()
does this mean there’s no such an element when it’s opened by PhantomJS()? Or in a nut shell, this is not a way to do headless browsing?
Thanks.
p.s. so I tried to save the content also by PhantomJS() as a file and see what’s happening:
driver = webdriver.PhantomJS()
base_url = "http://6atxfootball.vbulletin.net/"
verificationErrors = []
accept_next_alert = True
driver.get(base_url)
content=driver.page_source
cleaner=clean.Cleaner()
content=cleaner.clean_html(content)
with open('6atxfootball.html','w') as f:
f.write(content.encode('utf-8'))
doc=LH.fromstring(content)
the “6atxfootball.html” shows there isn’t any form to fill.
I guess it's because it's actually inside an iframe, so it would not be surprising that PhantomJS have some difficulties to find your element. You should try to directly log in to the url of the iframe, that is http://6atxfootball.vbulletin.net/auth/login-form
The NoSuchElementException means that the element wasn't found on that page. It could mean 2 things:
There is really no such element on the page
The page is still loading and you check too early
Many times, the main page will load but parts of it will take longer time to load.
The way to avoid that is setting proper timeouts. Again, You have 2 options:
Explicit wait - where you define to wait for a certain condition to occur before proceeding further in the code.
Implicit way - tell WebDriver to poll the DOM for a certain amount of time when trying to find an element or elements if they are not immediately available.
You can read more about that here.