selenium with python web crawler

selenium with python web crawler - python

I want to screen scrape a web site having multiple pages. These pages are loaded dynamically without changing the URL. Hence I'm using selenium to screen scrape it. But I'm getting an exception for this simple program.
import re
from contextlib import closing
from selenium.webdriver import Firefox
url="http://www.samsung.com/in/consumer/mobile-phone/mobile-phone/smartphone/"
with closing(Firefox()) as browser:
n = 2
link = browser.find_element_by_link_text(str(n))
link.click()
#web_page=browser.page_source
#print type(web_page)
Error is as follows
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: u'Unable to locate element: {"method":"link text","selector":"2"}' ; Stacktrace: Method FirefoxDriver.prototype.findElementInternal_ threw an error in file:///tmp/tmpMJeeTr/extensions/fxdriver#googlecode.com/components/driver_component.js
Is it the problem with the url given or with the firefox browser.
Would be great help if someone helped me.

I think your main issue is that the page itself takes a while to load, and you are immediately trying to access that link (which likely hasn't yet rendered, hence the stack trace). One thing you can try is using an Implicit Wait 1 with your browser, which will tell the browser to wait for a certain period of time for elements to appear before timing out. In your case, you could try the following, which would wait for up to 10 seconds while polling the DOM for a particular item (in this case, the link text 2):
browser.implicitly_wait(10)
n = 2
link = browser.find_element_by_link_text(str(n))
link.click()
#web_page=browser.page_source
#print type(web_page)

I'm developing a python module which might cover your (or another's) use case:
https://github.com/cmwslw/selenium-crawler
It converts recorded selenium scripts to crawling functions, thus avoiding writing any of the above code. It works great with pages that load content dynamically. I hope someone finds this useful.

Related

selenium-driver :looking a solution for locating and clicking a button when i open google page

python selenium driver:
No thanks
when i open a google page the, there is a small window asking if i want to sign in or not. i want to click on "No thanks" button, which is as shown above.
i have tried these methods so far, but i keep getting errors. None of the following is working.
#self.driver.find_element(By.CSS_SELECTOR, 'button.M6CB1c')
#button=self.driver.find_elements(By.XPATH, '//button')
#abc=self.driver.find_elements(By.NAME, 'ZUkOIc').click()
#self.driver.find_element(By.TAG_NAME, 'button').click()
error message for the 1st line of code:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".M6CB1c.rr4y5c"}

selenium.common.exceptions.NoSuchElement is caused when the element is not in the page at the current time.
TLDR;
What you're looking for is Explicit Wait in selenium. You need to use WebDriverWait with expected condition element_to_be_clickable.
When we load a page, modern pages tend to load javascript that can often manipulate DOM (html page objects). The proper way to handle this is to wait for the page or the required element to load, and then try to locate it.
The selenium waits section explains this very well with an example.

You should try this :
driver.find_element(By.XPATH, '//button[#id="W0wltc"])

Python - Selenium xpath

I'm wondering why the code works sometimes and sometimes not. My IDE gives me this debugging error:
Message: no such element: Unable to locate element:
{"method":"xpath","selector":"/html/body/div[4]/div/div/div[2]"}
(Session info: chrome=90.0.4430.93)
def find_followers(self):
self.driver.get(URL+ACCOUNT)
follow = self.driver.find_element_by_xpath('/html/body/div[1]/section/main/div/header/section/ul/li[3]/a')
follow.click()
time.sleep(10)
modal = self.driver.find_element_by_xpath('/html/body/div[6]/div/div/div[2]')
for i in range(10):
self.driver.execute_script('arguments[0].scrollTop = arguments[0].scrollHeight', modal)
time.sleep(13)
I'm trying to make a script which goes to Instagram and opening up a Instagram accounts followers. The script goes well till this error. I've checked the XPath and it is surely right. I tried the script for a few days and it was working, but now when I tried again it dosen't. I'm new with Python and want to learn why this happen and how to solve it.

You need to close the cookie popup and login to instagram in order to see the followers list. I'd recommend creating a second function called "initiate_instagram" that does that.
You could also login manually, because of 2 factor authentication.

Can't find element with Selenium even though I know the element exists

Okay, so I'm trying to find
<span class="totalcount">171</span>
on https://boston.craigslist.org/search/sss?query=food+stamps&sort=rel&search_distance=200&postal=01841
with
pagelement = driver.find_element_by_class_name('totalcount')
but for some reason I keep getting the following error
selenium.common.exceptions.NoSuchElementException: Message: Unable to find element with css selector == .totalcount
For reference, I'm using internet explorer 11 with Selenium because my boss requested I switch over to that from Firefox. Could that be what is causing the problem? (Before someone asks, I know it isn't because the page hasn't loaded yet, I added a wait specifically to deal with that.)

I did work for me, but im using find_elements as there are more than one element with the same locator strategy and also using chrome
not sure if it will help you, but you can use the below code for a better approach
Update: Tired with IE and its working
from selenium.webdriver import Ie
driver = Ie('path to IE driver')
driver.get('https://boston.craigslist.org/search/sss?
query=food+stamps&sort=rel&search_distance=200&postal=01841')
total_count = [x.text for x in driver.find_elements_by_class_name('totalcount')]
print(total_count)
['170', '170']

Why does trying to click with selenium brings up "ElementNotInteractableException"?

I'm trying to click on the webpage "https://2018.navalny.com/hq/arkhangelsk/" from the website's main page. However, I get this error
selenium.common.exceptions.ElementNotInteractableException: Message:
There's nothing after "Message:"
My code
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
browser = webdriver.Firefox()
browser.get('https://2018.navalny.com/')
time.sleep(5)
linkElem = browser.find_element_by_xpath("//a[contains(#href,'arkhangelsk')]")
type(linkElem)
linkElem.click()
I think xpath is necessary for me because, ultimately, my goal is to click not on a single link but on 80 links on this webpage. I've already managed to print all the relevant links using this :
driver.find_elements_by_xpath("//a[contains(#href,'hq')]")
However, for starters, I'm trying to make it click at least a single link.
Thanks for your help,

The best way to figure out issues like this, is to look at the page source using developer tools of your preferred browser. For instance, when I go to this page and look at HTML tab of the Firebug, and look for //a[contains(#href,'arkhangelsk')] I see this:
So the link is located within div, which is currently not visible (in fact entire sub-section starting from div with id="hqList" is hidden). Selenium will not allow you to click on invisible elements, although it will allow you to inspect them. Hence getting element works, clicking on it - does not.
What you do with it depends on what your expectations are. In this particular case it looks like you need to click on <label class="branches-map__toggle-label" for="branchesToggle">Список</label> to get that link visible. So add this:
browser.find_element_by_link_text("Список").click();
after that you can click on any links in the list.

Python - Automating form entry on a .aspx website and storing output in a file (using Selenium?)

I've just started to learn coding this month and started with Python. I would like to automate a simple task (my first project) - visit a company's career website, retrieve all the jobs posted for the day and store them in a file. So this is what I would like to do, in sequence:
Go to http://www.nov.com/careers/jobsearch.aspx
Select the option - 25 Jobs per page
Select the date option - Today
Click on Search for Jobs
Store results in a file (just the job titles)
I looked around and found that Selenium is the best way to go about handling .aspx pages.
I have done steps 1-4 using Selenium. However, there are two issues:
I do not want the browser opening up. I just need the output saved to a file.
Even if I am ok with the browser popping up, using the Python code (exported from Selenium as Web Driver) on IDLE (i have windows OS) results in errors. When I run the Python code, the browser opens up and the link is loaded. But none of the form selections happen and I get the foll error message (link below), before the browser closes. So what does the error message mean?
http://i.stack.imgur.com/lmcDz.png
Any help/guidance will be appreciated...Thanks!

First about the error you've got, I should say that according to the expression NoSuchElementException and the message Unable to locate element, the selector you provided for the web-driver is wrong and web-driver can't find the element.
Well, since you did not post your code and I can't open the link of the website you entered, I can just give you a sample code and I will count as much details as I can.
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("url")
number_option = driver.find_element_by_id("id_for_25_option_indicator")
number_option.click()
date_option = driver.find_element_by_id("id_for_today_option_indicator")
date_option.click()
search_button = driver.find_element_by_id("id_for_search_button")
search_button.click()
all_results = driver.find_elements_by_xpath("some_xpath_that_is_common_between_all_job_results")
result_file = open("result_file.txt", "w")
for result in all_results:
result_file.write(result.text + "\n")
driver.close()
result_file.close()
Since you said you just started to learn coding recently, I think I have to give some explanations:
I recommend you to use driver.find_element_by_id in all cases that elements have ID property. It's more robust.
Instead of result.text, you can use result.get_attribute("value") or result.get_attribute("innerHTML").
That's all came into my mind by now; but it's better if you post your code and we see what is wrong with that. Additionally, it would be great if you gave me a new link to the website, so I can add more details to the code; your current link is broken.

Concerning the first issue, you can simply use a headless browser. This is possible with Chrome as well as Firefox.
Check Grey Li's answer here for example: Python - Firefox Headless
from selenium import webdriver
options = webdriver.FirefoxOptions()
options.add_argument('headless')
driver = webdriver.Firefox(options=options)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

selenium with python web crawler - python

Related

selenium-driver :looking a solution for locating and clicking a button when i open google page

Python - Selenium xpath

Can't find element with Selenium even though I know the element exists

Why does trying to click with selenium brings up "ElementNotInteractableException"?

Python - Automating form entry on a .aspx website and storing output in a file (using Selenium?)

Categories

Resources