I am working on amazon web scraping script in python3 so I used selenium but I got this debug
webdriver.chrome()
TypeError: 'module' object is not callable
I saw solutions to change (chrome to Chrome) but I got this debug also
FileNotFoundError: [WinError 2] The system cannot find the file specified
this is my code
# -*- coding: utf-8 -*-
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.chrome()
driver.get('https://www.amazon.com/international-sales-offers/b/?ie=UTF8&node=15529609011&ref_=nav_navm_intl_deal_btn')
res = driver.execute_script("return document.documentElement.outerHTML")
driver.quit()
soup = BeautifulSoup(res , 'lxml')
box= soup.find('div',{'class':'a-row padCenterContainer widgetBorder'})
products=box.find_all('div',{'class':'a-section a-spacing-none tallCellView gridColumn5 singleCell'})
for details in products:
name= details.find('span',{'class':'a-declarative'}).text
link= details.find('a',{'class':'a-size-base a-link-normal dealTitleTwoLine singleCellTitle autoHeight'}).get('href')
print(name,link)
I believe it should be Chrome(), not chrome(). Try:
from selenium import webdriver
driver = webdriver.Chrome()
You can pass the path to your Chromedriver as well, set executable_path to the location where your chromedriver is located (path to chromedriver.exe or, for non-Windows users it's just called chromedriver):
driver = webdriver.Chrome(executable_path='C:/path/to/chromedriver.exe')
Related
I'm currently using selenium and BeautifulSoup to scrape a website but I'm running into two major issues, first of all, I can't get Chrome to launch in headless mode and it says there are multiple unexpected ends of inputs (photo of said errors). The other problem I have is that I keep getting an error on the line that contains "html.parser" saying that a 'str' is not a callable object. Any advice on these issues would be greatly appreciated thank you.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import urllib.request
import lxml
import html5lib
import time
from bs4 import BeautifulSoup
#config options
options = Options()
options.headless = True
# Set the URL you want to webscrape from
url = 'https://tokcount.com/?user=mrsam993'
# Connect to the URL
browser = webdriver.Chrome(options=options, executable_path='D:\chromedriver') #chrome_options=options
browser.get(url)
# Parse HTML and save to BeautifulSoup object
soup = BeautifulSoup(browser.page_source(), "html.parser")
browser.quit()
# for i in range(10):
links = soup.findAll('span', class_= 'odometer-value')
print(links)
As for the headless you need to call this way:
from selenium import webdriver
options = webdriver.ChromeOptions()
...
the page_source is not a method. So you need to remove the brackets:
browser.page_source
I'm trying to scrape a client-side-rendered web page using Selenium.
I started by creating a virtual environment and installing the required dependencies. Then I downloaded the Chrome Driver for my Chrome version and pasted it in the project's folder.
import os
import time
from bs4 import BeautifulSoup
from selenium import webdriver
driver_path = os.path.abspath('') + '/chromedriver'
driver = webdriver.Chrome(executable_path = driver_path)
print(' > Getting web page...')
url = 'https://www.someurl.com'
driver.get(url)
print(' > Sleeping...')
time.sleep(10)
print(' > Done. Html below:')
page_html = driver.page_source
print(page_source)
The browser open and the page loads. But after the program wakes up I get NameError: name 'page_source' is not defined. Any clues about what I might be doing wrong?
One thing that got me concerned is that I'm using 64-bit Windows, but the only driver available on Chrome's webpage was 32-bit. Anyways, it seems that this isn't a problem since the browser and the page are rendered correctly by the script.
Typo from print.
print(page_html)
Instead of
print(page_source)
page_source is never initialized in your code.
I'm trying to start webscraping, but whenever I try to acces an URL I get an error message.
My code is the following:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome()
driver.get('www.python.org')
This opens a new Chrome window, but that's all it does.
The error message I get is the following:
InvalidArgumentException: invalid argument
(Session info: chrome=80.0.3987.149)
I work with Spyder, which I get from Anaconda, and my chromedriver.exe is in the both in the Anaconda3 folder and the Spyder folder.
Thanks in advance!
This URL is not Valid , it has to start with http://
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
try:
driver = webdriver.Chrome()
driver.get('http://www.python.org')
except Exception as e:
print(e)
finally:
if driver is not None :
driver.close()
Please include executable path and try below solution:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome(executable_path=r"C:\New folder\chromedriver.exe")
driver.get("https://www.python.org")
Your url is not correct www.python.org , correct url : https://www.python.org
I have created simple basic automation script in Python using Selenium..
Getting unwanted exception.
File:-
import pandas as pd
from pandas import ExcelWriter
from selenium import webdriver
import selenium as sel
# Data = pd.read_excel(r"C:\Users\Admin\PycharmProjects\Web_Automation_Form_Filling\challenge.xlsx",sheet_name="Sheet1")
# browser = webdriver.Chrome(executable_path=r'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe')
browser = webdriver.Chrome("C:\Program Files (x86)\Google\Chrome\Application\chrome.exe");
browser.sleep(1000);
browser.get("http://www.python.org")
Error log:-
C:\Users\Admin\PycharmProjects\Web_Automation_Form_Filling\venv\Scripts\python.exe C:/Users/Admin/PycharmProjects/Web_Automation_Form_Filling/venv/Web_Auto_Filling.py
Traceback (most recent call last):
File "C:/Users/Admin/PycharmProjects/Web_Automation_Form_Filling/venv/Web_Auto_Filling.py", line 10, in <module>
browser = webdriver.Chrome("C:\Program Files (x86)\Google\Chrome\Application\chrome.exe");
File "C:\Users\Admin\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 73, in __init__
self.service.start()
File "C:\Users\Admin\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\common\service.py", line 98, in start
self.assert_process_still_running()
File "C:\Users\Admin\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\common\service.py", line 109, in assert_process_still_running
raise WebDriverException(
selenium.common.exceptions.WebDriverException: Message: Service C:\Program Files (x86)\Google\Chrome\Application\chrome.exe unexpectedly exited. Status code was: 0
Process finished with exit code 1
Any suggestion will be appreciated..
Thanks...
instead of chrome application try providing the chrome driver instead
more information on the site : https://sites.google.com/a/chromium.org/chromedriver/getting-started
Sample code :
import time
from selenium import webdriver
driver = webdriver.Chrome('/path/to/chromedriver') # Optional argument, if not specified will search path.
driver.get('http://www.google.com/');
time.sleep(5) # Let the user actually see something!
search_box = driver.find_element_by_name('q')
search_box.send_keys('ChromeDriver')
search_box.submit()
time.sleep(5) # Let the user actually see something!
driver.quit()
Download the ChromeDriver binary for your platform under the downloads section of this site
reference link to download : chrome driver
This code should work (better to use firefox for selenium):
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
# noinspection PyUnresolvedReferences
import wget
DesiredCapabilities.PHANTOMJS[
"phantomjs.page.settings.userAgent"
] = "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:16.0) Gecko/20121026 Firefox/16.0"
if browser == "firefox":
driver = webdriver.Firefox()
else:
driver = webdriver.PhantomJS(
CFG_phantomjs
) # r"D:/_devs/webserver/phantomjs-1.9.8/phantomjs.exe"
driver.get("https://tourwebsite")
username = driver.find_element_by_id("login_field")
password = driver.find_element_by_id("password")
username.clear()
The problem here in your codes is that you are passing chrome executable path rather than passing the path to chromedriver which is a different executable.
An appropriate version of chromedriver can be downloaded from here according to your Chrome version.
For more info, you can refer to the chromedriver documentation here.
And your final code should be something like:
from selenium import webdriver
path = 'C:/Users/Avinash/Downloads/chromedriver.exe'
driver = webdriver.Chrome(path)
driver.get('http://www.google.com/');
#..here what ever you want to do with page here
driver.quit()
I am trying to use selenium webdriver in centos to test my webpage.
But,I got an error message when I execute the process.
Can someone help me?
from pyvirtualdisplay import Display
from selenium import webdriver
display=Display(visible=0, size=(320, 240)).start()
path = "/usr/bin/firefox"
driver= webdriver.Firefox(path)
driver.get("www.google.com")
html_source = driver.page_source
print html_source
driver.close()
And here is the error message:
File "/var/www/test/test.py", line 19, in <module>
driver= webdriver.Firefox(path)
File "/usr/local/lib/python2.7/site-packages/selenium/webdriver/firefox/webdriver.py", line 54, in __init__
self.NATIVE_EVENTS_ALLOWED and self.profile.native_events_enabled)
AttributeError: 'str' object has no attribute 'native_events_enabled'
Pretty sure your problem has to do with the fact that your trying to pass the path to your firefox binary as a string, instead as a "FirefoxBinary" object, furthermore the first argument to Firefox() is a FirefoxProfile(). Doing the following should resolve the issue.
from pyvirtualdisplay import Display
from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
display=Display(visible=0, size=(320, 240)).start()
binary = FirefoxBinary("/usr/bin/firefox")
driver= webdriver.Firefox(firefox_binary=binary)
driver.get("www.google.com")
html_source = driver.page_source
print html_source
driver.close()
see this post for an answer to a very similar problem.