I m using this code for scrapping some data from the link https://website.grader.com/results/www.dubizzle.com. Because the actual script with the tags i want to extract loads after a 15 seconds of load, someone recommended me selemuim to introduce a delay in the code. Hence I use this code
The code is as below
#!/usr/bin/python
import urllib
import time
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup
from dateutil.parser import parse
from datetime import timedelta
import MySQLdb
import re
import pdb
import sys
import string
driver = webdriver.Firefox()
driver.get('https://website.grader.com/results/dubizzle.com')
time.sleep(25)
html = driver.page_source
soup = BeautifulSoup(html)
# print soup
Sizeofweb=""
try:
Sizeofweb= soup.find('span', {'data-reactid': ".0.0.3.0.0.3.$0.1.1.0"}).text
print Sizeofweb.get_text().encode("utf-8")
except StandardError as e:
converted_date="Error was {0}".format(e)
print converted_date
The part of the html which i am extracting is as below
Snap: https://www.dropbox.com/s/7dwbaiyizwa36m6/5.PNG?dl=0
<div class="result-value" data-reactid=".0.0.3.0.0.3.$0.1.1">
<span data-reactid=".0.0.3.0.0.3.$0.1.1.0">1.1</span>
<span class="result-value-unit" data-reactid=".0.0.3.0.0.3.$0.1.1.1">MB</span>
</div>
I installed the geckodriver by downloading it from here and extracting it to /home directory and then giving it a path export PATH=$PATH:/home/geckodriver as recommended by someone named #Ahn Smith here
Now when i run the program, it gives this error
Traceback (most recent call last):
File "ahmed.py", line 17, in <module>
driver = webdriver.Firefox()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/webdriver.py", line 140, in __init__
self.service.start()
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/common/service.py", line 74, in start
stdout=self.log_file, stderr=self.log_file)
File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
OSError: [Errno 20] Not a directory
There are two ways to point Selenium to the appropriate webdriver. You can pass it as a parameter:
driver = webdriver.Firefox(executable_path='/path/to/geckodriver')
Or you can create a local shell variable containing the PATH:
$ export PATH=$PATH:/path/to/
I think your problem is that you're exporting a PATH variable to the geckodriver and not to the folder containing it.
Related
I'm doing a school project and I am trying to scrape data from websites. Basically I'm following a tutorial in edureka - https://www.edureka.co/blog/web-scraping-with-python/#demo
The sample code is like this
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")
products=[] #List to store name of the product
prices=[] #List to store price of the product
ratings=[] #List to store rating of the product
driver.get("""https://www.flipkart.com/laptops/~buyback-guarantee-on-laptops-/pr?sid=6bo%2Cb5g&amp;amp;amp;amp;amp;amp;amp;uniq""")
content = driver.page_source
soup = BeautifulSoup(content)
for a in soup.findAll('a',href=True, attrs={'class':'_31qSD5'}):
name=a.find('div', attrs={'class':'_3wU53n'})
price=a.find('div', attrs={'class':'_1vC4OE _2rQ-NK'})
rating=a.find('div', attrs={'class':'hGSR34 _2beYZw'})
products.append(name.text)
prices.append(price.text)
ratings.append(rating.text)
df = pd.DataFrame({'Product Name':products,'Price':prices,'Rating':ratings})
df.to_csv('products.csv', index=False, encoding='utf-8')
I simplly copied and pasted the sample code to Python to see how it works, and this is what I got
PS D:\COSC2625_Team_Blue> & C:/Users/meowg/AppData/Local/Programs/Python/Python310/python.exe d:/COSC2625_Team_Blue/test.py
d:\COSC2625_Team_Blue\test.py:5: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")
Traceback (most recent call last):
File "C:\Users\meowg\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\common\service.py", line 71, in start
self.process = subprocess.Popen(cmd, env=self.env,
File "C:\Users\meowg\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 969, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\meowg\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 1438, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "d:\COSC2625_Team_Blue\test.py", line 5, in <module>
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")
File "C:\Users\meowg\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\chrome\webdriver.py", line 69, in __init__
super().__init__(DesiredCapabilities.CHROME['browserName'], "goog",
File "C:\Users\meowg\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\chromium\webdriver.py", line 89, in __init__
self.service.start()
File "C:\Users\meowg\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\common\service.py", line 81, in start
raise WebDriverException(
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://chromedriver.chromium.org/home
Does anyone know what went wrong? I have no idea what happened.
Looks like you just didn't download the file that was included in the tutorial, by the location of /usr/lib/chromium-browser/chromedriver. We can't really help you here, you just have to download the chromedriver.
I would recommend you use python playwright instead of selenium, as it is just a more modern library, with a slightly smaller learning curve, in my opinion, but that's just a recommendation.
I'm trying to use selenium for a python web scraper but when I try to run the program I get the following error:
/usr/local/bin/python3 /Users/xxx/Documents/Python/hello.py
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 72, in start
self.process = subprocess.Popen(cmd, env=self.env,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/subprocess.py", line 854, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/subprocess.py", line 1702, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/Users/xxx/Documents/Python/chromedriver.exe'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/xxx/Documents/Python/hello.py", line 9, in <module>
wd = webdriver.Chrome(executable_path=DRIVER_PATH)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
self.service.start()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 81, in start
raise WebDriverException(
selenium.common.exceptions.WebDriverException: Message: 'chromedriver.exe' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
Here is the python code:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
from selenium import webdriver
DRIVER_PATH = '/Users/xxx/Documents/Python/chromedriver.exe'
wd = webdriver.Chrome(executable_path=DRIVER_PATH)
I think the problem is that I'm not specifying the file path in the variable DRIVER_PATH properly but I'm not sure
I am using a Mac
You need to update DRIVER_PATH to include your root directory, which is usually C:\:
DRIVER_PATH = 'C:/Users/xxx/Documents/Python/chromedriver.exe'
Alternatively, you can follow this tutorial to add the path to containing folder of chromedriver.exe (usually chromedriver_win32 folder) to your Path environment variable:
https://docs.telerik.com/teststudio/features/test-runners/add-path-environment-variables
I would try this out (Just adding the 'r'):
wd = webdriver.Chrome(executable_path=r'/Users/xxx/Documents/Python/chromedriver.exe')
if you think it's the filepath then have a go with checking:
import os.path
os.path.exists(DRIVER_PATH)
Also, Beautifulsoup is used will with urllib2
https://www.pythonforbeginners.com/beautifulsoup/beautifulsoup-4-python
import urllib2
url = "https://www.URL.com"
content = urllib2.urlopen(url).read()
soup = BeautifulSoup(content)
You have a mistake in the name of the file.
"chomedriver.exe" is for windows.
If you use macOS and chromedriver for Mac, then the file name should be "chomedriver" without ".exe".
I had the same problem, but this solved it.
I want to use selenium to scrape off some website. I can't access the website via my own internet connection, so I need to use browsec mozilla addon for that.
I am unable to launch firefox with selenium with the add-on enabled.
Here is what I have tried:
import selenium
from selenium import webdriver
url = "http://url"
profile = webdriver.FirefoxProfile()
profile.add_extension('browsec#browsec.com.xpi')
#profile.add_extension("C:\Users\urs\AppData\Roaming\Mozilla\Firefox\Profiles\abc.default\extensions\browsec#browsec.com.xpi")
driver = webdriver.Firefox(firefox_profile=profile)
if __name__ == "__main__":
driver.get(url)
driver.wait(5)
driver.quit()
I have tried putting the extension in the same directory where my script is and using the following
profile.add_extension('browsec#browsec.com.xpi')
which gives me this error when I run:
Traceback (most recent call last): File
"C:\Python36\lib\site-packages\selenium\webdriver\firefox\firefox_profile
.py", line 346, in _addon_details
with open(os.path.join(addon_path, 'install.rdf'), 'r') as f: FileNotFoundError: [Errno 2] No such file or directory:
'C:\Users\Usr\AppD
ata\Local\Temp\tmp0hny31u3.browsec#browsec.com.xpi\install.rdf'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "test.py", line 7, in
profile.add_extension("browsec#browsec.com.xpi") File "C:\Python36\lib\site-packages\selenium\webdriver\firefox\firefox_profile
.py", line 95, in add_extension
self._install_extension(extension) File "C:\Python36\lib\site-packages\selenium\webdriver\firefox\firefox_profile
.py", line 274, in _install_extension
addon_details = self._addon_details(addon) File "C:\Python36\lib\site-packages\selenium\webdriver\firefox\firefox_profile
.py", line 351, in _addon_details
raise AddonFormatError(str(e), sys.exc_info()[2]) selenium.webdriver.firefox.firefox_profile.AddonFormatError: ("[Errno
2] No such file or directory:
'C:\\Users\\Usr\\AppData\\Local\\Temp\\tmp0hn
y31u3.browsec#browsec.com.xpi\\install.rdf'", )
I also tried giving the path to the extension:
profile.add_extension("C:\Users\urs\AppData\Roaming\Mozilla\Firefox\Profiles\abc.default\extensions\browsec#browsec.com.xpi")
And I ran into this error:
profile.add_extension("C:\Users\Hassan\AppData\Roaming\Mozilla\Firefox\Profi
les\n5jwlj9l.default\extensions\browsec#browsec.com.xpi")
^ SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in positio n 2-3: truncated
\UXXXXXXXX escape
Formatting the path string like below doesn't help either.
profile.add_extension(r"C:\Users\urs\AppData\Roaming\Mozilla\Firefox\Profiles\abc.default\extensions\browsec#browsec.com.xpi")
I get the following:
Traceback (most recent call last): File "test.py", line 7, in
profile.add_extension(r"C:\Users\Hassan\AppData\Roaming\Mozilla\Firefox\Prof
iles\n5jwlj9l.default\extensions\browsec#browsec.com.xpi") File
"C:\Python36\lib\site-packages\selenium\webdriver\firefox\firefox_profile
.py", line 95, in add_extension
self._install_extension(extension) File "C:\Python36\lib\site-packages\selenium\webdriver\firefox\firefox_profile
.py", line 274, in _install_extension
addon_details = self._addon_details(addon) File "C:\Python36\lib\site-packages\selenium\webdriver\firefox\firefox_profile
.py", line 351, in _addon_details
raise AddonFormatError(str(e), sys.exc_info()[2]) selenium.webdriver.firefox.firefox_profile.AddonFormatError: ("[Errno
2] No such file or directory:
'C:\\Users\\usr\\AppData\\Local\\Temp\\tmp1he
0fym_.browsec#browsec.com.xpi\\install.rdf'", )
How do I configure selenium to run firefox with browsec enabled by default?
I found this article rather helpful.
Instead of adding the extension to the profile, you install it after the browser has been created:
from selenium import webdriver
driver = webdriver.Firefox()
# This installs adblock plus
driver.install_addon("/home/your_username/coding/Project/seleniumTest/adblock.xpi", temporary=True)
driver.get('https://www.stackoverflow.com')
Be sure to add the .xpi to your project folder!
You can try to create profile on firefox browser like - On windows Run --> type
"firefox.exe -P"
It will open profile manager. Create new profile. Start firefox from that profile, add plugins. And use that same profile with code..Sometime it worked for me..
Sorry for my English))
Most likely you are using the new version of Firefox (Quantum - from the 57th version inclusive). In newer versions of Firefox, the extension metadata is not stored in the install.rdf file, but in the manifest.json file. Selenium does not know this yet (in version 3.11, and learns only in 3.14). Therefore, when trying to connect an extension, it looks for habit install.rdf.
Here the author wrote a class that slightly changes the connection function of the extension, and instead of install.rdf, selenium looks for metadata in manifest.json.
What you need to do:
# Add Import
import json
import os
import sys
from selenium.webdriver.firefox.firefox_profile import AddonFormatError
# Add class
class FirefoxProfileWithWebExtensionSupport(webdriver.FirefoxProfile):
def _addon_details(self, addon_path):
try:
return super()._addon_details(addon_path)
except AddonFormatError:
try:
with open(os.path.join(addon_path, 'manifest.json'), 'r') as f:
manifest = json.load(f)
return {
'id': manifest['applications']['gecko']['id'],
'version': manifest['version'],
'name': manifest['name'],
'unpack': False,
}
except (IOError, KeyError) as e:
raise AddonFormatError(str(e), sys.exc_info()[2])
# Declare Firefox_profile written class
profile = FirefoxProfileWithWebExtensionSupport()
Further as usual)))
Good luck)))
I have code of this
#!/usr/bin/env python
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import NoSuchElementException
import time
driver = webdriver.Chrome()
And the error occurs at the last line : driver = webdriver.Chrome()
It says this :
Traceback (most recent call last):
File "/Users/Edison/Desktop/untitled folder/huamai_jacket1.py", line 9, in <module>
driver = webdriver.Chrome()
File "/Users/Edison/anaconda2/lib/python2.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 61, in __init__
log_path=service_log_path)
File "/Users/Edison/anaconda2/lib/python2.7/site-packages/selenium/webdriver/chrome/service.py", line 42, in __init__
start_error_message="Please see https://sites.google.com/a/chromium.org/chromedriver/home")
File "/Users/Edison/anaconda2/lib/python2.7/site-packages/selenium/webdriver/common/service.py", line 42, in __init__
self.port = utils.free_port()
File "/Users/Edison/anaconda2/lib/python2.7/site-packages/selenium/webdriver/common/utils.py", line 36, in free_port
free_socket.bind(('0.0.0.0', 0))
File "/Users/Edison/anaconda2/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
socket.error: [Errno 49] Can't assign requested address
Exception AttributeError: "'Service' object has no attribute 'log_file'" in <bound method Service.__del__ of <selenium.webdriver.chrome.service.Service object at 0x1049a93d0>> ignored
[Finished in 0.1s with exit code 1]
[shell_cmd: python -u "/Users/Edison/Desktop/untitled folder/huamai_jacket1.py"]
[dir: /Users/Edison/Desktop/untitled folder]
[path: /usr/bin:/bin:/usr/sbin:/sbin]
Happens this morning and the thing is right after this happens, sometimes when i access website even like google.com, the ERR_ADDRESS_INVALID page shows up often and i need to keep refreshing to get the page back to the regular site.
Is this because that the driver.Chrome() does not include the PATH of Chromedriver location? I used my script for a week before today though and everything worked perfectly.
Please help :(
So if you do not pass a path as a part of the "webdriver.Chrome('path/to/chromedriver')" is will search along the PATH environment variable to find it. So first thing I would do is verify where my chromedriver is. If it is not in $PATH then you can put it in usr/bin and try running again. Or you can pass the path to your chromedriver executable in the .Chrome().
I am attempting to run a simple program on an Ubuntu 16.04 instance using Python 3.5. The program is below;
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.PhantomJS("p/phantomjs")
driver.get("http://www.bbc.co.uk")
s = BeautifulSoup(driver.page_source, "lxml")
print(s.findAll("a"))
try:
driver.close()
except AttributeError:
pass
All the modules are installed correctly. However, when I run the program, I receive the following errors:
Traceback (most recent call last):
File "t.py", line 4, in <module>
driver = webdriver.PhantomJS("p/phantomjs")
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/phantomjs/webdriver.py", line 52, in __init__
self.service.start()
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/common/service.py", line 64, in start
stdout=self.log_file, stderr=self.log_file)
File "/usr/lib/python3.5/subprocess.py", line 947, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.5/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg)
OSError: [Errno 8] Exec format error
Exception ignored in: <bound method Service.__del__ of <selenium.webdriver.phantomjs.service.Service object at 0x7fb05cd964a8>>
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/common/service.py", line 163, in __del__
self.stop()
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/common/service.py", line 135, in stop
if self.process is None:
AttributeError: 'Service' object has no attribute 'process'
It seems as though it is an issue with Selenium rather than with PhantomJS. However, I have no idea how to make the program work properly.
In other questions similar to this, the issue seems to be with closing the headless instance. However, this error is received as soon as I try to instantiate PhantomJS.
How can this be fixed?
If p folder (as you've mentioned) located in the same directory as your script, then you might need to start your code with something like
from bs4 import BeautifulSoup
from selenium import webdriver
import os
path_to_phantom_js = os.path.dirname(__file__) + '/p/phantomjs'
driver = webdriver.PhantomJS(path_to_phantom_js)
P.S. If it not works, tell me output of print(path_to_phantom_js)