Webscraping with requests_html but it says a chromium file is missing - python

I trying to web scrape using requests-html but it returns an error saying there is a missing file even though I pip install requests-html and it said all req fulfilled. how do I get around this.
from requests_html import HTMLSession
import time
url = 'https://soundcloud.com/jujubucks'
s = HTMLSession()
r = s.get(url)
r.html.render()
songs = r.html.xpath('//*[#id="content"]/div/div[4]/div[1]/div/div[2]/div/div[2]', first=True)
print(songs)
this produces a sxstrace error.
OSError: [WinError 14001] The application has failed to start because its side-by-side
configuration is incorrect. Please see the application event log or use the command-line
sxstrace.exe tool for more detail
apparently this is the missing file according the event log but I dont know where to get it.
Activation context generation failed for "C:\Users\houst\AppData\Local\pyppeteer\pyppeteer\local-chromium\588429\chrome-win32\chrome.exe". Dependent Assembly 71.0.3542.0,language="*",type="win32",version="71.0.3542.0" could not be found. Please use sxstrace.exe for detailed diagnosis.

I came here with the same question, but the only answer didn't apply to me. My win10x64 PC has 5 versions of python, 4 installed via anaconda and python 3.10 installed via the microsoft store. Debugging the process in vscode using the MS store version... with pip install requests-html installed for that version of python only.
VScode stack trace showed that subprocess.py failed to launch a subprocess.
Windows event viewer showed a failed attempt to launch chrome.exe in:
C:\Users\username\AppData\Local\pyppeteer\pyppeteer\local-chromium\588429\chrome-win32
Windows search showed that chrome.exe - which was downloaded and extracted automatically the first time an attempt was made to call response.html.render() - was actually located at:
C:\Users\username\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\Local\pyppeteer\pyppeteer\local-chromium\588429\chrome-win32
As a work around, and although I've no idea why the issue occured, I moved the chrome-win32 directory to the location expected, and found that chrome ran the javascript on the page and returned html correctly.

requests_html depends upon pyppeteer but it seems your pypeteer has not installed chromium completely. Try installing chromium manually, just activate your environment containing pyppeteer and run pyppeteer-install.exe.

Related

Python Selenium execute_script jQuery Error and Discrepencies Between OSs

I'm using Selenium with Python (3.5) to programmatically explore a site. One step of this exploration includes scrolling to the bottom of a given page, and I have chosen to do so with jQuery as such where driver is the webdriver object and scrollloadtime is the set amount of time I want the scrolling to take:
driver.execute_script("$('html, body').animate({scrollTop: $(document).height() - $(window).height()}, %s);" % scrollloadtime)
This is where things get weird. When I run this code in a test environment (VM running Kali Linux), I have no issues with this -- I've never once had a problem with this line running in this environment.
However, when I attempt to run the exact same code with the exact same package versions (which I have listed below) on the exact same webpage inside a docker container running Debian Stretch, I get the following error:
Message: TypeError: $(...).animate is not a function
I'd like to figure out why this is happening rather than just a workaround. It's driving me insane!
I'm certainly no jQuery expert, but from the research I've done on the above error this normally occurs when either an old and minimized jQuery version is being used. What I can't figure out myself is how that solution then ties into Selenium or even Python itself.
Things I have tried, to no avail:
Installed jquery-related packages that exist on my test environment that did not exist within the docker image, on the docker image (i.e. all libjs-jquery* packages).
Attempted to inject jQuery into the page before running the script which triggers DDoS security. (Additionally, this shouldn't be necessary because the jQuery script worked without any injection in the test environment)
Attempted to exchange the initial $('html, body') with a defined variable (var x = document.getElementsByTagName('html')[0]; x.animate(...), though I will be the first to admit that I might not have done so correctly
Versions:
Python 3.5
Selenium (Python) 3.141.0
Geckodriver 0.24.0
Firefox ESR 68.1.0
Debian Stretch and Kali Linux
Any assistance or troubleshooting guidance would be greatly appreciated. Let me know if I can provide any additional information.

Python 3.6 - image scraping with google-image-download

I want to crawl some images for my machine learning practice and found this google-image-download to very useful and the codes works out of the box.
However, at the moment, it only allow not more than 100 images, which is the limit from google image page(that only load 100 images per page).
The document said if you are using pip install google_images_download(which in my case, I am doing that), it will download together with selenium and by using chromedriver, you can download more than that limit.
however, everytime I run the code python gimages.py:
from google_images_download import google_images_download
response = google_images_download.googleimagesdownload()
arguments = {"keywords":"number plates","limit":200,"print_urls":True}
paths = response.download(arguments)
print(paths)
I will get error:
Looks like we cannot locate the path the 'chromedriver' (use the
'--chromedriver' argument to specify the path to the executable.) or
google chrome browser is not installed on your machine (exception:
expected str, bytes or os.PathLike object, not NoneType)
as I checked my installation, selenium already installed:
reading further, it said I can download chromedriver and put inside the same folder and call python gimages.py --chromedriver "chromedriver", I still get the same error.
How can I resolve this?
I am using conda with python 3.6, running the terminal from conda. the code is already working, just that chromedriver part is not.
You need to specify the path... "chromedriver" is not a path...
You might need to the explicit path "/path/to/chromedriver/folder".
In your case: python gimages.py --chromedriver "/path/to/chromedriver/folder"
Hope this helps you!

Got errors, while running exe file built with pyinstaller and Google Cloud API integration in python

I am working one file python project.
I integrated google-cloud-API for realtime speech streaming and recognition.
It works with python aaa.py command well.
Now I need windows build file(.exe), so I used pyinstaller program and I got aaa.exe file successfully.
But I got this error while running speech streaming by using Google cloud API.
[Errno 2] No such file or directory:
'D:\AI\ai\dist\AAA\google\cloud\gapic\speech\v1\speech_client_config.json'
So I copied this speech_client_config.json file in needed path, after that I got below error again.
Exception in 'grpc._cython.cygrpc.ssl_roots_override_callback'
ignored E0511 01:13:14.320000000 3108
src/core/lib/security/security_connector/security _connector.cc:1170]
assertion failed: pem_root_certs != nullptr
Then, I can not find solution to get working version with google-cloud API.
I am using python version 2.7.14
I need your friendly help.
Thanks.
I had the same problem. If you are willing to distribute roots.pem with your executable (just search for the file - it should be buried deep within the installation directory of grpcio), I had luck fixing this by setting GRPC_DEFAULT_SSL_ROOTS_FILE_PATH environment variable to the full path of this roots.pem file.
Update 2021
To anyone who is experiencing this issue. I got it working thanks to these amazing people. See the full conversation on this github issue.
Here is the link
Step 1
Credits to #cbenhagen & #rising-stark on this github link.
A PyInstaller hook called hook-grpc.py looking like this would do the trick:
Create a python file named hook-grpc.py with this code.
from PyInstaller.utils.hooks import collect_data_files
datas = collect_data_files('grpc')
Step 2
Put the hook-grpc.py file in your \site-packages\PyInstaller\hooks directory of the python environment you are running on. So basically you can find it at
C:\Users\yourusername\AppData\Local\Programs\Python\Python37\Lib\site-packages\PyInstaller\hooks
Note:
Just change the yourusername and Python37 to your
respective username and python version you are using.
For Anaconda users it might be different. Check this site
to find the anaconda python environment path you are using.
Step 3
Once you've done that you can now convert your .py python program to .exe using pyinstaller and it should work.
This looks to me like a SSL credentials mistake. I think you are not being allowed to GC. Check this code snippet and this documentation.

Anaconda Selenium issues with webdriver

I have been struggling to figure out why I keep getting errors trying to use selenium. I'm using a local install of anaconda3 on my /home/user unix drive at the company I work for. I already pip installed selenium, seemingly without issue, but when I try the following:
from selenium import webdriver
driver = webdriver.Firefox()
it fails with the following message:
WebDriverException: Message: Expected browser binary location, but unable to find binary in default location, no 'moz:firefoxOptions.binary' capability provided, and no binary flag set on the command line
I've tried downloading the most current chromedriver and trying with that, I've tried installing another gecko-driver, I've tried all kinds of things. But nothing is working. I'm happy to provide any amount of additional information, I just want to get this off the ground at some point...
Thank you!
from selenium import webdriver
path = r'C:\yourgeckodriverpath\geckodriver.exe'
driver = webdriver.Firefox(executable_path=path)
Alright, through a combination of the responses to this question, I have figured out what (I think) went wrong. I was using a linux anaconda install on my company's servers, which (I believe) meant my python had no access to a browser driver. The solution was sadly to install anaconda locally, manually download/unzip/install selenium and geckodriver, and then make sure I pass the whole "executable_path=path" parameter to the Firefox method. This didn't work for Chrome for some reason, which I'll assume has something to do with the unchangeable security specifications on my work machine. If any part of this doesn't sound right, feel free to chime in and shed more light on the issue. Thanks!

Pytractor not importing Firefox

I'm trying to use pytractor.
When I write import statement
from pytractor.webdriver import Firefox
Firefox is not referenced-not found. Neither Chrome.
However, pytractor instructions and example clearly have no problem importing
Firefox:
readme file
Anybody has solved this?
(I read the pytractor is not actively maintained, however last activity on its Github page is currently 6 days old so I guess it is maintained.)
example
UPDATE:
FF was underlined with red squiggly line in Pycharm but the code still works. I just didn't run it. It's Pycharm's error checking that was confusing.
Don't have any issues with that. Installed it from github:
$ pip install git+https://github.com/kpodl/pytractor
And imported in the Python shell:
$ ipython
In [1]: from pytractor.webdriver import Firefox
In [2]:
Make sure you don't have your script named pytractor.

Categories