I have an instance created in aws EC2, with these characteristics:
platform: windows
version: Windows_Server-2019-English-Full-Base-2022.07.13
Type: t2.micro
I want to make a scrapper that runs every x time on this instance.
I connect to the instance using RDP, launch it, and disconnect. The browser stays like in the image 1 (the inserted url and the blank screen without loading anything) the console starts writing exceptions like [2] . The weird thing is that the page only loads when I connect to the instance via RDP and scraper starts normally.
Could someone tell me what I'm doing wrong? or if some configuration is missing in the instance?
# undetected_chromedriver == '3.1.5r4'
# selenium == 4.3.0
# python == 3.10.5
# Chrome == VersiĆ³n 104.0.5112.81
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
import time
import undetected_chromedriver.v2 as uc
from selenium.webdriver.common.keys import Keys
def main(driver):
driver.get("https://www.google.com/")
WebDriverWait(driver, LONG_TIME).until(EC.presence_of_element_located((By.TAG_NAME, "form")))
input = WebDriverWait(driver, LONG_TIME).until(EC.presence_of_element_located((By.TAG_NAME, "input")))
input.send_keys('stackoverflow')
input.send_keys(Keys.ENTER)
search = WebDriverWait(driver, LONG_TIME).until(EC.presence_of_element_located((By.ID, "search")))
titles = search.find_elements(By.TAG_NAME, "h3")
for title in titles:
print(title.get_attribute('textContent'))
if __name__ == '__main__':
options = uc.ChromeOptions()
LONG_TIME = 30
driver = uc.Chrome(options=options, user_data_dir="C:\\temp\\profile")
while True:
try:
main(driver)
except Exception as e:
print(e)
time.sleep(60)
[2]:
Message:
Stacktrace:
Backtrace:
Ordinal0 [0x012878B3+2193587]
Ordinal0 [0x01220681+1771137]
Ordinal0 [0x011341A8+803240]
Ordinal0 [0x011624A0+992416]
Ordinal0 [0x0116273B+993083]
Ordinal0 [0x0118F7C2+1177538]
Ordinal0 [0x0117D7F4+1103860]
Ordinal0 [0x0118DAE2+1170146]
Ordinal0 [0x0117D5C6+1103302]
Ordinal0 [0x011577E0+948192]
Ordinal0 [0x011586E6+952038]
GetHandleVerifier [0x01530CB2+2738370]
GetHandleVerifier [0x015221B8+2678216]
GetHandleVerifier [0x013117AA+512954]
GetHandleVerifier [0x01310856+509030]
Ordinal0 [0x0122743B+1799227]
Ordinal0 [0x0122BB68+1817448]
Ordinal0 [0x0122BC55+1817685]
Ordinal0 [0x01235230+1856048]
BaseThreadInitThunk [0x77320419+25]
RtlGetAppContainerNamedObjectPath [0x778377FD+237]
RtlGetAppContainerNamedObjectPath [0x778377CD+189]
Related
Good morning. I am new in using python and webscraping. I have to download a series of images from this link (just the last part of the urll change for the following pages): https://historisch.cbs.nl/detail.php?nav_id=0-1&index=2&id=30568043
What I want to do is click on the two download buttons in order that at the end I download the images on my laptop (https://i.stack.imgur.com/zWAJg.png).
Here is my code until now:
browser = webdriver.Chrome(service=s, options=chrome_options)
for i in range(8047,8051):
no = str(i)
browser.get ("https://historisch.cbs.nl/detail.php?nav_id=1-1&index=2&id=3056"+str(i)+".jpeg")
download = WebDriverWait(browser, 5).until(EC.element_to_be_clickable((By.XPATH,'//*[#id="downloadDirect"]')))
download.click()
t.sleep(1)
download = WebDriverWait(browser, 3).until(EC.element_to_be_clickable((By.XPATH,'//*[#id="downloadResLink"]')))
download.click()
I recive this error:
----> 8 download = WebDriverWait(browser, 5).until(EC.element_to_be_clickable((By.XPATH,'//*[#id="downloadDirect"]')))
9 download.click()
10 t.sleep(1)
~\anaconda3\lib\site-packages\selenium\webdriver\support\wait.py in until(self, method, message)
88 if time.monotonic() > end_time:
89 break
---> 90 raise TimeoutException(message, screen, stacktrace)
91
92 def until_not(self, method, message: str = ""):
TimeoutException: Message:
Stacktrace:
Backtrace:
Ordinal0 [0x00371ED3+2236115]
Ordinal0 [0x003092F1+1807089]
Ordinal0 [0x002166FD+812797]
Ordinal0 [0x002455DF+1005023]
Ordinal0 [0x002457CB+1005515]
Ordinal0 [0x00277632+1209906]
Ordinal0 [0x00261AD4+1120980]
Ordinal0 [0x002759E2+1202658]
Ordinal0 [0x002618A6+1120422]
Ordinal0 [0x0023A73D+960317]
Ordinal0 [0x0023B71F+964383]
GetHandleVerifier [0x0061E7E2+2743074]
GetHandleVerifier [0x006108D4+2685972]
GetHandleVerifier [0x00402BAA+532202]
GetHandleVerifier [0x00401990+527568]
Ordinal0 [0x0031080C+1837068]
Ordinal0 [0x00314CD8+1854680]
Ordinal0 [0x00314DC5+1854917]
Ordinal0 [0x0031ED64+1895780]
BaseThreadInitThunk [0x75B4FA29+25]
RtlGetAppContainerNamedObjectPath [0x779A7BBE+286]
RtlGetAppContainerNamedObjectPath [0x779A7B8E+238]
If someone could help me, I would be very grateful.
You're trying to click on a button located in an iframe. You need to switch to that iframe first, then locate the button, then click on it:
WebDriverWait(browser, 5).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, '//iframe[#class="detail-media-viewer"]')))
t.sleep(5) ## let elements within iframe load properly
[.. do your stuff locating/clicking whatever buttons are there]
If you plan on interacting /locating again some elements from the main page(outside iframe) you need to switch back to main page:
browser.switch_to.default_content()
Selenium documentation can be found here: https://www.selenium.dev/documentation/
I am trying browse webpage scrape data off it. I have couple more links along with mentioned link which work on a for loop. However, unlike other links when get() function tries to access this like it gives me the error below:
Code statement:
driver = webdriver.Chrome(executable_path="..\Drivers\chromedriver.exe")
def page_content_extractor(given_line):
print('in page content extractor function. Given link is : ',given_line)
try:
driver.get(given_line)
except Exception as e:
print('Exception occurred !! Find these links under rogue links')
print(e)
time.sleep(2)
Exception statement:
selenium.common.exceptions.WebDriverException: Message: unknown error: unexpected command response
(Session info: chrome=103.0.5060.66)
Stacktrace:
Backtrace:
Ordinal0 [0x006FD953+2414931]
Ordinal0 [0x0068F5E1+1963489]
Ordinal0 [0x0057C6B8+837304]
Ordinal0 [0x0056EB34+781108]
Ordinal0 [0x0056E06A+778346]
Ordinal0 [0x0056D646+775750]
Ordinal0 [0x0056CEBC+773820]
Ordinal0 [0x0056CD59+773465]
Ordinal0 [0x0057DA70+842352]
Ordinal0 [0x005CAB6F+1157999]
Ordinal0 [0x005C4463+1131619]
Ordinal0 [0x0059E860+976992]
Ordinal0 [0x0059F756+980822]
GetHandleVerifier [0x0096CC62+2510274]
GetHandleVerifier [0x0095F760+2455744]
GetHandleVerifier [0x0078EABA+551962]
GetHandleVerifier [0x0078D916+547446]
Ordinal0 [0x00695F3B+1990459]
Ordinal0 [0x0069A898+2009240]
Ordinal0 [0x0069A985+2009477]
Ordinal0 [0x006A3AD1+2046673]
BaseThreadInitThunk [0x7660FA29+25]
RtlGetAppContainerNamedObjectPath [0x77B07A9E+286]
RtlGetAppContainerNamedObjectPath [0x77B07A6E+238]
Process finished with exit code 1
one more thing to mention here is that this exception is not thrown for other urls. No sure whats really wrong here or how to handle it. Tried using try-except but even that doesn't seem to work.
You have to download latest chrome driver file of version and use it project. Update your desktop to Latest chrome version (Right side three dots option>>Help>> About Chrome and use the chromedriver exe from below path which is supportive to you chrome version.
Chrome driver: https://chromedriver.chromium.org/downloads
this is the error message.
it appeared in a red coloured background box.
i ran it in visual studio code.
i have no clue what to do and i have no understanding what this error message means.
the goal was to run the automation software to automate some actions in chrome.
i have the correct chromedriver version.
Exception has occurred: NoSuchElementException
Message: no such element: Unable to locate element: {"method":"css selector","selector":"#mat-option-165 .pr-2:nth-child(1)"}
(Session info: chrome=101.0.4951.67)
Stacktrace:
Backtrace:
Ordinal0 [0x00E8B8F3+2406643]
Ordinal0 [0x00E1AF31+1945393]
Ordinal0 [0x00D0C748+837448]
Ordinal0 [0x00D392E0+1020640]
Ordinal0 [0x00D3957B+1021307]
Ordinal0 [0x00D66372+1205106]
Ordinal0 [0x00D542C4+1131204]
Ordinal0 [0x00D64682+1197698]
Ordinal0 [0x00D54096+1130646]
Ordinal0 [0x00D2E636+976438]
Ordinal0 [0x00D2F546+980294]
GetHandleVerifier [0x010F9612+2498066]
GetHandleVerifier [0x010EC920+2445600]
GetHandleVerifier [0x00F24F2A+579370]
GetHandleVerifier [0x00F23D36+574774]
Ordinal0 [0x00E21C0B+1973259]
Ordinal0 [0x00E26688+1992328]
Ordinal0 [0x00E26775+1992565]
Ordinal0 [0x00E2F8D1+2029777]
BaseThreadInitThunk [0x761BFA29+25]
RtlGetAppContainerNamedObjectPath [0x77A47A7E+286]
RtlGetAppContainerNamedObjectPath [0x77A47A4E+238]
File "C:\Users\Public\Untitled-2.py", line 38, in <module>
driver.find_element(By.CSS_SELECTOR, "#mat-option-165 .pr-2:nth-child(1)").click()
here is the code that led up to the error message.
the code between the "#####" and "###" are copied from the file generated by the selenium ide.
the last line of the code is what causes the error message.
#####
import pytest
import time
import json
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
###
from selenium import webdriver
driver = webdriver.Chrome(executable_path = r"C:\chromedriver\chromedriver.exe")
driver.get("https://www.travelstart.co.za")
#####
driver.get("https://www.travelstart.co.za/")
driver.set_window_size(1552, 840)
driver.find_element(By.ID, "dept_city0").click()
driver.find_element(By.ID, "dept_city0").send_keys("cpt")
time.sleep(1)
driver.find_element(By.CSS_SELECTOR, ".sub_txt:nth-child(6)").click()
driver.find_element(By.ID, "arr_city0").click()
driver.find_element(By.ID, "arr_city0").send_keys("jnb")
time.sleep(1)
driver.find_element(By.CSS_SELECTOR, "#mat-option-165 .pr-2:nth-child(1)").click()
###
i am using python 3.10.4
I am trying to write a scraper that will go on the eia.gov website and scrape electricity rates.
This is my scrape function:
from listOfElements import pieces
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import pyperclip
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
#We need a function that scrapes the eia.gov website for electricity rates..
def scrape1():
for piece in pieces:
try:
driver.get('https://www.eia.gov/electricity/monthly/epm_table_grapher.php?t=epmt_5_06_a')
element = driver.find_element(By.XPATH, piece)
#element.send_keys(Keys.CONTROL,'a')
element.send_keys(Keys.CONTROL,'c')
text = pyperclip.paste()
with open('output.txt', 'w', encoding='utf-8') as f:
f.write(text)
except Exception as e:
print(f'Exception while processing {piece} -> {e}')
This is my separate file (called listOfElements) that is a list of elements (XPATH's)
pieces = ['/html/body/div[1]/div[2]/div/div[3]/div/div/table/tbody/tr[3]/td[2]', #Maine Residential
'/html/body/div[1]/div[2]/div/div[3]/div/div/table/tbody/tr[3]/td[4]', #Maine Commercial
'/html/body/div[1]/div[2]/div/div[3]/div/div/table/tbody/tr[3]/td[6]', #Maine Industrial
'/html/body/div[1]/div[2]/div/div[3]/div/div/table/tbody/tr[4]/td[2]', #Massachusetts Residential
'/html/body/div[1]/div[2]/div/div[3]/div/div/table/tbody/tr[4]/td[5]', #Massachusetts Commercial
'/html/body/div[1]/div[2]/div/div[3]/div/div/table/tbody/tr[4]/td[7]', #Massachusetts Industrial
'/html/body/div[1]/div[2]/div/div[3]/div/div/table/tbody/tr[4]/td[8]',] #Massachusetts Transportation
This is the error I am getting (partial stacktrace):
runfile('C:/Users/MYNAME/Desktop/Price Grabber/priceGrabber.py', wdir='C:/Users/MYNAME/Desktop/Price Grabber')
Reloaded modules: listOfElements
C:\Users\MYNAME\Desktop\Price Grabber\priceGrabber.py:21: DeprecationWarning: executable_path has been deprecated, please pass in a Service object
driver = webdriver.Chrome(PATH)
Exception while processing /html/body/div[1]/div[2]/div/div[3]/div/div/table/tbody/tr[3]/td[2] -> Message: element not interactable
(Session info: chrome=100.0.4896.127)
Stacktrace:
Backtrace:
Ordinal0 [0x00C97413+2389011]
Ordinal0 [0x00C29F61+1941345]
Ordinal0 [0x00B1C520+836896]
Ordinal0 [0x00B448E3+1001699]
Ordinal0 [0x00B43FBE+999358]
Ordinal0 [0x00B6414C+1130828]
Ordinal0 [0x00B3F974+981364]
Ordinal0 [0x00B64364+1131364]
Ordinal0 [0x00B74302+1196802]
Ordinal0 [0x00B63F66+1130342]
Ordinal0 [0x00B3E546+976198]
Ordinal0 [0x00B3F456+980054]
GetHandleVerifier [0x00E49632+1727522]
GetHandleVerifier [0x00EFBA4D+2457661]
GetHandleVerifier [0x00D2EB81+569713]
GetHandleVerifier [0x00D2DD76+566118]
Ordinal0 [0x00C30B2B+1968939]
Ordinal0 [0x00C35988+1989000]
Ordinal0 [0x00C35A75+1989237]
Ordinal0 [0x00C3ECB1+2026673]
BaseThreadInitThunk [0x776DFA29+25]
RtlGetAppContainerNamedObjectPath [0x77C37A7E+286]
RtlGetAppContainerNamedObjectPath [0x77C37A4E+238]
Exception while processing /html/body/div[1]/div[2]/div/div[3]/div/div/table/tbody/tr[3]/td[4] -> Message: element not interactable
(Session info: chrome=100.0.4896.127)
Stacktrace:
Backtrace:
Ordinal0 [0x00C97413+2389011]
Ordinal0 [0x00C29F61+1941345]
Ordinal0 [0x00B1C520+836896]
Ordinal0 [0x00B448E3+1001699]
Ordinal0 [0x00B43FBE+999358]
How do I fix this?
The problem is in the way you get the data, it is not necessary to simulate CTRL+C to get the data, what you had was enough. To get the data you use the text attribute of the driver object and that's it!
#!/usr/bin/env python
from listOfElements import pieces
from selenium import webdriver
#from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
#import pyperclip
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
#We need a function that scrapes the eia.gov website for electricity rates..
def scrape1():
url= 'https://www.eia.gov/electricity/monthly/epm_table_grapher.php?t=epmt_5_06_a'
data = []
for piece in pieces:
try:
driver.get(url)
element = driver.find_element(By.XPATH, piece)
#element.send_keys(Keys.CONTROL,'a')
#element.send_keys(Keys.CONTROL,'c')
#text = element.text
data.append(element.text)
#with open('output.txt', 'w', encoding='utf-8') as f:
# f.write(text)
except Exception as e:
print(f'Exception while processing {piece} -> {e}')
output_file = open('output.txt', 'w')
for value in data:
output_file.write(value)
output_file.write(" ")
output_file.close()
scrape1()
I am using Selenium python with chromedriver and I got the usual error about incompatible chromedriver and chrome versions:
session not created
from disconnected: unable to connect to renderer
(Session info: chrome=96.0.4664.110)
I went and downloaded chromedriver 96 but I keep getting that same error.
Here is my output when i check the chromedriver version :
> chromedriver --version
ChromeDriver 96.0.4664.45 (76e4c1bb2ab4671b8beba3444e61c0f17584b2fc-refs/branch-heads/4664#{#947})
Here is how I initialize the driver in my code:
options = Options()
port = '8888'
options.add_argument('--remote-debugging-port=' + port)
options.add_argument('headless')
try:
driver = webdriver.Chrome(options=options)
except:
...
EDIT : After updating Selenium to ver. 4.1.0, I now get a stacktrace with the same error :
session not created
from disconnected: unable to connect to renderer
(Session info: chrome=96.0.4664.110)
Stacktrace:
Backtrace:
Ordinal0 [0x00916903+2517251]
Ordinal0 [0x008AF8E1+2095329]
Ordinal0 [0x007B2848+1058888]
Ordinal0 [0x007A376E+997230]
Ordinal0 [0x007B3A60+1063520]
Ordinal0 [0x007FBA7A+1358458]
Ordinal0 [0x007FA71A+1353498]
Ordinal0 [0x007F639B+1336219]
Ordinal0 [0x007D27A7+1189799]
Ordinal0 [0x007D3609+1193481]
GetHandleVerifier [0x00AA5904+1577972]
GetHandleVerifier [0x00B50B97+2279047]
GetHandleVerifier [0x009A6D09+534521]
GetHandleVerifier [0x009A5DB9+530601]
Ordinal0 [0x008B4FF9+2117625]
Ordinal0 [0x008B98A8+2136232]
Ordinal0 [0x008B99E2+2136546]
Ordinal0 [0x008C3541+2176321]
BaseThreadInitThunk [0x76ACFA29+25]
RtlGetAppContainerNamedObjectPath [0x77A17A9E+286]
RtlGetAppContainerNamedObjectPath [0x77A17A6E+238]
I think the problem was related to the fact that I launched the driver in headless mode and that it doesn't stop itself when the program stops, which caused the program to crash when being started again.
Solution: use driver.quit() at the end of the program