Selenium can not find element from workera.ai - python

I am trying to scrape question answers from workera.ai but I am stuck because Selenium cannot find any element I searched for using class. When I check the page source the element is available but Selenium can not find it. Here is what I am doing.
Signup using: https://workera.ai/candidates/signup
from selenium import webdriver
from selenium.webdriver.chrome import service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import time, os
option = webdriver.ChromeOptions()
option.add_argument("start-maximized")
option.add_experimental_option("excludeSwitches", ["enable-automation"])
option.add_experimental_option('useAutomationExtension', False)
option.add_argument("--disable-blink-features")
option.add_argument("--disable-gpu")
option.add_argument(r"--user-data-dir=C:\Users\user_name\AppData\Local\Google\Chrome\User Data") #e.g. C:\Users\You\AppData\Local\Google\Chrome\User Data
option.add_argument(r'--profile-directory=Profile 2') # using profile which is logged into the website
#option.add_argument("--headless")
option.add_argument('--disable-blink-features=AutomationControlled')
wd = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=option)
skill_sets = ['https://workera.ai/app/learner/skillset/82746bf6-4eb2-4065-b2fb-740bc3207d14','https://workera.ai/app/learner/skillset/7553e8f8-52bf-4136-a4ea-6aa63eb963d9','https://workera.ai/app/learner/skillset/e11cb698-38c1-4a4f-aa7b-43b85bdf5a51','https://workera.ai/app/learner/skillset/a999048c-ab99-4576-b849-4e72c9455418','https://workera.ai/app/learner/skillset/7df84ad9-ae67-4faf-a981-a95c1c02adbb', 'https://workera.ai/app/learner/skillset/737fa250-8c66-4ea0-810b-6847c304aa5b','https://workera.ai/app/learner/skillset/ed4f2f1f-2333-4b28-b36a-c7f736da9647','https://workera.ai/app/learner/skillset/323ba5d9-fffe-48c0-b7b4-966d1ebca99a','https://workera.ai/app/learner/skillset/488492e9-53c4-4600-b336-6dfe44340402']
# AI fluent AI literate DATA ANAlyst DATA Engineer DATA scientist Deep learn ML Responsible AI Software Engineer
for skill in skill_sets:
wd.get(skill)
time.sleep(20)
num = wd.find_element(By.CLASS_NAME, "sc-jNHgKk hrMhpT")# class name is different for every account
num = num.split('of')[1]
num = int(num)
print(num)
button = wd.find_elements(By.CLASS_NAME, "styled__SBase-sc-cmjz60-0 styled__SPrimary-sc-cmjz60-1 kSmXiJ hwoYMb sc-fKVqWL eOjNfz")
print(len(button))
wd.close()
I don't know why it is happening. Does the site block Selenium web drivers or it is something else?
Edit
I tried getting page source from Selenium and then accessing elements using bs4 and it is working. So I think the website is blocking Selenium by some mean.

The problem with selenium is that you can't select elements that has more than one class like this.
In order to select them, you can either mention one class in the value, or use "."
for example:
wd.find_element(By.CLASS_NAME,"class1.class2")
Also you can select the class that exists for all the answers which I believe it is this one "sc-jNHgKk", so you won't have the problem to select a class for each account, or you can just use XPATH instead.
num = int(wd.find_element(By.CLASS_NAME, "sc-jNHgKk").text.split("of ")[1])
button = wd.find_elements(By.CLASS_NAME, "styled__SBase-sc-cmjz60-0")
print(len(button))

Related

Webdriver wont click on a second link

I want to make a bot using selenium, but I'm having trouble with my bot going to a different part of a website. In my code, my driver successfully goes to nike.com (1), then successfully clicks and loads a different link within Nike (clicks circled area in (1) and goes to (2)). Then, my problems begin here, I try to click and load a different link (2) but my driver does nothing. I know my driver found the second link because if I print out 'second.text' then I get the correct text (3)...
I am still new to selenium and I pretty much don't know what I am doing. Any help would be helpful.
Thank you.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
if __name__ == '__main__':
options = Options()
options.add_argument("start-maximized")
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
driver.get("https://www.nike.com/men")
driver.implicitly_wait(5)
first = driver.find_element(by=By.CLASS_NAME, value="prl3-sm")
first.click()
driver.implicitly_wait(5)
second = driver.find_element(by=By.CSS_SELECTOR, value='a[class="JSftBPEZ"]')
#print(second.text)
second.click()
I have tested it.
Through Javascript click its getting clicked.
here is the code to click on second link.
second = driver.find_element(by=By.CSS_SELECTOR, value='a[class="JSftBPEZ"]')
driver.execute_script("arguments[0].click();",second)
BTW you may need to define xpath properly. Example the second link pointing to 6 elements. But anyway through Javascript sclick it would click the 1st option

Scraping with Selenium not showing all data (possible duplicate)

I was trying to make a simple code for scraping a dynamic website (a newbie with Selenium here). The data I intended to scrape is the product name and the price. I ran over the code and it worked, but only showed 10 entries, while there are 60 entries for each page. Here is the code:
import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get('https://www.tokopedia.com/p/komputer-laptop/media-penyimpanan-data') # the link
product_name = driver.find_elements(By.CSS_SELECTOR, value='span.css-1bjwylw')
product_price = driver.find_elements(By.CSS_SELECTOR, value='span.css-o5uqvq')
list_product = []
list_price = []
for i in range(len(product_name)):
list_product.append(product_name[i].text)
for j in range(len(product_price)):
list_price.append(product_price[i].text)
driver.quit()
df = pd.DataFrame(columns=['product', 'price'])
df['product'] = list_product
df['price'] = list_price
print(df)
I used the chromedriver installer instead of downloading the driver first and then locating it because I just thought it was just a simpler way. Also, I used Service instead of Options (many tutorial using Options) because I got some errors with it, and with Service it worked out fine. Oh, and I used PyCharm, if that just makes sense of something, maybe.
Any help or suggestions will be very much appreciated, thank you!
According to me you need to scroll down to bottom of the page first for all 60 of data to be loaded. As website is dynamic and as you scroll below data gets loaded. You can use javascript script for scrolling via webdriver as follows: driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") add this below driver.get() and before find_elements().
Don't forget to use sleep after scroll as it require time to get loaded.

Python Selenium - FedEx login not working

I want to automate the generation of shipping labels through FedEx, as we need to type 50+ at a time.
However, it looks like FedEx is blocking me from logging in using selenium but I can't quite tell if it's the website or my code.
I am pretty new to both python and selenium.
from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys
email = "my_email"
pwd = "my_password"
chrome_driver_path = "/Users/siddhartha/Developer/chromedriver"
driver = webdriver.Chrome(executable_path=chrome_driver_path)
driver.get("https://www.fedex.com/en-us/home.html")
location_click = driver.find_element_by_xpath('/html/body/div[4]/div/div/div/div/div/div/div[2]/ul/li[1]/a/span')
location_click.click()
signup = driver.find_element_by_xpath('/html/body/div[1]/header/div/div/nav/div/div/div/div[1]/a/span')
signup.click()
user_id_field = driver.find_element_by_id("NavLoginUserId")
user_id_field.send_keys(email)
time.sleep(4)
pwd_id_field = driver.find_element_by_id("NavLoginPassword")
pwd_id_field.send_keys(pwd)
time.sleep(3)
log_in_button = driver.find_element_by_xpath('/html/body/div[1]/header/div/div/nav/div/div/div/div[1]/div/div/form/button')
log_in_button.click()
Everything works until it presses LogIn and this error happens:
I added the time.sleep thinking they were blocking me to sign in because of the fast email/password typing.
Just encountered the same situation. The article linked by #Eric Ballard turned out to be very helpful. For me it was enough to add only the first option from that article: options.add_argument('--disable-blink-features=AutomationControlled')
So my code is the following:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument('--disable-blink-features=AutomationControlled')
browser = webdriver.Chrome(options=options)
browser.get('https://www.fedex.com/fedexbillingonline/pages/accountsummary/accountSummaryFBO.xhtml')
I had the same issue and was never able to find a solution. Your code looks fine, it's just that FedEx does something to block the browser from logging in when run with Selenium.
Seems like an Imperva type of deal, like Robert stated FedEx is likely detecting the use of bots. This may be of some use: https://piprogramming.org/articles/How-to-make-Selenium-undetectable-and-stealth--7-Ways-to-hide-your-Bot-Automation-from-Detection-0000000017.html

How to add repetitve responses to a webpage using selenium?

I'm trying to add a repetitive text response to a chat box in a gaming website in order to create an algorithm which guesses words based on a drawing and hint.
The website is the popular pictionary website : https://skribbl.io/
I've been working on the algorithm to guess the words based on others reply, I'm not familiar with Selenium and trying to just print some simple text in the chat/guess textbox.
The website opens up, but it's not printing anything onto the box. How can I resolve this ? Thank you
from selenium import webdriver
from selenium.webdriver.support import ui
from selenium.webdriver.common.keys import Keys
def page_is_loaded(driver):
return driver.find_element_by_tag_name("body")!=None
driver = webdriver.Firefox(executable_path = 'C:\Program Files\gecko\geckodriver.exe')
driver.get("https://skribbl.io/?p0YRvXqupiza")
wait = ui.WebDriverWait(driver,10)
wait.util(page_is_loaded)
for x in range (0,20):
textbox = driver.find_element_by_name("text")
textbox.send_keys("1")
This is how the homepage of Skribbl.io looks like - https://i.imgur.com/Udth9vs.jpg
The textbox is seen at the bottom right-hand side where I want the input from my code can be found here - https://i.imgur.com/frMTFjJ.jpg
Try the following
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
firefox_browser = webdriver.Firefox(executable_path=r'C:\Program Files\gecko\geckodriver.exe')
firefox_browser.get("https://skribbl.io/")
time.sleep(2)
name_input = firefox_browser.find_element_by_css_selector("#inputName")
play_button = firefox_browser.find_element_by_css_selector("button.btn:nth-child(3)")
name_input.send_keys("Drums3")
play_button.send_keys(Keys.ENTER)
for x in range(0, 20):
time.sleep(3)
chat_input = firefox_browser.find_element_by_css_selector("#inputChat")
chat_input.send_keys("hello")
chat_input.send_keys(Keys.ENTER)

Using Python to Scrape a JS Form

I'm currently working on a research project in which we are trying to collect saved image files from Brazil's Hemeroteca database. I've done web scraping on PHP pages before using C/C++ with HTML forms, but as this is a shared script, I need to switch to python such that everyone in the group can use this tool.
The page which I'm trying to scrape is: http://bndigital.bn.gov.br/hemeroteca-digital/
There are three forms which populate, the first being the newspaper/journal. Upon selecting this, the available times populate, and the final field is the search term. I've inspected the HTML page here and the three IDs of these are respectively: 'PeriodicoCmb1_Input', 'PeriodoCmb1_Input', and 'PesquisaTxt1'.
Some google searches on this topic led me to the Selenium package, and I've put together this sample code to attempt to read the page:
import webbrowser
import requests
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import time
print("Begin...")
browser = webdriver.Chrome()
url = "http://bndigital.bn.gov.br/hemeroteca-digital/"
browser.get(url)
print("Waiting to load page... (Delay 3 seconds)")
time.sleep(3)
print("Searching for elements")
journal = browser.find_element_by_id("PeriodicoCmb1_Input")
timeRange = browser.find_element_by_id("PeriodoCmb1_Input")
searchTerm = browser.find_element_by_id("PesquisaTxt1")
print(journal)
print("Set fields, delay 3 seconds between input")
search_journal = "Relatorios dos Presidentes dos Estados Brasileiros (BA)"
search_timeRange = "1890 - 1899"
search_text = "Milho"
journal.send_keys(search_journal)
time.sleep(3)
timeRange.send_keys(search_timeRange)
time.sleep(3)
searchTerm.send_keys(search_text)
print("Perform search")
submitButton = button.find_element_by_id("PesquisarBtn1_input")
submitButton.click()
The script runs to the print(journal) statement, where an error is thrown saying the element cannot be found.
Can anyone take a quick sweep of the page in question and make sure I've got the general premise of this script in line correctly, or point me towards some examples to get me running on this problem?
Thanks!
Your DOM elements you are trying to find are located in iframe. So before using find_element_by_id API you should switch to iframe context.
Here is a code how to switch to iframe context:
# add your code
frame_ref = browser.find_elements_by_tag_name("iframe")[0]
iframe = browser.switch_to.frame(frame_ref)
journal = browser.find_element_by_id("PeriodicoCmb1_Input")
timeRange = browser.find_element_by_id("PeriodoCmb1_Input")
searchTerm = browser.find_element_by_id("PesquisaTxt1")
# add your code
Here is a link describing switching to iframe context.

Categories