Selenium Extraction Problems: Waits/Not Finding Elements - python

In both chrome and firefox, everything is fine up until I need to extract text. I get this error:
h3 = next(element for element in h3s if element.is_displayed())
StopIteration
I even added a fluent wait.
browser = webdriver.Firefox()
browser.get('https://www.voilanorbert.com/')
inputElement = browser.find_element_by_id("form-search-name")
inputElement.send_keys(leadslist[i][0])
inputElement = browser.find_element_by_id("form-search-domain")
inputElement.send_keys(leadslist[i][1])
searchbutton = browser.find_element_by_name("search")
searchbutton.click()
wait = WebDriverWait(browser, 20)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.results")))
wait2 = WebDriverWait(browser, 3000, poll_frequency=100, ignored_exceptions=[ElementNotVisibleException])
wait2.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "h3.one")))
h3s = browser.find_elements_by_css_selector('h3.one')
h3 = next(element for element in h3s if element.is_displayed())
result = h3.text
I think its because its not actually extracting anything, so its just an empty list.
Some pictures that will probably help:
This is the before picture:
This is the after picture:
I need to extract what is in the "text-center displayed" class of the "result" class.

The answer is fairly simple, you just need a different selector when waiting for the search result.
The approach below (C#) works perfectly, it'll reduce your code with a few lines.
One "result DIV" becomes visible when the search is done. It is the only element with the "text-center displayed" class, so that's all your selector needs.
Once such a DIV is displayed, you know where to pinpoint the H3 element (it's a child of said DIV).
So simply wait for the below element to become visible after you've clicked the search button:
IWebElement headerResult = w.Until(ExpectedConditions.ElementIsVisible(By.CssSelector("div[class=\"text-center displayed\"] h3")));
string result = headerResult.Text;

Related

Scraping the total time uploaded to a Youtube channel with BeautifulSoup or selenium

I'm trying to grab the individual video lengths for all videos on one channel and store it in a list or something.
So first I tried Beautiful Soup by using requests library and doing findAll("div") but I get nothing useful. None of the elements look at all like the inspect element on the youtube channel page. Apparently it's because YouTube loads dynamically or something. So you have to use selenium. Idk what that means, but anyway I tried selenium and got this error:
Unable to locate element: {"method":"css selector","selector":"[id="video-title"]"}
from this code:
from selenium import webdriver
from selenium.webdriver.common.by import By
PATH = (path\chromedriver.exe)
driver = webdriver.Chrome(PATH)
driver.get(r"https://www.youtube.com/c/0214mex/videos?view=0&sort=dd&shelf_id=0")
print(driver.title)
search = driver.find_element(By.ID,"video-title")
print(search)
driver.quit()
I get the feeling I don't really understand how web scraping works. Usually if I wanted to grab elements from a webpage I'd just do the soup thing, findAll on div and then keep going down until I reached the a tag or whatever I needed. But I'm having no luck with doing that on YT channel pages.
Is there an easy way of doing this? I can clearly see the hierarchy when I do inspect element on the YouTube page. It goes:
body -> div id=content -> ytd-browse class... -> ytd-two-column-browse-results... -> div id=primary -> div id=contents -> div id =items -> div id = dismissible -> div id =details -> div id=meta -> h3 class... -> and inside an a tag there's all the information I need.
I'm probably naive for thinking that if I simply findAll on "div" it would just show me all the divs, I'd then go to the last one div id=meta and then searchAll "h3" and then search "a" tags and I'd have my info. But searching for "div" with findAll (in BeautifulSoup) has none of those divs and actually the ones it comes up with I can't even find in the select element thing.
So yeah, I seem to be misunderstanding how the findAll thing works. Can anyone provide a simple step-by-step way of getting the information which I'm looking for? Is it impossible without using selenium?
Problem explanation
YouTube is dynamic in nature what it means is, basically the more you load by doing scroll down, the more content it will show. So yes that's dynamic in nature.
So even Selenium understand the same thing, scroll down and add more items title into the list. Also they are typical take few seconds to load, so having an explicit waits will definitely help you get all the title.
You need to maximize the windows, put time.sleep(.5) for visibility and bit of stability. I have put range to 200, meaning grab 200 title, you can put any sensible arbitrary number and script should do the magic.
Also, since it is dynamic, I have defined number_of_title_to_scrape = 100, you can try with your desired number as well.
Solution
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
driver.implicitly_wait(50)
driver.get("https://www.youtube.com/c/0214mex/videos")
wait = WebDriverWait(driver, 20)
video_title = []
len_of_videos= len(driver.find_elements(By.XPATH, "//a[#id='video-title']"))
j = 0
number_of_title_to_scrape = 100
for i in range(number_of_title_to_scrape):
elements = driver.find_elements(By.XPATH, "//a[#id='video-title']")
driver.execute_script("arguments[0].scrollIntoView(true);", elements[j])
time.sleep(.5)
title = wait.until(EC.visibility_of((elements[j])))
print(title.text)
video_title.append(title.text)
j = j +1
if j == number_of_title_to_scrape:
break
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Output :
【遊戯王】千年パズルが難易度最高レベルでマジで無理wwwww
【ルームツアー】3億円の豪邸の中ってどうなってるの?
コムドットと心霊スポット行ったらヤンキー達に道を塞がれた。
【事件】コムドットやまと。盗撮した人が変装したはじめしゃちょーでもキレ過ぎて分からない説wwwww
はじめしゃちょーで曲を作ってみた【2021年ver.】
3億円の豪邸を買いました!
【コアラのマーチ】新種のコアラ考えたら商品化されてしまいましたwwwww
ヨギボー100万円分買って部屋に全部置いてみた結果wwwww
自販機からコーラが止まらなくなりました
絶対にむせる焼きそばがあるらしいぞ!んなわけねえよ!
はじめしゃちょーがsumika「Lovers」歌ってみた【THE FIRST TAKE】
Mr.マリックのマジックを全部失敗させたら空気が...
ビビりの後輩を永遠にビックリさせてみたwwwwwwww
【泣くなはじめ】大家さん。今まで6年間ありがとうございました。
液体窒素を口の中に入れてみたwwwwww
ヒカルに1億円の掴み取りさせたら大変な事になってしまった。
ヒカルさんに縁切られました。電話してみます。
玄関に透明なアクリル板があったらぶつかる説
なんとかしてください。
【+1 FES 3rd STAGE】誰が最強?YouTuber瓦割り対決!
【実況付き】アナウンサーとニュースっぽく心霊スポット行ったら怖くない説wwwww
体重が軽過ぎる男がシーソーに乗った結果wwwww
糸電話を10年間待ち続けた結果…
愛車を人にあげる事になったので最後に改造しました。
【閲覧注意】ゴギブリを退治したいのに勢いだけで結局何もできないヤツ
【12Kg】巨大なアルミ玉の断面ってどうなってるの?切断します。
打つのに10年間かかる超スローボールを投げた結果…
人の家の前にジンオウガがいた結果wwwwwwww
【野球】高く打ち上げたボールを10年間待った結果…
シャンプー泡立ったままでイルカショー見に行ってみたwwwwwww
水に10年間潜り続けた男
ランニングマシン10年間走り続けた結果…
コロナ流行ってるけど親友の結婚式行ってきた。
10年間タピオカ吸い続けた結果...
【バスケ】スリーポイント10000回練習したぞ。
かくれんぼ10年間放置した結果...
【危険】24時間エスカレーター生活してみた。
1日ヒカキンさんの執事として働いてみたwwwww
人が食ってたパスタを液体窒素でカチカチにしといたwwwwwwwwww
1人だけ乗ってる観覧車が檻になってるドッキリwwwwwww
【検証】コカ・コーラが1番美味しいのはどんなシチュエーション?
はじめしゃちょーをアニメ風に描いてもらった結果wwwww
コーラの容器がブツブツになった結果wwwww #shorts
絶対にケガする重い扉を倒してみた結果wwwwww
ショートケーキの缶売ってたwwwww #shorts
ガチの事故物件で1日生活してみたら何か起こるの?
【初公開】はじめしゃちょーの1日密着動画。
コーラに油を混ぜてメントス入れたらすごい事になる?! #shorts
【拡散希望】河野大臣…コロナワクチンって本当に大丈夫なん…?
ヤバい服見つけたんだがwwwwwwwwww
ヒカキンさんにチャンネル登録者数抜かれました。
What if the classroom you were hiding in the locker was a women's changing room?
コーラがトゲトゲになった結果wwwwww #shorts
エヴァンゲリオンが家の前に立っていたら...?
夏の始まり。1人で豪華客船を貸し切ってみた。女と行きたかった。
【検証】大食いYouTuber VS オレと業務用調理器。どっちが早いの?
天気良いのにオレの家だけ雨降らせたらバレる?バレない?wwwww
3000円ガチャ発見!PS5当たるまで回したらヤバい金額にwwwwww
カラオケで入れた曲のラスサビ永遠ループドッキリwwwwwwww
【ラーメン】ペヤング超超超超超超大盛りペタマックスの新作出たwwwwwww食います
深夜に急に家に入ってくる配達員。
オレは社会不適合者なのか
巨大なクマさん買い過ぎちゃった!
GACKTさん。オレGACKTさんみたいになりたいっス。
100万円の世界最強のスピーカー買ったんやけど全てがヤバいwwwww
【奇妙】ヒカキンさんにしか見えない人が1日中めっちゃ倒れてたらどうする?
ヘリウムガス吸い過ぎたら一生声が戻らなくなるドッキリwwwwwwww
スマブラ世界最強の男 VS 何でもして良いはじめしゃちょー
山田孝之とはじめしゃちょーの質問コーナー!そして消えた200万円。
山田孝之さんにめちゃくちゃ怒られました。
ヒカキンじゃんけんで絶対チョキを出させる方法を発見wwwwwwww
6年ぶりに銅羅で起こされたら同じ反応するの?
バイト先の後輩だった女性と結婚しました。
フォーエイトはエイトフォーで捕まえられるの?
ジムが素敵な女の子だらけだったら限界超えてバーベルめっちゃ上がる説
同棲?
はじめしゃちょー。バイクを買う。
【実話】過去に女性化していた事を話します。
【近未来】自走するスーツケースを買いました。もう乗り物。
ジェットコースターのレールの上を歩いてみた。
バカな後輩ならパイの実がめっちゃ大きくなってても気づかねえよwwwwwwww
久しぶりに他人を怒鳴ったわ
【42万円】Amazonですごいモノが売ってたので買いました。そして人の家の前へ
ペヤングの最新作がヤバ過ぎて全部食べれませんでした。
人の家の前で日本刀を持ったクマさんがずっと待ってる動画
3Pシュートを10000回練習したらどれくらい上手くなるの?【〜5000回】
おい佐藤二朗。オレとやり合おうや。
ひとりぼっちの君へ。
これはオレのしたかった東京の生活じゃない。
巨大なクマのぬいぐるみを浮かせたい
バスケットボール100個で試合したらプロに勝てるんじゃね?
【閲覧注意】100デシベル以上でねるねるねるね作ったら日本1うるさい動画になったwwwwww
オレずっと筋トレ続けてたんスよ。
収録までさせて勝手にDr.STONEの世界にオレがいる話作ってみた
失禁マシーンってのがあるらしいぞwwwwwwwww
謎の部屋に閉じ込められました。おや?真ん中になにかあるぞ?
家に来てほしくないから看板作りました。
これが未来のサウナです。
【恐怖映像】オレの後輩がガチでクズすぎる
就活あるある【ゲストが豪華】
If you want a specific number of videos- go for for loop as mentioned in another answer. Below code will keep scrolling until manually close the browser.
from selenium import webdriver
driver = webdriver.Chrome(executable_path="path to chromedriver.exe")
driver.maximize_window()
driver.implicitly_wait(20)
driver.get("https://www.youtube.com/c/0214mex/videos")
j=0
try:
while True: # Infinite loop, to keep scrolling.
videos = driver.find_elements_by_id("dismissible") # Find all the videos. Initially the length of videos will be 30, keeps increasing as we scroll down.
driver.execute_script("arguments[0].scrollIntoView(true);", videos[j]) # Scroll to all videos, more videos will be loaded and list of videos will be updated everytime.
video_name = videos[j].find_element_by_tag_name("h3").get_attribute("innerText") # videos[j] - Get the name of jth video indiviually
video_length = videos[j].find_element_by_xpath(".//span[#class='style-scope ytd-thumbnail-overlay-time-status-renderer']").text # videos[j] - Get the length of jth video indiviually.
print("{}: {}-{}".format(j+1,video_name,video_length)) # To print in specific format.
j+=1 # Increment j
except Exception as e:
print(e)
driver.quit()
Output: (Manually closed the browser)
1: 【遊戯王】千年パズルが難易度最高レベルでマジで無理wwwww-12:26
2: 【ルームツアー】3億円の豪邸の中ってどうなってるの?-8:20
3: コムドットと心霊スポット行ったらヤンキー達に道を塞がれた。-18:08
...
294: これがwww世界1巨大なwww人をダメにするソファwwwwww-8:06
295: 皆さまにお願いがあります。-4:18
Message: no such window: target window already closed
from unknown error: web view not found

Pop-up saying element not interactable Selenium

There are two pop-ups: One that asks if you live in California and the second one looks like this:
Here is my code:
The second pop-up doesn't show up every time and when it doesn't the function works. When it does I get an element not interactable error and I don't know why. Here is the inspector for the second pop-up close-btn.
test_data = ['Los Angeles','San Ramon']
base_url = "https://www.hunterdouglas.com/locator"
def pop_up_one(base_url):
driver.get(base_url)
try:
submit_btn = WebDriverWait(driver,5).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"[aria-label='No, I am not a California resident']")))
submit_btn.click()
time.sleep(5)
url = driver.current_url
submit_btn = WebDriverWait(driver,5).until(EC.presence_of_element_located((By.XPATH, "//*[contains(#class,'icon')]")))
print(submit_btn.text)
#submit_btn.click()
except Exception as t:
url = driver.current_url
print(t)
return url
else:
url = driver.current_url
print("second pop_up clicked")
return url
I have tried selecting by the aria-label, class_name, xpath, etc. the way I have it now shows that there is a selenium web element when I print just the element but it doesn't let me click it for some reason. Any direction appreciated. Thanks!
There are 41 elements on that page matching the //*[contains(#class,'icon')] XPath locator. At least the first element is not visible and not clickable, so when you trying to click this submit_btn element this gives element not interactable error.
In case this element is not always appearing you should use logic clicking element only in case the element appeared.
With the correct, unique locator you code can be something like this:
submit_btn = WebDriverWait(driver,5).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"[aria-label='No, I am not a California resident']")))
submit_btn.click()
time.sleep(5)
url = driver.current_url
submit_btn = driver.find_elements_by_xpath('button[aria-label="Close"]')
if(submit_btn):
submit_btn[0].click()
Here I'm using find_elements_by_xpath, this returns a list of web elements. In case element was not found it will be an empty list, it is interpreted as Boolean False in Python.
In case the element is found we will click on the first element in the returned list which will obviously the desired element.
UPD
See the correct locator used here.
It can also be in XPath syntax as //button[#aria-label="Close"]

Selenium can't find xpath or css selector

I'm trying to scrape this site that has pagination. The problem I'm facing is having selenium locate the next button.
What I've tried:
next_button = driver.find_element_by_css_selector(
'ul[class="css-12ke8jn e65zztl0"] button[aria-label="Next"]').click()
and
page_amount = driver.find_element_by_css_selector(
'/html/body/div[1]/div[2]/div/div/div/div[2]/div[1]/main/div[3]/div/div[2]/div[2]/nav/ul/button').click()
None of these work and I'm kinda stuck. The reason I'm using aria-label for the first one is because when the next button is selected the previous button changes to the same class as the next button. Note: The button is inside a ul.
It might not work finding the element because it's not visible in UI - it is loaded but not visible, the easiest way is to move to that element and click on it.
next_button = driver.find_element_by_css_selector('[aria-label=\'Next\']')
actions = ActionChains(driver)
actions.move_to_element(next_button).perform()
next_button.click()
next_button = driver.find_element_by_xpath('//button[#class="css-1lkjxdl eanm77i0"]').click()
You was using xpath variable and finding it by css. for css selector you have to use the class (.css-1lkjxdl) and use the above code it will work and accept the answer. Thanks!!
aria-label is an attribute, not an element.
Your xpath should be fixed as follow:
button[#aria-label="Next"]
To find this button anywhere in the page, you can try:
//button[#aria-label="Next"]
Then, you can try:
button = driver.find_element_by_xpath('//button[#aria-label="Next"]')

Blocking login overlay window when scraping web page using Selenium

I am trying to scrape a long list of books in 10 web pages. When the loop clicks on next > button for the first time the website displays a login overlay so selenium can not find the target elements.
I have tried all the possible solutions:
Use some chrome options.
Use try-except to click X button on the overlay. But it appears only one time (when clicking next > for the first time). The problem is that when I put this try-except block at the end of while True: loop, it became infinite as I use continue in except as I do not want to break the loop.
Add some popup blocker extensions to Chrome but they do not work when I run the code although I add the extension using options.add_argument('load-extension=' + ExtensionPath).
This is my code:
options = Options()
options.add_argument('start-maximized')
options.add_argument('disable-infobars')
options.add_argument('disable-avfoundation-overlays')
options.add_argument('disable-internal-flash')
options.add_argument('no-proxy-server')
options.add_argument("disable-notifications")
options.add_argument("disable-popup")
Extension = (r'C:\Users\DELL\AppData\Local\Google\Chrome\User Data\Profile 1\Extensions\ifnkdbpmgkdbfklnbfidaackdenlmhgh\1.1.9_0')
options.add_argument('load-extension=' + Extension)
options.add_argument('--disable-overlay-scrollbar')
driver = webdriver.Chrome(options=options)
driver.get('https://www.goodreads.com/list/show/32339._50_?page=')
wait = WebDriverWait(driver, 2)
review_dict = {'title':[], 'author':[],'rating':[]}
html_soup = BeautifulSoup(driver.page_source, 'html.parser')
prod_containers = html_soup.find_all('table', class_ = 'tableList js-dataTooltip')
while True:
table = driver.find_element_by_xpath('//*[#id="all_votes"]/table')
for product in table.find_elements_by_xpath(".//tr"):
for td in product.find_elements_by_xpath('.//td[3]/a'):
title = td.text
review_dict['title'].append(title)
for td in product.find_elements_by_xpath('.//td[3]/span[2]'):
author = td.text
review_dict['author'].append(author)
for td in product.find_elements_by_xpath('.//td[3]/div[1]'):
rating = td.text[0:4]
review_dict['rating'].append(rating)
try:
close = wait.until(EC.element_to_be_clickable((By.XPATH, '/html/body/div[3]/div/div/div[1]/button')))
close.click()
except NoSuchElementException:
continue
try:
element = wait.until(EC.element_to_be_clickable((By.CLASS_NAME, 'next_page')))
element.click()
except TimeoutException:
break
df = pd.DataFrame.from_dict(review_dict)
df
Any help like if I can change the loop to for loop clicks next > button until the end rather than while loop or where should I put try-except block to close the overlay or if there is Chromeoption can disable overlay.
Thanks in advance
Thank you for sharing your code and the website that you are having trouble with. I was able to close the Login Modal by using xpath. I took this challenge and broke up the code using class objects. 1 object is for the selenium.webdriver.chrome.webdriver and the other object is for the page that you wanted to scrape the data against ( https://www.goodreads.com/list/show/32339 ). In the following methods, I used the Javascript return arguments[0].scrollIntoView(); method and was able to scroll to the last book that displayed on the page. After I did that, I was able to click the next button
def scroll_to_element(self, xpath : str):
element = self.chrome_driver.find_element(By.XPATH, xpath)
self.chrome_driver.execute_script("return arguments[0].scrollIntoView();", element)
def get_book_count(self):
return self.chrome_driver.find_elements(By.XPATH, "//div[#id='all_votes']//table[contains(#class, 'tableList')]//tbody//tr").__len__()
def click_next_page(self):
# Scroll to last record and click "next page"
xpath = "//div[#id='all_votes']//table[contains(#class, 'tableList')]//tbody//tr[{0}]".format(self.get_book_count())
self.scroll_to_element(xpath)
self.chrome_driver.find_element(By.XPATH, "//div[#id='all_votes']//div[#class='pagination']//a[#class='next_page']").click()
Once I clicked on the "Next" button, I saw the modal display. I was able to find the xpath for the modal and was able to close the modal.
def is_displayed(self, xpath: str, int = 5):
try:
webElement = DriverWait(self.chrome_driver, int).until(
DriverConditions.presence_of_element_located(locator = (By.XPATH, xpath))
)
return True if webElement != None else False
except:
return False
def is_modal_displayed(self):
return self.is_displayed("//body[#class='modalOpened']")
def close_modal(self):
self.chrome_driver.find_element(By.XPATH, "//div[#class='modal__content']//div[#class='modal__close']").click()
if(self.is_modal_displayed()):
raise Exception("Modal Failed To Close")
I hope this helps you to solve your problem.

How to select a value from a drop-down using Selenium from a website with special setting- Python

Note: I particularly deal with this website
How can I use selenium with Python to get the reviews on this page to sort by 'Most recent'?
What I tried was:
driver.find_element_by_id('sort-order-dropdown').send_keys('Most recent')
from this didn't cause any error but didn't work.
Then I tried
from selenium.webdriver.support.ui import Select
select = Select(driver.find_element_by_id('sort-order-dropdown'))
select.select_by_value('recent')
select.select_by_visible_text('Most recent')
select.select_by_index(1)
I've got: Message: Element <select id="sort-order-dropdown" class="a-native-dropdown" name=""> is not clickable at point (66.18333435058594,843.7999877929688) because another element <span class="a-dropdown-prompt"> obscures it
This one
element = driver.find_element_by_id('sort-order-dropdown')
element.click()
li = driver.find_elements_by_css_selector('#sort-order-dropdown > option:nth-child(2)')
li.click()
from this caused the same error msg
This one from this caused the same error also
Select(driver.find_element_by_id('sort-order-dropdown')).select_by_value('recent').click()
So, I'm curious to know if there is any way that I can select the reviews to sort from the most recent first.
Thank you
This worked for me using Java:
#Test
public void amazonTest() throws InterruptedException {
String URL = "https://www.amazon.com/Harry-Potter-Slytherin-Wall-Banner/product-reviews/B01GVT5KR6/ref=cm_cr_dp_d_show_all_top?ie=UTF8&reviewerType=all_reviews";
String menuSelector = ".a-dropdown-prompt";
String menuItemSelector = ".a-dropdown-common .a-dropdown-item";
driver.get(URL);
Thread.sleep(2000);
WebElement menu = driver.findElement(By.cssSelector(menuSelector));
menu.click();
List<WebElement> menuItem = driver.findElements(By.cssSelector(menuItemSelector));
menuItem.get(1).click();
}
You can reuse the element names and follow a similar path using Python.
The key points here are:
Click on the menu itself
Click on the second menu item
It is a better practice not to hard-code the item number but actually read the item names and select the correct one so it works even if the menu changes. This is just a note for future improvement.
EDIT
This is how the same can be done in Python.
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
URL = "https://www.amazon.com/Harry-Potter-Slytherin-Wall-Banner/product-reviews/B01GVT5KR6/ref=cm_cr_dp_d_show_all_top?ie=UTF8&reviewerType=all_reviews";
menuSelector = ".a-dropdown-prompt";
menuItemSelector = ".a-dropdown-common .a-dropdown-item";
driver = webdriver.Chrome()
driver.get(URL)
elem = driver.find_element_by_css_selector(menuSelector)
elem.click()
time.sleep(1)
elemItems = []
elemItems = driver.find_elements_by_css_selector(menuItemSelector)
elemItems[1].click()
time.sleep(5)
driver.close()
Just to keep in mind, css selectors are a better alternative to xpath as they are much faster, more robust and easier to read and change.
This is the simplified version of what I did to get the reviews sorted from the most recent ones. As "Eugene S" said above, the key point is to click on the button itself and select/click the desired item from the list. However, my Python code use XPath instead of selector.
# click on "Top rated" button
driver.find_element_by_xpath('//*[#id="a-autoid-4-announce"]').click()
# this one select the "Most recent"
driver.find_element_by_xpath('//*[#id="sort-order-dropdown_1"]').click()

Categories