I have been looking EVERYWHERE for some form of help on any method on python to web scrape all nba props from app.prizepicks.com. I came down to 2 potential methods: API with pandas and selenium. I believe prizepicks recently shut down their api system to restrain from users from scraping the nba props, so to my knowledge using selenium-stealth is the only way possible to web scrape the prizepicks nba board. Can anyone please help me with, or provide a code that scrapes prizepicks for all nba props? The information needed would be the player name, prop type (such as points, rebounds, 3-Pt Made, Free throws made, fantasy, pts+rebs, etc.), prop line (such as 34.5, 8.5, which could belong to a prop type such as points and rebounds, respectively). I would need this to work decently quickly and refresh every set amount of minutes. I found something similar to what i would want provided in another thread by 'C. Peck'. Which I will provide (hopefully, i dont really know how to use stackoverflow). But the code that C. Peck provided does not work on my device and i was wondering if anyone here write a functional code/fix this code to work for me. I have a macbook pro so i dont know if that affects anything.
EDIT
After a lot of trial and error, and help from the thread, I have managed to complete the first step. I am able to webscrape from the "Points" tab on the nba league of prizepicks, but I want to scrape all the info from every tab, not just points. I honestly dont know why my code isnt fully working, but i basically want it to scrape points, rebounds, assists, fantasy, etc... Let me know any fixes i should do to be able to scrape for every stat_element in the stat_container, or other methods too! Ill update the code below:
EDIT AGAIN
it seems like the problem lies in the "stat-container" and "stat-elements". I checked to see what elements the "stat-elements" has, and it is only points. I checked to see what elements the "stat-container" has, and it gave me an error. I believe if someone helps me with that then the problem will be fixed. This is the error it gives when i try to see the elements inside of "stat-container": line 27, in
for element in stat_container:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'WebElement' object is not iterable
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://app.prizepicks.com/")
driver.find_element(By.CLASS_NAME, "close").click()
time.sleep(2)
driver.find_element(By.XPATH, "//div[#class='name'][normalize-space()='NBA']").click()
time.sleep(2)
# Wait for the stat-container element to be present and visible
stat_container = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CLASS_NAME, "stat-container")))
# Find all stat elements within the stat-container
stat_elements = driver.find_elements(By.CSS_SELECTOR, "div.stat")
# Initialize empty list to store data
nbaPlayers = []
# Iterate over each stat element
for stat in stat_elements:
# Click the stat element
stat.click()
projections = WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".projection")))
for projection in projections:
names = projection.find_element(By.XPATH, './/div[#class="name"]').text
points= projection.find_element(By.XPATH, './/div[#class="presale-score"]').get_attribute('innerHTML')
text = projection.find_element(By.XPATH, './/div[#class="text"]').text
print(names, points, text)
players = {
'Name': names,
'Prop':points, 'Line':text
}
nbaPlayers.append(players)
df = pd.DataFrame(nbaPlayers)
print(df)
driver.quit()
Answer updated with the working code. I made some small changes
Replaced stat_elements with categories, which is a list containing the stat names.
Loop over categories and click the div button with label equal to the current category name
Add .replace('\n','') at the end of the text variable
.
driver.get("https://app.prizepicks.com/")
driver.find_element(By.CLASS_NAME, "close").click()
time.sleep(2)
driver.find_element(By.XPATH, "//div[#class='name'][normalize-space()='NBA']").click()
time.sleep(2)
# Wait for the stat-container element to be present and visible
stat_container = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CLASS_NAME, "stat-container")))
# Find all stat elements within the stat-container
# i.e. categories is the list ['Points','Rebounds',...,'Turnovers']
categories = driver.find_element(By.CSS_SELECTOR, ".stat-container").text.split('\n')
# Initialize empty list to store data
nbaPlayers = []
# Iterate over each stat element
for category in categories:
# Click the stat element
line = '-'*len(category)
print(line + '\n' + category + '\n' + line)
driver.find_element(By.XPATH, f"//div[text()='{category}']").click()
projections = WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".projection")))
for projection in projections:
names = projection.find_element(By.XPATH, './/div[#class="name"]').text
points= projection.find_element(By.XPATH, './/div[#class="presale-score"]').get_attribute('innerHTML')
text = projection.find_element(By.XPATH, './/div[#class="text"]').text.replace('\n','')
print(names, points, text)
players = {'Name': names, 'Prop':points, 'Line':text}
nbaPlayers.append(players)
pd.DataFrame(nbaPlayers)
Related
I am only a hobbyist with Python, so please bare with me. I am trying to run this script to collect Trip Advisor reviews and write to excel.
But once it opens up the website it throws this error: NoSuchElementException no such element: Unable to locate element: {"method":"xpath","selector":".//q[#class='IRsGHoPm']"}
Any got any ideas on what is going wrong?
import csv #This package lets us save data to a csv file
from selenium import webdriver #The Selenium package we'll need
from selenium.webdriver.support import expected_conditions
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import StaleElementReferenceException
import time #This package lets us pause execution for a bit
path_to_file = "E:Desktop/Data/Reviews.csv"
pages_to_scrape = 3
url = "https://www.tripadvisor.com/Hotel_Review-g60982-d209422-Reviews-Hilton_Waikiki_Beach-Honolulu_Oahu_Hawaii.html"
# import the webdriver
driver = webdriver.Chrome()
driver.get(url)
# open the file to save the review
csvFile = open(path_to_file, 'a', encoding="utf-8")
csvWriter = csv.writer(csvFile)
# change the value inside the range to save the number of reviews we're going to grab
for i in range(0, pages_to_scrape):
# give the DOM time to load
time.sleep(5)
# Click the "expand review" link to reveal the entire review.
driver.find_element_by_xpath(".//div[contains(#data-test-target, 'expand-review')]").click()
# Now we'll ask Selenium to look for elements in the page and save them to a variable. First lets define a container that will hold all the reviews on the page. In a moment we'll parse these and save them:
container = driver.find_elements_by_xpath("//div[#data-reviewid]")
# Next we'll grab the date of the review:
dates = driver.find_elements_by_xpath(".//div[#class='_2fxQ4TOx']")
# Now we'll look at the reviews in the container and parse them out
for j in range(len(container)): # A loop defined by the number of reviews
# Grab the rating
rating = container[j].find_element_by_xpath(".//span[contains(#class, 'ui_bubble_rating bubble_')]").get_attribute("class").split("_")[3]
# Grab the title
title = container[j].find_element_by_xpath(".//div[contains(#data-test-target, 'review-title')]").text
#Grab the review
review = container[j].find_element_by_xpath(".//q[#class='IRsGHoPm']").text.replace("\n", " ")
#Grab the data
date = " ".join(dates[j].text.split(" ")[-2:])
#Save that data in the csv and then continue to process the next review
csvWriter.writerow([date, rating, title, review])
# When all the reviews in the container have been processed, change the page and repeat
driver.find_element_by_xpath('.//a[#class="ui_button nav next primary "]').click()
# When all pages have been processed, quit the driver
driver.quit()
enter image description here
I am trying to finding "_2fxQ4TOx" class name in the page but it came up with 0 results. May be these are the auto-generated classes by browser to keep up with compression and its own CSS reference.
Instead, try to find elements by these attributes (which actually exist in the frontend), try finding it like:
container = driver.find_element_by_css_selector('div[data-test-target="reviews-tab"]')
reviews = container.find_elements_by_css_selector('div[data-test-target="HR_CC_CARD"]')
for review in reviews:
# get date
# get descriptions
# ...
you can also refer to this link:
Is there a way to find an element by attributes in Python Selenium?
I'm trying to grab the individual video lengths for all videos on one channel and store it in a list or something.
So first I tried Beautiful Soup by using requests library and doing findAll("div") but I get nothing useful. None of the elements look at all like the inspect element on the youtube channel page. Apparently it's because YouTube loads dynamically or something. So you have to use selenium. Idk what that means, but anyway I tried selenium and got this error:
Unable to locate element: {"method":"css selector","selector":"[id="video-title"]"}
from this code:
from selenium import webdriver
from selenium.webdriver.common.by import By
PATH = (path\chromedriver.exe)
driver = webdriver.Chrome(PATH)
driver.get(r"https://www.youtube.com/c/0214mex/videos?view=0&sort=dd&shelf_id=0")
print(driver.title)
search = driver.find_element(By.ID,"video-title")
print(search)
driver.quit()
I get the feeling I don't really understand how web scraping works. Usually if I wanted to grab elements from a webpage I'd just do the soup thing, findAll on div and then keep going down until I reached the a tag or whatever I needed. But I'm having no luck with doing that on YT channel pages.
Is there an easy way of doing this? I can clearly see the hierarchy when I do inspect element on the YouTube page. It goes:
body -> div id=content -> ytd-browse class... -> ytd-two-column-browse-results... -> div id=primary -> div id=contents -> div id =items -> div id = dismissible -> div id =details -> div id=meta -> h3 class... -> and inside an a tag there's all the information I need.
I'm probably naive for thinking that if I simply findAll on "div" it would just show me all the divs, I'd then go to the last one div id=meta and then searchAll "h3" and then search "a" tags and I'd have my info. But searching for "div" with findAll (in BeautifulSoup) has none of those divs and actually the ones it comes up with I can't even find in the select element thing.
So yeah, I seem to be misunderstanding how the findAll thing works. Can anyone provide a simple step-by-step way of getting the information which I'm looking for? Is it impossible without using selenium?
Problem explanation
YouTube is dynamic in nature what it means is, basically the more you load by doing scroll down, the more content it will show. So yes that's dynamic in nature.
So even Selenium understand the same thing, scroll down and add more items title into the list. Also they are typical take few seconds to load, so having an explicit waits will definitely help you get all the title.
You need to maximize the windows, put time.sleep(.5) for visibility and bit of stability. I have put range to 200, meaning grab 200 title, you can put any sensible arbitrary number and script should do the magic.
Also, since it is dynamic, I have defined number_of_title_to_scrape = 100, you can try with your desired number as well.
Solution
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
driver.implicitly_wait(50)
driver.get("https://www.youtube.com/c/0214mex/videos")
wait = WebDriverWait(driver, 20)
video_title = []
len_of_videos= len(driver.find_elements(By.XPATH, "//a[#id='video-title']"))
j = 0
number_of_title_to_scrape = 100
for i in range(number_of_title_to_scrape):
elements = driver.find_elements(By.XPATH, "//a[#id='video-title']")
driver.execute_script("arguments[0].scrollIntoView(true);", elements[j])
time.sleep(.5)
title = wait.until(EC.visibility_of((elements[j])))
print(title.text)
video_title.append(title.text)
j = j +1
if j == number_of_title_to_scrape:
break
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Output :
【遊戯王】千年パズルが難易度最高レベルでマジで無理wwwww
【ルームツアー】3億円の豪邸の中ってどうなってるの?
コムドットと心霊スポット行ったらヤンキー達に道を塞がれた。
【事件】コムドットやまと。盗撮した人が変装したはじめしゃちょーでもキレ過ぎて分からない説wwwww
はじめしゃちょーで曲を作ってみた【2021年ver.】
3億円の豪邸を買いました!
【コアラのマーチ】新種のコアラ考えたら商品化されてしまいましたwwwww
ヨギボー100万円分買って部屋に全部置いてみた結果wwwww
自販機からコーラが止まらなくなりました
絶対にむせる焼きそばがあるらしいぞ!んなわけねえよ!
はじめしゃちょーがsumika「Lovers」歌ってみた【THE FIRST TAKE】
Mr.マリックのマジックを全部失敗させたら空気が...
ビビりの後輩を永遠にビックリさせてみたwwwwwwww
【泣くなはじめ】大家さん。今まで6年間ありがとうございました。
液体窒素を口の中に入れてみたwwwwww
ヒカルに1億円の掴み取りさせたら大変な事になってしまった。
ヒカルさんに縁切られました。電話してみます。
玄関に透明なアクリル板があったらぶつかる説
なんとかしてください。
【+1 FES 3rd STAGE】誰が最強?YouTuber瓦割り対決!
【実況付き】アナウンサーとニュースっぽく心霊スポット行ったら怖くない説wwwww
体重が軽過ぎる男がシーソーに乗った結果wwwww
糸電話を10年間待ち続けた結果…
愛車を人にあげる事になったので最後に改造しました。
【閲覧注意】ゴギブリを退治したいのに勢いだけで結局何もできないヤツ
【12Kg】巨大なアルミ玉の断面ってどうなってるの?切断します。
打つのに10年間かかる超スローボールを投げた結果…
人の家の前にジンオウガがいた結果wwwwwwww
【野球】高く打ち上げたボールを10年間待った結果…
シャンプー泡立ったままでイルカショー見に行ってみたwwwwwww
水に10年間潜り続けた男
ランニングマシン10年間走り続けた結果…
コロナ流行ってるけど親友の結婚式行ってきた。
10年間タピオカ吸い続けた結果...
【バスケ】スリーポイント10000回練習したぞ。
かくれんぼ10年間放置した結果...
【危険】24時間エスカレーター生活してみた。
1日ヒカキンさんの執事として働いてみたwwwww
人が食ってたパスタを液体窒素でカチカチにしといたwwwwwwwwww
1人だけ乗ってる観覧車が檻になってるドッキリwwwwwww
【検証】コカ・コーラが1番美味しいのはどんなシチュエーション?
はじめしゃちょーをアニメ風に描いてもらった結果wwwww
コーラの容器がブツブツになった結果wwwww #shorts
絶対にケガする重い扉を倒してみた結果wwwwww
ショートケーキの缶売ってたwwwww #shorts
ガチの事故物件で1日生活してみたら何か起こるの?
【初公開】はじめしゃちょーの1日密着動画。
コーラに油を混ぜてメントス入れたらすごい事になる?! #shorts
【拡散希望】河野大臣…コロナワクチンって本当に大丈夫なん…?
ヤバい服見つけたんだがwwwwwwwwww
ヒカキンさんにチャンネル登録者数抜かれました。
What if the classroom you were hiding in the locker was a women's changing room?
コーラがトゲトゲになった結果wwwwww #shorts
エヴァンゲリオンが家の前に立っていたら...?
夏の始まり。1人で豪華客船を貸し切ってみた。女と行きたかった。
【検証】大食いYouTuber VS オレと業務用調理器。どっちが早いの?
天気良いのにオレの家だけ雨降らせたらバレる?バレない?wwwww
3000円ガチャ発見!PS5当たるまで回したらヤバい金額にwwwwww
カラオケで入れた曲のラスサビ永遠ループドッキリwwwwwwww
【ラーメン】ペヤング超超超超超超大盛りペタマックスの新作出たwwwwwww食います
深夜に急に家に入ってくる配達員。
オレは社会不適合者なのか
巨大なクマさん買い過ぎちゃった!
GACKTさん。オレGACKTさんみたいになりたいっス。
100万円の世界最強のスピーカー買ったんやけど全てがヤバいwwwww
【奇妙】ヒカキンさんにしか見えない人が1日中めっちゃ倒れてたらどうする?
ヘリウムガス吸い過ぎたら一生声が戻らなくなるドッキリwwwwwwww
スマブラ世界最強の男 VS 何でもして良いはじめしゃちょー
山田孝之とはじめしゃちょーの質問コーナー!そして消えた200万円。
山田孝之さんにめちゃくちゃ怒られました。
ヒカキンじゃんけんで絶対チョキを出させる方法を発見wwwwwwww
6年ぶりに銅羅で起こされたら同じ反応するの?
バイト先の後輩だった女性と結婚しました。
フォーエイトはエイトフォーで捕まえられるの?
ジムが素敵な女の子だらけだったら限界超えてバーベルめっちゃ上がる説
同棲?
はじめしゃちょー。バイクを買う。
【実話】過去に女性化していた事を話します。
【近未来】自走するスーツケースを買いました。もう乗り物。
ジェットコースターのレールの上を歩いてみた。
バカな後輩ならパイの実がめっちゃ大きくなってても気づかねえよwwwwwwww
久しぶりに他人を怒鳴ったわ
【42万円】Amazonですごいモノが売ってたので買いました。そして人の家の前へ
ペヤングの最新作がヤバ過ぎて全部食べれませんでした。
人の家の前で日本刀を持ったクマさんがずっと待ってる動画
3Pシュートを10000回練習したらどれくらい上手くなるの?【〜5000回】
おい佐藤二朗。オレとやり合おうや。
ひとりぼっちの君へ。
これはオレのしたかった東京の生活じゃない。
巨大なクマのぬいぐるみを浮かせたい
バスケットボール100個で試合したらプロに勝てるんじゃね?
【閲覧注意】100デシベル以上でねるねるねるね作ったら日本1うるさい動画になったwwwwww
オレずっと筋トレ続けてたんスよ。
収録までさせて勝手にDr.STONEの世界にオレがいる話作ってみた
失禁マシーンってのがあるらしいぞwwwwwwwww
謎の部屋に閉じ込められました。おや?真ん中になにかあるぞ?
家に来てほしくないから看板作りました。
これが未来のサウナです。
【恐怖映像】オレの後輩がガチでクズすぎる
就活あるある【ゲストが豪華】
If you want a specific number of videos- go for for loop as mentioned in another answer. Below code will keep scrolling until manually close the browser.
from selenium import webdriver
driver = webdriver.Chrome(executable_path="path to chromedriver.exe")
driver.maximize_window()
driver.implicitly_wait(20)
driver.get("https://www.youtube.com/c/0214mex/videos")
j=0
try:
while True: # Infinite loop, to keep scrolling.
videos = driver.find_elements_by_id("dismissible") # Find all the videos. Initially the length of videos will be 30, keeps increasing as we scroll down.
driver.execute_script("arguments[0].scrollIntoView(true);", videos[j]) # Scroll to all videos, more videos will be loaded and list of videos will be updated everytime.
video_name = videos[j].find_element_by_tag_name("h3").get_attribute("innerText") # videos[j] - Get the name of jth video indiviually
video_length = videos[j].find_element_by_xpath(".//span[#class='style-scope ytd-thumbnail-overlay-time-status-renderer']").text # videos[j] - Get the length of jth video indiviually.
print("{}: {}-{}".format(j+1,video_name,video_length)) # To print in specific format.
j+=1 # Increment j
except Exception as e:
print(e)
driver.quit()
Output: (Manually closed the browser)
1: 【遊戯王】千年パズルが難易度最高レベルでマジで無理wwwww-12:26
2: 【ルームツアー】3億円の豪邸の中ってどうなってるの?-8:20
3: コムドットと心霊スポット行ったらヤンキー達に道を塞がれた。-18:08
...
294: これがwww世界1巨大なwww人をダメにするソファwwwwww-8:06
295: 皆さまにお願いがあります。-4:18
Message: no such window: target window already closed
from unknown error: web view not found
I am trying to obtain the electoral results of the 2016 elections in Peru disaggregated by district. As you can see in this link (https://www.web.onpe.gob.pe/modElecciones/elecciones/elecciones2016/PRPCP2016/Resultados-Ubigeo-Presidencial.html#posicion), when selecting the "scope", three boxes appear more with the title of "department", "province" and "district". I want to extract all the results from the image table at the "district" level and save them in a table, but to achieve this I must first select "department", "province" and "district" 1
So far, I have written this code that inserts options determined by me to dropdown lists, but I was wondering if I can automate the selection of options. There are 1874 districts so doing it manually is not an option.
I hope you can help me. Thanks!
import selenium
from selenium import webdriver
from selenium.webdriver.support.ui import Select
import time
driver = webdriver.Chrome()
driver.get('https://www.web.onpe.gob.pe/modElecciones/elecciones/elecciones2016/PRPCP2016/Resultados-Ubigeo-Presidencial.html#posicion')
time.sleep(4)
selectAmbit = Select(driver.find_element_by_id("cdgoAmbito"))
selectAmbit.select_by_value('P')
time.sleep(2)
selectRegion = Select(driver.find_element_by_id("cdgoDep"))
selectRegion.select_by_value('010000')
time.sleep(2)
selectProvin = Select(driver.find_element_by_id("cdgoProv"))
selectProvin.select_by_value('010200')
time.sleep(2)
selectDistr = Select(driver.find_element_by_id("cdgoDist"))
selectDistr.select_by_value('010202')
A simple loop to get all selectDistr options. You'd just do it nested to select all options of every type .
selectDistr = Select(driver.find_element_by_id("cdgoDist"))
for option in selectDistr.options:
selectDistr.select_by_value(option.get_attribute('value'))
I want to extract all the fantasy teams that have been entered for past contests. To loop through the dates, I just change a small part of the URL as shown in my code below:
#Packages:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
import pandas as pd
# Driver
chromedriver =("C:/Users/Michel/Desktop/python/package/chromedriver_win32/chromedriver.exe")
driver = webdriver.Chrome(chromedriver)
# Dataframe that will be use later
results = pd.DataFrame()
best_lineups=pd.DataFrame()
opti_lineups=pd.DataFrame()
#For loop over all DATES:
calendar=[]
calendar.append("2019-01-10")
calendar.append("2019-01-11")
for d in calendar:
driver.get("https://rotogrinders.com/resultsdb/date/"+d+"/sport/4/")
Then, to access the different contests of that day, you need to click on the contest tab. I use the following code to locate and click on it.
# Find "Contest" tab
contest= driver.find_element_by_xpath("//*[#id='root']/div/main/main/div[2]/div[3]/div/div/div[1]/div/div/div/div/div[3]")
contest.click()
I simply inspect and copy the xpath of the tab. However, most of the times it is working, but sometimes I get an error message " Unable to locate element...". Moreover, it seems to work only for the first date in my calendar loop and always fails in the next iteration... I do not know why. I try to locate it differently, but I feel I am missing something such as:
contests=driver.find_element_by_xpath("//*[#role='tab']
Once, the contest tab is successfully clicked, all contests of that day are there and you can click on a link to access all the entries of that contest. I stored the contests in order to iterate throuhg all as follow:
list_links = driver.find_elements_by_tag_name('a')
hlink=[]
for ii in list_links:
hlink.append(ii.get_attribute("href"))
sub="https://rotogrinders.com/resultsdb"
con= "contest"
contest_list=[]
for text in hlink:
if sub in text:
if con in text:
contest_list.append(text)
# Iterate through all the entries(user) of a contest and extract the information of the team entered by the user
for c in contest_list:
driver.get(c)
Then, I want to extract all participants team entered in the contest and store it in a dataframe. I am able to do it successfully for the first page of the contest.
# Waits until tables are loaded and has text. Timeouts after 60 seconds
while WebDriverWait(driver, 60).until(ec.presence_of_element_located((By.XPATH, './/tbody//tr//td//span//a[text() != ""]'))):
# while ????:
# Get tables to get the user names
tables = pd.read_html(driver.page_source)
users_df = tables[0][['Rank','User']]
users_df['User'] = users_df['User'].str.replace(' Member', '')
# Initialize results dataframe and iterate through users
for i, row in users_df.iterrows():
rank = row['Rank']
user = row['User']
# Find the user name and click on the name
user_link = driver.find_elements(By.XPATH, "//a[text()='%s']" %(user))[0]
user_link.click()
# Get the lineup table after clicking on the user name
tables = pd.read_html(driver.page_source)
lineup = tables[1]
#print (user)
#print (lineup)
# Restructure to put into resutls dataframe
lineup.loc[9, 'Name'] = lineup.iloc[9]['Salary']
lineup.loc[10, 'Name'] = lineup.iloc[9]['Pts']
temp_df = pd.DataFrame(lineup['Name'].values.reshape(-1, 11),
columns=lineup['Pos'].iloc[:9].tolist() + ['Total_$', 'Total_Pts'] )
temp_df.insert(loc=0, column = 'User', value = user)
temp_df.insert(loc=0, column = 'Rank', value = rank)
temp_df["Date"]=d
results = results.append(temp_df)
#next_button = driver.find_elements_by_xpath("//button[#type='button']")
#next_button[2].click()
results = results.reset_index(drop=True)
driver.close()
However, there are other pages and to access it, you need to click on the small arrow next buttonat the bottom. Moreover, you can click indefinitely on that button; even if there are not more entries. Therefore, I would like to be able to loop through all pages with entries and stop when there are no more entries and change contest. I try to implement a while loop to do so, but my code did not work...
You must really make sure that page loads completely before you do anything on that page.
Moreover, it seems to work only for the first date in my calendar loop
and always fails in the next iteration
Usually when selenium loads a browser page it tries to look for the element even if it is not loaded all the way. I suggest you to recheck the xpath of the element you are trying to click.
Also try to see when the page loads completely and use time.sleep(number of seconds)
to make sure you hit the element or you can check for a particular element or a property of element that would let you know that the page has been loaded.
One more suggestion is that you can use driver.current_url to see which page are you targetting. I have had this issue while i was working on multiple tabs and I had to tell python/selenium to manually switch to that tab
Note: I particularly deal with this website
How can I use selenium with Python to get the reviews on this page to sort by 'Most recent'?
What I tried was:
driver.find_element_by_id('sort-order-dropdown').send_keys('Most recent')
from this didn't cause any error but didn't work.
Then I tried
from selenium.webdriver.support.ui import Select
select = Select(driver.find_element_by_id('sort-order-dropdown'))
select.select_by_value('recent')
select.select_by_visible_text('Most recent')
select.select_by_index(1)
I've got: Message: Element <select id="sort-order-dropdown" class="a-native-dropdown" name=""> is not clickable at point (66.18333435058594,843.7999877929688) because another element <span class="a-dropdown-prompt"> obscures it
This one
element = driver.find_element_by_id('sort-order-dropdown')
element.click()
li = driver.find_elements_by_css_selector('#sort-order-dropdown > option:nth-child(2)')
li.click()
from this caused the same error msg
This one from this caused the same error also
Select(driver.find_element_by_id('sort-order-dropdown')).select_by_value('recent').click()
So, I'm curious to know if there is any way that I can select the reviews to sort from the most recent first.
Thank you
This worked for me using Java:
#Test
public void amazonTest() throws InterruptedException {
String URL = "https://www.amazon.com/Harry-Potter-Slytherin-Wall-Banner/product-reviews/B01GVT5KR6/ref=cm_cr_dp_d_show_all_top?ie=UTF8&reviewerType=all_reviews";
String menuSelector = ".a-dropdown-prompt";
String menuItemSelector = ".a-dropdown-common .a-dropdown-item";
driver.get(URL);
Thread.sleep(2000);
WebElement menu = driver.findElement(By.cssSelector(menuSelector));
menu.click();
List<WebElement> menuItem = driver.findElements(By.cssSelector(menuItemSelector));
menuItem.get(1).click();
}
You can reuse the element names and follow a similar path using Python.
The key points here are:
Click on the menu itself
Click on the second menu item
It is a better practice not to hard-code the item number but actually read the item names and select the correct one so it works even if the menu changes. This is just a note for future improvement.
EDIT
This is how the same can be done in Python.
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
URL = "https://www.amazon.com/Harry-Potter-Slytherin-Wall-Banner/product-reviews/B01GVT5KR6/ref=cm_cr_dp_d_show_all_top?ie=UTF8&reviewerType=all_reviews";
menuSelector = ".a-dropdown-prompt";
menuItemSelector = ".a-dropdown-common .a-dropdown-item";
driver = webdriver.Chrome()
driver.get(URL)
elem = driver.find_element_by_css_selector(menuSelector)
elem.click()
time.sleep(1)
elemItems = []
elemItems = driver.find_elements_by_css_selector(menuItemSelector)
elemItems[1].click()
time.sleep(5)
driver.close()
Just to keep in mind, css selectors are a better alternative to xpath as they are much faster, more robust and easier to read and change.
This is the simplified version of what I did to get the reviews sorted from the most recent ones. As "Eugene S" said above, the key point is to click on the button itself and select/click the desired item from the list. However, my Python code use XPath instead of selector.
# click on "Top rated" button
driver.find_element_by_xpath('//*[#id="a-autoid-4-announce"]').click()
# this one select the "Most recent"
driver.find_element_by_xpath('//*[#id="sort-order-dropdown_1"]').click()