I've a code for getting followers from a user on Instagram.But, i've some problems. First of all, sometimes it ends before reaching the bottom therefore i cannot get the all of the followers. Secondly, when i try to get too much followers after a while it takes too much time to scroll. How can i handle these two situations on python?
followers_panel = self.driver.find_element(By.XPATH,'/html/body/div[2]/div/div/div/div[2]/div/div/div[1]/div/div[2]/div/div/div/div/div[2]/div/div/div[2]')
last_ht, ht = 0, 1
while last_ht != ht:
last_ht = ht
ht = self.driver.execute_script(""" arguments[0].scrollTo(0, arguments[0].scrollHeight);return arguments[0].scrollHeight; """,followers_panel)
WebDriverWait(self.driver, 60).until(EC.invisibility_of_element_located((By.CLASS_NAME, "_ab8w _ab94 _ab97 _ab9f _ab9m _ab9p _abc0 _abcm")))
WebDriverWait(self.driver, 45).until(EC.visibility_of_element_located((By.XPATH, '(//div[#class="_aano"])')))
WebDriverWait(self.driver, 45).until(EC.visibility_of_element_located((By.XPATH, '//div[#class="_aanq"]')))
WebDriverWait(self.driver, 45).until(EC.visibility_of_element_located((By.XPATH, '(//div[#class="_aano"])')))
list_of_followers = list(map(lambda x: x.text, self.driver.find_elements(By.XPATH, '//div[#class="_aano"]//a/span/div')))
Related
i wanted to get instagram followers/following user list.
previously used this selenium python script is working:
scroll_box = browser.find_element_by_xpath("//div[#class='isgrP']")
sleep(5)
# height variable
last_ht, ht = 0, 1
while last_ht != ht:
last_ht = ht
sleep(2)
# scroll down and retrun the height of scroll
ht = browser.execute_script("""
arguments[0].scrollTo(0, arguments[0].scrollHeight);
return arguments[0].scrollHeight; """, scroll_box)
i've also tried this solution but still error: https://stackoverflow.com/a/54174682/11727107
is there a change from the instagram dev?
and now how to do scrolling.
thanks
work with this code:
scroll_box = browser.find_element_by_xpath("//div[#class='_9-49']")
So I am trying to get Selenium to keep pulling down vertical scroll bar in an element, see here. Vertical scrolling
This is the code I have been using, it does work - However when most is loaded, it just suddenly stops and keep loading further followers in Instagram
def _get_names(self):
sleep(2)
scroll_box = self.driver.find_element(By.XPATH, "/html/body/div[6]/div/div/div/div[3]")
last_ht, ht = 0, 1
while last_ht != ht:
last_ht = ht
sleep(1)
ht = self.driver.execute_script("""
arguments[0].scrollTo(0, arguments[0].scrollHeight);
return arguments[0].scrollHeight;
""", scroll_box)
links = scroll_box.find_elements_by_tag_name('a')
names = [name.text for name in links if name.text != '']
# close button
self.driver.find_element(By.XPATH, "/html/body/div[4]/div/div[1]/div/div[2]/button")\
.click()
return names
Which then means, it is not checking all Followers, as it stops scrolling.
I'm trying to scrape all the corner betting odds for a given game at skybet, but it looks like scrolling is messing things up in my loop. When I print section.text it looks like its doing what I want but then it clicks the wrong thing?
And when I don't scroll it will only click on the first few odds sections before the code just freezes.
Any help would be really appreciated thanks!
Also, I made the odds_sections refresh itself at each iteration because I thought that might be the problem.
driver = webdriver.Safari()
driver.get("https://m.skybet.com/football/competitions")
driver.maximize_window()
#click accept cookie
try:
button_cookie = WebDriverWait(driver, 20).until(
EC.element_to_be_clickable((By.XPATH, "//body/div[2]/div[1]/a[2]"))
)
button_cookie.click()
except:
print("no cookie")
#find location of premier league
pl = driver.find_elements_by_class_name("split__title")
locate_pl=0
link_name = pl[locate_pl].text
while link_name != "Premier League":
locate_pl += 1
link_name = pl[locate_pl].text
pl[locate_pl].click()
N = locate_pl + 1
#use N now to find pl matches
time.sleep(2)
#click on first match
button_match = driver.find_element_by_xpath("//div[#id='competitions']/ul[1]/li[{}]/div[1]/table[2]/tbody[1]/tr[2]/td[1]/a[1]".format(N))
teams = driver.find_element_by_xpath("//div[#id='competitions']/ul[1]/li[{}]/div[1]/table[2]/tbody[1]/tr[2]/td[1]/a[1]/b/span".format(N))
button_match.send_keys(Keys.ENTER)
time.sleep(2)
#find and click corners button
try:
button_corners = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.CLASS_NAME,"_1ouz2ki")))
#button_corners = driver.find_elements_by_class_name("_1ouz2ki")
except:
print("no corners")
n=0
link_name = button_corners[n].text
while link_name != "Corners":
n += 1
link_name = button_corners[n].text
button_corners[n].click()
#Now we will scrape all corner odds for this game.
odds_sections = driver.find_elements_by_class_name('_t0tx82')
N_sections = len(odds_sections)
c=0
scroll_to = 35
#the issue is within this loop
while c <= N_sections:
odds_sections = driver.find_elements_by_class_name('_t0tx82')
section = odds_sections[c]
print(section.text)
section.click()
time.sleep(2)
section.click()
c += 1
driver.execute_script("window.scrollTo(0,{})".format(scroll_to))
I'm writing a script to scrape product names from a website, filtered by brands. Some search results may contain more than one page, and this is where the problem comes in. I'm able to scrape the first page but when the script clicks on the next page the error message selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: element is not attached to the page document shows. Below is my code:
def scrape():
resultList = []
currentPage = 1
while currentPage <= 2:
titleResults = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'h4.mt-0')))
resultList.append(titleResults)
checkNextPage = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//div/nav/ul/li/a[#aria-label='Next']")))
for cnp in checkNextPage:
nextPageNumber = int(cnp.get_attribute("data-page"))
currentPage += 1
driver.find_element_by_xpath("//div/nav/ul/li/a[#aria-label='Next']").click()
for result in resultList[0]:
print("Result: {}".format(result.text))
I think the error got triggered when .click() was called. I've done a lot of searching on the internet before resorting to posting this question here because either I don't understand the solutions from other articles/posts or they don't apply to my case.
Stale Element means an old element or no longer available element.
I think the error is caused by last line.
You should extract elements text before the elements become unavailable.
def scrape():
resultList = []
currentPage = 1
while currentPage <= 2:
titleResults = WebDriverWait(driver,
10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'h4.mt-0')))
// Extract elements text
results_text = [titleResults[i].text for i in range(0, len(titleResults))]
resultList.extend(results_text)
checkNextPage = WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//div/nav/ul/li/a[#aria-label='Next']")))
for cnp in checkNextPage:
nextPageNumber = int(cnp.get_attribute("data-page"))
currentPage += 1
driver.find_element_by_xpath("//div/nav/ul/li/a[#aria-label='Next']").click()
print("Result: {}".format(resultList))
I am trying to scrape all job postings for the last 24 hours from Glassdoor and save them to a dictionary.
binary = FirefoxBinary('path_to_firebox_binary.exe')
cap = DesiredCapabilities().FIREFOX
cap["marionette"] = True
driver = webdriver.Firefox(firefox_binary=binary, capabilities=cap, executable_path=GeckoDriverManager().install())
base_url = 'https://www.glassdoor.com/Job/jobs.htm?suggestCount=0&suggestChosen=false&clickSource=searchBtn' \
'&typedKeyword=data+sc&sc.keyword=data+scientist&locT=C&locId=1154532&jobType= '
driver.get(url=base_url)
driver.implicitly_wait(20)
driver.maximize_window()
WebDriverWait(driver, 20).until(
EC.element_to_be_clickable((By.CSS_SELECTOR, "div#filter_fromAge>span"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((
By.XPATH, "//div[#id='PrimaryDropdown']/ul//li//span[#class='label' and contains(., 'Last Day')]"))).click()
# find job listing elements on web page
listings = driver.find_elements_by_class_name("jl")
n_listings = len(listings)
results = {}
for index in range(n_listings):
driver.find_elements_by_class_name("jl")[index].click() # runs into error
print("clicked listing {}".format(index + 1))
info = driver.find_element_by_class_name("empInfo.newDetails")
emp = info.find_element_by_class_name("employerName")
results[index] = {'title': title, 'company': emp_name, 'description': description}
I keep running into the error message
selenium.common.exceptions.StaleElementReferenceException: Message:
The element reference of is stale; either the element is no longer attached to the
DOM, it is not in the current frame context, or the document has been
refreshed
for the first line inside my for loop. Even if the for loop runs for some number of times, it eventually leads to the exception showing up. I am new to selenium and web scraping, will appreciate any help.
Every time a new post is selected the clicked element is being modified, and therefor the DOM is being refreshed. The change is slow, certainly in comparison to the actions in the loop, so what you want to do is to slow it a little bit. Instead of using fixed sleep you can wait for the changes to occur
Every time you select a posting a new class selected is being added and the style attribute lose it's content. You should wait for this to happen, get the information, and click the next post
wait = WebDriverWait(driver, 20)
for index in range(n_listings - 1):
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.selected:not([style="border-bottom:0"])')))
print("clicked listing {}".format(index + 1))
info = driver.find_element_by_class_name('empInfo.newDetails')
emp = info.find_element_by_class_name('employerName')
if index < n_listings - 1:
driver.find_element_by_css_selector('.selected + .jl').click()
This error means the element you are trying to click on was not found, you have to first make sure the target element exists and then call click() or wrap it in a try/except block.
# ...
results = {}
for index in range(n_listings):
try:
driver.find_elements_by_class_name("jl")[index].click() # runs into error
except:
print('Listing not found, retrying in 1 seconds ...')
time.sleep(1)
continue
print("clicked listing {}".format(index + 1))
info = driver.find_element_by_class_name("empInfo.newDetails")
emp = info.find_element_by_class_name("employerName")
# ...