Using Python + Selenium to create a web crawler/scraper to notify me when new homework is posted. Managed to log into the main website, but you need to click a link to select your course.
After searching through the HTML manually, I found this information about the link I usually click (The blue box is the link).
However, no button that seems clickable. So I searched the page for the link I knew it should redirect me to, and I found this:
It looks like a card, which is a new data structure/object for me. How can I use an automated web crawler to click this link?
Try the following:
ui.WebDriverWait(self.driver, timeout).until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".title.ellipsis")))
driver.find_element_by_css_selector(".title.ellipsis").click()
Hope it helps you!
Related
I have a python code that works for me, unfortunately it does not work on wix.com login page. (to see that page, open wix.com, click Sign In button, and then click Log in link to open the page)
The problem is with clicking on recaptcha box.
This is a part of the relevant page html
Here's the part of the code that I am using for testing:
frames = driver.find_element(By.XPATH, "//iframe[#title='reCAPTCHA']") driver.switch_to.frame(frames) try: a = driver.find_element(By.CLASS_NAME, "recaptcha-checkbox-border") a.click()
Program goes in to the try area and then throws an error when trying to find the element.
I would appreciate if someone could help me find the reason why?
My guess it is something related to the page I am accessing as the same code works well on a different site.
I am working on scraping data from the Flashscore website.
https://www.flashscore.com/football/albania/superliga-2019-2020/results/
Although I can find the links for most of the matches that are visible once the above page loads, there are many matches that are hidden and can only be accessed by clicking on 'Show more matches'.
Snapshot of the page
I found the class for 'Show more matches' (event__more event__more--static) and used the '.click()' method of the selenium library in Python but the output is null. Also, I tried various other implementations of clicking this link but couldn't get it working.
Is there any other way I can click on the link and extract the information in Python? Any help would be greatly appreciated.
Note: I also haven't found any classes where all of this information is hidden.
You can use the execute_script() driver method to achieve this. It's used for executing JavaScript in the current window/frame.
You can find the code snippet below-
driver.get('https://www.flashscore.com/football/albania/superliga-2019-2020/results/')
show_more_button=driver.find_element_by_xpath('//*[#id="live-table"]/div[1]/div/div/a') #find the show more results element
driver.execute_script("arguments[0].click();", show_more_button)
Using selenium I can browse through the entire website except for the login screen on a specific website. When I checked page source I see some js codes.
Using a normal chrome browser I can access the expected login screen. Can anyone help me to overcome this issue? Thanks in advance.
Chances are that the website is detecting that you are using a bot and is blocking you from accessing its login screen for that reason. I can't know for sure that this is the reason because I haven't seen your issue in person, but most good websites don't like having non-human users interact with their pages, and a login page is exactly the type of page that the website would block you from accessing.
You can try to change some things to make Selenium less detectable by the website, but it's quite difficult and inconsistent at best. You can find some more information about achieving this here, but I wouldn't expect too much.
I dont know where to exactly start here, and I have to admit my knowledge of python and websites are limited. However In the past ive done some requests from an api and accessed a file or two from a website but I had some examples to build off of. In this case I have no written example to help me through the process so I dont really know where to start or if "requests" is even the way to go.
What I have is a distributor's website that has a file with product information.
If I were to download this file manually I would have to login, navigate to the download section of the website. At this point a popup appears where I select the brand I want to download, I have options to select from as far as data I would like to gather, a text box to name the file and a download button that has no url.
Im sure all this seems pretty vague since I dont know what info would be helpful at this point.
A nudge in the right direct would be great!!
Thanks
Screen shot of popup
It sounds like there may not be an API, in instances like this using a web automation solution such as selenium could get you the desired result.
For your case it sounds like you will need to find the button elements and then click them
From their basic example:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get("http://www.python.org")
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.send_keys("pycon")
enter code here`elem.send_keys(Keys.RETURN)
based on your example html code, after you load the page, you could use the following to find the button and click it
elem = driver.find_element_by_id("downloadBtn")
elem.click()
You can use a http library like Request to download this. but you may offer the username and password, you can study from its examples.
If the site you wish to download from has no JavaScript you will need to parse to navigate to the file you wish, consider using RoboBrowser. Selenium may be overkill for this.
Here is a basic example:
robo = RoboBrowser(history=True, parser="html.parser")
robo.open("http://www.python.org")
search = robo.get_form(action="/search/")
search["q"].value = "Really awesome search query"
robo.submit_form(search)
Question: yikyak.com returns some sort of "browser not supported" landing page when I try to view source code in chrome (even for the page I'm logged in on) or when I write it out to the Python terminal. Why is this and what can I do to get around it?
Edit for clarification: I'm using the chrome webdriver. I can navigate around the yik yak website by clicking on it just fine. But whenever I try to see what html is on the page, I get an html page for a "browser not reported" page.
Background: I'm trying to access yikyak.com with selenium for python to download yaks and do fun things with them. I know fairly little about web programming.
Thanks!
Secondary, less important question: If you're already here, are there particularly great free resources for a super-quick intro to the certification knowledge I need to store logins and stuff like that to use my logged in account? That would be awesome.
I figured it out. I was being dumb. I saved off the html as a file and opened that file with chrome and it displayed the normal page. I just didn't see the fact that it was a normal page looking at it directly. Thanks all 15 people for your time.