Using selenium I can browse through the entire website except for the login screen on a specific website. When I checked page source I see some js codes.
Using a normal chrome browser I can access the expected login screen. Can anyone help me to overcome this issue? Thanks in advance.
Chances are that the website is detecting that you are using a bot and is blocking you from accessing its login screen for that reason. I can't know for sure that this is the reason because I haven't seen your issue in person, but most good websites don't like having non-human users interact with their pages, and a login page is exactly the type of page that the website would block you from accessing.
You can try to change some things to make Selenium less detectable by the website, but it's quite difficult and inconsistent at best. You can find some more information about achieving this here, but I wouldn't expect too much.
Related
I have searched in different forums, blogs, documents but I cannot find a way out.
My goal is to open up a browser window to access obv a site, this site tho needs a login in order to be accessed. Is it possible to login with email and pass before the actual new browser window is opened, so that the new new browser window opens the site where I just logged in? (i hope I was clear).
I have experienced Selenium and Python in the past and I am not that keen on using it again.
Does someone have any suggestion or can reveal a different technique? (I was thinking about cookies but I don't think it's possibile)
I have a website for work and I need to go through a list of numbers and determine if the user associated with the number is still active. The website requires a sign in so I can't use requests. Is there a way I can run it through my chrome browser to get the information I require?
If I can get the HTML then I am fine from there onward with the code.
Any help would be greatly appreciated
Can I have the webpage you are trying to access?
So, maybe I'm being paranoid.
I'm scraping my Facebook timeline for a hobby project using PhantomJS. Basically, I wrote a program that finds all of my ads by querying the page for the text Sponsored with XPATH inside of phantom's page.evaluate block. The text was being displayed as innerHTML of html a elements.
Things were working great for a few days and it was finding tons of ads.
Then it stopped returning any results.
When I logged into Facebook manually to inspect the elements again, I found that the word Sponsored was now appearing on the page in an ::after pseudoclass element with the css property content: sponsored. This means that an XPATH query for the text no longer yields any results. No joke, Facebook seemed to have changed the way they rendered this word after being scraped for a couple days.
Paranoid. I told you.
So, I offer this question to the community of Javascript, Web-Scraping, and PhantomJS developers out there. What the heck is going on. Can Facebook know what my PhantomJS program is doing inside of the page.evaluate block?
If so, how? Would my phantom commands appear in a key logger program embedded in the page, for instance?
What are some of your theories?
It is perfectly possible to detect PhantomJS even if the useragent is spoofed.
There are plenty of litte ways in which it differs from other browsers, among others:
Wrong order of headers
Lack of media plugins and latest JS capabilities
PhantomJS-specific methods, like window.callPhantom
PhantomJS name in the stack trace
and many others.
Please refer to this excellent article and presentation linked there for details: https://blog.shapesecurity.com/2015/01/22/detecting-phantomjs-based-visitors/
Maybe puppeteer would be a better fit for your needs as it is based on a real cutting-edge Chromium browser.
Using Python + Selenium to create a web crawler/scraper to notify me when new homework is posted. Managed to log into the main website, but you need to click a link to select your course.
After searching through the HTML manually, I found this information about the link I usually click (The blue box is the link).
However, no button that seems clickable. So I searched the page for the link I knew it should redirect me to, and I found this:
It looks like a card, which is a new data structure/object for me. How can I use an automated web crawler to click this link?
Try the following:
ui.WebDriverWait(self.driver, timeout).until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".title.ellipsis")))
driver.find_element_by_css_selector(".title.ellipsis").click()
Hope it helps you!
Question: yikyak.com returns some sort of "browser not supported" landing page when I try to view source code in chrome (even for the page I'm logged in on) or when I write it out to the Python terminal. Why is this and what can I do to get around it?
Edit for clarification: I'm using the chrome webdriver. I can navigate around the yik yak website by clicking on it just fine. But whenever I try to see what html is on the page, I get an html page for a "browser not reported" page.
Background: I'm trying to access yikyak.com with selenium for python to download yaks and do fun things with them. I know fairly little about web programming.
Thanks!
Secondary, less important question: If you're already here, are there particularly great free resources for a super-quick intro to the certification knowledge I need to store logins and stuff like that to use my logged in account? That would be awesome.
I figured it out. I was being dumb. I saved off the html as a file and opened that file with chrome and it displayed the normal page. I just didn't see the fact that it was a normal page looking at it directly. Thanks all 15 people for your time.