I am trying to make a tool that does things on your website account. Some things use reCAPTCHA after you log in, so I want to know how I could use the firefox's saved cookies the are on the normal browser on selenium so that it skips the reCAPTCHA and assumes you're not a bot.
Related
I am trying to make a python gui application.
What I want to do is to open a web browser by clicking a button. (Tkinter)
When the web browser is opened, I do login.
After logging it, it will redirect to the page.
And that page url will consist of code as a param I need to use later in code.
I used webbrowser.open_new('') to open a web browser.
But the limitation was it is only for opening.. there was no way to get the final redirected url I need.
Is there a way I can use to open a web browser and do something on that page and finally get that final url?
I am using python.
There are a few main approaches for automating interactions with browsers:
Telling a program how and what to click, like a human would, sometimes using desktop OS automation tools like Applescript
Parse files that contain browser data (will vary browser to browser, here is Firefox)
Use a tool or library that relies on the WebDriver protocol (e.g. selenium, puppeteer)
Access the local SQLite database of the browser and run queries against it
Sounds like 3 is what you need, assuming you're not against bringing in a new dependency.
I am new to session/cookie concepts. I am trying to browse https://finance.yahoo.com/portfolios to webscrape my portfolio details using python/selenium. When I normally try to access this page, it remembers my previous login details and goes straight to my portfolio page. However if I access the same page using selenium/webdriver, it does not show my portfolio details and it is blank.
Any suggestions/guidance for accomplishing this? Thanks.
To get the cookies from selenium, when accessing a page, you should use:
driver.get_cookies()
To add cookies in selenium you can use:
driver.add_cookie()
An option to log in could be going on the login page via selenium, filling in the info and logging in using selenium...
However, if you don't want that, I don't believe just setting the same cookies will be enough to fix your problem. What you should do instead is use the google chrome profile from the browser where you're logged in, using this:
option.add_argument('--user-data-dir=/path/to/your/logged-in/chrome/profile')
So for my python selenium script I have to complete a lot of Captcha's. I noticed that when I get the Captcha's on my regular browser they're much easier and quicker. Is there a way for me to hide the fact that I'm using a web automation bot so I get the easier Captcha's?
I already tried randomizing the User Agent but to no success.
You can go to your website and inspect the page. Then go to the network tab and select Network. Reload the page and the select the webpage you are accessing from the list. If you scroll down, you can see the user agent that your browser is using to access the page. Use that user agent in your scraper to exactly mimick your browser.
From a generic perspective there are no proven ways to hide the fact that you are using a Selenium driven web automation bot.
You can find a relevant detailed discussion in Can a website detect when you are using Selenium with chromedriver?
However at certain times modifying the navigator.webdriver flag helps to prevent detection.
References
You can find a couple of relevant detailed discussions in:
Is there a way to use Selenium WebDriver without informing the document that it is controlled by WebDriver?
Selenium Chrome gets detected
How does recaptcha 3 know I'm using selenium/chromedriver?
I am making a web scraper that can bring back my YouTube channel stats in python , so I went to my YT studio site and copied the link and pasted it print the soup using bs4.I took the whole test that was printed and created an html file and when i looked at it , it was the YouTube login page.
So now i want to login into this(lets say i can provide the password and email id in a text file) in order to scrape the yt studio stats.I have no idea bout this (im new to web scraping)
You can use YouTube API, you don't need web scraping for this task.
you can use youTubeAPI to perform your operation. If you are still looking for a method to perform via web scraping below is the code for it
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
driver= webdriver.Chrome('')
driver.get('https://accounts.google.com/signin')
driver.find_element(By.XPATH,'//*[#id="identifierId"]').send_keys('xxxxxxxx#gmail.com');
driver.find_element(By.XPATH,'//*[#id="identifierNext"]/div/button').click();
driver.findElement(By.id("passwordNext")).click();
While doing via web scraping after entering an email address and trying to enter the password field, you may come across an error like below. It will happen because of multiple reasons like two-factor auth, not the secure browser.
you can disable two-factor auth for your login and give it a try with web scraping it will help
You likely login via a POST request. So you'll want to use a browser and login to YouTube while monitoring the Network using the browser. If you're using Firefox, it would be this, if you're using another browser it should have an equivalent. You'll want to find the form request it sends and then replicate that.
Although, if you're that new to web scraping, you might be better off starting with something easier or using YouTube's API.
I am currently working on real-estate data and wanted to scrape some data from StreetEasy, which is the Register to see what it closed for about 2 months ago below listed price.
Example url
http://streeteasy.com/sale/1220187
The data I need requires login but the login mechanism is pretty different. There is no login page and the login is a pop-up. Is there anyway I can use Python to get the authentication and accesss the page after login like the image below?
With Selenium and PhantomJS, you get a powerful combination when it comes to scraping data.
from selenium import webdriver
host = "http://streeteasy.com/sale/1220187"
driver = webdriver.PhantomJS()
# Set the "window" wide enough so PhantomJS can "see" the right panel
driver.set_window_size(1280, 800)
driver.get(host)
driver.find_element_by_link_text("Register to see what it closed for").click()
driver.save_screenshot("output.jpg")
What you see is a small snippet of how Selenium can get you to the webpage login (verified via the JPG screencap). From there, it's a matter of toggling the login box, providing the credentials and click()ing your way in.
Oh, and be mindful of the TOS. Good luck!