how to avoid bot detection on websites using selenium python - python

We are trying to automate process using selenium python for a website but as we proceed with the process the bot gets detected every time and a captcha comes up. Even though a human solve that captcha the website does not allow to move forward and continuously keeps detecting the bot and again and again shows the captcha to solve. So far we have tried all methods whichever we have explored to overcome it but none of method worked. Can someone help on this issue ?
Some of the methods tried:
1.) using spoofing user agents
2.) using proxies
3.) turn-off useAutomationExtension
4.) changing the property value of navigator for web driver to undefined
5.) disable-blink-features
6.) exclude the collection of enable-automation switches

There is something called "Undetected ChromeDriver" you can check out!
Optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io Automatically downloads the driver binary and patches it.
here is the link
Here is another useful website you can check out, this website shows if a site will detect you using selenium or not or anything like that:
LINK
Also for future reference on Stack Overflow you should steer away from opinion-based questions. Read this to learn more about asking a good question.

Related

How do I hide the fact I'm using a bot?

So for my python selenium script I have to complete a lot of Captcha's. I noticed that when I get the Captcha's on my regular browser they're much easier and quicker. Is there a way for me to hide the fact that I'm using a web automation bot so I get the easier Captcha's?
I already tried randomizing the User Agent but to no success.
You can go to your website and inspect the page. Then go to the network tab and select Network. Reload the page and the select the webpage you are accessing from the list. If you scroll down, you can see the user agent that your browser is using to access the page. Use that user agent in your scraper to exactly mimick your browser.
From a generic perspective there are no proven ways to hide the fact that you are using a Selenium driven web automation bot.
You can find a relevant detailed discussion in Can a website detect when you are using Selenium with chromedriver?
However at certain times modifying the navigator.webdriver flag helps to prevent detection.
References
You can find a couple of relevant detailed discussions in:
Is there a way to use Selenium WebDriver without informing the document that it is controlled by WebDriver?
Selenium Chrome gets detected
How does recaptcha 3 know I'm using selenium/chromedriver?

Python Data extraction from a pop-up window

I'm trying to get a specific data from a website, but this is a little bit complicated to understand so here is some images.
So, first, I'm on this page,
Image1
then I click on the icon in the middle and something pop,
popup
then I have to click on this,
almost there
And finally I land here
arrival
And I want to get all the names of the people here
So, my question is, is there a way to get directly this list with a requests ?
If yes, how do i have do to ? I can't find the URL of this kind of pop up and I'm a complete beginner with requests and all this kind of things..
(To get the name, I have to be connected on my account by the way)
So, since I don't know how to access to the pop-up windows, this is the only code I got :
import requests
x = requests.get('https://www.tiktok.com/#programm___r?lang=en', headers={'User-Agent':'test'})
print(x.text)
I checked what it prints, and i didn't see a sign of the pop-up window
you can get some sort of network interception tool like Burpsuite and watch the network traffic that comes through each time you click on each link along the way to your final destination, this should give you an endpoint you may be able to send your request too. I think this network information should also be available in the browser tools but I'm not sure. A potential issue here is that usually tokens and other information has to be passed down the chain along the way, which might make scripting something like this too hard.
So aside from that, with browser automation software like selenium, you could automate the process of getting to that point on the page, and be able to pull out the list you want once you're there. I've used selenium myself and it's really usable and well documented!

Python-3.x-Selenium: Changing the used driver while staying logged in on a website

I'm currently testing a website with python-selenium and it works pretty well so far. I'm using webdriver.Firefox() because it makes the devolepment process much easier if you can see what the testing program actually does. However, the tests are very slow. At one point, the program has to click on 30 items to add them to a list, which takes roughly 40 seconds because the browser is responding so awfully slowly. So after googling how to make selenium faster I've thought about using a headless browser instead, for example webdriver.PhantomJS().
However, the problem is, that the website requires a login including a captcha at the beginning. Right now I enter the captcha manually in the Firefox-Browser. When switching to a headless browser, I cannot do this anymore.
So my idea was to open the website in Firefox, login and solve the captcha manually. Then I somehow continue the session in headless PhatomJS which allows me to run the code quickly. So basically it is about changing the used driver mid-code.
I know that a driver is completely clean when created. So if I create a new driver after logging in in Firefox, I'd be logged out in the other driver. So I guess I'd have to transfer some session-information between the two drivers.
Could this somehow work? If yes, how can I do it? To be honest I do not know a lot about the actual functionality of webhooks, cookies and storing the"logged-in" information in general. So how would you guys handle this problem?
Looking forward to hearing your answers,
Tobias
Note: I already asked a similar question, which got marked as a duplicate of this one. However, the other question discusses how to reconnect to the browser after quitting the script. This is not what I am intending to do. I want to change the used driver mid-script while staying logged in on the website. So I deleted my old question and created this new, more fitting one. I hope it is okay like that.
The real solution to this is to have your development team add a test mode (not available on Production) where the Captcha solution is either provided somewhere in the page code, or the Captcha is bypassed.
Your proposed solution does not sound like it would work, and having a manual step defeats the purpose of automation. Automation that requires manual steps to be taken will be abandoned.
The website "recognizes" the user via Cookies - a special HTTP Header which is being sent with each request so the website knows that the user is authenticated, has these or that permissions, etc.
Fortunately Selenium provides functions allowing cookies manipulation so all you need to do is to store cookies from the Firefox using WebDriver.get_cookies() method and once done add them to PhantomJS via WebDriver.add_cookie() method.
firefoxCookies = firefoxDriver.get_cookies()
for cookie in firefoxCookies:
phantomJSDriver.add_cookie(cookie)

Handle random ForeSee popup using Python and Selenium

I'm new to coding and trying to use Selenium with Python to click through a website and fill a shopping cart. I've got things working well except for the random ForeSee survey popup. When it appears (and it doesn't always appear in the same location), my code stops working at that point.
I read the ForeSee documentation and it says "...when the invitation is displayed, the fsr.r...cookie is dropped. This cookie prevents a user from being invited again for X days (default 90)."
Hoping for a quick fix, I created a separate Firefox profile and ran through the website and got the ForeSee pop up invitation--no more pop up when manually using that profile. But I still get the pop up when using Selenium.
I used this code:
fp = webdriver.FirefoxProfile('C:\path\to\profile')
browser = webdriver.Firefox(firefox_profile=fp)
EDIT: I got the cookie working. I was using the Local folder instead of the Roaming folder in C:\path\to\profile. Using the roaming folder solved the problem.
My question edited to delete the part about the cookie not working:
Can someone suggest code to permanently handle the ForeSee pop up that appears randomly and on random pages?
I'm using using Protractor with JS, so I can't give you actual code to handle the issue, but I can give you an idea how to approach this.
In a nutshell
When following script is executed in the browser's console -
window.FSR.setFSRVisibility(true);
it makes ForeSee popup appear behind the rest of HTML elements. And doesn't affect UI tests anymore
So my protractor script will look like so
await browser.executeScript(
`window.FSR.setFSRVisibility(true);`
);
Theory
So ForeSee is one of those services that can be integrated with any web app, and will be pulling js code from their API and changing HTML of your app, by executing the code on the scope of the website. Another example of such company is walkme
Obviously in modern world, if these guys can overlay a webpage, they should have a configuration to make it optional (at least for lower environments) and they actually do. What I mentioned as a solution came from this page. But assuming they didn't have such option, one could reach out their support and ask how to workaround their popups. Even if they didn't have such option they would gladly consider it as a feature for improvement.

Selenium: What functions would fire request?

I am new to Selenium and web applications. Please bear with me for a second if my question seems way too obvious. Here is my story.
I have written a scraper in Python that uses Selenium2.0 Webdriver to crawl AJAX web pages. One of the biggest challenge (and ethics) is that I do not want to burn down the website's server. Therefore I need a way to monitor the number of requests my webdriver is firing on each page parsed.
I have done some google-searches. It seems like only selenium-RC provides such a functionality. However, I do not want to rewrite my code just for this reason. As a compromise, I decided to limit the rate of method calls that potentially lead to the headless browser firing requests to the server.
In the script, I have the following kind of method calls:
driver.find_element_by_XXXX()
driver.execute_script()
webElement.get_attribute()
webElement.text
I use the second function to scroll to the bottom of the window and get the AJAX content, like the following:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
Based on my intuition, only the second function will trigger request firing, since others seem like parsing existing html content.
Is my intuition wrong?
Many many thanks
Perhaps I should elaborate more. I am automating a process of crawling on a website in Python. There is a subtantial amount of work done, and the script is running without large bugs.
My colleagues, however, reminded me that if in the process of crawling a page I made too many requests for the AJAX list within a short time, I may get banned by the server. This is why I started looking for a way to monitor the number of requests I am firing from my headless PhantomJS browswer in script.
Since I cannot find a way to monitor the number of requests in script, I made the compromise I mentioned above.
Therefore I need a way to monitor the number of requests my webdriver
is firing on each page parsed
As far as I know, the number of requests is depending on the webpage's design, i.e. the resources used by the webpage and the requests made by Javascript/AJAX. Webdriver will open a browser and load the webpage just like a normal user.
In Chrome, you can check the requests and responses using Developer Tools panel. You can refer to this post. The current UI design of Developer Tools is different but the basic functions are still the same. Alternatively, you can also use the Firebug plugin in Firefox.
Updated:
Another method to check the requests and responses is by using Wireshark. Please refer to these Wireshark filters.

Categories