I'm new to coding and trying to use Selenium with Python to click through a website and fill a shopping cart. I've got things working well except for the random ForeSee survey popup. When it appears (and it doesn't always appear in the same location), my code stops working at that point.
I read the ForeSee documentation and it says "...when the invitation is displayed, the fsr.r...cookie is dropped. This cookie prevents a user from being invited again for X days (default 90)."
Hoping for a quick fix, I created a separate Firefox profile and ran through the website and got the ForeSee pop up invitation--no more pop up when manually using that profile. But I still get the pop up when using Selenium.
I used this code:
fp = webdriver.FirefoxProfile('C:\path\to\profile')
browser = webdriver.Firefox(firefox_profile=fp)
EDIT: I got the cookie working. I was using the Local folder instead of the Roaming folder in C:\path\to\profile. Using the roaming folder solved the problem.
My question edited to delete the part about the cookie not working:
Can someone suggest code to permanently handle the ForeSee pop up that appears randomly and on random pages?
I'm using using Protractor with JS, so I can't give you actual code to handle the issue, but I can give you an idea how to approach this.
In a nutshell
When following script is executed in the browser's console -
window.FSR.setFSRVisibility(true);
it makes ForeSee popup appear behind the rest of HTML elements. And doesn't affect UI tests anymore
So my protractor script will look like so
await browser.executeScript(
`window.FSR.setFSRVisibility(true);`
);
Theory
So ForeSee is one of those services that can be integrated with any web app, and will be pulling js code from their API and changing HTML of your app, by executing the code on the scope of the website. Another example of such company is walkme
Obviously in modern world, if these guys can overlay a webpage, they should have a configuration to make it optional (at least for lower environments) and they actually do. What I mentioned as a solution came from this page. But assuming they didn't have such option, one could reach out their support and ask how to workaround their popups. Even if they didn't have such option they would gladly consider it as a feature for improvement.
Related
I'm trying to get a specific data from a website, but this is a little bit complicated to understand so here is some images.
So, first, I'm on this page,
Image1
then I click on the icon in the middle and something pop,
popup
then I have to click on this,
almost there
And finally I land here
arrival
And I want to get all the names of the people here
So, my question is, is there a way to get directly this list with a requests ?
If yes, how do i have do to ? I can't find the URL of this kind of pop up and I'm a complete beginner with requests and all this kind of things..
(To get the name, I have to be connected on my account by the way)
So, since I don't know how to access to the pop-up windows, this is the only code I got :
import requests
x = requests.get('https://www.tiktok.com/#programm___r?lang=en', headers={'User-Agent':'test'})
print(x.text)
I checked what it prints, and i didn't see a sign of the pop-up window
you can get some sort of network interception tool like Burpsuite and watch the network traffic that comes through each time you click on each link along the way to your final destination, this should give you an endpoint you may be able to send your request too. I think this network information should also be available in the browser tools but I'm not sure. A potential issue here is that usually tokens and other information has to be passed down the chain along the way, which might make scripting something like this too hard.
So aside from that, with browser automation software like selenium, you could automate the process of getting to that point on the page, and be able to pull out the list you want once you're there. I've used selenium myself and it's really usable and well documented!
I'm currently testing a website with python-selenium and it works pretty well so far. I'm using webdriver.Firefox() because it makes the devolepment process much easier if you can see what the testing program actually does. However, the tests are very slow. At one point, the program has to click on 30 items to add them to a list, which takes roughly 40 seconds because the browser is responding so awfully slowly. So after googling how to make selenium faster I've thought about using a headless browser instead, for example webdriver.PhantomJS().
However, the problem is, that the website requires a login including a captcha at the beginning. Right now I enter the captcha manually in the Firefox-Browser. When switching to a headless browser, I cannot do this anymore.
So my idea was to open the website in Firefox, login and solve the captcha manually. Then I somehow continue the session in headless PhatomJS which allows me to run the code quickly. So basically it is about changing the used driver mid-code.
I know that a driver is completely clean when created. So if I create a new driver after logging in in Firefox, I'd be logged out in the other driver. So I guess I'd have to transfer some session-information between the two drivers.
Could this somehow work? If yes, how can I do it? To be honest I do not know a lot about the actual functionality of webhooks, cookies and storing the"logged-in" information in general. So how would you guys handle this problem?
Looking forward to hearing your answers,
Tobias
Note: I already asked a similar question, which got marked as a duplicate of this one. However, the other question discusses how to reconnect to the browser after quitting the script. This is not what I am intending to do. I want to change the used driver mid-script while staying logged in on the website. So I deleted my old question and created this new, more fitting one. I hope it is okay like that.
The real solution to this is to have your development team add a test mode (not available on Production) where the Captcha solution is either provided somewhere in the page code, or the Captcha is bypassed.
Your proposed solution does not sound like it would work, and having a manual step defeats the purpose of automation. Automation that requires manual steps to be taken will be abandoned.
The website "recognizes" the user via Cookies - a special HTTP Header which is being sent with each request so the website knows that the user is authenticated, has these or that permissions, etc.
Fortunately Selenium provides functions allowing cookies manipulation so all you need to do is to store cookies from the Firefox using WebDriver.get_cookies() method and once done add them to PhantomJS via WebDriver.add_cookie() method.
firefoxCookies = firefoxDriver.get_cookies()
for cookie in firefoxCookies:
phantomJSDriver.add_cookie(cookie)
I want to webscraping from a website, where i have to log in first. The problem is that, there is a "robotprotection" too (so I have to verify that i am not a robot + a recaptcha-security.), but it's chances of success (passing the captcha) is ~30% and this is horrible for me.
There is another possibility maybe which one i am log in with my browser (for example chrome or firefox), and after im going to use this session ID in my python script to webscraping dataes automatically?
So, more simplier: I want to webscraping tables from a website, so i have to log in first. This 30% succes rate is not enough good for me, so i hope there is another possibilty : log in manually, and after use this session in python?!
After that, there is a textbox in this page, where i want to write what i want to search, and after it is navigate to the page, where i'll found the table and dataes.
Any ideas, or it is possible?
(now i have only a script which one i have to download the html code to this datapage, and after change some name in the code manually..it is a very big waste time, i hope i can automate it more.) - Python 2.7
My program opens a certain page on using
webbrowser.open(url)
How is it possible to reload the tab containing the url several times?
I could use sleep to set the time limit in which it has to wait before it has to reload.
But how do I refresh the tab after that? (Not open it in a new tab.)
I don't think it would be possible to implement a pure python solution for this which works with different browsers. A solution I would think of is using JavaScript. Vaguely the idea is to create a html file which has an iframe with the url you want and has javascript for reloading the iframe in regular interval. Then use webbrowser module to open that file.
This may sound ugly but this may be the only solution given the security concerns of a browser.
*If you are interested with this idea I can help you writing the code for this.
Hope this helps.
EDIT: below is my OLD answer, I'm not deleting it because it shows the ambiguity in the docs, and could possibly serve as a learning experience to someone.
If you read the docs, they make it sound like its possible. However, it is not possible to do with this module, further more, it seems like no matter what option you give to "new" it always opens in a new tab. Perhaps this behavior is specific to my system, or browser(IE9) but I believe it is more likely a bug in the program.
I investigated further, there is questions about this all over SO. you can't do it with webbrowser or anything built into python.
If you install selenium, you should be able to do what you want.
I am assuming you don't have access to the source code of this webpage, otherwise, you could just use html to do the refresh. If you don't want to install selenium and don't have source access, then you need to make a wrapper for the webpage, and use HTML/JS to refresh the wrapper.
the docs say:
webbrowser.open(url, new=0, autoraise=True)
Display url using the default browser. If new is 0, the url is opened in the same browser window if possible. If new is 1, a new browser window is opened if possible. If new is 2, a new browser page (“tab”) is opened if possible. If autoraise is True, the window is raised if possible (note that under many window managers this will occur regardless of the setting of this variable).
so...
to refresh the page, it would just be:
for i in range(refresh_limit):
time.sleep(wait_time)
webbrowser.open(url)
^^^ this does not actually work^^^
I'm a little new to web crawlers and such, though I've been programming for a year already. So please bear with me as I try to explain my problem here.
I'm parsing info from Yahoo! News, and I've managed to get most of what I want, but there's a little portion that has stumped me.
For example: http://news.yahoo.com/record-nm-blaze-test-forest-management-225730172.html
I want to get the numbers beside the thumbs up and thumbs down icons in the comments. When I use "Inspect Element" in my Chrome browser, I can clearly see the things that I have to look for - namely, an em tag under the div class 'ugccmt-rate'. However, I'm not able to find this in my python program. In trying to track down the root of the problem, I clicked to view source of the page, and it seems that this tag is not there. Do you guys know how I should approach this problem? Does this have something to do with the javascript on the page that displays the info only after it runs? I'd appreciate some pointers in the right direction.
Thanks.
The page is being generated via JavaScript.
Check if there is a mobile version of the website first. If not, check for any APIs or RSS/Atom feeds. If there's nothing else, you'll either have to manually figure out what the JavaScript is loading and from where, or use Selenium to automate a browser that renders the JavaScript for you for parsing.
Using the Web Console in Firefox you can pretty easily see what requests the page is actually making as it runs its scripts, and figure out what URI returns the data you want. Then you can request that URI directly in your Python script and tease the data out of it. It is probably in a format that Python already has a library to parse, such as JSON.
Yahoo! may have some stuff on their server side to try to prevent you from accessing these data files in a script, such as checking the browser (user-agent header), cookies, or referrer. These can all be faked with enough perseverance, but you should take their existence as a sign that you should tread lightly. (They may also limit the number of requests you can make in a given time period, which is impossible to get around.)