Getting current url after selenium navigation? - python

I have a Python script that runs Selenium and makes a search for me on YouTube. After my .send_keys() and .submit() commands I attempt to get the current url of the search page with print(driver.current_url) but it only gives me the original url from my driver.get('https://www.youtube.com') command.
How can I get the full current url path of the search page once I'm there? For example https://www.youtube.com/results?search_query=election instead of https://www.youtube.com.
Thank you.

As you have not shared the code you have tried. I am guessing issue is with your page load. After clicking on submit you are not giving any time for page to load before you get your url. Please give some wait time. The simplest ( No so good) way is to use :
time.sleep(5)
print(driver.current_url)
Above will wait for 5 sec.

Why are you practicing social media to automation
For multiple reasons, logging into sites like Gmail and Facebook using WebDriver is not recommended. Aside from being against the usage terms for these sites (where you risk having the account shut down), it is slow and unreliable.
The ideal practice is to use the APIs that email providers offer, or in the case of Facebook the developer tools service which exposes an API for creating test accounts, friends, and so forth. Although using an API might seem like a bit of extra hard work, you will be paid back in speed, reliability, and stability. The API is also unlikely to change, whereas webpages and HTML locators change often and require you to update your test framework.
Logging in to third-party sites using WebDriver at any point of your test increases the risk of your test failing because it makes your test longer. A general rule of thumb is that longer tests are more fragile and unreliable.
WebDriver implementations that are W3C conformant also annotate the navigator object with a WebDriver property so that Denial of Service attacks can be mitigated.

You can simply wait for a period of time.
driver.implicitly_wait(5)
print(driver.current_url)

To get the current URL after clicking on random videos on particular search, current_url is the only way.
The reason because of which you are getting the previous URL may be the page is not loaded, you may check for the page load by comparing the title of the page
For eg:
expectedURL = "demo class"
actualURL=driver.title
assert expectedURL == actualURL
If the assert gives you true then you may have the command to get the current URL
driver.current_url

Related

Google returning different layouts for pagination

I am using selenium and chrome to search on google. But it is returning different layouts for pagination. I am using different proxies and different user agents using the fake_useragent library.
I only want the second image layout. Does anybody know how can I get it every time?
First Image
Second Image
The issue was fake_useragent library was returning old user-agents sometimes even if I update the database. I tried this library(https://pypi.org/project/latest-user-agents/) and it returns newer user-agents.
Here is the working code.
from latest_user_agents import get_latest_user_agents
import random
from selenium import webdriver
PATH = 'C:\Program Files (x86)\chromedriver.exe'
proxy = ''
url = ''
user_agent = random.choice(get_latest_user_agents())
options = webdriver.ChromeOptions()
options.add_argument(f'--proxy-server={proxy}')
options.add_argument(f'user-agent={user_agent}')
driver = webdriver.Chrome(PATH, options=options)
driver.get(url)
The difference between the two layouts is when you disable javascript, Google will show the pagination as the first image layout.
To ensure that you get the second layout every time, you would need to make sure javascript is enabled.
If you have a chrome driver from selenium like: options = webdriver.ChromeOptions(), the following would make sure javascript is always enabled:
options.add_argument("--enable-javascript")
Edit based on OP's comment
I got it working by using the latest_user_agents library. The fake_useragent library was returning old user-agents sometimes. That's why it was showing the old layout.
Installing the latest_user_agents library: https://pypi.org/project/latest-user-agents/
Hey Dont try to automate google and google products by automation tools because every day google are changing webelements and view of thier pages.
For multiple reasons, logging into sites like Gmail and Facebook using WebDriver is not recommended. Aside from being against the usage terms for these sites (where you risk having the account shut down), it is slow and unreliable.
The ideal practice is to use the APIs that email providers offer, or in the case of Facebook the developer tools service which exposes an API for creating test accounts, friends, and so forth. Although using an API might seem like a bit of extra hard work, you will be paid back in speed, reliability, and stability. The API is also unlikely to change, whereas webpages and HTML locators change often and require you to update your test framework.
Logging in to third-party sites using WebDriver at any point of your test increases the risk of your test failing because it makes your test longer. A general rule of thumb is that longer tests are more fragile and unreliable.
WebDriver implementations that are W3C conformant also annotate the navigator object with a WebDriver property so that Denial of Service attacks can be mitigated.

How to access hidden instagram button using selenium python

I am making a project where I want to click on the button but the button is hidden. I am working on python. I want to access this 3 dot button which appears next to comment when i hover over it
This is worst practice in selenium.
For multiple reasons, logging into sites like Gmail and Facebook using WebDriver is not recommended. Aside from being against the usage terms for these sites (where you risk having the account shut down), it is slow and unreliable.
The ideal practice is to use the APIs that email providers offer, or in the case of Facebook the developer tools service which exposes an API for creating test accounts, friends and so forth. Although using an API might seem like a bit of extra hard work, you will be paid back in speed, reliability, and stability. The API is also unlikely to change, whereas webpages and HTML locators change often and require you to update your test framework.
Logging in to third party sites using WebDriver at any point of your test increases the risk of your test failing because it makes your test longer. A general rule of thumb is that longer tests are more fragile and unreliable.
Please check selenium Hq web site

Selenium page source doesn't match the actual one

I was trying to parse tweets (let's say https://twitter.com/Tesla), but I ran into a problem that once I download the source code using html = browser.page_source it does not match what I see when inspecting the element (Ctrl+Shift+I). It shows some of the tweets, but not nearly all of them, moreover, when saving the code to file and opening it in Chrome, I get something incomprehensible. I had experience working with selenium before and have never ran into such a problem. Maybe there is some other function to get the source?
By the way, I know that Twitter provides an API, but they declined my request without giving any reasons even though I do not plan to do anything against their terms.
Hey this is one of worst practice in selenium
For multiple reasons, logging into sites like Gmail and Facebook using WebDriver is not recommended. Aside from being against the usage terms for these sites (where you risk having the account shut down), it is slow and unreliable.
The ideal practice is to use the APIs that email providers offer, or in the case of Facebook the developer tools service which exposes an API for creating test accounts, friends and so forth. Although using an API might seem like a bit of extra hard work, you will be paid back in speed, reliability, and stability. The API is also unlikely to change, whereas webpages and HTML locators change often and require you to update your test framework.
Logging in to third party sites using WebDriver at any point of your test increases the risk of your test failing because it makes your test longer. A general rule of thumb is that longer tests are more fragile and unreliable.

Using Selenium for Python Scripting

I have written a Python code to open my gmail account. Here is the code that I am using:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('https://www.gmail.com')
emailElem = browser.find_element_by_id('email')
emailElem.send_keys(myemail)
emailElem = browser.find_element_by_id('password')
emailElem.send_keys(mypassword)
emailElem = browser.find_element_by_id('signInSubmit')
emailElem.submit()
Everything is working fine. I have also found out that there are sites that lets one Log In only after entering a Captcha, to prevent scripts from logging in.
Is there a way in which I can use my above code get around this problem??
Experimentation. If the site is not showing a captcha to normal users you'll have to mimic being a human with your code. So that could mean that you use time.sleep(x) to make it seem like it takes a while before certain actions happen.
Otherwise there are services out there that solve captchas for you.
If you perform the same actions repetitively, gmail(or any other site which tries to block automation) will identify your actions as automated ones. To get around this you need to pass random sleep time in your script. Also, switching between multiple credential helps.
For that you must used some Captcha resolver API. Here I will provide you website which provide text code of captcha https://2captcha.com/

Selenium: What functions would fire request?

I am new to Selenium and web applications. Please bear with me for a second if my question seems way too obvious. Here is my story.
I have written a scraper in Python that uses Selenium2.0 Webdriver to crawl AJAX web pages. One of the biggest challenge (and ethics) is that I do not want to burn down the website's server. Therefore I need a way to monitor the number of requests my webdriver is firing on each page parsed.
I have done some google-searches. It seems like only selenium-RC provides such a functionality. However, I do not want to rewrite my code just for this reason. As a compromise, I decided to limit the rate of method calls that potentially lead to the headless browser firing requests to the server.
In the script, I have the following kind of method calls:
driver.find_element_by_XXXX()
driver.execute_script()
webElement.get_attribute()
webElement.text
I use the second function to scroll to the bottom of the window and get the AJAX content, like the following:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
Based on my intuition, only the second function will trigger request firing, since others seem like parsing existing html content.
Is my intuition wrong?
Many many thanks
Perhaps I should elaborate more. I am automating a process of crawling on a website in Python. There is a subtantial amount of work done, and the script is running without large bugs.
My colleagues, however, reminded me that if in the process of crawling a page I made too many requests for the AJAX list within a short time, I may get banned by the server. This is why I started looking for a way to monitor the number of requests I am firing from my headless PhantomJS browswer in script.
Since I cannot find a way to monitor the number of requests in script, I made the compromise I mentioned above.
Therefore I need a way to monitor the number of requests my webdriver
is firing on each page parsed
As far as I know, the number of requests is depending on the webpage's design, i.e. the resources used by the webpage and the requests made by Javascript/AJAX. Webdriver will open a browser and load the webpage just like a normal user.
In Chrome, you can check the requests and responses using Developer Tools panel. You can refer to this post. The current UI design of Developer Tools is different but the basic functions are still the same. Alternatively, you can also use the Firebug plugin in Firefox.
Updated:
Another method to check the requests and responses is by using Wireshark. Please refer to these Wireshark filters.

Categories