Get the current URL from chrome in python without using Selenium - python

I wanted to get the current URL from chrome or firefox for that matter but without using any automation tools like Selenium. I thought about accessing the chrome history DB and sorting based on time but that is only possible when the chrome window is closed. I wanted to get the current URL when chrome is open also.
Any help is appreciated.

Related

Selenium page loads as blank unless browser is manually opened with same profile

I am using selenium for a crawling project, but I struggle with a specific webpage (both chrome and firefox).
I found 2 workarounds that work to an extend but I want to know why this issue happens and how to avoid it.
1) Opening chrome manually and then opening selenium with my user profile.
If i manually start chrome and then run:
from selenium import webdriver
options.add_argument(r"user-data-dir=C:\Users\User\AppData\Local\Google\Chrome\User Data")
driver = webdriver.Chrome(options=options)
the page loads as intended
2) Passing a variable in the request
by appending /?anything to the url the page loads as intended in selenium
For some reason the webpage has a function in the header despite not loading... I suspect this could be a clue but I do not know enough to determine the cause.

Why is a web page opened through selenium different from a normal browser?

I'm learning how selenium crawls data, but I find that when a website opens through selenium, it's different from what I used to get when I used other normal browsers. Even I add headers. And I'm very confused.
I really want to upload two contrast pictures, but I can't upload them in stackoverflow at present. I even tried to open the chrome driver and enter the web address manually, but the result is still different.
I use Python 3.6, selenium and chrome 75.0.3770.80
from selenium import webdriver
driver = webdriver.Chrome() #创建driver实例
url = 'https://www.free-ss.ooo'
driver.get(url)
At present, I can't post pictures on stack overflow, but I just want to figure out how I can use selenium to get normal web pages.
Aha,I found the problem, really because the target site detected selenium, the solution is to add options
Chrome_options. add_experiment_option ('excludeSwitches', ['enable-automation'])
Faced same issue and was able to resolve it by removing or fixing appropriate user-agent argument and it worked fine in both headless and non-headless mode.
The resolution was inspired by PDHide post

Python Selenium: Can selenium driver be identified via javascript?

This is the site that I want to login into: https://nid.naver.com/nidlogin.login
When I tried to log in this site using selenium webdriver, it showed CAPTCHA.
But when I type id/pw by myself, keyboard typing, the CAPTCHA didn't show up!
How can selenium driver be detected?
It depends on your driver. Chromedriver does set specific js variables when it starts the browser. I'm sure other driver vendors have something similar. So, in short, yes. There are different ways it can determine that you are running via webdriver.

Python : Browser not able to browse URL using selenium

I am writing a python script which includes opening up an URL and do some activity on it. I am facing an issue when i execute below code then Firefox browser starts but it is not able to browse URL. What could be wrong here..?
I also tried to add proxy exception but that isn't solve issue.
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('WEBSITE_URL')
So, Pls suggest what is wrong here.

Selenium with Python, how do I get the page output after running a script?

I'm not sure how to find this information, I have found a few tutorials so far about using Python with selenium but none have so much as touched on this.. I am able to run some basic test scripts through python that automate selenium but it just shows the browser window for a few seconds and then closes it.. I need to get the browser output into a string / variable (ideally) or at least save it to a file so that python can do other things on it (parse it, etc).. I would appreciate if anyone can point me towards resources on how to do this. Thanks
using Selenium Webdriver and Python, you would simply access the .page_source property to get the source of the current page.
for example, using Firefox() driver:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('http://www.example.com/')
print(driver.page_source)
driver.quit()
There's a Selenium.getHtmlSource() method in Java, most likely it is also available in Python. It returns the source of the current page as string, so you can do whatever you want with it
Ok, so here is how I ended up doing this, for anyone who needs this in the future..
You have to use firefox for this to work.
1) create a new firefox profile (not necessary but ideal so as to separate this from normal firefox usage), there is plenty of info on how to do this on google, it depends on your OS how you do this
2) get the firefox plugin: https://addons.mozilla.org/en-US/firefox/addon/2704/ (this automatically saves all pages for a given domain name), you need to configure this to save whichever domains you intend on auto-saving.
3) then just start the selenium server to use the profile you created (below is an example for linux)
cd /root/Downloads/selenium-remote-control-1.0.3/selenium-server-1.0.3
java -jar selenium-server.jar -firefoxProfileTemplate /path_to_your_firefox_profile/
Thats it, it will now save all the pages for a given domain name whenever selenium visits them, selenium does create a bunch of garbage pages too so you could just delete these via a simple regex parsing and its up to you, from there how to manipulate the saved pages

Categories