Python: Getting all Urls from every open Google Chrome Tab - python

I need to get all the urls from all open Google Chrome tabs in python 3 without intefering with the user. Im on Windows 10 using Microsoft Visual Studio Python3
Ive tried:
Opening it directly with open(path to current tabs)-- doesnt work because i have no permission- i think its locked because chrome activly writes to it.
Current_Tabs_Source = open(r"C:\Users\Beni\AppData\Local\Google\Chrome\User
Data\Default\Current Tabs", "r")
Current_Tabs_Raw = Current_Tabs_Source.read()
print(Current_Tabs_Raw) #just for checking
PermissionError: [Errno 13] Permission denied
Opening through sglite3 -- doesnt work because its locked. And i cant find a password anywhere. Ive tried to open the History for the urls but it doesnt work anyways.
import sqlite3
from os import path
data_path = path.expanduser('~') + r"\AppData\Local\Google\Chrome\User
Data\Default"
files = listdir(data_path)
history_db = path.join(data_path, 'history')
c = sqlite3.connect(history_db)
cursor = c.cursor()
select_statement = "SELECT urls.url, urls.visit_count FROM urls, visits
WHERE urls.id = visits.url;"
cursor.execute(select_statement)
results = cursor.fetchall()
print(results) #just for checking
sqlite3.OperationalError: database is locked
Using selenium and a 3rd party chrome extension to copy all urls to the clipboard -- doesnt work because these extensions only work in the active selenium window. So the Windows with the tabs in it that i want dont get copied.
Ive considered hacking together a chrome extension that copys the urls every 30 sec to a temp file. But i only know minimal Javascript so this thing is driving me mad.
So does anyone know a way to do this in Python ? Any other solution is greatly appreciated.

If you want to access the database, you should close all browsers.
(Source)

When we're accessing the SQLite database of any browser, we have to close that particular browser first.
Moreover, the SQL command being used here will fetch all the duplicate rows.
Need to change `select_statement'
select_statement = "SELECT distinct urls.url, urls.visit_count FROM urls, visits WHERE urls.id = visits.url;"
Further, we need to add a loop to print all the 'urls' from history database of chrome.
for url, count in results:
print(url) # print urls line by line
However, this will give us the whole history of the Chrome browser but not the required URLs of all the presently opened tabs.

You can use windows shadow copy service to make copy of locked sqlite database. And then read it normally.
There is python module for that https://github.com/sblosser/pyshadowcopy

Related

"with open as f" won't open and It doesn't even say there's a problem

import requests
res = requests.get("https://google.com")
res.raise_for_status()
print(len(res.text))
print(res.text)
with open('mygoogle.html',"w",encoding=('utf-8')) as f :
f.write(res.text)
The with open as f part doesn't work.
If it does, it has to open new file, but it doesn't.
I have tested this code and it works properly - mygoogle.html gets written and contains the expected content. The issue is likely outside of the script then. Since it is using a relative path, the issue could be that the file is getting written somewhere where you don't expect it due to the working directory not being set how you expect - Windows often defaults to C:\Windows\system32 or your user profile directory (%userprofile% aka C:\users\<username>) for new cmd instances.

Is there a way to automatically/programatically export cookies from Chrome on Windows 10?

Chrome cookies are stored under C:\Users\<your_username>\AppData\Local\Google\Chrome\User Data\Default\Cookies but they are in an encrypted SQLite database file.
Is there any way to access the cookies without going through Chrome itself? Or, failing that, to automate Chrome exporting the plaintext cookies for you.
I need to get the cookies for a domain automatically saved in a certain directory on a regular schedule. I can use powershell/batch or python to do this.
I found this script that works for Firefox on Linux:
Is it at all possible on Windows with Chrome given the file is encrypted. I'm aware of various extensions that allow you to save cookies for a domain as plaintext but this must be done manually - I want a scriptable solution.
I do not have a Windows computer at hand, but would this work ?
import sqlite3
import win32crypt # obtained from `pip install pywin32`
cookies_filepath = '...' # <--- CHANGE ME
with sqlite3.connect(cookies_filepath) as connection:
result = connection.execute('select host_key, name, encrypted_value from cookies') # add other fields if you want
for host_key, name, encrypted_value in result:
# see http://timgolden.me.uk/pywin32-docs/win32crypt__CryptUnprotectData_meth.html
crypt_description, value = win32crypt.CryptUnprotectData(encrypted_value, None, None, None, 0)
print(f"{host_key!r} {name!r} {value!r}")

How can I get chrome version using python code without having to actually open up a browser?

I'm writing a bunch of programs using python selenium. In order to scrape content from different websites, I need to download chromedriver.exe that is compatible with my current version of chrome. However, chrome is constantly being updated, so I want to write a program that will first check if chrome and chromedriver versions are compatible before running my programs. So, I need a way to get my current chrome version without using chromewebdriver or actually opening up a browser. Any suggestions?
If you are working on Linux:
Try this in terminal(If you want to get the result in python script, you need to use subprocess.popen()):
google-chrome --version
You may need to use which google-chrome to know whether you have install it.Hope it would help.
For windows you could try with the CMD reg query "HKLM\SOFTWARE\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall\Google Chrome" as follow:
import os
stream = os.popen('reg query "HKLM\\SOFTWARE\\Wow6432Node\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\Google Chrome"')
output = stream.read()
print(output)
To extract the google_version from the output you could try the following:
import os
def extract_version(output):
try:
google_version = ''
for letter in output[output.rindex('DisplayVersion REG_SZ') + 24:]:
if letter != '\n':
google_version += letter
else:
break
return(google_version.strip())
except TypeError:
return
stream = os.popen('reg query "HKLM\\SOFTWARE\\Wow6432Node\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\Google Chrome"')
output = stream.read()
google_version = extract_version(output)
print(google_version)

In Python, urllib.urlretrieve downloads a file which says "Go away"

I'm trying to download the (APK) files from links such as https://www.apkmirror.com/wp-content/themes/APKMirror/download.php?id=215041. When you enter the link in your browser, it brings up a dialog to open or save the file (see below).
I would like to save the file using a Python script. I've tried the following:
import urllib
download_link = 'https://www.apkmirror.com/wp-content/themes/APKMirror/download.php?id=215041'
download_file = '/tmp/apkmirror_test/youtube.apk'
if __name__ == "__main__":
urllib.urlretrieve(url=download_link, filename=download_file)
but the resulting youtube.apk contains only the words "Go away".
Since I am able to download the file by pasting the link in my browser's address bar, there must be some difference between that and urllib.urlretrieve that makes this not work. Can someone explain this difference and how to eliminate it?
You should not programmatically access that download page as it is disallowed in the robots.txt:
https://www.apkmirror.com/robots.txt
That being said, your request header is different. Python by default sets User-Agent to something like "Python...". That is the most likely cause of detection.

Inconsistent Download 'Failed - Path too long' Error using Python Selenium Chromedriver

Stack: Windows 8.1, Python 3.4.3.4, Selenium 2.46, Chromedriver 2.16
I'm using Selenium Python bindings with Chromedriver to automate some downloads from the following URL: r'http://financials.morningstar.com/'
I have set up the following chromedriver preferences:
chromeOptions = webdriver.ChromeOptions()
prefs = {'download.default_directory':symbol_dir}
chromeOptions.add_experimental_option('prefs', prefs)
chromeOptions.add_extension(self.chromeUBlock_path)
chromedriver_path = self.chromedriver_path
Furthermore I have set up the following selenium code to run the download which works without a problem and downloads to the correct file location etc.:
symbol = 'AAPL' # test symbol
IS_anl_statement = 'income-statement'
IS_anl_abbrev = 'is'
url_anl = self.mstar_base_url + r'{}/{}.html?t={}&region=usa&culture=en-US'
IS_anl = webdriver.Chrome(executable_path=chromedriver_path, chrome_options=chromeOptions)
IS_anl.set_page_load_timeout(90)
try:
IS_anl.get(url_anl.format(IS_anl_statement, IS_anl_abbrev, symbol))
anl_element = WebDriverWait(IS_anl, 90).until(EC.element_to_be_clickable(
(By.LINK_TEXT, 'Annual')))
anl_csv_element = WebDriverWait(IS_anl, 90).until(EC.element_to_be_clickable(
(By.CSS_SELECTOR,self.css_path_export)))
anl_csv_element.click()
for i in range(1,10,1):
time.sleep(i/50)
if os.path.isfile(anl_file_string)==True:
break
except Exception as e:
print(e)
IS_anl.quit()
However, when running the download with the following abbreviation (simply substituting balance-sheet in for income-statement like so:
BS_anl_statement = 'balance-sheet'
BS_anl_abbrev = 'bs'
and the remaining Selenium code exactly the same I get the dreaded download error:
Failed-Path Too Long
This is strange because the actual filepath is Not Too Long. I in fact tested the download in 3 different directory's each with a shorter filepath than the last. The final example path is: r"C:\mstar_data\\"
I'm stuck. The only difference I can see between the two download attempts is the actual CSV link. But even in this case the Income-Statement download url is in fact longer than the Balance-Sheet so again I'm stuck. Here they are:
I/S CSV url:
http://financials.morningstar.com/ajax/ReportProcess4CSV.html?&t=XNAS:ZUMZ&region=usa&culture=en-US&cur=&reportType=is&period=12&dataType=A&order=asc&columnYear=5&curYearPart=1st5year&rounding=3&view=raw&r=573205&denominatorView=raw&number=3
B/S CSV url:
http://financials.morningstar.com/ajax/ReportProcess4CSV.html?&t=XNYS:A&region=usa&culture=en-US&cur=&reportType=bs&period=12&dataType=A&order=asc&columnYear=5&curYearPart=1st5year&rounding=3&view=raw&r=558981&denominatorView=raw&number=3
Any help on this inconsistency would be a great help and definitely appreciated. Thanks.
I was able to find a workaround for this issue. #Skandigraun was right there was a piece of code that was trying to use the downloaded file too soon.
I used Advanced Uninstaller to clear all temporary files, internet history and so on.
I restarted my computer.
I altered the following code block regarding Skandigraun comments from:
for in range(1, 10, 1):
time.sleep(i/50)
if os.path.isfile(anl_file_string)==True:
break
to:
for i in range(1, 10, 1):
time.sleep(i/20) # changed from 50 to 20
if os.path.isfile(anl_file_string)==True:
break
and I added:
finally:
IS_anl.quit()
Now the code runs with much more stability. I believe the os call to test if the file existed was happening too quickly/frequently and it caused a download failure. I added the finally statement to the try/except loop to close the browser no matter what happened. This prevented the script from hanging if there was a problem accessing the site, timeout errors etc.

Categories