How to open a custom protocol url with urllib? - python

I use chrome Momentum extension to customize my browser new tab and would like to write a python script to get its daily dashboard wallpaper
By now I know I can reach desired page through the url
chrome-extension://laookkfknpbbblfpciffpaejjkokdgca/dashboard.html
However, when I try to call urllib.request.urlopen with this url the following error is raised:
urllib.error.URLError: <urlopen error unknown url type: chrome-extension>
Is possible to include custom protocols to be open by urllib?
Or would there be another way to get page html result?

If the file exists locally and not on the web then using urllib won't do you much good since it's not a URL.
use webbrowser lib instead and provide a path to your file:
def auto_open():
"""
This method takes the absolute path to the html file and opens it directly in the browser.
"""
html_page = 'path/to/your/file'
# open in a new tab.
new = 2
webbrowser.open(html_page, new=new)

Related

Struggling to grab data from website using python

I'm trying to grab snowfall data from the National Weather Service at this site:
https://www.nohrsc.noaa.gov/snowfall/
The data can be downloaded via the webpage with a 'click' on the file type in the drop down, but I can't seem to figure out how to automate this using python. They have an ftp archive, but it requires a login and I can't access it for some reason.
However, since the files can be downloaded via a "click" on the webpage interface, I imagine there must be a way to grab it using wget or urlopen? But I can't seem to figure out what the exact url address would be in this case in order to use those functions. Does anyone have any ideas on how to download this data straight from the website listed above?
Thanks!
You can inspect links with Chrome Console.
Press F12, then click on file type:
Here an URL https://www.nohrsc.noaa.gov/snowfall/data/202112/sfav2_CONUS_6h_2021122618_grid184.nc
You can download it with python using Requests library
import requests
r = requests.get('https://www.nohrsc.noaa.gov/snowfall/data/202112/sfav2_CONUS_6h_2021122618_grid184.nc')
data = r.content # file context
Or you can just save it to file with urlretrieve
from urllib.request import urlretrieve
url = 'https://www.nohrsc.noaa.gov/snowfall/data/202112/sfav2_CONUS_6h_2021122618_grid184.nc'
dst = 'data.nc'
urlretrieve(url, dst)

python requests: url with another sub-requests

When I execute manually this URL in my webbrowser I see in my network console that three other requests will be executed. it works.
call: www.my.url/publish_something
get this cmd
get that cmd
post that...
How can I do it in Python requests?
That I only call once the "main"-URL including all sub-requests like my webbrowser.
> publish_url = "www.my.url/publish_something" r =
> self.session.get(publish_url, verify=False, params=p)
it seems, when I call this url with python requests-module, he does not execute the sub-requests.
When you open an url in your browser, the browser
- issues a GET request to that url
- parse the content
- issues GET requests for each image tag and for each script, style etc tags mentionning an external source,
- executes the scripts, which may lead to more sub requests and DOM modifications,
- and finally render the final DOM.
When you send a GET request with Python (with python-rquests, the urllib module or whatever), on the first of the above stages is performed, so if you want more you'll have to do it by yourself (parsing the content, retrieving images etc etc).
Or you can use a headless browser like PhantomJS.

Downloading a pdf from link but server redirects to homepage

I am trying to download a pdf from a webpage using urllib. I used the source link that downloads the file in the browser but that same link fails to download the file in Python. Instead what downloads is a redirect to the main page.
import os
import urllib
os.chdir(r'/Users/file')
url = "http://www.australianturfclub.com.au/races/SectionalsMeeting.aspx?meetingId=2414"
urllib.urlretrieve (url, "downloaded_file")
Please try downloading the file manually from the link provided or from the redirected site, the link on the main page is called 'sectionals'.
Your help is much appreciated.
It is because the given link redirects you to a "raw" pdf file. Examining the response headers via Firebug, I am able to get the filename sectionals/2014/2607RAND.pdf (see screenshot below) and as it is relative to the current .aspx file, the required URI should be switched to (in your case by changing the url variable to this link) http://www.australianturfclub.com.au/races/sectionals/2014/2607RAND.pdf
In python3:
import urllib.request
import shutil
local_filename, headers = urllib.request.urlretrieve('http://www.australianturfclub.com.au/races/SectionalsMeeting.aspx?meetingId=2414')
shutil.move(local_filename, 'ret.pdf')
The shutil is there because python save to a temp folder (im my case, that's another partition so os.rename will give me an error).

Make selenium grab all cookies

I was told to do a cookie audit of our front facing sites, now we own alot of domains, so Im really not going to manually dig through each one extracting the cookies. I decided to go with selenium. This works up till the point where I want to grab third party cookies.
Currently (python) I can do
driver.get_cookies()
For all the cookies that are set from my domain, but this doesn't give me any Google, Twitter, Vimeo, or other 3rd party cookies
I have tried modifying the cookie permissions in the firefox driver, but it doesn't help. Anyone know how I can get hold of tehm
Your question has been answered on StackOverflow here
Step 1: You need to download and install "Get All Cookies in XML" extension for Firefox from here (don't forget to restart Firefox after installing the extension).
Step2: Execute this python code to have Selenium's FirefoxWebDriver save all cookies to an xml file and then read this file:
from xml.dom import minidom
from selenium import webdriver
import os
import time
def determine_default_profile_dir():
"""
Returns path of Firefox's default profile directory
#return: directory_path
"""
appdata_location = os.getenv('APPDATA')
profiles_path = appdata_location + "/Mozilla/Firefox/Profiles/"
dirs_files_list = os.listdir(profiles_path)
default_profile_dir = ""
for item_name in dirs_files_list:
if item_name.endswith(".default"):
default_profile_dir = profiles_path + item_name
if not default_profile_dir:
assert ("did not find Firefox default profile directory")
return default_profile_dir
#load firefox with the default profile, so that the "Get All Cookies in XML" addon is enabled
default_firefox_profile = webdriver.FirefoxProfile(determine_default_profile_dir())
driver = webdriver.Firefox(default_firefox_profile)
#trigger Firefox to save value of all cookies into an xml file in Firefox profile directory
driver.get("chrome://getallcookies/content/getAllCookies.xul")
#wait for a bit to give Firefox time to write all the cookies to the file
time.sleep(40)
#cookies file will not be saved into directory with default profile, but into a temp directory.
current_profile_dir = driver.profile.profile_dir
cookie_file_path = current_profile_dir+"/cookie.xml"
print "Reading cookie data from cookie file: "+cookie_file_path
#load cookies file and do what you need with it
cookie_file = open(cookie_file_path,'r')
xmldoc = minidom.parse(cookie_file)
cookie_file.close()
driver.close()
#process all cookies in xmldoc object
Selenium can only get the cookies of the current domain:
getCookies
java.util.Set getCookies()
Get all the cookies for the current domain. This is the equivalent of
calling "document.cookie" and parsing the result
Anyway, I heard somebody used a Firefox plugin that was able to save all the cookies in XML. As far as I know, it is your best option.
Yes, I don't believe Selenium allows you to interact with cookies other than your current domain.
If you know the domains in question, then you could navigate to that domain but I assume this is unlikely.
It would be a massive security risk if you could access cookies cross site
You can get any cookie out of the browsers sqlite database file in the profile folder.
I added a more complete answer here:
Selenium 2 get all cookies on domain

Opening Local File Works with urllib but not with urllib2

I'm trying to open a local file using urllib2. How can I go about doing this? When I try the following line with urllib:
resp = urllib.urlopen(url)
it works correctly, but when I switch it to:
resp = urllib2.urlopen(url)
I get:
ValueError: unknown url type: /path/to/file
where that file definitely does exit.
Thanks!
Just put "file://" in front of the path
>>> import urllib2
>>> urllib2.urlopen("file:///etc/debian_version").read()
'wheezy/sid\n'
In urllib.urlopen method: If the URL parameter does not have a scheme identifier, it will opens a local file. but the urllib2 doesn't behave like this.
So, the urllib2 method can't process it.
It's always be good to include the 'file://' schema identifier in both of the method call for the url parameter.
I had the same issue and actually, I just realized that if you download the source of the page, and then open it on chrome your browser will show you the exact local path on the url bar. Good luck!

Categories