Opening Local File Works with urllib but not with urllib2

Opening Local File Works with urllib but not with urllib2 - python

I'm trying to open a local file using urllib2. How can I go about doing this? When I try the following line with urllib:
resp = urllib.urlopen(url)
it works correctly, but when I switch it to:
resp = urllib2.urlopen(url)
I get:
ValueError: unknown url type: /path/to/file
where that file definitely does exit.
Thanks!

Just put "file://" in front of the path
>>> import urllib2
>>> urllib2.urlopen("file:///etc/debian_version").read()
'wheezy/sid\n'

In urllib.urlopen method: If the URL parameter does not have a scheme identifier, it will opens a local file. but the urllib2 doesn't behave like this.
So, the urllib2 method can't process it.
It's always be good to include the 'file://' schema identifier in both of the method call for the url parameter.

I had the same issue and actually, I just realized that if you download the source of the page, and then open it on chrome your browser will show you the exact local path on the url bar. Good luck!

Related

Quote path part of URL but not parameters

I have an URL of the following form https://website.com/chill & relax/folder/file.txt?a=1&b=2 (The link is an dummy example, it's not meant to be working)
When I paste this URL in Firefox, I can fetch the wanted file.txt, but when I try to retrieve the file using python and requests, it doesn't work:
>>> import requests
>>> url = "https://website.com/chill & relax/folder/file.txt?a=1&b=2"
>>> requests.get(url)
Traceback (most recent call last):
[...]
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x00982B70>:
Failed to establish a new connection: [WinError 10060] [...]
How come Firefox can request the file but not requests? How can I replace the spaces and "&" signs only in the path part of my URL?
EDIT: I now believe that requests can actually perform a requests with URLs that contain spaces. I think this issue is linked to my proxy, Firefox can work with my proxy, but requests commands executed within PyCharm are stopped by my proxy.

For path encoding you can use this:
from requests.utils import requote_uri
url = requote_uri("https://website.com/chill & relax/folder/file.txt?a=1&b=2")
But link anyway not working for me even in browser. Does the link is actually valid?

Simply URL encode your path then append the parameters.
import urllib.parse
url = urllib.parse.quote("https://website.com/chill & relax/folder/file.txt")
url += "?a=1&b=2"
r = requests.get(url)

In Python, how to open a string representing HTML in the browser?

I'd like to view a Django template in the browser. This particular template is called in render_to_string, but is not connected to a view, so I can't just runserver and navigate to the URL at my localhost.
My idea was to simply call render_to_string in the Django shell and somehow pass the resulting string to a web browser such as Chrome to view it. However, as far as I can tell the webbrowser module only accepts url arguments and can't be used to render strings representing HTML.
Any idea how I could achieve this?

Use Data URL:
import base64
html = b"..."
url = "text/html;base64," + base64.b64encode(html)
webbrowser.open(url)

you could convert the html string to url:
https://docs.python.org/2/howto/urllib2.html

Following Launch HTML code in browser (that is generated by BeautifulSoup) straight from Python, I wrote a test case in which I write the HTML to a temporary file and use the file:// prefix to turn that into a url accepted by webbrowser.open():
import tempfile
import webbrowser
from django.test import SimpleTestCase
from django.template.loader import render_to_string
class ViewEmailTemplate(SimpleTestCase):
def test_view_email_template(self):
html = render_to_string('ebay/activate_to_family.html')
fh, path = tempfile.mkstemp(suffix='.html')
url = 'file://' + path
with open(path, 'w') as fp:
fp.write(html)
webbrowser.open(url)
(Unfortunately, I found that the page does not contain images referenced by Django's static tag, but that's a separate issue).

Here's a more concise solution which gets around the possible ValueError: startfile: filepath too long for Windows error in the solution by #marat:
from pathlib import Path
Path("temp.html").write_text(html, encoding='utf-8')
webbrowser.open("temp.html")
Path("temp.html").unlink()

Downloading a pdf from link but server redirects to homepage

I am trying to download a pdf from a webpage using urllib. I used the source link that downloads the file in the browser but that same link fails to download the file in Python. Instead what downloads is a redirect to the main page.
import os
import urllib
os.chdir(r'/Users/file')
url = "http://www.australianturfclub.com.au/races/SectionalsMeeting.aspx?meetingId=2414"
urllib.urlretrieve (url, "downloaded_file")
Please try downloading the file manually from the link provided or from the redirected site, the link on the main page is called 'sectionals'.
Your help is much appreciated.

It is because the given link redirects you to a "raw" pdf file. Examining the response headers via Firebug, I am able to get the filename sectionals/2014/2607RAND.pdf (see screenshot below) and as it is relative to the current .aspx file, the required URI should be switched to (in your case by changing the url variable to this link) http://www.australianturfclub.com.au/races/sectionals/2014/2607RAND.pdf

In python3:
import urllib.request
import shutil
local_filename, headers = urllib.request.urlretrieve('http://www.australianturfclub.com.au/races/SectionalsMeeting.aspx?meetingId=2414')
shutil.move(local_filename, 'ret.pdf')
The shutil is there because python save to a temp folder (im my case, that's another partition so os.rename will give me an error).

urllib2 does not read entire page

A portion of code that I have that will parse a web site does not work.
I can trace the problem to the .read function of my urllib2.urlopen object.
page = urllib2.urlopen('http://magiccards.info/us/en.html')
data = page.read()
Until yesterday, this worked fine; but now the length of the data is always 69496 instead of 122989, however when I open smaller pages my code works fine.
I have tested this on Ubuntu, Linux Mint and windows 7. All have the same behaviour.
I'm assuming that something has changed on the web server; but the page is complete when I use a web browser. I have tried to diagnose the issue with wireshark but the page is received as complete.
Does anybody know why this may be happening or what I could try to determine the issue?

The page seems to be misbehaving unless you request the content encoded as gzip. Give this a shot:
import urllib2
import zlib
request = urllib2.Request('http://magiccards.info/us/en.html')
request.add_header('Accept-Encoding', 'gzip')
response = urllib2.urlopen(request)
data = zlib.decompress(response.read(), 16 + zlib.MAX_WBITS)
As Nathan suggested, you could also use the great Requests library, which accepts gzip by default.
import requests
data = requests.get('http://magiccards.info/us/en.html').text

Yes, the server is closing connection and you need keep-alive to be sent. urllib2 does not have that facility ( :-( ). There used be urlgrabber which you could use have a HTTPHandler that works alongside with urllib2 opener. But unfortunately, I dont find that working too. At the moment, you could be other libraries, like requests as demonstrated in the other answer or httplib2.
import httplib2
h = httplib2.Http(".cache")
resp, content = h.request("http://magiccards.info/us/en.html", "GET")
print len(content)

ValueError: unknown url type in urllib2, though the url is fine if opened in a browser

Basically, I am trying to download a URL using urllib2 in python.
the code is the following:
import urllib2
req = urllib2.Request('www.tattoo-cover.co.uk')
req.add_header('User-agent','Mozilla/5.0')
result = urllib2.urlopen(req)
it outputs ValueError and the program crushes for the URL in the example.
When I access the url in a browser, it works fine.
Any ideas how to handle the problem?
UPDATE:
thanks for Ben James and sth the problem is detected => add 'http://'
Now the question is refined:
Is it possible to handle such cases automatically with some builtin function or I have to do error handling with subsequent string concatenation?

When you enter a URL in a browser without the protocol, it defaults to HTTP. urllib2 won't make that assumption for you; you need to prefix it with http://.

You have to use a complete URL including the protocol, not just specify a host name.
The correct URL would be http://www.tattoo-cover.co.uk/.

You can use the method urlparse from urllib (Python 3) to check the presence of an addressing scheme (http, https, ftp) and concatenate the scheme in case it is not present:
In [1]: from urllib.parse import urlparse
..:
..: url = 'www.myurl.com'
..: if not urlparse(url).scheme:
..: url = 'http://' + url
..:
..: url
Out[1]: 'http://www.myurl.com'

You can use the urlparse function for that I think
:
Python User Documentation

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Opening Local File Works with urllib but not with urllib2 - python

Just put "file://" in front of the path >>> import urllib2 >>> urllib2.urlopen("file:///etc/debian_version").read() 'wheezy/sid\n'

I had the same issue and actually, I just realized that if you download the source of the page, and then open it on chrome your browser will show you the exact local path on the url bar. Good luck!

Related

Quote path part of URL but not parameters

In Python, how to open a string representing HTML in the browser?

Downloading a pdf from link but server redirects to homepage

urllib2 does not read entire page

ValueError: unknown url type in urllib2, though the url is fine if opened in a browser

Categories

Resources