Open URL or local file in Python 3 - python

I have a string.
I want:
If this string is an URL, I want to open this URL.
Otherwise, I want to open a local file with this name.
If there is no such object, raise an exception.
What is the correct and easy enough (if possible) way to do it in Python 3?
The main issue is the correct way to determine if a string is an URL.

Depends on what you mean by URL. If it's a web address, it will most usually start with http:// or https:// (usually, those are the cases you care about, anyway). However, it is possible that is also starts with ftp:// or some other protocol. However, most libraries accept URIs, which includes file URI. In that scheme, a file location looks like a URL that starts with file://, so you could pass your string, not caring whether it's a web address or a file, and the library will take care of it. There is no straight way of knowing whether the address is valid, but the library will throw an exception if it's not.

Try this:
import os
import webbrowser
import requests
webbrowser.open(s) if os.path.isfile(s) or requests.get(s) else exec("raise Exception")

This hasn't been tested, so treat it as psuedocode.
if s.startswith('http'):
# Do something
elif os.path.isfile(s):
file = open(s, 'r')
else:
raise Exception

Related

Using URL of file as file path in Python in Lambda

I am trying to acquire a file from a url on the web and then open that file for use in an application I’m making in python on AWS Lambda. There doesn’t seem to be a way for me to acquire the file in the form I need it, which I believe to be an os.Pathlike object.
Here is what I am trying now, which doesn’t work since requests.get returns a response not path. I’m posting from a phone right now so I cannot use code tags. Apologies.
filename = requests.get(“url.com/file.txt”)
f = open(filename, ‘rb’)
I have also tried a urlparse and a urllib urlretrieve on the url but that does not return a pathlike object either. Note that I don’t believe I can just use wget or something on the shell level as I am using AWS lambda.
import requests
url = 'http://url.com/file.txt'
r = requests.get(url, allow_redirects=True)
f = open(r, ‘rb’)
When you do such operation, it's always a good to see the entire response of the request you are doing. I usually use the dict attribute, works quite often
print(response.__dict__)
On the ones I have done, there were a _content field in the response object with the file bytes. Then you can simply use the io module to read this file :
file = io.BytesIO(response._content)
This can then be used as a file just like when you do open() function

urllib2.open error in python

I can't get URL
base_url = "http://status.aws.amazon.com/"
socket.setdefaulttimeout(30)
htmldata = urllib2.urlopen(base_url)
for url in parser.url_list:
get_rss_th = threading.Thread(target=parser.get_rss,name="get_rss_th", args=(url,))
get_rss_th.start()
print htmldata
<addinfourl at 140176301032584 whose fp = <socket._fileobject object at 0x7f7d56a09750>>
when specifying htmldata.read() (Python error when using urllib.open)
then getting blank screen
python 2.7
whole code:https://github.com/tech-sketch/zabbix_aws_template/blob/master/scripts/AWS_Service_Health_Dashboard.py
The problem is, that from URL link (RSS feed), i can't get output (data) variable data = zbx_client.recv(4096) is empty- no status
There's no real problem with your code (except for a bunch of indentation errors and syntax errors that apparently aren't in your real code), only with your attempts to debug it.
First, you did this:
print htmldata
That's perfectly fine, but since htmldata is a urllib2 response object, printing it just prints that response object. Which apparently looks like this:
<addinfourl at 140176301032584 whose fp = <socket._fileobject object at 0x7f7d56a09750>>
That doesn't look like particularly useful information, but that's the kind of output you get when you print something that's only really useful for debugging purposes. It tells you what type of object it is, some unique identifier for it, and the key members (in this case, the socket fileobject wrapped up by the response).
Then you apparently tried this:
print htmldata.read()
But already called read on the same object earlier:
parser.feed(htmldata.read())
When you read() the same file-like object twice, the first time gets everything in the file, and the second time gets everything after everything in the file—that is, nothing.
What you want to do is read() the contents once, into a string, and then you can reuse that string as many times as you want:
contents = htmldata.read()
parser.feed(contents)
print contents
It's also worth noting that, as the urllib2 documentation said right at the top:
See also The Requests package is recommended for a higher-level HTTP client interface.
Using urllib2 can be a big pain, in a lot of ways, and this is just one of the more minor ones. Occasionally you can't use requests because you have to dig deep under the covers of HTTP, or handle some protocol it doesn't understand, or you just can't install third-party libraries, so urllib2 (or urllib.request, as it's renamed in Python 3.x) is still there. But when you don't have to use it, it's better not to. Even Python itself, in the ensurepip bootstrapper, uses requests instead of urllib2.
With requests, the normal way to access the contents of a response is with the content (for binary) or text (for Unicode text) properties. You don't have to worry about when to read(); it does it automatically for you, and lets you access the text over and over. So, you can just do this:
import requests
base_url = "http://status.aws.amazon.com/"
response = requests.get(base_url, timeout=30)
parser.feed(response.content) # assuming it wants bytes, not unicode
print response.text
If I use this code:
import urllib2
import socket
base_url = "http://status.aws.amazon.com/"
socket.setdefaulttimeout(30)
htmldata = urllib2.urlopen(base_url)
print(htmldata.read())
I get the page's HTML code.

read() from a ExFileObject always cause StreamError exception

I am trying to read only one file from a tar.gz file. All operations over tarfile object works fine, but when I read from concrete member, always StreamError is raised, check this code:
import tarfile
fd = tarfile.open('file.tar.gz', 'r|gz')
for member in fd.getmembers():
if not member.isfile():
continue
cfile = fd.extractfile(member)
print cfile.read()
cfile.close()
fd.close()
cfile.read() always causes "tarfile.StreamError: seeking backwards is not allowed"
I need to read contents to mem, not dumping to file (extractall works fine)
Thank you!
The problem is this line:
fd = tarfile.open('file.tar.gz', 'r|gz')
You don't want 'r|gz', you want 'r:gz'.
If I run your code on a trivial tarball, I can even print out the member and see test/foo, and then I get the same error on read that you get.
If I fix it to use 'r:gz', it works.
From the docs:
mode has to be a string of the form 'filemode[:compression]'
...
For special purposes, there is a second format for mode: 'filemode|[compression]'. tarfile.open() will return a TarFile object that processes its data as a stream of blocks. No random seeking will be done on the file… Use this variant in combination with e.g. sys.stdin, a socket file object or a tape device. However, such a TarFile object is limited in that it does not allow to be accessed randomly, see Examples.
'r|gz' is meant for when you have a non-seekable stream, and it only provides a subset of the operations. Unfortunately, it doesn't seem to document exactly which operations are allowed—and the link to Examples doesn't help, because none of the examples use this feature. So, you have to either read the source, or figure it out through trial and error.
But, since you have a normal, seekable file, you don't have to worry about that; just use 'r:gz'.
In addition to the file mode, I attempted to seek on a network stream.
I had the same error when trying to requests.get the file, so I extracted all to a tmp directory:
# stream == requests.get
inputs = [tarfile.open(fileobj=LZMAFile(stream), mode='r|')]
t = "/tmp"
for tarfileobj in inputs:
tarfileobj.extractall(path=t, members=None)
for fn in os.listdir(t):
with open(os.path.join(t, fn)) as payload:
print(payload.read())

hgweb raw view returns the wrong content-type

Background: I'm using https://bitbucket.org/mariocesar/django-hgwebproxy/wiki/Home to add a Mercurial browser to a Django site I'm building.
The problem I'm having is: The particular files we're storing in the HG repo are bind zone files and happen to be named /some/path/somedomain.com which is causing hgweb to set the content-type to application/x-msdos-program (when the content is really text/plain) when returning the raw view of the file. The incorrect content-type is causing hgwebproxy to dump the content into the page template, rather than just return it. It does a test like this to skip templating:
if response['content-type'].split(';')[0] in ('application/octet-stream', 'text/plain'):
return response
Some posible solutions are of course
Rename all the files to .zone (Lame and time consuming)
Hack hgwebproxy to pass application/x-msdos-program (Lame and dirty)
Convince hgweb to use the correct content-type (Awesome! I hope you'll help)
hgweb uses mimetypes to detect the mime type of a file. You might be able to override the ".com" suffix detection by adding a settings file. See: mimetypes.knownfiles:
>>> import mimetypes
>>> mimetypes.init()
>>> mimetypes.knownfiles
['/etc/mime.types', '/etc/httpd/mime.types', '/etc/httpd/conf/mime.types', '/etc/apache/mime.types', '/etc/apache2/mime.types', '/usr/local/etc/httpd/conf/mime.types', '/usr/local/lib/netscape/mime.types', '/usr/local/etc/httpd/conf/mime.types', '/usr/local/etc/mime.types']

How to allow dynamically created file to be downloaded in CherryPy?

I'm trying to use CherryPy for a simple website, having never done Python web programming before.
I'm stuck trying to allow the download of a file that is dynamically created. I can create a file and return it from the handler, or call serve_fileobj() on the file, but in either case the contents of the file are simply rendered to the screen, rather than downloaded.
Does CherryPy offer any useful methods here? How can this be accomplished?
If you set the correct content type, you won't have to worry about it rendering in the browser when you return it unless it's appropriate. Try:
response.headers['Content-Type'] = 'application/foo'
(or whatever the correct MIME type for your content is) before you return the content.
Add 'Content-Disposition: attachment; filename="<file>"' header to response
Putting the previous answers from #varela and #JasonFruit together:
dynamic_content = "this was generated on the fly!"
cherrypy.response.headers['Content-Disposition'] = 'attachment; filename="<file>"'
return dynamic_content

Categories