I am trying to acquire a file from a url on the web and then open that file for use in an application I’m making in python on AWS Lambda. There doesn’t seem to be a way for me to acquire the file in the form I need it, which I believe to be an os.Pathlike object.
Here is what I am trying now, which doesn’t work since requests.get returns a response not path. I’m posting from a phone right now so I cannot use code tags. Apologies.
filename = requests.get(“url.com/file.txt”)
f = open(filename, ‘rb’)
I have also tried a urlparse and a urllib urlretrieve on the url but that does not return a pathlike object either. Note that I don’t believe I can just use wget or something on the shell level as I am using AWS lambda.
import requests
url = 'http://url.com/file.txt'
r = requests.get(url, allow_redirects=True)
f = open(r, ‘rb’)
When you do such operation, it's always a good to see the entire response of the request you are doing. I usually use the dict attribute, works quite often
print(response.__dict__)
On the ones I have done, there were a _content field in the response object with the file bytes. Then you can simply use the io module to read this file :
file = io.BytesIO(response._content)
This can then be used as a file just like when you do open() function
Related
I can't get URL
base_url = "http://status.aws.amazon.com/"
socket.setdefaulttimeout(30)
htmldata = urllib2.urlopen(base_url)
for url in parser.url_list:
get_rss_th = threading.Thread(target=parser.get_rss,name="get_rss_th", args=(url,))
get_rss_th.start()
print htmldata
<addinfourl at 140176301032584 whose fp = <socket._fileobject object at 0x7f7d56a09750>>
when specifying htmldata.read() (Python error when using urllib.open)
then getting blank screen
python 2.7
whole code:https://github.com/tech-sketch/zabbix_aws_template/blob/master/scripts/AWS_Service_Health_Dashboard.py
The problem is, that from URL link (RSS feed), i can't get output (data) variable data = zbx_client.recv(4096) is empty- no status
There's no real problem with your code (except for a bunch of indentation errors and syntax errors that apparently aren't in your real code), only with your attempts to debug it.
First, you did this:
print htmldata
That's perfectly fine, but since htmldata is a urllib2 response object, printing it just prints that response object. Which apparently looks like this:
<addinfourl at 140176301032584 whose fp = <socket._fileobject object at 0x7f7d56a09750>>
That doesn't look like particularly useful information, but that's the kind of output you get when you print something that's only really useful for debugging purposes. It tells you what type of object it is, some unique identifier for it, and the key members (in this case, the socket fileobject wrapped up by the response).
Then you apparently tried this:
print htmldata.read()
But already called read on the same object earlier:
parser.feed(htmldata.read())
When you read() the same file-like object twice, the first time gets everything in the file, and the second time gets everything after everything in the file—that is, nothing.
What you want to do is read() the contents once, into a string, and then you can reuse that string as many times as you want:
contents = htmldata.read()
parser.feed(contents)
print contents
It's also worth noting that, as the urllib2 documentation said right at the top:
See also The Requests package is recommended for a higher-level HTTP client interface.
Using urllib2 can be a big pain, in a lot of ways, and this is just one of the more minor ones. Occasionally you can't use requests because you have to dig deep under the covers of HTTP, or handle some protocol it doesn't understand, or you just can't install third-party libraries, so urllib2 (or urllib.request, as it's renamed in Python 3.x) is still there. But when you don't have to use it, it's better not to. Even Python itself, in the ensurepip bootstrapper, uses requests instead of urllib2.
With requests, the normal way to access the contents of a response is with the content (for binary) or text (for Unicode text) properties. You don't have to worry about when to read(); it does it automatically for you, and lets you access the text over and over. So, you can just do this:
import requests
base_url = "http://status.aws.amazon.com/"
response = requests.get(base_url, timeout=30)
parser.feed(response.content) # assuming it wants bytes, not unicode
print response.text
If I use this code:
import urllib2
import socket
base_url = "http://status.aws.amazon.com/"
socket.setdefaulttimeout(30)
htmldata = urllib2.urlopen(base_url)
print(htmldata.read())
I get the page's HTML code.
I would like to download a zip file from internet and extract it.
I would rather use requests. I don't want to write to the disk.
I knew how to do that in Python2 but I am clueless for python3.3. Apparently, zipfile.Zipfile wants a file-like object but I don't know how to get that from what requests returns.
If you know how to do it with urllib.request, I would be curious to see how you do it too.
I found out how to do it:
request = requests.get(url)
file = zipfile.ZipFile(BytesIO(request.content))
What I was missing :
request.content should be used to access the bytes
io.BytesIO is the correct file-like object for bytes.
Here's another approach saving you having to install requests:
r = urllib.request.urlopen(req)
with zipfile.ZipFile(BytesIO(r.read())) as z:
print( z.namelist() )
Using Requests, this can be done very simply.
import requests, zipfile, StringIO
response = requests.get(zip_file_url)
zipDocument = zipfile.ZipFile(StringIO.StringIO(response.content))
Using String.IO you can make a file-like object for the responses content attribute.
If you want to extract to directory you can use the ZipFile's extractall() function
zipDocment.extractall()
So the process I'm performing seems to make logical sense to me but I keep getting an error. So I have this binary file I'm trying to send to a server (Shapeways to be exact. It's a binary 3d model file) so I go through this process to make it acceptable in a URL
theFile = open(fileloc,'rb')
contents = theFile.read()
b64 = base64.urlsafe_b64encode(contents)
url = urllib.urlencode(b64) # error
The problem is the last line always throws the error
TypeError: not a valid non-string sequence or mapping object
Which doesn't make sense to me as the data is suppose to be encoded for URLs. Is it possible it simply contains other characters that weren't encoded or something like that?
urllib.urlencode takes a sequence of two-element tuples or dictionary into a URL query string (it is basicly excerpt from docstring), but you are passing as a argument just a string.
You can try something like that:
theFile = open(fileloc,'rb')
contents = theFile.read()
b64 = base64.urlsafe_b64encode(contents)
url = urllib.urlencode({'shape': b64})
but all you get inside url variable is encoded params, so you still need actual url. If you don't need low level operations it is better to use requests library:
import requests
import base64
url = 'http://example.com'
r = requests.post(
url=url,
data={'shape':base64.urlsafe_b64encode(open(fileloc, 'rb').read())}
)
If you're just trying to send a file to your server, you shouldn't need to urlencode it. Send it using a POST request.
You can use urllib2 or you could use the requests lib which can simplify things a bit.
This SO thread may help you as well.
i've a issue with Python.
My case: i've a gzipped file from a partner platform (i.e. h..p//....namesite.../xxx)
If i click the link from my browser, it will download a file like (i.e. namefile.xml.gz).
So... if i read this file with python i can decompress and read it.
Code:
content = gzip.open(namefile.xml.gz,'rb')
print content.read()
But i can't if i try to read the file from remote source.
From remote file i can read only the encoded string, but not decoded it.
Code:
response = urllib2.urlopen(url)
encoded =response.read()
print encoded
With this code i can read the string encoded... but i can't decoded it with gzip or lzip.
Any advices?
Thanks a lot
Unfortunately the method #Aya suggests does not work, since GzipFile extensively uses seek method of the file object (not supported by response).
So you have basically two options:
Read the contents of the remote file into io.StringIO, and pass the object into gzip.GzipFile (if the file is small)
download the file into a temporary file on disk, and use gzip.open
There is another option (which requires some coding) - to implement your own reader using zlib module. It is rather easy, but you will need to know about a magic constant (How can I decompress a gzip stream with zlib?).
If you use Python 3.2 or later the bug in GzipFile (requiring tell support) is fixed, but they apparently aren't going to backport the fix to Python 2.x
For Python v3.2 or later, you can use the gzip.GzipFile class to wrap the file object returned by urllib2.urlopen(), with something like this...
import urllib2
import gzip
response = urllib2.urlopen(url)
gunzip_response = gzip.GzipFile(fileobj=response)
content = gunzip_response.read()
print content
...which will transparently decompress the response stream as you read it.
I have a program where I need to open many webpages and download information in them. The information, however, is in the middle of the page, and it takes a long time to get to it. Is there a way to have urllib only retrieve x lines? Or, if nothing else, don't load the information afterwards?
I'm using Python 2.7.1 on Mac OS 10.8.2.
The returned object is a file-like object, and you can use .readline() to only read a partial response:
resp = urllib.urlopen(url)
for i in range(10):
line = resp.readline()
would read only 10 lines, for example. Note that this won't guarantee a faster response.