When getting a file-like object from an API request, and copying it to a file, is there any way to determine what the file type was from the file-like object?
In the below example, I'm copying the response to testing.pdf because I just happen to know that the document in question was a PDF.
However, I'm trying to write a script that will take in a series of document IDs and run a bulk download from the source system (Oracle Aconex in this case).
This might be a question more aimed at the Aconex API, but I don't know much about Python's request library or how it works in the background, so thought I'd ask if there's something I'm missing.
import requests
import shutil
# Set aconex username and password
user_pass = (username, password)
# Document ID
doc_id = "xxxxxxxxxxxx"
# API Connection String
download_docs_string = "https://uk1.aconex.co.uk/api/projects/xxxxxxxx/register/" + doc_id
# API request ~ content is returned as bytes
response = requests.get(download_docs_string, auth = user_pass, stream = True)
# Copy contents of file-like object to new file
with open("testing.pdf", "wb") as f:
response.raw.decode_content = True
shutil.copyfileobj(response.raw, f)
I need a json file on the server to store some data, but it won't be too big to need a database. So I try to read the file, and after I finish using it I will need to overwrite the data to keep update.
I tried like this:
#app.route("/json")
def readwrite():
SITE_ROOT = os.path.realpath(os.path.dirname(__file__))
json_url = os.path.join(SITE_ROOT,'static', 'test.json')
token = open(json_url)
return token
But I get a 404 error on those. I'm not sure how I can read out those data and further rewrite. Please help if you see any problem in my code. Thanks!
You are returning the file handle via HTTP to the client. Get the json data and send that.
stored_json = token.readlines()
token.close()
return stored_json
I'm using the URLLib2 method to download a file from another server via a rest api (the url can't be exposed to the user--that's why it needs to be done on the backend).
It gives me the following response:
(<addinfourl at 4365818480 whose fp = <google.appengine.dist27.socket._fileobject object at 0x1043883d0>>
I'm now trying to find a way to serve this file to the end user (a download). I did quite a bit of research tonight but had no luck. I tried using print .read() and that didn't help either.
Here's some additional information:
The Platform is Google Appengine. And below is the relevant code:
In calltrunk.get_recording:
req = urllib2.Request(url, None, forward_headers)
print response[0].read()
stream = urllib2.urlopen(req)
In my main.py
response = calltrunk.get_recording(ConversationId=cId)
print response[0].read()
Could really use a hand here!
I am testing my webpage software by sending requests from python to it. I am able to send requests, receive responses and parse the json. However, one option on the webpage is to download files. I send the download request and can confirm that the response headers contain what I expect (application/octet-stream and the appropriate filename) but the Content-Length is 0. If the length is 0, I assume the file was not actually sent. I am able to download files from other means so I know my software works but I am having trouble with getting it to work with python.
I build up the request then do:
f = urllib.request.urlopen(request)
f.body = f.read()
I expect data to be in f.body but it is empty (I see "b''")
Is there a different way to access the file contents from an attachment in python?
Is there a different way to access the file contents from an attachment in python?
This is in python-requests instead urllib, since I'm more familiar with that.
import requests
url = "http://example.com/foobar.jpg"
#make request
r = requests.get(url)
attachment_data = r.content
#save to file
with open(r"C:/pictures/foobar.jpg", 'wb') as f:
f.write(attachment_data)
Turns out I needed to throw some data into the file in order to have something in the body. I should've noticed this much sooner.
I currently use WebFaction for my hosting with the basic package that gives us 80MB of RAM. This is more than adequate for our needs at the moment, apart from our backups. We do our own backups to S3 once a day.
The backup process is this: dump the database, tar.gz all the files into one backup named with the correct date of the backup, upload to S3 using the python library provided by Amazon.
Unfortunately, it appears (although I don't know this for certain) that either my code for reading the file or the S3 code is loading the entire file in to memory. As the file is approximately 320MB (for today's backup) it is using about 320MB just for the backup. This causes WebFaction to quit all our processes meaning the backup doesn't happen and our site goes down.
So this is the question: Is there any way to not load the whole file in to memory, or are there any other python S3 libraries that are much better with RAM usage. Ideally it needs to be about 60MB at the most! If this can't be done, how can I split the file and upload separate parts?
Thanks for your help.
This is the section of code (in my backup script) that caused the processes to be quit:
filedata = open(filename, 'rb').read()
content_type = mimetypes.guess_type(filename)[0]
if not content_type:
content_type = 'text/plain'
print 'Uploading to S3...'
response = connection.put(BUCKET_NAME, 'daily/%s' % filename, S3.S3Object(filedata), {'x-amz-acl': 'public-read', 'Content-Type': content_type})
It's a little late but I had to solve the same problem so here's my answer.
Short answer: in Python 2.6+ yes! This is because the httplib supports file-like objects as of v2.6. So all you need is...
fileobj = open(filename, 'rb')
content_type = mimetypes.guess_type(filename)[0]
if not content_type:
content_type = 'text/plain'
print 'Uploading to S3...'
response = connection.put(BUCKET_NAME, 'daily/%s' % filename, S3.S3Object(fileobj), {'x-amz-acl': 'public-read', 'Content-Type': content_type})
Long answer...
The S3.py library uses python's httplib to do its connection.put() HTTP requests. You can see in the source that it just passes the data argument to the httplib connection.
From S3.py...
def _make_request(self, method, bucket='', key='', query_args={}, headers={}, data='', metadata={}):
...
if (is_secure):
connection = httplib.HTTPSConnection(host)
else:
connection = httplib.HTTPConnection(host)
final_headers = merge_meta(headers, metadata);
# add auth header
self._add_aws_auth_header(final_headers, method, bucket, key, query_args)
connection.request(method, path, data, final_headers) # <-- IMPORTANT PART
resp = connection.getresponse()
if resp.status < 300 or resp.status >= 400:
return resp
# handle redirect
location = resp.getheader('location')
if not location:
return resp
...
If we take a look at the python httplib documentation we can see that...
HTTPConnection.request(method, url[, body[, headers]])
This will send a request to the server using the HTTP request method method and the selector url. If the body argument is present, it should be a string of data to send after the headers are finished. Alternatively, it may be an open file object, in which case the contents of the file is sent; this file object should support fileno() and read() methods. The header Content-Length is automatically set to the correct value. The headers argument should be a mapping of extra HTTP headers to send with the request.
Changed in version 2.6: body can be a file object.
don't read the whole file into your filedata variable. you could use a loop and then just read ~60 MB and submit them to amazon.
backup = open(filename, 'rb')
while True:
part_of_file = backup.read(60000000) # not exactly 60 MB....
response = connection.put() # submit part_of_file here to amazon