First post here so go easy on me. I'm new to Python.
I'm trying to download a .zip file from a website that requires a login to access the link.
In the Zip file is a .csv.
I've tried a few different methods but can't get the file to download.
This doesn't throw any errors but the file doesn't appear.
Sorry i can't give out the password and username.
Am i missing something? Do i need a different way to do it?
Thanks for your help.
UPDATE: still no joy. I managed to do it with VBA in excel but not python. Any ideas?
import requests
from requests.auth import HTTPBasicAuth
theurl= 'https://www.pencarrie.com/export/products.zip'
username = 'Secret'
password = 'Secret'
filename = os.path.basename(urlparse(theurl).path)
r=requests.get(theurl, auth=HTTPBasicAuth(username, password))
I've now switched to a different approach and am still getting the same result. The file that is downloaded is now complete. or is just the zip generated by the write command. the download complete command says its complete.......HELP PLEASE.
import requests
userid = '???????'
password = '??????'
site_url = 'https://www.pencarrie.com/marketing/website-options/data/enhanced-data'
file_url = 'https://www.pencarrie.com/export/products.zip'
o_file = 'products.zip'
# create session
s = requests.Session()
# GET request. This will generate cookie for you
s.get(site_url)
# login to site.
s.post(site_url, data={'_username': userid, '_password': password})
# Next thing will be to visit URL for file you would like to download.
r = s.get(file_url)
# Download file
with open(o_file, 'wb') as output:
output.write(r.content)
print(f"requests:: File {o_file} downloaded successfully!")
# Close session once all work done
s.close()
I have it running with VBA so I assume it's basic authentication but it just downloads html. But I'm not sure.
requests.get does not save whatever it downloads to file, instead it saves it to content field of the Response object.
What you need is to write it to file manually:
f = open(filename, "wb")
f.write(r.content)
f.close()
I had a script to send a file to anonfiles.com for sharing using requests. It emulates the suggested curl command:
curl -F "file=#test.txt" https://api.anonfiles.com/upload
with open(filename, 'rb') as file:
req = requests.post(
'https://api.anonfiles.com/upload',
files={'file': file},
)
res = json.loads(req.text)
This worked properly for files smaller than 2GB, but when the file grew bigger, it started printing a error message about the filesize.
I tried to using requests-toolbelt to send the file with no success.
with open(filename, 'rb') as file:
encoder = MultipartEncoder({'file': file})
req = requests.post(
'https://api.anonfiles.com/upload',
data=encoder,
headers={'Content-Type': encoder.content_type}
)
res = json.loads(req.text)
Which returns a the following error from anonfiles.com:
{
"status": false,
"error": {
"message": "No file chosen.",
"type": "ERROR_FILE_NOT_PROVIDED",
"code": 10
}
}
I'm probably missing something obvious, but can't figure it out right now.
P.S.: the use of requests-toolbelt.MultipartEncoder is not mandatory, I tried it and didn't manage to make it work. I will accept any answer even if it doesn't include this. A solution without request is not optimal but would also work.
Using a tuple seems to work:
with open(filename, 'rb') as file:
encoder = MultipartEncoder({'file': (filename, file)})
req = requests.post(
'https://api.anonfiles.com/upload',
data=encoder,
headers={'Content-Type': encoder.content_type}
)
res = json.loads(req.text)
EDIT:
Upon further examination, I really am not sure what's going on. I originally thought you could fix it by sending json=json_file with the request, but then I realized you were using the encoder. I do not know if the MultipartEncoder will work the same way with the json object.
Anyway...good luck!
I think this is a simple case of json.load vs json.loads:
json.load is for files
json.loads is for strings
In your original code you never loaded the file because you used json.loads, which is for strings.
Although I can't explain why it worked for smaller files so maybe I'm just off base!! you might try some sort of streaming for the larger file as well.
I haven't seen the syntax you used for the response before so I rewrote it a tiny bit to simplify it to understand it a little better, but mostly I think the issue is what I wrote above:
url = 'www.url.com'
headers = 'header stuff'
with open(filename, 'rb') as file:
json_file = json.load(file)
encoder = MultipartEncoder({'file': json_file})
response = requests.post(url, data=encoder, headers=headers).text
# or instead of appending .text you could create
# data = response.text for easier use in the rest of your code.
Writing a bot for a personal project, and the Bittrex api refuses to validate my content hash. I've tried everything I can think of and all the suggestions from similar questions, but nothing has worked so far. Tried hashing 'None', tried a blank string, tried the currency symbol, tried the whole uri, tried the command & balance, tried a few other things that also didn't work. Reformatted the request a few times (bytes/string/dict), still nothing.
Documentation says to hash the request body (which seems synonymous with payload in similar questions about making transactions through the api), but it's a simple get/chcek balance request with no payload.
Problem is, I get a 'BITTREX ERROR: INVALID CONTENT HASH' response when I run it.
Any help would be greatly appreciated, this feels like a simple problem but it's been frustrating the hell out of me. I am very new to python, but the rest of the bot went very well, which makes it extra frustrating that I can't hook it up to my account :/
import hashlib
import hmac
import json
import os
import time
import requests
import sys
# Base Variables
Base_Url = 'https://api.bittrex.com/v3'
APIkey = os.environ.get('B_Key')
secret = os.environ.get('S_B_Key')
timestamp = str(int(time.time() * 1000))
command = 'balances'
method = 'GET'
currency = 'USD'
uri = Base_Url + '/' + command + '/' + currency
payload = ''
print(payload) # Payload Check
# Hashes Payload
content = json.dumps(payload, separators=(',', ':'))
content_hash = hashlib.sha512(bytes(json.dumps(content), "utf-8")).hexdigest()
print(content_hash)
# Presign
presign = (timestamp + uri + method + str(content_hash) + '')
print(presign)
# Create Signature
message = f'{timestamp}{uri}{method}{content_hash}'
sign = hmac.new(secret.encode('utf-8'), message.encode('utf-8'),
hashlib.sha512).hexdigest()
print(sign)
headers = {
'Api-Key': APIkey,
'Api-Timestamp': timestamp,
'Api-Signature': sign,
'Api-Content-Hash': content_hash
}
print(headers)
req = requests.get(uri, json=payload, headers=headers)
tracker_1 = "Tracker 1: Response =" + str(req)
print(tracker_1)
res = req.json()
if req.ok is False:
print('bullshit error #1')
print("Bittex response: %s" % res['code'], file=sys.stderr)
I can see two main problems:
You are serialising/encoding the payload separately for the hash (with json.dumps and then bytes) and for the request (with the json=payload parameter to request.get). You don't have any way of knowing how the requests library will format your data, and if even one byte is different you will get a different hash. It is better to convert your data to bytes first, and then use the same bytes for the hash and for the request body.
GET requests do not normally have a body (see this answer for more details), so it might be that the API is ignoring the payload you are sending. You should check the API docs to see if you really need to send a request body with GET requests.
I'm performing a simple task of uploading a file using Python requests library. I searched Stack Overflow and no one seemed to have the same problem, namely, that the file is not received by the server:
import requests
url='http://nesssi.cacr.caltech.edu/cgi-bin/getmulticonedb_release2.cgi/post'
files={'files': open('file.txt','rb')}
values={'upload_file' : 'file.txt' , 'DB':'photcat' , 'OUT':'csv' , 'SHORT':'short'}
r=requests.post(url,files=files,data=values)
I'm filling the value of 'upload_file' keyword with my filename, because if I leave it blank, it says
Error - You must select a file to upload!
And now I get
File file.txt of size bytes is uploaded successfully!
Query service results: There were 0 lines.
Which comes up only if the file is empty. So I'm stuck as to how to send my file successfully. I know that the file works because if I go to this website and manually fill in the form it returns a nice list of matched objects, which is what I'm after. I'd really appreciate all hints.
Some other threads related (but not answering my problem):
Send file using POST from a Python script
http://docs.python-requests.org/en/latest/user/quickstart/#response-content
Uploading files using requests and send extra data
http://docs.python-requests.org/en/latest/user/advanced/#body-content-workflow
If upload_file is meant to be the file, use:
files = {'upload_file': open('file.txt','rb')}
values = {'DB': 'photcat', 'OUT': 'csv', 'SHORT': 'short'}
r = requests.post(url, files=files, data=values)
and requests will send a multi-part form POST body with the upload_file field set to the contents of the file.txt file.
The filename will be included in the mime header for the specific field:
>>> import requests
>>> open('file.txt', 'wb') # create an empty demo file
<_io.BufferedWriter name='file.txt'>
>>> files = {'upload_file': open('file.txt', 'rb')}
>>> print(requests.Request('POST', 'http://example.com', files=files).prepare().body.decode('ascii'))
--c226ce13d09842658ffbd31e0563c6bd
Content-Disposition: form-data; name="upload_file"; filename="file.txt"
--c226ce13d09842658ffbd31e0563c6bd--
Note the filename="file.txt" parameter.
You can use a tuple for the files mapping value, with between 2 and 4 elements, if you need more control. The first element is the filename, followed by the contents, and an optional content-type header value and an optional mapping of additional headers:
files = {'upload_file': ('foobar.txt', open('file.txt','rb'), 'text/x-spam')}
This sets an alternative filename and content type, leaving out the optional headers.
If you are meaning the whole POST body to be taken from a file (with no other fields specified), then don't use the files parameter, just post the file directly as data. You then may want to set a Content-Type header too, as none will be set otherwise. See Python requests - POST data from a file.
(2018) the new python requests library has simplified this process, we can use the 'files' variable to signal that we want to upload a multipart-encoded file
url = 'http://httpbin.org/post'
files = {'file': open('report.xls', 'rb')}
r = requests.post(url, files=files)
r.text
Client Upload
If you want to upload a single file with Python requests library, then requests lib supports streaming uploads, which allow you to send large files or streams without reading into memory.
with open('massive-body', 'rb') as f:
requests.post('http://some.url/streamed', data=f)
Server Side
Then store the file on the server.py side such that save the stream into file without loading into the memory. Following is an example with using Flask file uploads.
#app.route("/upload", methods=['POST'])
def upload_file():
from werkzeug.datastructures import FileStorage
FileStorage(request.stream).save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
return 'OK', 200
Or use werkzeug Form Data Parsing as mentioned in a fix for the issue of "large file uploads eating up memory" in order to avoid using memory inefficiently on large files upload (s.t. 22 GiB file in ~60 seconds. Memory usage is constant at about 13 MiB.).
#app.route("/upload", methods=['POST'])
def upload_file():
def custom_stream_factory(total_content_length, filename, content_type, content_length=None):
import tempfile
tmpfile = tempfile.NamedTemporaryFile('wb+', prefix='flaskapp', suffix='.nc')
app.logger.info("start receiving file ... filename => " + str(tmpfile.name))
return tmpfile
import werkzeug, flask
stream, form, files = werkzeug.formparser.parse_form_data(flask.request.environ, stream_factory=custom_stream_factory)
for fil in files.values():
app.logger.info(" ".join(["saved form name", fil.name, "submitted as", fil.filename, "to temporary file", fil.stream.name]))
# Do whatever with stored file at `fil.stream.name`
return 'OK', 200
You can send any file via post api while calling the API just need to mention files={'any_key': fobj}
import requests
import json
url = "https://request-url.com"
headers = {"Content-Type": "application/json; charset=utf-8"}
with open(filepath, 'rb') as fobj:
response = requests.post(url, headers=headers, files={'file': fobj})
print("Status Code", response.status_code)
print("JSON Response ", response.json())
#martijn-pieters answer is correct, however I wanted to add a bit of context to data= and also to the other side, in the Flask server, in the case where you are trying to upload files and a JSON.
From the request side, this works as Martijn describes:
files = {'upload_file': open('file.txt','rb')}
values = {'DB': 'photcat', 'OUT': 'csv', 'SHORT': 'short'}
r = requests.post(url, files=files, data=values)
However, on the Flask side (the receiving webserver on the other side of this POST), I had to use form
#app.route("/sftp-upload", methods=["POST"])
def upload_file():
if request.method == "POST":
# the mimetype here isnt application/json
# see here: https://stackoverflow.com/questions/20001229/how-to-get-posted-json-in-flask
body = request.form
print(body) # <- immutable dict
body = request.get_json() will return nothing. body = request.get_data() will return a blob containing lots of things like the filename etc.
Here's the bad part: on the client side, changing data={} to json={} results in this server not being able to read the KV pairs! As in, this will result in a {} body above:
r = requests.post(url, files=files, json=values). # No!
This is bad because the server does not have control over how the user formats the request; and json= is going to be the habbit of requests users.
Upload:
with open('file.txt', 'rb') as f:
files = {'upload_file': f.read()}
values = {'DB': 'photcat', 'OUT': 'csv', 'SHORT': 'short'}
r = requests.post(url, files=files, data=values)
Download (Django):
with open('file.txt', 'wb') as f:
f.write(request.FILES['upload_file'].file.read())
Regarding the answers given so far, there was always something missing that prevented it to work on my side. So let me show you what worked for me:
import json
import os
import requests
API_ENDPOINT = "http://localhost:80"
access_token = "sdfJHKsdfjJKHKJsdfJKHJKysdfJKHsdfJKHs" # TODO: get fresh Token here
def upload_engagement_file(filepath):
url = API_ENDPOINT + "/api/files" # add any URL parameters if needed
hdr = {"Authorization": "Bearer %s" % access_token}
with open(filepath, "rb") as fobj:
file_obj = fobj.read()
file_basename = os.path.basename(filepath)
file_to_upload = {"file": (str(file_basename), file_obj)}
finfo = {"fullPath": filepath}
upload_response = requests.post(url, headers=hdr, files=file_to_upload, data=finfo)
fobj.close()
# print("Status Code ", upload_response.status_code)
# print("JSON Response ", upload_response.json())
return upload_response
Note that requests.post(...) needs
a url parameter, containing the full URL of the API endpoint you're calling, using the API_ENDPOINT, assuming we have an http://localhost:8000/api/files endpoint to POST a file
a headers parameter, containing at least the authorization (bearer token)
a files parameter taking the name of the file plus the entire file content
a data parameter taking just the path and file name
Installation required (console):
pip install requests
What you get back from the function call is a response object containing a status code and also the full error message in JSON format. The commented print statements at the end of upload_engagement_file are showing you how you can access them.
Note: Some useful additional information about the requests library can be found here
Some may need to upload via a put request and this is slightly different that posting data. It is important to understand how the server expects the data in order to form a valid request. A frequent source of confusion is sending multipart-form data when it isn't accepted. This example uses basic auth and updates an image via a put request.
url = 'foobar.com/api/image-1'
basic = requests.auth.HTTPBasicAuth('someuser', 'password123')
# Setting the appropriate header is important and will vary based
# on what you upload
headers = {'Content-Type': 'image/png'}
with open('image-1.png', 'rb') as img_1:
r = requests.put(url, auth=basic, data=img_1, headers=headers)
While the requests library makes working with http requests a lot easier, some of its magic and convenience obscures just how to craft more nuanced requests.
In Ubuntu you can apply this way,
to save file at some location (temporary) and then open and send it to API
path = default_storage.save('static/tmp/' + f1.name, ContentFile(f1.read()))
path12 = os.path.join(os.getcwd(), "static/tmp/" + f1.name)
data={} #can be anything u want to pass along with File
file1 = open(path12, 'rb')
header = {"Content-Disposition": "attachment; filename=" + f1.name, "Authorization": "JWT " + token}
res= requests.post(url,data,header)
I would like to download file over HTTP protocol using urllib3.
I have managed to do this using following code:
url = 'http://url_to_a_file'
connection_pool = urllib3.PoolManager()
resp = connection_pool.request('GET',url )
f = open(filename, 'wb')
f.write(resp.data)
f.close()
resp.release_conn()
But I was wondering what is the proper way of doing this.
For example will it work well for big files and If no what to do to make this code more bug tolerant and scalable.
Note. It is important to me to use urllib3 library not urllib2 for example, because I want my code to be thread safe.
Your code snippet is close. Two things worth noting:
If you're using resp.data, it will consume the entire response and return the connection (you don't need to resp.release_conn() manually). This is fine if you're cool with holding the data in-memory.
You could use resp.read(amt) which will stream the response, but the connection will need to be returned via resp.release_conn().
This would look something like...
import urllib3
http = urllib3.PoolManager()
r = http.request('GET', url, preload_content=False)
with open(path, 'wb') as out:
while True:
data = r.read(chunk_size)
if not data:
break
out.write(data)
r.release_conn()
The documentation might be a bit lacking on this scenario. If anyone is interested in making a pull-request to improve the urllib3 documentation, that would be greatly appreciated. :)
The most correct way to do this is probably to get a file-like object that represents the HTTP response and copy it to a real file using shutil.copyfileobj as below:
url = 'http://url_to_a_file'
c = urllib3.PoolManager()
with c.request('GET',url, preload_content=False) as resp, open(filename, 'wb') as out_file:
shutil.copyfileobj(resp, out_file)
resp.release_conn() # not 100% sure this is required though
Most easy way with urllib3, you can use shutil do auto-manage packages.
import urllib3
import shutil
http = urllib3.PoolManager()
with open(filename, 'wb') as out:
r = http.request('GET', url, preload_content=False)
shutil.copyfileobj(r, out)