Related
I have a .html file downloaded and want to send a request to this file to grab it's content.
However, if I do the following:
import requests
html_file = "/user/some_html.html"
r = requests.get(html_file)
Gives the following error:
Invalid URL 'some_html.html': No schema supplied.
If I add a schema I get the following error:
HTTPConnectionPool(host='some_html.html', port=80): Max retries exceeded with url:
I want to know how to specifically send a request to a html file when it's downloaded.
You are accessing html file from local directory. get() method uses HTTPConnection and port 80 to access data from website not a local directory. To access file from local directory using get() method use Xampp or Wampp.
for accessing file from local directory you can use open() while requests.get() is for accessing file from Port 80 using http Connection in simple word from internet not local directory
import requests
html_file = "/user/some_html.html"
t=open(html_file, "r")
for v in t.readlines():
print(v)
Output:
You don't "send a request to a html file". Instead, you can send a request to a HTTP server on the internet which will return a response with the contents of a html file.
The file itself knows nothing about "requests". If you have the file stored locally and want to do something with it, then you can open it just like any other file.
If you are interested in learning more about the request and response model, I suggest you try a something like
response = requests.get("http://stackoverflow.com")
You should also read about HTTP and requests and responses to better understand how this works.
You can do it by setting up a local server to your html file.
If you use Visual Studio Code, you can install Live Server by Ritwick Dey.
Then you do as follows:
1 - Make the first request and save the html content into a .html file:
my_req.py
import requests
file_path = './'
file_name = 'my_file'
url = "https://www.qwant.com/"
response = requests.request("GET", url)
w = open(file_path + file_name + '.html', 'w')
w.write(response.text)
2 - With Live Server installed on Visual Studio Code, click on my_file.html and then click on Go Live.
and
3 - Now you can make a request to your local http schema:
second request
import requests
url = "http://127.0.0.1:5500/my_file.html"
response = requests.request("GET", url)
print(response.text)
And, tcharan!! do what you need to do.
On a crawler work, I had one situation where there was a difference between the content displayed on the website and the content retrieved with the response.text so the xpaths did not were the same as on the website, so I needed to download the content, making a local html file, and get the new ones xpaths to get the info that I needed.
You can try this:
from requests_html import HTML
with open("htmlfile.html") as htmlfile:
sourcecode = htmlfile.read()
parsedHtml = HTML(html=sourcecode)
print(parsedHtml)
I'm performing a simple task of uploading a file using Python requests library. I searched Stack Overflow and no one seemed to have the same problem, namely, that the file is not received by the server:
import requests
url='http://nesssi.cacr.caltech.edu/cgi-bin/getmulticonedb_release2.cgi/post'
files={'files': open('file.txt','rb')}
values={'upload_file' : 'file.txt' , 'DB':'photcat' , 'OUT':'csv' , 'SHORT':'short'}
r=requests.post(url,files=files,data=values)
I'm filling the value of 'upload_file' keyword with my filename, because if I leave it blank, it says
Error - You must select a file to upload!
And now I get
File file.txt of size bytes is uploaded successfully!
Query service results: There were 0 lines.
Which comes up only if the file is empty. So I'm stuck as to how to send my file successfully. I know that the file works because if I go to this website and manually fill in the form it returns a nice list of matched objects, which is what I'm after. I'd really appreciate all hints.
Some other threads related (but not answering my problem):
Send file using POST from a Python script
http://docs.python-requests.org/en/latest/user/quickstart/#response-content
Uploading files using requests and send extra data
http://docs.python-requests.org/en/latest/user/advanced/#body-content-workflow
If upload_file is meant to be the file, use:
files = {'upload_file': open('file.txt','rb')}
values = {'DB': 'photcat', 'OUT': 'csv', 'SHORT': 'short'}
r = requests.post(url, files=files, data=values)
and requests will send a multi-part form POST body with the upload_file field set to the contents of the file.txt file.
The filename will be included in the mime header for the specific field:
>>> import requests
>>> open('file.txt', 'wb') # create an empty demo file
<_io.BufferedWriter name='file.txt'>
>>> files = {'upload_file': open('file.txt', 'rb')}
>>> print(requests.Request('POST', 'http://example.com', files=files).prepare().body.decode('ascii'))
--c226ce13d09842658ffbd31e0563c6bd
Content-Disposition: form-data; name="upload_file"; filename="file.txt"
--c226ce13d09842658ffbd31e0563c6bd--
Note the filename="file.txt" parameter.
You can use a tuple for the files mapping value, with between 2 and 4 elements, if you need more control. The first element is the filename, followed by the contents, and an optional content-type header value and an optional mapping of additional headers:
files = {'upload_file': ('foobar.txt', open('file.txt','rb'), 'text/x-spam')}
This sets an alternative filename and content type, leaving out the optional headers.
If you are meaning the whole POST body to be taken from a file (with no other fields specified), then don't use the files parameter, just post the file directly as data. You then may want to set a Content-Type header too, as none will be set otherwise. See Python requests - POST data from a file.
(2018) the new python requests library has simplified this process, we can use the 'files' variable to signal that we want to upload a multipart-encoded file
url = 'http://httpbin.org/post'
files = {'file': open('report.xls', 'rb')}
r = requests.post(url, files=files)
r.text
Client Upload
If you want to upload a single file with Python requests library, then requests lib supports streaming uploads, which allow you to send large files or streams without reading into memory.
with open('massive-body', 'rb') as f:
requests.post('http://some.url/streamed', data=f)
Server Side
Then store the file on the server.py side such that save the stream into file without loading into the memory. Following is an example with using Flask file uploads.
#app.route("/upload", methods=['POST'])
def upload_file():
from werkzeug.datastructures import FileStorage
FileStorage(request.stream).save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
return 'OK', 200
Or use werkzeug Form Data Parsing as mentioned in a fix for the issue of "large file uploads eating up memory" in order to avoid using memory inefficiently on large files upload (s.t. 22 GiB file in ~60 seconds. Memory usage is constant at about 13 MiB.).
#app.route("/upload", methods=['POST'])
def upload_file():
def custom_stream_factory(total_content_length, filename, content_type, content_length=None):
import tempfile
tmpfile = tempfile.NamedTemporaryFile('wb+', prefix='flaskapp', suffix='.nc')
app.logger.info("start receiving file ... filename => " + str(tmpfile.name))
return tmpfile
import werkzeug, flask
stream, form, files = werkzeug.formparser.parse_form_data(flask.request.environ, stream_factory=custom_stream_factory)
for fil in files.values():
app.logger.info(" ".join(["saved form name", fil.name, "submitted as", fil.filename, "to temporary file", fil.stream.name]))
# Do whatever with stored file at `fil.stream.name`
return 'OK', 200
You can send any file via post api while calling the API just need to mention files={'any_key': fobj}
import requests
import json
url = "https://request-url.com"
headers = {"Content-Type": "application/json; charset=utf-8"}
with open(filepath, 'rb') as fobj:
response = requests.post(url, headers=headers, files={'file': fobj})
print("Status Code", response.status_code)
print("JSON Response ", response.json())
#martijn-pieters answer is correct, however I wanted to add a bit of context to data= and also to the other side, in the Flask server, in the case where you are trying to upload files and a JSON.
From the request side, this works as Martijn describes:
files = {'upload_file': open('file.txt','rb')}
values = {'DB': 'photcat', 'OUT': 'csv', 'SHORT': 'short'}
r = requests.post(url, files=files, data=values)
However, on the Flask side (the receiving webserver on the other side of this POST), I had to use form
#app.route("/sftp-upload", methods=["POST"])
def upload_file():
if request.method == "POST":
# the mimetype here isnt application/json
# see here: https://stackoverflow.com/questions/20001229/how-to-get-posted-json-in-flask
body = request.form
print(body) # <- immutable dict
body = request.get_json() will return nothing. body = request.get_data() will return a blob containing lots of things like the filename etc.
Here's the bad part: on the client side, changing data={} to json={} results in this server not being able to read the KV pairs! As in, this will result in a {} body above:
r = requests.post(url, files=files, json=values). # No!
This is bad because the server does not have control over how the user formats the request; and json= is going to be the habbit of requests users.
Upload:
with open('file.txt', 'rb') as f:
files = {'upload_file': f.read()}
values = {'DB': 'photcat', 'OUT': 'csv', 'SHORT': 'short'}
r = requests.post(url, files=files, data=values)
Download (Django):
with open('file.txt', 'wb') as f:
f.write(request.FILES['upload_file'].file.read())
Regarding the answers given so far, there was always something missing that prevented it to work on my side. So let me show you what worked for me:
import json
import os
import requests
API_ENDPOINT = "http://localhost:80"
access_token = "sdfJHKsdfjJKHKJsdfJKHJKysdfJKHsdfJKHs" # TODO: get fresh Token here
def upload_engagement_file(filepath):
url = API_ENDPOINT + "/api/files" # add any URL parameters if needed
hdr = {"Authorization": "Bearer %s" % access_token}
with open(filepath, "rb") as fobj:
file_obj = fobj.read()
file_basename = os.path.basename(filepath)
file_to_upload = {"file": (str(file_basename), file_obj)}
finfo = {"fullPath": filepath}
upload_response = requests.post(url, headers=hdr, files=file_to_upload, data=finfo)
fobj.close()
# print("Status Code ", upload_response.status_code)
# print("JSON Response ", upload_response.json())
return upload_response
Note that requests.post(...) needs
a url parameter, containing the full URL of the API endpoint you're calling, using the API_ENDPOINT, assuming we have an http://localhost:8000/api/files endpoint to POST a file
a headers parameter, containing at least the authorization (bearer token)
a files parameter taking the name of the file plus the entire file content
a data parameter taking just the path and file name
Installation required (console):
pip install requests
What you get back from the function call is a response object containing a status code and also the full error message in JSON format. The commented print statements at the end of upload_engagement_file are showing you how you can access them.
Note: Some useful additional information about the requests library can be found here
Some may need to upload via a put request and this is slightly different that posting data. It is important to understand how the server expects the data in order to form a valid request. A frequent source of confusion is sending multipart-form data when it isn't accepted. This example uses basic auth and updates an image via a put request.
url = 'foobar.com/api/image-1'
basic = requests.auth.HTTPBasicAuth('someuser', 'password123')
# Setting the appropriate header is important and will vary based
# on what you upload
headers = {'Content-Type': 'image/png'}
with open('image-1.png', 'rb') as img_1:
r = requests.put(url, auth=basic, data=img_1, headers=headers)
While the requests library makes working with http requests a lot easier, some of its magic and convenience obscures just how to craft more nuanced requests.
In Ubuntu you can apply this way,
to save file at some location (temporary) and then open and send it to API
path = default_storage.save('static/tmp/' + f1.name, ContentFile(f1.read()))
path12 = os.path.join(os.getcwd(), "static/tmp/" + f1.name)
data={} #can be anything u want to pass along with File
file1 = open(path12, 'rb')
header = {"Content-Disposition": "attachment; filename=" + f1.name, "Authorization": "JWT " + token}
res= requests.post(url,data,header)
I'm performing a simple task of uploading a file using Python requests library. I searched Stack Overflow and no one seemed to have the same problem, namely, that the file is not received by the server:
import requests
url='http://nesssi.cacr.caltech.edu/cgi-bin/getmulticonedb_release2.cgi/post'
files={'files': open('file.txt','rb')}
values={'upload_file' : 'file.txt' , 'DB':'photcat' , 'OUT':'csv' , 'SHORT':'short'}
r=requests.post(url,files=files,data=values)
I'm filling the value of 'upload_file' keyword with my filename, because if I leave it blank, it says
Error - You must select a file to upload!
And now I get
File file.txt of size bytes is uploaded successfully!
Query service results: There were 0 lines.
Which comes up only if the file is empty. So I'm stuck as to how to send my file successfully. I know that the file works because if I go to this website and manually fill in the form it returns a nice list of matched objects, which is what I'm after. I'd really appreciate all hints.
Some other threads related (but not answering my problem):
Send file using POST from a Python script
http://docs.python-requests.org/en/latest/user/quickstart/#response-content
Uploading files using requests and send extra data
http://docs.python-requests.org/en/latest/user/advanced/#body-content-workflow
If upload_file is meant to be the file, use:
files = {'upload_file': open('file.txt','rb')}
values = {'DB': 'photcat', 'OUT': 'csv', 'SHORT': 'short'}
r = requests.post(url, files=files, data=values)
and requests will send a multi-part form POST body with the upload_file field set to the contents of the file.txt file.
The filename will be included in the mime header for the specific field:
>>> import requests
>>> open('file.txt', 'wb') # create an empty demo file
<_io.BufferedWriter name='file.txt'>
>>> files = {'upload_file': open('file.txt', 'rb')}
>>> print(requests.Request('POST', 'http://example.com', files=files).prepare().body.decode('ascii'))
--c226ce13d09842658ffbd31e0563c6bd
Content-Disposition: form-data; name="upload_file"; filename="file.txt"
--c226ce13d09842658ffbd31e0563c6bd--
Note the filename="file.txt" parameter.
You can use a tuple for the files mapping value, with between 2 and 4 elements, if you need more control. The first element is the filename, followed by the contents, and an optional content-type header value and an optional mapping of additional headers:
files = {'upload_file': ('foobar.txt', open('file.txt','rb'), 'text/x-spam')}
This sets an alternative filename and content type, leaving out the optional headers.
If you are meaning the whole POST body to be taken from a file (with no other fields specified), then don't use the files parameter, just post the file directly as data. You then may want to set a Content-Type header too, as none will be set otherwise. See Python requests - POST data from a file.
(2018) the new python requests library has simplified this process, we can use the 'files' variable to signal that we want to upload a multipart-encoded file
url = 'http://httpbin.org/post'
files = {'file': open('report.xls', 'rb')}
r = requests.post(url, files=files)
r.text
Client Upload
If you want to upload a single file with Python requests library, then requests lib supports streaming uploads, which allow you to send large files or streams without reading into memory.
with open('massive-body', 'rb') as f:
requests.post('http://some.url/streamed', data=f)
Server Side
Then store the file on the server.py side such that save the stream into file without loading into the memory. Following is an example with using Flask file uploads.
#app.route("/upload", methods=['POST'])
def upload_file():
from werkzeug.datastructures import FileStorage
FileStorage(request.stream).save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
return 'OK', 200
Or use werkzeug Form Data Parsing as mentioned in a fix for the issue of "large file uploads eating up memory" in order to avoid using memory inefficiently on large files upload (s.t. 22 GiB file in ~60 seconds. Memory usage is constant at about 13 MiB.).
#app.route("/upload", methods=['POST'])
def upload_file():
def custom_stream_factory(total_content_length, filename, content_type, content_length=None):
import tempfile
tmpfile = tempfile.NamedTemporaryFile('wb+', prefix='flaskapp', suffix='.nc')
app.logger.info("start receiving file ... filename => " + str(tmpfile.name))
return tmpfile
import werkzeug, flask
stream, form, files = werkzeug.formparser.parse_form_data(flask.request.environ, stream_factory=custom_stream_factory)
for fil in files.values():
app.logger.info(" ".join(["saved form name", fil.name, "submitted as", fil.filename, "to temporary file", fil.stream.name]))
# Do whatever with stored file at `fil.stream.name`
return 'OK', 200
You can send any file via post api while calling the API just need to mention files={'any_key': fobj}
import requests
import json
url = "https://request-url.com"
headers = {"Content-Type": "application/json; charset=utf-8"}
with open(filepath, 'rb') as fobj:
response = requests.post(url, headers=headers, files={'file': fobj})
print("Status Code", response.status_code)
print("JSON Response ", response.json())
#martijn-pieters answer is correct, however I wanted to add a bit of context to data= and also to the other side, in the Flask server, in the case where you are trying to upload files and a JSON.
From the request side, this works as Martijn describes:
files = {'upload_file': open('file.txt','rb')}
values = {'DB': 'photcat', 'OUT': 'csv', 'SHORT': 'short'}
r = requests.post(url, files=files, data=values)
However, on the Flask side (the receiving webserver on the other side of this POST), I had to use form
#app.route("/sftp-upload", methods=["POST"])
def upload_file():
if request.method == "POST":
# the mimetype here isnt application/json
# see here: https://stackoverflow.com/questions/20001229/how-to-get-posted-json-in-flask
body = request.form
print(body) # <- immutable dict
body = request.get_json() will return nothing. body = request.get_data() will return a blob containing lots of things like the filename etc.
Here's the bad part: on the client side, changing data={} to json={} results in this server not being able to read the KV pairs! As in, this will result in a {} body above:
r = requests.post(url, files=files, json=values). # No!
This is bad because the server does not have control over how the user formats the request; and json= is going to be the habbit of requests users.
Upload:
with open('file.txt', 'rb') as f:
files = {'upload_file': f.read()}
values = {'DB': 'photcat', 'OUT': 'csv', 'SHORT': 'short'}
r = requests.post(url, files=files, data=values)
Download (Django):
with open('file.txt', 'wb') as f:
f.write(request.FILES['upload_file'].file.read())
Regarding the answers given so far, there was always something missing that prevented it to work on my side. So let me show you what worked for me:
import json
import os
import requests
API_ENDPOINT = "http://localhost:80"
access_token = "sdfJHKsdfjJKHKJsdfJKHJKysdfJKHsdfJKHs" # TODO: get fresh Token here
def upload_engagement_file(filepath):
url = API_ENDPOINT + "/api/files" # add any URL parameters if needed
hdr = {"Authorization": "Bearer %s" % access_token}
with open(filepath, "rb") as fobj:
file_obj = fobj.read()
file_basename = os.path.basename(filepath)
file_to_upload = {"file": (str(file_basename), file_obj)}
finfo = {"fullPath": filepath}
upload_response = requests.post(url, headers=hdr, files=file_to_upload, data=finfo)
fobj.close()
# print("Status Code ", upload_response.status_code)
# print("JSON Response ", upload_response.json())
return upload_response
Note that requests.post(...) needs
a url parameter, containing the full URL of the API endpoint you're calling, using the API_ENDPOINT, assuming we have an http://localhost:8000/api/files endpoint to POST a file
a headers parameter, containing at least the authorization (bearer token)
a files parameter taking the name of the file plus the entire file content
a data parameter taking just the path and file name
Installation required (console):
pip install requests
What you get back from the function call is a response object containing a status code and also the full error message in JSON format. The commented print statements at the end of upload_engagement_file are showing you how you can access them.
Note: Some useful additional information about the requests library can be found here
Some may need to upload via a put request and this is slightly different that posting data. It is important to understand how the server expects the data in order to form a valid request. A frequent source of confusion is sending multipart-form data when it isn't accepted. This example uses basic auth and updates an image via a put request.
url = 'foobar.com/api/image-1'
basic = requests.auth.HTTPBasicAuth('someuser', 'password123')
# Setting the appropriate header is important and will vary based
# on what you upload
headers = {'Content-Type': 'image/png'}
with open('image-1.png', 'rb') as img_1:
r = requests.put(url, auth=basic, data=img_1, headers=headers)
While the requests library makes working with http requests a lot easier, some of its magic and convenience obscures just how to craft more nuanced requests.
In Ubuntu you can apply this way,
to save file at some location (temporary) and then open and send it to API
path = default_storage.save('static/tmp/' + f1.name, ContentFile(f1.read()))
path12 = os.path.join(os.getcwd(), "static/tmp/" + f1.name)
data={} #can be anything u want to pass along with File
file1 = open(path12, 'rb')
header = {"Content-Disposition": "attachment; filename=" + f1.name, "Authorization": "JWT " + token}
res= requests.post(url,data,header)
I am communicating with an API using HTTP.client in Python 3.6.2.
In order to upload a file it requires a three stage process.
I have managed to talk successfully using POST methods and the server returns data as I expect.
However, the stage that requires the actual file to be uploaded is a PUT method - and I cannot figure out how to syntax the code to include a pointer to the actual file on my storage - the file is an mp4 video file.
Here is a snippet of the code with my noob annotations :)
#define connection as HTTPS and define URL
uploadstep2 = http.client.HTTPSConnection("grabyo-prod.s3-accelerate.amazonaws.com")
#define headers
headers = {
'accept': "application/json",
'content-type': "application/x-www-form-urlencoded"
}
#define the structure of the request and send it.
#Here it is a PUT request to the unique URL as defined above with the correct file and headers.
uploadstep2.request("PUT", myUniqueUploadUrl, body="C:\Test.mp4", headers=headers)
#get the response from the server
uploadstep2response = uploadstep2.getresponse()
#read the data from the response and put to a usable variable
step2responsedata = uploadstep2response.read()
The response I am getting back at this stage is an
"Error 400 Bad Request - Could not obtain the file information."
I am certain this relates to the body="C:\Test.mp4" section of the code.
Can you please advise how I can correctly reference a file within the PUT method?
Thanks in advance
uploadstep2.request("PUT", myUniqueUploadUrl, body="C:\Test.mp4", headers=headers)
will put the actual string "C:\Test.mp4" in the body of your request, not the content of the file named "C:\Test.mp4" as you expect.
You need to open the file, read it's content then pass it as body. Or to stream it, but AFAIK http.client does not support that, and since your file seems to be a video, it is potentially huge and will use plenty of RAM for no good reason.
My suggestion would be to use requests, which is a way better lib to do this kind of things:
import requests
with open(r'C:\Test.mp4'), 'rb') as finput:
response = requests.put('https://grabyo-prod.s3-accelerate.amazonaws.com/youruploadpath', data=finput)
print(response.json())
I do not know if it is useful for you, but you can try to send a POST request with requests module :
import requests
url = ""
data = {'title':'metadata','timeDuration':120}
mp3_f = open('/path/your_file.mp3', 'rb')
files = {'messageFile': mp3_f}
req = requests.post(url, files=files, json=data)
print (req.status_code)
print (req.content)
Hope it helps .
I want to upload a file to an url. The file I want to upload is not on my computer, but I have the url of the file. I want to upload it using requests library. So, I want to do something like this:
url = 'http://httpbin.org/post'
files = {'file': open('report.xls', 'rb')}
r = requests.post(url, files=files)
But, only difference is, the file report.xls comes from some url which is not in my computer.
The only way to do this is to download the body of the URL so you can upload it.
The problem is that a form that takes a file is expecting the body of the file in the HTTP POST. Someone could write a form that takes a URL instead, and does the fetching on its own… but that would be a different form and request than the one that takes a file (or, maybe, the same form, with an optional file and an optional URL).
You don't have to download it and save it to a file, of course. You can just download it into memory:
urlsrc = 'http://example.com/source'
rsrc = requests.get(urlsrc)
urldst = 'http://example.com/dest'
rdst = requests.post(urldst, files={'file': rsrc.content})
Of course in some cases, you might always want to forward along the filename, or some other headers, like the Content-Type. Or, for huge files, you might want to stream from one server to the other without downloading and then uploading the whole file at once. You'll have to do any such things manually, but almost everything is easy with requests, and explained well in the docs.*
* Well, that last example isn't quite easy… you have to get the raw socket-wrappers off the requests and read and write, and make sure you don't deadlock, and so on…
There is an example in the documentation that may suit you. A file-like object can be used as a stream input for a POST request. Combine this with a stream response for your GET (passing stream=True), or one of the other options documented here.
This allows you to do a POST from another GET without buffering the entire payload locally. In the worst case, you may have to write a file-like class as "glue code", allowing you to pass your glue object to the POST that in turn reads from the GET response.
(This is similar to a documented technique using the Node.js request module.)
import requests
img_url = "http://...."
res_src = requests.get(img_url)
payload={}
files=[
('files',('image_name.jpg', res_src.content,'image/jpeg'))
]
headers = {"token":"******-*****-****-***-******"}
response = requests.request("POST", url, headers=headers, data=payload, files=files)
print(response.text)
above code is working for me.