Using Python requests for automatic file uploading to Flask environment - python

I am trying to setup a script where I upload a file (now using the python requests library) to a Flask environment that runs inside of a Docker (-compose) container. The python script is ran in the hypervisor, and I force it to use the same version of python (3.6) as well. I am able to get responses from the server, but the file I upload is 12Kb, and the file the Flask container receives is 2Kb, and I have no clue what is going wrong.
It seems like when I use WireShark to capture the tcp stream, I receive a 2Kb file as well, so my guess the requests library applies some compression, but I can not seem to find any documentation about this happening.
I have tried to replace the file tuple in the sending code with solely the file handle, but this seemed to have no effect. Sending files of a different size results in a different filesize in Flask / Docker.
Sending a string instead of the filehandler ("1234567890") results in the filesize being as big as the string length (10 bytes).
Replacing the file opening method from rb to r results in a UnicodeDecodeError: 'ascii' codec can't decode byte 0xdf in position 14: ordinal not in range(128) raise inside requests -> encodings.
Hypervisor: send.py
import requests
with open('file.docx', 'rb') as f:
url = 'http://localhost:8081/file'
r = requests.post(url, files={'file': ('file', f, 'multipart/form-data')})
print(r.text)
Flask: file.py
#app.route('/file', methods=['POST'])
def parse_from_post():
file = request.files['file']
fn = secure_filename("file.docx") # did this manually instead of getting it from the request for testing reasons
folder = "/app/files/"
fl = os.path.join(folder, fn)
# Removes old file
if os.path.exists(fl):
os.remove(fl)
file.save(fl)
return ""

The problem lies with the filesize, which python requests does not take care of directly. I used the MultipartEncoder from the requests_toolbelt package, to encapsulate the file instead of directly plugging in the file in the requests post call.
Hypervisor: send.py
import requests
from requests_toolbelt import MultipartEncoder
with open('file.docx', 'rb') as f:
url = 'http://localhost:8081/file'
m = MultipartEncoder(fields={
"file": ("file.docx", f)
})
r = requests.post(url, data=m, headers={'Content-Type': m.content_type})
print(r.text)
I actually found this result from another post on SO, see the first comment on the question linking to https://toolbelt.readthedocs.io/....

Related

How to get file from url in python?

I want to download text files using python, how can I do so?
I used requests module's urlopen(url).read() but it gives me the bytes representation of file.
For me, I had to do the following (Python 3):
from urllib.request import urlopen
data = urlopen("[your url goes here]").read().decode('utf-8')
# Do what you need to do with the data.
You can use multiple options:
For the simpler solution you can use this
file_url = 'https://someurl.com/text_file.txt'
for line in urllib.request.urlopen(file_url):
print(line.decode('utf-8'))
For an API solution
file_url = 'https://someurl.com/text_file.txt'
response = requests.get(file_url)
if (response.status_code):
data = response.text
for line in enumerate(data.split('\n')):
print(line)
When downloading text files with python I like to use the wget module
import wget
remote_url = 'https://www.google.com/test.txt'
local_file = 'local_copy.txt'
wget.download(remote_url, local_file)
If that doesn't work try using urllib
from urllib import request
remote_url = 'https://www.google.com/test.txt'
file = 'copy.txt'
request.urlretrieve(remote_url, file)
When you are using the request module you are reading the file directly from the internet and it is causing you to see the text in byte format. Try to write the text to a file then view it manually by opening it on your desktop
import requests
remote_url = 'test.com/test.txt'
local_file = 'local_file.txt'
data = requests.get(remote_url)
with open(local_file, 'wb')as file:
file.write(data.content)

Passing Binary file over HTTP POST

I have a local python file that decodes binary files. This python file first reads from the file, opens it as binary and then saves it in a buffer and interprets it. Reading it is simply:
with open(filepath, 'rb') as f:
buff = f.read()
read_all(buff)
This works fine locally. Now I'd like to setup a Azure Python job where I can send the file, approx. 100kb, over a HTTP POST and then read the interpreted meta data which my original python script does well.
I've first removed the read function so that I'll now work with the buffer only.
In my Azure Python Job I have the following, triggered by a HttpRequest
my_data = reader.read_file(req.get_body())
To test my sending I've tried the following in python
import requests
url = 'http://localhost:7071/api/HttpTrigger'
files = {'file': open('test', 'rb')}
with open('test', 'rb') as f:
buff = f.read()
r = requests.post(url, files=files) #Try using files
r = requests.post(url, data=buff) #Try using data
I've also tried in Postman adding the file to the body as a binary and setting the headers to application/octet-stream
All this doesn't send the binary file the same way as the original f.read() did. So I'm getting a wrong interpretation of the binary file.
What is file.read doing differently to how I'm sending it over as a HTTP Body message?
Printing out the first line from the local python read file gives.
b'\n\n\xfe\xfe\x00\x00\x00\x00\\\x18,A\x18\x00\x00\x00(\x00\x00\x00\x1f\x00\x00\
Whereas printing it out at the req.get_body() shows me
b'\n\n\xef\xbf\xbd\xef\xbf\xbd\x00\x00\x00\x00\\\x18,A\x18\x00\x00\x00(\x00\x00\x00\x1f\x00\
So something is clearly wrong. Any help why this could be different?
Thanks
EDIT:
I've implemented a similar function in Flask and it works well.
The code in flask is simply grabbing the file from a POST. No encoding/decoding.
if request.method == 'POST':
f = request.files['file']
#f.save(secure_filename(f.filename))
my_data = reader.read_file(f.read())
Why is the Azure Function different?
You can try UTF-16 to decode and do the further action in your code.
Here is the code for that:
with open(path_to_file,'rb') as f:
contents = f.read()
contents = contents.rstrip("\n").decode("utf-16")
Basically after doing re.get_body, perform the below operation:
contents = contents.rstrip("\n").decode("utf-16")
See if it gives you the same output as your receive in local python file.
Hope it helps.

upload a file using python requests module

I am trying to upload a file using python requests module and i am not sure whether we can use both data and files in the post call.
fileobj= open(filename,'rb')
upload_data = {
'data':payload,
'file':fileobj
}
resp = s.post(upload_url,data=upload_data,headers=upload_headers)
and this is not working. So can anyone help me with this ?
I think you should be using the data and files keyword parameters in the post request to send the data and file respectively.
with open(filename,'rb') as fileobj:
files = {'file': fileobj}
resp = s.post(upload_url,data=payload,files=files,headers=upload_headers)
I've also use a context manager just because it closes the file for me and takes care of exceptions that happen either during file opening or during something that happens with the requests post.

Get arbitrary resources content in Python 3

I need to get the content of the resources received in command line. The user can write a relative path to a file or an URL. Is it possible to read from this resource regardless if it is a path to a file or an URL?
In Ruby I have something like the next, but I'm having problems finding a Python alternative:
content = open(path_or_url) { |io| io.read }
I don't know of a nice way to do it, however, urllib.request.urlopen() will support opening normal URLs (http, https, ftp, etc) as well as files on the file system. So you could assume a file if the URL is missing a scheme component:
from urllib.parse import urlparse
from urllib.request import urlopen
resource = input('Enter a URL or relative file path: ')
if urlparse(resource).scheme == '':
# assume that it is a file, use "file:" scheme
resource = 'file:{}'.format(resource)
data = urlopen(resource).read()
This works for the following user input:
http://www.blah.com
file:///tmp/x/blah
file:/tmp/x/blah
file:x/blah # assuming cwd is /tmp
/tmp/x/blah
x/blah # assuming cwd is /tmp
Note that file: (without slashes) might not be a valid URI, however, this is the only way to open a file specified by relative path, and urlopen() works with such URIs.

Errow while uploading gzip via http POST

So trying I'm POSTing a compressed file via httplib2 in Python 3.2. I get the following error:
io.UnsupportedOperation: fileno
I used to post just an xml file but since those files are getting too big I want to compress them inside the memory first.
This is how I create the compressed file in memory:
contentfile = open(os.path.join(r'path', os.path.basename(fname)), 'rb')
tempfile = io.BytesIO()
compressedFile = gzip.GzipFile(fileobj=tempfile, mode='wb')
compressedFile.write(contentfile.read())
compressedFile.close()
tempfile.seek(0)
and this is how I'm trying to POST it.
http.request(self.URL,'POST', tempfile, headers={"Content-Type": "application/x-gzip", "Connection": "keep-alive"})
Any ideas ?
Like i said, it worked well when using the xml file i.e. contentfile
Solved by providing the "Content-Length" header which obviously removes the need for httplib2 to check the length.

Categories