python-requests post with unicode filenames

python-requests post with unicode filenames - python

I've read through several related questions here on SO but didn't manage to find a working solution.
I have a Flask server with this simplified code:
app = Flask(__name__)
api = Api(app)
class SendMailAPI(Resource):
def post(self):
print request.files
return Response(status=200)
api.add_resource(SendMailAPI, '/')
if __name__ == '__main__':
app.run(host='0.0.0.0', debug=True)
Then in the client:
# coding:utf-8
import requests
eng_file_name = 'a.txt'
heb_file_name = u'א.txt'
requests.post('http://localhost:5000/', files={'file0': open(eng_file_name, 'rb')})
requests.post('http://localhost:5000/', files={'file0': open(heb_file_name, 'rb')})
When sending the first request with the non-utf-8 filename the server receives the request with the file and prints ImmutableMultiDict([('file0', <FileStorage: u'a.txt' (None)>)]), but when sending the file with the utf-8 filename the server doesn't seem to receive the file as it prints ImmutableMultiDict([]).
I'm using requests 2.3.0 but the problem doesn't resolve with the latest version as well (2.8.1), Flask version is 0.10.1 and Flask-RESTful version is 0.3.4.
I've done some digging in requests code and the request seems to be sent ok (ie with the file), and I printed the request right before it is being sent and see the file name was indeed encoded to RFC2231:
--6ea257530b254861b71626f10a801726
Content-Disposition: form-data; name="file0"; filename*=utf-8''%D7%90.txt
To sum things up, I'm not entirely sure if the problem lies within requests that doesn't properly attach the file to the request or if Flask is having issues with picking up files with file names that are encoded according to RFC2231.
UPDATE: Came across this issue in requests GitHub: https://github.com/kennethreitz/requests/issues/2505

I think maybe there's confusion here on encoding here -
eng_file_name = 'a.txt' # ASCII encoded, by default in Python 2
heb_file_name = u'א.txt' # NOT UTF-8 Encoded - just a unicode object
To send the second one to the server what you want to do is this:
requests.post('http://localhost:5000/', files={'file0': open(heb_file_name.encode('utf-8'), 'rb')})
I'm a little surprised that it doesn't throw an error on the client trying to open the file though - you see nothing on the client end indicating an error?
EDIT: An easy way to confirm or deny my idea is of course to print out the contents from inside the client to ensure it's being read properly.

I workaround this issue by manually reading the file with read() and then posting its contents:
requests.post(upload_url, files={
'file': ("photo.jpg", open(path_with_unicode_filename, 'rb').read())
})

Try this workaround:
filename.encode("utf-8").decode("iso-8859-1").
Example:
requests.post("https://example.com", files={"file":
("中文filename.txt".encode("utf-8").decode("iso-8859-1"), fobj, mimetype)})
I post this because this is my first result when searching python requests post filename encoding.
There are lots of RFC standards about Content-Disposition encoding.
And it seems that different programs implement this part differently.
See stackoverflow: lots of RFCs and application tests, RFC 2231 - 4, email.utils.encode_rfc2231.
Java version answer here.

Related

How to send POST request with each payload on its own line using Python requests

I have to send a POST request to the /batch endpoint of : 'https://www.google-analytics.com'.
As mentioned in the Documentation I have to send the request to /batch endpoint and specify each payload on its own line.
I was able to achieve this using POSTMAN as follows:
My query is to make a POST request using Python's requests library
I tried something like this :
import requests
text = '''v=1&cid=43223523&tid=UA-XXXXXX-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=bookmarks&ev=13
v=1&cid=43223523&tid=UA-XXXXXX-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=upvotes&ev=65
v=1&cid=43223523&tid=UA-XXXXXX-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=questions&ev=15
v=1&cid=43223523&tid=UA-XXXXXX-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=postviews&ev=95'''
response = requests.post('https://www.google-analytics.com/batch', data=text)
but it doesn't works.
UPDATE
I Tried this and it works !
import http.client
conn = http.client.HTTPSConnection("www.google-analytics.com")
payload = "v=1&cid=43223523&tid=UA-200248207-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=bookmarks&ev=13\r\nv=1&cid=43223523&tid=UA-200248207-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=upvotes&ev=63\r\nv=1&cid=43223523&tid=UA-200248207-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=questions&ev=11\r\nv=1&cid=43223523&tid=UA-200248207-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=postviews&ev=23"
headers = {
'Content-Type': 'text/plain'
}
conn.request("POST", "/batch", payload, headers)
res = conn.getresponse()
But the question remains open, what's the issue with requests here.

You don't need to double-escape the newline symbol.
Moreover, you don't need the newline symbol at all for the multi-line string.
And also the indentations you put in your multi-line string are counted:
test = '''abc
def
ghi'''
print(test)
Here's an SO answer that explains this with some additional ways to make long stings: https://stackoverflow.com/a/10660443/4570170
Now the request body.
The documentation says
payload_data – The BODY of the post request. The body must include exactly 1 URI encoded payload and must be no longer than 8192 bytes.
So try uri-encoding your payload:
text = '''v=1&cid=43223523&tid=UA-XXXXXX-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=bookmarks&ev=13
v=1&cid=43223523&tid=UA-XXXXXX-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=upvotes&ev=65
v=1&cid=43223523&tid=UA-XXXXXX-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=questions&ev=15
v=1&cid=43223523&tid=UA-XXXXXX-1&t=event&ec=aggregated_stats&ea=daily_kpi&el=postviews&ev=95'''
text_final = requests.utils.quote(text)
response = requests.post('https://www.google-analytics.com/batch', data=text_final)

Finally , I figured out the solution myself.
Updating for others help.
The problem was I was working on AWS Cloud9 and as mentioned in the documentation
Some environments are not able to send hits to Google Analytics directly. Examples of this are older mobile phones that can't run JavaScript or corporate intranets behind a firewall.
So we just need to include the User Agent parameter
ua=Opera/9.80
in each of our payloads
It works !

Python SUDS - Getting Exception 415 when calling a SOAP method

from suds.client import Client
url = r'http://*********?singleWsdl'
c = Client(url)
The requests work fine till here, but when I execute the below statement, I get the error message shown at the end. Please help.
c.service.Method_Name('parameter1', 'parameter2')
The Error message is :
Exception: (415, u'Cannot process the message because the content type
\'text/xml; charset=utf-8\' was not the expected type
\'multipart/related; type="application/xop+xml"\'.')

A Content-Type header of multipart/related; type="application/xop+xml" is the type used by MTOM, a message format used to efficiently send attachments to/from web services.
I'm not sure why the error claims to be expecting it, because the solution I found for my situation was the override the Content-Type header to 'application/soap+xml;charset=UTF-8'.
Example:
soap_client.set_options(headers = {'Content-Type': 'application/soap+xml;charset=UTF-8'})
If you are able, you could also trying checking for MTOM encoding in the web service's configuration and changing it.

Error -3 while decompressing data: incorrect header check - urllib2

I am checking for url status with this code:
h = httplib2.Http()
hdr = {'User-Agent': 'Mozilla/5.0'}
resp = h.request("http://" + url, headers=hdr)
if int(resp[0]['status']) < 400:
return 'ok'
else:
return 'bad'
and getting
Error -3 while decompressing data: incorrect header check
the url i am checking is:
http://www.sueddeutsche.de/wirtschaft/deutschlands-innovationsangst-wir-neobiedermeier-1.2117528
the Exception Location is:
Exception Location: C:\Python27\lib\site-packages\httplib2-0.9-py2.7.egg\httplib2\__init__.py in _decompressContent, line 403
try:
encoding = response.get('content-encoding', None)
if encoding in ['gzip', 'deflate']:
if encoding == 'gzip':
content = gzip.GzipFile(fileobj=StringIO.StringIO(new_content)).read()
if encoding == 'deflate':
content = zlib.decompress(content) ##<---- error line
response['content-length'] = str(len(content))
# Record the historical presence of the encoding in a way the won't interfere.
response['-content-encoding'] = response['content-encoding']
del response['content-encoding']
except IOError:
content = ""
http status is 200 which is ok for my case, but i am getting this error
I actually need only http status, why is it reading the whole content?

You may have any number of reasons why you choose httplib2, but it's far too easy to get the status code of an HTTP request using the python module requests. Install with the following command:
$ pip install requests
See an extremely simple example below.
In [1]: import requests as rq
In [2]: url = "http://www.sueddeutsche.de/wirtschaft/deutschlands-innovationsangst-wir-neobiedermeier-1.2117528"
In [3]: r = rq.get(url)
In [4]: r
Out[4]: <Response [200]>
Link
Unless you have a considerable constraint that needs httplib2 explicitly, this solves your problem.

This may be a bug (or just uncommon design decision) in httplib2. I don't get this problem with urllib2 or httplib in the 2.x stdlib, or urllib.request or http.client in the 3.x stdlib, or the third-party libraries requests, urllib3, or pycurl.
So, is there a reason you need to use this particular library?
If so:
I actually need only http status, why is it reading the whole content?
Well, most HTTP libraries are going to read and parse the whole content, or at least the headers, before returning control. That way they can respond to simple requests about the headers or chunked encoding or MIME envelope or whatever without any delay.
Also, many of them automate things like 100 continue, 302 redirect, various kinds of auth, etc., and there's no way they could do that if they didn't read ahead. In particular, according to the description for httplib2, handling these things automatically is one of the main reasons you should use it in the first place.
Also, the first TCP read is nearly always going to include the headers anyway, so why not read them?
This means that if the headers are invalid, you may get an exception immediately. They may still provide a way to get the status code (or the raw headers, or other information) anyway.
As a side note, if you only want the HTTP status, you should probably send a HEAD request rather than a GET. Unless you're writing and testing a server, you can almost always rely on the fact that, as the RFC says, the status and headers should be identical to what you'd get with GET. In fact, that would almost certainly solve things in this case—if there is no body to decompress, the fact that httplib2 has gotten confused into thinking the body is gzipped when it isn't won't matter anyway.

How to serve filetype object in python

I'm using the URLLib2 method to download a file from another server via a rest api (the url can't be exposed to the user--that's why it needs to be done on the backend).
It gives me the following response:
(<addinfourl at 4365818480 whose fp = <google.appengine.dist27.socket._fileobject object at 0x1043883d0>>
I'm now trying to find a way to serve this file to the end user (a download). I did quite a bit of research tonight but had no luck. I tried using print .read() and that didn't help either.
Here's some additional information:
The Platform is Google Appengine. And below is the relevant code:
In calltrunk.get_recording:
req = urllib2.Request(url, None, forward_headers)
print response[0].read()
stream = urllib2.urlopen(req)
In my main.py
response = calltrunk.get_recording(ConversationId=cId)
print response[0].read()
Could really use a hand here!

Python httplib POST request and proper formatting

I'm currently working on a automated way to interface with a database website that has RESTful webservices installed. I am having issues with figure out the proper formatting of how to properly send the requests listed in the following site using python.
https://neesws.neeshub.org:9443/nees.html
Particular example is this:
POST https://neesws.neeshub.org:9443/REST/Project/731/Experiment/1706/Organization
<Organization id="167"/>
The biggest problem is that I do not know where to put the XML formatted part of the above. I want to send the above as a python HTTPS request and so far I've been trying something of the following structure.
>>>import httplib
>>>conn = httplib.HTTPSConnection("neesws.neeshub.org:9443")
>>>conn.request("POST", "/REST/Project/731/Experiment/1706/Organization")
>>>conn.send('<Organization id="167"/>')
But this appears to be completely wrong. I've never actually done python when it comes to webservices interfaces so my primary question is how exactly am I supposed to use httplib to send the POST Request, particularly the XML formatted part of it? Any help is appreciated.

You need to set some request headers before sending data. For example, content-type to 'text/xml'. Checkout the few examples,
Post-XML-Python-1
Which has this code as example:
import sys, httplib
HOST = www.example.com
API_URL = /your/api/url
def do_request(xml_location):
"""HTTP XML Post requeste"""
request = open(xml_location,"r").read()
webservice = httplib.HTTP(HOST)
webservice.putrequest("POST", API_URL)
webservice.putheader("Host", HOST)
webservice.putheader("User-Agent","Python post")
webservice.putheader("Content-type", "text/xml; charset=\"UTF-8\"")
webservice.putheader("Content-length", "%d" % len(request))
webservice.endheaders()
webservice.send(request)
statuscode, statusmessage, header = webservice.getreply()
result = webservice.getfile().read()
print statuscode, statusmessage, header
print result
do_request("myfile.xml")
Post-XML-Python-2
You may get some idea.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.