I am using Python's requests module to post some data to my company's SharePoint site:
response = requests.post("https//my-sharepoint-site.com",
data={"myPost": "this is a test"},
auth=HttpNtlmAuth("\\MY_USER_NAME", MY_PASS))
The resulting post on my company's sharepoint site looks like:
myPost=this+is+a+test
How do I 1) remove "myPost=" and 2) stop the filling of white space with '+' ?
Note: I do not have access to the company's server-side application logic.
Another note: curl does not encode white spaces with '+'. The result of:
curl –f –v --ntlm –u MY_USER_NAME --data “this is a test” https://my-sharepoint-site.com
is:
this is a test
You're passing a dict as the data parameter, which means the dict is interpreted as key-value pairs to be encoded as application/x-url-encoded. URL encoding (such as is used for GET requests) substitutes + characters instead of %20 because that is the behavior of most browsers.
If you want to pass a POST body literally, you need to encode it into bytes yourself.
In your case, simply passing data="this is a test".encode("utf-8") should give you the behavior you want.
Related
The client sends a bearer token to the server and it looks like so:
Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c
Apparently I don't need the 'Bearer' prefix so I have to get rid of it. I understand it is as simple as splitting the string and taking the right element but what I do not understand is why the library function I'm using doesn't do it for me.
I also have to check if the token is actually of the right type (which is bearer in this case). It forces me to write additional lines of code and I don't like it.
So my question is "Are there any smarter ways of processing the token?"
I'm using PyJWT.
The Bearer ... string is usually found in a Authorisation header from a HTTP request. It then depends on the specific framework you are using to receive or send HTTP requests if there is specific support for such headers.
The format is not part of the JSON web token standard; the Authorization header, with or without Bearer, is a common place to find one, but a package like PyJWT only deals with the tokens, not the transport mechanism. So a library that focuses on handling JSON Web Tokens should not be expected to handle parsing tokens out of a HTTP header (though some may do).
The HTTP 1.1 specification that determines what headers on a HTTP request should look like only standardizes that a Authorization: header in a request should contain credentials, and a separate RFC 2617 standard on HTTP Authentication states that credentials should consist of at least a scheme and arbitrary parameters:
credentials = auth-scheme #auth-param
That's not much to go on for a Python HTTP library to work with. The specific RFC only further specifies two different authorization schemes: Basic and Digest. Bearer is not part of this standard. So a framework like Werkzeug (which underpins Flask, among others) does support parsing Authorization headers, but only if one of those two standardised schemes is being used (see the Authorization class docs).
The Bearer scheme is instead part of the OAuth 2.0 standard. It just defines that a client can send a token, one given to them, that the server can accept to authorize the request. The Bearer scheme is just one of several ways to send the token, and the only limitation on the token is that it should be base64-encoded. Nothing more is said.
But it does say that if the Authorization header is used, then the format must follow a specific format:
b64token = 1*( ALPHA / DIGIT /
"-" / "." / "_" / "~" / "+" / "/" ) *"="
credentials = "Bearer" 1*SP b64token
So Bearer, followed by 1 or more spaces, then followed by Base64 data with some added permitted characters (Base64 only uses letters, digits and + and / with = as padding at the end, so -, ., _ and ~ are extra here). That's it.
If you must have a library, find one that handles OAuth 2.0. But it is otherwise trivial to just split on whitespace, and (optionally) decode the string as Base64:
from base64 import b64decode
auth = header_string.split(maxsplit=2) # only interested in the first two parts
token = b64token = None
if len(auth) > 1 and auth.lower() == 'bearer':
b64token = auth[1]
try:
token = b64decode(b64token)
except ValueError:
pass
Now b64token and token are either None, or the first non-whitespace portion after Bearer, and the base64-decoded version of that string.
A JSON Web Token is actually three Base64-encoded strings joined with ., so decoding such a token as a single Base64-encoded value could easily fail. You'd pass the b64token string to PyJWT.
Why are the + not converted to spaces:
>>> import urllib
>>> url = 'Q=Who+am+I%3F'
>>> urllib.unquote(url)
'Q=Who+am+I?'
>>>
There are two variants; urllib.unquote() and urllib.unquote_plus(). Use the latter:
>>> import urllib
>>> url = 'Q=Who+am+I%3F'
>>> urllib.unquote_plus(url)
'Q=Who am I?'
That's because there are two variants of URL quoting; one for URL path segments, and one for URL query parameters; the latter uses a different specification. See Wikipedia:
When data that has been entered into HTML forms is submitted, the form field names and values are encoded and sent to the server in an HTTP request message using method GET or POST, or, historically, via email. The encoding used by default is based on a very early version of the general URI percent-encoding rules, with a number of modifications such as newline normalization and replacing spaces with "+" instead of "%20".
So forms using the application/x-www-form-urlencoded mime type in a GET or POST request use slightly different rules, one where spaces are encoded to +, but when encoding characters in a URL, %20 is used. When decoding you need to pick the right variant. You have form data (from the query part of the URL) so you need to use unquote_plus().
Now, if you are parsing a query string, you may want to use the urlparse.parse_qs() or urlparse.parse_qsl() functions; these not only will use the right unquote*() function, but parse out the parameters into a dictionary or list of key-value pairs as well:
>>> import urlparse
>>> urlparse.parse_qs(url)
{'Q': ['Who am I?']}
>>> urlparse.parse_qsl(url)
[('Q', 'Who am I?')]
I am using the twitter API, and when I make a request to the api website, something like
https://api.instagram.com/v1/tags/cats/media/recent?user_id=myUserId&count=1
I get the correct response back, JSON data, except all of the // characters are escaped and are shown as \/\/
This is true for the command line, using curl and when i type that url directly into the browser.
If it makes any difference, I am ultimately going to be calling a function and navigating to that URL so I need it to be the unescaped.
Furthermore, I will be accessing that URL with Python, so if there is a Python method that is good, but ideally I would just get the response back unchanged.
The JSON standard allows (though not requires) / to be escaped. If you use any standard-compliant JSON parser (i.e. pretty much any JSON parser), it will do the unescaping for you.
I am sending my GET request to python server my query string is having
"http://192.168.4.106:3333/xx/xx/xx/xx?excelReport**&detail=&#tt**=475&dee=475&empi=&qwer=&start_date=03/01/2014&end_date=03/13/2014&SearchVar=0&report_format=D"
my query string is containing one character # so when i am doing request.keys() in my server its not showing me any params passed.Its working with other special character??
I am stuck in this problem from quite a long time??
I am using zope framework??
Please suggest??
The # character cannot be used like that in a query string.
You should encode it with %23 and decode it when you parse the string.
The reason behind that can be found at W3 site
# marks the end of the 'query' part of an URL and the start of the 'fragment'. If you need to have a '#' inside your query (that is, the GET params that you get with request.keys()), you need to encode it (with the standard urllib.urlencode or with whatever your framework provides).
I'm not sure what's the purpose of # in that URL, though. Is it supposed to be a key #tt** in your request.keys()? Is it in fact the start of the fragment?
Nowadays fragments are often used to have some routing in the client side of a webapp, since if you go from #a to #b inside a webpage, you don't need to reload the page. So if that may be the case then you can't encode the #, since it would lose its meaning. You would need then to extract the parameters you want from the fragment part manually.
You can use urllib.quote to solve your problem generally.
>>> import urllib
>>> urllib.quote('#')
'%23'
I am having trouble getting my Python script to ass Unicode data over RESTful http call.
I have a script that reads data from web site X using a REST interface and then pushes it into web site Y using it's REST interface. Both system are open source and are run on our servers. Site X uses PHP, Apache and PostgreSQL. Site Y is Java, Tomcat and PostgreSQL. The script doing the processing is currently in Python.
In general, the script works very well. We do have a few international users, and when trying to process a user with unicode characters in their name things break down. The original version of the script read the JSON data into the Python. The data was converted automagically into Unicode. I am pretty sure everything was working fine up to this point. To output the data I used subprocess.Popen() to call curl. This works for regular ascii, but the unicode was getting mangled somewhere in transit. I didn't get an error anywhere, but when viewing the results on site B it is no longer correctly encoded.
I know that Unicode is supported for these fields because I can craft a request using Firefox that correctly adds the data to site B.
Next idea was to not use curl, but just do everything in Python. I experimented by passing a hand constructed Unicode string to Python's urllib to make the REST call, but I received an error from urllib.urlopen():
UnicodeEncodeError: 'ascii' codec can't encode characters in position 103-105: ordinal not in range(128)
Any ideas on how to make this work? I would rather not re-write too much, but if there is another scripting language that would be better suited I wouldn't mind hearing about that also.
Here is my Python test script:
import urllib
uni = u"abc_\u03a0\u03a3\u03a9"
post = u"xdat%3Auser.login=unitest&"
post += u"xdat%3Auser.primary_password=nauihe4r93nf83jshhd83&"
post += u"xdat%3Auser.firstname=" + uni + "&"
post += u"xdat%3Auser.lastname=" + uni ;
url = u"http://localhost:8081/xnat/app/action/XDATRegisterUser"
data = urllib.urlopen(url,post).read()
With regard to your test script, it is failing because you are passing unicode object to urllib.urlencode() (it is being called for you by urlopen()). It does not support unicode objects, so it implicitly encodes using the default charset, which is ascii. Obviously, it fails.
The simplest way to handle POSTing unicode objects is to be explicit; Gather your data and build a dict, encode unicode values with an appropriate charset, urlencode the dict (to get a POSTable ascii string), then initiate the request. Your example could be rewritten as:
import urllib
import urllib2
## Build our post data dict
data = {
'xdat:user.login' : u'unitest',
'xdat:primary_password' : u'nauihe4r93nf83jshhd83',
'xdat:firstname' : u"abc_\u03a0\u03a3\u03a9",
'xdat:lastname' : u"abc_\u03a0\u03a3\u03a9",
}
## Encode the unicode using an appropriate charset
data = dict([(key, value.encode('utf8')) for key, value in data.iteritems()])
## Urlencode it for POSTing
data = urllib.urlencode(data)
## Build a POST request, get the response
url = "http://localhost:8081/xnat/app/action/XDATRegisterUser"
request = urllib2.Request(url, data)
response = urllib2.urlopen(request)
EDIT: More generally, when you make an http request with python (say urllib2.urlopen),
the content of the response is not decoded to unicode for you. That means you need to be aware of the encoding used by the server that sent it. Look at the content-type header; Usually it includes a charset=xyz.
It is always prudent to decode your input as early as possible, and encode your output as late as possible.