How to not convert or escape " // " characters in an HTTP response - python

I am using the twitter API, and when I make a request to the api website, something like
https://api.instagram.com/v1/tags/cats/media/recent?user_id=myUserId&count=1
I get the correct response back, JSON data, except all of the // characters are escaped and are shown as \/\/
This is true for the command line, using curl and when i type that url directly into the browser.
If it makes any difference, I am ultimately going to be calling a function and navigating to that URL so I need it to be the unescaped.
Furthermore, I will be accessing that URL with Python, so if there is a Python method that is good, but ideally I would just get the response back unchanged.

The JSON standard allows (though not requires) / to be escaped. If you use any standard-compliant JSON parser (i.e. pretty much any JSON parser), it will do the unescaping for you.

Related

python request library giving wrong value single quotes

Facing some issue in calling API using request library. Problem is described as follows
The code:.
r = requests.post(url, data=json.dumps(json_data), headers=headers)
When I perform r.text the apostrophe in the string is giving me as
like this Bachelor\u2019s Degree. This should actually give me the response as Bachelor's Degree.
I tried json.loads also but the single quote problem remains the same,
How to get the string value correctly.
What you see here ("Bachelor\u2019s Degree") is the string's inner representation, where "\u2019" is the unicode codepoint for "RIGHT SINGLE QUOTATION MARK". This is perfectly correct, there's nothing wrong here, if you print() this string you'll get what you expect:
>>> s = 'Bachelor\u2019s Degree'
>>> print(s)
Bachelor’s Degree
Learning about unicode and encodings might save you quite some time FWIW.
EDIT:
When I save in db and then on displaying on HTML it will cause issue
right?
Have you tried ?
Your database connector is supposed to encode it to the proper encoding (according to your fields, tables and client encoding settings).
wrt/ "displaying it on HTML", it mostly depends on whether you're using Python 2.7.x or Python 3.x AND on how you build your HTML, but if you're using some decent framework with a decent template engine (if not you should reconsider your stack) chances are it will work out of the box.
As I already mentionned, learning about unicode and encodings will save you a lot of time.
It's just using a UTF-8 encoding, it is not "wrong".
string = 'Bachelor\u2019s Degree'
print(string)
Bachelor’s Degree
You can decode and encode it again, but I can't see any reason why you would want to do that (this might not work in Python 2):
string = 'Bachelor\u2019s Degree'.encode().decode('utf-8')
print(string)
Bachelor’s Degree
From requests docs:
When you make a request, Requests makes educated guesses about the
encoding of the response based on the HTTP headers. The text encoding
guessed by Requests is used when you access r.text
On the response object, you may use .content instead of .text to get the response in UTF-8

Submitting a http post via python and seeing escape characters in the json posted

I am posting via python (totally new language for me so I am sure I am overlooking something basic) and seeing escape characters in the json posted to the server which obviously results in invalid json. Here is my code:
import requests
#try three different ways to escape json - all result in the backslash being submitted to the server in the post
json = """{"testId": "616fdb5e-40c1-326a-81a4-433051627e6d","testName": "nameHere"}"""
#json = '{"testId": "616fdb5e-40c1-326a-81a4-433051627e6d","testName": "nameHere"}'
#json = "{\"testId\": \"616fdb5e-40c1-326a-81a4-433051627e6d\",\"testName\": \"nameHere\"}"
response = requests.post("http://localhost:8888", data=None, json=json)
I am posting locally to fiddler and see that the escape characters are still there. Here is what is posted:
"{\"testId\": \"616fdb5e-40c1-326a-81a4-433051627e6d\",\"testName\": \"nameHere\"}"
I would expect the library to strip out escape characters. Is that not the case?
The other weird thing is that the characters aren't there when I am running the code, at least from what I can tell:
json
'{"testId": "616fdb5e-40c1-326a-81a4-433051627e6d","testName": "nameHere"}'
json.find("\\")
-1
Short answer is don't submit a string, the response method wants a dict. Code that works:
import requests
json_dict = {"testId": "616fdb5e-40c1-326a-81a4-433051627e6d","testName": "nameHere"}
response = requests.post("http://localhost:8888", data=None, json=json_dict)

request.keys() not having passed params when containg # in it

I am sending my GET request to python server my query string is having
"http://192.168.4.106:3333/xx/xx/xx/xx?excelReport**&detail=&#tt**=475&dee=475&empi=&qwer=&start_date=03/01/2014&end_date=03/13/2014&SearchVar=0&report_format=D"
my query string is containing one character # so when i am doing request.keys() in my server its not showing me any params passed.Its working with other special character??
I am stuck in this problem from quite a long time??
I am using zope framework??
Please suggest??
The # character cannot be used like that in a query string.
You should encode it with %23 and decode it when you parse the string.
The reason behind that can be found at W3 site
# marks the end of the 'query' part of an URL and the start of the 'fragment'. If you need to have a '#' inside your query (that is, the GET params that you get with request.keys()), you need to encode it (with the standard urllib.urlencode or with whatever your framework provides).
I'm not sure what's the purpose of # in that URL, though. Is it supposed to be a key #tt** in your request.keys()? Is it in fact the start of the fragment?
Nowadays fragments are often used to have some routing in the client side of a webapp, since if you go from #a to #b inside a webpage, you don't need to reload the page. So if that may be the case then you can't encode the #, since it would lose its meaning. You would need then to extract the parameters you want from the fragment part manually.
You can use urllib.quote to solve your problem generally.
>>> import urllib
>>> urllib.quote('#')
'%23'

escaping forward slashes in json output

I have a python server-side application that generates a simple HTML page with a big blurb of client-side javascript that generates client-side the DOM tree displayed to the user based on a big blob of JSON data assigned to a js variable. Some of that JSON data contains strings, some of which contain HTML tags. It all boils down to something like this:
<html>
...
var tmp = "<p>some text</p>";
...
</html>
Unsurprisingly, the above does not work since it should look like the following to make the browser HTML parser happy:
<html>
...
var tmp = "<p>some text<\/p>";
...
</html>
(notice the escaped forward slash)
The JSON inserted in the HTML is generated with the python default json library. Namely, with json.dumps which is designed explicitely to not escape the forward slash in strings.
I tried to subclass json.JSONDecoder to override its behavior for python strings but this does not work since it does not allow specialization of the serialization of basic python types.
I tried to use a variety of other python json libraries without much luck: it seems that since most people hate the escaped forward slashes, most libraries do not generate them.
I could escape the strings by hand before stuffing them in my python data structures before calling json.dumps. I could also write a function to recursively iterate over the data structure, spot strings, and escape them automatically (nicer over the long run). I could maybe escape the string generated by json.dumps before stuffing it in the HTML (I am not sure that this could not lead to invalid JSON being inserted in the HTML).
Which leads me to my question: is there a json serialization library that can be coerced to escape forward slashes in strings in python ?
The best way I've found is to just do a replacement on the resulting string.
out = json.dumps(obj)
out = out.replace("/", "\\/")
Escaping forward slashes is optional within the JSON spec, and doing so ensures that you won't get bit by "</script>" attacks in the string.

Script having trouble passing Unicode through a REST interface

I am having trouble getting my Python script to ass Unicode data over RESTful http call.
I have a script that reads data from web site X using a REST interface and then pushes it into web site Y using it's REST interface. Both system are open source and are run on our servers. Site X uses PHP, Apache and PostgreSQL. Site Y is Java, Tomcat and PostgreSQL. The script doing the processing is currently in Python.
In general, the script works very well. We do have a few international users, and when trying to process a user with unicode characters in their name things break down. The original version of the script read the JSON data into the Python. The data was converted automagically into Unicode. I am pretty sure everything was working fine up to this point. To output the data I used subprocess.Popen() to call curl. This works for regular ascii, but the unicode was getting mangled somewhere in transit. I didn't get an error anywhere, but when viewing the results on site B it is no longer correctly encoded.
I know that Unicode is supported for these fields because I can craft a request using Firefox that correctly adds the data to site B.
Next idea was to not use curl, but just do everything in Python. I experimented by passing a hand constructed Unicode string to Python's urllib to make the REST call, but I received an error from urllib.urlopen():
UnicodeEncodeError: 'ascii' codec can't encode characters in position 103-105: ordinal not in range(128)
Any ideas on how to make this work? I would rather not re-write too much, but if there is another scripting language that would be better suited I wouldn't mind hearing about that also.
Here is my Python test script:
import urllib
uni = u"abc_\u03a0\u03a3\u03a9"
post = u"xdat%3Auser.login=unitest&"
post += u"xdat%3Auser.primary_password=nauihe4r93nf83jshhd83&"
post += u"xdat%3Auser.firstname=" + uni + "&"
post += u"xdat%3Auser.lastname=" + uni ;
url = u"http://localhost:8081/xnat/app/action/XDATRegisterUser"
data = urllib.urlopen(url,post).read()
With regard to your test script, it is failing because you are passing unicode object to urllib.urlencode() (it is being called for you by urlopen()). It does not support unicode objects, so it implicitly encodes using the default charset, which is ascii. Obviously, it fails.
The simplest way to handle POSTing unicode objects is to be explicit; Gather your data and build a dict, encode unicode values with an appropriate charset, urlencode the dict (to get a POSTable ascii string), then initiate the request. Your example could be rewritten as:
import urllib
import urllib2
## Build our post data dict
data = {
'xdat:user.login' : u'unitest',
'xdat:primary_password' : u'nauihe4r93nf83jshhd83',
'xdat:firstname' : u"abc_\u03a0\u03a3\u03a9",
'xdat:lastname' : u"abc_\u03a0\u03a3\u03a9",
}
## Encode the unicode using an appropriate charset
data = dict([(key, value.encode('utf8')) for key, value in data.iteritems()])
## Urlencode it for POSTing
data = urllib.urlencode(data)
## Build a POST request, get the response
url = "http://localhost:8081/xnat/app/action/XDATRegisterUser"
request = urllib2.Request(url, data)
response = urllib2.urlopen(request)
EDIT: More generally, when you make an http request with python (say urllib2.urlopen),
the content of the response is not decoded to unicode for you. That means you need to be aware of the encoding used by the server that sent it. Look at the content-type header; Usually it includes a charset=xyz.
It is always prudent to decode your input as early as possible, and encode your output as late as possible.

Categories