I am sending my GET request to python server my query string is having
"http://192.168.4.106:3333/xx/xx/xx/xx?excelReport**&detail=&#tt**=475&dee=475&empi=&qwer=&start_date=03/01/2014&end_date=03/13/2014&SearchVar=0&report_format=D"
my query string is containing one character # so when i am doing request.keys() in my server its not showing me any params passed.Its working with other special character??
I am stuck in this problem from quite a long time??
I am using zope framework??
Please suggest??
The # character cannot be used like that in a query string.
You should encode it with %23 and decode it when you parse the string.
The reason behind that can be found at W3 site
# marks the end of the 'query' part of an URL and the start of the 'fragment'. If you need to have a '#' inside your query (that is, the GET params that you get with request.keys()), you need to encode it (with the standard urllib.urlencode or with whatever your framework provides).
I'm not sure what's the purpose of # in that URL, though. Is it supposed to be a key #tt** in your request.keys()? Is it in fact the start of the fragment?
Nowadays fragments are often used to have some routing in the client side of a webapp, since if you go from #a to #b inside a webpage, you don't need to reload the page. So if that may be the case then you can't encode the #, since it would lose its meaning. You would need then to extract the parameters you want from the fragment part manually.
You can use urllib.quote to solve your problem generally.
>>> import urllib
>>> urllib.quote('#')
'%23'
Related
Facing some issue in calling API using request library. Problem is described as follows
The code:.
r = requests.post(url, data=json.dumps(json_data), headers=headers)
When I perform r.text the apostrophe in the string is giving me as
like this Bachelor\u2019s Degree. This should actually give me the response as Bachelor's Degree.
I tried json.loads also but the single quote problem remains the same,
How to get the string value correctly.
What you see here ("Bachelor\u2019s Degree") is the string's inner representation, where "\u2019" is the unicode codepoint for "RIGHT SINGLE QUOTATION MARK". This is perfectly correct, there's nothing wrong here, if you print() this string you'll get what you expect:
>>> s = 'Bachelor\u2019s Degree'
>>> print(s)
Bachelor’s Degree
Learning about unicode and encodings might save you quite some time FWIW.
EDIT:
When I save in db and then on displaying on HTML it will cause issue
right?
Have you tried ?
Your database connector is supposed to encode it to the proper encoding (according to your fields, tables and client encoding settings).
wrt/ "displaying it on HTML", it mostly depends on whether you're using Python 2.7.x or Python 3.x AND on how you build your HTML, but if you're using some decent framework with a decent template engine (if not you should reconsider your stack) chances are it will work out of the box.
As I already mentionned, learning about unicode and encodings will save you a lot of time.
It's just using a UTF-8 encoding, it is not "wrong".
string = 'Bachelor\u2019s Degree'
print(string)
Bachelor’s Degree
You can decode and encode it again, but I can't see any reason why you would want to do that (this might not work in Python 2):
string = 'Bachelor\u2019s Degree'.encode().decode('utf-8')
print(string)
Bachelor’s Degree
From requests docs:
When you make a request, Requests makes educated guesses about the
encoding of the response based on the HTTP headers. The text encoding
guessed by Requests is used when you access r.text
On the response object, you may use .content instead of .text to get the response in UTF-8
I have a problem.
I'm trying to use urllib library in python.
but, I don't understand of it.
a = 'http%3A%2F%2Ffile%2Efir%2Enet%2F40d55cecf9a3a47851b1d0ebda3e423993c837d3ca%2F20110909%5F52%5Fblogfile%2Folsscj25%5F1315512137967%5F5tAuGI%5Fzip%2F%255B%25C0%25A9%25B5%25B5%25BF%25ECxp%255D%2B%25C0%25A9%25B5%25B5%25BF%25ECxp%2B%25BD%25C3%25B8%25AE%25BE%25F3%25B3%25D1%25B9%25F6%5F%2Ezip'
aa = unquote(unquote(a))
'http://file.fir.net/40d55cecf9a3a47851b1d0ebda3e423993c837d3ca/20110909_52_blogfile/olsscj25_1315512137967_5tAuGI_zip/[\xc0\xa9\xb5\xb5\xbf\xecxp]+\xc0\xa9\xb5\xb5\xbf\xecxp+\xbd\xc3\xb8\xae\xbe\xf3\xb3\xd1\xb9\xf6_.zip'
a1 = quote(quote(aa))
'http%253A//file.fir.net/40d55cecf9a3a47851b1d0ebda3e423993c837d3ca/20110909_52_blogfile/olsscj25_1315512137967_5tAuGI_zip/%255B%25C0%25A9%25B5%25B5%25BF%25ECxp%255D%252B%25C0%25A9%25B5%25B5%25BF%25ECxp%252B%25BD%25C3%25B8%25AE%25BE%25F3%25B3%25D1%25B9%25F6_.zip'
Why does not equal two values(a and a1).
Please let me know
Thanks.
I think you are convoluting multiple problems into 1.
First of all, the only reason you are asking this question is because you want to unquote the tail portion of the file name, which seems to be quoted twice.
Second of all, the file name, even if doubly unquoted, results in non-utf-8 encoded data and it's not printable.
Thirdly, you don't seem to understand the URL format.
An finally, you don't understand what quote and unquote are actually doing.
urllib.quote() and urllib.unquote() are intended only for the path_info portion of the URL, which is everything after http://file.fir.net/.
urllib.quote() replaces everything in the string parameter that is not "safe in a URL with percent encoding. Meaning every character that will cause problems (e.g: :~[SPACE] etc.) with a %BYTES_IN_HEX format.
Since [:] is not safe in the URL's path portion, quote() will encode it with it's percent-encoding.
All these means that you should not pass the entire URL straight into the quote() unless you happen to want to actually encode a URL into the path_info portion of a URL.
The steps to solve your problem is something like this:
Fix the file name encoding to use something printable to help you debug.
urllib.unquote() once to get back a normal URL.
When you get the unquoted URL, pass it to urlparse.urlparse() first to break the components into their appropriate portions.
urllib.unquote() the file name portion.
Now you can retrieve the original file name, you can proceed to do whatever you need to do.
References:
http://docs.python.org/library/urlparse.html
http://docs.python.org/library/urllib.html
The answer is in the documentation on quote method:
... Letters, digits, and the characters '_.-' are never quoted. ...
a and a1 differ because a probably wasn't quoted using quote() and therefore more characters where quoted than it is required. The a1 is still valid quoted string, but some characters wheren't quoted because they don't have to.
I am having trouble getting my Python script to ass Unicode data over RESTful http call.
I have a script that reads data from web site X using a REST interface and then pushes it into web site Y using it's REST interface. Both system are open source and are run on our servers. Site X uses PHP, Apache and PostgreSQL. Site Y is Java, Tomcat and PostgreSQL. The script doing the processing is currently in Python.
In general, the script works very well. We do have a few international users, and when trying to process a user with unicode characters in their name things break down. The original version of the script read the JSON data into the Python. The data was converted automagically into Unicode. I am pretty sure everything was working fine up to this point. To output the data I used subprocess.Popen() to call curl. This works for regular ascii, but the unicode was getting mangled somewhere in transit. I didn't get an error anywhere, but when viewing the results on site B it is no longer correctly encoded.
I know that Unicode is supported for these fields because I can craft a request using Firefox that correctly adds the data to site B.
Next idea was to not use curl, but just do everything in Python. I experimented by passing a hand constructed Unicode string to Python's urllib to make the REST call, but I received an error from urllib.urlopen():
UnicodeEncodeError: 'ascii' codec can't encode characters in position 103-105: ordinal not in range(128)
Any ideas on how to make this work? I would rather not re-write too much, but if there is another scripting language that would be better suited I wouldn't mind hearing about that also.
Here is my Python test script:
import urllib
uni = u"abc_\u03a0\u03a3\u03a9"
post = u"xdat%3Auser.login=unitest&"
post += u"xdat%3Auser.primary_password=nauihe4r93nf83jshhd83&"
post += u"xdat%3Auser.firstname=" + uni + "&"
post += u"xdat%3Auser.lastname=" + uni ;
url = u"http://localhost:8081/xnat/app/action/XDATRegisterUser"
data = urllib.urlopen(url,post).read()
With regard to your test script, it is failing because you are passing unicode object to urllib.urlencode() (it is being called for you by urlopen()). It does not support unicode objects, so it implicitly encodes using the default charset, which is ascii. Obviously, it fails.
The simplest way to handle POSTing unicode objects is to be explicit; Gather your data and build a dict, encode unicode values with an appropriate charset, urlencode the dict (to get a POSTable ascii string), then initiate the request. Your example could be rewritten as:
import urllib
import urllib2
## Build our post data dict
data = {
'xdat:user.login' : u'unitest',
'xdat:primary_password' : u'nauihe4r93nf83jshhd83',
'xdat:firstname' : u"abc_\u03a0\u03a3\u03a9",
'xdat:lastname' : u"abc_\u03a0\u03a3\u03a9",
}
## Encode the unicode using an appropriate charset
data = dict([(key, value.encode('utf8')) for key, value in data.iteritems()])
## Urlencode it for POSTing
data = urllib.urlencode(data)
## Build a POST request, get the response
url = "http://localhost:8081/xnat/app/action/XDATRegisterUser"
request = urllib2.Request(url, data)
response = urllib2.urlopen(request)
EDIT: More generally, when you make an http request with python (say urllib2.urlopen),
the content of the response is not decoded to unicode for you. That means you need to be aware of the encoding used by the server that sent it. Look at the content-type header; Usually it includes a charset=xyz.
It is always prudent to decode your input as early as possible, and encode your output as late as possible.
On a website I have the word pluș sent via POST to a Django view.
It is sent as plu%25C8%2599. So I took that string and tried to figure out a way how to make %25C8%2599 back into ș.
I tried decoding the string like this:
from urllib import unquote_plus
s = "plu%25C8%2599"
print unquote_plus(unquote_plus(s).decode('utf-8'))
The result i get is pluÈ which actually has a length of 5, not 4.
How can I get the original string pluș after it's encoded ?
edit:
I managed to do it like this
def js_unquote(quoted):
quoted = quoted.encode('utf-8')
quoted = unquote_plus(unquote_plus(quoted)).decode('utf-8')
return quoted
It looks weird but works the way I needed it.
URL-decode twice, then decode as UTF-8.
You can't unless you know what the encoding is. Unicode itself is not an encoding. You might try BeautifulSoup or UnicodeDammit, which might help you get the result you were hoping for.
http://www.crummy.com/software/BeautifulSoup/
I hope this helps!
Also take a look at:
http://www.joelonsoftware.com/articles/Unicode.html
unquote_plus(s).encode('your_lang_encoding')
I was try like that. I was tried to sent a json POST request by HTML form to directly a django URI, which is included unicode characters like "şğüöçı+" and it works. I have used iso_8859-9 encoder in encode() function.
I have a list that I need to send through a URL to a third party vendor. I don't know what language they are using.
The list prints out like this:
[u'1', u'6', u'5']
I know that the u encodes the string in utf-8 right? So a couple of questions.
Can I send a list through a URL?
Will the u's show up on the other end when going through the URL?
If so, how do I remove them?
I am not sure what keywords to search to help me out, so any resources would be helpful too.
Can I send a list through a URL?
No. A URL is just text. If you want a way to package structured information in it, you'll have to agree that with the provider you're talking to.
One standard encoding for structure in URLs, that might or might not be what you need, is the use of multiple parameters with the same name in a query string. This format comes from HTML form submissions:
http://www.example.com/script?par=1&par=6&par=5
might be considered to represent a parameter par with a three-item list as its value. Or maybe not, it's up to the receiver to decide. For example in a PHP application you would have had to name the parameter par[] to get it to accept the array value.
I know that the u encodes the string in utf-8 right?
No. a u'...' string is a native Unicode string, where each index represents a whole character and not a byte in any particular encoding. If you want UTF-8 bytes, write u'...'.encode('utf-8') before URL-encoding. UTF-8 is a good default choice, but again: what encoding the receiving side wants is up to that application.
Will the u's show up on the other end when going through the URL?
u is part of the literal representation of the string, just the same as the ' quotes themselves. They are not part of the string value and would not be echoed by print or when joined into other strings, unless you deliberately asked for the literal representation by calling repr.
u'' is not utf-8, its python unicode strings for python 2.x
To send it through url, you need to encode them with utf8 like .encode('utf-8'), and also need to urlencode, and list cannot send it through URL, you need to make it as string.
Basically, you need to do it in following steps
python list -> unicode string -> utf8 string -> url encode -> send it through proper urllib api
Incorrect. unicode literals use Python's internal encoding, decided when it was compiled.
You can't send anything "through" URLs. Pick a protocol instead. And encode before sending, probably to UTF-8.