Fixing Python requests.exception.InvalidURL : Invalid percent-escape sequence 'u2' error? - python

My python app accepts url's processed using Javascript escape function which are unescaped using urllib.unquote in my python code, this works fine for most url's but fails if the file-name (which is also part of the url) contains a & in it with following error.
requests.exception.InvalidURL : Invalid percent-escape sequence 'u2'
Edit: Example code with error
import urllib,requests
url = "https%3A//r20---sn-cvh7zn76.googlevideo.com/videoplayback%3Fipbits%3D0%26ms%3Dau%26fexp%3D931328%2C931946%2C934804%2C914004%2C931818%2C937417%2C913434%2C923328%2C936916%2C934022%2C936923%26sparams%3Dclen%2Cdur%2Cgir%2Cid%2Cip%2Cipbits%2Citag%2Clmt%2Crequiressl%2Csource%2Cupn%2Cexpire%26source%3Dyoutube%26mv%3Dm%26dur%3D278.593%26id%3Do-AOxIEhMchATdRjU99Gveow8reeBWtxFaqwpWifXC9KwS%26expire%3D1397662367%26clen%3D4425254%26sver%3D3%26signature%3D9A0CFEC5F59C2C7FC35A8CF87491F4E7F9683C59.C46B3A3602A20611C73CC4228FCB8B287034F52D%26mt%3D1397639060%26upn%3DBknKrHPqCCw%26gir%3Dyes%26itag%3D140%26key%3Dyt5%26ip%3D117.200.252.163%26lmt%3D1386126879207085%26requiressl%3Dyes%26ratebypass%3Dyes%26title%3DHum%20Tuhmaray%20hain%20%u2022%20SRK%20_%20Madhuri%20Dixit%20%u2022%20HD%201080p%20%u2022%20Hindi%20%u2022%20Bollywood%20Songs"
url = urllib.unquote_plus(url).decode('utf-8')
resp = requests.head(url, verify=False, allow_redirects=True)
print resp
The string Hum Tuhmaray hain • SRK & Madhuri Dixit • causes issues, the problem seems to be the unicode bullet character %u2022 in the url.

Using encodeURIComponent() in javascript code instead of escape() fixed the issue.
Thanks

Related

How To Remove (%0D) in python requests

import requests
nexmokey = 'mykey'
nexmosec = 'mysecretkey'
nexmoBal = 'https://rest.nexmo.com/account/get-balance?api_key={}&api_secret={}'.format(nexmokey,nexmosec)
rr = requests.get(nexmoBal)
print(rr.url)
I would like to send a request to post at
https://rest.nexmo.com/account/get-balance?api_key=mykey&api_secret=mysecretkey
but why does %0D appear?
https://rest.nexmo.com/account/get-balance?api_key=mykey%0D&api_secret=mysecretkey%0D
requests.get expects parameters like api_secret=my_secret to be provided through the params argument, not as part of the URL, which is URL-encoded for you.
Use this:
nexmoBal = 'https://rest.nexmo.com/account/get-balance'
rr = requests.get(nexmoBal, params={'api_key': nexmokey, 'api_secret': nexmosec})
The fact that %0D ends up in there, indicates you have a character #13 (0D hexadecimal) in there, which is a carriage return (part of the end of line on Windows systems) - probably because you are reading the key and secret from some file and didn't include them in the example code.
Also, note that you mention you want to post, but you're calling .get().

Http Post data from Python to net core app

I am trying to send a file from a Python script to my .net core webserver.
In Python I am doing this using the requests library, and my code looks like so.
filePath = "run-1.csv"
with open(filePath, "rb") as postFile:
file_dict = {filePath: postFile}
response = requests.post(server_path + "/batchUpload/create", files=file_dict, verify=validate_sql)
print (response.text)
This code executes fine, and I can see the request fine in my webserver code which looks like so:
[HttpPost]
[Microsoft.AspNetCore.Authorization.AllowAnonymous]
public string Create(IFormFile file) //Dictionary<string, IFormFile>
{
var ms = new MemoryStream();
file.CopyTo(ms);
var text = Encoding.ASCII.GetString(ms.ToArray());
Debug.Print(text);
return "s";
}
However, the file parameter always returns as null.
Also, I can see the file parameter fine when getting data posted from postMan
I suspect that this problem has to do with how .net core model binding works, but not sure...
Any suggestions here on how to get my file displaying on the server?
Solved my issue - the problem was that in Python I was assigning my file to my upload dictionary with the actual file name "./run1.csv" rather than a literal string "file"
Updating this fixed my issue.
file_dict = {"file": postFile}
This is what I believe #nalnpir mentioned above.
I figured this out by posting from postman and also from my python code to http://httpbin.org/post and comparing the respoinse
The example from the requests docs is mostly correct, except that the key has to match the parameter of the controller method signature.
url = 'https://www.url.com/api/post'
files = {'parameterName': open('filename.extension', 'rb')}
r = requests.post(url, files=files)
So in this case the controller action should be
[HttpPost]
public string Post(IFormFile parameterName)

Python Requests URL with Unicode Parameters

I'm currently trying to hit the google tts url, http://translate.google.com/translate_tts with japanese characters and phrases in python using the requests library.
Here is an example:
http://translate.google.com/translate_tts?tl=ja&q=ひとつ
However, when I try to use the python requests library to download the mp3 that the endpoint returns, the resulting mp3 is blank. I have verified that I can hit this URL in requests using non-unicode characters (via romanji) and have gotten correct responses back.
Here is a part of the code I am using to make the request
langs = {'japanese': 'ja',
'english': 'en'}
def get_sound_file_for_text(text, download=False, lang='japanese'):
r = StringIO()
glang = langs[lang]
text = text.replace('*', '')
text = text.replace('/', '')
text = text.replace('x', '')
url = 'http://translate.google.com/translate_tts'
if download:
result = requests.get(url, params={'tl': glang, 'q': text})
r.write(result.content)
r.seek(0)
return r
else:
return url
Also, if I print textor url within this snippet, the kana/kanji is rendered correctly in my console.
Edit:
If I attempt to encode the unicode and quote it as such, I still get the same response.
# -*- coding: utf-8 -*-
from StringIO import StringIO
import urllib
import requests
__author__ = 'jacob'
langs = {'japanese': 'ja',
'english': 'en'}
def get_sound_file_for_text(text, download=False, lang='japanese'):
r = StringIO()
glang = langs[lang]
text = text.replace('*', '')
text = text.replace('/', '')
text = text.replace('x', '')
text = urllib.quote(text.encode('utf-8'))
url = 'http://translate.google.com/translate_tts?tl=%(glang)s&q=%(text)s' % locals()
print url
if download:
result = requests.get(url)
r.write(result.content)
r.seek(0)
return r
else:
return url
Which returns this:
http://translate.google.com/translate_tts?tl=ja&q=%E3%81%B2%E3%81%A8%E3%81%A4
Which seems like it should work, but doesn't.
Edit 2:
If I attempt to use urlllb/urllib2, I get a 403 error.
Edit 3:
So, it seems that this problem/behavior is simply limited to this endpoint. If I try the following URL, a different endpoint.
http://www.kanjidamage.com/kanji/13-un-%E4%B8%8D
From within requests and my browser, I get the same response (they match). If I even try ascii characters to the server, like this url.
http://translate.google.com/translate_tts?tl=ja&q=sayonara
I get the same response as well (they match again). But if I attempt to send unicode characters to this URL, I get a correct audio file on my browser, but not from requests, which sends an audio file, but with no sound.
http://translate.google.com/translate_tts?tl=ja&q=%E3%81%B2%E3%81%A8%E3%81%A4
So, it seems like this behavior is limited to the Google TTL URL?
The user agent can be part of the problem, however, it is not in this case. The translate_tts service rejects (with HTTP 403) some user agents, e.g. any that begin with Python, curl, wget, and possibly others. That is why you are seeing a HTTP 403 response when using urllib2.urlopen() - it sets the user agent to Python-urllib/2.7 (the version might vary).
You found that setting the user agent to Mozilla/5.0 fixed the problem, but that might work because the API might assume a particular encoding based on the user agent.
What you actually should do is to explicitly specify the URL character encoding with the ie field. Your URL request should look like this:
http://translate.google.com/translate_tts?ie=UTF-8&tl=ja&q=%E3%81%B2%E3%81%A8%E3%81%A4
Note the ie=UTF-8 which explicitly sets the URL character encoding. The spec does state that UTF-8 is the default, but doesn't seem entirely true, so you should always set ie in your requests.
The API supports kanji, hiragana, and katakana (possibly others?). These URLs all produce "nihongo", although the audio produced for hiragana input has a slightly different inflection to the others.
import requests
one = u'\u3072\u3068\u3064'
kanji = u'\u65e5\u672c\u8a9e'
hiragana = u'\u306b\u307b\u3093\u3054'
katakana = u'\u30cb\u30db\u30f3\u30b4'
url = 'http://translate.google.com/translate_tts'
for text in one, kanji, hiragana, katakana:
r = requests.get(url, params={'ie': 'UTF-8', 'tl': 'ja', 'q': text})
print u"{} -> {}".format(text, r.url)
open(u'/tmp/{}.mp3'.format(text), 'wb').write(r.content)
I made this little method before to help me with UTF-8 encoding. I was having issues printing cyrllic and CJK languages to csvs and this did the trick.
def assist(unicode_string):
utf8 = unicode_string.encode('utf-8')
read = utf8.decode('string_escape')
return read ## UTF-8 encoded string
Also, make sure you have these two lines at the beginning of your .py.
#!/usr/bin/python
# -*- coding: utf-8 -*-
The first line is just a good python habit, it specifies which compiler to use on the .py (really only useful if you have more than one version of python loaded on your machine). The second line specifies the encoding of the python file. A slightly longer answer for this is given here.
Setting the User-Agent to Mozilla/5.0 fixes this issue.
from StringIO import StringIO
import urllib
import requests
__author__ = 'jacob'
langs = {'japanese': 'ja',
'english': 'en'}
def get_sound_file_for_text(text, download=False, lang='japanese'):
r = StringIO()
glang = langs[lang]
text = text.replace('*', '')
text = text.replace('/', '')
text = text.replace('x', '')
url = 'http://translate.google.com/translate_tts'
if download:
result = requests.get(url, params={'tl': glang, 'q': text}, headers={'User-Agent': 'Mozilla/5.0'})
r.write(result.content)
r.seek(0)
return r
else:
return url

Python url construction: escape characters other than regular letters

I am using wikipedia api and using following api request,
http://en.wikipedia.org/w/api.php?`action=query&meta=globaluserinfo&guiuser='$cammer'&guiprop=groups|merged|unattached&format=json`
but the problem is I am unable to escape Dollar Sign and similar characters like that, I tried the following but it didn't work,
r['guiprop'] = u'groups|merged|unattached'
r['guiuser'] = u'$cammer'
I found it this in w3school but checking this for every single character would a pain full, what would be the best way to escape this in the strip.http://www.w3schools.com/tags/ref_urlencode.asp
You should take a look at using urlencode.
from urllib import urlencode
base_url = "http://en.wikipedia.org/w/api.php?"
arguments = dict(action="query",
meta="globaluserinfo",
guiuser="$cammer",
guiprop="groups|merged|unattached",
format="json")
url = base_url + urlencode(arguments)
If you don't need to build a complete url you can just use the quote function for a single string:
>>> import urllib
>>> urllib.quote("$cammer")
'%24cammer'
So you end up with:
r['guiprop'] = urllib.quote(u'groups|merged|unattached')
r['guiuser'] = urllib.quote(u'$cammer')

using unicode in Google translate url from python script

I'm trying to use my script to automatically using google translate apis from russian to english. Here is the code.
mytext = {some text in russian}
url = 'https://ajax.googleapis.com/ajax/services/language/translate?v=1.0&q='+ mytext +'&langpair=ru%7Cen'
request = urllib2.Request(url, None, {'Referer': 'http://www.mysite.org'})
Now I've tried using various encodings for mytext, including unicode, utf-8, windows-1251 but it never works. Either the urllib.request complains of non-ascii characters or google returns an error code. Any idea if I need any codec?
Use
url = 'https://ajax.googleapis.com/ajax/services/language/translate?v=1.0&q=' \
+ urllib2.quote(mytext) + '&langpair=ru%7Cen'
to quote your text

Categories