I am using python requests to connect to a website. I am passing some strings to get data about them.
The problem is some string contains slash /, so when they are passed in url, I got a ValueError.
this is my url:
https://api.flipkart.net/sellers/skus/%s/listings % string
when string is passed (string that does not contain slash), I get:
https://api.flipkart.net/sellers/skus/A35-Charry-228_39/listings
It returns a valid response. but when i pass string which contains a slash:
string = "L20-ORG/BLUE-109(38)"
I get url like:
https://api.flipkart.net/sellers/skus/L20-ORG/BLUE-109(38)/listings
Which throws the error.
how to solve this?
Raw string literals in Python
string = r"L20-ORG/BLUE-109(38)"
You could find more info here and here.
urllib.quote_plus is your friend. As urllib is a module from the standard library, you just have to import it with import urllib.
If you want to be conservative, just use it with default value:
string = urllib.quote_plus("L20-ORG/BLUE-109(38)")
gives 'L20-ORG%2FBLUE-109%2838%29'
If you know that some characters are harmless for your use case (say parentheses):
string = urllib.quote_plus("L20-ORG/BLUE-109(38)", '()')
gives 'L20-ORG%2FBLUE-109(38)'
Related
I have been searching all over the place for this, but I couldn't solve my issue.
I am using a local API to fetch some data, in that API, the wildcard is the percent character %.
The URL is like so : urlReq = 'http://myApiURL?ID=something¶meter=%w42'
And then I'm passing this to the get function:
req = requests.get(urlReq,auth=HTTPBasicAuth(user, pass))
And get the following error: InvalidURL: Invalid percent-escape sequence: 'w4'
I have tried escaping the % character using %%, but in vain. I also tried the following:
urlReq = 'http://myApiURL?ID=something¶meter=%sw42' % '%' but didn't work as well.
Does anyone know how to solve this?
PS I'm using Python 2.7.8 :: Anaconda 1.9.1 (64-bit)
You should have a look at urllib.quote - that should do the trick. Have a look at the docs for reference.
To expand on this answer: The problem is, that % (+ a hexadecimal number) is the escape sequence for special characters in URLs. If you want the server to interpret your % literaly, you need to escape it as well, which is done by replacing it with %25. The aforementioned qoute function does stuff like that for you.
Let requests construct the query string for you by passing the parameters in the params argument to requests.get() (see documentation):
api_url = 'http://myApiURL'
params = {'ID': 'something', 'parameter': '%w42'}
r = requests.get(api_url, params=params, auth=(user, pass))
requests should then percent encode the parameters in the query string for you. Having said that, at least with requests version 2.11.1 on my machine, I find that the % is encoded when passing it in the url, so perhaps you could check which version you are using.
Also for basic authentication you can simply pass the user name and password in a tuple as shown above.
in requests you should use requests.compat.quote_plus here's take alook
example :
>>> requests.compat.quote_plus('example: parameter=%w42')
'example%3A+parameter%3D%25w42'
Credits to #Tryph:
the % is used to encode special characters in urls. you can encode the % character with this sequence %25. see here for more detail: w3schools.com/tags/ref_urlencode.asp
I'm pretty new to python and programming in general. I'm currently working on a script to scrape stock quotes from Google finance. Here is my code:
import urllib.request as ur
import re
def getquote(symbol):
base_url = 'http://finance.google.com/finance?q='
content = ur.urlopen(base_url + symbol).read()
m = re.search(b'id="ref_(.*?)">(.*?)<', content)
if m:
quote = m.group(2)
else:
quote = 'no quote available for: ' + symbol
return quote
which returns:
b'655.65'
(655.65 is the current price of Google stock which is the symbol I passed in)
My question is: is there a way for me to either scrub the return so I just get the price without the b or the quotations? Ideally I'd like to have it returned as a float but if need be I can have it return as a string and convert it to a float when I need it later.
I've referenced these other posts:
How to create a stock quote fetching app in python
Python TypeError on regex
How to convert between bytes and strings in Python 3?
Convert bytes to a Python string
Perhaps I've missed something in one of those but I believe I've tried everything I could find and it is still returning in the format shown above.
SOLVED
The problem I was having wasn't displaying a string without quotes, it was that I had a value set to a byte literal that needed to first be converted to a string literal and then to a float. I had tried this but I tried this outside of the if statement (noob move). the solution was as v1k45 suggested:
add a line in the if statement
quote = float(quote.decode('utf-8'))
to decode it and convert to float.
thanks for the help!
Add a line in the if condition:
quote = float(quote.decode('utf-8'))
You have to decode the bytes to unicode to return a proper string. Use float() to convert it into a float.
In a python script I am parsing the return of
gsettings get org.gnome.system.proxy ignore-hosts
which looks like it should be properly formatted JSON
['localhost', '127.0.0.0/8']
however, when passing this output to json.loads it throws
ValueError: No JSON object could be decoded
I make the call to gsettings via:
import subprocess
proc = subprocess.Popen(["gsettings", "get", "org.gnome.system.proxy", "ignore-hosts"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout,stderr = proc.communicate()
which assigns "['localhost', '127.0.0.0/8']\n" to stdout.
Then I strip the newline and pass to json.loads:
ignore = json.loads(stdout.strip("\n"))
But, this throws a ValueError.
I've tracked the issue down to the string being defined by single-quotes or double-quotes as shown in the following snippet:
# tested in python 2.7.3
import json
ignore_hosts_works = '["localhost", "127.0.0.0/8"]'
ignore_hosts_fails = "['localhost', '127.0.0.0/8']"
json.loads(ignore_hosts_works) # produces list of unicode strings
json.loads(ignore_hosts_fails) # ValueError: No JSON object could be decoded
import string
table = string.maketrans("\"'", "'\"")
json.loads(string.translate(ignore_hosts_fails, table)) # produces list of unicode strings
Why is ignore_hosts_fails not successfully parsed by json.loads without swapping the quote types?
In case it might matter, I'm running Ubuntu 12.04 with Python 2.7.3.
From the JSON RFC 7159:
string = quotation-mark *char quotation-mark
[...]
quotation-mark = %x22 ; "
JSON strings must use " quotes.
You can parse that list as a Python literal instead, using ast.literal_eval():
>>> import ast
>>> ast.literal_eval("['localhost', '127.0.0.0/8']")
['localhost', '127.0.0.0/8']
Because RFC 7159 says so. Strings in JSON documents are enclosed in double quotes.
JSON is not just JavaScript.
JSON strings are double quoted according to the spec pdf or json.org.
JSON object keys are strings.
You must use double quotes for your strings and keys (to follow the spec). Many JSON parsers will be more permissive.
From object definition:
An object structure is represented as a pair of curly bracket tokens surrounding zero or more name/value pairs.
A name is a string. A single colon token follows each name, separating the name from the value. A single comma token separates a value from a following name.
From string definition:
A string is a sequence of Unicode code points wrapped with quotation marks (U+0022).
That U+0022 is the (double) quotation mark: ".
As said before, that is invalid JSON. To parse, there are two other possibilities: use either demjson or yaml
>>> demjson.decode(" ['localhost', '127.0.0.0/8']")
[u'localhost', u'127.0.0.0/8']
>>> yaml.load(" ['localhost', '127.0.0.0/8']")
['localhost', '127.0.0.0/8']
Yes it cares for valid json. But you can tweak Simple json code to parse this Unquoted and single quoted json strings.
I have given my answer on this post
Single versus double quotes in json loads in Python
params = {'token': 'JVFQ%2FFb5Ri2aKNtzTjOoErWvAaHRHsWHc8x%2FKGS%2FKAuoS4IRJI161l1rz2ab7rovBzGB86bGsh8pmDVaW8jj6AiJ2jT2rLIyt%2Bbpm80MCOE%3D'}
rsp = requests.get("http://xxxx/access", params=params)
print rsp.url
print params
when print rsp.url, I get
http://xxxx/access?token=JVFQ%252FFb5Ri2aKNtzTjOoErWvAaHRHsWHc8x%252FKGS%252FKAuoS4IRJI161l1rz2ab7rovBzGB86bGsh8pmDVaW8jj6AiJ2jT2rLIyt%252Bbpm80MCOE%253D
JVFQ%2FF
JVFQ%252FF
The value of the ?token= in the url is different from params['token'].
Why does it change?
You passed in a URL encoded value, but requests encodes the value for you. As a result, the value is encoded twice; the % character is encoded to %25.
Don't pass in a URL-encoded value. Decode it manually if you must:
from urllib import unquote
params['token'] = unquote(params['token'])
URL's use a special type of syntax. The % character is a reserved character in URLs. It is used as an escape character to allow you to type other characters (such as space, #, and % itself).
Requests automatically encodes URLs to proper syntax when necessary. The % key had to be econded to "%25". In other words, the URL parameters never changed. They are the same. The URL was just encoded to proper syntax. Everywhere you put "%" it was encoded to the proper form of "%25"
You can check out URL Syntax info here if you want:
http://en.wikipedia.org/wiki/Uniform_resource_locator#Syntax
And you can encode/decode URLs here. Try encoding "%" or try decoding "%25" to see what you get:
http://www.url-encode-decode.com/
I've got a string from an HTTP header, but it's been escaped.. what function can I use to unescape it?
myemail%40gmail.com -> myemail#gmail.com
Would urllib.unquote() be the way to go?
I am pretty sure that urllib's unquote is the common way of doing this.
>>> import urllib
>>> urllib.unquote("myemail%40gmail.com")
'myemail#gmail.com'
There's also unquote_plus:
Like unquote(), but also replaces plus signs by spaces, as required for unquoting HTML form values.
In Python 3, these functions are urllib.parse.unquote and urllib.parse.unquote_plus.
The latter is used for example for query strings in the HTTP URLs, where the space characters () are traditionally encoded as plus character (+), and the + is percent-encoded to %2B.
In addition to these there is the unquote_to_bytes that converts the given encoded string to bytes, which can be used when the encoding is not known or the encoded data is binary data. However there is no unquote_plus_to_bytes, if you need it, you can do:
def unquote_plus_to_bytes(s):
if isinstance(s, bytes):
s = s.replace(b'+', b' ')
else:
s = s.replace('+', ' ')
return unquote_to_bytes(s)
More information on whether to use unquote or unquote_plus is available at URL encoding the space character: + or %20.
Yes, it appears that urllib.unquote() accomplishes that task. (I tested it against your example on codepad.)