I am working on a REST API and using python. say for a get request ( sample below),
I am assuming , anyone who makes a call will URL encode the URL, what is the correct way to decode and read query parameters in python?
'https://someurl.com/query_string_params?id=1&type=abc'
import requests
import urllib
def get():
//parse query string parameters here
Here's an example of how to split a URL and get the query parameters:
import urllib.parse
url='https://someurl.com/query_string_params?id=1&type=abc'
url_parts = urllib.parse.urlparse(url)
print( f"{url_parts=}" )
query_parts = urllib.parse.parse_qs(url_parts.query)
print( f"{query_parts=}" )
Result:
url_parts=ParseResult(scheme='https', netloc='someurl.com', path='/query_string_params', params='', query='id=1&type=abc', fragment='')
query_parts={'id': ['1'], 'type': ['abc']}
Documentation is here https://docs.python.org/3/library/urllib.parse.html?highlight=url%20decode
Related
I am trying to use urlparse Python library to parse some custom URIs.
I noticed that for some well-known schemes params are parsed correctly:
>>> from urllib.parse import urlparse
>>> urlparse("http://some.domain/some/nested/endpoint;param1=value1;param2=othervalue2?query1=val1&query2=val2#fragment")
ParseResult(scheme='http', netloc='some.domain', path='/some/nested/endpoint', params='param1=value1;param2=othervalue2', query='query1=val1&query2=val2', fragment='fragment')
>>> urlparse("ftp://some.domain/some/nested/endpoint;param1=value1;param2=othervalue2?query1=val1&query2=val2#fragment")
ParseResult(scheme='ftp', netloc='some.domain', path='/some/nested/endpoint', params='param1=value1;param2=othervalue2', query='query1=val1&query2=val2', fragment='fragment')
...but for custom ones - they are not. params field remains empty. Instead, params are treated as a part of path:
>>> urlparse("scheme://some.domain/some/nested/endpoint;param1=value1;param2=othervalue2?query1=val1&query2=val2#fragment")
ParseResult(scheme='scheme', netloc='some.domain', path='/some/nested/endpoint;param1=value1;param2=othervalue2', params='', query='query1=val1&query2=val2', fragment='fragment')
Why there is a difference in parsing depending on schema? How can I parse params within urlparse library using custom schema?
This is because urlparse assumes that only a set of schemes will uses parameters in their URL format. You can see that check with in the source code.
if scheme in uses_params and ';' in url:
url, params = _splitparams(url)
else:
params = ''
Which means urlparse will attempt to parse parameters only if the scheme is in uses_params (which is a list of known schemes).
uses_params = ['', 'ftp', 'hdl', 'prospero', 'http', 'imap',
'https', 'shttp', 'rtsp', 'rtspu', 'sip', 'sips',
'mms', 'sftp', 'tel']
So to get the expected output you can append your custom scheme into uses_params list and perform the urlparse call again.
>>> from urllib.parse import uses_params, urlparse
>>>
>>> uses_params.append('scheme')
>>> urlparse("scheme://some.domain/some/nested/endpoint;param1=value1;param2=othervalue2?query1=val1&query2=val2#fragment")
ParseResult(scheme='scheme', netloc='some.domain', path='/some/nested/endpoint', params='param1=value1;param2=othervalue2', query='query1=val1&query2=val2', fragment='fragment')
Can you remove that custom schemes from the url?
That allways will return the params
urlparse("//some.domain/some/nested/endpoint;param1=value1;param2=othervalue2?query1=val1&query2=val2#fragment")
ParseResult(scheme='', netloc='some.domain', path='/some/nested/endpoint', params='param1=value1;param2=othervalue2', query='query1=val1&query2=val2', fragment='fragment')
This question already has answers here:
How can I parse (read) and use JSON?
(5 answers)
What are the differences between the urllib, urllib2, urllib3 and requests module?
(11 answers)
Closed last month.
I want to dynamically query Google Maps through the Google Directions API. As an example, this request calculates the route from Chicago, IL to Los Angeles, CA via two waypoints in Joplin, MO and Oklahoma City, OK:
http://maps.googleapis.com/maps/api/directions/json?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false
It returns a result in the JSON format.
How can I do this in Python? I want to send such a request, receive the result and parse it.
I recommend using the awesome requests library:
import requests
url = 'http://maps.googleapis.com/maps/api/directions/json'
params = dict(
origin='Chicago,IL',
destination='Los+Angeles,CA',
waypoints='Joplin,MO|Oklahoma+City,OK',
sensor='false'
)
resp = requests.get(url=url, params=params)
data = resp.json() # Check the JSON Response Content documentation below
JSON Response Content: https://requests.readthedocs.io/en/master/user/quickstart/#json-response-content
The requests Python module takes care of both retrieving JSON data and decoding it, due to its builtin JSON decoder. Here is an example taken from the module's documentation:
>>> import requests
>>> r = requests.get('https://github.com/timeline.json')
>>> r.json()
[{u'repository': {u'open_issues': 0, u'url': 'https://github.com/...
So there is no use of having to use some separate module for decoding JSON.
requests has built-in .json() method
import requests
requests.get(url).json()
import urllib
import json
url = 'http://maps.googleapis.com/maps/api/directions/json?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false'
result = json.load(urllib.urlopen(url))
Use the requests library, pretty print the results so you can better locate the keys/values you want to extract, and then use nested for loops to parse the data. In the example I extract step by step driving directions.
import json, requests, pprint
url = 'http://maps.googleapis.com/maps/api/directions/json?'
params = dict(
origin='Chicago,IL',
destination='Los+Angeles,CA',
waypoints='Joplin,MO|Oklahoma+City,OK',
sensor='false'
)
data = requests.get(url=url, params=params)
binary = data.content
output = json.loads(binary)
# test to see if the request was valid
#print output['status']
# output all of the results
#pprint.pprint(output)
# step-by-step directions
for route in output['routes']:
for leg in route['legs']:
for step in leg['steps']:
print step['html_instructions']
just import requests and use from json() method :
source = requests.get("url").json()
print(source)
OR you can use this :
import json,urllib.request
data = urllib.request.urlopen("url").read()
output = json.loads(data)
print (output)
Try this:
import requests
import json
# Goole Maps API.
link = 'http://maps.googleapis.com/maps/api/directions/json?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false'
# Request data from link as 'str'
data = requests.get(link).text
# convert 'str' to Json
data = json.loads(data)
# Now you can access Json
for i in data['routes'][0]['legs'][0]['steps']:
lattitude = i['start_location']['lat']
longitude = i['start_location']['lng']
print('{}, {}'.format(lattitude, longitude))
Also for pretty Json on console:
json.dumps(response.json(), indent=2)
possible to use dumps with indent. (Please import json)
Write a function named variable_get that takes a string as a parameter representing part of a path of a url and returns the response of an HTTPS GET request to the url https://google.com/input as a string where input is the input parameter of this function
import urllib.request
def variable_get(input):
x = "https://google.com/" + input
response = urllib.request.urlopen(x)
html = response.read()
return html
I am getting TypeError: b'lliks' is not JSON serialize. What am I doing wrong?
import urllib.request
import json
def variable_get(input):
uri = "https://google.com/input" + input
response = urllib.request.urlopen(uri).read().decode()
html = json.dumps(response)
return html
The main take away from this code is the function json.dumps, converts it into a json string. For info here.
I have an url as follows:
https://some_url/vivi/v2/ZUxOZmVrdzJqTURxV20wQ0RvRld6SytEQWNocThwMGVnbFJ4RDQrZzJMeGRBcnhPYnUzV1pRPT0=/BE?category=PASSENGER&make=30&model=124®month=3®date=2015-03&body=443,4781&facelift=252&seats=4&bodyHeight=443&bodyLength=443&weight=-1&engine=1394&wheeldrive=196&transmission=400
What I need is to get the string after v2/, thus ZUxOZmVrdzJqTURxV20wQ0RvRld6SytEQWNocThwMGVnbFJ4RDQrZzJMeGRBcnhPYnUzV1pRPT0=
I use furl to extract the parameter value. I do it as follows:
furl(url).args['category'] // gives PASSENGER
But here I do not have the name of the parameter.
How can I do that?
If you don't need a generalized solution but for the url you have provided in question. Then you can do the following:
url="https://some_url/vivi/v2/ZUxOZmVrdzJqTURxV20wQ0RvRld6SytEQWNocThwMGVnbFJ4RDQrZzJMeGRBcnhPYnUzV1pRPT0=/BE?category=PASSENGER&make=30&model=124®month=3®date=2015-03&body=443,4781&facelift=252&seats=4&bodyHeight=443&bodyLength=443&weight=-1&engine=1394&wheeldrive=196&transmission=400"
answer=url.split('/')[5]
Use following code:
l=url.split('/')
m=l[l.index('v2')+1]
print(m)
Desired output using re.
import re
url = "https://some_url/vivi/v2/ZUxOZmVrdzJqTURxV20wQ0RvRld6SytEQWNocThwMGVnbFJ4RDQrZzJMeGRBcnhPYnUzV1pRPT0=/BE?category=PASSENGER&make=30&model=124®month=3®date=2015-03&body=443,4781&facelift=252&seats=4&bodyHeight=443&bodyLength=443&weight=-1&engine=1394&wheeldrive=196&transmission=400"
re.findall(r'v2/(.*)/', url)
Resulting with ['ZUxOZmVrdzJqTURxV20wQ0RvRld6SytEQWNocThwMGVnbFJ4RDQrZzJMeGRBcnhPYnUzV1pRPT0='].
But it's safer to use split() the way other mentioned, because when api version changes to v3 this re code won't work anymore.
The string that you are after is not a query parameter, it is part of the URL path.
In the general case you can use the urllib.parse module to parse the URL into its components, then access the path. Then extract the required part of the path:
import base64
from urllib.parse import urlparse, parse_qs
parsed_url = urlparse(url)
s = parsed_url.path.split('/')[-2] # second last component of path
>>> s
'ZUxOZmVrdzJqTURxV20wQ0RvRld6SytEQWNocThwMGVnbFJ4RDQrZzJMeGRBcnhPYnUzV1pRPT0='
>>> base64.b64decode(s)
b'eLNfekw2jMDqWm0CDoFWzK+DAchq8p0eglRxD4+g2LxdArxObu3WZQ=='
The keys and values of the query string can also be processed into a dictionary and accessed by key:
params = parse_qs(parsed_url.query)
>>> params
{'category': ['PASSENGER'], 'make': ['30'], 'model': ['124'], 'regmonth': ['3'], 'regdate': ['2015-03'], 'body': ['443,4781'], 'facelift': ['252'], 'seats': ['4'], 'bodyHeight': ['443'], 'bodyLength': ['443'], 'weight': ['-1'], 'engine': ['1394'], 'wheeldrive': ['196'], 'transmission': ['400']}
>>> params['category']
['PASSENGER']
I need to add custom parameters to an URL query string using Python
Example:
This is the URL that the browser is fetching (GET):
/scr.cgi?q=1&ln=0
then some python commands are executed, and as a result I need to set following URL in the browser:
/scr.cgi?q=1&ln=0&SOMESTRING=1
Is there some standard approach?
You can use urlsplit() and urlunsplit() to break apart and rebuild a URL, then use urlencode() on the parsed query string:
from urllib import urlencode
from urlparse import parse_qs, urlsplit, urlunsplit
def set_query_parameter(url, param_name, param_value):
"""Given a URL, set or replace a query parameter and return the
modified URL.
>>> set_query_parameter('http://example.com?foo=bar&biz=baz', 'foo', 'stuff')
'http://example.com?foo=stuff&biz=baz'
"""
scheme, netloc, path, query_string, fragment = urlsplit(url)
query_params = parse_qs(query_string)
query_params[param_name] = [param_value]
new_query_string = urlencode(query_params, doseq=True)
return urlunsplit((scheme, netloc, path, new_query_string, fragment))
Use it as follows:
>>> set_query_parameter("/scr.cgi?q=1&ln=0", "SOMESTRING", 1)
'/scr.cgi?q=1&ln=0&SOMESTRING=1'
Use urlsplit() to extract the query string, parse_qsl() to parse it (or parse_qs() if you don't care about argument order), add the new argument, urlencode() to turn it back into a query string, urlunsplit() to fuse it back into a single URL, then redirect the client.
You can use python's url manipulation library furl.
import furl
f = furl.furl("/scr.cgi?q=1&ln=0")
f.args['SOMESTRING'] = 1
print(f.url)
import urllib
url = "/scr.cgi?q=1&ln=0"
param = urllib.urlencode({'SOME&STRING':1})
url = url.endswith('&') and (url + param) or (url + '&' + param)
the docs