I have some big URLs that contain a lot of URL parameters.
For my specific case, I need to URL Encode the content of one specific URL Parameter (q) when the content after the "q=" starts with a slash ("/")
Example URL:
https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"
How can I only URL encode that last part of the URL which is within the "q" parameter?
The output of this example should be:
https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%22TEST%22%20
I already tried some different things with urllib.parse but it doesnt work the way I want it.
Thanks for your help!
split the string on the &q=/ part and only encode the last string
from urllib import parse
url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"'
encoded = parse.quote_plus(url.split("&q=/")[1])
encoded_url = f"{url.split('&q=/')[0]}&q=/{encoded}"
print(encoded_url)
output
https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%22TEST%22
Note that there's a difference between this and the requested output, but you have an url encoded space (%20) at the end
EDIT
Comment shows a different need for the encoding, so the code needs to change a bit. The code below only encodes the part after &q=. Basically, first split the url and the parameters, then iterate through the parameters to find the q= parameter, and encode that part. Do some f-string and join magic and you get an url that has the q parameter encoded. Note that this might have issues if an & is present in the part that needs to be encoded.
url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"&utm_source=test1&cpc=123&gclid=abc123'
# the first parameter is always delimited by a ?
baseurl, parameters = url.split("?")
newparameters = []
for parameter in parameters.split("&"):
# check if the parameter is the part that needs to be encoded
if parameter.startswith("q="):
# encode the parameter
newparameters.append(f"q={parse.quote_plus(parameter[2:])}")
else:
# otherwise add the parameter unencoded
newparameters.append(parameter)
# string magic to create the encoded url
encoded_url = f"{baseurl}?{'&'.join(newparameters)}"
print(encoded_url)
output
https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%22TEST%22&utm_source=test1&cpc=123&gclid=abc123
EDIT 2
Trying to solve the edge case where there's a & character in the string to be encoded, as this messes up the string.split("&").
I tried using urllib.parse.parse_qs() but this has the same issue with the & character. Docs for reference.
This question is a nice example of how edge cases can mess up simple logic and make it overly complicated.
The RFC3986 also didn't specify any limitations on the name of the query string, otherwise that could've been used to narrow down possible errors even more.
updated code
from urllib import parse
url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/&"TE&eeST"&utm_source=test1&cpc=123&gclid=abc123'
# the first parameter is always delimited by a ?
baseurl, parameters = url.split("?")
# addition to handle & in the querystring.
# it reduces errors, but it can still mess up if there's a = in the part to be encoded.
split_parameters = []
for index, parameter in enumerate(parameters.split("&")):
if "=" not in parameter:
# add this part to the previous entry in split_parameters
split_parameters[-1] += f"&{parameter}"
else:
split_parameters.append(parameter)
newparameters = []
for parameter in split_parameters:
# check if the parameter is the part that needs to be encoded
if parameter.startswith("q="):
# encode the parameter
newparameters.append(f"q={parse.quote_plus(parameter[2:])}")
else:
# otherwise add the parameter unencoded
newparameters.append(parameter)
# string magic to create the encoded url
encoded_url = f"{baseurl}?{'&'.join(newparameters)}"
print(encoded_url)
output
https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%26%22TE%26eeST%22&utm_source=test1&cpc=123&gclid=abc123
#EdoAkse has a good answer, and should get the credit for the answer.
But the purist in me would do the same thing slightly differently, because
(1) I don't like doing the same function on the same data twice (for efficiency), and
(2) I like the logical symmetry of using the join function to reverse a split.
My code would look more like this:
from urllib import parse
url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"'
splitter = "&q=/"
unencoded,encoded = url.split(splitter)
encoded_url = splitter.join(unencoded,parse.quote_plus(encoded))
print(encoded_url)
Edit: I couldn't resist posting my edited answer based on the commentary. You can see the virtual identical code developed independently. This must be the right approach then, I guess.
from urllib import parse
url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"'
base_url,arglist = url.split("?",1)
args = arglist.split("&")
new_args = []
for arg in args:
if arg.lower().startswith("q="):
new_args.append(arg[:2]+parse.quote_plus(arg[2:]))
else:
new_args.append(arg)
encoded_url = "?".join([base_url,"&".join(new_args)])
print(encoded_url)
Related
I'm trying to just input a URL and have python pull the data needed from the url for the script.
I only really need to isolate 1.2.3 into a variable, 7/8/9 into a variable, and 1.2.3/4/5/6/7 into a variable. The url variables changes and want to be able to update my script easily.
Not really sure if that's possible.
Url=https://Thedog.big.com//red/house/large/1.2.3/4/5/6/7/8/9
x = 1.2.3
y = 7/8/9
z = 1.2.3/4/5/6/7
That really depends what kind of urls you have, the idea is just find a regular patterns
If all urls are having same structure with change only in values, you can just take out the parts from url string as index:
a = str(url)[start:stop]
else you can use str(url).split(r'/') to split the url into list of pieces seperated by /
I am trying to map the path parameter form part of the request URL:
example
I am doing this but it doesn't work:
#app.route("/people/:<string:id>", methods=['GET'])
def api_search_a_person(id):
return Id
Does anyone know how to get the value after the ":" (string "123456-7" in the example)
: is a special URI character because it's used to define the protocol or port, so it's likely that your browser URI encodes the character.
Do you really need this character anyway ? Can't you simply remove it (http://host.net/people/123456-7), or use URI parameters instead http://host.net/people/?123456-7 ?
If you really want to use a :, escape it on both ends (the escape character is %3A).
Try this:
#app.route("/people/:<string_id>", methods=['GET']) # notice _ instead :
def api_search_a_person(string_id): # extracts the string_id from the request url
return string_id # returns the mapped string_id
It's unclear why you used <string:id> with : inside.
And also what Id represents in the return (maybe you meant id from the method parameter ...
I am trying to create a token for an API call in R. I have example code and output in Python, but I am unable to replicate in R. I know little to nothing about encoding, decoding, etc. Hoping someone can shed some light on what I can do to make these outputs match. Here is a toy example.
R Code:
library(RCurl)
library(digest)
api_key = "abcdefghijklmnopqrstuvwxyz123456789=="
decoded_api_key = base64Decode(api_key)
hmac_binary = hmac(decoded_api_key, "MySpecialString", "sha512")
hmac_encoded = base64Encode(digest(hmac_binary))
print(as.character(hmac_encoded))
# ZmZjZDBlMjkyNzg3NDNmYWM1ZDcyNjVkNmY4ZmM1OGQ=
Python:
import hmac
import hashlib
import base64
api_key = "abcdefghijklmnopqrstuvwxyz123456789=="
decoded_api_key = base64.b64decode(api_key)
hmac_binary = hmac.new(decoded_api_key, "MySpecialString", hashlib.sha512)
hmac_encoded = base64.b64encode(hmac_binary.digest())
print(hmac_encoded)
# MduxNfXVkwcOtCpBWJEl96S43boYVYTtHb4waR21ARCMo6iokKuxbwEJMTkuytbrCOxvBqKCYiaZiV/AyHTEcw==
The answers I obtain are given at the end of the code blocks. Clearly they don't match. I'd like someone to help me change my R code to match the Python output.
Thanks in advance.
The digest() function in R doesn't do the same thing as the .digest() method in python. It doesn't extract the value, it computes a new digest for whatever you pass in. Also the hmac function will by default return a string with the bytes in it, but you want to base64 encode the actual bytes so you need to make sure to request the raw values. Finally, a base64 string should have a multiple of 4 characters in the string. The extra padding seems to return a different value. So this should give the same value as the python code
api_key = "abcdefghijklmnopqrstuvwxyz123456789="
decoded_api_key = base64Decode(api_key)
hmac_binary = hmac(decoded_api_key, "MySpecialString", "sha512", raw=TRUE)
hmac_encoded = base64Encode(hmac_binary)
print(as.character(hmac_encoded))
# [1] "MduxNfXVkwcOtCpBWJEl96S43boYVYTtHb4waR21ARCMo6iokKuxbwEJMTkuytbrCOxvBqKCYiaZiV/AyHTEcw=="
I want to crawl a webpage for some information and what I've done so far It's working but I need to do a request to another url from the website, I'm trying to format it but it's not working, this is what I have so far:
name = input("> ")
page = requests.get("http://www.mobafire.com/league-of-legends/champions")
tree = html.fromstring(page.content)
for index, champ in enumerate(champ_list):
if name == champ:
y = tree.xpath(".//*[#id='browse-build']/a[{}]/#href".format(index + 1))
print(y)
guide = requests.get("http://www.mobafire.com{}".format(y))
builds = html.fromstring(guide.content)
print(builds)
for title in builds.xpath(".//table[#class='browse-table']/tr[2]/td[2]/div[1]/a/text()"):
print(title)
From the input, the user enters a name; if the name matches one from a list (champ_list) it prints an url and from there it formats it to the guide variable and gets another information but I'm getting errors such as invalid ipv6.
This is the output url (one of them but they're similar anyway) ['/league-of-legends/champion/ivern-133']
I tried using slicing but it doesn't do anything, probably I'm using it wrong or it doesn't work in this case. I tried using replace as well, they don't work on lists; tried using it as:
y = [y.replace("'", "") for y in y] so I could see if it removed at least the quotes but it didn't work neither; what can be another approach to format this properly?
I take it y is the list you want to insert into the string?
Try this:
"http://www.mobafire.com{}".format('/'.join(y))
I have an app that will show images from reddit. Some images come like this http://imgur.com/Cuv9oau, when I need to make them look like this http://i.imgur.com/Cuv9oau.jpg. Just add an (i) at the beginning and (.jpg) at the end.
You can use a string replace:
s = "http://imgur.com/Cuv9oau"
s = s.replace("//imgur", "//i.imgur")+(".jpg" if not s.endswith(".jpg") else "")
This sets s to:
'http://i.imgur.com/Cuv9oau.jpg'
This function should do what you need. I expanded on #jh314's response and made the code a little less compact and checked that the url started with http://imgur.com as that code would cause issues with other URLs, like the google search I included. It also only replaces the first instance, which could causes issues.
def fixImgurLinks(url):
if url.lower().startswith("http://imgur.com"):
url = url.replace("http://imgur", "http://i.imgur",1) # Only replace the first instance.
if not url.endswith(".jpg"):
url +=".jpg"
return url
for u in ["http://imgur.com/Cuv9oau","http://www.google.com/search?q=http://imgur"]:
print fixImgurLinks(u)
Gives:
>>> http://i.imgur.com/Cuv9oau.jpg
>>> http://www.google.com/search?q=http://imgur
You should use Python's regular expressions to place the i. As for the .jpg you can just append it.