HMAC Encoding in R vs Python - python

I am trying to create a token for an API call in R. I have example code and output in Python, but I am unable to replicate in R. I know little to nothing about encoding, decoding, etc. Hoping someone can shed some light on what I can do to make these outputs match. Here is a toy example.
R Code:
library(RCurl)
library(digest)
api_key = "abcdefghijklmnopqrstuvwxyz123456789=="
decoded_api_key = base64Decode(api_key)
hmac_binary = hmac(decoded_api_key, "MySpecialString", "sha512")
hmac_encoded = base64Encode(digest(hmac_binary))
print(as.character(hmac_encoded))
# ZmZjZDBlMjkyNzg3NDNmYWM1ZDcyNjVkNmY4ZmM1OGQ=
Python:
import hmac
import hashlib
import base64
api_key = "abcdefghijklmnopqrstuvwxyz123456789=="
decoded_api_key = base64.b64decode(api_key)
hmac_binary = hmac.new(decoded_api_key, "MySpecialString", hashlib.sha512)
hmac_encoded = base64.b64encode(hmac_binary.digest())
print(hmac_encoded)
# MduxNfXVkwcOtCpBWJEl96S43boYVYTtHb4waR21ARCMo6iokKuxbwEJMTkuytbrCOxvBqKCYiaZiV/AyHTEcw==
The answers I obtain are given at the end of the code blocks. Clearly they don't match. I'd like someone to help me change my R code to match the Python output.
Thanks in advance.

The digest() function in R doesn't do the same thing as the .digest() method in python. It doesn't extract the value, it computes a new digest for whatever you pass in. Also the hmac function will by default return a string with the bytes in it, but you want to base64 encode the actual bytes so you need to make sure to request the raw values. Finally, a base64 string should have a multiple of 4 characters in the string. The extra padding seems to return a different value. So this should give the same value as the python code
api_key = "abcdefghijklmnopqrstuvwxyz123456789="
decoded_api_key = base64Decode(api_key)
hmac_binary = hmac(decoded_api_key, "MySpecialString", "sha512", raw=TRUE)
hmac_encoded = base64Encode(hmac_binary)
print(as.character(hmac_encoded))
# [1] "MduxNfXVkwcOtCpBWJEl96S43boYVYTtHb4waR21ARCMo6iokKuxbwEJMTkuytbrCOxvBqKCYiaZiV/AyHTEcw=="

Related

Does JSON decoding method affect the validity of the token?

For example, there is a JSON object written in Dart and Python
final jsonData = {'access_key':'i-want-to-authorized','param1': 'parameter1'};
jsonData = {'access_key':'i-want-to-authorized','param1': 'parameter1'}
and if I stringify these variables like this
import 'dart:convert';
final jsonData = {'access_key':'i-want-to-be-authorized','param1': 'parameter1'};
final dumpedJson = json.encode(jsonData);
print(dumpedJson);
import json
jsonData = {'access_key':'i-want-to-be-authorized','param1': 'parameter1'}
dumped_json = json.dumps(jsonData)
print(dumped_json)
When I execute the codes above, I get the following results.
In Dart, {"access_key":"i-want-to-be-authorized","param1":"parameter1"}
In Python, {"access_key": "i-want-to-be-authorized", "param1": "parameter1"}
As you can see, the JSON library in Python puts space after every colon and comma, which makes the total different result when it's encrypted in anyways.
Though I believe that when they're all decrypted and decoded, they will be taken the same on the server-side but is it probable that this difference makes the server not able to comprehend the request?

Python: How to only URL Encode a specific URL Parameter?

I have some big URLs that contain a lot of URL parameters.
For my specific case, I need to URL Encode the content of one specific URL Parameter (q) when the content after the "q=" starts with a slash ("/")
Example URL:
https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"
How can I only URL encode that last part of the URL which is within the "q" parameter?
The output of this example should be:
https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%22TEST%22%20
I already tried some different things with urllib.parse but it doesnt work the way I want it.
Thanks for your help!
split the string on the &q=/ part and only encode the last string
from urllib import parse
url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"'
encoded = parse.quote_plus(url.split("&q=/")[1])
encoded_url = f"{url.split('&q=/')[0]}&q=/{encoded}"
print(encoded_url)
output
https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%22TEST%22
Note that there's a difference between this and the requested output, but you have an url encoded space (%20) at the end
EDIT
Comment shows a different need for the encoding, so the code needs to change a bit. The code below only encodes the part after &q=. Basically, first split the url and the parameters, then iterate through the parameters to find the q= parameter, and encode that part. Do some f-string and join magic and you get an url that has the q parameter encoded. Note that this might have issues if an & is present in the part that needs to be encoded.
url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"&utm_source=test1&cpc=123&gclid=abc123'
# the first parameter is always delimited by a ?
baseurl, parameters = url.split("?")
newparameters = []
for parameter in parameters.split("&"):
# check if the parameter is the part that needs to be encoded
if parameter.startswith("q="):
# encode the parameter
newparameters.append(f"q={parse.quote_plus(parameter[2:])}")
else:
# otherwise add the parameter unencoded
newparameters.append(parameter)
# string magic to create the encoded url
encoded_url = f"{baseurl}?{'&'.join(newparameters)}"
print(encoded_url)
output
https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%22TEST%22&utm_source=test1&cpc=123&gclid=abc123
EDIT 2
Trying to solve the edge case where there's a & character in the string to be encoded, as this messes up the string.split("&").
I tried using urllib.parse.parse_qs() but this has the same issue with the & character. Docs for reference.
This question is a nice example of how edge cases can mess up simple logic and make it overly complicated.
The RFC3986 also didn't specify any limitations on the name of the query string, otherwise that could've been used to narrow down possible errors even more.
updated code
from urllib import parse
url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/&"TE&eeST"&utm_source=test1&cpc=123&gclid=abc123'
# the first parameter is always delimited by a ?
baseurl, parameters = url.split("?")
# addition to handle & in the querystring.
# it reduces errors, but it can still mess up if there's a = in the part to be encoded.
split_parameters = []
for index, parameter in enumerate(parameters.split("&")):
if "=" not in parameter:
# add this part to the previous entry in split_parameters
split_parameters[-1] += f"&{parameter}"
else:
split_parameters.append(parameter)
newparameters = []
for parameter in split_parameters:
# check if the parameter is the part that needs to be encoded
if parameter.startswith("q="):
# encode the parameter
newparameters.append(f"q={parse.quote_plus(parameter[2:])}")
else:
# otherwise add the parameter unencoded
newparameters.append(parameter)
# string magic to create the encoded url
encoded_url = f"{baseurl}?{'&'.join(newparameters)}"
print(encoded_url)
output
https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=%2F%22TEST%22%2F%26%22TE%26eeST%22&utm_source=test1&cpc=123&gclid=abc123
#EdoAkse has a good answer, and should get the credit for the answer.
But the purist in me would do the same thing slightly differently, because
(1) I don't like doing the same function on the same data twice (for efficiency), and
(2) I like the logical symmetry of using the join function to reverse a split.
My code would look more like this:
from urllib import parse
url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"'
splitter = "&q=/"
unencoded,encoded = url.split(splitter)
encoded_url = splitter.join(unencoded,parse.quote_plus(encoded))
print(encoded_url)
Edit: I couldn't resist posting my edited answer based on the commentary. You can see the virtual identical code developed independently. This must be the right approach then, I guess.
from urllib import parse
url = 'https://www.exmple.com/test?test1=abc&test2=abc&test3=abc&q=/"TEST"/"TEST"'
base_url,arglist = url.split("?",1)
args = arglist.split("&")
new_args = []
for arg in args:
if arg.lower().startswith("q="):
new_args.append(arg[:2]+parse.quote_plus(arg[2:]))
else:
new_args.append(arg)
encoded_url = "?".join([base_url,"&".join(new_args)])
print(encoded_url)

Python 3 - ascii to hex for hmac

I'm having an issue and not quite sure how to explain it but I will try my best.
So I'm attempting to authenticate with an API which requires grabbing a private key that is provided by the website in hex representation (e.g. an example token is "665c20b3c4517e025311160b7fec3fdb9b4d091f142d308c568d0eec4745f569") and decode to ascii to create a keyed hash so I may pass it in an http header which is part of the authentication process.
When it comes to python2 I can simply
import hashlib
import hmac
import requests
headers = {
"custom header": hmac.new("665c20b3c4517e025311160b7fec3fdb9b4d091f142d308c568d0eec4745f569".decode("hex"),
msg="whatever",
digestmod=hashlib.sha256).hexdigest()
}
requests.get("my url", headers=headers)
However, I cannot get this working in python3 despite several hours of googling, various SO posts and looking at the official docs for hmac.
This seems to stem from the differences between how python2 and 3 handle strings.
In python2 running "665c20b3c4517e025311160b7fec3fdb9b4d091f142d308c568d0eec4745f569".decode("hex") returns this string of characters "f\ ��Q~S�?ۛM -0�V��GE�i" which is passed to hmac.new()
Somethings I have tried in Python3 after searching around:
bytes.fromhex('665c20b3c4517e025311160b7fec3fdb9b4d091f142d308c568d0eec4745f569').decode('utf-8')
bytes.fromhex('665c20b3c4517e025311160b7fec3fdb9b4d091f142d308c568d0eec4745f569').decode('ascii')
import binascii
binascii.unhexlify(b"665c20b3c4517e025311160b7fec3fdb9b4d091f142d308c568d0eec4745f569")
But these all error or output different returns that hmac.new() won't accept. I'm assuming there's a simple fix that I'm just ignorant on since I'm not very knowledgeable about the nuances of how p2 and p3 handle strings.
One of your attempts is correct:
In [1]: import binascii
...: binascii.unhexlify(b"665c20b3c4517e025311160b7fec3fdb9b4d091f142d308c568d0eec4745f569")
...:
Out[1]: b'f\\ \xb3\xc4Q~\x02S\x11\x16\x0b\x7f\xec?\xdb\x9bM\t\x1f\x14-0\x8cV\x8d\x0e\xecGE\xf5i'
If you get a wrong result from hmac afterwards, you can post a question about that specific scenario, with some examples comparing python2/3.
You may be running into a problem with the message itself, which needs to explicitly use bytes, not a string. These two give the same values:
Python 3:
In [10]: hmac.new(binascii.unhexlify(b"665c20b3c4517e025311160b7fec3fdb9b4d091f142d308c568d0eec4745f569"),
...: msg="whatever".encode('utf-8'),
...: digestmod=hashlib.sha256).hexdigest()
Out[10]: '79ca98357629c22a094c67a02638076573ec41d2c5ce8996435656f8488552d0'
Python 2:
>>> hmac.new("665c20b3c4517e025311160b7fec3fdb9b4d091f142d308c568d0eec4745f569".decode("hex"),
... msg="whatever",
... digestmod=hashlib.sha256).hexdigest()
'79ca98357629c22a094c67a02638076573ec41d2c5ce8996435656f8488552d0'

How to make a request to the Intersango API

I'm trying to figure out what's the correct URL format for the Intersango API (which is poorly documented). I'm programming my client in C#, but I'm looking at the Python example and I'm a little confused as to what is actually being placed in the body of the request:
def make_request(self,call_name,params):
params.append(('api_key',self.api_key)) // <-- How does this get serialized?
body = urllib.urlencode(params)
self.connect()
try:
self.connection.putrequest('POST','/api/authenticated/v'+self.version+'/'+call_name+'.php')
self.connection.putheader('Connection','Keep-Alive')
self.connection.putheader('Keep-Alive','30')
self.connection.putheader('Content-type','application/x-www-form-urlencoded')
self.connection.putheader('Content-length',len(body))
self.connection.endheaders()
self.connection.send(body)
response = self.connection.getresponse()
return json.load(response)
//...
I can't figure out this piece of code: params.append(('api_key',self.api_key))
Is it some kind of a dictionary, something that gets serialized to JSON, comma delimited, or exactly how does it get serialized? What would the body look like when the parameters are encoded and assigned to it?
P.S. I don't have anything that I can run the code with so I can debug it, but I'm just hoping that this is simple enough to understand for somebody that knows Python and they would be able to tell me what's happening on that line of code.
params is a list of 2-element lists. The list would look like ((key1, value1), (key2, value2), ...)
params.append(('api_key',self.api_key)) adds another 2-element list to the existing params list.
Finally, urllib.urlencode takes this list and converts it into a propert urlencoded string. In this case, it will return a string key1=value1&key2=value2&api_key=23423. If there are any special characters in your keys or values, urlencode will %encode them. See documentation for urlencode
I tried to get the C# code working, and it kept failing with exception {"The remote server returned an error: (417) Expectation Failed."}. I finally found what the problem is. You could read about it in depth here
In short, the way to make C# access Intersango API is to add following code:
System.Net.ServicePointManager.Expect100Continue = false;
This code needs to only run once. This is a global setting, so it affects your full application, so beware that something else could break as a result.
Here's a sample code:
System.Net.ServicePointManager.Expect100Continue = false;
var address = "https://intersango.com/api/authenticated/v0.1/listAccounts.php";
HttpWebRequest request = WebRequest.Create(address) as HttpWebRequest;
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
var postBytes = Encoding.UTF8.GetBytes("api_key=aa75***************fd65785");
request.ContentLength = postBytes.Length;
var dataStream = request.GetRequestStream();
dataStream.Write(postBytes, 0, postBytes.Length);
dataStream.Close();
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Piece of cake
instead of params.append(('api_key',self.api_key))
just write:
params['api_key']=self.api_key

Why am I getting the wrong signature for the Amazon Product Advertising API?

I'm following the directions on the API documentation precisely, and after some frustration I finally put together something directly from their examples on http://docs.amazonwebservices.com/AWSECommerceService/2011-08-01/DG/rest-signature.html
I've tried this python script on a few machines and have gotten the same result on all of them.
import hmac
from base64 import b64encode
from hashlib import sha256
secret_key = '1234567890'
to_sign = """GET
webservices.amazon.com
/onca/xml
AWSAccessKeyId=AKIAI44QH8DHBEXAMPLE&ItemId=0679722769&Operation=ItemLookup&ResponseGroup=ItemAttributes%2COffers%2CImages%2CReviews&Service=AWSECommerceService&Timestamp=2009-01-01T12%3A00%3A00Z&Version=2009-01-06"""
print b64encode(hmac.new(secret_key, to_sign, sha256).digest())
The instructions say that the signature using this request, and this key, is Nace+U3Az4OhN7tISqgs1vdLBHBEijWcBeCqL5xN9xg= but I get O6UTkH+m4zAQUvB+WXUZJeA8bZcKAdkc4crKgHtbc6s=
(Before anyone says anything: The example page displays the requests wrapped at 65 characters; I've already tried it. This doesn't provide a solution, and is not stated in the instructions for signature creation.)
EDIT: I found the answer, see below.
Well, look at that... The docs were wrong.
I stumbled on an old (nearly) duplicate of this question: Calculating a SHA hash with a string + secret key in python
It looks like the AWSAccessKeyId value changed from 00000000000000000000 to AKIAI44QH8DHBEXAMPLE in the example requests page.
Updating this in the script prints the expected key, Nace+U3Az4OhN7tISqgs1vdLBHBEijWcBeCqL5xN9xg=
import hmac
from base64 import b64encode
from hashlib import sha256
secret_key = '1234567890'
to_sign = """GET
webservices.amazon.com
/onca/xml
AWSAccessKeyId=00000000000000000000&ItemId=0679722769&Operation=ItemLookup&ResponseGroup=ItemAttributes%2COffers%2CImages%2CReviews&Service=AWSECommerceService&Timestamp=2009-01-01T12%3A00%3A00Z&Version=2009-01-06"""
print b64encode(hmac.new(secret_key, to_sign, sha256).digest())
You might check out the Bottlenose library, https://github.com/dlo/bottlenose, I have found that it makes dealing with AWS Product API much more friendly.

Categories