Why my python and objective-c code get different hmac-sha1 result? - python

I am writing a client/server project that need a signature. I use base64(hmac-sha1(key, data)) to generate a signature. But I got different signatures between python code and objective-c code:
get_signature('KEY', 'TEXT') //python get 'dAOnR2oXWP9xa4vUBdDvVXTpzQo='
[self hmacsha1:#"KEY" #"TEXT"] //obj-c get '7FH0NG0Ou4nb5luKUyjfrdWunos='
Not only the base64 values are different, the hmac-sha1 digest values are different too.
I'm trying to work it out with my friend for a few hours, still don't get it. Where is the problem of my code?
My python code:
import hmac
import hashlib
import base64
def get_signature(key, msg):
return base64.b64encode(hmac.new(key, msg, hashlib.sha1).digest())
My friend's objective-c code (copy from Objective-C sample code for HMAC-SHA1):
(NSString *)hmac_sha1:(NSString *)key text:(NSString *)text{
const char *cKey = [key cStringUsingEncoding:NSASCIIStringEncoding];
const char *cData = [text cStringUsingEncoding:NSASCIIStringEncoding];
unsigned char cHMAC[CC_SHA1_DIGEST_LENGTH];
CCHmac(kCCHmacAlgSHA1, cKey, strlen(cKey), cData, strlen(cData), cHMAC);
NSData *HMAC = [[NSData alloc] initWithBytes:cHMAC length:sizeof(cHMAC)];
NSString *hash = [GTMBase64 stringByEncodingData:HMAC];
return hash;
}
SOLVED: Thanks for everyone below. But I'm not gotta tell you that the real reason is I typed "TE S T" in my python IDE while typed "TE X T" in this post :P
For not wasting your time, I made some tests and got a nicer solution, base on your answers:
print get_signature('KEY', 'TEXT')
# 7FH0NG0Ou4nb5luKUyjfrdWunos=
print get_signature(bytearray('KEY'), bytearray('TEXT'))
# 7FH0NG0Ou4nb5luKUyjfrdWunos=
print get_signature('KEY', u'你好'.encode('utf-8')) # best solution, i think!
# PxEm7Oibj7ijZ55ko7V3isSkD1Q=
print get_signature('KEY', bytearray(u'你好'))
# TypeError: unicode argument without an encoding
print get_signature('KEY', u'你好')
# UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
print get_signature(u'KEY', 'TEXT')
# TypeError: character mapping must return integer, None or unicode
print get_signature(b'KEY', b'TEXT')
# 7FH0NG0Ou4nb5luKUyjfrdWunos=
Conclusion:
The message to be signature should be encoded to utf-8 string with both sides.
(Thanks to DJV)In python 3, strings are all unicode, so they should be used with 'b', or bytearray(thanks to Burhan Khalid), or encoded to utf-8 string.

Your friend is completely right, but so are you (sorta). Your function is completely right in both Python 2 and Python 3. However, your call is a little erroneous in Python 3. You see, in Python 3, strings are unicode, so in order to pass an ASCII string (as your Objective C friend does and as you would do in Python 2), you need to call your function with:
get_signature(b'KEY', b'TEXT')
in order to specify that those strings are bytes a.k.a. ASCII strings.
EDIT: As Burhan Khalid noted, the flexible way of doing this in Python 3 is to either call your function like this:
get_signature(key.encode('ascii'), test.encode('ascii'))
or define it as:
def get_signature(key, msg):
if(isinstance(key, str)):
key = key.encode('ascii')
if(isinstance(msg, str)):
msg = msg.encode('ascii')
return base64.b64encode(hmac.new(key, msg, hashlib.sha1).digest())

Related

Unicode to ASCII string in Python 2

Trying to make a wall display of current MET data for a selected airport.
This is my first use of a Raspberry Pi 3 and of Python.
The plan is to read from a net data provider and show selected data on a LCD display.
The LCD library seems to work only in Python2. Json data seems to be easier to handle in Python3.
This question python json loads unicode probably adresses my problem, but I do not anderstand what to do.
So, what shall I do to my code?
Minimal example demonstrating my problem:
#!/usr/bin/python
import I2C_LCD_driver
import urllib2
import urllib
import json
mylcd = I2C_LCD_driver.lcd()
mylcd.lcd_clear()
url = 'https://avwx.rest/api/metar/ESSB'
request = urllib2.Request(url)
response = urllib2.urlopen(request).read()
data = json.loads(response)
str1 = data["Altimeter"], data["Units"]["Altimeter"]
mylcd.lcd_display_string(str1, 1, 0)
The error is as follows:
$python Minimal.py
Traceback (most recent call last):
File "Minimal.py", line 18, in <module>
mylcd.lcd_display_string(str1, 1, 0)
File "/home/pi/I2C_LCD_driver.py", line 161, in lcd_display_string
self.lcd_write(ord(char), Rs)
TypeError: ord() expected a character, but string of length 4 found
It's a little bit hard to tell without seeing the inside of mylcd.lcd_display_string(), but I think the problem is here:
str1 = data["Altimeter"], data["Units"]["Altimeter"]
I suspect that you want str1 to contain something of type string, with a value like "132 metres". Try adding a print statement just after, so that you can see what str1 contains.
str1 = data["Altimeter"], data["Units"]["Altimeter"]
print( "str1 is: {0}, of type {1}.".format(str1, type(str1)) )
I think you will see a result like:
str1 is: ('foo', 'bar'), of type <type 'tuple'>.
The mention of "type tuple", the parentheses, and the comma indicate that str1 is not a string.
The problem is that the comma in the print statement does not do concatenation, which is perhaps what you are expecting. It joins the two values into a tuple. To concatenate, a flexible and sufficient method is to use the str.format() method:
str1 = "{0} {1}".format(data["Altimeter"], data["Units"]["Altimeter"])
print( "str1 is: {0}, of type {1}.".format(str1, type(str1)) )
Then I expect you will see a result like:
str1 is: 132 metres, of type <type 'str'>.
That value of type "str" should work fine with mylcd.lcd_display_string().
You are passing in a tuple, not a single string:
str1 = data["Altimeter"], data["Units"]["Altimeter"]
mylcd.lcd_display_string() expects a single string instead. Perhaps you meant to concatenate the two strings:
str1 = data["Altimeter"] + data["Units"]["Altimeter"]
Python 2 will implicitly encode Unicode strings to ASCII byte strings where a byte string is expected; if your data is not ASCII encodable you'd have to encode your data yourself, with a suitable codec, one that maps your Unicode codepoints to the right bytes for your LCD display (I believe you can have different character sets stored in ROM). This likely would involve a manual translation table as non-ASCII LCD ROM character sets seldom correspond with standard codecs.
If you just want to force the string to contain ASCII characters only, encode with the error handler set to 'ignore':
str1 = (data["Altimeter"] + data["Units"]["Altimeter"]).encode('ASCII', 'ignore')

Strings operation issue while switching from python 2.x to python 3

I am facing some problem with strings switching from python 2.x to python 3
Issue 1:
from ctypes import*
charBuffer=create_string_buffer(1000)
var = charBuffer.value # var contains like this "abc:def:ghi:1234"
a,b,c,d= var.split(':')
It works fine in python 2.x but not in 3.x it is throwing some errors like this
a,b,c,d= var.split(':')
TypeError: 'str' does not support the buffer interface
I got the links after doing some research in stackoverflow link link2
If I print, desired output would be
a= abc
b =def
c=ghi
d=1234
Issue2:
from ctypes import*
cdll = "Windll"
var = 0x1fffffffffffffffffffffff # I want to send this long variable to character pointer which is in cdll
charBuf =create_string_buffer(var.to_bytes(32,'little'))
cdll.createBuff (charBuf )
cdll function
int createBuff (char * charBuff){
print charBuff
return 0;
}
I want to send this long variable to character pointer which is in cdll, since its a character pointer its throwing errors.
Need your valuable inputs on how could I achieve this. Thanks in advance
In Python 3.x , '.value' on return of create_string_buffer() returns a byte string .
In your example you are trying to split the byte string using a Unicode string (which is the normal string in Python 3.x ) . This is what is causing your issue.
You would need to either split with byte string . Example -
a,b,c,d = var.split(b':')
Or you can decode the byte string to a Unicode string using '.decode()' method on it .
Example -
var = var.decode('<encoding>')
Split using b":" and you will be fine in both versions of python.
In py2 str is a bytestring, in py3 str is a unicode object. The object returned by the ctypes string buffer is a bytestring (str on py2 and bytes on py3). By writing the string literal as b"... you force it to be a bytestring in both version of python.

Python throws UnicodeEncodeError although I am doing str.decode(). Why?

Consider this function:
def escape(text):
print repr(text)
escaped_chars = []
for c in text:
try:
c = c.decode('ascii')
except UnicodeDecodeError:
c = '&{};'.format(htmlentitydefs.codepoint2name[ord(c)])
escaped_chars.append(c)
return ''.join(escaped_chars)
It should escape all non ascii characters by the corresponding htmlentitydefs. Unfortunately python throws
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 0: ordinal not in range(128)
when the variable text contains the string whose repr() is u'Tam\xe1s Horv\xe1th'.
But, I don't use str.encode(). I only use str.decode(). Do I miss something?
It's a misleading error-report which comes from the way python handles the de/encoding process. You tried to decode an already decoded String a second time and that confuses the Python function which retaliates by confusing you in turn! ;-) The encoding/decoding process takes place as far as i know, by the codecs-module. And somewhere there lies the origin for this misleading Exception messages.
You may check for yourself: either
u'\x80'.encode('ascii')
or
u'\x80'.decode('ascii')
will throw a UnicodeEncodeError, where a
u'\x80'.encode('utf8')
will not, but
u'\x80'.decode('utf8')
again will!
I guess you are confused by the meaning of encoding and decoding.
To put it simple:
decode encode
ByteString (ascii) --------> UNICODE ---------> ByteString (utf8)
codec codec
But why is there a codec-argument for the decode method? Well, the underlying function can not guess which codec the ByteString was encoded with, so as a hint it takes codec as an argument. If not provided it assumes you mean the sys.getdefaultencoding() to be implicitly used.
so when you use c.decode('ascii') you a) have a (encoded) ByteString (thats why you use decode) b) you want to get a unicode-representation-object (thats what you use decode for) and c) the codec in which the ByteString is encoded is ascii.
See also:
https://stackoverflow.com/a/370199/1107807
http://docs.python.org/howto/unicode.html
http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8
http://www.stereoplex.com/blog/python-unicode-and-unicodedecodeerror
You're passing a string that's already unicode. So, before Python can call decode on it, it has to actually encode it - and it does so by default using the ASCII encoding.
Edit to add It depends on what you want to do. If you simply want to convert a unicode string with non-ASCII characters into an HTML-encoded representation, you can do it in one call: text.encode('ascii', 'xmlcharrefreplace').
Python has two types of strings: character-strings (the unicode type) and byte-strings (the str type). The code you have pasted operates on byte-strings. You need a similar function to handle character-strings.
Maybe this:
def uescape(text):
print repr(text)
escaped_chars = []
for c in text:
if (ord(c) < 32) or (ord(c) > 126):
c = '&{};'.format(htmlentitydefs.codepoint2name[ord(c)])
escaped_chars.append(c)
return ''.join(escaped_chars)
I do wonder whether either function is truly necessary for you. If it were me, I would choose UTF-8 as the character encoding for the result document, process the document in character-string form (without worrying about entities), and perform a content.encode('UTF-8') as the final step before delivering it to the client. Depending on the web framework of choice, you may even be able to deliver character-strings directly to the API and have it figure out how to set the encoding.
This answer always works for me when I have this problem:
def byteify(input):
'''
Removes unicode encodings from the given input string.
'''
if isinstance(input, dict):
return {byteify(key):byteify(value) for key,value in input.iteritems()}
elif isinstance(input, list):
return [byteify(element) for element in input]
elif isinstance(input, unicode):
return input.encode('utf-8')
else:
return input
from How to get string objects instead of Unicode ones from JSON in Python?
I found solution in this-site
reload(sys)
sys.setdefaultencoding("latin-1")
a = u'\xe1'
print str(a) # no exception
decode a str make no sense.
I think you can check ord(c)>127

Converting Unicode objects with non-ASCII symbols in them into strings objects (in Python)

I want to send Chinese characters to be translated by an online service, and have the resulting English string returned. I'm using simple JSON and urllib for this.
And yes, I am declaring.
# -*- coding: utf-8 -*-
on top of my code.
Now everything works fine if I feed urllib a string type object, even if that object contains what would be Unicode information. My function is called translate.
For example:
stringtest1 = '無與倫比的美麗'
print translate(stringtest1)
results in the proper translation and doing
type(stringtest1)
confirms this to be a string object.
But if do
stringtest1 = u'無與倫比的美麗'
and try to use my translation function I get this error:
File "C:\Python27\lib\urllib.py", line 1275, in urlencode
v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-8: ordinal not in range(128)
After researching a bit, it seems this is a common problem:
Problem: neither urllib2.quote nor urllib.quote encode the unicode strings arguments
urllib.quote throws exception on Unicode URL
Now, if I type in a script
stringtest1 = '無與倫比的美麗'
stringtest2 = u'無與倫比的美麗'
print 'stringtest1',stringtest1
print 'stringtest2',stringtest2
excution of it returns:
stringtest1 無與倫比的美麗
stringtest2 無與倫比的美麗
But just typing the variables in the console:
>>> stringtest1
'\xe7\x84\xa1\xe8\x88\x87\xe5\x80\xab\xe6\xaf\x94\xe7\x9a\x84\xe7\xbe\x8e\xe9\xba\x97'
>>> stringtest2
u'\u7121\u8207\u502b\u6bd4\u7684\u7f8e\u9e97'
gets me that.
My problem is that I don't control how the information to be translated comes to my function. And it seems I have to bring it in the Unicode form, which is not accepted by the function.
So, how do I convert one thing into the other?
I've read Stack Overflow question Convert Unicode to a string in Python (containing extra symbols).
But this is not what I'm after. Urllib accepts the string object but not the Unicode object, both containing the same information
Well, at least in the eyes of the web application I'm sending the unchanged information to, I'm not sure if they're are still equivalent things in Python.
When you get a unicode object and want to return a UTF-8 encoded byte string from it, use theobject.encode('utf8').
It seems strange that you don't know whether the incoming object is a str or unicode -- surely you do control the call sites to that function, too?! But if that is indeed the case, for whatever weird reason, you may need something like:
def ensureutf8(s):
if isinstance(s, unicode):
s = s.encode('utf8')
return s
which only encodes conditionally, that is, if it receives a unicode object, not if the object it receives is already a byte string. It returns a byte string in either case.
BTW, part of your confusion seems to be due to the fact that you don't know that just entering an expression at the interpreter prompt will show you its repr, which is not the same effect you get with print;-).

How to check if a string in Python is in ASCII?

I want to I check whether a string is in ASCII or not.
I am aware of ord(), however when I try ord('é'), I have TypeError: ord() expected a character, but string of length 2 found. I understood it is caused by the way I built Python (as explained in ord()'s documentation).
Is there another way to check?
I think you are not asking the right question--
A string in python has no property corresponding to 'ascii', utf-8, or any other encoding. The source of your string (whether you read it from a file, input from a keyboard, etc.) may have encoded a unicode string in ascii to produce your string, but that's where you need to go for an answer.
Perhaps the question you can ask is: "Is this string the result of encoding a unicode string in ascii?" -- This you can answer
by trying:
try:
mystring.decode('ascii')
except UnicodeDecodeError:
print "it was not a ascii-encoded unicode string"
else:
print "It may have been an ascii-encoded unicode string"
def is_ascii(s):
return all(ord(c) < 128 for c in s)
In Python 3, we can encode the string as UTF-8, then check whether the length stays the same. If so, then the original string is ASCII.
def isascii(s):
"""Check if the characters in string s are in ASCII, U+0-U+7F."""
return len(s) == len(s.encode())
To check, pass the test string:
>>> isascii("♥O◘♦♥O◘♦")
False
>>> isascii("Python")
True
New in Python 3.7 (bpo32677)
No more tiresome/inefficient ascii checks on strings, new built-in str/bytes/bytearray method - .isascii() will check if the strings is ascii.
print("is this ascii?".isascii())
# True
Vincent Marchetti has the right idea, but str.decode has been deprecated in Python 3. In Python 3 you can make the same test with str.encode:
try:
mystring.encode('ascii')
except UnicodeEncodeError:
pass # string is not ascii
else:
pass # string is ascii
Note the exception you want to catch has also changed from UnicodeDecodeError to UnicodeEncodeError.
Your question is incorrect; the error you see is not a result of how you built python, but of a confusion between byte strings and unicode strings.
Byte strings (e.g. "foo", or 'bar', in python syntax) are sequences of octets; numbers from 0-255. Unicode strings (e.g. u"foo" or u'bar') are sequences of unicode code points; numbers from 0-1112064. But you appear to be interested in the character é, which (in your terminal) is a multi-byte sequence that represents a single character.
Instead of ord(u'é'), try this:
>>> [ord(x) for x in u'é']
That tells you which sequence of code points "é" represents. It may give you [233], or it may give you [101, 770].
Instead of chr() to reverse this, there is unichr():
>>> unichr(233)
u'\xe9'
This character may actually be represented either a single or multiple unicode "code points", which themselves represent either graphemes or characters. It's either "e with an acute accent (i.e., code point 233)", or "e" (code point 101), followed by "an acute accent on the previous character" (code point 770). So this exact same character may be presented as the Python data structure u'e\u0301' or u'\u00e9'.
Most of the time you shouldn't have to care about this, but it can become an issue if you are iterating over a unicode string, as iteration works by code point, not by decomposable character. In other words, len(u'e\u0301') == 2 and len(u'\u00e9') == 1. If this matters to you, you can convert between composed and decomposed forms by using unicodedata.normalize.
The Unicode Glossary can be a helpful guide to understanding some of these issues, by pointing how how each specific term refers to a different part of the representation of text, which is far more complicated than many programmers realize.
Ran into something like this recently - for future reference
import chardet
encoding = chardet.detect(string)
if encoding['encoding'] == 'ascii':
print 'string is in ascii'
which you could use with:
string_ascii = string.decode(encoding['encoding']).encode('ascii')
How about doing this?
import string
def isAscii(s):
for c in s:
if c not in string.ascii_letters:
return False
return True
I found this question while trying determine how to use/encode/decode a string whose encoding I wasn't sure of (and how to escape/convert special characters in that string).
My first step should have been to check the type of the string- I didn't realize there I could get good data about its formatting from type(s). This answer was very helpful and got to the real root of my issues.
If you're getting a rude and persistent
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 263: ordinal not in range(128)
particularly when you're ENCODING, make sure you're not trying to unicode() a string that already IS unicode- for some terrible reason, you get ascii codec errors. (See also the Python Kitchen recipe, and the Python docs tutorials for better understanding of how terrible this can be.)
Eventually I determined that what I wanted to do was this:
escaped_string = unicode(original_string.encode('ascii','xmlcharrefreplace'))
Also helpful in debugging was setting the default coding in my file to utf-8 (put this at the beginning of your python file):
# -*- coding: utf-8 -*-
That allows you to test special characters ('àéç') without having to use their unicode escapes (u'\xe0\xe9\xe7').
>>> specials='àéç'
>>> specials.decode('latin-1').encode('ascii','xmlcharrefreplace')
'àéç'
To improve Alexander's solution from the Python 2.6 (and in Python 3.x) you can use helper module curses.ascii and use curses.ascii.isascii() function or various other: https://docs.python.org/2.6/library/curses.ascii.html
from curses import ascii
def isascii(s):
return all(ascii.isascii(c) for c in s)
You could use the regular expression library which accepts the Posix standard [[:ASCII:]] definition.
A sting (str-type) in Python is a series of bytes. There is no way of telling just from looking at the string whether this series of bytes represent an ascii string, a string in a 8-bit charset like ISO-8859-1 or a string encoded with UTF-8 or UTF-16 or whatever.
However if you know the encoding used, then you can decode the str into a unicode string and then use a regular expression (or a loop) to check if it contains characters outside of the range you are concerned about.
Like #RogerDahl's answer but it's more efficient to short-circuit by negating the character class and using search instead of find_all or match.
>>> import re
>>> re.search('[^\x00-\x7F]', 'Did you catch that \x00?') is not None
False
>>> re.search('[^\x00-\x7F]', 'Did you catch that \xFF?') is not None
True
I imagine a regular expression is well-optimized for this.
import re
def is_ascii(s):
return bool(re.match(r'[\x00-\x7F]+$', s))
To include an empty string as ASCII, change the + to *.
To prevent your code from crashes, you maybe want to use a try-except to catch TypeErrors
>>> ord("¶")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ord() expected a character, but string of length 2 found
For example
def is_ascii(s):
try:
return all(ord(c) < 128 for c in s)
except TypeError:
return False
I use the following to determine if the string is ascii or unicode:
>> print 'test string'.__class__.__name__
str
>>> print u'test string'.__class__.__name__
unicode
>>>
Then just use a conditional block to define the function:
def is_ascii(input):
if input.__class__.__name__ == "str":
return True
return False

Categories