Compare result from hexdigest() to a string - python

I've got a generated MD5-hash, which I would like to compare to another MD5-hash from a string. The statement below is false, even though they look the same when you print them and should be true.
hashlib.md5("foo").hexdigest() == "acbd18db4cc2f85cedef654fccc4a4d8"
Google told me that I should encode the result from hexdigest(), since it doesn't return a string. However, the code below doesn't seem to work either.
hashlib.md5("foo").hexdigest().encode("utf-8") == "foo".encode("utf-8")

Python 2.7, .hexdigest() does return a str
>>> hashlib.md5("foo").hexdigest() == "acbd18db4cc2f85cedef654fccc4a4d8"
True
>>> type(hashlib.md5("foo").hexdigest())
<type 'str'>
Python 3.1
.md5() doesn't take a unicode (which "foo" is), so that needs to be encoded to a byte stream.
>>> hashlib.md5("foo").hexdigest()
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
hashlib.md5("foo").hexdigest()
TypeError: Unicode-objects must be encoded before hashing
>>> hashlib.md5("foo".encode("utf8")).hexdigest()
'acbd18db4cc2f85cedef654fccc4a4d8'
>>> hashlib.md5("foo".encode("utf8")).hexdigest() == 'acbd18db4cc2f85cedef654fccc4a4d8'
True

Using == for a hash comparison is likely a security vulnerability.
https://groups.google.com/forum/?fromgroups=#!topic/keyczar-discuss/VXHsoJSLKhM
It's possible for an attacker to look for timing differences and iterate through the keyspace efficiently and find a value that will pass the equality test.

hexdigest returns a string. Your first statement returns True in python-2.x.
In python-3.x you would need to encode argument to md5 function, in that case equality is also True. Without encoding it raises TypeError.

Related

str.isdigit() behaviour when handling strings

Assuming the following:
>>> square = '²' # Superscript Two (Unicode U+00B2)
>>> cube = '³' # Superscript Three (Unicode U+00B3)
Curiously:
>>> square.isdigit()
True
>>> cube.isdigit()
True
OK, let's convert those "digits" to integer:
>>> int(square)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '²'
>>> int(cube)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '³'
Oooops!
Could someone please explain what behavior I should expect from the str.isdigit() method when handling strings?
str.isdigit doesn't claim to be related to parsability as an int. It's reporting a simple Unicode property, is it a decimal character or digit of some sort:
str.isdigit()
Return True if all characters in the string are digits and there is at least one character, False otherwise. Digits include decimal characters and digits that need special handling, such as the compatibility superscript digits. This covers digits which cannot be used to form numbers in base 10, like the Kharosthi numbers. Formally, a digit is a character that has the property value Numeric_Type=Digit or Numeric_Type=Decimal.
In short, str.isdigit is thoroughly useless for detecting valid numbers. The correct solution to checking if a given string is a legal integer is to call int on it, and catch the ValueError if it's not a legal integer. Anything else you do will be (badly) reinventing the same tests the actual parsing code in int() performs, so why not let it do the work in the first place?
Side-note: You're using the term "utf-8" incorrectly. UTF-8 is a specific way of encoding Unicode, and only applies to raw binary data. Python's str is an "idealized" Unicode text type; it has no encoding (under the hood, it's stored encoded as one of ASCII, latin-1, UCS-2, UCS-4, and possibly also UTF-8, but none of that is visible at the Python layer outside of indirect measurements like sys.getsizeof, which only hints at the underlying encoding by letting you see how much memory the string consumes). The characters you're talking about are simple Unicode characters above the ASCII range, they're not specifically UTF-8.

f-string format specifier with None throws TypeError

Using plain f-strings with a NoneType object works:
>>> a = None
>>> f'{a}'
'None'
However, when using a format specifier, it breaks---as does str.format():
>>> f'{a:>6}'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported format string passed to NoneType.__format__
>>> '{:>6}'.format(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported format string passed to NoneType.__format__
Unexpectedly, (for me, at least) the old C-style string formatting works:
>>> '%10s' % a
' None'
What is going on here? I don't understand why f'{a:>6}' doesn't evaluate to ' None'. Why should a format specifier break it?
Is this a bug in python? If it is a bug, how would I fix it?
None is not a string, so f'{None:>6}' makes no sense. You can convert it to a string with f'{None!s:>6}'. !a, !s, and !r call ascii(), str(), and repr() respectively on an object.
None doesn't support format specifiers. It's up to each object type to determine how it wants to handle format specifiers, and the default is to reject them:
The __format__ method of object itself raises a TypeError if passed any non-empty string.
None inherits this default.
You seem to be expecting None to handle format specifiers the same way strings do, where '{:>6}'.format('None') == ' None'. It kind of sounds like you expect all types to handle format specifiers the way strings do, or you expect the string behavior to be the default. The way strings handle format specifiers is specific to strings; other types have their own handling.
You might be thinking, hey, why doesn't %10s fail too? First, the s requests that the argument be converted to a string by str before any further processing. Second, all conversion specifier handling in printf-style string formatting is performed by str.__mod__; it never delegates to the arguments to figure out what a conversion specifier means.
The accepted answer above explains why. A solution that I have used effectively is something along the lines of:
f"{mystring:.2f}" if mystring is not None else ""

Python formatted string literals for objects

one of a very cool new feature of Python3.6 is the implementation of Formatted string literals (https://docs.python.org/3.6/whatsnew/3.6.html#whatsnew36-pep498).
Unfortunately, it does not behave like the well known format() function:
>> a="abcd"
>> print(f"{a[:2]}")
>> 'ab'
As you see, slicing is possible (actually all python functions on string).
But format() will not work with slicing:
>> print("{a[:2]}".format(a="abcd")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: string indices must be integers
Is there a way to get the functionality of the new formatted string literals on string objects??
>> string_object = "{a[:2]}" # may also be comming from a file
>> # some way to get the result 'ab' with 'string_object'
The str.format syntax does not and will not support the full range of expressions that the newer f-strings will. You'll have to manually evaluate the slice expression outside of the string and supply it to the format function instead:
a = "abcd"
string_object = "{a}".format(a = a[:2])
It should also be noted there are subtle differences between the syntax allowed by f-strings and str.format, so that the former is not strictly a superset of the latter.
Nope, str.format tries to cast the indexes to a str first before applying them, that's why you get that error; it tries to index the string with str indices:
a = "abcd"
>>> a[:'2']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: slice indices must be integers or None or have an __index__ method
It really isn't meant for cases like that; "{a[::]}".format(a=a) would probably be evaluated as a[:':'] too I'd guess.
This is one of the reasons f-strings came about, in order to support any Python expressions' desire to be formatted.

printing numpy timedelta64 with format()

I would like to print a numpy.timedelta64() value in a formatted way. The direct method works well:
>>> import numpy as np
>>> print np.timedelta64(10,'m')
10 minutes
Which I guess comes from the __str__() method
>>> np.timedelta64(10,'m').__str__()
'10 minutes'
But when I try to print it with the format() function I get the following error:
>>> print "my delta is : {delta}".format(delta=np.timedelta64(10,'m'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: don't know how to convert scalar number to long
I would like to understand the underlying mechanism of the "string".format() function, and why it doesn't work in this particular case.
According to the Format String Syntax documentation:
The conversion field causes a type coercion before formatting.
Normally, the job of formatting a value is done by the __format__()
method of the value itself. However, in some cases it is desirable to
force a type to be formatted as a string, overriding its own
definition of formatting. By converting the value to a string before
calling __format__(), the normal formatting logic is bypassed.
>>> np.timedelta64(10,'m').__format__('')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: don't know how to convert scalar number to long
By appending !s conversion flag, you can force it to use str:
>>> "my delta is : {delta!s}".format(delta=np.timedelta64(10,'m'))
'my delta is : 10 minutes'
falsetru mentions one aspect of the problem. The other is why this errors at all.
Looking at the code for __format__, we see that it is a generic implementation.
The important part is:
else if (PyArray_IsScalar(self, Integer)) {
#if defined(NPY_PY3K)
obj = Py_TYPE(self)->tp_as_number->nb_int(self);
#else
obj = Py_TYPE(self)->tp_as_number->nb_long(self);
#endif
}
This triggers, and tries to run:
int(numpy.timedelta64(10, "m"))
but Numpy (rightly) says that you can't convert a number with units to a raw number.
This looks like a bug.
%s should be fine. It calls str() on the object.

How to format a write statement in Python?

I have data that I want to print to file. For missing data, I wish to print the mean of the actual data. However, the mean is calculated to more than the required 4 decimal places. How can I write to the mean to file and format this mean at the same time?
I have tried the following, but keep getting errors:
outfile.write('{0:%.3f}'.format(str(mean))+"\n")
First, remove the % since it makes your format syntax invalid. See a demonstration below:
>>> '{:%.3f}'.format(1.2345)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Invalid conversion specification
>>> '{:.3f}'.format(1.2345)
'1.234'
>>>
Second, don't put mean in str since str.format is expecting a float (that's what the f in the format syntax represents). Below is a demonstration of this bug:
>>> '{:.3f}'.format('1.2345')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Unknown format code 'f' for object of type 'str'
>>> '{:.3f}'.format(1.2345)
'1.234'
>>>
Third, the +"\n" is unnecessary since you can put the "\n" in the string you used on str.format.
Finally, as shown in my demonstrations, you can remove the 0 since it is redundant.
In the end, the code should be like this:
outfile.write('{:.3f}\n'.format(mean))
You don't need to convert to string using str(). Also, the "%" is not required. Just use:
outfile.write('{0:.3f}'.format(mean)+"\n")
First of all, the formatting of your string has nothing to do with your write statement. You can reduce your problem to:
string = '{0:%.3f}'.format(str(mean))+"\n"
outfile.write(string)
Then, your string specification is incorrect and should be:
string = '{0:.3f}\n'.format(mean)
outfile.write('{.3f}\n'.format(mean))

Categories