Why sha3 in different python's modules gives different result?

Why sha3 in different python's modules gives different result? - python

I am realized that default hashlib.sha3_256 hasher does not calculates hashes like other solution, for instance, other python's modules. Below, for instance, I comparing hashlib and sha3 implementation of sha2_256 algorithm on Python 3.6.3.
Implemetation from sha3 gaves correct result (according to other internet resources) while hashlib.sha3_256 result is completely different. How it could be? Am I missed something?
import sha3
import hashlib
def test(s):
k = sha3.keccak_256()
k.update(s.encode())
print('test string="{s}", sha3.keccak_256()="{h}"'.format(s=s, h=k.hexdigest()))
h = hashlib.sha3_256()
h.update(s.encode())
print('test string="{s}", hashlib.keccak_256()="{h}"'.format(s=s, h=h.hexdigest()))
test('')
test('eth')
test('foo.bar')
Results:
test string="", sha3.keccak_256()="c5d2460186f7233c927e7db2dcc703c0e500b653ca82273b7bfad8045d85a470"
test string="", hashlib.keccak_256()="a7ffc6f8bf1ed76651c14756a061d662f580ff4de43b49fa82d80a4b80f8434a"
test string="eth", sha3.keccak_256()="4f5b812789fc606be1b3b16908db13fc7a9adf7ca72641f84d75b47069d3d7f0"
test string="eth", hashlib.keccak_256()="4b3cdfda85c576e43c848d43fdf8e901d8d02553fec8ee56289d10b8dc47d997"
test string="foo.bar", sha3.keccak_256()="461e2b648c9a6c0c3e2cab45884ae0fcab21c655fcf588f2a45e6596e3e0e9a7"
test string="foo.bar", hashlib.keccak_256()="e55dea66e750540f599874a18596745b0c5705bc6873ca3ef1ccd2acbba88670"

I think they are different implementations:
>>> hashlib.sha3_256("asdf".encode()).hexdigest()
'dd2781f4c51bccdbe23e4d398b8a82261f585c278dbb4b84989fea70e76723a9'
>>> sha3.sha3_256("asdf".encode()).hexdigest()
'dd2781f4c51bccdbe23e4d398b8a82261f585c278dbb4b84989fea70e76723a9'
>>> sha3.keccak_256("asdf".encode()).hexdigest()
'4c8f18581c0167eb90a761b4a304e009b924f03b619a0c0e8ea3adfce20aee64'
After a search, I find that's true.
https://github.com/status-im/nim-keccak-tiny/issues/1
edit
After further search... It seems keccak_256 is not a standard implementation.

Related

Python sortedcontainers is too slow

#This is my code
from sortedcontainers import SortedList, SortedSet, SortedDict
import timeit
import random
def test_speed1(data):
SortedList(data)
def test_speed2(data):
sorted_data = SortedList()
for val in data:
sorted_data.add(val)
data = []
numpts = 10 ** 5
for i in range(numpts):
data.append(random.random())
print(f'Num of pts:{len(data)}')
sorted_data = SortedList()
n_runs=10
result = timeit.timeit(stmt='test_speed1(data)', globals=globals(), number=n_runs)
print(f'Speed1 is {1000*result/n_runs:0.0f}ms')
n_runs=10
result = timeit.timeit(stmt='test_speed2(data)', globals=globals(), number=n_runs)
print(f'Speed2 is {1000*result/n_runs:0.0f}ms')
enter image description here
The code for test speed2 is supposed to take 12~ ms (I checked the setup they report). Why does it take 123 ms (10X slowers)???
test_speed1 runs in 15 ms (which makes sense)
I am running in Conda.
The
This is where they outlined the performance
https://grantjenks.com/docs/sortedcontainers/performance.html

You are presumably not executing your benchmark in the same conditions as they do:
you are not using the same benchmark code,
you don't use the same computer with the same performance characteristics,
you are not using the same Python version and environment,
you are not running the same OS,
etc.
Hence, the benchmark results are not comparable and you cannot conclude anything about the performance (and certainly not that "sortedcontainers is too slow").
Performance is only relative to a given execution context and they only stated that their solution is faster relative to other concurrent solutions.
If you really wish to execute the benchmark on your computer, follow the instructions they give in the documentation.

"init() uses Python’s highly optimized sorted() function while add() cannot.". This is why the speed2 is faster than the speed3.
This is the answer I got from the developers on the sortedcontainers library.

How to see the "Bundle" output of Hypothesis Python library? (Stateful testing)

When using the hypothesis library and performing stateful testing, how can I see or output the Bundle "services" the library is trying on my code?
Example
import hypothesis.strategies as st
from hypothesis.strategies import integers
from hypothesis.stateful import Bundle, RuleBasedStateMachine, rule, precondition
class test_servicediscovery(RuleBasedStateMachine):
services = Bundle('services')
#rule(target=services, s=st.integers(min_value=0, max_value=2))
def add_service(self, s):
return s
The question is: how do I print / see the Bundle "services" variable, generated by the library?

In the example you've given, the services bundle isn't being tried on your code - you're adding things to it, but never using them as inputs to another rule.
If you are, running Hypothesis in verbose mode will show all inputs as they happen; or even in normal mode failing examples will print all the values used.

Using Numpy in different platforms

I have a piece of code which computes the Helmholtz-Hodge Decomposition.
I've been running on my Mac OS Yosemite and it was working just fine. A month ago, however, my Mac got pretty slow (it was really old), and I opted to buy a new notebook (Windows 8.1, Dell).
After installing all Python libs and so on, I continued my work running this same code (versioned in Git). And then the result was pretty weird, completely different from the one obtained in the old notebook.
For instance, what I do is to construct to matrices a and b(really long calculus) and then I call the solver:
s = numpy.linalg.solve(a, b)
This was returning a (wrong, and different of the result obtained in my Mac, which was right).
Then, I tried to use:
s = scipy.linalg.solve(a, b)
And the program exits with code 0 but at the middle of it.
Then, I just made a simple test of:
print 'here1'
s = scipy.linalg.solve(a, b)
print 'here2'
And here2 is never printed.
I tried:
print 'here1'
x, info = numpy.linalg.cg(a, b)
print 'here2'
And the same happens.
I also tried to check the solution after using numpy.linalg.solve:
print numpy.allclose(numpy.dot(a, s), b)
And I got a False (?!).
I don't know what is happening, how to find a solution, I just know that the same code runs in my Mac, but it would be very good if I could run it in other platforms. Now I'm stucked in this problem (don't have a Mac anymore) and with no clue about the cause.
The weirdest thing is that I don't receive any error on runtime warning, no feedback at all.
Thank you for any help.
EDIT:
Numpy Suit Test Results:
Scipy Suit Test Results:

Download Anaconda package manager
http://continuum.io/downloads
When you download this it will already have all the dependencies for numpy worked out for you. It installs locally and will work on most platforms.

This is not really an answer, but this blog discusses in length the problems of having a numpy ecosystem that evolves fast, at the expense of reproducibility.
By the way, which version of numpy are you using? The documentation for the latest 1.9 does not report any method called cg as the one you use...
I suggest the use of this example so that you (and others) can check the results.
>>> import numpy as np
>>> import scipy.linalg
>>> np.random.seed(123)
>>> a = np.random.random(size=(10000, 10000))
>>> b = np.random.random(size=(10000,))
>>> s_np = np.linalg.solve(a, b)
>>> s_sc = scipy.linalg.solve(a, b)
>>> np.allclose(s_np,s_sc)
>>> s_np
array([-15.59186559, 7.08345804, 4.48174646, ..., -16.43310046,
-8.81301553, -10.77509242])

I hope you can find the answer - one option in the future is to create a virtual machine for each of your projects, using Docker. This allows easy portability.
See a great article here discussing Docker for research.

Python vs Go Hashing Differences

I have a Go program
package main
import (
"crypto/hmac"
"crypto/sha1"
"fmt"
)
func main() {
val := []byte("nJ1m4Cc3")
hasher := hmac.New(sha1.New, val)
fmt.Printf("%x\n", hasher.Sum(nil))
// f7c0aebfb7db2c15f1945a6b7b5286d173df894d
}
And a Python (2.7) program that is attempting to reproduce the Go code (using crypto/hmac)
import hashlib
val = u'nJ1m4Cc3'
hasher = hashlib.new("sha1", val)
print hasher.hexdigest()
# d67c1f445987c52bceb8d6475c30a8b0e9a3365d
Using the hmac module gives me a different result but still not the same as the Go code.
import hmac
val = 'nJ1m4Cc3'
h = hmac.new("sha1", val)
print h.hexdigest()
# d34435851209e463deeeb40cba7b75ef
Why do these print different values when they use the same hash on the same input?

You have to make sure that
the input in both scenarios is equivalent and that
the processing method in both scenarios is equivalent.
In both cases, the input should be the same binary blob. In your Python program you define a unicode object, and you do not take control of its binary representation. Replace the u prefix with a b, and you are fine (this is the explicit way to define a byte sequence in Python 2.7 and 3). This is not the actual problem, but better be explicit here.
The problem is that you apply different methods in your Go and Python implementations.
Given that Python is the reference
In Go, no need to import "crypto/hmac" at all, in Python you just build a SHA1 hash of your data. In Go, the equivalent would be:
package main
import (
"crypto/sha1"
"fmt"
)
func main() {
data := []byte("nJ1m4Cc3")
fmt.Printf("%x", sha1.Sum(data))
}
Test and output:
go run hashit.go
d67c1f445987c52bceb8d6475c30a8b0e9a3365d
This reproduces what your first Python snippet creates.
Edit: I have simplified the Go code a bit, to not make Python look more elegant. Go is quite elegant here, too :-).
Given that Go is the reference
import hmac
import hashlib
data = b'nJ1m4Cc3'
h = hmac.new(key=data, digestmod=hashlib.sha1)
print h.hexdigest()
Test & output:
python hashit.py
f7c0aebfb7db2c15f1945a6b7b5286d173df894d
This reproduces what your Go snippet creates. I am, however, not sure about the cryptographic significance of an HMAC when one does use an empty message.

Unique session id in python

How do I generate a unique session id in Python?

UPDATE: 2016-12-21
A lot has happened in a the last ~5yrs. /dev/urandom has been updated and is now considered a high-entropy source of randomness on modern Linux kernels and distributions. In the last 6mo we've seen entropy starvation on a Linux 3.19 kernel using Ubuntu, so I don't think this issue is "resolved", but it's sufficiently difficult to end up with low-entropy randomness when asking for any amount of randomness from the OS.
I hate to say this, but none of the other solutions posted here are correct with regards to being a "secure session ID."
# pip install M2Crypto
import base64, M2Crypto
def generate_session_id(num_bytes = 16):
return base64.b64encode(M2Crypto.m2.rand_bytes(num_bytes))
Neither uuid() or os.urandom() are good choices for generating session IDs. Both may generate random results, but random does not mean it is secure due to poor entropy. See "How to Crack a Linear Congruential Generator" by Haldir or NIST's resources on Random Number Generation. If you still want to use a UUID, then use a UUID that was generated with a good initial random number:
import uuid, M2Crypto
uuid.UUID(bytes = M2Crypto.m2.rand_bytes(num_bytes)))
# UUID('5e85edc4-7078-d214-e773-f8caae16fe6c')
or:
# pip install pyOpenSSL
import uuid, OpenSSL
uuid.UUID(bytes = OpenSSL.rand.bytes(16))
# UUID('c9bf635f-b0cc-d278-a2c5-01eaae654461')
M2Crypto is best OpenSSL API in Python atm as pyOpenSSL appears to be maintained only to support legacy applications.

You can use the uuid library like so:
import uuid
my_id = uuid.uuid1() # or uuid.uuid4()

Python 3.6 makes most other answers here a bit out of date. Versions including 3.6 and beyond include the secrets module, which is designed for precisely this purpose.
If you need to generate a cryptographically secure string for any purpose on the web, refer to that module.
https://docs.python.org/3/library/secrets.html
Example:
import secrets
def make_token():
"""
Creates a cryptographically-secure, URL-safe string
"""
return secrets.token_urlsafe(16)
In use:
>>> make_token()
'B31YOaQpb8Hxnxv1DXG6nA'

import os, base64
def generate_session():
return base64.b64encode(os.urandom(16))

It can be as simple as creating a random number. Of course, you'd have to store your session IDs in a database or something and check each one you generate to make sure it's not a duplicate, but odds are it never will be if the numbers are large enough.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why sha3 in different python's modules gives different result? - python

Related

Python sortedcontainers is too slow

How to see the "Bundle" output of Hypothesis Python library? (Stateful testing)

Using Numpy in different platforms

Python vs Go Hashing Differences

Unique session id in python

Categories

Resources