Go has different result from mmh3 of Python

Go has different result from mmh3 of Python - python

I tried https://github.com/spaolacci/murmur3,roberson-io/mmh3 and many go version. But I get the different result from python mmh3.
Python example:
import mmh3
print(mmh3.hash("foods",45))
Go example:
package main
import (
"fmt"
"github.com/spaolacci/murmur3"
)
func main() {
mHash := murmur3.New32WithSeed(45)
mHash.Write([]byte("foods"))
hashNum := mHash.Sum32()
fmt.Println(hashNum)
fmt.Printf("%d\n", murmur3.Sum64WithSeed([]byte("foods"), 45))
}
I want to get same hash value with python mmh3. So how to get the same hash value with mmh3 from python?

All those hash methods return unsigned integers in Go:
func (Hash32) Sum32() uint32
func Sum64WithSeed(data []byte, seed uint32) uint64
While in Python they return signed integers by default:
hash(key[, seed=0, signed=True]) -> hash value Return a 32 bit integer.
If you want the same decimal number, you either have to return an unsigned int from Python:
# 3049460612
print(mmh3.hash("foods",45,signed=False))
Or a signed int from Go:
mHash := murmur3.New32WithSeed(45)
mHash.Write([]byte("foods"))
hashNum := mHash.Sum32()
// -1245506684
fmt.Println(int32(hashNum))

Related

When I write an int64 type number to the function, a different number is returned

I converted the golang code to c code and called it from python. but when the function should return a number close to the number I wrote inside, it returns a very different number.
main.py
import ctypes
library = ctypes.cdll.LoadLibrary('./maintain.so')
hello_world = library.helloWorld
numb = 5000000000
n = ctypes.c_int64(numb)
x = hello_world(n)
print(x)
returning number: 705032703
golang code that I converted to c code
main.go
package main
import "C"
func helloWorld(x int64) int64 {
s := int64(1)
for i := int64(1); i < x; i++ {
s = i
}
return s
}

You're making the mistake 99% of new ctypes users: not declaring the argument types and return type of the function used. ctypes assumes c_int for scalars and c_void_p for pointers on arguments and c_int for return type unless told otherwise. If you define them, you don't have to wrap every parameter in the type you want to pass, because ctypes will already know.
I'm not set up for Go, but here's a simple C implementation of the function with a 64-bit argument and return type:
#include <stdint.h>
#ifdef _WIN32
# define API __declspec(dllexport)
#else
# define API
#endif
API int64_t helloWorld(int64_t x) {
return x + 1;
}
The Python code to call it:
import ctypes as ct
dll = ct.CDLL('./test')
dll.helloWorld.argtypes = ct.c_int64, # sequence of argument types
dll.helloWorld.restype = ct.c_int64 # return type
# Note you don't have to wrap the argument, e.g. c_int64(5000000000).
print(dll.helloWorld(5_000_000_000))
Output:
5000000001

What is `readUInt32BE` of nodeJS in Python?

I am translating a source of NodeJS to Python. However there is a function readUInt32BE that I do not quite understand how it works
Original Code
const buf = Buffer.from("vgEAAwAAAA1kZXYubG9yaW90LmlvzXTUl6ESlOrvJST-gsL_xQ==", 'base64');
const appId = parseInt(buf.slice(0, 4).toString('hex'), 16);
const serverIdLength = buf.slice(4, 8).readUInt32BE(0);
Here is what I have tried so far in Python
encodeToken = base64.b64decode("vgEAAwAAAA1kZXYubG9yaW90LmlvzXTUl6ESlOrvJST-gsL_xQ==")
appId = encodeToken[:4]
appId = appId.hex()
serverIdLength = ......
If possible, can you write a function that works the same as readUInt32BE(0) and explain it for me ? Thanks

I'm assuming from the name that the function interpreters an arbitrary sequence of 4 bytes as an unsigned 32-bit (big endian) integer.
The corresponding Python function would be struct.unpack with an appropriate format string.
import struct
appId = encodeToken[:4]
serverIdLength = struct.unpack(">I", appId)[0]
# ">" means "big-endian"
# "I" means 4-byte unsigned integer
No need to to get the a hex representation of the bytes first. unpack always returns a tuple, even if only one value is created by the format string, so you need to take the first element of that tuple as the final value.

Python C API check if int64 is unsigned

I'm writing a Python3 script with some computationally heavy sections in C, using the Python C API. When dealing with int64s, I can't figure out how to ensure that an input number is an unsigned int64; that is , if it's smaller than 0. As the official documentation suggests, I'm using PyArg_ParseTuple() with the formatter K - which does not check for overflow. Here is my C code:
static PyObject* from_uint64(PyObject* self, PyObject*){
uint64_t input;
PyObject* output;
if (!PyArg_ParseTuple(args, "K", &input)){
return PyErr_Format(PyExc_ValueError, "Wrong input: expected unsigned 64-bit integer.");
}
return NULL;
}
However, calling the function with a negative argument throws no error, and the input number is casted to unsigned. E.g., from_uint64(-1) will result in input=2^64-2. As expected, since there's no overflow check.
What would be the correct way of determining whether the input number is negative, possibly before parsing it?

You should use
unsigned long long input = PyLong_AsUnsignedLongLong(args);
You can then check with
if (PyErr_Occurred()) {
// handle out of range here
}
if the number was unsuitable for an unsigned long long.
See also the Python 3 API documentation on Integer Objects

With a little modification from #Ctx 's answer:
The solution is to first parse the input as an object (so, not directly from args), then check its type:
static PyObject* from_uint64(PyObject* self, PyObject* args){
PyObject* output;
PyObject* input_obj;
if (!PyArg_ParseTuple(args, "O", &input_obj)){
return PyErr_Format(PyExc_TypeError, "Wrong input: expected py object.");
}
unsigned long long input = PyLong_AsUnsignedLongLong(input_obj);
if(input == -1 && PyErr_Occurred()) {
PyErr_Clear();
return PyErr_Format(PyExc_TypeError, "Parameter must be an unsigned integer type, but got %s", Py_TYPE
(input_obj)->tp_name);
}
This code, as expected, works on any input in [0, 2^64-1] and throws error on integers outside the boundaries as well as illegal types like float, string, etc.

Python - Convert a signed float to Unsigned Long( DWORD for win32 )

I know python doesn't have unsigned variables but I need to convert one from a program that runs python( Blender ) to a win32 application written in C++. I know I can convert an integer like so:
>>> int i = -1
>>> _ + 2**32
How can I take a float like: 0.2345f and convert it to a long type? I will need to convert to long in python and then back to float in win32( c++ )...
typically in C++ it is down by
>>>float f = 0.2345f;
>>>DWORD dw = *reinterpret_cast< DWORD* >( &f );
this produces an unsigned long ... and to convert it back is simply the reverse:
>>>FLOAT f = *reinterpret_cast< FLOAT* >( &dw );

You can use struct.pack and struct.unpack for this. Note though that it is not a cast (i.e. a reinterpretation of the same memory), but a converter (copy to a new piece of memory).
import struct
def to_float(int_):
return struct.unpack('d', struct.pack('q', int_))[0]
def to_long(float_):
return struct.unpack('q', struct.pack('d', float_))[0]
data = 0.2345
long_data= to_long(data) #4597616773191482474
new_data = to_float(long_data) #0.2345

i = 0.2345
converted = long(i)

Can I create a PyObject* (numpy.float32)

I am trying to implement a function in C (Extending Python) to return a numpy.float32 data type. Is it possible to actually create an object and return it, such that in python the object returned from calling the function is an instance of numpy.float32 ?
(C Extension)
PyObject *numpyFloatFromFloat(float d)
{
ret = SomeAPICall(d)
return ret;
}
(in python)
a = something_special()
type(a)
-> numpy.float32
Right now all attempts at using the API documented in the reference documentation illustrate how to make an Array which yields a numpy.ndarray, and so far using the data types yields a C-type float which converts to a double in python. And for some reason that I'm unaware of, I really need an actual IEEE 754 float32 at the end of this function.
Solution thus far:
something.pyx:
cdef extern from "float_gen.h"
float special_action(void)
def numpy_float_interface():
return numpy.float32(special_action())
float_gen.h
static inline float special_action() { return 1.0; }
I don't see any loss in data here but I can't be certain. I know a numpy.float32 is treated as a C float or float32_t so assuming when I call special_action in the pyx file that it doesn't convert it to a double (as python does) it should be lossless.
Edit
The ultimate solution was very different, I just had to understand how to properly extend Python in C with the numpy library.
Below just returns a np.float32(32)
static PyObject *get_float(PyObject *self, PyObject *args) {
float v = 32;
PyObject *np_float32_val = NULL;
PyArray_Descr *descr = NULL;
if(! PyArg_ParseTuple(args, ""))
return NULL;
if(! (descr = PyArray_DescrFromType(NPY_FLOAT32))) {
PyErr_SetString(PyExc_TypeError, "Improper descriptor");
return NULL;
}
np_float32_val = PyArray_Scalar(&v, descr, NULL);
printf("%lu\n", np_float32_val->ob_refcnt);
return np_float32_val;
}

This simple module returns np.int32 from a C float. The cdef float isn't really necessary as np.float32() should coerce whatever you give to it to a np.float32.
test_mod.pyx
import numpy as np
def fucn():
cdef float a
a = 1
return np.float32(a)
tester.py
import pyximport
pyximport.install()
import test_mod
a = test_mod.func()
print type(a) # <type 'numpy.float32'>

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Go has different result from mmh3 of Python - python

Related

When I write an int64 type number to the function, a different number is returned

What is `readUInt32BE` of nodeJS in Python?

Python C API check if int64 is unsigned

Python - Convert a signed float to Unsigned Long( DWORD for win32 )

Can I create a PyObject* (numpy.float32)

Categories

Resources