I'm aware I could just do 0x0 - 9223372036854775807 - 1, but is there a bit shift operation I could do instead to make this faster? Context: I'm fed a uint64 number in hex string form but I want to store this number inside an 8 byte signed integer attr in PostgreSQL. Also, I would need a way to convert it back from signed integer to unsigned hex
Related
I'm looking to get a hash value for string and integer inputs.
Using murmurhash3, I'm able to do it for strings but not integers:
pip install murmurhash3
import mmh3
mmh3.hash(34)
Returns the following error:
TypeError: a bytes-like object is required, not 'int'
I could convert it to bytes like this:
mmh3.hash(bytes(34))
But then I'll get an error message if the input is string
How do I overcome this without converting the integer to string?
How do I overcome this without converting the integer to string?
You can't. Or more precisely, you need to convert it to bytes or str in some way, but it needn't be a human-readable text form like b'34'/'34'. A common approach on Python 3 would be:
my_int = 34 # Or some other value
my_int_as_bytes = my_int.to_bytes((my_int.bit_length() + 7) // 8, 'little')
which makes a minimalist raw bytes representation of the original int (regardless of length); for 34, you'd get b'"' (because it only takes one byte to store it, so you're basically getting a bytes object with its ordinal value), but for larger ints it still works (unlike mucking about with chr), and it's always as small as possible (getting 8 bits of data per byte, rather than a titch over 3 bits per byte as you'd get converting to a text string).
If you're on Python 2 (WHY?!? It's been end-of-life for nearly a year), int.to_bytes doesn't exist, but you can fake it with moderate efficiency in various ways, e.g. (only handling non-negative values, unlike to_bytes which handles signed values with a simple flag):
from binascii import unhexlify
my_int_as_bytes = unhexlify('%x' % (my_int,))
I am trying to implement the OS2IP algorithm in Python. However I do not know how I can convert a character string, say "Men of few words are the best men." into the octet format.
Use the .encode() method of str. For example:
"öä and ü".encode("utf-8")
displays
b'\xc3\xb6\xc3\xa4 and \xc3\xbc'
If you then want to convert this to an int, you can just use the int.from_bytes() method, e.g.
the_bytes = "öä and ü".encode("utf-8")
the_int = int.from_bytes(the_bytes, 'big')
print(the_int)
displays
236603614466389086088250300
In preparing for an RSA encryption, a padding algorithm is typically applied to the result of the first encoding step to pad the byte array out to the size of the RSA modulus, and then padded byte array is converted to an integer. This padding step is critical to the security of RSA cryptography.
I try to make a inter process communication between a Python and c program via winsockets. Sending a string does work, but now I try to send an int array from the c socket to the python socket.
I already found out that I have to use htonl() to convert the int array into a byte stream as the send function of winsock2 cannot send int arrays directly.
Now I want to use ntohl() in the python socket but the receive function returns bytes whereas ntohl() needs an integer value as input.
Here is my code
C-Side (just relevant parts):
uint32_t a[1] = {1231};
uint32_t a_converted[1]={0};
a_converted[0] = htonl(a[0]);
iResult = send( ConnectSocket, ( char *) a_converted, sizeof( a_converted), 0 );
Python Side (just relevant parts):
data = connection.recv(16)
data_i = socket.ntohl(data)
What you received is string of bytes, did not ntohl cause exception?
You may use struct module to unpack - for 16 bytes
struct.unpack('!4I', data)
Meaning - unpack 4 unsigned 32-bit integers in network order
RTM
(I cannot test it - try it on your own)
EDIT:
Ooops, did not read your comment through. According to sockets docs, recv should return object of type bytes. If it returns object of type str - you should convert it to bytes - in Python3 it would be data.encode()
PS Which Python are you on?
You said you have managed to send strings over the connection. I assume you sent a char* and received it in python as a string. What you have done is sent a stream of bytes.
Now you want to send an array of integers. In the memory, the integers are again stored as bytes.
Each integer could occupy 4/8 bytes. You can check this before hand by printing
printf("Size of integer is %zu", sizeof(int));
Okay, great now we know how many bytes we need to send. Say it is 4 for now.
We also need to know the endianness of the integers but lets assume big endian for now.
This means the lowest significant byte will be first and the highest significant byte at the end.
So now you can send the integer array exactly lile you sent, by casting the array to char* and sending sizeof(array).
On the receiving side though, you just have a stream of bytes. To convert it to array of integers you need to get 4 bytes at a time and combine it into an integer.
We can do that as follows.
Say there are total 10 integers. You have to pass this information on separately somehow.
bytes = connection.recv(10*4)
array = []
for i in range(10):
x = ord(bytes[i*4+0])
x += ord(bytes[i*4+1]) << 8
x += ord(bytes[i*4+2]) << 16
x += ord(bytes[i*4+3]) << 24
array += [x]
print x
And you will be able to see you array of integers.
Here the function ord converts a character to its ASCII equivalent integer.
Side notes:
Now, if your system has size of integer as 8 instead of 4, you need to extend the body of the loop in python. It will go till 56. Also each of the index in bytes will be i*8+...
Similarly if the endianess is different, the order of the elements will change. Basically the indices on bytes will go from i*4+3 to i*4+0.
I need to get an int through the network. Is this the proper way to convert to bytes in big-endian?
pack("I",socket.htonl(integer_value))
I unpack it as:
socket.ntohl(unpack("I",data)[0])
I noticed that pack-unpack also have the <> to use for endian conversion so I am not sure if I could just directly use that instead or if htonl is safer.
You should use only the struct module for communicating with another system. By using the htonl first, you'll end up with an indeterminate order being transmitted.
Since you need to convert the integer into a string of bytes in order to send it to another system, you'll need to use struct.pack (because htonl just returns a different integer than the one passed as argument and you cannot directly send an integer). And in using struct.pack you must choose an endianness for that string of bytes (if you don't specify one, you'll get a default ordering which may not be the same on the receiving side so you really need to choose one).
Converting an integer to a sequence of bytes in a definite order is exactly what struct.pack("!I", integer_value) does and a sequence of bytes in a definite order is exactly what you need on the receiving end.
On the other hand, if you use struct.pack("!I", socket.htonl(integer_value)), what does that do? Well, first it puts the integer into big-endian order (network byte order), then it takes your already big-endian integer and converts it to bytes in "big-endian order". But, on a little endian machine, that will actually reverse the ordering again, and you will end up transmitting the integer in little-endian byte order if you do both those two operations.
But on a big-endian machine htonl is a no-op, and then you're converting the result into bytes in big-endian order.
So using ntohl actually defeats the purpose and a receiving machine would have to know the byte-order used on the sending machine in order to properly decode it. Observe...
Little-endian box:
>>> print(socket.htonl(27))
452984832
>>> print(struct.pack("!I", 27))
b'\x00\x00\x00\x1b'
>>> print(struct.pack("!I", socket.htonl(27)))
b'\x1b\x00\x00\x00'
Big-endian box:
>>> print(socket.htonl(27))
27
>>> print(struct.pack("!I", 27))
b'\x00\x00\x00\x1b'
>>> print(struct.pack("!I", socket.htonl(27)))
b'\x00\x00\x00\x1b'
struct.unpack() uses '!' in the format specifiers for network byte order. But its the same as '>'...
I'm fetching a value(2 bytes) from a register using AARDVARK python module. The value are returned in Hex format. So I convert the value to decimal.Since the register value can be negative, I need to convert to signed integer if required.
I came across the following piece of code that does this and I'm unable to fathom the logic behind it.
myValue = int(myValue,16)
if( myValue > 32768):
myValue = ((myValue+0x8000)&0xFFFF) - 0x8000
For example if myValue read from register is 32769 , the corresponding signed representation after using the above piece of code is -32767