Integer to Unique String

Integer to Unique String - python

There's probably someone else who asked a similar question, but I didn't take much time to search for this, so just point me to it if someone's already answered this.
I'm trying to take an integer (or long) and turn it into a string, in a very specific way.
The goal is essentially to split the integer into 8-bit segments, then take each of those segments and get the corresponding ASCII character for that chunk, then glue the chunks together.
This is easy to implement, but I'm not sure I'm going about it in the most efficient way.
>>> def stringify(integer):
output = ""
part = integer & 255
while integer != 0:
output += chr(part)
integer = integer >> 8
return output
>>> stringify(10)
'\n'
>>> stringify(10 << 8 | 10)
'\n\n'
>>> stringify(32)
' '
Is there a more efficient way to do this?
Is this built into Python?
EDIT:
Also, as this will be run sequentially in a tight loop, is there some way to streamline it for such use?
>>> for n in xrange(1000): ## EXAMPLE!
print stringify(n)
...

struct can easily do this for integers up to 64 bits in size. Any larger will require you to carve the number up first.
>>> struct.pack('>Q', 12345678901234567890)
'\xabT\xa9\x8c\xeb\x1f\n\xd2'

Related

Is there a built-in python function to count bit flip in a binary string?

Is there a built-in python function to count bit flip in a binary string? The question I am trying to solve is given a binary string of arbitrary length, how can I return the number of bit flips of the string. For example, the bit flip number of '01101' is 3, and '1110011' is 2, etc.
The way I can come up with to solve this problem is to use a for loop and a counter. However, that seems too lengthy. Is there a way I can do that faster? Or is there a built-in function in python that allows me to do that directly? Thanks for the help!

There is a very fast way to do that without any explicit loops and only using Python builtins: you can convert the string to a binary number, then detect all the bit flips using a XOR-based integer tricks and then convert the integer back to a string to count the number of bit flips. Here is the code:
# Convert the binary string `s` to an integer: "01101" -> 0b01101
n = int(s, 2)
# Build a binary mask to skip the most significant bit of n: 0b01101 -> 0b01111
mask = (1 << (len(s)-1)) - 1
# Check if the ith bit of n is different from the (i+1)th bit of n using a bit-wise XOR:
# 0b01101 & 0b01111 -> 0b1101 (discard the first bit)
# 0b01101 >> 1 -> 0b0110
# 0b1101 ^ 0b0110 -> 0b1011
bitFlips = (n & mask) ^ (n >> 1)
# Convert the integer back to a string and count the bit flips: 0b1011 -> "0b1011" -> 3
flipCount = bin(bitFlips).count('1')
This trick is much faster than other methods since integer operations are very optimized compare to a loop-based interpreted codes or the ones working on iterables. Here are performance results for a string of size 1000 on my machine:
ljdyer's solution: 96 us x1.0
Karl's solution: 39 us x2.5
This solution: 4 us x24.0
If you are working with short bounded strings, then there are even faster ways to count the number of bits set in an integer.

Don't know about a built in function, but here's a one-liner:
bit_flip_count = len([x for x in range(1, len(x0)) if x0[x] != x0[x-1]])

Given a sequence of values, you can find the number of times that the value changes by grouping contiguous values and then counting the groups. There will be one more group than the number of changes (since the elements before the first change are also in a group). (Of course, for an empty sequence, this gives you a result of -1; you may want to handle this case separately.)
Grouping in Python is built-in, via the standard library itertools.groupby. This tool only considers contiguous groups, which is often a drawback (if you want to make a histogram, for example, you have to sort the data first) but in our case is exactly what we want. The overall interface of this tool is a bit complex, but in our case we can use it simply:
from itertools import groupby
def changes_in(sequence):
return len(list(groupby(sequence))) - 1

Add one to end of alphanumeric string in Python

I'm trying to efficiently add one to the end of a string like this:
tt0000001 --> tt0000002 but I'm not sure how to accomplish this.
A complicated way of doing this is to remove the 2 t's at the beginning, count the number of non-zero digits (let's call that number z), make the string an int, add 1, and then create a string with 2 t's, 6 - z 0's, and then the int, but since I need to use many strings (ex: tt0000001, then tt0000002 then tt0000003, etc) many times, it would be great to have a more efficient way of doing this.
Would anyone know how to do this? A one-liner would be ideal if possible.
Thank you!

What you describe is essentially correct. It's not as difficult as you suggest, though, as creating a 0-padded string from an integer is supported.
As long as you know that the number is 7 digits, you can do something like
>>> x = 'tt0000001'
>>> x = f'tt{int(x.lstrip("t"))+1:07}'
>>> x
'tt0000002'
Even simpler, though, is to keep just an integer variable, and only (re)construct the label as necessary each time you increment the integer.
>>> x = 1
>>> x += 1
>>> f'tt{x:07}'
'tt0000002'
>>> x += 1
>>> f'tt{x:07}'
'tt0000003'

Split integer into two concatenated hex strings- Python

I need to transmit a value that is larger than 65535 via two different hex strings so that when the strings are received, they can be concatenated to form the integer again. For example if the value was 70000 then the two strings would be 0x0001 and 0x1170.
I thought it would be as simple as converting the integer to hex then shifting it right by 4 to get the top string and removing all but the last 4 characters for the bottom.
I think I might be struggling with some syntax (fairly new to Python) and probably some of the logic too. Can anyone think of an easy way to do this?
Thanks

Use divmod builtin function:
>>> [hex(x) for x in divmod(70000, 65536)]
['0x1', '0x1170']

Your algorithm can be implemented easily, as in Lev Levitsky's answer:
hex(big)[2:-4], hex(big)[-4:]
However, it will fail for numbers under 65536.
You could fix that, but you're probably better off splitting the number, then converting the two halves into hex, instead of splitting the hex string.
ecatmur's answer is probably the simplest way to do this:
[hex(x) for x in divmod(70000, 65536)]
Or you could translate your "shift right/truncate" algorithm on the numbers like this:
hex(x >> 16), hex(x & 0xFFFF)
If you need these to be strings like '0x0006' rather than '0x6', instead of calling hex on the parts, you can do this:
['%#06x' % (x,) for x in divmod(x, 65536)]
Or, using the more modern string formatting style:
['0x{:04x}'.format(x) for x in divmod(x, 65536)]
But on the other side, you again probably want to undo this by converting to ints first and then shifting and masking the numbers, instead of concatenating the strings. The inverse of ecatmur's answer is:
int(bighalf) * 65536 + int(smallhalf)
The (equivalent) inverse of the shift/mask implementation is:
(int(bighalf) << 16) | int(smallhalf)
And in that case, you don't need the extra 0s on the left.
It's also worth pointing out that none of these algorithms will work if the number can be negative, or greater than 4294967295, but only because the problem is impossible in those cases.

You mean like this?
In [1]: big = 12345678
In [2]: first, second = hex(big)[2:][:-4], hex(big)[2:][-4:]
In [3]: first, second
Out[3]: ('bc', '614e')
In [4]: int(first+second, 16)
Out[4]: 12345678

Being wary of big/little endians, what you could do to keep it simple is:
val = 70000
to_send = '{:08X}'.format(val) # '00011170'
decoded = int('00011170', 16) # 70000
EDIT: to be very clear then...
hex1, hex2 = to_send[:4], to_send[4:] # send these two and on receipt
my_number = int(hex1 + hex2, 16)

for numbers greater than 65536 or for numbers whose with length >=5, you can use slicing:
>>> num=70000
>>> var1=hex(num)[:-4]
>>> var2='0x'+hex(num)[-4:]
>>> integ=int(var1+var2[2:],16)
>>> print(integ)
70000

Is there a way to pad to an even number of digits?

I'm trying to create a hex representation of some data that needs to be transmitted (specifically, in ASN.1 notation). At some points, I need to convert data to its hex representation. Since the data is transmitted as a byte sequence, the hex representation has to be padded with a 0 if the length is odd.
Example:
>>> hex2(3)
'03'
>>> hex2(45)
'2d'
>>> hex2(678)
'02a6'
The goal is to find a simple, elegant implementation for hex2.
Currently I'm using hex, stripping out the first two characters, then padding the string with a 0 if its length is odd. However, I'd like to find a better solution for future reference. I've looked in str.format without finding anything that pads to a multiple.

def hex2(n):
x = '%x' % (n,)
return ('0' * (len(x) % 2)) + x

To be totally honest, I am not sure what the issue is. A straightforward implementation of what you describe goes like this:
def hex2(v):
s = hex(v)[2:]
return s if len(s) % 2 == 0 else '0' + s
I would not necessarily call this "elegant" but I would certainly call it "simple."

Python's binascii module's b2a_hex is guaranteed to return an even-length string.
the trick then is to convert the integer into a bytestring. Python3.2 and higher has that built-in to int:
from binascii import b2a_hex
def hex2(integer):
return b2a_hex(integer.to_bytes((integer.bit_length() + 7) // 8, 'big'))

Might want to look at the struct module, which is designed for byte-oriented i/o.
import struct
>>> struct.pack('>i',678)
'\x00\x00\x02\xa6'
#Use h instead of i for shorts
>>> struct.pack('>h',1043)
'\x04\x13'

Python: Shorten ugly code?

I have a ridiculous code segment in one of my programs right now:
str(len(str(len(var_text)**255)))
Is there an easy way to shorten that? 'Cause, frankly, that's ridiculous.
A option to convert a number >500 digits to scientific notation would also be helpful
(that's what I'm trying to do)
Full code:
print("Useless code rating:" , str(len(var_text)**255)[1] + "e" + str(len(str(len(var_text)**255))))

TL;DR: y = 2.408 * len(var_text)
Lets assume that your passkey is a string of characters with 256 characters available (0-255). Then just as a 16bit number holds 65536 numbers (2**16) the permutations of a string of equal length would be
n_perms = 256**len(passkey)
If you want the number of (decimal) digits in n_perms, consider the logarithm:
>>> from math import log10
>>> log10(1000)
3.0
>>> log10(9999)
3.9999565683801923
>>>
So we have length = floor(log10(n_perms)) + 1. In python, int rounds down anyway, so I'd say you want
n_perms = 256**len(var_text)
length = int(log10(n_perms)) + 1
I'd argue that 'shortening' ugly code isn't always the best way - you want it to be clear what you're doing.
Edit: On further consideration I realised that choosing base-10 to find the length of your permutations is really arbitrary anyway - so why not choose base-256!
length = log256(256**len(var_text)
length = len(var_text) # the log and exp cancel!
You are effectively just finding the length of your passkey in a different base...
Edit 2: Stand back, I'm going to attempt Mathematics!
if x = len(var_text), we want y such that
y = log10(256**x)
10**y = 256**x
10**y = (10**log10(256))**x
10**y = (10**(log10(256)x))
y = log10(256) * x
So, how's this for short:
length = log10(256) * len(var_text) # or about (2.408 * x)

This looks like it's producing a string version of the number of digits in the 255th power of the length of a string. Is that right? I'd be curious what that's used for.
You could compute the number differently, but it's not shorter and I'm not sure it's prettier:
str(int(math.ceil(math.log10(len(var_text))*255)))
or:
"%d" % math.ceil(math.log10(len(v))*255)

Are you trying to determine the number of possible strings having the same length as var_text? If so, you have your base and exponent reversed. You want to use 255**len(var_text) instead of len(var_text)**255.
But, I have to ask ... how long do these passkeys get to be, and what are you using them for?
And, why not just use the length of the passkey as an indicator of its length?

Firstly, if your main problem is manipulating huge floating point expressions, use the bigfloat package:
>>> import bigfloat
>>> bigfloat.BigFloat('1e1000')
BigFloat.exact('1.0000000000000001e+1000', precision=53)
As for the details in your question: len(str(num)) is approximately equal to log(num, 10) + 1. This is not significantly shorter, but it's certainly a better way of expressing it in code (for the benefit of anyone who doesn't know that off the top of their head). You can then simplify it with log laws:
len(str(x**y))
= log(x**y, 10) + 1
= y * log(x, 10) + 1
So maybe you'll find:
"%i" % (log(len(var_text),10)*255 + 1)
... is better? It's not significantly shorter, but it's a much clearer mathematical relationship between input and output.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Integer to Unique String - python

struct can easily do this for integers up to 64 bits in size. Any larger will require you to carve the number up first. >>> struct.pack('>Q', 12345678901234567890) '\xabT\xa9\x8c\xeb\x1f\n\xd2'

Related

Is there a built-in python function to count bit flip in a binary string?

Add one to end of alphanumeric string in Python

Split integer into two concatenated hex strings- Python

Is there a way to pad to an even number of digits?

Python: Shorten ugly code?

Categories

Resources