Short Unique Hexadecimal String in Python

Short Unique Hexadecimal String in Python - python

I need to generate a unique hexadecimal string in Python 3 that meets following requirements:
It should contain 6 characters
it should not contain just digits. There must be at least one character.
These generated strings should be random. They should not be in any order.
There should be minimum probability of conflict
I have considered uuid4(). But the problem is that it generates strings with too many characters and any substring of the generated string can contain all digits(i.e. no character) at some point.
Is there any other way to fulfill this conditions? Thanks in advance!
EDIT
Can we use a hash for example SHA-1 to fulfill above requirements?

Here's a simple method that samples evenly from all allowed strings. Sampling uniformly makes conflicts as rare as possible, short of keeping a log of previous keys or using a hash based on a counter (see below).
import random
digits = '0123456789'
letters = 'abcdef'
all_chars = digits + letters
length = 6
while True:
val = ''.join(random.choice(all_chars) for i in range(length))
# The following line might be faster if you only want hex digits.
# It makes a long int with 24 random bits, converts it to hex,
# drops '0x' from the start and 'L' from the end, then pads
# with zeros up to six places if needed
# val = hex(random.getrandbits(4*length))[2:-1].zfill(length)
# test whether it contains at least one letter
if not val.isdigit():
break
# now val is a suitable string
print val
# 5d1d81
Alternatively, here's a somewhat more complex approach that also samples uniformly, but doesn't use any open-ended loops:
import random, bisect
digits = '0123456789'
letters = 'abcdef'
all_chars = digits + letters
length = 6
# find how many valid strings there are with their first letter in position i
pos_weights = [10**i * 6 * 16**(length-1-i) for i in range(length)]
pos_c_weights = [sum(pos_weights[0:i+1]) for i in range(length)]
# choose a random slot among all the allowed strings
r = random.randint(0, pos_c_weights[-1])
# find the position for the first letter in the string
first_letter = bisect.bisect_left(pos_c_weights, r)
# generate a random string matching this pattern
val = ''.join(
[random.choice(digits) for i in range(first_letter)]
+ [random.choice(letters)]
+ [random.choice(all_chars) for i in range(first_letter + 1, length)]
)
# now val is a suitable string
print val
# 4a99f0
And finally, here's an even more complex method that uses the random number r to index directly into the entire range of allowed values, i.e., this converts any number in the range of 0-15,777,216 into a suitable hex string. This could be used to completely avoid conflicts (discussed more below).
import random, bisect
digits = '0123456789'
letters = 'abcdef'
all_chars = digits + letters
length = 6
# find how many valid strings there are with their first letter in position i
pos_weights = [10**i * 6 * 16**(length-1-i) for i in range(length)]
pos_c_weights = [sum(pos_weights[0:i+1]) for i in range(length + 1)]
# choose a random slot among all the allowed strings
r = random.randint(0, pos_c_weights[-1])
# find the position for the first letter in the string
first_letter = bisect.bisect_left(pos_c_weights, r) - 1
# choose the corresponding string from among all that fit this pattern
offset = r - pos_c_weights[first_letter]
val = ''
# convert the offset to a collection of indexes within the allowed strings
# the space of allowed strings has dimensions
# 10 x 10 x ... (for digits) x 6 (for first letter) x 16 x 16 x ... (for later chars)
# so we can index across it by dividing into appropriate-sized slices
for i in range(length):
if i < first_letter:
offset, v = divmod(offset, 10)
val += digits[v]
elif i == first_letter:
offset, v = divmod(offset, 6)
val += letters[v]
else:
offset, v = divmod(offset, 16)
val += all_chars[v]
# now val is a suitable string
print val
# eb3493
Uniform Sampling
I mentioned above that this samples uniformly across all allowed strings. Some other answers here choose 5 characters completely at random and then force a letter into the string at a random position. That approach produces more strings with multiple letters than you would get randomly. e.g., that method always produces a 6-letter string if letters are chosen for the first 5 slots; however, in this case the sixth selection should actually only have a 6/16 chance of being a letter. Those approaches can't be fixed by forcing a letter into the sixth slot only if the first 5 slots are digits. In that case, all 5-digit strings would automatically be converted to 5 digits plus 1 letter, giving too many 5-digit strings. With uniform sampling, there should be a 10/16 chance of completely rejecting the string if the first 5 characters are digits.
Here are some examples that illustrate these sampling issues. Suppose you have a simpler problem: you want a string of two binary digits, with a rule that at least one of them must be a 1. Conflicts will be rarest if you produce 01, 10 or 11 with equal probability. You can do that by choosing random bits for each slot, and then throwing out the 00's (similar to my approach above).
But suppose you instead follow this rule: Make two random binary choices. The first choice will be used as-is in the string. The second choice will determine the location where an additional 1 will be inserted. This is similar to the approach used by the other answers here. Then you will have the following possible outcomes, where the first two columns represent the two binary choices:
0 0 -> 10
0 1 -> 01
1 0 -> 11
1 1 -> 11
This approach has a 0.5 chance of producing 11, or 0.25 for 01 or 10, so it will increase the risk of collisions among 11 results.
You could try to improve this as follows: Make three random binary choices. The first choice will be used as-is in the string. The second choice will be converted to a 1 if the first choice was a 0; otherwise it will be added to the string as-is. The third choice will determine the location where the second choice will be inserted. Then you have the following possible outcomes:
0 0 0 -> 10 (second choice converted to 1)
0 0 1 -> 01 (second choice converted to 1)
0 1 0 -> 10
0 1 1 -> 01
1 0 0 -> 10
1 0 1 -> 01
1 1 0 -> 11
1 1 1 -> 11
This gives 0.375 chance for 01 or 10, and 0.25 chance for 11. So this will slightly increase the risk of conflicts between duplicate 10 or 01 values.
Reducing Conflicts
If you are open to using all letters instead of just 'a' through 'f' (hexadecimal digits), you could alter the definition of letters as noted in the comments. This will give much more diverse strings and much less chance of conflict. If you generated 1,000 strings allowing all upper- and lowercase letters, you'd only have about a 0.0009% chance of generating any duplicates, vs. 3% chance with hex strings only. (This will also virtually eliminate double-passes through the loop.)
If you really want to avoid conflicts between strings, you could store all the values you've generated previously in a set and check against that before breaking from the loop. This would be good if you are going to generate fewer than about 5 million keys. Beyond that, you'd need quite a bit of RAM to hold the old keys, and it might take a few runs through the loop to find an unused key.
If you need to generate more keys than that, you could encrypt a counter, as described at Generating non-repeating random numbers in Python. The counter and its encrypted version would both be ints in the range of 0 to 15,777,216. The counter would just count up from 0, and the encrypted version would look like a random number. Then you would convert the encrypted version to hex using the third code example above. If you do this, you should generate a random encryption key at the start, and change the encryption key each time the counter rolls past your maximum, to avoid producing the same sequence again.

The following approach works as follows, first pick one random letter to ensure rule 2, then select 4 random entries from the list of all available characters. Shuffle the resulting list. Lastly prepend one value taken from the list of all entries except 0 to ensure the string has 6 characters.
import random
all = "0123456789abcdef"
result = [random.choice('abcdef')] + [random.choice(all) for _ in range(4)]
random.shuffle(result)
result.insert(0, random.choice(all[1:]))
print(''.join(result))
Giving you something like:
3b7a4e
This approach avoids having to repeatedly check the result to ensure that it satisfies the rules.

Note: Updated the answer for hexadecimal unique string. Earlier I assumed for alhanumeric string.
You may create your own unique function using uuid and random library
>>> import uuid
>>> import random
# Step 1: Slice uuid with 5 i.e. new_id = str(uuid.uuid4())[:5]
# Step 2: Convert string to list of char i.e. new_id = list(new_id)
>>> uniqueval = list(str(uuid.uuid4())[:5])
# uniqueval = ['f', '4', '4', '4', '5']
# Step 3: Generate random number between 0-4 to insert new char i.e.
# random.randint(0, 4)
# Step 4: Get random char between a-f (for Hexadecimal char) i.e.
# chr(random.randint(ord('a'), ord('f')))
# Step 5: Insert random char to random index
>>> uniqueval.insert(random.randint(0, 4), chr(random.randint(ord('a'), ord('f'))))
# uniqueval = ['f', '4', '4', '4', 'f', '5']
# Step 6: Join the list
>>> uniqueval = ''.join(uniqueval)
# uniqueval = 'f444f5'

This function returns the nth string conforming to your requirements, so you can simply generate unique integers and convert them using this function.
def inttohex(number, digits):
# there must be at least one character:
fullhex = 16**(digits - 1)*6
assert number < fullhex
partialnumber, remainder = divmod(number, digits*6)
charposition, charindex = divmod(remainder, digits)
char = ['a', 'b', 'c', 'd', 'e', 'f'][charposition]
hexconversion = list("{0:0{1}x}".format(partialnumber, digits-1))
hexconversion.insert(charposition, char)
return ''.join(hexconversion)
Now you can get a particular one using for instance
import random
digits = 6
inttohex(random.randint(0, 6*16**(digits-1)), digits)
You can't have maximum randomness along with minimum probability of conflict. I recommend keeping track of which numbers you have handed out or if you are looping through all of them somehow, using a randomly sorted list.

Related

Time Complexity for LeetCode 3. Longest Substring Without Repeating Characters

Problem: Given a string s, find the length of the longest substring
without repeating characters.
Example: Input: s = "abcabcbb" Output: 3 Explanation: The answer is
"abc", with the length of 3.
My solution:
class Solution:
def lengthOfLongestSubstring(self, s: str) -> int:
seen = set()
l = r = curr_len = max_len = 0
n = len(s)
while l < n:
if r < n and s[r] not in seen:
seen.add(s[r])
curr_len += 1
max_len = max(curr_len, max_len)
r += 1
else:
l += 1
r = l
curr_len = 0
seen.clear()
return max_len
I know this is not an efficient solution, but I am having trouble figuring out its time complexity.
I visit every character in the string but, for each one of them, the window expands until it finds a repeated char. So every char ends up being visited multiple times, but not sure if enough times to justify an O(n2) time complexity and, obviously, it's way worse than O(n).

You could claim the algorithm to be O(n) if you know the size of the character set your input can be composed of, because the length your window can expand is limited by the number of different characters you could pass over before encountering a duplicate, and this is capped by the size of the character set you're working with, which itself is some constant independent of the length of the string. For example, if you are only working with lower case alphabetic characters, the algorithm is O(26n) = O(n).
To be more exact you could say that it runs in O(n*(min(m,n)) where n is the length of the string and m is the number of characters in the alphabet of the string. The reason for the min is that even if you're somehow working with an alphabet of unlimited unique characters, at worst you're doing a double for loop to the end of the string. That means however that if the number of possible characters you can encounter in the string exceeds the string's length you have a worst case O(n^2) performance (which occurs when every character of the string is unique).

How can I optimize this for-loop?

I need to check the occurrences of the letter "a" in a string s of size n.
Example:
s = "abcac"
n = 10
String to check for occurrences of letter "a": "abcacabcac".
Occurrences: 4
My code works, but I need it to work faster for larger values of n.
What can I do to optimize this code?
def repeatedString(s, n):
a_count, word_iter = 0, 0
for i in range(n):
if s[word_iter] == "a":
a_count+=1
word_iter += 1
if word_iter == (len(s)):
word_iter = 0
return a_count

You only don't need to assemble the full repeated string to do it. count the number of the specified characted in the whole string and multiple that by the number of times it will be fully repeated (n//len(s) times). Add to that the number of occurrences that will appear in the last (truncated) part at the end of the repetitions (i.e. first n%len(s) characters)
def countChar(s,n,c):
return s.count(c)*n//len(s)+s[:n%len(s)].count(c)
output:
countChar("abcac",10,"a") # 4 times in 'abcacabcac'
countChar("abcac",17,"a") # 7 times in 'abcacabcacabcacab'

Count the number of times a appears in a string, s up to length n
s = "abcac"
n = 10
str(s*(int(n/len(s))))[:n].count('a')

You can use regular expressions:
import re
a_count = len(re.findall(r'a',s))
re.findall returns an array of all matches, and we can just get the length of it. Using a regular expression allows for greater generalization and the ability to search for more complex patterns. Debra's original answer is better for a simple string search though:
a_count = s.count('a')

I'm unable to figure out what test cases am I failing here

I need to find the maximum occurring character in a string: a-z. It is 26 characters long i.e. 26 different types.
Even though the output is correct, I'm still failing. What am I doing wrong?
These are the conditions:
Note: If there are more than one type of equal maximum then the type with lesser ASCII value will be considered.
Input Format
The first line of input consists of number of test cases, T.
The second line of each test case consists of a string representing the type of each individual characters.
Constraints
1<= T <=10
1<= |string| <=100000
Output Format
For each test case, print the required output in a separate line.
Sample TestCase 1
Input
2
gqtrawq
fnaxtyyzz
Output
q
y
Explanation
Test Case 1: There are 2 q occurring the max while the rest all are present alone.
Test Case 2: There are 2 y and 2 z types. Since the maximum value is same, the type with lesser Ascii value is considered as output. Therfore, y is the correct type.
def testcase(str1):
ASCII_SIZE = 256
ctr = [0] * ASCII_SIZE
max = -1
ch = ''
for i in str1:
ctr[ord(i)]+=1;
for i in str1:
if max < ctr[ord(i)]:
max = ctr[ord(i)]
ch = i
return ch
print(testcase("gqtrawq"))
print(testcase("fnaxtyyzz"))
I'm passing the output i.e. I'm getting the correct output but failing the test cases.

Note the note:
Note: If there are more than one type of equal maximum then the type with lesser ASCII value will be considered.
But with your code, you return the character with highest count that appears first in the string. In case of ties, take the character itself into account in the comparison:
for i in str1:
if max < ctr[ord(i)] or max == ctr[ord(i)] and i < ch:
max = ctr[ord(i)]
ch = i
Or shorter (but not necessarily clearer) comparing tuples of (count, char):
if (max, i) < (ctr[ord(i)], ch):
(Note that this is comparing (old_cnt, new_char) < (new_cnt, old_chr)!)
Alternatively, you could also iterate the characters in the string in sorted order:
for i in sorted(str1):
if max < ctr[ord(i)]:
...
Having said that, you could simplify/improve your code by counting the characters directly instead of their ord (using a dict instead of list), and using the max function with an appropriate key function to get the most common character.
def testcase(str1):
ctr = {c: 0 for c in str1}
for c in str1:
ctr[c] += 1
return max(sorted(set(str1)), key=ctr.get)
You could also use collections.Counter, and most_common, but where's the fun in that?

What should be the output for this - print(testcase("fanaxtyfzyz"))?
IMO the output should be 'a' but your program writes 'f'.
The reason is you are iterating through the characters of the input string,
for i in str1: #Iterating through the values 'f','a','n','a','x','t',...
#first count of 'f' is considered.
#count of 'f' occurs first, count of 'a' not considered.
if max < ctr[ord(i)]:
max = ctr[ord(i)]
ch = i
Instead, you should iterate through the values of ctr. Or sort the input string and do the same.

lightly alter a hash programmatically

At the moment I frequently have to do something in unittests with hashes and cryptographic signatures. Sometimes they get generated, and I just need to alter one slightly and prove that something no longer works. They are strings of hex-digits 0-9 and a-f of specific length. Here is a sample 64 long:
h = '702b31faad0246cc89a5dc782cdf5235a885d0f529fb30a4e1e70e00938df91a'
I want to change just one character somewhere in there.
You can't be sure that every digit 0 - 9 and a - f will be in there, although would guess it's at least 95% certain that they all are. If you could be sure, I would just run h = h.replace('a', 'b', 1) on it.
If you do it manually, you can just look at it and see the third digit is 2 and run:
new = list(h)
new[2] = '3'
h = ''.join(new)
But if you cannot see it and it needs to happen programmatically, what is a clean and certain way to change just one character in it somewhere?

from random import randrange
h = '702b31faad0246cc89a5dc782cdf5235a885d0f529fb30a4e1e70e00938df91a'
i = randrange(len(h))
new_h = h[:i] + hex(int(h[i], 16) + randrange(1, 16))[-1:] + h[i+1:]
In words:
choose a random index i in h
split the string into the part before the index, the char at the index, and the rest
replace the char at the index with its hex value incremented by a random int between 1 and 15, modulo 16 (i.e., its rightmost hex character)
build the new string from the above pieces
Note that an increment by a value between 1 and 15 (included), followed by a modulo 16, never maps a hex digit onto itself. An increment by 0 or 16 would map it exactly onto itself.

You can just choose a random index
import random
valid_chars = '0...f'
def replace_hash(hash_digest):
idx_to_replace = random.randint(64)
char_to_replace = hash_digest[idx_to_replace]
replacements = valid_chars.replace(char_to_replace, '')
hash_digest[idx_to_replace] = replacements[random.randint(15)
return hash_digest
The most efficient way is to just replace the first char with 1 of 2 replacements. I mean, you can only collide with one char anyway so there's no need to do it randomly. But if you want a random change the function'll work.

I suggest you increment the last character of the hash (cycling to 0 after f). That way you are sure to get a different hash, only differing by one character.
You can easily extend this method to change a character at the position of your choosing, and not just the last one.
h = '702b31faad0246cc89a5dc782cdf5235a885d0f529fb30a4e1e70e00938df91a'
def change_hash(h, index=-1):
digits = list(h)
old_digit= digits[index]
v = int(old_digit, 16)
new_v = (v+1)%16
new_digit = '{:x}'.format(new_v)
digits[index] = new_digit
return ''.join(digits)
print(change_hash(h))
# 702b31faad0246cc89a5dc782cdf5235a885d0f529fb30a4e1e70e00938df91b
# ^
print(change_hash(h, 2))
# 703b31faad0246cc89a5dc782cdf5235a885d0f529fb30a4e1e70e00938df91a
# ^
EDIT:
added option to change a digit at an arbitrary position
formatting the digit using format() as it was proposed in another answer

h = chr(ord(h[0]) + ((-1) if (h[0] in "9z") else 1)) + h[1:]

Convert a long number to corresponding letter combinations

Given a number, translate it to all possible combinations of corresponding letters. For example, if given the number 1234, it should spit out abcd, lcd, and awd because the combinations of numbers corresponding to letters could be 1 2 3 4, 12 3 4, or 1 23 4.
I was thinking of ways to do this in Python and I was honestly stumped. Any hints?
I basically only setup a simple system to convert single digit to letters so far.

Make str.
Implement partition as in here.
Filter lists with a number over 26.
Write function that returns letters.
def alphabet(n):
# return " abcde..."[n]
return chr(n + 96)
def partition(lst):
for i in range(1, len(lst)):
for r in partition(lst[i:]):
yield [lst[:i]] + r
yield [lst]
def int2words(x):
for lst in partition(str(x)):
ints = [int(i) for i in lst]
if all(i <= 26 for i in ints):
yield "".join(alphabet(i) for i in ints)
x = 12121
print(list(int2words(x)))
# ['ababa', 'abau', 'abla', 'auba', 'auu', 'laba', 'lau', 'lla']

I'm not gonna give you a complete solution but an idea where to start:
I would transform the number to a string and iterate over the string, as the alphabet has 26 characters you would only have to check one- and two-digit numbers.
As in a comment above a recursive approach will do the trick, e.g.:
Number is 1234
*) Take first character -> number is 1
*) From there combine it with all remaining 1-digit numbers -->
1 2 3 4
*) Then combine it with the next 2 digit number (if <= 26) and the remaining 1 digit numbers -->
1 23 4
*) ...and so on
As i said, it's just an idea where to start, but basically its a recursive approach using combinatorics including checks if two digit numbers aren't greater then 26 and thus beyond the alphabet.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Short Unique Hexadecimal String in Python - python

Related

Time Complexity for LeetCode 3. Longest Substring Without Repeating Characters

How can I optimize this for-loop?

I'm unable to figure out what test cases am I failing here

lightly alter a hash programmatically

Convert a long number to corresponding letter combinations

Categories

Resources