Me and a guy at work decided to try and make a basic python program that would 1) jumble a string & 2) un-jumble the string. The idea was that we can send each other absolute rubbish.
My first attempt (I am awful at this):
x = ("This is the text")
x1 = x.replace("a","r")
x2 = x1.replace("b","w")
x3 = x2.replace("c","z") #Do this for the whole alphabet
print(x3) #both upper and lower case
Then do the same but back to front to unscramble.... It works but its also longggggg....
Code from this article: http://gomputor.wordpress.com/2008/09/27/search-replace-multiple-words-or-characters-with-python/ suggests creating a method as follows:
def replace_all(text,dic):
for i,j in dic.iteritems():
text = text.replace(i,j)
return text
reps = {'a':'r','b':'w','c':'z'} #for the whole alphabet again (yawn)
x = ("This is the text")
txt = replace_all(x, reps)
print(txt)
Will this work? I have seen iteritems() get bad press elsewhere??
iteritems returns a list of key, value pairs from the dictionary. The only reason to watch out for iteritems is that it was removed in python 3.0 because dict.items now does the same thing.
As for the code, its functionally correct (except that x should be my_text), though the encryption strength isn't very high.
There are many algorithms that use simpler encryption key methods (your reps dictionary) and produce higher quality encryption. If you are working in python why not use a simple library like https://pypi.python.org/pypi/simple-crypt to get a higher quality encryption/decryption?
If you're working in Python 2 rot13 is a built in codec:
>>> 'This is the text'.encode('rot13')
'Guvf vf gur grkg'
>>> 'Guvf vf gur grkg'.decode('rot13')
u'This is the text'
http://docs.python.org/2/library/codecs.html
Drum roll please.....
>>> msg="The quick brown fox jumps over the lazy dog"
>>> cyphertext = ' '.join(w[1:]+'-'+w[0]+'ey' for w in msg.split())
>>> cyphertext
'he-Tey uick-qey rown-bey ox-fey umps-jey ver-oey he-tey azy-ley og-dey'
>>> ' '.join(w[-3] + w[:-4] for w in cyphertext.split())
'The quick brown fox jumps over the lazy dog'
Note that the non-standard treatment of "the" and "quick" along with the possible confusion as to the "ey" vs "ay" suffix enhances security.
why not use string.translate
import string
tab = string.maketrans(string.ascii_letters,string.ascii_letters[13:]+string.ascii_letters[:13])
message = "hello world!"
encoded = message.translate(tab)
tab_decode = string.maketrans(string.ascii_letters[13:]+string.ascii_letters[:13],string.ascii_letters)
print encoded.translate(tab_decode)
with string.translate you can define any mapping you want
key = "lkj234oiu1563lop|.)(*&^%$##!" #note key must be as long as your alphabet(in this case 52)
mapping = (string.ascii_letters,key)
tab_encode = string.maketrans(*mapping)
encoded = "hello world".translate(tab_encode)
tab_decode = string.maketrans(*list(reversed(mapping)))
decoded = encoded.translate(tab_decode)
another thing you could do is base64 encoding (note this is not encryption, however it is probably about as hard to crack as a simple ceasar cipher)
import base64
decoded = "Hello World!!!"
encoded = base64.b64encode(decoded)
print encoded
print base64.b64decode(encoded)
Here is a complete solution that you can use to learn from. You can use it to both encode and decode you messages. Just specify what you want to do and then type in your message. A blank line indicates that you are finished with your message and signals for the requested transformation to take place.
import string
PLAIN_TEXT = string.ascii_letters
CIPHER_TEXT = 'kWFUsDYeAxolSLTwChgNJtaMvQIzRZVrPEyfiKXGcdBunbqHjpOm'
CIPHER_TABLE = str.maketrans(PLAIN_TEXT, CIPHER_TEXT)
PLAIN_TABLE = str.maketrans(CIPHER_TEXT, PLAIN_TEXT)
def main():
while True:
choice = input('Encode or Decode? ').lower()
if choice:
if 'encode'.startswith(choice):
process_text(CIPHER_TABLE)
continue
if 'decode'.startswith(choice):
process_text(PLAIN_TABLE)
continue
print('Please try again.')
def process_text(table):
buffer = []
while True:
line = input('> ')
if not line:
break
buffer.append(line.translate(table))
print('\n'.join(buffer))
if __name__ == '__main__':
main()
If you want an even better solution, bookmark Wabol Talk and use that to create your messages. You can look at its source and see that it is implemented in Python as well. The program encrypts messages and encodes them in the "Wabol" language. I can also provide a GUI version of the program if you want.
Here are some basic, very insecure ciphers with which to have fun! By the way, I was googling around and I found some neat slides from Northeastern University on basic cryptosystems. Also, I apologize if my lettering habits differ from wiki or anyone else; I just go with my habits.
Shift Cipher
The shift cipher has an integer ring (e.g. the letters of the alphabet, A-Z, numbered from 0-25 usually, and we typically call this Z26) and an integer key, let's call it k. In order to build this ring, we will use modular arithmetic, basically meaning our numbers will only ever be between 0 and 25 if our modulus, let's call it n, is 26.) For any given plaintext letter, let's call it m, we can encode it by adding k to m, modulo n - and viola! We have our ciphertext letter, let's call it c.
To put it mathematically: c ≅ m + k (mod n). And logically, m ≅ c - k (mod n). To put it programmatically:
def encrypt(m, k, n):
return (m + k) % n
def decrypt(c, k, n):
return (c - k) % n
Substitution Cipher
Substitution ciphers are much harder to crack than shift ciphers, but they are still easy to crack using many characteristics of the English language. Google them if you like - but I gotta say, a nice program to help you is freq.c.
A substitution cipher consists of the same ring, with a mapping of plaintext characters to ciphertext characters. For instance, A = G, B = Q, C = E, D = T and so on. A dictionary's really great for this: k = {'A' : 'G', 'B' : 'Q', ...}. If you want to decrypt, you can invert the dictionary.
def encrypt(m, k):
return k[m]
def decrypt(c, k):
for key, value in k.items():
if value == c: return key
Affine Cipher
This cipher adds modular multiplication to the shift cipher. The key is now a pair a, b where a must be coprime to n, so that it has a modular multiplicative inverse. To encrypt, multiply m with a (modularly) and then add b (modularly). Choosing an a which is coprime isn't exactly obvious - basically, you need to make sure that the greatest common divisor of a and n is 1.
Mathematically: c ≅ (am + b) mod n and therefore m ≅ a^-1(c - b) mod n. Programmatically:
def encrypt(m, a, b, n):
return (a*m + b) % 25
And I'm not going to write a clever solution for finding this modular multiplicative inverse... you can use the extended Euclidean algorithm for this, but for me it's just easier to bruteforce it when n is so small.
def getModularMultiplicativeInverse(a, b, n):
for x in range(2, n):
if (a * x) % n == 1:
return x
def decrypt(c, a, b, n):
return (getModularMultiplicativeInverse(a, b, n) * (c - b)) % n
These were just some little examples... less for the code, more for the chatter.
Related
I have an application that is kind of like a URL shortener and need to generate unique URL whenever a user requests.
For this I need a function to map an index/number to a unique string of length n with two requirements:
Two different numbers can not generate same string.
In other words as long as i,j<K: f(i) != f(j). K is the number of possible strings = 26^n. (26 is number of characters in English)
Two strings generated by number i and i+1 don't look similar most of the times. For example they are not abcdef1 and abcdef2. (So that users can not predict the pattern and the next IDs)
This is my current code in Python:
chars = "abcdefghijklmnopqrstuvwxyz"
for item in itertools.product(chars, repeat=n):
print("".join(item))
# For n = 7 generates:
# aaaaaaa
# aaaaaab
# aaaaaac
# ...
The problem with this code is there is no index that I can use to generate unique strings on demand by tracking that index. For example generate 1 million unique strings today and 2 million tomorrow without looping through or collision with the first 1 million.
The other problem with this code is that the strings that are created after each other look very similar and I need them to look random.
One option is to populate a table/dictionary with millions of strings, shuffle them and keep track of index to that table but it takes a lot of memory.
An option is also to check the database of existing IDs after generating a random string to make sure it doesn't exist but the problem is as I get closer to the K (26^n) the chance of collision increases and it wouldn't be efficient to make a lot of check_if_exist queries against the database.
Also if n was long enough I could use UUID with small chance of collision but in my case n is 7.
I'm going to outline a solution for you that is going to resist casual inspection even by a knowledgeable person, though it probably IS NOT cryptographically secure.
First, your strings and numbers are in a one-to-one map. Here is some simple code for that.
alphabet = 'abcdefghijklmnopqrstuvwxyz'
len_of_codes = 7
char_to_pos = {}
for i in range(len(alphabet)):
char_to_pos[alphabet[i]] = i
def number_to_string(n):
chars = []
for _ in range(len_of_codes):
chars.append(alphabet[n % len(alphabet)])
n = n // len(alphabet)
return "".join(reversed(chars))
def string_to_number(s):
n = 0
for c in s:
n = len(alphabet) * n + char_to_pos[c]
return n
So now your problem is how to take an ascending stream of numbers and get an apparently random stream of numbers out of it instead. (Because you know how to turn those into strings.) Well, there are lots of tricks for primes, so let's find a decent sized prime that fits in the range that you want.
def is_prime (n):
for i in range(2, n):
if 0 == n%i:
return False
elif n < i*i:
return True
if n == 2:
return True
else:
return False
def last_prime_before (n):
for m in range(n-1, 1, -1):
if is_prime(m):
return m
print(last_prime_before(len(alphabet)**len_of_codes)
With this we find that we can use the prime 8031810103. That's how many numbers we'll be able to handle.
Now there is an easy way to scramble them. Which is to use the fact that multiplication modulo a prime scrambles the numbers in the range 1..(p-1).
def scramble1 (p, k, n):
return (n*k) % p
Picking a random number to scramble by, int(random.random() * 26**7) happened to give me 3661807866, we get a sequence we can calculate with:
for i in range(1, 5):
print(number_to_string(scramble1(8031810103, 3661807866, i)))
Which gives us
lwfdjoc
xskgtce
jopkctb
vkunmhd
This looks random to casual inspection. But will be reversible for any knowledgeable someone who puts modest effort in. They just have to guess the prime and algorithm that we used, look at 2 consecutive values to get the hidden parameter, then look at a couple of more to verify it.
Before addressing that, let's figure out how to take a string and get the number back. Thanks to Fermat's little theorem we know for p prime and 1 <= k < p that (k * k^(p-2)) % p == 1.
def n_pow_m_mod_k (n, m, k):
answer = 1
while 0 < m:
if 1 == m % 2:
answer = (answer * n) % k
m = m // 2
n = (n * n) % k
return answer
print(n_pow_m_mod_k(3661807866, 8031810103-2, 8031810103))
This gives us 3319920713. Armed with that we can calculate scramble1(8031810103, 3319920713, string_to_number("vkunmhd")) to find out that vkunmhd came from 4.
Now let's make it harder. Let's generate several keys to be scrambling with:
import random
p = 26**7
for i in range(5):
p = last_prime_before(p)
print((p, int(random.random() * p)))
When I ran this I happened to get:
(8031810103, 3661807866)
(8031810097, 3163265427)
(8031810091, 7069619503)
(8031809963, 6528177934)
(8031809917, 991731572)
Now let's scramble through several layers, working from smallest prime to largest (this requires reversing the sequence):
def encode (n):
for p, k in [
(8031809917, 991731572)
, (8031809963, 6528177934)
, (8031810091, 7069619503)
, (8031810097, 3163265427)
, (8031810103, 3661807866)
]:
n = scramble1(p, k, n)
return number_to_string(n)
This will give a sequence:
ehidzxf
shsifyl
gicmmcm
ofaroeg
And to reverse it just use the same trick that reversed the first scramble (reversing the primes so I am unscrambling in the order that I started with):
def decode (s):
n = string_to_number(s)
for p, k in [
(8031810103, 3319920713)
, (8031810097, 4707272543)
, (8031810091, 5077139687)
, (8031809963, 192273749)
, (8031809917, 5986071506)
]:
n = scramble1(p, k, n)
return n
TO BE CLEAR I do NOT promise that this is cryptographically secure. I'm not a cryptographer, and I'm aware enough of my limitations that I know not to trust it.
But I do promise that you'll have a sequence of over 8 billion strings that you are able to encode/decode with no obvious patterns.
Now take this code, scramble the alphabet, regenerate the magic numbers that I used, and choose a different number of layers to go through. I promise you that I personally have absolutely no idea how someone would even approach the problem of figuring out the algorithm. (But then again I'm not a cryptographer. Maybe they have some techniques to try. I sure don't.)
How about :
from random import Random
n = 7
def f(i):
myrandom = Random()
myrandom.seed(i)
alphabet = "123456789"
return "".join([myrandom.choice(alphabet) for _ in range(7)])
# same entry, same output
assert f(0) == "7715987"
assert f(0) == "7715987"
assert f(0) == "7715987"
# different entry, different output
assert f(1) == "3252888"
(change the alphabet to match your need)
This "emulate" a UUID, since you said you could accept a small chance of collision. If you want to avoid collision, what you really need is a perfect hash function (https://en.wikipedia.org/wiki/Perfect_hash_function).
you can try something based on the sha1 hash
#!/usr/bin/python3
import hashlib
def generate_link(i):
n = 7
a = "abcdefghijklmnopqrstuvwxyz01234567890"
return "".join(a[x%36] for x in hashlib.sha1(str(i).encode('ascii')).digest()[-n:])
This is a really simple example of what I outlined in this comment. It just offsets the number based on i. If you want "different" strings, don't use this, because if num is 0, then you will get abcdefg (with n = 7).
alphabet = "abcdefghijklmnopqrstuvwxyz"
# num is the num to convert, i is the "offset"
def num_to_char(num, i):
return alphabet[(num + i) % 26]
# generate the link
def generate_link(num, n):
return "".join([num_to_char(num, i) for i in range(n)])
generate_link(0, 7) # "abcdefg"
generate_link(0, 7) # still "abcdefg"
generate_link(0, 7) # again, "abcdefg"!
generate_link(1, 7) # now it's "bcdefgh"!
You would just need to change the num + i to some complicated and obscure math equation.
I have IDs from a database, and I want them to be short and easily differentiatable by eye (i.e., two close numbers look different).
Like this:
13892359163211 -> ALO2WE7
13992351216421 -> 52NBEK3
or similar, algorithmically. So kind of like a hash, except it needs to be reversible? An encryption algorithm like AES is almost ideal, except that its outputs are way too long. (and overkill).
I'm using Python (3), although I don't think that should really matter
New answer with 'close' numbers looking different
You could use RSA to encrypt (and later decrypt) your numbers. This is definitely overkill - but ... here is the example:
Install https://github.com/sybrenstuvel/python-rsa (pip install rsa)
import rsa
import rsa.core
# (pubkey, privkey) = rsa.newkeys(64) # Generate key pair
pubkey = rsa.PublicKey(n=9645943279888986023, e=65537)
privkey = rsa.PrivateKey(n=9645943279888986023, e=65537, d=7507666207464026273, p=9255782423, q=1042153201)
print("1st", rsa.core.encrypt_int(13892359163211, pubkey.e, pubkey.n))
print("2nd", rsa.core.encrypt_int(13992351216421, pubkey.e, pubkey.n))
print("1st", hex(rsa.core.encrypt_int(13892359163211, pubkey.e, pubkey.n))[2:])
print("2nd", hex(rsa.core.encrypt_int(13992351216421, pubkey.e, pubkey.n))[2:])
# If you want to compare a couple of numbers that are similar
for i in range (13892359163211, 13892359163251):
encrypted = rsa.core.encrypt_int(i, pubkey.e, pubkey.n)
# decrypted = rsa.core.decrypt_int(encrypted, privkey.d, privkey.n)
print (i, hex(encrypted)[2:], encrypted)
Please not that you cannot encrypt numbers bigger than pubkey.n. This is a RSA related limitation. By generating a different keypair with a higher n you can circumvent this issue. If you would like all generated numbers to have the same length, prefix them with leading zeroes. You could also consider making them uppercase for better readability. To make the displayed strings shorter consider using the base62 encoding mentioned in my old answer below.
output
1st 5427392181794576250
2nd 7543432434424555966
1st 4b51f86f0c99177a
2nd 68afa7d5110929be
input hex(encrypted) encrypted
13892359163211 4b51f86f0c99177a 5427392181794576250
13892359163212 2039f9a3f5cf5d46 2322161565485194566
13892359163213 173997b57918a6c3 1673535542221383363
13892359163214 36644663653bbb4 244958435527080884
13892359163215 c2eeec0c054e633 877901489011746355
...
Old answer related to displaying the numbers a bit shorter, not being aware that they should look substantially different
You want to change the base of your number from 10 to something bigger to use less characters. See https://stackoverflow.com/a/1119769 for an example with base 62 (a-zA-Z0-9).
Or quick and dirty for base 16, (0-9A-F, hexadecimal).
hex(13892359163211)[2:] # -> 'ca291220d4b'
The problem is easier to state than it is to solve. One solution is to borrow some ideas from format-preserving encryption, but simplifying because security is not a goal. Using the Feistel cipher framework a very short and reversible "mixing" function can be written, followed by a short encoding function, to achieve something that appears to be what you want.
import hashlib
import string
mask = (1 << 22) - 1
alphabet = string.ascii_uppercase + string.digits
def func(x: int):
return int.from_bytes(hashlib.sha256(x.to_bytes(3, 'big')).digest(), 'big') & mask
def mix(id_in: int):
L, R = id_in >> 22, id_in & mask
L ^= func(R)
R ^= func(L)
return (L << 22) | R
def unmix(mixed: int):
L, R = mixed >> 22, mixed & mask
R ^= func(L)
L ^= func(R)
return (L << 22) | R
def base_n_encode(value: int):
digits = []
for i in range(9):
value, rem = divmod(value, len(alphabet))
digits.insert(0, rem)
return ''.join(alphabet[digit] for digit in digits)
def base_n_decode(encoded: str):
digits = [alphabet.index(ch) for ch in encoded]
result = 0
for digit in digits:
result = result * len(alphabet) + digit
return result
def encode(id_in: int):
return base_n_encode(mix(id_in))
def decode(encoded: str):
return unmix(base_n_decode(encoded))
if __name__ == '__main__':
e1 = encode(13892359163211)
e2 = encode(13992351216421)
print('13892359163211 -> ' + e1)
print('13992351216421 -> ' + e2)
print(e1 + ' -> ' + str(decode(e1)))
print(e2 + ' -> ' + str(decode(e2)))
Output is:
13892359163211 -> BC33VXN8A
13992351216421 -> D1UOW6SLL
BC33VXN8A -> 13892359163211
D1UOW6SLL -> 13992351216421
Note the use of sha256. This is slow and most definitely overkill, but it has the advantage of being built-in to python and thus a one-liner. Unless you are converting millions of IDs speed shouldn't be an issue, but if it is you can replace func with something much, much faster, maybe Murmur3.
The code is written with hard-coded constants to make it a little easier to see what's going on, but it can be generalized to work with arbitrary length (in bits) IDs and arbitrary alphabets.
A more general version of this example is available on github.
How about finding crc32 for the input and showing the result in hex?
>>> n = 13892359163211
>>>
>>> import binascii
>>> hex(binascii.crc32(str(n).encode()))[2:]
'240a831a'
Convert the numeric ID's to binary form (3) and use an encoder (4, 5).
In [1]: import struct, base64
In [2]: i = 13892359163211
Out[2]: 13892359163211
In [3]: struct.pack('L', i)
Out[3]: b'K\r"\x91\xa2\x0c\x00\x00'
In [4]: base64.b85encode(struct.pack('L', i)).decode('ascii')
Out[4]: 'OAR8Cq6`24'
In [5]: base64.b64encode(struct.pack('L', i)).decode('ascii')[:-1]
Out[5]: 'Sw0ikaIMAAA'
Which encoder to use depends on which characters you want to allow.
You can use CrypII idea to convert from integer to base64. This will be the shortest
13892359163211 is 4LWL and
13992351216421 is 64yl
Google or Amazone ask the following question in an interview, would my solution be accepted?
problem: find the index of the first occurrence of the given word from the given string
note: Above problem is from a website and following code passed all the test cases. however, I am not sure if this is the most optimum solutions and so would be accepted by big giants.
def strStr(A, B):
if len(A) == 0 or len(B) == 0:
return -1
for i in range(len(A)):
c = A[i:i+len(B)]
if c == B:
return i
else:
return -1
There are a few algorithms that you can learn on this topic like
rabin karp algorithm , z algorithm ,kmpalgorithm
which all run in run time complexity of O(n+m) where n is the string length and m is the pattern length. Your algorithm runs in O(n*m) runtime complexity . I would suggest starting to learn from rabin karp algorithm, I personally found it the easiest to grasp.
There are also some advanced topics like searching many patterns in one string like the aho-corasick algorithm which is good to read. I think this is what grep uses when searching for multiple patterns.
Hope it helps :)
Python actually has a built in function for this, which is why this question doesn't seem like a great fit for interviews in python. Something like this would suffice:
def strStr(A, B):
return A.find(B)
Otherwise, as commenters have mentioned, inputs/outputs and tests are important. You could add some checks that make it slightly more performant (i.e. check that B is smaller than A), but I think in general, you won't do better than O(n).
If you want to match the entire word to the words in the string, your code would not work.
E.g If my arguments are print(strStr('world hello world', 'wor')), your code would return 0, but it should return -1.
I checked your function, works well in python3.6
print(strStr('abcdef', 'bcd')) # with your function. *index start from 0
print("adbcdef".find('bcd')) # python default function. *index start from 1
first occurrence index, use index() or find()
text = 'hello i am homer simpson'
index = text.index('homer')
print(index)
index = text.find('homer')
print(index)
output:
11
11
It is always better to got for the builtin python funtions.
But sometimes in the interviews they will ask for you to implemente it yourself. The best thing to do is to start with the simplest version, then think about corner cases and improvements.
Here you have a test with your version, a slightly improved one that avoid to reallocating new strings in each index and the python built-ing:
A = "aaa foo baz fooz bar aaa"
B = "bar"
def strInStr1(A, B):
if len(A) == 0 or len(B) == 0:
return -1
for i in range(len(A)):
c = A[i:i+len(B)]
if c == B:
return i
else:
return -1
def strInStr2(A, B):
size = len(B)
for i in range(len(A)):
if A[i] == B[0]:
if A[i:i+size] == B:
return i
return -1
def strInStr3(A, B):
return A.index(B)
import timeit
setup = '''from __main__ import strInStr1, strInStr2, strInStr3, A, B'''
for f in ("strInStr1", "strInStr2", "strInStr3"):
result = timeit.timeit(f"{f}(A, B)", setup=setup)
print(f"{f}: ", result)
The results speak for themselves (time in seconds):
strInStr1: 15.809420814999612
strInStr2: 7.687011377005547
strInStr3: 0.8342400040055509
Here you have the live version
I'm implementing a program that calculates an equation: F(n) = F(n-1) + 'a' + func1(func2(F(n-1))).
func1 takes every 'a' and makes it 'c' and every 'c' becomes 'a'.
func2 reverses the string (e.x. "xyz" becomes "zyx").
I want to calculate the Kth character of F(10**2017).
The basic rules are F(0) = "" (empty string), and examples are F(1) = "a", F(2) = "aac", and so on.
How do I do this efficiently?
The basic part of my code is this:
def op1 (str1):
if str1 == 'a':
return 'c'
else:
return 'a'
def op2 (str2):
return str2[::-1]
sinitial = ''
while (counter < 10**2017):
Finitial = Finitial + 'a' + op1(op2(Finitial))
counter += 1
print Finitial
Let's start by fixing your original code and defining a function to compute F(n) for small n. We'll also print out the first few values of F. All code below is for Python 3; if you're using Python 2, you'll need to make some minor changes, like replacing str.maketrans with string.maketrans and range with xrange.
swap_ac = str.maketrans({ord('a'): 'c', ord('c'): 'a'})
def F(n):
s = ''
for _ in range(n):
s = s + 'a' + s[::-1].translate(swap_ac)
return s
for n in range(7):
print("F({}) = {!r}".format(n, F(n)))
This gives the following output:
F(0) = ''
F(1) = 'a'
F(2) = 'aac'
F(3) = 'aacaacc'
F(4) = 'aacaaccaaaccacc'
F(5) = 'aacaaccaaaccaccaaacaacccaaccacc'
F(6) = 'aacaaccaaaccaccaaacaacccaaccaccaaacaaccaaaccacccaacaacccaaccacc'
A couple of observations at this point:
F(n) is a string of length 2**n-1. That means that F(n) grows fast. Computing F(50) would already require some serious hardware: even if we stored one character per bit, we'd need over 100 terabytes to store the full string. F(200) has more characters than there are estimated atoms in the solar system. So the idea of computing F(10**2017) directly is laughable: we need a different approach.
By construction, each F(n) is a prefix of F(n+1). So what we really have is a well-defined infinite string, where each F(n) merely gives us the first 2**n-1 characters of that infinite string, and we're looking to compute its kth character. And for any practical purpose, F(10**2017) might as well be that infinite string: for example, when we do our computation, we don't need to check that k < 2**(10**2017)-1, since a k exceeding this can't even be represented in normal binary notation in this universe.
Luckily, the structure of the string is simple enough that computing the kth character directly is straightforward. The major clue comes when we look at the characters at even and odd positions:
>>> F(6)[::2]
'acacacacacacacacacacacacacacacac'
>>> F(6)[1::2]
'aacaaccaaaccaccaaacaacccaaccacc'
The characters at even positions simply alternate between a and c (and it's straightforward to prove that this is true, based on the construction). So if our k is even, we can simply look at whether k/2 is odd or even to determine whether we'll get an a or a c.
What about the odd positions? Well F(6)[1::2] should look somewhat familiar: it's just F(5):
>>> F(6)[1::2] == F(5)
True
Again, it's straightforward to prove (e.g., by induction) that this isn't simply a coincidence, and that F(n+1)[1::2] == F(n) for all nonnegative n.
We now have an effective way to compute the kth character in our infinite string: if k is even, we just look at the parity of k/2. If k is odd, then we know that the character at position k is equal to that at position (k-1)/2. So here's a first solution to computing that character:
def char_at_pos(k):
"""
Return the character at position k of the string F(n), for any
n satisfying 2**n-1 > k.
"""
while k % 2 == 1:
k //= 2
return 'ac'[k//2%2]
And a check that this does the right thing:
>>> ''.join(char_at_pos(i) for i in range(2**6-1))
'aacaaccaaaccaccaaacaacccaaccaccaaacaaccaaaccacccaacaacccaaccacc'
>>> ''.join(char_at_pos(i) for i in range(2**6-1)) == F(6)
True
But we can do better. We're effectively staring at the binary representation of k, removing all trailing '1's and the next '0', then simply looking at the next bit to determine whether we've got an 'a' or a 'c'. Identifying the trailing 1s can be done by bit-operation trickery. This gives us the following semi-obfuscated loop-free solution, which I leave it to you to unwind:
def char_at_pos2(k):
"""
Return the character at position k of the string F(n), for any
n satisfying 2**n-1 > k.
"""
return 'ac'[k//(1+(k+1^k))%2]
Again, let's check:
>>> F(20) == ''.join(char_at_pos2(i) for i in range(2**20-1))
True
Final comments: this is a very well-known and well-studied sequence: it's called the dragon curve sequence, or the regular paper-folding sequence, and is sequence A014577 in the online encyclopaedia of integer sequences. Some Google searches will likely give you many other ways to compute its elements. See also this codegolf question.
Based on what you have already coded, here's my suggestion:
def main_function(num):
if num == 0:
return ''
previous = main_function(num-1)
return previous + 'a' + op1(op2(previous))
print(main_function(10**2017))
P.S: I'm not sure of the efficiency.
When I submit the below code for testcases in HackerRank challenge "AND product"...
You will be given two integers A and B. You are required to compute the bitwise AND amongst all natural numbers lying between A and B, both inclusive.
Input Format:
First line of the input contains T, the number of testcases to follow.
Each testcase in a newline contains A and B separated by a single space.
from math import log
for case in range(int(raw_input())):
l, u = map(int, (raw_input()).split())
if log(l, 2) == log(u, 2) or int(log(l,2))!=int(log(l,2)):
print 0
else:
s = ""
l, u = [x for x in str(bin(l))[2:]], [x for x in str(bin(u))[2:]]
while len(u)!=len(l):
u.pop(0)
Ll = len(l)
for n in range(0, len(l)):
if u[n]==l[n]:
s+=u[n]
while len(s)!=len(l):
s+="0"
print int(s, 2)
...it passes 9 of the test cases, Shows "Runtime error" in 1 test case and shows "Wrong Answer" in the rest 10 of them.
What's wrong in this?
It would be better for you to use the Bitwise operator in Python for AND. The operator is: '&'
Try this code:
def andProduct(a, b):
j=a+1
x=a
while(j<=b):
x = x&j
j+=1
return x
For more information on Bitwise operator you can see: https://wiki.python.org/moin/BitwiseOperators
Yeah you can do this much faster.
You are doing this very straightforward, calculating all ands in a for loop.
It should actually be possible to calculate this in O(1) (I think)
But here are some optimisations:
1) abort the for loop if you get the value 0, because it will stay 0 no matter what
2)If there is a power of 2 between l and u return 0 (you don't need a loop in that case)
My Idea for O(1) would be to think about which bits change between u and l.
Because every bit that changes somewhere between u and l becomes 0 in the answer.
EDIT 1: Here is an answer in O(same leading digits) time.
https://math.stackexchange.com/questions/1073532/how-to-find-bitwise-and-of-all-numbers-for-a-given-range
EDIT 2: Here is my code, I have not tested it extensively but it seems to work. (O(log(n))
from math import log
for case in [[i+1,j+i+1] for i in range(30) for j in range(30)]:
#Get input
l, u = case
invL=2**int(log(l,2)+1)-l
invU=2**int(log(u,2)+1)-u
#Calculate pseudo bitwise xnor of input and format to binary rep
e=format((u&l | invL & invU),'010b')
lBin=format(l,'010b')
#output to zero
res=0
#boolean to check if we have found any zero
anyZero=False
#boolean to check the first one because we have leading zeros
firstOne=False
for ind,i in enumerate(e):
#for every digit
#if it is a leading one
if i=='1' and (not anyZero):
firstOne=True
#leftshift result (multiply by 2)
res=res<<1
#and add 1
res=res+int(lBin[ind])
#else if we already had a one and find a zero this happens every time
elif(firstOne):
anyZero=True
#leftshift
res=res<<1
#test if we are in the same power, if not there was a power between
if(res!=0):
#print "test",(int(log(res,2))!=int(log(l,2))) | ((log(res,2))!=int(log(u,2)))
if((int(log(res,2))!=int(log(l,2))) or (int(log(res,2))!=int(log(u,2)))):
res=0
print res
Worked for every but a single testcase. Small change needed to get the last one. You'll have to find out what that small change is yourself. Seriously