How to make my Python code more time efficient?

How to make my Python code more time efficient? - python

The program I tried to execute has following problem statement:
The program must accept N integers containing integers from 1 to N
with duplicates in any order. The program must print the missing
integers from 1 to N among the given integers in ascending order as
the output.
example :
Input: 5
2 5 5 1 1
Output: 3 4
Explanation: The integers 3 and 4 are missing in the 5 integers 2 5 5
1 1. Hence 3 and 4 are printed as the output
My code :
def modusoperandi(n, t):
if str(n) not in t:
yield n
n = int(input())
t = tuple(sr for sr in input().split())
for i in range(1,n+1):
for j in modusoperandi(i,t):
print(j,end=' ')
My code, however, failed to pass all the test cases since it is takes considerable amount of time to execute for test cases with huge input[takes more than 500 ms which is the time limit].
I tried to compute execution time using timeit method. It is peculiar that when number of elements in tuple increase the execution time also increase for a given N. I prefered tuple over list since it is supposed to be more efficient.

You'll want to convert the existing numbers into integers, then put them in a set; sets are very efficient for figuring out whether or not a given value is a member.
n = int(input())
extant = set(int(n) for n in input().split())
for i in range(1, n + 1):
if i not in extant:
print(i, end=" ")

The key is indeed to use a set for checking the presence of expected numbers in the input string. You don't need to convert the input to integers though. You can do this the other way around by generating sequential numbers as strings.
nums = input().split()
numSet = set(nums)
missing = " ".join(str(n) for n in range(1,len(nums)+1) if str(n) not in numSet)
print(missing) # 3 4
For this particular problem, there is a slightly faster alternative to using a set because you can afford to create an array of flags with a known (and reasonable) size:
numbers = input().split()
present = [False]*len(numbers)
for n in numbers: present[int(n)-1] = True
missing = " ".join(str(n+1) for n,seen in enumerate(present) if not seen)

n = '5'
i = '2 5 5 1 1'
def compute(n, i):
s1 = set(range(1, n+1))
yield from sorted(s1.difference(i))
for val in compute(int(n), map(int, i.split()) ):
print(val, end=' ')
Prints:
3 4

You should think of the complexity of your solution (which is quite bad):
def modusoperandi(n, t):
# Since 't' is a tuple, the complexity of 'not in t' is O(len(t))
# This makes the overall complexity of this function O(len(t))
if str(n) not in t:
yield n
n = int(input())
t = tuple(sr for sr in input().split()) # O(len(t))
for i in range(1,n+1): # O(n) iterations
# 0 or 1 iteration, but the call to 'modusoperandi' is O(len(t))
for j in modusoperandi(i,t):
print(j,end=' ')
Overall complexity O(n * len(t)). This is not a very nice complexity. You'd like to have a complexity which is linear in the input. There are two ways:
Use a hash table to mark all visited integers, and set is such a hash-table. Unfortunately hash-tables have some shortcomings.
Since there are n entries and the numbers are in the range 1..n, then it is very efficient to use a characteristic vector values_encountered, in which values_encountered[i] is True if and only if i value was encountered. For big input, of this kind, this solution is likely to run faster than a set, and to consume less memory.
.
import numpy as np
n = int(input())
values_encountered = np.zeros(n+1, dtype=bool) # O(n)
values_encountered[[int(i) for i in input().split()]] = True # O(n)
# Or:
# values_encountered[list(map(int, input().split()))] = True
values_missing= (values_encountered == False) # O(n)
values_missing[0] = False
print(*list(*values_missing.nonzero())) # O(n)

Related

generate unique string of length n without prefilled dictionary

I have an application that is kind of like a URL shortener and need to generate unique URL whenever a user requests.
For this I need a function to map an index/number to a unique string of length n with two requirements:
Two different numbers can not generate same string.
In other words as long as i,j<K: f(i) != f(j). K is the number of possible strings = 26^n. (26 is number of characters in English)
Two strings generated by number i and i+1 don't look similar most of the times. For example they are not abcdef1 and abcdef2. (So that users can not predict the pattern and the next IDs)
This is my current code in Python:
chars = "abcdefghijklmnopqrstuvwxyz"
for item in itertools.product(chars, repeat=n):
print("".join(item))
# For n = 7 generates:
# aaaaaaa
# aaaaaab
# aaaaaac
# ...
The problem with this code is there is no index that I can use to generate unique strings on demand by tracking that index. For example generate 1 million unique strings today and 2 million tomorrow without looping through or collision with the first 1 million.
The other problem with this code is that the strings that are created after each other look very similar and I need them to look random.
One option is to populate a table/dictionary with millions of strings, shuffle them and keep track of index to that table but it takes a lot of memory.
An option is also to check the database of existing IDs after generating a random string to make sure it doesn't exist but the problem is as I get closer to the K (26^n) the chance of collision increases and it wouldn't be efficient to make a lot of check_if_exist queries against the database.
Also if n was long enough I could use UUID with small chance of collision but in my case n is 7.

I'm going to outline a solution for you that is going to resist casual inspection even by a knowledgeable person, though it probably IS NOT cryptographically secure.
First, your strings and numbers are in a one-to-one map. Here is some simple code for that.
alphabet = 'abcdefghijklmnopqrstuvwxyz'
len_of_codes = 7
char_to_pos = {}
for i in range(len(alphabet)):
char_to_pos[alphabet[i]] = i
def number_to_string(n):
chars = []
for _ in range(len_of_codes):
chars.append(alphabet[n % len(alphabet)])
n = n // len(alphabet)
return "".join(reversed(chars))
def string_to_number(s):
n = 0
for c in s:
n = len(alphabet) * n + char_to_pos[c]
return n
So now your problem is how to take an ascending stream of numbers and get an apparently random stream of numbers out of it instead. (Because you know how to turn those into strings.) Well, there are lots of tricks for primes, so let's find a decent sized prime that fits in the range that you want.
def is_prime (n):
for i in range(2, n):
if 0 == n%i:
return False
elif n < i*i:
return True
if n == 2:
return True
else:
return False
def last_prime_before (n):
for m in range(n-1, 1, -1):
if is_prime(m):
return m
print(last_prime_before(len(alphabet)**len_of_codes)
With this we find that we can use the prime 8031810103. That's how many numbers we'll be able to handle.
Now there is an easy way to scramble them. Which is to use the fact that multiplication modulo a prime scrambles the numbers in the range 1..(p-1).
def scramble1 (p, k, n):
return (n*k) % p
Picking a random number to scramble by, int(random.random() * 26**7) happened to give me 3661807866, we get a sequence we can calculate with:
for i in range(1, 5):
print(number_to_string(scramble1(8031810103, 3661807866, i)))
Which gives us
lwfdjoc
xskgtce
jopkctb
vkunmhd
This looks random to casual inspection. But will be reversible for any knowledgeable someone who puts modest effort in. They just have to guess the prime and algorithm that we used, look at 2 consecutive values to get the hidden parameter, then look at a couple of more to verify it.
Before addressing that, let's figure out how to take a string and get the number back. Thanks to Fermat's little theorem we know for p prime and 1 <= k < p that (k * k^(p-2)) % p == 1.
def n_pow_m_mod_k (n, m, k):
answer = 1
while 0 < m:
if 1 == m % 2:
answer = (answer * n) % k
m = m // 2
n = (n * n) % k
return answer
print(n_pow_m_mod_k(3661807866, 8031810103-2, 8031810103))
This gives us 3319920713. Armed with that we can calculate scramble1(8031810103, 3319920713, string_to_number("vkunmhd")) to find out that vkunmhd came from 4.
Now let's make it harder. Let's generate several keys to be scrambling with:
import random
p = 26**7
for i in range(5):
p = last_prime_before(p)
print((p, int(random.random() * p)))
When I ran this I happened to get:
(8031810103, 3661807866)
(8031810097, 3163265427)
(8031810091, 7069619503)
(8031809963, 6528177934)
(8031809917, 991731572)
Now let's scramble through several layers, working from smallest prime to largest (this requires reversing the sequence):
def encode (n):
for p, k in [
(8031809917, 991731572)
, (8031809963, 6528177934)
, (8031810091, 7069619503)
, (8031810097, 3163265427)
, (8031810103, 3661807866)
]:
n = scramble1(p, k, n)
return number_to_string(n)
This will give a sequence:
ehidzxf
shsifyl
gicmmcm
ofaroeg
And to reverse it just use the same trick that reversed the first scramble (reversing the primes so I am unscrambling in the order that I started with):
def decode (s):
n = string_to_number(s)
for p, k in [
(8031810103, 3319920713)
, (8031810097, 4707272543)
, (8031810091, 5077139687)
, (8031809963, 192273749)
, (8031809917, 5986071506)
]:
n = scramble1(p, k, n)
return n
TO BE CLEAR I do NOT promise that this is cryptographically secure. I'm not a cryptographer, and I'm aware enough of my limitations that I know not to trust it.
But I do promise that you'll have a sequence of over 8 billion strings that you are able to encode/decode with no obvious patterns.
Now take this code, scramble the alphabet, regenerate the magic numbers that I used, and choose a different number of layers to go through. I promise you that I personally have absolutely no idea how someone would even approach the problem of figuring out the algorithm. (But then again I'm not a cryptographer. Maybe they have some techniques to try. I sure don't.)

How about :
from random import Random
n = 7
def f(i):
myrandom = Random()
myrandom.seed(i)
alphabet = "123456789"
return "".join([myrandom.choice(alphabet) for _ in range(7)])
# same entry, same output
assert f(0) == "7715987"
assert f(0) == "7715987"
assert f(0) == "7715987"
# different entry, different output
assert f(1) == "3252888"
(change the alphabet to match your need)
This "emulate" a UUID, since you said you could accept a small chance of collision. If you want to avoid collision, what you really need is a perfect hash function (https://en.wikipedia.org/wiki/Perfect_hash_function).

you can try something based on the sha1 hash
#!/usr/bin/python3
import hashlib
def generate_link(i):
n = 7
a = "abcdefghijklmnopqrstuvwxyz01234567890"
return "".join(a[x%36] for x in hashlib.sha1(str(i).encode('ascii')).digest()[-n:])

This is a really simple example of what I outlined in this comment. It just offsets the number based on i. If you want "different" strings, don't use this, because if num is 0, then you will get abcdefg (with n = 7).
alphabet = "abcdefghijklmnopqrstuvwxyz"
# num is the num to convert, i is the "offset"
def num_to_char(num, i):
return alphabet[(num + i) % 26]
# generate the link
def generate_link(num, n):
return "".join([num_to_char(num, i) for i in range(n)])
generate_link(0, 7) # "abcdefg"
generate_link(0, 7) # still "abcdefg"
generate_link(0, 7) # again, "abcdefg"!
generate_link(1, 7) # now it's "bcdefgh"!
You would just need to change the num + i to some complicated and obscure math equation.

How do I optimize my Python code to perform my calculation using less memory?

I have put together the following code in order to determine if a number has an odd or even number of divisors. The code works well with relatively small numbers but once a larger number like 9 digits is entered it hangs up.
def divisors(n):
num = len(set([x for x in range(1,n+1) if not divmod(n,x)[1]]))
if (num != 0 and num % 2 == 0):
return 'even'
else:
return 'odd'
what can I do to make this more efficient?

Here's your problem:
num = len(set([x for x in range(1,n+1) if not divmod(n,x)[1]]))
This constructs a list, then constructs a set out of that list, then takes the length of the set. You don't need to do any of that work (range(), or xrange() for that matter, does not produce repeated objects, so we don't need the set, and sum() works on any iterable object, so you don't need the list either). While we're on the subject, divmod(n, x)[1] is just a very elaborate way of writing n % x, and consumes a little bit of extra memory to construct a tuple (which is immediately reclaimed because you throw the tuple away). Here's the fixed version:
num = sum(1 for x in xrange(1,n+1) if not n % x)

You do not need to test every possible divisor, testing up to sqrt(n) is enough. This will make your function O(sqrt(n)) instead of O(n).
import math
def num_divisors(n):
sqrt = math.sqrt(n)
upper = int(sqrt)
num = sum(1 for x in range(1, upper + 1) if not n % x)
num *= 2
if upper == sqrt and num != 0:
num -= 1
return num
In my benchmarks using python2 this is 1000 times faster than sum(1 for x in range(1, n + 1) if not n % x) with n = int(1e6) and 10000 times faster for 1e8. For 1e9 the latter code gave me a memory error, suggesting that the whole sequence is stored in memory before doing the sum because in python 2 range() returns a list and I should be using xrange() instead. For python3 range() is fine.

How to make the length of a list "len(list)" a variable to be used for further calculations? Python3

Is it possible to make the result from len(factors) be assigned as a variable? What I have so far is h = int(len(factors)), however i'm not sure if this actually does anything. My code below is attempting to take an integer 'r' and represent 'r' in the form (2^k)*t+1. This part of the code below is dealing with finding this product of powers of two and some other odd integer (2^k)*t.
It could be that I am going about this the wrong way, but from my research and trial and error, I have finally got this to work so far. But now more issues arise when extracting certain values.
from math import *
def executeproth():
r = input("Number to test:")
n = int(r)-1
d = 2
factors = []
while n % 2 == 0:
factors.append(d)
n = int(n/d)
h = int(len(factors))
print(n, factors, h)
# k = eval(2**h)
return factors
executeproth()
For example an input of 29 yields the following:
Number to test:29
14 [2] 1
7 [2, 2] 2
So in this instance, t=7, k=2, so we would have 29=(2^2)*7+1.
What I want to do is now take the third lines values, namely the '2', and use this for further calculations. But the commented out line # k = eval(2**h) throws the error as follows:
TypeError: eval() arg 1 must be a string, bytes or code object
So from what I can understand, the thing I am trying to evaluate is not in the correct form. I also wonder if the problem arises due to the nature of the while loop that keeps feeding values back in and creating multiples lists, as shown, and hence multiple values of h len(factors).
How would one print only the results of the 'final' iteration in the while loop? i.e. 7 [2,2] 2

Here this should fulfil your requirement,I don't think you really need to evaluate k.
Also this addresses the second part of your question too, to print the final result of the loop.
And it is as Gregory pointed out that convert explicitly to int only when needed and eval is for strings, your expression was already in integer terms.
def executeproth():
r = input("Number to test:")
n = int(r) - 1
d = 2
factors = []
while n % 2 == 0:
factors.append(d)
n = n // d
h = len(factors)
#print(n, factors, h)
else:
print"{} = ( 2 ^ {} ) * {} + 1".format(r,h,n)
return factors
executeproth()

First of all, you don't need to explicitly convert a value to an int just to use it in an expression in general. You do need it when processing the input since input() returns a string.
It is more idiomatic to use integer division a // b instead of int(a/b) in python 3.
Finally, eval is for evaluating strings, not expressions. Expressions are always evaluated.
from math import *
def executeproth():
r = input("Number to test:")
n = int(r)-1
d = 2
factors = []
while n % 2 == 0:
factors.append(d)
n = n // d
h = len(factors)
print(n, factors, h)
k = 2**h
# does the same thing but is less efficient
# k = eval("2**h")
return factors
executeproth()

As others have said you don't need eval here. In fact, you should generally avoid using eval since it can be dangerous. And in most situations where you do need to evaluate an expression in string form you can generally get by with the much safer ast.literal_eval. However, at this stage of your learning it's unlikely that you will encounter many situations where you need to work with such advanced features of the language.
Anyway, here are a few more improvements to your code.
You don't need to import the math module since you aren't using any of the functions or constants defined in it. But when you do need to import a module it's best to avoid the from module_name import * form since that pollutes your namespace with all of the names defined in the module.
You don't need to store those 2s in a list - just count them.
It's better to do your input (and input validation) in the outer layers of your program rather than doing it deep in the functions that perform your calculations.
Python provides various augmented assignment operators that you can use when you want to perform a simple operation on a value and store the result back under the original name. Eg count += 1 adds 1 to count, saving the result in count.
Python allows you to return multiple objects as a tuple, so you can return the final value of n and the count of the number of factors of 2 that you've found.
def executeproth(r):
n = r - 1
count = 0
if r != 0:
while n % 2 == 0:
count += 1
n //= 2
return n, count
r = int(input("Number to test: "))
n, count = executeproth(r)
k = 2 ** count
print("{0} = {1} * 2 ** {2} + 1".format(r, n, count))
#v = n*k + 1
#if v != r:
# print("Error!")
The if r != 0: prevents infinite looping if r is zero.
I've also added a (commented-out) test at the end. It's a good idea to do simple tests like that to make sure we're getting what we expect. Writing useful tests is an important part of program development.
Typical output:
Number to test: 0
0 = -1 * 2 ** 0 + 1
Number to test: 29
29 = 7 * 2 ** 2 + 1
Number to test: 57
57 = 7 * 2 ** 3 + 1

Number of multiples less than the max number

For the following problem on SingPath:
Given an input of a list of numbers and a high number,
return the number of multiples of each of
those numbers that are less than the maximum number.
For this case the list will contain a maximum of 3 numbers
that are all relatively prime to each
other.
Here is my code:
def countMultiples(l, max_num):
counting_list = []
for i in l:
for j in range(1, max_num):
if (i * j < max_num) and (i * j) not in counting_list:
counting_list.append(i * j)
return len(counting_list)
Although my algorithm works okay, it gets stuck when the maximum number is way too big
>>> countMultiples([3],30)
9 #WORKS GOOD
>>> countMultiples([3,5],100)
46 #WORKS GOOD
>>> countMultiples([13,25],100250)
Line 5: TimeLimitError: Program exceeded run time limit.
How to optimize this code?

3 and 5 have some same multiples, like 15.
You should remove those multiples, and you will get the right answer
Also you should check the inclusion exclusion principle https://en.wikipedia.org/wiki/Inclusion-exclusion_principle#Counting_integers
EDIT:
The problem can be solved in constant time. As previously linked, the solution is in the inclusion - exclusion principle.
Let say you want to get the number of multiples of 3 less than 100, you can do this by dividing floor(100/3), the same applies for 5, floor(100/5).
Now to get the multiplies of 3 and 5 that are less than 100, you would have to add them, and subtract the ones that are multiples of both. In this case, subtracting multiplies of 15.
So the answer for multiples of 3 and 5, that are less than 100 is floor(100/3) + floor(100/5) - floor(100/15).
If you have more than 2 numbers, it gets a bit more complicated, but the same approach applies, for more check https://en.wikipedia.org/wiki/Inclusion-exclusion_principle#Counting_integers
EDIT2:
Also the loop variant can be speed up.
Your current algorithm appends multiple in a list, which is very slow.
You should switch the inner and outer for loop. By doing that you would check if any of the divisors divide the number, and you get the the divisor.
So just adding a boolean variable which tells you if any of your divisors divide the number, and counting the times the variable is true.
So it would like this:
def countMultiples(l, max_num):
nums = 0
for j in range(1, max_num):
isMultiple = False
for i in l:
if (j % i == 0):
isMultiple = True
if (isMultiple == True):
nums += 1
return nums
print countMultiples([13,25],100250)

If the length of the list is all you need, you'd be better off with a tally instead of creating another list.
def countMultiples(l, max_num):
count = 0
counting_list = []
for i in l:
for j in range(1, max_num):
if (i * j < max_num) and (i * j) not in counting_list:
count += 1
return count

Finding digits in powers of 2 fast

The task is to search every power of two below 2^10000, returning the index of the first power in which a string is contained. For example if the given string to search for is "7" the program will output 15, as 2^15 is the first power to contain 7 in it.
I have approached this with a brute force attempt which times out on ~70% of test cases.
for i in range(1,9999):
if search in str(2**i):
print i
break
How would one approach this with a time limit of 5 seconds?

Try not to compute 2^i at each step.
pow = 1
for i in xrange(1,9999):
if search in str(pow):
print i
break
pow *= 2
You can compute it as you go along. This should save a lot of computation time.
Using xrange will prevent a list from being built, but that will probably not make much of a difference here.
in is probably implemented as a quadratic string search algorithm. It may (or may not, you'd have to test) be more efficient to use something like KMP for string searching.

A faster approach could be computing the numbers directly in decimal
def double(x):
carry = 0
for i, v in enumerate(x):
d = v*2 + carry
if d > 99999999:
x[i] = d - 100000000
carry = 1
else:
x[i] = d
carry = 0
if carry:
x.append(carry)
Then the search function can become
def p2find(s):
x = [1]
for y in xrange(10000):
if s in str(x[-1])+"".join(("00000000"+str(y))[-8:]
for y in x[::-1][1:]):
return y
double(x)
return None
Note also that the digits of all powers of two up to 2^10000 are just 15 millions, and searching the static data is much faster. If the program must not be restarted each time then
def p2find(s, digits = []):
if len(digits) == 0:
# This precomputation happens only ONCE
p = 1
for k in xrange(10000):
digits.append(str(p))
p *= 2
for i, v in enumerate(digits):
if s in v: return i
return None
With this approach the first check will take some time, next ones will be very very fast.

Compute every power of two and build a suffix tree using each string. This is linear time in the size of all the strings. Now, the lookups are basically linear time in the length of each lookup string.
I don't think you can beat this for computational complexity.

There are only 10000 numbers. You don't need any complex algorithms. Simply calculated them in advance and do search. This should take merely 1 or 2 seconds.
powers_of_2 = [str(1<<i) for i in range(10000)]
def search(s):
for i in range(len(powers_of_2)):
if s in powers_of_2[i]:
return i

Try this
twos = []
twoslen = []
two = 1
for i in xrange(10000):
twos.append(two)
twoslen.append(len(str(two)))
two *= 2
tens = []
ten = 1
for i in xrange(len(str(two))):
tens.append(ten)
ten *= 10
s = raw_input()
l = len(s)
n = int(s)
for i in xrange(len(twos)):
for j in xrange(twoslen[i]):
k = twos[i] / tens[j]
if k < n: continue
if (k - n) % tens[l] == 0:
print i
exit()
The idea is to precompute every power of 2, 10 and and also to precompute the number of digits for every power of 2. In this way the problem is reduces to finding the minimum i for which there exist a j such that after removing the last j digits from 2 ** i you obtain a number which ends with n or expressed as a formula (2 ** i / 10 ** j - n) % 10 ** len(str(n)) == 0.

A big problem here is that converting a binary integer to decimal notation takes time quadratic in the number of bits (at least in the straightforward way Python does it). It's actually faster to fake your own decimal arithmetic, as #6502 did in his answer.
But it's very much faster to let Python's decimal module do it - at least under Python 3.3.2 (I don't know how much C acceleration is built in to Python decimal versions before that). Here's code:
class S:
def __init__(self):
import decimal
decimal.getcontext().prec = 4000 # way more than enough for 2**10000
p2 = decimal.Decimal(1)
full = []
for i in range(10000):
s = "%s<%s>" % (p2, i)
##assert s == "%s<%s>" % (str(2**i), i)
full.append(s)
p2 *= 2
self.full = "".join(full)
def find(self, s):
import re
pat = s + "[^<>]*<(\d+)>"
m = re.search(pat, self.full)
if m:
return int(m.group(1))
else:
print(s, "not found!")
and sample usage:
>>> s = S()
>>> s.find("1")
0
>>> s.find("2")
1
>>> s.find("3")
5
>>> s.find("65")
16
>>> s.find("7")
15
>>> s.find("00000")
1491
>>> s.find("666")
157
>>> s.find("666666")
2269
>>> s.find("66666666")
66666666 not found!
s.full is a string with a bit over 15 million characters. It looks like this:
>>> print(s.full[:20], "...", s.full[-20:])
1<0>2<1>4<2>8<3>16<4 ... 52396298354688<9999>
So the string contains each power of 2, with the exponent following a power enclosed in angle brackets. The find() method constructs a regular expression to search for the desired substring, then look ahead to find the power.
Playing around with this, I'm convinced that just about any way of searching is "fast enough". It's getting the decimal representations of the large powers that sucks up the vast bulk of the time. And the decimal module solves that one.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.