Python methods to reduce running times KTNS algorithm - python

I have been trying to develop an algorithm called keep the tool needed soonest but during the simulations, I have realized that it takes too much time to run.
I want to decrease the running times and checking other previous questions about how to fast python coding Is Python slower than Java/C#? [closed] I have found several solutions, but I don't know how to implement them in my code.
On my computer it takes 0.004999876022338867 seconds, but the main problem is that for the whole program this function is called 13,000 times.
Here I attach my whole code, if you have any suggestion to improve it please don't hesitate to share with me.
import sets
import numpy
import copy
import time
J= {1: [6, 7],2: [2, 3], 3: [1, 6, 9], 4: [1, 5, 9], 5: [5, 8, 10], 6: [1, 3, 6, 8], 7: [5, 6, 8, 9], 8: [5, 7, 8], 9: [1, 4, 5, 8], 10: [1, 2, 4, 10]}
def KTNS(sigma=[10, 9, 4, 1, 6, 3, 7, 5, 2, 8], Jobs=J, m=10 ,capacity=4 ):
t0=time.time()
Tools = {}
Lin={}
n=len(sigma)
for i in range(1,m+1):
for e in sigma:
if i in Jobs[e]:
Tools[i]=sets.Set([])
count = 1
available_tools=sets.Set()
for e in sigma:
for i in Jobs[e]:
Tools[i].add(count)
available_tools.add(i)
count+=1
Tin=copy.deepcopy(Tools)
for e in Tin:
Lin[e]=min(Tin[e])
count=1
J = numpy.array([0] *m)
W = numpy.array([[0] * m] * n)
while count<=len(sigma):
for e in Tools:
if len(available_tools)<capacity:
reference=len(available_tools)
else:
reference=capacity
while numpy.count_nonzero(J == 1) <reference:
min_value = min(Lin.itervalues())
min_keys = [k for k in Lin if Lin[k] == min_value]
temp = min_keys[0] #min(Lin, key=Lin.get)
if min_value>count:
if len(min_keys)>=2:
if count==1:
J[temp - 1] = 1
Lin[temp] = '-'
else:
J0=W[count-2]
k=0
for elements in min_keys: #tested
if numpy.count_nonzero(J == 1) < reference:
if J0[elements-1]==1:
J[elements-1]=1
Lin[elements]='-'
k=1
else:
pass
else:
pass
if k==0:
J[temp - 1] = 1
Lin[temp] = '-'
else:
J[temp - 1] = 1
Lin[temp] = '-'
else:
J[temp-1]=1
Lin[temp] = '-'
Tin[e].discard(count)
for element in Tin:
try:
Lin[element] = min(Tin[element])
except ValueError:
Tin[element]=sets.Set([len(sigma)+1])
Lin[element]=len(sigma)+1
W[count-1]=J
J= numpy.array([0] *m)
count+=1
Cost=0
for e in range(1,len(sigma)):
temp=W[e]-W[e-1]
temp[temp < 0] = 0
Cost+=sum(temp)
return Cost+capacity,time.time()-t0

One recommendation - try to minimize your use of dictionaries. It looks like many of your dictionaries could instead be lists. Dictionary access is much slower than list access in python.
It looks like you could simply make Tools, Lin and Tin all be lists, e.g. Lin = [] instead of Lin = {}, and I expect you'll see a drastic improvement in performance.
You know the sizes of your 3 dictionaries, so just initialize them to the size you need. Create Lin and Tools as follows:
Lin = [None] * m+1
Tools = [None] * m+1
Tin = [None] * m+1
This will make a list of m+1 elements (which is what you'll get with your loop from 1 through m+1). Since you're doing 1-based indexing, it leaves an empty place in Lin[0], Tools[0], etc, but you'll then be able to access Lin[1] - Lin[10], as you're currently doing.
Simple example you can try for yourself:
python3 -m timeit -s 'foo = [x for x in range(10000)]' 'foo[500]'
100000000 loops, best of 3: 0.0164 usec per loop
python3 -m timeit -s 'foo = {x: x for x in range(10000)}' 'foo[500]'
10000000 loops, best of 3: 0.0254 usec per loop
Simply by changing your dictionaries to list, you get almost a 2x improvement. Your 65 second task would now take about 35 seconds.
By the way, check out the python wiki for tips on improving speed, including lots of references on how to profile your function.

Related

Random series whre non are 75% alike

So, I've been trying to make a random series generator with the given numbers using an array:
so the possibilities are: [0-9, 0-9, 0-9, 0 - 59, 0-9, 0-9, 0-9].
The only problem is that I want that all the series aren't even 75% the same (no more than 2 numbers the same).
So here are some examples:
Good:
[1, 1, 1, 1, 1, 1, 1]
[2, 2, 1, 2, 1, 2, 2]
Not good:
[1, 1, 1, 1, 1, 1, 1]
[2, 2, 1, 2, 1, 2, 1]
So, if there are fewer than 2 numbers the same it deletes the second one.
And the second problem is that I want 10,000 of these series.
Sorry if I didn't explain it well, the code would probably explain what I tried to explain.
TRIGGER WARNING!! CODE ISN'T EFFICIENT AT ALL!!
TOTAL_SERIES = 10000
placement_amount = [9, 9, 9, 59, 9, 9, 9]
all_series = []
def create_series():
global fail, success
series = []
for i in range(len(placement_amount)):
series.append(random.randint(0, placement_amount[i]))
for i in all_series:
count = 0
for j in range(len(i)):
if series[j] == i[j]:
count += 1
if count > 2:
return;
all_series.append(series)
while len(all_series) < TOTAL_SERIES:
create_series()
The code technically works but it takes around 1 hour to generate 400 of these since the longer it runs the harder it takes to find a series that follows the rules.
So, my question is how do I make it more efficient and so it will make 10,000 series the fastest a code can.
What I've tried so far:
Tried adding cuda so I'll be able to run the code on a gpu making it faster (have python 32-bit so can't)
Tried creating a few threads where each generates 10,000/threads amount and then run a code that deletes all the ones who don't follow the rules (the code just got stuck).
I'm open to hear how I can try these again but with a correct code or anything that will make it efficient.
The answer for me isn't code efficiency but just that it's impossible make 10,000 series since the first 3 numbers can't be identical, so I changed the lines:
if counter > 2:
to
if counter > 3
Thanks everyone for the help, but if you got a way to make it more efficient it would be nice :D
Your original solution is in O( P(N)*N), you can reduce it to O(P(N)) with dicts and computing the differrent index combinations:
-P(N) is the expected number of iterations to get N such series
- the constants are larger!
import itertools
import random
indexes=list(itertools.combinations(range(7),3))
big_dict={ k : {} for k in indexes }
TOTAL_SERIES = 1000
placement_amount = [9, 9, 9, 59, 9, 9, 9]
all_series = []
loops=0
while len(all_series) < TOTAL_SERIES:
loops+=1
candidate = tuple(random.randint(0, amount) for amount in placement_amount)
if any( (candidate[index[0]],candidate[index[1]],candidate[index[2]]) in \
big_dict[index] for index in indexes ):
continue
else:
for index in indexes:
big_dict[index[(candidate[index[0]],candidate[index[1]],candidate[index[2]])]=True
all_series.append(candidate)
This must be the solution:
import random
def gen_series(pattern):
return [random.randint(0, max_val) for max_val in pattern]
pattern = [9, 9, 9, 59, 9, 9, 9]
for i in range(100):
print(gen_series(pattern))

How to improve time complexity of remove all multiplicands from array or list?

I am trying to find elements from array(integer array) or list which are unique and those elements must not divisible by any other element from same array or list.
You can answer in any language like python, java, c, c++ etc.
I have tried this code in Python3 and it works perfectly but I am looking for better and optimum solution in terms of time complexity.
assuming array or list A is already sorted and having unique elements
A = [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
while i<len(A)-1:
while j<len(A):
if A[j]%A[i]==0:
A.pop(j)
else:
j+=1
i+=1
j=i+1
For the given array A=[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] answer would be like ans=[2,3,5,7,11,13]
another example,A=[4,5,15,16,17,23,39] then ans would be like, ans=[4,5,17,23,39]
ans is having unique numbers
any element i from array only exists if (i%j)!=0, where i!=j
I think it's more natural to do it in reverse, by building a new list containing the answer instead of removing elements from the original list. If I'm thinking correctly, both approaches do the same number of mod operations, but you avoid the issue of removing an element from a list.
A = [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
ans = []
for x in A:
for y in ans:
if x % y == 0:
break
else: ans.append(x)
Edit: Promoting the completion else.
This algorithm will perform much faster:
A = [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
if (A[-1]-A[0])/A[0] > len(A)*2:
result = list()
for v in A:
for f in result:
d,m = divmod(v,f)
if m == 0: v=0;break
if d<f: break
if v: result.append(v)
else:
retain = set(A)
minMult = 1
maxVal = A[-1]
for v in A:
if v not in retain : continue
minMult = v*2
if minMult > maxVal: break
if v*len(A)<maxVal:
retain.difference_update([m for m in retain if m >= minMult and m%v==0])
else:
retain.difference_update(range(minMult,maxVal,v))
if maxVal%v == 0:
maxVal = max(retain)
result = list(retain)
print(result) # [2, 3, 5, 7, 11, 13]
In the spirit of the sieve of Eratostenes, each number that is retained, removes its multiples from the remaining eligible numbers. Depending on the magnitude of the highest value, it is sometimes more efficient to exclude multiples than check for divisibility. The divisibility check takes several times longer for an equivalent number of factors to check.
At some point, when the data is widely spread out, assembling the result instead of removing multiples becomes faster (this last addition was inspired by Imperishable Night's post).
TEST RESULTS
A = [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] (100000 repetitions)
Original: 0.55 sec
New: 0.29 sec
A = list(range(2,5000))+[9697] (100 repetitions)
Original: 3.77 sec
New: 0.12 sec
A = list(range(1001,2000))+list(range(4000,6000))+[9697**2] (10 repetitions)
Original: 3.54 sec
New: 0.02 sec
I know that this is totally insane but i want to know what you think about this:
A = [4,5,15,16,17,23,39]
prova=[[x for x in A if x!=y and y%x==0] for y in A]
print([A[idx] for idx,x in enumerate(prova) if len(prova[idx])==0])
And i think it's still O(n^2)
If you care about speed more than algorithmic efficiency, numpy would be the package to use here in python:
import numpy as np
# Note: doesn't have to be sorted
a = [2, 2, 3, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 16, 29, 29]
a = np.unique(a)
result = a[np.all((a % a[:, None] + np.diag(a)), axis=0)]
# array([2, 3, 5, 7, 11, 13, 29])
This divides all elements by all other elements and stores the remainder in a matrix, checks which columns contain only non-0 values (other than the diagonal), and selects all elements corresponding to those columns.
This is O(n*M) where M is the max size of an integer in your list. The integers are all assumed to be none negative. This also assumes your input list is sorted (came to that assumption since all lists you provided are sorted).
a = [4, 7, 7, 8]
# a = [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
# a = [4, 5, 15, 16, 17, 23, 39]
M = max(a)
used = set()
final_list = []
for e in a:
if e in used:
continue
else:
used.add(e)
for i in range(e, M + 1):
if not (i % e):
used.add(i)
final_list.append(e)
print(final_list)
Maybe this can be optimized even further...
If the list is not sorted then for the above method to work, one must sort it. The time complexity will then be O(nlogn + Mn) which equals to O(nlogn) when n >> M.

Python - how to speed up a for loop creating a numpy array from another numpy array calculation

First off, apologies for the vague title, I couldn't think of an appropriate name for this issue.
I have 3 numpy arrays in the follwing formats:
N = ([[13, 14, 15], [2, 5, 7], [4, 6, 8] ... several hundred thousand elements long
e1 = [1, 0, 0]
e2 = [0, 1, 0]
The idea is to create a fourth array, 'v', which shall have the same dimensions as 'N', but will be given values based on an if statement. Here is what I currently have which should better explain the issue:
v = np.zeros([len(N), 3])
for i in range(0, len(N)):
if((N*e1)[i,0] != 0):
v[i] = np.cross(N[i],e1)
else:
v[i] = np.cross(N[i],e2)
This code does what I require it to but does so in a longer than anticipated time (> 5 mins). Is there any form of list comprehension or similar concept I could use to increase the efficiency of the code?
You can use numpy.where to replace if-else and vectorize the process with broadcasting, here is an option with numpy.where:
import numpy as np
np.where(np.repeat(N[:,0] != 0, 3).reshape(1000,3), np.cross(N, e1), np.cross(N, e2))
Some benchmarks here:
1) Data set up:
N = np.array([np.random.randint(0,10,3) for i in range(1000)])
N
#array([[3, 5, 0],
# [5, 0, 8],
# [4, 6, 0],
# ...,
# [9, 4, 2],
# [6, 9, 3],
# [2, 9, 2]])
e1 = np.array([1, 0, 0])
e2 = np.array([0, 1, 0])
2) Timing:
def forloop():
v = np.zeros([len(N), 3]);
​
for i in range(0, len(N)):
if((N*e1)[i,0] != 0):
v[i] = np.cross(N[i],e1)
else:
v[i] = np.cross(N[i],e2)
return v
def forloop2():
v = np.zeros([len(N), 3])
​
# Only calculate this one time.
my_product = N*e1
​
for i in range(0, len(N)):
if my_product[i,0] != 0:
v[i] = np.cross(N[i],e1)
else:
v[i] = np.cross(N[i],e2)
return v
%timeit forloop()
10 loops, best of 3: 25.5 ms per loop
%timeit forloop2()
100 loops, best of 3: 12.7 ms per loop
%timeit np.where(np.repeat(N[:,0] != 0, 3).reshape(1000,3), np.cross(N, e1), np.cross(N, e2))
10000 loops, best of 3: 71.9 µs per loop
3) Result checking for all methods:
v1 = forloop()
v2 = np.where(np.repeat(N[:,0] != 0, 3).reshape(1000,3), np.cross(N, e1), np.cross(N, e2))
v3 = forloop2()
(v3 == v1).all()
# True
(v1 == v2).all()
# True
I'm not certain what it is you're trying to do, but I know why this specific code is so slow for you. The worst offender is (N*e1). That's a simple calculation, and it runs pretty fast with numpy, but you're executing it inside of the loop, len(N) times!.
I am able to execute your code with N == 1000000 in less than 15 seconds on my machine by pulling that outside of the loop. Example below.
v = np.zeros([len(N), 3])
# Only calculate this one time.
my_product = N*e1
for i in range(0, len(N)):
if my_product[i,0] != 0):
v[i] = np.cross(N[i],e1)
else:
v[i] = np.cross(N[i],e2)
The other answer demonstrates how to avoid the for loop and if statements for a lot of extra speed at the cost of somewhat less readable code.

the path complexity (fastest route) to any given number in python

Today I went to a math competition and there was a question that was something like this:
You have a given number n, now you have to like calculate what's the shortest route to that number, but there are rules.
You start with number 1
You end when you reach n
You can get to n either by doubling your previous number, or by adding two previous numbers.
Example: n = 25
Slowest route : 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25
(You just keep adding 1)
Fastest route : 1,2,4,8,16,24,25, complexity = 6
Example: n = 8
Fastest route : 1,2,4,8, complexity = 3
Example : n = 15
Fastest route : 1,2,3,6,9,15, complexity = 5
How do I make a program that can calculate the complexity of a given number n (with n <= 32)?
I already know that for any given number n ( n <= 32 ) that the complexity is lower that 1.45 x 2log(n).
So now I only need to calculate all the routes with complexity below 1.45 x 2log(n) and than compare them and see which one is the fastest 'route'.
But I have no idea how to put all the routes and all this in python, because the number of routes changes when the given number n changes.
This is what I have for now:
number = raw_input('Enter your number here : ')
startnumber = 1
complexity = 0
while startnumber <= number
I accept the challenge :)
The algorithm is relatively fast. It calculates the complexity of the first 32 numbers in 50ms on my computer, and I don't use multithreading. (or 370ms for the first 100 numbers.)
It is a recursive branch and cut algorithm. The _shortest function takes 3 arguments: the optimization lies in the max_len argument. E.g. if the function finds a solution with length=9, it stops considering any paths with a length > 9. The first path that is found is always a pretty good one, which directly follows from the binary representation of the number. E.g. in binary: 111001 => [1,10,100,1000,10000,100000,110000,111000,111001]. That's not always the fastest path, but if you only search for paths that are as least as fast, you can cut away most of the search tree.
#!/usr/bin/env python
# Find the shortest addition chain...
# #param acc List of integers, the "accumulator". A strictly monotonous
# addition chain with at least two elements.
# #param target An integer > 2. The number that should be reached.
# #param max_len An integer > 2. The maximum length of the addition chain
# #return A addition chain starting with acc and ending with target, with
# at most max_len elements. Or None if such an addition chain
# does not exist. The solution is optimal! There is no addition
# chain with these properties which can be shorter.
def _shortest(acc, target, max_len):
length = len(acc)
if length > max_len:
return None
last = acc[-1]
if last == target:
return acc;
if last > target:
return None
if length == max_len:
return None
last_half = (last / 2)
solution = None
potential_solution = None
good_len = max_len
# Quick check: can we make this work?
# (this improves the performance considerably for target > 70)
max_value = last
for _ in xrange(length, max_len):
max_value *= 2
if max_value >= target:
break
if max_value < target:
return None
for i in xrange(length-1, -1, -1):
a = acc[i]
if a < last_half:
break
for j in xrange(i, -1, -1):
b = acc[j]
s = a+b
if s <= last:
break
# modifying acc in-place has much better performance than copying
# the list and doing
# new_acc = list(acc)
# potential_solution = _shortest(new_acc, target, good_len)
acc.append(s)
potential_solution = _shortest(acc, target, good_len)
if potential_solution is not None:
new_len = len(potential_solution)
solution = list(potential_solution)
good_len = new_len-1
# since we didn't copy the list, we have to truncate it to its
# original length now.
del acc[length:]
return solution
# Finds the shortest addition chain reaching to n.
# E.g. 9 => [1,2,3,6,9]
def shortest(n):
if n > 3:
# common case first
return _shortest([1,2], n, n)
if n < 1:
raise ValueError("n must be >= 1")
return list(xrange(1,n+1))
for i in xrange(1,33):
s = shortest(i)
c = len(s) - 1
print ("complexity of %2d is %d: e.g. %s" % (i,c,s))
Output:
complexity of 1 is 0: e.g. [1]
complexity of 2 is 1: e.g. [1, 2]
complexity of 3 is 2: e.g. [1, 2, 3]
complexity of 4 is 2: e.g. [1, 2, 4]
complexity of 5 is 3: e.g. [1, 2, 4, 5]
complexity of 6 is 3: e.g. [1, 2, 4, 6]
complexity of 7 is 4: e.g. [1, 2, 4, 6, 7]
complexity of 8 is 3: e.g. [1, 2, 4, 8]
complexity of 9 is 4: e.g. [1, 2, 4, 8, 9]
complexity of 10 is 4: e.g. [1, 2, 4, 8, 10]
complexity of 11 is 5: e.g. [1, 2, 4, 8, 10, 11]
complexity of 12 is 4: e.g. [1, 2, 4, 8, 12]
complexity of 13 is 5: e.g. [1, 2, 4, 8, 12, 13]
complexity of 14 is 5: e.g. [1, 2, 4, 8, 12, 14]
complexity of 15 is 5: e.g. [1, 2, 4, 5, 10, 15]
complexity of 16 is 4: e.g. [1, 2, 4, 8, 16]
complexity of 17 is 5: e.g. [1, 2, 4, 8, 16, 17]
complexity of 18 is 5: e.g. [1, 2, 4, 8, 16, 18]
complexity of 19 is 6: e.g. [1, 2, 4, 8, 16, 18, 19]
complexity of 20 is 5: e.g. [1, 2, 4, 8, 16, 20]
complexity of 21 is 6: e.g. [1, 2, 4, 8, 16, 20, 21]
complexity of 22 is 6: e.g. [1, 2, 4, 8, 16, 20, 22]
complexity of 23 is 6: e.g. [1, 2, 4, 5, 9, 18, 23]
complexity of 24 is 5: e.g. [1, 2, 4, 8, 16, 24]
complexity of 25 is 6: e.g. [1, 2, 4, 8, 16, 24, 25]
complexity of 26 is 6: e.g. [1, 2, 4, 8, 16, 24, 26]
complexity of 27 is 6: e.g. [1, 2, 4, 8, 9, 18, 27]
complexity of 28 is 6: e.g. [1, 2, 4, 8, 16, 24, 28]
complexity of 29 is 7: e.g. [1, 2, 4, 8, 16, 24, 28, 29]
complexity of 30 is 6: e.g. [1, 2, 4, 8, 10, 20, 30]
complexity of 31 is 7: e.g. [1, 2, 4, 8, 10, 20, 30, 31]
complexity of 32 is 5: e.g. [1, 2, 4, 8, 16, 32]
There is a dynamic programming solution to your problem since you either add any two numbers or multiply a number by 2 we can try all those cases and choose the minimum one also if the complexity of 25 was 5 and the route contains 9 then we know the solution for 9 which is 4 and we can use the solution for 9 to generate the solution for 25.We also need to keep track of every possible minimum solution for m to be able to use it to use it to solve n where m < n
def solve(m):
p = [set([frozenset([])]) for i in xrange(m+1)] #contains all paths to reach n
a = [9999 for _ in xrange(m+1)]
#contains all complexities initialized with a big number
a[1] = 0
p[1] = set([frozenset([1])])
for i in xrange(1,m+1):
for pos in p[i]:
for j in pos: #try adding any two numbers and 2*any number
for k in pos:
if (j+k <= m):
if a[j+k] > a[i]+1:
a[j+k] = a[i] + 1
p[j+k] = set([frozenset(list(pos) + [j+k])])
elif a[j+k] == a[i]+1:
p[j+k].add(frozenset(list(pos) + [j+k]))
return a[m],sorted(list(p[m].pop()))
n = int(raw_input())
print solve(n)
this can solve up to n = 100
For larger numbers you can get a 30% or more speedup by adding a couple lines to remove some redundant calculations from the inner loop. For this the pos2 variable is created and trimmed on each iteration:
def solve(m):
p = [set([frozenset([])]) for i in xrange(m+1)] #contains all paths to reach n
a = [9999 for _ in xrange(m+1)]
#contains all complexities initialized with a big number
a[1] = 0
p[1] = set([frozenset([1])])
for i in xrange(1,m+1):
for pos in p[i]:
pos2 = set(pos)
for j in pos: #try adding any two numbers and 2*any number
for k in pos2:
if (j+k <= m):
if a[j+k] > a[i]+1:
a[j+k] = a[i] + 1
p[j+k] = set([frozenset(list(pos) + [j+k])])
elif a[j+k] == a[i]+1:
p[j+k].add(frozenset(list(pos) + [j+k]))
pos2.remove(j)
return a[m],sorted(list(p[m].pop()))
Brute forcing this
def solve(m, path):
if path[-1] == m:
return path
if path[-1] > m:
return False
best_path = [i for i in range(m)]
test_path = solve (m, path + [path[-1]*2])
if test_path and len(test_path) < len(best_path):
best_path = test_path
for k1 in path[:-1] :
for k2 in path[:-1] :
test_path = solve (m, path + [path[-1] + k1 + k2])
if test_path and len(test_path) < len(best_path):
#retain best
best_path = test_path
return best_path
print (solve(19, [1,2])) #[1, 2, 4, 8, 16, 19]
print (solve(25, [1,2])) #[1, 2, 4, 8, 16, 25]
Runs fairly slow, I'm pretty sure a smarter solution exist but this look semantically correct
A brute force technique that simply searches through all the possible paths until it reaches its target. It is extremely fast as it does not evaluate numbers that have been reached with a lower complexity, but is guaranteed to find the optimal path.
# Use a node class that contains the number as well as a pointer to its parent
class Node:
def __init__(self, number, parent):
self.number = number
self.parent = parent
# get a list of all the numbers to reach this node
def getPath(self):
path = [self.number]
parent = self.parent
while parent != None:
path.append(parent.number)
parent = parent.parent
return path
def solve(start, target):
currentList = [] # List of visited nodes in the previous round
nextList = [Node(start, None)] # List of visited nodes in the next round (start with at least one number)
seen = set([start]) # store all number that have already been seen in the previous round
while nextList: # continue until the final number has reached, on each iteration the complexity grows
currentList = nextList # swap the lists around
nextList = []
for n in currentList:
path = n.getPath() # fetch all the number that are needed to reach this point
if n.number == target: # check of we have reach our target
return path[::-1] # the path is in reverse at this point, so reverse it back
for a in path: # for any combination of the last number and a previous number (including the last, which is the same as multiplying it by 2)
newnumber = a + path[0]
if newnumber <= target and newnumber not in seen: # only evaluate the new number if is is not yet seen already on a lower complexity
nextList.append(Node(newnumber, n))
for n in nextList: # update the seen list
seen.add(n.number)
return [] # if the destination could not be reached
print "path to 25 = ", solve(1, 25)
print "path to 8 = ", solve(1, 8)
print "path to 15 = ", solve(1, 15)
print "path to 500 = ", solve(1, 500)
will output the following:
path to 25 = [1, 2, 4, 8, 16, 24, 25]
path to 8 = [1, 2, 4, 8]
path to 15 = [1, 2, 4, 5, 10, 15]
path to 500 = [1, 2, 4, 8, 16, 32, 64, 96, 100, 200, 400, 500]
I've tested this method to solve variables up to 500, and it was able to solve it in 0.36 seconds.

Find numbers within a range bisect python

I have a list of integer numbers and I want to write a function that returns a subset of numbers that are within a range. Something like NumbersWithinRange(list, interval) function name...
I.e.,
list = [4,2,1,7,9,4,3,6,8,97,7,65,3,2,2,78,23,1,3,4,5,67,8,100]
interval = [4,20]
results = NumbersWithinRange(list, interval) # [4,4,6,8,7,8]
maybe i forgot to write one more number in results, but that's the idea...
The list can be as big as 10/20 million length, and the range is normally of a few 100.
Any suggestions on how to do it efficiently with python - I was thinking to use bisect.
Thanks.
I would use numpy for that, especially if the list is that long. For example:
In [101]: list = np.array([4,2,1,7,9,4,3,6,8,97,7,65,3,2,2,78,23,1,3,4,5,67,8,100])
In [102]: list
Out[102]:
array([ 4, 2, 1, 7, 9, 4, 3, 6, 8, 97, 7, 65, 3,
2, 2, 78, 23, 1, 3, 4, 5, 67, 8, 100])
In [103]: good = np.where((list > 4) & (list < 20))
In [104]: list[good]
Out[104]: array([7, 9, 6, 8, 7, 5, 8])
# %timeit says that numpy is MUCH faster than any list comprehension:
# create an array 10**6 random ints b/w 0 and 100
In [129]: arr = np.random.randint(0,100,1000000)
In [130]: interval = xrange(4,21)
In [126]: %timeit r = [x for x in arr if x in interval]
1 loops, best of 3: 14.2 s per loop
In [136]: %timeit good = np.where((list > 4) & (list < 20)) ; new_list = list[good]
100 loops, best of 3: 10.8 ms per loop
In [134]: %timeit r = [x for x in arr if 4 < x < 20]
1 loops, best of 3: 2.22 s per loop
In [142]: %timeit filtered = [i for i in ifilter(lambda x: 4 < x < 20, arr)]
1 loops, best of 3: 2.56 s per loop
The pure-Python Python sortedcontainers module has a SortedList type that can help you. It maintains the list automatically in sorted order and has been tested passed tens of millions of elements. The sorted list type has a bisect function you can use.
from sortedcontainers import SortedList
data = SortedList(...)
def NumbersWithinRange(items, lower, upper):
start = items.bisect(lower)
end = items.bisect_right(upper)
return items[start:end]
subset = NumbersWithinRange(data, 4, 20)
Bisecting and indexing will be much faster this way than scanning the entire list. The sorted containers module is very fast and has a performance comparison page with benchmarks against alternative implementations.
If the list isn't sorted, you need to scan the entire list:
lst = [ 4,2,1,...]
interval=[4,20]
results = [ x for x in lst if interval[0] <= x <= interval[1] ]
If the list is sorted, you can use bisect to find the left and right indices that
bound your range.
left = bisect.bisect_left(lst, interval[0])
right = bisect.bisect_right(lst, interval[1])
results = lst[left+1:right]
Since scanning the list is O(n) and sorting is O(n lg n), it probably is not worth sorting the list just to use bisect unless you plan on doing lots of range extractions.
I think this should be sufficiently efficient:
>>> nums = [4,2,1,7,9,4,3,6,8,97,7,65,3,2,2,78,23,1,3,4,5,67,8,100]
>>> r = [x for x in nums if 4 <= x <21]
>>> r
[4, 7, 9, 4, 6, 8, 7, 4, 5, 8]
Edit:
After J.F. Sebastian's excellent observation, modified the code.
Using iterators
>>> from itertools import ifilter
>>> A = [4,2,1,7,9,4,3,6,8,97,7,65,3,2,2,78,23,1,3,4,5,67,8,100]
>>> [i for i in ifilter(lambda x: 4 < x < 20, A)]
[7, 9, 6, 8, 7, 5, 8]
Let's create a list similar to what you described:
import random
l = [random.randint(-100000,100000) for i in xrange(1000000)]
Now test some possible solutions:
interval=range(400,800)
def v2():
""" return a list """
return [i for i in l if i in interval]
def v3():
""" return a generator """
return list((i for i in l if i in interval))
def v4():
def te(x):
return x in interval
return filter(te,l)
def v5():
return [i for i in ifilter(lambda x: x in interval, l)]
print len(v2()),len(v3()), len(v4()), len(v5())
cmpthese.cmpthese([v2,v3,v4,v5],micro=True, c=2)
Prints this:
rate/sec usec/pass v5 v4 v2 v3
v5 0 6929225.922 -- -0.4% -1.0% -1.6%
v4 0 6903028.488 0.4% -- -0.6% -1.2%
v2 0 6861472.487 1.0% 0.6% -- -0.6%
v3 0 6817855.477 1.6% 1.2% 0.6% --
HOWEVER, watch what happens if interval is a set instead of a list:
interval=set(range(400,800))
cmpthese.cmpthese([v2,v3,v4,v5],micro=True, c=2)
rate/sec usec/pass v5 v4 v3 v2
v5 5 201332.569 -- -20.6% -62.9% -64.6%
v4 6 159871.578 25.9% -- -53.2% -55.4%
v3 13 74769.974 169.3% 113.8% -- -4.7%
v2 14 71270.943 182.5% 124.3% 4.9% --
Now comparing with numpy:
na=np.array(l)
def v7():
""" assume you have to convert from list => numpy array and return a list """
arr=np.array(l)
tgt = np.where((arr >= 400) & (arr < 800))
return [arr[x] for x in tgt][0].tolist()
def v8():
""" start with a numpy list but return a python list """
tgt = np.where((na >= 400) & (na < 800))
return na[tgt].tolist()
def v9():
""" numpy all the way through """
tgt = np.where((na >= 400) & (na < 800))
return [na[x] for x in tgt][0]
# or return na[tgt] if you prefer that syntax...
cmpthese.cmpthese([v2,v3,v4,v5, v7, v8,v9],micro=True, c=2)
rate/sec usec/pass v5 v4 v7 v3 v2 v8 v9
v5 5 185431.957 -- -17.4% -24.7% -63.3% -63.4% -93.6% -93.6%
v4 7 153095.007 21.1% -- -8.8% -55.6% -55.7% -92.3% -92.3%
v7 7 139570.475 32.9% 9.7% -- -51.3% -51.4% -91.5% -91.5%
v3 15 67983.985 172.8% 125.2% 105.3% -- -0.2% -82.6% -82.6%
v2 15 67861.438 173.3% 125.6% 105.7% 0.2% -- -82.5% -82.5%
v8 84 11850.476 1464.8% 1191.9% 1077.8% 473.7% 472.6% -- -0.0%
v9 84 11847.973 1465.1% 1192.2% 1078.0% 473.8% 472.8% 0.0% --
Clearly numpy is faster than pure python as long as you can work with numpy all the way through. Otherwise, use a set for the interval to speed up a bit...
I think you are looking for something like this..
b=[i for i in a if 4<=i<90]
print sorted(set(b))
[4, 5, 6, 7, 8, 9, 23, 65, 67, 78]
If your data set isn't too sparse, you could use "bins" to store and retrieve the data. For example:
a = [4,2,1,7,9,4,3,6,8,97,7,65,3,2,2,78,23,1,3,4,5,67,8,100]
# Initalize a list of 0's [0, 0, ...]
# This is assuming that the minimum possible value is 0
bins = [0 for _ in range(max(a) + 1)]
# Update the bins with the frequency of each number
for i in a:
bins[i] += 1
def NumbersWithinRange(data, interval):
result = []
for i in range(interval[0], interval[1] + 1):
freq = data[i]
if freq > 0:
result += [i] * freq
return result
This works for this test case:
print(NumbersWithinRange(bins, [4, 20]))
# [4, 4, 4, 5, 6, 7, 7, 8, 8, 9]
For simplicity, I omitted some bounds checking in the function.
To reiterate, this may work better in terms of space and time usage, but it depends heavily on your particular data set. The less sparse the data set, the better it will do.

Categories