Find numbers within a range bisect python - python

I have a list of integer numbers and I want to write a function that returns a subset of numbers that are within a range. Something like NumbersWithinRange(list, interval) function name...
I.e.,
list = [4,2,1,7,9,4,3,6,8,97,7,65,3,2,2,78,23,1,3,4,5,67,8,100]
interval = [4,20]
results = NumbersWithinRange(list, interval) # [4,4,6,8,7,8]
maybe i forgot to write one more number in results, but that's the idea...
The list can be as big as 10/20 million length, and the range is normally of a few 100.
Any suggestions on how to do it efficiently with python - I was thinking to use bisect.
Thanks.

I would use numpy for that, especially if the list is that long. For example:
In [101]: list = np.array([4,2,1,7,9,4,3,6,8,97,7,65,3,2,2,78,23,1,3,4,5,67,8,100])
In [102]: list
Out[102]:
array([ 4, 2, 1, 7, 9, 4, 3, 6, 8, 97, 7, 65, 3,
2, 2, 78, 23, 1, 3, 4, 5, 67, 8, 100])
In [103]: good = np.where((list > 4) & (list < 20))
In [104]: list[good]
Out[104]: array([7, 9, 6, 8, 7, 5, 8])
# %timeit says that numpy is MUCH faster than any list comprehension:
# create an array 10**6 random ints b/w 0 and 100
In [129]: arr = np.random.randint(0,100,1000000)
In [130]: interval = xrange(4,21)
In [126]: %timeit r = [x for x in arr if x in interval]
1 loops, best of 3: 14.2 s per loop
In [136]: %timeit good = np.where((list > 4) & (list < 20)) ; new_list = list[good]
100 loops, best of 3: 10.8 ms per loop
In [134]: %timeit r = [x for x in arr if 4 < x < 20]
1 loops, best of 3: 2.22 s per loop
In [142]: %timeit filtered = [i for i in ifilter(lambda x: 4 < x < 20, arr)]
1 loops, best of 3: 2.56 s per loop

The pure-Python Python sortedcontainers module has a SortedList type that can help you. It maintains the list automatically in sorted order and has been tested passed tens of millions of elements. The sorted list type has a bisect function you can use.
from sortedcontainers import SortedList
data = SortedList(...)
def NumbersWithinRange(items, lower, upper):
start = items.bisect(lower)
end = items.bisect_right(upper)
return items[start:end]
subset = NumbersWithinRange(data, 4, 20)
Bisecting and indexing will be much faster this way than scanning the entire list. The sorted containers module is very fast and has a performance comparison page with benchmarks against alternative implementations.

If the list isn't sorted, you need to scan the entire list:
lst = [ 4,2,1,...]
interval=[4,20]
results = [ x for x in lst if interval[0] <= x <= interval[1] ]
If the list is sorted, you can use bisect to find the left and right indices that
bound your range.
left = bisect.bisect_left(lst, interval[0])
right = bisect.bisect_right(lst, interval[1])
results = lst[left+1:right]
Since scanning the list is O(n) and sorting is O(n lg n), it probably is not worth sorting the list just to use bisect unless you plan on doing lots of range extractions.

I think this should be sufficiently efficient:
>>> nums = [4,2,1,7,9,4,3,6,8,97,7,65,3,2,2,78,23,1,3,4,5,67,8,100]
>>> r = [x for x in nums if 4 <= x <21]
>>> r
[4, 7, 9, 4, 6, 8, 7, 4, 5, 8]
Edit:
After J.F. Sebastian's excellent observation, modified the code.

Using iterators
>>> from itertools import ifilter
>>> A = [4,2,1,7,9,4,3,6,8,97,7,65,3,2,2,78,23,1,3,4,5,67,8,100]
>>> [i for i in ifilter(lambda x: 4 < x < 20, A)]
[7, 9, 6, 8, 7, 5, 8]

Let's create a list similar to what you described:
import random
l = [random.randint(-100000,100000) for i in xrange(1000000)]
Now test some possible solutions:
interval=range(400,800)
def v2():
""" return a list """
return [i for i in l if i in interval]
def v3():
""" return a generator """
return list((i for i in l if i in interval))
def v4():
def te(x):
return x in interval
return filter(te,l)
def v5():
return [i for i in ifilter(lambda x: x in interval, l)]
print len(v2()),len(v3()), len(v4()), len(v5())
cmpthese.cmpthese([v2,v3,v4,v5],micro=True, c=2)
Prints this:
rate/sec usec/pass v5 v4 v2 v3
v5 0 6929225.922 -- -0.4% -1.0% -1.6%
v4 0 6903028.488 0.4% -- -0.6% -1.2%
v2 0 6861472.487 1.0% 0.6% -- -0.6%
v3 0 6817855.477 1.6% 1.2% 0.6% --
HOWEVER, watch what happens if interval is a set instead of a list:
interval=set(range(400,800))
cmpthese.cmpthese([v2,v3,v4,v5],micro=True, c=2)
rate/sec usec/pass v5 v4 v3 v2
v5 5 201332.569 -- -20.6% -62.9% -64.6%
v4 6 159871.578 25.9% -- -53.2% -55.4%
v3 13 74769.974 169.3% 113.8% -- -4.7%
v2 14 71270.943 182.5% 124.3% 4.9% --
Now comparing with numpy:
na=np.array(l)
def v7():
""" assume you have to convert from list => numpy array and return a list """
arr=np.array(l)
tgt = np.where((arr >= 400) & (arr < 800))
return [arr[x] for x in tgt][0].tolist()
def v8():
""" start with a numpy list but return a python list """
tgt = np.where((na >= 400) & (na < 800))
return na[tgt].tolist()
def v9():
""" numpy all the way through """
tgt = np.where((na >= 400) & (na < 800))
return [na[x] for x in tgt][0]
# or return na[tgt] if you prefer that syntax...
cmpthese.cmpthese([v2,v3,v4,v5, v7, v8,v9],micro=True, c=2)
rate/sec usec/pass v5 v4 v7 v3 v2 v8 v9
v5 5 185431.957 -- -17.4% -24.7% -63.3% -63.4% -93.6% -93.6%
v4 7 153095.007 21.1% -- -8.8% -55.6% -55.7% -92.3% -92.3%
v7 7 139570.475 32.9% 9.7% -- -51.3% -51.4% -91.5% -91.5%
v3 15 67983.985 172.8% 125.2% 105.3% -- -0.2% -82.6% -82.6%
v2 15 67861.438 173.3% 125.6% 105.7% 0.2% -- -82.5% -82.5%
v8 84 11850.476 1464.8% 1191.9% 1077.8% 473.7% 472.6% -- -0.0%
v9 84 11847.973 1465.1% 1192.2% 1078.0% 473.8% 472.8% 0.0% --
Clearly numpy is faster than pure python as long as you can work with numpy all the way through. Otherwise, use a set for the interval to speed up a bit...

I think you are looking for something like this..
b=[i for i in a if 4<=i<90]
print sorted(set(b))
[4, 5, 6, 7, 8, 9, 23, 65, 67, 78]

If your data set isn't too sparse, you could use "bins" to store and retrieve the data. For example:
a = [4,2,1,7,9,4,3,6,8,97,7,65,3,2,2,78,23,1,3,4,5,67,8,100]
# Initalize a list of 0's [0, 0, ...]
# This is assuming that the minimum possible value is 0
bins = [0 for _ in range(max(a) + 1)]
# Update the bins with the frequency of each number
for i in a:
bins[i] += 1
def NumbersWithinRange(data, interval):
result = []
for i in range(interval[0], interval[1] + 1):
freq = data[i]
if freq > 0:
result += [i] * freq
return result
This works for this test case:
print(NumbersWithinRange(bins, [4, 20]))
# [4, 4, 4, 5, 6, 7, 7, 8, 8, 9]
For simplicity, I omitted some bounds checking in the function.
To reiterate, this may work better in terms of space and time usage, but it depends heavily on your particular data set. The less sparse the data set, the better it will do.

Related

Update values in numpy array with other values in Python

Given the following array:
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
[[1 2 3]
[4 5 6]
[7 8 9]]
How can I replace certain values with other values?
bad_vals = [4, 2, 6]
update_vals = [11, 1, 8]
I currently use:
for idx, v in enumerate(bad_vals):
a[a==v] = update_vals[idx]
Which gives:
[[ 1 1 3]
[11 5 8]
[ 7 8 9]]
But it is rather slow for large arrays with many values to be replaced. Is there any good alternative?
The input array can be changed to anything (list of list/tuples) if this might be necessary to access certain speedy black magic.
EDIT:
Based on the great answers from #Divakar and #charlysotelo did a quick comparison for my real use-case date using the benchit package. My input data array has more or less a of ratio 100:1 (rows:columns) where the length of array of replacement values are in order of 3 x rows size.
Functions:
# current approach
def enumerate_values(a, bad_vals, update_vals):
for idx, v in enumerate(bad_vals):
a[a==v] = update_vals[idx]
return a
# provided solution #Divakar
def map_values(a, bad_vals, update_vals):
N = max(a.max(), max(bad_vals))+1
mapar = np.empty(N, dtype=int)
mapar[a] = a
mapar[bad_vals] = update_vals
out = mapar[a]
return out
# provided solution #charlysotelo
def vectorize_values(a, bad_vals, update_vals):
bad_to_good_map = {}
for idx, bad_val in enumerate(bad_vals):
bad_to_good_map[bad_val] = update_vals[idx]
f = np.vectorize(lambda x: (bad_to_good_map[x] if x in bad_to_good_map else x))
a = f(a)
return a
# define benchit input functions
import benchit
funcs = [enumerate_values, map_values, vectorize_values]
# define benchit input variables to bench against
in_ = {
n: (
np.random.randint(0,n*10,(n,int(n * 0.01))), # array
np.random.choice(n*10, n*3,replace=False), # bad_vals
np.random.choice(n*10, n*3) # update_vals
)
for n in [300, 1000, 3000, 10000, 30000]
}
# do the bench
# btw: timing of bad approaches (my own function here) take time
t = benchit.timings(funcs, in_, multivar=True, input_name='Len')
t.plot(logx=True, grid=False)
Here's one way based on the hinted mapping array method for positive numbers -
def map_values(a, bad_vals, update_vals):
N = max(a.max(), max(bad_vals))+1
mapar = np.empty(N, dtype=int)
mapar[a] = a
mapar[bad_vals] = update_vals
out = mapar[a]
return out
Sample run -
In [94]: a
Out[94]:
array([[1, 2, 1],
[4, 5, 6],
[7, 1, 1]])
In [95]: bad_vals
Out[95]: [4, 2, 6]
In [96]: update_vals
Out[96]: [11, 1, 8]
In [97]: map_values(a, bad_vals, update_vals)
Out[97]:
array([[ 1, 1, 1],
[11, 5, 8],
[ 7, 1, 1]])
Benchmarking
# Original soln
def replacevals(a, bad_vals, update_vals):
out = a.copy()
for idx, v in enumerate(bad_vals):
out[out==v] = update_vals[idx]
return out
The given sample had the 2D input of nxn with n samples to be replaced. Let's setup input datasets with the same structure.
Using benchit package (few benchmarking tools packaged together; disclaimer: I am its author) to benchmark proposed solutions.
import benchit
funcs = [replacevals, map_values]
in_ = {n:(np.random.randint(0,n*10,(n,n)),np.random.choice(n*10,n,replace=False),np.random.choice(n*10,n)) for n in [3,10,100,1000,2000]}
t = benchit.timings(funcs, in_, multivar=True, input_name='Len')
t.plot(logx=True, save='timings.png')
Plot :
This really depends on the size of your array, and the size of your mappings from bad to good integers.
For a larger number of bad to good integers - the method below is better:
import numpy as np
import time
ARRAY_ROWS = 10000
ARRAY_COLS = 1000
NUM_MAPPINGS = 10000
bad_vals = np.random.rand(NUM_MAPPINGS)
update_vals = np.random.rand(NUM_MAPPINGS)
bad_to_good_map = {}
for idx, bad_val in enumerate(bad_vals):
bad_to_good_map[bad_val] = update_vals[idx]
# np.vectorize with mapping
# Takes about 4 seconds
a = np.random.rand(ARRAY_ROWS, ARRAY_COLS)
f = np.vectorize(lambda x: (bad_to_good_map[x] if x in bad_to_good_map else x))
print (time.time())
a = f(a)
print (time.time())
# Your way
# Takes about 60 seconds
a = np.random.rand(ARRAY_ROWS, ARRAY_COLS)
print (time.time())
for idx, v in enumerate(bad_vals):
a[a==v] = update_vals[idx]
print (time.time())
Running the code above it took less than 4 seconds for the np.vectorize(lambda) way to finish - whereas your way took almost 60 seconds. However, setting the NUM_MAPPINGS to 100, your method takes less than a second for me - faster than the 2 seconds for the np.vectorize way.

what is the fastest way to increment a slice of a list?

I have a list:
lst = [ 1,2,3,4,5,6,7,8]
I want to increment all numbers above index 4.
for i in range(4,len(lst)):
lst[i]+=2
Since this operation needs to be done many time, I want to do it the most efficient way possible.
How can I do this fast.
Use Numpy for fast array manipulations, check the example below:
import numpy as np
lst = np.array([1,2,3,4,5,6,7,8])
# add 2 at all indices from 4 till the end of the array
lst[4:] += 2
print(lst)
# array([ 1, 2, 3, 4, 7, 8, 9, 10])
If you are updating large ranges of a large list many times, use a more suitable data structure so that the updates don't take O(n) time each.
One such data structure is a segment tree, where each list element corresponds to a leaf node in a tree; the true value of the list element can be represented as the sum of the values on the path between the leaf node and the root node. This way, adding a number to a single internal node is effectively like adding it to all of the list elements represented by that subtree.
The data structure supports get/set operations by index in O(log n) time, and add-in-range operations also in O(log n) time. The solution below uses a binary tree, implemented using a list of length <= 2n.
class RangeAddList:
def __init__(self, vals):
# list length
self._n = len(vals)
# smallest power of 2 >= list length
self._m = 1 << (self._n - 1).bit_length()
# list representing binary tree; leaf nodes offset by _m
self._vals = [0]*self._m + vals
def __repr__(self):
return '{}({!r})'.format(self.__class__.__name__, list(self))
def __len__(self):
return self._n
def __iter__(self):
for i in range(self._n):
yield self[i]
def __getitem__(self, i):
if i not in range(self._n):
raise IndexError()
# add up values from leaf to root node
t = 0
i += self._m
while i > 0:
t += self._vals[i]
i >>= 1
return t + self._vals[0]
def __setitem__(self, i, x):
# add difference (new value - old value)
self._vals[self._m + i] += x - self[i]
def add_in_range(self, i, j, x):
if i not in range(self._n + 1) or j not in range(self._n + 1):
raise IndexError()
# add at internal nodes spanning range(i, j)
i += self._m
j += self._m
while i < j:
if i & 1:
self._vals[i] += x
i += 1
if j & 1:
j -= 1
self._vals[j] += x
i >>= 1
j >>= 1
Example:
>>> r = RangeAddList([0] * 10)
>>> r.add_in_range(0, 4, 10)
>>> r.add_in_range(6, 9, 20)
>>> r.add_in_range(3, 7, 100)
>>> r
RangeAddList([10, 10, 10, 110, 100, 100, 120, 20, 20, 0])
It turns out that NumPy is so well-optimized, you need to go up to lists of length 50,000 or so before the segment tree catches up. The segment tree is still only about twice as fast as NumPy's O(n) range updates for lists of length 100,000 on my machine. You may want to benchmark with your own data to be sure.
This is a fast way of doing it:
lst1 = [1, 2, 3, 4, 5, 6, 7, 8]
new_list = [*lst[:4], *[x+2 for x in lst1[4:]]]
# or even better
new_list[4:] = [x+2 for x in lst1[4:]]
In terms of speed, numpy isn't faster for lists this small:
import timeit
import numpy as np
lst1 = [1, 2, 3, 4, 5, 6, 7, 8]
npa = np.array(lst)
def numpy_it():
global npa
npa[4:] += 2
def python_it():
global lst1
lst1 = [*lst1[:4], *[x+2 for x in lst1[4:]]]
print(timeit.timeit(numpy_it))
print(timeit.timeit(python_it))
For me gets:
1.7008036
0.6737076000000002
But for anything serious numpy beats generating a new list for the slice that needs replacing, which beats regenerating the entire list (which beats in-place replacement with a loop like in your example):
import timeit
import numpy as np
lst1 = list(range(0, 10000))
npa = np.array(lst1)
lst2 = list(range(0, 10000))
lst3 = list(range(0, 10000))
def numpy_it():
global npa
npa[4:] += 2
def python_it():
global lst1
lst1 = [*lst1[:4], *[x+2 for x in lst1[4:]]]
def python_it_slice():
global lst2
lst2[4:] = [x+2 for x in lst2[4:]]
def python_inplace():
global lst3
for i in range(4, len(lst3)):
lst3[i] = lst3[i] + 2
n = 10000
print(timeit.timeit(numpy_it, number=n))
print(timeit.timeit(python_it_slice, number=n))
print(timeit.timeit(python_it, number=n))
print(timeit.timeit(python_inplace, number=n))
Results:
0.057994199999999996
4.3747423
4.5193105000000005
9.949074000000001
Use assign to slice:
lst[4:] = [x+2 for x in lst[4:]]
Test (on my ancient ThinkPad i3-3110, Python 3.5.2):
import timeit
lst = [1, 2, 3, 4, 5, 6, 7, 8]
def python_it():
global lst
lst = [*lst[:4], *[x+2 for x in lst[4:]]]
def python_it2():
global lst
lst[4:] = [x+2 for x in lst[4:]]
print(timeit.timeit(python_it))
print(timeit.timeit(python_it2))
Prints:
1.2732834180060308
0.9285018060181756
use python builtin map function and lambda
lst = [1,2,3,4,5,6,7,8]
lst[4:] = map(lambda x:x+2, lst[4:])
print(lst)
# [1, 2, 3, 4, 7, 8, 9, 10]

How to improve time complexity of remove all multiplicands from array or list?

I am trying to find elements from array(integer array) or list which are unique and those elements must not divisible by any other element from same array or list.
You can answer in any language like python, java, c, c++ etc.
I have tried this code in Python3 and it works perfectly but I am looking for better and optimum solution in terms of time complexity.
assuming array or list A is already sorted and having unique elements
A = [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
while i<len(A)-1:
while j<len(A):
if A[j]%A[i]==0:
A.pop(j)
else:
j+=1
i+=1
j=i+1
For the given array A=[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] answer would be like ans=[2,3,5,7,11,13]
another example,A=[4,5,15,16,17,23,39] then ans would be like, ans=[4,5,17,23,39]
ans is having unique numbers
any element i from array only exists if (i%j)!=0, where i!=j
I think it's more natural to do it in reverse, by building a new list containing the answer instead of removing elements from the original list. If I'm thinking correctly, both approaches do the same number of mod operations, but you avoid the issue of removing an element from a list.
A = [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
ans = []
for x in A:
for y in ans:
if x % y == 0:
break
else: ans.append(x)
Edit: Promoting the completion else.
This algorithm will perform much faster:
A = [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
if (A[-1]-A[0])/A[0] > len(A)*2:
result = list()
for v in A:
for f in result:
d,m = divmod(v,f)
if m == 0: v=0;break
if d<f: break
if v: result.append(v)
else:
retain = set(A)
minMult = 1
maxVal = A[-1]
for v in A:
if v not in retain : continue
minMult = v*2
if minMult > maxVal: break
if v*len(A)<maxVal:
retain.difference_update([m for m in retain if m >= minMult and m%v==0])
else:
retain.difference_update(range(minMult,maxVal,v))
if maxVal%v == 0:
maxVal = max(retain)
result = list(retain)
print(result) # [2, 3, 5, 7, 11, 13]
In the spirit of the sieve of Eratostenes, each number that is retained, removes its multiples from the remaining eligible numbers. Depending on the magnitude of the highest value, it is sometimes more efficient to exclude multiples than check for divisibility. The divisibility check takes several times longer for an equivalent number of factors to check.
At some point, when the data is widely spread out, assembling the result instead of removing multiples becomes faster (this last addition was inspired by Imperishable Night's post).
TEST RESULTS
A = [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] (100000 repetitions)
Original: 0.55 sec
New: 0.29 sec
A = list(range(2,5000))+[9697] (100 repetitions)
Original: 3.77 sec
New: 0.12 sec
A = list(range(1001,2000))+list(range(4000,6000))+[9697**2] (10 repetitions)
Original: 3.54 sec
New: 0.02 sec
I know that this is totally insane but i want to know what you think about this:
A = [4,5,15,16,17,23,39]
prova=[[x for x in A if x!=y and y%x==0] for y in A]
print([A[idx] for idx,x in enumerate(prova) if len(prova[idx])==0])
And i think it's still O(n^2)
If you care about speed more than algorithmic efficiency, numpy would be the package to use here in python:
import numpy as np
# Note: doesn't have to be sorted
a = [2, 2, 3, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 16, 29, 29]
a = np.unique(a)
result = a[np.all((a % a[:, None] + np.diag(a)), axis=0)]
# array([2, 3, 5, 7, 11, 13, 29])
This divides all elements by all other elements and stores the remainder in a matrix, checks which columns contain only non-0 values (other than the diagonal), and selects all elements corresponding to those columns.
This is O(n*M) where M is the max size of an integer in your list. The integers are all assumed to be none negative. This also assumes your input list is sorted (came to that assumption since all lists you provided are sorted).
a = [4, 7, 7, 8]
# a = [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
# a = [4, 5, 15, 16, 17, 23, 39]
M = max(a)
used = set()
final_list = []
for e in a:
if e in used:
continue
else:
used.add(e)
for i in range(e, M + 1):
if not (i % e):
used.add(i)
final_list.append(e)
print(final_list)
Maybe this can be optimized even further...
If the list is not sorted then for the above method to work, one must sort it. The time complexity will then be O(nlogn + Mn) which equals to O(nlogn) when n >> M.

Python methods to reduce running times KTNS algorithm

I have been trying to develop an algorithm called keep the tool needed soonest but during the simulations, I have realized that it takes too much time to run.
I want to decrease the running times and checking other previous questions about how to fast python coding Is Python slower than Java/C#? [closed] I have found several solutions, but I don't know how to implement them in my code.
On my computer it takes 0.004999876022338867 seconds, but the main problem is that for the whole program this function is called 13,000 times.
Here I attach my whole code, if you have any suggestion to improve it please don't hesitate to share with me.
import sets
import numpy
import copy
import time
J= {1: [6, 7],2: [2, 3], 3: [1, 6, 9], 4: [1, 5, 9], 5: [5, 8, 10], 6: [1, 3, 6, 8], 7: [5, 6, 8, 9], 8: [5, 7, 8], 9: [1, 4, 5, 8], 10: [1, 2, 4, 10]}
def KTNS(sigma=[10, 9, 4, 1, 6, 3, 7, 5, 2, 8], Jobs=J, m=10 ,capacity=4 ):
t0=time.time()
Tools = {}
Lin={}
n=len(sigma)
for i in range(1,m+1):
for e in sigma:
if i in Jobs[e]:
Tools[i]=sets.Set([])
count = 1
available_tools=sets.Set()
for e in sigma:
for i in Jobs[e]:
Tools[i].add(count)
available_tools.add(i)
count+=1
Tin=copy.deepcopy(Tools)
for e in Tin:
Lin[e]=min(Tin[e])
count=1
J = numpy.array([0] *m)
W = numpy.array([[0] * m] * n)
while count<=len(sigma):
for e in Tools:
if len(available_tools)<capacity:
reference=len(available_tools)
else:
reference=capacity
while numpy.count_nonzero(J == 1) <reference:
min_value = min(Lin.itervalues())
min_keys = [k for k in Lin if Lin[k] == min_value]
temp = min_keys[0] #min(Lin, key=Lin.get)
if min_value>count:
if len(min_keys)>=2:
if count==1:
J[temp - 1] = 1
Lin[temp] = '-'
else:
J0=W[count-2]
k=0
for elements in min_keys: #tested
if numpy.count_nonzero(J == 1) < reference:
if J0[elements-1]==1:
J[elements-1]=1
Lin[elements]='-'
k=1
else:
pass
else:
pass
if k==0:
J[temp - 1] = 1
Lin[temp] = '-'
else:
J[temp - 1] = 1
Lin[temp] = '-'
else:
J[temp-1]=1
Lin[temp] = '-'
Tin[e].discard(count)
for element in Tin:
try:
Lin[element] = min(Tin[element])
except ValueError:
Tin[element]=sets.Set([len(sigma)+1])
Lin[element]=len(sigma)+1
W[count-1]=J
J= numpy.array([0] *m)
count+=1
Cost=0
for e in range(1,len(sigma)):
temp=W[e]-W[e-1]
temp[temp < 0] = 0
Cost+=sum(temp)
return Cost+capacity,time.time()-t0
One recommendation - try to minimize your use of dictionaries. It looks like many of your dictionaries could instead be lists. Dictionary access is much slower than list access in python.
It looks like you could simply make Tools, Lin and Tin all be lists, e.g. Lin = [] instead of Lin = {}, and I expect you'll see a drastic improvement in performance.
You know the sizes of your 3 dictionaries, so just initialize them to the size you need. Create Lin and Tools as follows:
Lin = [None] * m+1
Tools = [None] * m+1
Tin = [None] * m+1
This will make a list of m+1 elements (which is what you'll get with your loop from 1 through m+1). Since you're doing 1-based indexing, it leaves an empty place in Lin[0], Tools[0], etc, but you'll then be able to access Lin[1] - Lin[10], as you're currently doing.
Simple example you can try for yourself:
python3 -m timeit -s 'foo = [x for x in range(10000)]' 'foo[500]'
100000000 loops, best of 3: 0.0164 usec per loop
python3 -m timeit -s 'foo = {x: x for x in range(10000)}' 'foo[500]'
10000000 loops, best of 3: 0.0254 usec per loop
Simply by changing your dictionaries to list, you get almost a 2x improvement. Your 65 second task would now take about 35 seconds.
By the way, check out the python wiki for tips on improving speed, including lots of references on how to profile your function.

group list of ints by continuous sequence

I have a list of integers...
[1,2,3,4,5,8,9,10,11,200,201,202]
I would like to group them into a list of lists where each sublist contains integers whose sequence has not been broken. Like this...
[[1,5],[8,11],[200,202]]
I have a rather clunky work around...
lSequenceOfNum = [1,2,3,4,5,8,9,10,11,200,201,202]
lGrouped = []
start = 0
for x in range(0,len(lSequenceOfNum)):
if x != len(lSequenceOfNum)-1:
if(lSequenceOfNum[x+1] - lSequenceOfNum[x]) > 1:
lGrouped.append([lSequenceOfNum[start],lSequenceOfNum[x]])
start = x+1
else:
lGrouped.append([lSequenceOfNum[start],lSequenceOfNum[x]])
print lGrouped
It is the best I could do. Is there a more "pythonic" way to do this? Thanks..
Assuming the list will always be in ascending order:
from itertools import groupby, count
numberlist = [1,2,3,4,5,8,9,10,11,200,201,202]
def as_range(g):
l = list(g)
return l[0], l[-1]
print [as_range(g) for _, g in groupby(numberlist, key=lambda n, c=count(): n-next(c))]
I realised I had overcomplicated this a little, far easier to just count manually than use a slightly convoluted generator:
def ranges(seq):
start, end = seq[0], seq[0]
count = start
for item in seq:
if not count == item:
yield start, end
start, end = item, item
count = item
end = item
count += 1
yield start, end
print(list(ranges([1,2,3,4,5,8,9,10,11,200,201,202])))
Producing:
[(1, 5), (8, 11), (200, 202)]
This method is pretty fast:
This method (and the old one, they perform almost exactly the same):
python -m timeit -s "from test import ranges" "ranges([1,2,3,4,5,8,9,10,11,200,201,202])"
1000000 loops, best of 3: 0.47 usec per loop
Jeff Mercado's Method:
python -m timeit -s "from test import as_range; from itertools import groupby, count" "[as_range(g) for _, g in groupby([1,2,3,4,5,8,9,10,11,200,201,202], key=lambda n, c=count(): n-next(c))]"
100000 loops, best of 3: 11.1 usec per loop
That's over 20x faster - although, naturally, unless speed matters this isn't a real concern.
My old solution using generators:
import itertools
def resetable_counter(start):
while True:
for i in itertools.count(start):
reset = yield i
if reset:
start = reset
break
def ranges(seq):
start, end = seq[0], seq[0]
counter = resetable_counter(start)
for count, item in zip(counter, seq): #In 2.x: itertools.izip(counter, seq)
if not count == item:
yield start, end
start, end = item, item
counter.send(item)
end = item
yield start, end
print(list(ranges([1,2,3,4,5,8,9,10,11,200,201,202])))
Producing:
[(1, 5), (8, 11), (200, 202)]
You can do this efficiently in three steps
given
list1=[1,2,3,4,5,8,9,10,11,200,201,202]
Calculate the discontinuity
[1,2,3,4,5,8,9,10,11 ,200,201,202]
- [1,2,3,4,5,8,9 ,10 ,11 ,200,201,202]
----------------------------------------
[1,1,1,1,3,1,1 ,1 ,189,1 ,1]
(index) 1 2 3 4 5 6 7 8 9 10 11
* *
rng = [i+1 for i,e in enumerate((x-y for x,y in zip(list1[1:],list1))) if e!=1]
>>> rng
[5, 9]
Add the boundaries
rng = [0] + rng + [len(list1)]
>>> rng
[0, 5, 9,12]
now calculate the actual continuity ranges
[(list1[i],list1[j-1]) for i,j in zip(list2,list2[1:])]
[(1, 5), (8, 11), (200, 202)]
LB [0, 5, 9, 12]
UB [0, 5, 9, 12]
-----------------------
indexes (LB,UB-1) (0,4) (5,8) (9,11)
The question is quite old, but I thought I'll share my solution anyway
Assuming import numpy as np
a = [1,2,3,4,5,8,9,10,11,200,201,202]
np.split(a, array(np.add(np.where(diff(a)>1),1)).tolist()[0])
pseudo code (with off-by-one errors to fix):
jumps = new array;
for idx from 0 to len(array)
if array[idx] != array[idx+1] then jumps.push(idx);
I think this is actually a case where it makes sense to work with the indices (as in C, before java/python/perl/etc. improved upon this) instead of the objects in the array.
Here's a version that should be easy to read:
def close_range(el, it):
while True:
el1 = next(it, None)
if el1 != el + 1:
return el, el1
el = el1
def compress_ranges(seq):
iterator = iter(seq)
left = next(iterator, None)
while left is not None:
right, left1 = close_range(left, iterator)
yield (left, right)
left = left1
list(compress_ranges([1, 2, 3, 4, 5, 8, 9, 10, 11, 200, 201, 202]))
Similar questions:
Python - find incremental numbered sequences with a list comprehension
Pythonic way to convert a list of integers into a string of comma-separated ranges
input = [1, 2, 3, 4, 8, 10, 11, 12, 17]
i, ii, result = iter(input), iter(input[1:]), [[input[0]]]
for x, y in zip(i,ii):
if y-x != 1:
result.append([y])
else:
result[-1].append(y)
>>> result
[[1, 2, 3, 4], [8], [10, 11, 12], [17]]
>>> print ", ".join("-".join(map(str,(g[0],g[-1])[:len(g)])) for g in result)
1-4, 8, 10-12, 17
>>> [(g[0],g[-1])[:len(g)] for g in result]
[(1, 4), (8,), (10, 12), (17,)]

Categories