Finding closest match in collection of strings representing numbers

Finding closest match in collection of strings representing numbers - python

I have a sorted list of datetimes in text format. The format of each entry is '2009-09-10T12:00:00'.
I want to find the entry closest to a target. There are many more entries than the number of searches I would have to do.
I could change each entry to a number, then search numerically (for example these approaches), but that would seem excess effort.
Is there a better way than this:
def mid(res, target):
#res is a list of entries, sorted by dt (dateTtime)
#each entry is a dict with a dt and some other info
n = len(res)
low = 0
high = n-1
# find the first res greater than target
while low < high:
mid = (low + high)/2
t = res[int(mid)]['dt']
if t < target:
low = mid + 1
else:
high = mid
# check if the prior value is closer
i = max(0, int(low)-1)
a = dttosecs(res[i]['dt'])
b = dttosecs(res[int(low)]['dt'])
t = dttosecs(target)
if abs(a-t) < abs(b-t):
return int(low-1)
else:
return int(low)
import time
def dttosecs(dt):
# string to seconds since the beginning
date,tim = dt.split('T')
y,m,d = date.split('-')
h,mn,s = tim.split(':')
y = int(y)
m = int(m)
d = int(d)
h = int(h)
mn = int(mn)
s = min(59,int(float(s)+0.5)) # round to neatest second
s = int(s)
secs = time.mktime((y,m,d,h,mn,s,0,0,-1))
return secs

"Copy and paste coding" (getting bisect's sources into your code) is not recommended as it carries all sorts of costs down the road (lot of extra source code for you to test and maintain, difficulties dealing with upgrades in the upstream code you've copied, etc, etc); the best way to reuse standard library modules is simply to import them and use them.
However, to do one pass transforming the dictionaries into meaningfully comparable entries is O(N), which (even though each step of the pass is simple) will eventually swamp the O(log N) time of the search proper. Since bisect can't support a key= key extractor like sort does, what the solution to this dilemma -- how can you reuse bisect by import and call, without a preliminary O(N) step...?
As quoted here, the solution is in David Wheeler's famous saying, "All problems in computer science can be solved by another level of indirection". Consider e.g....:
import bisect
listofdicts = [
{'dt': '2009-%2.2d-%2.2dT12:00:00' % (m,d) }
for m in range(4,9) for d in range(1,30)
]
class Indexer(object):
def __init__(self, lod, key):
self.lod = lod
self.key = key
def __len__(self):
return len(self.lod)
def __getitem__(self, idx):
return self.lod[idx][self.key]
lookfor = listofdicts[len(listofdicts)//2]['dt']
def mid(res=listofdicts, target=lookfor):
keys = [r['dt'] for r in res]
return res[bisect.bisect_left(keys, target)]
def midi(res=listofdicts, target=lookfor):
wrap = Indexer(res, 'dt')
return res[bisect.bisect_left(wrap, target)]
if __name__ == '__main__':
print '%d dicts on the list' % len(listofdicts)
print 'Looking for', lookfor
print mid(), midi()
assert mid() == midi()
The output (just running this indexer.py as a check, then with timeit, two ways):
$ python indexer.py
145 dicts on the list
Looking for 2009-06-15T12:00:00
{'dt': '2009-06-15T12:00:00'} {'dt': '2009-06-15T12:00:00'}
$ python -mtimeit -s'import indexer' 'indexer.mid()'
10000 loops, best of 3: 27.2 usec per loop
$ python -mtimeit -s'import indexer' 'indexer.midi()'
100000 loops, best of 3: 9.43 usec per loop
As you see, even in a modest task with 145 entries in the list, the indirection approach can have a performance that's three times better than the "key-extraction pass" approach. Since we're comparing O(N) vs O(log N), the advantage of the indirection approach grows without bounds as N increases. (For very small N, the higher multiplicative constants due to the indirection make the key-extraction approach faster, but this is soon surpassed by the big-O difference). Admittedly, the Indexer class is extra code -- however, it's reusable over ALL tasks of binary searching a list of dicts sorted by one entry in each dict, so having it in your "container-utilities back of tricks" offers good return on that investment.
So much for the main search loop. For the secondary task of converting two entries (the one just below and the one just above the target) and the target to a number of seconds, consider, again, a higher-reuse approach, namely:
import time
adt = '2009-09-10T12:00:00'
def dttosecs(dt=adt):
# string to seconds since the beginning
date,tim = dt.split('T')
y,m,d = date.split('-')
h,mn,s = tim.split(':')
y = int(y)
m = int(m)
d = int(d)
h = int(h)
mn = int(mn)
s = min(59,int(float(s)+0.5)) # round to neatest second
s = int(s)
secs = time.mktime((y,m,d,h,mn,s,0,0,-1))
return secs
def simpler(dt=adt):
return time.mktime(time.strptime(dt, '%Y-%m-%dT%H:%M:%S'))
if __name__ == '__main__':
print adt, dttosecs(), simpler()
assert dttosecs() == simpler()
Here, there is no performance advantage to the reuse approach (indeed, and on the contrary, dttosecs is faster) -- but then, you only need to perform three conversions per search, no matter how many entries are on your list of dicts, so it's not clear whether that performance issue is germane. Meanwhile, with simpler you only have to write, test and maintain one simple line of code, while dttosecs is a dozen lines; given this ratio, in most situations (i.e., excluding absolute bottlenecks), I would prefer simpler. The important thing is to be aware of both approaches and of the tradeoffs between them so as to ensure the choice is made wisely.

You want the bisect module from the standard library. It will do a binary search and tell you the correct insertion point for a new value into an already sorted list. Here's an example that will print the place in the list where target would be inserted:
from bisect import bisect
dates = ['2009-09-10T12:00:00', '2009-09-11T12:32:00', '2009-09-11T12:43:00']
target = '2009-09-11T12:40:00'
print bisect(dates, target)
From there you can just compare to the thing before and after your insertion point, which in this case would be dates[i-1] and dates[i] to see which one is closest to your target.

import bisect
def mid(res, target):
keys = [r['dt'] for r in res]
return res[bisect.bisect_left(keys, target)]

First, change to this.
import datetime
def parse_dt(dt):
return datetime.strptime( dt, "%Y-%m-%dT%H:%M:%S" )
This removes much of the "effort".
Consider this as the search.
def mid( res, target ):
"""res is a list of entries, sorted by dt (dateTtime)
each entry is a dict with a dt and some other info
"""
times = [ parse_dt(r['dt']) for r in res ]
index= bisect( times, parse_dt(target) )
return times[index]
This doesn't seem like very much "effort". This does not depend on your timestamps being formatted properly, either. You can change to any timestamp format and be assured that this will always work.

Related

My program can't run that fast even with memoization

I tried a problem on project euler where I needed to find the sum of all the fibonacci terms under 4 million. It took me a long time but then I found out that I can use memoization to do it but it seems to take still a long time. After a lot of research, I found out that I can use a built-in module called lru_cache. My question is : why isn't it as fast as memoization ?
Here's my code:
from functools import lru_cache
#lru_cache(maxsize=1000000)
def fibonacci_memo(input_value):
global value
fibonacci_cache = {}
if input_value in fibonacci_cache:
return fibonacci_cache[input_value]
if input_value == 0:
value = 1
elif input_value == 1:
value = 1
elif input_value > 1:
value = fibonacci_memo(input_value - 1) + fibonacci_memo(input_value - 2)
fibonacci_cache[input_value] = value
return value
def sumOfFib():
SUM = 0
for n in range(500):
if fibonacci_memo(n) < 4000000:
if fibonacci_memo(n) % 2 == 0:
SUM += fibonacci_memo(n)
return SUM
print(sumOfFib())
The code works by the way. It takes less than a second to run it when I use the lru_cache module.

The other answer is the correct way to calculate the fibonacci sequence, indeed, but you should also know why your memoization wasn't working. To be specific:
fibonacci_cache = {}
This line being inside the function means you were emptying your cache every time fibonacci_memo was called.

You shouldn't be computing the Fibonacci sequence, not even by dynamic programming. Since the Fibonacci sequence satisfies a linear recurrence relation with constant coefficients and constant order, then so will be the sequence of their sums.
Definitely don't cache all the values. That will give you an unnecessary consumption of memory. When the recurrences have constant order, you only need to remember as many previous terms as the order of the recurrence.
Further more, there is a way to turn recurrences of constant order into systems recurrences of order one. The solution of the latter is given by a power of a matrix. This gives a faster algorithm, for large values of n. Each step will be more expensive, though. So, the best method would use a combination of the two, choosing the first method for small values of n and the latter for large inputs.
O(n) using the recurrence for the sum
Denote S_n=F_0+F_1+...+F_n the sum of the first Fibonacci numbers F_0,F_1,...,F_n.
Observe that
S_{n+1}-S_n=F_{n+1}
S_{n+2}-S_{n+1}=F_{n+2}
S_{n+3}-S_{n+2}=F_{n+3}
Since F_{n+3}=F_{n+2}+F_{n+1} we get that S_{n+3}-S_{n+2}=S_{n+2}-S_n. So
S_{n+3}=2S_{n+2}-S_n
with the initial conditions S_0=F_0=1, S_1=F_0+F_1=1+1=2, and S_2=S_1+F_2=2+2=4.
One thing that you can do is compute S_n bottom up, remembering the values of only the previous three terms at each step. You don't need to remember all of the values of S_k, from k=0 to k=n. This gives you an O(n) algorithm with O(1) amount of memory.
O(ln(n)) by matrix exponentiation
You can also get an O(ln(n)) algorithm in the following way:
Call X_n to be the column vector with components S_{n+2},S_{n+1},S_{n}
So, the recurrence above gives the recurrence
X_{n+1}=AX_n
where A is the matrix
[
[2,0,-1],
[1,0,0],
[0,1,0],
]
Therefore, X_n=A^nX_0. We have X_0. To multiply by A^n we can do exponentiation by squaring.

For the sake of completeness here are implementations of the general ideas described in #NotDijkstra's answer plus my humble optimizations including the "closed form" solution implemented in integer arithmetic.
We can see that the "smart" methods are not only an order of magnitude faster but also seem to scale better compatible with the fact (thanks #NotDijkstra) that Python big ints use better than naive multiplication.
import numpy as np
import operator as op
from simple_benchmark import BenchmarkBuilder, MultiArgument
B = BenchmarkBuilder()
def pow(b,e,mul=op.mul,unit=1):
if e == 0:
return unit
res = b
for bit in bin(e)[3:]:
res = mul(res,res)
if bit=="1":
res = mul(res,b)
return res
def mul_fib(a,b):
return (a[0]*b[0]+5*a[1]*b[1])>>1 , (a[0]*b[1]+a[1]*b[0])>>1
def fib_closed(n):
return pow((1,1),n+1,mul_fib)[1]
def fib_mat(n):
return pow(np.array([[1,1],[1,0]],'O'),n,op.matmul)[0,0]
def fib_sequential(n):
t1,t2 = 1,1
for i in range(n-1):
t1,t2 = t2,t1+t2
return t2
def sum_fib_direct(n):
t1,t2,res = 1,1,1
for i in range(n):
t1,t2,res = t2,t1+t2,res+t2
return res
def sum_fib(n,method="closed"):
if method == "direct":
return sum_fib_direct(n)
return globals()[f"fib_{method}"](n+2)-1
methods = "closed mat sequential direct".split()
def f(method):
def f(n):
return sum_fib(n,method)
f.__name__ = method
return f
for method in methods:
B.add_function(method)(f(method))
B.add_arguments('N')(lambda:(2*(1<<k,) for k in range(23)))
r = B.run()
r.plot()
import matplotlib.pylab as P
P.savefig(fib.png)

I am not sure how you are taking anything near a second. Here is the memoized version without fanciness:
class fibs(object):
def __init__(self):
self.thefibs = {0:0, 1:1}
def __call__(self, n):
if n not in self.thefibs:
self.thefibs[n] = self(n-1)+self(n-2)
return self.thefibs[n]
dog = fibs()
sum([dog(i) for i in range(40) if dog(i) < 4000000])

Python iterate through unique commutative equations

Inspired by the math of this video, I was curious how long it takes get a number, n, where I can use the number, 1, and a few operations like addition and subtraction, in the binary base. Currently, I have things coded up like this:
import itertools as it
def brute(m):
out = set()
for combo in it.product(['+','*','|'], repeat=m):
x = parse(combo)
if type(x) == int and 0 < x-1:
out.add(x)
return out
def parse(ops):
eq = ""
last = 1
for op in ops:
if op == "|":
last *= 2
last += 1
else:
eq += str(last)
eq += op
last = 1
eq += str(last)
return eval(eq)
Here, "|" refers to concatenation, thus combo = ("+","|","*","|","|") would parse as 1+11*111 = 1+3*7 = 22. I'm using this to build a table of how many numbers are m operations away and study how this grows. Currently I am using the itertools product function to search all possible operation combos, despite this being a bit repetitive, as "+" and "*" are both commutative operations. Is there a clean way to only generate the commutatively unique expressions?
Currently, my best idea is to have a second brute force program, brute2(k) which only uses the operations "*" and "|", and then have:
def brute(m):
out = set()
for k in range(m):
for a,b in it.product(brute(k),brute2(m-k)):
out.add(a+b)
if I memoize things, this would be pretty decent, but I'm not sure if there's a more pythonic or efficient way than this. Furthermore, this fails to cleanly generalize if I decide to add more operations in like subtraction.
What I'm hoping to achieve is insight on if there is some module or simple method which will efficiently iterate through the operation combos. Currently, itertools.product has complexity O(3^m). However, with memoizing and using the brute2 method, it seems I could get things down to complexity O(|brute(m-1)|), which seems to be asymptotically ~O(1.5^m). While I am moderately happy with this second method, it would be nice if there was a more generalizable method which would extend for arbitrary amounts of operations, some of which are commutative.
Update:
I've now gotten my second idea coded up. With this, I was able to quickly get all numbers reachable in 42 operations, when my old code got stuck for hours after 20 operations. Here is the new code:
memo1 = {0:{1}}
def brute1(m):
if m not in memo1:
out = set(brute2(m+1))
for i in range(m):
for a,b in it.product(brute1(i),brute2(m-i)):
out.add(a+b)
memo1[m] = set(out)
return memo1[m]
memo2 = {0:{1}}
def brute2(m):
if m not in memo2:
out = set()
for i in range(m):
for x in brute2(i):
out.add(x*(2**(m-i)-1))
memo2[m] = set(out)
return memo2[m]
I were to generalize this, you order all your commutative operations, [op_1,op_2,... op_x], have brute(n,0) return all numbers reachable with n non-commutative operations, and then have:
memo = {}
def brute(n,i):
if (n,i) not in memo:
out = brute(n,i-1) #in case I don't use op_i
for x in range(1,n): #x is the number of operations before my last use of op_i, if combo = (op_i,"|","+","+","*","+",op_i,"*","|"), this case would be covered when x = 6
for a,b in it.product(brute(x,i),brute(n-x-1,i-1)):
out.add(a op_i b)
memo[(n,i)] = out
return memo[(n,i)]
However, you have to be careful about order of operations. If I did addition and then multiplication, things would be totally different, and this would say different things. If there's a hierarchy like PEMDAS, and you don't want to consider parentheses, I believe you just list the operations in decreasing order of priority, i.e. op_1 is the first operation you do etc. If you allow general parenthesization, then you should have something like this I believe:
memo = {0:1}
def brute(m, ops):
if m not in memo:
out = set()
for i in range(m):
for a,b in it.product(brute(i),brute(m-i-1)):
for op in ops:
out.add( op(a,b) )
memo[m] = out
return memo[m]
I'm more interested in the case where we have some sort of PEMDAS system in place, and the parentheses are implied by the sequence of operations, but feedback on the validity/efficiency to either case is still welcome.

Efficient keys for dictionaries

I'm new to posting here, so I hope I provide enough detail. I'm trying to find out if key selection effects the efficiency of dictionaries in Python. Some comparisons I'm thinking of are:
numbers vs strings (e.g. would my_dict[20] be faster than my_dict['twenty'])
len of strings (e.g. would my_dict['a'] be faster than my_dict['abcdefg'])
mixing key types within a dictionary, for instance using numbers, strings, and/or tuples (e.g. would my_dict = {0: 'zero', 2: 'two'} perform faster than {0: 'zero', 'two': 2})
I haven't been able to find this topic from a google search, so I thought maybe someone here might know.

First of all, I'd recommend you to understand How are Python's Built In Dictionaries Implemented.
Now, let's make a little random experiment to prove the theory (at least partially):
import timeit
import string
import random
import time
def random_str(N):
return ''.join(
random.choice(string.ascii_uppercase + string.digits) for _ in range(N)
)
def experiment1(dct, keys):
s = time.time()
{dct[k] for k in keys}
return time.time() - s
if __name__ == "__main__":
N = 10000
K = 200
S = 50000
dct1 = {random_str(K): None for k in range(N)}
dct2 = {i: None for i in range(N)}
keys1 = list(dct1.keys())
keys2 = list(dct2.keys())
samples1 = []
samples2 = []
for i in range(S):
samples1.append(experiment1(dct1, keys1))
samples2.append(experiment1(dct2, keys2))
print(sum(samples1), sum(samples2))
# print(
# timeit.timeit('{dct1[k] for k in keys1}', setup='from __main__ import dct1, keys1')
# )
# print(
# timeit.timeit('{dct2[k] for k in keys2}', setup='from __main__ import dct2, keys2')
# )
The results I've got with different sampling sizes on my box were:
N=10000, K=200, S=100 => 0.08300423622131348 0.07200479507446289
N=10000, K=200, S=1000 => 0.885051965713501 0.7120392322540283
N=10000, K=200, S=10000 => 8.88549256324768 7.005417346954346
N=10000, K=200, S=50000 => 43.57453536987305 34.82594871520996
So as you can see, no matter whether you use big random strings to lookup dictionaries or integers, the performance will remain almost the same. The only "real" difference you'd like to consider would be in terms of memory consumption of both dictionaries. That may be relevant when dumping/loading huge dictionaries to/from disk, in that case it may be worth to have compact form of your data structures so you'll be able to shave off few seconds when caching/reading them.
NS: If anyone is able to explain why I was getting really huge times when using timeit (commented parts) please let me... with little experiment constants I'd get really high values... that's why I left it uncommented. Add a comment if you know the reason ;D

I don't know the answer either but it's easy enough to test.
from timeit import default_timer as timer
import string
#Create a few dictionaries, all the values are None
nums = range(1,27)
dict_numbers = dict.fromkeys(nums)
letters = string.ascii_lowercase
dict_singleLetter = dict.fromkeys(letters)
long_names = []
for letter in letters:
long_names.append(''.join([letter, letters]))
dict_longNames = dict.fromkeys(long_names)
#Function to time thousands of runs to average out discrepancies
def testSpeed(dictionary, keys):
x = None
start = timer()
for _ in range(1,100000):
for i in keys:
x = dictionary[i]
end = timer()
return str(end - start)
#Time the different dictionaries
print("Number took " + testSpeed(dict_numbers, nums) + " seconds")
print("Single letters took " + testSpeed(dict_singleLetter, letters) + " seconds")
print("Long keys took " + testSpeed(dict_longNames, long_names) + " seconds")
All of these dictionaries are the same length and contain the same value for each key. When I ran this the dictionary with the long keys was actually always the fastest, albeit by only maybe 5%. Which could possible be accounted for by other minor differences I am unaware of. Numbers and single letters were pretty close in speed but numbers were generally barely faster then single letters. Hopefully this answers your question, and this code should be easy enough to expand to test mixed cases, but I'm out of time at the moment.

Optimizing the run time of the nested for loop

I am just getting started with competitive programming and after writing the solution to certain problem i got the error of RUNTIME exceeded.
max( | a [ i ] - a [ j ] | + | i - j | )
Where a is a list of elements and i,j are index i need to get the max() of the above expression.
Here is a short but complete code snippet.
t = int(input()) # Number of test cases
for i in range(t):
n = int(input()) #size of list
a = list(map(int, str(input()).split())) # getting space separated input
res = []
for s in range(n): # These two loops are increasing the run-time
for d in range(n):
res.append(abs(a[s] - a[d]) + abs(s - d))
print(max(res))
Input File This link may expire(Hope it works)
1<=t<=100
1<=n<=10^5
0<=a[i]<=10^5
Run-time on leader-board for C language is 5sec and that for Python is 35sec while this code takes 80sec.
It is an online judge so independent on machine.numpy is not available.
Please keep it simple i am new to python.
Thanks for reading.

For a given j<=i, |a[i]-a[j]|+|i-j| = max(a[i]-a[j]+i-j, a[j]-a[i]+i-j).
Thus for a given i, the value of j<=i that maximizes |a[i]-a[j]|+|i-j| is either the j that maximizes a[j]-j or the j that minimizes a[j]+j.
Both these values can be computed as you run along the array, giving a simple O(n) algorithm:
def maxdiff(xs):
mp = mn = xs[0]
best = 0
for i, x in enumerate(xs):
mp = max(mp, x-i)
mn = min(mn, x+i)
best = max(best, x+i-mn, -x+i+mp)
return best
And here's some simple testing against a naive but obviously correct algorithm:
def maxdiff_naive(xs):
best = 0
for i in xrange(len(xs)):
for j in xrange(i+1):
best = max(best, abs(xs[i]-xs[j]) + abs(i-j))
return best
import random
for _ in xrange(500):
r = [random.randrange(1000) for _ in xrange(50)]
md1 = maxdiff(r)
md2 = maxdiff_naive(r)
if md1 != md2:
print "%d != %d\n%s" % (md1, md2, r)
exit
It takes a fraction of a second to run maxdiff on an array of size 10^5, which is significantly better than your reported leaderboard scores.

"Competitive programming" is not about saving a few milliseconds by using a different kind of loop; it's about being smart about how you approach a problem, and then implementing the solution efficiently.
Still, one thing that jumps out is that you are wasting time building a list only to scan it to find the max. Your double loop can be transformed to the following (ignoring other possible improvements):
print(max(abs(a[s] - a[d]) + abs(s - d) for s in range(n) for d in range(n)))
But that's small fry. Worry about your algorithm first, and then turn to even obvious time-wasters like this. You can cut the number of comparisons to half, as #Brett showed you, but I would first study the problem and ask myself: Do I really need to calculate this quantity n^2 times, or even 0.5*n^2 times? That's how you get the times down, not by shaving off milliseconds.

Cost of list functions in Python

Based on this older thread, it looks like the cost of list functions in Python is:
Random access: O(1)
Insertion/deletion to front: O(n)
Insertion/deletion to back: O(1)
Can anyone confirm whether this is still true in Python 2.6/3.x?

Take a look here. It's a PEP for a different kind of list. The version specified is 2.6/3.0.
Append (insertion to back) is O(1), while insertion (everywhere else) is O(n). So yes, it looks like this is still true.
Operation...Complexity
Copy........O(n)
Append......O(1)
Insert......O(n)
Get Item....O(1)
Set Item....O(1)
Del Item....O(n)
Iteration...O(n)
Get Slice...O(k)
Del Slice...O(n)
Set Slice...O(n+k)
Extend......O(k)
Sort........O(n log n)
Multiply....O(nk)

Python 3 is mostly an evolutionary change, no big changes in the datastructures and their complexities.
The canonical source for Python collections is TimeComplexity on the Wiki.

That's correct, inserting in front forces a move of all the elements to make place.
collections.deque offers similar functionality, but is optimized for insertion on both sides.

I know this post is old, but I recently did a little test myself. The complexity of list.insert() appears to be O(n)
Code:
'''
Independent Study, Timing insertion list method in python
'''
import time
def make_list_of_size(n):
ret_list = []
for i in range(n):
ret_list.append(n)
return ret_list
#Estimate overhead of timing loop
def get_overhead(niters):
'''
Returns the time it takes to iterate a for loop niter times
'''
tic = time.time()
for i in range(niters):
pass #Just blindly iterate, niter times
toc = time.time()
overhead = toc-tic
return overhead
def tictoc_midpoint_insertion(list_size_initial, list_size_final, niters,file):
overhead = get_overhead(niters)
list_size = list_size_initial
#insertion_pt = list_size//2 #<------- INSERTION POINT ASSIGMNET LOCATION 1
#insertion_pt = 0 #<--------------- INSERTION POINT ASSIGNMENT LOCATION 4 (insert at beginning)
delta = 100
while list_size <= list_size_final:
#insertion_pt = list_size//2 #<----------- INSERTION POINT ASSIGNMENT LOCATION 2
x = make_list_of_size(list_size)
tic = time.time()
for i in range(niters):
insertion_pt = len(x)//2 #<------------- INSERTION POINT ASSIGNMENT LOCATION 3
#insertion_pt = len(x) #<------------- INSERTION POINT ASSIGN LOC 5 insert at true end
x.insert(insertion_pt,0)
toc = time.time()
cost_per_iter = (toc-tic)/niters #overall time cost per iteration
cost_per_iter_no_overhead = (toc - tic - overhead)/niters #the fraction of time/iteration, #without overhead cost of pure iteration
print("list size = {:d}, cost without overhead = {:f} sec/iter".format(list_size,cost_per_iter_no_overhead))
file.write(str(list_size)+','+str(cost_per_iter_no_overhead)+'\n')
if list_size >= 10*delta:
delta *= 10
list_size += delta
def main():
fname = input()
file = open(fname,'w')
niters = 10000
tictoc_midpoint_insertion(100,10000000,niters,file)
file.close()
main()
See 5 positions where insertion can be done. Cost is of course a function of how large the list is, and how close you are to the beginning of the list (i.e. how many memory locations have to be re-organized)
Ignore left image of plot

Fwiw, there is a faster (for some ops... insert is O(log n)) list implementation called BList if you need it. BList

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding closest match in collection of strings representing numbers - python

import bisect def mid(res, target): keys = [r['dt'] for r in res] return res[bisect.bisect_left(keys, target)]

Related

My program can't run that fast even with memoization

Python iterate through unique commutative equations

Efficient keys for dictionaries

Optimizing the run time of the nested for loop

Cost of list functions in Python

Categories

Resources