Fast delta encoding for increasing sequence of integers in Python - python

Given a = [1, 2, 3, 4, 5]
After encoding, a' = [1, 1, 1, 1, 1], each element represents the difference compare to its previous element.
I know this can be done with
for i in range(len(a) - 1, 0, -1):
a[i] = a[i] - a[i - 1]
Is there a faster way? I am working with 2 billion numbers here, the process is taking about 30 minutes.

One way using itertools.starmap, islice and operator.sub:
from operator import sub
from itertools import starmap, islice
l = list(range(1, 10000000))
[l[0], *starmap(sub, zip(islice(l, 1, None), l))]
Output:
[1, 1, 1, ..., 1]
Benchmark:
l = list(range(1, 100000000))
# OP's method
%timeit [l[i] - l[i - 1] for i in range(len(l) - 1, 0, -1)]
# 14.2 s ± 373 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# numpy approach by #ynotzort
%timeit np.diff(l)
# 8.52 s ± 301 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# zip approach by #Nick
%timeit [nxt - cur for cur, nxt in zip(l, l[1:])]
# 7.96 s ± 243 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# itertool and operator approach by #Chris
%timeit [l[0], *starmap(sub, zip(islice(l, 1, None), l))]
# 6.4 s ± 255 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

You could use zip to put together the list with an offset version and subtract those values
a = [1, 2, 3, 4, 5]
a[1:] = [nxt - cur for cur, nxt in zip(a, a[1:])]
print(a)
Output:
[1, 1, 1, 1, 1]
Out of interest, I ran this, the original code and #ynotzort answer through timeit and this was much faster than the numpy code for short lists; remaining faster up to about 10M values; both were about 30% faster than the original code. As the list size increased beyond 10M, the numpy code has more of a speed up and eventually is faster from about 20M values onward.
Update
Also tested the starmap code, and that is about 40% faster than the numpy code at 20M values...
Update 2
#Chris has some more comprehensive performance data in their answer. This answer can be sped up further (about 10%) by using itertools.islice to generate the offset list:
a = [a[0], *[nxt - cur for cur, nxt in zip(a, islice(a, 1, None))]]

You could use numpy.diff, For example:
import numpy as np
a = [1, 2, 3, 4, 5]
npa = np.array(a)
a_diff = np.diff(npa)

Related

Partitioning an integer with pattern

I am aware that the problem of partitioning an integer is old and there are many questions and answers about it here on SO, but after searching extensively I haven't found exactly what I am looking for. To be fair, my solution is not too too bad, but I'd like to know if there is a faster/better way of doing the following:
I need to partition an integer into a fixed-length partition that may include the value 0, and where each "position" in the partition is subject to a max possible value. For example:
>>>list(partition(number = 5, max_vals = (1,0,3,4)))
[(1, 0, 3, 1),
(1, 0, 2, 2),
(1, 0, 0, 4),
(1, 0, 1, 3),
(0, 0, 1, 4),
(0, 0, 2, 3),
(0, 0, 3, 2)]
My solution is the following:
from collections import Counter
from itertools import combinations
def partition(number:int, max_vals:tuple):
S = set(combinations((k for i,val in enumerate(max_vals) for k in [i]*val), number))
for s in S:
c = Counter(s)
yield tuple([c[n] for n in range(len(max_vals))])
Essentially I first create "tokens" for each slot, then I combine the right number of them and finally I count how many per slot there are.
I don't particularly like having to instantiate a Counter for each partition, but the thing I dislike the most is that combinations generates many more tuples than what is needed and then I discard all of the duplicates with set(), which seems quite inefficient. Is there a better way?
Even though there must be better algorithms, a relatively simpler and faster solution, using itertools.product will be:
>>> from itertools import product
>>> def partition_2(number:int, max_vals:tuple):
return (comb for comb in
product(*(range(min(number, i) + 1) for i in max_vals))
if sum(comb)==number)
>>> list(partition_2(number = 5, max_vals = (1,0,3,4)))
[(0, 0, 1, 4),
(0, 0, 2, 3),
(0, 0, 3, 2),
(1, 0, 0, 4),
(1, 0, 1, 3),
(1, 0, 2, 2),
(1, 0, 3, 1)]
Performance:
>>> %timeit list(partition(number = 15, max_vals = (1,0,3,4)*3))
155 ms ± 681 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit list(partition_2(number = 15, max_vals = (1,0,3,4)*3))
14.7 ms ± 763 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
################################################################################
>>> %timeit list(partition(number = 5, max_vals = (10,20,30,10,10)))
1.17 s ± 26.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit list(partition_2(number = 5, max_vals = (10,20,30,10,10)))
1.21 ms ± 28.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#################################################################################
>>> %timeit list(partition_2(number = 35, max_vals = (8,9,10,11,12)))
23.2 ms ± 697 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit list(partition(number = 35, max_vals = (8,9,10,11,12)))
# Will update when/if it finishes :)
A recursive function is usually an elegant way of approaching this kind of problem:
def partition(N,slots):
if len(slots)==1:
if slots[0]>=N: yield [N]
return
for s in range(min(N,slots[0])+1):
yield from ([s]+p for p in partition(N-s,slots[1:]))
for part in partition(5,[1,0,3,4]): print(part)
[0, 0, 1, 4]
[0, 0, 2, 3]
[0, 0, 3, 2]
[1, 0, 0, 4]
[1, 0, 1, 3]
[1, 0, 2, 2]
[1, 0, 3, 1]
This can be further optimized by checking the remaining space at each recursion level and short circuit traversal when the remaining slots are insufficient for the number to spread:
def partition(N,slots,space=None):
if space is None: space = sum(slots)
if N>space: return
if len(slots)==1:
if slots[0]>=N: yield [N]
return
for s in range(min(N,slots[0])+1):
yield from ([s]+p for p in partition(N-s,slots[1:],space-slots[0]))
This optimization improves performance in scenarios where the number of solutions is less than the full product of all slots. It is slower than iterations in cases where most of the slot combinations work.
from timeit import timeit
t = timeit(lambda:list(partition(45,(8,9,10,11,12))),number=1)
print(t) # 0.000679596
t = timeit(lambda:list(partition_2(45,(8,9,10,11,12))),number=1)
print(t) # 0.027492302 (Sayandip's)
t = timeit(lambda:list(partition(15,(1,0,3,4)*3)),number=1)
print(t) # 0.024383259
t = timeit(lambda:list(partition_2(15,(1,0,3,4)*3)),number=1)
print(t) # 0.018362536
To get systematically better performance from the recursive approach, we would need to limit the depth of recursion. This can be done by approaching the problem differently. If we split the slots in two groups and determine the distribution between two combined slots (left and right) we can then apply the partition on each side and combine the results. This will only recurse to a depth of Log2N and will combine large chunks together instead of only adding values one at a time:
from itertools import product
def partition(N,slots,space=None):
if space is not None and N>space: return
if len(slots)==1:
if slots[0]>=N: yield [N]
return
if len(slots)==2:
for left in range(max(0,N-slots[1]),min(N,slots[0])+1):
yield [left,N-left]
return
leftSlots = slots[:len(slots)//2]
rightSlots = slots[len(slots)//2:]
leftSpace,rightSpace = sum(leftSlots),sum(rightSlots)
for leftN,rightN in partition(N,[leftSpace,rightSpace],leftSpace+rightSpace):
partLeft = partition(leftN, leftSlots, leftSpace)
partRight = partition(rightN, rightSlots, rightSpace)
for leftSide,rightSide in product(partLeft,partRight):
yield leftSide+rightSide
The performance improvement is then systematic, in all scenarios:
t = timeit(lambda:list(partition(45,(8,9,10,11,12))),number=1)
print(t) # 0.00017742
t = timeit(lambda:list(partition_2(45,(8,9,10,11,12))),number=1)
print(t) # 0.02895038
t = timeit(lambda:list(partition(15,(1,0,3,4)*3)),number=1)
print(t) # 0.00338676
t = timeit(lambda:list(partition_2(15,(1,0,3,4)*3)),number=1)
print(t) # 0.02025453

Python numpy split with indices

I'm looking for a numpy equivalent of my suboptimal Python code. The calculation I want to do can be summarized by:
The average of the peak of each section for each row.
Here the code with a sample array and list of indices. Sections can be of different sizes.
x = np.array([[1, 2, 3, 4],
[5, 6, 7, 8]])
indices = [2]
result = np.empty((1, x.shape[0]))
for row in x:
splited = np.array_split(row, indexes)
peak = [np.amax(a) for a in splited]
result[0, i] = np.average(peak)
Which gives: result = array([[3., 7.]])
What is the optimized numpy way to suppress both loop?
You could just take off the for loop and use axis instead:
result2 = np.mean([np.max(arr, 1) for arr in np.array_split(x_large, indices, 1)], axis=0)
Output:
array([3., 7.])
Benchmark:
x_large = np.array([[1, 2, 3, 4],
[5, 6, 7, 8]] * 1000)
%%timeit
result = []
for row in x_large:
splited = np.array_split(row, indices)
peak = [np.amax(a) for a in splited]
result.append(np.average(peak))
# 29.9 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.mean([np.max(arr, 1) for arr in np.array_split(x_large, indices, 1)], axis=0)
# 37.4 µs ± 499 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Validation:
np.array_equal(result, result2)
# True

Find largest row in a matrix with numpy (row with highest length)

I have a massive array with rows and columns. Some rows are larger than others. I need to get the max length row, that is, the row that has the highest length. I wrote a simple function for this, but I wanted it to be as fas as possible, like numpy fast. Currently, it looks like this:
Example array:
values = [
[1,2,3],
[4,5,6,7,8,9],
[10,11,12,13]
]
def values_max_width(values):
max_width = 1
for row in values:
if len(row) > max_width:
max_width = len(row)
return max_width
Is there any way to accomplish this with numpy?
In [261]: values = [
...: [1,2,3],
...: [4,5,6,7,8,9],
...: [10,11,12,13]
...: ]
...:
In [262]:
In [262]: values
Out[262]: [[1, 2, 3], [4, 5, 6, 7, 8, 9], [10, 11, 12, 13]]
In [263]: def values_max_width(values):
...: max_width = 1
...: for row in values:
...: if len(row) > max_width:
...: max_width = len(row)
...: return max_width
...:
In [264]: values_max_width(values)
Out[264]: 6
In [265]: [len(v) for v in values]
Out[265]: [3, 6, 4]
In [266]: max([len(v) for v in values])
Out[266]: 6
In [267]: np.max([len(v) for v in values])
Out[267]: 6
Your loop and the list comprehension are similar in speed, np.max is much slower - it has to first turn the list into an array.
In [268]: timeit max([len(v) for v in values])
656 ns ± 16.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [269]: timeit np.max([len(v) for v in values])
13.9 µs ± 181 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [271]: timeit values_max_width(values)
555 ns ± 13 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
If you are starting with a list, it's a good idea to thoroughly test the list implementation. numpy is fast when it is doing compiled array stuff, but creating an array from a list is time consuming.
Making an array directly from values isn't much help. The result in a object dtype array:
In [272]: arr = np.array(values)
In [273]: arr
Out[273]:
array([list([1, 2, 3]), list([4, 5, 6, 7, 8, 9]), list([10, 11, 12, 13])],
dtype=object)
Math on such an array is hit-or-miss, and always slower than math on pure numeric arrays. We can iterate on such an array, but that iteration is slower than on a list.
In [275]: values_max_width(arr)
Out[275]: 6
In [276]: timeit values_max_width(arr)
1.3 µs ± 8.27 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Not sure how you can make it faster. I've tried using np.max over the length of each item, but that will take even longer:
import numpy as np
import time
values = []
for k in range(100000):
values.append(list(np.random.randint(100, size=np.random.randint(1000))))
def timeit(func):
def wrapper(*args, **kwargs):
now = time.time()
retval = func(*args, **kwargs)
print('{} took {:.5f}s'.format(func.__name__, time.time() - now))
return retval
return wrapper
#timeit
def values_max_width(values):
max_width = 1
for row in values:
if len(row) > max_width:
max_width = len(row)
return max_width
#timeit
def value_max_width_len(values):
return np.max([len(l) for l in values])
values_max_width(values)
value_max_width_len(values)
values_max_width took 0.00598s
value_max_width_len took 0.00994s
* Edit *
As #Mstaino suggested, using map does make this code faster:
#timeit
def value_max_width_len(values):
return max(map(len, values))
values_max_width took 0.00598s
value_max_width_len took 0.00499s

How can I improve the efficiency of matrix multiplication in python?

I have written a code to do matrix multiplication of different range but it takes a lot of time to exexcute the code,
code:
Program to multiply two matrices using nested loops
import time
print("Enter the size of matrix A")
m = int(input())
n = int(input())
print("Enter the size of matrix A")
p = int(input())
q = int(input())
if(n==p):
print('enter matrix A')
else:
print("invalid entry")
exit()
our_list1 = []
A = []
i = 0
int(i)
for i in range(m):
for i in range(n):
number = int(input('Please enter a element '))
our_list1.append(number)
A.append(our_list1)
our_list1= []
print(A)
print('enter matrix B')
our_list1 = []
B = []
for i in range(p):
for i in range(q):
number = int(input('Please enter a element '))
our_list1.append(number)
B.append(our_list1)
our_list1= []
print(B)
start_time = time.time()
#
our_list1 = []
R = []
for i in range(m):
for i in range(q):
number = 0
our_list1.append(number)
R.append(our_list1)
our_list1= []
print(R)
for i in range(len(A)):
# iterating by coloum by B
for j in range(len(B[0])):
# iterating by rows of B
for k in range(len(B)):
R[i][j] += A[i][k] * B[k][j]
print(R)
print("--- %s seconds ---" % (time.time() - start_time))
It takes more time to execute this method of matrix multiplication, how can I choose the efficient way of matrix multiplication of huge dimension range? So higher dimension array can be executed smoothly and quickly.
Sample output:
Matrix A[[3, 3, 3], [3, 3, 3], [3, 3, 3]]
Matrix B[[3, 3, 3], [3, 3, 3], [3, 3, 3]]
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
[[27, 27, 27], [27, 27, 27], [27, 27, 27]]
--- 0.00014400482177734375 seconds ---
it takes 0.00014400482177734375 seconds can I improve this timmings when I do for higher dimension multiplication?
This timings in your comments have some significant drawbacks:
print() is comparatively expensive and has nothing to do with the calculation. Including it in the timings could take up a big chunk of the overall time.
Using wallclock (time.time()) is not a good way of getting stable timings; you get one run and anything could be happening on your system.
This should give a better test case for comparison:
import numpy as np
def python_lists():
A = [[3, 3, 3], [3, 3, 3], [3, 3, 3]]
B = [[3, 3, 3], [3, 3, 3], [3, 3, 3]]
our_list1 = []
R = []
for i in range(3):
for i in range(3):
number = 0
our_list1.append(number)
R.append(our_list1)
our_list1= []
for i in range(len(A)):
# iterating by coloum by B
for j in range(len(B[0])):
# iterating by rows of B
for k in range(len(B)):
R[i][j] += A[i][k] * B[k][j]
def numpy_array():
A = np.full((3, 3), 3)
B = np.full((3, 3), 3)
result = np.dot(A, B)
And the timings:
%timeit python_lists()
15 µs ± 45.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit numpy_array()
5.57 µs ± 44.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
So, NumPy is ~3 times faster for this example. But this would be more significant if you had bigger arrays.
EDIT:
And actually, you could argue that creating A and B inside the function is not helpful for timing the actual matrix multiplication, so if I instead create the lists/arrays first and pass them, the new timings are:
%timeit python_lists(A, B)
14.4 µs ± 98.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit numpy_array(A, B)
1.2 µs ± 13.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
And, for the sake of completeness, for an array with shape (200, 200):
%timeit python_lists()
6.99 s ± 128 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit numpy_array()
5.77 ms ± 43.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

How to find the index of the element in a list that first appears in another given list?

a = [3, 4, 2, 1, 7, 6, 5]
b = [4, 6]
The answer should be 1. Because in a, 4 appears first in list b, and it's index is 1.
The question is that is there any fast code in python to achieve this?
PS: Actually a is a random permutation and b is a subset of a, but it's represented as a list.
If b is to be seen as a subset (order doesn't matter, all values are present in a), then use min() with a map():
min(map(a.index, b))
This returns the lowest index. This is a O(NK) solution (where N is the length of a, K that of b), but all looping is executed in C code.
Another option is to convert a to a set and use next() on a loop over enumerate():
bset = set(b)
next(i for i, v in enumerate(a) if v in bset)
This is a O(N) solution, but has higher constant cost (Python bytecode to execute). It heavily depends on the sizes of a and b which one is going to be faster.
For the small input example in the question, min(map(...)) wins:
In [86]: a = [3, 4, 2, 1, 7, 6, 5]
...: b = [4, 6]
...:
In [87]: %timeit min(map(a.index, b))
...:
608 ns ± 64.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [88]: bset = set(b)
...:
In [89]: %timeit next(i for i, v in enumerate(a) if v in bset)
...:
717 ns ± 30.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In one line :
print("".join([str(index) for item in b for index,item1 in enumerate(a) if item==item1][:1]))
output:
1
In detail :
a = [3, 4, 2, 1, 7, 6, 5]
b = [4, 6]
new=[]
for item in b:
for index,item1 in enumerate(a):
if item==item1:
new.append(index)
print("".join([str(x) for x in new[:1]]))
For little B sample, the set approach is output dependent, execution time grow linearly with index output. Numpy can provide better solution in this case.
N=10**6
A=np.unique(np.random.randint(0,N,N))
np.random.shuffle(A)
B=A[:3].copy()
np.random.shuffle(A)
def find(A,B):
pos=np.in1d(A,B).nonzero()[0]
return pos[A[pos].argsort()][B.argsort().argsort()].min()
def findset(A,B):
bset = set(B)
return next(i for i, v in enumerate(A) if v in bset)
#In [29]: find(A,B)==findset(A,B)
#Out[29]: True
#In [30]: %timeit findset(A,B)
# 63.5 ms ± 1.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#
# In [31]: %timeit find(A,B)
# 2.24 ms ± 52.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Categories