Related
I want to modify block elements of 3d array without for loop. Without loop because it is the bottleneck of my code.
To illustrate what I want, I draw a figure:
The code with for loop:
import numpy as np
# Create 3d array with 2x4x4 elements
a = np.arange(2*4*4).reshape(2,4,4)
b = np.zeros(np.shape(a))
# Change Block Elements
for it1 in range(2):
b[it1]= np.block([[a[it1,0:2,0:2], a[it1,2:4,0:2]],[a[it1,0:2,2:4], a[it1,2:4,2:4]]] )
First let's see if there's a way to do what you want for a 2D array using only indexing, reshape, and transpose operations. If there is, then there's a good chance that you can extend it to a larger number of dimensions.
x = np.arange(2 * 3 * 2 * 5).reshape(2 * 3, 2 * 5)
Clearly you can reshape this into an array that has the blocks along a separate dimension:
x.reshape(2, 3, 2, 5)
Then you can transpose the resulting blocks:
x.reshape(2, 3, 2, 5).transpose(2, 1, 0, 3)
So far, none of the data has been copied. To make the copy happen, reshape back into the original shape:
x.reshape(2, 3, 2, 5).transpose(2, 1, 0, 3).reshape(2 * 3, 2 * 5)
Adding additional leading dimensions is as simple as increasing the number of the dimensions you want to swap:
b = a.reshape(a.shape[0], 2, a.shape[1] // 2, 2, a.shape[2] // 2).transpose(0, 3, 2, 1, 4).reshape(a.shape)
Here is a quick benchmark of the other implementations with your original array:
a = np.arange(2*4*4).reshape(2,4,4)
%%timeit
b = np.zeros(np.shape(a))
for it1 in range(2):
b[it1] = np.block([[a[it1, 0:2, 0:2], a[it1, 2:4, 0:2]], [a[it1, 0:2, 2:4], a[it1, 2:4, 2:4]]])
27.7 µs ± 107 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit
b = a.copy()
b[:,0:2,2:4], b[:,2:4,0:2] = b[:,2:4,0:2].copy(), b[:,0:2,2:4].copy()
2.22 µs ± 3.89 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit b = np.block([[a[:,0:2,0:2], a[:,2:4,0:2]],[a[:,0:2,2:4], a[:,2:4,2:4]]])
13.6 µs ± 217 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit b = a.reshape(a.shape[0], 2, a.shape[1] // 2, 2, a.shape[2] // 2).transpose(0, 3, 2, 1, 4).reshape(a.shape)
1.27 µs ± 14.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
For small arrays, the differences can sometimes be attributed to overhead. Here is a more meaningful comparison with arrays of size 10x1000x1000, split into 10 500x500 blocks:
a = np.arange(10*1000*1000).reshape(10, 1000, 1000)
%%timeit
b = np.zeros(np.shape(a))
for it1 in range(10):
b[it1]= np.block([[a[it1,0:500,0:500], a[it1,500:1000,0:500]],[a[it1,0:500,500:1000], a[it1,500:1000,500:1000]]])
58 ms ± 904 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
b = a.copy()
b[:,0:500,500:1000], b[:,500:1000,0:500] = b[:,500:1000,0:500].copy(), b[:,0:500,500:1000].copy()
41.2 ms ± 688 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit b = np.block([[a[:,0:500,0:500], a[:,500:1000,0:500]],[a[:,0:500,500:1000], a[:,500:1000,500:1000]]])
27.5 ms ± 569 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit b = a.reshape(a.shape[0], 2, a.shape[1] // 2, 2, a.shape[2] // 2).transpose(0, 3, 2, 1, 4).reshape(a.shape)
20 ms ± 161 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
So it seems that using numpy's own reshaping and transposition mechanism is fastest on my computer. Also, notice that the overhead of np.block becomes less important than copying the temporary arrays as size gets bigger, so the other two implementations change places.
You can directly replace the it1 by a slice of the whole dimension:
b = np.block([[a[:,0:2,0:2], a[:,2:4,0:2]],[a[:,0:2,2:4], a[:,2:4,2:4]]])
Will it make it faster?
import numpy as np
a = np.arange(2*4*4).reshape(2,4,4)
b = a.copy()
b[:,0:2,2:4], b[:,2:4,0:2] = b[:,2:4,0:2].copy(), b[:,0:2,2:4].copy()
Comparison with np.block() alternative from another answer.
Option 1:
%timeit b = a.copy(); b[:,0:2,2:4], b[:,2:4,0:2] = b[:,2:4,0:2].copy(), b[:,0:2,2:4].copy()
Output:
5.44 µs ± 134 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Option 2
%timeit b = np.block([[a[:,0:2,0:2], a[:,2:4,0:2]],[a[:,0:2,2:4], a[:,2:4,2:4]]])
Output:
30.6 µs ± 1.75 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
I am trying to implement a code which works for retrieving Millionth Fibonacci number or beyond. I am using Matrix Multiplication with Numpy for faster calculations.
According to my understanding it should take O(logN) time and worst case for a million should result in: Nearly 6secs which should be alright.
Following is my implementation:
def fib(n):
import numpy as np
matrix = np.matrix([[1, 1], [1, 0]]) ** abs(n)
if n%2 == 0 and n < 0:
return -matrix[0,1]
return matrix[0, 1]
However, leave 1million, it is not even generating correct response for 1000
Actual response:
43466557686937456435688527675040625802564660517371780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228875
My Response:
817770325994397771
Why is Python truncating the responses generally from the docs it should be capable of calulating even 10**1000 values. Where did I go wrong?
Numpy can handle numbers and calculations in a high-performance way (both memory efficiency and computing time). So while Python can process sufficiently large numbers, Numpy can't. You can let Python do the calculation and get the result, exchanging with performance reduction.
Sample code:
import numpy as np
def fib(n):
# the difference is dtype=object, it will let python do the calculation
matrix = np.matrix([[1, 1], [1, 0]], dtype=object) ** abs(n)
if n%2 == 0 and n < 0:
return -matrix[0,1]
return matrix[0, 1]
print(fib(1000))
Output:
43466557686937456435688527675040625802564660517371780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228875
PS: Warning
Milionth Fibonacci number is extremely large, you should make sure that Python can handle it. If not, you will have to implement/find some module to handle these large numbers.
I'm not convinced that numpy is much help here as it does not directly support Python's very large integers in vectorized operations. A basic Python implementation of an O(logN) algorithm gets the 1 millionth Fibonacci number in 0.15 sec on my laptop. An iterative (slow) approach gets it in 12 seconds.:
def slowfibo(N):
a = 0
b = 1
for _ in range(1,N): a,b = b,a+b
return a
# Nth Fibonacci number (exponential iterations) O(log(N)) time (N>=0)
def fastFibo(N):
a,b = 1,1
f0,f1 = 0,1
r,s = (1,1) if N&1 else (0,1)
N //=2
while N > 0:
a,b = f0*a+f1*b, f0*b+f1*(a+b)
f0,f1 = b-a,a
if N&1: r,s = f0*r+f1*s, f0*s+f1*(r+s)
N //= 2
return r
output:
f1K = slowFibo(1000) # 0.00009 sec
f1K = fib(1000) # 0.00011 sec (tandat's)
f1K = fastFibo(1000) # 0.00002 sec
43466557686937456435688527675040625802564660517371780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228875
f1M = slowFibo(1_000_000) # 12.52 sec
f1M = fib(1_000_000) # 0.2769 sec (tandat's)
f1M = fastFibo(1_000_000) # 0.14718 sec
19532821287077577316...68996526838242546875
len(str(f1M)) # 208988 digits
The core of your function is
np.matrix([[1, 1], [1, 0]]) ** abs(n)
which is discussed in the Wiki article
np.matrix implements ** as __pwr__, which in turn uses np.linalg.matrix_power. Essentially that's a repeated dot matrix multiplication, with a modest enhancement by grouping the products by powers of 2.
In [319]: M=np.matrix([[1, 1], [1, 0]])
In [320]: M**10
Out[320]:
matrix([[89, 55],
[55, 34]])
The use of np.matrix is discouraged, so I can do the same with
In [321]: A = np.array(M)
In [322]: A
Out[322]:
array([[1, 1],
[1, 0]])
In [323]: np.linalg.matrix_power(A,10)
Out[323]:
array([[89, 55],
[55, 34]])
Using the (newish) # matrix multiplication operator, that's the same as:
In [324]: A#A#A#A#A#A#A#A#A#A
Out[324]:
array([[89, 55],
[55, 34]])
matrix_power does something more like:
In [325]: A2=A#A; A4=A2#A2; A8=A4#A4; A8#A2
Out[325]:
array([[89, 55],
[55, 34]])
And some comparative times:
In [326]: timeit np.linalg.matrix_power(A,10)
16.2 µs ± 58.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [327]: timeit M**10
33.5 µs ± 38.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [328]: timeit A#A#A#A#A#A#A#A#A#A
25.6 µs ± 914 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [329]: timeit A2=A#A; A4=A2#A2; A8=A4#A4; A8#A2
10.2 µs ± 97.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
numpy integers are implemented at int64, in c, so are limited in size. Thus we get overflow with a modest 100:
In [330]: np.linalg.matrix_power(A,100)
Out[330]:
array([[ 1298777728820984005, 3736710778780434371],
[ 3736710778780434371, -2437933049959450366]])
We can get around this by changing the dtype to object. The values are then Python ints, and can grow indefinately:
In [331]: Ao = A.astype(object)
In [332]: Ao
Out[332]:
array([[1, 1],
[1, 0]], dtype=object)
Fortunately matrix_power can cleanly handle object dtype:
In [333]: np.linalg.matrix_power(Ao,100)
Out[333]:
array([[573147844013817084101, 354224848179261915075],
[354224848179261915075, 218922995834555169026]], dtype=object)
Usually math on object dtype is slower, but not in this case:
In [334]: timeit np.linalg.matrix_power(Ao,10)
14.9 µs ± 198 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
I'm guessing it's because of the small (2,2) size of the array, where fast compiled methods aren't useful. This is basically an iterative task, where numpy doesn't have any advantages.
Scaling isn't bad - increase n by 10, and only get a 3-4x increase in time.
In [337]: np.linalg.matrix_power(Ao,1000)
Out[337]:
array([[70330367711422815821835254877183549770181269836358732742604905087154537118196933579742249494562611733487750449241765991088186363265450223647106012053374121273867339111198139373125598767690091902245245323403501,
43466557686937456435688527675040625802564660517371780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228875],
[43466557686937456435688527675040625802564660517371780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228875,
26863810024485359386146727202142923967616609318986952340123175997617981700247881689338369654483356564191827856161443356312976673642210350324634850410377680367334151172899169723197082763985615764450078474174626]],
dtype=object)
In [338]: timeit np.linalg.matrix_power(Ao,1000)
53.8 µs ± 83 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
With object dtype np.matrix:
In [340]: Mo = M.astype(object)
In [344]: timeit Mo**1000
86.1 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
And for million, the times aren't as bad as I anticipated:
In [352]: timeit np.linalg.matrix_power(Ao,1_000_000)
423 ms ± 1.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
For comparison, the fastFibo times on my machine are:
In [354]: fastFibo(100)
Out[354]: 354224848179261915075
In [355]: timeit fastFibo(100)
3.91 µs ± 154 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [356]: timeit fastFibo(1000)
9.37 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [357]: timeit fastFibo(1_000_000)
226 ms ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Let's say I have this numpy array:
[[3 2 1 5]
[3 2 1 5]
[3 2 1 5]
[3 2 1 5]]
How to merge the values of the last column into the first column (or any column to any column). Expected output:
[[8 2 1]
[8 2 1]
[8 2 1]
[8 2 1]]
I've found this solution. But, is there any better way than that?
As per comment, you need to create a view or copy of array in order to get a new array with different size. This is a short comparison of performance of view vs copy:
x = np.tile([1,3,2,4],(4,1))
def f(x):
# calculation + view
x[:,0] = x[:,0] + x[:,-1]
return x[:,:-1]
def g(x):
# calculation + copy
x[:,0] = x[:,0] + x[:,-1]
return np.delete(x,-1, 1)
def h(x):
#calculation only
x[:,0] = x[:,0] + x[:,-1]
%timeit f(x)
%timeit g(x)
%timeit h(x)
9.16 µs ± 1.1 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
35 µs ± 7.35 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
7.81 µs ± 1.42 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
And if len(x) were = 1M:
6.13 ms ± 623 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
18 ms ± 2.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
5.83 ms ± 720 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So solution in a link is very economic, it applies calculation + instant view.
I don't know if this is the best, but it's kind of clever.
In [66]: np.add.reduceat(arr[:,[0,3,1,2]], [0,2,3], axis=1)
Out[66]:
array([[8, 2, 1],
[8, 2, 1],
[8, 2, 1]])
reduceat applies add to groups of columns (axis 1). I first reordered the columns to put the ones to be added together.
I have a dataframe which has a column that is a list. I want to extract the individual elements in every list in the column. So given this input dataframe:
A
0 [5, 4, 3, 6]
1 [7, 8, 9, 6]
The intended output should be a list:
[5, 4, 3, 6,7, 8, 9, 6]
You can use list comprehension with flatten:
a = [y for x in df.A for y in x]
Or use itertools.chain:
from itertools import chain
a = list(chain.from_iterable(df.A))
Or use numpy.concatenate:
a = np.concatenate(df.A).tolist()
Or Series.explode, working for pandas 0.25+:
a = df.A.explode().tolist()
Performance with sample data for 100k rows:
df = pd.DataFrame({
'A':[[5, 4, 3, 6], [7, 8, 9, 6]] * 50000})
print (df)
In [263]: %timeit [y for x in df.A for y in x]
37.7 ms ± 3.93 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [264]: %timeit list(chain.from_iterable(df.A))
27.3 ms ± 1.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [265]: %timeit np.concatenate(df.A).tolist()
1.71 s ± 86.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [266]: %timeit df.A.explode().tolist()
207 ms ± 3.48 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#ansev1
In [267]: %timeit np.hstack(df['A']).tolist()
328 ms ± 6.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
I need to merge the list and have a function that can be implemented, but when the number of merges is very slow and unbearable, I wonder if there is a more efficient way
Consolidation conditions:Sub-lists contain identical numbers to each other Thank you
Simple Association:
[7,8,9] = [7,8]+[8,9] #The same number 8
Cascade contains:
[1,2,3] = [1,2,3]+[3,4] #The same number 3
[3,4,5,6] = [3,4],[4,5,6] #The same number 4
[1,2,3,4,5,6] = [1,2,3]+[3,4,5,6] #The same number 3
Function:
a = [ [1,2,3],[4,5,6],[3,4],[7,8],[8,9],[6,12,13] ]
b = len(a)
for i in range(b):
for j in range(b):
x = list(set(a[i]+a[j]))
y = len(a[j])+len(a[i])
if i == j or a[i] == 0 or a[j] == 0:
break
elif len(x) < y:
a[i] = x
a[j] = [0]
print a
print [i for i in a if i!= [0]]
result:
[[8, 9, 7], [1, 2, 3, 4, 5, 6, 10, 11]]
Above is an example where each sub-list in the actual calculation has a length of only 2,
a = [[1,3],[5,6],[3,4],[7,8],[8,9],[12,13]]
I want to miss out more data, here is a simulation data.
a = np.random.rand(150,150)>0.99
a[np.tril_indices(a.shape[1], -1)] = 0
a[np.diag_indices(a.shape[1])] = 0
a = [list(x) for x in np.c_[np.where(a)]]
consolidate(a)
I think your algorithm is close to optimal, except that the inner loop can be shortened because the intersection operation is symmetric, i.e. if you check that (A, B) intersect, there is no need to check for (B, A).
This way you would go from O(n²) to O(n * (n / 2)).
However, I would rewrite the piece of code more cleanly and I would also avoid modifying the input.
Note also, that since sets do not guarantee ordering, it is a good idea to do some sorting before getting to list.
Here is my proposed code (EDITED to reduce the number of castings and sortings):
def consolidate(items):
items = [set(item.copy()) for item in items]
for i, x in enumerate(items):
for j, y in enumerate(items[i + 1:]):
if x & y:
items[i + j + 1] = x | y
items[i] = None
return [sorted(x) for x in items if x]
Encapsulating your code in a function, I would get:
def consolidate_orig(a):
a = [x.copy() for x in a]
b = len(a)
for i in range(b):
for j in range(b):
x = list(set(a[i]+a[j]))
y = len(a[j])+len(a[i])
if i == j or a[i] == 0 or a[j] == 0:
break
elif len(x) < y:
a[i] = x
a[j] = [0]
return [i for i in a if i!= [0]]
This would allow us to do some clean micro-benchmarking (for completeness I have included also #zipa's merge()):
EDIT:
#zipa's code is not properly encapsulated, here is an equivalent version with proper encapsulation:
def merge(iterable, base=None):
if base is None:
base = iterable
merged = set([tuple(set(i).union(
*[j for j in base if set(i).intersection(j)])) for i in iterable])
if merged == iterable:
return merged
else:
return merge(merged, base)
and updated timings:
in_list = [[1,2,3], [4,5,6], [3,4], [7,8], [8,9], [6,12,13]]
%timeit consolidate_orig(in_list)
# 17.9 µs ± 368 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit consolidate(in_list)
# 6.15 µs ± 30 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit merge(in_list)
# 53.6 µs ± 718 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
in_list = [[1, 3], [5, 6], [3, 4], [7, 8], [8, 9], [12, 13]]
%timeit consolidate_orig(in_list)
# 16.1 µs ± 159 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit consolidate(in_list)
# 5.87 µs ± 71.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit merge(in_list)
# 27 µs ± 701 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Showing that, at least for this input, the proposed solution is consistently faster.
Since it is not too straightforward to generate large meaningful inputs, I'll leave to you to check that this is more efficient then your approach for the larger inputs you have in mind.
EDIT
With larger, but probably meaningless inputs, the timings are still favorable for the proposed version:
in_list = [[1,2,3], [4,5,6], [3,4], [7,8], [8,9], [6,12,13]] * 300
%timeit consolidate_orig(in_list)
# 1.04 s ± 14.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit consolidate(in_list)
# 724 ms ± 7.51 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit merge(in_list)
# 1.04 s ± 7.94 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
in_list = [[1, 3], [5, 6], [3, 4], [7, 8], [8, 9], [12, 13]] * 300
%timeit consolidate_orig(in_list)
# 1.03 s ± 18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit consolidate(in_list)
# 354 ms ± 3.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit merge(in_list)
# 967 ms ± 16.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
This approach should perform faster on larger nested lists:
def merge(iterable):
merged = set([tuple(set(i).union(*[j for j in a if set(i).intersection(j)])) for i in iterable])
if merged == iterable:
return merged
else:
return merge(merged)
merged(a)
#set([(1, 2, 3, 4, 5, 6, 12, 13), (8, 9, 7)])
It recursively combines lists until all the combinations are exhausted.