I am trying to write out a covariance calculation for the following example, and i know there has to be a better way than a for loop. I've looked into np.dot, np.einsum and i feel like np.einsum has the capability but i am just missing something for implementing it.
import numpy as np
# this is mx3
a = np.array([[1,2,3],[4,5,6]])
# this is x3
mean = a.mean(axis=0)
# result should be 3x3
b = np.zeros((3,3))
for i in range(a.shape[0]):
b = b + (a[i]-mean).reshape(3,1) * (a[i]-mean)
b
array([[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5]])
so this is fine for a 2 data point sample but for a m=large number this is super slow. There has to be a better way. Any suggestions?
In [108]: a = np.array([[1,2,3],[4,5,6]])
...: # this is x3
...: mean = a.mean(axis=0)
...:
...: # result should be 3x3
...: b = np.zeros((3,3))
...: for i in range(a.shape[0]):
...: b = b + (a[i]-mean).reshape(3,1) * (a[i]-mean)
...:
In [109]: b
Out[109]:
array([[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5]])
In [110]: a.mean(axis=0)
Out[110]: array([2.5, 3.5, 4.5])
Since the mean is subtracted twice, lets define a new variable. In this case the 2d and 1d dimensions broadcast, so we can simply:
In [111]: a1= a - a.mean(axis=0)
In [112]: a1
Out[112]:
array([[-1.5, -1.5, -1.5],
[ 1.5, 1.5, 1.5]])
The rest is a normal dot product:
In [113]: a1.T#a1
Out[113]:
array([[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5]])
np.einsum and np.dot can also do this matrix multiplication.
I have a numpy array([1.0, 2.0, 3.0]), which is actually a mesh in 1 dimension in my problem. What I want to do is to refine the mesh to get this: array([0.8, 0.9, 1, 1.1, 1.2, 1.8, 1.9, 2, 2.1, 2.2, 2.8, 2.9, 3, 3.1, 3.2,]).
The actual array is very large and this procedure costs a lot of time. How to do this quickly (maybe vectorize) in python?
Here's a vectorized approach -
(a[:,None] + np.arange(-0.2,0.3,0.1)).ravel() # a is input array
Sample run -
In [15]: a = np.array([1.0, 2.0, 3.0]) # Input array
In [16]: (a[:,None] + np.arange(-0.2,0.3,0.1)).ravel()
Out[16]:
array([ 0.8, 0.9, 1. , 1.1, 1.2, 1.8, 1.9, 2. , 2.1, 2.2, 2.8,
2.9, 3. , 3.1, 3.2])
Here are a few options(python 3):
Option 1:
np.array([j for i in arr for j in np.arange(i - 0.2, i + 0.25, 0.1)])
# array([ 0.8, 0.9, 1. , 1.1, 1.2, 1.8, 1.9, 2. , 2.1, 2.2, 2.8,
# 2.9, 3. , 3.1, 3.2])
Option 2:
np.array([j for x, y in zip(arr - 0.2, arr + 0.25) for j in np.arange(x,y,0.1)])
# array([ 0.8, 0.9, 1. , 1.1, 1.2, 1.8, 1.9, 2. , 2.1, 2.2, 2.8,
# 2.9, 3. , 3.1, 3.2])
Option 3:
np.array([arr + i for i in np.arange(-0.2, 0.25, 0.1)]).T.ravel()
# array([ 0.8, 0.9, 1. , 1.1, 1.2, 1.8, 1.9, 2. , 2.1, 2.2, 2.8,
# 2.9, 3. , 3.1, 3.2])
Timing on a larger array:
arr = np.arange(100000)
arr
# array([ 0, 1, 2, ..., 99997, 99998, 99999])
%timeit np.array([j for i in arr for j in np.arange(i-0.2, i+0.25, 0.1)])
# 1 loop, best of 3: 615 ms per loop
%timeit np.array([j for x, y in zip(arr - 0.2, arr + 0.25) for j in np.arange(x,y,0.1)])
# 1 loop, best of 3: 250 ms per loop
%timeit np.array([arr + i for i in np.arange(-0.2, 0.25, 0.1)]).T.ravel()
# 100 loops, best of 3: 1.93 ms per loop
I have the following array:
X
array([ 3.5, -3, 5.4, 3.7, 14.9, -7.8, -3.5, 2.1])
For each values of X I know its recording time T. I want to find the indexes between two consecutive positive-negative or viceversa. Concluding I would like an array like
Y = array([ T(1)-T(0), T(2)-T(1), T(5)-T(4), T(7)-T(6)])
Perhaps iterating over the array in a list comprehension would work for you:
In [35]: x=np.array([ 3.5, -3, 5.4, 3.7, 14.9, -7.8, -3.5, 2.1])
In [36]: y=np.array([b-a for a,b in zip(x, x[1:]) if (a<0) != (b<0)])
In [37]: y
Out[37]: array([ -6.5, 8.4, -22.7, 5.6])
Edit
I apparently didn't understand the question completely. Try this instead:
In [38]: X=np.array([ 3.5, -3, 5.4, 3.7, 14.9, -7.8, -3.5, 2.1])
In [39]: T=np.array([ 0, 0.1, 2, 3.5, 5, 22, 25, 50])
In [40]: y=np.array([t1-t0 for x0,x1,t0,t1 in zip(X, X[1:], T, T[1:]) if (x0<0) != (x1<0)])
In [41]: y
Out[41]: array([ 0.1, 1.9, 17. , 25. ])
I have an array A (variable) of the form:
A = [1, 3, 7, 9, 15, 20, 24]
Now I want to create 10 (variable) equally spaced values in between values of array A so that I get array B of the form:
B = [1, 1.2, 1.4, ... 2.8, 3, 3.4, 3.8, ... , 6.6, 7, 7.2, ..., 23.6, 24]
In essence B should always have the values of A and equally spaced values in between values of A.
I did solve this by using the code:
import numpy as np
A = np.array([1, 3, 7, 9, 15, 20, 24])
B = []
for i in range(len(A) - 1):
B = np.append(B, np.linspace(A[i], A[i + 1], 11))
print (B)
But does NumPy already have any function or are there any other better methods to create such array.
Alternative method using interpolation instead of concatenation:
n = 10
x = np.arange(0, n * len(A), n) # 0, 10, .., 50, 60
xx = np.arange((len(A) - 1) * n + 1) # 0, 1, .., 59, 60
B = np.interp(xx, x, A)
Result:
In [31]: B
Out[31]:
array([ 1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6,
2.8, 3. , 3.4, 3.8, 4.2, 4.6, 5. , 5.4, 5.8,
6.2, 6.6, 7. , 7.2, 7.4, 7.6, 7.8, 8. , 8.2,
8.4, 8.6, 8.8, 9. , 9.6, 10.2, 10.8, 11.4, 12. ,
12.6, 13.2, 13.8, 14.4, 15. , 15.5, 16. , 16.5, 17. ,
17.5, 18. , 18.5, 19. , 19.5, 20. , 20.4, 20.8, 21.2,
21.6, 22. , 22.4, 22.8, 23.2, 23.6, 24. ])
This should be faster than the other solutions, since it does not use a Python for-loop, and does not do the many calls to linspace. Quick timing comparison:
In [58]: timeit np.interp(np.arange((len(A) - 1) * 10 + 1), np.arange(0, 10*len(A), 10), A)
100000 loops, best of 3: 10.3 µs per loop
In [59]: timeit np.append(np.concatenate([np.linspace(i, j, 10, False) for i, j in zip(A, A[1:])]), A[-1])
10000 loops, best of 3: 94.2 µs per loop
In [60]: timeit np.unique(np.hstack(np.linspace(a, b, 10 + 1) for a, b in zip(A[:-1], A[1:])))
10000 loops, best of 3: 140 µs per loop
You can usezip function within a list comprehension and np.concatenate But as you want the last element too you can append it with np.append:
>>> np.append(np.concatenate([np.linspace(i, j, 10, False) for i,j in zip(A,A[1:])]),A[-1])
array([ 1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6,
2.8, 3. , 3.4, 3.8, 4.2, 4.6, 5. , 5.4, 5.8,
6.2, 6.6, 7. , 7.2, 7.4, 7.6, 7.8, 8. , 8.2,
8.4, 8.6, 8.8, 9. , 9.6, 10.2, 10.8, 11.4, 12. ,
12.6, 13.2, 13.8, 14.4, 15. , 15.5, 16. , 16.5, 17. ,
17.5, 18. , 18.5, 19. , 19.5, 20. , 20.4, 20.8, 21.2,
21.6, 22. , 22.4, 22.8, 23.2, 23.6, 24. ])
Also you can use retstep=True to return (samples, step), where step is the spacing between samples.
>>> np.concatenate([np.linspace(i, j, 10, False,retstep=True) for i,j in zip(A,A[1:])])
array([array([ 1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8]),
0.2,
array([ 3. , 3.4, 3.8, 4.2, 4.6, 5. , 5.4, 5.8, 6.2, 6.6]),
0.4,
array([ 7. , 7.2, 7.4, 7.6, 7.8, 8. , 8.2, 8.4, 8.6, 8.8]),
0.2,
array([ 9. , 9.6, 10.2, 10.8, 11.4, 12. , 12.6, 13.2, 13.8, 14.4]),
0.6,
array([ 15. , 15.5, 16. , 16.5, 17. , 17.5, 18. , 18.5, 19. , 19.5]),
0.5,
array([ 20. , 20.4, 20.8, 21.2, 21.6, 22. , 22.4, 22.8, 23.2, 23.6]),
0.4], dtype=object)
Basically a slightly condensed version of your original approach:
print np.hstack(np.linspace(a, b, 10, endpoint=False) for a, b in zip(A[:-1], A[1:]))
Output:
[ 1. 1.2 1.4 1.6 1.8 2. 2.2 2.4 2.6 2.8 3. 3.4
3.8 4.2 4.6 5. 5.4 5.8 6.2 6.6 7. 7.2 7.4 7.6
7.8 8. 8.2 8.4 8.6 8.8 9. 9.6 10.2 10.8 11.4 12.
12.6 13.2 13.8 14.4 15. 15.5 16. 16.5 17. 17.5 18. 18.5
19. 19.5 20. 20.4 20.8 21.2 21.6 22. 22.4 22.8 23.2 23.6]
The endpoint parameter controls whether you have 9 or 10 equally spaced values in between two original values.
Edit
Since you want the 24 at the very end, you can either append it like Kasra does or -- to bring up some variation ;) -- forget the endpoint argument and generate 10 + 1 values from a to b. This will append the 24 automatically (since endpoint is true by default).
(Update: As Bas Swinckels indicates, you need to wrap it with unique now...)
print np.unique(np.hstack(np.linspace(a, b, 10 + 1) for a, b in zip(A[:-1], A[1:])))
[ 1. 1.2 1.4 1.6 1.8 2. 2.2 2.4 2.6 2.8 3.
3.4 3.8 4.2 4.6 5. 5.4 5.8 6.2 6.6 7. 7.2
7.4 7.6 7.8 8. 8.2 8.4 8.6 8.8 9 9.6 10.2
10.8 11.4 12. 12.6 13.2 13.8 14.4 15. 15.5 16. 16.5
17. 17.5 18. 18.5 19. 19.5 20. 20.4 20.8 21.2 21.6
22. 22.4 22.8 23.2 23.6 24. ]
Solution Code
This solution suggests a vectorized approach using broadcasting and matrix multiplication.
The basic steps are:
Divide a unit-step interval excluding 1 i.e. [0,1) into an array of elements of equal step-size and of length steps.
Then, multiply each of those step array elements with the differentiation of A to get a 2D array of offsetted interpolated elements.
Finally, add A elements for the actual interpolated values.
Here's the implementation -
out2D = (np.diff(A)[:,None]*np.arange(steps)/steps) + A[:-1,None]
out = np.append(out2D,A[-1])
Benchmarking
The proposed approach seems to be faster than the actual interpolation based approach as suggested in the other solution for medium to large sized input arrays, as we are working with a regular pattern to interpolate values. Here' some runtime tests to confirm that -
Case #1: A of length 100 and steps = 10
In [42]: A = np.sort(np.random.randint(1,100000,(1,100))).ravel()
In [43]: steps = 10
In [44]: %timeit interp_based(A,steps)
100000 loops, best of 3: 18.3 µs per loop
In [45]: %timeit broadcasting_based(A,steps)
100000 loops, best of 3: 19.7 µs per loop
Case #2: A of length 500 and steps = 10
In [46]: A = np.sort(np.random.randint(1,100000,(1,500))).ravel()
In [47]: steps = 10
In [48]: %timeit interp_based(A,steps)
10000 loops, best of 3: 101 µs per loop
In [49]: %timeit broadcasting_based(A,steps)
10000 loops, best of 3: 48.8 µs per loop
Case #3: A of length 1000 and steps = 20
In [50]: A = np.sort(np.random.randint(1,100000,(1,1000))).ravel()
In [51]: steps = 20
In [52]: %timeit interp_based(A,steps)
1000 loops, best of 3: 345 µs per loop
In [53]: %timeit broadcasting_based(A,steps)
10000 loops, best of 3: 139 µs per loop
I have the following matrices:
1 2 3
4 5 6
7 8 9
m2:
2 3 4
5 6 7
8 9 10
I want to average the two to get:
1.5 2.5 3.5
4.5 5.5 6.5
7.5 8.5 9.5
What is the best way of doing this?
Thanks
List comprehensions and the zip function are your friends:
>>> from __future__ import division
>>> m1 = [[1,2,3], [4,5,6], [7,8,9]]
>>> m2 = [[2,3,4], [5,6,7], [8,9,10]]
>>> [[(x+y)/2 for x,y in zip(r1, r2)] for r1, r2 in zip(m1, m2)]
[[1.5, 2.5, 3.5], [4.5, 5.5, 6.5], [7.5, 8.5, 9.5]]
Of course, the numpy package makes these kind of computations trivially easy:
>>> from numpy import array
>>> m1 = array([[1,2,3], [4,5,6], [7,8,9]])
>>> m2 = array([[2,3,4], [5,6,7], [8,9,10]])
>>> (m1 + m2) / 2
array([[ 1.5, 2.5, 3.5],
[ 4.5, 5.5, 6.5],
[ 7.5, 8.5, 9.5]])
The obvious answer would be:
m1 = np.arange(1,10,dtype=np.double).reshape((3,3))
m2 = 1. + m1
m_average = 0.5 * (m1 + m2)
print m_average
array([[ 1.5, 2.5, 3.5],
[ 4.5, 5.5, 6.5],
[ 7.5, 8.5, 9.5]])
Perhaps a more elegant way (although probably a bit slower) to do it would be to use the numpy.mean function on a stacked version of the two arrays:
m_average = np.dstack([m1,m2]).mean(axis=2)
print m_average
array([[ 1.5, 2.5, 3.5],
[ 4.5, 5.5, 6.5],
[ 7.5, 8.5, 9.5]])