numpy covariance calculation (einsum potential?)

numpy covariance calculation (einsum potential?) - python

I am trying to write out a covariance calculation for the following example, and i know there has to be a better way than a for loop. I've looked into np.dot, np.einsum and i feel like np.einsum has the capability but i am just missing something for implementing it.
import numpy as np
# this is mx3
a = np.array([[1,2,3],[4,5,6]])
# this is x3
mean = a.mean(axis=0)
# result should be 3x3
b = np.zeros((3,3))
for i in range(a.shape[0]):
b = b + (a[i]-mean).reshape(3,1) * (a[i]-mean)
b
array([[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5]])
so this is fine for a 2 data point sample but for a m=large number this is super slow. There has to be a better way. Any suggestions?

In [108]: a = np.array([[1,2,3],[4,5,6]])
...: # this is x3
...: mean = a.mean(axis=0)
...:
...: # result should be 3x3
...: b = np.zeros((3,3))
...: for i in range(a.shape[0]):
...: b = b + (a[i]-mean).reshape(3,1) * (a[i]-mean)
...:
In [109]: b
Out[109]:
array([[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5]])
In [110]: a.mean(axis=0)
Out[110]: array([2.5, 3.5, 4.5])
Since the mean is subtracted twice, lets define a new variable. In this case the 2d and 1d dimensions broadcast, so we can simply:
In [111]: a1= a - a.mean(axis=0)
In [112]: a1
Out[112]:
array([[-1.5, -1.5, -1.5],
[ 1.5, 1.5, 1.5]])
The rest is a normal dot product:
In [113]: a1.T#a1
Out[113]:
array([[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5]])
np.einsum and np.dot can also do this matrix multiplication.

Related

python arrays: averaging slope and intercept of datasets

I am having some difficulties achieving the following. Let's say I have two sets of data obtained from a test:
import numpy as np
a = np.array([[0.0, 1.0, 2.0, 3.0], [0.0, 2.0, 4.0, 6.0]]).T
b = np.array([[0.5, 1.5, 2.5, 3.5], [0.5, 1.5, 2.5, 3.5]]).T
where the data in the 0th column represents (in my case) displacement and the data in the 1th column represents the respective measured force values.
(Given data represents two lines with slopes of 2 and 1, both with a y-intercept of 0.)
Now I am trying to program a script that averages those two arrays despite the mismatched x-values, such that it will yield
c = [0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5], [0.0, 0.75, 1.5,
2.25, 3.0, 3.75, 4.5, 5.25]]).T
(A line with a slope of 1.5 and a y-intercept of 0.)
I tried my best using slicing and linear interpolation, however it seems like I cannot get my head around it (I am a beginner).
I'd be very glad for any input and tips and hope the information I gave to you is sufficient!
Thanks in advance,
Robert

You can get the coefficients (slope and intercept) of each dataset, obtain the mean, and fit that data to a new array of x values.
Step by Step:
Fit deg-1 polynomial to each array a, and b using polyfit to get the coefficients of each (slope and intercept):
coef_a = np.polyfit(a[:,0], a[:,1], deg=1)
coef_b = np.polyfit(b[:,0], b[:,1], deg=1)
>>> coef_a
array([ 2.00000000e+00, 2.22044605e-16])
>>> coef_b
array([ 1.00000000e+00, 1.33226763e-15])
Get the mean of those coefficients to use as the coefficients of c:
coef_c = np.mean(np.stack([coef_a,coef_b]), axis=0)
>>> coef_c
array([ 1.50000000e+00, 7.77156117e-16])
Create new x-values for c using np.arange
c_x = np.arange(0,4,0.5)
>>> c_x
array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5])
use polyval to fit your new c coeficients to your new x values:
c_y = np.polyval(coef_c, c_x)
>>> c_y
array([ 7.77156117e-16, 7.50000000e-01, 1.50000000e+00,
2.25000000e+00, 3.00000000e+00, 3.75000000e+00,
4.50000000e+00, 5.25000000e+00])
Put your c_x and c_y values together using stack:
c = np.stack([c_x, c_y])
>>> c
array([[ 0.00000000e+00, 5.00000000e-01, 1.00000000e+00,
1.50000000e+00, 2.00000000e+00, 2.50000000e+00,
3.00000000e+00, 3.50000000e+00],
[ 7.77156117e-16, 7.50000000e-01, 1.50000000e+00,
2.25000000e+00, 3.00000000e+00, 3.75000000e+00,
4.50000000e+00, 5.25000000e+00]])
If you round that to 2 decimals, you'll see it's the same as your desired outcome:
>>> np.round(c, 2)
array([[ 0. , 0.5 , 1. , 1.5 , 2. , 2.5 , 3. , 3.5 ],
[ 0. , 0.75, 1.5 , 2.25, 3. , 3.75, 4.5 , 5.25]])
In a single statement:
c = np.stack([np.arange(0, 4, 0.5),
np.polyval(np.mean(np.stack([np.polyfit(a.T[0], a.T[1], 1),
np.polyfit(b.T[0], b.T[1], 1)]),
axis=0),
np.arange(0, 4, 0.5))])
>>> c
array([[ 0.00000000e+00, 5.00000000e-01, 1.00000000e+00,
1.50000000e+00, 2.00000000e+00, 2.50000000e+00,
3.00000000e+00, 3.50000000e+00],
[ 7.77156117e-16, 7.50000000e-01, 1.50000000e+00,
2.25000000e+00, 3.00000000e+00, 3.75000000e+00,
4.50000000e+00, 5.25000000e+00]])

Append selected values from a multi dimensional array to a new array

Hello :) I am a python beginner and i started working with numpy lately, basically i got a nd-array: data.shape = {55000, 784} filled with float32 values. Based on a condition i made, i want to append specific rows and their columns to a new array, its important that the formating stays the same. e.g. i want data[5][0-784] appended to an empty array.. i heard about something called fancy indexing, still couldn't figure out how to use it, an example would help me out big time. I would appreciate every help from you guys! - Greets

I'd recommend skimming through the documentation for Indexing. But, here is an example to demonstrate.
import numpy as np
data = np.array([[0, 1, 2], [3, 4, 5]])
print(data.shape)
(2, 3)
print(data)
[[0 1 2]
[3 4 5]]
selection = data[1, 1:3]
print(selection)
[4 5]
Fancy indexing is an advanced indexing function which allows indexing using integer arrays. Here is an example.
fancy_selection = data[[0, 1], [0, 2]]
print(fancy_selection)
[0 5]
Since you also asked about appending, have a look at Append a NumPy array to a NumPy array. Here is an example anyway.
data_two = np.array([[6, 7, 8]])
appended_array = np.concatenate((data, data_two))
print(appended_array)
[[0 1 2]
[3 4 5]
[6 7 8]]

As #hpaulj recommends in his comment appending to arrays is possible but inefficient and should be avoided. Let's turn to your example but make the numbers a bit smaller.
a = np.sum(np.ogrid[1:5, 0.1:0.39:0.1])
a
# array([[ 1.1, 1.2, 1.3],
# [ 2.1, 2.2, 2.3],
# [ 3.1, 3.2, 3.3],
# [ 4.1, 4.2, 4.3]])
a.shape
# (4, 3)
Selecting an element:
a[1,2]
# 2.3
Selecting an entire row:
a[2, :] # or a[2] or a 2[, ...]
# array([ 3.1, 3.2, 3.3])
or column:
a[:, 1] # or a[..., 1]
# array([ 1.2, 2.2, 3.2, 4.2])
fancy indexing, observe that the first index is not a slice but a list or array:
a[[3,0,0,1], :] # or a[[3,0,0,1]]
# array([[ 4.1, 4.2, 4.3],
# [ 1.1, 1.2, 1.3],
# [ 1.1, 1.2, 1.3],
# [ 2.1, 2.2, 2.3]])
fancy indexing can be used on multiple axes to select arbitrary elements and assemble them to a new shape for example you could make a 2x2x2 array like so:
a[ [[[0,1], [1,2]], [[3,3], [3,2]]], [[[2,1], [1,1]], [[2,1], [0,0]]] ]
# array([[[ 1.3, 2.2],
# [ 2.2, 3.2]],
#
# [[ 4.3, 4.2],
# [ 4.1, 3.1]]])
There is also logical indexing
mask = np.isclose(a%1.1, 1.0)
mask
# array([[False, False, False],
# [ True, False, False],
# [False, True, False],
# [False, False, True]], dtype=bool)
a[mask]
# array([ 2.1, 3.2, 4.3])
To combine arrays, collect them in a list and use concatenate
np.concatenate([a[1:, :2], a[:0:-1, [2,0]]], axis=1)
# array([[ 2.1, 2.2, 4.3, 4.1],
# [ 3.1, 3.2, 3.3, 3.1],
# [ 4.1, 4.2, 2.3, 2.1]])
Hope that help getting you started.

How to find two consecutive positive-negative values in an array?

I have the following array:
X
array([ 3.5, -3, 5.4, 3.7, 14.9, -7.8, -3.5, 2.1])
For each values of X I know its recording time T. I want to find the indexes between two consecutive positive-negative or viceversa. Concluding I would like an array like
Y = array([ T(1)-T(0), T(2)-T(1), T(5)-T(4), T(7)-T(6)])

Perhaps iterating over the array in a list comprehension would work for you:
In [35]: x=np.array([ 3.5, -3, 5.4, 3.7, 14.9, -7.8, -3.5, 2.1])
In [36]: y=np.array([b-a for a,b in zip(x, x[1:]) if (a<0) != (b<0)])
In [37]: y
Out[37]: array([ -6.5, 8.4, -22.7, 5.6])
Edit
I apparently didn't understand the question completely. Try this instead:
In [38]: X=np.array([ 3.5, -3, 5.4, 3.7, 14.9, -7.8, -3.5, 2.1])
In [39]: T=np.array([ 0, 0.1, 2, 3.5, 5, 22, 25, 50])
In [40]: y=np.array([t1-t0 for x0,x1,t0,t1 in zip(X, X[1:], T, T[1:]) if (x0<0) != (x1<0)])
In [41]: y
Out[41]: array([ 0.1, 1.9, 17. , 25. ])

Python removing all negative values in array

What is the most efficient way to remove negative elements in an array? I have tried numpy.delete and Remove all specific value from array and code of the form x[x != i].
For:
import numpy as np
x = np.array([-2, -1.4, -1.1, 0, 1.2, 2.2, 3.1, 4.4, 8.3, 9.9, 10, 14, 16.2])
I want to end up with an array:
[0, 1.2, 2.2, 3.1, 4.4, 8.3, 9.9, 10, 14, 16.2]

In [2]: x[x >= 0]
Out[2]: array([ 0. , 1.2, 2.2, 3.1, 4.4, 8.3, 9.9, 10. , 14. , 16.2])

If performance is important, you could take advantage of the fact that your np.array is sorted and use numpy.searchsorted
For example:
In [8]: x[np.searchsorted(x, 0) :]
Out[8]: array([ 0. , 1.2, 2.2, 3.1, 4.4, 8.3, 9.9, 10. , 14. , 16.2])
In [9]: %timeit x[np.searchsorted(x, 0) :]
1000000 loops, best of 3: 1.47 us per loop
In [10]: %timeit x[x >= 0]
100000 loops, best of 3: 4.5 us per loop
The difference in performance will increase as the size of the array increases because np.searchsorted does a binary search that is O(log n) vs. O(n) linear search that x >= 0 is doing.
In [11]: x = np.arange(-1000, 1000)
In [12]: %timeit x[np.searchsorted(x, 0) :]
1000000 loops, best of 3: 1.61 us per loop
In [13]: %timeit x[x >= 0]
100000 loops, best of 3: 9.87 us per loop

In numpy:
b = array[array>=0]
Example:
>>> import numpy as np
>>> arr = np.array([-2, -1.4, -1.1, 0, 1.2, 2.2, 3.1, 4.4, 8.3, 9.9, 10, 14, 16.2])
>>> arr = arr[arr>=0]
>>> arr
array([ 0. , 1.2, 2.2, 3.1, 4.4, 8.3, 9.9, 10. , 14. , 16.2])

There's probably a cool way to do this is numpy because numpy is magic to me, but:
x = np.array( [ num for num in x if num >= 0 ] )

averaging elements in a matrix with the corresponding elements in another matrix (in python)

I have the following matrices:
1 2 3
4 5 6
7 8 9
m2:
2 3 4
5 6 7
8 9 10
I want to average the two to get:
1.5 2.5 3.5
4.5 5.5 6.5
7.5 8.5 9.5
What is the best way of doing this?
Thanks

List comprehensions and the zip function are your friends:
>>> from __future__ import division
>>> m1 = [[1,2,3], [4,5,6], [7,8,9]]
>>> m2 = [[2,3,4], [5,6,7], [8,9,10]]
>>> [[(x+y)/2 for x,y in zip(r1, r2)] for r1, r2 in zip(m1, m2)]
[[1.5, 2.5, 3.5], [4.5, 5.5, 6.5], [7.5, 8.5, 9.5]]
Of course, the numpy package makes these kind of computations trivially easy:
>>> from numpy import array
>>> m1 = array([[1,2,3], [4,5,6], [7,8,9]])
>>> m2 = array([[2,3,4], [5,6,7], [8,9,10]])
>>> (m1 + m2) / 2
array([[ 1.5, 2.5, 3.5],
[ 4.5, 5.5, 6.5],
[ 7.5, 8.5, 9.5]])

The obvious answer would be:
m1 = np.arange(1,10,dtype=np.double).reshape((3,3))
m2 = 1. + m1
m_average = 0.5 * (m1 + m2)
print m_average
array([[ 1.5, 2.5, 3.5],
[ 4.5, 5.5, 6.5],
[ 7.5, 8.5, 9.5]])
Perhaps a more elegant way (although probably a bit slower) to do it would be to use the numpy.mean function on a stacked version of the two arrays:
m_average = np.dstack([m1,m2]).mean(axis=2)
print m_average
array([[ 1.5, 2.5, 3.5],
[ 4.5, 5.5, 6.5],
[ 7.5, 8.5, 9.5]])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy covariance calculation (einsum potential?) - python

Related

python arrays: averaging slope and intercept of datasets

Append selected values from a multi dimensional array to a new array

How to find two consecutive positive-negative values in an array?

Python removing all negative values in array

averaging elements in a matrix with the corresponding elements in another matrix (in python)

Categories

Resources