python arrays: averaging slope and intercept of datasets

python arrays: averaging slope and intercept of datasets - python

I am having some difficulties achieving the following. Let's say I have two sets of data obtained from a test:
import numpy as np
a = np.array([[0.0, 1.0, 2.0, 3.0], [0.0, 2.0, 4.0, 6.0]]).T
b = np.array([[0.5, 1.5, 2.5, 3.5], [0.5, 1.5, 2.5, 3.5]]).T
where the data in the 0th column represents (in my case) displacement and the data in the 1th column represents the respective measured force values.
(Given data represents two lines with slopes of 2 and 1, both with a y-intercept of 0.)
Now I am trying to program a script that averages those two arrays despite the mismatched x-values, such that it will yield
c = [0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5], [0.0, 0.75, 1.5,
2.25, 3.0, 3.75, 4.5, 5.25]]).T
(A line with a slope of 1.5 and a y-intercept of 0.)
I tried my best using slicing and linear interpolation, however it seems like I cannot get my head around it (I am a beginner).
I'd be very glad for any input and tips and hope the information I gave to you is sufficient!
Thanks in advance,
Robert

You can get the coefficients (slope and intercept) of each dataset, obtain the mean, and fit that data to a new array of x values.
Step by Step:
Fit deg-1 polynomial to each array a, and b using polyfit to get the coefficients of each (slope and intercept):
coef_a = np.polyfit(a[:,0], a[:,1], deg=1)
coef_b = np.polyfit(b[:,0], b[:,1], deg=1)
>>> coef_a
array([ 2.00000000e+00, 2.22044605e-16])
>>> coef_b
array([ 1.00000000e+00, 1.33226763e-15])
Get the mean of those coefficients to use as the coefficients of c:
coef_c = np.mean(np.stack([coef_a,coef_b]), axis=0)
>>> coef_c
array([ 1.50000000e+00, 7.77156117e-16])
Create new x-values for c using np.arange
c_x = np.arange(0,4,0.5)
>>> c_x
array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5])
use polyval to fit your new c coeficients to your new x values:
c_y = np.polyval(coef_c, c_x)
>>> c_y
array([ 7.77156117e-16, 7.50000000e-01, 1.50000000e+00,
2.25000000e+00, 3.00000000e+00, 3.75000000e+00,
4.50000000e+00, 5.25000000e+00])
Put your c_x and c_y values together using stack:
c = np.stack([c_x, c_y])
>>> c
array([[ 0.00000000e+00, 5.00000000e-01, 1.00000000e+00,
1.50000000e+00, 2.00000000e+00, 2.50000000e+00,
3.00000000e+00, 3.50000000e+00],
[ 7.77156117e-16, 7.50000000e-01, 1.50000000e+00,
2.25000000e+00, 3.00000000e+00, 3.75000000e+00,
4.50000000e+00, 5.25000000e+00]])
If you round that to 2 decimals, you'll see it's the same as your desired outcome:
>>> np.round(c, 2)
array([[ 0. , 0.5 , 1. , 1.5 , 2. , 2.5 , 3. , 3.5 ],
[ 0. , 0.75, 1.5 , 2.25, 3. , 3.75, 4.5 , 5.25]])
In a single statement:
c = np.stack([np.arange(0, 4, 0.5),
np.polyval(np.mean(np.stack([np.polyfit(a.T[0], a.T[1], 1),
np.polyfit(b.T[0], b.T[1], 1)]),
axis=0),
np.arange(0, 4, 0.5))])
>>> c
array([[ 0.00000000e+00, 5.00000000e-01, 1.00000000e+00,
1.50000000e+00, 2.00000000e+00, 2.50000000e+00,
3.00000000e+00, 3.50000000e+00],
[ 7.77156117e-16, 7.50000000e-01, 1.50000000e+00,
2.25000000e+00, 3.00000000e+00, 3.75000000e+00,
4.50000000e+00, 5.25000000e+00]])

Related

Python numpy matrix multiplication mismatch in core dimension

I am trying to matrix multiply a 2x2 matrix with a 2x1 matrix. Both matrices have entries which are linspaces such that the resulting 2x1 matrix gives me a value for each value of the linspace.
I get this dimensionality error however.
matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 1 is different from 2)
For readability I am not posting the whole code but what's necessary.
I have also replaced linspace values with indicative text.
Matrix "L" is a result of other 2x2 multiplications which contain constants, thus no errors there.
The matrix B (2x2) gives the desired result, so the problem comes down to the multiplication between B and C.
import numpy as np
from sympy import *
# Defining range of values
z = np.linspace(initial, final, 10)
g = np.linspace(initial, final, 10)
y = np.linspace(initial, final, 10)
# Matrix operations
A = np.array([[1, z], [0, 1]], dtype=object)
B = np.matmul(L,A)
C = np.array([[y],[g]])
D = np.matmul(B, C)
print(total)
An alternative POV of what I am trying to do, is that for the matrix "B" when multiplied with the 2x1 "C" which contains unknowns, to calculate those unknowns "y" and "g"
Many thanks,
P.S; For an array "C" with single value entries, the multiplication runs as expected.
Edit; As per mozway's suggestion, I am providing the prints of array "A" and "M" which will make stuff clearer, but let M = B

In [66]: initial, final = 0,1
In [67]: z = np.linspace(initial,final,11)
In [68]: z
Out[68]: array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])
A is (2,2), but contains a mix of array and scalars
In [69]: A = np.array([[1,z],[0,1]], object)
In [70]: A
Out[70]:
array([[1,
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])],
[0, 1]], dtype=object)
In [71]: A.shape
Out[71]: (2, 2)
Now make a (2,2) numeric array:
In [72]: L = np.eye(2)
In [75]: L[1,1] = 2
In [76]: np.matmul(L,A)
Out[76]:
array([[1.0,
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])],
[0.0, array([2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.])]],
dtype=object)
matmul does work with object dtype arrays, provided the elements implement the necessary + and *. The result is still (2,2), but the (1,1) term 2*z.
Now for the C:
In [77]: C = np.array([[z],[z]])
In [78]: C
Out[78]:
array([[[0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ]],
[[0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ]]])
In [79]: C.shape
Out[79]: (2, 1, 11)
This is float dtype, 3d array.
In [81]: B=Out[76]
In [82]: np.matmul(B,C)
Traceback (most recent call last):
File "<ipython-input-82-5eababb7341e>", line 1, in <module>
np.matmul(B,C)
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 1 is different from 2)
In [83]: B.shape
Out[83]: (2, 2)
In [84]: C.shape
Out[84]: (2, 1, 11)
There's a mismatch in shapes. But change C definition so it is a 2d array:
In [85]: C = np.array([z,z])
In [86]: C.shape
Out[86]: (2, 11)
In [87]: np.matmul(B,C)
Out[87]:
array([[array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]),
array([0.1 , 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2 ]),
...
array([1.8, 1.8, 1.8, 1.8, 1.8, 1.8, 1.8, 1.8, 1.8, 1.8, 1.8]),
array([2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.])]],
dtype=object)
In [88]: _.shape
Out[88]: (2, 11)
Here the (2,2) B matmuls with (2,11) just fine producing (2,11). But each element is itself a (11,) array - because of the z used in defining A.
But you say you want a (2,1) C. To get that we have to use:
In [91]: C = np.empty((2,1), object)
In [93]: C[:,0]=[z,z]
In [94]: C
Out[94]:
array([[array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])],
[array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])]],
dtype=object)
Be very careful when trying to create object dtype arrays. Things might not be what you expect.
Now matmul of (2,2) with (2,1) => (2,1), object dtype
In [95]: D = np.matmul(B,C)
In [96]: D.shape
Out[96]: (2, 1)
In [99]: D
Out[99]:
array([[array([0. , 0.11, 0.24, 0.39, 0.56, 0.75, 0.96, 1.19, 1.44, 1.71, 2. ])],
[array([0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8, 2. ])]],
dtype=object)
Keep in mind that matmul is very fast with working with numeric dtype arrays. It does work with object dtype arrays, but speed is much slower, more like using list comprehensions.

Not sure what you're trying to do (you should provide a reproducible example, there are currently many missing variables), and the expected output.
Nevertheless, the definition of A is fundamentally wrong. I imagine you expect a 2x2 array, but as z is a (10,) shaped array, you will end up with A being a weird object array whose element (0,1) is an array.
This prevents you do to any further mathematical operation.

Matrix created from a function, and concatenated column vector of the matrix

We have a function f(x,y). We want to calculate the matrix Bij = f(xi,xj) = f(ih,jh) for 1 <= i,j <= n and h=1/(n+1), such as :
If f(x,y)=x+y, then Bij = ih+jh and the matrix becomes (here, n=3) :
I would like to program a function calculating the column vector b that concatenates all the columns of Bij. For example, with my previous example, we would have :
I done, we can change the function and n, here f(x,y)=x+y :
n=3
def f(i,j):
h=1.0/(n+1)
a=((i+1)*h)+((j+1)*h)
return a
B = np.fromfunction(f,(n,n))
print(B)
But I don't know how to do the vector b. And with
np.concatenate((B[:,0],B[:,1],B[:,2],B[:,3])
I get a line vector, and not a column vector. Could you help me ? Sorry for my bad english, and I'm beginner in Python.

The ravel function along with a new axis should do the trick:
import numpy as np
x = np.array([[0.5, 0.75, 1],
[0.75, 1, 1.25],
[1, 1.25, 1.5]])
x.T.ravel()[:, np.newaxis]
# array([[ 0.5 ],
# [ 0.75],
# [ 1. ],
# [ 0.75],
# [ 1. ],
# [ 1.25],
# [ 1. ],
# [ 1.25],
# [ 1.5 ]])
Ravel stitches together all the rows, so we first transpose the matrix (with .T). The result is a row-vector, and we change it to a column vector by adding a new axis.

import numpy as np
# create sample matrix `m`
m = np.matrix([[0.5, 0.75, 1], [0.75, 1, 1.25], [1, 1.25, 1.5]])
# convert matrix `m` to a 'flat' matrix
m_flat = m.flatten()
print(m_flat)
# `m_flat` is still a matrix, in case you need an array:
m_flat_arr = np.squeeze(np.asarray(m_flat))
print(m_flat_arr)
The snippet uses .flatten(), .asarray() and .squeeze() to convert the original matrix m being
matrix([[ 0.5 , 0.75, 1. ],
[ 0.75, 1. , 1.25],
[ 1. , 1.25, 1.5 ]])
into an array m_flat_arr of:
array([ 0.5 , 0.75, 1. , 0.75, 1. , 1.25, 1. , 1.25, 1.5 ])

Creating a Pandas rolling-window series of arrays

Suppose I have the following code:
import numpy as np
import pandas as pd
x = np.array([1.0, 1.1, 1.2, 1.3, 1.4])
s = pd.Series(x, index=[1, 2, 3, 4, 5])
This produces the following s:
1 1.0
2 1.1
3 1.2
4 1.3
5 1.4
Now what I want to create is a rolling window of size n, but I don't want to take the mean or standard deviation of each window, I just want the arrays. So, suppose n = 3. I want a transformation that outputs the following series given the input s:
1 array([1.0, nan, nan])
2 array([1.1, 1.0, nan])
3 array([1.2, 1.1, 1.0])
4 array([1.3, 1.2, 1.1])
5 array([1.4, 1.3, 1.2])
How do I do this?

Here's one way to do it
In [294]: arr = [s.shift(x).values[::-1][:3] for x in range(len(s))[::-1]]
In [295]: arr
Out[295]:
[array([ 1., nan, nan]),
array([ 1.1, 1. , nan]),
array([ 1.2, 1.1, 1. ]),
array([ 1.3, 1.2, 1.1]),
array([ 1.4, 1.3, 1.2])]
In [296]: pd.Series(arr, index=s.index)
Out[296]:
1 [1.0, nan, nan]
2 [1.1, 1.0, nan]
3 [1.2, 1.1, 1.0]
4 [1.3, 1.2, 1.1]
5 [1.4, 1.3, 1.2]
dtype: object

Here's a vectorized approach using NumPy broadcasting -
n = 3 # window length
idx = np.arange(n)[::-1] + np.arange(len(s))[:,None] - n + 1
out = s.get_values()[idx]
out[idx<0] = np.nan
This gets you the output as a 2D array.
To get a series with each element holding each window as a list -
In [40]: pd.Series(out.tolist())
Out[40]:
0 [1.0, nan, nan]
1 [1.1, 1.0, nan]
2 [1.2, 1.1, 1.0]
3 [1.3, 1.2, 1.1]
4 [1.4, 1.3, 1.2]
dtype: object
If you wish to have a list of 1D arrays split arrays, you can use np.split on the output, like so -
out_split = np.split(out,out.shape[0],axis=0)
Sample run -
In [100]: s
Out[100]:
1 1.0
2 1.1
3 1.2
4 1.3
5 1.4
dtype: float64
In [101]: n = 3
In [102]: idx = np.arange(n)[::-1] + np.arange(len(s))[:,None] - n + 1
...: out = s.get_values()[idx]
...: out[idx<0] = np.nan
...:
In [103]: out
Out[103]:
array([[ 1. , nan, nan],
[ 1.1, 1. , nan],
[ 1.2, 1.1, 1. ],
[ 1.3, 1.2, 1.1],
[ 1.4, 1.3, 1.2]])
In [104]: np.split(out,out.shape[0],axis=0)
Out[104]:
[array([[ 1., nan, nan]]),
array([[ 1.1, 1. , nan]]),
array([[ 1.2, 1.1, 1. ]]),
array([[ 1.3, 1.2, 1.1]]),
array([[ 1.4, 1.3, 1.2]])]
Memory-efficiency with strides
For memory efficiency, we can use a strided one - strided_axis0, similar to #B. M.'s solution, but a bit more generic one.
So, to get 2D array of values with NaNs precedding the first element -
In [35]: strided_axis0(s.values, fillval=np.nan, L=3)
Out[35]:
array([[nan, nan, 1. ],
[nan, 1. , 1.1],
[1. , 1.1, 1.2],
[1.1, 1.2, 1.3],
[1.2, 1.3, 1.4]])
To get 2D array of values with NaNs as fillers coming after the original elements in each row and the order of elements being flipped, as stated in the problem -
In [36]: strided_axis0(s.values, fillval=np.nan, L=3)[:,::-1]
Out[36]:
array([[1. , nan, nan],
[1.1, 1. , nan],
[1.2, 1.1, 1. ],
[1.3, 1.2, 1.1],
[1.4, 1.3, 1.2]])
To get a series with each element holding each window as a list, simply wrap the earlier methods with pd.Series(out.tolist()) with out being the 2D array outputs -
In [38]: pd.Series(strided_axis0(s.values, fillval=np.nan, L=3)[:,::-1].tolist())
Out[38]:
0 [1.0, nan, nan]
1 [1.1, 1.0, nan]
2 [1.2, 1.1, 1.0]
3 [1.3, 1.2, 1.1]
4 [1.4, 1.3, 1.2]
dtype: object

Your data look like a strided array :
data=np.lib.stride_tricks.as_strided(np.concatenate(([NaN]*2,s))[2:],(5,3),(8,-8))
"""
array([[ 1. , nan, nan],
[ 1.1, 1. , nan],
[ 1.2, 1.1, 1. ],
[ 1.3, 1.2, 1.1],
[ 1.4, 1.3, 1.2]])
"""
Then transform in Series :
pd.Series(map(list,data))
""""
0 [1.0, nan, nan]
1 [1.1, 1.0, nan]
2 [1.2, 1.1, 1.0]
3 [1.3, 1.2, 1.1]
4 [1.4, 1.3, 1.2]
dtype: object
""""

If you attach the missing nans at the beginning and the end of the series, you use a simple window
def wndw(s,size=3):
stretched = np.hstack([
np.array([np.nan]*(size-1)),
s.values.T,
np.array([np.nan]*size)
])
for begin in range(len(stretched)-size):
end = begin+size
yield stretched[begin:end][::-1]
for arr in wndw(s, 3):
print arr

Normalize values between -1 and 1 inclusive

I am trying to generate a .wav file in python using Numpy. I have voltages ranging between 0-5V and I need to normalize them between -1 and 1 to use them in a .wav file.
I have seen this website which uses numpy to generate a wav file but the algorithm used to normalize is no long available.
Can anyone explain how I would go about generating these values in Python on my Raspberry Pi.

isn't this just a simple calculation? Divide by half the maximum value and minus 1:
In [12]: data=np.linspace(0,5,21)
In [13]: data
Out[13]:
array([ 0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 , 1.75, 2. ,
2.25, 2.5 , 2.75, 3. , 3.25, 3.5 , 3.75, 4. , 4.25,
4.5 , 4.75, 5. ])
In [14]: data/2.5-1.
Out[14]:
array([-1. , -0.9, -0.8, -0.7, -0.6, -0.5, -0.4, -0.3, -0.2, -0.1, 0. ,
0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

The following function should do what you want, irrespective of the range of the input data, i.e., it works also if you have negative values.
import numpy as np
def my_norm(a):
ratio = 2/(np.max(a)-np.min(a))
#as you want your data to be between -1 and 1, everything should be scaled to 2,
#if your desired min and max are other values, replace 2 with your_max - your_min
shift = (np.max(a)+np.min(a))/2
#now you need to shift the center to the middle, this is not the average of the values.
return (a - shift)*ratio
my_norm(data)

You can use the fit_transform method in sklearn.preprocessing.StandardScaler. This method will remove the mean from your data and scale your array to unit variance (-1,1)
from sklearn.preprocessing import StandardScaler
data = np.asarray([[0, 0, 0],
[1, 1, 1],
[2,1, 3]])
data = StandardScaler().fit_transform(data)
And if you print out data, you will now have:
[[-1.22474487 -1.41421356 -1.06904497]
[ 0. 0.70710678 -0.26726124]
[ 1.22474487 0.70710678 1.33630621]]

Combining two numpy arrays to form an array with the largest value from each array

I want to combine two numpy arrays to produce an array with the largest values from each array.
import numpy as np
a = np.array([[ 0., 0., 0.5],
[ 0.1, 0.5, 0.5],
[ 0.1, 0., 0.]])
b = np.array([[ 0., 0., 0.0],
[ 0.5, 0.1, 0.5],
[ 0.5, 0.1, 0.]])
I would like to produce
array([[ 0., 0., 0.5],
[ 0.5, 0.5, 0.5],
[ 0.5, 0.1, 0.]])
I know you can do
a += b
which results in
array([[ 0. , 0. , 0.5],
[ 0.6, 0.6, 1. ],
[ 0.6, 0.1, 0. ]])
This is clearly not what I'm after. It seems like such an easy problem and I assume it most probably is.

You can use np.maximum to compute the element-wise maximum of the two arrays:
>>> np.maximum(a, b)
array([[ 0. , 0. , 0.5],
[ 0.5, 0.5, 0.5],
[ 0.5, 0.1, 0. ]])
This works with any two arrays, as long as they're the same shape or one can be broadcast to the shape of the other.
To modify the array a in-place, you can redirect the output of np.maximum back to a:
np.maximum(a, b, out=a)
There is also np.minimum for calculating the element-wise minimum of two arrays.

You are looking for the element-wise maximum.
Example:
>>> np.maximum([2, 3, 4], [1, 5, 2])
array([2, 5, 4])
http://docs.scipy.org/doc/numpy/reference/generated/numpy.maximum.html

inds = b > a
a[inds] = b[inds]
This modifies the original array a which is what += is doing in your example which may or may not be what you want.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python arrays: averaging slope and intercept of datasets - python

Related

Python numpy matrix multiplication mismatch in core dimension

Matrix created from a function, and concatenated column vector of the matrix

Creating a Pandas rolling-window series of arrays

Normalize values between -1 and 1 inclusive

Combining two numpy arrays to form an array with the largest value from each array

Categories

Resources