numpy.diff returning an empty array?

numpy.diff returning an empty array? - python

#python 3.6.3
import numpy as np
time_C0002A/1000
array([[-0.99925 ],
[-0.99925 ],
[-0.99925 ],
...,
[ 0.0181095],
[ 0.0195675],
[ 0.0205931]])
Fs_log = 1 / np.diff(time_C0002A/1000)
When I enter it in to see what it returns, it is given as an empty array
Fs_log
array([], shape=(9063,0), dtype = float64)
I am expecting an array to be returned and have confirmed with a different example, any idea what could be occurring and how I should remedy this? i believe it is an issue with the axis along which diff is taken but I am not sure what it should be defined as, example:
np.diff(time_C0002A/1000, axis = 0)
But I am not sure? Input appreciated!

Your time_C0002A array has a shape of (n,1). np.diff take the difference over the last axis by default, in your case of length 1. You can specify the axis as an argument.
np.diff(time_C0002A/1000, axis=0)

Related

Masking a 2D array and operating on second array based off masked indices

I have a function that reads in and outputs a 2D array. I want the output to be constant (pi in this case) for every index in the input that equals 0, otherwise I perform some maths on it. E.g:
import numpy as np
import numpy.ma as ma
def my_func(x):
mask = ma.where(x==0,x)
# make an array of pi's the same size and shape as the input
y = np.pi * np.ones(x)
# psuedo-code bit I can't figure out
y.not_masked = y**2
return y
my_array = [[0,1,2],[1,0,2],[1,2,0]]
result_array = my_func(my_array)
This should give me the following:
result_array = [[3.14, 1, 4],[1, 3.14, 4], [1, 4, 3.14]]
I.e. it has applied y**2 to each element in the 2D list that doesn't equal zero, and replaced all the zeros with pi.
I need this because my function will include division, and I don't know the indexes beforehand. I'm trying to convert a matlab tutorial from a textbook into Python and this function is stumping me!
Thanks

Just use np.where() directly:
y = np.where(x, x**2, np.pi)
Example:
>>> x = np.asarray([[0,1,2],[1,0,2],[1,2,0]])
>>> y = np.where(x, x**2, np.pi)
>>> print(y)
[[ 3.14159265 1. 4. ]
[ 1. 3.14159265 4. ]
[ 1. 4. 3.14159265]]

Try this:
my_array = np.array([[0,1,2],[1,0,2],[1,2,0]]).astype(float)
def my_func(x):
mask = x == 0
x[mask] = np.pi
x[~mask] = x[~mask]**2 # or some other operation on x...
return x

I would suggest rather than using masks you can use a boolean array to achieve what you want.
def my_func(x):
#create a boolean matrix, a, that has True where x==0 and
#False where x!=0
a=x==0
x[a]=np.pi
#Use np.invert to flip where a is True and False so we can
#operate on the non-zero values of the array
x[~a]=x[~a]**2
return x #return the transformed array
my_array = np.array([[0.,1.,2.],[1.,0.,2.],[1.,2.,0.]])
result_array = my_func(my_array)
this gives the output:
array([[ 3.14159265, 1. , 4. ],
[ 1. , 3.14159265, 4. ],
[ 1. , 4. , 3.14159265]])
Notice that I passed to the function an numpy array specifically, originally you passed a list and that will give problems when you attempt to do mathematical operations. Also notice I defined the array with 1. rather than just 1, in order to make sure it was an array of floats rather than integers, because if it is an array of integers when you set values equal to pi it will truncate to 3.
Perhaps it would be good to add a piece to the function to check the dtype of the input argument and see if it is a numpy array rather than a list or other object, and also to make sure it contains floats, and if not you can adjust accordingly.
EDIT:
Change to using ~a rather than invert(a) as per Scotty1's suggestion.

python, tensorflow, how to get a tensor shape with half the features

I need the shape of a tensor, except instead of feature_size as the -1 dimension I need feature_size//2
The code I'm currently using is
_, half_output = tf.split(output,2,axis=-1)
half_shape = tf.shape(half_output)
This works but it's incredibly inelegant. I don't need an extra copy of half the tensor, I just need that shape. I've tried to do this other ways but nothing besides this bosh solution has worked yet.
Anyone know a simple way to do this?

A simple way to get the shape with the last value halved:
half_shape = tf.shape(output[..., 1::2])
What it does is simply iterate output in its last dimension with step 2, starting from the second element (index 1).
The ... doesn't touch other dimensions. As a result, you will have output[..., 1::2] with the same dimensions as output, except for the last one, which will be sampled like the following example, resulting in half the original value.
>>> a = np.random.rand(5,5)
>>> a
array([[ 0.21553665, 0.62008421, 0.67069869, 0.74136913, 0.97809012],
[ 0.70765302, 0.14858418, 0.47908281, 0.75706245, 0.70175868],
[ 0.13786186, 0.23760233, 0.31895335, 0.69977537, 0.40196103],
[ 0.7601455 , 0.09566717, 0.02146819, 0.80189659, 0.41992885],
[ 0.88053697, 0.33472285, 0.84303012, 0.10148065, 0.46584882]])
>>> a[..., 1::2]
array([[ 0.62008421, 0.74136913],
[ 0.14858418, 0.75706245],
[ 0.23760233, 0.69977537],
[ 0.09566717, 0.80189659],
[ 0.33472285, 0.10148065]])
This half_shape prints the following Tensor:
Tensor("Shape:0", shape=(3,), dtype=int32)
Alternatively you could get the shape of output and create the shape you want manually:
s = output.get_shape().as_list()
half_shape = tf.TensorShape(s[:-1] + [s[-1] // 2])
This half_shape prints a TensorShape showing the shape halved in the last dimension.

Error with Padlen in signal.filtfilt in Python

I am working with library "scipy.signal" in Python and I have the next code:
from scipy import signal
b = [ 0.001016 0.00507999 0.01015998 0.01015998 0.00507999 0.001016 ]
a = [ 1. -3.0820186 4.04351697 -2.76126457 0.97291013 -0.14063199]
data = [[ 1.]
[ 1.]
[ 1.]
...]
# length = 264
y = signal.filtfilt(b, a, data)
But when I execute the code I get the next error message:
The length of the input vector x must be at least padlen, which is 18.
What could I do?

It appears that data is a two-dimensional array with shape (264, 1). By default, filtfilt filters along the last axis of the input array, so in your case it is trying to filter along an axis where the length of the data is 1, which is not long enough for the default padding method.
I assume you meant to interpret data as a one-dimensional array. You can add the argument axis=0
y = signal.filtfilt(b, a, data, axis=0)
to filter along the first dimension (i.e. down the column), in which case the output y will also have shape (264, 1). Alternatively, you can convert the input to a one-dimensional array by flattening it with np.ravel(data) or by using indexing to select the first (and only) column, data[:, 0]. (The latter will only work if data is, in fact, a numpy array and not a list of lists.) E.g.
y = signal.filtfilt(b, a, np.ravel(data))
In that case, the output y will also be a one-dimensional array, with shape (264,).

Assuming you have a two-dimensional array with shape (264, 2), you can also use np.hsplit() to split data into two separate arrays like so:
import numpy as np
arr1, arr2 = np.hsplit(data,2)
You can view the shape of each individual array, for example:
print(arr1.shape)
Your code will then look something like this:
y1 = signal.filtfilt(b, a, arr1)
y2 = signal.filtfilt(b, a, arr2)

Scikit/Numpy/Pandas ValueError: setting an array element with sequence

I had a pandas dataframe that had columns with strings from 0-9 as column names:
working_df = pd.DataFrame(np.random.rand(5,10),index=range(0,5), columns=[str(x) for x in range(10)])
working_df.loc[:,'outcome'] = [0,1,1,0,1]
I then wanted to get an array of all of these numbers into one column so I did:
array_list = [Y for Y in x[[str(num) for num in range(10)]].values]
which gave me:
[array([ 0.0793451 , 0.3288617 , 0.75887129, 0.01128641, 0.64105905,
0.78789297, 0.69673768, 0.20354558, 0.48976411, 0.72848541]),
array([ 0.53511388, 0.08896322, 0.10302786, 0.08008444, 0.18218731,
0.2342337 , 0.52622153, 0.65607384, 0.86069294, 0.8864577 ]),
array([ 0.82878026, 0.33986175, 0.25707122, 0.96525733, 0.5897311 ,
0.3884232 , 0.10943644, 0.26944414, 0.85491211, 0.15801284]),
array([ 0.31818888, 0.0525836 , 0.49150727, 0.53682492, 0.78692193,
0.97945708, 0.53181293, 0.74330327, 0.91364064, 0.49085287]),
array([ 0.14909577, 0.33959452, 0.20607263, 0.78789116, 0.41780657,
0.0437907 , 0.67697385, 0.98579928, 0.1487507 , 0.41682309])]
I then attached it to my dataframe using:
working_df.loc[:,'array_list'] = pd.Series(array_list)
I then setup my rf_clf = RandomForestClassifier() and I try to rf_clf.fit(working_df['array_list'][1:].values, working_df['outcome'][1:].values) which results in the ValueError: setting an array element with sequence
Is it a problem with the array of arrays in the fitting? Thanks for any insight.

The problem is that scikit-learn expects a two-dimensional array of values as input. You're passing a one dimensional array of objects (with each object itself being a one-dimensional array).
A quick fix would be to do this:
X = np.array(list(working_df['array_list'][1:]))
y = working_df['outcome'][1:].values
rf_clf.fit(X, y)
A better fix would be to not store your two-dimensional feature array within a one-dimensional pandas column.

derivative with numpy.diff problems

I have this problem:
I have an array of 7 elements:
vector = [array([ 76.27789424]), array([ 76.06870298]), array([ 75.85016864]), array([ 75.71155968]), array([ 75.16982466]), array([ 73.08832948]), array([ 68.59935515])]
(this array is the result of a lot of operation)
now I want calculate the derivative with numpy.diff(vector) but I know that the type must be a numpy array.
for this, I type:
vector=numpy.array(vector);
if I print the vector, now, the result is:
[[ 76.27789424]
[ 76.06870298]
[ 75.85016864]
[ 75.71155968]
[ 75.16982466]
[ 73.08832948]
[ 68.59935515]]
but If i try to calculate the derivative, the result is [].
Can You help me, please?
Thanks a lot!

vector is a list of arrays, to get a 1-D NumPy array use a list comprehension and pass it to numpy.array:
>>> vector = numpy.array([x[0] for x in vector])
>>> numpy.diff(vector)
array([-0.20919126, -0.21853434, -0.13860896, -0.54173502, -2.08149518,
-4.48897433])

vector = numpy.array(vector);
gives you a two dimensional array with seven rows and one column
>>> vector.shape
(7, 1)
The shape reads like: (length axis 0, length axis 1, length axis 2, ...)
As you can see the last axis is axis 1 and it's length is 1.
from the docs
numpy.diff(a, n=1, axis=-1)
...
axis : int, optional
The axis along which the difference is taken, default is the last axis.
There is no way to take difference of a single value. So lets try to use the first axis which has a length of 7. Since axis counting starts with zero, the first axis is 0
>>> np.diff(vector, axis=0)
array([[-0.20919126],
[-0.21853434],
[-0.13860896],
[-0.54173502],
[-2.08149518],
[-4.48897433]])
Note that every degree of derivative will be one element shorter so the new shape is (7-1, 1) which is (6, 1). Lets verify that
>>> np.diff(vector, axis=0).shape
(6, 1)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy.diff returning an empty array? - python

Your time_C0002A array has a shape of (n,1). np.diff take the difference over the last axis by default, in your case of length 1. You can specify the axis as an argument. np.diff(time_C0002A/1000, axis=0)

Related

Masking a 2D array and operating on second array based off masked indices

python, tensorflow, how to get a tensor shape with half the features

Error with Padlen in signal.filtfilt in Python

Scikit/Numpy/Pandas ValueError: setting an array element with sequence

derivative with numpy.diff problems

Categories

Resources