numpy split doesn't work on float array - python

I was trying to split a float array into sub arrays using numpy split, however the results are not correct:
import numpy as np
x = np.array([1.2, 1.3, 1.5, 2, 2.1, 2.5])
np.split(x, [1, 2, 3])
Out[127]: [array([ 1.2]), array([ 1.3]), array([ 1.5]), array([ 2. , 2.1, 2.5])]
1.2, 1.3 and 1.5 should be put into one sub array but they are separated, whereas it seems it splits the 2, 2.1 and 2.5 correctly.

I guess you want to split the array into the elements that are smaller than 1, between 1 and 2, between 2 and 3 and greater than 3 (4 bins). If we assume the array is sorted then the following will work:
>>> x = np.array([0.4, 1.2, 1.3, 1.5, 2, 2.1, 2.5, 3.4])
>>> np.split(x, np.bincount(np.digitize(x, [1, 2, 3])).cumsum())[:-1]
[array([ 0.4]),
array([ 1.2, 1.3, 1.5]),
array([ 2. , 2.1, 2.5]),
array([ 3.4])]
With np.digitize we get the index of the bin for each array element. With np.bincount we get the number of elements in each bin. With np.cumsum we can take the splitting indexes of each bin in the sorted array. Finally, we have what np.split needs.

Quoted from the docs:
numpy.split(ary, indices_or_sections, axis=0)
indices_or_sections : int or 1-D array If indices_or_sections is an
integer, N, the array will be divided into N equal arrays along axis.
If such a split is not possible, an error is raised. If
indices_or_sections is a 1-D array of sorted integers, the entries
indicate where along axis the array is split. For example, [2, 3]
would, for axis=0, result in ary[:2] ary[2:3] ary[3:] If an index
exceeds the dimension of the array along axis, an empty sub-array is
returned correspondingly.
So, if you want to split a the third element on the axis you need to do something like this:
In [1]: import numpy as np
In [2]: x = np.array([1.2, 1.3, 1.5, 2, 2.1, 2.5])
In[3]: np.split(x, [3])
Out[3]: [array([ 1.2, 1.3, 1.5]), array([ 2. , 2.1, 2.5])]
If you rather want to split the array x into two equal sub-arrays:
In [4]: np.split(x, 2)
Out[4]: [array([ 1.2, 1.3, 1.5]), array([ 2. , 2.1, 2.5])]

np.split(x, [1, 2, 3]) gives you x[:1], x[1:2], x[3:] which obviously is not what you want. It seems what you want is np.split(x, [3]).

Related

Get a list from numpy ndarray in Python?

I have a numpy.ndarray here which I am trying to convert it to a list.
>>> a=np.array([[[0.7]], [[0.3]], [[0.5]]])
I am using hstack for it. However, I am getting a list of a list. How can I get a list instead? I am expecting to get [0.7, 0.3, 0.5].
>>> b = np.hstack(a)
>>> b
array([[0.7, 0.3, 0.5]])
Do you understand what you have?
In [46]: a=np.array([[[0.7]], [[0.3]], [[0.5]]])
In [47]: a
Out[47]:
array([[[0.7]],
[[0.3]],
[[0.5]]])
In [48]: a.shape
Out[48]: (3, 1, 1)
That's a 3d array - count the []
You can convert it to 1d with:
In [49]: a.ravel()
Out[49]: array([0.7, 0.3, 0.5])
tolist converts the array to a list:
In [50]: a.ravel().tolist()
Out[50]: [0.7, 0.3, 0.5]
You could also use a[:,0,0]. If you use hstack, that partially flattens it - but not all the way to 1d.
In [51]: np.hstack(a)
Out[51]: array([[0.7, 0.3, 0.5]])
In [52]: _.shape
Out[52]: (1, 3)
In [53]: np.hstack(a)[0]
Out[53]: array([0.7, 0.3, 0.5])
Alternatively, numpy.ndarray.flatten can be used:
a.flatten().tolist()
And yet another possibility:
a.reshape(-1).tolist()
Output:
[0.7, 0.3, 0.5]

How to index elements from a column of a ndarray such that the output is a column vector?

I have an nx2 array of points represented as a ndarray. I want to index some of the elements (indices are given in a ndarray as well) of one of the two column vectors such that the output is a column vector. If however the index array contains only one index, a (1,)-shaped array should be returned.
I already tried the following things without success:
import numpy as np
points = np.array([[0, 1], [1, 1.5], [2.5, 0.5], [4, 1], [5, 2]])
index = np.array([0, 1, 2])
points[index, [0]] -> array([0. , 1. , 2.5]) -> shape (3,)
points[[index], 0] -> array([[0. , 1. , 2.5]]) -> shape (1, 3)
points[[index], [0]] -> array([[0. , 1. , 2.5]]) -> shape (1, 3)
points[index, 0, np.newaxis] -> array([[0. ], [1. ], [2.5]]) -> shape(3, 1) # desired
np.newaxis works for this scenario however if the index array only contains one value it does not deliver the right shape:
import numpy as np
points = np.array([[0, 1], [1, 1.5], [2.5, 0.5], [4, 1], [5, 2]])
index = np.array([0])
points[index, 0, np.newaxis] -> array([[0.]]) -> shape (1, 1)
points[index, [0]] -> array([0.]) -> shape (1,) # desired
Is there possibility to index the ndarray such that the output has shapes (3,1) for the first example and (1,) for the second example without doing case differentiations based on the size of the index array?
Thanks in advance for your help!
In [329]: points = np.array([[0, 1], [1, 1.5], [2.5, 0.5], [4, 1], [5, 2]])
...: index = np.array([0, 1, 2])
We can select 3 rows with:
In [330]: points[index,:]
Out[330]:
array([[0. , 1. ],
[1. , 1.5],
[2.5, 0.5]])
However if we select a column as well, the result is 1d, even if we use [0]. That's because the (3,) row index is broadcast against the (1,) column index, resulting in a (3,) result:
In [331]: points[index,0]
Out[331]: array([0. , 1. , 2.5])
In [332]: points[index,[0]]
Out[332]: array([0. , 1. , 2.5])
If we make row index (3,1) shape, the result also (3,1):
In [333]: points[index[:,None],[0]]
Out[333]:
array([[0. ],
[1. ],
[2.5]])
In [334]: points[index[:,None],0]
Out[334]:
array([[0. ],
[1. ],
[2.5]])
We get the same thing if we use a row slice:
In [335]: points[0:3,[0]]
Out[335]:
array([[0. ],
[1. ],
[2.5]])
Using [index] doesn't help because it makes the row index (1,3) shape, resulting in a (1,3) result. Of course you could transpose it to get (3,1).
With a 1 element index:
In [336]: index1 = np.array([0])
In [337]: points[index1[:,None],0]
Out[337]: array([[0.]])
In [338]: _.shape
Out[338]: (1, 1)
In [339]: points[index1,0]
Out[339]: array([0.])
In [340]: _.shape
Out[340]: (1,)
If the row index was a scalar, as opposed to 1d:
In [341]: index1 = np.array(0)
In [342]: points[index1[:,None],0]
...
IndexError: too many indices for array
In [343]: points[index1[...,None],0] # use ... instead
Out[343]: array([0.])
In [344]: points[index1, 0] # scalar result
Out[344]: 0.0
I think handling the np.array([0]) case separately requires an if test. At least I can't think of a builtin numpy way of burying it.
I'm not certain I understand the wording in your question, but it seems as though you may be after the ndarray.swapaxes method (see https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.ndarray.swapaxes.html#numpy.ndarray.swapaxes)
for your snippet:
points = np.array([[0, 1], [1, 1.5], [2.5, 0.5], [4, 1], [5, 2]])
swapped = points.swapaxes(0,1)
print(swapped)
gives
[[0. 1. 2.5 4. 5. ]
[1. 1.5 0.5 1. 2. ]]

python arrays: averaging slope and intercept of datasets

I am having some difficulties achieving the following. Let's say I have two sets of data obtained from a test:
import numpy as np
a = np.array([[0.0, 1.0, 2.0, 3.0], [0.0, 2.0, 4.0, 6.0]]).T
b = np.array([[0.5, 1.5, 2.5, 3.5], [0.5, 1.5, 2.5, 3.5]]).T
where the data in the 0th column represents (in my case) displacement and the data in the 1th column represents the respective measured force values.
(Given data represents two lines with slopes of 2 and 1, both with a y-intercept of 0.)
Now I am trying to program a script that averages those two arrays despite the mismatched x-values, such that it will yield
c = [0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5], [0.0, 0.75, 1.5,
2.25, 3.0, 3.75, 4.5, 5.25]]).T
(A line with a slope of 1.5 and a y-intercept of 0.)
I tried my best using slicing and linear interpolation, however it seems like I cannot get my head around it (I am a beginner).
I'd be very glad for any input and tips and hope the information I gave to you is sufficient!
Thanks in advance,
Robert
You can get the coefficients (slope and intercept) of each dataset, obtain the mean, and fit that data to a new array of x values.
Step by Step:
Fit deg-1 polynomial to each array a, and b using polyfit to get the coefficients of each (slope and intercept):
coef_a = np.polyfit(a[:,0], a[:,1], deg=1)
coef_b = np.polyfit(b[:,0], b[:,1], deg=1)
>>> coef_a
array([ 2.00000000e+00, 2.22044605e-16])
>>> coef_b
array([ 1.00000000e+00, 1.33226763e-15])
Get the mean of those coefficients to use as the coefficients of c:
coef_c = np.mean(np.stack([coef_a,coef_b]), axis=0)
>>> coef_c
array([ 1.50000000e+00, 7.77156117e-16])
Create new x-values for c using np.arange
c_x = np.arange(0,4,0.5)
>>> c_x
array([ 0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5])
use polyval to fit your new c coeficients to your new x values:
c_y = np.polyval(coef_c, c_x)
>>> c_y
array([ 7.77156117e-16, 7.50000000e-01, 1.50000000e+00,
2.25000000e+00, 3.00000000e+00, 3.75000000e+00,
4.50000000e+00, 5.25000000e+00])
Put your c_x and c_y values together using stack:
c = np.stack([c_x, c_y])
>>> c
array([[ 0.00000000e+00, 5.00000000e-01, 1.00000000e+00,
1.50000000e+00, 2.00000000e+00, 2.50000000e+00,
3.00000000e+00, 3.50000000e+00],
[ 7.77156117e-16, 7.50000000e-01, 1.50000000e+00,
2.25000000e+00, 3.00000000e+00, 3.75000000e+00,
4.50000000e+00, 5.25000000e+00]])
If you round that to 2 decimals, you'll see it's the same as your desired outcome:
>>> np.round(c, 2)
array([[ 0. , 0.5 , 1. , 1.5 , 2. , 2.5 , 3. , 3.5 ],
[ 0. , 0.75, 1.5 , 2.25, 3. , 3.75, 4.5 , 5.25]])
In a single statement:
c = np.stack([np.arange(0, 4, 0.5),
np.polyval(np.mean(np.stack([np.polyfit(a.T[0], a.T[1], 1),
np.polyfit(b.T[0], b.T[1], 1)]),
axis=0),
np.arange(0, 4, 0.5))])
>>> c
array([[ 0.00000000e+00, 5.00000000e-01, 1.00000000e+00,
1.50000000e+00, 2.00000000e+00, 2.50000000e+00,
3.00000000e+00, 3.50000000e+00],
[ 7.77156117e-16, 7.50000000e-01, 1.50000000e+00,
2.25000000e+00, 3.00000000e+00, 3.75000000e+00,
4.50000000e+00, 5.25000000e+00]])

Append selected values from a multi dimensional array to a new array

Hello :) I am a python beginner and i started working with numpy lately, basically i got a nd-array: data.shape = {55000, 784} filled with float32 values. Based on a condition i made, i want to append specific rows and their columns to a new array, its important that the formating stays the same. e.g. i want data[5][0-784] appended to an empty array.. i heard about something called fancy indexing, still couldn't figure out how to use it, an example would help me out big time. I would appreciate every help from you guys! - Greets
I'd recommend skimming through the documentation for Indexing. But, here is an example to demonstrate.
import numpy as np
data = np.array([[0, 1, 2], [3, 4, 5]])
print(data.shape)
(2, 3)
print(data)
[[0 1 2]
[3 4 5]]
selection = data[1, 1:3]
print(selection)
[4 5]
Fancy indexing is an advanced indexing function which allows indexing using integer arrays. Here is an example.
fancy_selection = data[[0, 1], [0, 2]]
print(fancy_selection)
[0 5]
Since you also asked about appending, have a look at Append a NumPy array to a NumPy array. Here is an example anyway.
data_two = np.array([[6, 7, 8]])
appended_array = np.concatenate((data, data_two))
print(appended_array)
[[0 1 2]
[3 4 5]
[6 7 8]]
As #hpaulj recommends in his comment appending to arrays is possible but inefficient and should be avoided. Let's turn to your example but make the numbers a bit smaller.
a = np.sum(np.ogrid[1:5, 0.1:0.39:0.1])
a
# array([[ 1.1, 1.2, 1.3],
# [ 2.1, 2.2, 2.3],
# [ 3.1, 3.2, 3.3],
# [ 4.1, 4.2, 4.3]])
a.shape
# (4, 3)
Selecting an element:
a[1,2]
# 2.3
Selecting an entire row:
a[2, :] # or a[2] or a 2[, ...]
# array([ 3.1, 3.2, 3.3])
or column:
a[:, 1] # or a[..., 1]
# array([ 1.2, 2.2, 3.2, 4.2])
fancy indexing, observe that the first index is not a slice but a list or array:
a[[3,0,0,1], :] # or a[[3,0,0,1]]
# array([[ 4.1, 4.2, 4.3],
# [ 1.1, 1.2, 1.3],
# [ 1.1, 1.2, 1.3],
# [ 2.1, 2.2, 2.3]])
fancy indexing can be used on multiple axes to select arbitrary elements and assemble them to a new shape for example you could make a 2x2x2 array like so:
a[ [[[0,1], [1,2]], [[3,3], [3,2]]], [[[2,1], [1,1]], [[2,1], [0,0]]] ]
# array([[[ 1.3, 2.2],
# [ 2.2, 3.2]],
#
# [[ 4.3, 4.2],
# [ 4.1, 3.1]]])
There is also logical indexing
mask = np.isclose(a%1.1, 1.0)
mask
# array([[False, False, False],
# [ True, False, False],
# [False, True, False],
# [False, False, True]], dtype=bool)
a[mask]
# array([ 2.1, 3.2, 4.3])
To combine arrays, collect them in a list and use concatenate
np.concatenate([a[1:, :2], a[:0:-1, [2,0]]], axis=1)
# array([[ 2.1, 2.2, 4.3, 4.1],
# [ 3.1, 3.2, 3.3, 3.1],
# [ 4.1, 4.2, 2.3, 2.1]])
Hope that help getting you started.

Using Python numpy where condition to change entires below a certain value

Here's my array:
import numpy as np
a = np.array([0, 5.0, 0, 5.0, 5.0])
Is it possible to use numpy.where in some way to add a value x to all those entires in a that are less than l?
So something like:
a = a[np.where(a < 5).add(2.5)]
Should return:
array([2.5, 5.0, 2.5, 5.0, 5.0])
a = np.array([0., 5., 0., 5., 5.])
a[np.where(a < 5)] += 2.5
in case you really want to use where or just
a[a < 5] += 2.5
which I usually use for these kind of operations.
You could use np.where to create the array of additions and then simply add to a -
a + np.where(a < l, 2.5,0)
Sample run -
In [16]: a = np.array([1, 5, 4, 5, 5])
In [17]: l = 5
In [18]: a + np.where(a < l, 2.5,0)
Out[18]: array([ 3.5, 5. , 6.5, 5. , 5. ])
Given that you probably need to change the dtype (from int to float) you need to create a new array. A simple way without explicit .astype or np.where calls is multiplication with a mask:
>>> b = a + (a < 5) * 2.5
>>> b
array([ 2.5, 5. , 2.5, 5. , 5. ])
with np.where this can be changed to a simple expression (using the else-condition, third argument, in where):
>>> a = np.where(a < 5, a + 2.5, a)
>>> a
array([ 2.5, 5. , 2.5, 5. , 5. ])
a += np.where(a < 1, 2.5, 0)
where will return the second argument wherever the condition (first argument) is satisfied and the third argument otherwise.
You can use a "masked array" as an index. Boolean operations, such as a < 1 return such an array.
>>> a<1
array([False, False, False, False, False], dtype=bool)
you can use it as
>>> a[a<1] += 1
The a<1 part selects only the items in a that match the condition. You can operate on this part only then.
If you want to keep a trace of your selection, you can proceed in two steps.
>>> mask = a>1
>>> a[mask] += 1
Also, you can count the items matching the conditions:
>>> print np.sum(mask)

Categories