Hello :) I am a python beginner and i started working with numpy lately, basically i got a nd-array: data.shape = {55000, 784} filled with float32 values. Based on a condition i made, i want to append specific rows and their columns to a new array, its important that the formating stays the same. e.g. i want data[5][0-784] appended to an empty array.. i heard about something called fancy indexing, still couldn't figure out how to use it, an example would help me out big time. I would appreciate every help from you guys! - Greets
I'd recommend skimming through the documentation for Indexing. But, here is an example to demonstrate.
import numpy as np
data = np.array([[0, 1, 2], [3, 4, 5]])
print(data.shape)
(2, 3)
print(data)
[[0 1 2]
[3 4 5]]
selection = data[1, 1:3]
print(selection)
[4 5]
Fancy indexing is an advanced indexing function which allows indexing using integer arrays. Here is an example.
fancy_selection = data[[0, 1], [0, 2]]
print(fancy_selection)
[0 5]
Since you also asked about appending, have a look at Append a NumPy array to a NumPy array. Here is an example anyway.
data_two = np.array([[6, 7, 8]])
appended_array = np.concatenate((data, data_two))
print(appended_array)
[[0 1 2]
[3 4 5]
[6 7 8]]
As #hpaulj recommends in his comment appending to arrays is possible but inefficient and should be avoided. Let's turn to your example but make the numbers a bit smaller.
a = np.sum(np.ogrid[1:5, 0.1:0.39:0.1])
a
# array([[ 1.1, 1.2, 1.3],
# [ 2.1, 2.2, 2.3],
# [ 3.1, 3.2, 3.3],
# [ 4.1, 4.2, 4.3]])
a.shape
# (4, 3)
Selecting an element:
a[1,2]
# 2.3
Selecting an entire row:
a[2, :] # or a[2] or a 2[, ...]
# array([ 3.1, 3.2, 3.3])
or column:
a[:, 1] # or a[..., 1]
# array([ 1.2, 2.2, 3.2, 4.2])
fancy indexing, observe that the first index is not a slice but a list or array:
a[[3,0,0,1], :] # or a[[3,0,0,1]]
# array([[ 4.1, 4.2, 4.3],
# [ 1.1, 1.2, 1.3],
# [ 1.1, 1.2, 1.3],
# [ 2.1, 2.2, 2.3]])
fancy indexing can be used on multiple axes to select arbitrary elements and assemble them to a new shape for example you could make a 2x2x2 array like so:
a[ [[[0,1], [1,2]], [[3,3], [3,2]]], [[[2,1], [1,1]], [[2,1], [0,0]]] ]
# array([[[ 1.3, 2.2],
# [ 2.2, 3.2]],
#
# [[ 4.3, 4.2],
# [ 4.1, 3.1]]])
There is also logical indexing
mask = np.isclose(a%1.1, 1.0)
mask
# array([[False, False, False],
# [ True, False, False],
# [False, True, False],
# [False, False, True]], dtype=bool)
a[mask]
# array([ 2.1, 3.2, 4.3])
To combine arrays, collect them in a list and use concatenate
np.concatenate([a[1:, :2], a[:0:-1, [2,0]]], axis=1)
# array([[ 2.1, 2.2, 4.3, 4.1],
# [ 3.1, 3.2, 3.3, 3.1],
# [ 4.1, 4.2, 2.3, 2.1]])
Hope that help getting you started.
Related
I am trying to write out a covariance calculation for the following example, and i know there has to be a better way than a for loop. I've looked into np.dot, np.einsum and i feel like np.einsum has the capability but i am just missing something for implementing it.
import numpy as np
# this is mx3
a = np.array([[1,2,3],[4,5,6]])
# this is x3
mean = a.mean(axis=0)
# result should be 3x3
b = np.zeros((3,3))
for i in range(a.shape[0]):
b = b + (a[i]-mean).reshape(3,1) * (a[i]-mean)
b
array([[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5]])
so this is fine for a 2 data point sample but for a m=large number this is super slow. There has to be a better way. Any suggestions?
In [108]: a = np.array([[1,2,3],[4,5,6]])
...: # this is x3
...: mean = a.mean(axis=0)
...:
...: # result should be 3x3
...: b = np.zeros((3,3))
...: for i in range(a.shape[0]):
...: b = b + (a[i]-mean).reshape(3,1) * (a[i]-mean)
...:
In [109]: b
Out[109]:
array([[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5]])
In [110]: a.mean(axis=0)
Out[110]: array([2.5, 3.5, 4.5])
Since the mean is subtracted twice, lets define a new variable. In this case the 2d and 1d dimensions broadcast, so we can simply:
In [111]: a1= a - a.mean(axis=0)
In [112]: a1
Out[112]:
array([[-1.5, -1.5, -1.5],
[ 1.5, 1.5, 1.5]])
The rest is a normal dot product:
In [113]: a1.T#a1
Out[113]:
array([[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5]])
np.einsum and np.dot can also do this matrix multiplication.
I tried to make a function and inside it there is a code to divides a column with its column sum and here I come up with.
A = np.array([[1,2,3,4],[1,2,3,4],[1,2,3,4]])
print(A)
A = A.T
Asum = A.sum(axis=1)
print(Asum)
for i in range(len(Asum)):
A[:,i] = A[:,i]/Asum[i]
I'm hoping some decimal matrix but it automatically turn into integer. It gives me a zero matrix. Where do I go wrong?
You must change:
Asum = A.sum(axis=1)
by:
Asum = A.sum(axis=0)
To get the column by column sum.
Also you can get the division easily with numpy.divide:
np.divide(A, Asum)
#array([[0.1, 0.1, 0.1],
# [0.2, 0.2, 0.2],
# [0.3, 0.3, 0.3],
# [0.4, 0.4, 0.4]])
Or simply with:
A/Asum
Your A is integer dtype; assigned floats get truncated. If A started as a float array your iteration would work. But you don't need to iterate to perform this calculation:
In [108]: A = np.array([[1,2,3,4],[1,2,3,4],[1,2,3,4]]).T
In [109]: A
Out[109]:
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4]])
In [110]: Asum = A.sum(axis=1)
In [111]: Asum
Out[111]: array([ 3, 6, 9, 12])
A is (4,3), Asum is (4,). If we make it (4,1):
In [114]: Asum[:,None]
Out[114]:
array([[ 3],
[ 6],
[ 9],
[12]])
we can perform the divide without iteration (review broadcasting if necessary):
In [115]: A/Asum[:,None]
Out[115]:
array([[0.33333333, 0.33333333, 0.33333333],
[0.33333333, 0.33333333, 0.33333333],
[0.33333333, 0.33333333, 0.33333333],
[0.33333333, 0.33333333, 0.33333333]])
sum has keepdims parameter that makes this kind of calculation easier:
In [117]: Asum = A.sum(axis=1, keepdims=True)
In [118]: Asum
Out[118]:
array([[ 3],
[ 6],
[ 9],
[12]])
I'm calculating an aggregate value over smaller blocks in a 2D numpy array. I'd like to exclude values 0 from the aggregation operation in an efficient manner (rather than for and if statements).
I'm using skimage.measure.block_reduce and numpy.ma.masked_equal, but it looks like block_reduce ignores the mask.
import numpy as np
import skimage
a = np.array([[2,4,0,12,5,7],[6,0,8,4,3,9]])
zeros_included = skimage.measure.block_reduce(a,(2,2),np.mean)
includes 0s and (correctly) produces
zeros_included
array([[3., 6., 6.]])
I was hoping
masked = np.ma.masked_equal(a,0)
zeros_excluded = skimage.measure.block_reduce(masked,(2,2),np.mean)
would do the trick, but still produces
zeros_excluded
array([[3., 6., 6.]])
The desired result would be:
array([[4., 8., 6.]])
I'm looking for a pythonesque way to achieve the correct result, use of skimage is optional. Of course my actual arrays and blocks are much bigger than in this example, hence the need for efficiency.
Thanks for your interest.
You could use np.nanmean, but you'll have to modify original array or create a new one:
import numpy as np
import skimage
a = np.array([[2,4,0,12,5,7],[6,0,8,4,3,9]])
b = a.astype("float")
b[b==0] = np.nan
zeros_excluded = skimage.measure.block_reduce(b,(2,2), np.nanmean)
zeros_excluded
# array([[4., 8., 6.]])
The core code of block_reduce is
blocked = view_as_blocks(image, block_size)
return func(blocked, axis=tuple(range(image.ndim, blocked.ndim)))
view_as_blocks uses as_strided to create a different view of the array:
In [532]: skimage.util.view_as_blocks(a,(2,2))
Out[532]:
array([[[[ 2, 4],
[ 6, 0]],
[[ 0, 12],
[ 8, 4]],
[[ 5, 7],
[ 3, 9]]]])
When applied to the masked array it produces the same thing. In effect it works with masked.data, or np.asarray(masked). Some actions preserve subclasses, this does not.
In [533]: skimage.util.view_as_blocks(masked,(2,2))
Out[533]:
array([[[[ 2, 4],
[ 6, 0]],
...
That's why the np.mean applied to the (2,3) axes does not respond to the masking.
np.mean applied to a masked array delegates the action to the arrays own method, so is sensitive to the masking:
In [544]: np.mean(masked[:,:2])
Out[544]: 4.0
In [545]: masked[:,:2].mean()
Out[545]: 4.0
In [547]: [masked[:,i:i+2].mean() for i in range(0,6,2)]
Out[547]: [4.0, 8.0, 6.0]
np.nanmean works with view_as_blocks because it doesn't depend on the array being a special subclass.
I can define a function that applies masking to the block view:
def foo(arr,axis):
return np.ma.masked_equal(arr,0).mean(axis)
In [552]: skimage.measure.block_reduce(a,(2,2),foo)
Out[552]:
masked_array(data=[[4.0, 8.0, 6.0]],
mask=[[False, False, False]],
fill_value=1e+20)
====
Since your blocks aren't overlapping, I create the blocks with reshaping and swapping axes.
In [554]: masked.reshape(2,3,2).transpose(1,0,2)
Out[554]:
masked_array(
data=[[[2, 4],
[6, --]],
[[--, 12],
[8, 4]],
[[5, 7],
[3, 9]]],
mask=[[[False, False],
[False, True]],
[[ True, False],
[False, False]],
[[False, False],
[False, False]]],
fill_value=0)
and then apply mean to the last 2 axes:
In [555]: masked.reshape(2,3,2).transpose(1,0,2).mean((1,2))
Out[555]:
masked_array(data=[4.0, 8.0, 6.0],
mask=[False, False, False],
fill_value=1e+20)
Are there alternative or better ways to convert a numpy matrix to a python array than this?
>>> import numpy
>>> import array
>>> b = numpy.matrix("1.0 2.0 3.0; 4.0 5.0 6.0", dtype="float16")
>>> print(b)
[[ 1. 2. 3.]
[ 4. 5. 6.]]
>>> a = array.array("f")
>>> a.fromlist((b.flatten().tolist())[0])
>>> print(a)
array('f', [1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
You could convert to a NumPy array and generate its flattened version with .ravel() or .flatten(). This could also be achieved by simply using the function np.ravel itself as it does both these takes under the hood. Finally, use array.array() on it, like so -
a = array.array('f',np.ravel(b))
Sample run -
In [107]: b
Out[107]:
matrix([[ 1., 2., 3.],
[ 4., 5., 6.]], dtype=float16)
In [108]: array.array('f',np.ravel(b))
Out[108]: array('f', [1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
here is an example :
>>> x = np.matrix(np.arange(12).reshape((3,4))); x
matrix([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> x.tolist()
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
I was trying to split a float array into sub arrays using numpy split, however the results are not correct:
import numpy as np
x = np.array([1.2, 1.3, 1.5, 2, 2.1, 2.5])
np.split(x, [1, 2, 3])
Out[127]: [array([ 1.2]), array([ 1.3]), array([ 1.5]), array([ 2. , 2.1, 2.5])]
1.2, 1.3 and 1.5 should be put into one sub array but they are separated, whereas it seems it splits the 2, 2.1 and 2.5 correctly.
I guess you want to split the array into the elements that are smaller than 1, between 1 and 2, between 2 and 3 and greater than 3 (4 bins). If we assume the array is sorted then the following will work:
>>> x = np.array([0.4, 1.2, 1.3, 1.5, 2, 2.1, 2.5, 3.4])
>>> np.split(x, np.bincount(np.digitize(x, [1, 2, 3])).cumsum())[:-1]
[array([ 0.4]),
array([ 1.2, 1.3, 1.5]),
array([ 2. , 2.1, 2.5]),
array([ 3.4])]
With np.digitize we get the index of the bin for each array element. With np.bincount we get the number of elements in each bin. With np.cumsum we can take the splitting indexes of each bin in the sorted array. Finally, we have what np.split needs.
Quoted from the docs:
numpy.split(ary, indices_or_sections, axis=0)
indices_or_sections : int or 1-D array If indices_or_sections is an
integer, N, the array will be divided into N equal arrays along axis.
If such a split is not possible, an error is raised. If
indices_or_sections is a 1-D array of sorted integers, the entries
indicate where along axis the array is split. For example, [2, 3]
would, for axis=0, result in ary[:2] ary[2:3] ary[3:] If an index
exceeds the dimension of the array along axis, an empty sub-array is
returned correspondingly.
So, if you want to split a the third element on the axis you need to do something like this:
In [1]: import numpy as np
In [2]: x = np.array([1.2, 1.3, 1.5, 2, 2.1, 2.5])
In[3]: np.split(x, [3])
Out[3]: [array([ 1.2, 1.3, 1.5]), array([ 2. , 2.1, 2.5])]
If you rather want to split the array x into two equal sub-arrays:
In [4]: np.split(x, 2)
Out[4]: [array([ 1.2, 1.3, 1.5]), array([ 2. , 2.1, 2.5])]
np.split(x, [1, 2, 3]) gives you x[:1], x[1:2], x[3:] which obviously is not what you want. It seems what you want is np.split(x, [3]).