Index two sets of columns in an array - python

I am trying to slice columns out of an array and assign to a new variable, like so.
array1 = array[:,[0,1,2,3,15,16,17,18,19,20]]
Is there a short cut for something like this?
I tried this, but it threw an error:
array1 = array[:,[0:3,15:20]]
This is probably really simple but I can't find it anywhere.

Use np.r_:
Translates slice objects to concatenation along the first axis.
import numpy as np
arr = np.arange(100).reshape(5, 20)
cols = np.r_[:3, 15:20]
print(arr[:, cols])
[[ 0 1 2 15 16 17 18 19]
[20 21 22 35 36 37 38 39]
[40 41 42 55 56 57 58 59]
[60 61 62 75 76 77 78 79]
[80 81 82 95 96 97 98 99]]
At the end of the day, probably only a little less verbose than what you have now, but could come in handy for more complex cases.

For most simple cases like this, the best and most straightforward way is to use concatenation:
array1 = array[0:3] + array[15:20]
For more complicated cases, you'll need to use a custom slice, such as NumPy's s_, which allows for multiple slices with gaps, separated by commas. You can read about it here.
Also, if your slice follows a pattern (i.e. get 5, skip 10, get 5 etc), you can use itertools.compress, as explained by user ncoghlan in this answer.

You could use list(range(0, 4)) + list(range(15, 20))

Related

How do I sort columns of numerical file data in python

I'm trying to write a piece of code in python to graph some data from a tab separated file with numerical data.
I'm very new to Python so I would appreciate it if any help could be dumbed down a little bit.
Basically, I have this file and I would like to take two columns from it, sort them each in ascending order, and then graph those sorted columns against each other.
First of all, you should not put code as images, since there is a functionality to insert and format here in the editor.
It's as simple as calling x.sort() and y.sort() since both of them are slices from data so that should work fine (assuming they are 1 dimensional arrays).
Here is an example:
import numpy as np
array = np.random.randint(0,100, size=50)
print(array)
Output:
[89 47 4 10 29 21 91 95 32 12 97 66 59 70 20 20 36 79 23 4]
So if we use the method mentioned before:
print(array.sort())
Output:
[ 4 4 10 12 20 20 21 23 29 32 36 47 59 66 70 79 89 91 95 97]
Easy as that :)

Unexpected results with functools reduce in python

While executing the below code I'm simply getting the first row(split) of the matrix back not the sum of the elements as what I'm expecting. Is my understanding incorrect or did I do something stupid?
My objective is to get the sum of all elements in each row.
import numpy as np
from functools import reduce
matrix = 100*np.random.rand(4,4)
matrix=matrix.astype(int)
print(matrix)
s_matrix = np.vsplit(matrix, 4)
sum_test = reduce((lambda a,b : a+b), list(s_matrix[0]))
print(sum_test)
Output:
[[79 75 33 26]
[49 45 16 19]
[58 33 83 55]
[40 14 2 93]]
[79 75 33 26]
Expected:
[213, 129, 229, 149]
Check the expression you're using: print(list(s_matrix[0])). I think you'll find that it's a double-nested list
[[79 75 33 26]]
Thus, the "sum" is merely concatenation of a single list element.
You can use reduce() for this by continually adding the results to a list in the accumulator:
import numpy as np
from functools import reduce
matrix = 100*np.random.rand(4,4)
matrix=matrix.astype(int)
sum_test = reduce(lambda a,b : a+[sum(b)], list(matrix), [])
print(sum_test)
...but you really shouldn't. One of the major points of Numpy is that you can avoid explicit python loops. Instead you should just use the array's sum function. You can pass it an axis to tell it to sum rows instead of the whole thing:
import numpy as np
matrix = np.random.randint(0, 100, [4,4])
print(matrix)
print(matrix.sum(axis = 1))
Result
[[64 89 97 15]
[12 47 81 31]
[52 81 37 78]
[27 64 79 50]]
[265 171 248 220]
sum_test = reduce((lambda a,b : a+b), list(s_matrix[0]))
above line is your problem,
you are only giving the first row of your matrix instead of giving the whole matrix

Argument must be a string, a bytes-like object or a number, not 'slice'

I am having troubles with deleting slices from a numpy array.
x_train[:,:,0]
returns the data I want to delete
but
np.delete(x_train, np.s_[:,:,0])
throws the exception
TypeError: int() argument must be a string, a bytes-like object or a number, not 'slice'
But in the documentation it is written
Return a new array with sub-arrays along an axis deleted. For a one dimensional array, this returns those entries not returned by arr[obj].
obj : slice, int or array of ints
Indicate which sub-arrays to remove.
First, in this case, np.s_ return a tuple, not a slice.
In the documentation, they say you can pass a slice as argument, but in fact they mean the python built in slice class (Doc)
A valid code would be:
x = [[[1,2,3],[4,5,6]],[[1,1,1],[2,2,2]],[[5,5,5],[7,7,7]]]
np.delete(x, slice(1,1,1))
But let's take a look at the output of np.s_.
print(np.s_[:,:,0])
returns
(slice(None,None,None), slice(None,None,None), 0)
The output of np.s_ is a tuple of objets, some are slices and some are indexes, you should read the doc of np.s_ for more information to know how to use it.
In fact the slice is the object that allow you to write mylist[0:3], in fact this code is just mylist[slice(0,3)]
mylist[:], is a special case of slice, in fact : is a slice from 0 to len(mylist)-1.
You can try this:
arr1 = np.delete(arr1, 0, axis=-1)
Testing it out:
import numpy as np
arr1 = np.arange(48).reshape(2,3,8)
print (arr1)
arr1 = np.delete(arr1, 0, axis=-1)
print (arr1)
Output:
# Before delete
[[[ 0 1 2 3 4 5 6 7]
[ 8 9 10 11 12 13 14 15]
[16 17 18 19 20 21 22 23]]
[[24 25 26 27 28 29 30 31]
[32 33 34 35 36 37 38 39]
[40 41 42 43 44 45 46 47]]]
# After delete
[[[ 1 2 3 4 5 6 7]
[ 9 10 11 12 13 14 15]
[17 18 19 20 21 22 23]]
[[25 26 27 28 29 30 31]
[33 34 35 36 37 38 39]
[41 42 43 44 45 46 47]]]
I think the problem is in your slice which is not working there. Try
np.delete(x_train, np.s_[1,1,1])

Remove Specific Indices From 2D Numpy Array

If I have a set of data that's of shape (1000,1000) and I know that the values I need from it are contained within the indices (25:888,11:957), how would I go about separating the two sections of data from one another?
I couldn't figure out how to get np.delete() to like the specific 2D case and I also need both the good and the bad sections of data for analysis, so I can't just specify my array bounds to be within the good indices.
I feel like there's a simple solution I'm missing here.
Is this how you want to divide the array?
In [364]: arr = np.ones((1000,1000),int)
In [365]: beta = arr[25:888, 11:957]
In [366]: beta.shape
Out[366]: (863, 946)
In [367]: arr[:25,:].shape
Out[367]: (25, 1000)
In [368]: arr[888:,:].shape
Out[368]: (112, 1000)
In [369]: arr[25:888,:11].shape
Out[369]: (863, 11)
In [370]: arr[25:888,957:].shape
Out[370]: (863, 43)
I'm imaging a square with a rectangle cut out of the middle. It's easy to specify that rectangle, but the frame is has to be viewed as 4 rectangles - unless it is described via the mask of what is missing.
Checking that I got everything:
In [376]: x = np.array([_366,_367,_368,_369,_370])
In [377]: np.multiply.reduce(x, axis=1).sum()
Out[377]: 1000000
Let's say your original numpy array is my_arr
Extracting the "Good" Section:
This is easy because the good section has a rectangular shape.
good_arr = my_arr[25:888, 11:957]
Extracting the "Bad" Section:
The "bad" section doesn't have a rectangular shape. Rather, it has the shape of a rectangle with a rectangular hole cut out of it.
So, you can't really store the "bad" section alone, in any array-like structure, unless you're ok with wasting some extra space to deal with the cut out portion.
What are your options for the "Bad" Section?
Option 1:
Be happy and content with having extracted the good section. Let the bad section remain as part of the original my_arr. While iterating trough my_arr, you can always discriminate between good and and bad items based on the indices. The disadvantage is that, whenever you want to process only the bad items, you have to do it through a nested double loop, rather than use some vectorized features of numpy.
Option 2:
Suppose we want to perform some operations such as row-wise totals or column-wise totals on only the bad items of my_arr, and suppose you don't want the overhead of the nested for loops. You can create something called a numpy masked array. With a masked array, you can perform most of your usual numpy operations, and numpy will automatically exclude masked out items from the calculations. Note that internally, there will be some memory wastage involved, just to store an item as "masked"
The code below illustrates how you can create a masked array called masked_arr from your original array my_arr:
import numpy as np
my_size = 10 # In your case, 1000
r_1, r_2 = 2, 8 # In your case, r_1 = 25, r_2 = 889 (which is 888+1)
c_1, c_2 = 3, 5 # In your case, c_1 = 11, c_2 = 958 (which is 957+1)
# Using nested list comprehension, build a boolean mask as a list of lists, of shape (my_size, my_size).
# The mask will have False everywhere, except in the sub-region [r_1:r_2, c_1:c_2], which will have True.
mask_list = [[True if ((r in range(r_1, r_2)) and (c in range(c_1, c_2))) else False
for c in range(my_size)] for r in range(my_size)]
# Your original, complete 2d array. Let's just fill it with some "toy data"
my_arr = np.arange((my_size * my_size)).reshape(my_size, my_size)
print (my_arr)
masked_arr = np.ma.masked_where(mask_list, my_arr)
print ("masked_arr is:\n", masked_arr, ", and its shape is:", masked_arr.shape)
The output of the above is:
[[ 0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[20 21 22 23 24 25 26 27 28 29]
[30 31 32 33 34 35 36 37 38 39]
[40 41 42 43 44 45 46 47 48 49]
[50 51 52 53 54 55 56 57 58 59]
[60 61 62 63 64 65 66 67 68 69]
[70 71 72 73 74 75 76 77 78 79]
[80 81 82 83 84 85 86 87 88 89]
[90 91 92 93 94 95 96 97 98 99]]
masked_arr is:
[[0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[20 21 22 -- -- 25 26 27 28 29]
[30 31 32 -- -- 35 36 37 38 39]
[40 41 42 -- -- 45 46 47 48 49]
[50 51 52 -- -- 55 56 57 58 59]
[60 61 62 -- -- 65 66 67 68 69]
[70 71 72 -- -- 75 76 77 78 79]
[80 81 82 83 84 85 86 87 88 89]
[90 91 92 93 94 95 96 97 98 99]] , and its shape is: (10, 10)
Now that you have a masked array, you will be able to perform most of the numpy operations on it, and numpy will automatically exclude the masked items (the ones that appear as "--" when you print the masked array)
Some examples of what you can do with the masked array:
# Now, you can print column-wise totals, of only the bad items.
print (masked_arr.sum(axis=0))
# Or row-wise totals, for that matter.
print (masked_arr.sum(axis=1))
The output of the above is:
[450 460 470 192 196 500 510 520 530 540]
[45 145 198 278 358 438 518 598 845 945]

numpy: take multiple range subsets of the same of size

What I'm looking for
# I have an array
x = np.arange(0, 100)
# I have a size n
n = 10
# I have a random set of numbers
indexes = np.random.randint(n, 100, 10)
# What I want is a matrix where every row i is the i-th element of indexes plus the previous n elements
res = np.empty((len(indexes), n), int)
for (i, v) in np.ndenumerate(indexes):
res[i] = x[v-n:v]
To reformulate, as I wrote in the title what am looking for is a way to take multiple subsets (of the same size) of an initial array.
Just to add a detail this loopy version works, I want just to know if there is a numpyish way to achieve this in a more elegant way.
The following does what you are asking for. It uses numpy.lib.stride_tricks.as_strided to create a special view on the data which can be indexed in the desired way.
import numpy as np
from numpy.lib import stride_tricks
x = np.arange(100)
k = 10
i = np.random.randint(k, len(x)+1, size=(5,))
xx = stride_tricks.as_strided(x, strides=np.repeat(x.strides, 2), shape=(len(x)-k+1, k))
print(i)
print(xx[i-k])
Sample output:
[ 69 85 100 37 54]
[[59 60 61 62 63 64 65 66 67 68]
[75 76 77 78 79 80 81 82 83 84]
[90 91 92 93 94 95 96 97 98 99]
[27 28 29 30 31 32 33 34 35 36]
[44 45 46 47 48 49 50 51 52 53]]
A bit of explanation. Arrays store not only data but also a small "header" with layout information. Amongst this are the strides which tell how to translate linear memory to nd. There is a stride for each dimension which is just the offset at which the next element along that dimension can be found. So the strides for a 2d array are (row offset, element offset). as_strided permits to directly manipulate an array's strides; by setting row offsets to the same as element offsets we create a view that looks like
0 1 2 ...
1 2 3 ...
2 3 4
. .
. .
. .
Note that no data are copied at this stage; for exasmple, all the 2s refer to the same memory location in the original array. Which is why this solution should be quite efficient.

Categories