2d Array column slicing in Pure Python without for loops - python

Is it possible to slice a column off a 2d array in pure Python without a for loop or list comprehension? Say for instance you have a 4x4 array of ints:
grid = [[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]]
and let's say you'd like to return the grid without the first row and the last column [[5,6,7],[9,10,11],[13,14,15]]
Is there a slicing syntax that allows you to do this? Excluding the first row is easily achieved with grid = grid[1:4]
However doing something like grid = grid[1:4][0:2] seems like it should work but results in [[5, 6, 7, 8], [9, 10, 11, 12]]. If at all possible, I'd like to avoid having to iterate through it in a for loop/list comprehension. I know that would work, but I'm wondering if there's a more elegant syntax.

To ddejohn's point, this can't be done with just slicing notation. This doesn't use slicing notation but a good answer that doesn't use list comps or for loops is list(zip(*matrix)) if matrix is the input list.

Related

Grouping elements of a NumPy array by sum of indices

I have several large numpy array of dimensions 30*30*30, on which I need to traverse the array, get the sum of each index triplet and bin these elements by this sum. For example, consider this simple 2*2 array:
test = np.array([[2,3],[0,1]])
This array has the indices [0,0],[0,1],[1,0] and [1,1]. This routine would return the list: [2,[3,0],1], because 2 in array test has index sum 0, 3 and 0 have index sum 1 and 1 has index sum 2. I know the brute force method of iterating through the NumPy array and checking the sum would work, but it is far too inefficient for my actual case with large N(=30) and several arrays. Any inputs on using NumPy routines to accomplish this grouping would be appreciated. Thank you in advance.
Here is one way that should be reasonably fast, but not super-fast: 30x30x30 takes 20 ms on my machine.
import numpy as np
# make example
dims = 2,3,4
a = np.arange(np.prod(dims),0,-1).reshape(dims)
# create and sort indices
idx = sum(np.ogrid[tuple(map(slice,dims))])
srt = idx.ravel().argsort(kind='stable')
# use order to arrange and split data
asrt = a.ravel()[srt]
spltpts = idx.ravel().searchsorted(np.arange(1,np.sum(dims)-len(dims)+1),sorter=srt)
out = np.split(asrt,spltpts)
# admire
out
# [array([24]), array([23, 20, 12]), array([22, 19, 16, 11, 8]), array([21, 18, 15, 10, 7, 4]), array([17, 14, 9, 6, 3]), array([13, 5, 2]), array([1])]
You could procedural create a list of index tuplets and use that, but may be getting into a code constant that's too large to be efficient.
[(0,0),[(1,0),(0,1)],(1,1)],
So you need a function to generate these indexes on the fly for an n-demensional array.
For one dimension, a trivial count/increment
[(0),(1),(2),...]
The the second, use the one dimension strategy for the fist dimension, the decrement the first and increment the second to fill in.
[(0...)...,(1...)...,(2...)...,...]
[[(0,0)],[(1,0),(0,1)],[(2,0),(1,1),(0,2)],[...],...]
Notice some of these would be outside the example array, Your generator would need to include a bounds check.
Then three dimensions, give the first two demensions the treatment as above, but at the end, decrement the first dimension, increment the third, repeat until done
[[(0,0,0),...],[(1,0,0),(0,1,0),...],[(2,0,0),(1,1,0),(0,2,0),...],[...],...]
[[(0,0,0)],[(1,0,0),(0,1,0),(0,0,1)],[(2,0,0),(1,1,0),(0,2,0),(1,0,1),(0,1,1)(0,0,2)
Again need bounds checks or cleverer starting/end points to avoid trying to access outside the index, but this general algorithm is how you'd go about generating the indexes on the fly rather than having two large arrays compete for cache and i/o.
Generating the python or nympy equivalent is left as an exercise to the user.

Python: fast subscription of the elements in 2D array that satisfy a condition

I am new in Python and my question might be too obvious, but I did not find a sufficiently good answer.
Suppose I have a 2D array a=np.array([[1,2,3],[4,5,6],[7,8,9]]). How can I subscribe those elements that satisfy a condition? Suppose, I want to increase by one those elements of a that are greater than 3. In Matlab, I would do it in 1 line:
a(a>3)=a(a>3)+1. What about Python?
The result should be [[1,2,3],[5,6,7],[8,9,10]].
I am aware that there are functions that can return the indeces I need, like np.where. I am also aware that there is a way of indexing 2D array with two 1D arrays. I was not able to combine those together.
Of course, I am able to do it with for loop. I am interested, is there a convenient Matlab-like way of doing this?
Thanks
If you already know how boolean indexing works then just an in-place addition is all you need to do:
In [6]: a=np.array([[1,2,3],[4,5,6],[7,8,9]])
In [7]: a[a>3] += 1 # roughly equal to a = a[a>3] + 1
In [8]: a
Out[8]:
array([[ 1, 2, 3],
[ 5, 6, 7],
[ 8, 9, 10]])

Python list or pandas dataframe arbitrary indexing and slicing

I have used both R and Python extensively in my work, and at times I get the syntax between them confused.
In R, if I wanted to create a model from only some features of my data set, I can do something like this:
subset = df[1:1000, c(1,5,14:18,24)]
This would take the first 1000 rows (yes, R starts on index 1), and it would take the 1st, 5th, 14th through 18th, and 24th columns.
I have tried to do any combination of slice, range, and similar sorts of functions, and have not been able to duplicate this sort of flexibility. In the end, I just enumerated all of the values.
How can this be done in Python?
Pick an arbitrary subset of elements from a list, some of which are selected individually (as in the commas shown above) and some selected sequentially (as in the colons shown above)?
In a file of index_tricks, numpy defines a class instance that converts a scalars and slices into an enumerated list, using the r_ method:
In [560]: np.r_[1,5,14:18,24]
Out[560]: array([ 1, 5, 14, 15, 16, 17, 24])
It's an instance with a __getitem__ method, so it uses the indexing syntax. It expands 14:18 into np.arange(14,18). It can also expand values with linspace.
So I think you'd rewrite
subset = df[1:1000, c(1,5,14:18,24)]
as
df.iloc[:1000, np.r_[0,4,13:17,23]]
You can use iloc for integer indexing in pandas:
df.iloc[0:10000, [0, 4] + range(13,18) + [23]]
As commented by #root, in Python 3, you need to explicitly convert range() to list by df.iloc[0:10000, [0, 4] + list(range(13,18)) + [23]]
Try this, The first square brackets filter. The second set of square brackets slice.
df[[0,4]+ range(13,18)+[23]][:1000]

Easier way to produce a list out of a nested list?

Using some numerical algorithm, my code produces a list of matrices, which is stored in a nested list, like follows
A = [matrix([[1,2],[3,4]]), matrix([[5,6],[7,8]]), ...)
Subsequently, I want to plot the values 1,5,9,... against some other list, say 'x', with the same length. At the moment I loop over the values I want like such
wanted_sol
for i in range(0,len(A))
wanted_sol.append(A[i][0,0])
and then I plot 'wanted_sol'. I was wondering if there is a shorter way to do this? I tried several things like
plot(x, A[:][0,0])
plot(x, A[0:len(A)][0,0]),
but I cannot get it to work.
You can convert A to numpy.ndarray and use numpy slice notation:
>>> A = np.array([np.matrix([[1,2],[3,4]]), np.matrix([[5,6],[7,8]])])
>>> A[:,0,0]
array([1, 5])

Finding the dimension of a nested list in python

We can create multi-dimensional arrays in python by using nested list, such as:
A = [[1,2,3],
[2,1,3]]
etc.
In this case, it is simple nRows= len(A) and nCols=len(A[0]). However, when I have more than three dimensions it would become complicated.
A = [[[1,1,[1,2,3,4]],2,[3,[2,[3,4]]]],
[2,1,3]]
etc.
These lists are legal in Python. And the number of dimensions is not a priori.
In this case, how to determine the number of dimensions and the number of elements in each dimension.
I'm looking for an algorithm and if possible implementation. I believe it has something similar to DFS. Any suggestions?
P.S.: I'm not looking for any existing packages, though I would like to know about them.
I believe to have solve the problem my self.
It is just a simple DFS.
For the example given above: A = [[[1,1,[1,2,3,4]],2,[3,[2,[3,4]]]],
[2,1,3]]
the answer is as follows:
[[3, 2, 2, 2, 3, 4], [3]]
The total number of dimensions is the 7.
I guess I was overthinking... thanks anyway...!

Categories