List of lists equivalent for numpy 2D indexing [:,0] - python

I need to access the first element of each list within a list. Usually I do this via numpy arrays by indexing:
import numpy as np
nparr=np.array([[1,2,3],[4,5,6],[7,8,9]])
first_elements = nparr[:,0]
b/c:
print(nparr[0,:])
[1 2 3]
print(nparr[:,0])
[1 4 7]
Unfortunately I have to tackle non-rectangular dynamic arrays now so numpy won't work.
But Pythons standard lists behave strangely (at least for me):
pylist=[[1,2,3],[4,5,6],[7,8,9]]
print(pylist[0][:])
[1, 2, 3]
print(pylist[:][0])
[1, 2, 3]
I guess either lists doesn't support this (which would lead to a second question: What to use instead) or I got the syntax wrong?

You have a few options. Here's one.
pylist=[[1,2,3],[4,5,6],[7,8,9]]
print(pylist[0]) # [1, 2, 3]
print([row[0] for row in pylist]) # [1, 4, 7]
Alternatively, if you want to transpose pylist (make its rows into columns), you could do the following.
pylist_transpose = [*zip(*pylist)]
print(pylist_transpose[0]) # [1, 4, 7]
pylist_transpose will always be a rectangular array with a number of rows equal to the length of the shortest row in pylist.

Related

Python: Obtain a matrix containing all differences between all elements from another matrix

I need to calculate all differences (in absolute value) between all combinations of the elements from a given matrix. For example, given some matrix M such as
M = [[1 2]
[5 8]]
I need to obtain a matrix X defined as
X = [[1 4 7]
[1, 3, 6]
[4, 3, 3]
[7, 6, 3]]
where I associated each row to each element in M and each column with its substraction from every other element (except with itself). I've been trying to make some for cicles, but I haven't been able to get something close to what I want.
In my code, I used numpy, thus defining all the matrices as
M = np.zeros([nx, ny])
and then replacing the values as the code progresses such as
M[i, j] = 5
While this can be easily done by looping over elements, here is a faster and more pythonic way.
import numpy as np
M = np.array([[1,2],[5,8]])
# flatten M matrix
flatM = M.reshape(-1)
# get all pairwise differences
X = flatM[:,None] - flatM[None,:]
# remove diagonal elements, since you don't want differences between same elements
mask = np.where(~np.eye(X.shape[0],dtype=bool))
X = X[mask]
# reshape X into desired form
X = X.reshape(len(flatM),-1)
# take absolute values
X = np.abs(X)
print(X)
Removal of diagonal elements was done using approach suggested here How to get indices of non-diagonal elements of a numpy array?
Sensibly the same approach as the very good answer of #wizzzz1, but with a simpler syntax and faster execution:
a = M.ravel()
X = (abs(a-a[:, None])
[~np.eye(M.size, dtype=bool)]
.reshape(-1, M.size-1)
)
Output:
array([[1, 4, 7],
[1, 3, 6],
[4, 3, 3],
[7, 6, 3]])

Numpy array indexing syntax

I am learning numpy newly and confused about syntax used in indexing of arrays. For example:
arr[2, 3]
This means element at intersection of 3nd row and 4th column. What confuses me separation of different indices by comma inside square brackets (like in function arguments). Doing so with python lists is not valid:
l = [[1, 2], [3, 4]]
l[1, 1]
Traceback (most recent call last):
File "", line 1, in
TypeError: list indices must be integers or slices, not tuple
So, if this not a valid python syntax, how numpy arrays work?
Use Colon ':' instead of commas ','.
In slicing or indexing is done using colon ':'
In your above example,
l = [[1, 2], [3, 4]]
->l[0] is [1,2] and -> l[1] is [3,4]
Read further documentation for better understanding.
Thank You
In your given example, you're comparing a numpy array to a list of lists. The main difference between the two is that a numpy array is predictable in terms of shape, data type of its elements, and so on, while a list can contain an arbitrary combination of any other python objects (lists, tuples, strings, etc.)
Take this as an example, say you create a numpy array like so:
arr = np.array([[0, 1], [2, 3], [4, 5]])
Here, the shape of arr is known right after instantiation "arr.shape returns (3,2)", so you can easily index the array with only a comma separated square bracket. On the other hand, take the list example:
l = [[0, 1], [2, 3], [4, 5]]
l[0] # This returns the list [0, 1]
l[0].append("HELLO")
l[0] # This returns the list [0, 1, "HELLO"]
A list is very unpredictable, as there's no way to know what each list element will return to you. So, the way we index a specific element in a list of lists is by using 2 square brackets "e.g. l[0][0]"
What if we created a non-uniform numpy array? Well, you get a similar behaviour to a list of lists:
arr = np.array([[0, 1], [2, 3], [4]]) # Here, you get a Warning!
print(arr) # Returns: array([list([0, 1]), list([2, 3]), list([4])], dtype=object)
In this case, you can't index the numpy array using [0, 0]. Instead, you have to use two square brackets, just like a list of lists
You can also check the documentation of ndarray for more info.

array[row][col] vs array[row,col] in Python

What is the difference between indexing a 2D array row/col with [row][col] vs [row, col] in numpy/pandas? Is there any implications of using either of these two?
For example:
import numpy as np
arr = np.array([[1, 2], [3, 4]])
print(arr[1][0])
print(arr[1, 0])
Both give 3.
Single-element indexing
For single elements indexing as in your example, the result is indeed the same. Although as stated in the docs:
So note that x[0,2] = x[0][2] though the second case is more
inefficient as a new temporary array is created after the first index
that is subsequently indexed by 2.
emphasis mine
Array indexing
In this case, not only that double-indexing is less efficient - it simply gives different results. Let's look at an example:
>>> arr = np.array([[1, 2], [3, 4], [5, 6]])
>>> arr[1:][0]
[3 4]
>>> arr[1:, 0]
[3 5]
In the first case, we create a new array after the first index which is all rows from index 1 onwards:
>>> arr[1:]
[[3 4]
[5 6]]
Then we simply take the first element of that new array which is [3 4].
In the second case, we use numpy indexing which doesn't index the elements but indexes the dimensions. So instead of taking the first row, it is actually taking the first column - [3 5].
Using [row][col] is one more function call than using [row, col]. When you are indexing an array (in fact, any object, for that matter), you are calling obj.__getitem__ under the hook. Since Python wraps the comma in a tuple, doing obj[row][col] is the equivalent of calling obj.__getitem__(row).__getitem__(col), whereas obj[row, col] is simply obj.__getitem__((row,col)). Therefore, indexing with [row, col] is more efficient because it has one fewer function call (plus some namespace lookups but they can normally be ignored).

Python: Complex for-loops

I am working through some code trying to understand some Python mechanics, which I just do not get. I guess it is pretty simple and I also now, what it does, but i do not know how it works. I understand the normal use of for-loops but this here... I do not know.
Remark: I know some Python, but I am not an expert.
np.array([[[S[i,j]] for i in range(order+1)] for j in range(order+1)])
The second piece of code, I have problems with is this one:
for i in range(len(u)):
for j in range(len(v)):
tmp+=[rm[i,j][k]*someFuction(name,u[i],v[j])[k] for k in range(len(rm[i,j])) if rm[i,j][k]]
How does the innermost for-loop work? And also what does the if do here?
Thank you for your help.
EDIT: Sorry that the code is so unreadable, I just try to understand it myself. S, rm are numpy matrices, someFunction returns an array with scalar entries, andtmp is just a help variable
There are quite a few different concepts inside your code. Let's start with the most basic ones. Python lists and numpy arrays have different methodologies for indexation. Also you can build a numpy array by providing it a list:
S_list = [[1,2,3], [4,5,6], [7,8,9]]
S_array = np.array(S_list)
print(S_list)
print(S_array)
print(S_list[0][2]) # indexing element 2 from list 0
print(S_array[0,2]) # indexing element at position 0,2 of 2-dimensional array
This results in:
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
[[1 2 3]
[4 5 6]
[7 8 9]]
3
3
So for your first line of code:
np.array([[[S[i,j]] for i in range(order+1)] for j in range(order+1)])
You are building a numpy array by providing it a list. This list is being built with the concept of list comprehension. So the code inside the np.array(...) method:
[[[S[i,j]] for i in range(order+1)] for j in range(order+1)]
... is equivalent to:
order = 2
full_list = []
for j in range(order+1):
local_list = []
for i in range(order+1):
local_list.append(S_array[i, j])
full_list.append(local_list)
print(full_list)
This results in:
[[1, 4, 7], [2, 5, 8], [3, 6, 9]]
As for your second snippet its important to notice that although typically numpy arrays have very specific and constant (for all the array) cell types you can actually give the data type object to a numpy array. So creating a 2-dimensional array of lists is possible. It is also possible to create a 3-dimensional array. Both are compatible with the indexation rm[i,j][k]. You can check this in the following example:
rm = np.array(["A", 3, [1,2,3]], dtype="object")
print(rm, rm[2][0]) # Acessing element 0 of list at position 2 of the array
rm2 = np.zeros((3, 3, 3))
print(rm2[0, 1][2]) # This is also valid
The following code:
[rm[i,j][k]*someFuction(name,u[i],v[j])[k] for k in range(len(rm[i,j])) if rm[i,j][k]]
... could be written as such:
some_list = []
for k in range(len(rm[i,j])):
if rm[i, j][k]: # Expecting a boolean value (or comparable)
a_list = rm[i,j][k]*someFuction(name,u[i],v[j])
some_list.append(a_list[k])
The final detail is the tmp+=some_list. When you sum two list they'll be concatenated as can been seen in this simple example:
tmp = []
tmp += [1, 2, 3]
print(tmp)
tmp += [4, 5, 6]
print(tmp)
Which results in this:
[1, 2, 3]
[1, 2, 3, 4, 5, 6]
Also notice that multiplying a list by a number will effectively be the same as summing the list several times. So 2*[1,2] will result in [1,2,1,2].
Its a list comprehension, albeit a pretty unreadable one. That was someome doing something very 'pythonic' in spite of readablity. Just look up list comprehensions and try to rewrite it yourself as a traditional for loop. list comprehensions are very useful, not sure I would have gone that route here.
The syntax for a list comprehension is
[var for var in iterable if optional condition]
So this bottom line can be rewritten like so:
for k in range(len(rm[i,j]):
if rm[i,j][k]:
tmp+= rm[i,j][k]*someFunction(name,u[i],v[j])[k]

Index a numpy array with another array

I feel silly, because this is such a simple thing, but I haven't found the answer either here or anywhere else.
Is there no straightforward way of indexing a numpy array with another?
Say I have a 2D array
>> A = np.asarray([[1, 2], [3, 4], [5, 6], [7, 8]])
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
if I want to access element [3,1] I type
>> A[3,1]
8
Now, say I store this index in an array
>> ind = np.array([3,1])
and try using the index this time:
>> A[ind]
array([[7, 8],
[3, 4]])
the result is not A[3,1]
The question is: having arrays A and ind, what is the simplest way to obtain A[3,1]?
Just use a tuple:
>>> A[(3, 1)]
8
>>> A[tuple(ind)]
8
The A[] actually calls the special method __getitem__:
>>> A.__getitem__((3, 1))
8
and using a comma creates a tuple:
>>> 3, 1
(3, 1)
Putting these two basic Python principles together solves your problem.
You can store your index in a tuple in the first place, if you don't need NumPy array features for it.
That is because by giving an array you actually ask
A[[3,1]]
Which gives the third and first index of the 2d array instead of the first index of the third index of the array as you want.
You can use
A[ind[0],ind[1]]
You can also use (if you want more indexes at the same time);
A[indx,indy]
Where indx and indy are numpy arrays of indexes for the first and second dimension accordingly.
See here for all possible indexing methods for numpy arrays: http://docs.scipy.org/doc/numpy-1.10.1/user/basics.indexing.html

Categories