How to fix IndexError from numpy? - python

import numpy as np
from scipy.io import mmread
from scipy import linalg
A = mmread('bcspwr02.mtx')
A =np.transpose(A)+A+np.identity(A.shape[0])
#A = np.array([[20, 18, 1], [2, 3, 1], [1, 2, 1]])
def get_b(A):
n = A.shape[0]
b = np.ones(n)
return b
def Jacobi(A, b, numIter):
n = A.shape[0]
x=np.zeros(n)
x0 = np.zeros(n)
for numItr in range(numIter):
print("Iteration "+ str(numItr) + ": " + str(x))
for i in range(len(A)):
temp = 0
for j in range(len(A)):
if i != j:
temp = x0[j] * A[i][j]
x[i] = float((b[i] - temp) / A[i][i])
else:
x0 = x.copy()
numIter = 4
Jacobi(A, get_b(A), numIter)
Result:
Iteration 0: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0.]
Traceback (most recent call last):
File "/Users/cxf/Desktop/test.py", line 36, in <module>
Jacobi(A, get_b(A), numIter)
File "/Users/cxf/Desktop/test.py", line 29, in Jacobi
temp = x0[j] * A[i][j]
File "/Applications/Spyder.app/Contents/Resources/lib/python3.9/numpy/matrixlib/defmatrix.py", line 193, in __getitem__
out = N.ndarray.__getitem__(self, index)
IndexError: index 1 is out of bounds for axis 0 with size 1

What exactly does mmread return?
A = mmread('bcspwr02.mtx')
A =np.transpose(A)+A+np.identity(A.shape[0])
The docs say "Dense or sparse matrix depending on the matrix format in the Matrix Market file."
Let's experiment with a sparse matrix:
In [52]: A = sparse.coo_matrix([[1,0,1],[0,0,1],[0,1,0]])
In [53]: A
Out[53]:
<3x3 sparse matrix of type '<class 'numpy.int64'>'
with 4 stored elements in COOrdinate format>
In [54]: A.A
Out[54]:
array([[1, 0, 1],
[0, 0, 1],
[0, 1, 0]])
In [58]: A1 = np.transpose(A)+A+np.identity(A.shape[0])
In [59]: A1
Out[59]:
matrix([[3., 0., 1.],
[0., 1., 2.],
[1., 2., 1.]])
In [60]: A1[0]
Out[60]: matrix([[3., 0., 1.]]) # shape (1,3)
In [61]: A1[0][0]
Out[61]: matrix([[3., 0., 1.]]) # still (1,3)
In [62]: A1[0][1]
Traceback (most recent call last):
File "<ipython-input-62-c6007014201d>", line 1, in <module>
A1[0][1]
File "/usr/local/lib/python3.8/dist-packages/numpy/matrixlib/defmatrix.py", line 193, in __getitem__
out = N.ndarray.__getitem__(self, index)
IndexError: index 1 is out of bounds for axis 0 with size 1
If A is coo matrix, then the transpose expression creates a np.matrix. A1[i][j] indexing does not work the same as for regular numpy array. Instead you need to use the safe A1[i,j] syntax.
In [63]: A1[0,1]
Out[63]: 0.0
Note that the traceback tells me the error is in the defmatrix file. I should have read your traceback more carefully. The nature of the problem was hidden in plain sight!
initial
Evidently, in
x0[j] * A[i][j]
either j or i is too large. Why - we/you have to look at how they are set, and what the shape of x0 and A are.
Try to understand the error before asking how to fix it.
In the commented line A is (3,3), so n=3. Then x0 will be (3,) shape. i and j iterate over range(3). With those shapes
x0[j] * A[i][j]
x0[j] * A[i,j] # better
should work.
But the error says one of the arrays has shape (1,?) or (1,).
You need to check the array shapes. Don't just assume the shapes are right; when there's an error, you must verify.

Related

How to vectorize increments in Python

I have a 2d array, and I have some numbers to add to some cells. I want to vectorize the operation in order to save time. The problem is when I need to add several numbers to the same cell. In this case, the vectorized code only adds the last.
'a' is my array, 'x' and 'y' are the coordinates of the cells I want to increment, and 'z' contains the numbers I want to add.
import numpy as np
a=np.zeros((4,4))
x=[1,2,1]
y=[0,1,0]
z=[2,3,1]
a[x,y]+=z
print(a)
As you see, a[1,0] should be incremented twice: one by 2, one by 1. So the expected array should be:
[[0. 0. 0. 0.]
[3. 0. 0. 0.]
[0. 3. 0. 0.]
[0. 0. 0. 0.]]
but instead I get:
[[0. 0. 0. 0.]
[1. 0. 0. 0.]
[0. 3. 0. 0.]
[0. 0. 0. 0.]]
The problem would be easy to solve with a for loop, but I wonder if I can correctly vectorize this operation.
Use np.add.at for that:
import numpy as np
a = np.zeros((4,4))
x = [1, 2, 1]
y = [0, 1, 0]
z = [2, 3, 1]
np.add.at(a, (x, y), z)
print(a)
# [[0. 0. 0. 0.]
# [3. 0. 0. 0.]
# [0. 3. 0. 0.]
# [0. 0. 0. 0.]]
When you're doing a[x,y]+=z, we can decompose the operations as :
a[1, 0], a[2, 1], a[1, 0] = [a[1, 0] + 2, a[2, 1] + 3, a[1, 0] + 1]
# Equivalent to :
a[1, 0] = 2
a[2, 1] = 3
a[1, 0] = 1
That's why it doesn't works.
But if you're incrementing your array with a loop for each dimention, it should work
You could create a multi-dimensional array of size 3x4x4, then add up z to all the 3 different dimensions and them sum them all
import numpy as np
x = [1,2,1]
y = [0,1,0]
z = [2,3,1]
a = np.zeros((3,4,4))
n = range(a.shape[0])
a[n,x,y] += z
print(sum(a))
which will result in
[[0. 0. 0. 0.]
[3. 0. 0. 0.]
[0. 3. 0. 0.]
[0. 0. 0. 0.]]
Approach #1: Bincount-based method for performance
We can use np.bincount for efficient bin-based summation and basically inspired by this post -
def accumulate_arr(x, y, z, out):
# Get output array shape
shp = out.shape
# Get linear indices to be used as IDs with bincount
lidx = np.ravel_multi_index((x,y),shp)
# Or lidx = coords[0]*(coords[1].max()+1) + coords[1]
# Accumulate arr with IDs from lidx
out += np.bincount(lidx,z,minlength=out.size).reshape(out.shape)
return out
If you are working with a zeros-initialized output array, feed in the output shape directly into the function and get the bincount output as the final one.
Output on given sample -
In [48]: accumulate_arr(x,y,z,a)
Out[48]:
array([[0., 0., 0., 0.],
[3., 0., 0., 0.],
[0., 3., 0., 0.],
[0., 0., 0., 0.]])
Approach #2: Using sparse-matrix for memory-efficiency
In [54]: from scipy.sparse import coo_matrix
In [56]: coo_matrix((z,(x,y)), shape=(4,4)).toarray()
Out[56]:
array([[0, 0, 0, 0],
[3, 0, 0, 0],
[0, 3, 0, 0],
[0, 0, 0, 0]])
If you are okay with a sparse-matrix, skip the .toarray() part for a memory-efficient solution.

What does the rcond parameter of numpy.linalg.pinv do?

While looking up how to calculate pseudo-inverses in numpy (1.15.4) I noticed that numpy.linalg.pinv has a parameter rcond for which the description reads:
rcond : (…) array_like of float
Cutoff for small singular values. Singular values smaller (in
modulus) than rcond * largest_singular_value (again, in modulus)
are set to zero. Broadcasts against the stack of matrices
From my understanding if rcond is a scalar float, all entries
in the output of pinv which would have been smaller than rcond should be set to zero instead (which would be really useful) but this is not what happens, e.g.:
>>> A = np.array([[ 0., 0.3, 1., 0.],
[ 0., 0.4, -0.3, 0.],
[ 0., 1., -0.1, 0.]])
>>> np.linalg.pinv(A, rcond=1e-3)
array([[ 8.31963531e-17, -4.52584594e-17, -5.09901252e-17],
[ 1.82668420e-01, 3.39032588e-01, 8.09586439e-01],
[ 8.95805933e-01, -2.97384188e-01, -1.49788105e-01],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]])
What does this parameter actually do? And can I only get the behaviour I actually want by iterating over the whole output matrix again?
Under the hood, a pseudoinverse is calculated using a singular value decomposition. An initial matrix A=UDV^T is inverted as A^+=VD^+U^T, where D is a diagonal matrix with positive real values (singular values). rcond is used to zero out small entries in D. For example:
import numpy as np
# Initial matrix
a = np.array([[1, 0],
[0, 0.1]])
# SVD with diagonal entries in D = [1. , 0.1]
print(np.linalg.svd(a))
# (array([[1., 0.],
# [0., 1.]]),
# array([1. , 0.1]),
# array([[1., 0.],
# [0., 1.]]))
# Pseudoinverse
c = np.linalg.pinv(a)
print(c)
# [[ 1. 0.]
# [ 0. 10.]]
# Reconstruction is perfect
print(np.dot(a, np.dot(c, a)))
# [[1. 0. ]
# [0. 0.1]]
# Zero out all entries in D below rcond * largest_singular_value = 0.2 * 1
# Not entries of the initial or inverse matrices!
d = np.linalg.pinv(a, rcond=0.2)
print(d)
# [[1. 0.]
# [0. 0.]]
# Reconstruction is imperfect
print(np.dot(a, np.dot(d, a)))
# [[1. 0.]
# [0. 0.]]
To just zero out small values of a matrix:
a = np.array([[1, 2],
[3, 0.1]])
a[a < 0.5] = 0
print(a)
# [[1. 2.]
# [3. 0.]]

numpy: multiply arbitrary shape array along first axis

I want to multiply an array along it's first axis by some vector.
For instance, if a is 2D, b is 1D, and a.shape[0] == b.shape[0], we can do:
a *= b[:, np.newaxis]
What if a has an arbitrary shape? In numpy, the ellipsis "..." can be interpreted as "fill the remaining indices with ':'". Is there an equivalent for filling the remaining axes with None/np.newaxis?
The code below generates the desired result, but I would prefer a general vectorized way to accomplish this without falling back to a for loop.
from __future__ import print_function
import numpy as np
def foo(a, b):
"""
Multiply a along its first axis by b
"""
if len(a.shape) == 1:
a *= b
elif len(a.shape) == 2:
a *= b[:, np.newaxis]
elif len(a.shape) == 3:
a *= b[:, np.newaxis, np.newaxis]
else:
n = a.shape[0]
for i in range(n):
a[i, ...] *= b[i]
n = 10
b = np.arange(n)
a = np.ones((n, 3))
foo(a, b)
print(a)
a = np.ones((n, 3, 3))
foo(a, b)
print(a)
Just reverse the order of the axes:
transpose = a.T
transpose *= b
a.T is a transposed view of a, where "transposed" means reversing the order of the dimensions for arbitrary-dimensional a. We assign a.T to a separate variable so the *= doesn't try to set the a.T attribute; the results still apply to a, since the transpose is a view.
Demo:
In [55]: a = numpy.ones((2, 2, 3))
In [56]: a
Out[56]:
array([[[1., 1., 1.],
[1., 1., 1.]],
[[1., 1., 1.],
[1., 1., 1.]]])
In [57]: transpose = a.T
In [58]: transpose *= [2, 3]
In [59]: a
Out[59]:
array([[[2., 2., 2.],
[2., 2., 2.]],
[[3., 3., 3.],
[3., 3., 3.]]])
Following the idea of the accepted answer, you could skip the variable assignment to the transpose as follows:
arr = np.tile(np.arange(10, dtype=float), 3).reshape(3, 10)
print(arr)
factors = np.array([0.1, 1, 10])
arr.T[:, :] *= factors
print(arr)
Which would print
[[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]]
[[ 0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]
[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. ]
[ 0. 10. 20. 30. 40. 50. 60. 70. 80. 90. ]]

How to initialise a Numpy array of numpy arrays

I have a numpy array D of dimensions 4x4
I want a new numpy array based on an user defined value v
If v=2, the new numpy array should be [D D].
If v=3, the new numpy array should be [D D D]
How do i initialise such a numpy array as numpy.zeros(v) dont allow me to place arrays as elements?
If I understand correctly, you want to take a 2D array and tile it v times in the first dimension? You can use np.repeat:
# a 2D array
D = np.arange(4).reshape(2, 2)
print D
# [[0 1]
# [2 3]]
# tile it 3 times in the first dimension
x = np.repeat(D[None, :], 3, axis=0)
print x.shape
# (3, 2, 2)
print x
# [[[0 1]
# [2 3]]
# [[0 1]
# [2 3]]
# [[0 1]
# [2 3]]]
If you wanted the output to be kept two-dimensional, i.e. (6, 2), you could omit the [None, :] indexing (see this page for more info on numpy's broadcasting rules).
print np.repeat(D, 3, axis=0)
# [[0 1]
# [0 1]
# [0 1]
# [2 3]
# [2 3]
# [2 3]]
Another alternative is np.tile, which behaves slightly differently in that it will always tile over the last dimension:
print np.tile(D, 3)
# [[0, 1, 0, 1, 0, 1],
# [2, 3, 2, 3, 2, 3]])
You can do that as follows:
import numpy as np
v = 3
x = np.array([np.zeros((4,4)) for _ in range(v)])
>>> print x
[[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]
[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]
[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]]
Here you go, see if this works for you.
import numpy as np
v = raw_input('Enter: ')
To intialize the numpy array of arrays from user input (obviously can be whatever shape you're wanting here):
b = np.zeros(shape=(int(v),int(v)))
I know this isn't initializing a numpy array but since you mentioned wanting an array of [D D] if v was 2 for example, just thought I'd throw this in as another option as well.
new_array = []
for x in range(0, int(v)):
new_array.append(D)

Arrays to Matrix numpy

I have a function that is giving multiple arrays and I need to but these into a matrix.
def equations(specie, elements):
for x in specie:
formula = parse_formula(x)
print extracting_columns(formula, elements)
what im getting:
equations(['OH', 'CO2','C3O3','H2O3','CO','C3H1'], ['H', 'C', 'O'])
[ 1. 0. 1.]
[ 0. 1. 2.]
[ 0. 3. 3.]
[ 2. 0. 3.]
[ 0. 1. 1.]
[ 1. 3. 0.]
i need it to give me ([[1,0,1][[ 0., 1., 2.][ 0. , 3. , 3.][ 2. , 0. ,3.][ 0. , 1. ,1.][ 1. , 3., 0.]])
I have been messing with this for a while and cant figure it out.
If you need my past functions they are below:
def extracting_columns(specie, elements):
species_vector=zeros(len(elements))
for (el,mul) in specie:
species_vector[elements.index(el)]=mul
return species_vector
Instead of printing out each row, collect them into a list (e.g. result):
def equations(specie, elements):
result = []
for x in specie:
formula = parse_formula(x)
result.append(extracting_columns(formula, elements))
return np.array(result)
For example,
import numpy as np
import re
def equations(specie, elements):
result = []
for x in specie:
formula = parse_formula(x)
result.append(extracting_columns(formula, elements))
return np.array(result)
def extracting_columns(formula, elements):
return [formula.get(e, 0) for e in elements]
def parse_formula(formula):
elts = iter(re.split(r'([A-Z][a-z]*)',formula)[1:])
return {element:toint(num) for element, num in zip(*[elts]*2)}
def toint(num):
try:
return int(num)
except ValueError:
return 1
print(equations(['OH', 'CO2','C3O3','H2O3','CO','C3H1'], ['H', 'C', 'O']))
yields
[[1 0 1]
[0 1 2]
[0 3 3]
[2 0 3]
[0 1 1]
[1 3 0]]

Categories