Related
I'm trying to learn how to work with Numpy arrays in python and working on a task where the goal is to append certain values from a square function to an np array.
To be specific, trying to append to the array in such a way that the result looks like this.
[[0, 0], [1, 1], [2, 4], [3, 9], [4, 16], [5, 25](....)
In other words kind of like using a for loop to append to a nested list kind of like this:
N = 101
def f(x):
return x**2
list1 = []
for i in range(N+1):
list1.append([i])
list1[i].append(f(i))
print(list1)
When I try to do this similarly whit Numpy arrays like below:
import numpy as np
N = 101
x_min = 1
x_max = 10
y = np.zeros(N)
x = np.linspace(x_min,x_max, N)
def f(x):
return x**2
for i in y:
np.append(y,f(x))
print(y)
I get the following output:
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0.]
... which is obviously wrong
Arrays as a datatype are quite new to me, so I would massively appreciate it if anyone could help me out.
Best regards from a rookie who is motivated to learn and welcome all help.
It is kind of un-numpy-thonic (if that's a thing) to mix and match numpy arrays with vanilla python operations like for loops and appending. If I were to do this in pure numpy I would first start with your original array
>>> import numpy as np
>>> N = 101
>>> values = np.arange(N)
>>> values
array([ 0, 1, 2, ..., 99, 100])
then I would generate your squared array to create your 2D result
>>> values = np.array([values, values**2])
>>> values.T
array([[ 0, 0],
[ 1, 1],
[ 2, 4],
...
[ 98, 9604],
[ 99, 9801],
[ 100, 10000]])
Numpy gets its speed advantages in two primary ways:
Faster execution of large numbers of repeated operations (i.e. without Python for loops)
Avoiding moving data in memory (i.e. re-allocating memory space).
It's impossible to implement an indefinite append operation with Numpy arrays and still get both of these advantages. So don't do it!
I can't see in your example why an append is necessary because you know the size of the result array in advance (N).
Perhaps what you are looking for instead is vectorized function execution and assignment:
y[:] = f(x)
print(y)
(Instead of your for loop.)
This produces:
[ 1. 1.1881 1.3924 1.6129 1.8496 2.1025 2.3716 2.6569
2.9584 3.2761 3.61 3.9601 4.3264 4.7089 5.1076 5.5225
5.9536 6.4009 6.8644 7.3441 7.84 8.3521 8.8804 9.4249
9.9856 10.5625 11.1556 11.7649 12.3904 13.0321 13.69 14.3641
...etc.
Or, to get a similar output to your first bit of code:
y = np.zeros((N, 2))
y[:, 0] = x
y[:, 1] = f(x)
You could simply broadcast the operation and column_stack them.
col1 = np.arange(N)
col2 = col1 **2
list1 = np.column_stack((col1,col2))
Can someone help me please on how to generate a weighted adjacency matrix from a numpy array based on euclidean distance between all rows, i.e 0 and 1, 0 and 2,.. 1 and 2,...?
Given the following example with an input matrix(5, 4):
matrix = [[2,10,9,6],
[5,1,4,7],
[3,2,1,0],
[10, 20, 1, 4],
[17, 3, 5, 18]]
I would like to obtain a weighted adjacency matrix (5,5) containing the most minimal distance between nodes, i.e,
if dist(row0, row1)= 10,77 and dist(row0, row2)= 12,84,
--> the output matrix will take the first distance as a column value.
I have already solved the first part for the generation of the adjacency matrix with the following code :
from scipy.spatial.distance import cdist
dist = cdist( matrix, matrix, metric='euclidean')
and I get the following result :
array([[ 0. , 10.77032961, 12.84523258, 15.23154621, 20.83266666],
[10.77032961, 0. , 7.93725393, 20.09975124, 16.43167673],
[12.84523258, 7.93725393, 0. , 19.72308292, 23.17326045],
[15.23154621, 20.09975124, 19.72308292, 0. , 23.4520788 ],
[20.83266666, 16.43167673, 23.17326045, 23.4520788 , 0. ]])
But I don't know yet how to specify the number of neighbors for which we select for example 2 neighbors for each node. For example, we define the number of neighbors N = 2, then for each row, we choose only two neighbors with the two minimum distances and we get as a result :
[[ 0. , 10.77032961, 12.84523258, 0, 0],
[10.77032961, 0. , 7.93725393, 0, 0],
[12.84523258, 7.93725393, 0. , 0, 0],
[15.23154621, 0, 19.72308292, 0. , 0 ],
[20.83266666, 16.43167673, 0, 0 , 0. ]]
You can use this cleaner solution to get the smallest n from a matrix. Try the following -
The dist.argsort(1).argsort(1) creates a rank order (smallest is 0 and largest is 4) over axis=1 and the <= 2 decided the number of nsmallest values you need from the rank order. np.where filters it or replaces it with 0.
np.where(dist.argsort(1).argsort(1) <= 2, dist, 0)
array([[ 0. , 10.77032961, 12.84523258, 0. , 0. ],
[10.77032961, 0. , 7.93725393, 0. , 0. ],
[12.84523258, 7.93725393, 0. , 0. , 0. ],
[15.23154621, 0. , 19.72308292, 0. , 0. ],
[20.83266666, 16.43167673, 0. , 0. , 0. ]])
This works for any axis or if you want nlargest or nsmallest from a matrix as well.
Assuming a is your Euclidean distance matrix, you can use np.argpartition to choose n min/max values per row. Keep in mind the diagonal is always 0 and euclidean distances are non-negative, so to keep two closest point in each row, you need to keep three min per row (including 0s on diagonal). This does not hold if you want to do max however.
a[np.arange(a.shape[0])[:,None],np.argpartition(a, 3, axis=1)[:,3:]] = 0
output:
array([[ 0. , 10.77032961, 12.84523258, 0. , 0. ],
[10.77032961, 0. , 7.93725393, 0. , 0. ],
[12.84523258, 7.93725393, 0. , 0. , 0. ],
[15.23154621, 0. , 19.72308292, 0. , 0. ],
[20.83266666, 16.43167673, 0. , 0. , 0. ]])
I want to generate a modified version of the identity matrix, call it C, such that Cii is zero until some index i, the rest is still 1.
I can use brute force to set Cii to 0, but I think that is not good.
Is there any efficient functions I can use, this is hard to search.
Example below:
the original identity matrix for 3 * 3 is
1 0 0
0 1 0
0 0 1
, I want to change this into:
0 0 0
0 1 0
0 0 1
so the i is 0 in this case, want to change Ckk, k goes from [0, i] to 0.
np.diag makes a 2d array from a 1d diagonal:
In [97]: np.diag((np.arange(6)>2).astype(int))
Out[97]:
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1]])
basically the same as PPanzer's, but generating the diagonal a different way. Similar speed.
Here is one possibility:
N = 5
k = 2
np.diag(np.bincount([k],None,N).cumsum())
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 0, 1]])
Update: fast solution:
out = np.zeros((N,N))
out.reshape(-1)[(N+1)*k::N+1] = 1
You can build an NxN identity matrix and assign zero to the top left KxK corner:
N,K = 10,3
im = np.identity(N)
im[:K,:K] = 0
print(im)
output:
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]
40% faster than hpaulj's but not as fast at Paul Panzer's fast solution (which is 3x faster than this)
columns = np.shape(lines)[0] # Gets x-axis dimension of array lines (to get numbers of columns)
lengths = np.zeros(shape=(2,1)) # Create a 2D array
# lengths = [[ 0.]
# [ 0.]]
lengths = np.arange(columns).reshape((columns)) # Makes array have the same number of columns as columns and fills it with elements going up from zero <--- This line seems to be turning it into a 1D array
Output after printing lengths array:
print(lengths)
[0 1 2]
Expected Output Example:
print(lengths)
[[0 1 2]] # Notice the double square bracket
This results in me not being able to enter data into a 2D parts of an array, because it now no longer exists:
np.append(lengths, 65, axis=1)
AxisError: axis 1 is out of bounds for array of dimension 1
I want the array to be 2D so I can store "IDs" on the first row and values on the second (at a later point in the program). I'm also aware that I could add another row to the array instead of doing it at initialization. But I'd rather not do that since I heard that's inefficient and this program's success is highly dependent on performance.
Thank you.
Since you eventually want a 2d array with ids in one row and values in the second, I'd suggest starting with the right size
In [535]: arr = np.zeros((2,10),int)
In [536]: arr
Out[536]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
In [537]: arr[0,:]=np.arange(10)
In [538]: arr
Out[538]:
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
Sure you could start with a 1 row array of ids, but adding that 2nd row at a later time requires making a new array anyways. np.append is just a variation on np.concatenate.
But to make a 2d array from arange I like:
In [539]: np.arange(10)[None,:]
Out[539]: array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
reshape also works, but has to be given the correct shape, e.g. (1,10).
In:
lengths = np.zeros(shape=(2,1)) # Create a 2D array
lengths = np.arange(columns).reshape((columns))
the 2nd lengths assignment replaces the first. You have to do an indexed assignment as I did with arr[0,:] to modify an existing array. lengths[0,:] = np.arange(10) wouldn't work because lengths only has 1 column, not 10. Assignments like this require correct pairing of dimensions.
Don't need 2D data to put into a column of a 2D array. You just need 1D data.
You can put the data into the 0th row instead of the 0th column if you change the organization of memory. This is copying data into contiguous memory (memory without gaps) and that is faster.
Program:
import numpy as np
data = np.arange(12)
#method 1
buf = np.zeros((12, 6))
buf[:,0] = data
print(buf)
#method 2
buf = np.zeros((6, 12))
buf[0] = data
print(buf)
Result:
[[ 0. 0. 0. 0. 0. 0.]
[ 1. 0. 0. 0. 0. 0.]
[ 2. 0. 0. 0. 0. 0.]
[ 3. 0. 0. 0. 0. 0.]
[ 4. 0. 0. 0. 0. 0.]
[ 5. 0. 0. 0. 0. 0.]
[ 6. 0. 0. 0. 0. 0.]
[ 7. 0. 0. 0. 0. 0.]
[ 8. 0. 0. 0. 0. 0.]
[ 9. 0. 0. 0. 0. 0.]
[ 10. 0. 0. 0. 0. 0.]
[ 11. 0. 0. 0. 0. 0.]]
[[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
When using scipy.sparse.spdiags or scipy.sparse.diags I have noticed want I consider to be a bug in the routines eg
scipy.sparse.spdiags([1.1,1.2,1.3],1,4,4).toarray()
returns
array([[ 0. , 1.2, 0. , 0. ],
[ 0. , 0. , 1.3, 0. ],
[ 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. ]])
That is for positive diagonals it drops the first k data. One might argue that there is some grand programming reason for this and that I just need to pad with zeros. OK annoying as that may be, one can use scipy.sparse.diags which gives the correct result. However this routine has a bug that can't be worked around
scipy.sparse.diags([1.1,1.2],0,(4,2)).toarray()
gives
array([[ 1.1, 0. ],
[ 0. , 1.2],
[ 0. , 0. ],
[ 0. , 0. ]])
nice, and
scipy.sparse.diags([1.1,1.2],-2,(4,2)).toarray()
gives
array([[ 0. , 0. ],
[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.2]])
but
scipy.sparse.diags([1.1,1.2],-1,(4,2)).toarray()
gives an error saying ValueError: Diagonal length (index 0: 2 at offset -1) does not agree with matrix size (4, 2). Obviously the answer is
array([[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.2],
[ 0. , 0. ]])
and for extra random behaviour we have
scipy.sparse.diags([1.1],-1,(4,2)).toarray()
giving
array([[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.1],
[ 0. , 0. ]])
Anyone know if there is a function for constructing diagonal sparse matrices that actually works?
Executive summary: spdiags works correctly, even if the matrix input isn't the most intuitive. diags has a bug that affects some offsets in rectangular matrices. There is a bug fix on scipy github.
The example for spdiags is:
>>> data = array([[1,2,3,4],[1,2,3,4],[1,2,3,4]])
>>> diags = array([0,-1,2])
>>> spdiags(data, diags, 4, 4).todense()
matrix([[1, 0, 3, 0],
[1, 2, 0, 4],
[0, 2, 3, 0],
[0, 0, 3, 4]])
Note that the 3rd column of data always appears in the 3rd column of the sparse. The other columns also line up. But they are omitted where they 'fall off the edge'.
The input to this function is a matrix, while the input to diags is a ragged list. The diagonals of the sparse matrix all have different numbers of values. So the specification has to accomodate this in one or other. spdiags does this by ignoring some values, diags by taking a list input.
The sparse.diags([1.1,1.2],-1,(4,2)) error is puzzling.
the spdiags equivalent does work:
In [421]: sparse.spdiags([[1.1,1.2]],-1,4,2).A
Out[421]:
array([[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.2],
[ 0. , 0. ]])
The error is raised in this block of code:
for j, diagonal in enumerate(diagonals):
offset = offsets[j]
k = max(0, offset)
length = min(m + offset, n - offset)
if length <= 0:
raise ValueError("Offset %d (index %d) out of bounds" % (offset, j))
try:
data_arr[j, k:k+length] = diagonal
except ValueError:
if len(diagonal) != length and len(diagonal) != 1:
raise ValueError(
"Diagonal length (index %d: %d at offset %d) does not "
"agree with matrix size (%d, %d)." % (
j, len(diagonal), offset, m, n))
raise
The actual matrix constructor in the diags is:
dia_matrix((data_arr, offsets), shape=(m, n))
This is the same constructor that spdiags uses, but without any manipulation.
In [434]: sparse.dia_matrix(([[1.1,1.2]],-1),shape=(4,2)).A
Out[434]:
array([[ 0. , 0. ],
[ 1.1, 0. ],
[ 0. , 1.2],
[ 0. , 0. ]])
In dia format, the inputs are stored exactly as given by spdiags (complete with that matrix with extra values):
In [436]: M.data
Out[436]: array([[ 1.1, 1.2]])
In [437]: M.offsets
Out[437]: array([-1], dtype=int32)
As #user2357112 points out, length = min(m + offset, n - offset is wrong, producing 3 in the test case. Changing it to length = min(m + k, n - k) makes all cases for this (4,2) matrix work. But it fails with the transpose: diags([1.1,1.2], 1, (2, 4))
The correction, as of Oct 5, for this issue is:
https://github.com/pv/scipy-work/commit/529cbde47121c8ed87f74fa6445c05d71353eb6c
length = min(m + offset, n - offset, min(m,n))
With this fix, diags([1.1,1.2], 1, (2, 4)) works.