Say that I have a sparse matrix in scipy.sparse format. How can I extract a diagonal other than than the main diagonal? For a numpy array, you can use numpy.diag. Is there a scipy sparse equivalent?
For example:
from scipy import sparse
A = sparse.diags(ones(5),1)
How would I get back the vector of ones without converting to a numpy array?
When the sparse array is in dia format, the data along the diagonals is recorded in the offsets and data attributes:
import scipy.sparse as sparse
import numpy as np
def make_sparse_array():
A = np.arange(ncol*nrow).reshape(nrow, ncol)
row, col = zip(*np.ndindex(nrow, ncol))
val = A.ravel()
A = sparse.coo_matrix(
(val, (row, col)), shape=(nrow, ncol), dtype='float')
A = A.todia()
# A = sparse.diags(np.ones(5), 1)
# A = sparse.diags([np.ones(4),np.ones(3)*2,], [2,3])
print(A.toarray())
return A
nrow, ncol = 10, 5
A = make_sparse_array()
diags = {offset:(diag[offset:nrow+offset] if 0<=offset<=ncol else
diag if offset+nrow-ncol>=0 else
diag[:offset+nrow-ncol])
for offset, diag in zip(A.offsets, A.data)}
for offset, diag in sorted(diags.iteritems()):
print('{o}: {d}'.format(o=offset, d=diag))
Thus for the array
[[ 0. 1. 2. 3. 4.]
[ 5. 6. 7. 8. 9.]
[ 10. 11. 12. 13. 14.]
[ 15. 16. 17. 18. 19.]
[ 20. 21. 22. 23. 24.]
[ 25. 26. 27. 28. 29.]
[ 30. 31. 32. 33. 34.]
[ 35. 36. 37. 38. 39.]
[ 40. 41. 42. 43. 44.]
[ 45. 46. 47. 48. 49.]]
the code above yields
-9: [ 45.]
-8: [ 40. 46.]
-7: [ 35. 41. 47.]
-6: [ 30. 36. 42. 48.]
-5: [ 25. 31. 37. 43. 49.]
-4: [ 20. 26. 32. 38. 44.]
-3: [ 15. 21. 27. 33. 39.]
-2: [ 10. 16. 22. 28. 34.]
-1: [ 5. 11. 17. 23. 29.]
0: [ 0. 6. 12. 18. 24.]
1: [ 1. 7. 13. 19.]
2: [ 2. 8. 14.]
3: [ 3. 9.]
4: [ 4.]
The output above is printing the offset followed by the diagonal at that offset.
The code above should work for any sparse array. I used a fully populated sparse array only to make it easier to check that the output is correct.
Related
Suppose that A is a three-dimensional matrix like the following:
A = [np.zeros((3, 8)) for _ in range(20)]
and B is a two-dimensional matrix that has 60 rows and 8 columns containing numbers. What should I do if I want to put numbers from matrix B into matrix A and use a loop to write code?
A[0][0] = B[0]
A[0][1] = B[1]
A[0][2] = B[2]
A[1][0] = B[3]
A[1][1] = B[4]
A[1][2] = B[5]
...
A[20][0] = B[58]
A[20][1] = B[59]
A[20][2] = B[60]
Thanks
I hope I've understood your question right. You can use .flat + indexing:
A = [np.zeros((3, 8)) for _ in range(20)]
B = np.arange(60 * 8).reshape(60, 8)
for i, subl in enumerate(A):
subl.flat[:] = B.flat[i * 3 * 8 :]
print(*A, sep="\n\n")
Prints:
[[ 0. 1. 2. 3. 4. 5. 6. 7.]
[ 8. 9. 10. 11. 12. 13. 14. 15.]
[16. 17. 18. 19. 20. 21. 22. 23.]]
[[24. 25. 26. 27. 28. 29. 30. 31.]
[32. 33. 34. 35. 36. 37. 38. 39.]
[40. 41. 42. 43. 44. 45. 46. 47.]]
...
I have 4 point groups, each of them contain 5 different 3D positions. My goal is to brut force all possible four permutations for each group without repeating the order and print them out as (5x3)array. E.g. for input data:
1,2,3
4,5,6
7,8,9
10,11,12
13,14,15
16,17,18
19,20,21
22,23,24
25,26,27
28,29,30
31,32,33
34,35,36
37,38,39
40,41,42
43,44,45
46,47,48
49,50,51
52,53,54
55,56, 57
58,59,60
I read the file:
def read_file(name):
with open(name, 'r') as f:
data = []
for line in f:
l = line.strip()
cols = [float(i) for i in line.split(',')]
data.append(cols)
return np.array(data)
and reshape it to have 4x(5x3) arrays to be brute-forced:
def main():
filePath= 'C:/Users/retw/input.txt'
data = read_file(filePath)
print('data:', data, type(data), data.shape)
reshapedData = data.reshape(4, 5, 3)
print('reshapedData :', reshapedData, type(reshapedData), reshapedData.shape)
The current output looks like:
respahedData: [[[ 1. 2. 3.]
[ 4. 5. 6.]
[ 7. 8. 9.]
[10. 11. 12.]
[13. 14. 15.]]
[[16. 17. 18.]
[19. 20. 21.]
[22. 23. 24.]
[25. 26. 27.]
[28. 29. 30.]]
[[31. 32. 33.]
[34. 35. 36.]
[37. 38. 39.]
[40. 41. 42.]
[43. 44. 45.]]
[[46. 47. 48.]
[49. 50. 51.]
[52. 53. 54.]
[55. 56. 57.]
[58. 59. 60.]]] <class 'numpy.ndarray'> (4, 5, 3)
after brut force, the permutations as array or list should look like:
[[1,2,3]
[16,17,18]
[31,32,33]
[46,47,48]]
[[1,2,3]
[19,20,21]
[31,32,33]
[46,47,48]]
[[1,2,3]
[22,23,24]
[31,32,33]
[46,47,48]]
etc,
until
[[13,14,15]
[28,29,30]
[43,44,45]
[58,59,60]]
Edit
For given two 4x3 arrays as input:
[[[1,2,3]
[4,5,6]]
[7,8,9]
[10,11,12]]]
The output after brute force should be:
[[1,2,3]
[7,8,9]]
[[1,2,3]
[10,11,12]]
[[4,5,6]
[7,8,9]]
[[4,5,6]
[10,11,12]]
Here is a solution using numpy and a generator that appears to work, generates the correct number of combos (625), and sequences them as you are looking for...
import numpy as np
f_in = 'data.csv'
data = []
with open(f_in, 'r') as f:
for line in f:
l = line.strip()
cols = [float(i) for i in line.split(',')]
data.append(cols)
data = np.array(data).reshape((4,5,3))
#print(data)
def result_gen(data):
odometer = [0, 0, 0, 0]
roll_seq = [1, 2, 3, 0] # the sequence of positions by which to roll the odometer
expired = False
while not expired:
res = data[[0, 1, 2, 3], [odometer]]
for i in roll_seq:
if odometer[i] < 4:
odometer[i] += 1
break
else:
if i == 0: # we have exhausted all combos
expired = True
odometer[i] = 0
yield res
my_gen = result_gen(data)
a = list(my_gen)
print(len(a))
for t in a[:6]:
print(t)
Yields:
625
[[[ 1. 2. 3.]
[16. 17. 18.]
[31. 32. 33.]
[46. 47. 48.]]]
[[[ 1. 2. 3.]
[19. 20. 21.]
[31. 32. 33.]
[46. 47. 48.]]]
[[[ 1. 2. 3.]
[22. 23. 24.]
[31. 32. 33.]
[46. 47. 48.]]]
[[[ 1. 2. 3.]
[25. 26. 27.]
[31. 32. 33.]
[46. 47. 48.]]]
[[[ 1. 2. 3.]
[28. 29. 30.]
[31. 32. 33.]
[46. 47. 48.]]]
[[[ 1. 2. 3.]
[16. 17. 18.]
[34. 35. 36.]
[46. 47. 48.]]]
[Finished in 0.2s]
Looks like you want to create something like this.
import numpy as np
a = np.arange(1,61).reshape(4,5,3)
print (a)
b = np.zeros((20,4,3))
for k in range(4):
for i in range(4):
for j in range(5):
if i == k:
b[5*k + j][i] = a[i][j]
else:
b[5*k + j][i] = a[i][0]
print (b)
The output of this will be:
[[[ 1. 2. 3.]
[16. 17. 18.]
[31. 32. 33.]
[46. 47. 48.]]
[[ 4. 5. 6.]
[16. 17. 18.]
[31. 32. 33.]
[46. 47. 48.]]
[[ 7. 8. 9.]
[16. 17. 18.]
[31. 32. 33.]
[46. 47. 48.]]
[[10. 11. 12.]
[16. 17. 18.]
[31. 32. 33.]
[46. 47. 48.]]
[[13. 14. 15.]
[16. 17. 18.]
[31. 32. 33.]
[46. 47. 48.]]
[[ 1. 2. 3.]
[16. 17. 18.]
[31. 32. 33.]
[46. 47. 48.]]
[[ 1. 2. 3.]
[19. 20. 21.]
[31. 32. 33.]
[46. 47. 48.]]
[[ 1. 2. 3.]
[22. 23. 24.]
[31. 32. 33.]
[46. 47. 48.]]
[[ 1. 2. 3.]
[25. 26. 27.]
[31. 32. 33.]
[46. 47. 48.]]
[[ 1. 2. 3.]
[28. 29. 30.]
[31. 32. 33.]
[46. 47. 48.]]
[[ 1. 2. 3.]
[16. 17. 18.]
[31. 32. 33.]
[46. 47. 48.]]
[[ 1. 2. 3.]
[16. 17. 18.]
[34. 35. 36.]
[46. 47. 48.]]
[[ 1. 2. 3.]
[16. 17. 18.]
[37. 38. 39.]
[46. 47. 48.]]
[[ 1. 2. 3.]
[16. 17. 18.]
[40. 41. 42.]
[46. 47. 48.]]
[[ 1. 2. 3.]
[16. 17. 18.]
[43. 44. 45.]
[46. 47. 48.]]
[[ 1. 2. 3.]
[16. 17. 18.]
[31. 32. 33.]
[46. 47. 48.]]
[[ 1. 2. 3.]
[16. 17. 18.]
[31. 32. 33.]
[49. 50. 51.]]
[[ 1. 2. 3.]
[16. 17. 18.]
[31. 32. 33.]
[52. 53. 54.]]
[[ 1. 2. 3.]
[16. 17. 18.]
[31. 32. 33.]
[55. 56. 57.]]
[[ 1. 2. 3.]
[16. 17. 18.]
[31. 32. 33.]
[58. 59. 60.]]]
There are a total of 20 arrays of 4 x 3 I could get looping through this.
I want to create a copy df with different values based on the previous one. I have used this technique before and it worked just fine, however it doesn't work here.
Does anyone know if I am missing something?
Code:
df2 = df1.copy()
for index, row in df2.iterrows():
flowers_num = int(row["flowers_num"])
if flowers_num >= 100:
flowers_num = 10
elif flowers_num >= 10:
flowers_num = 8
else:
flowers_num = 6
row["flowers_num"] = flowers_num
Unique values on df2 before loop:
[ 0. 1. 10. 15. 6. 2. 4. 3. 44. 8. 9. 7. 22. 5.
11. 19. 12. 13. 21. 20. 14. 23. 16. 18. 24. 17. 35. 32.
25. 30. 28. 57. 45. 27. 42. 38. 43. 37. 34. 26. 29. 41.
52. 31. 39. 46. 51. 131. 36. 61. 53. 33. 48. 40. 58. 49.
76. 50. 119. 55. 91. 59. 106. 56. 65. 54. 47. 63. 64. 67.
75. 102. 74. 70. 60.]
Unique values on df2 after loop (should be just 6, 8 or 10):
[ 0. 1. 10. 15. 6. 2. 4. 3. 44. 8. 9. 7. 22. 5.
11. 19. 12. 13. 21. 20. 14. 23. 16. 18. 24. 17. 35. 32.
25. 30. 28. 57. 45. 27. 42. 38. 43. 37. 34. 26. 29. 41.
52. 31. 39. 46. 51. 131. 36. 61. 53. 33. 48. 40. 58. 49.
76. 50. 119. 55. 91. 59. 106. 56. 65. 54. 47. 63. 64. 67.
75. 102. 74. 70. 60.]
Thanks in advance!
Your coded worked from me, however, the "pandas" way to do this is to use pd.cut:
pd.cut(df1['flowers_num'], [0,10,100,np.inf], labels=[6,8,10])
It would be much faster if you use apply on the column rather than iterrows.
Create a function to change the values
def change_num(x):
if x>=100:
return 10
elif x>=10:
return 8
else:
return 6
Dummy DataFrame:
df_ex = pd.DataFrame({'flowers_num': np.random.randint(1,1000,20)})
Using apply:
df_ex["flowers_num"]=df_ex["flowers_num"].apply(change_num)
I have the following arrays:
from mxnet import nd
A=nd.array([[1,1,1,1],[2,2,2,2]])
B=nd.array([[11,11,11,11],[22,22,22,22]])
Y=nd.array([[91,91,91,91],[92,92,92,92]])
Imagine that each list whithin each array corresponds to a client.
So [1,1,1,1] is the result of operation A to client 1 and [2,2,2,2] is the result of operation A to client 2.
Then I have another array with a diferent operation that is applied to all the clients. [11,11,11,11] is the result of operation B to client 1 and so on.
And I need to get the following result:
D=nd.array( [ [[1,1,1,1],[11,11,11,11]],[[2,2,2,2],[22,22,22,22]] ])
list([D,Y])
This returns:
[
[[[ 1. 1. 1. 1.]
[11. 11. 11. 11.]]
[[ 2. 2. 2. 2.]
[22. 22. 22. 22.]]]
<NDArray 2x2x4 #cpu(0)>,
[[91. 91. 91. 91.]
[92. 92. 92. 92.]]
<NDArray 2x4 #cpu(0)>]
As you can see, the operations (A and B) are grouped for each client.
I tried:
list([list(zip(A,B)),Y])
And I get:
[[(
[1. 1. 1. 1.]
<NDArray 4 #cpu(0)>,
[11. 11. 11. 11.]
<NDArray 4 #cpu(0)>), (
[2. 2. 2. 2.]
<NDArray 4 #cpu(0)>,
[22. 22. 22. 22.]
<NDArray 4 #cpu(0)>)],
[[91. 91. 91. 91.]
[92. 92. 92. 92.]]
<NDArray 2x4 #cpu(0)>]
Which is not what I need. Plus, arrays A and B are really big, so I don't want to use a loop or something that will take too long.
Thanks.
this is typically an operation you can do with an mxnet.ndarray.concat, yet you need to expand the dimension of the concatenated items before the concat so that they stay in separate arrays.
This command will get exactly the output you ask for:
C = nd.concat(A.expand_dims(axis=1), B.expand_dims(axis=1), dim=1)
print(C)
which returns:
[[[ 1. 1. 1. 1.]
[11. 11. 11. 11.]]
[[ 2. 2. 2. 2.]
[22. 22. 22. 22.]]]
<NDArray 2x2x4 #cpu(0)>
Problem:
The input is a (i,j)-matrix M. The desired output is a (i^n,j^n) matrix K , where n is the number of products taken. The verbose way to get the desired output is as follows
Generate all arrays of n row permutations I (total of i**n n-arrays)
Generate all arrays of n column permutations J (total of j**n n-arrays)
K[i,j] = m[I[0],J[0]] * ... * m[I[n],J[n]] for all n in range(len(J))
The straightforward way I've done this is by generating a list of labels of all n-permutations of numbers in range(len(np.shape(m)[0])) and range(len(np.shape(m)[1])) for rows and columns, respectively. Afterwards you can multiply them as in the last bullet point above. This, however, is not practical for large input matrices -- so I'm looking for ways to optimize the above. Thank you in advance
Example:
Input
np.array([[1,2,3],[4,5,6]])
Output for n = 3
[[ 1. 2. 3. 2. 4. 6. 3. 6. 9. 2. 4. 6.
4. 8. 12. 6. 12. 18. 3. 6. 9. 6. 12. 18.
9. 18. 27.]
[ 4. 5. 6. 8. 10. 12. 12. 15. 18. 8. 10. 12.
16. 20. 24. 24. 30. 36. 12. 15. 18. 24. 30. 36.
36. 45. 54.]
[ 4. 8. 12. 5. 10. 15. 6. 12. 18. 8. 16. 24.
10. 20. 30. 12. 24. 36. 12. 24. 36. 15. 30. 45.
18. 36. 54.]
[ 16. 20. 24. 20. 25. 30. 24. 30. 36. 32. 40. 48.
40. 50. 60. 48. 60. 72. 48. 60. 72. 60. 75. 90.
72. 90. 108.]
[ 4. 8. 12. 8. 16. 24. 12. 24. 36. 5. 10. 15.
10. 20. 30. 15. 30. 45. 6. 12. 18. 12. 24. 36.
18. 36. 54.]
[ 16. 20. 24. 32. 40. 48. 48. 60. 72. 20. 25. 30.
40. 50. 60. 60. 75. 90. 24. 30. 36. 48. 60. 72.
72. 90. 108.]
[ 16. 32. 48. 20. 40. 60. 24. 48. 72. 20. 40. 60.
25. 50. 75. 30. 60. 90. 24. 48. 72. 30. 60. 90.
36. 72. 108.]
[ 64. 80. 96. 80. 100. 120. 96. 120. 144. 80. 100. 120.
100. 125. 150. 120. 150. 180. 96. 120. 144. 120. 150. 180.
144. 180. 216.]]
Partial solution:
The best I've found is a function to create the cartesian product of matrices proposed here: https://stackoverflow.com/a/1235363/4003747
The problem is that the output is not a matrix but an array of arrays. Multiplying the element of each array gives the values I'm after, but in an unordered fashion. I've tried for a while but I have no idea how to sensibly reorder them.
Inefficient solution for n =3:
import numpy as np
import itertools
m=np.array([[1,2,3],[4,5,6]])
def f(m):
labels_i = [list(p) for p in itertools.product(range(np.shape(m)[0]),repeat=3)]
labels_j = [list(p) for p in itertools.product(range(np.shape(m)[1]),repeat=3)]
out = np.zeros([len(labels_i),len(labels_j)])
for i in range(len(labels_i)):
for j in range(len(labels_j)):
out[i,j] = m[labels_i[i][0],labels_j[j][0]] * m[labels_i[i][1],labels_j[j][1]] * m[labels_i[i][2],labels_j[j][2]]
return out
Here's a vectorized approach using a combination of broadcasting and linear indexing -
from itertools import product
# Get input array's shape
r,c = A.shape
# Setup arrays corresponding to labels i and j
arr_i = np.array(list(product(range(r), repeat=n)))
arr_j = np.array(list(product(range(c), repeat=n)))
# Use linear indexing with ".ravel()" to extract elements.
# Perform elementwise product along the rows for the final output
out = A.ravel()[(arr_i*c)[:,None,:] + arr_j].prod(2)
Runtime test and output verification -
In [167]: # Inputs
...: n = 4
...: A = np.array([[1,2,3],[4,5,6]])
...:
...: def f(m):
...: labels_i = [list(p) for p in product(range(np.shape(m)[0]),repeat=n)]
...: labels_j = [list(p) for p in product(range(np.shape(m)[1]),repeat=n)]
...:
...: out = np.zeros([len(labels_i),len(labels_j)])
...: for i in range(len(labels_i)):
...: for j in range(len(labels_j)):
...: out[i,j] = m[labels_i[i][0],labels_j[j][0]] \
...: * m[labels_i[i][1],labels_j[j][1]] \
...: * m[labels_i[i][2],labels_j[j][2]] \
...: * m[labels_i[i][3],labels_j[j][3]]
...: return out
...:
...: def f_vectorized(A,n):
...: r,c = A.shape
...: arr_i = np.array(list(product(range(r), repeat=n)))
...: arr_j = np.array(list(product(range(c), repeat=n)))
...: return A.ravel()[(arr_i*c)[:,None,:] + arr_j].prod(2)
...:
In [168]: np.allclose(f_vectorized(A,n),f(A))
Out[168]: True
In [169]: %timeit f(A)
100 loops, best of 3: 2.37 ms per loop
In [170]: %timeit f_vectorized(A,n)
1000 loops, best of 3: 202 µs per loop
this should work:
import numpy as np
import itertools
m=np.array([[1,2,3],[4,5,6]])
n=3 # change your n here
def f(m):
labels_i = [list(p) for p in itertools.product(range(np.shape(m)[0]),repeat=n)]
labels_j = [list(p) for p in itertools.product(range(np.shape(m)[1]),repeat=n)]
out = np.zeros([len(labels_i),len(labels_j)])
for i in range(len(labels_i)):
for j in range(len(labels_j)):
out[i,j] = np.prod([m[labels_i[i][k],labels_j[j][k]] for k in range(n)])
return out