Accumulated sum of 2D array [duplicate] - python

This question already has answers here:
Multidimensional cumulative sum in numpy
(3 answers)
Closed 2 years ago.
Suppose I have a 2D numpy array like below
dat = np.array([[1,2],[3,4],[5,6],[7,8])
I want to get a new array with each row equals to the sum of its previous rows with itself, like the following
first row: [1,2]
second row: [1,2] + [3,4] = [4,6]
third row: [4,6] + [5,6] = [9,12]
forth row: [9,12] + [7,8] = [16,20]
So the array would be like
dat = np.array([[1,2],[4,6],[9,12],[16,20])

np.cumsum is what you are looking for:
dat = np.array([[1,2],[3,4],[5,6],[7,8]])
result = np.cumsum(dat, axis=0)

Related

histogram indexed by month and day [duplicate]

This question already has an answer here:
Plotting histogram of list of tuplets matplotlib
(1 answer)
Closed 4 years ago.
I'm trying to create a histogram of the number of month / day pairs. So, I have an array which consists of the following:
date_patterns = [(12,1,1992), (1,4,1993), (1,5,1993),
(1,6,1993), (1,4,1994), (1,5,1994),
(2,9,1995), (3,4,1995), (1,4,1996)]
I'd like this histogram indexed by just the month and day so:
(12,1) = 1
(1,4) = 3
(1,5) = 2
(1,6) = 1
(2,9) = 1
(3, 4) = 1
import itertools
date_patterns = [(12,1,1992), (1,4,1993), (1,5,1993),
(1,6,1993), (1,4,1994), (1,5,1994),
(2,9,1995), (3,4,1995), (1,4,1996)]
#use a list comprehension to go through the date patterns grouped by day, month and then count the lengths of the groups
groups = [(k, len(list(g))) for k, g in itertools.groupby(sorted(date_patterns), lambda x:(x[0], x[1]))]
print groups

Creating a 2D array from a single list of input integers separated by space

I was solving some problems at geeksforgeeks and I came across a particluar question where the inputs are provided in the test case as shown:
2 2 # denotes row, column of the matrix
1 0 0 0 # all the elements of the matrix in a single line separated by a single space.
I am not getting how to initialize my 2D array with the inputs given in such a manner.
P.S. I can't use split as it will split all the elements on in a single array from which I have to read again each element. I am looking for more simple and pythonic way.
You should use .split. And you also need to convert the split string items to int. But you can do that very compactly, if you want to:
rows, cols = map(int, input('rows cols: ').split())
data = map(int, input('data: ').split())
mat = [*map(list, zip(*[data] * cols))]
print(rows, cols)
print(mat)
demo
rows cols: 2 2
data: 1 2 3 4
2 2
[[1, 2], [3, 4]]
If you get a SyntaxError on mat = [*map(list, zip(*[data] * cols))] change it to
mat = list(map(list, zip(*[data] * cols)))
Or upgrade to a newer Python 3. ;)
After using split on both strings:
n_rows, n_cols = [int(x) for x in matrix_info_str.split(' ')]
split_str = matrix_str.split(' ')
I'd got with:
matrix = [split_str[i : i + n_cols] for i in xrange(0, n_rows * n_cols, n_cols)]

python when add lists using array setting elements error

write python program to add list,the first list is a 10*3*11 list(3 dimension), and the second list is also a 10*3*11 list with all elements are 0, add them,use numpy:
data_split_count = 10
cluster_number = 3
total_center_list = [[[[0] for i in range(11)] for j in range(cluster_number)] for kj in range(data_split_count)]
print("1 len total center list")
print(len(total_center_list))
total_center_data_list = minibatchkmeansClustering_no_gender(data_list)
print("total center list")
print(len(total_center_data_list))
print("total center list 0")
print(len(total_center_data_list[0]))
print("total center list 0 0")
print(len(total_center_data_list[0][0]))
print(total_center_data_list[0][1])
print("sum total center list")
temp_test = numpy.array([total_center_data_list,total_center_list])
total_center_list = temp_test.sum(axis = 0)
print(len(total_center_list))
when runnung, it shows:
1 len total center list
10
total center list
10
total center list 0
3
total center list 0 0
11
[ 0.07459313 0.05333272 0.01219305 0.32307944 0.16194127 0.00409273
0.34603601 0.33625275 0.06253664 0.1693817 0.08579227]
sum total center list
File "F:/MyDocument/F/My Document/Training/Python/PyCharmProject/FaceBookCrawl/FB_group_user_stability.py", line 36, in dist_cal
temp_test = numpy.array([total_center_data_list,total_center_list])
ValueError: setting an array element with a sequence
could you please tell me the reason and how to solve it
If you would like to use numpy, it operates on arrays of data. You have to convert your lists to arrays using asarray. Then you can just add two arrays together element-wise, using "+".
import numpy as np
list1=range(3*5*11) # list1 = your total_center_list
a1=np.asarray(list1).reshape((3,5,11)) # converted to numpy array, reshaped to match your dimensions
print a1
list2=range(3*5*11) # list2 = your total_center_data_list
a2=np.asarray(list2).reshape(3,5,11)
a3=a1+a2 # your sum array
print a3.shape # checks dimensions

Transforming a 3 Column Matrix into an N x N Matrix in Numpy

I have a 2D numpy array with 3 columns. Columns 1 and 2 are a list of connections between ID's. Column 3 is a the strength of that connection. I would like to transform this 3 column matrix into a weighted adjacency matrix (an N x N matrix where cells represent the strength of connection between each ID).
I have already done this in my code below. matrix is the 3 column 2D array and t1 is the weighted adjacency matrix. My problem is this code is very slow because I am using nested for loops. I am familiar with the pandas function melt which does this, but I am not able to use pandas. Is there a faster implementation not using pandas?
import numpy as np
a = np.arange(2000)
np.random.shuffle(a)
b = np.arange(2000)
np.random.shuffle(b)
c = np.random.rand(2000,1)
matrix = np.column_stack((a,b,c))
#get unique value list of nm
flds = list(np.unique(matrix[:,0]))
flds.extend(list(np.unique(matrix[:,1])))
flds = np.asarray(flds)
flds = np.unique(flds)
#make lookup dict
lookup = dict(zip(np.arange(0,len(flds)), flds))
lookup_rev = dict(zip(flds, np.arange(0,len(flds))))
#make empty n by n matrix with unique lists
t1 = np.zeros([len(flds) , len(flds)])
#map values into the n by n matrix and make the rest 0
'''this takes a long time to run'''
#iterate through rows
for i in np.arange(0,len(lookup)):
#iterate through columns
for k in np.arange(0,len(lookup)):
val = matrix[(matrix[:,0] == lookup[i]) & (matrix[:,1] == lookup[k])][:,2]
if val:
t1[i,k] = sum(val)
Assuming that I understood the question correctly and that val is a scalar, you could use a vectorized approach that involves initializing with zeros and then indexing, like so -
out = np.zeros((len(flds),len(flds)))
out[matrix[:,0].astype(int),matrix[:,1].astype(int)] = matrix[:,2]
Please note that by my observation it looks like you can avoid using lookup.
You need to iterate your matrix only once:
import numpy as np
size = 2000
a = np.arange(size)
np.random.shuffle(a)
b = np.arange(size)
np.random.shuffle(b)
c = np.random.rand(size,1)
matrix = np.column_stack((a,b,c))
#get unique value list of nm
fields = np.unique(matrix[:,:2])
n = len(fields)
#make reverse lookup dict
lookup = dict(zip(fields, range(n)))
#make empty n by n matrix
t1 = np.zeros([n, n])
for src, dest, val in matrix:
i = lookup[src]
j = lookup[dest]
t1[i, j] += val
The main acceleration you can get is by not iterating through each element of the NxN matrix but instead iterate trough your connection list, which is much smaller.
I tried to simplify your code a bit. It use the list.index method, which can be slow, but it should still be faster that what you had.
import numpy as np
a = np.arange(2000)
np.random.shuffle(a)
b = np.arange(2000)
np.random.shuffle(b)
c = np.random.rand(2000,1)
matrix = np.column_stack((a,b,c))
lookup = np.unique(matrix[:,:2]).tolist() # You can call unique only once
t1 = np.zeros((len(lookup),len(lookup)))
for i,j,val in matrix:
t1[lookup.index(i),lookup.index(j)] = val # Fill the matrix

Numpy Error using 'np.arange': Index Error [duplicate]

This question already has answers here:
First row of numpy.ones is still populated after referencing another matrix
(5 answers)
Closed 6 years ago.
I am working with the code shown below and am getting an Index Error:
index 8 is out of bounds for axis 1 with size 8 and
index 6 is out of bounds for axis 0 with size 6`.
When changing np.arrange(1,9) and np.arrange(2,8) to np.arrange(8) and np.arrange(6) respectively the code runs with no errors. However, the output matrix C is giving different results than expected. What if I want i and j to be an integer with values (1,2,3,4,5,6,7,8) instead of (0,1,2,3,4,5,6,7) and k and l with integer values (2,3,4,5,6,7)? I am creating a matrix C which looks at the inner 6x6 square of B (ignoring the border) and uses the matrix D as a 'weight' when determining the values of matrix C.
import numpy
import numpy as np
A = np.matrix([[8,8,8,7,7,6,8,2],
[8,8,7,7,7,6,6,7],
[1,8,8,7,7,6,6,6],
[1,1,8,7,7,6,7,7],
[1,1,1,1,8,7,7,6],
[1,1,2,1,8,7,7,6],
[2,2,2,1,1,8,7,7],
[2,1,2,1,1,8,8,7]])
B = np.ones((8,8),dtype=np.int)
for i in np.arange(1,9):
for j in np.arange(1,9):
B[i,j] = A[i,j]
C = np.zeros((6,6),dtype=np.int)
print C
D = np.matrix([[1,1,2,3,3,2,2,1],
[1,2,1,2,3,3,3,2],
[1,1,2,1,1,2,2,3],
[2,2,3,2,2,2,1,3],
[1,2,2,3,2,3,1,3],
[1,2,3,3,2,3,2,3],
[1,2,2,3,2,3,1,2],
[2,2,3,2,2,3,2,2]])
print D
for k in np.arange(2,8):
for l in np.arange(2,8):
B[k,l] # point in middle
b = B[(k-1),(l-1)]
if b == 8:
# Matrix C is smaller than Matrix B
C[(k-1),(l-1)] = C[(k-1),(l-1)] + 1*D[(k-1),(l-1)]
Remember that Python indexing starts at 0, not 1, so your array would be from 0-7, not 1-8.

Categories