Summing array values by repeating index for an array - python

I want to sum the values in vals into elements of a smaller array a specified in an index list idx.
import numpy as np
a = np.zeros((1,3))
vals = np.array([1,2,3,4])
idx = np.array([0,1,2,2])
a[0,idx] += vals
This produces the result [[ 1. 2. 4.]] but I want the result [[ 1. 2. 7.]], because it should add the 3 from vals and 4 from vals into the 2nd element of a.
I can achieve what I want with:
import numpy as np
a = np.zeros((1,3))
vals = np.array([1,2,3,4])
idx = np.array([0,1,2,2])
for i in np.unique(idx):
fidx = (idx==i).astype(int)
psum = (vals * fidx).sum()
a[0,i] = psum
print(a)
Is there a way to do this with numpy without using a for loop?

Possible with np.add.at as long as the shapes align, i.e., a will need to be 1D here.
a = a.squeeze()
np.add.at(a, idx, vals)
a
array([1., 2., 7.])

Related

How do I remove rows in a list containing numpy arrays based on a condition?

I have the following numpy array arr_split:
import numpy as np
arr1 = np.array([[1.,2,3], [4,5,6], [7,8,9]])
arr_split = np.array_split(arr1,
indices_or_sections = 4,
axis = 0)
arr_split
Output:
[array([[1., 2., 3.]]),
array([[4., 5., 6.]]),
array([[7., 8., 9.]]),
array([], shape=(0, 3), dtype=float64)]
How do I remove rows which are "empty" (ie. in the above eg., it's the last row). The array arr_split can have any number of "empty" rows. The above eg. just so happens to have only one row which is "empty".
I have tried using list comprehension, as per below:
arr_split[[(arr_split[i].shape[0] != 0) for i in range(len(arr_split))]]
but this doesn't work because the list comprehension [(arr_split[i].shape[0] != 0) for i in range(len(arr_split))] part returns a list, when I actually just need the elements in the list to feed into arr_split[] as indices.
Anyone know how I could fix this or is there another way of doing this? If possible, looking for the easiest way of doing this without too many loops or if statements.
you can change the indices_or_sections value to length of the first axis, this will prevent any empty arrays from being produced
import numpy as np
arr1 = np.array([[1.,2,3], [4,5,6], [7,8,9]])
arr_split = np.array_split(arr1,
indices_or_sections = arr1.shape[0],
axis = 0)
arr_split
>>> [
array([[1., 2., 3.]]),
array([[4., 5., 6.]]),
array([[7., 8., 9.]])
]
Just loop through and check the size. Only add them to the new list if they have a size greater than 0.
arr_split_new = [arr for arr in arr_split if arr.size > 0]
You can use enumerate to get the indexes and size to check if empty
indexes = [idx for idx, v in enumerate(arr_split) if v.size != 0]
[0, 1, 2]

Sum 2-D arrays in Python

I have two 2-D arrays, and I tried to sum element-wise
A = array([[-0.31326169, -0., -3.23995333],
[-0.26328247, -0., -0.64439666]])
B = array([[-0 , -0.28733533, -0.],
[-0 , -2.12692801, -0]])
sum(A + B)
array([-0.57654415, -2.41426334, -3.88434999])
Why does it result in a 1-D array?
What you are looking for is numpy.add
import numpy as np
arr1 = np.array([[-0.31326169, -0., -3.23995333],[-0.26328247, -0., -0.64439666]])
arr2 = np.array([[-0., -0.28733533, -0.],[-0., -2.12692801, -0.]])
arr3=np.add(arr1,arr2)
print(arr3)
Output
[[-0.31326169 -0.28733533 -3.23995333]
[-0.26328247 -2.12692801 -0.64439666]]
This happens because A + B is a 2 by 3 array, and it's then summed using the built-in sum function (np.sum would've returned a single number).
__builtins__.sum will iterate over the given array, and the iteration happens to be row-wise, so individual rows will be added up (I called your arrays X and Y):
>>> X + Y
array([[-0.31326169, -0.28733533, -3.23995333],
[-0.26328247, -2.12692801, -0.64439666]])
Then, sum(X + Y) will do the following:
__sum = 0
for row in (X + Y):
__sum += row
return __sum
So, individual rows will be summed:
>>> X + Y
array([[-0.31326169, -0.28733533, -3.23995333],
[-0.26328247, -2.12692801, -0.64439666]])
>>> _[0] + _[1]
array([-0.57654416, -2.41426334, -3.88434999])
If you want to sum X and Y element-wise, then... just sum them: result = X + Y.

get the column from 2d array to calculate the normalization and cross product in python

I have a 2d matrix with dimension (3, n) called A, I want to calculate the normalization and cross product of two arrays (b,z) (see the code please) for each column (for the first column, then the second one and so on).
let say A is:
A=[[-0.00022939 -0.04265404 0.00022939]
[ 0. -0.2096513 0. ]
[ 0.00026388 0.00465183 0.00026388]]
how can I take the first column( -0.00022939, 0., 0.00026388) from A and use it in the function below, then take then second column, ... n column
def vectors(b):
b = b/np.sqrt(np.sum(b**2.,axis=0))
b = b/np.linalg.norm(b)
z = np.array([0.,0.,1.])
n1 = np.cross(z,b,axis=0)
n1 = n1/np.linalg.norm(n1) ## normalize n
return [n1]
n1 = vectors(A)
How can I make a loop that picks the first column and makes the calculation, then the second column and so on. Any help!!. Thank in advance
It depends on how you set up your array to start with. I like to use numpy arrays as I find the indexing easier to get my head around. I think the below code is what you are after. As you always have 3 colulmns it doesnt matter how long A is, you can just slice it into 3 columns.
import numpy as np
A=np.array([[-0.00022939, -0.04265404, 0.00022939],
[-0.00022939, -0.04265404, 0.00022939],
[0., -0.2096513, 0.],
[0.00026388, 0.00465183, 0.00026388]])
for idx in range(3):
b = A[:, idx]
print b # call your function here
EDIT:: Full implementation showing the code & the output
import numpy as np
def vectors(b):
b = b/np.sqrt(np.sum(b**2.,axis=0))
b = b/np.linalg.norm(b)
z = np.array([0.,0.,1.])
n1 = np.cross(z,b,axis=0)
n1 = n1/np.linalg.norm(n1) ## normalize n
return [n1]
A=np.array([[-0.00022939, -0.04265404, 0.00022939],
[ 0., -0.2096513, 0. ],
[ 0.00026388, 0.00026388, 0.00026388]])
for idx in range(3):
b = A[:, idx]
n1 = vectors(b)
print 'idx', idx, '\nb ', b, '\nn1 ', n1, '\n'
Output:
idx 0
b [-0.00022939 0. 0.00026388]
n1 [array([ 0., -1., 0.])]
idx 1
b [-0.04265404 -0.2096513 0.00026388]
n1 [array([ 0.9799247 , -0.19936794, 0. ])]
idx 2
b [ 0.00022939 0. 0.00026388]
n1 [array([ 0., 1., 0.])]
You can try this:
A=[[1,2,3],[4,5,6],[7,8,9]]
def getColumn(m):
res=[]
for x in A:
res.append(x[m])
return res
def countSomething(x):
# counting code here
print x
def looper(n): # n is the second dimension size
for x in xrange(0,n):
countSomething(getColumn(x))
looper(3)

how to use sparse vectors and matrices in Python?

I am trying to do something very simple, but confused by the abundance of information about sparse matrices and vectors in Python.
I want to create two vectors, x and y, one of length 5 and one of length 6, being sparse. Then I want to set one coordinate in each one of them. Then I want to create a matrix A, sparse, which is 5 x 6 and add to it the outer product between x and y. I then want to do SVD on that A.
Here is what I tried, and it goes wrong in many ways.
from scipy import sparse;
import numpy as np;
import scipy.sparse.linalg as ssl;
x = sparse.bsr_matrix(np.zeros(5));
x[1] = 1;
y = sparse.bsr_matrix(np.zeros(6));
y[1] = 2;
A = sparse.coo_matrix(5, 6);
A = A + np.outer(x,y.transpose())
svdresult = ssl.svds(A,1);
At first, you should determine data you want to store in sparse matrix before constructing it. Otherwise you should use sparse.csc_matrix or sparse.csr_matrix instead. Then you can assign or change data like this:
x[0, 1] = 1
At second, outer product of vectors x and y is equivalent to x.transpose() * y.
Here is working code:
from scipy import sparse
import numpy as np
import scipy.sparse.linalg as ssl
x = np.zeros(5)
x[1] = 1
x_bsr = sparse.bsr_matrix(x)
y = np.zeros(6)
y[1] = 2
y_bsr = sparse.bsr_matrix(y)
A = sparse.coo_matrix((5, 6)) # Sparse matrix 5 x 6
B = x_bsr.transpose().dot(y_bsr) # Outer product of x and y
svdresult = ssl.svds((A + B), 1)
Output:
(array([[ 5.55111512e-17],
[ -1.00000000e+00],
[ 0.00000000e+00],
[ -2.77555756e-17],
[ 1.11022302e-16]]), array([ 2.]), array([[ 0., -1., 0., 0., 0., 0.]]))

How to add column to numpy array

I am trying to add one column to the array created from recfromcsv. In this case it's an array: [210,8] (rows, cols).
I want to add a ninth column. Empty or with zeroes doesn't matter.
from numpy import genfromtxt
from numpy import recfromcsv
import numpy as np
import time
if __name__ == '__main__':
print("testing")
my_data = recfromcsv('LIAB.ST.csv', delimiter='\t')
array_size = my_data.size
#my_data = np.append(my_data[:array_size],my_data[9:],0)
new_col = np.sum(x,1).reshape((x.shape[0],1))
np.append(x,new_col,1)
I think that your problem is that you are expecting np.append to add the column in-place, but what it does, because of how numpy data is stored, is create a copy of the joined arrays
Returns
-------
append : ndarray
A copy of `arr` with `values` appended to `axis`. Note that `append`
does not occur in-place: a new array is allocated and filled. If
`axis` is None, `out` is a flattened array.
so you need to save the output all_data = np.append(...):
my_data = np.random.random((210,8)) #recfromcsv('LIAB.ST.csv', delimiter='\t')
new_col = my_data.sum(1)[...,None] # None keeps (n, 1) shape
new_col.shape
#(210,1)
all_data = np.append(my_data, new_col, 1)
all_data.shape
#(210,9)
Alternative ways:
all_data = np.hstack((my_data, new_col))
#or
all_data = np.concatenate((my_data, new_col), 1)
I believe that the only difference between these three functions (as well as np.vstack) are their default behaviors for when axis is unspecified:
concatenate assumes axis = 0
hstack assumes axis = 1 unless inputs are 1d, then axis = 0
vstack assumes axis = 0 after adding an axis if inputs are 1d
append flattens array
Based on your comment, and looking more closely at your example code, I now believe that what you are probably looking to do is add a field to a record array. You imported both genfromtxt which returns a structured array and recfromcsv which returns the subtly different record array (recarray). You used the recfromcsv so right now my_data is actually a recarray, which means that most likely my_data.shape = (210,) since recarrays are 1d arrays of records, where each record is a tuple with the given dtype.
So you could try this:
import numpy as np
from numpy.lib.recfunctions import append_fields
x = np.random.random(10)
y = np.random.random(10)
z = np.random.random(10)
data = np.array( list(zip(x,y,z)), dtype=[('x',float),('y',float),('z',float)])
data = np.recarray(data.shape, data.dtype, buf=data)
data.shape
#(10,)
tot = data['x'] + data['y'] + data['z'] # sum(axis=1) won't work on recarray
tot.shape
#(10,)
all_data = append_fields(data, 'total', tot, usemask=False)
all_data
#array([(0.4374783740738456 , 0.04307289878861764, 0.021176067323686598, 0.5017273401861498),
# (0.07622262416466963, 0.3962146058689695 , 0.27912715826653534 , 0.7515643883001745),
# (0.30878532523061153, 0.8553768789387086 , 0.9577415585116588 , 2.121903762680979 ),
# (0.5288343561208022 , 0.17048864443625933, 0.07915689716226904 , 0.7784798977193306),
# (0.8804269791375121 , 0.45517504750917714, 0.1601389248542675 , 1.4957409515009568),
# (0.9556552723429782 , 0.8884504475901043 , 0.6412854758843308 , 2.4853911958174133),
# (0.0227638618687922 , 0.9295332854783015 , 0.3234597575660103 , 1.275756904913104 ),
# (0.684075052174589 , 0.6654774682866273 , 0.5246593820025259 , 1.8742119024637423),
# (0.9841793718333871 , 0.5813955915551511 , 0.39577520705133684 , 1.961350170439875 ),
# (0.9889343795296571 , 0.22830104497714432, 0.20011292764078448 , 1.4173483521475858)],
# dtype=[('x', '<f8'), ('y', '<f8'), ('z', '<f8'), ('total', '<f8')])
all_data.shape
#(10,)
all_data.dtype.names
#('x', 'y', 'z', 'total')
If you have an array, a of say 210 rows by 8 columns:
a = numpy.empty([210,8])
and want to add a ninth column of zeros you can do this:
b = numpy.append(a,numpy.zeros([len(a),1]),1)
The easiest solution is to use numpy.insert().
The Advantage of np.insert() over np.append is that you can insert the new columns into custom indices.
import numpy as np
X = np.arange(20).reshape(10,2)
X = np.insert(X, [0,2], np.random.rand(X.shape[0]*2).reshape(-1,2)*10, axis=1)
'''
np.append or np.hstack expects the appended column to be the proper shape, that is N x 1. We can use np.zeros to create this zeros column (or np.ones to create a ones column) and append it to our original matrix (2D array).
def append_zeros(x):
zeros = np.zeros((len(x), 1)) # zeros column as 2D array
return np.hstack((x, zeros)) # append column
I add a new column with ones to a matrix array in this way:
Z = append([[1 for _ in range(0,len(Z))]], Z.T,0).T
Maybe it is not that efficient?
It can be done like this:
import numpy as np
# create a random matrix:
A = np.random.normal(size=(5,2))
# add a column of zeros to it:
print(np.hstack((A,np.zeros((A.shape[0],1)))))
In general, if A is an m*n matrix, and you need to add a column, you have to create an n*1 matrix of zeros, then use "hstack" to add the matrix of zeros to the right of the matrix A.
Similar to some of the other answers suggesting using numpy.hstack, but more readable:
import numpy as np
# declare 10 rows x 3 cols integer array of all 1s
arr = np.ones((10, 3), dtype=np.int64)
# get the number of rows in the original array (as if we didn't know it was 10 or it could be different in other cases)
numRows = arr.shape[0]
# declare the new array which will be the new column, integer array of all 0s so it's visually distinct from the original array
additionalColumn = np.zeros((numRows, 1), dtype=np.int64)
# use hstack to tack on the additionl column
result = np.hstack((arr, additionalColumn))
print(result)
result:
$ python3 scratchpad.py
[[1 1 1 0]
[1 1 1 0]
[1 1 1 0]
[1 1 1 0]
[1 1 1 0]
[1 1 1 0]
[1 1 1 0]
[1 1 1 0]
[1 1 1 0]
[1 1 1 0]]
Here's a shorter one-liner:
import numpy as np
data = np.random.rand(210, 8)
data = np.c_[data, np.zeros(len(data))]
Something that I use often to convert points to homogenous coordinates with np.ones instead.

Categories