How to broadcast the sum of list of list in an efficient way?
below is a working code, but its not quite efficient when list1 has nth value like 30 elements.
Any improvement on this?
from numpy import sum
import numpy as np
list1 = [[4,8],[8,16]]
list2 = [2]
elemSum=[sum(list1[0]),sum(list1[1])]
print((np.array(elemSum)/np.array(list2)))
prints:
[ 6. 12.] # expected output
I want a single line like this below , eliminating the declaration of variable elemSum, but it yields incorrect output since it sums 2 elements to 1
print(sum(np.array(list1)/np.array(list2)))
prints:
18.0 # not expected it sums 2 elements to 1
numpy.sum takes an optional axis argument, which can be used for partial sums along a single axis:
>>> list1 = np.array([[4,8],[8,16]])
>>> list2 = np.array([2])
>>> np.sum(list1)
36
>>> np.sum(list1, axis=1)
array([12, 24])
>>> np.sum(list1, axis=1) / list2
array([ 6., 12.])
Just use numpy the entire time, don't mess with lists if you want arrays:
list1 = [[4,8],[8,16]]
list2 = [2]
import numpy as np
arr1 = np.array(list1)
arr2 = np.array(list2)
Then simply:
result = arr1.sum(axis=1) / arr2
Related
I've a numpy array
[
[1,5,6],
[10,12,20]
]
I want to do an operation on every element of the array that, takes the current element value and does an operation. A sort of bulk operation like the np.square function.
i.e. x = (x + 1) * 2
the result would be:
[
[4,12,14],
[22,26,42]
]
I know I can do a for loop for every element and apply the operation but I want more compact syntax.
You can use numpy:
import numpy as np
lst = [
[1, 5, 6],
[10, 12, 20]
]
lst = np.array(lst)
lst = (lst + 1) * 2
print(lst)
Which outputs, as desired:
[[ 4 12 14]
[22 26 42]]
if at the end you want it to be a list you may convert it back, but np arrays are great.
Solution using numpy vectorize:
# define numpy array
arr = np.array([
[1,5,6],
[10,12,20]
])
# create function
func = np.vectorize(lambda x: (x + 1) * 2)
# apply function to array
func(arr)
I'm trying to convert this numpy.ndarray to a list
[[105.53518731]
[106.45317529]
[107.37373843]
[108.00632646]
[108.56373502]
[109.28813113]
[109.75593207]
[110.57458371]
[111.47960639]]
I'm using this function to convert it.
conver = conver.tolist()
the output is this, I'm not sure whether it's a list and if so, can I access its elements by doing cover[0] , etc
[[105.5351873125], [106.45317529411764], [107.37373843478261], [108.00632645652173], [108.56373502040816], [109.28813113157895], [109.75593206666666], [110.57458370833334], [111.47960639393939]]
finally, after I convert it to a list, I try to multiply the list members by 1.05 and get this error!
TypeError: can't multiply sequence by non-int of type 'float'
You start with a 2d array, with shape (n,1), like this:
In [342]: arr = np.random.rand(5,1)*100
In [343]: arr
Out[343]:
array([[95.39049043],
[19.09502087],
[85.45215423],
[94.77657561],
[32.7869103 ]])
tolist produces a list - but it contains lists; each [] layer denotes a list. Notice that the [] nesting matches the array's:
In [344]: arr.tolist()
Out[344]:
[[95.39049043424225],
[19.095020872584335],
[85.4521542296349],
[94.77657561477125],
[32.786910295446425]]
To get a number you have to index through each list layer:
In [345]: arr.tolist()[0]
Out[345]: [95.39049043424225]
In [346]: arr.tolist()[0][0]
Out[346]: 95.39049043424225
In [347]: arr.tolist()[0][0]*1.05
Out[347]: 100.16001495595437
If you first turn the array into a 1d one, the list indexing is simpler:
In [348]: arr.ravel()
Out[348]: array([95.39049043, 19.09502087, 85.45215423, 94.77657561, 32.7869103 ])
In [349]: arr.ravel().tolist()
Out[349]:
[95.39049043424225,
19.095020872584335,
85.4521542296349,
94.77657561477125,
32.786910295446425]
In [350]: arr.ravel().tolist()[0]
Out[350]: 95.39049043424225
But if your primary goal is to multiply the elements, doing with the array is simpler:
In [351]: arr * 1.05
Out[351]:
array([[100.16001496],
[ 20.04977192],
[ 89.72476194],
[ 99.5154044 ],
[ 34.42625581]])
You can access elements of the array with:
In [352]: arr[0,0]
Out[352]: 95.39049043424225
But if you do need to iterate, the tolist() option is good to know. Iterating on lists is usually faster than iterating on an array. With an array you should try to use the fast whole-array methods.
you convert to list of list, so you could not broadcast.
import numpy as np
x = [[105.53518731],
[106.45317529],
[107.37373843],
[108.00632646],
[108.56373502],
[109.28813113],
[109.75593207],
[110.57458371],
[111.47960639],]
x = np.hstack(x)
x * 1.05
array([110.81194668, 111.77583405, 112.74242535, 113.40664278,
113.99192177, 114.75253769, 115.24372867, 116.1033129 ,
117.05358671])
yes, it's a list, you can check the type of a variable:
type(a)
to multiply each element with 1.05 then run the code below:
x = [float(i[0]) * 1.05 for i in a]
print(x)
Try this:
import numpy as np
a = [[105.53518731],
[106.45317529],
[107.37373843],
[108.00632646],
[108.56373502],
[109.28813113],
[109.75593207],
[110.57458371],
[111.47960639]]
b = [elem[0] for elem in a]
b = np.array(b)
print(b*1.05)
How do you add the elements in sub-lists according to the index of the values? For example, how do you turn this:
nested_list = [[1,2],[3,4],[5,6]]
into this? :
sublist_sums = [9,12] # [1 + 3 + 5, 2 + 4 + 6]
Sorry if the title wasn't very clear, I wasn't really sure how to put it.
If using NumPy is allowed, then you can use numpy.sum() along axis=0:
In [11]: np.sum(nested_list, axis=0)
Out[11]: array([ 9, 12])
On the other hand, if you want a plain Python solution, then using ziped result in a list comprehension would suffice:
In [32]: [sum(l) for l in zip(*nested_list)]
Out[32]: [9, 12]
Already an answer is accepted , but the following can also be used for
your requirement.Let me know does this answer your question.
import pandas as pd
import numpy as np
c = ['Val1','Val2']
v = [
[1,1.0],
[2,1.0],
[1,1.0],
[2,0.98],
[3,0.78],
[4,0.70],
[9,0.97],
[6,0.67],
[12,0.75],
]
n = len(v)
df = pd.DataFrame(v,columns=c)
#Take top N ie all elements in this case and sum it.
print(list(df.groupby('Val1').head(n).sum()))
#### Output ####
[40.0, 7.85]
#Alternatively you can create a column where the value is same for all
#In my case column is 'id' and value is 1
#Then apply group-by-sum on 'id'
df['id'] = [1]*n
print(df.groupby('id').sum())
#### Output ####
Val1 Val2
id
1 40 7.85
I would like to build up a numpy matrix using rows I get in a loop. But how do I initialize the matrix? If I write
A = []
A = numpy.vstack((A, [1, 2]))
I get
ValueError: all the input array dimensions except for the concatenation axis must match exactly
What's the best practice for this?
NOTE: I do not know the number of rows in advance. The number of columns is known.
Unknown number of rows
One way is to form a list of lists, and then convert to a numpy array in one operation:
final = []
# x is some generator
for item in x:
final.append(x)
A = np.array(x)
Or, more elegantly, given a generator x:
A = np.array(list(x))
This solution is time-efficient but memory-inefficient.
Known number of rows
Append operations on numpy arrays are expensive and not recommended. If you know the size of the final array in advance, you can instantiate an empty (or zero) array of your desired size, and then fill it with values. For example:
A = np.zeros((10, 2))
A[0] = [1, 2]
Or in a loop, with a trivial assignment to demonstrate syntax:
A = np.zeros((2, 2))
# in reality, x will be some generator whose length you know in advance
x = [[1, 2], [3, 4]]
for idx, item in enumerate(x):
A[idx] = item
print(A)
array([[ 1., 2.],
[ 3., 4.]])
This question already has answers here:
Numpy sum elements in array based on its value
(2 answers)
Closed 4 years ago.
Maybe has been asked before, but I can't find it.
Sometimes I have an index I, and I want to add successively accordingly to this index to an numpy array, from another array. For example:
A = np.array([1,2,3])
B = np.array([10,20,30])
I = np.array([0,1,1])
for i in range(len(I)):
A[I[i]] += B[i]
print(A)
prints the expected (correct) value:
[11 52 3]
while
A[I] += B
print(A)
results in the expected (wrong) answer
[11 32 3].
Is there any way to do what I want in a vectorized way, without the loop?
If not, which is the fastest way to do this?
Use numpy.add.at:
>>> import numpy as np
>>> A = np.array([1,2,3])
>>> B = np.array([10,20,30])
>>> I = np.array([0,1,1])
>>>
>>> np.add.at(A, I, B)
>>> A
array([11, 52, 3])
Alternatively, np.bincount:
>>> A = np.array([1,2,3])
>>> B = np.array([10,20,30])
>>> I = np.array([0,1,1])
>>>
>>> A += np.bincount(I, B, minlength=A.size).astype(int)
>>> A
array([11, 52, 3])
Which is faster?
Depends. In this concrete example add.at seems marginally faster, presumably because we need to convert types in the bincount solution.
If OTOH A and B were float dtype then bincount would be faster.
You need to use np.add.at:
A = np.array([1,2,3])
B = np.array([10,20,30])
I = np.array([0,1,1])
np.add.at(A, I, B)
print(A)
prints
array([11, 52, 3])
This is noted in the doc:
ufunc.at(a, indices, b=None)
Performs unbuffered in place operation on operand ‘a’ for elements specified by ‘indices’. For addition ufunc, this method is equivalent to a[indices] += b, except that results are accumulated for elements that are indexed more than once. For example, a[[0,0]] += 1 will only increment the first element once because of buffering, whereas add.at(a, [0,0], 1) will increment the first element twice.