Related
I have 5 multidimensional arrays
A = [ [1,2,3,4,5,6]
[3,4,5,6,7,8]
[5,6,7,8,9,0] ]
B = [ [11,12,13,14,15,16]
[21,22,23,24,25,26]
[13,14,15,16,17,18] ]
C = [ [31,32,33,34,35,36]
[12,13,14,15,16,17]
[20,21,22,23,24,25] ]
D = [ [2,3,4,5,6,7]
[3,4,5,6,7,8]
[6,7,8,9,0,11] ]
Base = [ [11,22,33,44,55,66]
[12,23,34,45,56,67]
[33,44,55,66,77,88]
[1,2,3,4,5,6] ]
So I want to multiply the array A, B, C, and D by the Base array. Which will produce output like the following
Output = [ [A[0]*Base[0] + B[0]*Base[1] + C[0]*Base[2] + D[0]*Base[3] ], (this is summed)
[A[1]*Base[0] + B[1]*Base[1] + C[1]*Base[2] + D[1]*Base[3] ], (this is summed)
[A[2]*Base[0] + B[2]*Base[1] + C[2]*Base[2] + D[2]*Base[3] ], (this is summed)
[A[3]*Base[0] + B[3]*Base[1] + C[3]*Base[2] + D[3]*Base[3] ] (this is summed)
]
what I've been trying to do is to use an algorithm like the following.
Multiply them one by one,
Output 1 = A*Base[0]
Output 2 = B*Base[1]
Output 3 = C*Base[2]
Output 4 = D*Base[3]
then add them up one by one
Sum1 = sum(Output1[0])
Sum2 = sum[Output1[1])
Sum3 = sum(Output1[2])
Sum4 = sum(Output2[0])
Sum5 = sum(Output2[1])
...
...
...
...
So I get the output by adding them up : [Sum1 + Sum4 +.....], [Sum2 + Sum5+....]....
Is there a simpler and shorter way to do this, apart from multiplying them one by one?
It's not very clear what operation result you want with multiplying arrays, but just use numpy to make your life easier.
A, B, C, D, Base = np.array(A), np.array(B), np.array(C), np.array(D), np.array(Base)
For instance, the following operation multiplies and sum A * Base resulting in 4 * 3 sum arrays:
[sum(list(base_list * A_list)) for base_list in Base for A_list in A]
[1001, 1463, 1265, 1022, 1496, 1300, 1463, 2189, 2035, 91, 133, 115]
Then just sum the results to obtain the full sum of A*Base in 1/2 lines (and do the same with the remaining ones)
sum([1001, 1463, 1265, 1022, 1496, 1300, 1463, 2189, 2035, 91, 133, 115])
13573
numpy is really useful for this type of operation. Here a code snippet for what you want to do:
import numpy as np
A = [[1, 2, 3, 4, 5, 6],
[3, 4, 5, 6, 7, 8],
[5, 6, 7, 8, 9, 0]]
B = [[11, 12, 13, 14, 15, 16],
[21, 22, 23, 24, 25, 26],
[13, 14, 15, 16, 17, 18]]
C = [[31, 32, 33, 34, 35, 36],
[12, 13, 14, 15, 16, 17],
[20, 21, 22, 23, 24, 25]]
D = [[2, 3, 4, 5, 6, 7],
[3, 4, 5, 6, 7, 8],
[6, 7, 8, 9, 0, 11]]
Base = [
[11, 22, 33, 44, 55, 66],
[12, 23, 34, 45, 56, 67],
[33, 44, 55, 66, 77, 88],
[1, 2, 3, 4, 5, 6]
]
A = np.asarray(A)
B = np.asarray(B)
C = np.asarray(C)
D = np.asarray(D)
Base = np.asarray(Base)
Output = np.asarray([
[A[0] * Base[0] + B[0] * Base[1] + C[0] * Base[2] + D[0] * Base[3]],
[A[1] * Base[0] + B[1] * Base[1] + C[1] * Base[2] + D[1] * Base[3]],
[A[2] * Base[0] + B[2] * Base[1] + C[2] * Base[2] + D[2] * Base[3]],
# [A[3] * Base[0] + B[3] * Base[1] + C[3] * Base[2] + D[3] * Base[3]] # IndexError: index 3 is out of bounds for axis 0 with size 3 -> the matrices A, B, C and D have only 3 rows
]).squeeze()
print(f"{A.shape = } {B.shape = } {C.shape = } {D.shape = } {Base.shape = }, {Output.shape = }")
# simple math
results_simple = A * Base[0] + B * Base[1] + C * Base[2] + D * Base[3]
print(f"{results_simple.shape = }")
print(f"{np.allclose(results_simple, Output) = }")
# result with more algebra
M = np.stack([A, B, C, D], axis=1)
print(f"{M.shape = }")
results_algebra = np.sum(M * Base, axis=1)
print(f"{results_algebra.shape = }")
print(f"{np.allclose(results_algebra, Output) = }")
# results with einsum
results_einsum = np.einsum("ijk,jk->ik", M, Base)
print(f"{results_einsum.shape = }")
print(f"{np.allclose(results_einsum, Output) = }")
out:
A.shape = (3, 6) B.shape = (3, 6) C.shape = (3, 6) D.shape = (3, 6) Base.shape = (4, 6), Output.shape = (3, 6)
results_simple.shape = (3, 6)
np.allclose(results_simple, Output) = True
M.shape = (3, 4, 6)
results_algebra.shape = (3, 6)
np.allclose(results_algebra, Output) = True
results_einsum.shape = (3, 6)
np.allclose(results_einsum, Output) = True
After this, you can get the sum of the output in a certain axis using the function numpy.sum. You will have something like this:
out_sum = np.sum(Output, axis=-1)
num_pixels_per_cell_one_axis = 4
num_cells_per_module_one_axis = 4
inter_cell_sep = 2
max_items_in_list = num_cells_per_module_one_axis * num_pixels_per_cell_one_axis + (num_cells_per_module_one_axis-1) * inter_cell_sep
print(max_items_in_list)
indices_to_retain = list(range(max_items_in_list))
indices_to_remove = indices_to_retain[num_pixels_per_cell_one_axis :: num_pixels_per_cell_one_axis + inter_cell_sep]
The result I'm trying to get is the list indices_to_retain =[0,1,2,3,6,7,8,9,12,13,14,15,18,19,20,21]
IIUC you want to keep 4 items, then skip 2?
You could use:
keep, skip = 4,2
indices_to_retain = [i for i in range(max_items_in_list) if i%(skip+keep)<keep]
output:
>>> indices_to_retain
[0, 1, 2, 3, 6, 7, 8, 9, 12, 13, 14, 15, 18, 19, 20, 21]
NB. amusingly, I answered to a very similar question just a few hours ago
using numpy:
indices_to_retain = np.arange(max_items_in_list)
indices_to_retain = indices_to_retain[indices_to_retain%(skip+keep)<keep]
If I have a list test
test = [i for i in range(20)]
print(test)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
and I want to get the last 3 numbers every 5 numbers such that I get a list that looks like:
[2, 3, 4, 7, 8, 9, 12, 13, 14, 17, 18, 19]
Is there a way to do this with list slicing? I can do it with a modulo function like
[i for i in test if i % 5 > 1]
But I'm wondering if there is a way to do this with list slicing? Thanks
Use the filter function:
list(filter(lambda x: x % 5 > 1, test)) # [2, 3, 4, 7, 8, 9, 12, 13, 14, 17, 18, 19]
If ordering does not matter, you can try the following:
test[2::5] + test[3::5] + test[4::5]
Or more generally speaking
start = 2 #Number of indices to skip
n = 5
new_test = []
while start < 5:
b.extend(test[start::n])
start += 1
Yes, but I very much doubt it will be faster than a simple list comprehension:
from itertools import chain, zip_longest as zipl
def offset_modulo(l, x, n):
sentinel = object()
slices = (l[i::n] for i in range(x, n))
iterable = chain.from_iterable(zipl(*slices, fillvalue=sentinel))
return list(filter(lambda x: x is not sentinel, iterable))
print(offset_modulo(range(20), 2, 5))
# [2, 3, 4, 7, 8, 9, 12, 13, 14, 17, 18, 19]
print(offset_modulo(range(24), 2, 5))
# [2, 3, 4, 7, 8, 9, 12, 13, 14, 17, 18, 19, 22, 23]
Basically, this approach gets the list slices that represents each the index i such that i % n >= x. It then uses zip and chain to flatten those into the output.
Edit:
A simpler way
def offset(l, x, n):
diff = n-x
slices = (l[i:i+diff] for i in range(x, len(l), n))
return list(chain.from_iterable(slices))
offset(range(20), 2, 5)
# [2, 3, 4, 7, 8, 9, 12, 13, 14, 17, 18, 19]
offset(range(24), 2, 5)
# [2, 3, 4, 7, 8, 9, 12, 13, 14, 17, 18, 19, 22, 23]
Where we get the slices of the adjacent elements we want, then chain those together.
I propose this solution:
from functools import reduce
reduce(lambda x, y: x + y, zip(test[2::5], test[3::5], test[4::5]))
Testing with timeit, it is faster than filter and comprehension list (at least on my pc).
Here the code to carry out an execution time comparison:
import numpy as np
import timeit
a = timeit.repeat('list(filter(lambda x: x % 5 > 1, test))',
setup='from functools import reduce; test = list(range(20))',
repeat=20,
number=100000)
b = timeit.repeat('[i for i in test if i % 5 > 1]',
repeat=20,
setup='test = list(range(20))',
number=100000)
c = timeit.repeat('reduce(lambda x, y: x + y, zip(test[2::5], test[3::5], test[4::5]))',
repeat=20,
setup='from functools import reduce;test = list(range(20))',
number=100000)
list(map(lambda x: print("{}:\t\t {} ({})".format(x[0], np.mean(x[1]), np.std(x[1]))),
[("filter list", a),
('comprehension', b),
('reduce + zip', c)]))
The previous code produce the following results:
filter list: 0.2983790061000036 (0.007463432805174629)
comprehension: 0.15115660065002884 (0.004455055805853705)
reduce + zip: 0.11976779574997636 (0.002553487341208172)
I hope this can help :)
I have a simple R script for running FactoMineR's PCA on a tiny dataframe in order to find the cumulative percentage of variance explained for each variable:
library(FactoMineR)
a <- c(1, 2, 3, 4, 5)
b <- c(4, 2, 9, 23, 3)
c <- c(9, 8, 7, 6, 6)
d <- c(45, 36, 74, 35, 29)
df <- data.frame(a, b, c, d)
df_pca <- PCA(df, ncp = 4, graph=F)
print(df_pca$eig$`cumulative percentage of variance`)
Which returns:
> print(df_pca$eig$`cumulative percentage of variance`)
[1] 58.55305 84.44577 99.86661 100.00000
I'm trying to do the same in Python using scikit-learn's decomposition package as follows:
import pandas as pd
from sklearn import decomposition, linear_model
a = [1, 2, 3, 4, 5]
b = [4, 2, 9, 23, 3]
c = [9, 8, 7, 6, 6]
d = [45, 36, 74, 35, 29]
df = pd.DataFrame({'a': a,
'b': b,
'c': c,
'd': d})
pca = decomposition.PCA(n_components = 4)
pca.fit(df)
transformed_pca = pca.transform(df)
# sum cumulative variance from each var
cum_explained_var = []
for i in range(0, len(pca.explained_variance_ratio_)):
if i == 0:
cum_explained_var.append(pca.explained_variance_ratio_[i])
else:
cum_explained_var.append(pca.explained_variance_ratio_[i] +
cum_explained_var[i-1])
print(cum_explained_var)
But this results in:
[0.79987089715487936, 0.99224337624509307, 0.99997254568237226, 1.0]
As you can see, both correctly add up to 100%, but it seems the contributions of each variable differ between the R and Python versions. Does anyone know where these differences are coming from or how to correctly replicate the R results in Python?
EDIT: Thanks to Vlo, I now know that the differences stem from the FactoMineR PCA function scaling the data by default. By using the sklearn preprocessing package (pca_data = preprocessing.scale(df)) to scale my data before running PCA, my results match the
Thanks to Vlo, I learned that the differences between the FactoMineR PCA function and the sklearn PCA function is that the FactoMineR one scales the data by default. By simply adding a scaling function to my python code, I was able to reproduce the results.
import pandas as pd
from sklearn import decomposition, preprocessing
a = [1, 2, 3, 4, 5]
b = [4, 2, 9, 23, 3]
c = [9, 8, 7, 6, 6]
d = [45, 36, 74, 35, 29]
e = [35, 84, 3, 54, 68]
df = pd.DataFrame({'a': a,
'b': b,
'c': c,
'd': d})
pca_data = preprocessing.scale(df)
pca = decomposition.PCA(n_components = 4)
pca.fit(pca_data)
transformed_pca = pca.transform(pca_data)
cum_explained_var = []
for i in range(0, len(pca.explained_variance_ratio_)):
if i == 0:
cum_explained_var.append(pca.explained_variance_ratio_[i])
else:
cum_explained_var.append(pca.explained_variance_ratio_[i] +
cum_explained_var[i-1])
print(cum_explained_var)
Output:
[0.58553054049052267, 0.8444577483783724, 0.9986661265687754, 0.99999999999999978]
I have two lists of numbers, say [1, 2, 3, 4, 5] and [7, 8, 9, 10, 11], and I would like to form a new list which consists of the products of each member in the first list with each member in the second list. In this case, there would be 5*5 = 25 elements in the new list.
I have been unable to do this so far with a while() loop.
This is what I have so far:
x = 0
y = 99
results = []
while x < 5:
x = x + 1
results.append(x*y)
while y < 11:
y = y + 1
results.append(x*y)
Use itertools.product to generate all possible 2-tuples, then calculate the product of that:
[x * y for (x, y) in itertools.product([1,2,3,4,5], [7,8,9,10,11])]
The problem is an example of an outer product. The answer already posted with itertools.product is the way I would do this as well.
But here's an alternative with numpy, which is usually more efficient than working in pure python for crunching numeric data.
>>> import numpy as np
>>> x1 = np.array([1,2,3,4,5])
>>> x2 = np.array([7,8,9,10,11])
>>> np.outer(x1,x2)
array([[ 7, 8, 9, 10, 11],
[14, 16, 18, 20, 22],
[21, 24, 27, 30, 33],
[28, 32, 36, 40, 44],
[35, 40, 45, 50, 55]])
>>> np.ravel(np.outer(x1,x2))
array([ 7, 8, 9, 10, 11, 14, 16, 18, 20, 22, 21, 24, 27, 30, 33, 28, 32,
36, 40, 44, 35, 40, 45, 50, 55])
Wht dont you try with known old ways;
list1 = range(1, 100)
list2 = range(10, 50, 5)
new_values = []
for x in list1:
for y in list2:
new_values.append(x*y)
Without any importing, you can do:
[x * y for x in range(1, 6) for y in range(7, 12)]
or alternatively:
[[x * y for x in range(1, 6)] for y in range(7, 12)]
To split out the different multiples, but it depends which order you want the results in.
from functools import partial
mult = lambda x, y: x * y
l1 = [2,3,4,5,5]
l2 = [5,3,23,4,4]
results = []
for n in l1:
results.extend( map( partial(mult, n) , l2) )
print results