I am working with a list vectors A and B. Each vector has dimension n, but the size of A and B are different. For each vector in A, I would like to compute its product with all the vectors in B. Here is an example with n=2:
import numpy as np
A = np.random.rand(10,2)
B = np.random.rand(5,2)
for a in A:
PRODUCT = 1
for b in B:
PRODUCT = PRODUCT * np.matmul(a, b)
This does what I want, but I was wondering if there are faster methods that avoid using nested for-loops. One idea I had was to use cartesian products by using from itertools import product, and then somehow doing these computations every i % len(B) == 0 iterations. But I was not able to make that work.
Are there ways for improvements? Or nested for loops are the way to go?
There might be a cleaner way to do this, but it works
import numpy as np
from itertools import product
A = np.random.rand(10,2)
B = np.random.rand(5,2)
one = []
two = []
curr = A[0]
PRODUCT = 1
for a, b in product(A, B):
if np.array_equal(a, curr) is False:
curr = a
PRODUCT = 1
PRODUCT = PRODUCT * np.matmul(a, b)
one.append(PRODUCT)
for a in A:
PRODUCT = 1
for b in B:
PRODUCT = PRODUCT * np.matmul(a, b)
two.append(PRODUCT)
if one == two:
print(True)
else:
print(False)
Results
True
Related
a and b are two arrays of floats of length n each. a can have both negative and positive entries.
b is cumulative sum of a.
b[0] != a[0]. In fact, b[0] = a[0] + k
Both a and b are shuffled such that the relative order between them is maintained, i.e., if say a[0] becomes a[6] then b[0] will become b[6] and so on.
Can someone suggest an algo to find k for randomly shuffled a and b such that their relative order is maintained.
My naive attempt below (which takes forever for n>=10)
import numpy as np
import itertools
def get_starting_point(a, b):
for msk in itertools.permutations(range(len(a))): # NOTE: Takes forever for n>=10.
new_a = a[list(msk)]
new_b = b[list(msk)]
k = new_b[0] - new_a[0]
new_a = np.cumsum(new_a) + k
if np.nansum(np.abs(new_b - new_a)) < 0.001:
return k
return None
Generate samples of a, b and expected k to try your solution:
def get_a_b_k(n=14):
a = np.round(np.random.uniform(low=-10, high=10, size=(n,)), 2)
b = np.cumsum(a)
prob = np.random.uniform(0,1)
if prob < 0.4:
k = np.round(np.random.uniform(-10,10), 2)
# NOTE: this elif can be removed as its just sub-case of else block.
elif prob < 0.6: # k same as the last b.
k = b[n-1]
a[n-2] -= k
else: # k same as one of b's
idx = np.random.choice(n, size=1)
k = b[idx]
a[idx] -= k
b = np.cumsum(a)
msk = np.random.choice(n, size=n, replace=False) # Randomly generated mask of size n.
return a[msk], b[msk] + k, k
We have:
b = np.cumsum(a) + k
We can compute b-a to get the previous elements of the sum. Thus the only element of b-a that does not belong to b indicates the position of the start.
As we are working with floating point numbers, we need a function to match floating point values. I used isin_tolerance that is defined here.
def solve(a, b):
m = isin_tolerance(b-a, b, 1e-8)
return (b[~m] - a[~m])[0]
np.random.seed(0)
for i in range(1_000_000):
a, b, k = get_a_b_k()
assert np.isclose(k, solve(a, b))
This takes a few minutes to run on 1M attempts but did not fail. On 10k tests with n=200 this runs in ~2s.
NB. This could fail if coincidentally, k is equal to one of the values in b, but this is fairly unlikely and did not happen in my random tests.
I'd like to know how to make the following code shorter and/or more efficient. Could I (or should I) get rid of the for loop by using a functional method, or is there method I should be using from numpy?
The code calculates the expected value of an array of of integers.
vals = np.arange(self.n+1)
# array of probability of each value in vals
parr = np.ones(len(vals))
for i in range(len(vals)):
parr[i] *= self.prob(vals[i])
return np.dot(vals,parr)
As requested in comments, the implementation of the method prob():
def prob(self, x):
"""Computes probability of removing x items
:param x: number of items to remove
:returns: probability of removing x items
"""
# p is the probability of removing an item
# sl.choose computes n choose x
return sl.choose(self.n, x) * (self.p**x) * \
(1-self.p)**(self.n-x)
I think it will be most faster:
vals = np.arange(self.n+1)
# array of probability of each value in vals
parr = self.prob(vals)
return np.dot(vals,parr)
and function:
def prob(list_of_x):
"""Computes probability of removing x items
:param list_of_x: numbers of items to remove
:returns: probability of removing x items
"""
# p is the probability of removing an item
# sl.choose computes n choose x
return np.asarray([sl.choose(self.n, e) for e in list_of_x]) * (self.p ** list_of_x) * \
(1-self.p)**(self.n - list_of_x)
Because numpy is faster:
import timeit
import numpy as np
list_a = [1, 2, 3] * 1000
list_b = [4, 5, 6] * 1000
np_list_a = np.asarray(list_a)
np_list_b = np.asarray(list_b)
print(timeit.timeit('[a * b for a, b in zip(list_a, list_b)]', 'from __main__ import list_a, list_b', number=1000))
print(timeit.timeit('np_list_a * np_list_b', 'from __main__ import np_list_a, np_list_b', number=1000))
Result:
0.19378583212707723
0.004333830584755033
The loop can be reduced to a list comprehension:
vals = np.arange(self.n+1)
# array of probability of each value in vals
parr = [self.prob(v) for v in vals]
return np.dot(vals, parr)
i want to optimize 2 for loops into single for loop, is there any way as length of array is very large.
A = [1,4,2 6,9,10,80] #length of list is very large
B = []
for x in A:
for y in A:
if x != y:
B.append(abs(x-y))
print(B)
not any better but more pythonic:
B = [abs(x-y) for x in A for y in A if x!=y]
unless you absolutely need duplicates (abs(a-b) == abs(b-a)), you can half your list (and thus computation):
B = [abs(A[i]-A[j]) for i in range(len(A)) for j in range(i+1, len(A))]
finaly you can use the power of numpy to get C++ speedup:
import numpy as np
A = np.array(A)
A.shape = -1,1 # make it a column vector
diff = np.abs(A - A.T) # diff is the matrix of abs differences
# grab upper triangle of order 1 (i.e. less the diagonal)
B = diff[np.triu_indices(len(A), k=1)]
But this will always be O(n^2) no matter what...
I have an ndarray, A,
and I want to multiply this ndarray element wise by another 1D array b where I assume that A.shape[i] = len(b) for some i. I need this generality in my application.
I can do this using np.tile as follows:
A = np.random.rand(2,3,5,9)
b = np.random.rand(5)
i = 2
b_shape = np.ones(len(A.shape), dtype=np.int)
b_shape[i] = len(b)
b_reps = list(A.shape)
b_reps[i] = 1
B = np.tile(b.reshape(b_shape), b_reps)
# Here B.shape = A.shape and
# B[i,j,:,k] = b for all i,j,k
This strikes me as ugly. Is there a better way to do this?
For this particular example, the following code would do the trick:
result = A*b[:, np.newaxis]
For any value of i, try this:
A2, B = np.broadcast_arrays(A, b)
result = A2*B
How can I calculate this product without a loop? I think I need to use numpy.tensordot but I can't seem to set it up correctly. Here's the loop version:
import numpy as np
a = np.random.rand(5,5,3,3)
b = np.random.rand(5,5,3,3)
c = np.zeros(a.shape[:2])
for i in range(c.shape[0]):
for j in range(c.shape[1]):
c[i,j] = np.sum(a[i,j,:,:] * b[i,j,:,:])
(The result is a numpy array c of shape (5,5))
I've lost the plot. The answer is simply
c = a * b
c = np.sum(c,axis=3)
c = np.sum(c,axis=2)
or on one line
c = np.sum(np.sum(a*b,axis=2),axis=2)
May this help you with the syntax ?
>>> from numpy import *
>>> a = arange(60.).reshape(3,4,5)
>>> b = arange(24.).reshape(4,3,2)
>>> c = tensordot(a,b, axes=([1,0],[0,1])) # sum over the 1st and 2nd dimensions
>>> c.shape
(5,2)
>>> # A slower but equivalent way of computing the same:
>>> c = zeros((5,2))
>>> for i in range(5):
... for j in range(2):
... for k in range(3):
... for n in range(4):
... c[i,j] += a[k,n,i] * b[n,k,j]
...
(from http://www.scipy.org/Numpy_Example_List#head-a46c9c520bd7a7b43e0ff166c01b57ec76eb96c7)