Related
I was working on part of a program in which I'm trying to input a list of numbers and return all groups of 3 numbers which sum to 0, without double or triple counting each number. Here's where I'm up to:
def threeSumZero2(array):
sums = []
apnd=[sorted([x,y,z]) for x in array for y in array for z in array if x+y+z==0]
for sets in apnd:
if sets not in sums:
sums.append(sets)
return sums
Is there any code I can put in the third line to make sure I don't return [0,0,0] as an answer.
This is my test list:
[-1,0,1,2,-1,4]
Thank you
*Edit: I should have clarified for repeated input values: the result expected for this test list is:
[[-1,-1,2],[-1,0,1]]
You want combinations without replacement, this is something offered by itertools. Your sums can then be made a set to remove the duplicates with regard to ordering.
from itertools import combinations
def threeSumZero2(array):
sums = set()
for comb in combinations(array, 3):
if sum(comb) == 0:
sums.add(tuple(sorted(comb)))
return sums
print(threeSumZero2([-1,0,1,2,-1,4]))
Output
{(-1, -1, 2), (-1, 0, 1)}
This solution can also be written more concisely using a set-comprehension.
def threeSumZero2(nums):
return {tuple(sorted(comb)) for comb in combinations(nums, 3) if sum(comb) == 0}
More efficient algorithm
Although, the above algorithm requires traversing all combinations of three items, which makes it O(n3).
A general strategy used for this kind of n-sum problem is to traverse the n-1 combinations and hash their sums, allowing to efficiently test them against the numbers in the list.
The algorithm complexity drops by one order of magnitude, making it O(n2)
from itertools import combinations
def threeSumZero2(nums, r=3):
two_sums = {}
for (i_x, x), (i_y, y) in combinations(enumerate(nums), r - 1):
two_sums.setdefault(x + y, []).append((i_x, i_y))
sums = set()
for i, n in enumerate(nums):
if -n in two_sums:
sums |= {tuple(sorted([nums[idx[0]], nums[idx[1]], n]))
for idx in two_sums[-n] if i not in idx}
return sums
print(threeSumZero2([-1,0,1,2,-1,4]))
Output
{(-1, -1, 2), (-1, 0, 1)}
You could do this with itertools (see Oliver's answer), but you can also achieve the result with three nested for-loops:
def threeSumZero2(lst):
groups = []
for i in range(len(lst)-2):
for j in range(i + 1, len(lst)-1):
for k in range(j + 1, len(lst)):
if lst[i] + lst[j] + lst[k] == 0:
groups.append((lst[i], lst[j], lst[k]))
return groups
and your test:
>>> threeSumZero2([-1, 0, 1, 2, -1, 4])
[(-1, 0, 1), (-1, 2, -1), (0, 1, -1)]
Oh and list != array!
I want to write a Rem(a, b) which return a new tuple that is like a, with the first appearance of element b is removed. For example
Rem((0, 1, 9, 1, 4), 1) which will return (0, 9, 1, 4).
I am only allowed to use higher order functions such as lambda, filter, map, and reduce.
I am thinking about to use filter but this will delete all of the match elements
def myRem(T, E):
return tuple(filter(lambda x: (x!=E), T))
myRem((0, 1, 9, 1, 4), 1) I will have (0,9,4)
The following works (Warning: hacky code):
tuple(map(lambda y: y[1], filter(lambda x: (x[0]!=T.index(E)), enumerate(T))))
But I would never recommend doing this unless the requirements are rigid
Trick with temporary list:
def removeFirst(t, v):
tmp_lst = [v]
return tuple(filter(lambda x: (x != v or (not tmp_lst or v != tmp_lst.pop(0))), t))
print(removeFirst((0, 1, 9, 1, 4), 1))
tmp_lst.pop(0) - will be called only once (thus, excluding the 1st occurrence of the crucial value v)
not tmp_lst - all remaining/potential occurrences will be included due to this condition
The output:
(0, 9, 1, 4)
For fun, using itertools, you can sorta use mostly higher-order functions...
>>> from itertools import *
>>> data = (0, 1, 9, 1, 4)
>>> not1 = (1).__ne__
>>> tuple(chain(takewhile(not1, data), islice(dropwhile(not1, data), 1, None)))
(0, 9, 1, 4)
BTW, here's some timings comparing different approaches for dropping a particular index in a tuple:
>>> timeit.timeit("t[:i] + t[i+1:]", "t = tuple(range(100000)); i=50000", number=10000)
10.42419078599778
>>> timeit.timeit("(*t[:i], *t[i+1:])", "t = tuple(range(100000)); i=50000", number=10000)
20.06185237201862
>>> timeit.timeit("(*islice(t,None, i), *islice(t, i+1, None))", "t = tuple(range(100000)); i=50000; from itertools import islice", number=10000)
>>> timeit.timeit("tuple(chain(islice(t,None, i), islice(t, i+1, None)))", "t = tuple(range(100000)); i=50000; from itertools import islice, chain", number=10000)
19.71128663700074
>>> timeit.timeit("it = iter(t); tuple(chain(islice(it,None, i), islice(it, 1, None)))", "t = tuple(range(100000)); i=50000; from itertools import islice, chain", number=10000)
17.6895881179953
Looks like it is hard to beat the straightforward: t[:i] + t[i+1:], which is not surprising.
Note, this one is shockingly less performant:
>>> timeit.timeit("tuple(j for i, j in enumerate(t) if i != idx)", "t = tuple(range(100000)); idx=50000", number=10000)
111.66658291200292
Which makes me thing all these solutions using takewhile, filter and lambda will all suffer pretty bad...
Although:
>>> timeit.timeit("not1 = (i).__ne__; tuple(chain(takewhile(not1, t), islice(dropwhile(not1, t), 1, None)))", "t = tuple(range(100000)); i=50000; from itertools import chain, takewhile,dropwhile, islice", number=10000)
62.22159145199112
Almost twice as fast as the generator expression, which goes to show, generator overhead can be quite large. However, takewhile and dropwhile are implemented in C, albeit this implementation has redundancy (take-while and dropwhile will pass the dropwhile areas twice).
Another interesting observation, if we simply wrap the substitute a list-comp for the generator expression, it is significantly faster despite the fact that the list-comprehension + tuple call iterates over the result twice compared to only once with the generator expression:
>>> timeit.timeit("tuple([j for i, j in enumerate(t) if i != idx])", "t = tuple(range(100000)); idx=50000", number=10000)
82.59887028901721
Goes to show how steep the generator-expression price can be...
Here is a solution that only uses lambda, filter(), map(), reduce() and tuple().
def myRem(T, E):
# map the tuple into a list of tuples (value, indicator)
M = map(lambda x: [(x, 1)] if x == E else [(x,0)], T)
# make the indicator 0 once the first instance of E is found
# think of this as a boolean mask of items to remove
# here the second reduce can be changed to the sum function
R = reduce(
lambda x, y: x + (y if reduce(lambda a, b: a+b, map(lambda z: z[1], x)) < 1
else [(y[0][0], 0)]),
M
)
# filter the reduced output based on the indicator
F = filter(lambda x: x[1]==0, R)
# map the output back to the desired format
O = map(lambda x: x[0], F)
return tuple(O)
Explanation
A good way to understand what's going on is to print the outputs of the intermediate steps.
Step 1: First Map
For each value in the tuple, we return a tuple with the value and a flag to indicate if it's the value to remove. These tuples are encapsulated in a list because it makes combining easier in the next step.
# original example
T = (0, 1, 9, 1, 4)
E = 1
M = map(lambda x: [(x, 1)] if x == E else [(x,0)], T)
print(M)
#[[(0, 0)], [(1, 1)], [(9, 0)], [(1, 1)], [(4, 0)]]
Step 2: Reduce
This returns a list of tuples in a similar structure to the contents of M, but the flag variable is set to 1 for the first instance of E, and 0 for all subsequent instances. This is achieved by calculating the sum of the indicator up to that point (implemented as another reduce()).
R = reduce(
lambda x, y: x + (y if reduce(lambda a, b: a+b, map(lambda z: z[1], x)) < 1
else [(y[0][0], 0)]),
M
)
print(R)
#[(0, 0), (1, 1), (9, 0), (1, 0), (4, 0)]
Now the output is in the form of (value, to_be_removed).
Step 3: Filter
Filter out the value to be removed.
F = filter(lambda x: x[1]==0, R)
print(F)
#[(0, 0), (9, 0), (1, 0), (4, 0)]
Step 4: Second map and conversion to tuple
Extract the value from the filtered list, and convert it to a tuple.
O = map(lambda x: x[0], F)
print(tuple(O))
#(0, 9, 1, 4)
This violates your requirement for "only using higher order functions" - but since it's not clear why this is a requirement, I include the below solution.
def myRem(tup, n):
idx = tup.index(n)
return tuple(j for i, j in enumerate(tup) if i != idx)
myRem((0, 1, 9, 1, 4), 1)
# (0, 9, 1, 4)
Here is a numpy solution (still not using higher-order functions):
import numpy as np
def myRem(tup, n):
tup_arr = np.array(tup)
return tuple(np.delete(tup_arr, np.min(np.nonzero(tup_arr == n)[0])))
myRem((0, 1, 9, 1, 4), 1)
# (0, 9, 1, 4)
I'm currently stuck with an algorithm problem in which I want to optimize the complexity.
I have two lists of intervals S = [[s1, s2], [s3, s4], ..., [sn-1, sn]] and W = [[w1, w2], [w3, w4], ..., [wm-1, wm]] that I want to merge respecting ordinal order, and intervals of S have priority over those of W. (S for strong, W for weak)
For example, that priority imply :
S = [[5,8]] and W = [[1, 5], [7, 10]] will give : res = [[1, 4, W], [5, 8, S], [9, 10, W]]. Here intervals from W are cropped in priority for intervals of S
S = [[5, 8]] and W = [[2, 10]] will give : res = [[2, 4, W], [5, 8, S], [9, 10, W]]. Here the interval of W is split into two parts because S has priority.
While merging those lists, I need to keep track of the strong of weak nature of those intervals by writing a third element beside each interval, that we can call the symbol. that's why the result is something like : [[1, 4, W], [5, 8, S], [9, 10, W]].
Finally, as the union of all interval does not cover all integers in a certain range, we have a third symbol, let's say B for blank which fill missing interval : [[1, 2, W], [5, 8, S], [9, 10, W], [16, 20, S]] will be filled in to become : [1, 2, W], [3, 4, B], [5, 8, S], [9, 10, W], [11, 15, B], [16, 20, S]]
My first attempt was very naive and lazy (because I first wanted it to work) :
If the greatest integer covered by these two lists of intervals is M, then I created a list of size M filled with B symbols : res = [B]*M = [B, B, B ..., B]
Then I first take interval from W one by one and rewrite elements from res of index in this interval to change its symbol to W. Next, I do the same with intervals of S, and the priority is respected because I overwrite with the symbol S in the last step.
It gives something like :
[B, B, B, B, B, B, B, B, B, B, B, B, B, B, B, B, B]
[B, B, B, W, W, W, W, B, W, W, W, W, B, W, W, B, B]
[B, B, S, S, W, W, W, B, S, S, W, W, B, S, W, B, B]
Finally, I go through the big list one last time to factorize and recreate intervals with its corresponding symbols. Previous example gives :
[[1, 2, B], [3, 4, S], [5, 7, W], [8, 8, B], [9, 10, S], [11, 12, W], [13, 13, B], [14, 14, S], [15, 15, W], [16, 17, B]]
Unfortunately but predictably, this algorithm is not usable in practice : M is around 1000000 in my application and this algorithm is O(n2) if I'm not mistaken.
So, I would like some advice and directions to solve this algorithmic complexity problem. I'm sure that this problem looks alike a well-known algorithmic problem but I don't know where to go.
My few ideas to improve that for now can be used to optimize the algorithm, but are quite complex to implement so I think there is better ideas. but here they are :
Do the same kind of overwrite process to respect priority : in list W, insert intervals of S with overwriting when necessary to respect priority. Then fill in this list to insert missing interval with B symbol. But we would have an heavy use of if to compare intervals because of the great amount of cases.
Construct a new list while browsing S and W step by step. In this idea we would have one cursor by list to go from interval to interval until the end of one of the two lists. Again we use a lot of if and cases we insert intervals in the new list with respect to priority. But it raises the same complex problem with the great amount of cases.
I hope I made myself clear, if not I can explain in other way.
Please teach me with experience and cleverness :)
Thanks
EDIT: here is my "naive" algorithm code:
def f(W, S, size):
#We first write one symbol per sample
int_result = ['B'] * size
for interval in W:
for i in range(interval[0], interval[1]+1):
int_result[i] = 'W'
for interval in S:
for i in range(interval[0], interval[1]+1):
int_result[i] = 'S'
#we then factorize: we store one symbol for an interval of the same symbol.
symbols_intervals = []
sym = int_result[0]
start = 0
for j in range(len(int_result)):
if int_result[j] != sym:
symbols_intervals.append([start, j-1, sym])
sym = all_symbols[j]
start = j
if j == len(int_result)-1:
symbols_intervals.append([start, j-1, sym])
return symbols_intervals
Your naive method sounds very reasonable; I think the time complexity of it is O(NM), where N is the number of intervals you're trying to resolve, and M is the the range over which you're trying to resolve them. The difficulty you might have is that you also have space complexity of O(M), which might use up a fair bit of memory.
Here's a method for merging without building a "master list", which may be faster; because it treats intervals as objects, complexity is no longer tied to M.
I'll represent an interval (or list of intervals) as a set of tuples (a,b,p), each of which indicates the time points from a to b, inclusively, with the integer priority p (W can be 1, and S can be 2). In each interval, it must be the case that a < b. Higher priorities are preferred.
We need a predicate to define the overlap between two intervals:
def has_overlap(i1, i2):
'''Returns True if the intervals overlap each other.'''
(a1, b1, p1) = i1
(a2, b2, p2) = i2
A = (a1 - a2)
B = (b2 - a1)
C = (b2 - b1)
D = (b1 - a2)
return max(A * B, D * C, -A * D, B * -C) >= 0
When we find overlaps, we need to resolve them. This method takes care of that, respecting priority:
def join_intervals(i1, i2):
'''
Joins two intervals, fusing them if they are of the same priority,
and trimming the lower priority one if not.
Invariant: the interval(s) returned by this function will not
overlap each other.
>>> join_intervals((1,5,2), (4,8,2))
{(1, 8, 2)}
>>> join_intervals((1,5,2), (4,8,1))
{(1, 5, 2), (6, 8, 1)}
>>> join_intervals((1,3,2), (4,8,2))
{(1, 3, 2), (4, 8, 2)}
'''
if has_overlap(i1, i2):
(a1, b1, p1) = i1
(a2, b2, p2) = i2
if p1 == p2:
# UNION
return set([(min(a1, a2), max(b1, b2), p1)])
# DIFFERENCE
if p2 < p1:
(a1, b1, p1) = i2
(a2, b2, p2) = i1
retval = set([(a2, b2, p2)])
if a1 < a2 - 1:
retval.add((a1, a2 - 1, p1))
if b1 > b2 + 1:
retval.add((b2 + 1, b1, p1))
return retval
else:
return set([i1, i2])
Finally, merge_intervals takes an iterable of intervals and joins them together until there are no more overlaps:
import itertools
def merge_intervals(intervals):
'''Resolve overlaps in an iterable of interval tuples.'''
# invariant: retval contains no mutually overlapping intervals
retval = set()
for i in intervals:
# filter out the set of intervals in retval that overlap the
# new interval to add O(N)
overlaps = set([i2 for i2 in retval if has_overlap(i, i2)])
retval -= overlaps
overlaps.add(i)
# members of overlaps can potentially all overlap each other;
# loop until all overlaps are resolved O(N^3)
while True:
# find elements of overlaps which overlap each other O(N^2)
found = False
for i1, i2 in itertools.combinations(overlaps, 2):
if has_overlap(i1, i2):
found = True
break
if not found:
break
overlaps.remove(i1)
overlaps.remove(i2)
overlaps.update(join_intervals(i1, i2))
retval.update(overlaps)
return retval
I think this has worst-case time complexity of O(N^4), although the average case should be fast. In any case, you may want to time this solution against your simpler method, to see what works better for your problem.
As far as I can see, my merge_intervals works for your examples:
# example 1
assert (merge_intervals({(5, 8, 2), (1, 5, 1), (7, 10, 1)}) ==
{(1, 4, 1), (5, 8, 2), (9, 10, 1)})
# example 2
assert (merge_intervals({(5, 8, 2), (2, 10, 1)}) ==
{(2, 4, 1), (5, 8, 2), (9, 10, 1)})
To cover the case with blank (B) intervals, simply add another interval tuple which covers the whole range with priority 0: (1, M, 0):
# example 3 (with B)
assert (merge_intervals([(1, 2, 1), (5, 8, 2), (9, 10, 1),
(16, 20, 2), (1, 20, 0)]) ==
{(1, 2, 1), (3, 4, 0), (5, 8, 2),
(9, 10, 1), (11, 15, 0), (16, 20, 2)})
The below solution has O(n + m) complexity, where n and m are the lengths of S and W lists. It assumes that S and W are internally sorted.
def combine(S, W):
s, w = 0, 0 # indices of S and W
common = []
while s < len(S) or w < len(W):
# only weak intervals remain, so append them to common
if s == len(S):
common.append((W[w][0], W[w][1], 'W'))
w += 1
# only strong intervals remain, so append them to common
elif w == len(W):
common.append((S[s][0], S[s][1], 'S'))
s += 1
# assume that the strong interval starts first
elif S[s][0] <= W[w][0]:
W[w][0] = max(W[w][0], S[s][1]+1)
if W[w][0] > W[w][1]: # drop the weak interval
w += 1
common.append((S[s][0], S[s][1], 'S'))
s += 1
# assume that the weak interval starts first
elif S[s][0] > W[w][0]:
# end point of weak interval before the start of the strong
if W[w][1] < S[s][0]:
common.append(W[w][0], W[w][1], 'W')
w += 1
# end point of the weak interval between a strong interval
elif S[s][0] <= W[w][1] <= S[s][1]:
W[w][1] = S[s][0] - 1
common.append((W[w][0], W[w][1], 'W'))
w += 1
# end point of the weak interval after the end point of the strong
elif W[w][1] > S[s][1]:
common.append((W[w][0], S[s][0]-1, 'W'))
W[w][0] = S[s][1] + 1
return common
print combine(S=[[5,8]], W=[[1, 5],[7, 10]])
print combine(S=[[5,8]], W=[[2,10]])
Sorry for what seems like a basic question, but I could not find it anywhere. In Python 2, I would like to apply a 1-variable function to its own output storing the list of all steps, i.e. if f(x) returns x*x then iterating from 2, i need to get
[2, 4, 16, 256, 65536, ...]
Ideally, I would need to pass in my function f, the first input 1, and the number of iterations I would like to keep.
I guess this is, in some sense, the opposite of reduce and somewhat similar to unfold from functional programming.
A naive way to do this is to write
out = [2]
for x in xrange(5):
out.append(f(out[-1]))
What is a good Pythonic way to do this?
Thank you very much.
What you need is a "Generator". For example,
def f(x, n):
for _ in range(n):
yield x
x = x * x
l = list(f(2, 5))
print(l) # [2, 4, 16, 256, 65536]
Or
def f(x):
while True:
yield x
x = x * x
for v in f(2):
if v > 100000:
break
print(v), # 2 4 16 256 65536
Ideally, I would need to pass in my function f, the first input 1, and
the number of iterations I would like to keep.
Here is an unfold function that accepts a function, a starting value, and an iteration count.
def unfold(function, start, iterations):
results = []
for _ in range(iterations):
results.append(start)
start = function(start)
return results
Which you can use as expected:
>>> print unfold(lambda x: x*x, 2, 5)
[2, 4, 16, 256, 65536]
I need to perform a special type of tensor contraction. I want something of this kind:
A_{bg} = Sum_{a,a',a''} ( B_{a} C_{a'b} D_{a''g} )
where all the indices can have values 0,1 and the sum over a, a' and a'' is carried for all cases where a+a'+a'' = 1 or a+a'+a'' = 2. So it is like the reverse of the Einstein summation convention: I want to sum only when one of the three indices is different to the others.
Moreover, I want some flexibility with the number of indices that are not being summed: in the example the resulting tensor has 2 indices, and the sum is over products of elements of 3 tensors, one with one index, the other two with two indices. These numbers of indices are going to vary, so in general I would like to be able to write something like this:
A_{...} = Sum_{a,a',a''} ( B_{a...} C_{a...} D_{a''...} )
I want to point that the number of indices is not fixed, but it is controlled: I can know and specify how many indices every tensor has in each step.
I tried np.einsum(), but then apparently I am forced to sum over repeated indices in the standard Einstein convention, and I don't know how to implement the condition I exposed here.
And I cannot write everything with various for because, as I said, the number of indices of the tensors involved is not fixed.
Anyone has an idea?
From comments:
I would write what I put here in programming language like this:
tensa = np.zeros((2,2))
for be in range(2):
for ga in range(2):
for al in range(2):
for alp in range(2):
for alpp in range(res(al,alp),prod(al,alp)):
tensa[be,ga] += tensb[al] * tensc[alp,be] * tensd[alpp,ga]
where res and prod are two functions that ensure that al+alp+alpp = 1 or 2. The problem with this is that I need to specify all the indices involved, and I cannot do that in the general calculation for all the lattice.
First, lets write your example out in Python loops, to have a baseline for comparisons. If I understood you correctly, this is what you want to do:
b, g = 4, 5
B = np.random.rand(2)
C = np.random.rand(2, b)
D = np.random.rand(2, g)
out = np.zeros((b, g))
for j in (0, 1):
for k in (0, 1):
for l in (0, 1):
if j + k + l in (1, 2):
out += B[j] * C[k, :, None] * D[l, None, :]
When I run this, I get this output:
>>> out
array([[ 1.27679643, 2.26125361, 1.32775173, 1.5517918 , 0.47083151],
[ 0.84302586, 1.57516142, 1.1335904 , 1.14702252, 0.34226837],
[ 0.70592576, 1.34187278, 1.02080112, 0.99458563, 0.29535054],
[ 1.66907981, 3.07143067, 2.09677013, 2.20062463, 0.65961165]])
You can't get at this directly with np.einsum, but you can run it twice and get your result as the difference of these two:
>>> np.einsum('i,jk,lm->km', B, C, D) - np.einsum('i,ik,im->km', B, C, D)
array([[ 1.27679643, 2.26125361, 1.32775173, 1.5517918 , 0.47083151],
[ 0.84302586, 1.57516142, 1.1335904 , 1.14702252, 0.34226837],
[ 0.70592576, 1.34187278, 1.02080112, 0.99458563, 0.29535054],
[ 1.66907981, 3.07143067, 2.09677013, 2.20062463, 0.65961165]])
The first call to np.einsum is adding everything up, regardless of what the indices add up to. The second only adds up those where all three indices are the same. So obviously your result is the difference of the two.
Ideally, you could now go on to write something like:
>>>(np.einsum('i...,j...,k...->...', B, C, D) -
... np.einsum('i...,i...,i...->...', B, C, D))
and get your result regardless of the dimensions of your C and D arrays. If you try the first, you will get the following error message:
ValueError: operands could not be broadcast together with remapped shapes
[original->remapped]: (2)->(2,newaxis,newaxis) (2,4)->(4,newaxis,2,newaxis)
(2,5)->(5,newaxis,newaxis,2)
The problem is that, since you are not specifying what you want to do with the b and g dimensions of your tensors, it tries to broadcast them together, and since they are different, it fails. You can get it to work by adding extra dimensions of size 1:
>>> (np.einsum('i...,j...,k...->...', B, C, D[:, None]) -
... np.einsum('i...,i...,i...->...', B, C, D[:, None]))
array([[ 1.27679643, 2.26125361, 1.32775173, 1.5517918 , 0.47083151],
[ 0.84302586, 1.57516142, 1.1335904 , 1.14702252, 0.34226837],
[ 0.70592576, 1.34187278, 1.02080112, 0.99458563, 0.29535054],
[ 1.66907981, 3.07143067, 2.09677013, 2.20062463, 0.65961165]])
If you wanted all the axes of B to be placed before all the axes of C, and these before all the axes of D, the following seems to work, at least as far as creating an output of the right shape, although you may want to double check that the result is really what you want:
>>> B = np.random.rand(2, 3)
>>> C = np.random.rand(2, 4, 5)
>>> D = np.random.rand(2, 6)
>>> C_idx = (slice(None),) + (None,) * (B.ndim - 1)
>>> D_idx = C_idx + (None,) * (C.ndim - 1)
>>> (np.einsum('i...,j...,k...->...', B, C[C_idx], D[D_idx]) -
... np.einsum('i...,i...,i...->...', B, C[C_idx], D[D_idx])).shape
(3L, 4L, 5L, 6L)
EDIT From the comments, if instead of just the first axis of each tensor having to be reduced over, it is the first two, then the above could be written as:
>>> B = np.random.rand(2, 2, 3)
>>> C = np.random.rand(2, 2, 4, 5)
>>> D = np.random.rand(2, 2, 6)
>>> C_idx = (slice(None),) * 2 + (None,) * (B.ndim - 2)
>>> D_idx = C_idx + (None,) * (C.ndim - 2)
>>> (np.einsum('ij...,kl...,mn...->...', B, C[C_idx], D[D_idx]) -
... np.einsum('ij...,ij...,ij...->...', B, C[C_idx], D[D_idx])).shape
(3L, 4L, 5L, 6L)
More generally, if reducing over d indices, C_idx and D_idx would look like:
>>> C_idx = (slice(None),) * d + (None,) * (B.ndim - d)
>>> D_idx = C_idx + (None,) * (C.ndim - d)
and the calls to np.einsum would need to have d letters in the indexing, unique in the first call, repeating in the second.
EDIT 2 So what exactly goes on with C_idx and D_idx? Take the last example, with B, C and D with shapes (2, 2, 3), (2, 2, 4, 5) and (2, 2, 6). C_idx is made up of two empty slices, plus as many Nones as the number of dimensions of B minus 2, so when we take C[C_idx] the result has shape (2, 2, 1, 4, 5). Similarly D_idx is C_idx plus as many Nones as the number of dimensions of C minus 2, so the result of D[D_idx] has shape (2, 2, 1, 1, 1, 6). These three arrays don't braodcast together, but np.einsum adds additional dimensions of size 1, i.e. the "remapped" shapes of the error above, so the resulting arrays turn out to have extra trailing ones, and the shapes amtch as follows:
(2, 2, 3, 1, 1, 1)
(2, 2, 1, 4, 5, 1)
(2, 2, 1, 1, 1, 6)
The first two axes are reduced, so the disappear from the output, and in the other cases, broadcasting applies, where a dimension of size 1 is replicated to match a larger one, so the output is (3, 4, 5, 6) as we wanted.
#hpaulj proposes a method using "Levi-Civita like" tensors, that should in theory be faster, see my comments to the original question. Here's some code for comparison:
b, g = 5000, 2000
B = np.random.rand(2)
C = np.random.rand(2, b)
D = np.random.rand(2, g)
def calc1(b, c, d):
return (np.einsum('i,jm,kn->mn', b, c, d) -
np.einsum('i,im,in->mn', b, c, d))
def calc2(b, c, d):
return np.einsum('ijk,i,jm,kn->mn', calc2.e, b, c, d)
calc2.e = np.ones((2,2,2))
calc2.e[0, 0, 0] = 0
calc2.e[1, 1, 1] = 0
But when running it:
%timeit calc1(B, C, D)
1 loops, best of 3: 361 ms per loop
%timeit calc2(B, C, D)
1 loops, best of 3: 643 ms per loop
np.allclose(calc1(B, C, D), calc2(B, C, D))
Out[48]: True
A surprising result, which I can't explain...