please see this image before reading :)
finding a centroid coordinate based on the biggest multiply of left/right/top/down side
below code is working but has no end with bigger array.
how can i optimize this:
(if numpy matters , i am passing region to find_centroid with region=region_coordinates.tolist())
def find_centroid(region):
centroid = region[0]
coord_weight = 0
for coord in region:
new_coord_weight = weight_calc(region, coord, -1, 0)*weight_calc(
region, coord, 1, 0)*weight_calc(region, coord, 0, -1)*weight_calc(region, coord, 0, 1)
if new_coord_weight > coord_weight:
coord_weight = new_coord_weight
centroid = coord
return centroid
def weight_calc(region, coord, xinc, yinc):
weight = 1
x = coord[0]
y = coord[1]
while(True):
if [x, y] in region:
weight += 1
x += xinc
y += yinc
else:
break
return weight
and for a test :
array_test = ([[0, 0], [1, 0], [1, 1], [0, 1], [2, 1], [2, 2], [3, 1], [3, 0], [2, 0], [3, 2]])
print(find_centroid(array_test))
No infinite loop explained
This part of your code will get you stuck in an infinite loop if region is a numpy array:
while True:
if [x, y] in region:
...
That's because, when used on arrays, the operator in will return True if any of the element of the list matches any of the array's sublist elements.
Instead, you can use python's any and all methods :
if (np.array(region)==[x,y]).all(axis=1).any(axis=0):
The all(axis=1) will check for every sublist if both values are equal, in the correct order.
We got an array of boolean. If any boolean is True, then there is at least one match.
Casting any of both lists into a numpy array is enough to make this test possible.
But it will work if...
If both elements are lists, the in operator will work as expected, but in that case you should make sure that region and each of its sublist are all list, not a numpy array. Casting it won't work. Here is why :
import numpy as np
array_test = [[0, 0], [1, 0]]
print([1,1] in array_test) # prints False, as expected
# numpy always compares element-wise, when both elements have the same length
print([1,1] == np.array([1,0])) # Prints [True, False]
print(np.array([1,1]) == np.array([1,0])) # [Line 6] Prints [True, False]
# Errors when ambiguous "in"
print([1,1] == np.array(array_test)) # Prints [[False False] [ True False]]
print([1,1] in np.array(array_test)) # Prints true as explained, because we have at least one True
print([1,1] in list(np.array(array_test))) #Error because numpy doesn't know how to evaluate the result at line 6
Another version
Here is my way of doing this. There might be better ways, it's just my two cents.
Pre-filtering the potential centroids
First, I'd compose all possible intersections (I'll call them "centers" from now) in the region. So first, I would count every x coordinate and y coordinate. To make it easy, I will use numpy.
import numpy as np
# We count every x values. We keep those that are present at least twice.
x_counts = dict(zip(*np.unique(array_test[:,0], return_counts=True)))
y_counts = dict(zip(*np.unique(array_test[:,1], return_counts=True)))
# If an x is present once, then there cannot be any center in this column.
x_inter = [coord for coord, count in x_counts.items() if count>=2]
# Same with y and rows.
y_inter = [coord for coord, count in y_counts.items() if count>=2]
# Next, we create all combinations of (x, y)
# an filter in the combinations present in our region.
possible_centroids = np.array([(x,y) for x,y in product(x_inter, y_inter)
if (array_test==np.array([x,y])).all(axis=1).any())
Measuring arm lengths
To calculate the power of our centers, we first use a function to measure the arm length. Let's make it a bit parametrable, with a directionargument.
# Since we are in 2D and we have no diagonal, there are four possible directions.
directions = np.array([[0,1], [0, -1], [1, 0], [-1, 0]])
def get_arm_length(center, direction):
position = center+direction # going one step in the direction
# We keep track of the length in the direction.
length = 0
# adding 1 as long as the next step in direction is in region
while (region==position).all(axis=1).any():
position += direction
length+=1
return length
Measuring every potential centroid
Now we can test the four directions, for each of our potential centroids (previously selected) and keeping the best one along the way.
best_center=(0,[-1, -1]) # => (power, center_coords)
for center in centers:
# Setting to 1, which is the identity element of the product (x * 1 == x)
power = 1
for direction in directions:
# We multiply by the power along the four axes.
power *= get_arm_length(center, direction)
# if a more powerful one is found, we store it power and coords.
if power > best_center[0]:
best_center = power, center
# At this point, we found most powerful center, which is our centroid.
Putting it all together
Here is the full code.
def find_centroid2(region):
region = np.array(region)
# Directions:
directions = np.array([[0,1], [0, -1], [1, 0], [-1, 0]])
def get_arm_length(center, direction):
position = center+direction
length = 1
while (region==position).all(axis=1).any(axis=0):
position+= direction
length+=1
return length
# Intersections:
x_counts = dict(zip(*np.unique(region[:,0], return_counts=True)))
y_counts = dict(zip(*np.unique(region[:,1], return_counts=True)))
x_inter = [coord for coord, count in x_counts.items() if count>=2]
y_inter = [coord for coord, count in y_counts.items() if count>=2]
centers = np.array([(x,y) for x,y in product(x_inter, y_inter) if (region==np.array([x,y])).all(axis=1).any()])
# Measuring each center's "power":
best_center=(0,[-1, -1]) # => (power, center_coords)
for center in centers:
power = 1
for direction in directions:
power *= get_arm_length(center, direction)
if power > best_center[0]:
best_center = power, center
return best_center[1]
An optimisation's optimisation
Instead of testing all virtual centers to keep the ones that belong to our region, we can instead filter our region and keep the cells that have a coordinate counted twice or more.
def find_centroid3(region):
region = np.array(region)
# Directions:
directions = np.array([[0,1], [0, -1], [1, 0], [-1, 0]])
def get_arm_length(center, direction):
position = center+direction
length = 1
while (region==position).all(axis=1).any(axis=0):
position+= direction
length+=1
return length
# Intersections:
# It's better to filter the cells instead of computing and testing all combinations
x_counts = [x[0] for x in zip(*np.unique(region[:,0], return_counts=True)) if x[1]>=2]
y_counts = [y[0] for y in zip(*np.unique(region[:,1], return_counts=True)) if y[1]>=2]
centers = [[x,y] for x,y in region if x in x_counts or y in y_counts]
# Measuring each center's "power":
best_center=(0,[-1, -1]) # => (power, center_coords)
for center in centers:
power = 1
for direction in directions:
power *= get_arm_length(center, direction)
if power > best_center[0]:
best_center = power, center
return best_center[1]
Comparing V2
The random region preparation, with a lot of cells.
# Keeping the grid fairly big and filled
# 150*150 grid (22'500 cells) with 15'000 filled cells max.
array_test = np.random.randint(15, size=(150, 2)) # => len = 15'000
# Getting rid of duplicates, else they will mess with the counting.
# Assuming your own grids also don't have any
new_array = [list(array_test[0])]
for elem in array_test[1:]:
if (elem != np.array(new_array)).any(axis=1).all():
new_array.append(elem)
array_test = np.array(new_array) # => len = 10'959, all are unique cells
Results:
find_centroid(array_test) # Original version. Result = [64 127]
# 16 s ± 117 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
find_centroid(array_test) # Proposed version 1. result = [61 127]
# 13.1 s ± 87.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
find_centroid3(array_test) # Proposed version 2. Result = [61, 127]
# 9.49 s ± 47.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
I tried several grid sizes, keeping it half filled at max.
Comparing V1
[Obsolete]
Your original code (corrected for dealing with the infinite loop):
%%timeit
find_centroid(array_test) # Result => array([73, 16])
# 21.4 s ± 397 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The proposed code:
%%timeit
find_centroid2(array_test) # Result => array([73, 16])
# 17.2 s ± 76.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
That's not a tremendous optimisation, but that's an optimization anyways.
Maybe some other reviews and other ideas can make it better.
I tried several grid sizes, keeping it half filled at max.
For anybody else who need a better(may be perfect)answer is using an small shape passed through erosion Over the array in a while loop which remove a border until find:
-one cross : the center coordinates
-many cross : compare for best one separately
Related
I have a 2x2 reference tensor and a batch of candidate 2x2 tensors. I would like to find the closest candidate tensor to the reference tensor by summed euclidean distance over the identically indexed (except for the batch index) elements.
For example:
ref = torch.as_tensor([[1, 2], [3, 4]])
candidates = torch.rand(100, 2, 2)
I would like to find the 2x2 tensor index in candidates that minimizes:
(ref[0][0] - candidates[index][0][0])**2 +
(ref[0][1] - candidates[index][0][1])**2 +
(ref[1][0] - candidates[index][1][0])**2 +
(ref[1][1] - candidates[index][1][1])**2
Ideally, this solution would work for arbitrary dimension reference tensor of size (b, c, d, ...., z) and an arbitrary batch_size of candidate tensors with equal dimensions to the reference tensor (batch_size, b, c, d,..., z)
Elaborating on #ndrwnaguib's answer, it should be rather:
dist = torch.cdist( ref.float().flatten().unsqueeze(0), candidates.flatten(start_dim=1))
print(torch.square( dist ))
torch.argmin( dist )
tensor([[23.3516, 21.8078, 25.5247, 26.3465, 21.3161, 17.7537, 24.1075, 22.4388,
22.7513, 16.8489]])
tensor(9)
other options, worth noting:
torch.square(ref.float()- candidates).sum( dim=(1,2) )
tensor([[23.3516, 21.8078, 25.5247, 26.3465, 21.3161, 17.7537, 24.1075, 22.4388,
22.7513, 16.8489]])
diff = ref.float()- candidates
torch.einsum( "abc,abc->a" ,diff, diff)
tensor([[23.3516, 21.8078, 25.5247, 26.3465, 21.3161, 17.7537, 24.1075, 22.4388,
22.7513, 16.8489]])
The following line returns the index of the tensor in candidates that minimizes the summation of the element-wise Euclidean distance to ref
In [1]: import torch
In [2]: ref = torch.as_tensor([[1, 2], [3, 4]])
...: candidates = torch.rand(100, 2, 2)
In [3]: %timeit torch.argmin(((ref - candidates) ** 2).sum((1, 2)))
16.9 µs ± 350 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Let's consider, there are two arrays I and J which determine the neighbor pairs:
I = np.array([0, 0, 1, 2, 2, 3])
J = np.array([1, 2, 0, 0, 3, 2])
Which means element 0 has two neighbors 1 and 2. Element 1 has only 0 as a neighbor and so on.
What is the most efficient way to create arrays of all neighbor triples I', J', K' such that j is neighbor of i and k is neighbor of j given the condition i, j, and k are different elements (i != j != k)?
Ip = np.array([0, 0, 2, 3])
Jp = np.array([2, 2, 0, 2])
Kp = np.array([0, 3, 1, 0])
Of course, one way is to loop over each element. Is there a more efficient algorithm? (working with 10-500 million elements)
I would go with a very simple approach and use pandas (I and J are your numpy arrays):
import pandas as pd
df1 = pd.DataFrame({'I': I, 'J': J})
df2 = df1.rename(columns={'I': 'K', 'J': 'I'})
result = pd.merge(df2, df1, on='I').query('K != J')
The advantage is that pandas.merge relies on a very fast underlying numerical implementation. Also, you can make the computation even faster for example by merging using indexes.
To reduce the memory that this approach needs, it would be probably very useful to reduce the size of df1 and df2 before merging them (for example, by changing the dtype of their columns to something that suits your need).
Here is an example of how to optimize speed and memory of the computation:
from timeit import timeit
import numpy as np
import pandas as pd
I = np.random.randint(0, 10000, 1000000)
J = np.random.randint(0, 10000, 1000000)
df1_64 = pd.DataFrame({'I': I, 'J': J})
df1_32 = df1_64.astype('int32')
df2_64 = df1_64.rename(columns={'I': 'K', 'J': 'I'})
df2_32 = df1_32.rename(columns={'I': 'K', 'J': 'I'})
timeit(lambda: pd.merge(df2_64, df1_64, on='I').query('K != J'), number=1)
# 18.84
timeit(lambda: pd.merge(df2_32, df1_32, on='I').query('K != J'), number=1)
# 9.28
There is no particularly magic algorithm to generate all of the triples. You can avoid re-fetching a node's neighbors by an orderly search, but that's about it.
Make an empty list, N, of nodes to check.
Add some start node, S, to N
While N is not empty
Pop a node off the list; call it A.
Make a set of its neighbors, A'.
for each neighbor B of A
for each element a of A'
Generate the triple (a, A, B)
Add B to the list of nodes to check, if it has not already been checked.
Does that help? There are still several details to handle in the algorithm above, such as avoiding duplicate generation, and fine points of moving through cliques.
What you are looking for is all paths of length 3 in the graph. You can achieve this simply with the following recursive algorithm:
import networkx as nx
def findPaths(G,u,n):
"""Returns a list of all paths of length `n` starting at vertex `u`."""
if n==1:
return [[u]]
paths = [[u]+path for neighbor in G.neighbors(u) for path in findPaths(G,neighbor,n-1) if u not in path]
return paths
# Generating graph
vertices = np.unique(I)
edges = list(zip(I,J))
G = nx.Graph()
G.add_edges_from(edges)
# Grabbing all 3-paths
paths = [path for v in vertices for path in findPaths(G,v,3)]
paths
>>> [[0, 2, 3], [1, 0, 2], [2, 0, 1], [3, 2, 0]]
This is an initial solution to your problem using networkx, an optimized library for graph computations:
import numpy as np
import networkx as nx
I = np.array([0, 0, 1, 2, 2, 3])
J = np.array([1, 2, 0, 0, 3, 2])
I_, J_, K_ = [], [], [],
num_nodes = np.max(np.concatenate([I,J])) + 1
A = np.zeros((num_nodes, num_nodes))
A[I,J] = 1
print("Adjacency Matrix:")
print(A)
G = nx.from_numpy_matrix(A)
for i in range(num_nodes):
first_neighbors = list(G.neighbors(i))
for j in first_neighbors:
second_neighbor = list(G.neighbors(j))
second_neighbor_no_circle = list(filter(lambda node: node != i, second_neighbor))
num_second_neighbors = len(second_neighbor_no_circle)
if num_second_neighbors > 0:
I_.extend(num_second_neighbors * [i])
J_.extend(num_second_neighbors * [j])
K_.extend(second_neighbor_no_circle)
I_, J_, K_ = np.array(I_), np.array(J_), np.array(K_)
print("result:")
print(I_)
print(J_)
print(K_)
####### Output #######
Adjacency Matrix:
[[0. 1. 1. 0.]
[1. 0. 0. 0.]
[1. 0. 0. 1.]
[0. 0. 1. 0.]]
result:
[0 1 2 3]
[2 0 0 2]
[3 2 1 0]
I used %%timeit on the code above without print statements to check the running time:
49 µs ± 113 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Complexity analysis:
Finding all the neighbors of all the neighbors is essentially taking 2 steps in a Depth First Search algorithm. This could take, depending on the graph's topology, up to O(|V| + |E|) where |E| is the number of edges in the graph and |V| is the number of vertices.
To the best of my knowledge, there is no better algorithm on a general graph.
However, if you do know some special properties about the graph, the running time could be more tightly bounded or perhaps alter the current algorithm based on this knowledge.
For instance, if you know all the vertices have at most d edges, and the graph has one connected component, the bound of this implementation becomes O(2d) which is quite better if d << |E|.
Let me know if you have any questions.
Assume we have a numpy array A with shape (N, ) and a matrix D with shape (M, 3) which has data and another matrix I with shape (M, 3) which has corresponding index of each data element in D. How can we construct A given D and I such that the repeated element indexes are added?
Example:
############# A[I] := D ###################################
A = [0.5, 0.6] # Final Reduced Data Vector
D = [[0.1, 0.1 0.2], [0.2, 0.4, 0.1]] # Data
I = [[0, 1, 0], [0, 1, 1]] # Indices
For example:
A[0] = D[0][0] + D[0][2] + D[1][0] # 0.5 = 0.1 + 0.2 + 0.2
Since in index matrix we have:
I[0][0] = I[0][2] = I[1][0] = 0
Target is to avoid looping over all elements to be efficient for large N, M (10^6-10^9).
I doubt you can get much faster than np.bincount - and notice how the official documentation provides this exact usecase
# Your example
A = [0.5, 0.6]
D = [[0.1, 0.1, 0.2], [0.2, 0.4, 0.1]]
I = [[0, 1, 0], [0, 1, 1]]
# Solution
import numpy as np
D, I = np.array(D).flatten(), np.array(I).flatten()
print(np.bincount(I, D)) #[0.5 0.6]
The shape of I and D doesn't matter: you can clearly ravel the arrays without changing the outcome:
index = np.ravel(I)
data = np.ravel(D)
Now you can sort both arrays according to I:
sorter = np.argsort(index)
index = index[sorter]
data = data[sorter]
This is helpful because now index looks like this:
0, 0, 0, 1, 1, 1
And data is this:
0.1, 0.2, 0.2, 0.1, 0.4, 0.1
Adding together runs of consecutive numbers should be easier than processing random locations. Let's start by finding the indices where the runs start:
runs = np.r_[0, np.flatnonzero(np.diff(index)) + 1]
Now you can use the fact that ufuncs like np.add have a partial reduce operation called reduceat. This allows you to sum regions of an array:
a = np.add.reduceat(data, runs)
If I is guaranteed to contain all indices in [0, A.size) at least once, you're done: just assign to A instead of a. If not, you can make the mapping using the fact that the start of each run in index is the target index:
A = np.zeros(n)
A[index[runs]] = a
Algorithmic complexity analysis:
ravel is O(1) in time and space if the data is in an array. If it's a list, this is O(MN) in time and space
argsort is O(MN log MN) in time and O(MN) in space
Indexing by sorter is O(MN) in time and space
Computing runs is O(MN) in time and O(MN + M) = O(MN) in space
reduceat is a single pass: O(MN) in time, O(M) in space
Reassigning A is O(M) in time and space
Total: O(MN log MN) time, O(MN) space
TL;DR
def make_A(D, I, M):
index = np.ravel(I)
data = np.ravel(D)
sorter = np.argsort(index)
index = index[sorter]
if index[0] < 0 or index[-1] >= M:
raise ValueError('Bad indices')
data = data[sorter]
runs = np.r_[0, np.flatnonzero(np.diff(index)) + 1]
a = np.add.reduceat(data, runs)
if a.size == M:
return a
A = np.zeros(M)
A[index[runs]] = a
return A
If you know the size of A beforehand, as it seems you do, you can simply use add.at:
import numpy as np
D = [[0.1, 0.1, 0.2], [0.2, 0.4, 0.1]]
I = [[0, 1, 0], [0, 1, 1]]
arr_D = np.array(D)
arr_I = np.array(I)
A = np.zeros(2)
np.add.at(A, arr_I, arr_D)
print(A)
Output
[0.5 0.6]
If you don't know the size of A, you can use max to compute it:
A = np.zeros(arr_I.max() + 1)
np.add.at(A, arr_I, arr_D)
print(A)
Output
[0.5 0.6]
The time complexity of this algorithm is O(N), with also space complexity O(N).
The:
arr_I.max() + 1
is what bincount does under the hood, from the documentation:
The result of binning the input array. The length of out is equal to
np.amax(x)+1.
That being said, bincount is at least one order of magnitude faster:
I = np.random.choice(1000, size=(1000, 3), replace=True)
D = np.random.random((1000, 3))
%timeit make_A_with_at(I, D, 1000)
213 µs ± 25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit make_A_with_bincount(I, D)
11 µs ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
I have an 8x8x25000 array W and an 8 x 25000 array r. I want to multiple each 8x8 slice of W by each column (8x1) of r and save the result in Wres, which will end up being an 8x25000 matrix.
I am accomplishing this using a for loop as such:
for i in range(0,25000):
Wres[:,i] = np.matmul(W[:,:,i],res[:,i])
But this is slow and I am hoping there is a quicker way to accomplish this.
Any ideas?
Matmul can propagate as long as the 2 arrays share the same 1 axis length. From the docs:
If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.
Thus, you have to perform 2 operations prior to matmul:
import numpy as np
a = np.random.rand(8,8,100)
b = np.random.rand(8, 100)
transpose a and b so that the first axis are the 100 slices
add an extra dimension to b so that b.shape = (100, 8, 1)
Then:
at = a.transpose(2, 0, 1) # swap to shape 100, 8, 8
bt = b.T[..., None] # swap to shape 100, 8, 1
c = np.matmul(at, bt)
c is now 100, 8, 1, reshape back to 8, 100:
c = np.squeeze(c).swapaxes(0, 1)
or
c = np.squeeze(c).T
And last, a one-liner just for conveniende:
c = np.squeeze(np.matmul(a.transpose(2, 0, 1), b.T[..., None])).T
An alternative to using np.matmul is np.einsum, which can be accomplished in 1 shorter and arguably more palatable line of code with no method chaining.
Example arrays:
np.random.seed(123)
w = np.random.rand(8,8,25000)
r = np.random.rand(8,25000)
wres = np.einsum('ijk,jk->ik',w,r)
# a quick check on result equivalency to your loop
print(np.allclose(np.matmul(w[:, :, 1], r[:, 1]), wres[:, 1]))
True
Timing is equivalent to #Imanol's solution so take your pick of the two. Both are 30x faster than looping. Here, einsum will be competitive because of the size of the arrays. With arrays larger than these, it would likely win out, and lose for smaller arrays. See this discussion for more.
def solution1():
return np.einsum('ijk,jk->ik',w,r)
def solution2():
return np.squeeze(np.matmul(w.transpose(2, 0, 1), r.T[..., None])).T
def solution3():
Wres = np.empty((8, 25000))
for i in range(0,25000):
Wres[:,i] = np.matmul(w[:,:,i],r[:,i])
return Wres
%timeit solution1()
100 loops, best of 3: 2.51 ms per loop
%timeit solution2()
100 loops, best of 3: 2.52 ms per loop
%timeit solution3()
10 loops, best of 3: 64.2 ms per loop
Credit to: #Divakar
I perform the cross product of contiguous segments of a trajectory (xy coordinates) using the following script:
In [129]:
def func1(xy, s):
size = xy.shape[0]-2*s
out = np.zeros(size)
for i in range(size):
p1, p2 = xy[i], xy[i+s] #segment 1
p3, p4 = xy[i+s], xy[i+2*s] #segment 2
out[i] = np.cross(p1-p2, p4-p3)
return out
def func2(xy, s):
size = xy.shape[0]-2*s
p1 = xy[0:size]
p2 = xy[s:size+s]
p3 = p2
p4 = xy[2*s:size+2*s]
tmp1 = p1-p2
tmp2 = p4-p3
return tmp1[:, 0] * tmp2[:, 1] - tmp2[:, 0] * tmp1[:, 1]
In [136]:
xy = np.array([[1,2],[2,3],[3,4],[5,6],[7,8],[2,4],[5,2],[9,9],[1,1]])
func2(xy, 2)
Out[136]:
array([ 0, -3, 16, 1, 22])
func1 is particularly slow because of the inner loop so I rewrote the cross-product myself (func2) which is orders of magnitude faster.
Is it possible to use the numpy einsum function to make the same calculation?
einsum computes sums of products only, but you could shoehorn the cross-product into a sum of products by reversing the columns of tmp2 and changing the sign of the first column:
def func3(xy, s):
size = xy.shape[0]-2*s
tmp1 = xy[0:size] - xy[s:size+s]
tmp2 = xy[2*s:size+2*s] - xy[s:size+s]
tmp2 = tmp2[:, ::-1]
tmp2[:, 0] *= -1
return np.einsum('ij,ij->i', tmp1, tmp2)
But func3 is slower than func2.
In [80]: xy = np.tile(xy, (1000, 1))
In [104]: %timeit func1(xy, 2)
10 loops, best of 3: 67.5 ms per loop
In [105]: %timeit func2(xy, 2)
10000 loops, best of 3: 73.2 µs per loop
In [106]: %timeit func3(xy, 2)
10000 loops, best of 3: 108 µs per loop
Sanity check:
In [86]: np.allclose(func1(xy, 2), func3(xy, 2))
Out[86]: True
I think the reason why func2 is beating einsum here is because the cost of setting of the loop in einsum for just 2 iterations is too expensive compared to just manually writing out the sum, and the reversing and multiplying eat up some time as well.
np.cross is a smart little beast, that can handle broadcasting without any issue. So you can rewrite your func2 as:
def func2(xy, s):
size = xy.shape[0]-2*s
p1 = xy[0:size]
p2 = xy[s:size+s]
p3 = p2
p4 = xy[2*s:size+2*s]
return np.cross(p1-p2, p4-p3)
and it will produce the correct result:
>>> func2(xy, 2)
array([ 0, -3, 16, 1, 22])
In the latest numpy it will likely run a tad faster than your code, as it was rewritten to minimize intermediate array creation. You can look at the source code (pure Python) here.