Average Number of Repeating Patterns with Numpy - python

I have an arbitrary array with only binary values- say:
a = np.array([1,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,1,1,0,])
What would be the most efficient way to count average length of sequences of 1s array? Eg. In this example, it would be (1 + 8 + 2)/3.

For an all numpy solution, you can use Alex Martelli's solution like so:
def runs_of_ones_array(bits):
# make sure all runs of ones are well-bounded
bounded = np.hstack(([0], bits, [0]))
# get 1 at run starts and -1 at run ends
difs = np.diff(bounded)
run_starts, = np.where(difs > 0)
run_ends, = np.where(difs < 0)
return run_ends - run_starts
>>> a=np.array([1,0,0,0,0,0,1,1,1,1,1,1,1,1,0,0,1,1,0,])
>>> b=runs_of_ones_array(a)
>>> float(sum(b))/len(b)
3.66666666667

I'm not sure easiest, but one alternative is
np.mean([len(list(v)) for k,v in itertools.groupby(a) if k])
3.6666666666666665
Explanation
groupby groups adjacent equal values together, we filter only ones (if k), the list comprehension [...] creates the list of the lengths of sub sequences of ones, i.e. [1,8,2], and mean computes the average value.

Related

What is the math program I'm trying to solve in python?

I am trying to solve this math problem in python, and I'm not sure what it is called:
The answer X is always 100
Given a list of 5 integers, their sum would equal X
Each integer has to be between 1 and 25
The integers can appear one or more times in the list
I want to find all the possible unique lists of 5 integers that match.
These would match:
20,20,20,20,20
25,25,25,20,5
10,25,19,21,25
along with many more.
I looked at itertools.permutations, but I don't think that handles duplicate integers in the list. I'm thinking there must be a standard math algorithm for this, but my search queries must be poor.
Only other thing to mention is if it matters that the list size could change from 10 integers to some other length (6, 24, etc).
This is a constraint satisfaction problem. These can often be solved by a method called linear programming: You fix one part of the solution and then solve the remaining subproblem. In Python, we can implement this approach with a recursive function:
def csp_solutions(target_sum, n, i_min=1, i_max=25):
domain = range(i_min, i_max + 1)
if n == 1:
if target_sum in domain:
return [[target_sum]]
else:
return []
solutions = []
for i in domain:
# Check if a solution is still possible when i is picked:
if (n - 1) * i_min <= target_sum - i <= (n - 1) * i_max:
# Construct solutions recursively:
solutions.extend([[i] + sol
for sol in csp_solutions(target_sum - i, n - 1)])
return solutions
all_solutions = csp_solutions(100, 5)
This yields 23746 solutions, in agreement with the answer by Alex Reynolds.
Another approach with Numpy:
#!/usr/bin/env python
import numpy as np
start = 1
end = 25
entries = 5
total = 100
a = np.arange(start, end + 1)
c = np.array(np.meshgrid(a, a, a, a, a)).T.reshape(-1, entries)
assert(len(c) == pow(end, entries))
s = c.sum(axis=1)
#
# filter all combinations for those that meet sum criterion
#
valid_combinations = c[np.where(s == total)]
print(len(valid_combinations)) # 23746
#
# filter those combinations for unique permutations
#
unique_permutations = set(tuple(sorted(x)) for x in valid_combinations)
print(len(unique_permutations)) # 376
You want combinations_with_replacement from itertools library. Here is what the code would look like:
from itertools import combinations_with_replacement
values = [i for i in range(1, 26)]
candidates = []
for tuple5 in combinations_with_replacement(values, 5):
if sum(tuple5) == 100:
candidates.append(tuple5)
For me on this problem I get 376 candidates. As mentioned in the comments above if these are counted once for each arrangement of the 5-pair, then you'd want to look at all, permutations of the 5 candidates-which may not be all distinct. For example (20,20,20,20,20) is the same regardless of how you arrange the indices. However, (21,20,20,20,19) is not-this one has some distinct arrangements.
I think that this could be what you are searching for: given a target number SUM, a left treshold L, a right treshold R and a size K, find all the possible lists of K elements between L and R which sum gives SUM. There isn't a specific name for this problem though, as much as I was able to find.

Efficiently adding two different sized one dimensional arrays

I want to add two numpy arrays of different sizes starting at a specific index. As I need to do this couple of thousand times with large arrays, this needs to be efficient, and I am not sure how to do this efficiently without iterating through each cell.
a = [5,10,15]
b = [0,0,10,10,10,0,0]
res = add_arrays(b,a,2)
print(res) => [0,0,15,20,25,0,0]
naive approach:
# b is the bigger array
def add_arrays(b, a, i):
for j in range(len(a)):
b[i+j] = a[j]
You might assign smaller one into zeros array then add, I would do it following way
import numpy as np
a = np.array([5,10,15])
b = np.array([0,0,10,10,10,0,0])
z = np.zeros(b.shape,dtype=int)
z[2:2+len(a)] = a # 2 is offset
res = z+b
print(res)
output
[ 0 0 15 20 25 0 0]
Disclaimer: I assume that offset + len(a) is always less or equal len(b).
Nothing wrong with your approach. You cannot get better asymptotic time or space complexity. If you want to reduce code lines (which is not an end in itself), you could use slice assignment and some other utils:
def add_arrays(b, a, i):
b[i:i+len(a)] = map(sum, zip(b[i:i+len(a)], a))
But the functional overhead should makes this less performant, if anything.
Some docs:
map
sum
zip
It should be faster than Daweo answer, 1.5-5x times (depending on the size ratio between a and b).
result = b.copy()
result[offset: offset+len(a)] += a

How do I create a data frame with every possible permutation of integers with certain constraints?

I have ten columns: n1, n2, n3, n4, n5, n6, n7, n8, n9, n10.
The values in a single row must add up to exactly 10, and all of the values must be non-negative integers less than or equal to 5.
I'd like to make a DataFrame with every possible permutation according to the constraints that I've just described. The order matters (i.e., [5,5,0,0,0,0,0,0,0,0] and [5,0,5,0,0,0,0,0,0,0] should both be separate rows).
Here's my attempt:
import itertools as it
permutations = [i for i in it.permutations(range(0,6), 10) if sum(i)==10]
df = pd.DataFrame(data=permutations,columns=['x1','x2','x3','x4','x5','x6','x7','x8','x9','x10'])
The problem is that there are zero rows in df. The array permutations is empty. I don't see why that is. If I replace it.permutations with it.combinations_with_replacement, the length of the resulting list is 30. Why does it.permutations return nothing?
It's an easy fix!
Since order matters, you're actually looking for itertools.product (I know, it's a weird name). Here's the documentation: https://docs.python.org/3/library/itertools.html#itertools.product.
Solution:
import itertools as it
permutations = [i for i in it.product(range(6), repeat=10) if sum(i) == 10]
You can't get a permutation of 10 items from a list of 6 items. (Maybe "permutation" doesn't mean what you think it means.)
Here's one way to get what you want (though its gonna take a while to run):
permutations = []
for p in [i for i in it.combinations_with_replacement(range(0,6), 10) if sum(i)==10]:
permutations += [x for x in set(it.permutations(p))]
(Explanation: each p is way of choosing 10 sets of values with the proper sum. The we use permutations to find all the ways to order that set of values.)
You chose the incorrect way to repeat things.
import itertools as it
permutations = [i for i in it.product(*it.repeat(range(6),10)) if sum(i)==10]
df = pd.DataFrame(data=permutations,columns=['x1','x2','x3','x4','x5','x6','x7','x8','x9','x10'])
This should get about 85228 results.

Finding maximum sum of occurrences of one element in two attempts from a list

Best explained by example. If a python list is -
[[0,1,2,0,4],
[0,1,2,0,2],
[1,0,0,0,1],
[1,0,0,1,0]]
I want to select two sub-lists which will yield the max sum of occurrences of zeros present - where sum is to be calculated as below
SUM = No. of zeros present in the first selected sub-list + No. of zeros present in the second selected sub-list which were not present in the first selected sub-list.
In this case, answer is 5. (First or second sub-list and the last sub-list). (Note that the third sub-list is not to be selected because it has zero present in 3rd index which is same as in first/second sub-list we have to select and it will amount to sum as 4 which will not be maximum if we consider the last sub-list)
What kind of algorithm is best suited if we were to apply it on a big input? Is there a better way to do this in better than in N2 time?
Binary operations are fairly useful for this task:
Convert each sublist to a binary number, where a 0 is turned into a 1 bit, and other numbers are turned into a 0 bit.
For example, [0,1,2,0,4] would be turned into 10010, which is 18.
Eliminate duplicate numbers.
Combine the remaining numbers pairwise and combine them with a binary OR.
Find the number with the most 1 bits.
The code:
lists = [[0,1,2,0,4],
[0,1,2,0,2],
[1,0,0,0,1],
[1,0,0,1,0]]
import itertools
def to_binary(lst):
num = ''.join('1' if n == 0 else '0' for n in lst)
return int(num, 2)
def count_ones(num):
return bin(num).count('1')
# Step 1 & 2: Convert to binary and remove duplicates
binary_numbers = {to_binary(lst) for lst in lists}
# Step 3: Create pairs
combinations = itertools.combinations(binary_numbers, 2)
# Step 4 & 5: Compute binary OR and count 1 digits
zeros = (count_ones(a | b) for a, b in combinations)
print(max(zeros)) # output: 5
The efficiency of the naive algorithm is O(n(n-1)*m) ~ O(n2m) where n is the number of lists and m is the length of each list. When n and m are comparable in magnitude, this equates to O(n3).
It might be helpful to observe that naive matrix multiplication is also O(n3). This might lead us to the following algorithm:
Write each list with only 1's and 0's, where a 1 indicates a non-zero entry.
Arrange these lists in a matrix A.
Compute the product M=AAT.
Find the minimum element in M; the row and column correspond to the lists which produce the maximize number of non-overlapping zeros.
Here, (3) is the limiting step of the algorithm. Asymptotically, depending on your matrix multiplication algorithm, you can achieve a complexity down to roughly O(n2.4).
An example Python implementation would look like:
import numpy as np
lists = [[0,1,2,0,4],
[0,1,2,0,2],
[1,0,0,0,1],
[1,0,0,1,0]]
filtered = list(set(tuple(1 if e else 0 for e in sub) for sub in lists))
A = np.mat(filtered)
D = np.einsum('ik,jk->ij', A, A)
indices= np.unravel_index(np.argmin(D), D.shape)
print(f'{indices}: {len(lists[0]) - D[indices]}') # (0, 3): 0
Note that this algorithm on it's own has the fundamental inefficiency that it is calculating both the lower-triangular and upper-triangular halves of dot product matrix. However, the numpy speed-up will probably offset this from the combinations approach. See the timing results below:
def numpy_approach(lists):
filtered = list(set(tuple(1 if e else 0 for e in sub) for sub in lists))
A = np.mat(filtered, dtype=bool).astype(int)
D = np.einsum('ik,jk->ij', A, A)
return len(lists[0]) - D.min()
def itertools_approach(lists):
binary_numbers = {int(''.join('1' if n == 0 else '0' for n in lst), 2)
for lst in lists}
combinations = itertools.combinations(binary_numbers, 2)
zeros = (bin(a | b).count('1') for a, b in combinations)
return max(zeros)
from time import time
N = 1000
lists = [[random.randint(0, 5) for _ in range(10)] for _ in range(100)]
for name, function in {
'numpy approach': numpy_approach,
'itertools approach': itertools_approach
}.items():
start = time()
for _ in range(N):
function(lists)
print(f'{name}: {time() - start}')
# numpy approach: 0.2698099613189697
# itertools approach: 0.9693171977996826
The algorithm should look something like (with Haskell code as example, so as not to make the process trivial for you in Python:
turn each sublist into "Is zero" or "Isn't zero"
map (map (\x -> if x==0 then 1 else 0)) bigList
Enumerate the list so you can keep indices
enumList = zip [0..] bigList
Compare each sublist with its successive sublists
myCompare = concat . go
where
go [] = []
go ((ix, xs):xss) = [((ix, iy), zipWith (.|.) xs ys) | (iy, ys) <- xss] : go xss
Calculate your maxes
best = maximumBy (compare `on` (sum . snd)) $ myCompare enumList
Pull out the indices
result = fst best

Algorithm - Grouping List in unique pairs

I'm having difficulties with an assignment I've received, and I am pretty sure the problem's text is flawed. I've translated it to this:
Consider a list x[1..2n] with elements from {1,2,..,m}, m < n. Propose and implement in Python an algorithm with a complexity of O(n) that groups the elements into pairs (pairs of (x[i],x[j]) with i < j) such as every element is present in a single pair. For each set of pairs, calculate the maximum sum of the pairs, then compare it with the rest of the sets. Return the set that has the minimum of those.
For example, x = [1,5,9,3] can be paired in three ways:
(1,5),(9,3) => Sums: 6, 12 => Maximum 12
(1,9),(5,3) => Sums: 10, 8 => Maximum 10
(1,3),(5,9) => Sums: 4, 14 => Maximum 14
----------
Minimum 10
Solution to be returned: (1,9),(5,3)
The things that strike me oddly are as follows:
Table contents definition It says that there are elements of 1..2n, from {1..m}, m < n. But if m < n, then there aren't enough elements to populate the list without duplicating some, which is not allowed. So then I would assume m >= 2n. Also, the example has n = 2 but uses elements that are greater than 1, so I assume that's what they meant.
O(n) complexity? So is there a way to combine them in a single loop? I can't think of anything.
My Calculations:
For n = 4:
Number of ways to combine: 6
Valid ways: 3
For n = 6
Number of ways to combine: 910
Valid ways: 15
For n = 8
Number of ways to combine: >30 000
Valid ways: ?
So obviously, I cannot use brute force and then figure out if it is valid after then. The formula I used to calculate the total possible ways is
C(C(n,2),n/2)
Question:
Is this problem wrongly written and impossible to solve? If so, what conditions should be added or removed to make it feasible? If you are going to suggest some code in python, remember I cannot use any prebuilt functions of any kind. Thank you
Assuming a sorted list:
def answer(L):
return list(zip(L[:len(L)//2], L[len(L)//2:][::-1]))
Or if you want to do it more manually:
def answer(L):
answer = []
for i in range(len(L)//2):
answer.append((L[i], L[len(L)-i-1)]))
return answer
Output:
In [3]: answer([1,3,5,9])
Out[3]: [(1, 9), (3, 5)]

Categories