Longest Snake Sequence in an Array

Longest Snake Sequence in an Array - python

Question : A set of numbers separated by space is passed as input. The program must print the largest snake sequence present in the numbers. A snake sequence is made up of adjacent numbers such that for each number, the number on the right or left is +1 or -1 of it's value. If multiple snake sequences of maximum length is possible print the snake sequence appearing in the natural input order.
Example Input/Output 1:
Input:
9 8 7 5 3 0 1 -2 -3 1 2
Output:
3 2 1 0 1
Example Input/Output 2:
Input:
-5 -4 -3 -1 0 1 4 6 5 4 3 4 3 2 1 0 2 -3 9
Output:
6 5 4 3 4 3 2 1 0 -1 0 1 2
Example Input/Output 3:
Input:
5 6 7 9 8 8
Output:
5 6 7 8 9 8
I have searched online & have only found references to find a snake sequence when a grid of numbers is given & not an array.
My Solution so far :
Create a 2D Array containing all the numbers from input as 1 value and the 2nd value being the max length sequence that can be generated starting from that number. But this doesn't always generate the max length sequence and doesn't work at all when there are 2 snakes of max length.

Assuming that the order in the original set of numbers does not matter, as seems to be the case in your question, this seems to be an instance of the Longest Path Problem, which is NP-hard.
Think of it that way: You can create a graph from your numbers, with edges between all pairs of nodes that have a difference of one. Now, the longest simple (acyclic) path in this graph is your solution. Your first example would correspond to this graph and path. (Note that there are two 1 nodes for the two ones in the input set.)
While this in itself does not solve your problem, it should help you getting started finding an algorithm to solve (or approximate) it, now that you know a better/more common name for the problem.
One algorithm works like this: Starting from each of the numbers, determine the "adjacent" numbers and do sort of a depth-first search through the graph to determine the longest path. Remember to temporarily remove the visited nodes from the graph. This has a worstcase complexity of O(2n) 1), but apparently it's sufficient for your examples.
def longest_snake(numbers, counts, path):
best = path
for n in sorted(counts, key=numbers.index):
if counts[n] > 0 and (path == [] or abs(path[-1] - n) == 1):
counts[n] -= 1
res = longest_snake(numbers, counts, path + [n])
if len(res) > len(best):
best = res
counts[n] += 1
return best
Example:
>>> from collections import Counter
>>> numbers = list(map(int, "9 8 7 5 3 0 1 -2 -3 1 2".split()))
>>> longest_snake(numbers, Counter(numbers), [])
[3, 2, 1, 0, 1]
Note that this algorithm will reliably find a maximum "snake" sequence, using no number more often than allowed. However, it may not find the specific sequence that's expected as the output, i.e. "the snake sequence appearing in the natural input order", whatever that's supposed to mean.
To get closer to the "natural order", you might try the numbers in the same order as they appear in the input (as I did with sorted), but that does not work perfectly, either. Anyway, I'm sure you can figure out the rest by yourself.
1) In this special case, the graph has a branching factor of 2, thus O(2n); in the more general case, the complexity would be closer to O(n!).

Related

Sorting a random array using permutation

I tried to sort an array by permuting it with itself
(the array contain all the numbers in range between 0 to its length-1)
so to test it I used random.shuffle but it had some unexpected results
a = np.array(range(10))
random.shuffle(a)
a = a[a]
a = a[a]
print(a)
# not a sorted array
# [9 5 2 3 1 7 6 8 0 4]
a = np.array([2,1,4,7,6,5,0,3,8,9])
a = a[a]
a = a[a]
print(a)
# [0 1 2 3 4 5 6 7 8 9]
so for some reason the permutation when using the second example of an unsorted array returns the sorted array as expected but the shuffled array doesn't work the same way.
Does anyone know why? Or if there is an easier way to sort using permutation or something similar it would be great.

TL;DR
There is no reason to expect a = a[a] to sort the array. In most cases it won't. In case of a coincidence it might.
What is the operation c = b[a]? or Applying a permutation
When you use an array a obtained by shuffling range(n) as a mask for an array b of same size n, you are applying a permutation, in the mathematical sense, to the elements of b. For instance:
a = [2,0,1]
b = np.array(['Alice','Bob','Charlie'])
print(b[a])
# ['Charlie' 'Alice' 'Bob']
In this example, array a represents the permutation (2 0 1), which is a cycle of length 3. Since the length of the cycle is 3, if you apply it three times, you will end up where you started:
a = [2,0,1]
b = np.array(['Alice','Bob','Charlie'])
c = b
for i in range(3):
c = c[a]
print(c)
# ['Charlie' 'Alice' 'Bob']
# ['Bob' 'Charlie' 'Alice']
# ['Alice' 'Bob' 'Charlie']
Note that I used strings for the elements of b ton avoid confusing them with indices. Of course, I could have used numbers from range(n):
a = [2,0,1]
b = np.array([0,1,2])
c = b
for i in range(3):
c = c[a]
print(c)
# [2 0 1]
# [1 2 0]
# [0 1 2]
You might see an interesting, but unsurprising fact: The first line is equal to a; in other words, the first result of applying a to b is equal to a itself. This is because b was initialised to [0 1 2], which represent the identity permutation id; thus, the permutations that we find by repeatedly applying a to b are:
id == a^0
a
a^2
a^3 == id
Can we always go back where we started? or The rank of a permutation
It is a well-known result of algebra that if you apply the same permutation again and again, you will eventually end up on the identity permutation. In algebraic notations: for every permutation a, there exists an integer k such that a^k == id.
Can we guess the value of k?
The minimum value of k is called the rank of a permutation.
If a is a cycle, then the minimum possible k is the length of the cycle. In our previous example, a was a cycle of length 3, so it took three applications of a before we found the identity permutation again.
How about a cycle of length 2? A cycle of length 2 is just "swapping two elements". For instance, swapping elements 0 and 1:
a = [1,0,2]
b = np.array([0,1,2])
c = b
for i in range(2):
c = c[a]
print(c)
# [1 0 2]
# [0 1 2]
We swap 0 and 1, then we swap them back.
How about two disjoint cycles? Let's try a cycle of length 3 on the first three elements, simultaneously with swapping the last two elements:
a = [2,0,1,3,4,5,7,6]
b = np.array([0,1,2,3,4,5,6,7])
c = b
for i in range(6):
c = c[a]
print(c)
# [2 0 1 3 4 5 7 6]
# [1 2 0 3 4 5 6 7]
# [0 1 2 3 4 5 7 6]
# [2 0 1 3 4 5 6 7]
# [1 2 0 3 4 5 7 6]
# [0 1 2 3 4 5 6 7]
As you can see by carefully examining the intermediary results, there is a period of length 3 on the first three elements, and a period of length 2 on the last two elements. The overall period is the least common multiple of the two periods, which is 6.
What is k in general? A well-known theorem of algebra states: every permutation can be written as a product of disjoint cycles. The rank of a cycle is the length of the cycle. The rank of a product of disjoint cycles is the least common multiple of the ranks of cycles.
A coincidence in your code: sorting [2,1,4,7,6,5,0,3,8,9]
Let us go back to your python code.
a = np.array([2,1,4,7,6,5,0,3,8,9])
a = a[a]
a = a[a]
print(a)
# [0 1 2 3 4 5 6 7 8 9]
How many times did you apply permutation a? Note that because of the assignment a =, array a changed between the first and the second lines a = a[a]. Let us dissipate some confusion by using a different variable name for every different value. Your code is equivalent to:
a = np.array([2,1,4,7,6,5,0,3,8,9])
a2 = a[a]
a4 = a2[a2]
print(a4)
Or equivalently:
a = np.array([2,1,4,7,6,5,0,3,8,9])
a4 = (a[a])[a[a]]
This last line looks a little bit complicated. However, a cool result of algebra is that composition of permutations is associative. You already knew that addition and multiplication were associative: x+(y+z) == (x+y)+z and x(yz) == (xy)z. Well, it turns out that composition of permutations is associative as well! Using numpy's masks, this means that:
a[b[c]] == (a[b])[c]
Thus your python code is equivalent to:
a = np.array([2,1,4,7,6,5,0,3,8,9])
a4 = ((a[a])[a])[a]
print(a4)
Or without the unneeded parentheses:
a = np.array([2,1,4,7,6,5,0,3,8,9])
a4 = a[a][a][a]
print(a4)
Since a4 is the identity permutation, this tells us that the rank of a divides 4. Thus the rank of a is 1, 2 or 4. This tells us that a can be written as a product of swaps and length-4 cycles. The only permutation of rank 1 is the identity itself. Permutations of rank 2 are products of disjoint swaps, and we can see that this is not the case of a. Thus the rank of a must be exactly 4.
You can find the cycles by choosing an element, and following its orbit: what values is that element successively transformed into? Here we see that:
0 is transformed into 2; 2 is transformed into 4; 4 is transformed into 6; 6 is transformed into 0;
1 remains untouched;
3 becomes 7; 7 becomes 3;
5 is untouched; 8 and 9 are untouched.
Conclusion: Your numpy array represents the permutation (0 -> 2 -> 4 -> 6 -> 0)(3 <-> 7), and its rank is the least common multiple of 4 and 2, lcm(4,2) == 4.

it's took some time but I figure a way to do it.
numpy doesn't have this fiture but panda does have.
by using df.reindex I can sort a data frame by it indexes
import pandas as pd
import numpy as np
train_df = pd.DataFrame(range(10))
train_df = train_df.reindex(np.random.permutation(train_df.index))
print(train_df) # random dataframe contaning all values up to 9
train_df = train_df.reindex(range(10))
print(train_df) # sort data frame

Why does the last element reflect the number of non-negative solutions?

Please excuse my naivete as I don't have much programming experience. While googling something for an unrelated question, I stumbled upon this:
https://www.geeksforgeeks.org/find-number-of-solutions-of-a-linear-equation-of-n-variables/
I completely understand the first (extremely inefficient) bit of code. But the second:
def countSol(coeff, n, rhs):
# Create and initialize a table
# to store results of subproblems
dp = [0 for i in range(rhs + 1)]
dp[0] = 1
# Fill table in bottom up manner
for i in range(n):
for j in range(coeff[i], rhs + 1):
dp[j] += dp[j - coeff[i]]
return dp[rhs]
confuses me. My question being: why does this second program count the number of non-negative integer solutions?
I have written out several examples, including the one given in the article, and I understand that it does indeed do this. And I understand how it is populating the list. But I don't understand exactly why this works.
Please excuse what must be, to some, an ignorant question. But I would quite like to understand the logic, as I think it rather clever that such a little snip-it is able able to answer a question as general as "How many non negative integer solutions exist" (for some general equation).

This algorithms is pretty cool and demonstrates the power of looking for a solution from a different perspective.
Let's take a example: 3x + 2y + z = 6, where LHS is the left hand side and RHS is the right hand side.
dp[k] will keep track of the number of unique ways to arrive at a RHS value of k by substituting non-negative integer values for LHS variables.
The i loop iterates over the variables in the LHS. The algorithm begins with setting all the variables to zero. So, the only possible k value is zero, hence
k 0 1 2 3 4 5 6
dp[k] = 1 0 0 0 0 0 0
For i = 0, we will update dp to reflect what happens if x is 1 or 2. We don't care about x > 2 because the solutions are all non-negative and 3x would be too big. The j loop is responsible for updating dp and dp[k] gets incremented by dp[k - 3] because we can arrive at RHS value k by adding one copy of the coefficient 3 to k-3. The result is
k 0 1 2 3 4 5 6
dp[k] = 1 0 0 1 0 0 1
Now the algorithm continues with i = 1, updating dp to reflect all possible RHS values where x is 0, 1, or 2 and y is 0, 1, 2, or 3. This time the j loop increments dp[k] by dp[k-2] because we can arrive at RHS value k by adding one copy of the coefficient 2 to k-2, resulting in
k 0 1 2 3 4 5 6
dp[k] = 1 0 1 1 1 1 2
Finally, the algorithm incorporates z = 1, 2, 3, 4, 5, or 6, resulting in
k 0 1 2 3 4 5 6
dp[k] = 1 1 2 3 4 5 7
In addition to computing the answer in pseudo-polynomial time, dp encodes the answer for every RHS <= the input right hand side.

Find Repeating Sublist Within Large List

I have a large list of sub-lists (approx. 16000) that I want to find where the repeating pattern starts and ends. I am not 100% sure that there is a repeat, however I have a strong reason to believe so, due to the diagonals that appear within the sub-list sequence. The structure of a list of sub-lists is preferred, as it is used that way for other things in this script. The data looks like this:
data = ['1100100100000010',
'1001001000000110',
'0010010000001100',
'0100100000011011', etc
I do not have any time constraints, however the fastest method would not be frown upon. The code should be able to return the starting/ending sequence and location within the list, to be called upon in the future. If there is an arrangement of the data that would be more useful, I can try to reformat it if necessary. Python is something that I have been learning for the past few months, so I am not quite able to just create my own algorithms from scratch just yet. Thank you!

Here's some fairly simple code that scans a string for adjacent repeating subsequences. Set minrun to the length of the smallest subsequences that you want to check. For each match, the code prints the starting index of the first subsequence, the length of the subsequence, and the subsequence itself.
data = [
'1100100100000010',
'1001001000000110',
'0010010000001100',
'0100100000011011',
]
data = ''.join(data)
minrun = 3
lendata = len(data)
for runlen in range(minrun, lendata // 2):
i = 0
while i < lendata - runlen * 2:
s1 = data[i:i + runlen]
s2 = data[i + runlen:i + runlen * 2]
if s1 == s2:
print(i, runlen, s1)
i += runlen
else:
i += 1
output
1 3 100
4 3 100
8 3 000
15 3 010
18 3 010
23 3 000
32 3 001
38 3 000
47 3 001
53 3 000
17 15 001001000000110
32 15 001001000000110
Note that we get the same sequence of length 3 at index 15 and 18 = 15 + 3 : 010; that indicates that there are 3 adjacent copies of 010. Similarly, there are 3 adjacent copies of the sequence at index 17 of length 15.

In Python Dictionaries, how does ( (j*5)+1 ) % 2i cycle through all 2i

I am researching how python implements dictionaries. One of the equations in the python dictionary implementation relates the pseudo random probing for an empty dictionary slot using the equation
j = ((j*5) + 1) % 2**i
which is explained here.
I have read this question, How are Python's Built In Dictionaries Implemented?, and basically understand how dictionaries are implemented.
What I don't understand is why/how the equation:
j = ((j*5) + 1) % 2**i
cycles through all the remainders of 2**i. For instance, if i = 3 for a total starting size of 8. j goes through the cycle:
0
1
6
7
4
5
2
3
0
if the starting size is 16, it would go through the cycle:
0 1 6 15 12 13 2 11 8 9 14 7 4 5 10 3 0
This is very useful for probing all the slots in the dictionary. But why does it work ? Why does j = ((j*5)+1) work but not j = ((j*6)+1) or j = ((j*3)+1) both of which get stuck in smaller cycles.
I am hoping to get a more intuitive understanding of this than the equation just works and that's why they used it.

This is the same principle that pseudo-random number generators use, as Jasper hinted at, namely linear congruential generators. A linear congruential generator is a sequence that follows the relationship X_(n+1) = (a * X_n + c) mod m. From the wiki page,
The period of a general LCG is at most m, and for some choices of factor a much less than that. The LCG will have a full period for all seed values if and only if:
m and c are relatively prime.
a - 1 is divisible by all prime factors of m.
a - 1 is divisible by 4 if m is divisible by 4.
It's clear to see that 5 is the smallest a to satisfy these requirements, namely
2^i and 1 are relatively prime.
4 is divisible by 2.
4 is divisible by 4.
Also interestingly, 5 is not the only number that satisfies these conditions. 9 will also work. Taking m to be 16, using j=(9*j+1)%16 yields
0 1 10 11 4 5 14 15 8 9 2 3 12 13 6 7
The proof for these three conditions can be found in the original Hull-Dobell paper on page 5, along with a bunch of other PRNG-related theorems that also may be of interest.

calculating a frequency band around the mode in pandas/numpy

I have a pandas series of value_counts for a data set. I would like to plot the data with a color band (I'm using bokeh, but calculating the data band is the important part):
I hesitate to use the word standard deviation since all the references I use calculate that based on the mean value, and I specifically want to use the mode as the center.
So, basically, I'm looking for a way in pandas to start at the mode and return a new series that of value counts that includes 68.2% of the sum of the value_counts. If I had this series:
val count
1 0
2 0
3 3
4 1
5 2
6 5 <-- mode
7 4
8 3
9 2
10 1
total = sum(count) # example value 21
band1_count = 21 * 0.682 # example value ~ 14.3
This is the order they would be added based on an algorithm that walks the value count on each side of the mode and includes the higher of the two until the sum of the counts is > than 14.3.
band1_values = [6, 7, 8, 5, 9]
Here are the steps:
val count step
1 0
2 0
3 3
4 1
5 2 <-- 4) add to list -- eq (9,2), closer to (6,5)
6 5 <-- 1) add to list -- mode
7 4 <-- 2) add to list -- gt (5,2)
8 3 <-- 3) add to list -- gt (5,2)
9 2 <-- 5) add to list -- gt (4,1), stop since sum of counts > 14.3
10 1
Is there a native way to do this calculation in pandas or numpy? If there is a formal name for this study, I would appreciate knowing what it's called.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Longest Snake Sequence in an Array - python

Related

Sorting a random array using permutation

Why does the last element reflect the number of non-negative solutions?

Find Repeating Sublist Within Large List

In Python Dictionaries, how does ( (j*5)+1 ) % 2i cycle through all 2i

calculating a frequency band around the mode in pandas/numpy

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Longest Snake Sequence in an Array - python

Related

Sorting a random array using permutation

Why does the last element reflect the number of non-negative solutions?

Find Repeating Sublist Within Large List

In Python Dictionaries, how does ( (j*5)+1 ) % 2**i cycle through all 2**i

calculating a frequency band around the mode in pandas/numpy

Categories

Resources

In Python Dictionaries, how does ( (j*5)+1 ) % 2i cycle through all 2i