So I have a pretty simple dynamic programming solution to the "longest increasing subsequence" problem (find the longest subsequence of increasing elements in a given sequence, for instance for [1, 7, 2, 6, 4] it would be [1, 2, 4]), which can also find the actual subsequence (as opposed to just lenght):
sequence = [1, 8, 6, 4, 9, 8, 3, 5, 2, 7, 1, 9, 5, 7]
listofincreasing = [[] for _ in range(len(sequence))]
listofincreasing[0].append(sequence[0])
for right in range(1, len(sequence)):
for left in range(right):
if (sequence[left] < sequence[right]) and (len(listofincreasing[right]) < len(listofincreasing[left])):
listofincreasing[right] = [] + listofincreasing[left]
listofincreasing[right].append(sequence[right])
print(max(listofincreasing, key=len))
These sort of brainteasers are pretty manageable for me, but I don't really know the hard theory behind this. My question is this: How would I go about creating a cost function that would formally describe "how I am filling the list', so to speak? Could someone show me how to approach these problems on this example? Thanks in advance.
Edit - some people asked for a clarification. In the most succint way possible, I would need to create a mathematic function in the the exact same way as it is created here: https://medium.com/#pp7954296/change-making-problem-dynamic-method-4954a446a511 in the "formula to solve coin change using dynamic method:" section, but not for the change making problem but for my solution of the longest increasing subsequence problem
You are looking for a recursive formulation of the overlapping subproblems in your dynamic programming solution.
Let LONGEST(S,x) be the longest increasing subsequence of the first x characters of the sequence S. The solution to the problem is then LONGEST(S,|S|).
Recursively (using 1-based indexing):
LONGEST(S,x) = S[1] if x = 1. Otherwise,
LONGEST(S,x) = the longest of:
S[x],
LONGEST(S,y), where 1 <= y < x, or
LONGEST(S,y) + S[x], where 1 <= y < x and LAST_ELMENT(LONGEST(S,y)) < S[x]
Since LONGEST(S,x) depends only on the values for smaller prefixes, we can produce the values iteratively in order of increasing x, and that is what your program does.
Related
I have been tasked to write an algorithm for a project. Basically, i scan the data to get the unique items and store their positions in an array. So i end up with multiple arrays with variable length. Now i have to do element wise operations on ALL of these arrays and their elements. Note that these will always be sorted (if that matters)
a = [0, 7, 13, 18]
b = [1, 2, 8, 10]
c = [0, 3, 5, 6, 7]
The current solution i have is a pretty basic loop solution where i loop through every array and compare its element with every other array and its elements. It works for small number of arrays and, as you can imagine, doesn't work well where i have a lot of unique items each with their own array/list.
def add(a, b):
result = []
for i in range(len(a)):
for j in range(len(b)):
result.append(a[i] + b[j])
return result
a = [0, 7, 13, 18]
b = [1, 2, 8, 10]
c = [0, 3, 5, 6, 7]
total_unique_items = [a, b, c]
calc = []
for i in range(len(total_unique_items)):
for j in range(i+1, len(total_unique_items)):
calc.append(add(total_unique_items[i], total_unique_items[j]))
print(calc)
I know there are pythonic solutions like zip but my teacher is asking for a generic language-independent solution here.
I am not really sure how to tackle this problem. One way would be to use a data structure like a tree or a graph and traverse through it? the other way would be to find a way to perform the operation on all the array's ith elements in ith iteration of the loop. This way, my main loop would run for the length of the longest array. I am just really confused about it and would love to get an idea of the direction i should go from here.
I have 6 test questions that I want to randomize, together with their correct answers. Questions #1 and #2, #3 and #4, #5 and #6 are of the same type. In order not to make the test too easy, I don't want show #1 and #2 in a row (nor #3 and #4, or #5 and #6, for this matter).
For this purpose, I think I should shuffle a list [1, 2, 3, 4, 5, 6] with this constraint: 1 and 2, 3 and 4, 5 and 6 are not adjacent. For example, [1, 2, 4, 6, 3, 5] is not acceptable because 1 and 2 are next to each other. Then, I want apply the new order to both the question list and the answer list.
As someone new to programming, I only know how to shuffle a list without constraint, like so:
question = [1, 3, 5, 2, 4, 6]
answer = ['G', 'Y', 'G', 'R', 'Y', 'R']
order = list(zip(question, answer))
random.shuffle(order)
question, answer = zip(*order)
Any help would be appreciated!
Here's a "brute force" approach. It just shuffles the list repeatedly until it finds a valid ordering:
import random
def is_valid(sequence):
similar_pairs = [(1, 2), (3, 4), (5, 6)]
return all(
abs(sequence.index(a) - sequence.index(b)) != 1
for a, b in similar_pairs
)
sequence = list(range(1, 7))
while not is_valid(sequence):
random.shuffle(sequence)
print(sequence)
# One output: [6, 2, 4, 5, 3, 1]
For inputs this small, this is fine. (Computers are fast.) For longer inputs, you'd want to think about doing something more efficient, but it sounds like you're after a simple practical approach, not a theoretically optimal one.
I see two simple ways:
Shuffle the list and accept the shuffle if it satisfies the constraints, else repeat.
Iteratively sample numbers and use the constraints to limit the possible numbers. For example, if you first draw 1 then the second draw can be 3..6. This could also result in a solution that is infeasible so you'll have to account for that.
Draw a graph with your list elements as vertices. If elements u and v can be adjacent in the output list, draw an edge (u,v) between them, otherwise do not.
Now you have to find a Hamiltonian path on this graph. This problem is generally intractable (NP-complete) but if the graph is almost complete (there are few constraints, i.e. missing edges) it can be effectively solved by DFS.
For a small input set like in your example it could be easier to just generate all permutations and then filter out those that violate one of the constraints.
You can try this. This should work fine for small lists. So as you can see below, I used a list of python sets for the constraints. The code builds the permutation you require element by element.
Building element by element can lead to invalid permutation if at some point the remaining elements in the list are all limited by the constraint.
Example: If the code makes 4,1,3,2,6 It is forced to try using 5 as the last element, but that is invalid, so the function tries to make another permutation.
It is better than the brute force approach(in terms of performance) of generating a random shuffle and checking if its valid(The answer given by smarx).
Note: The function would result in an infinite loop if no permutation satisfying the constraints is possible.
import random
def shuffler(dataList, constraints):
my_data_list = list(dataList)
shuffledList = [random.choice(dataList)]
my_data_list.remove(shuffledList[0])
for i in range(1, list_size):
prev_ele = shuffledList[i - 1]
prev_constraint = set()
for sublist in constraints:
if prev_ele in sublist:
prev_constraint = set.union(prev_constraint, sublist)
choices = [choice for choice in my_data_list if choice not in prev_constraint]
if len(choices) == 0:
print('Trying once more...')
return shuffler(dataList,constraints)
curr_ele = random.choice(choices)
my_data_list.remove(curr_ele)
shuffledList.append(curr_ele)
return shuffledList
if __name__ == '__main__':
dataList = [1, 2, 3, 4, 5, 6]
list_size = len(dataList)
constraints = [{1,2},{3,4},{5,6}]
print(shuffler(dataList,constraints))
You could try something like:
shuffle the list
while (list is not good)
find first invalid question
swap first invalid question with a different random question
endwhile
I haven't done any timings, but it might run faster than reshuffling the whole list. It partly preserves the valid part before the first invalid question, so it should reach a good ordering faster.
I'm looking at getting values in a list with an increment.
l = [0,1,2,3,4,5,6,7]
and I want something like:
[0,4,6,7]
At the moment I am using l[0::2] but I would like sampling to be sparse at the beginning and increase towards the end of the list.
The reason I want this is because the list represents the points along a line from the center of a circle to a point on its circumference. At the moment I iterate every 10 points along the lines and draw a circle with a small radius on each. Therefore, my circles close to the center tend to overlap and I have gaps as I get close to the circle edge. I hope this provides a bit of context.
Thank you !
This can be more complicated than it sounds... You need a list of indices starting at zero and ending at the final element position in your list, presumably with no duplication (i.e. you don't want to get the same points twice). A generic way to do this would be to define the number of points you want first and then use a generator (scaled_series) that produces the required number of indices based on a function. We need a second generator (unique_ints) to ensure we get integer indices and no duplication.
def scaled_series(length, end, func):
""" Generate a scaled series based on y = func(i), for an increasing
function func, starting at 0, of the specified length, and ending at end
"""
scale = float(end) / (func(float(length)) - func(1.0))
intercept = -scale * func(1.0)
print 'scale', scale, 'intercept', intercept
for i in range(1, length + 1):
yield scale * func(float(i)) + intercept
def unique_ints(iter):
last_n = None
for n in iter:
if last_n is None or round(n) != round(last_n):
yield int(round(n))
last_n = n
L = [0, 1, 2, 3, 4, 5, 6, 7]
print [L[i] for i in unique_ints(scaled_series(4, 7, lambda x: 1 - 1 / (2 * x)))]
In this case, the function is 1 - 1/2x, which gives the series you want [0, 4, 6, 7]. You can play with the length (4) and the function to get the kind of spacing between the circles you are looking for.
I am not sure what exact algorithm you want to use, but if it is non-constant, as your example appears to be, then you should consider creating a generator function to yield values:
https://wiki.python.org/moin/Generators
Depending on what your desire here is, you may want to consider a built in interpolator like scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html#scipy.interpolate.interp1d
Basically, given your question, you can't do it with the basic slice operator. Without more information this is the best answer I can give you :-)
Use the slice function to create a range of indices. You can then extend your sliced list with other slices.
k = [0,1,2,3,4,5,6,7]
r = slice(0,len(k)//2,4)
t = slice(r.stop,None,1)
j = k[r]
j.extend(k[t])
print(j) #outputs: [0,4,5,6,7]
What I would do is just use list comprehension to retrieve the values. It is not possible to do it just by indexing. This is what I came up with:
l = [0, 1, 2, 3, 4, 5, 6, 7]
m = [l[0]] + [l[1+sum(range(3, s-1, -1))] for s in [x for x in range(3, 0, -1)]]
and here is a breakdown of the code into loops:
# Start the list with the first value of l (the loop does not include it)
m = [l[0]]
# Descend from 3 to 1 ([3, 2, 1])
for s in range(3, 0, -1):
# append 1 + sum of [3], [3, 2] and [3, 2, 1]
m.append(l[ 1 + sum(range(3, s-1, -1)) ])
Both will give you the same answer:
>>> m
[0, 4, 6, 7]
I made this graphic that would I hope will help you to understand the process:
While this question is formulated using the Python programming language, I believe it is more of a programming logic problem.
I have a list of all possible combinations, i.e.: n choose k
I can prepare such a list using
import itertools
bits_list = list(itertools.combinations(range(n), k))
If 'n' is 100, and `k' is 5, then the length of 'bits_list' will be 75287520.
Now, I want to prune this list, such that numbers appear in groups, or they don't. Let's use the following sets as an example:
Set 1: [0, 1, 2]
Set 2: [57, 58]
Set 3: [10, 15, 20, 25]
Set 4: [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Here each set needs to appear in any member of the bits_list together, or not at all.
So far, I only have been able to think of a brute-force if-else method of solving this problem, but the number of if-else conditions will be very large this way.
Here's what I have:
bits_list = [x for x in list(itertools.combinations(range(n), k))
if all(y in x for y in [0, 1, 2]) or
all(y not in x for y in [0, 1, 2])]
Now, this only covered Set 1. I would like to do this for many sets. If the length of the set is longer than the value of k, we can ignore the set (for example, k = 5 and Set 4).
Note, that the ultimate aim is to have 'k' iterate over a range, say [5:25] and work on the appended list. The size of the list grows exponentially here and computationally speaking, very expensive!
With 'k' as 10, the python interpreter interrupts the process before completion on any average laptop with 16 GB RAM. I need to find a solution that fits in the memory of a relatively modern server (not a cluster or a server farm).
Any help is greatly appreciated!
P.S.: Intuitively, think of this problem as generating all the possible cases for people boarding a public bus or train system. Usually, you board an entire group or you don't board anyone.
UPDATE:
For the given sets above, if k = 5, then a valid member of bits_list would be [0, 1, 2, 57, 58], i.e.: a combination of Set1 and Set2. If k = 10, then we could have built Set1 + Set2 + Set3 + NoSetElement as a possible member. #DonkeyKong's solution made me realize I haven't mentioned this explicitly in my question.
I have a lot of sets; I intend to use enough sets to prune the full list of combinations such that the bits_list eventually fits into memory.
#9000's suggestion is perfectly valid here, that during each iteration, I can save the combinations as actual bits.
This still gets crushed by a memory error (which I don't see how you're getting away from if you insist on a list) at a certain point (around n=90, k=5), but it is much faster than your current implementation. For n=80 and k=5, my rudimentary benchmarking had my solution at 2.6 seconds and yours around 52 seconds.
The idea is to construct the disjoint and subset parts of your filter separately. The disjoint part is trivial, and the subset part is calculated by taking the itertools.product of all disjoint combinations of length k - set_len and the individual elements of your set.
from itertools import combinations, product, chain
n = 80
k = 5
set1 = {0,1,2}
nots = set(range(n)) - set1
disj_part = list(combinations(nots, k))
subs_part = [tuple(chain(x, els)) for x, *els in
product(combinations(nots, k - len(set1)), *([e] for e in set1))]
full_l = disj_part + subs_part
If you actually represented your bits as bits, that is, 0/1 values in a binary representation of an integer n bits long with exactly k bits set, the amount of RAM you'd need to store the data would be drastically smaller.
Also, you'd be able to use bit operations to look check if all bits in a mask are actually set (value & mask == mask), or all unset (value | ~mask == value).
The brute-force will probably take shorter that the time you'd spend thinking about a more clever algorithm, so it's totally OK for a one-off filtering.
If you must execute this often and quickly, and your n is in small hundreds or less, I'd rather use cython to describe the brute-force algorithm efficiently than look at algorithmic improvements. Modern CPUs can efficiently operate on 64-bit numbers; you won't benefit much from not comparing a part of the number.
OTOH if your n is really large, and the number of sets to compare to is also large, you could partition your bits for efficient comparison.
Let's suppose you can efficiently compare a chunk of 64 bits, and your bit lists contain e.g. 100 chunks each. Then you can do the same thing you'd do with strings: compare chunk by chunk, and if one of the chunks fails to match, do not compare the rest.
A faster implementation would be to replace the if and all() statements in:
bits_list = [x for x in list(itertools.combinations(range(n), k))
if all(y in x for y in [0, 1, 2]) or
all(y not in x for y in [0, 1, 2])]
with python's set operations isdisjoint() and issubset() operations.
bits_generator = (set(x) for x in itertools.combinations(range(n), k))
first_set = set([0,1,2])
filter_bits = (x for x in bits_generator
if x.issubset(first_set) or
x.isdisjoint(first_set))
answer_for_first_set = list(filter_bits)
I can keep going using generators and with generators you won't run out of memory but you will be waiting and hastening the heat death of the universe. Not because of python's runtime or other implementation details but because there are some problems that are just not feasible even in computer time if you pick a large N and K values.
Based on the ideas from #Mitch's answer, I created a solution with a slightly different thinking than originally presented in the question. Instead of creating the list (bits_list) of all combinations and then pruning those combinations that do not match the sets listed, I built bits_list from the sets.
import itertools
all_sets = [[0, 1, 2], [3, 4, 5], [6, 7], [8], [9, 19, 29], [10, 20, 30],
[11, 21, 31], [12, 22, 32], ...[57, 58], ... [95], [96], [97]]
bits_list = [list(itertools.chain.from_iterable(x)) for y in [1, 2, 3, 4, 5]
for x in itertools.combinations(all_sets, y)]
Here, instead of finding n choose k, and then looping for all k, and finding combinations which match the sets, I started from the sets, and even included the individual members as sets by themselves and therefore removing the need for the 2 components - the disjoint and the subset parts - discussed in #Mitch's answer.
In Python you can get the intersection of two sets doing:
>>> s1 = {1, 2, 3, 4, 5, 6, 7, 8, 9}
>>> s2 = {0, 3, 5, 6, 10}
>>> s1 & s2
set([3, 5, 6])
>>> s1.intersection(s2)
set([3, 5, 6])
Anybody knows the complexity of this intersection (&) algorithm?
EDIT: In addition, does anyone know what is the data structure behind a Python set?
The data structure behind the set is a hash table where the typical performance is an amortized O(1) lookup and insertion.
The intersection algorithm loops exactly min(len(s1), len(s2)) times. It performs one lookup per loop and if there is a match performs an insertion. In pure Python, it looks like this:
def intersection(self, other):
if len(self) <= len(other):
little, big = self, other
else:
little, big = other, self
result = set()
for elem in little:
if elem in big:
result.add(elem)
return result
The answer appears to be a search engine query away. You can also use this direct link to the Time Complexity page at python.org. Quick summary:
Average: O(min(len(s), len(t))
Worst case: O(len(s) * len(t))
EDIT: As Raymond points out below, the "worst case" scenario isn't likely to occur. I included it originally to be thorough, and I'm leaving it to provide context for the discussion below, but I think Raymond's right.
Set intersection of two sets of sizes m,n can be achieved with O(max{m,n} * log(min{m,n})) in the following way:
Assume m << n
1. Represent the two sets as list/array(something sortable)
2. Sort the **smaller** list/array (cost: m*logm)
3. Do until all elements in the bigger list has been checked:
3.1 Sort the next **m** items on the bigger list(cost: m*logm)
3.2 With a single pass compare the smaller list and the m items you just sorted and take the ones that appear in both of them(cost: m)
4. Return the new set
The loop in step 3 will run for n/m iterations and each iteration will take O(m*logm), so you will have time complexity of O(nlogm) for m << n.
I think that's the best lower bound that exists