Identify overlap sequence - python

I have a question, I am building a program where I want to identify the overlapped sequence of two lists.
sequ_1 = 'blablablaaaabla'
seque_2 = 'aaablaccbla'
The expected output would be: 'aaabla'
I created this function that would tell me whether or not the sequence from the sequ_2 overlaps with sequ_1:
` def overlap (sequ_1, sequ_2):
count_overlapp = 0
count_nooverlapp = 0
for i in range(len(sequ_1)-len(sequ_2)+1):
if loci1[i:i+len(sequ_1)]==sequ_2:
count_overlapp= count_overlapp+1
else: count_nooverlapp = count_nooverlapp +1
return print(f'The number of overlapped sequence:{count_overlapp}\nThe number of not overlapped sequence:{count_nooverlapp}')`
But it only computes well if the sequ_2 is within sequ_1, but it doesnt if sequ_2 is partially overlaps sequ_1
What would be great is to identify the sequence that overlapps as well
Thank you very much in advance

maybe this solution is that you need:
sequ_1 = 'blablablaaaabla'
seque_2 = 'aaablaccbla'
overlap = ""
for i in range(len(seque_2)):
for j in range(i + 1, len(seque_2)):
tmp = seque_2[i:j+1]
if len(tmp) > len(overlap) and tmp in sequ_1:
overlap = tmp
print(overlap)
Output: aaabla

Related

List Comprehensions method to generate a sequence of random and unique numbers

I am writing a programme to generate 6 numbers (lotto style). I want to then generate a second number and compare the two and see how long it takes (in terms of counts) before the two sets of numbers match.
This is my code :
import random
range_of_numbers = [i for i in range(1,60)]
def draw_a_ticket():
total_numbers = range_of_numbers = [i for i in range(1,60)]
draw = []
i = 0
while i < 6:
num = random.choice(total_numbers)
total_numbers.remove(num)
draw.append(num)
i += 1
return draw
draw = draw_a_ticket()
draw1 = draw_a_ticket()
counter = 0
while draw[0:2] != draw1[0:2]: # I am using [0:2] to reduce the complexity/find match sooner
counter += 1
draw = draw1
draw1 = draw_a_ticket()
print(f"{counter} : Draw:{draw} - Draw1:{draw1}")
The code above works fine. But I am trying to be more pythonic and use list comprehensions to generate the numbers sets.
Ive tried the following - but I get an invalid syntax:
draw = [i = set(random.randint(1,60)) in range(1,7)]
print(draw)
The key features I am trying to achieve in a list comprehension is to:
generate 6 unique random integers between 1 and 59
store these in a list.
Thanks for any help.
for your question generate 6 unique random integers between 1 and 59 and store them in list you can use random.sample()
Return a k length list of unique elements chosen from the population
sequence or set. Used for random sampling without replacement.
try this :
draw=random.sample(range(1,59),6)
for all your program you can do it like this :
import random
def draw_a_ticket():
return random.sample(range(1,60),6)
draw = draw_a_ticket()
draw1 = draw_a_ticket()
counter = 0
while draw[0:2] != draw1[0:2]: # I am using [0:2] to reduce the complexity/find match sooner
counter += 1
draw = draw1
draw1 = draw_a_ticket()
print(f"{counter} : Draw:{draw} - Draw1:{draw1}")
if you want your program select draw only once you can append the generated draw to a list of selected draws :
like this :
import random
selected_draw=[]
def draw_a_ticket():
draw=random.sample(range(1,60),6)
if draw in selected :
draw_a_ticket()
selected_draw.append(draw)
return draw
draw = draw_a_ticket()
draw1 = draw_a_ticket()
counter = 0
while draw[0:2] != draw1[0:2]: # I am using [0:2] to reduce the complexity/find match sooner
counter += 1
draw = draw1
draw1 = draw_a_ticket()
print(f"{counter} : Draw:{draw} - Draw1:{draw1}")
Your second approach is fine, except for that you are trying to do assignment in a list comprehension and converting an int from random.randint to a set. Your draw_ticket function should be like this:
def draw_ticket():
numbers = list(range(0, 60)) # no need for a list comprehension to get a list from a
# range - just convert it directly
draw = [numbers.pop(numbers.index(random.choice(numbers))) + 1 for n in range(6)]
return draw
Above code is not easy to understand (as most list comprehensions), so I made an easy-to-understand-version below:
def draw_ticket():
numbers = list(range(0, 60))
draw = []
for n in range(6): # iterate 6 times
index = numbers.index(random.choice(numbers))
number = numbers.pop(index)
draw.append(number)
return draw
So that's what the above list comprehension does.
Thanks to everyone who responded, really appreciated the suggestions.
Ive kind of pieced all the answers together and come up with a version of the programme that doesn't use the function but seems to work ok:
import random
draw = list(set([(random.randint(1,60)) for i in range(1,7)]))
draw1 = list(set([(random.randint(1,60)) for i in range(1,7)]))
counter = 0
while draw[0:2] != draw1[0:2]: # using [0:2] reduces the complexity
counter += 1
draw = draw1
draw1 = list(set([(random.randint(1,60)) for i in range(1,7)]))
print(f"{counter} : Draw:{draw} - Draw1:{draw1}")
Thanks again.

Find thresholds for binary classification

I have an array of ordered numeric values and a corresponding array of classes, in the form of yes/no. I need to find thresholds that have this criteria, quoting the paper I'm studying on:
"T is a threshold if it falls bewteen two consecutive examples that do not belong to the same class. In the special case when a group of two or more examples have the same value but belong to more than one class, then the cut points on either side of the examples are also thresholds. The examples with identical values cannot be separated."
If I understood correclty, if I have:
vals = [10,12, 22, 28, 28, 40, 41]
classes = ['y','y','n','y','n','y','n']
the thresholds must be: [17,25,34,40.5]
This is the code I wrote:
for i in range(len(vals)-1):
if vals[i] != vals[i+1]:
if classes[i] != classes[i+1]:
thresholds.append((vals[i] + vals[i+1]) / 2)
else:
j = i
while vals[i] == vals[i+1]:
i = i+1
if j != 0:
thresholds.append((vals[j] + vals[j-1]) / 2)
thresholds.append((vals[i] + vals[i+1]) / 2)
But 1) I really don't like it and I'd like it to be more compact, 2) even if it works for the example before it's not always true, for example if I have
vals = [2,2,5,5,7,11,18]
out = ['y','y','y','y','n','n','n]
I'd like the only threshold to be [6], but this code prints also 3.5
How can I make this prettier and more generic?
Update:
This is the new code for now (which can be refactored further and I will post that in my next edit). I have tested it on a good number of test cases. Below is just the code, you can find the detailed code with comments and unit tests in these links:
Detailed Code With Comments.
Detailed Tests for this code.
Plain code:
def compress_group(vls):
val0, lab0 = next(vls[1])
if all(lab == lab0 for val, lab in vls[1]):
return val0, lab0
return val0, case2_label
vl_combined = [(v, l) for v, l in zip(values, labels)]
vl_groups = groupby(vl_combined, lambda vc: vc[0])
vl_groups = map(lambda vl_group: compress_group(vl_group), vl_groups)
thresholds = []
prev_value, prev_label = next(vl_groups)
for curr_value, curr_label in vl_groups:
if prev_label == case2_label or curr_label != prev_label:
threshold = (prev_value + curr_value) / 2
thresholds.append(threshold)
prev_value, prev_label = curr_value, curr_label
return thresholds

Minimum number of steps to reach a given number

I need to calculate the minimum number of ways to reach a value, x, from value n, by adding/subtracting a list of values, l, to n.
For example: Value n = 100, value X = 45
List, l,: 50,6,1
The best way to do this is to say:
100-50-6+1 = 45
I want a programme to work this out for any value of x and n given list, l
I am really struggling to outline how I would write this.
I am confused about how to overcome the following issues:
How to inform the programme if I should attempt an addition or
subtraction and how many times this should be done. For example I
might need to subtract, then add, then subtract again to reach a
solution
How do I include enough for/while loops to ensure I can provide a
solution for all possible input values
Has anyone come across an issue like this before and have any ideas how I could outline the code for such a solution (I am using Python if it helps direct me towards learning about particular functions available that could assist me)
Thanks
This is my attempt so far but I am stuck
inputA = ""
while inputA == "":
inputA = input("""Please enter two numbers, separated by a comma.
The first value should indicate the number of jugs:
The second value should indicate the volume to be measured
""")
itemList = list(inputA.split(","))
valueToMeasure = int(itemList[1])
inputB = ""
while inputB == "":
inputB = input("Plese enter the volumes for the {} jug(s) listed: ".format((itemList[0])))
if len(inputB.split(",")) != int(itemList[0]):
inputB = ""
TargetVolume = itemList[1]
jugSizes = inputB.split(",")
print("Calculating: smallest number of steps to get", TargetVolume, "ml using jugs of sizes:", jugSizes)
jugSizes.sort()
jugSizes.reverse()
largestJug = int(jugSizes[0])
ratioTable = {}
for item in jugSizes:
firstVal = int(jugSizes[0])
itemV = int(item)
valueToAssign = firstVal/itemV
ratioTable[int(item)] = int(valueToAssign)
taskPossible = True
if valueToMeasure > largestJug:
print ("Impossible task")
taskPossible = False
newList = jugSizes
if taskPossible == True:
for item in jugSizes:
if item < TargetVolume: break
newList = newList[1:]
newDict = {}
for itemA in ratioTable:
if int(itemA) < int(item):
newDict[itemA]= ratioTable[itemA]
print ("Do work with these numbers:", newDict)
This is how I would approach the problem if I understand correctly.
X = 45
largest_jug = measured = 100
jug_sizes = [50, 6, 1]
steps = []
jug_to_use = 0
while measured != X:
if jug_to_use < len(jug_sizes) - 1: # we have smaller jugs in reserve
error_with_large_jug = min([abs(measured - jug_sizes[jug_to_use] - X), abs(measured + jug_sizes[jug_to_use] - X)])
error_with_small_jug = min([abs(measured - jug_sizes[jug_to_use + 1] - X), abs(measured + jug_sizes[jug_to_use + 1] - X)])
if error_with_small_jug < error_with_large_jug:
jug_to_use += 1
if measured > X:
measured -= jug_sizes[jug_to_use]
steps.append(('-', jug_sizes[jug_to_use]))
else:
measured += jug_sizes[jug_to_use]
steps.append(('+', jug_sizes[jug_to_use]))
print(steps)
Yielding
[('-', 50), ('-', 6), ('+', 1)]
It basically starts by using the largest jug, until it's in range of the next size and so on. We can test it with randomly sized jugs of [30, 7, 1] and see it again results in an accurate answer of [('-', 30), ('-', 30), ('+', 7), ('-', 1), ('-', 1)].
Important notes:
jug_sizes should be ordered largest to smallest
This solution assumes the X can be reached with the numbers provided in jug_sizes (otherwise it will infinitely loop)
This doesn't take into account that a jug size can make the target unreachable (i.e. [50, 12, 5] where the 12 size should be skipped, otherwise the solution is unreachable
This assumes every jug should be used (related to above point)
I'm sure you could figure out solutions for all these problems based on your specific circumstances though

How to reduce a collection of ranges to a minimal set of ranges [duplicate]

This question already has answers here:
Union of multiple ranges
(5 answers)
Closed 7 years ago.
I'm trying to remove overlapping values from a collection of ranges.
The ranges are represented by a string like this:
499-505 100-115 80-119 113-140 500-550
I want the above to be reduced to two ranges: 80-140 499-550. That covers all the values without overlap.
Currently I have the following code.
cr = "100-115 115-119 113-125 80-114 180-185 500-550 109-120 95-114 200-250".split(" ")
ar = []
br = []
for i in cr:
(left,right) = i.split("-")
ar.append(left);
br.append(right);
inc = 0
for f in br:
i = int(f)
vac = []
jnc = 0
for g in ar:
j = int(g)
if(i >= j):
vac.append(j)
del br[jnc]
jnc += jnc
print vac
inc += inc
I split the array by - and store the range limits in ar and br. I iterate over these limits pairwise and if the i is at least as great as the j, I want to delete the element. But the program doesn't work. I expect it to produce this result: 80-125 500-550 200-250 180-185
For a quick and short solution,
from operator import itemgetter
from itertools import groupby
cr = "499-505 100-115 80-119 113-140 500-550".split(" ")
fullNumbers = []
for i in cr:
a = int(i.split("-")[0])
b = int(i.split("-")[1])
fullNumbers+=range(a,b+1)
# Remove duplicates and sort it
fullNumbers = sorted(list(set(fullNumbers)))
# Taken From http://stackoverflow.com/questions/2154249
def convertToRanges(data):
result = []
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
group = map(itemgetter(1), g)
result.append(str(group[0])+"-"+str(group[-1]))
return result
print convertToRanges(fullNumbers)
#Output: ['80-140', '499-550']
For the given set in your program, output is ['80-125', '180-185', '200-250', '500-550']
Main Possible drawback of this solution: This may not be scalable!
Let me offer another solution that doesn't take time linearly proportional to the sum of the range sizes. Its running time is linearly proportional to the number of ranges.
def reduce(range_text):
parts = range_text.split()
if parts == []:
return ''
ranges = [ tuple(map(int, part.split('-'))) for part in parts ]
ranges.sort()
new_ranges = []
left, right = ranges[0]
for range in ranges[1:]:
next_left, next_right = range
if right + 1 < next_left: # Is the next range to the right?
new_ranges.append((left, right)) # Close the current range.
left, right = range # Start a new range.
else:
right = max(right, next_right) # Extend the current range.
new_ranges.append((left, right)) # Close the last range.
return ' '.join([ '-'.join(map(str, range)) for range in new_ranges ]
This function works by sorting the ranges, then looking at them in order and merging consecutive ranges that intersect.
Examples:
print(reduce('499-505 100-115 80-119 113-140 500-550'))
# => 80-140 499-550
print(reduce('100-115 115-119 113-125 80-114 180-185 500-550 109-120 95-114 200-250'))
# => 80-125 180-185 200-250 500-550

Dijsktra's Shortest Path Algorithm

When I run this, the end output is a table with columns:
Vertex - DisVal - PrevVal - Known.
The two nodes connected to my beginning node show the correct values, but none of the others end up getting updated. I can include the full program code if anyone wants to see, but I know the problem is isolated here. I think it may have to do with not changing the index the right way. This is a simple dijsktra's btw, not the heap/Q version.
Here's the rest of the code: http://ideone.com/UUOUn8
The adjList looks like this: [1: 2, 4, 2: 6, 3, ...] where it shows each node connected to a vertex. DV = distance value (weight), PV = previous value (node), known = has it bee visited
def dijkstras(graph, adjList):
pv = [None] * len(graph.nodes)
dv = [999]*len(graph.nodes)
known = [False] * len(graph.nodes)
smallestV = 9999
index = 0
dv[0] = 0
known[0] = True
for i in xrange(len(dv)):
if dv[i] < smallestV and known[i]:
smallestV = dv[i]
index = i
known[index] = True
print smallestV
print index
for edge in adjList[index]:
if (dv[index]+graph.weights[(index, edge)] < dv[edge]):
dv[edge] = dv[index] + graph.weights[(index, edge)]
pv[edge] = index
printTable(dv, pv, known)
The first iteration sets smallestV and index to 0 unconditionally, and they never change afterwards (assuming non-negative weights).
Hard to tell what you are trying to do here.

Categories