comparing each sequence created in the loop to the previous

comparing each sequence created in the loop to the previous - python

I have created a function in python which randomly generates nucleotide sequence:
import random
selection60 = {"A":20, "T":20, "G":30, "C":30}
sseq60=[]
for k in selection60:
sseq60 = sseq60 + [k] * int(selection60[k])
random.shuffle(sseq60)
for i in range(100):
random.shuffle(sseq60)
def generateSequence(self, length):
length = int(length)
sequence = ""
while len(sequence) < length:
sequence="".join(random.sample(self, length))
return sequence[:length]
Now, I would like to check that while I apply this function, if a newly created sequence has a similarity of > 10% to the previous sequences, the sequence is eliminated and a new one is created:
I wrote something like this:
lst60=[]
newSeq=[]
for i in range(5):
while max_identity < 10:
newSeq=generateSequence(sseq60,100)
identity[i] = [newSeq[i] == newSeq[i] for i in range(len(newSeq[i]))]
max_identity[I]=100*sum(identity[i]/length(identity[i])
lst60.append(newSeq)
print(len(lst60))
However, it seems I get an empty list

You have to use a nested for loop if you want to compare ith sequence with jth sequence for all 1 <= j < i.
Further, I created a separate getSimilarity function for easier code readability. Pass it an old and new sequence to get the similarity.
def getSimilarity(old_seq, new_seq):
similarity = [old_seq[i] == new_seq[i] for i in range(len(new_seq))]
return 100*sum(similarity)/len(similarity)
lst60=[generateSequence(sseq60,100)]
for i in range(1,5):
newSeq = ""
max_identity = 0
while True:
newSeq = generateSequence(sseq60,100)
for j in range(0,i):
max_identity = max(max_identity, getSimilarity(lst60[j], newSeq))
if max_identity < 10:
break
lst60.append(newSeq)
print(len(lst60))

Related

Rolling a roulette with chances and counting the maximum same-side sequences of a number of rolls

I am trying to generate a random sequence of numbers, with each "result" having a chance of {a}48.6%/{b}48.6%/{c}2.8%.
Counting how many times in a sequence of 6 or more {a} occurred, same for {b}.
{c} counts as neutral, meaning that if an {a} sequence is happening, {c} will count as {a}, additionally if a {b} sequence is happening, then {c} will count as {b}.
The thing is that the results seem right, but every "i" iteration seems to give results that are "weighted" either on the {a} side or the {b} side. And I can't seem to figure out why.
I would expect for example to have a result of :
{a:6, b:7, a:8, a:7, b:9} but what I am getting is {a:7, a:9, a:6, a:8} OR {b:7, b:8, b:6} etc.
Any ideas?
import sys
import random
from random import seed
from random import randint
from datetime import datetime
import time
loopRange = 8
flips = 500
median = 0
for j in range(loopRange):
random.seed(datetime.now())
sequenceArray = []
maxN = 0
flag1 = -1
flag2 = -1
for i in range(flips):
number = randint(1, 1000)
if(number <= 486):
flag1 = 0
sequenceArray.append(number)
elif(number > 486 and number <= 972):
flag1 = 1
sequenceArray.append(number)
elif(number > 972):
sequenceArray.append(number)
if(flag1 != -1 and flag2 == -1):
flag2 = flag1
if(flag1 != flag2):
sequenceArray.pop()
if(len(sequenceArray) > maxN):
maxN = len(sequenceArray)
if(len(sequenceArray) >= 6):
print(len(sequenceArray))
# print(sequenceArray)
# print(sequenceArray)
sequenceArray = []
median += maxN
print("Maximum sequence is %d " % maxN)
print("\n")
time.sleep(random.uniform(0.1, 1))
median = float(median/loopRange)
print("\n")
print(median)

I would implement something with two cursors prev (previous) and curr (current) since you need to detect a change between the current and the previous state.
I just write the code of the inner loop on i since the external loop adds complexity without focusing on the source of the problem. You can then include this in your piece of code. It seems to work for me, but I am not sure to understand perfectly how you want to manage all the behaviours (especially at start).
prev = -1
curr = -1
seq = []
maxN = 0
for i in range(flips):
number = randint(1, 1000)
if number<=486:
curr = 0 # case 'a'
elif number<=972:
curr = 1 # case 'b'
else:
curr = 2 # case 'c', the joker
if (prev==-1) and (curr==2):
# If we start with the joker, don't do anything
# You can add code here to change this behavior
pass
else:
if (prev==-1):
# At start, it is like the previous element is the current one
prev=curr
if (prev==curr) or (curr==2):
# We continue the sequence
seq.append(number)
else:
# Break the sequence
maxN = max(maxN,len(seq)) # save maximum length
if len(seq)>=6:
print("%u: %r"%(prev,seq))
seq = [] # reset sequence
seq.append(number) # don't forget to append the new number! It breaks the sequence, but starts a new one
prev=curr # We switch case 'a' for 'b', or 'b' for 'a'

How to check if all elements of array are filled in Python

I am making a To Do List app in Python using Arrays or Lists. I want to check if the array containing all "to-do tasks" is full or not. If it is full I would then inform the user that the list is full. I am still a beginner.
todo_list = ["1.)", "2.)", "3.)", "4.)", "5.)", "6.)", "7.)", "8.)", "9.)", "10.)"]
def addTask(taskName):
'''
this is a global variable to keep track of what index the last task was
placed in.
'''
global x
x = int(x)
num = x + 1
num = int(num)
taskName = str(taskName)
'''
This is what I tried to make the program start from the beginning if the
list was full.
'''
if x > list_length:
x = 0
todo_list[0] = None
todo_list[x] = str(num) + ".) " + taskName
x = x+1
print("Done!")
main()

Are you saying that you've limited the number of possible tasks to 10? and if each one has a task attached then the list should notify the user it's full?
If so, you know that an empty task is "10.)" so at max it's length is 4 (4 characters), so if any items length is less than or equal to 4 then it's empty
for task in todo_list:
if len(task) > 4:
print('Todo list is full')
break
else:
print('Todo list is full')
Can I also advise of a better way to create the todo list? Use a dictionary.
todo_list = {'clean cat':'incomplete', 'buy milk':'complete'}
and to add a new task is easy!
todo_list['learn python'] = 'incomplete'
and to update a task is even easier!
todo_list['clean cat'] = 'complete'
Here is how I would do it:
todo_list = {}
if len(todo_list) == 10:
print('Sorry, list is full!')
else:
task_name = input('Task name: ')
todo_list[task_name] = 'incomplete'
print(todo_list)

You don't have to define todo_list with numbering. Just define the maximum size of todo_list and check if length of todo_list is bigger than maximum size.
todo_list = list()
MAX_SIZE = 10
add_task(taskName:str):
if len(todo_list) >= MAX_SIZE:
# code you want to run when todo_list is full
else:
todo_list.append("{}.) {}".format(len(todo_list)+1, taskName))

Make my Nested loops Works simpler (Operating Time is Higher)

I am a learner in nested loops in python.
Below I have written my code. I want to make my code simpler, since when I run the code it takes so much time to produce the result.
I have a list which contains 1000 values:
Brake_index_values = [ 44990678, 44990679, 44990680, 44990681, 44990682, 44990683,
44997076, 44990684, 44997077, 44990685,
...
44960673, 8195083, 8979525, 100107546, 11089058, 43040161,
43059162, 100100533, 10180192, 10036189]
I am storing the element no 1 in another list
original_top_brake_index = [Brake_index_values[0]]
I created a temporary list called temp and a numpy array for iteration through Loop:
temp =[]
arr = np.arange(0,1000,1)
Loop operation:
for i in range(1, len(Brake_index_values)):
if top_15_brake <= 15:
a1 = Brake_index_values[i]
#a2 = Brake_index_values[j]
a3 = arr[:i]
for j in a3:
a2 = range(Brake_index_values[j] - 30000, Brake_index_values[j] + 30000)
if a1 in a2:
pass
else:
temp.append(a1)
if len(temp)== len(a3):
original_top_brake_index.append(a1)
top_15_brake += 1
del temp[:]
else:
del temp[:]
continue
I am comparing the Brake_index_values[1] element available between the range of 30000 before and after Brake_index_values[0] element, that is `range(Brake_index_values[0]-30000, Brake_index_values[0]+30000).
If the Brake_index_values[1] available between the range, I should ignore that element and go for the next element Brake_index_values[2] and follow the same process as before for Brake_index_values[0] & Brake_index_values[1]
If it is available, store the Value, in original_top_brake_index thorough append operation.
In other words :
(Lets take 3 values a,b & c. I am checking whether the value b is in range between (a-30000 to a+30000). Possibility 1: If b is in between (a-30000 to a+30000) , neglect that element (Here I am storing inside a temporary list). Then the same process continues with c (next element) Possibility 2: If b is not in b/w those range put b in another list called original_top_brake_index
(this another list is the actual result what i needed)
The result I get:
It is working, but it takes so much time to complete the operation and sometimes it shows MemoryError.
I just want my code to work simpler and efficient with simple operations.

Try this code (with numpy):
import numpy as np
original_top_brake_index = [Brake_index_values[0]]
top_15_brake = 0
Brake_index_values = np.array(Brake_index_values)
for i, a1 in enumerate(Brake_index_values[0:]):
if top_15_brake > 15:
break
m = (Brake_index_values[:i] - a1)
if np.logical_or(m > 30000, m < - 30000).all():
original_top_brake_index.append(a1)
top_15_brake += 1
Note: you can probably make it even more efficient, but this already should reduce the number of operations significantly (and doesn't change much the logic of your original code)

We can use the bisect module to shorten the elements we actually have to lookup by finding the smallest element that's greater or less than the current value. We will use recipes from here
Let's look at this example:
from bisect import bisect_left, bisect_right
def find_lt(a, x):
'Find rightmost value less than x'
i = bisect_left(a, x)
if i:
return a[i-1]
return
def find_gt(a, x):
'Find leftmost value greater than x'
i = bisect_right(a, x)
if i != len(a):
return a[i]
return
vals = [44990678, 44990679, 44990680, 44990681, 44990682, 589548954, 493459734, 3948305434, 34939349534]
vals.sort() # we have to sort the values for bisect to work
passed = []
originals = []
for val in vals:
passed.append(val)
l = find_lt(passed, val)
m = find_gt(passed, val)
cond1 = (l and l + 30000 >= val)
cond2 = (m and m - 30000 <= val)
if not l and not m:
originals.append(val)
continue
elif cond1 or cond2:
continue
else:
originals.append(val)
Which gives us:
print(originals)
[44990678, 493459734, 589548954, 3948305434, 34939349534]
There might be another, more mathematical way to do this, but this should at least simplify your code.

Most frequently overlapping range - Python3.x

I'm a beginner, trying to write code listing the most frequently overlapping ranges in a list of ranges.
So, input is various ranges (#1 through #7 in the example figure; https://prntscr.com/kj80xl) and I would like to find the most common range (in the example 3,000- 4,000 in 6 out of 7 - 86 %). Actually, I would like to find top 5 most frequent.
Not all ranges overlap. Ranges are always positive and given as integers with 1 distance (standard range).
What I have now is only code comparing one sequence to another and returning the overlap, but after that I'm stuck.
def range_overlap(range_x,range_y):
x = (range_x[0], (range_x[-1])+1)
y = (range_y[0], (range_y[-1])+1)
overlap = (max(x[0],y[0]),min(x[-1],(y[-1])))
if overlap[0] <= overlap[1]:
return range(overlap[0], overlap[1])
else:
return "Out of range"
I would be very grateful for any help.

Better solution
I came up with a simpler solution (at least IMHO) so here it is:
def get_abs_min(ranges):
return min([min(r) for r in ranges])
def get_abs_max(ranges):
return max([max(r) for r in ranges])
def count_appearances(i, ranges):
return sum([1 for r in ranges if i in r])
def create_histogram(ranges):
keys = [str(i) for i in range(len(ranges) + 1)]
histogram = dict.fromkeys(keys)
results = []
min = get_abs_min(range_list)
max = get_abs_max(range_list)
for i in range(min, max):
count = str(count_appearances(i, ranges))
if histogram[count] is None:
histogram[count] = dict(start=i, end=None)
elif histogram[count]['end'] is None:
histogram[count]['end'] = i
elif histogram[count]['end'] == i - 1:
histogram[count]['end'] = i
else:
start = histogram[count]['start']
end = histogram[count]['end']
results.append((range(start, end + 1), count))
histogram[count]['start'] = i
histogram[count]['end'] = None
for count, d in histogram.items():
if d is not None and d['start'] is not None and d['end'] is not None:
results.append((range(d['start'], d['end'] + 1), count))
return results
def main(ranges, top):
appearances = create_histogram(ranges)
return sorted(appearances, key=lambda t: t[1], reverse=True)[:top]
The idea here is as simple as iterating through a superposition of all the ranges and building a histogram of appearances (e.g. the number of original ranges this current i appears in)
After that just sort and slice according to the chosen size of the results.
Just call main with the ranges and the top number you want (or None if you want to see all results).
OLDER EDITS BELOW
I (almost) agree with #Kasramvd's answer.
here is my take on it:
from collections import Counter
from itertools import combinations
def range_overlap(x, y):
common_part = list(set(x) & set(y))
if common_part:
return range(common_part[0], common_part[-1] +1)
else:
return False
def get_most_common(range_list, top_frequent):
overlaps = Counter(range_overlap(i, j) for i, j in
combinations(list_of_ranges, 2))
return [(r, i) for (r, i) in overlaps.most_common(top_frequent) if r]
you need to input the range_list and the number of top_frequent you want.
EDIT
the previous answer solved this question for all 2's combinations over the range list.
This edit is tested against your input and results with the correct answer:
from collections import Counter
from itertools import combinations
def range_overlap(*args):
sets = [set(r) for r in args]
common_part = list(set(args[0]).intersection(*sets))
if common_part:
return range(common_part[0], common_part[-1] +1)
else:
return False
def get_all_possible_combinations(range_list):
all_combos = []
for i in range(2, len(range_list)):
all_combos.append(combinations(range_list, i))
all_combos = [list(combo) for combo in all_combos]
return all_combos
def get_most_common_for_combo(combo):
return list(filter(None, [range_overlap(*option) for option in combo]))
def get_most_common(range_list, top_frequent):
all_overlaps = []
combos = get_all_possible_combinations(range_list)
for combo in combos:
all_overlaps.extend(get_most_common_for_combo(combo))
return [r for (r, i) in Counter(all_overlaps).most_common(top_frequent) if r]
And to get the results just run get_most_common(range_list, top_frequent)
Tested on my machine (ubunut 16.04 with python 3.5.2) with your input range_list and top_frequent = 5 with the results:
[range(3000, 4000), range(2500, 4000), range(1500, 4000), range(3000, 6000), range(1, 4000)]

You can first change your function to return a valid range in both cases so that you can use it in a set of comparisons. Also, since Python's range objects are not already created iterables but smart objects that only get start, stop and step attributes of a range and create the range on-demand, you can do a little change on your function as well.
def range_overlap(range_x,range_y):
rng = range(max(range_x.start, range_y.start),
min(range_x.stop, range_y.stop)+1)
if rng.start < rng.stop:
return rng.start, rng.stop
Now, if you have a set of ranges and you want to compare all the pairs you can use itertools.combinations to get all the pairs and then using range_overlap and collections.Counter you can find the number of overlapped ranges.
from collections import Counter
from itertools import combinations
overlaps = Counter(range_overlap(i,j) for i, j in
combinations(list_of_ranges, 2))

Changing this Python program to have function def()

The following Python program flips a coin several times, then reports the longest series of heads and tails. I am trying to convert this program into a program that uses functions so it uses basically less code. I am very new to programming and my teacher requested this of us, but I have no idea how to do it. I know I'm supposed to have the function accept 2 parameters: a string or list, and a character to search for. The function should return, as the value of the function, an integer which is the longest sequence of that character in that string. The function shouldn't accept input or output from the user.
import random
print("This program flips a coin several times, \nthen reports the longest
series of heads and tails")
cointoss = int(input("Number of times to flip the coin: "))
varlist = []
i = 0
varstring = ' '
while i < cointoss:
r = random.choice('HT')
varlist.append(r)
varstring = varstring + r
i += 1
print(varstring)
print(varlist)
print("There's this many heads: ",varstring.count("H"))
print("There's this many tails: ",varstring.count("T"))
print("Processing input...")
i = 0
longest_h = 0
longest_t = 0
inarow = 0
prevIn = 0
while i < cointoss:
print(varlist[i])
if varlist[i] == 'H':
prevIn += 1
if prevIn > longest_h:
longest_h = prevIn
print("",longest_h,"")
inarow = 0
if varlist[i] == 'T':
inarow += 1
if inarow > longest_t:
longest_t = inarow
print("",longest_t,"")
prevIn = 0
i += 1
print ("The longest series of heads is: ",longest_h)
print ("The longest series of tails is: ",longest_t)
If this is asking too much, any explanatory help would be really nice instead. All I've got so far is:
def flip (a, b):
flipValue = random.randint
but it's barely anything.

import random
def Main():
numOfFlips=getFlips()
outcome=flipping(numOfFlips)
print(outcome)
def getFlips():
Flips=int(input("Enter number if flips:\n"))
return Flips
def flipping(numOfFlips):
longHeads=[]
longTails=[]
Tails=0
Heads=0
for flips in range(0,numOfFlips):
flipValue=random.randint(1,2)
print(flipValue)
if flipValue==1:
Tails+=1
longHeads.append(Heads) #recording value of Heads before resetting it
Heads=0
else:
Heads+=1
longTails.append(Tails)
Tails=0
longestHeads=max(longHeads) #chooses the greatest length from both lists
longestTails=max(longTails)
return "Longest heads:\t"+str(longestHeads)+"\nLongest tails:\t"+str(longestTails)
Main()
I did not quite understand how your code worked, so I made the code in functions that works just as well, there will probably be ways of improving my code alone but I have moved the code over to functions

First, you need a function that flips a coin x times. This would be one possible implementation, favoring random.choice over random.randint:
def flip(x):
result = []
for _ in range(x):
result.append(random.choice(("h", "t")))
return result
Of course, you could also pass from what exactly we are supposed to take a choice as a parameter.
Next, you need a function that finds the longest sequence of some value in some list:
def longest_series(some_value, some_list):
current, longest = 0, 0
for r in some_list:
if r == some_value:
current += 1
longest = max(current, longest)
else:
current = 0
return longest
And now you can call these in the right order:
# initialize the random number generator, so we get the same result
random.seed(5)
# toss a coin a hundred times
series = flip(100)
# count heads/tails
headflips = longest_series('h', series)
tailflips = longest_series('t', series)
# print the results
print("The longest series of heads is: " + str(headflips))
print("The longest series of tails is: " + str(tailflips))
Output:
>> The longest series of heads is: 8
>> The longest series of heads is: 5
edit: removed the flip implementation with yield, it made the code weird.

Counting the longest run
Let see what you have asked for
I'm supposed to have the function accept 2 parameters: a string or list,
or, generalizing just a bit, a sequence
and a character
again, we'd speak, generically, of an item
to search for. The function should return, as the value of the
function, an integer which is the longest sequence of that character
in that string.
My implementation of the function you are asking for, complete of doc
string, is
def longest_run(i, s):
'Counts the longest run of item "i" in sequence "s".'
c, m = 0, 0
for el in s:
if el==i:
c += 1
elif c:
m = m if m >= c else c
c = 0
return m
We initialize c (current run) and m (maximum run so far) to zero,
then we loop, looking at every element el of the argument sequence s.
The logic is straightforward but for elif c: whose block is executed at the end of a run (because c is greater than zero and logically True) but not when the previous item (not the current one) was not equal to i. The savings are small but are savings...
Flipping coins (and more...)
How can we simulate flipping n coins? We abstract the problem and recognize that flipping n coins corresponds to choosing from a collection of possible outcomes (for a coin, either head or tail) for n times.
As it happens, the random module of the standard library has the exact answer to this problem
In [52]: random.choices?
Signature: choices(population, weights=None, *, cum_weights=None, k=1)
Docstring:
Return a k sized list of population elements chosen with replacement.
If the relative weights or cumulative weights are not specified,
the selections are made with equal probability.
File: ~/lib/miniconda3/lib/python3.6/random.py
Type: method
Our implementation, aimed at hiding details, could be
def roll(n, l):
'''Rolls "n" times a dice/coin whose face values are listed in "l".
E.g., roll(2, range(1,21)) -> [12, 4] simulates rolling 2 icosahedron dices.
'''
from random import choices
return choices(l, k=n)
Putting this together
def longest_run(i, s):
'Counts the longest run of item "i" in sequence "s".'
c, m = 0, 0
for el in s:
if el==i:
c += 1
elif c:
m = m if m >= c else c
c = 0
return m
def roll(n, l):
'''Rolls "n" times a dice/coin whose face values are listed in "l".
E.g., roll(2, range(1,21)) -> [12, 4] simulates rolling 2 icosahedron dices.
'''
from random import choices
return choices(l, k=n)
N = 100 # n. of flipped coins
h_or_t = ['h', 't']
random_seq_of_h_or_t = flip(N, h_or_t)
max_h = longest_run('h', random_seq_of_h_or_t)
max_t = longest_run('t', random_seq_of_h_or_t)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

comparing each sequence created in the loop to the previous - python

Related

Rolling a roulette with chances and counting the maximum same-side sequences of a number of rolls

How to check if all elements of array are filled in Python

Make my Nested loops Works simpler (Operating Time is Higher)

Most frequently overlapping range - Python3.x

Changing this Python program to have function def()

Categories

Resources