I'd appreciate some help debugging this code:
testing = """There is something unique about this line
in that it can span across several lines, which is unique and
useful in python."""
listofthings = []
i = 0
while i < len(testing):
if testing[i] == " ":
listofthings.append(i + 1)
i = i + 1
listofthings.insert(0, 0)
listofthings.append(len(testing))
print listofthings
word_list = []
i = 0
while i < len(listofthings):
l = i + 1
x = listofthings[i]
y = listofthings[l]
word = testing[x:y]
word_list.append(word)
i = l
print word_list
I am not sure why I am getting the index out of range error. I understand what the error means obviously, but am not sure what I am doing wrong. Weirdly enough, this only happens when I run the above code. It doesn't give me any errors when I run this:
word = testing[x:y]
print word
I am fairly new with Python(going on three days) so I am sure it is a stupid overlooked syntactical error...
l = i + 1
x = listofshit[i]
y = listofshit[l]
word = testing[x:y]
word_list.append(word)
When i=length-1,then y=length, which is an error.Python array indexing starts from 0, hence max address is length-1
The length of list listofshit is 21 with the range of index from 0 to 20. And when it comes to the final loop, i is 20 and l is 21, so there is a out of range error. And I think the following code is what you want:
testing = """There is something unique about this line
in that it can span across several lines, which is unique and
useful in python."""
listofshit = []
i = 0
while i < len(testing):
if testing[i] == " ":
listofshit.append(i)
i = i + 1
listofshit.insert(0, 0)
listofshit.append(len(testing))
word_list = []
i = 0
while i < len(listofshit) - 1:
l = i + 1
x = listofshit[i]
y = listofshit[l]
word = testing[x:y]
word_list.append(word)
i = l
print word_list
while i < len(listofshit):
l = i + 1
x = listofshit[i]
y = listofshit[l]
When i corresponds to the last element,
y = listofshit[l]
You are trying to access the element next to the last element. Thats why it is throwing the error.
On the last iteration of the second while loop, l is set to len(listofshit). This is past the end of listofshit; the last valid index is len(listofshit) - 1.
Related
I am making a function which takes a tuple containing strings, selects one string, then selects a start and end index within that string. Within this index, the string is chopped up into pieces and permuted randomly and then the tuple is returned with the modified string. However, there appears to be some sort of error when defining the start and end index in the function and then splicing the string based on it. In some runs it works as expected, and in others it does not and hence throws an error which I avoid in the while loop. Can anyone tell me what could be going on here:
def chromothripsist(seqs, keep_frequency = 0.9):
seqs = list(seqs)
c = list(range(len(seqs)))
chrom = random.sample(c, 1)
orig_seq = seqs[chrom[0]]
n = len(orig_seq)
NotLongEnough = True
while(NotLongEnough):
splits = random.randint(5,10)
##IF SPLITS > len(seq) then this bugs
print(orig_seq)
lowbd = int(0.1*n)
upbd = int(0.9*n)
distance = int(np.random.uniform(lowbd, upbd))
stidx = random.randint(0,n-1)
edidx = max(stidx + distance, n-1)
dist = edidx - stidx
print(splits, stidx, edidx)
if(dist > splits):
NotLongEnough = False
break
first_part = orig_seq[:stidx]
last_part = orig_seq[edidx:]
seq = orig_seq[stidx:edidx]
# THE ABOVE LINES ARE NOT WORKING AS EXPECTED
print(seq)
print(stidx, edidx)
print(len(seq), splits)
breakpoints = np.random.choice(len(seq), splits-1, replace = False)
breakpoints.sort()
subseq = []
curridx = 0
for i in breakpoints:
subseq.append(seq[curridx:i])
curridx = i
subseq.append(seq[breakpoints[-1]:])
rearrange = np.random.permutation(splits)
#n_to_select= int(splits *keep_frequency)
n_to_select = len(rearrange)
rearrange = random.sample(list(rearrange), n_to_select)
build_seq = ''
for i in rearrange:
build_seq += subseq[i]
seqs[chrom[0]] = first_part + build_seq + last_part
breakpoints = list(breakpoints)
return tuple(seqs), [stidx, edidx, breakpoints, rearrange, chrom[0]]
yamxxopd
yndfyamxx
Output: 5
I am not quite sure how to find the number of the most amount of shared characters between two strings. For example (the strings above) the most amount of characters shared together is "yamxx" which is 5 characters long.
xx would not be a solution because that is not the most amount of shared characters. In this case the most is yamxx which is 5 characters long so the output would be 5.
I am quite new to python and stack overflow so any help would be much appreciated!
Note: They should be the same order in both strings
Here is simple, efficient solution using dynamic programming.
def longest_subtring(X, Y):
m,n = len(X), len(Y)
LCSuff = [[0 for k in range(n+1)] for l in range(m+1)]
result = 0
for i in range(m + 1):
for j in range(n + 1):
if (i == 0 or j == 0):
LCSuff[i][j] = 0
elif (X[i-1] == Y[j-1]):
LCSuff[i][j] = LCSuff[i-1][j-1] + 1
result = max(result, LCSuff[i][j])
else:
LCSuff[i][j] = 0
print (result )
longest_subtring("abcd", "arcd") # prints 2
longest_subtring("yammxdj", "nhjdyammx") # prints 5
This solution starts with sub-strings of longest possible lengths. If, for a certain length, there are no matching sub-strings of that length, it moves on to the next lower length. This way, it can stop at the first successful match.
s_1 = "yamxxopd"
s_2 = "yndfyamxx"
l_1, l_2 = len(s_1), len(s_2)
found = False
sub_length = l_1 # Let's start with the longest possible sub-string
while (not found) and sub_length: # Loop, over decreasing lengths of sub-string
for start in range(l_1 - sub_length + 1): # Loop, over all start-positions of sub-string
sub_str = s_1[start:(start+sub_length)] # Get the sub-string at that start-position
if sub_str in s_2: # If found a match for the sub-string, in s_2
found = True # Stop trying with smaller lengths of sub-string
break # Stop trying with this length of sub-string
else: # If no matches found for this length of sub-string
sub_length -= 1 # Let's try a smaller length for the sub-strings
print (f"Answer is {sub_length}" if found else "No common sub-string")
Output:
Answer is 5
s1 = "yamxxopd"
s2 = "yndfyamxx"
# initializing counter
counter = 0
# creating and initializing a string without repetition
s = ""
for x in s1:
if x not in s:
s = s + x
for x in s:
if x in s2:
counter = counter + 1
# display the number of the most amount of shared characters in two strings s1 and s2
print(counter) # display 5
I'm working on this python problem:
Given a sequence of the DNA bases {A, C, G, T}, stored as a string, returns a conditional probability table in a data structure such that one base (b1) can be looked up, and then a second (b2), to get the probability p(b2 | b1) of the second base occurring immediately after the first. (Assumes the length of seq is >= 3, and that the probability of any b1 and b2 which have never been seen together is 0. Ignores the probability that b1 will be followed by the end of the string.)
You may use the collections module, but no other libraries.
However I'm running into a roadblock:
word = 'ATCGATTGAGCTCTAGCG'
def dna_prob2(seq):
tbl = dict()
levels = set(word)
freq = dict.fromkeys(levels, 0)
for i in seq:
freq[i] += 1
for i in levels:
tbl[i] = {x:0 for x in levels}
lastlevel = ''
for i in tbl:
if lastlevel != '':
tbl[lastlevel][i] += 1
lastlevel = i
for i in tbl:
print(i,tbl[i][i] / freq[i])
return tbl
tbl['T']['T'] / freq[i]
Basically, the end result is supposed to be the final line tbl you see above. However, when I try to do that in print(i,tbl[i][i] /freq[i), and run dna_prob2(word), I get 0.0s for everything.
Wondering if anyone here can help out.
Thanks!
I am not sure what it is your code is doing, but this works:
def makeprobs(word):
singles = {}
probs = {}
thedict={}
ll = len(word)
for i in range(ll-1):
x1 = word[i]
x2 = word[i+1]
singles[x1] = singles.get(x1, 0)+1.0
thedict[(x1, x2)] = thedict.get((x1, x2), 0)+1.0
for i in thedict:
probs[i] = thedict[i]/singles[i[0]]
return probs
I finally got back to my professor. This is what it was trying to accomplish:
word = 'ATCGATTGAGCTCTAGCG'
def dna_prob2(seq):
tbl = dict()
levels = set(seq)
freq = dict.fromkeys(levels, 0)
for i in seq:
freq[i] += 1
for i in levels:
tbl[i] = {x:0 for x in levels}
lastlevel = ''
for i in seq:
if lastlevel != '':
tbl[lastlevel][i] += 1
lastlevel = i
return tbl, freq
condfreq, freq = dna_prob2(word)
print(condfreq['T']['T']/freq['T'])
print(condfreq['G']['A']/freq['A'])
print(condfreq['C']['G']/freq['G'])
Hope this helps.
I can print the following list of lists fine, but when I append to an empty list, it skips the last on each iteration or gives me an index out of range error when I add one more.
This works:
ordered_results = []
temp = []
A = len(results[1])-2
i = 1
while i < len(results):
x = 0
y = 1
while x < A:
temp = [results[i][0], results[0][x], results[i][y]]
print(temp)
x+=1
y+=1
temp = [results[i][0], results[0][x], results[i][y]]
print(temp)
i+=1
ordered_results
Note: len(results[0]) = 240 and len(results[1] = 241
If you replace "print" with ordered_results.append(temp) it skips:
results[i][0], results[0][239], results[i][240]
each iteration.
(Note the code was expanded as I am messing around trying to figure this out, it was more compact before).
So I've written a bit of code to stack integers in a list from the zeroth position. For some reason I cannot decipher, the while loop below is not being processed. I have followed all good style and syntax requirements that I know, and the while loop works when run by itself.
def row(line):
"""
Function that merges a single row or column.
"""
result_length = len(line)
print result_length
# Create a list of zeros the same length as the 'line' argument
pts_alloc = 0
dummy = 0
result = line
result[0:] = [pts_alloc for dummy in range(len(result))]
print result
#Iterate over the 'line' list looking for non-zero entries and
#stack them from 'result[0]'
line_count = 0
result_place = 0
while (line_count <= (len(line)-1)):
if (line[line_count] > 0):
result[result_place] = line[line_count]
print result
result_place += 1
line_count += 1
return result
print row([4, 0, 0, 5])
Is there a major error in this code that I've missed? Is there some syntax requirement that I am unaware of?
The problems seems to be this part:
result = line
result[0:] = [pts_alloc for dummy in range(len(result))]
By replacing a slice of result, with result = line, you are replacing that same slice in line, too, as result is just another reference to the same list, not a copy.
Since the slice is the entire list, anyway, just do:
result = [pts_alloc for dummy in range(len(result))]
Also, you are declaring a lot of unnecessary variables. You could shorten your code to this:
def row(line):
result = [0] * len(line)
result_place = 0
for x in line:
if x > 0:
result[result_place] = x
result_place += 1
return result
Or even this:
def row(line):
non_zero = [x for x in line if x > 0] # take non-zero values
return non_zero + [0] * (len(line) - len(non_zero)) # pad with zeros