Make program continue action till line has been created - python

Currently my program takes line1 such as "taaaaaaaaaaNataggggggggggNccc" and will cut 1 character of the end untill it matches line2 such as "taaaaaaaaaaNcccggggggggggNccc" and once they match it concatenates them together to form line3, however if they dont match it should cut another character off. How can I make it repeat the cutting action until they match and line3 has been made? I have thought about for and while loops but am unsure how to state this issue. Everything else about this program works as it should but when it tries matching them if it fails it just stops and wont go back to try trimming again.
I have tried the below code where magic(matching) is essentially the counting code used to idnetfy how much the 2 lines match and if below 8 it should repeat the cutting. However when used it asks for matching and magic to be stated before the while loop which is right at the start and this messes up the rest of the code.
while magic(matching) >=8:
line3=line1+line2
print ("Matching and merging has occured as shown below")
print (line3)
The code of interest is below:
n = 0
consec_matches = []
chars = defaultdict(int)
for k, group in groupby(zip(line1_u_i, line2_u_rev_comp_join_i), class_chars):
elems = len(list(group))
chars[k] += elems
if k == 'match':
consec_matches.append((n, n+elems-1))
n += elems
print ("Print chars below")
print (chars)
print ("Print consec_matches below")
print (consec_matches)
print ([x for x in consec_matches if x[1]-x[0] >= 9])
print (" Matches longer than 10 below")
list = [x for x in consec_matches if x[1]-x[0] >= 9]
flatten_list= [x for y in list for x in y]
print (flatten_list)
print ("Flatterend list")
matching=[y[1] for y in list for x in y if x ==0 ]
print ("Matching list below")
print (matching)
magic = lambda matching: int(''.join(str(i) for i in matching) or 0)
print (" Print magic matching below")
print (magic(matching))
line2_u_rev_comp_join_i_l = line2_u_rev_comp_join_i[magic(matching):]
print ("Print line2_u_rev_comp_join_i_l type below")
print (type(line2_u_rev_comp_join_i_l))
print ("Print line2_u_rev_comp_join_i_l sequence below")
print (line2_u_rev_comp_join_i_l)
line2_u_rev_comp_join_i_l_str = ''.join(line2_u_rev_comp_join_i_l)
print ('List of line2 converted to string')
print ("List2 before as list below")
print (line2_u_rev_comp_join_i_l)
print ("Line 2 reprinted when string as below")
print (line2_u_rev_comp_join_i_l_str)
print (line1_u_i)
print ("Magic below")
print (magic)
if magic(matching) >=8:
line3=line1_u_i+line2_u_rev_comp_join_i_l_str
print ("Matching and merging has occured as shown below")
print (line3)
else:
continue
The cutting code is:
line2_u_rev_comp_join_i = line2_u_rev_comp_join[1:]
line1_u_i = line1_u[:-1]

l1 = "taaaaaaaaaaNataggggggggggNccc"
l2 = "taaaaaaaaaaNcccggggggggggNccc"
l1_ten=l1[0:10] # first ten chars
l2_ten=l2[0:10]
if l1_ten==l2_ten:
print l1_ten+l2_ten
taaaaaaaaataaaaaaaaa
If you want the chars that are equal at the same index in each string.
l1 = "taaaaaaaaaaNataggggggggggNccc"
l2 = "taaaaaaaaaaNcccggggggggggNccc"
count = 0
slice = 0
new_s=''
while count < len(l1):
if l1[slice]==l2[slice]:
new_s+= l1[slice]
count += 1
slice += 1
new_s
In [13]: new_s
Out[13]: 'taaaaaaaaaaNggggggggggNccc'
You can just use a for loop to achieve the same:
new_s1=""
for i in range(len(l1)):
if l1[i] ==l2[i]:
new_s+=l1[i]
I have assumed you are using strings of equal lengths

Related

Find string in list of splitted string

I have a string teststring and a list of substrings s but where the teststring was accidentally split. Now I would like to know the indexes within the list, which, if put together, would recreate the teststring.
teststring = "Hi this is a test!"
s = ["Hi", "this is", "Hello,", "Hi", "this is", "a test!", "How are", "you?"]
The expected output would be (the strings in the list s that would make up the teststring need to appear consecutively -> [0,4,5] would be wrong):
[3,4,5]
Anyone knows how to do that ?
I tried to come up with a decent solution, but found nothing that was working...
I just record every instance that a part of the teststring appears in one of the substrings in s:
test_list = []
for si in s:
if si in teststring:
flag = True
else:
flag = False
test_list.append(flag)
Then you would get: [True, True, False, True, True, True, False, False]
...and then one would have to take the index of the longest consecutive "True". Anayone knows how to get those indexes ?
If what you want is a list of consecutive indices that form the string when concatenated, I think this will do what you're looking for:
teststring = "Hi this is a test!"
s = ["Hi", "this is", "Hello,", "Hi", "this is", "a test!", "How are", "you?"]
test_list = []
i = 0 # the index of the current element si
for si in s:
if si in teststring:
# add the index to the list
test_list.append(i)
# check to see if the concatenation of the elements at these
# indices form the string. if so, this is the list we want, so exit the loop
if ' '.join(str(s[t]) for t in test_list) == teststring:
break
else:
# if we've hit a substring not in our teststring, clear the list because
# we only want consecutive indices
test_list = []
i += 1
This is a little convoluted, but it does the job:
start_index = ' '.join(s).index(teststring)
s_len = 0
t_len = 0
indices = []
found = False
for i, sub in enumerate(s):
s_len += len(sub) + 1 # To account for the space
if s_len > start_index:
found = True
if found:
t_len += len(sub)
if t_len > len(teststring):
break
indices.append(i)
Join the list into a large string, find the target string in the large string, then determine the starting and ending indices by checking the length of each string in the list.
>>> teststring = "Hi this is a test!"
>>> s = ["Hi", "this is", "Hello,", "Hi", "this is", "a test!", "How are", "you?"]
>>> joined = ' '.join(s)
>>> index = joined.index(teststring)
>>> lengths = list(map(len, s))
>>> loc = 0
>>> for start,ln in enumerate(lengths):
... if loc == index:
... break
... loc += ln + 1
...
>>> dist = 0
>>> for end,ln in enumerate(lengths, start=start):
... if dist == len(teststring):
... break
... dist += ln + 1
...
>>> list(range(start, end))
[3, 4, 5]
This is how I would approach the problem, hope it helps:
def rebuild_string(teststring, s):
for i in range(len(s)): # loop through our whole list
if s[i] in teststring:
index_list = [i] # reset each time
temp_string = teststring
temp_string = temp_string.replace(s[i], "").strip()
while i < len(s) - 1: # loop until end of list for each run through for loop
if len(temp_string) == 0: # we've eliminated all characters
return index_list # all matches are found, so we'll break all our loops and exit
i += 1 # we need to manually increment i inside while loop, but reuse variable because we need initial i from for loop
if s[i] in temp_string: # the next item in list is also in our string
index_list.append(i)
temp_string = temp_string.replace(s[i], "").strip()
else:
break # go back to for loop and try again
return None # no match exists in the list
my_test = "Hi this is a test!"
list_of_strings = ["Hi", "this is", "Hello,", "Hi", "this is", "a test!", "How are", "you?"]
print(rebuild_string(my_test, list_of_strings))
Result:
[3, 4, 5]
Basically I just found where the list item exists in the main string, and then the next successive list items must also exist in the string, until there is nothing left to match (stripping white spaces along the way). This would match strings that are put in the list out of order too, so long as when they are combined they recreate the entire string. Not sure if that's what you were going for though...

Find elements from list and store nearby elements in a list - Python

I have a list of tokenized words and I am searching some words from it and storing nearby 3 elements to the found word. Code is :
Words_to_find -- List of words to find
tokens -- large list from which I have to find from words_to_find
for x in words_to_find:
if x in tokens:
print "Matched word is", x
indexing = tokens.index(x)
print "This is index :", indexing
count = 0
lower_limit = indexing - 3
upper_limit = indexing + 3
print "Limits are", lower_limit,upper_limit
for i in tokens:
if count >= lower_limit and count <= upper_limit:
print "I have entered the if condition"
print "Count is : ",count
wording = tokens[count]
neighbours.append(wording)
else:
count +=1
break
count +=1
final_neighbour.append(neighbours)
print "I am in access here", final_neighbour
I am not able to find what is wrong in this code. I am taking lower and upper limit and trying to save that in a list and make a list of list out of it(final_neighbour).
Please help me find the issue. Thanks in advance
We can use slicing to get the neighbours rather than iterating using counts.
tokens = [u'प्रीमियम',u'एंड',u'गिव',u'फ्रॉम',u'महाराष्ट्रा',u'मुंबई',u'इंश्योरेंस',u'कंपन‌​ी',u'फॉर',u'दिस']
words_to_find = [u'फ्रॉम',u'महाराष्ट्रा']
final_neighbours = {}
for i in words_to_find:
if i in tokens:
print "Matched word : ",i
idx = tokens.index(i)
print "this is index : ",idx
idx_lb = idx-3
idx_ub = idx+4
print "Limits : ",idx_lb,idx_ub
only_neighbours = tokens[idx_lb : idx_ub]
only_neighbours.remove(i)
final_neighbours[i]= only_neighbours
for k,v in final_neighbours.items():
print "\nKey:",k
print "Values:"
for i in v:
print i,
Output:
Matched word : फ्रॉम
this is index : 3
Limits : 0 7
Matched word : महाराष्ट्रा
this is index : 4
Limits : 1 8
Key: महाराष्ट्रा
Values:
एंड गिव फ्रॉम मुंबई इंश्योरेंस कंपन‌​ी
Key: फ्रॉम
Values:
प्रीमियम एंड गिव महाराष्ट्रा मुंबई इंश्योरेंस
The neighbour for each word changes. Therefore make it null every word. And also count should be assigned to indexing-3, that is lower_limit if its >=0 otherwise 0, as the previous and next three words from the word found is what you need.
for x in words_to_find:
neighbours=[] # the neighbour for the new word will change, therefore make it null!
if x in tokens:
print "Matched word is", x
indexing = tokens.index(x)
print "This is index :", indexing
lower_limit = indexing - 3
upper_limit = indexing + 3
count = lower_limit if lower_limit >=0 else 0# lower_limit starts from the index-3 of the word found!
print "Limits are", lower_limit,upper_limit,count
for i in tokens:
if count >= lower_limit and count <= upper_limit:
print "I have entered the if condition"
print "Count is : ",count
wording = tokens[count]
neighbours.append(wording)
else:
count +=1
break
count +=1
final_neighbour.append(neighbours)
print "I am in access here", final_neighbour
Sample IO(some random token and words_to_find for testing purpose):
tokens='hi this is hi keerthana hello world hey hi hello'.split()
words_to_find=['hi','hello']
I am in access here [['hi', 'this', 'is', 'hi'], ['is', 'hi', 'keerthana', 'hello', 'world', 'hey', 'hi']]
Suggestion
You can use list slicing to get the 3 words before and after the matched word. This will also give the desired output!
lower_limit = lower_limit if lower_limit >=0 else 0
neighbours.append(tokens[lower_limit:upper_limit+1])
That is,
final_neighbour=[]
for x in words_to_find:
neighbours=[] # the neighbour for the new word will change, therefore make it null!
if x in tokens:
print "Matched word is", x
indexing = tokens.index(x)
print "This is index :", indexing
lower_limit = indexing - 3
upper_limit = indexing + 3
lower_limit = lower_limit if lower_limit >=0 else 0# lower_limit starts from the index-3 of the word found!
print "Limits are", lower_limit,upper_limit
neighbours.append(tokens[lower_limit:upper_limit+1])
final_neighbour.append(neighbours)
print "I am in access here", final_neighbour
Hope it helps!
You have line below in for loop
neighbours.append(wording)
What is "neighbours" ??
you should initialize it(Especially outside of loop... prefer to use at start of code where you defined tokens and Words_to_find) as below before the append statement
neighbours[]

Finding a longest string in for loop

I have the task to write a program in which the user inputs 8 words, after which the program prints the longest word inputed and counts the length of the word. I'm having problems with finding the longest inputed string. Here's my code:
counter = 0
for i in range(8):
x = str(input('Enter a word: '))
counter = len(x)
if counter == max(counter):
print('The longest word is: ', counter)
which of course doesn't work.
max can take an argument key which is applied to each element:
words = [raw_input('Enter a word: ') for _ in xrange(8)]
max_word = max(words, key=len)
This ought to do it - this sorts the list using the 'len' operator to obtain the length of each string, and [-1] just picks the last (the longest) one.
words = []
for i in range(8):
words.append(raw_input('Enter a word: '))
longestWord = sorted(words, key=len)[-1]
print 'The longest word is %s (%s character%s)' % (longestWord, len(longestWord), len(longestWord) != 1 and 's' or '')
Mind you it is somewhat inefficient in that it stores all inputs in the array until the loop is over. Maybe better would be this:
longestWord = ''
for i in range(8):
word = raw_input('Enter a word: ')
if len(word) > len(longestWord):
longestWord = word
print 'The longest word is %s (%s character%s)' % (longestWord, len(longestWord), len(longestWord) != 1 and 's' or '')
Consider keeping the lengths in a list and finding the max value in that list.
counter=""
for i in range(8):
x=str(input('Enter a word: '))
if len(counter) < len(x):
counter = x
print('The longest word is: ',x)

How do I display an item which is in the list multiple times?

It always gives me the first time it appears with .index(). I want index to equal the places of the list. For example, This word appears in the places, 0,4. If the user typed chicken that would be the output.
dlist = ["chicken","potato","python","hammer","chicken","potato","hammer","hammer","potato"]
x=None
while x != "":
print ("\nPush enter to exit")
x = input("\nGive me a word from this list: Chicken, Potato, Python, or Hammer")
y = x.lower()
if y in dlist:
count = dlist.count(y)
index = dlist.index(y)
print ("\nThis word appears",count,"times.")
print ("\nThis word appears in the places",index)
elif y=="":
print ("\nGood Bye")
else:
print ("\nInvalid Word or Number")
you can use
r = [i for i, w in enumerate(dlist) if w == y]
print ("\nThis word appears",len(r),"times.")
print ("\nThis word appears in the places", r)
instead of
count = dlist.count(y)
index = dlist.index(y)
print ("\nThis word appears",count,"times.")
print ("\nThis word appears in the places",index)
all_indexes = [idx for idx, value in enumerate(dlist) if value == y]
Something along these lines should work:
index_list = [i for i in xrange(len(dlist)) if dlist[i] == "hammer"]
That gives the list [3, 6, 7] in your example...

Loop Issue with Local Variable

I'm using Python (3.x) to create a simple program for an assignment. It takes a multiline input, and if there is more than one consecutive whitespace it strips them out and replaces it with one whitespace. [That's the easy part.] It must also print the value of the most consecutive whitespaces in the entire input.
Example:
input = ("This is the input.")
Should print:
This is the input.
3
My code is below:
def blanks():
#this function works wonderfully!
all_line_max= []
while True:
try:
strline= input()
if len(strline)>0:
z= (maxspaces(strline))
all_line_max.append(z)
y= ' '.join(strline.split())
print(y)
print(z)
if strline =='END':
break
except:
break
print(all_line_max)
def maxspaces(x):
y= list(x)
count = 0
#this is the number of consecutive spaces we've found so far
counts=[]
for character in y:
count_max= 0
if character == ' ':
count= count + 1
if count > count_max:
count_max = count
counts.append(count_max)
else:
count = 0
return(max(counts))
blanks()
I understand that this is probably horribly inefficient, but it seems to almost work. My issue is this: I would like to, once the loop is finished appending to all_lines_max, print the largest value of that list. However, there doesn't seem to be a way to print the max of that list without doing it on every line, if that makes sense. Any ideas on my convoluted code?
Just print the max of all_line_max, right where you currently print the whole list:
print(max(all_line_max))
but leave it at the top level (so dedent once):
def blanks():
all_line_max = []
while True:
try:
strline = input()
if strline:
z = maxspaces(strline)
all_line_max.append(z)
y = ' '.join(strline.split())
print(y)
if strline == 'END':
break
except Exception:
break
print(max(all_line_max))
and remove the print(z) call, which prints the maximum whitespace count per line.
Your maxspaces() function adds count_max to your counts list each time a space is found; not the most efficient method. You don't even need to keep a list there; count_max needs to be moved out of the loop and will then correctly reflect the maximum space count. You also don't have to turn the sentence into a list, you can directly loop over a string:
def maxspaces(x):
max_count = count = 0
for character in x:
if character == ' ':
count += 1
if count > max_count:
max_count = count
else:
count = 0
return max_count

Categories