Find string in list of splitted string

Find string in list of splitted string - python

I have a string teststring and a list of substrings s but where the teststring was accidentally split. Now I would like to know the indexes within the list, which, if put together, would recreate the teststring.
teststring = "Hi this is a test!"
s = ["Hi", "this is", "Hello,", "Hi", "this is", "a test!", "How are", "you?"]
The expected output would be (the strings in the list s that would make up the teststring need to appear consecutively -> [0,4,5] would be wrong):
[3,4,5]
Anyone knows how to do that ?
I tried to come up with a decent solution, but found nothing that was working...
I just record every instance that a part of the teststring appears in one of the substrings in s:
test_list = []
for si in s:
if si in teststring:
flag = True
else:
flag = False
test_list.append(flag)
Then you would get: [True, True, False, True, True, True, False, False]
...and then one would have to take the index of the longest consecutive "True". Anayone knows how to get those indexes ?

If what you want is a list of consecutive indices that form the string when concatenated, I think this will do what you're looking for:
teststring = "Hi this is a test!"
s = ["Hi", "this is", "Hello,", "Hi", "this is", "a test!", "How are", "you?"]
test_list = []
i = 0 # the index of the current element si
for si in s:
if si in teststring:
# add the index to the list
test_list.append(i)
# check to see if the concatenation of the elements at these
# indices form the string. if so, this is the list we want, so exit the loop
if ' '.join(str(s[t]) for t in test_list) == teststring:
break
else:
# if we've hit a substring not in our teststring, clear the list because
# we only want consecutive indices
test_list = []
i += 1

This is a little convoluted, but it does the job:
start_index = ' '.join(s).index(teststring)
s_len = 0
t_len = 0
indices = []
found = False
for i, sub in enumerate(s):
s_len += len(sub) + 1 # To account for the space
if s_len > start_index:
found = True
if found:
t_len += len(sub)
if t_len > len(teststring):
break
indices.append(i)

Join the list into a large string, find the target string in the large string, then determine the starting and ending indices by checking the length of each string in the list.
>>> teststring = "Hi this is a test!"
>>> s = ["Hi", "this is", "Hello,", "Hi", "this is", "a test!", "How are", "you?"]
>>> joined = ' '.join(s)
>>> index = joined.index(teststring)
>>> lengths = list(map(len, s))
>>> loc = 0
>>> for start,ln in enumerate(lengths):
... if loc == index:
... break
... loc += ln + 1
...
>>> dist = 0
>>> for end,ln in enumerate(lengths, start=start):
... if dist == len(teststring):
... break
... dist += ln + 1
...
>>> list(range(start, end))
[3, 4, 5]

This is how I would approach the problem, hope it helps:
def rebuild_string(teststring, s):
for i in range(len(s)): # loop through our whole list
if s[i] in teststring:
index_list = [i] # reset each time
temp_string = teststring
temp_string = temp_string.replace(s[i], "").strip()
while i < len(s) - 1: # loop until end of list for each run through for loop
if len(temp_string) == 0: # we've eliminated all characters
return index_list # all matches are found, so we'll break all our loops and exit
i += 1 # we need to manually increment i inside while loop, but reuse variable because we need initial i from for loop
if s[i] in temp_string: # the next item in list is also in our string
index_list.append(i)
temp_string = temp_string.replace(s[i], "").strip()
else:
break # go back to for loop and try again
return None # no match exists in the list
my_test = "Hi this is a test!"
list_of_strings = ["Hi", "this is", "Hello,", "Hi", "this is", "a test!", "How are", "you?"]
print(rebuild_string(my_test, list_of_strings))
Result:
[3, 4, 5]
Basically I just found where the list item exists in the main string, and then the next successive list items must also exist in the string, until there is nothing left to match (stripping white spaces along the way). This would match strings that are put in the list out of order too, so long as when they are combined they recreate the entire string. Not sure if that's what you were going for though...

Related

Is there any way to split string into array by spaces in another string in Python?

I have two input strings. In the first one - words with spaces. In the second - the word with same count of symbols without spaces. The task is to split the second string into array by spaces in the first one.
I tried to make it with cycles but there is problem of index out of range and i can't find another solution.
a = str(input())
b = str(input())
b_word = str()
b_array = list()
for i in range(len(a)):
if a[i] != " ":
b_word += b[i]
else:
b_array += b_word
b_word = str()
print(b_array)
Input:
>>head eat
>>aaabbbb
Output:
Traceback (most recent call last):
File "main.py", line 29, in <module>
b_word += b[i]
IndexError: string index out of range
Expected output:
>> ["aaab", "bbb"]
Thanks in advance!

Consider a solution based on iterator and itertools.islice method:
import itertools
def split_by_space(s1, s2):
chunks = s1.split()
it = iter(s2) # second string as iterator
return [''.join(itertools.islice(it, len(c))) for c in chunks]
print(split_by_space('head eat or', 'aaaabbbcc')) # ['aaaa', 'bbb', 'cc']

a = input() # you don't need to wrap these in str() since in python3 input always returns a string
b = input()
output = list()
for i in a.split(' '): # split input a by spaces
output.append(b[:len(i)]) # split input b
b = b[len(i):] # update b
print(output)
Output:
['aaab', 'bbb']

You can do something like this:
a = input()
b = input()
splitted_b = []
idx = 0
for word in a.split():
w_len = len(word)
splitted_b.append(b[idx:idx+w_len])
idx += w_len
print(splitted_b)
The idea is taking consecutive sub-strings from b of the length of each word on a.

Instead of using indices, you can iterate over each character of a. If the character is not a space, add the next character of b to your b_word. If it is a space, add b_word to the b_array
b_iter = iter(b) # Create an iterator from b so we can get the next character when needed
b_word = []
b_array = []
for char in a:
# If char is a space, and b_word isn't empty, append it to the result
if char == " " and b_word:
b_array.append("".join(b_word))
b_word = []
else:
b_word.append(next(b_iter)) # Append the next character from b to b_word
if b_word: # If anything left over in b_word, append it to the result
b_array.append("".join(b_word))
Which gives b_array = ['aaab', 'bbb']
Note that I changed b_word to a list that I .append to every time I add a character. This prevents the entire string from being recreated every time you append a character.
Then join all the characters using "".join(b_word) before appending it to b_array.

So to accomodate for any number of spaces in the input it gets a bit more complex as the indexes of the letters will change with each space that is added. So to gather all of the spaces in the string I created this loop which will account of the multiple spaces and alter the index with each new space in the initial word.
indexs = []
new = ''
for i in range(len(a)):
if len(indexs) > 0:
if a[i] == ' ':
indexs.append(i-len(indexs))
else:
if a[i] == ' ':
indexs.append(i)
Then we simple concatenate them together to create a new string that includes spaces at the predetermined indexes.
for i in range(len(b)):
if i in indexs:
print(i)
new += " "
new += b[i]
else:
new += b[i]
print(new)
Hope this helps.

Code
sone = input()
stwo = 'zzzzzxxxyyyyy'
nwz = []
wrd = ''
cnt = 0
idx = 0
spc = sone.split(' ') #split by whitespace
a = [len(i) for i in spc] #list word lengths w/out ws
for i in stwo:
if cnt == a[idx]: #if current iter eq. word length w/out ws
nwz.append(wrd) #append the word
wrd = '' #clear old word
wrd = wrd + i #start new word
idx = idx + 1
cnt = 0
else:
wrd = wrd + i #building word
cnt = cnt + 1
nwz.append(wrd) #append remaining word
print(nwz)
Result
>'split and match'
['zzzzz', 'xxx', 'yyyyy']

Python : how to decode the text in s-language (strings)

Input exists out of a sentence or text in s-language: Isis thasat yousour sisisteser?
Output exists out same sentence without s-language : Is that your sister?
Problems:
I have the following code but some things are not working. For example I can not append after the if statements and my if statements are too litteral. Also the print(decoded_tekst) is not working.
Method:
I iterate through the different positions of the text with two variables ("vowelgroup" to store the vowels and "decoded text" to store the consonant and if "s" to replace it by the vowelgroup).
text = input('Enter a text: ')
vowelgroup = []
decoded_text = []
sentence = []
vowel = 'aeiou'
count = 0
for i in text:
if i is not vowel and not "s":
sentence = decoded_text.append(i)
if i is vowel:
vowelgroup = vowelgroup.append(vowel)
if i is "s":
decoded_text = sentence.append(vowelgroup)
count += 1
print(decoded_text)

You look a little confused about how append works. append just adds the argument to the end of the list and it returns None. E.g:
l = [1,2]
a = l.append(3)
print a # None
print l # [1,2,3]
So you just use it like:
l = [1,2]
l.append(3)
print l # [1,2,3]

How do I find all indexes that have same string in a list?

In a game of Hangman, if the hidden word is hello and the player guesses l then I need to find the index of both locations.
Example:
word = "hello"
guess = "l"
position = word.index(guess) #this helps me find the first one
I couldn't come up with any way to find the second. How am I able to do that?

Well, you could use enumerate and a list comprehension:
>>> s = "hello"
>>> indexes = [i for i, v in enumerate(s) if v == "l"]
>>> indexes
[2, 3]

Specifically for hangman:
>>> word = 'hello'
>>> guess = 'l'
>>> puzzle = ''.join(i if i == guess else '_' for i in word)
>>> print(puzzle)
__ll_

Another thing you could do, is preprocess the word and have the list of indexes already available in a map so you do not have to iterate trough the string all the time, only once.
word = "hello"
map = {}
for i, c in enumerate(word):
if (c in map):
map[c].append(i)
else:
map[c] = [i]
Than, check that the letter guessed is in map. If it is, the letter exists otherwise it does not.

Make program continue action till line has been created

Currently my program takes line1 such as "taaaaaaaaaaNataggggggggggNccc" and will cut 1 character of the end untill it matches line2 such as "taaaaaaaaaaNcccggggggggggNccc" and once they match it concatenates them together to form line3, however if they dont match it should cut another character off. How can I make it repeat the cutting action until they match and line3 has been made? I have thought about for and while loops but am unsure how to state this issue. Everything else about this program works as it should but when it tries matching them if it fails it just stops and wont go back to try trimming again.
I have tried the below code where magic(matching) is essentially the counting code used to idnetfy how much the 2 lines match and if below 8 it should repeat the cutting. However when used it asks for matching and magic to be stated before the while loop which is right at the start and this messes up the rest of the code.
while magic(matching) >=8:
line3=line1+line2
print ("Matching and merging has occured as shown below")
print (line3)
The code of interest is below:
n = 0
consec_matches = []
chars = defaultdict(int)
for k, group in groupby(zip(line1_u_i, line2_u_rev_comp_join_i), class_chars):
elems = len(list(group))
chars[k] += elems
if k == 'match':
consec_matches.append((n, n+elems-1))
n += elems
print ("Print chars below")
print (chars)
print ("Print consec_matches below")
print (consec_matches)
print ([x for x in consec_matches if x[1]-x[0] >= 9])
print (" Matches longer than 10 below")
list = [x for x in consec_matches if x[1]-x[0] >= 9]
flatten_list= [x for y in list for x in y]
print (flatten_list)
print ("Flatterend list")
matching=[y[1] for y in list for x in y if x ==0 ]
print ("Matching list below")
print (matching)
magic = lambda matching: int(''.join(str(i) for i in matching) or 0)
print (" Print magic matching below")
print (magic(matching))
line2_u_rev_comp_join_i_l = line2_u_rev_comp_join_i[magic(matching):]
print ("Print line2_u_rev_comp_join_i_l type below")
print (type(line2_u_rev_comp_join_i_l))
print ("Print line2_u_rev_comp_join_i_l sequence below")
print (line2_u_rev_comp_join_i_l)
line2_u_rev_comp_join_i_l_str = ''.join(line2_u_rev_comp_join_i_l)
print ('List of line2 converted to string')
print ("List2 before as list below")
print (line2_u_rev_comp_join_i_l)
print ("Line 2 reprinted when string as below")
print (line2_u_rev_comp_join_i_l_str)
print (line1_u_i)
print ("Magic below")
print (magic)
if magic(matching) >=8:
line3=line1_u_i+line2_u_rev_comp_join_i_l_str
print ("Matching and merging has occured as shown below")
print (line3)
else:
continue
The cutting code is:
line2_u_rev_comp_join_i = line2_u_rev_comp_join[1:]
line1_u_i = line1_u[:-1]

l1 = "taaaaaaaaaaNataggggggggggNccc"
l2 = "taaaaaaaaaaNcccggggggggggNccc"
l1_ten=l1[0:10] # first ten chars
l2_ten=l2[0:10]
if l1_ten==l2_ten:
print l1_ten+l2_ten
taaaaaaaaataaaaaaaaa
If you want the chars that are equal at the same index in each string.
l1 = "taaaaaaaaaaNataggggggggggNccc"
l2 = "taaaaaaaaaaNcccggggggggggNccc"
count = 0
slice = 0
new_s=''
while count < len(l1):
if l1[slice]==l2[slice]:
new_s+= l1[slice]
count += 1
slice += 1
new_s
In [13]: new_s
Out[13]: 'taaaaaaaaaaNggggggggggNccc'
You can just use a for loop to achieve the same:
new_s1=""
for i in range(len(l1)):
if l1[i] ==l2[i]:
new_s+=l1[i]
I have assumed you are using strings of equal lengths

how to reverse two characters in a string python

I was wondering how to reverse two characters in a string.
Here are some examples:
'wing' => 'iwng', 'inwg', 'ingw'
'west' => 'ewst', 'eswt', 'estw'
I was going to use any answers given and put it in a while loop so I can get all the possible combinations of a string while swapping two characters at a time.
ex.
counter = 0
while (counter <= len(str1)):
if str1 == reverse(str2):
return str2
elif str1 == str2
return str2
else:
str1 = *some code that would swap the the characters m and n*
str1 =
n += 1
m += 1
return False
This code compares two strings, str1 to str2, and checks to see if they are the same by swapping the characters around.
ALSO, is there a way i can get this to produce a list of the results instead of printing them?
THANKS!

Try this:
s = 'wing'
s = 'west'
l = [x for x in s]
for i in xrange(len(s)-1):
l[i], l[i+1] = l[i+1], l[i]
print "".join(l)

In order to generate all possibilities, we can use:
s = "yourstring"
for i in range(0,len(s)-2):
if i>0: print s[:i]+s[i+1:i-1:-1]+s[i+2:]
else: print s[1]+s[0]+s[2:]

Since you wish to actually compare two strings to see if they "are the same by swapping two characters around," you do not actually need to generate all possible combinations, instead you can iterate through each of the characters in each of the strings and ensure that no more than two of them are not equal.
This can be done as follows:
def twoCharactersDifferent(str1,str2):
if sorted(str1) != sorted(str2): #they must contain the same letters, exactly!
return False
numDifferent = 0
for i in range(len(str1)):
numDifferent += (str1[i] != str2[i])
if numDifferent >2:
return False
return True
print twoCharactersDifferent('wings','winxg')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find string in list of splitted string - python

Related

Is there any way to split string into array by spaces in another string in Python?

Python : how to decode the text in s-language (strings)

How do I find all indexes that have same string in a list?

Make program continue action till line has been created

how to reverse two characters in a string python

Categories

Resources