How to make index read entire list? - python

I'm working on a project for my Python course and I'm still pretty new to coding in general. I'm having issues with one of the snippets of my code. I'm trying to make Python find every instance of the word "the" (or any input word really, it doesn't matter.) and return the word immediately after it. I am able to make it return the word after "the", but it stops after one instance when I need it to scan the entire list.
Here is my code:
the_list=['the']
animal_list=['the', 'cat', 'the', 'dog', 'the', 'axolotl']
for the_list in animal_list:
nextword=animal_list[animal_list.index("the")+1]
continue
print(nextword)
All I'm returning is cat whereas dog and axolotl should pop up as well. I tried using a for loop and a continue in order to make the code go through the same process for dog and axolotl, but it didn't work.

I am not clear what you are asking for, but I think what you want is to get the animals that are in the list animal_list, and assuming that the word 'the' is in the even indeces, you can use this;
animals = [animal for animal in animal_list if animal != 'the']
Since you are a beginner, the previous code uses a comprehension which is a pythonic way to iterate over a loop without a for loop, the equivalent code to the previous one using a for loop is:
animals = []
for animal in animal_list:
if animal != 'the':
animals.append(animal)

index only will get the first instance.
The typical pythonic way is to use a list comprehension:
[animal_list[i+1] for i,val in enumerate(animal_list) if val=='the']

list.index will only find the first occurrence, however you can specify a start and stop value to skip over other indexes.
Now we also need to use a try/except block because list.index will raise a ValueError in the case that it doesn't find a match.
animal_list=['the', 'cat', 'the', 'dog', 'the', 'axolotl']
match = 'the'
i = 0
while True:
try:
i = animal_list.index(match, i) + 1 # start search at index i
except ValueError:
break
# can remove this check if certain that your list won't end with 'the'
# otherwise could raise IndexError
if i < len(animal_list):
print(animal_list[i])
However in case you don't have to use list.index, I would suggest the following instead. (Again can remove the check if list won't end with 'the'.
for i, item in enumerate(animal_list):
if item == match and i + 1 < len(animal_list):
print(animal_list[i + 1])
Or more compact is to use list comprehension. Which will output a list of all items after 'the'.
animals = [ animal_list[i + 1] for i, v in enumerate(animal_list)
if v == match and i + 1 < len(animal_list) ]
print(animals)
Note: The use of continue is not correct. continue is used when you want to end the current iteration of the loop and move on to the next. For example
for i in range(5):
print(i)
if i == 2:
continue
print(i)
# Output
0
0
1
1
2 # Notice only '2' is printed once
3
3
4
4

One approach is to zip the list to a shifted version of itself:
keyword = 'the'
animal_list=['the', 'cat', 'the', 'dog', 'the', 'axolotl']
zipped = zip(animal_list, animal_list[1:])
# zipped contains [('the', 'cat'), ('cat', 'the'), ('the', 'dog') etc.]
found_words = [after for before, after in zipped if before == 'the']
This will deal with a list that ends in 'the' without raising an error (the final 'the' will simply be ignored).

the_word = 'the'
animal_list = ['the', 'cat', 'the', 'dog', 'the', 'axolotl']
# Iterate through animal_list by index, so it is easy to get the next element when we find the_word
for i in range(len(animal_list) - 1):
if animal_list[i] == the_word: # if the current word == the word we want to find
print(animal_list[i+1]) # print the next word
We dont want to check the last element in animal_list. That is why I subtract 1 from the length of animal_list. That way i will have values of 0, 1, 2, 3, 4.

Try this:
the_list=['the']
animal_list=['the', 'cat', 'the', 'dog', 'the', 'axolotl']
i=0
for i in range(len(animal_list)):
if animal_list[i] in the_list:
nextword=animal_list[i+1]
print nextword

This is a very UN-PYTHONIC way of doing this...but perhaps it'll help you understand indexes:
animal_list = ['the', 'cat', 'the', 'dog', 'the', 'axolotl']
index=0
for x in animal_list:
if x == "the":
print(animal_list[(index + 1)])
index +=1

Related

How to remove a word from a list with a specific character in a specific index position

this is what I have so far:
wlist = [word for word in wlist if not any(map(lambda x: x in word, 'c'))]
this code works, however in its current state it will remove all strings from wlist which contain 'c'. I would like to be able to specify an index position. For example if
wlist = ['snake', 'cat', 'shock']
wlist = [word for word in wlist if not any(map(lambda x: x in word, 'c'))]
and I select index position 3 than only 'shock' will be removed since 'shock' is the only string with c in index 3. the current code will remove both 'cat' and 'shock'. I have no idea how to integrate this, I would appreciate any help, thanks.
Simply use slicing:
out = [w for w in wlist if w[3:4] != 'c']
Output: ['snake', 'cat']
Probably you should use regular expressions. How ever I don`t know them )), so just iterates through list of words.
for i in wlist:
try:
if i[3] == 'c':
wlist.remove(i)
except IndexError:
continue
You should check only the 3rd character in your selected string. If you use the [<selected_char>:<selected_char>+1] list slicing then only the selected character will be checked. More about slicing: https://stackoverflow.com/a/509295/11502612
Example code:
checking_char_index = 3
wlist = ["snake", "cat", "shock", "very_long_string_with_c_char", "x", "c"]
out = [word for word in wlist if word[checking_char_index : checking_char_index + 1] != "c"]
print(out)
As you can see, I have extended your list with some corner-case strings and it works as expected as you can see below.
Output:
>>> python3 test.py
['snake', 'cat', 'very_long_string_with_c_char', 'x', 'c']

for loop exit if condition is not met

I am trying to write a Shiritori game in Python. In the game you are given a word (ex: dog) and you must to add another word that starts with the end of the previous word ex(: doG, Goose).
So given a list words = ['dog', 'goose', "elephant" 'tiger', 'rhino', 'orc', 'cat'] it must return all value, but if "elephant" is missing it must return:
["dog","goose"] because "dog" and "goose" match, but "goose" and "tiger" not.
I am running into a bug where it either loop out of range checking next index in list or it returns only "dog" and not "goose", or it returns ["dog","goose"] and than exit the loop without iterating through the rest of the list(s).
What am I doing wrong?
def(game():
words = ['dog', 'goose', 'tiger', 'rhino', 'orc', 'cat']
check_words = ['goose', 'tiger', 'rhino', 'orc', 'cat']
# check words has one less element to avoid index out or range in the for loop
# example = if word[-1] != words[index+1][0]: # index+1 gives error
good_words = []
for index, word in enumerate(words):
for index2, word2 in enumerate(check_words):
# I want to add the correct pair and keep looping if True
if word[-1] == word2[0]:
good_words.extend([word,word2])
return good_words # break out of the loop ONLY when this condition is not met
print(game())
your code need an indent after "def game():".
I am not sure why you needed the 2nd for loop.
here is a solution.
def game():
words = ['dog', 'goose', 'elephant', 'utiger', 'rhino', 'orc', 'cat']
good_words = []
for index in range(0, len(words)):
if index+1 < len(words):
previous_word = words[index][-1]
next_word = words[index+1][0]
if previous_word == next_word:
# appends the new word if not in list
if words[index] in good_words:
good_words.append(words[index+1])
else:
# only used for the first time to append the current and the next word
good_words.append(words[index])
good_words.append(words[index+1])
else:
return good_words # break out of the loop ONLY when this condition is not met
return good_words
print(game())

Find difference between two list and print their position and value in list

I'm trying to find the difference between two lists, but I would also like to know the position of the diff items.
My script isn't producing the results I want.
For example:
Here are the lists.
lst1 = ['dog', 'cat', 'plant', 'book', 'lamp']
lst2 = ['dog', 'mouse', 'plant', 'sock', 'lamp']
Here I am getting the position and value.
new_lst1 = [f"{i}, {v}" for i, v in enumerate(lst1)]
new_lst2 = [f"{i}, {v}" for i, v in enumerate(lst2)]
Then I want to find the difference between the two new lists.
def Diff(new_lst1, new_lst2):
(list(set(new_lst1) - set(new_lst2)))
Afterwards, I want to print the results.
print(new_lst1)
However, I'm getting:
['0, dog', '1, cat', '2, plant', '3, book', '4, lamp']
Sorry for the long explanation!
You split new_lst1, but left new_lst2 intact. First of all, this gets a run-time error, not the output you mention. If it did work, it gives you semantically incompatible elements to compare. Get rid of the split:
def Diff(new_lst1, new_lst2):
return list(set(new_lst1) - set(new_lst2))
# Afterwards, I want to print the results.
print(Diff(new_lst1, new_lst2))
Output:
['1, cat', '3, book']
You now have the correct information; format to taste.
It seems you're looking for the symmetric_difference of these lists:
>>> set(enumerate(lst1)) ^ set(enumerate(lst2))
{(1, 'mouse'), (1, 'cat'), (3, 'book'), (3, 'sock')}
Unless you're only looking for just the positions:
>>> [i for i, word in enumerate(lst1) if lst2[i] != word]
[1, 3]
You can refactor the following code into a function if you would like but this should accomplish what you're trying. Remember the first item in the list starts at 0 in python. So when it says difference at 1, it means the second item.
lst1 = ['dog', 'cat', 'plant', 'book', 'lamp']
lst2 = ['dog', 'mouse', 'plant', 'sock', 'lamp']
varying_pos = []
for index, item in enumerate(lst1):
if item != lst2[index]:
message = str(index) + ', ' + item
varying_pos.append(message)
print("The 2 lists vary at position:")
for value in varying_pos:
print(value)

How to check for words in a string even if the order is different - PYTHON

I am trying to find make as many words with e.g. 'workbook'
So the results should be like: work, workbook, book, bookwork, bow, row etc.
This is one method I tried, but this won't find words that are spelled in different order. (e.g. it won't append 'bow' even though you could rearrange letters within 'workbook' to write 'bow')
f = open('/usr/share/dict/words', 'r')
test = "workbook"
anagramlist = []
for word in f:
if word[:-1] in test and len(word[:-1]) > 2:
anagramlist.append(word[:-1])
# this wont append 'bookwork', 'row' etc
print anagramlist #outputs ['boo', 'book', 'work', 'workbook']
Another method I tried approaching this problem is by using sets.. But this doesn't work entirely either because it appends words that for e.g have more than 1 'w's like 'wow' or 'wowwow', even though I want it to only use the number of letters and letters in 'workbook'
f = open('/usr/share/dict/words', 'r')
test = "workbook"
anagramlist = []
for word in f:
if len(word) > 2 and set(word[:-1]) == set(test) & set(word[:-1]):
anagramlist.append(word[:-1])
print anagramlist
the output for this one is. I'm hoping I can fix something in the condition, or maybe this is a completely wrong approach.
['bo', 'bob', 'bobo', 'boo', 'boob', 'boobook', 'book', 'bookwork', 'boor', 'bor', 'boro', 'borrow', 'bow', 'bowk', 'bowwow', 'brob', 'broo', 'brook', 'brow', 'ko', 'kob', 'koko', 'kor', 'or', 'orb', 'ow', 'owk', 'rob', 'rook', 'row', 'wo', 'wob', 'woo', 'work', 'workbook', 'wow', 'wro']
I would really appreciate your help.
First generate all the potential anagrams by calculating all the word permutations and iterating over all the possible anagram lengths. Then filter potential_anagrams according to your words file f.
import itertools
def compute_anagrams(word)
n = len(word) + 1
permutations = {''.join(p) for p in itertools.permutations(word)}
potential_anagrams = {p[:i] for i in range(n) for p in permutations}
return [anagram for anagram in potential_anagrams if anagram in f]
Deomonstration:
>>> f = ['book', 'bookwork', 'bow', 'row', 'work', 'workbook']
>>> word = 'workbook'
>>> compute_anagrams(words)
['work', 'bow', 'workbook', 'row', 'bookwork', 'book']
You additionally need to test that for each letter in the dictionary word, it does not appear more times in the dictionary word than it does in "workbook". You could do this for example using the method count() of str.
Of course there are other approaches that in the end might be more efficient, but it's not necessary to start from scratch in order to fix what you have.

Checking superset of list in given order

I have a list of tuples in format (float,string) sorted in descending order.
print sent_scores
[(0.10507038451969995,'Deadly stampede in Shanghai - Emergency personnel help victims.'),
(0.078586381821416265,'Deadly stampede in Shanghai - Police and medical staff help injured people after the stampede.'),
(0.072031446647399661, '- Emergency personnel help victims.')]
In case there two cases in the list which four words same in continuinty. I want to remove the tuple with lesser score from the list. The new list should also preserve order.
The output of above:
[(0.10507038451969995,'Deadly stampede in Shanghai - Emergency personnel help victims.')]
This will be first certainly involve tokenization of the words, which can be done the code below:
from nltk.tokenize import TreebankWordTokenizer
def tokenize_words(text):
tokens = TreebankWordTokenizer().tokenize(text)
contractions = ["n't", "'ll", "'m","'s"]
fix = []
for i in range(len(tokens)):
for c in contractions:
if tokens[i] == c: fix.append(i)
fix_offset = 0
for fix_id in fix:
idx = fix_id - 1 - fix_offset
tokens[idx] = tokens[idx] + tokens[idx+1]
del tokens[idx+1]
fix_offset += 1
return tokens
tokenized_sents=[tokenize_words(sentence) for score,sentence in sent_scores]
I earlier tried to convert the words of each sentences in groups of 4 contained a set and then use issuperset for other sentences. But it doesn't check continuity then.
I suggest taking sequences of 4 tokens in a row from your tokenized list, and making a set of those tokens. By using Python's itertools module, this can be done rather elegantly:
my_list = ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
i1 = itertools.islice(my_list, 0, None)
i2 = itertools.islice(my_list, 1, None)
i3 = itertools.islice(my_list, 2, None)
i4 = itertools.islice(my_list, 3, None)
print zip(i1, i2, i3, i4)
Output of the above code (nicely formatted for you):
[('The', 'quick', 'brown', 'fox'),
('quick', 'brown', 'fox', 'jumps'),
('brown', 'fox', 'jumps', 'over'),
('fox', 'jumps', 'over', 'the'),
('jumps', 'over', 'the', 'lazy'),
('over', 'the', 'lazy', 'dog')]
Actually, even more elegant would be:
my_list = ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
iterators = [itertools.islice(my_list, x, None) for x in range(4)]
print zip(*iterators)
Same output as before.
Now that you have your list of four consecutive tokens (as 4-tuples) for each list, you can stick those tokens in a set, and check whether the same 4-tuple appears in two different sets:
my_list = ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
set1 = set(zip(*[itertools.islice(my_list, x, None) for x in range(4)]))
other_list = ['The', 'quick', 'red', 'fox', 'goes', 'home']
set2 = set(zip(*[itertools.islice(other_list, x, None) for x in range(4)]))
print set1.intersection(set2) # Empty set
if set1.intersection(set2):
print "Found something in common"
else:
print "Nothing in common"
# Prints "Nothing in common"
third_list = ['The', 'quick', 'brown', 'fox', 'goes', 'to', 'school']
set3 = set(zip(*[itertools.islice(third_list, x, None) for x in range(4)]))
print set1.intersection(set3) # Set containing ('The', 'quick', 'brown', 'fox')
if set1.intersection(set3):
print "Found something in common"
else:
print "Nothing in common"
# Prints "Found something in common"
NOTE: If you're using Python 3, just replace all the print "Something" statements with print("Something"): in Python 3, print became a function rather than a statement. But if you're using NLTK, I suspect you're using Python 2.
IMPORTANT NOTE: Any itertools.islice objects you create will iterate through their original list once, and then become "exhausted" (they've returned all their data, so putting them in a second for loop will produce nothing, and the for loop just won't do anything). If you want to iterate through the same list multiple times, create multiple iterators (as you see I did in my examples).
Update: Here's how to eliminate the lesser-scoring words. First, replace this line:
tokenized_sents=[tokenize_words(sentence) for score,sentence in sent_scores]
with:
tokenized_sents=[(score,tokenize_words(sentence)) for score,sentence in sent_scores]
Now what you have is a list of (score,sentence) tuples. Then we'll construct a list called scores_and_sets that will be a list of (score,sets_of_four_words) tuples (where sets_of_four_words is a list of four-word slices like in the example above):
scores_and_sentences_and_sets = [(score, sentence, set(zip(*[itertools.islice(sentence, x, None) for x in range(4)]))) for score,sentence in tokenized_sents]
That one-liner may be a bit too clever, actually, so let's unpack it to be a bit more readable:
scores_and_sentences_and_sets = []
for score, sentence in tokenized_sents:
set_of_four_word_groups = set(zip(*[itertools.islice(sentence, x, None) for x in range(4)]))
score_sentence_and_sets_tuple = (score, sentence, set_of_four_word_groups)
scores_and_sentences_and_sets.append(score_sentence_and_sets_tuple)
Go ahead and experiment with those two code snippets, and you'll find that they do exactly the same thing.
Okay, so now we have a list of (score, sentence, set_of_four_word_groups) tuples. So we'll go through the list in order, and build up a result list consisting of ONLY the sentences we want to keep. Since the list is already sorted in descending order, that makes things a little easier, because it means that at any point in the list, we only have to look at the items that have already been "accepted" to see if any of them have a duplicate; if any of the accepted items are a duplicate of the one we've just looked at, we don't even need to look at the scores, because we know the accepted item came earlier than the one we're looking at, and therefore it must have a higher score than the one we're looking at.
So here's some code that should do what you want:
accepted_items = []
for current_tuple in scores_and_sentences_and_sets:
score, sentence, set_of_four_words = current_tuple
found = False
for accepted_tuple in accepted_items:
accepted_score, accepted_sentence, accepted_set = accepted_tuple
if set_of_four_words.intersection(accepted_set):
found = True
break
if not found:
accepted_items.append(current_tuple)
print accepted_items # Prints a whole bunch of tuples
sentences_only = [sentence for score, sentence, word_set in accepted_items]
print sentences_only # Prints just the sentences

Categories