Explanation about split in python - python

I have this task.
st = 'print only the words that sstart with an s in the sstatement'
and the solution would be
for word in st.split():
if word[0] == 's':
print word
why won't it work with
for word in st.split():
if word[1] == 's':
print word
I kind of understand what that zero stands for, but how can I print the words with the second letter being 's'.

One of the problems is that it is not guaranteed that the length of the string is sufficient. For instance the empty string ('') or a string with one character ('s') might end up in the word list as well.
A quick fix is to use a length check:
for word in st.split():
if len(word) > 1 and word[1] == 's':
print word
Or you can - like #idjaw says - use slicing, and then we will obtain an empty string if out of range:
for word in st.split():
if word[1:2] == 's':
print word
If you have a string, you can obtain a substring with st[i:j] with st the string, i the first index (inclusive) and j the last index (exclusive). If however the indices are out of range, that is not a problem: then you will obtain the empty string. So we simply construct a slice that starts at 1 and ends at 1 (both inclusive here). If no such indices exist, we obtain the empty string (and this is not equal to 's'), otherwise we obtain a string with exactly one character: the one at index 1.
In the case however you will check against more complicated patterns, you can use a regex:
import re
rgx = re.compile(r'\b\ws\w*\b')
rgx.findall('print only the words that sstart with an s in the sstatement')
Here we specified to match anything between word boundaries \b that is a sequence of \ws with the second character an s:
>>> rgx.findall('print only the words that sstart with an s in the sstatement')
['sstart', 'sstatement']

Related

How to limit the results of a Python if-in statement when checking if a string is found in another string?

I wrote a Python for loop that goes through each word in the English language (from nltk.corpus import words), and prints words made only of 6 letters provided by the user. The 6 user inputs are stored in a list named characters, so the for loop compares the items from the list to each string (english words).
The problem is that words are printed that contain multiple characters of the same character. For example, if the characters are 'u, l, c, i , e, n', words with multiple letters such as "icicle" are returned. How to I prevent the script from returning words with duplicate letters?
characters = [input1, input2, input3, input4, input5, input6]
for word in word_list:
word = word.lower()
if len(word) == 3:
if word[0] in characters and word[1] in characters and word[2] in characters:
print(word)
elif len(word) == 4:
if word[0] in characters and word[1] in characters and word[2] in characters and word[3] in characters:
print(word)
elif len(word) == 5:
if word[0] in characters and word[1] in characters and word[2] in characters and word[3] in characters and word[4] in characters:
print(word)
elif len(word) == 6:
if word[0] in characters and word[1] in characters and word[2] in characters and word[3] in characters and word[4] in characters and word[5] in characters:
print(word)
I know the code is inefficiently written, so I'd appreciate tips on improvement as well. An example of the results of the above script is:
eel
eileen
eli
ell
elle
ellen
ellice
encell
ennui
eunice
ice
iceni
icicle
ilicic
ilicin
ill
inn
inulin
This is untested since I have no test data, but should do:
characters = [input1, input2, input3, input4, input5, input6]
for word in word_list:
word = word.lower()
isIn = True
for c in word:
if c not in characters or word.count(c) != 1:
isIn = False
if isIn:
print(word)
I don't know this package, but it sounds that your word list is big.
You should use a keyword tree instead of looping through the whole list everytime, when new letters are given. It is possible that this package contains better data structures for accessing those words, if not then you should transform it into a Trie. It is a one-time task and after it, lookup times become faster for every input.
Answering your question, you can make a dictionary, what maps the input letters with their quantities. For example:
input = {'a':1, 'b':2, 'c':1}
Then, if you are looping on each word, costly you can count each letter. If you are using a Trie, then you only need to go over on children and make a recursive call if
input[children's letter] != 0
before the recursive call, you need to decrement that value, and after call increment it.
This way, you only go over on the words that starts the same as your letters instead of going over every word, every time.
Hope it helps :)
You can use collections.Counter.
from collections import Counter
Then, to get Counter objects (essentially multisets) which count how many times each character occurs in the word and in the inputted allowed characters:
word_counter = Counter(word)
characters_counter = Counter(characters)
To check that the word is a subset of the characters, and print if so, do
if word_counter & characters_counter == word_counter:
print(word)
(& means intersection)
Very simple. Quick, because it uses standard library functionality hash maps that are optimized and probably written in C, instead of costly multiple-level list loops and finds and additions and removals. It also has the added benefit that if a user enters the same characters multiple times, then it will allow words with that character repeated multiple times, up to however many times the user entered it.
For example, if the user entered "i, i, c, c, l, e" then the word "icicle" would still be printed, whereas if they entered "i, i, c, z, l, e" then "icicle" would not be printed.
from collections import Counter
# input characters, get words...
characters_counter = Counter(characters)
for word in word_list:
word_counter = Counter(word)
if word_counter & characters_counter == word_counter:
print(word)
Done!
My first thought about the efficiency is:
def test_word(word, characters):
for i in range(len(word)):
if word[i] not in characters: # Does everything in 2 lines :)
return False
return True
This function returns False if the word has letters not in the list "characters", and True otherwise.
The reason I used a function is simply because it is neater and you can run the code from any point in your program easily. Make sure you use a copy of the list "characters" if you need to use it in the future:
copy_of_chars = characters.copy()
test_word(word, copy_of_chars)
About the duplicate letters- I would delete any letter in the list that has been "found":
def test_word(word, characters):
for i in range(len(word)):
if word[i] not in characters:
return False
characters.pop(characters.index([word[i]])) # Removes the letter from the list "characters"
return True
This function will return False if the word has characters not in the list characters, or if it has multiple letters when only one can be found in the list "characters". Otherwise it will return True.
Hope this helps!
Didn't test it:
for word in word_list:
if word < 6:
if all(letter in character for letter in list(word.lower()):
print(word)

How to check for duplicate letter in a array of strings. python

So if I have a list
word = ['cat','laap','cabb']
How can I check if the adjacent letter is a duplicate? I was thinking that with i position I can go to the first index of the array and with j to check every individual letter? Is this the right approach?
Note: that my tabs might be wrong in my code I find stack over tag code to be a little weird.
words = ['cat','laap','cabb']
for i in range(len(words)):
for j in range(len(words)):
if(words[i][j]==word[i][j+1]):
print('dup')
I think that what you wanted to write is more like
words = ['cat','laap','cabb']
for word in words:
for index, character in enumerate(word):
if word[index + 1] == character:
print('dup')
But beware, this fails at the last letter (even your code does). To avoid this, you can write
for word in words:
for index, character in enumerate(word[:-1]):
if word[index + 1] == character:
print('dup')
Use zip(word, word[1:]) to create 2 element tuples of consecutive letters in a word, convert the tuple to set and check if its length is 1 to check if both the elements are same, and use any to check that for any letter pair
>>> words = ['cat','laap','cabb']
>>> any(any(len(set(t))==1 for t in zip(w, w[1:])) for w in words)
True
May be you can also try using zip with set comprehension :
results = {w for w in word for ch1, ch2 in zip(w, w[1:]) if ch1 == ch2}

How to get left and right most indexes from matching word inside of string

With:
phrase = "this is string example....wow!!!"
word = "example"
I want to know what the left-most and right-most indexes are for a matching word inside of the phrase. How to do it with a minimum coding?
The output should be two integer values. First is the ordered number of the first letter of word: "e". The second integer value is the ordered number of the last letter of word: same character "e" (in "example" the first and the last letters are the same). We need to find where in the phase "this is string example....wow!!!" the word "example" is.
This is a way to do it without using the re package. It will also return the beginning/end indices of all occurrences of word:
phrase = "this is string example....wow!!!"
word = "example"
word_len = len(word) # word length is 7
phrase_len = len(phrase) # phrase length is 32
#We loop through the phrase using a "window" size equal to the word length
#If we find a match, we return the first and last index of the "current" window
for i in range(phrase_len - word_len+1):
current = phrase[i:i+word_len]
if current == word:
print i,i+word_len-1
#prints 15, 21
I assume you only want to find the first occurrence of word in phrase. If that's the case, just use str.index to get the position of the first character. Then, add len(word) - 1 to it to get the position of the last character.
start = phrase.index(word) # 15
end = start + len(word) - 1 # 21
If you need to find indexes of all occurrences of word, it's much easier to use the re module:
import re
for m in re.finditer(word, "example example"):
print(m.start(), m.end() - 1)
Prints
0 6
8 14
How about this:
import re
phrase = "this example is example string example....wow example!!!"
word = "example"
start, end = min([(m.start(0), m.end(0)) for m in re.finditer(word, phrase)])
print start, end - 1 # 5 11
print phrase[start:end] # example

Returning list of sorted words

I am trying to return a list of sorted words where the letters are alphabetically sorted. For example:
>>> sorted_words(["bet", "abacus", "act", "celebration", "door"])
['act', 'bet', 'door']
My function should return a new list of sorted values, but it must keep out any words where the first letter has lower or equal unicode than the following letters. For example, "door" is appended to a new list because d' <= 'o' and 'o' <= 'o' and 'o' <= 'r'. This is what I have written so far, but I'm having no luck.
def sorted_words[wordlist]:
result = []
for word in wordlist:
if word[0] <= word[1:]:
result.append(word)
print(word)
I know this isn't right, I just don't know how to compare the first letter of each word with the rest of the letters. Any help would be greatly appreciated. I also have to use the sorted() method but I am unsure how to use that.
You should divide your code in two sections: the first to select the words (according to the criteria), the second to sort the selected words.
There is a discrepancy between the description of the criteria and the example. If the criteria is "the first letter has lower or equal unicode than the following letters", then the word "abacus" should be included. However, the explanation you provide for "door" seems to be that "the unicode values of the characters should be in (not strictly) ascending order"
("strictly ascending means < while non strictly means <= between and element and its following one).
For the first criteria, use the following code
WORDS = ["bet", "abacus", "act", "celebration", "door"]
list_of_words = []
for word in WORDS:
if all(word[0] <= c for c in word[1:]):
list_of_words.append(word)
list_of_words.sort()
print (list_of_words)
For the second criteria, use
list2_of_words = []
for word in WORDS:
if all(c1 <= c2 for c1,c2 in zip(word[0:-1],word[1:])):
list2_of_words.append(word)
list2_of_words.sort()
print (list2_of_words)
You need another loop in there, so:
def sorted_words(wordlist):
result = []
for word in wordlist:
keepWord = True
for letter in word[1:]:
if word[0] >= letter:
keepWord = False
break
if keepWord:
result.append(word)
return result
print(sorted_words(["bet", "abacus", "act", "celebration", "door"]))
['bet', 'act', 'door']

How do I match vowels?

I am having trouble with a small component of a bigger program I am in the works on. Basically I need to have a user input a word and I need to print the index of the first vowel.
word= raw_input("Enter word: ")
vowel= "aeiouAEIOU"
for index in word:
if index == vowel:
print index
However, this isn't working. What's wrong?
Try:
word = raw_input("Enter word: ")
vowels = "aeiouAEIOU"
for index,c in enumerate(word):
if c in vowels:
print index
break
for .. in will iterate over actual characters in a string, not indexes. enumerate will return indexes as well as characters and make referring to both easier.
Just to be different:
import re
def findVowel(s):
match = re.match('([^aeiou]*)', s, flags=re.I)
if match:
index = len(match.group(1))
if index < len(s):
return index
return -1 # not found
The same idea using list comprehension:
word = raw_input("Enter word: ")
res = [i for i,ch in enumerate(word) if ch.lower() in "aeiou"]
print(res[0] if res else None)
index == vowel asks if the letter index is equal to the entire vowel list. What you want to know is if it is contained in the vowel list. See some of the other answers for how in works.
One alternative solution, and arguably a more elegant one, is to use the re library.
import re
word = raw_input('Enter a word:')
try:
print re.search('[aeiou]', word, re.I).start()
except AttributeError:
print 'No vowels found in word'
In essence, the re library implements a regular expression matching engine. re.search() searches for the regular expression specified by the first string in the second one and returns the first match. [aeiou] means "match a or e or i or o or u" and re.I tells re.search() to make the search case-insensitive.
for i in range(len(word)):
if word[i] in vowel:
print i
break
will do what you want.
"for index in word" loops over the characters of word rather than the indices. (You can loop over the indices and characters together using the "enumerate" function; I'll let you look that up for yourself.)

Categories