So I was exploring on coderbyte.com and one of the challenges is to find the longest word in a string. My code to do so is the following:
def LongestWord(sen):
current="";
currentBest=0
numberOfLettersInWord=0
longestWord=0
temp=sen.split()
for item in temp:
listOfCharacters=list(item)
for currentChar in listOfCharacters:
if currentChar.isalpha():
numberOfLettersInWord+=1
if numberOfLettersInWord>longestWord:
longestWord=numberOfLettersInWord
numberOfLettersInWord=0
currentBest=item
z = list(currentBest)
x=''
for item in z:
if item.isalpha(): x+=item
return x
testCase="a confusing /:sentence:/ this"
print LongestWord(testCase)
when testCase is "a confusing /:sentence:/"
The code returns confusing, which is the correct answer. But when the test case is the one in the current code, my code is returning 'this' instead of 'confusing'
Any ideas as to why this is happening?
I know that this is not the answer to your question, but this is how I would calculate the longest word. And not sharing it, wouldn't help you, either:
import re
def func(text: str) -> str:
words = re.findall(r"[\w]+", text)
return max(words, key=len)
print(func('a confusing /:sentence:/ this'))
Let me suggest another approach, which is more modular and more Pythonic.
Let's make a function to measure word length:
def word_length(w):
return sum(ch.isalpha() for ch in w)
So it will count (using sum()) how many characters there are for which .isalpha() is True:
>>> word_length('hello!!!')
5
>>> word_length('/:sentence:/')
8
Now, from a list of words, create a list of lengths. This is easily done with map():
>>> sen = 'a confusing /:sentence:/ this'.split()
>>> map(word_length, sen)
[1, 9, 8, 4]
Another builtin useful to find the maximum value in a list is max():
>>> max(map(word_length, sen))
9
But you want to know the word which maximizes the length, which in mathematical terms is called argument of the maximum.
To solve this, zip() the lengths with the words, and get the second argument found by max().
Since this is useful in many cases, make it a function:
def arg_max(func, values):
return max(zip(map(func, values), values))[1]
Now the longest word is easily found with:
>>> arg_max(word_length, sen)
'confusing'
Note: PEP-0008 (Style Guide for Python Code) suggests that function names be lower case and with words separated by underscore.
You loop through the words composing the sentence. However, numberOfLettersInWord is never reseted so it keeps increasing while you iterate among the words.
You have to set the counter to 0 each time you start a new word.
for item in temp:
numberOfLettersInWord = 0
It solves your issue as you can see: https://ideone.com/y1cmHX
Here's a little function I just wrote that will return the longest word, using a regular expression to remove non alpha-numeric characters
import re
def longest_word(input):
words = input.split()
longest = ''
for word in words:
word = re.sub(r'\W+', '', word)
if len(word) > len(longest):
longest = word
return longest
print(longest_word("a confusing /:sentence:/ this"))
Related
I am new to Python and I am not sure what is wrong with my syntax or logic here as this seems fairly straightforward. Do I need to split the words into chars?
Count how many words in a list have length 5.
This is what I have so far:
def countWords(lst):
total=0
for word in lst:
if len(word)==5:
total+=1
return total
Update: There are great answers and explanations here, thank you! Unfortunately, I think the activecode is just not working on this site: https://runestone.academy/runestone/books/published/thinkcspy/Lists/Exercises.html: Question 10.
First you have to fix your indentation, and then you probably want to use another name for your sum variable. I've changed it to found below for you.
def countWords(lst):
found = 0
for word in lst:
if len(word) == 5:
found += 1
return found
Then you'll have to call the function, so
countWords(lst)
where lst is the list of words.
First, indentation is very important in Python, also avoid using built-in names like sum, len etc. Also, function name should be in lower case, with words separated by an underscore. here is the multiline solution
def count_words(lst):
word_count = 0
for word in lst:
if len(word) == 5:
word_count += 1
return word_count
and here is the one-liner solution
def count_words(lst):
return len([word for word in lst if len(word) == 5])
the code correct and will provide you, total no of words having a length of 5.
you don't need to count the individual character of a word as len(str) provide the total no of characters inside the word.
to make this solution more scalable and testable for different length words, you can provide length as an option in the functional argument. setting default word length equals to 5 default and check it inside the function. Adding code for it
def countWords(lst,word_length=5):
total=0
for word in lst:
if len(word)==word_length:
total+=1
return total
if you want solution in single line
def countWords(lst, word_length=5):
return sum(1 for word in lst if len(word)==word_length)
you can do this more directly, by using a list comprehension to find all of the words, and counting from there.
def countWords(lst):
return sum([int(len(word) == 5) for word in lst])
This iterates through all of the words, checking the length, and adding up the resulting Booleans: True is , Falseis 0, by definition. Actually, you don't *need* theint` conversion, but some people prefer it for clarity.
You can also achieve the same using a map + lambda to single out the words in the list of length 5.
lst = ["12345", "123", "1234", "abcde", "123", "1234", "abcde"]
def countWords(lst):
return sum(map(lambda word: len(word) == 5, lst))
print(countWords(lst))
outputs:
3
You must remember that in Python, the indentation is important. In your case, since return sum is not indented, it is considered outside of your countWords() function.
The valid code is:
def countWords(lst):
count=0
for word in lst:
if len(word)==5:
count+=1
return count
It seems to be an indentation problem, check your indentation in the Idle (it makes it more obvious).
Since you are new to python, I will show you some cool ways to do this in python style.
return (len([word for word in lst if len(word) == 5]))
# using the filter function
def isWord(word):
return len(word) == 3
#an_iterator = filter(isWord, lst)
return len(list (filter(isWord, lst)))
# using lambda
#an_iterator = filter(lambda word: len(word), lst)
return len(list (filter(lambda word: len(word), lst)))
Let's say we have a string 'abc' and another string 'bcd'. If I do 'abc' in 'bcd' it will return false. I want to say 'if there is a character of 'abc' in the string 'bcd' than return true. (python)
edit: thank you for the spelling changes. It makes me feel dumb. They were typos though.
I have tried iterating through the string using for loops, but this is clunky and I am assuming it is not good practice. Anyway I couldn't make it flexible enough for my needs.
import random
symb1 = random.choice('abc#') # I am trying to test if it chose AL one
# symbol
symb2 = random.choice('abc!')
mystring = (symb1+symb2) #lets say mystring is 'a!'
if mystring in '#!' # I want to test here somehow if part of mystring is
# in #!
I want it to output true, and the output is false. I understand why, I just need help creating a way to test for the symbol in mystring
Iterate one of the Strings while doing in checks:
any(c in "#!" for c in mystring)
"Is there any c from mystring in '#!'?"
You could use list comprehensions.
Example;
>>> a = 'abc'
>>> b = 'bcd'
>>> [letter for letter in a if letter in b] # list comprehensions
['b', 'c']
>>> any(letter for letter in a if letter in b) # generator expression
True
as mentioned by #asikorski; change to use generator expression so the loop stops on the first match.
Okay the comprehensions made more sense but I decided to do a more expanded for loop. I found a way to make it less terrible. I'm just teaching python to a friend and I want him to be able to read the code more easily. here's the section of code in the program.
def punc():
characters = int(input('How long would you like your password to be? : '))
passlist = []
for x in range(characters):
add = random.choice(string.ascii_letters + '#####$$$$$~~~~!!!!!?????')
passlist.append(add)
l = []
for x in passlist:
if x in ['#','$','~','!','?']:
l.append(0)
if len(l) == 0:
punc()
else:
print(''.join(passlist))
There are few solutions:
the best one (both effective and pythonic):
if any(char in 'bcd' for char in 'abc'):
...
Uses generator + built-in any. It does not have to create a list with list comprehension and doesn't waste time to gather all the letters that are in both of strings, so it's memory effective.
the boring one (~ C style):
def check_letters(word1, word2):
for char in word1:
if char in word2:
return True
reeturn False
if check_letters('abc', 'bcd'):
...
Quite obvious.
the fancy one:
if set(list('abc')) & set(list('bcd')):
...
This one uses some tricks. list converts string to a list of letters, set creates a set of letters from the list. Then intersetion of two sets is created with & operator and if's condition evaluates to True if there's any element in the intersection.
It's not very effective, though; it has to create two lists and two sets.
I'm trying to write an algorithm that by given to it a bunch of letters is giving you all the words that can be constructed of the letters, for instance, given 'car' should return a list contains [arc,car,a, etc...] and out of it returns the best scrabble word. The problem is in finding that list which contains all the words.
I've got a giant txt file dictionary, line delimited and I've tried this so far:
def find_optimal(bunch_of_letters: str):
words_to_check = []
c1 = Counter(bunch_of_letters.lower())
for word in load_words():
c2 = Counter(word.lower())
if c2 & c1 == c2:
words_to_check.append(word)
max_word = max_word_value(words_to_check)
return max_word,calc_word_value(max_word)
max_word_value - returns the word with the maximum value of the list given
calc_word_value - returns the word's score in scrabble.
load_words - return a list of the dictionary.
I'm currently using counters to do the trick but, the problem is that I'm currently on about 2.5 seconds per search and I don't know how to optimize this, any thoughts?
Try this:
def find_optimal(bunch_of_letters):
bunch_of_letters = ''.join(sorted(bunch_of_letters))
words_to_check = [word for word in load_words() if ''.join(sorted(word)) in bunch_of_letters]
max_word = max_word_value(words_to_check)
return max_word, calc_word_value(max_word)
I've just used (or at least tried to use) a list comprehension. Essentially, words_to_check will (hopefully!) be a list of all of the words which are in your text file.
On a side note, if you don't want to use a gigantic text file for the words, check out enchant!
from itertools import permutations
theword = 'car' # or we can use input('Type in a word: ')
mylist = [permutations(theword, i)for i in range(1, len(theword)+1)]
for generator in mylist:
for word in generator:
print(''.join(word))
# instead of .join just print (word) for tuple
Output:
c
a
r
ca
cr
...
ar rc ra car cra acr arc rca rac
This will give us all the possible combinations (i.e. permutations) of a word.
If you're looking to see if the generated word is an actual word in the English dictionary we can use This Answer
import enchant
d = enchant.Dict("en_US")
for word in mylist:
print(d.check(word), word)
Conclusion:
If want to generate all the combinations of the word. We use this code:
from itertools import combinations, permutations, product
word = 'word' # or we can use input('Type in a word: ')
solution = permutations(word, 4)
for i in solution:
print(''.join(i)) # just print(i) if you want a tuple
I'm learning python from Think Python by Allen Downey and I'm stuck at Exercise 6 here. I wrote a solution to it, and at first look it seemed to be an improvement over the answer given here. But upon running both, I found that my solution took a whole day (~22 hours) to compute the answer, while the author's solution only took a couple seconds.
Could anyone tell me how the author's solution is so fast, when it iterates over a dictionary containing 113,812 words and applies a recursive function to each to compute a result?
My solution:
known_red = {'sprite': 6, 'a': 1, 'i': 1, '': 0} #Global dict of known reducible words, with their length as values
def compute_children(word):
"""Returns a list of all valid words that can be constructed from the word by removing one letter from the word"""
from dict_exercises import words_dict
wdict = words_dict() #Builds a dictionary containing all valid English words as keys
wdict['i'] = 'i'
wdict['a'] = 'a'
wdict[''] = ''
res = []
for i in range(len(word)):
child = word[:i] + word[i+1:]
if nword in wdict:
res.append(nword)
return res
def is_reducible(word):
"""Returns true if a word is reducible to ''. Recursively, a word is reducible if any of its children are reducible"""
if word in known_red:
return True
children = compute_children(word)
for child in children:
if is_reducible(child):
known_red[word] = len(word)
return True
return False
def longest_reducible():
"""Finds the longest reducible word in the dictionary"""
from dict_exercises import words_dict
wdict = words_dict()
reducibles = []
for word in wdict:
if 'i' in word or 'a' in word: #Word can only be reducible if it is reducible to either 'I' or 'a', since they are the only one-letter words possible
if word not in known_red and is_reducible(word):
known_red[word] = len(word)
for word, length in known_red.items():
reducibles.append((length, word))
reducibles.sort(reverse=True)
return reducibles[0][1]
wdict = words_dict() #Builds a dictionary containing all valid English words...
Presumably, this takes a while.
However, you regenerate this same, unchanging dictionary many times for every word you try to reduce. What a waste! If you make this dictionary once, and then re-use that dictionary for every word you try to reduce like you do for known_red, the computation time should be greatly reduced.
first of all i want to mention that there might not be any real life applications for this simple script i created, but i did it because I'm learning and I couldn't find anything similar here in SO. I wanted to know what could be done to "arbitrarily" change characters in an iterable like a list.
Sure tile() is a handy tool I learned relatively quick, but then I got to think what if, just for kicks, i wanted to format (upper case) the last character instead? or the third, the middle one,etc. What about lower case? Replacing specific characters with others?
Like I said this is surely not perfect but could give away some food for thought to other noobs like myself. Plus I think this can be modified in hundreds of ways to achieve all kinds of different formatting.
How about helping me improve what I just did? how about making it more lean and mean? checking for style, methods, efficiency, etc...
Here it goes:
words = ['house', 'flower', 'tree'] #string list
counter = 0 #counter to iterate over the items in list
chars = 4 #character position in string (0,1,2...)
for counter in range (0,len(words)):
while counter < len(words):
z = list(words[counter]) # z is a temp list created to slice words
if len(z) > chars: # to compare char position and z length
upper = [k.upper() for k in z[chars]] # string formatting EX: uppercase
z[chars] = upper [0] # replace formatted character with original
words[counter] = ("".join(z)) # convert and replace temp list back into original word str list
counter +=1
else:
break
print (words)
['housE', 'flowEr', 'tree']
This is somewhat of a combination of both (so +1 to both of them :) ). The main function accepts a list, an arbitrary function and the character to act on:
In [47]: def RandomAlter(l, func, char):
return [''.join([func(w[x]) if x == char else w[x] for x in xrange(len(w))]) for w in l]
....:
In [48]: RandomAlter(words, str.upper, 4)
Out[48]: ['housE', 'flowEr', 'tree']
In [49]: RandomAlter([str.upper(w) for w in words], str.lower, 2)
Out[49]: ['HOuSE', 'FLoWER', 'TReE']
In [50]: RandomAlter(words, lambda x: '_', 4)
Out[50]: ['hous_', 'flow_r', 'tree']
The function RandomAlter can be rewritten as this, which may make it a bit more clear (it takes advantage of a feature called list comprehensions to reduce the lines of code needed).
def RandomAlter(l, func, char):
# For each word in our list
main_list = []
for w in l:
# Create a container that is going to hold our new 'word'
new_word = []
# Iterate over a range that is equal to the number of chars in the word
# xrange is a more memory efficient 'range' - same behavior
for x in xrange(len(w)):
# If the current position is the character we want to modify
if x == char:
# Apply the function to the character and append to our 'word'
# This is a cool Python feature - you can pass around functions
# just like any other variable
new_word.append(func(w[x]))
else:
# Just append the normal letter
new_word.append(w[x])
# Now we append the 'word' to our main_list. However since the 'word' is
# a list of letters, we need to 'join' them together to form a string
main_list.append(''.join(new_word))
# Now just return the main_list, which will be a list of altered words
return main_list
There's much better Pythonistas than me, but here's one attempt:
[''.join(
[a[x].upper() if x == chars else a[x]
for x in xrange(0,len(a))]
)
for a in words]
Also, we're talking about the programmer's 4th, right? What everyone else calls 5th, yes?
Some comments on your code:
for counter in range (0,len(words)):
while counter < len(words):
This won't compile unless you indent the while loop under the for loop. And, if you do that, the inner loop will completely screw up the loop counter for the outer loop. And finally, you almost never want to maintain an explicit loop counter in Python. You probably want this:
for counter, word in enumerate(words):
Next:
z = list(words[counter]) # z is a temp list created to slice words
You can already slice strings, in exactly the same way you slice lists, so this is unnecessary.
Next:
upper = [k.upper() for k in z[chars]] # string formatting EX: uppercase
This is a bad name for the variable, since there's a function with the exact same name—which you're calling on the same line.
Meanwhile, the way you defined things, z[chars] is a character, a copy of words[4]. You can iterate over a single character in Python, because each character is itself a string. but it's generally pointless—[k.upper() for k in z[chars]] is the same thing as [z[chars].upper()].
z[chars] = upper [0] # replace formatted character with original
So you only wanted the list of 1 character to get the first character out of it… why make it a list in the first place? Just replace the last two lines with z[chars] = z[chars].upper().
else:
break
This is going to stop on the first string shorter than length 4, rather than just skip strings shorter than length 4, which is what it seems like you want. The way to say that is continue, not break. Or, better, just fall off the end of the list. In some cases, it's hard to write things without a continue, but in this case, it's easy—it's already at the end of the loop, and in fact it's inside an else: that has nothing else in it, so just remove both lines.
It's hard to tell with upper that your loops are wrong, because if you accidentally call upper twice, it looks the same as if you called it once. Change the upper to chr(ord(k)+1), which replaces any letter with the next letter. Then try it with:
words = ['house', 'flower', 'tree', 'a', 'abcdefgh']
You'll notice that, e.g., you get 'flowgr' instead of 'flowfr'.
You may also want to add a variable that counts up the number of times you run through the inner loop. It should only be len(words) times, but it's actually len(words) * len(words) if you have no short words, or len(words) * len(<up to the first short word>) if you have any. You're making the computer do a whole lot of extra work—if you have 1000 words, it has to do 1000000 loops instead of 1000. In technical terms, your algorithm is O(N^2), even though it only needs to be O(N).
Putting it all together:
words = ['house', 'flower', 'tree', 'a', 'abcdefgh'] #string list
chars = 4 #character position in string (0,1,2...)
for counter, word in enumerate(words):
if len(word) > chars: # to compare char position and z length
z = list(word)
z[chars] = chr(ord(z[chars]+1) # replace character with next character
words[counter] = "".join(z) # convert and replace temp list back into original word str list
print (words)
That does the same thing as your original code (except using "next character" instead of "uppercase character"), without the bugs, with much less work for the computer, and much easier to read.
I think the general case of what you're talking about is a method that, given a string and an index, returns that string, with the indexed character transformed according to some rule.
def transform_string(strng, index, transform):
lst = list(strng)
if index < len(lst):
lst[index] = transform(lst[index])
return ''.join(lst)
words = ['house', 'flower', 'tree']
output = [transform_string(word, 4, str.upper) for word in words]
To make it even more abstract, you could have a factory that returns a method, like so:
def transformation_factory(index, transform):
def inner(word):
lst = list(word)
if index < len(lst):
lst[index] = transform(lst[index])
return inner
transform = transformation_factory(4, lambda x: x.upper())
output = map(transform, words)