Python word counter - python

I'm taking a Python 2.7 course at school, and they told us to create the following program:
Assume s is a string of lower case characters.
Write a program that prints the longest substring of s in which the letters occur in alphabetical order.
For example, if s = azcbobobegghakl , then your program should print
Longest substring in alphabetical order is: beggh
In the case of ties, print the first substring.
For example, if s = 'abcbcd', then your program should print
Longest substring in alphabetical order is: abc
I wrote the following code:
s = 'czqriqfsqteavw'
string = ''
tempIndex = 0
prev = ''
curr = ''
index = 0
while index < len(s):
curr = s[index]
if index != 0:
if curr < prev:
if len(s[tempIndex:index]) > len(string):
string = s[tempIndex:index]
tempIndex=index
elif index == len(s)-1:
if len(s[tempIndex:index]) > len(string):
string = s[tempIndex:index+1]
prev = curr
index += 1
print 'Longest substring in alphabetical order is: ' + string
The teacher also gave us a series of test strings to try:
onyixlsttpmylw
pdxukpsimdj
yamcrzwwgquqqrpdxmgltap
dkaimdoviquyazmojtex
abcdefghijklmnopqrstuvwxyz
evyeorezmslyn
msbprjtwwnb
laymsbkrprvyuaieitpwpurp
munifxzwieqbhaymkeol
lzasroxnpjqhmpr
evjeewybqpc
vzpdfwbbwxpxsdpfak
zyxwvutsrqponmlkjihgfedcba
vzpdfwbbwxpxsdpfak
jlgpiprth
czqriqfsqteavw
All of them work fine, except the last one, which produces the following answer:
Longest substring in alphabetical order is: cz
But it should say:
Longest substring in alphabetical order is: avw
I've checked the code a thousand times, and found no mistake. Could you please help me?

These lines:
if len(s[tempIndex:index]) > len(string):
string = s[tempIndex:index+1]
don't agree with each other. If the new best string is s[tempIndex:index+1] then that's the string you should be comparing the length of in the if condition. Changing them to agree with each other fixes the problem:
if len(s[tempIndex:index+1]) > len(string):
string = s[tempIndex:index+1]

Indices are your friend.
below is a simple code for the problem.
longword = ''
for x in range(len(s)-1):
for y in range(len(s)+1):
word = s[x:y]
if word == ''.join(sorted(word)):
if len(word) > len(longword):
longword = word
print ('Longest substring in alphabetical order is: '+ longword)

I see that user5402 has nicely answered your question, but this particular problem intrigued me, so I decided to re-write your code. :) The program below uses essentially the same logic as your code with a couple of minor changes.
It is considered more Pythonic to avoid using indices when practical, and to iterate directly over the contents of strings (or other container objects). This generally makes the code easier to read since we don't have to keep track of both the indices and the contents.
In order to get access to both the current & previous character in the string we zip together two copies of the input string, with one of the copies offset by inserting a space character at the start. We also append a space character to the end of the other copy so that we don't have to do special handling when the longest ordered sub-sequence occurs at the end of the input string.
#! /usr/bin/env python
''' Find longest ordered substring of a given string
From http://stackoverflow.com/q/27937076/4014959
Written by PM 2Ring 2015.01.14
'''
data = [
"azcbobobegghakl",
"abcbcd",
"onyixlsttpmylw",
"pdxukpsimdj",
"yamcrzwwgquqqrpdxmgltap",
"dkaimdoviquyazmojtex",
"abcdefghijklmnopqrstuvwxyz",
"evyeorezmslyn",
"msbprjtwwnb",
"laymsbkrprvyuaieitpwpurp",
"munifxzwieqbhaymkeol",
"lzasroxnpjqhmpr",
"evjeewybqpc",
"vzpdfwbbwxpxsdpfak",
"zyxwvutsrqponmlkjihgfedcba",
"vzpdfwbbwxpxsdpfak",
"jlgpiprth",
"czqriqfsqteavw",
]
def longest(s):
''' Return longest ordered substring of s
s consists of lower case letters only.
'''
found, temp = [], []
for prev, curr in zip(' ' + s, s + ' '):
if curr < prev:
if len(temp) > len(found):
found = temp[:]
temp = []
temp += [curr]
return ''.join(found)
def main():
msg = 'Longest substring in alphabetical order is:'
for s in data:
print s
print msg, longest(s)
print
if __name__ == '__main__':
main()
output
azcbobobegghakl
Longest substring in alphabetical order is: beggh
abcbcd
Longest substring in alphabetical order is: abc
onyixlsttpmylw
Longest substring in alphabetical order is: lstt
pdxukpsimdj
Longest substring in alphabetical order is: kps
yamcrzwwgquqqrpdxmgltap
Longest substring in alphabetical order is: crz
dkaimdoviquyazmojtex
Longest substring in alphabetical order is: iquy
abcdefghijklmnopqrstuvwxyz
Longest substring in alphabetical order is: abcdefghijklmnopqrstuvwxyz
evyeorezmslyn
Longest substring in alphabetical order is: evy
msbprjtwwnb
Longest substring in alphabetical order is: jtww
laymsbkrprvyuaieitpwpurp
Longest substring in alphabetical order is: prvy
munifxzwieqbhaymkeol
Longest substring in alphabetical order is: fxz
lzasroxnpjqhmpr
Longest substring in alphabetical order is: hmpr
evjeewybqpc
Longest substring in alphabetical order is: eewy
vzpdfwbbwxpxsdpfak
Longest substring in alphabetical order is: bbwx
zyxwvutsrqponmlkjihgfedcba
Longest substring in alphabetical order is: z
vzpdfwbbwxpxsdpfak
Longest substring in alphabetical order is: bbwx
jlgpiprth
Longest substring in alphabetical order is: iprt
czqriqfsqteavw
Longest substring in alphabetical order is: avw

I have come across this question myself and thought I would share my answer.
My solution works 100% of the time.
The question is to help new Python coders understand loops without having to dig deep into other complex solutions. This bit of code is flatter and uses variable names to make easy reading for new coders.
I added comments to explain the code steps. Without the comments it is very clean and readable.
s = 'czqriqfsqteavw'
test_char = s[0]
temp_str = str('')
longest_str = str('')
for character in s:
if temp_str == "": # if empty = we are working with a new string
temp_str += character # assign first char to temp_str
longest_str = test_char # it will be the longest_str for now
elif character >= test_char[-1]: # compare each char to the previously stored test_char
temp_str += character # add char to temp_str
test_char = character # change the test_char to the 'for' looping char
if len(temp_str) > len(longest_str): # test if temp_char stores the longest found string
longest_str = temp_str # if yes, assign to longest_str
else:
test_char = character # DONT SWAP THESE TWO LINES.
temp_str = test_char # OR IT WILL NOT WORK.
print("Longest substring in alphabetical order is: {}".format(longest_str))

My solution is similar to Nim J's, but it performs less iterations.
res = ""
for n in range(len(s)):
for i in range(1, len(s)-n+1):
if list(s[n:n+i]) == sorted(s[n:n+i]):
if len(list(s[n:n+i])) > len(res):
res = s[n:n+i]
print("Longest substring in alphabetical order is:", res)

Related

A Python program to print the longest consecutive chain of words of the same length from a sentence

I got tasked with writing a Python script that would output the longest chain of consecutive words of the same length from a sentence. For example, if the input is "To be or not to be", the output should be "To, be, or".
text = input("Enter text: ")
words = text.replace(",", " ").replace(".", " ").split()
x = 0
same = []
same.append(words[x])
for i in words:
if len(words[x]) == len(words[x+1]):
same.append(words[x+1])
x += 1
elif len(words[x]) != len(words[x+1]):
same = []
x += 1
else:
print("No consecutive words of the same length")
print(words)
print("Longest chain of words with similar length: ", same)
In order to turn the string input into a list of words and to get rid of any punctuation, I used the replace() and split() methods. The first word of this list would then get appended to a new list called "same", which would hold the words with the same length. A for-loop would then compare the lengths of the words one by one, and either append them to this list if their lengths match, or clear the list if they don't.
if len(words[x]) == len(words[x+1]):
~~~~~^^^^^
IndexError: list index out of range
This is the problem I keep getting, and I just can't understand why the index is out of range.
I will be very grateful for any help with solving this issue and fixing the program. Thank you in advance.
using groupby you can get the result as
from itertools import groupby
string = "To be or not to be"
sol = ', '.join(max([list(b) for a, b in groupby(string.split(), key=len)], key=len))
print(sol)
# 'To, be, or'
len() function takes a string as an argument, for instance here in this code according to me first you have to convert the words variable into a list then it might work.
Thank You !!!

How can find the substrings that are in alphabetical order within a sorted string list? Python

I should create a program that will find the characters that are in alphabetical order in a given input and find how many characters are in that particular substring or substrings.
For example
Input: cabin
Output: abc, 3
Input: sightfulness
Output: ghi, 3
OUtput: stu, 3
Here is what I have coded so far. I am stuck in the part of checking whether the two consecutive letters in my sorted list is in alphabetical order.
I have converted that string input to a list of characters and removed the duplicates. I already sorted the updated list so far.
import string
a = input("Input A: ")
#sorted_a is the sorted letters of the string input a
sorted_a = sorted(a)
print(sorted_a)
#to remove the duplicate letters in sorted_a
#make a temporary list to contain the filtered elements
temp = []
for x in sorted_a:
if x not in temp:
temp.append(x)
#pass the temp list to sorted_a, sorted_a list updated
sorted_a = temp
joined_a = "".join(sorted_a)
print(sorted_a)
alphabet = list(string.ascii_letters)
print(alphabet)
def check_list_order(sorted_a):
in_order_list = []
for i in sorted_a:
if any((match := substring) in i for substring in alphabet):
print(match)
#this should be the part
#that i would compare the element
#in sorted_a with the elements in alphabet
#to know the order of both of them
#and to put them ordered characters
#to in_order_list
if ord(i)+1 == ord(i)+1:
in_order_list.append(i)
return in_order_list
print(check_list_order(sorted_a))
You could try something like this:
string = input("Input string: ")
chars = sorted(set(string.strip().casefold()))
parts, part = [], ""
for a, b in zip(chars, chars[1:] + ["-"]):
part += a
if ord(a) + 1 != ord(b):
if len(part) > 1:
parts.append(part)
part = ""
print(parts)
The result parts for the input-string "sightfulness" would be
['efghi', 'stu']
so I'm not quite sure why your output differs: Is there a requirement that you haven't mentioned?
If - could be part of the string then replace ... + ["-"] with something more suitable. If you want to exclude any characters that are not in the alphabet then you could do:
from string import ascii_lowercase as alphabet
string = input("Input string: ")
chars = sorted(set(string.strip().casefold()).intersection(alphabet))
...

Python: Find the longest word in a string

I'm preparing for an exam but I'm having difficulties with one past-paper question. Given a string containing a sentence, I want to find the longest word in that sentence and return that word and its length. Edit: I only needed to return the length but I appreciate your answers for the original question! It helps me learn more. Thank you.
For example: string = "Hello I like cookies". My program should then return "Cookies" and the length 7.
Now the thing is that I am not allowed to use any function from the class String for a full score, and for a full score I can only go through the string once. I am not allowed to use string.split() (otherwise there wouldn't be any problem) and the solution shouldn't have too many for and while statements. The strings contains only letters and blanks and words are separated by one single blank.
Any suggestions? I'm lost i.e. I don't have any code.
Thanks.
EDIT: I'm sorry, I misread the exam question. You only have to return the length of the longest word it seems, not the length + the word.
EDIT2: Okay, with your help I think I'm onto something...
def longestword(x):
alist = []
length = 0
for letter in x:
if letter != " ":
length += 1
else:
alist.append(length)
length = 0
return alist
But it returns [5, 1, 4] for "Hello I like cookies" so it misses "cookies". Why? EDIT: Ok, I got it. It's because there's no more " " after the last letter in the sentence and therefore it doesn't append the length. I fixed it so now it returns [5, 1, 4, 7] and then I just take the maximum value.
I suppose using lists but not .split() is okay? It just said that functions from "String" weren't allowed or are lists part of strings?
You can try to use regular expressions:
import re
string = "Hello I like cookies"
word_pattern = "\w+"
regex = re.compile(word_pattern)
words_found = regex.findall(string)
if words_found:
longest_word = max(words_found, key=lambda word: len(word))
print(longest_word)
Finding a max in one pass is easy:
current_max = 0
for v in values:
if v>current_max:
current_max = v
But in your case, you need to find the words. Remember this quote (attribute to J. Zawinski):
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
Besides using regular expressions, you can simply check that the word has letters. A first approach is to go through the list and detect start or end of words:
current_word = ''
current_longest = ''
for c in mystring:
if c in string.ascii_letters:
current_word += c
else:
if len(current_word)>len(current_longest):
current_longest = current_word
current_word = ''
else:
if len(current_word)>len(current_longest):
current_longest = current_word
A final way is to split words in a generator and find the max of what it yields (here I used the max function):
def split_words(mystring):
current = []
for c in mystring:
if c in string.ascii_letters:
current.append(c)
else:
if current:
yield ''.join(current)
max(split_words(mystring), key=len)
Just search for groups of non-whitespace characters, then find the maximum by length:
longest = len(max(re.findall(r'\S+',string), key = len))
For python 3. If both the words in the sentence is of the same length, then it will return the word that appears first.
def findMaximum(word):
li=word.split()
li=list(li)
op=[]
for i in li:
op.append(len(i))
l=op.index(max(op))
print (li[l])
findMaximum(input("Enter your word:"))
It's quite simple:
def long_word(s):
n = max(s.split())
return(n)
IN [48]: long_word('a bb ccc dddd')
Out[48]: 'dddd'
found an error in a previous provided solution, he's the correction:
def longestWord(text):
current_word = ''
current_longest = ''
for c in text:
if c in string.ascii_letters:
current_word += c
else:
if len(current_word)>len(current_longest):
current_longest = current_word
current_word = ''
if len(current_word)>len(current_longest):
current_longest = current_word
return current_longest
I can see imagine some different alternatives. Regular expressions can probably do much of the splitting words you need to do. This could be a simple option if you understand regexes.
An alternative is to treat the string as a list, iterate over it keeping track of your index, and looking at each character to see if you're ending a word. Then you just need to keep the longest word (longest index difference) and you should find your answer.
Regular Expressions seems to be your best bet. First use re to split the sentence:
>>> import re
>>> string = "Hello I like cookies"
>>> string = re.findall(r'\S+',string)
\S+ looks for all the non-whitespace characters and puts them in a list:
>>> string
['Hello', 'I', 'like', 'cookies']
Now you can find the length of the list element containing the longest word and then use list comprehension to retrieve the element itself:
>>> maxlen = max(len(word) for word in string)
>>> maxlen
7
>>> [word for word in string if len(word) == maxlen]
['cookies']
This method uses only one for loop, doesn't use any methods in the String class, strictly accesses each character only once. You may have to modify it depending on what characters count as part of a word.
s = "Hello I like cookies"
word = ''
maxLen = 0
maxWord = ''
for c in s+' ':
if c == ' ':
if len(word) > maxLen:
maxWord = word
word = ''
else:
word += c
print "Longest word:", maxWord
print "Length:", len(maxWord)
Given you are not allowed to use string.split() I guess using a regexp to do the exact same thing should be ruled out as well.
I do not want to solve your exercise for you, but here are a few pointers:
Suppose you have a list of numbers and you want to return the highest value. How would you do that? What information do you need to track?
Now, given your string, how would you build a list of all word lengths? What do you need to keep track of?
Now, you only have to intertwine both logics so computed word lengths are compared as you go through the string.
My proposal ...
import re
def longer_word(sentence):
word_list = re.findall("\w+", sentence)
word_list.sort(cmp=lambda a,b: cmp(len(b),len(a)))
longer_word = word_list[0]
print "The longer word is '"+longer_word+"' with a size of", len(longer_word), "characters."
longer_word("Hello I like cookies")
import re
def longest_word(sen):
res = re.findall(r"\w+",sen)
n = max(res,key = lambda x : len(x))
return n
print(longest_word("Hey!! there, How is it going????"))
Output : there
Here I have used regex for the problem. Variable "res" finds all the words in the string and itself stores them in the list after splitting them.
It uses split() to store all the characters in a list and then regex does the work.
findall keyword is used to find all the desired instances in a string. Here \w+ is defined which tells the compiler to look for all the words without any spaces.
Variable "n" finds the longest word from the given string which is now free of any undesired characters.
Variable "n" uses lambda expressions to define the key len() here.
Variable "n" finds the longest word from "res" which has removed all the non-string charcters like %,&,! etc.
>>>#import regular expressions for the problem.**
>>>import re
>>>#initialize a sentence
>>>sen = "fun&!! time zone"
>>>res = re.findall(r"\w+",sen)
>>>#res variable finds all the words and then stores them in a list.
>>>res
Out: ['fun','time','zone']
>>>n = max(res)
Out: zone
>>>#Here we get "zone" instead of "time" because here the compiler
>>>#sees "zone" with the higher value than "time".
>>>#The max() function returns the item with the highest value, or the item with the highest value in an iterable.
>>>n = max(res,key = lambda x:len(x))
>>>n
Out: time
Here we get "time" because lambda expression discards "zone" as it sees the key is for len() in a max() function.
list1 = ['Happy', 'Independence', 'Day', 'Zeal']
listLen = []
for i in list1:
listLen.append(len(i))
print list1[listLen.index(max(listLen))]
Output - Independence

Return Boolean depending on whether or not string is in alphabetical order

This is what I have so far in python:
def Alphaword():
alphabet = "abcdefghijklmnopqrstuvwxyz"
x = alphabet.split()
i = 0
word = input("Enter a word: ").split()
I'm planning on using a for loop for this problem but not sure how to start it.
Think about it this way - a word containing letters (if they're in alphabetical order) should be equal to itself when forced to be in alphabetical order, so:
def alpha_word():
word = list(input('Enter a word: '))
return word == sorted(word)
That's the naive approach anyway... if you had massive iterables of sequences, there are other techniques, but for strings typed via input, it's practical enough.
There are two ways:
You just loop over each char in the string, use a test like if a[i]<a[i+1]. This works because 'a' < 'b' is true.
You can split string to char list, sort it and compare it to the original list.
Python supports direct comparison of characters. For instance, 'a' < 'b' < 'c' and so on. Use a for loop to go through each letter in the word and compare it to the previous letter:
def is_alphabetical(word):
lowest = word[0]
for letter in word:
if letter >= lowest:
lowest = letter
else:
return False
return True

Finding the largest repeating substring

Here is a function I wrote that will take a very long text file. Such as a text file containing an entire textbook. It will find any repeating substrings and output the largest string. Right now it doesn't work however, it just outputs the string I put in
For example, if there was a typo in which an entire sentence was repeated. It would output that sentence; given it is the largest in the whole file. If there was a typo where an entire paragraph was typed twice, it would output the paragraph.
This algorithm takes the first character, finds any matches, and if it does and if the length is the largest, store the substring. Then it takes the first 2 characters and repeats. Then the first 3 characters. etc.. Then it will start over except starting at the 2nd character instead of the 1st. Then all the way through and back, starting at the 3rd character.
def largest_substring(string):
length = 0
x,y=0,0
for y in range(len(string)): #start at string[0, ]
for x in range(len(string)): #start at string[ ,0]
substring = string[y:x] #substring is [0,0] first, then [0,1], then [0.2]... then [1,1] then [1,2] then [1,3]... then [2,2] then [2,3]... etc.
if substring in string: #if substring found and length is longest so far, save the substring and proceed.
if len(substring) > length:
match = substring
length = len(substring)
I think your logic is flawed here because it will always return the entire string as it checks whether a substring is in whole string which is always true so the statement if substring in string will be always true. Instead you need to find if the substring occurs more than once in the entire string and then update the count.
Here is example of brute force algorithm that solves it :-
import re
def largest_substring(string):
length = 0
x=0
y=0
for y in range(len(string)):
for x in range(len(string)):
substring = string[y:x]
if len(list(re.finditer(substring,string))) > 1 and len(substring) > length:
match = substring
length = len(substring)
return match
print largest_substring("this is repeated is repeated is repeated")

Categories