Python - Finding the longest word in a text file error - python

I am trying to find the longest word in a text file and it keeps on saying:
ValueError: max() arg is an empty sequence
def find_longest_word(filename):
with open(filename,'r+') as f:
words = f.read().split()
max_len_word = max(words,key=len)
print('maximum length word in file :',max_len_word)
print('length is : ',max_len_word)
print(find_longest_word('data1.txt'))
What did I do wrong?

This should work for you:
from functools import reduce
def find_longest_word(filename):
f = open(filename, "r")
s = [y for x in f.readlines() for y in x.split()]
longest_word = reduce(lambda x, y: y if len(x) < len(y) else x, s, "")
print("The longest word is", longest_word, "and it is", len(longest_word),"characters long")
return longest_word
print(find_longest_word('input.txt'))

I have tested the code inside the function and it works but without the function declaration.
I see that your code is missing indentation on line 2. Also, you want to print a return value from the function but your function does not return anything. So maybe your code should be like this.
def find_longest_word(filename):
with open(filename,'r+') as f:
words = f.read().split()
max_len_word = max(words,key=len)
max_len = len(max(words,key=len))
return max_len_word, max_len
And the usage of the function should be like this.
word, length = find_longest_word('data1.txt')
print("max length word in file: ", word)
print("length is: ", length)

Related

How to find the some word in the text file

I have a file text, I want filter some word in the text with condition:
1) same the length and starting with same a letter
2) find words with the at least 2 correctly placed letters
For example:
word = bubal
text
byres
brits
blurb
bulks
bible
debug
debut
and want to output: ['bulks', 'bible'] with bulks have 'b' and 'u' correctly placed and bible have 2 b correctly placed with bubal
My ideal to find the word with starting a lettre and so find the word same length and then find the word correct 2nd condition
But I write the code find the word starting by using re and it don't run good
import re
with open('words.txt','r') as file:
liste = file.read()
word = re.findall('[b]\w+',liste)
print(word)
My code return the ['byres','brits','bulks','but','bug']
How to fix it and find word flows condition
Edited based on your comment.
This may be what you're after:
#!/usr/bin/env python
def find_best_letter_matches(lines, target):
m = []
m_count = 0
for line in lines:
count = sum(map(lambda x: x[0] == x[1], zip(line, target)))
if count > m_count:
m = []
m_count = count
if count == m_count:
m.append(line)
return m
def find_n_letter_matches(lines, target, n):
m = []
for line in lines:
count = sum(map(lambda x: x[0] == x[1], zip(line, target)))
if count >= n:
m.append(line)
return m
if __name__ == '__main__':
with open('text.txt', 'r') as f:
lines = f.read().split('\n')
best_matches = find_best_letter_matches(lines, 'bubal')
n_matches = find_n_letter_matches(lines, 'bubal', 2)
print('Best letter matches', best_matches)
print('At least 2 letters match', n_matches)
The functions compare each line to the target, letter by letter, and counts the number of matches. The first then returns the list of the highest matching lines, and the second returns all that match with n or more letters.
The output with your example text (with bubal added) is:
Best letter matches ['bubal']
At least 2 letters match ['bulks', 'bible', 'bubal']
Try this
wordToSearch = "bubal"
singlesChar = list(wordToSearch)
finalArray = []
with open('words.txt','r') as file:
liste = file.readlines()
for each in liste:
each = each.rstrip()
fn = list(each)
flag = 0
for i in range(0,len(singlesChar)):
if(fn[i] == singlesChar[i]):
flag+=1
if(flag >= 2): finalArray.append(each)
print(finalArray)

How to use string methods on text files?

I have to write a program where I need to find
the number of uppercase letters
the number of lowercase letters
the number of digits
the number of whitespace characters
in a text file and my current code is
def lowercase(line_list):
print("Lower case Letters: ", sum(1 for x in line_list if x.islower))
def uppercase(line_list):
print("Upper case Letters: ", sum(1 for c in line_list if c.isupper())
def numbers(line_list):
print("Numbers: ", sum(1 for b in line_list if b.isdigit())
def whitespace(line_list):
print("Spaces: ", sum(1 for y in line_list if y.isspace())
def main():
in_file = open("text.txt", "r")
line = in_file.readline()
line_list = line.split()
lowercase(line_list)
uppercase(line_list)
numbers(line_list)
whitespace(line_list)
in_file.close()
main()
However whenever I try to run the script it gives me a syntax error. Is there something I am doing wrong?
Right now, you have a syntax error in your lowercase function (you're missing the parens for the function call islower). However, your main function also has some problems. Right now, you are only reading in one line of the file. Also, you're splitting that line (split splits using space by default, so you will lose the spaces you are trying to count). If you're trying to read the whole thing, not just one line. Try this:
def main():
lower_case = 0
upper_case = 0
numbers = 0
whitespace = 0
with open("text.txt", "r") as in_file:
for line in in_file:
lower_case += sum(1 for x in line if x.islower())
upper_case += sum(1 for x in line if x.isupper())
numbers += sum(1 for x in line if x.isdigit())
whitespace += sum(1 for x in line if x.isspace())
print 'Lower case Letters: %s' % lower_case
print 'Upper case Letters: %s' % upper_case
print 'Numbers: %s' % numbers
print 'Spaces: %s' % spaces
main()
Here it is code where syntax errors resolved:
You have missed closing parenthesis in several places.
def lowercase(line_list):
print("Lower case Letters: ", sum(1 for x in line_list if x.islower))
def uppercase(line_list):
print("Upper case Letters: ", sum(1 for c in line_list if c.isupper()))
def numbers(line_list):
print("Numbers: ", sum(1 for b in line_list if b.isdigit()))
def whitespace(line_list):
print("Spaces: ", sum(1 for y in line_list if y.isspace()))
def main():
in_file = open("text.txt", "r")
line = in_file.readline()
line_list = line.split()
lowercase(line_list)
uppercase(line_list)
numbers(line_list)
whitespace(line_list)
in_file.close()
main()
Note: This is only solution for error you faced, there may be any other errors occurring due to the logic issues you have to check for the same.

Longest word in a txt file python code?

This is my version of this program. Is there a shorter and simpler way to do this?
d = open('txt.txt','r')
l = d.readlines()
string = l[0]
stringsplit = string.split()
d = []
for i in stringsplit:
d.append(len(i))
e = max(d)
for j in stringsplit:
if len(j) == e:
print("The longest word is", j, "and it is", len(j),"characters long")
You can use list comprehensions to turn the input into a single list and then use reduce on the resulting list to find the longest string.
f = open("input.txt", "r")
s = [y for x in f.readlines() for y in x.split()]
longest = reduce(lambda x, y: y if len(x) < len(y) else x, s, "")
print("The longest word is", longest, "and it is", len(longest),"characters long")
You can use the max() built-in function:
longest = max(stringsplit, key=len)
print("The longest word is {} and it is {} characters long".format(longest, len(longest)))
It is worth noting, however, that stringsplit is a list of the words in just the first line of the file, not the whole file itself. If you want it to be all of the words, use stringsplit = d.read().split()

Python v3 Find The Longest Word (Error Message)

I'm using Python 3.4 and am getting an error message " 'wordlist is not defined' " in my program. What am I doing wrong? Please respond with code.
The program is to find the longest word:
def find_longest_word(a):
length = len(a[0])
word = a[0]
for i in wordlist:
word = (i)
length = len(i)
return word, length
def main():
wordlist = input("Enter a list of words seperated by spaces ".split()
word, length = find_longestest_word(wordlist)
print (word, "is",length,"characters long.")
main()
Apart from the problems with your code indentation, your find_longest_word() function doesn't really have any logic in it to find the longest word. Also, you pass it a parameter named a, but you never use a in the function, instead you use wordlist...
The code below does what you want. The len() function in Python is very efficient because all Python container objects store their current length, so it's rarely worth bothering to store length in a separate variable. So my find_longest_word() simply stores the longest word it's encountered so far.
def find_longest_word(wordlist):
longest = ''
for word in wordlist:
if len(word) > len(longest):
longest = word
return longest
def main():
wordlist = input("Enter a list of words separated by spaces: ").split()
word = find_longest_word(wordlist)
print(word, "is" ,len(word), "characters long.")
if __name__ == '__main__':
main()
The line "return word, length" is outside any function. The closest function is "find_longest_word(a)", so if you want it to be a part of that function, you need to indent lines 4-7.
Indentation matters in Python. As the error says, you have the return outside the function. Try:
def find_longest_word(a):
length = len(a[0])
word = a[0]
for i in wordlist:
word = (i)
length = len(i)
return word, length
def main():
wordlist = input("Enter a list of words seperated by spaces ".split()
word, length = find_longestest_word(wordlist)
print (word, "is",length,"characters long.")
main()
In python the indentation is very important. It should be:
def find_longest_word(a):
length = len(a[0])
word = a[0]
for i in wordlist:
word = (i)
length = len(i)
return word, length
But because of the function name, I think the implementation is wrong.

Why is this not correct? (codeeval challenge)PYTHON

This is what I have to do https://www.codeeval.com/open_challenges/140/
I've been on this challenge for three days, please help. It it is 85-90 partially solved. But not 100% solved... why?
This is my code:
import sys
test_cases = open(sys.argv[1], 'r')
for test in test_cases:
saver=[]
text=""
textList=[]
positionList=[]
num=0
exists=int()
counter=0
for l in test.strip().split(";"):
saver.append(l)
for i in saver[0].split(" "):
textList.append(i)
for j in saver[1].split(" "):
positionList.append(j)
for i in range(0,len(positionList)):
positionList[i]=int(positionList[i])
accomodator=[None]*len(textList)
for n in range(1,len(textList)):
if n not in positionList:
accomodator[n]=textList[len(textList)-1]
exists=n
for item in positionList:
accomodator[item-1]=textList[counter]
counter+=1
if counter>item:
accomodator[exists-1]=textList[counter]
for word in accomodator:
text+=str(word) + " "
print text
test_cases.close()
This code works for me:
import sys
def main(name_file):
_file = open(name_file, 'r')
text = ""
while True:
try:
line = _file.next()
disordered_line, numbers_string = line.split(';')
numbers_list = map(int, numbers_string.strip().split(' '))
missing_number = sum(xrange(sorted(numbers_list)[0],sorted(numbers_list)[-1]+1)) - sum(numbers_list)
if missing_number == 0:
missing_number = len(disordered_line)
numbers_list.append(missing_number)
disordered_list = disordered_line.split(' ')
string_position = zip(disordered_list, numbers_list)
ordered = sorted(string_position, key = lambda x: x[1])
text += " ".join([x[0] for x in ordered])
text += "\n"
except StopIteration:
break
_file.close()
print text.strip()
if __name__ == '__main__':
main(sys.argv[1])
I'll try to explain my code step by step so maybe you can see the difference between your code and mine one:
while True
A loop that breaks when there are no more lines.
try:
I put the code inside a try and catch the StopIteracion exception, because this is raised when there are no more items in a generator.
line = _file.next()
Use a generator, so that way you do not put all the lines in memory from once.
disordered_line, numbers_string = line.split(';')
Get the unordered phrase and the numbers of every string's position.
numbers_list = map(int, numbers_string.strip().split(' '))
Convert every number from string to int
missing_number = sum(xrange(sorted(numbers_list)[0],sorted(numbers_list)[-1]+1)) - sum(numbers_list)
Get the missing number from the serial of numbers, so that missing number is the position of the last string in the phrase.
if missing_number == 0:
missing_number = len(unorder_line)
Check if the missing number is equal to 0 if so then the really missing number is equal to the number of the strings that make the phrase.
numbers_list.append(missing_number)
Append the missing number to the list of numbers.
disordered_list = disordered_line.split(' ')
Conver the disordered phrase into a list.
string_position = zip(disordered_list, numbers_list)
Combine every string with its respective position.
ordered = sorted(string_position, key = lambda x: x[1])
Order the combined list by the position of the string.
text += " ".join([x[0] for x in ordered])
Concatenate the ordered phrase, and the reamining code it's easy to understand.
UPDATE
By looking at your code here is my opinion tha might solve your problem.
split already returns a list so you do not have to loop over the splitted content to add that content to another list.
So these six lines:
for l in test.strip().split(";"):
saver.append(l)
for i in saver[0].split(" "):
textList.append(i)
for j in saver[1].split(" "):
positionList.append(j)
can be converted into three:
splitted_test = test.strip().split(';')
textList = splitted_test[0].split(" ")
positionList = map(int, splitted_test[1].split(" "))
In this line positionList = map(int, splitted_test[0].split(" ")) You already convert numbers into int, so you save these two lines:
for i in range(0,len(positionList)):
positionList[i]=int(positionList[i])
The next lines:
accomodator=[None]*len(textList)
for n in range(1,len(textList)):
if n not in positionList:
accomodator[n]=textList[len(textList)-1]
exists=n
can be converted into the next four:
missing_number = sum(xrange(sorted(positionList)[0],sorted(positionList)[-1]+1)) - sum(positionList)
if missing_number == 0:
missing_number = len(textList)
positionList.append(missing_number)
Basically what these lines do is calculate the missing number in the serie of numbers so the len of the serie is the same as textList.
The next lines:
for item in positionList:
accomodator[item-1]=textList[counter]
counter+=1
if counter>item:
accomodator[exists-1]=textList[counter]
for word in accomodator:
text+=str(word) + " "
Can be replaced by these ones:
string_position = zip(textList, positionList)
ordered = sorted(string_position, key = lambda x: x[1])
text += " ".join([x[0] for x in ordered])
text += "\n"
From this way you can save, lines and memory, also use xrange instead of range.
Maybe the factors that make your code pass partially could be:
Number of lines of the script
Number of time your script takes.
Number of memory your script uses.
What you could do is:
Use Generators. #You save memory
Reduce for's, this way you save lines of code and time.
If you think something could be made it easier, do it.
Do not redo the wheel, if something has been already made it, use it.

Categories