How to use string methods on text files?

How to use string methods on text files? - python

I have to write a program where I need to find
the number of uppercase letters
the number of lowercase letters
the number of digits
the number of whitespace characters
in a text file and my current code is
def lowercase(line_list):
print("Lower case Letters: ", sum(1 for x in line_list if x.islower))
def uppercase(line_list):
print("Upper case Letters: ", sum(1 for c in line_list if c.isupper())
def numbers(line_list):
print("Numbers: ", sum(1 for b in line_list if b.isdigit())
def whitespace(line_list):
print("Spaces: ", sum(1 for y in line_list if y.isspace())
def main():
in_file = open("text.txt", "r")
line = in_file.readline()
line_list = line.split()
lowercase(line_list)
uppercase(line_list)
numbers(line_list)
whitespace(line_list)
in_file.close()
main()
However whenever I try to run the script it gives me a syntax error. Is there something I am doing wrong?

Right now, you have a syntax error in your lowercase function (you're missing the parens for the function call islower). However, your main function also has some problems. Right now, you are only reading in one line of the file. Also, you're splitting that line (split splits using space by default, so you will lose the spaces you are trying to count). If you're trying to read the whole thing, not just one line. Try this:
def main():
lower_case = 0
upper_case = 0
numbers = 0
whitespace = 0
with open("text.txt", "r") as in_file:
for line in in_file:
lower_case += sum(1 for x in line if x.islower())
upper_case += sum(1 for x in line if x.isupper())
numbers += sum(1 for x in line if x.isdigit())
whitespace += sum(1 for x in line if x.isspace())
print 'Lower case Letters: %s' % lower_case
print 'Upper case Letters: %s' % upper_case
print 'Numbers: %s' % numbers
print 'Spaces: %s' % spaces
main()

Here it is code where syntax errors resolved:
You have missed closing parenthesis in several places.
def lowercase(line_list):
print("Lower case Letters: ", sum(1 for x in line_list if x.islower))
def uppercase(line_list):
print("Upper case Letters: ", sum(1 for c in line_list if c.isupper()))
def numbers(line_list):
print("Numbers: ", sum(1 for b in line_list if b.isdigit()))
def whitespace(line_list):
print("Spaces: ", sum(1 for y in line_list if y.isspace()))
def main():
in_file = open("text.txt", "r")
line = in_file.readline()
line_list = line.split()
lowercase(line_list)
uppercase(line_list)
numbers(line_list)
whitespace(line_list)
in_file.close()
main()
Note: This is only solution for error you faced, there may be any other errors occurring due to the logic issues you have to check for the same.

Related

Python - Finding the longest word in a text file error

I am trying to find the longest word in a text file and it keeps on saying:
ValueError: max() arg is an empty sequence
def find_longest_word(filename):
with open(filename,'r+') as f:
words = f.read().split()
max_len_word = max(words,key=len)
print('maximum length word in file :',max_len_word)
print('length is : ',max_len_word)
print(find_longest_word('data1.txt'))
What did I do wrong?

This should work for you:
from functools import reduce
def find_longest_word(filename):
f = open(filename, "r")
s = [y for x in f.readlines() for y in x.split()]
longest_word = reduce(lambda x, y: y if len(x) < len(y) else x, s, "")
print("The longest word is", longest_word, "and it is", len(longest_word),"characters long")
return longest_word
print(find_longest_word('input.txt'))

I have tested the code inside the function and it works but without the function declaration.
I see that your code is missing indentation on line 2. Also, you want to print a return value from the function but your function does not return anything. So maybe your code should be like this.
def find_longest_word(filename):
with open(filename,'r+') as f:
words = f.read().split()
max_len_word = max(words,key=len)
max_len = len(max(words,key=len))
return max_len_word, max_len
And the usage of the function should be like this.
word, length = find_longest_word('data1.txt')
print("max length word in file: ", word)
print("length is: ", length)

reading a text file and counting how many times a word is repeated. Using .split function. Now wants it to ignore case sensitive

Getting the desired output so far.
The program prompts user to search for a word.
user enters it and the program reads the file and gives the output.
'ashwin: 2'
Now i want it to ignore case sensitive. For example, "Ashwin" and "ashwin" both shall return 2, as it contains two ashwin`s in the text file.
def word_count():
file = "test.txt"
word = input("Enter word to be searched:")
k = 0
with open(file, 'r') as f:
for line in f:
words = line.split()
for i in words:
if i == word:
k = k + 1
print(word + ": " + str(k))
word_count()

You could use lower() to compare the strings in this part if i.lower() == word.lower():
For example:
def word_count():
file = "test.txt"
word = input("Enter word to be searched:")
k = 0
with open(file, 'r') as f:
for line in f:
words = line.split()
for i in words:
if i.lower() == word.lower():
k = k + 1
print(word + ": " + str(k))
word_count()

You can either use .lower on the line and word to eliminate case.
Or you can use the built-in re module.
len(re.findall(word, text, flags=re.IGNORECASE))

Use the Counter class from collections that returns an dictionary with key value pairs that could be accessed using O(1) time.
from collections import Counter
def word_count():
file = "test.txt"
with open(file, 'r') as f:
words = f.read().replace('\n', '').lower().split()
count = Counter(words)
word = input("Enter word to be searched:")
print(word, ":", count.get(word.lower()))

python argument of type 'int' is not iterable when using for in

I am trying to write code that will accept a filename from the command line and
print out the following properties:
number of lines
number of characters
number of words
number of “the”
number of “a/an"
I keep getting the error message
"argument of type 'int' is not iterable"
for the line if 'the' in words:.
How do I fix this?
import sys
import string
file_name=sys.argv[0]
char= words = lines = theCount = aCount= 0
with open(file_name,'r') as in_file:
for line in in_file:
lines +=1
words +=len(line.split())
char +=len(line)
if 'the' in words:
theCount +=1
if 'a' in words:
a +=1
if 'an' in words:
a +=1
print("Filename:", file_name)
print("Number of lines:", lines)
print("Number of characters:", char)
print("Number of 'the'", theCount)
print("Number of a/an:", aCount)

If you are trying to collect the actual words, rather than just the count of them, then perhaps you need to initialize words to an empty list:
words = []
and change
words += len(line.split())
to
words += line.split()

there are some errors in your code, read the comments in this snipped:
import sys
#import string #not sure if this is needed
file_name=sys.argv[0]
char= words = lines = theCount = aCount= 0
with open(file_name,'r') as in_file:
for line in in_file:
lines +=1
x = line.split() #use a variable to hold the split words
#so that you can search in it
words +=len(x)
char +=len(line)
if 'the' in x: #your original code was using "words" variable
#that holds the "number of words" in the line,
#therefore ints are not iterable
theCount +=1
if 'a' in x:
aCount +=1 #your original code using "a" variable
#which did not initialized,
#you have initialized "aCount" variable
if 'an' in x:
aCount +=1 #same as above
print("Filename:", file_name)
print("Number of lines:", lines)
print("Number of characters:", char)
print("Number of 'the'", theCount)
print("Number of a/an:", aCount)
https://repl.it/Mnwz/0

find all spaces, newlines and tabs in a python file

def count_spaces(filename):
input_file = open(filename,'r')
file_contents = input_file.read()
space = 0
tabs = 0
newline = 0
for line in file_contents == " ":
space +=1
return space
for line in file_contents == '\t':
tabs += 1
return tabs
for line in file_contents == '\n':
newline += 1
return newline
input_file.close()
I'm trying to write a function which takes a filename as a parameter and returns the total number of all spaces, newlines and also tab characters in the file. I want to try use a basic for loop and if statement but I'm struggling at the moment :/ any help would be great thanks.

Your current code doesn't work because you're combining loop syntax (for x in y) with a conditional test (x == y) in a single muddled statement. You need to separate those.
You also need to use just a single return statement, as otherwise the first one you reach will stop the function from running and the other values will never be returned.
Try:
for character in file_contents:
if character == " ":
space +=1
elif character == '\t':
tabs += 1
elif character == '\n':
newline += 1
return space, tabs, newline
The code in Joran Beasley's answer is a more Pythonic approach to the problem. Rather than having separate conditions for each kind of character, you can use the collections.Counter class to count the occurrences of all characters in the file, and just extract the counts of the whitespace characters at the end. A Counter works much like a dictionary.
from collections import Counter
def count_spaces(filename):
with open(filename) as in_f:
text = in_f.read()
count = Counter(text)
return count[" "], count["\t"], count["\n"]

To support large files, you could read a fixed number of bytes at a time:
#!/usr/bin/env python
from collections import namedtuple
Count = namedtuple('Count', 'nspaces ntabs nnewlines')
def count_spaces(filename, chunk_size=1 << 13):
"""Count number of spaces, tabs, and newlines in the file."""
nspaces = ntabs = nnewlines = 0
# assume ascii-based encoding and b'\n' newline
with open(filename, 'rb') as file:
chunk = file.read(chunk_size)
while chunk:
nspaces += chunk.count(b' ')
ntabs += chunk.count(b'\t')
nnewlines += chunk.count(b'\n')
chunk = file.read(chunk_size)
return Count(nspaces, ntabs, nnewlines)
if __name__ == "__main__":
print(count_spaces(__file__))
Output
Count(nspaces=150, ntabs=0, nnewlines=20)
mmap allows you to treat a file as a bytestring without actually loading the whole file into memory e.g., you could search for a regex pattern in it:
#!/usr/bin/env python3
import mmap
import re
from collections import Counter, namedtuple
Count = namedtuple('Count', 'nspaces ntabs nnewlines')
def count_spaces(filename, chunk_size=1 << 13):
"""Count number of spaces, tabs, and newlines in the file."""
nspaces = ntabs = nnewlines = 0
# assume ascii-based encoding and b'\n' newline
with open(filename, 'rb', 0) as file, \
mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as s:
c = Counter(m.group() for m in re.finditer(br'[ \t\n]', s))
return Count(c[b' '], c[b'\t'], c[b'\n'])
if __name__ == "__main__":
print(count_spaces(__file__))
Output
Count(nspaces=107, ntabs=0, nnewlines=18)

C=Counter(open(afile).read())
C[' ']

In my case tab(\t) is converted to " "(four spaces). So i have modified the
logic a bit to take care of that.
def count_spaces(filename):
with open(filename,"r") as f1:
contents=f1.readlines()
total_tab=0
total_space=0
for line in contents:
total_tab += line.count(" ")
total_tab += line.count("\t")
total_space += line.count(" ")
print("Space count = ",total_space)
print("Tab count = ",total_tab)
print("New line count = ",len(contents))
return total_space,total_tab,len(contents)

Print out the character, word, and line amounts using Python

This is what I have so far:
def stats(filename):
' prints the number of lines, words, and characters in file filename'
infile = open(filename)
lines = infile.readlines()
words = infile.read()
chars = infile.read()
infile.close()
print("line count:", len(lines))
print("word count:", len(words.split()))
print("character counter:", len(chars))
When executed, return the number of lines properly, but return 0 for words and character counts. Not sure why...

You can iterate through the file once and count lines, words and chars without seeking back to the beginning multiple times, which you would need to do with your approach because you exhaust the iterator when counting lines:
def stats(filename):
' prints the number of lines, words, and characters in file filename'
lines = chars = 0
words = []
with open(filename) as infile:
for line in infile:
lines += 1
words.extend(line.split())
chars += len(line)
print("line count:", lines)
print("word count:", len(words))
print("character counter:", chars)
return len(words) > len(set(words)) # Returns True if duplicate words
Or alternatively use the side effect that the file is at the end for chars:
def stats(filename):
' prints the number of lines, words, and characters in file filename'
words = []
with open(filename) as infile:
for lines, line in enumerate(infile, 1):
words.extend(line.split())
chars = infile.tell()
print("line count:", lines)
print("word count:", len(words))
print("character counter:", chars)
return len(words) > len(set(words)) # Returns True if duplicate words

you need to go back to beginning of file with infile.seek(0) after you read the position is at the end, seek(0) resets it to the start, so that you can read again.
infile = open('data')
lines = infile.readlines()
infile.seek(0)
print(lines)
words = infile.read()
infile.seek(0)
chars = infile.read()
infile.close()
print("line count:", len(lines))
print("word count:", len(words.split()))
print("character counter:", len(chars))
Output:
line count: 2
word count: 19
character counter: 113
other way of doing it....:
from collections import Counter
from itertools import chain
infile = open('data')
lines = infile.readlines()
cnt_lines = len(lines)
words = list(chain.from_iterable([x.split() for x in lines]))
cnt_words = len(words)
cnt_chars = len([ c for word in words for c in word])
# show words frequency
print(Counter(words))

You have exhausted the iterator after you call to readlines, you can seek back to the start but really you don't need to read all the file into memory at all:
def stats(filename):
chars, words, dupes = 0, 0, False
seen = set()
with open(filename) as f:
for i, line in enumerate(f, 1):
chars += len(line)
spl = line.split()
words += len(spl)
if dupes or not seen.isdisjoint(spl):
dupes = True
elif not dupes:
seen.update(spl)
return i, chars, words, dupes
Then assign the values by unpacking:
no_lines, no_chars, no_words, has_dupes = stats("your_file")
You may want to use chars += len(line.rstrip()) if you don't want to include the line endings. The code only stores exactly the amount of data needed, using readlines, read, dicts of full data etc.. means for large files your code won't be very practical

File_Name = 'file.txt'
line_count = 0
word_count = 0
char_count = 0
with open(File_Name,'r') as fh:
# This will produce a list of lines.
# Each line of the file will be an element of the list.
data = fh.readlines()
# Count of total number for list elements == total number of lines.
line_count = len(data)
for line in data:
word_count = word_count + len(line.split())
char_count = char_count + len(line)
print('Line Count : ' , line_count )
print('Word Count : ', word_count)
print('Char Count : ', char_count)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to use string methods on text files? - python

Related

Python - Finding the longest word in a text file error

reading a text file and counting how many times a word is repeated. Using .split function. Now wants it to ignore case sensitive

python argument of type 'int' is not iterable when using for in

find all spaces, newlines and tabs in a python file

Print out the character, word, and line amounts using Python

Categories

Resources