find all spaces, newlines and tabs in a python file

find all spaces, newlines and tabs in a python file - python

def count_spaces(filename):
input_file = open(filename,'r')
file_contents = input_file.read()
space = 0
tabs = 0
newline = 0
for line in file_contents == " ":
space +=1
return space
for line in file_contents == '\t':
tabs += 1
return tabs
for line in file_contents == '\n':
newline += 1
return newline
input_file.close()
I'm trying to write a function which takes a filename as a parameter and returns the total number of all spaces, newlines and also tab characters in the file. I want to try use a basic for loop and if statement but I'm struggling at the moment :/ any help would be great thanks.

Your current code doesn't work because you're combining loop syntax (for x in y) with a conditional test (x == y) in a single muddled statement. You need to separate those.
You also need to use just a single return statement, as otherwise the first one you reach will stop the function from running and the other values will never be returned.
Try:
for character in file_contents:
if character == " ":
space +=1
elif character == '\t':
tabs += 1
elif character == '\n':
newline += 1
return space, tabs, newline
The code in Joran Beasley's answer is a more Pythonic approach to the problem. Rather than having separate conditions for each kind of character, you can use the collections.Counter class to count the occurrences of all characters in the file, and just extract the counts of the whitespace characters at the end. A Counter works much like a dictionary.
from collections import Counter
def count_spaces(filename):
with open(filename) as in_f:
text = in_f.read()
count = Counter(text)
return count[" "], count["\t"], count["\n"]

To support large files, you could read a fixed number of bytes at a time:
#!/usr/bin/env python
from collections import namedtuple
Count = namedtuple('Count', 'nspaces ntabs nnewlines')
def count_spaces(filename, chunk_size=1 << 13):
"""Count number of spaces, tabs, and newlines in the file."""
nspaces = ntabs = nnewlines = 0
# assume ascii-based encoding and b'\n' newline
with open(filename, 'rb') as file:
chunk = file.read(chunk_size)
while chunk:
nspaces += chunk.count(b' ')
ntabs += chunk.count(b'\t')
nnewlines += chunk.count(b'\n')
chunk = file.read(chunk_size)
return Count(nspaces, ntabs, nnewlines)
if __name__ == "__main__":
print(count_spaces(__file__))
Output
Count(nspaces=150, ntabs=0, nnewlines=20)
mmap allows you to treat a file as a bytestring without actually loading the whole file into memory e.g., you could search for a regex pattern in it:
#!/usr/bin/env python3
import mmap
import re
from collections import Counter, namedtuple
Count = namedtuple('Count', 'nspaces ntabs nnewlines')
def count_spaces(filename, chunk_size=1 << 13):
"""Count number of spaces, tabs, and newlines in the file."""
nspaces = ntabs = nnewlines = 0
# assume ascii-based encoding and b'\n' newline
with open(filename, 'rb', 0) as file, \
mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as s:
c = Counter(m.group() for m in re.finditer(br'[ \t\n]', s))
return Count(c[b' '], c[b'\t'], c[b'\n'])
if __name__ == "__main__":
print(count_spaces(__file__))
Output
Count(nspaces=107, ntabs=0, nnewlines=18)

C=Counter(open(afile).read())
C[' ']

In my case tab(\t) is converted to " "(four spaces). So i have modified the
logic a bit to take care of that.
def count_spaces(filename):
with open(filename,"r") as f1:
contents=f1.readlines()
total_tab=0
total_space=0
for line in contents:
total_tab += line.count(" ")
total_tab += line.count("\t")
total_space += line.count(" ")
print("Space count = ",total_space)
print("Tab count = ",total_tab)
print("New line count = ",len(contents))
return total_space,total_tab,len(contents)

Related

Removing leading + trailing whitespace and newlines from a Python string, .strip() not working?

I'm currently in my second Python course and in our review warm-ups I'm getting stumped by a seemingly simple problem. Here's how it's stated:
In this exercise, your function will receive 1 parameter, the name of a text file. The function will return a string created by concatenating the fifth character from each line into one string. If the line has fewer than 5 characters, then the line should be skipped. All lines should have leading and trailing whitespace removed before looking for the fifth character.
CodeGrinder then grades it based off of randomly generated .txt files. Here's the code I currently have:
def fifthchar(filename):
file = open(filename)
fifthstring = ''
for x in file:
x.strip('\n ')
if len(x) >= 5:
fifthstring += x[4]
else:
pass
fifthstring.strip('\n ')
return fifthstring
And the error in return:
AssertionError: False is not true : fifthchar(rprlrhya.txt) returned mylgcwdnbi
dmou. It should have returned mylgcwdnbidmou. Check your logic and try again.
It seems that newlines are sneaking in through my .strip(), and I'm not sure how to remove them. I thought that .strip() would remove \n, and I've tried everything from .rstrip() to fifthstring.join(fifthstring.split()) to having redundancy in stripping both fifthstring and x in the loop. How are these newlines getting through?

Your solution is not taking in consideration several things:
empty lines where its fifth char is the '\n' char.
every line's leading and trailing spaces should be removed.
strip() doesn't mutate x, you need to re-assign the stripped string.
Here is your solution:
def fifthchar(filename):
file = open(filename)
fifthstring = ''
for x in file:
x = x.strip()
if len(x) >= 5:
fifthstring += x[4]
else:
pass
fifthstring.strip('\n ')
return fifthstring
Here is another:
def fifth(filename):
with open(filename) as f:
string = ''
for line in f.readlines():
l = line.strip()
string += l[4] if len(l) >= 5 else ''
return ''.join(string)
The same as before using list comprehension:
def fifth(filename):
with open(filename) as f:
string = [line.strip()[4] if len(line.strip()) >= 5 else '' for line in f.readlines()]
return ''.join(string)

This should work:
def fifthchar(filename):
with open(filename) as fin :
return ''.join( line.strip()[4] if len(line.strip()) > 4 else '' for line in fin.readlines() )

How to use string methods on text files?

I have to write a program where I need to find
the number of uppercase letters
the number of lowercase letters
the number of digits
the number of whitespace characters
in a text file and my current code is
def lowercase(line_list):
print("Lower case Letters: ", sum(1 for x in line_list if x.islower))
def uppercase(line_list):
print("Upper case Letters: ", sum(1 for c in line_list if c.isupper())
def numbers(line_list):
print("Numbers: ", sum(1 for b in line_list if b.isdigit())
def whitespace(line_list):
print("Spaces: ", sum(1 for y in line_list if y.isspace())
def main():
in_file = open("text.txt", "r")
line = in_file.readline()
line_list = line.split()
lowercase(line_list)
uppercase(line_list)
numbers(line_list)
whitespace(line_list)
in_file.close()
main()
However whenever I try to run the script it gives me a syntax error. Is there something I am doing wrong?

Right now, you have a syntax error in your lowercase function (you're missing the parens for the function call islower). However, your main function also has some problems. Right now, you are only reading in one line of the file. Also, you're splitting that line (split splits using space by default, so you will lose the spaces you are trying to count). If you're trying to read the whole thing, not just one line. Try this:
def main():
lower_case = 0
upper_case = 0
numbers = 0
whitespace = 0
with open("text.txt", "r") as in_file:
for line in in_file:
lower_case += sum(1 for x in line if x.islower())
upper_case += sum(1 for x in line if x.isupper())
numbers += sum(1 for x in line if x.isdigit())
whitespace += sum(1 for x in line if x.isspace())
print 'Lower case Letters: %s' % lower_case
print 'Upper case Letters: %s' % upper_case
print 'Numbers: %s' % numbers
print 'Spaces: %s' % spaces
main()

Here it is code where syntax errors resolved:
You have missed closing parenthesis in several places.
def lowercase(line_list):
print("Lower case Letters: ", sum(1 for x in line_list if x.islower))
def uppercase(line_list):
print("Upper case Letters: ", sum(1 for c in line_list if c.isupper()))
def numbers(line_list):
print("Numbers: ", sum(1 for b in line_list if b.isdigit()))
def whitespace(line_list):
print("Spaces: ", sum(1 for y in line_list if y.isspace()))
def main():
in_file = open("text.txt", "r")
line = in_file.readline()
line_list = line.split()
lowercase(line_list)
uppercase(line_list)
numbers(line_list)
whitespace(line_list)
in_file.close()
main()
Note: This is only solution for error you faced, there may be any other errors occurring due to the logic issues you have to check for the same.

Counting each DIFFERENT character in text file

I'm relatively new to python and coding and I'm trying to write a code that counts the number of times each different character comes out in a text file while disregarding the case of the characters.
What I have so far is
letters = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o',
'p','q','r','s','t','u','v','w','x','y','z']
prompt = "Enter filename: "
titles = "char count\n---- -----"
itemfmt = "{0:5s}{1:10d}"
totalfmt = "total{0:10d}"
whiteSpace = {' ':'space', '\t':'tab', '\n':'nline', '\r':'crtn'}
filename = input(prompt)
fname = filename
numberCharacters = 0
fname = open(filename, 'r')
for line in fname:
linecount +=1
word = line.split()
word += words
for word in words:
for char in word:
numberCharacters += 1
return numberCharacters
Somethings seems wrong about this. Is there a more efficient way to do my desired task?
Thanks!

from collections import Counter
frequency_per_character = Counter(open(filename).read().lower())
Then you can display them as you wish.

A better way would be to use the str methods such as isAlpha
chars = {}
for l in open('filename', 'rU'):
for c in l:
if not c.isalpha(): continue
chars[c] = chars.get(c, 0) + 1
And then use the chars dict to write the final histogram.

You over-complicating it, you can just convert your file content to a set in order to eliminate duplicated characters:
number_diff_value = len(set(open("file_path").read()))

Python Count not resetting?

I'm trying to insert an increment after the occurance of ~||~ in my .txt. I have this working, however I want to split it up, so after each semicolon, it starts back over at 1.
So Far I have the following, which does everything except split up at semicolons.
inputfile = "output2.txt"
outputfile = "/output3.txt"
f = open(inputfile, "r")
words = f.read().split('~||~')
f.close()
count = 1
for i in range(len(words)):
if ';' in words [i]:
count = 1
words[i] += "~||~" + str(count)
count = count + 1
f2 = open(outputfile, "w")
f2.write("".join(words))

Why not first split the file based on the semicolon, then in each segment count the occurences of '~||~'.
import re
count = 0
with open(inputfile) as f:
semicolon_separated_chunks = f.read().split(';')
count = len(re.findall('~||~', semicolon_separated_chunks))
# if file text is 'hello there ~||~ what is that; what ~||~ do you ~|| mean; nevermind ~||~'
# then count = 4

Instead of resetting the counter the way you are now, you could do the initial split on ;, and then split the substrings on ~||~. You'd have to store your words another way, since you're no longer doing words = f.read().split('~||~'), but it's safer to make an entirely new list anyway.
inputfile = "output2.txt"
outputfile = "/output3.txt"
all_words = []
f = open(inputfile, "r")
lines = f.read().split(';')
f.close()
for line in lines:
count = 1
words = line.split('~||~')
for word in words:
all_words.append(word + "~||~" + str(count))
count += 1
f2 = open(outputfile, "w")
f2.write("".join(all_words))
See if this works for you. You also may want to put some strategically-placed newlines in there, to make the output more readable.

Python - I read a file but it shows me erroneous values?

def showCounts(fileName):
lineCount = 0
wordCount = 0
numCount = 0
comCount = 0
dotCount = 0
with open(fileName, 'r') as f:
for line in f:
for char in line:
if char.isdigit() == True:
numCount+=1
elif char == '.':
dotCount+=1
elif char == ',':
comCount+=1
#i know formatting below looks off but it's right
words = line.split()
lineCount += 1
wordCount += len(words)
for word in words:
# text = word.translate(string.punctuation)
exclude = set(string.punctuation)
text = ""
text = ''.join(ch for ch in text if ch not in exclude)
try:
if int(text) >= 0 or int(text) < 0:
numCount += 1
except ValueError:
pass
print("Line count: " + str(lineCount))
print("Word count: " + str(wordCount))
print("Number count: " + str(numCount))
print("Comma count: " + str(comCount))
print("Dot count: " + str(dotCount) + "\n")
I have it read a .txt file containing words, lines, dots, commas, and numbers. It will give me the correct number of dots commas and numbers, but the words and lines values will be each much much higher than they actually are. Any one know why? Thanks guys.

I don't know if this is actually the answer, but my reputation isn't high enough to comment, so I'm putting it here. You obviously don't need to accept it as the final answer if it doesn't solve the issue.
So, I think it might have something to do with the fact that all of your print statements are actually outside of the showCounts() function. Try indenting the print statements.
I hope this helps.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

find all spaces, newlines and tabs in a python file - python

C=Counter(open(afile).read()) C[' ']

Related

Removing leading + trailing whitespace and newlines from a Python string, .strip() not working?

How to use string methods on text files?

Counting each DIFFERENT character in text file

Python Count not resetting?

Python - I read a file but it shows me erroneous values?

Categories

Resources