I am not able to strip the space and newlines. Any idea what might gone wrong?
line_count = 0
word_count = 0
for fline in fh:
line = repr(fline)
line = line.strip()
print line
line_count += 1
word_count += len(line.split())
result['size'] = filesize
result['line'] = line_count
result['words'] = word_count
output
'value of $input if it is\n'
' larger than or equal to ygjhg\n'
' that number. Otherwise assigns the value of \n'
' \n'
' '
Your strings are surrounded by double quotes because of repr():
>>> x = 'hello\n'
>>> repr(x)
"'hello\\n'"
>>> repr(x).strip()
"'hello\\n'"
>>>
Here is your edited code:
line_count = 0
word_count = 0
for fline in fh:
line = repr(line.strip())
print line
line_count += 1
word_count += len(line.split())
result['size'] = filesize
result['line'] = line_count
result['words'] = word_count
If fline is a string, then calling repr with it as the argument would enclose it in literal quotes. Thus:
foo\n
becomes
"foo\n"
Since the newline isn't at the end of the string anymore, strip won't remove it. Maybe consider not calling repr unless you desperately need to, or calling it after calling strip.
From what the others have mentioned, just change
line = repr(fline)
line = line.strip()
to
line = line.strip()
line = repr(fline)
Note that you might be wanting .rstrip() or even .rstrip("\n") instead.
Related
Good day!
I have the following snippets:
words_count = 0
lines_count = 0
line_max = None
file = open("alice.txt", "r")
for line in file:
line = line.rstrip("\n")
words = line.split()
words_count += len(words)
if line_max == None or len(words) > len(line_max.split()):
line_max = line
lines.append(line)
file.close()
This is using rstrip method to get rid of the white spaces in the file, but my exam unit do not allow the method rstrip since it was not introduced. My question is: Is there any other way to get the same result of Total number of words: 26466 without using the rstrip?
Thank you guys!
Interestingly, this works for me without using str.rstrip:
import requests
wc = 0
content = requests.get('https://files.catbox.moe/dz39pw.txt').text
for line in content.split('\n'):
# line = line.rstrip("\n")
words = line.split()
wc += len(words)
assert wc == 26466
Note that a one-liner way of doing that in Python could be:
wc = sum(len(line.split()) for line in content.split('\n'))
I'm currently in my second Python course and in our review warm-ups I'm getting stumped by a seemingly simple problem. Here's how it's stated:
In this exercise, your function will receive 1 parameter, the name of a text file. The function will return a string created by concatenating the fifth character from each line into one string. If the line has fewer than 5 characters, then the line should be skipped. All lines should have leading and trailing whitespace removed before looking for the fifth character.
CodeGrinder then grades it based off of randomly generated .txt files. Here's the code I currently have:
def fifthchar(filename):
file = open(filename)
fifthstring = ''
for x in file:
x.strip('\n ')
if len(x) >= 5:
fifthstring += x[4]
else:
pass
fifthstring.strip('\n ')
return fifthstring
And the error in return:
AssertionError: False is not true : fifthchar(rprlrhya.txt) returned mylgcwdnbi
dmou. It should have returned mylgcwdnbidmou. Check your logic and try again.
It seems that newlines are sneaking in through my .strip(), and I'm not sure how to remove them. I thought that .strip() would remove \n, and I've tried everything from .rstrip() to fifthstring.join(fifthstring.split()) to having redundancy in stripping both fifthstring and x in the loop. How are these newlines getting through?
Your solution is not taking in consideration several things:
empty lines where its fifth char is the '\n' char.
every line's leading and trailing spaces should be removed.
strip() doesn't mutate x, you need to re-assign the stripped string.
Here is your solution:
def fifthchar(filename):
file = open(filename)
fifthstring = ''
for x in file:
x = x.strip()
if len(x) >= 5:
fifthstring += x[4]
else:
pass
fifthstring.strip('\n ')
return fifthstring
Here is another:
def fifth(filename):
with open(filename) as f:
string = ''
for line in f.readlines():
l = line.strip()
string += l[4] if len(l) >= 5 else ''
return ''.join(string)
The same as before using list comprehension:
def fifth(filename):
with open(filename) as f:
string = [line.strip()[4] if len(line.strip()) >= 5 else '' for line in f.readlines()]
return ''.join(string)
This should work:
def fifthchar(filename):
with open(filename) as fin :
return ''.join( line.strip()[4] if len(line.strip()) > 4 else '' for line in fin.readlines() )
how do you count charcters with out spaces? I am not getting the right number. The right number of num_charsx is 1761
num_words = 0
num_chars = 0
with open("C:/Python33/fire.txt",'r') as f:
for line in f:
words = line.split('\n')
num_words += len(words)
num_chars += len(line)
num_charsx = num_chars - line.count(' ')
print(num_charsx)
2064
words = line.split('\n')
num_words += len(words)
doesn't do what you think it does. In the loop
for line in f:
line is a string that ends in '\n', so line.split('\n') is a two-item list, with the first item containing all the characters of the line apart from the terminating '\n'; the second item in that list is the empty string. Example:
line = 'This is a test\n'
words = line.split('\n')
print(words, len(words))
output
['This is a test', ''] 2
So your num_words += len(words) doesn't actually count words, it just gets twice the count of the number of lines.
To get an actual list of the words in line you need
words = line.split()
Your penultimate line
num_charsx = num_chars - line.count(' ')
is outside the for loop so it subtracts the space count of the last line of the file from the total num_chars, but I assume you really want to subtract the total space count of the whole file from num_chars.
Here's a repaired version of your code.
num_words = 0
num_chars = 0
num_spaces = 0
with open(fname, 'r') as f:
for num_lines, line in enumerate(f, 1):
num_words += len(line.split())
num_chars += len(line) - 1
num_spaces += line.count(' ')
num_charsx = num_chars - num_spaces
print(num_lines, num_words, num_chars, num_spaces, num_charsx)
I've modified the line reading loop to use enumerate. That's an efficient way to get the line number and the line contents without having to maintain a separate line counter.
In num_chars += len(line) - 1 the -1 is so we don't include the terminating '\n' of each line in the char count.
Note that on Windows text file lines are (normally) terminated with '\r\n' but that terminator gets converted to '\n' when you read a file opened in text mode. So on Windows the actual byte size of the file is num_chars + 2 * num_lines, assuming the last line has a '\r\n' terminator; it may not, in which case the actual size will be 2 bytes less than that.
You may want to try splitting the lines with a ' ' instead of a '\n'. As the '\n' should pretty much being done by the for loop.
The other option if you just want a character count is you could just use the replace method to remove ' ' and then count the length of the string.
num_chars = len(line.replace(' ', ''))
You could also try this:
num_chars = 0
with open("C:/Python33/fire.txt",'r') as f:
for line in f:
num_chars += len(line.split('\n')[0])
num_charsx = num_chars - line.count(' ')
print(num_charsx)
def count_spaces(filename):
input_file = open(filename,'r')
file_contents = input_file.read()
space = 0
tabs = 0
newline = 0
for line in file_contents == " ":
space +=1
return space
for line in file_contents == '\t':
tabs += 1
return tabs
for line in file_contents == '\n':
newline += 1
return newline
input_file.close()
I'm trying to write a function which takes a filename as a parameter and returns the total number of all spaces, newlines and also tab characters in the file. I want to try use a basic for loop and if statement but I'm struggling at the moment :/ any help would be great thanks.
Your current code doesn't work because you're combining loop syntax (for x in y) with a conditional test (x == y) in a single muddled statement. You need to separate those.
You also need to use just a single return statement, as otherwise the first one you reach will stop the function from running and the other values will never be returned.
Try:
for character in file_contents:
if character == " ":
space +=1
elif character == '\t':
tabs += 1
elif character == '\n':
newline += 1
return space, tabs, newline
The code in Joran Beasley's answer is a more Pythonic approach to the problem. Rather than having separate conditions for each kind of character, you can use the collections.Counter class to count the occurrences of all characters in the file, and just extract the counts of the whitespace characters at the end. A Counter works much like a dictionary.
from collections import Counter
def count_spaces(filename):
with open(filename) as in_f:
text = in_f.read()
count = Counter(text)
return count[" "], count["\t"], count["\n"]
To support large files, you could read a fixed number of bytes at a time:
#!/usr/bin/env python
from collections import namedtuple
Count = namedtuple('Count', 'nspaces ntabs nnewlines')
def count_spaces(filename, chunk_size=1 << 13):
"""Count number of spaces, tabs, and newlines in the file."""
nspaces = ntabs = nnewlines = 0
# assume ascii-based encoding and b'\n' newline
with open(filename, 'rb') as file:
chunk = file.read(chunk_size)
while chunk:
nspaces += chunk.count(b' ')
ntabs += chunk.count(b'\t')
nnewlines += chunk.count(b'\n')
chunk = file.read(chunk_size)
return Count(nspaces, ntabs, nnewlines)
if __name__ == "__main__":
print(count_spaces(__file__))
Output
Count(nspaces=150, ntabs=0, nnewlines=20)
mmap allows you to treat a file as a bytestring without actually loading the whole file into memory e.g., you could search for a regex pattern in it:
#!/usr/bin/env python3
import mmap
import re
from collections import Counter, namedtuple
Count = namedtuple('Count', 'nspaces ntabs nnewlines')
def count_spaces(filename, chunk_size=1 << 13):
"""Count number of spaces, tabs, and newlines in the file."""
nspaces = ntabs = nnewlines = 0
# assume ascii-based encoding and b'\n' newline
with open(filename, 'rb', 0) as file, \
mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as s:
c = Counter(m.group() for m in re.finditer(br'[ \t\n]', s))
return Count(c[b' '], c[b'\t'], c[b'\n'])
if __name__ == "__main__":
print(count_spaces(__file__))
Output
Count(nspaces=107, ntabs=0, nnewlines=18)
C=Counter(open(afile).read())
C[' ']
In my case tab(\t) is converted to " "(four spaces). So i have modified the
logic a bit to take care of that.
def count_spaces(filename):
with open(filename,"r") as f1:
contents=f1.readlines()
total_tab=0
total_space=0
for line in contents:
total_tab += line.count(" ")
total_tab += line.count("\t")
total_space += line.count(" ")
print("Space count = ",total_space)
print("Tab count = ",total_tab)
print("New line count = ",len(contents))
return total_space,total_tab,len(contents)
I'm trying to insert an increment after the occurance of ~||~ in my .txt. I have this working, however I want to split it up, so after each semicolon, it starts back over at 1.
So Far I have the following, which does everything except split up at semicolons.
inputfile = "output2.txt"
outputfile = "/output3.txt"
f = open(inputfile, "r")
words = f.read().split('~||~')
f.close()
count = 1
for i in range(len(words)):
if ';' in words [i]:
count = 1
words[i] += "~||~" + str(count)
count = count + 1
f2 = open(outputfile, "w")
f2.write("".join(words))
Why not first split the file based on the semicolon, then in each segment count the occurences of '~||~'.
import re
count = 0
with open(inputfile) as f:
semicolon_separated_chunks = f.read().split(';')
count = len(re.findall('~||~', semicolon_separated_chunks))
# if file text is 'hello there ~||~ what is that; what ~||~ do you ~|| mean; nevermind ~||~'
# then count = 4
Instead of resetting the counter the way you are now, you could do the initial split on ;, and then split the substrings on ~||~. You'd have to store your words another way, since you're no longer doing words = f.read().split('~||~'), but it's safer to make an entirely new list anyway.
inputfile = "output2.txt"
outputfile = "/output3.txt"
all_words = []
f = open(inputfile, "r")
lines = f.read().split(';')
f.close()
for line in lines:
count = 1
words = line.split('~||~')
for word in words:
all_words.append(word + "~||~" + str(count))
count += 1
f2 = open(outputfile, "w")
f2.write("".join(all_words))
See if this works for you. You also may want to put some strategically-placed newlines in there, to make the output more readable.