This program I created in python is supposed to count the number of uppercase letters, lower case, digits, and number of white space characters in a text file. It keeps coming back with syntax error. I'm having trouble finding out why my error is.
infile = open("text.txt", "r")
uppercasecount, lowercasecount, digitcount = (0, 0, 0)
for character in infile.readlines():
if character.isupper() == True:
uppercasecount += 1
if character.islower() == True:
lowercasecount += 1
if character.isdigit() == True:
digitcount += 1
print(uppercasecount),(lowercasecount),(digitcount)
print "Total count is %d Upper case, %d Lower case and %d Digit(s)" %(uppercasecount, lowercasecount, digitcount)
change this:
print(uppercasecount),(lowercasecount),(digitcount)
to:
print uppercasecount,lowercasecount,digitcount
Instead of readlines, use read:
for character in infile.read():
readlines will read the whole file as a list of line
read will read the whole file as a string
Complete answer for python 2 and 3. Try this if you want to get count of letters not words
infile = open("text.txt", "r")
uppercasecount, lowercasecount, digitcount = (0, 0, 0)
for character in infile.read():
if character.isupper() == True:
uppercasecount += 1
if character.islower() == True:
lowercasecount += 1
if character.isdigit() == True:
digitcount += 1
print(uppercasecount,lowercasecount,digitcount)
print("Total count is %d Upper case, %d Lower case and %d Digit(s)" %(uppercasecount, lowercasecount, digitcount))
Related
I'm writing a program to encode, decode and crack with the Caesar Cipher.
I have this function that shifts the letters in a string along by a specified amount:
def shift(data, shifter):
alphabet = "abcdefghijklmnopqrstuvwxyz"
data = list(data)
counter = 0 # we will use this to modify the list while we iterate over it
for letter in data:
letter = letter.lower()
if letter not in alphabet:
counter += 1
continue
lPos = alphabet.find(letter)
if shifter >= 0:
shiftedPos = lPos + (0 - shifter)
else:
shiftedPos = lPos + abs(shifter)
if shiftedPos >= len(alphabet) - 1: shiftedPos -= len(alphabet)
data[counter] = alphabet[shiftedPos] # update the letter
counter += 1 # advance
data = ''.join(data) # make it into a string again
return data
And I have this function to crack a ciphered string:
def crack(decryptor=None, tries=None):
if decryptor is None and tries is None:
task = getValidInput("Get data from a [f]ile or [s]tdin? >", "Please give either 'f' or 's'.", 'f', 's')
if task == "f": # it's a file
dataFile = getValidFile() # get an open file object
data = dataFile.read() # get the data from the text file. hopefully it's ascii text!
elif task == "s": # we need to get data from stdin
data = input("Enter data to crack >")
tries = getValidInt("Enter tries per sweep >")
else:
data = decryptor
retry = True
shifter = 0
while retry:
for i in range(0, tries):
oput = "Try " + str(i) + ": "
posData = shift(data, shifter)
negData = shift(data, 0 - shifter)
# semitry 1 - positive
oput += posData + ", "
# semitry 2 - negative
negData = ''.join(negData) # make it into a string again
oput += negData
print(oput)
shifter += 1
doRetry = getValidInput("Keep trying (y/n)? > ", "Invalid!", 'y', 'n')
if doRetry == 'n': retry = False
However, after selecting 'y' to continue a few times, I get the following IndexError:
Traceback (most recent call last):
File "CeaserCypher.py", line 152, in <module>
crack()
File "CeaserCypher.py", line 131, in crack
negData = shift(data, 0 - shifter)
File "CeaserCypher.py", line 60, in shift
print(alphabet[shiftedPos])
IndexError: string index out of range
Why am I getting this error and how can I fix it?
IndexError means that the index you are trying to access does not exist. In a string, that means you're trying to get a character from the string at a given point. If that given point does not exist, then you will be trying to get a character that is not inside of the string.
"0123456"[7] tries to get the 7th character in the string, but that index does not exist so "IndexError" is raised.
All valid indexes on a string are less than the length of the string (when you do len(string)). In your case, alphabet[shiftedPos] raises IndexError because shiftedPos is greater than or equal to the length of the string "alphabet".
To my understanding, what you want to do is loop back over the string when you go out of bounds like this. "z" (character 25) gets incrimented by say 2 and becomes character 27. You want that to now become character 2 (letter "b") in this case. Hence, you should use modulo. replace "alphabet[shiftedPos]" with "alphabet[shiftedPos%len(alphabet)]" and I believe this will solve this problem.
Modulo, btw, divides a number by n and gives you the remainder. Effectively, it will subtract n until the number is less than n (so it will always be in the range you want it to be in).
I am trying to count the number of paragraphs and the most frequent words in a text file (any text file for that matter) but seem to have zero output when I run my code, no errors either. Any tips on where I'm going wrong?
filename = input("enter file name: ")
inf = open(filename, 'r')
#frequent words
wordcount={}
for word in inf.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
for key in wordcount.keys():
print ("%s %s " %(key , wordcount[key]))
#Count Paragraph(s)
linecount = 0
for i in inf:
paragraphcount = 0
if '\n' in i:
linecount += 1
if len(i) < 2: paragraphcount *= 0
elif len(i) > 2: paragraphcount = paragraphcount + 1
print('%-4d %4d %s' % (paragraphcount, linecount, i))
inf.close()
filename = raw_input("enter file name: ")
wordcount={}
paragraphcount = 0
linecount = 0
with open(filename, 'r') as ftext:
for line in ftext.readlines():
if line in ('\n', '\r\n'):
if linecount == 0:
paragraphcount = paragraphcount + 1
linecount = linecount + 1
else:
linecount = 0
#frequent words
for word in line.split():
wordcount[word] = wordcount.get(word,0) + 1
print wordcount
print paragraphcount
When you are reading a file, there is a cursor that indicates which byte you are reading at the moment. In your code, you are trying to read the file twice and encountered a strange behavior, which shoud have been a hint that you are doing something wrong. To the solution,
What is the correct way ?
You should read the file once, store every line, then find word count and paragraph count, using the same store. Rather than trying to reading it twice.
What is happening is the current code ?
When you first read the file, your byte cursor is set to the end of the file, when you try to read lines, if returns an empty list because it tries to read the end of the file. You can corrent this by resetting the file pointer(the cursor).
Call inf.seek(0) just before you try to read lines. But instead of this, you should be focusing on implementing a method I mentioned in the first section.
def count_spaces(filename):
input_file = open(filename,'r')
file_contents = input_file.read()
space = 0
tabs = 0
newline = 0
for line in file_contents == " ":
space +=1
return space
for line in file_contents == '\t':
tabs += 1
return tabs
for line in file_contents == '\n':
newline += 1
return newline
input_file.close()
I'm trying to write a function which takes a filename as a parameter and returns the total number of all spaces, newlines and also tab characters in the file. I want to try use a basic for loop and if statement but I'm struggling at the moment :/ any help would be great thanks.
Your current code doesn't work because you're combining loop syntax (for x in y) with a conditional test (x == y) in a single muddled statement. You need to separate those.
You also need to use just a single return statement, as otherwise the first one you reach will stop the function from running and the other values will never be returned.
Try:
for character in file_contents:
if character == " ":
space +=1
elif character == '\t':
tabs += 1
elif character == '\n':
newline += 1
return space, tabs, newline
The code in Joran Beasley's answer is a more Pythonic approach to the problem. Rather than having separate conditions for each kind of character, you can use the collections.Counter class to count the occurrences of all characters in the file, and just extract the counts of the whitespace characters at the end. A Counter works much like a dictionary.
from collections import Counter
def count_spaces(filename):
with open(filename) as in_f:
text = in_f.read()
count = Counter(text)
return count[" "], count["\t"], count["\n"]
To support large files, you could read a fixed number of bytes at a time:
#!/usr/bin/env python
from collections import namedtuple
Count = namedtuple('Count', 'nspaces ntabs nnewlines')
def count_spaces(filename, chunk_size=1 << 13):
"""Count number of spaces, tabs, and newlines in the file."""
nspaces = ntabs = nnewlines = 0
# assume ascii-based encoding and b'\n' newline
with open(filename, 'rb') as file:
chunk = file.read(chunk_size)
while chunk:
nspaces += chunk.count(b' ')
ntabs += chunk.count(b'\t')
nnewlines += chunk.count(b'\n')
chunk = file.read(chunk_size)
return Count(nspaces, ntabs, nnewlines)
if __name__ == "__main__":
print(count_spaces(__file__))
Output
Count(nspaces=150, ntabs=0, nnewlines=20)
mmap allows you to treat a file as a bytestring without actually loading the whole file into memory e.g., you could search for a regex pattern in it:
#!/usr/bin/env python3
import mmap
import re
from collections import Counter, namedtuple
Count = namedtuple('Count', 'nspaces ntabs nnewlines')
def count_spaces(filename, chunk_size=1 << 13):
"""Count number of spaces, tabs, and newlines in the file."""
nspaces = ntabs = nnewlines = 0
# assume ascii-based encoding and b'\n' newline
with open(filename, 'rb', 0) as file, \
mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as s:
c = Counter(m.group() for m in re.finditer(br'[ \t\n]', s))
return Count(c[b' '], c[b'\t'], c[b'\n'])
if __name__ == "__main__":
print(count_spaces(__file__))
Output
Count(nspaces=107, ntabs=0, nnewlines=18)
C=Counter(open(afile).read())
C[' ']
In my case tab(\t) is converted to " "(four spaces). So i have modified the
logic a bit to take care of that.
def count_spaces(filename):
with open(filename,"r") as f1:
contents=f1.readlines()
total_tab=0
total_space=0
for line in contents:
total_tab += line.count(" ")
total_tab += line.count("\t")
total_space += line.count(" ")
print("Space count = ",total_space)
print("Tab count = ",total_tab)
print("New line count = ",len(contents))
return total_space,total_tab,len(contents)
def showCounts(fileName):
lineCount = 0
wordCount = 0
numCount = 0
comCount = 0
dotCount = 0
with open(fileName, 'r') as f:
for line in f:
for char in line:
if char.isdigit() == True:
numCount+=1
elif char == '.':
dotCount+=1
elif char == ',':
comCount+=1
#i know formatting below looks off but it's right
words = line.split()
lineCount += 1
wordCount += len(words)
for word in words:
# text = word.translate(string.punctuation)
exclude = set(string.punctuation)
text = ""
text = ''.join(ch for ch in text if ch not in exclude)
try:
if int(text) >= 0 or int(text) < 0:
numCount += 1
except ValueError:
pass
print("Line count: " + str(lineCount))
print("Word count: " + str(wordCount))
print("Number count: " + str(numCount))
print("Comma count: " + str(comCount))
print("Dot count: " + str(dotCount) + "\n")
I have it read a .txt file containing words, lines, dots, commas, and numbers. It will give me the correct number of dots commas and numbers, but the words and lines values will be each much much higher than they actually are. Any one know why? Thanks guys.
I don't know if this is actually the answer, but my reputation isn't high enough to comment, so I'm putting it here. You obviously don't need to accept it as the final answer if it doesn't solve the issue.
So, I think it might have something to do with the fact that all of your print statements are actually outside of the showCounts() function. Try indenting the print statements.
I hope this helps.
im writing a python function to do the following, add numbers from each line, so i can then find the average. this is what my file looks like:
-2.7858521
-2.8549764
-2.8881847
2.897689
1.6789098
-0.07865
1.23589
2.532461
0.067825
-3.0373958
Basically ive written a program that does a for loop for each line, incrementing the counter of lines and setting each line to a float value.
counterTot = 0
with open('predictions2.txt', 'r') as infile:
for line in infile:
counterTot += 1
i = float(line.strip())
now is the part i get a lil stuck
totalSum =
mean = totalSum / counterTot
print(mean)
As you can tell im new to python, but i find it very handy for text analysis work, so im getting into it.
Extra function
I was also looking into an extra feature. but should be a seperate function as above.
counterTot = 0
with open('predictions2.txt', 'r') as infile:
for line in infile:
counterTot += 1
i = float(line.strip())
if i > 3:
i = 3
elif i < -3:
i = -3
As you can see from the code, the function decides if a number is bigger than 3, if so, then make it 3. If number is smaller than -3, make it -3. But im trying to output this to a new file, so that it keeps its structure in tact. For both situations i would like to keep the decimal places. I can always round the output numbers myself, i just need the numbers intact.
You can do this without loading the elements into a list by cheekily using fileinput and retrieve the line count from that:
import fileinput
fin = fileinput.input('your_file')
total = sum(float(line) for line in fin)
print total / fin.lineno()
You can use enumerate here:
with open('predictions2.txt') as f:
tot_sum = 0
for i,x in enumerate(f, 1):
val = float(x)
#do something with val
tot_sum += val #add val to tot_sum
print tot_sum/i #print average
#prints -0.32322842
I think you want something like this:
with open('numbers.txt') as f:
numbers = f.readlines()
average = sum([float(n) for n in numbers]) / len(numbers)
print average
Output:
-0.32322842
It reads your numbers from numbers.txt, splits them by newline, casts them to a float, adds them all up and then divides the total by the length of your list.
Do you mean you need to change 5.1234 to 3.1234 and -8.5432 to -3.5432 ?
line = " -5.123456 "
i = float(line.strip())
if i > 3:
n = int(i)
i = i - (n - 3)
elif i < -3:
n = int(i)
i = i - (n + 3)
print(i)
it give you
-3.123456
Edit:
shorter version
line = " -5.123456 "
i = float(line.strip())
if i >= 4:
i -= int(i) - 3
elif i <= -4:
i -= int(i) + 3
print(i)
Edit 2:
If you need to change 5.1234 to 3.0000 ("3" and 4x "0") and -8.7654321 to -3.0000000 ("-3" and 7x "0")
line = " -5.123456 "
line = line.strip()
i = float(line)
if i > 3:
length = len(line.split(".")[1])
i = "3.%s" % ("0" * length) # now it will be string again
elif i < -3:
length = len(line.split(".")[1])
i = "-3.%s" % ("0" * length) # now it will be string again
print(i)
Here is a more verbose version. You could decide to replace invalid lines (if any) by a neutral value instead of ignoring it
numbers = []
with open('myFile.txt', 'r') as myFile:
for line in myFile:
try:
value = float(line)
except ValueError, e:
print line, "is not a valid float" # or numbers.append(defaultValue) if an exception occurred
else:
numbers.append(value)
print sum(numbers) / len(numbers)
For your second request here is the most straightforward solution (more solutions here)
def clamp(value, lowBound, highBound):
return max(min(highBound, value), lowBound)
Applying it to our list:
clampedValues = map(lambda x: clamp(x, -3.0, 3.0), numbers)