How to count number of replacements made in string - python

I am currently working on a beginner problem
(https://www.reddit.com/r/beginnerprojects/comments/1i6sax/challenge_count_and_fix_green_eggs_and_ham/).
The challenge is to read through a file, replacing lower case 'i' with 'I' and writing a new corrected file.
I am at a point where the program reads the input file, replaces the relevant lower case characters, and writes a new corrected file. However, I need to also count the number of corrections.
I have looked through the .replace() documentation and I cannot see that it is possible to find out the number of replacements made. Is it possible to count corrections using the replace method?
def capitalize_i(file):
file = file.replace('i ', 'I ')
file = file.replace('-i-', '-I-')
return file
with open("green_eggs.txt", "r") as f_open:
file_1 = f_open.read()
file_2 = open("result.txt", "w")
file_2.write(capitalize_i(file_1))

You can just use the count function:
i_count = file.count('i ')
file = file.replace('i ', 'I ')
i_count += file.count('-i-')
file = file.replace('-i-', '-I-')
i_count will have the total amount of replacements made. You can also separate them by creating new variables if you want.

Related

Is there a way to reverse the order of lines within a text file using a function in python?

def encrypt():
while True:
try:
userinp = input("Please enter the name of a file: ")
file = open(f"{userinp}.txt", "r")
break
except:
print("That File Does Not Exist!")
second = open("encoded.txt", "w")
for line in file:
reverse_word(line)
def reverse_word(line):
data = line.read()
data_1 = data[::-1]
print(data_1)
return data_1
encrypt()
I'm currently supposed to make a program that encrypts a text file in some way, and one method that I'm trying to use is reversing the sequence of the lines in the text file. All of my other functions already made, utilize the "for line in file", where "line" is carried over to each separate function, then changed for the purpose of encryption, but when trying to do the same thing here for reversing the order of the lines in the file, I get an error
"str" object has no attribute "read"
I've tried using the same sequence as I did down below, but instead carrying over the file, which works, but I want to have it so that it can work when I carry over individual lines from the file, as is, with the other functions that I have currently (or more simply put, having this function inside of the for loop).
Any Suggestions? Thanks!
Are you trying to reverse the order of the lines or the order of the words in each line?
Reversing the lines can be done by simply reading the lines and using the built-in reverse function:
lines = fp.readlines()
lines.reverse()
If you're trying to reverse the words (actual words, not just the string of characters in each line) you're going to need to do some regex to match on word boundaries.
Otherwise, simply reversing each line can be done like:
lines = fp.readlines()
for line in lines:
chars = list(line)
chars.reverse()
I think the bug you're referring to is in this function:
def reverse_word(line):
data = line.read()
data_1 = data[::-1]
print(data_1)
return data_1
You don't need to call read() on line because it's already a string; read() is called on file objects in order to turn them into strings. Just do:
def reverse_line(line):
return line[::-1]
and it will reverse the entire line.
If you wanted to reverse the individual words in the line, while keeping them in the same order within the line (e.g. turn "the cat sat on a hat" to "eht tac tas no a tah"), that'd be something like:
def reverse_words(line):
return ' '.join(word[::-1] for word in line.split())
If you wanted to reverse the order of the words but not the words themselves (e.g. turn "the cat sat on a hat" to "hat a on sat cat the"), that would be:
def reverse_word_order(line):
return ' '.join(line.split()[::-1])

Troubles averaging all the grades from a "CSV" file

So I am new to python and I am having a hard time figuring this code. I am trying to use "CSV File" called exam_grades.csv and then write a function that reads in all my values in the file but using the string class split() method to split this long string into a list of strings. Each string represents a grade. Then my function should return the average of all the grades.
So far this is what I have; I can open the .csv file just fine but I'm having troubles averaging all the grades. I have some commented out because I am sure where to go from what I have been doing :(
def fileSearch():
'Problem 4'
readfile = open('exam_grades.csv', "r")
for line in readfile:
l = line.split(str(","))
#num_grades = len(l)
#averageAllGrades = l * 500
#return num_grades
print(l)
fileSearch()
Any advice?
Thanks!
Most CSV files have a header at the top, you'll want to skip that but for simplicity sake, let's say we ignore that.
Here's some code that works:
def fileSearch():
'Problem 4'
readfile = open('exam_grades.csv', "r")
grade_sum = 0
grade_count = 0
for line in readfile:
l = line.split(str(","))
for grade in l:
grade_sum += int(grade)
grade_count += 1
print(grade_sum/grade_count)
fileSearch()
This assumes you have multiple lines with grades and multiple grades per line.
We're keeping track of two variables here, the sum of all grades and the number of all grades we've added to the list (we're also casting to integers, since you're going to be reading strings).
When you add all the grades up and divide by the number of grades, you get an average.
Hope this helped.

Count number of sentences ending with puncutation mark in a textfile

I'm attempting to make a function that counts the number of sentences in a textfile. In this case, a sentence refers to any string ending with either a '.', '?', or a '!'.
I'm new to Python and I'm having trouble figuring out how to do this. I keep getting the error 'UnboundLocalError: local variable 'numberofSentences' referenced before assignment.' Any help would be appreciated!
def countSentences(filename):
endofSentence =['.', '!', '?']
for sentence in filename:
for fullStops in endofSentence:
if numberofSentences.find(fullStops) == true:
numberofSentences = numberofSentences+1
return numberofSentences
print(countSentences('paragraph.txt'))
You need to initialize the variable to something before incrementing it. You also need to open the file itself, read it's text and evaluate it - and not count what letters are in the given filename.
# create file
with open("paragraph.txt","w") as f:
f.write("""
Some text. More text. Even
more text? No, dont need more of that!
Nevermore.""")
def countSentences(filename):
"""Count the number of !.? in a file. Return it."""
numberofSentences = 0 # init variable here before using it
with open(filename) as f: # read file
for line in f: # process line wise , each line char wise
for char in line:
if char in {'.', '!', '?'}:
numberofSentences += 1
return numberofSentences
print(countSentences('paragraph.txt'))
Output:
5
Doku:
reading files
It would work if you actually checked that the fullStop was in the sentence in the first place and you declared numberOfSentences beforehand.
The best method to do this I think would be to instead of using find, which returns a number, not a bool, would be to write
if sentence in endofSentence:
numberOfSentences+=1

How to convert a list into float for using the '.join' function?

I have to compress a file into a list of words and list of positions to recreate the original file. My program should also be able to take a compressed file and recreate the full text, including punctuation and capitalization, of the original file. I have everything correct apart from the recreation, using the map function my program can't convert my list of positions into floats because of the '[' as it is a list.
My code is:
text = open("speech.txt")
CharactersUnique = []
ListOfPositions = []
DownLine = False
while True:
line = text.readline()
if not line:
break
TwoList = line.split()
for word in TwoList:
if word not in CharactersUnique:
CharactersUnique.append(word)
ListOfPositions.append(CharactersUnique.index(word))
if not DownLine:
CharactersUnique.append("\n")
DownLine = True
ListOfPositions.append(CharactersUnique.index("\n"))
w = open("List_WordsPos.txt", "w")
for c in CharactersUnique:
w.write(c)
w.close()
x = open("List_WordsPos.txt", "a")
x.write(str(ListOfPositions))
x.close()
with open("List_WordsPos.txt", "r") as f:
NewWordsUnique = f.readline()
f.close()
h = open("List_WordsPos.txt", "r")
lines = h.readlines()
NewListOfPositions = lines[1]
NewListOfPositions = map(float, NewListOfPositions)
print("Recreated Text:\n")
recreation = " " .join(NewWordsUnique[pos] for pos in (NewListOfPositions))
print(recreation)
The error I get is:
Task 3 Code.py", line 42, in <genexpr>
recreation = " " .join(NewWordsUnique[pos] for pos in (NewListOfPositions))
ValueError: could not convert string to float: '['
I am using Python IDLE 3.5 (32-bit). Does anyone have any ideas on how to fix this?
Why do you want to turn the position values in the list into floats, since they list indices, and those must be integer? I suspected this might be an instance of what is called the XY Problem.
I also found your code difficult to understand because you haven't followed the PEP 8 - Style Guide for Python Code. In particular, with how many (although not all) of the variable names are CamelCased, which according to the guidelines, should should be reserved for the class names.
In addition some of your variables had misleading names, like CharactersUnique, which actually [mostly] contained unique words.
So, one of the first things I did was transform all the CamelCased variables into lowercase underscore-separated words, like camel_case. In several instances I also gave them better names to reflect their actual contents or role: For example: CharactersUnique became unique_words.
The next step was to improve the handling of files by using Python's with statement to ensure they all would be closed automatically at the end of the block. In other cases I consolidated multiple file open() calls into one.
After all that I had it almost working, but that's when I discovered a problem with the approach of treating newline "\n" characters as separate words of the input text file. This caused a problem when the file was being recreated by the expression:
" ".join(NewWordsUnique[pos] for pos in (NewListOfPositions))
because it adds one space before and after every "\n" character encountered that aren't there in the original file. To workaround that, I ended up writing out the for loop that recreates the file instead of using a list comprehension, because doing so allows the newline "words" could be handled properly.
At any rate, here's the resulting rewritten (and working) code:
input_filename = "speech.txt"
compressed_filename = "List_WordsPos.txt"
# Two lists to represent contents of input file.
unique_words = ["\n"] # preload with newline "word"
word_positions = []
with open(input_filename, "r") as input_file:
for line in input_file:
for word in line.split():
if word not in unique_words:
unique_words.append(word)
word_positions.append(unique_words.index(word))
word_positions.append(unique_words.index("\n")) # add newline at end of each line
# Write representations of the two data-structures to compressed file.
with open(compressed_filename, "w") as compr_file:
words_repr = " ".join(repr(word) for word in unique_words)
compr_file.write(words_repr + "\n")
positions_repr = " ".join(repr(posn) for posn in word_positions)
compr_file.write(positions_repr + "\n")
def strip_quotes(word):
"""Strip the first and last characters from the string (assumed to be quotes)."""
tmp = word[1:-1]
return tmp if tmp != "\\n" else "\n" # newline "words" are special case
# Recreate input file from data in compressed file.
with open(compressed_filename, "r") as compr_file:
line = compr_file.readline()
new_unique_words = list(map(strip_quotes, line.split()))
line = compr_file.readline()
new_word_positions = map(int, line.split()) # using int, not float here
words = []
lines = []
for posn in new_word_positions:
word = new_unique_words[posn]
if word != "\n":
words.append(word)
else:
lines.append(" ".join(words))
words = []
print("Recreated Text:\n")
recreation = "\n".join(lines)
print(recreation)
I created my own speech.txt test file from the first paragraph of your question and ran the script on it with these results:
Recreated Text:
I have to compress a file into a list of words and list of positions to recreate
the original file. My program should also be able to take a compressed file and
recreate the full text, including punctuation and capitalization, of the
original file. I have everything correct apart from the recreation, using the
map function my program can't convert my list of positions into floats because
of the '[' as it is a list.
Per your question in the comments:
You will want to split the input on spaces. You will also likely want to use different data structures.
# we'll map the words to a list of positions
all_words = {}
with open("speech.text") as f:
data = f.read()
# since we need to be able to re-create the file, we'll want
# line breaks
lines = data.split("\n")
for i, line in enumerate(lines):
words = line.split(" ")
for j, word in enumerate(words):
if word in all_words:
all_words[word].append((i, j)) # line and pos
else:
all_words[word] = [(i, j)]
Note that this does not yield maximum compression as foo and foo. count as separate words. If you want more compression, you'll have to go character by character. Hopefully now you can use a similar approach to do so if desired.

Same value in list keeps getting repeated when writing to text file

I'm a total noob to Python and need some help with my code.
The code is meant to take Input.txt [http://pastebin.com/bMdjrqFE], split it into seperate Pokemon (in a list), and then split that into seperate values which I use to reformat the data and write it to Output.txt.
However, when I run the program, only the last Pokemon gets outputted, 386 times. [http://pastebin.com/wkHzvvgE]
Here's my code:
f = open("Input.txt", "r")#opens the file (input.txt)
nf = open("Output.txt", "w")#opens the file (output.txt)
pokeData = []
for line in f:
#print "%r" % line
pokeData.append(line)
num = 0
tab = """ """
newl = """NEWL
"""
slash = "/"
while num != 386:
current = pokeData
current.append(line)
print current[num]
for tab in current:
words = tab.split()
print words
for newl in words:
nf.write('%s:{num:%s,species:"%s",types:["%s","%s"],baseStats:{hp:%s,atk:%s,def:%s,spa:%s,spd:%s,spe:%s},abilities:{0:"%s"},{1:"%s"},heightm:%s,weightkg:%s,color:"Who cares",eggGroups:["%s"],["%s"]},\n' % (str(words[2]).lower(),str(words[1]),str(words[2]),str(words[3]),str(words[4]),str(words[5]),str(words[6]),str(words[7]),str(words[8]),str(words[9]),str(words[10]),str(words[12]).replace("_"," "),str(words[12]),str(words[14]),str(words[15]),str(words[16]),str(words[16])))
num = num + 1
nf.close()
f.close()
There are quite a few problems with your program starting with the file reading.
To read the lines of a file to an array you can use file.readlines().
So instead of
f = open("Input.txt", "r")#opens the file (input.txt)
pokeData = []
for line in f:
#print "%r" % line
pokeData.append(line)
You can just do this
pokeData = open("Input.txt", "r").readlines() # This will return each line within an array.
Next you are misunderstanding the uses of for and while.
A for loop in python is designed to iterate through an array or list as shown below. I don't know what you were trying to do by for newl in words, a for loop will create a new variable and then iterate through an array setting the value of this new variable. Refer below.
array = ["one", "two", "three"]
for i in array: # i is created
print (i)
The output will be:
one
two
three
So to fix alot of this code you can replace the whole while loop with something like this.
(The code below is assuming your input file has been formatted such that all the words are split by tabs)
for line in pokeData:
words = line.split (tab) # Split the line by tabs
nf.write ('your very long and complicated string')
Other helpers
The formatted string that you write to the output file looks very similar to the JSON format. There is a builtin python module called json that can convert a native python dict type to a json string. This will probably make things alot easier for you but either way works.
Hope this helps

Categories