Replacing a word in a text file - python

does anyone know how to replace a word in a text file?
Here's one line from my stock file:
bread 0.99 12135479 300 200 400
I want to be able to replace my 4th word (in this instance '300') when I print 'productline' with a new number created by the nstock part of this code:
for line in details: #for every line in the file:
if digits in line: #if the barcode is in the line
productline=line #it stores the line as 'productline'
itemsplit=productline.split(' ') #seperates into different words
price=float(itemsplit[1]) #the price is the second part of the line
current=int(itemsplit[3]) #the current stock level is the third part of the line
quantity=int(input("How much of the product do you wish to purchase?\n"))
if quantity<current:
total=(price)*(quantity) #this works out the price
print("Your total spent on this product is:\n" + "£" +str(total)+"\n") #this tells the user, in total how much they have spent
with open("updatedstock.txt","w") as f:
f.writelines(productline) #writes the line with the product in
nstock=int(current-quantity) #the new stock level is the current level minus the quantity
My code does not replace the 4th word (which is the current stock level) with the new stock level (nstock)

Actually you can use regular expressions for that purpose.
import re
string1='bread 0.99 12135479 300 200 400'
pattern='300'
to_replace_with="youyou"
string2=re.sub(pattern, to_replace_with, string1)
You will have the output bellow:
'bread 0.99 12135479 youyou 200 400'
Hope this was what you were looking for ;)

Related

Counting word occurrences from specific part of txt files using python3

I have a folder with a number of txt files.
I want to count the number of occurrences of a set of words in a certain part of a each txt file and export the results to a new excel file.
Specifically, I want to look for the occurrences of words only in part of text that begins after the word "Company A" and ends in the word "Company B."
For example:
I want to look for the words "Corporation" and "Board" in the bold part of the following text:
...the Board of Company A oversees the management of risks inherent in the operation of the Corporation businesses and the implementation of its strategic plan. The Board reviews the risks associated with the Corporation strategic plan at an annual strategic planning session and periodically throughout the year as part of its consideration of the strategic direction of Company B. In addition, the Board addresses the primary risks associated with...
I have managed to count the occurrences of the set of words but from the whole txt file and not the part from Company A up to Company B.
import os
import sys
import glob
for filename in glob.iglob('file path' + '**/*', recursive=True):
def countWords(filename, list_words):
try:
reading = open(filename, "r+", encoding="utf-8")
check = reading.readlines()
reading.close()
for each in list_words:
lower = each.lower()
count = 0
for string in check:
word_check = string.split()
for word in word_check:
lowerword = word.lower()
line = lowerword.strip("!##$%^&*()_+?><:.,-'\\ ")
if lower == line:
count += 1
print(lower, ":", count)
except FileNotFoundError:
print("This file doesn't exist.")
for zero in list_words:
if zero != "":
print(zero, ":", "0")
else:
pass
print('----')
print(os.path.basename(filename))
countWords(filename, ["Corporation", "Board"])
The final output for the example text should be like this:
txtfile1
Corporation: 2
Board: 1
And the above process should be replicated for all txt files of the folder and exported as an excel file.
Thanks for the consideration and I apologize in advance for the length of the question.
you might try regexp, assuming you want the whole string if you see repetitions of company a before you see company b.
re.findall('company a.*?company b', 'company a did some things in agreement with company b')
That will provide a list of all the text strings starting with company a and ending with company b.

Python3 - Spellcheck txt file - replace values and preserve formatting

Consider below .txt file: myfile.txt:
Box-No.: DK10-95794
Total Discounts USD 1,360.80
Totat: usp 529.20
As you can see, in above text file there is two errors totat and usp (should be total and usd)
Now, I am using a Python package built upon SymSpell, called SymSpellPy. This can check a word and determine if it's spelled incorrectly.
This is my Python script:
# maximum edit distance per dictionary precalculation
max_edit_distance_dictionary = 2
prefix_length = 7
# create object
sym_spell = SymSpell(max_edit_distance_dictionary, prefix_length)
# load dictionary
dictionary_path = os.path.join(
os.path.dirname(__file__), "Dictionaries/eng.dictionary.txt")
term_index = 0 # column of the term in the dictionary text file
count_index = 1 # column of the term frequency in the dictionary text file
with open("myfile.txt", "r") as file:
for line in file:
for word in re.findall(r'\w+', line):
# word by word
input_term = word
# max edit distance per lookup
max_edit_distance_lookup = 2
suggestion_verbosity = Verbosity.CLOSEST # TOP, CLOSEST, ALL
suggestions = sym_spell.lookup(input_term, suggestion_verbosity,
max_edit_distance_lookup)
# display suggestion term, term frequency, and edit distance
for suggestion in suggestions:
word = word.replace(input_term, suggestion.term)
print("{}, {}". format(input_term, word))
Running above script on my text file, gives me this output:
Total, Total
USD, USD
Totat, Total
As you can see, it correctly catches the last word totat => total.
My question is - how can I find mispelled words and correct them in a txt file?

How do I delete the title from a text file in Python?

I have around 2,000 text files containing summaries of news articles and I want to remove the title from all the files that have titles (some don't have titles for some reason) using Python.
Here's an example:
Ad sales boost Time Warner profit
Quarterly profits at US media giant TimeWarner jumped 76% to $1.13bn (£600m) for the three months to December, from $639m year-earlier.Its profits were buoyed by one-off gains which offset a profit dip at Warner Bros, and less users for AOL.It lost 464,000 subscribers in the fourth quarter profits were lower than in the preceding three quarters.However, the company said AOL's underlying profit before exceptional items rose 8% on the back of stronger internet advertising revenues.Time Warner's fourth quarter profits were slightly better than analysts' expectations.For the full-year, TimeWarner posted a profit of $3.36bn, up 27% from its 2003 performance, while revenues grew 6.4% to $42.09bn.For 2005, TimeWarner is projecting operating earnings growth of around 5%, and also expects higher revenue and wider profit margins.
My question is how to remove the line, "Ad sales boost Time Warner profit" ?
Edit: I basically want to remove everything before a line break.
TIA.
If it's (as you say) just a simple matter of removing the first line, when followed by \n\n, you could use a simple regex like this:
import re
with open('testing.txt', 'r') as fin:
doc = fin.read()
doc = re.sub(r'^.+?\n\n', '', doc)
try this:
it will split the text into everything before the line break "\n\n" and only select the last element (the body)
line.split('\n\n', 1)[-1]
This also works when there is no line break in the text
As you may know, you can't read and write to a file. - Therefore the solution in this case would be to read the lines to a variable; modify and re-write to file.
lines = []
# open the text file in read mode and readlines (returns a list of lines)
with open('textfile.txt', 'r') as file:
lines = file.readlines()
# open the text file in write mode and write lines
with open('textfile.txt', 'w') as file:
# if the number of lines is bigger than 1 (assumption) write summary else write all lines
file.writelines(lines[2:] if len(lines) > 1 else lines)
The above is a simple example of how you can achieve what you're after. - Although keep in mind that edge cases might be present.
This will remove everything before the first line break ('\n\n').
with open('text.txt', 'r') as file:
f = file.read()
idx = f.find('\n\n') # Search for a line break
if idx > 0: # If found, return everything after it
g = f[idx+2:]
else: # Otherwise, return the original text file
g = f
print(g)
# Save the file
with open('text.txt', 'w') as file:
file.write(g)
"Quarterly profits at US media giant TimeWarner jumped 76% to $1.13bn (£600m) for the three months to December, from $639m year-earlier.Its profits were buoyed by one-off gains which offset a profit dip at Warner Bros, and less users for AOL.It lost 464,000 subscribers in the fourth quarter profits were lower than in the preceding three quarters.However, the company said AOL's underlying profit before exceptional items rose 8% on the back of stronger internet advertising revenues.Time Warner's fourth quarter profits were slightly better than analysts' expectations.For the full-year, TimeWarner posted a profit of $3.36bn, up 27% from its 2003 performance, while revenues grew 6.4% to $42.09bn.For 2005, TimeWarner is projecting operating earnings growth of around 5%, and also expects higher revenue and wider profit margins.\n"

Python help. Finding largest value in a file and printing out value w name

I need to create a progtam that opens a file then reads the values inside the file and then prints out the name with the largest value.
The file contains the following info:
Juan,27
Joe,16
Mike,29
Roy,10
Now the code I have is as follows:
UserFile=input('enter file name')
FileOpen=open(User File,'r')
for lines in User File:
data=line.split(",")
name=data[0]
hrs=data[1]
hrs=int(hrs)
LHRS = 0
if hrs > LHRS:
LHRS = hrs
if LHRS == LHRS:
print('Person with largest hours is',name)
The following prints out :
Person with the largest hours is Juan
Person with the largest hours is Mike
How can I make it so it only prints out the true largest?
While your effort for a first timer is pretty impressive, what you're unable to do here is.. Keep track of the name WHILE keeping track of the max value! I'm sure it can be done in your way, but might I suggest an alternative?
import operator
Let's read in the file like how I've done. This is good practice, this method handles file closing which can be the cause of many problems if not done properly.
with open('/Users/abhishekbabuji/Desktop/example.txt', 'r') as fh:
lines = fh.readlines()
Now that I have each line in a list called lines, it also has this annoying \n in it. Let's replace that with empty space ''
lines = [line.replace("\n", "") for line in lines]
Now we have a list like this. ['Name1, Value1', 'Name2, Value2'..] What I intend to do now, is for each string item in my list, take the first part in as a key, and the integer portion of the second part as the value to my dictionary called example_dict. So in 'Name1, Value1', Name1 is the item in index 0 and Name2 is my item in index 1 when I turn this into a list like I've done below and added the key, value pair into the dictionary.
example_dict = {}
for text in lines:
example_dict[text.split(",")[0]] = int(text.split(",")[1])
print(example_dict)
Gives:
{'Juan': 27, 'Joe': 16, 'Mike': 29, 'Roy': 10}
Now, obtain the key whose value is max and print it.
largest_hour = max(example_dict.items(), key=operator.itemgetter(1))[1]
highest_key = []
for person, hours in example_dict.items():
if hours == largest_hour:
highest_key.append((person, hours))
for pair in highest_key:
print('Person with largest hours is:', pair[0])

Word & Line Concordance Program

I originally posted this question here but was then told to post it to code review; however, they told me that my question needed to be posted here instead. I will try to better explain my problem so hopefully there is no confusion. I am trying to write a word-concordance program that will do the following:
1) Read the stop_words.txt file into a dictionary (use the same type of dictionary that you’re timing) containing only stop words, called stopWordDict. (WARNING: Strip the newline(‘\n’) character from the end of the stop word before adding it to stopWordDict)
2) Process the WarAndPeace.txt file one line at a time to build the word-concordance dictionary(called wordConcordanceDict) containing “main” words for the keys with a list of their associated line numbers as their values.
3) Traverse the wordConcordanceDict alphabetically by key to generate a text file containing the concordance words printed out in alphabetical order along with their corresponding line numbers.
I tested my program on a small file with a short list of stop words and it worked correctly (provided an example of this below). The outcome was what I expected, a list of the main words with their line count, not including words from the stop_words_small.txt file. The only difference between the small file I tested and the main file I am actually trying to test, is the main file is much longer and contains punctuation. So the problem I am running into is when I run my program with the main file, I am getting way more results then expected. The reason I am getting more results then expected is because the punctuation is not being removed from the file.
For example, below is a section of the outcome where my code counted the word Dmitri as four separate words because of the different capitalization and punctuation that follows the word. If my code were to remove the punctuation correctly, the word Dmitri would be counted as one word followed by all the locations found. My output is also separating upper and lower case words, so my code is not making the file lower case either.
What my code currently displays:
Dmitri : [2528, 3674, 3687, 3694, 4641, 41131]
Dmitri! : [16671, 16672]
Dmitri, : [2530, 3676, 3685, 13160, 16247]
dmitri : [2000]
What my code should display:
dmitri : [2000, 2528, 2530, 3674, 3676, 3685, 3687, 3694, 4641, 13160, 16671, 16672, 41131]
Words are defined to be sequences of letters delimited by any non-letter. There should also be no distinction made between upper and lower case letters, but my program splits those up as well; however, blank lines are to be counted in the line numbering.
Below is my code and I would appreciate it if anyone could take a look at it and give me any feedback on what I am doing wrong. Thank you in advance.
import re
def main():
stopFile = open("stop_words.txt","r")
stopWordDict = dict()
for line in stopFile:
stopWordDict[line.lower().strip("\n")] = []
hwFile = open("WarAndPeace.txt","r")
wordConcordanceDict = dict()
lineNum = 1
for line in hwFile:
wordList = re.split(" |\n|\.|\"|\)|\(", line)
for word in wordList:
word.strip(' ')
if (len(word) != 0) and word.lower() not in stopWordDict:
if word in wordConcordanceDict:
wordConcordanceDict[word].append(lineNum)
else:
wordConcordanceDict[word] = [lineNum]
lineNum = lineNum + 1
for word in sorted(wordConcordanceDict):
print (word," : ",wordConcordanceDict[word])
if __name__ == "__main__":
main()
Just as another example and reference here is the small file I test with the small list of stop words that worked perfectly.
stop_words_small.txt file
a, about, be, by, can, do, i, in, is, it, of, on, the, this, to, was
small_file.txt
This is a sample data (text) file to
be processed by your word-concordance program.
The real data file is much bigger.
correct output
bigger: 4
concordance: 2
data: 1 4
file: 1 4
much: 4
processed: 2
program: 2
real: 4
sample: 1
text: 1
word: 2
your: 2
You can do it like this:
import re
from collections import defaultdict
wordConcordanceDict = defaultdict(list)
with open('stop_words_small.txt') as sw:
words = (line.strip() for line in sw)
stop_words = set(words)
with open('small_file.txt') as f:
for line_number, line in enumerate(f, 1):
words = (re.sub(r'[^\w\s]','',word).lower() for word in line.split())
good_words = (word for word in words if word not in stop_words)
for word in good_words:
wordConcordanceDict[word].append(line_number)
for word in sorted(wordConcordanceDict):
print('{}: {}'.format(word, ' '.join(map(str, wordConcordanceDict[word]))))
Output:
bigger: 4
data: 1 4
file: 1 4
much: 4
processed: 2
program: 2
real: 4
sample: 1
text: 1
wordconcordance: 2
your: 2

​I will add explanations tomorrow, it's getting late here ;). Meanwhile, you can ask in the comments if some part of the code isn't clear for you.

Categories