I have a txt file and a dictionary, where keys are adjectives, values are their synonyms. I need to replace the common adjectives from the dictionary which I meet in a given txt file with their synonyms - randomly! and save both versions - with changed and unchanged adjectives - line by line - in a new file(task3_edited_text). My code:
#get an English text as a additional input
filename_eng = sys.argv[2]
infile_eng = open(filename_eng, "r")
task3_edited_text = open("task3_edited_text.txt", "w")
#necessary for random choice
import random
#look for adjectives in English text
#line by line
for line in infile_eng:
task3_edited_text.write(line)
line_list = line.split()
#for each word in line
for word in line_list:
#if we find common adjectives, change them into synonym, randomly
if word in dict.keys(dictionary):
word.replace(word, str(random.choice(list(dictionary.values()))))
else:
pass
task3_edited_text.write(line)
Problem is in the output adjectives are not substituted by their values.
line_list = line.split()
...
task3_edited_text.write(line)
The issue is that you try to modify line_list, which you created from line. However, line_list is simply a list made from copying values generated from line ; modifying it doesn't change line in the slightest. So writing line to the file writes the unmodified line to the file, and doesn't take your changes into account.
You probably want to generate a line_to_write from line_list, and writing it to the file instead, like so:
line_to_write = " ".join(line_list)
task3_edited_text.write(line_to_write)
Also, line_list isn't even modified in your code as word is a copy of an element in line_list and not a reference to the original. Moreover, replace returns a copy of a string and doesn't modify the string you call it on. You probably want to modify line_list via the index of the elements like so:
for idx, word in enumerate(line_list):
#if we find common adjectives, change them into synonym, randomly
if word in dict.keys(dictionary):
line_list[idx] = word.replace(word, str(random.choice(list(dictionary.values()))))
else:
pass
Related
so the problem I came across is that I need to read each word of a line from a line one by one and repeat it for the whole file. each of the words are separated from each other by the # sign, e.g
2016/2017#Southeast_Kootenay#Mount_Baker_Secondary#STANDARD#COURSE_MARKS#99.0#71.0#88.0#49.0
after that I need to assign each value to the appropriate element of a class, for example:
school_years would be 2016/2017, district_name would be Southeast_Kootenay and etc.
the thing is that I have clue how to do it, I managed to extract the first word from a file but couldn't do it for the whole line and let alone the whole file, this is the code I used.
def word_return():
for lines in file:
for word in lines.split('#'):
return word
any kind of help would be appreciated
You're returning a single word. Remove last for and return the entire list like this if you want to get only the first line:
(Assuming file is a list of lines resulted from file = open("file.txt", "r").readlines())
def word_return():
for line in open("yourFile.txt", "r").readlines():
return lines.split('#')
If you want to return a list that will contain a list for each line, check the following:
def word_return():
allLines = []
for line in open("yourFile.txt", "r").readlines():
allLines.append(lines.split('#'))
return allLines
I want to read from a file various lines like this for example:
hello I live in London.
hello I study.
And then based on what is the first word I want to remove the line from the file.
Can I put which sentence in a array?
You can read in the entire contents of the file into memory (into a list), choose which lines you wish to keep, and write those a new file (you can replace the old one if you wish).
For example:
old_lines = open("input.txt",'r').readlines()
new_lines = []
for line in old_lines:
words = line.split()
if words[0] == 'hello': # if the first word is "hello", keep it.
new_lines.append(line)
f = open("output.txt",'w')
for line in new_lines:
f.write(line)
I am using Python-3 and I am reading a text file which can have multiple paragraphs separated by '\n'. I want to split all those paragraphs into a separate list. There can be n number of paragraphs in the input file.
So this split and output list creation should happen dynamically thereby allowing me to view a particular paragraph by just entering the paragraph number as list[2] or list[3], etc....
So far I have tried the below process :
input = open("input.txt", "r") #Reading the input file
lines = input.readlines() #Creating a List with separate sentences
str = '' #Declaring a empty string
for i in range(len(lines)):
if len(lines[i]) > 2: #If the length of a line is < 2, It means it can be a new paragraph
str += lines[i]
This method will not store paragraphs into a new list (as I am not sure how to do it). It will just remove the line with '\n' and stores all the input lines into str variable. When I tried to display the contents of str, it is showing the output as words. But I need them as sentences.
And my code should store all the sentences until first occurence of '\n' into a separate list and so on.
Any ideas on this ?
UPDATE
I found a way to print all the lines that are present until '\n'. But when I try to store them into the list, it is getting stored as letters, not as whole sentences. Below is the code snippet for reference
input = open("input.txt", "r")
lines = input.readlines()
input_ = []
for i in range(len(lines)):
if len(lines[i]) <= 2:
for j in range(i):
input_.append(lines[j]) #This line is storing as letters.
even "input_ += lines" is storing as letters, Not as sentences.
Any idea how to modify this code to get the desired output ?
Don't forgot to do input.close(), or the file won't save.
Alternatively you can use with.
#Using "with" closes the file automatically, so you don't need to write file.close()
with open("input.txt","r") as file:
file_ = file.read().split("\n")
file_ is now a list with each paragraph as a separate item.
It's as simple as 2 lines.
I have to compress a file into a list of words and list of positions to recreate the original file. My program should also be able to take a compressed file and recreate the full text, including punctuation and capitalization, of the original file. I have everything correct apart from the recreation, using the map function my program can't convert my list of positions into floats because of the '[' as it is a list.
My code is:
text = open("speech.txt")
CharactersUnique = []
ListOfPositions = []
DownLine = False
while True:
line = text.readline()
if not line:
break
TwoList = line.split()
for word in TwoList:
if word not in CharactersUnique:
CharactersUnique.append(word)
ListOfPositions.append(CharactersUnique.index(word))
if not DownLine:
CharactersUnique.append("\n")
DownLine = True
ListOfPositions.append(CharactersUnique.index("\n"))
w = open("List_WordsPos.txt", "w")
for c in CharactersUnique:
w.write(c)
w.close()
x = open("List_WordsPos.txt", "a")
x.write(str(ListOfPositions))
x.close()
with open("List_WordsPos.txt", "r") as f:
NewWordsUnique = f.readline()
f.close()
h = open("List_WordsPos.txt", "r")
lines = h.readlines()
NewListOfPositions = lines[1]
NewListOfPositions = map(float, NewListOfPositions)
print("Recreated Text:\n")
recreation = " " .join(NewWordsUnique[pos] for pos in (NewListOfPositions))
print(recreation)
The error I get is:
Task 3 Code.py", line 42, in <genexpr>
recreation = " " .join(NewWordsUnique[pos] for pos in (NewListOfPositions))
ValueError: could not convert string to float: '['
I am using Python IDLE 3.5 (32-bit). Does anyone have any ideas on how to fix this?
Why do you want to turn the position values in the list into floats, since they list indices, and those must be integer? I suspected this might be an instance of what is called the XY Problem.
I also found your code difficult to understand because you haven't followed the PEP 8 - Style Guide for Python Code. In particular, with how many (although not all) of the variable names are CamelCased, which according to the guidelines, should should be reserved for the class names.
In addition some of your variables had misleading names, like CharactersUnique, which actually [mostly] contained unique words.
So, one of the first things I did was transform all the CamelCased variables into lowercase underscore-separated words, like camel_case. In several instances I also gave them better names to reflect their actual contents or role: For example: CharactersUnique became unique_words.
The next step was to improve the handling of files by using Python's with statement to ensure they all would be closed automatically at the end of the block. In other cases I consolidated multiple file open() calls into one.
After all that I had it almost working, but that's when I discovered a problem with the approach of treating newline "\n" characters as separate words of the input text file. This caused a problem when the file was being recreated by the expression:
" ".join(NewWordsUnique[pos] for pos in (NewListOfPositions))
because it adds one space before and after every "\n" character encountered that aren't there in the original file. To workaround that, I ended up writing out the for loop that recreates the file instead of using a list comprehension, because doing so allows the newline "words" could be handled properly.
At any rate, here's the resulting rewritten (and working) code:
input_filename = "speech.txt"
compressed_filename = "List_WordsPos.txt"
# Two lists to represent contents of input file.
unique_words = ["\n"] # preload with newline "word"
word_positions = []
with open(input_filename, "r") as input_file:
for line in input_file:
for word in line.split():
if word not in unique_words:
unique_words.append(word)
word_positions.append(unique_words.index(word))
word_positions.append(unique_words.index("\n")) # add newline at end of each line
# Write representations of the two data-structures to compressed file.
with open(compressed_filename, "w") as compr_file:
words_repr = " ".join(repr(word) for word in unique_words)
compr_file.write(words_repr + "\n")
positions_repr = " ".join(repr(posn) for posn in word_positions)
compr_file.write(positions_repr + "\n")
def strip_quotes(word):
"""Strip the first and last characters from the string (assumed to be quotes)."""
tmp = word[1:-1]
return tmp if tmp != "\\n" else "\n" # newline "words" are special case
# Recreate input file from data in compressed file.
with open(compressed_filename, "r") as compr_file:
line = compr_file.readline()
new_unique_words = list(map(strip_quotes, line.split()))
line = compr_file.readline()
new_word_positions = map(int, line.split()) # using int, not float here
words = []
lines = []
for posn in new_word_positions:
word = new_unique_words[posn]
if word != "\n":
words.append(word)
else:
lines.append(" ".join(words))
words = []
print("Recreated Text:\n")
recreation = "\n".join(lines)
print(recreation)
I created my own speech.txt test file from the first paragraph of your question and ran the script on it with these results:
Recreated Text:
I have to compress a file into a list of words and list of positions to recreate
the original file. My program should also be able to take a compressed file and
recreate the full text, including punctuation and capitalization, of the
original file. I have everything correct apart from the recreation, using the
map function my program can't convert my list of positions into floats because
of the '[' as it is a list.
Per your question in the comments:
You will want to split the input on spaces. You will also likely want to use different data structures.
# we'll map the words to a list of positions
all_words = {}
with open("speech.text") as f:
data = f.read()
# since we need to be able to re-create the file, we'll want
# line breaks
lines = data.split("\n")
for i, line in enumerate(lines):
words = line.split(" ")
for j, word in enumerate(words):
if word in all_words:
all_words[word].append((i, j)) # line and pos
else:
all_words[word] = [(i, j)]
Note that this does not yield maximum compression as foo and foo. count as separate words. If you want more compression, you'll have to go character by character. Hopefully now you can use a similar approach to do so if desired.
So what I'd like to do is to make all the lines lowercase and then use my part_list to search for all words matching in frys.txt and to append it to items. I'm having a lot of trouble creating a loop that goes through each word in the list and just actually finding the words in frys.txt. I'm even trying to find doubles if that is at all possible. But the main thing I want to be able to do is just find that the word exists and to append it to items if it does.
Any suggestions would be great!
items = []
part_list = ['ccs', 'fcex', '8-12', '8-15', '8-15b', '80ha3']
f = open("C:/Users/SilenX/Desktop/python/frys.txt", "r+")
searchlines = f.readlines()
f.close()
for n, line in enumerate(searchlines):
p = 0
if part_list[p] in line.split():
part_list[p] = part_list[p + 1]
parts = searchlines[n]
parts = parts.strip('\n')
items.append(parts)
print items
You're doing some complex stuff with enumeration that I really don't think is necessary, and it definitely looks like your inner "loop" isn't doing what you want (because as you've written it, it isn't a loop). Try this:
part_list = ['ccs', 'fcex', '8-12', '8-15', '8-15b', '80ha3']
items = []
f = open("C:/Users/SilenX/Desktop/python/frys.txt", "r") # Open the file
for line in f:
for token in line.lower().split(): # Loop over lowercase words in the line
if token in part_list: # If it's one of the words you're looking for,
items.append(token) # Append it to your list.
f.close()
print items
This will find all the words in the file that appear in your list. It will not identify words in your file that are attached to something else, like "ccs." or "fcex8-12". If you want that, you'll have to reverse the way the search works, so that you count how many times each word in part_list appears in the line rather than counting how many words in the line are in part_list.