does for keyword understand that a string is an iteration? - python

Question :
" Open the file romeo.txt and read it line by line. For each line, split the line into a list of words using the split() method. The program should build a list of words. For each word on each line check to see if the word is already in the list and if not append it to the list. When the program completes, sort and print the resulting words in alphabetical order."
Code:
fname = input("Enter file name: ")
fh = open(fname)
hh = list()
for sen in fh:
sen=sen.split()
for element in sen:
if element not in hh:
hh.append(element)
hh.sort()
print(hh)
I want to make sure that I understood the code. So first we took the file name then opened it then we created an empty list then we split the strings into a list and then we checked if the elements in sen is in the empty list we created and then we appended it and printed.
Also, I have a question when using the for keyword, does the for keyword understand that each word in the file is an iteration even before splitting it??

Python str.split() documentation: https://docs.python.org/3/library/stdtypes.html#str.split
fname = input("Enter file name: ") #-- user enters name of a file
fh = open(fname) #-------------------- open the file
hh = list() #------------------------- create an empty list
for sen in fh: #---------------------- loop through lines in the file
sen=sen.split() #----------------- split the line into words
for element in sen: #------------- loop through words in the line
if element not in hh: #------- if word is not in the list of unique words
hh.append(element) #------ add the word to the list
hh.sort() #--------------- organize the list
print(hh) #--------------------------- print the list of unique words
hh will be a list of all the unique words in the file.
The best way to work with files in Python is to use Context Managers.
Python Context Manager documentation: https://docs.python.org/3/library/contextlib.html
You should probably use:
filename = input("Enter file name: ")
unique_words = list()
with open(filename, "r") as file: # 'with' Context Manager
for line in file:
line = line.split()
for word in line:
if word not in unique_words:
unique_words.append(word)
unique_words.sort()
print(unique_words)

Related

How to split words in each line?

I wrote a code to get each word in a line of a text file. Please see below.
fname = input('Enter your file name:')
try:
fhand = open(fname)
except:
print('File cannot be opened:', fname)
word=[]
for word in fhand:
word = word.split()
print(word)
I run this code and I got what I want it. But, when I just run print(word), it only shows words of the last line. I think I didn't define word in the beginning. Then I added word = [] but the results are the same.
in your loop -
for word in fhand:
you reassign the variable 'word' everytime it loops. Since you made 'word' a list, you would need to append to it the new lines instead of reassigning it. Also, there is probably some issue with using 'word' twice, once as a list and once in your for loop counter. Try something like-
word = []
fname = input('Enter your file name:')
with open(fname, 'r') as file:
for line in file:
word.append(line.split())

read words from file, line by line and concatenate to paragraph

I have a really long list of words that are on each line. How do I make a program that takes in all that and print them all side by side?
I tried making the word an element of a list, but I don't know how to proceed.
Here's the code I've tried so far:
def convert(lst):
return([i for item in lst for i in item.split()])
lst = [''' -The list of words come here- ''']
print(convert(lst))
If you already have the words in a list, you can use the join() function to concatenate them. See https://docs.python.org/3/library/stdtypes.html#str.join
words = open('your_file.txt').readlines()
separator = ' '
print(separator.join(words))
Another, a little bit more cumbersome method would be to print the words using the builtin print() function but suppress the newline that print() normally adds automatically to the end of your argument.
words = open('your_file.txt').readlines()
for word in words:
print(word, end=' ')
Try this, and example.txt just has a list of words going down line by line.
with open("example.txt", "r") as a_file:
sentence = ""
for line in a_file:
stripped_line = line.strip()
sentence = sentence + f"{stripped_line} "
print(sentence)
If your input file is really large and you cant fit it all in memory, you can read the words lazy and write them to disk instead of holding the whole output in memory.
# create a generator that yields each individual line
lines = (l for l in open('words'))
with open("output", "w+") as writer:
# read the file line by line to avoid memory issues
while True:
try:
line = next(lines)
# add to the paragraph in the out file
writer.write(line.replace('\n', ' '))
except StopIteration:
break
You can check the working example here: https://replit.com/#bluebrown/readwritewords#main.py

how to read a specific set of words in .txt file and generate a randomizer of those words

I wanna open a file named words.txt file and generate words randomly through 1 to 10 depending on which number the user enters.
library:la biblioteca
school:el colegio,la escuela
restaurant:el restaurante
movie theater:el cine
airport:el aeropuerto
museum:el museo
park:el parque
university:la universidad
office:la oficina,el despacho
house:la casa
Is there a way to read only the "second" part of the words, first line for example. skip "library:", and read "la biblioteca" without hardingcoding the words.
with open("words.txt", "r") as infile:
words = infile.readline().split() #This is the line that needs improvement
random_word = random.choice(words)
newKeys = False
for i in range(10):
a = random.choice(1, 10)
This is how far I got but I know my second line is what I gotta change (probably)
Sorry for bad english
The following snippet does what you described:
import random
with open("words.txt", "r") as infile:
words = [line.rstrip().split(":")[1] for line in infile]
for i in range(10):
print (random.choice(words))
The rstrip() call is necessary to remove the newline character at the end of each line, and split(":") splits the line on the colon character, so [1] will return the second part. The whole expression is inside a list comprehension, so it will be repeated for every line of the file, and the result is collected in the list words.
This is close. After reading the words.txt file, you can use the split() method on a string, for example:
print("example:1".split(":"))
# ['example', '1']
If you want to print out "la biblioteca" and skip past "library":
import random
with open("words.txt", "r") as infile:
# Read a list of words from the file
words = infile.read().splitlines()
# Replace word strings with lists where they've been split on ":"
words = [word.split(":") for word in words]
for i in range(10):
# Choose a random word from the words
a = random.choice(words)
# Only print the right portion
print(a[1])

Random substitution

I have a txt file and a dictionary, where keys are adjectives, values are their synonyms. I need to replace the common adjectives from the dictionary which I meet in a given txt file with their synonyms - randomly! and save both versions - with changed and unchanged adjectives - line by line - in a new file(task3_edited_text). My code:
#get an English text as a additional input
filename_eng = sys.argv[2]
infile_eng = open(filename_eng, "r")
task3_edited_text = open("task3_edited_text.txt", "w")
#necessary for random choice
import random
#look for adjectives in English text
#line by line
for line in infile_eng:
task3_edited_text.write(line)
line_list = line.split()
#for each word in line
for word in line_list:
#if we find common adjectives, change them into synonym, randomly
if word in dict.keys(dictionary):
word.replace(word, str(random.choice(list(dictionary.values()))))
else:
pass
task3_edited_text.write(line)
Problem is in the output adjectives are not substituted by their values.
line_list = line.split()
...
task3_edited_text.write(line)
The issue is that you try to modify line_list, which you created from line. However, line_list is simply a list made from copying values generated from line ; modifying it doesn't change line in the slightest. So writing line to the file writes the unmodified line to the file, and doesn't take your changes into account.
You probably want to generate a line_to_write from line_list, and writing it to the file instead, like so:
line_to_write = " ".join(line_list)
task3_edited_text.write(line_to_write)
Also, line_list isn't even modified in your code as word is a copy of an element in line_list and not a reference to the original. Moreover, replace returns a copy of a string and doesn't modify the string you call it on. You probably want to modify line_list via the index of the elements like so:
for idx, word in enumerate(line_list):
#if we find common adjectives, change them into synonym, randomly
if word in dict.keys(dictionary):
line_list[idx] = word.replace(word, str(random.choice(list(dictionary.values()))))
else:
pass

How to search a text file for a specific word in Python

I want to find words in a text file that match words stored in an existing list called items, the list is created in a previous function and I want to be able to use the list in the next function as well but I'm unsure how to do that, I tried using classes for that but i couldn't get it right. And I can't figure out what the problem is with the rest of the code. I tried running it without the class and list and replaced the list 'items[]' in line 8 with a word in the text file being opened and it still didn't do anything, even though no errors come up. When the below code is run it prints out: "Please entre a valid textfile name: " and it stops there.
class searchtext():
textfile = input("Please entre a valid textfile name: ")
items = []
def __init__search(self):
with open("textfile") as openfile:
for line in openfile:
for part in line.split():
if ("items[]=") in part:
print (part)
else:
print("not found")
The list is created from another text file containing words in a previous function that looks like this and it works as it should, if it is to any help:
def createlist():
items = []
with open('words.txt') as input:
for line in input:
items.extend(line.strip().split(','))
return items
print(createlist())
You can use regexp the following way:
>>> import re
>>> words=['car','red','woman','day','boston']
>>> word_exp='|'.join(words)
>>> re.findall(word_exp,'the red car driven by the woman',re.M)
['red', 'car', 'woman']
The second command creates a list of acceptable words separated by "|". To run this on a file, just replace the string in 'the red car driven by the woman' for open(your_file,'r').read().
This may be a bit cleaner. I feel class is an overkill here.
def createlist():
items = []
with open('words.txt') as input:
for line in input:
items.extend(line.strip().split(','))
return items
print(createlist())
# store the list
word_list = createlist()
with open('file.txt') as f:
# split the file content to words (first to lines, then each line to it's words)
for word in (sum([x.split() for x in f.read().split('\n')], [])):
# check if each word is in the list
if word in word_list:
# do something with word
print word + " is in the list"
else:
# word not in list
print word + " is NOT in the list"
There is nothing like Regular expressions in matching https://docs.python.org/3/howto/regex.html
items=['one','two','three','four','five'] #your items list created previously
import re
file=open('text.txt','r') #load your file
content=file.read() #save the read output so the reading always starts from begining
for i in items:
lis=re.findall(i,content)
if len(lis)==0:
print('Not found')
elif len(lis)==1:
print('Found Once')
elif len(lis)==2:
print('Found Twice')
else:
print('Found',len(lis),'times')

Categories