How to load a word list into Python - python

I'm working through an introductory Python programming course on MIT OCW. On this problem set I've been given some code to work on and a text file. The code and the text file are in the same folder. The code looks like this:
import random
import string
def load_words( ):
print "Loading word list from file..."
inFile = open (WORDLIST_FILENAME, 'r', 0)
line = inFile.readline( )
wordlist = string.split (line)
print " ", len(wordlist), "words loaded."
return wordlist
def choose_word (wordlist):
return random.choice (wordlist)
wordlist = load_words ( )
When I run the code as it is, the problem set instructions say I should get this:
Loading word list from file...
55900 words loaded.
For some reason though, when I run the code I get:
Loading word list from file...
1 words loaded
I've tried omitting the 2nd and 3rd parameters from the input to the open function but to no avail. What could the problem be?
Moreover, when I try to print the value of wordlist I get
['AA']
When I print the value of line within the context of the relevant function I get:
AA
The text file does begin with 'AA', but what about all of the letters that follow?

line = inFile.readline( ) should be readlines(), plural.
readline would read only a single line. The reason why only one word is read.
Using readlines() would give you a list delimited by new line characters in your input file.

raw file like this:
cat wordlist.txt
aa
bb
cc
dd
ee
python file like this:
import random
def load_words(WORDLIST_FILENAME):
print "Loading word list from file..."
wordlist = list()
# 'with' can automate finish 'open' and 'close' file
with open(WORDLIST_FILENAME) as f:
# fetch one line each time, include '\n'
for line in f:
# strip '\n', then append it to wordlist
wordlist.append(line.rstrip('\n'))
print " ", len(wordlist), "words loaded."
print '\n'.join(wordlist)
return wordlist
def choose_word (wordlist):
return random.choice (wordlist)
wordlist = load_words('wordlist.txt')
then result:
python load_words.py
Loading word list from file...
5 words loaded.
aa
bb
cc
dd
ee

the function u have written can read words in a single line. It assumes all words are written in single line in text file and hence reads that line and creates a list by splitting it. However, it appears your text file contains some newlines also. Hence u can replace the following with:
line = inFile.readline( )
wordlist = string.split (line)
with:
wordlist =[]
for line in inFile:
line = line.split()
wordlist.extend(line)
print " ", len(wordlist), "words loaded."

Related

read words from file, line by line and concatenate to paragraph

I have a really long list of words that are on each line. How do I make a program that takes in all that and print them all side by side?
I tried making the word an element of a list, but I don't know how to proceed.
Here's the code I've tried so far:
def convert(lst):
return([i for item in lst for i in item.split()])
lst = [''' -The list of words come here- ''']
print(convert(lst))
If you already have the words in a list, you can use the join() function to concatenate them. See https://docs.python.org/3/library/stdtypes.html#str.join
words = open('your_file.txt').readlines()
separator = ' '
print(separator.join(words))
Another, a little bit more cumbersome method would be to print the words using the builtin print() function but suppress the newline that print() normally adds automatically to the end of your argument.
words = open('your_file.txt').readlines()
for word in words:
print(word, end=' ')
Try this, and example.txt just has a list of words going down line by line.
with open("example.txt", "r") as a_file:
sentence = ""
for line in a_file:
stripped_line = line.strip()
sentence = sentence + f"{stripped_line} "
print(sentence)
If your input file is really large and you cant fit it all in memory, you can read the words lazy and write them to disk instead of holding the whole output in memory.
# create a generator that yields each individual line
lines = (l for l in open('words'))
with open("output", "w+") as writer:
# read the file line by line to avoid memory issues
while True:
try:
line = next(lines)
# add to the paragraph in the out file
writer.write(line.replace('\n', ' '))
except StopIteration:
break
You can check the working example here: https://replit.com/#bluebrown/readwritewords#main.py

how to read a specific set of words in .txt file and generate a randomizer of those words

I wanna open a file named words.txt file and generate words randomly through 1 to 10 depending on which number the user enters.
library:la biblioteca
school:el colegio,la escuela
restaurant:el restaurante
movie theater:el cine
airport:el aeropuerto
museum:el museo
park:el parque
university:la universidad
office:la oficina,el despacho
house:la casa
Is there a way to read only the "second" part of the words, first line for example. skip "library:", and read "la biblioteca" without hardingcoding the words.
with open("words.txt", "r") as infile:
words = infile.readline().split() #This is the line that needs improvement
random_word = random.choice(words)
newKeys = False
for i in range(10):
a = random.choice(1, 10)
This is how far I got but I know my second line is what I gotta change (probably)
Sorry for bad english
The following snippet does what you described:
import random
with open("words.txt", "r") as infile:
words = [line.rstrip().split(":")[1] for line in infile]
for i in range(10):
print (random.choice(words))
The rstrip() call is necessary to remove the newline character at the end of each line, and split(":") splits the line on the colon character, so [1] will return the second part. The whole expression is inside a list comprehension, so it will be repeated for every line of the file, and the result is collected in the list words.
This is close. After reading the words.txt file, you can use the split() method on a string, for example:
print("example:1".split(":"))
# ['example', '1']
If you want to print out "la biblioteca" and skip past "library":
import random
with open("words.txt", "r") as infile:
# Read a list of words from the file
words = infile.read().splitlines()
# Replace word strings with lists where they've been split on ":"
words = [word.split(":") for word in words]
for i in range(10):
# Choose a random word from the words
a = random.choice(words)
# Only print the right portion
print(a[1])

Print output to text file (.txt) using Python

I want print my output to text file. But the results different if I print in terminal. My code :
...
words = keywords.split("makan","Rina")
sentences = text.split(".")
for itemIndex in range(len(sentences)):
for word in words:
if word in sentences[itemIndex]:
print('"' + sentences[itemIndex] + '."')
break
The ouput like this :
"Semalam saya makan nasi padang."
" Saya makan bersama Rina."
" Rina pesan ayam goreng."
If I add print to text file :
words = ["makan","Rina"]
sentences = text.split(".")
for itemIndex in range(len(sentences)):
for word in words:
if word in sentences[itemIndex]:
with open("corpus.txt",'w+') as f:
f.write(sentences[itemIndex])
f.close()
The output just :
Rina pesan ayam goreng
Why? How to print outputs to text file same like I print outputs in terminal?
You are reopening the file for each iteration of the loop so when you write to it you overwrite what is already there. You need to open the file outside of all the loops and open it in append mode, denoted by a.
When you finish you will end up with only the last line in the file. Remember to close the file using f.close() when you are done with it.
You have to reorder the lines of your code, by moving opening/closing the file outside of the loop:
with open("corpus.txt",'w+') as f:
words = ["makan","Rina"]
sentences = text.split(".")
for itemIndex in range(len(sentences)):
for word in words:
if word in sentences[itemIndex]:
f.write(sentences[itemIndex])
Also, print usually added a newline character after the output, if you want your sentences to be written on the different lines in the file, you may want to add f.write('\n') after every sentence.
Because you are listing with open inside of the loop, and you're using 'w+' mode, your program is going to overwrite the file each time, so you will only end up with the last line written to the file. Try it with 'a' instead, or move with open outside of the loop.
You don't need to call close on a file handle that you have opened using the with syntax. The closing of the file is handled for you.
I would open the file just once before for loops (the for loops should be within the with statement) instead of opening it multiple times. You are overwriting the file each time you are opening it to write a new line.
Your code should be:
words = ["makan","Rina"]
sentences = text.split(".")
with open("corpus.txt",'w+') as f:
for itemIndex in range(len(sentences)):
for word in words:
if word in sentences[itemIndex]:
f.write(sentences[itemIndex] + '\n')

Splitting a line, then searching for string

sry im still new to python.
My complete code so far:
for line in file:
line = line.split("\t")
if my_var in line[1]:
print line[13]
What the program should do, is reading lines from a file.
the lines have the following Format:
"word" \t "word" \t "word" ...
The Programm should split each line into a list of strings containing the words
==> list = (word1, word2, word3, ...)
then i wish to test if the word at index 1 matches a given word, and if so i wish to print the word at index 13 (each line has the same ammount of elements)
What i dont understand is, writing:
for line in file:
line = line.split("\t")
word = line[1]
print word
works, while
for line in file:
line = line.split("\t")
word = line[1]
if my_var in word:
print line[13]
does not work.
Im pretty shure there is an easy solution to this Problem and that i simply cant find it.
Your error is because of the following line :
print line[16]
Your splited list hasn't 16 item it is just contain 4 item and you have tried to get the 16th index.

How to search a text file for a specific word in Python

I want to find words in a text file that match words stored in an existing list called items, the list is created in a previous function and I want to be able to use the list in the next function as well but I'm unsure how to do that, I tried using classes for that but i couldn't get it right. And I can't figure out what the problem is with the rest of the code. I tried running it without the class and list and replaced the list 'items[]' in line 8 with a word in the text file being opened and it still didn't do anything, even though no errors come up. When the below code is run it prints out: "Please entre a valid textfile name: " and it stops there.
class searchtext():
textfile = input("Please entre a valid textfile name: ")
items = []
def __init__search(self):
with open("textfile") as openfile:
for line in openfile:
for part in line.split():
if ("items[]=") in part:
print (part)
else:
print("not found")
The list is created from another text file containing words in a previous function that looks like this and it works as it should, if it is to any help:
def createlist():
items = []
with open('words.txt') as input:
for line in input:
items.extend(line.strip().split(','))
return items
print(createlist())
You can use regexp the following way:
>>> import re
>>> words=['car','red','woman','day','boston']
>>> word_exp='|'.join(words)
>>> re.findall(word_exp,'the red car driven by the woman',re.M)
['red', 'car', 'woman']
The second command creates a list of acceptable words separated by "|". To run this on a file, just replace the string in 'the red car driven by the woman' for open(your_file,'r').read().
This may be a bit cleaner. I feel class is an overkill here.
def createlist():
items = []
with open('words.txt') as input:
for line in input:
items.extend(line.strip().split(','))
return items
print(createlist())
# store the list
word_list = createlist()
with open('file.txt') as f:
# split the file content to words (first to lines, then each line to it's words)
for word in (sum([x.split() for x in f.read().split('\n')], [])):
# check if each word is in the list
if word in word_list:
# do something with word
print word + " is in the list"
else:
# word not in list
print word + " is NOT in the list"
There is nothing like Regular expressions in matching https://docs.python.org/3/howto/regex.html
items=['one','two','three','four','five'] #your items list created previously
import re
file=open('text.txt','r') #load your file
content=file.read() #save the read output so the reading always starts from begining
for i in items:
lis=re.findall(i,content)
if len(lis)==0:
print('Not found')
elif len(lis)==1:
print('Found Once')
elif len(lis)==2:
print('Found Twice')
else:
print('Found',len(lis),'times')

Categories