Splitting a line, then searching for string - python

sry im still new to python.
My complete code so far:
for line in file:
line = line.split("\t")
if my_var in line[1]:
print line[13]
What the program should do, is reading lines from a file.
the lines have the following Format:
"word" \t "word" \t "word" ...
The Programm should split each line into a list of strings containing the words
==> list = (word1, word2, word3, ...)
then i wish to test if the word at index 1 matches a given word, and if so i wish to print the word at index 13 (each line has the same ammount of elements)
What i dont understand is, writing:
for line in file:
line = line.split("\t")
word = line[1]
print word
works, while
for line in file:
line = line.split("\t")
word = line[1]
if my_var in word:
print line[13]
does not work.
Im pretty shure there is an easy solution to this Problem and that i simply cant find it.

Your error is because of the following line :
print line[16]
Your splited list hasn't 16 item it is just contain 4 item and you have tried to get the 16th index.

Related

an item of list is not being equal to one another

I am creating a project game which will include palindrome words
I have a list of all the words in english and I want to check every word in the list and find the ones equal to eachother
file1 = open ('words.txt')
file2reversed = open ('words.txt')
words = file1.readlines()
print(words[3][::-1])
print()
if words[3][::-1] == words[3]:
print("equal")
else:
print("not")
my code looks like this, I wrote the 3rd word as a palindrome word and wanted to check if it is working and the output looks like this
aaa
aaa
not
why is words[3][::-1] not equal to words[3] even if it is a palindrome word?
Use file.read().splitlines() instead. file.readlines() returns lines with a newline appended to each string at the end, so when reversed, '\naaa' != 'aaa\n'.
More cleanly
file = open('words.txt')
text = file.read()
words = text.splitlines()
# words is a list of strings without '\n' at the end of each line.

how to read a specific set of words in .txt file and generate a randomizer of those words

I wanna open a file named words.txt file and generate words randomly through 1 to 10 depending on which number the user enters.
library:la biblioteca
school:el colegio,la escuela
restaurant:el restaurante
movie theater:el cine
airport:el aeropuerto
museum:el museo
park:el parque
university:la universidad
office:la oficina,el despacho
house:la casa
Is there a way to read only the "second" part of the words, first line for example. skip "library:", and read "la biblioteca" without hardingcoding the words.
with open("words.txt", "r") as infile:
words = infile.readline().split() #This is the line that needs improvement
random_word = random.choice(words)
newKeys = False
for i in range(10):
a = random.choice(1, 10)
This is how far I got but I know my second line is what I gotta change (probably)
Sorry for bad english
The following snippet does what you described:
import random
with open("words.txt", "r") as infile:
words = [line.rstrip().split(":")[1] for line in infile]
for i in range(10):
print (random.choice(words))
The rstrip() call is necessary to remove the newline character at the end of each line, and split(":") splits the line on the colon character, so [1] will return the second part. The whole expression is inside a list comprehension, so it will be repeated for every line of the file, and the result is collected in the list words.
This is close. After reading the words.txt file, you can use the split() method on a string, for example:
print("example:1".split(":"))
# ['example', '1']
If you want to print out "la biblioteca" and skip past "library":
import random
with open("words.txt", "r") as infile:
# Read a list of words from the file
words = infile.read().splitlines()
# Replace word strings with lists where they've been split on ":"
words = [word.split(":") for word in words]
for i in range(10):
# Choose a random word from the words
a = random.choice(words)
# Only print the right portion
print(a[1])

Split a .txt at each period instead of by line?

I am attempting to split a .txt file by sentence into a list, but my coding efforts can only split by line.
Example of .txt contents:
This is line 1 of txt file,
it is now on line 2. Here is the
second sentence between line 2 and 3.
Code
listed = []
with open("example.txt","r") as text:
Line = text.readline()
while Line!="":
Line1 = Line.split(".")
for sentence in Line1:
listed.append(sentence)
Line = text.readline()
print(listed)
This would print something like: ['This is line 1 of txt file,\n','it is now on line 2\n', 'Here is the\n','second sentence between line 2 and 3/n']
If the entire document was on one line, this would work correctly, except for cases like "Mr." and "Mrs." and such. However, that's a future worry. Does anyone out there know how to use split in the above scenario?
Assuming all sentence ends with a dot .
You may just :
read the whole file : fic.read()
remove return char replace('\n', '')
split on dot
apply strip on each sentence to remove spaces padding and leading spaces
keep the sentences
with open("data.txt", "r") as fic:
content = fic.read().replace('\n', '')
sentences = list(map(str.strip, content.split(".")))
A version more detailled
with open("data.txt", "r") as fic:
content = fic.read()
content = content.replace('\n', '')
sentences = content.split(".")
sentences = list(map(str.strip, sentences))
# same as
sentences = [s.strip() for s in sentences]
split on a string will split on whatever you ask it to, without regard to line breaks, just do read to pull the whole file instead of readlines. the issue becomes whether that's too much text to handle in a single read, if so you'll need to be more clever. you'll probably want to filter out actual line breaks to get the effect of one-string-per-sentence.

How to return lines that have a specific word only

Running a simple program that accepts two inputs, an input file, and a word to search for. It then should print out all lines that contain the word. For example, my input file contains 5 sentences as follows:
My cat is named garfield
He is my first Cat
My mom is named cathy
This is a catastrophe
Hello how are you
The word I want to check for is cat
This is the code I wrote:
input_file = sys.argv[1]
input_file = open(input_file,"r")
wordCheck = sys.argv[2]
for line in input_file:
if wordCheck in line:
print line
input1.close()
Now obviously, this will return lines 1, 3, and 4, because they all contain "cat" at some point. My question is, how would I work it so that only line 1 (the only line with just the word "cat") would be printed?
Second question is, what would be the best way to get all lines that had the word "cat" in them, disregarding case. So in this situation, you would return lines 1 and 2, because they contain "cat" and "Cat" respectively. Thanks in advance.
You can use regular expressions for that:
import re
# '\b': word boundary, re.I: case insensitive
pat = re.compile(r'\b{}\b'.format(wordCheck), flags=re.I)
for line in input_file:
if pat.search(line):
print line
Here's a short way of doing it, use in on a list of words instead of string directly.
word = 'cat'
for line in lines:
if word in line.split(' '): # use `in` on a list of all the words of that line.
print(line)
Outputs:
My cat is named garfield
For your first question, you can use break statement to stop the loop after getting the first match
for line in input_file:
if wordCheck in line.split(' '):
print line
break # add break here
For your second question, you could user lower() function to convert everything into lower case, so that Cat and cat would be detected.
for line in input_file:
if wordCheck in line.lower().split(' '):
print line

How to load a word list into Python

I'm working through an introductory Python programming course on MIT OCW. On this problem set I've been given some code to work on and a text file. The code and the text file are in the same folder. The code looks like this:
import random
import string
def load_words( ):
print "Loading word list from file..."
inFile = open (WORDLIST_FILENAME, 'r', 0)
line = inFile.readline( )
wordlist = string.split (line)
print " ", len(wordlist), "words loaded."
return wordlist
def choose_word (wordlist):
return random.choice (wordlist)
wordlist = load_words ( )
When I run the code as it is, the problem set instructions say I should get this:
Loading word list from file...
55900 words loaded.
For some reason though, when I run the code I get:
Loading word list from file...
1 words loaded
I've tried omitting the 2nd and 3rd parameters from the input to the open function but to no avail. What could the problem be?
Moreover, when I try to print the value of wordlist I get
['AA']
When I print the value of line within the context of the relevant function I get:
AA
The text file does begin with 'AA', but what about all of the letters that follow?
line = inFile.readline( ) should be readlines(), plural.
readline would read only a single line. The reason why only one word is read.
Using readlines() would give you a list delimited by new line characters in your input file.
raw file like this:
cat wordlist.txt
aa
bb
cc
dd
ee
python file like this:
import random
def load_words(WORDLIST_FILENAME):
print "Loading word list from file..."
wordlist = list()
# 'with' can automate finish 'open' and 'close' file
with open(WORDLIST_FILENAME) as f:
# fetch one line each time, include '\n'
for line in f:
# strip '\n', then append it to wordlist
wordlist.append(line.rstrip('\n'))
print " ", len(wordlist), "words loaded."
print '\n'.join(wordlist)
return wordlist
def choose_word (wordlist):
return random.choice (wordlist)
wordlist = load_words('wordlist.txt')
then result:
python load_words.py
Loading word list from file...
5 words loaded.
aa
bb
cc
dd
ee
the function u have written can read words in a single line. It assumes all words are written in single line in text file and hence reads that line and creates a list by splitting it. However, it appears your text file contains some newlines also. Hence u can replace the following with:
line = inFile.readline( )
wordlist = string.split (line)
with:
wordlist =[]
for line in inFile:
line = line.split()
wordlist.extend(line)
print " ", len(wordlist), "words loaded."

Categories