reading and checking the consecutive words in a file - python

I want to read the words in a file, and say for example, check if the word is "1",if word is 1, I have to check if the next word is "two". After that i have to do some other task. Can u help me to check the occurance of "1" and "two" consecutively.
I have used
filne = raw_input("name of existing file to be proceesed:")
f = open(filne, 'r+')
for word in f.read().split():
for i in xrange(len(word)):
print word[i]
print word[i+1]
but its not working.

The easiest way to deal with consecutive items is with zip:
with open(filename, 'r') as f: # better way to open file
for line in f: # for each line
words = line.strip().split() # all words on the line
for word1, word2 in zip(words, words[1:]): # iterate through pairs
if word1 == '1' and word2 == 'crore': # test the pair
At the moment, your indices (i and i+1) are within each word (i.e. characters) not for words within the list.

I think you want to print two consecutive words from the file,
In your code you are iterating over the each character instead of each word in file if thats what you intend to do.
You can do that in following way:
f = open('yourFileName')
str1 = f.read().split()
for i in xrange(len(str1)-1): # -1 otherwise it will be index out of range error
print str1[i]
print str1[i+1]
and if you want to check some word is present and want check for word next to it, use
if 'wordYouWantToCheck' in str1:
index=str1.index('wordYouWantToCheck')
Now you have index for the word you are looking for, you can check for the word next to it using str1[index+1].
But 'index' function will return only the first occurrence of the word. To accomplish your intent here, you can use 'enumerate' function.
indices = [i for i,x in enumerate(str1) if x == "1"]
This will return list containing indices of all occurrences of word '1'.

Related

How to search words from txt file to python

How can I show words which length are 20 in a text file?
To show how to list all the word, I know I can use the following code:
#Program for searching words is in 20 words length in words.txt file
def main():
file = open("words.txt","r")
lines = file.readlines()
file.close()
for line in lines:
print (line)
return
main()
But I not sure how to focus and show all the words with 20 letters.
Big thanks
If your lines have lines of text and not just a single word per line, you would first have to split them, which returns a list of the words:
words = line.split(' ')
Then you can iterate over each word in this list and check whether its length is 20.
for word in words:
if len(word) == 20:
# Do what you want to do here
If each line has a single word, you can just operate on line directly and skip the for loop. You may need to strip the trailing end-of-line character though, word = line.strip('\n'). If you just want to collect them all, you can do this:
words_longer_than_20 = []
for word in words:
if len(word) > 20:
words_longer_than_20.append(word)
If your file has one word only per line, and you want only the words with 20 letters you can simply use:
with open("words.txt", "r") as f:
words = f.read().splitlines()
found = [x for x in words if len(x) == 20]
you can then print the list or print each word seperately
You can try this:
f = open('file.txt')
new_file = f.read().splitlines()
words = [i for i in f if len(i) == 20]
f.close()

Python make a list of words from a file

I'm trying to make a list of words from a file that includes only words that do not contain any duplicate letters such as 'hello' but 'helo' would be included.
My code words perfectly when I use a list that I create by just typing in words however when I try to do it with the file list it just prints all the words even if they include duplicate letters.
words = []
length = 5
file = open('dictionary.txt')
for word in file:
if len(word) == length+1:
words.insert(-1, word.rstrip('\n'))
alpha = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
x = 0
while x in range(0, len(alpha)):
i = 0
while i in range(0, len(words)):
if words[i].count(alpha[x]) > 1:
del(words[i])
i = i - 1
else:
i = i + 1
x = x + 1
print(words)
This snippet adds words, and removes duplicated letters before inserting them
words = []
length = 5
file = open('dictionary.txt')
for word in file:
clean_word = word.strip('\n')
if len(clean_word) == length + 1:
words.append(''.join(set(clean_word))
We convert the string to a set, which removed duplicates, and then we join the set to a string again:
>>> word = "helloool"
>>> set(word)
set(['h', 'e', 'l', 'o'])
>>> ''.join(set(word))
'helo'
I am not 100% sure how you want to remove duplicates like this, so I've assumed no letter can be more than once in the word (as your question specifies "duplicate letter" and not "double letter").
What does your dictionary.txt look like? Your code should work so long as each word is on a separate line (for x in file iterates through lines) and at least some of the words have 5 non-repeating letters.
Also, couple of tips:
You can read lines from a file into a list by calling file.readlines()
You can check for repeats in a list or string by using sets. Sets remove all duplicate elements, so checking if len(word) == len(set(word)) will tell you if there are duplicate letters in much less code :)

Python - Checking if all and only the letters in a list match those in a string?

I'm creating an Anagram Solver in Python 2.7.
The solver takes a user inputted anagram, converts each letter to a list item and then checks the list items against lines of a '.txt' file, appending any words that match the anagram's letters to a possible_words list, ready for printing.
It works... almost!
# Anagram_Solver.py
anagram = list(raw_input("Enter an Anagram: ").lower())
possible_words = []
with file('wordsEn.txt', 'r') as f:
for line in f:
if all(x in line + '\n' for x in anagram) and len(line) == len(anagram) + 1:
line = line.strip()
possible_words.append(line)
print "\n".join(possible_words)
For anagrams with no duplicate letters it works fine, but for words such as 'hello', the output contains words such as 'helio, whole, holes', etc, as the solver doesn't seem to count the letter 'L' as being 2 separate entries?
What am I doing wrong? I feel like there is a simple solution that I'm missing?
Thanks!
This is probably easiest to solve using a collections.Counter
>>> from collections import Counter
>>> Counter('Hello') == Counter('loleH')
True
>>> Counter('Hello') == Counter('loleHl')
False
The Counter will check that the letters and the number of times that each letter is present are the same.
Your code does as it's expected. You haven't actually made it check whether a letter appears twice (or 3+ times), it just checks if 'l' in word twice, which will always be True for all words with at least one l.
One method would be to count the letters of each word. If the letter counts are equal, then it is an anagram. This can be achieved easily with the collections.Counter class:
from collections import Counter
anagram = raw_input("Enter an Anagram: ").lower()
with file('wordsEn.txt', 'r') as f:
for line in f:
line = line.strip()
if Counter(anagram) == Counter(line):
possible_words.append(line)
print "\n".join(possible_words)
Another method would be to use sorted() function, as suggested by Chris in the other answer's comments. This sorts the letters in both the anagram and line into alphabetical order, and then checks to see if they match. This process runs faster than the collections method.
anagram = raw_input("Enter an Anagram: ").lower()
with file('wordsEn.txt', 'r') as f:
for line in f:
line = line.strip()
if sorted(anagram) == sorted(line):
possible_words.append(line)
print "\n".join(possible_words)

How to read a text file in Python

I have an assignment that reads:
Write a function which takes the input file name and list of words
and write into the file “Repeated_word.txt” the word and number of
times word repeated in input file?
word_list = [‘Emma’, ‘Woodhouse’, ‘father’, ‘Taylor’, ‘Miss’, ‘been’, ‘she’, ‘her’]
My code is below.
All it does is create the new file 'Repeated_word.txt' however it doesn't write the number of times the word from the wordlist appears in the file.
#obtain the name of the file
filename = raw_input("What is the file being used?: ")
fin = open(filename, "r")
#create list of words to see if repeated
word_list = ["Emma", "Woodhouse", "father", "Taylor", "Miss", "been", "she", "her"]
def repeatedWords(fin, word_list):
#open the file
fin = open(filename, "r")
#create output file
fout = open("Repeated_word.txt", "w")
#loop through each word of the file
for line in fin:
#split the lines into words
words = line.split()
for word in words:
#check if word in words is equal to a word from word_list
for i in range(len(word_list)):
if word == i:
#count number of times word is in word
count = words.count(word)
fout.write(word, count)
fout.close
repeatedWords(fin, word_list)
These lines,
for i in range(len(word_list)):
if word == i:
should be
for i in range(len(word_list)):
if word == word_list[i]:
or
for i in word_list:
if word == i:
word is a string, whereas i is an integer, the way you have it right now. These are never equal, hence nothing ever gets written to the file.
In response to your further question, you can either 1) use a dictionary to keep track of how many of each word you have, or 2) read in the whole file at once. This is one way you might do that:
words = fin.read().split()
for word in word_list:
fout.write(word, words.count(word), '\n')
I leave it up to you to figure out where to put this in your code and what you need to replace. This is, after all, your assignment, not ours.
Seems like you are making several mistakes here:
[1] for i in range(len(word_list)):
[2] if word == i:
[3] #count number of times word is in word
[4] count = words.count(word)
[5] fout.write(word, count)
First, you are comparing the word from cin with an integer from the range. [line 2]
Then you are writing the count to fout upon every match per line. [line 5] I guess you should keep the counts (e.g. in a dict) and write them all at the end of parsing input file.

Anagram Finder Python

I want to return a list of the words in 'listofwords.txt' that are anagrams of some string 'b'
def find_anagrams(a,b): ##a is the listofwords.txt
f=open('listofwords.txt', 'r')
for line in f:
word=line.strip()
wordsorted= ''.join(sorted(line))
for word in f:
if wordsorted == ''.join(sorted(word)):
print word
Why is it just giving me anagrams of the first word in the list?
Also how can I return a message if no anagrams are found?
The second for is incorrect. And you are comparing wordsorted with ''.join(sorted(word)), which are the same thing. This should work better:
def find_anagrams(a, b):
f = open(a, 'r')
for line in f:
word = line.strip()
wordsorted = ''.join(sorted(word))
if wordsorted == ''.join(sorted(b)):
print word
Now, make sure you close the file (or, better, use with statement).
Edit: about returning a message, the best thing to do is actually to return a list of the anagrams found. Then you decide what to do with the words (either print them, or print a message when the list is empty, or whatever you want). So it could be like
def find_anagrams(a, b):
anagrams = []
with open(a, 'r') as infile:
for line in f:
word = line.strip()
wordsorted = ''.join(sorted(word))
if wordsorted == ''.join(sorted(b)):
anagrams.append(word)
return anagrams
Then you can use it as
anagrams = find_anagrams('words.txt', 'axolotl')
if len(anagrams) > 0:
for anagram in anagrams:
print anagram
else:
print "no anagrams found"
You are reusing the file iterator f in the inner loop. Once the inner loop is finished, f will be exhausted and you exit the outer loop immediately, so you don't actually get past the first line.
If you want to have two independent loops over all the lines in your file, one solution (I'm sure this problem could be solved more efficiently) would be to first read the lines into a list and then iterating over the list:
with open('listofwords.txt') as f: # note: 'r' is the default mode
lines = f.readlines() # also: using `with` is good practice
for line in lines:
word = line.strip()
wordsorted = ''.join(sorted(line))
for word in lines:
if word == ''.join(sorted(word)):
print word
Edit: My code doesn't solve the problem you stated (I misunderstood it first, see matiasg's answer for the correct code), but my answer still explains why you only get the anagrams for the first word in the file.

Categories