searching for reverse string in an file - python

I am trying to make a script that looks inside a file that contains all words of my language (1 word per line), read it and checks if every single word in that file is in reverse in the file, basicly palindromes and semi-palindromes
words = open('AllWords.txt', 'r')
for line in words:
reverse = line[::-1]
if reverse in words:
print(reverse)
if reverse not in words:
continue
However it seems that after the first word in the file (which is not reverse in words) it stops iterating.
Does anyone know how I could fix this?

The problem is that word is an iterator and the check reverse in words exhausts it. So for the next iteration of the for loop there is no further element available (the iterator is exhausted) and so it stops iterating.
You could use a list or set instead:
words = set(map(str.rstrip, open(...).readlines()))
Then perform the rest of the code as you've already indicated.
If order matters then you can use a list for the iteration and a set for the check (membership tests for sets are O(1)):
with open(...) as fh:
words = [x.rstrip() for x in fh]
word_set = set(words)
for word in words:
if word[::-1] in word_set:
print(word)
You can also use two sets since the palindromes are the intersection between two sets, one for words and one for reversed words:
with open(...) as fh:
words = set(map(str.rstrip, fh))
words_reversed = set(x[::-1] for x in words)
palindromes = words & words_reversed

words = open('AllWords.txt', 'r').readlines()
for line in words:
reverse = line[::-1]
if reverse in words:
print(reverse)

The process is going to be quite slow if you have a large number of words in your file. You could get results much faster using set operations:
words = open("Allwords.txt").read().split("\n")
palindromes = set(words).intersection(w[::-1] for w in words)
for palindrome in palindromes: print(palindrome)

Related

What's the difference between "word = line.split()" and "for word in line.split()"?

I'm new to programming and this is my first question here. I feel it might be a very silly beginner doubt, but here goes.
On multiple occasions, I've typed out the whole code right except for this one line, on which I make the same mistake every time.
Could someone please explain to me what the computer understands when I type each of the following lines, and what the difference is?
word = line.split()
for word in line.split()
The difference between the expected and my actual output is just because I typed the former instead of the latter:
word = line.split()
This will split the line variable (using the default "any amount of white space" separator) and give you back a list of words built from it. You then bind the variable word to that list.
On the other hand:
for word in line.split()
initially does the same thing the previous command did (splitting the line to get a list) but, instead of binding the word variable to that entire list, it iterates over the list, binding word to each string in the list in turn.
The following transcript hopefully makes this clearer:
>>> line = 'pax is good-looking'
>>> word = line.split() ; print(word)
['pax', 'is', 'good-looking']
>>> for word in line.split(): print(word)
...
pax
is
good-looking
The split() is a separator method.
word = line.split() will return a list by splitting line into words where ' ' is present (as it is the default separator.)
for word in line.split() will iterate over that list (line.split()).
Here is an example for clarification.
line = "Stackoverflow is amazing"
word = line.split()
print(word)
>>>['Stackoverflow','is','amazing']
for word in line.split():
print(word)
>>>
'Stackoverflow'
'is'
'amazing'

Duplicates with in a sentence of a text file in python

Hi I want to write a code that reads a text file, and identifies the sentences in the file with words that have duplicates within that sentence. I was thinking of putting each sentence of the file in a dictionary and finding which sentences have duplicates. Since I am new to Python, I need some help in writing the code.
This is what I have so far:
def Sentences():
def Strings():
l = string.split('.')
for x in range(len(l)):
print('Sentence', x + 1, ': ', l[x])
return
text = open('Rand article.txt', 'r')
string = text.read()
Strings()
return
The code above converts files to sentences.
Suppose you have a file where each line is a sentence, e.g. "sentences.txt":
I contain unique words.
This sentence repeats repeats a word.
The strategy could be to split the sentence into its constituent words, then use set to find the unique words in the sentence. If the resulting set is shorter than the list of all words, then you know that the sentence contains at least one duplicated word:
sentences_with_dups = []
with open("sentences.txt") as fh:
for sentence in fh:
words = sentence.split(" ")
if len(set(words)) != len(words):
sentences_with_dups.append(sentence)

Automatically separating words into letters?

So I have this code:
import sys ## The 'sys' module lets us read command line arguments
words1 = open(sys.argv[2],'r') ##sys.argv[2] is your dictionary text file
words = str((words1.read()))
def main():
# Get the dictionary to search
if (len(sys.argv) != 3) :
print("Proper format: python filename.py scrambledword filename.txt")
exit(1) ## the non-zero return code indicates an error
scrambled = sys.argv[1]
print(sys.argv[1])
unscrambled = sorted(scrambled)
print(unscrambled)
for line in words:
print(line)
When I print words, it prints the words in the dictionary, one word at a time, which is great. But as soon as I try and do anything with those words like in my last two lines, it automatically separates the words into letters, and prints one letter per line of each word. Is there anyway to keep the words together? My end goal is to do ordered=sorted(line), and then an if (ordered==unscrambled) have it print the original word from the dictionary?
Your words is an instance of str. You should use split to iterate over words:
for word in words.split():
print(word)
A for-loop takes one element at a time from the "sequence" you pass it. You have read the contents of your file into a single string, so python treats it as a sequence of letters. What you need is to convert it into a list yourself: Split it into a list of strings that are as large as you like:
lines = words.splitlines() # Makes a list of lines
for line in lines:
....
Or
wordlist = words.split() # Makes a list of "words", by splitting at whitespace
for word in wordlist:
....

searching in python

I am trying to search a file to find all words which use any or all of the letters of a persons first name and are the same length as their first name. I have imported the file and it can be opened and read etc, but now i want to be able to seach the file for any words which would contain the specified letters, the words have to be same length as the persons first name.
You can use itertools (for permutations) and regular expressions (for searching)
def find_anagrams_in_file(filename, searchword):
import re
searchword = searchword.lower()
found_words = []
for line in open(filename, 'rt'):
words = re.split(r'\W', line)
for word in words:
if len(word) == len(searchword):
tmp = word.lower()
try:
for letter in searchword:
idx = tmp.index(letter)
tmp = tmp[:idx] + tmp[idx+1:]
found_words += [word]
except ValueError:
pass
return found_words
Run as so (Python 3):
>>> print(find_anagrams_in_file('apa.txt', 'Urne'))
['Rune', 'NurE', 'ERUN']
I would approach this problem this way:
filter out the words of the length different from the length of the first name,
iterate over the rest of the words checking whether intersection of first name's letters and word's letters is non-empty (set might be useful here).
P.S. Is that your homework?

python - remove string from words in an array

#!/usr/bin/python
#this looks for words in dictionary that begin with 'in' and the suffix is a real word
wordlist = [line.strip() for line in open('/usr/share/dict/words')]
newlist = []
for word in wordlist:
if word.startswith("in"):
newlist.append(word)
for word in newlist:
word = word.split('in')
print newlist
how would I get the program to remove the string "in" from all the words that it starts with? right now it does not work
#!/usr/bin/env python
# Look for all words beginning with 'in'
# such that the rest of the word is also
# a valid word.
# load the dictionary:
with open('/usr/share/dict/word') as inf:
allWords = set(word.strip() for word in inf) # one word per line
using 'with' ensures the file is always properly closed;
I make allWords a set; this makes searching it an O(1) operation
then we can do
# get the remainder of all words beginning with 'in'
inWords = [word[2:] for word in allWords if word.startswith("in")]
# filter to get just those which are valid words
inWords = [word for word in inWords if word in allWords]
or run it into a single statement, like
inWords = [word for word in (word[2:] for word in allWords if word.startswith("in")) if word in allWords]
Doing it the second way also lets us use a generator for the inside loop, reducing memory requirements.
split() returns a list of the segments obtained by splitting. Furthermore,
word = word.split('in')
doesn't modify your list, it just modifies the variable being iterated.
Try replacing your second loop with this:
for i in range(len(newlist)):
word = newlist[i].split('in', 1)
newlist[i] = word[1]
It's difficult to tell from your question what you want in newlist if you just want words that start with "in" but with "in" removed then you can use a slice:
newlist = [word[2:] for word in wordlist if word.startswith('in')]
If you want words that start with "in" are still in wordlist once they've had "in" removed (is that what you meant by "real" in your comment?) then you need something a little different:
newlist = [word for word in wordlist if word.startswith('in') and word[2:] in wordlist
Note that in Python we use a list, not an "array".
Suppose that wordlist is the list of words. Following code should do the trick:
for i in range(len(wordlist)):
if wordlist[i].startswith("in"):
wordlist[i] = wordlist[i][2:]
It is better to use while loop if the number of words in the list is quite big.

Categories