I have a set of tuples of the form
ref_set = [(a1,b1),(a2,b2),(a3,b3)...]
and so on. I need to compare words from a list of sentences and check if it is equal to a1, a2, a3.. if word == a1, replace it with b1. If word == a2, replace with b2 and so on.
Here's my code:
def replace_words(x): #function
for line in x: #iterate over lines in list
for word in line.split(): #iterate over words in list
for i,j in ref_set: #iterate over each tuple
if word == i: #if word is equal to first element
word = j #replace it with 2nd one.
I'm getting None as a result; I know I need to return something.
Don't use a list of tuples. Use a dictionary:
ref_map = dict(ref_set)
for line in x:
line = ' '.join([ref_map.get(word, word) for word in line.split()])
otherwise you have a NxM loop; for every extra word in your text or in your ref_set you double the number of iterations you need to do.
Your code only rebinds word, not replace the word in the line; the list comprehension above produces a new line value instead. This doesn't replace the line in x though, you need another list comprehension for that:
x = [' '.join([ref_map.get(word, word) for word in line.split()]) for line in x]
It appears from the comments that x is not a list of sentences but rather one sentence. In which case you use just process that one line with one list comprehension, as in the loop iteration over x above:
def corrected(line):
return ' '.join([ref_map.get(word, word) for word in line.split()])
Related
Write a function called word_freq(text) which takes one string
argument. This string will not have any punctuation. Perform a count
of the number of 'n' character words in this string and return a list
of tuples of the form[(n, count), (n-1, count) ...] in descending
order of the counts. For example:
Example: word_freq('a aaa a aaaa')
Result: [(4, 1), (3, 1), (1, 2)]
Note: that this does not show anything for the 2 character words. str1
= 'a aaa a aaa' str.split(str1) str.count(str1)
def word_freq(str): Python code to find frequency of each word
I tried this
text = 'a aaa a aaaa'
def word_freq(str):
tuple = ()
count = {}
for x in str:
if x in count.keys():
count[x] += 1
else:
count[x] = 1
print(count)
def count_letters(word):
char = "a"
count = 0
for c in word:
if char == c:
count += 1
return count
word_freq(text)
The code below does what you want. Now I'll explain how it works. before anything, we will make a dictionary called "WC" which will hold the count of each n-character-word in our sentence. now we start. first of all, it receives a string from user. then it takes the string and using split(), it turns the string into a LIST of words. then for each word it checks its length, if it is 2, it ignores it. otherwise, it will add 1 to the count of that n-character word in our dictionary.
after every word is checked, we use wc.items() to turn our dictionary into a list of tuples. Each element in the list is a tuple that contains data for each word. each tuple has 2 elements. the first is number of charatcers of each word and the second element is the number of times it existed in the sentence. with that out of the way, Now all we need is to do is sort this list based on the character counts in reverse (from high char count to low char count). we do that using the sorted function. we sort based on x[0] which means the first element of each tuple which is the character count for each word. Finally, we return this list of tuples. You can print it.
if anything is unclear, let me know. also, you can put print() statements at every line so you can better understand what is happening.
here's the code, I hope it helps:
inp = input("Enter your text: ")
def word_count(inp_str):
wc = {}
for item in inp_str.strip().split():
if len(item) == 2:
continue
wc[len(item)] = wc.get(len(item), 0) + 1
return sorted(wc.items(), key=lambda x: x[0], reverse = True)
print(word_count(inp))
I have this code that is supposed to remove all words from a list that are under 4 characters long but it just removes some of the words (I'm not sure which) but definitely not all of them:
#load in the words from the original text file
def load_words():
with open('words_alpha.txt') as word_file:
valid_words = [word_file.read().split()]
return valid_words
english_words = load_words()
print("loading...")
print(len(english_words[0]))
#remove words under 4 letters
for word in english_words[0]:
if len(word) < 4:
english_words[0].remove(word)
print("done")
print(len(english_words[0]))
#save the remaining words to a new text file
new_words = open("english_words_v3.txt","w")
for word in english_words[0]:
new_words.write(word)
new_words.write("\n")
new_words.close()
It outputs this:
loading...
370103
done
367945
In words_alpha.txt there is 67000 words from the English language
You want to iterate over a copy of english_words by taking it's copy using english_words[0][:]. Right now you are iterating on the same list you are modifying, which is causing the wierd behaviour. So the for loop will look like
for word in english_words[0][:]:
if len(word) < 4:
english_words[0].remove(word)
Also you can simplify your first for-loop via list-comprehension, and you don't need to wrap word_file.read().split() in a list, since it already returns a list
So your code will look like
#load in the words from the original text file
def load_words():
with open('words_alpha.txt') as word_file:
#No need to wrap this into a list since it already returns a list
valid_words = word_file.read().split()
return valid_words
english_words = load_words()
#remove words under 4 letters using list comprehension
english_words = [word for word in english_words if len(word) >= 4]
print("done")
print(len(english_words))
#save the remaining words to a new text file
new_words = open("english_words_v3.txt","w")
for word in english_words:
new_words.write(word)
new_words.write("\n")
new_words.close()
Try this with a list comprehensions:
print([word for word in english_words[0] if len(word) >= 4])
The problem in your script is that you are modifying a list while iterating over it. You could also avoid this problem by instanciating and populating a new list, but list comprehensions are ideal for this kind of situations.
I'm trying to write an algorithm that by given to it a bunch of letters is giving you all the words that can be constructed of the letters, for instance, given 'car' should return a list contains [arc,car,a, etc...] and out of it returns the best scrabble word. The problem is in finding that list which contains all the words.
I've got a giant txt file dictionary, line delimited and I've tried this so far:
def find_optimal(bunch_of_letters: str):
words_to_check = []
c1 = Counter(bunch_of_letters.lower())
for word in load_words():
c2 = Counter(word.lower())
if c2 & c1 == c2:
words_to_check.append(word)
max_word = max_word_value(words_to_check)
return max_word,calc_word_value(max_word)
max_word_value - returns the word with the maximum value of the list given
calc_word_value - returns the word's score in scrabble.
load_words - return a list of the dictionary.
I'm currently using counters to do the trick but, the problem is that I'm currently on about 2.5 seconds per search and I don't know how to optimize this, any thoughts?
Try this:
def find_optimal(bunch_of_letters):
bunch_of_letters = ''.join(sorted(bunch_of_letters))
words_to_check = [word for word in load_words() if ''.join(sorted(word)) in bunch_of_letters]
max_word = max_word_value(words_to_check)
return max_word, calc_word_value(max_word)
I've just used (or at least tried to use) a list comprehension. Essentially, words_to_check will (hopefully!) be a list of all of the words which are in your text file.
On a side note, if you don't want to use a gigantic text file for the words, check out enchant!
from itertools import permutations
theword = 'car' # or we can use input('Type in a word: ')
mylist = [permutations(theword, i)for i in range(1, len(theword)+1)]
for generator in mylist:
for word in generator:
print(''.join(word))
# instead of .join just print (word) for tuple
Output:
c
a
r
ca
cr
...
ar rc ra car cra acr arc rca rac
This will give us all the possible combinations (i.e. permutations) of a word.
If you're looking to see if the generated word is an actual word in the English dictionary we can use This Answer
import enchant
d = enchant.Dict("en_US")
for word in mylist:
print(d.check(word), word)
Conclusion:
If want to generate all the combinations of the word. We use this code:
from itertools import combinations, permutations, product
word = 'word' # or we can use input('Type in a word: ')
solution = permutations(word, 4)
for i in solution:
print(''.join(i)) # just print(i) if you want a tuple
When the name is given, for example Aberdeen Scotland.
I need to get the result of Adbnearldteoecns.
Leaving the first word plain, but reverse the last word and put in between the first word.
I have done so far:
coordinatesf = "Aberdeen Scotland"
for line in coordinatesf:
separate = line.split()
for i in separate [0:-1]:
lastw = separate[1][::-1]
print(i)
A bit dirty but it works:
coordinatesf = "Aberdeen Scotland"
new_word=[]
#split the two words
words = coordinatesf.split(" ")
#reverse the second and put to lowercase
words[1]=words[1][::-1].lower()
#populate the new string
for index in range(0,len(words[0])):
new_word.insert(2*index,words[0][index])
for index in range(0,len(words[1])):
new_word.insert(2*index+1,words[1][index])
outstring = ''.join(new_word)
print outstring
Note that what you want to do is only well-defined if the the input string is composed of two words with the same lengths.
I use assertions to make sure that is true but you can leave them out.
def scramble(s):
words = s.split(" ")
assert len(words) == 2
assert len(words[0]) == len(words[1])
scrambledLetters = zip(words[0], reversed(words[1]))
return "".join(x[0] + x[1] for x in scrambledLetters)
>>> print(scramble("Aberdeen Scotland"))
>>> AdbnearldteoecnS
You could replace the x[0] + x[1] part with sum() but I think that makes it less readable.
This splits the input, zips the first word with the reversed second word, joins the pairs, then joins the list of pairs.
coordinatesf = "Aberdeen Scotland"
a,b = coordinatesf.split()
print(''.join(map(''.join, zip(a,b[::-1]))))
I'm trying to extract the first few elements of a tab-delimited file using the following:
words = []
name_elements = []
counter = 0
for line in f:
words = line.split()
for element in words:
counter = counter + 1
if words[element].isupper():
name_elements = words[0:counter-1]
print type(counter)
When I run this code, I get this error:
TypeError: list indices must be integers, not str
logout
Even though when I run type(counter) it says it's an integer.
What's the issue?
You are trying to index words with element. element is a string; it is already the item you wanted to get.
The for loop is giving you each element from words in turn, assigning it to the element variable. element is not an integer index into the words list.
Note that your counter is going to go out of bounds; if you want to have an index into the words list along with the element, use the enumerate() function. You are also replacing the name_elements list with a slice from words; perhaps you wanted to extend the list instead:
for line in f:
words = line.split()
for counter, element in enumerate(words):
if element.isupper():
name_elements.extend(words[:counter-1])
although it is not clear exactly what you wanted to do with the words list in this case.