Is there a way to obtain a random word from PyEnchant's dictionaries?
I've tried doing the following:
enchant.Dict("<language>").keys() #Type 'Dict' has no attribute 'keys'
list(enchant.Dict("<language>")) #Type 'Dict' is not iterable
I've also tried looking into the module to see where it gets its wordlist from but haven't had any success.
Using the separate "Random-Words" module is a workaround, but as it doesn't follow the same wordlist as PyEnchant, not all words will match. It is also quite a slow method. It is, however, the best alternative I've found so far.
Your question really got me curious so I thought of some way to make a random word using enchant.
import enchant
import random
import string
# here I am getting hold of all the letters
letters = string.ascii_lowercase
# crating a string with a random length with random letters
word = "".join([random.choice(letters) for _ in range(random.randint(3, 8))])
d = enchant.Dict("en_US")
# using the `enchant` to suggest a word based on the random string we provide
random_word = d.suggest(word)
Sometimes the suggest method will not return any suggestion so you will need to make a loop to check if random_word has any value.
With the help of #furas this question has been resolved.
Using the dict-en text file in furas' PyWordle, I wrote a short code that filters out invalid words in pyenchant's wordlist.
import enchant
wordlist = enchant.Dict("en_US")
baseWordlist = open("dict-en.txt", "r")
lines = baseWordlist.readlines()
baseWordlist.close()
newWordlist = open("dict-en_NEW.txt", "w") #write to new text file
for line in lines:
word = line.strip("\n")
if wordList.check(word) == True: #if word exists in pyenchant's dictionary
print(line + " is valid.")
newWordlist.write(line)
else:
print(line + " is invalid.")
newWordlist.close()
Afterwards, calling the text file will allow you to gather the information in that line.
validWords = open("dict-en_NEW", "r")
wordList = validWords.readlines()
myWord = wordList[<line>]
#<line> can be any int (max is .txt length), either a chosen one or a random one.
#this will return the word located at line <line>.
Related
I'm trying to get all the words made from the letters, 'crbtfopkgevyqdzsh' from a file called web2.txt. The posted cell below follows a block of code which improperly returned the whole run up to a full word e.g. for the word shocked it would return s, sh, sho, shoc, shock, shocke, shocked
So I tried a trie (know pun intended).
web2.txt is 2.5 MB in size, and contains 2,493,838 words of varying length. The trie in the cell below is breaking my Google Colab notebook. I even upgraded to Google Colab Pro, and then to Google Colab Pro+ to try and accommodate the block of code, but it's still too much. Any more efficient ideas besides trie to get the same result?
# Find the words3 word list here: svnweb.freebsd.org/base/head/share/dict/web2?view=co
trie = {}
with open('/content/web2.txt') as words3:
for word in words3:
cur = trie
for l in word:
cur = cur.setdefault(l, {})
cur['word'] = True # defined if this node indicates a complete word
def findWords(word, trie = trie, cur = '', words3 = []):
for i, letter in enumerate(word):
if letter in trie:
if 'word' in trie[letter]:
words3.append(cur)
findWords(word, trie[letter], cur+letter, words3 )
# first example: findWords(word[:i] + word[i+1:], trie[letter], cur+letter, word_list )
return [word for word in words3 if word in words3]
words3 = findWords("crbtfopkgevyqdzsh")
I'm using Pyhton3
A trie is overkill. There's about 200 thousand words, so you can just make one pass through all of them to see if you can form the word using the letters in the base string.
This is a good use case for collections.Counter, which gives us a clean way to get the frequencies (i.e. "counters") of the letters of an arbitrary string:
from collections import Counter
base_counter = Counter("crbtfopkgevyqdzsh")
with open("data.txt") as input_file:
for line in input_file:
line = line.rstrip()
line_counter = Counter(line.lower())
# Can use <= instead if on Python 3.10
if line_counter & base_counter == line_counter:
print(line)
i am new to python so i hope you guys can help me out. Currently i am using the import function to get my output, but i wish to include def function to this set of code that count the top 10 most frequent word. But i can't figure it out. Hope you guys can help me. Thanks in advance!!!
import collections
import re
file = open('partA', 'r')
file = file.read()
stopwords = set(line.strip() for line in open('stopwords.txt'))
stopwords = stopwords.union(set(['it', 'is']))
wordcount = collections.defaultdict(int)
"""
the next paragraph does all the counting and is the main point of difference from the original article. More on this is explained later.
"""
pattern = r"\W"
for word in file.lower().split():
word = re.sub(pattern, '', word)
if word not in stopwords:
wordcount[word] += 1
to_print = int(input("How many top words do you wish to print?"))
print(f"The most common {to_print} words are:")
mc = sorted(wordcount.items(), key=lambda k_v: k_v[1], reverse=True) [:to_print]
for word, count in mc:
print(word, ":", count)
The output:
How many top words do you wish to print?30
The most common 30 words are:
hey : 1
there : 1
this : 1
joey : 1
how : 1
going : 1
'def' is used to create a user defined function to later call in a script. For example:
def printme(str):
print(str)
printme('Hello')
I've now created a function called 'printme' that I call later to print the string. This is obviously a pointless function as it just does what the 'print' function does, but I hope this clears up what the purpose of 'def' is!
Hi so i have 2 text files I have to read the first text file count the frequency of each word and remove duplicates and create a list of list with the word and its count in the file.
My second text file contains keywords I need to count the frequency of these keywords in the first text file and return the result without using any imports, dict, or zips.
I am stuck on how to go about this second part I have the file open and removed punctuation etc but I have no clue how to find the frequency
I played around with the idea of .find() but no luck as of yet.
Any suggestions would be appreciated this is my code at the moment seems to find the frequency of the keyword in the keyword file but not in the first text file
def calculateFrequenciesTest(aString):
listKeywords= aString
listSize = len(listKeywords)
keywordCountList = []
while listSize > 0:
targetWord = listKeywords [0]
count =0
for i in range(0,listSize):
if targetWord == listKeywords [i]:
count = count +1
wordAndCount = []
wordAndCount.append(targetWord)
wordAndCount.append(count)
keywordCountList.append(wordAndCount)
for i in range (0,count):
listKeywords.remove(targetWord)
listSize = len(listKeywords)
sortedFrequencyList = readKeywords(keywordCountList)
return keywordCountList;
EDIT- Currently toying around with the idea of reopening my first file again but this time without turning it into a list? I think my errors are somehow coming from it counting the frequency of my list of list. These are the types of results I am getting
[[['the', 66], 1], [['of', 32], 1], [['and', 27], 1], [['a', 23], 1], [['i', 23], 1]]
You can try something like:
I am taking a list of words as an example.
word_list = ['hello', 'world', 'test', 'hello']
frequency_list = {}
for word in word_list:
if word not in frequency_list:
frequency_list[word] = 1
else:
frequency_list[word] += 1
print(frequency_list)
RESULT: {'test': 1, 'world': 1, 'hello': 2}
Since, you have put a constraint on dicts, I have made use of two lists to do the same task. I am not sure how efficient it is, but it serves the purpose.
word_list = ['hello', 'world', 'test', 'hello']
frequency_list = []
frequency_word = []
for word in word_list:
if word not in frequency_word:
frequency_word.append(word)
frequency_list.append(1)
else:
ind = frequency_word.index(word)
frequency_list[ind] += 1
print(frequency_word)
print(frequency_list)
RESULT : ['hello', 'world', 'test']
[2, 1, 1]
You can change it to how you like or re-factor it as you wish
I agree with #bereal that you should use Counter for this. I see that you have said that you don't want "imports, dict, or zips", so feel free to disregard this answer. Yet, one of the major advantages of Python is its great standard library, and every time you have list available, you'll also have dict, collections.Counter and re.
From your code I'm getting the impression that you want to use the same style that you would have used with C or Java. I suggest trying to be a little more pythonic. Code written this way may look unfamiliar, and can take time getting used to. Yet, you'll learn way more.
Claryfying what you're trying to achieve would help. Are you learning Python? Are you solving this specific problem? Why can't you use any imports, dict, or zips?
So here's a proposal utilizing built in functionality (no third party) for what it's worth (tested with Python 2):
#!/usr/bin/python
import re # String matching
import collections # collections.Counter basically solves your problem
def loadwords(s):
"""Find the words in a long string.
Words are separated by whitespace. Typical signs are ignored.
"""
return (s
.replace(".", " ")
.replace(",", " ")
.replace("!", " ")
.replace("?", " ")
.lower()).split()
def loadwords_re(s):
"""Find the words in a long string.
Words are separated by whitespace. Only characters and ' are allowed in strings.
"""
return (re.sub(r"[^a-z']", " ", s.lower())
.split())
# You may want to read this from a file instead
sourcefile_words = loadwords_re("""this is a sentence. This is another sentence.
Let's write many sentences here.
Here comes another sentence.
And another one.
In English, we use plenty of "a" and "the". A whole lot, actually.
""")
# Sets are really fast for answering the question: "is this element in the set?"
# You may want to read this from a file instead
keywords = set(loadwords_re("""
of and a i the
"""))
# Count for every word in sourcefile_words, ignoring your keywords
wordcount_all = collections.Counter(sourcefile_words)
# Lookup word counts like this (Counter is a dictionary)
count_this = wordcount_all["this"] # returns 2
count_a = wordcount_all["a"] # returns 1
# Only look for words in the keywords-set
wordcount_keywords = collections.Counter(word
for word in sourcefile_words
if word in keywords)
count_and = wordcount_keywords["and"] # Returns 2
all_counted_keywords = wordcount_keywords.keys() # Returns ['a', 'and', 'the', 'of']
Here is a solution with no imports. It uses nested linear searches which are acceptable with a small number of searches over a small input array, but will become unwieldy and slow with larger inputs.
Still the input here is quite large, but it handles it in reasonable time. I suspect if your keywords file was larger (mine has only 3 words) the slow down would start to show.
Here we take an input file, iterate over the lines and remove punctuation then split by spaces and flatten all the words into a single list. The list has dupes, so to remove them we sort the list so the dupes come together and then iterate over it creating a new list containing the string and a count. We can do this by incrementing the count as long the same word appears in the list and moving to a new entry when a new word is seen.
Now you have your word frequency list and you can search it for the required keyword and retrieve the count.
The input text file is here and the keyword file can be cobbled together with just a few words in a file, one per line.
python 3 code, it indicates where applicable how to modify for python 2.
# use string.punctuation if you are somehow allowed
# to import the string module.
translator = str.maketrans('', '', '!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~')
words = []
with open('hamlet.txt') as f:
for line in f:
if line:
line = line.translate(translator)
# py 2 alternative
#line = line.translate(None, string.punctuation)
words.extend(line.strip().split())
# sort the word list, so instances of the same word are
# contiguous in the list and can be counted together
words.sort()
thisword = ''
counts = []
# for each word in the list add to the count as long as the
# word does not change
for w in words:
if w != thisword:
counts.append([w, 1])
thisword = w
else:
counts[-1][1] += 1
for c in counts:
print('%s (%d)' % (c[0], c[1]))
# function to prevent need to break out of nested loop
def findword(clist, word):
for c in clist:
if c[0] == word:
return c[1]
return 0
# open keywords file and search for each word in the
# frequency list.
with open('keywords.txt') as f2:
for line in f2:
if line:
word = line.strip()
thiscount = findword(counts, word)
print('keyword %s appear %d times in source' % (word, thiscount))
If you were so inclined you could modify findword to use a binary search, but its still not going to be anywhere near a dict. collections.Counter is the right solution when there are no restrictions. Its quicker and less code.
I made a function to make a dictionary out of the words in a file and the value is the frequency of the word, named read_dictionary. I'm trying to now make another function that prints the top three words of the file by count so it looks like this:
"The top three words in fileName are:
word : number
word : number
word : number"
This is what my code is:
def top_three_by_count(fileName):
freq_words = sorted(read_dictionary(f), key = read_dictionary(f).get,
reverse = True)
top_3 = freq_words[:3]
print top_3
print top_three_by_count(f)
You can use collections.Counter.
from collections import Counter
def top_three_by_count(fileName):
return [i[0] for i in Counter(read_dictionary(fileName)).most_common(3)]
Actually you don't need the read_dictionary() function at all. The following code snippet do the whole thing for you.
with open('demo.txt','r') as f:
print Counter([i.strip() for i in f.read().split(' ')]).most_common(3)
I want to randomly retrieve and print a whole line from a text file.
The text file is basically a list and so each item on that list needs to be searched.
import random
a= random.random
prefix = ["CYBER-", "up-", "down-", "joy-"]
suprafix = ["with", "in", "by", "who", "thus", "what"]
suffix = ["boy", "girl", "bread", "hippy", "box", "christ"]
print (random.choice(prefix), random.choice(suprafix), random.choice(prefix), random.choice(suffix))
this is my code for if i were to just manually input it into a list but i can't seem to find how to use an array or index to capture line by line the text and use that
I'm not sure I completely understand what you're asking, but I'll try to help.
If you're trying to choose a random line from a file, you can use open(), then readlines(), then random.choice():
import random
line = random.choice(open("file").readlines())
If you're trying to choose a random element from each of three lists, you can use random.choice():
import random
choices=[random.choice(i) for i in lists]
lists is a list of lists to choose from here.
Use Python's file.readLines() method:
with open("file_name.txt") as f:
prefix = f.readlines()
Now you should be able to iterate through the list prefix.
Those answers have helped me grab things from a list in a text file. as you can see in my code below. But I have three lists as text files and I am trying to generate a 4 word message randomly, choosing from the 'prefix' and 'suprafix' list for the first 3 words and 'suffix' file for the fourth word but I want to prevent it, when printing them, from picking a word that was already chosen by the random.choice function
import random
a= random.random
prefix = open('prefix.txt','r').readlines()
suprafix = open('suprafix.txt','r').readlines()
suffix = open('suffix.txt','r').readlines()
print (random.choice(prefix + suprafix), random.choice(prefix + suprafix), random.choice(prefix + suprafix), random.choice(suffix))
as you can see it chooses randomly from those 2 lists for 3 words