I should divide the alphabet into lists of letters. I done that, my solution is good but my mentor said to me that I should a little improve this solution.
This is my code:
import string
import random
def generate_list():
list_of_letters=list(string.ascii_lowercase)
number_of_letter = len(list_of_letters)
main_list = []
while number_of_letter > 0:
a = random.randint(4, 7)
number_of_letter -= a
main_list.append(list_of_letters[0:a])
del list_of_letters[0:a]
print(main_list)
generate_list()
My mentor said me that I should take and remove lists of letters in one function, not to manually delete these pieces with lists of all letters manually using the del function So he would like to replace this fragment of code in one line.
main_list.append(list_of_letters[0:a])
del list_of_letters[0:a]
Can someone help me? Thank you in advance :)
You can use the pop() function of lists. It returns one item of the list and removes it from the list.
As it removes from the right side, in your case you have to specifically tell to take the list item at index 0 by calling pop(0).
So replacing your two lines from above with the following snippet should do everything in one step:
main_list.append([list_of_letters.pop(0) for _ in range(min(len(list_of_letters), a))])
Please note, that I stop popping elements from list_of_letters if a is larger then the remaining items in it, hence the min(len(list_of_letters), a).
I just finished learning how to do lists in python from the book,Python Programming for the Absolute Beginner, and came across a challenge asking to list out words randomly without repeating them. I have been trying to do it since the book doesn't gives you the answer to it. So far this is my code:
WORDS = ("YOU","ARE","WHO","THINK")
for word in WORDS:
newword=random.choice(WORDS)
while newword==word is False:
newword=random.choice(WORDS)
word=newword
print(word)
As obvious as it seems, the code didn't work out as the words repeat in lists.
You could use shuffle with a list instead of a tuple.
import random
lst = ['WHO','YOU','THINK','ARE']
random.shuffle(lst)
for x in lst:
print x
See this Q/A here also.
To convert a tuple to list: Q/A
The whole code if you insist on having a tuple:
import random
tuple = ('WHO','YOU','THINK','ARE')
lst = list(tuple)
random.shuffle(lst)
for x in lst:
print x
Add the printed word to a different array (e.g. 'usedwords') and loop through that array every time before you print another word.
Thats not perfomant but its a small list... so it should work just fine
(no code example, it should be in a beginners range to do that)
How to modify program of generation for saving words in list and choosing them with random.choice() previously do import random?
where is mistake? its not working correct
def generate_model(cfdist,word,num=15):
for i in range(num):
word=random.choice.im_class(cfdist
[word].keys())
>>> generate_model(cfd,'living')
There is all kinds of weird stuff going on with that code:
def generate_model(cfdist,word,num=15):
You use word as the key to look up in the dictionary.
word=
Then you change it? Are you intentionally chaining the result of one random lookup as the key for the next lookup?
random.choice
If you're intentionally chaining, this is right, but if you want a bunch of words from the same dict you want random.sample.
.im_class(
This is completely unnecessary. Just call it like random.choice(. Look at the examples in the random docs
cfdist[word]
You're getting the value in cfdist with the key equal to the value of word passed in (in this case, living) the first time, then the key is equal to the result of the choice after that. Is that what you intended?
.keys())
This will work, if each value in cfdist is a dict.
Now, you say you want
for saving words in list
But since I'm not sure what exactly you want, I'll give two examples. The first, I'll just aggregate the words, but not change anything else:
def generate_model(cfdist,word,num=15):
words = []
# you can add the starting word with words.append(word) here if you want
for i in range(num):
word=random.choice(cfdist[word].keys())
words.append(word)
return words
The second, I'll assume you just want 15 random words from the dict with no repetitions:
def generate_model(cfdist,word,num=15):
return random.sample(cfdist[word].keys(), num)
Then either way call it as
>>> words = generate_model(cfd,'living')
to get the list of words.
I've done a lot of Googling, but haven't found anything, so I'm really sorry if I'm just searching for the wrong things.
I am writing an implementation of the Ghost for MIT Introduction to Programming, assignment 5.
As part of this, I need to determine whether a string of characters is the start of any valid word. I have a list of valid words ("wordlist").
Update: I could use something that iterated through the list each time, such as Peter's simple suggestion:
def word_exists(wordlist, word_fragment):
return any(w.startswith(word_fragment) for w in wordlist)
I previously had:
wordlist = [w for w in wordlist if w.startswith(word_fragment)]
(from here) to narrow the list down to the list of valid words that start with that fragment and consider it a loss if wordlist is empty. The reason that I took this approach was that I (incorrectly, see below) thought that this would save time, as subsequent lookups would only have to search a smaller list.
It occurred to me that this is going through each item in the original wordlist (38,000-odd words) checking the start of each. This seems silly when wordlist is ordered, and the comprehension could stop once it hits something that is after the word fragment. I tried this:
newlist = []
for w in wordlist:
if w[:len(word_fragment)] > word_fragment:
# Take advantage of the fact that the list is sorted
break
if w.startswith(word_fragment):
newlist.append(w)
return newlist
but that is about the same speed, which I thought may be because list comprehensions run as compiled code?
I then thought that more efficient again would be some form of binary search in the list to find the block of matching words. Is this the way to go, or am I missing something really obvious?
Clearly it isn't really a big deal in this case, but I'm just starting out with programming and want to do things properly.
UPDATE:
I have since tested the below suggestions with a simple test script. While Peter's binary search/bisect would clearly be better for a single run, I was interested in whether the narrowing list would win over a series of fragments. In fact, it did not:
The totals for all strings "p", "py", "pyt", "pyth", "pytho" are as follows:
In total, Peter's simple test took 0.175472736359
In total, Peter's bisect left test took 9.36985015869e-05
In total, the list comprehension took 0.0499348640442
In total, Neil G's bisect took 0.000373601913452
The overhead of creating a second list etc clearly took more time than searching the longer list. In hindsight, this was likely the best approach regardless, as the "reducing list" approach increased the time for the first run, which was the worst case scenario.
Thanks all for some excellent suggestions, and well done Peter for the best answer!!!
Generator expressions are evaluated lazily, so if you only need to determine whether or not your word is valid, I would expect the following to be more efficient since it doesn't necessarily force it to build the full list once it finds a match:
def word_exists(wordlist, word_fragment):
return any(w.startswith(word_fragment) for w in wordlist)
Note that the lack of square brackets is important for this to work.
However this is obviously still linear in the worst case. You're correct that binary search would be more efficient; you can use the built-in bisect module for that. It might look something like this:
from bisect import bisect_left
def word_exists(wordlist, word_fragment):
try:
return wordlist[bisect_left(wordlist, word_fragment)].startswith(word_fragment)
except IndexError:
return False # word_fragment is greater than all entries in wordlist
bisect_left runs in O(log(n)) so is going to be considerably faster for a large wordlist.
Edit: I would guess that the example you gave loses out if your word_fragment is something really common (like 't'), in which case it probably spends most of its time assembling a large list of valid words, and the gain from only having to do a partial scan of the list is negligible. Hard to say for sure, but it's a little academic since binary search is better anyway.
You're right that you can do this more efficiently given that the list is sorted.
I'm building off of #Peter's answer, which returns a single element. I see that you want all the words that start with a given prefix. Here's how you do that:
from bisect import bisect_left
wordlist[bisect_left(wordlist, word_fragment):
bisect_left(wordlist, word_fragment[:-1] + chr(ord(word_fragment[-1])+1))]
This returns the slice from your original sorted list.
As Peter suggested I would use the Bisect module. Especially if you're reading from a large file of words.
If you really need speed you could make a daemon ( How do you create a daemon in Python? ) that has a pre-processed data structure suited for the task
I suggest you could use "tries"
http://www.topcoder.com/tc?module=Static&d1=tutorials&d2=usingTries
There are many algorithms and data structures to index and search
strings inside a text, some of them are included in the standard
libraries, but not all of them; the trie data structure is a good
example of one that isn't.
Let word be a single string and let dictionary be a large set of
words. If we have a dictionary, and we need to know if a single word
is inside of the dictionary the tries are a data structure that can
help us. But you may be asking yourself, "Why use tries if set
and hash tables can do the same?" There are two main reasons:
The tries can insert and find strings in O(L) time (where L represent
the length of a single word). This is much faster than set , but is it
a bit faster than a hash table.
The set and the hash tables
can only find in a dictionary words that match exactly with the single
word that we are finding; the trie allow us to find words that have a
single character different, a prefix in common, a character missing,
etc.
The tries can be useful in TopCoder problems, but also have a
great amount of applications in software engineering. For example,
consider a web browser. Do you know how the web browser can auto
complete your text or show you many possibilities of the text that you
could be writing? Yes, with the trie you can do it very fast. Do you
know how an orthographic corrector can check that every word that you
type is in a dictionary? Again a trie. You can also use a trie for
suggested corrections of the words that are present in the text but
not in the dictionary.
an example would be:
start={'a':nodea,'b':nodeb,'c':nodec...}
nodea={'a':nodeaa,'b':nodeab,'c':nodeac...}
nodeb={'a':nodeba,'b':nodebb,'c':nodebc...}
etc..
then if you want all the words starting with ab you would just traverse
start['a']['b'] and that would be all the words you want.
to build it you could iterate through your wordlist and for each word, iterate through the characters adding a new default dict where required.
In case of binary search (assuming wordlist is sorted), I'm thinking of something like this:
wordlist = "ab", "abc", "bc", "bcf", "bct", "cft", "k", "l", "m"
fragment = "bc"
a, m, b = 0, 0, len(wordlist)-1
iterations = 0
while True:
if (a + b) / 2 == m: break # endless loop = nothing found
m = (a + b) / 2
iterations += 1
if wordlist[m].startswith(fragment): break # found word
if wordlist[m] > fragment >= wordlist[a]: a, b = a, m
elif wordlist[b] >= fragment >= wordlist[m]: a, b = m, b
if wordlist[m].startswith(fragment):
print wordlist[m], iterations
else:
print "Not found", iterations
It will find one matched word, or none. You will then have to look to the left and right of it to find other matched words. My algorithm might be incorrect, its just a rough version of my thoughts.
Here's my fastest way to narrow the list wordlist down to a list of valid words starting with a given fragment :
sect() is a generator function that uses the excellent Peter's idea to employ bisect, and the islice() function :
from bisect import bisect_left
from itertools import islice
from time import clock
A,B = [],[]
iterations = 5
repetition = 10
with open('words.txt') as f:
wordlist = f.read().split()
wordlist.sort()
print 'wordlist[0:10]==',wordlist[0:10]
def sect(wordlist,word_fragment):
lgth = len(word_fragment)
for w in islice(wordlist,bisect_left(wordlist, word_fragment),None):
if w[0:lgth]==word_fragment:
yield w
else:
break
def hooloo(wordlist,word_fragment):
usque = len(word_fragment)
for w in wordlist:
if w[:usque] > word_fragment:
break
if w.startswith(word_fragment):
yield w
for rep in xrange(repetition):
te = clock()
for i in xrange(iterations):
newlistA = list(sect(wordlist,'VEST'))
A.append(clock()-te)
te = clock()
for i in xrange(iterations):
newlistB = list(hooloo(wordlist,'VEST'))
B.append(clock() - te)
print '\niterations =',iterations,' number of tries:',repetition,'\n'
print newlistA,'\n',min(A),'\n'
print newlistB,'\n',min(B),'\n'
result
wordlist[0:10]== ['AA', 'AAH', 'AAHED', 'AAHING', 'AAHS', 'AAL', 'AALII', 'AALIIS', 'AALS', 'AARDVARK']
iterations = 5 number of tries: 30
['VEST', 'VESTA', 'VESTAL', 'VESTALLY', 'VESTALS', 'VESTAS', 'VESTED', 'VESTEE', 'VESTEES', 'VESTIARY', 'VESTIGE', 'VESTIGES', 'VESTIGIA', 'VESTING', 'VESTINGS', 'VESTLESS', 'VESTLIKE', 'VESTMENT', 'VESTRAL', 'VESTRIES', 'VESTRY', 'VESTS', 'VESTURAL', 'VESTURE', 'VESTURED', 'VESTURES']
0.0286089433154
['VEST', 'VESTA', 'VESTAL', 'VESTALLY', 'VESTALS', 'VESTAS', 'VESTED', 'VESTEE', 'VESTEES', 'VESTIARY', 'VESTIGE', 'VESTIGES', 'VESTIGIA', 'VESTING', 'VESTINGS', 'VESTLESS', 'VESTLIKE', 'VESTMENT', 'VESTRAL', 'VESTRIES', 'VESTRY', 'VESTS', 'VESTURAL', 'VESTURE', 'VESTURED', 'VESTURES']
0.415578236899
sect() is 14.5 times faster than holloo()
PS:
I know the existence of timeit, but here, for such a result, clock() is fully sufficient
Doing binary search in the list is not going to guarantee you anything. I am not sure how that would work either.
You have a list which is ordered, it is a good news. The algorithmic performance complexity of both your cases is O(n) which is not bad, that you just have to iterate through the whole wordlist once.
But in the second case, the performance (engineering performance) should be better because you are breaking as soon as you find that rest cases will not apply. Try to have a list where 1st element is match and rest 38000 - 1 elements do not match, you will the second will beat the first.