Python anonymization using random.shuffle a list in a list

Python anonymization using random.shuffle a list in a list - python

Here is my coding:
word = ma['vals']
shuffled = list(word)
random.shuffle(word)
shuffled = ''.join(random.sample(word, len(word)))
newval = shuffled
the result will be BALLOONSFLOWERSGIFTSFLOWERSCANDYFLOWERSBALLOONSBALLOONS. When I want the result to be for eg if i'm shuffling gifts, e result will be stgfi.

This is too long for a comment, so I'll put it in as an answer, although it isn't.
I'm sorry, please don't take this the wrong way, but this is the worst code I've seen in 15 years. You should probably go through a basic tutorial at least once more to get a better grip on what is happening, because this feels to me like you are just randomly typing things without trying to understand what they do.
Let's start from the beginning:
ma['vals'] = [balloons, flowers, gifts, candy]
OK, so I assume ma is a dictionary. You use that dictionary nowehere in the code. Why is it there?
word = ma['vals']
Now you just set word to [balloons, flowers, gifts, candy]. Why not do that directly? Also, don't call a list of words "word". That implies that it is one word, but you made it a list.
shuffled = list(word)
Why do you do list(word)? It's already a list. All you have done now is set:
shuffled = [balloons, flowers, gifts, candy]
And you call it shuffled, when it's not.
random.shuffle(word)
And now you shuffle it. But you didn't use the shuffled variable, you used the word variable.
shuffled = ''.join(random.sample(word, len(word)))
And now you set shuffled to another thing, so you never used the first shuffled. besides, making a random sample from a list that is as long as the list, is the same things as shuffling it, and the list is already shuffled.
newval = shuffled
Why did you do this?
All your code can in fact be compressed into:
newval = [balloons, flowers, gifts, candy]
random.shuffle(newval)
This will have the same end result: You will have a randomly shuffled list of words.
So two thirds of your code actually end up not doing anything. The above also makes it quite clear why your code doesn't behave like you think. You shuffle a list of words, when you want to shuffle a word.

If you want to randomly choose a word, and then randomly shuffle the letters of that word:
In [27]: letters = list(random.choice(word))
In [28]: random.shuffle(letters)
In [29]: ''.join(letters)
Out[29]: 'blanolos'
Here, word is the same variable as in your script (i.e. the list of words).

You must create a list from a string. In your case for example, the string "balloons" would be accessed by ma["vals"][0]. You can then convert this to a list via calling list and passing in the string. The named optional paramter random to the function random.shuffle is the function based on which the sorting occurs. random.shuffle modifies the list passed into random.shuffle in place, hence why you call join on the list and not the result of the call to random.shuffle.
>>> wordList = list(ma["vals"][0])) #"balloons"
>>> random.shuffle(wordList,random=random.random)
>>> ''.join(wordList)
'oboanlls'

Related

How to divide the alphabet into lists of letters in a different way

I should divide the alphabet into lists of letters. I done that, my solution is good but my mentor said to me that I should a little improve this solution.
This is my code:
import string
import random
def generate_list():
list_of_letters=list(string.ascii_lowercase)
number_of_letter = len(list_of_letters)
main_list = []
while number_of_letter > 0:
a = random.randint(4, 7)
number_of_letter -= a
main_list.append(list_of_letters[0:a])
del list_of_letters[0:a]
print(main_list)
generate_list()
My mentor said me that I should take and remove lists of letters in one function, not to manually delete these pieces with lists of all letters manually using the del function So he would like to replace this fragment of code in one line.
main_list.append(list_of_letters[0:a])
del list_of_letters[0:a]
Can someone help me? Thank you in advance :)

You can use the pop() function of lists. It returns one item of the list and removes it from the list.
As it removes from the right side, in your case you have to specifically tell to take the list item at index 0 by calling pop(0).
So replacing your two lines from above with the following snippet should do everything in one step:
main_list.append([list_of_letters.pop(0) for _ in range(min(len(list_of_letters), a))])
Please note, that I stop popping elements from list_of_letters if a is larger then the remaining items in it, hence the min(len(list_of_letters), a).

Listing words randomly without repeating them

I just finished learning how to do lists in python from the book,Python Programming for the Absolute Beginner, and came across a challenge asking to list out words randomly without repeating them. I have been trying to do it since the book doesn't gives you the answer to it. So far this is my code:
WORDS = ("YOU","ARE","WHO","THINK")
for word in WORDS:
newword=random.choice(WORDS)
while newword==word is False:
newword=random.choice(WORDS)
word=newword
print(word)
As obvious as it seems, the code didn't work out as the words repeat in lists.

You could use shuffle with a list instead of a tuple.
import random
lst = ['WHO','YOU','THINK','ARE']
random.shuffle(lst)
for x in lst:
print x
See this Q/A here also.
To convert a tuple to list: Q/A
The whole code if you insist on having a tuple:
import random
tuple = ('WHO','YOU','THINK','ARE')
lst = list(tuple)
random.shuffle(lst)
for x in lst:
print x

Add the printed word to a different array (e.g. 'usedwords') and loop through that array every time before you print another word.
Thats not perfomant but its a small list... so it should work just fine
(no code example, it should be in a beginners range to do that)

Splitting a python list into multiple lists

for example, if i have a list like:
one = [1,2,3]
what function or method can i use to split each element into their own separate list like:
one = [1]
RANDOM_DYNAMIC_NAME = [2]
RANDOM_DYNAMIC_NAME_AGAIN = [3]
and at any given time, the unsplit list called one may have more than 1 element, its dynamic, and this algorithm is needed for a hangman game i am coding as self-given homework.
the algorithm is needed to complete this example purpose:
pick a word: mississippi
guess a letter: s
['_','_','s','s','_','s','s','_','_','_','_']
Here is my code:
http://pastebin.com/gcCZv67D

Looking at your code, if the part you're trying to solve is the comments in lines 24-26, you definitely don't need dynamically-created variables for that at all, and in fact I can't even imagine how they could help you.
You've got this:
enum = [i for i,x in enumerate(letterlist) if x == word]
The names of your variables are very confusing—something called word is the guessed letter, while you've got a different variable letterguess that's something else, and then a variable called letter that's the whole word… But I think I get what you're aiming for.
enum is a list of all of the indices of word within letterlist. For example, if letterlist is 'letter' and word is t, it will be [2, 3].
Then you do this:
bracketstrip = (str(w) for w in enum)
So now bracketstrip is ['2', '3']. I'm not sure why you want that.
z = int(''.join(bracketstrip))
And ''.join(bracketstrip) is '23', so z is 23.
letterguess[z] = word
And now you get an IndexError, because you're trying to set letterguess[23] instead of setting letterguess[2] and letterguess[3].
Here's what I think you want to replace that with:
enum = [i for i,x in enumerate(letterlist) if x == word]
for i in enum:
letterguess[i] = word
A few hints about some other parts of your code:
You've got a few places where you do things like this:
letterlist = []
for eachcharacter in letter:
letterlist.append(eachcharacter)
This is the same as letterlist = list(letter). But really, you don't need that list at all. The only thing you do with that is for i, x in enumerate(letterlist), and you could have done the exact same thing with letter in the first place. You're generally making things much harder for yourself than you have to. Make sure you actually understand why you've written each line of code.
"Because I couldn't get it to work any other way" isn't a reason—what were you trying to get to work? Why did you think you needed a list of letters? Nobody can keep all of those decisions in their head at once. The more skill you have, the more of your code will be so obvious to you that it doesn't need comments, but you'll never get to the point where you don't need any. When you're just starting out, every time you figure out how to do something, add a comment reminding yourself what you were trying to do, and why it works. You can always remove comments later; you can never get back comments that you didn't write.

for question one ,just list comprehension is good . it will return each element as a separate list
[ [x,] for x in one ]

As for a literal answer to your question, here's how you do it, though I can't immagine why you would want to to this. Generally, dynamic variable names are poor design. You probably just want a single list, or list of lists.
import random
for x in one:
name = 'x' + str(random.getrandbits(10))
globals()[name] = [x]

modify program of generation for saving words in list and choosing them with random.choice()

How to modify program of generation for saving words in list and choosing them with random.choice() previously do import random?
where is mistake? its not working correct
def generate_model(cfdist,word,num=15):
for i in range(num):
word=random.choice.im_class(cfdist
[word].keys())
>>> generate_model(cfd,'living')

There is all kinds of weird stuff going on with that code:
def generate_model(cfdist,word,num=15):
You use word as the key to look up in the dictionary.
word=
Then you change it? Are you intentionally chaining the result of one random lookup as the key for the next lookup?
random.choice
If you're intentionally chaining, this is right, but if you want a bunch of words from the same dict you want random.sample.
.im_class(
This is completely unnecessary. Just call it like random.choice(. Look at the examples in the random docs
cfdist[word]
You're getting the value in cfdist with the key equal to the value of word passed in (in this case, living) the first time, then the key is equal to the result of the choice after that. Is that what you intended?
.keys())
This will work, if each value in cfdist is a dict.
Now, you say you want
for saving words in list
But since I'm not sure what exactly you want, I'll give two examples. The first, I'll just aggregate the words, but not change anything else:
def generate_model(cfdist,word,num=15):
words = []
# you can add the starting word with words.append(word) here if you want
for i in range(num):
word=random.choice(cfdist[word].keys())
words.append(word)
return words
The second, I'll assume you just want 15 random words from the dict with no repetitions:
def generate_model(cfdist,word,num=15):
return random.sample(cfdist[word].keys(), num)
Then either way call it as
>>> words = generate_model(cfd,'living')
to get the list of words.

Fastest way in Python to find a 'startswith' substring in a long sorted list of strings

I've done a lot of Googling, but haven't found anything, so I'm really sorry if I'm just searching for the wrong things.
I am writing an implementation of the Ghost for MIT Introduction to Programming, assignment 5.
As part of this, I need to determine whether a string of characters is the start of any valid word. I have a list of valid words ("wordlist").
Update: I could use something that iterated through the list each time, such as Peter's simple suggestion:
def word_exists(wordlist, word_fragment):
return any(w.startswith(word_fragment) for w in wordlist)
I previously had:
wordlist = [w for w in wordlist if w.startswith(word_fragment)]
(from here) to narrow the list down to the list of valid words that start with that fragment and consider it a loss if wordlist is empty. The reason that I took this approach was that I (incorrectly, see below) thought that this would save time, as subsequent lookups would only have to search a smaller list.
It occurred to me that this is going through each item in the original wordlist (38,000-odd words) checking the start of each. This seems silly when wordlist is ordered, and the comprehension could stop once it hits something that is after the word fragment. I tried this:
newlist = []
for w in wordlist:
if w[:len(word_fragment)] > word_fragment:
# Take advantage of the fact that the list is sorted
break
if w.startswith(word_fragment):
newlist.append(w)
return newlist
but that is about the same speed, which I thought may be because list comprehensions run as compiled code?
I then thought that more efficient again would be some form of binary search in the list to find the block of matching words. Is this the way to go, or am I missing something really obvious?
Clearly it isn't really a big deal in this case, but I'm just starting out with programming and want to do things properly.
UPDATE:
I have since tested the below suggestions with a simple test script. While Peter's binary search/bisect would clearly be better for a single run, I was interested in whether the narrowing list would win over a series of fragments. In fact, it did not:
The totals for all strings "p", "py", "pyt", "pyth", "pytho" are as follows:
In total, Peter's simple test took 0.175472736359
In total, Peter's bisect left test took 9.36985015869e-05
In total, the list comprehension took 0.0499348640442
In total, Neil G's bisect took 0.000373601913452
The overhead of creating a second list etc clearly took more time than searching the longer list. In hindsight, this was likely the best approach regardless, as the "reducing list" approach increased the time for the first run, which was the worst case scenario.
Thanks all for some excellent suggestions, and well done Peter for the best answer!!!

Generator expressions are evaluated lazily, so if you only need to determine whether or not your word is valid, I would expect the following to be more efficient since it doesn't necessarily force it to build the full list once it finds a match:
def word_exists(wordlist, word_fragment):
return any(w.startswith(word_fragment) for w in wordlist)
Note that the lack of square brackets is important for this to work.
However this is obviously still linear in the worst case. You're correct that binary search would be more efficient; you can use the built-in bisect module for that. It might look something like this:
from bisect import bisect_left
def word_exists(wordlist, word_fragment):
try:
return wordlist[bisect_left(wordlist, word_fragment)].startswith(word_fragment)
except IndexError:
return False # word_fragment is greater than all entries in wordlist
bisect_left runs in O(log(n)) so is going to be considerably faster for a large wordlist.
Edit: I would guess that the example you gave loses out if your word_fragment is something really common (like 't'), in which case it probably spends most of its time assembling a large list of valid words, and the gain from only having to do a partial scan of the list is negligible. Hard to say for sure, but it's a little academic since binary search is better anyway.

You're right that you can do this more efficiently given that the list is sorted.
I'm building off of #Peter's answer, which returns a single element. I see that you want all the words that start with a given prefix. Here's how you do that:
from bisect import bisect_left
wordlist[bisect_left(wordlist, word_fragment):
bisect_left(wordlist, word_fragment[:-1] + chr(ord(word_fragment[-1])+1))]
This returns the slice from your original sorted list.

As Peter suggested I would use the Bisect module. Especially if you're reading from a large file of words.
If you really need speed you could make a daemon ( How do you create a daemon in Python? ) that has a pre-processed data structure suited for the task
I suggest you could use "tries"
http://www.topcoder.com/tc?module=Static&d1=tutorials&d2=usingTries
There are many algorithms and data structures to index and search
strings inside a text, some of them are included in the standard
libraries, but not all of them; the trie data structure is a good
example of one that isn't.
Let word be a single string and let dictionary be a large set of
words. If we have a dictionary, and we need to know if a single word
is inside of the dictionary the tries are a data structure that can
help us. But you may be asking yourself, "Why use tries if set
and hash tables can do the same?" There are two main reasons:
The tries can insert and find strings in O(L) time (where L represent
the length of a single word). This is much faster than set , but is it
a bit faster than a hash table.
The set and the hash tables
can only find in a dictionary words that match exactly with the single
word that we are finding; the trie allow us to find words that have a
single character different, a prefix in common, a character missing,
etc.
The tries can be useful in TopCoder problems, but also have a
great amount of applications in software engineering. For example,
consider a web browser. Do you know how the web browser can auto
complete your text or show you many possibilities of the text that you
could be writing? Yes, with the trie you can do it very fast. Do you
know how an orthographic corrector can check that every word that you
type is in a dictionary? Again a trie. You can also use a trie for
suggested corrections of the words that are present in the text but
not in the dictionary.
an example would be:
start={'a':nodea,'b':nodeb,'c':nodec...}
nodea={'a':nodeaa,'b':nodeab,'c':nodeac...}
nodeb={'a':nodeba,'b':nodebb,'c':nodebc...}
etc..
then if you want all the words starting with ab you would just traverse
start['a']['b'] and that would be all the words you want.
to build it you could iterate through your wordlist and for each word, iterate through the characters adding a new default dict where required.

In case of binary search (assuming wordlist is sorted), I'm thinking of something like this:
wordlist = "ab", "abc", "bc", "bcf", "bct", "cft", "k", "l", "m"
fragment = "bc"
a, m, b = 0, 0, len(wordlist)-1
iterations = 0
while True:
if (a + b) / 2 == m: break # endless loop = nothing found
m = (a + b) / 2
iterations += 1
if wordlist[m].startswith(fragment): break # found word
if wordlist[m] > fragment >= wordlist[a]: a, b = a, m
elif wordlist[b] >= fragment >= wordlist[m]: a, b = m, b
if wordlist[m].startswith(fragment):
print wordlist[m], iterations
else:
print "Not found", iterations
It will find one matched word, or none. You will then have to look to the left and right of it to find other matched words. My algorithm might be incorrect, its just a rough version of my thoughts.

Here's my fastest way to narrow the list wordlist down to a list of valid words starting with a given fragment :
sect() is a generator function that uses the excellent Peter's idea to employ bisect, and the islice() function :
from bisect import bisect_left
from itertools import islice
from time import clock
A,B = [],[]
iterations = 5
repetition = 10
with open('words.txt') as f:
wordlist = f.read().split()
wordlist.sort()
print 'wordlist[0:10]==',wordlist[0:10]
def sect(wordlist,word_fragment):
lgth = len(word_fragment)
for w in islice(wordlist,bisect_left(wordlist, word_fragment),None):
if w[0:lgth]==word_fragment:
yield w
else:
break
def hooloo(wordlist,word_fragment):
usque = len(word_fragment)
for w in wordlist:
if w[:usque] > word_fragment:
break
if w.startswith(word_fragment):
yield w
for rep in xrange(repetition):
te = clock()
for i in xrange(iterations):
newlistA = list(sect(wordlist,'VEST'))
A.append(clock()-te)
te = clock()
for i in xrange(iterations):
newlistB = list(hooloo(wordlist,'VEST'))
B.append(clock() - te)
print '\niterations =',iterations,' number of tries:',repetition,'\n'
print newlistA,'\n',min(A),'\n'
print newlistB,'\n',min(B),'\n'
result
wordlist[0:10]== ['AA', 'AAH', 'AAHED', 'AAHING', 'AAHS', 'AAL', 'AALII', 'AALIIS', 'AALS', 'AARDVARK']
iterations = 5 number of tries: 30
['VEST', 'VESTA', 'VESTAL', 'VESTALLY', 'VESTALS', 'VESTAS', 'VESTED', 'VESTEE', 'VESTEES', 'VESTIARY', 'VESTIGE', 'VESTIGES', 'VESTIGIA', 'VESTING', 'VESTINGS', 'VESTLESS', 'VESTLIKE', 'VESTMENT', 'VESTRAL', 'VESTRIES', 'VESTRY', 'VESTS', 'VESTURAL', 'VESTURE', 'VESTURED', 'VESTURES']
0.0286089433154
['VEST', 'VESTA', 'VESTAL', 'VESTALLY', 'VESTALS', 'VESTAS', 'VESTED', 'VESTEE', 'VESTEES', 'VESTIARY', 'VESTIGE', 'VESTIGES', 'VESTIGIA', 'VESTING', 'VESTINGS', 'VESTLESS', 'VESTLIKE', 'VESTMENT', 'VESTRAL', 'VESTRIES', 'VESTRY', 'VESTS', 'VESTURAL', 'VESTURE', 'VESTURED', 'VESTURES']
0.415578236899
sect() is 14.5 times faster than holloo()
PS:
I know the existence of timeit, but here, for such a result, clock() is fully sufficient

Doing binary search in the list is not going to guarantee you anything. I am not sure how that would work either.
You have a list which is ordered, it is a good news. The algorithmic performance complexity of both your cases is O(n) which is not bad, that you just have to iterate through the whole wordlist once.
But in the second case, the performance (engineering performance) should be better because you are breaking as soon as you find that rest cases will not apply. Try to have a list where 1st element is match and rest 38000 - 1 elements do not match, you will the second will beat the first.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.